[ 
https://issues.apache.org/jira/browse/MAHOUT-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pat Ferrel updated MAHOUT-1568:
-------------------------------

    Description: 
Implement mechanisms to read and write data from/to flexible stores. These will 
support tuples streams and drms but with extensions that allow keeping user 
defined values for IDs. The mechanism in some sense can replace Sequence Files 
for import/export and will make the operation much easier for the user. In many 
cases directly consuming their input files.

Start with text delimited files for input/output in the Spark version of 
ItemSimilarity

A proposal is running with ItemSimilarity on Spark and is documented on the 
github wiki here: https://github.com/pferrel/harness/wiki

Comments are appreciated

  was:
Implement mechanisms to read and write data from/to flexible stores. These will 
support tuples streams and drms but with extensions that allow keeping user 
defined values for IDs. The mechanism in some sense can replace Sequence Files 
for import/export and will make the operation much easier for the user. In many 
cases directly consuming their input files.

Start with text delimited files for input/output in the Spark version of 
ItemSimilarity

A proposal is running with ItemSimilarity on Spark which and is documented on 
the github wiki here: https://github.com/pferrel/harness/wiki

Comments are appreciated


> Build an I/O model that can replace sequence files for import/export
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-1568
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1568
>             Project: Mahout
>          Issue Type: New Feature
>          Components: CLI
>         Environment: Scala, Spark
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>
> Implement mechanisms to read and write data from/to flexible stores. These 
> will support tuples streams and drms but with extensions that allow keeping 
> user defined values for IDs. The mechanism in some sense can replace Sequence 
> Files for import/export and will make the operation much easier for the user. 
> In many cases directly consuming their input files.
> Start with text delimited files for input/output in the Spark version of 
> ItemSimilarity
> A proposal is running with ItemSimilarity on Spark and is documented on the 
> github wiki here: https://github.com/pferrel/harness/wiki
> Comments are appreciated



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to