export

Andrew Palumbo Sun, 01 Jun 2014 20:12:07 -0700

Thanks.  Yeah its a weird going from Java to Scala.  Everything makes sense 
until all of the sudden it doesn't.  I appreciate the pointers!


> Subject: Re: [jira] [Created] (MAHOUT-1568) Build an I/O model that can 
> replace sequence files for import/export
> From: [email protected]
> Date: Sun, 1 Jun 2014 19:53:06 -0700
> To: [email protected]
> 
> Well it does run so I need to clean that stuff up anyway.
> 
> The use of Traits is very powerful but is nothing like Python of Ruby mixins. 
> Took me a lot of head scratching to get it straight and these are about as  
> simple as you can get.
> 
> The key thing to look at is the reader and writer methods and the 
> cooccurrence stuff. They are good examples of using Scala collection classes 
> to do distributed computing. Sebastian originally did most of that and it 
> should give you a leg up to understanding how Scala relates to Spark and why 
> it makes things so much easier that java+mapreduce. 
>  
> On Jun 1, 2014, at 6:58 PM, Andrew Palumbo <[email protected]> wrote:
> 
> Cool- I was just going through it to get familiar with the DSL (and really 
> scala in general at this point) and the read/write traits that you were 
> talking about... Just looking at the code really- I don't have any need to 
> build it right now.  Wanted to make sure i wasn't totally off...
> 
> Thanks  
> 
> > Subject: Re: [jira] [Created] (MAHOUT-1568) Build an I/O model that can 
> > replace sequence files for import/export
> > From: [email protected]
> > Date: Sun, 1 Jun 2014 17:57:47 -0700
> > To: [email protected]
> > 
> > Sorry, wasn’t expecting someone to build it. Don’t know if the packaging is 
> > right yet and it's about a month behind on the trunk.
> > 
> > You pull the repo at the same level as the major pieces like 
> > math-scala—into MAHOUT_HOME, apply MAHOUT-1464 patch, but all you need is 
> > org.apache.mahout.cf.CooccurrenceAnalysis from the patches. Your version 
> > should work. Then build the snapshot mahout, go into harness and ‘mvn 
> > install -DskipTests’. Since the packaging may not be right I haven’t 
> > integrated it with the mahout poms. 
> > 
> > I’ll merge it with the trunk tomorrow.
> > 
> > On Jun 1, 2014, at 1:57 PM, Andrew Palumbo <[email protected]> wrote:
> > 
> > Hi Pat,
> > 
> > Does Harness compile against the mahout trunk + MAHOUT-1464.patch 
> > (cooccurance)?  I have a patched up branch of the mahout trunk with 
> > basically a gutted MAHOUT-1464.patch- just something that defines 
> > org.apache.mahout.cf.CooccurrenceAnalysis and compiles (so i wouldn't be 
> > able to run Harness right now anyways).  I think the changes from 
> > MAHOUT-1529 are causing my problems-  the errors are from DrmLike stuff:
> > 
> > 
> > [ERROR] 
> > /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/IndexedDataset.scala:40:
> >  error: not found: type DrmLike
> > [INFO] case class IndexedDataset(matrix: DrmLike[Int], rowIDs: 
> > BiMap[String,Int], columnIDs: BiMap[String,Int]) {
> > [INFO]                                   ^
> > [ERROR] 
> > /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala:105:
> >  error: not found: type DrmRdd
> > [INFO]         }).asInstanceOf[DrmRdd[Int]]
> > [INFO]                         ^
> > [ERROR] 
> > /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala:107:
> >  error: not found: type CheckpointedDrmBase
> > [INFO]       val drmInteractions = new 
> > CheckpointedDrmBase[Int](indexedInteractions, numRows, numColumns)
> > [INFO]                                 ^
> > [ERROR] 
> > /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala:145:
> >  error: not found: type DrmLike
> > [INFO]       val matrix: DrmLike[Int] = indexedDataset.matrix
> > 
> > Thanks,
> > 
> > Andy      
> > 
> > 
> >> Date: Sun, 1 Jun 2014 17:27:01 +0000
> >> From: [email protected]
> >> To: [email protected]
> >> Subject: [jira] [Created] (MAHOUT-1568) Build an I/O model that can 
> >> replace sequence files for import/export
> >> 
> >> Pat Ferrel created MAHOUT-1568:
> >> ----------------------------------
> >> 
> >>            Summary: Build an I/O model that can replace sequence files for 
> >> import/export
> >>                Key: MAHOUT-1568
> >>                URL: https://issues.apache.org/jira/browse/MAHOUT-1568
> >>            Project: Mahout
> >>         Issue Type: New Feature
> >>         Components: CLI
> >>        Environment: Scala, Spark
> >>           Reporter: Pat Ferrel
> >>           Assignee: Pat Ferrel
> >> 
> >> 
> >> Implement mechanisms to read and write data from/to flexible stores. These 
> >> will support tuples streams and drms but with extensions that allow 
> >> keeping user defined values for IDs. The mechanism in some sense can 
> >> replace Sequence Files for import/export and will make the operation much 
> >> easier for the user. In many cases directly consuming their input files.
> >> 
> >> Start with text delimited files for input/output in the Spark version of 
> >> ItemSimilarity
> >> 
> >> A proposal is running with ItemSimilarity on Spark which and is documented 
> >> on the github wiki here: https://github.com/pferrel/harness/wiki
> >> 
> >> Comments are appreciated
> >> 
> >> 
> >> 
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v6.2#6252)
> >                                       
> > 
>                                         
>

RE: [jira] [Created] (MAHOUT-1568) Build an I/O model that can replace sequence files for import/export

Reply via email to