Thanks. Yeah its a weird going from Java to Scala. Everything makes sense until all of the sudden it doesn't. I appreciate the pointers!
> Subject: Re: [jira] [Created] (MAHOUT-1568) Build an I/O model that can > replace sequence files for import/export > From: [email protected] > Date: Sun, 1 Jun 2014 19:53:06 -0700 > To: [email protected] > > Well it does run so I need to clean that stuff up anyway. > > The use of Traits is very powerful but is nothing like Python of Ruby mixins. > Took me a lot of head scratching to get it straight and these are about as > simple as you can get. > > The key thing to look at is the reader and writer methods and the > cooccurrence stuff. They are good examples of using Scala collection classes > to do distributed computing. Sebastian originally did most of that and it > should give you a leg up to understanding how Scala relates to Spark and why > it makes things so much easier that java+mapreduce. > > On Jun 1, 2014, at 6:58 PM, Andrew Palumbo <[email protected]> wrote: > > Cool- I was just going through it to get familiar with the DSL (and really > scala in general at this point) and the read/write traits that you were > talking about... Just looking at the code really- I don't have any need to > build it right now. Wanted to make sure i wasn't totally off... > > Thanks > > > Subject: Re: [jira] [Created] (MAHOUT-1568) Build an I/O model that can > > replace sequence files for import/export > > From: [email protected] > > Date: Sun, 1 Jun 2014 17:57:47 -0700 > > To: [email protected] > > > > Sorry, wasn’t expecting someone to build it. Don’t know if the packaging is > > right yet and it's about a month behind on the trunk. > > > > You pull the repo at the same level as the major pieces like > > math-scala—into MAHOUT_HOME, apply MAHOUT-1464 patch, but all you need is > > org.apache.mahout.cf.CooccurrenceAnalysis from the patches. Your version > > should work. Then build the snapshot mahout, go into harness and ‘mvn > > install -DskipTests’. Since the packaging may not be right I haven’t > > integrated it with the mahout poms. > > > > I’ll merge it with the trunk tomorrow. > > > > On Jun 1, 2014, at 1:57 PM, Andrew Palumbo <[email protected]> wrote: > > > > Hi Pat, > > > > Does Harness compile against the mahout trunk + MAHOUT-1464.patch > > (cooccurance)? I have a patched up branch of the mahout trunk with > > basically a gutted MAHOUT-1464.patch- just something that defines > > org.apache.mahout.cf.CooccurrenceAnalysis and compiles (so i wouldn't be > > able to run Harness right now anyways). I think the changes from > > MAHOUT-1529 are causing my problems- the errors are from DrmLike stuff: > > > > > > [ERROR] > > /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/IndexedDataset.scala:40: > > error: not found: type DrmLike > > [INFO] case class IndexedDataset(matrix: DrmLike[Int], rowIDs: > > BiMap[String,Int], columnIDs: BiMap[String,Int]) { > > [INFO] ^ > > [ERROR] > > /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala:105: > > error: not found: type DrmRdd > > [INFO] }).asInstanceOf[DrmRdd[Int]] > > [INFO] ^ > > [ERROR] > > /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala:107: > > error: not found: type CheckpointedDrmBase > > [INFO] val drmInteractions = new > > CheckpointedDrmBase[Int](indexedInteractions, numRows, numColumns) > > [INFO] ^ > > [ERROR] > > /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala:145: > > error: not found: type DrmLike > > [INFO] val matrix: DrmLike[Int] = indexedDataset.matrix > > > > Thanks, > > > > Andy > > > > > >> Date: Sun, 1 Jun 2014 17:27:01 +0000 > >> From: [email protected] > >> To: [email protected] > >> Subject: [jira] [Created] (MAHOUT-1568) Build an I/O model that can > >> replace sequence files for import/export > >> > >> Pat Ferrel created MAHOUT-1568: > >> ---------------------------------- > >> > >> Summary: Build an I/O model that can replace sequence files for > >> import/export > >> Key: MAHOUT-1568 > >> URL: https://issues.apache.org/jira/browse/MAHOUT-1568 > >> Project: Mahout > >> Issue Type: New Feature > >> Components: CLI > >> Environment: Scala, Spark > >> Reporter: Pat Ferrel > >> Assignee: Pat Ferrel > >> > >> > >> Implement mechanisms to read and write data from/to flexible stores. These > >> will support tuples streams and drms but with extensions that allow > >> keeping user defined values for IDs. The mechanism in some sense can > >> replace Sequence Files for import/export and will make the operation much > >> easier for the user. In many cases directly consuming their input files. > >> > >> Start with text delimited files for input/output in the Spark version of > >> ItemSimilarity > >> > >> A proposal is running with ItemSimilarity on Spark which and is documented > >> on the github wiki here: https://github.com/pferrel/harness/wiki > >> > >> Comments are appreciated > >> > >> > >> > >> -- > >> This message was sent by Atlassian JIRA > >> (v6.2#6252) > > > > > >
