ok, I’ve integrated it with the Mahout structure and it’s now a pull request 
here: https://github.com/apache/mahout/pull/11 . Much more to do but it does 
work if you run the driver test in ItemSimilarityDriverTest. This isn’t a 
scalatest, it runs the driver from the drivers dir.

#11 includes #12 (https://github.com/apache/mahout/pull/12), which is ready to 
push once I get a clean compile with tests. So to experiment all you need is 
#11. Make sure to skip tests.

BTW RE: Scala, make sure to look at the package objects in various 
package.scala files. Don’t know how to look at these as scaladocs but they 
often contain important and useful helper functions. Check out the use of 
package objects in Scala.

On Jun 1, 2014, at 8:11 PM, Andrew Palumbo <[email protected]> wrote:

Thanks.  Yeah its a weird going from Java to Scala.  Everything makes sense 
until all of the sudden it doesn't.  I appreciate the pointers!

> Subject: Re: [jira] [Created] (MAHOUT-1568) Build an I/O model that can 
> replace sequence files for import/export
> From: [email protected]
> Date: Sun, 1 Jun 2014 19:53:06 -0700
> To: [email protected]
> 
> Well it does run so I need to clean that stuff up anyway.
> 
> The use of Traits is very powerful but is nothing like Python of Ruby mixins. 
> Took me a lot of head scratching to get it straight and these are about as  
> simple as you can get.
> 
> The key thing to look at is the reader and writer methods and the 
> cooccurrence stuff. They are good examples of using Scala collection classes 
> to do distributed computing. Sebastian originally did most of that and it 
> should give you a leg up to understanding how Scala relates to Spark and why 
> it makes things so much easier that java+mapreduce. 
> 
> On Jun 1, 2014, at 6:58 PM, Andrew Palumbo <[email protected]> wrote:
> 
> Cool- I was just going through it to get familiar with the DSL (and really 
> scala in general at this point) and the read/write traits that you were 
> talking about... Just looking at the code really- I don't have any need to 
> build it right now.  Wanted to make sure i wasn't totally off...
> 
> Thanks  
> 
>> Subject: Re: [jira] [Created] (MAHOUT-1568) Build an I/O model that can 
>> replace sequence files for import/export
>> From: [email protected]
>> Date: Sun, 1 Jun 2014 17:57:47 -0700
>> To: [email protected]
>> 
>> Sorry, wasn’t expecting someone to build it. Don’t know if the packaging is 
>> right yet and it's about a month behind on the trunk.
>> 
>> You pull the repo at the same level as the major pieces like math-scala—into 
>> MAHOUT_HOME, apply MAHOUT-1464 patch, but all you need is 
>> org.apache.mahout.cf.CooccurrenceAnalysis from the patches. Your version 
>> should work. Then build the snapshot mahout, go into harness and ‘mvn 
>> install -DskipTests’. Since the packaging may not be right I haven’t 
>> integrated it with the mahout poms. 
>> 
>> I’ll merge it with the trunk tomorrow.
>> 
>> On Jun 1, 2014, at 1:57 PM, Andrew Palumbo <[email protected]> wrote:
>> 
>> Hi Pat,
>> 
>> Does Harness compile against the mahout trunk + MAHOUT-1464.patch 
>> (cooccurance)?  I have a patched up branch of the mahout trunk with 
>> basically a gutted MAHOUT-1464.patch- just something that defines 
>> org.apache.mahout.cf.CooccurrenceAnalysis and compiles (so i wouldn't be 
>> able to run Harness right now anyways).  I think the changes from 
>> MAHOUT-1529 are causing my problems-  the errors are from DrmLike stuff:
>> 
>> 
>> [ERROR] 
>> /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/IndexedDataset.scala:40:
>>  error: not found: type DrmLike
>> [INFO] case class IndexedDataset(matrix: DrmLike[Int], rowIDs: 
>> BiMap[String,Int], columnIDs: BiMap[String,Int]) {
>> [INFO]                                   ^
>> [ERROR] 
>> /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala:105:
>>  error: not found: type DrmRdd
>> [INFO]         }).asInstanceOf[DrmRdd[Int]]
>> [INFO]                         ^
>> [ERROR] 
>> /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala:107:
>>  error: not found: type CheckpointedDrmBase
>> [INFO]       val drmInteractions = new 
>> CheckpointedDrmBase[Int](indexedInteractions, numRows, numColumns)
>> [INFO]                                 ^
>> [ERROR] 
>> /home/andy/sandbox/harness/src/main/scala/org/apache/mahout/drivers/ReaderWriter.scala:145:
>>  error: not found: type DrmLike
>> [INFO]       val matrix: DrmLike[Int] = indexedDataset.matrix
>> 
>> Thanks,
>> 
>> Andy      
>> 
>> 
>>> Date: Sun, 1 Jun 2014 17:27:01 +0000
>>> From: [email protected]
>>> To: [email protected]
>>> Subject: [jira] [Created] (MAHOUT-1568) Build an I/O model that can replace 
>>> sequence files for import/export
>>> 
>>> Pat Ferrel created MAHOUT-1568:
>>> ----------------------------------
>>> 
>>>           Summary: Build an I/O model that can replace sequence files for 
>>> import/export
>>>               Key: MAHOUT-1568
>>>               URL: https://issues.apache.org/jira/browse/MAHOUT-1568
>>>           Project: Mahout
>>>        Issue Type: New Feature
>>>        Components: CLI
>>>       Environment: Scala, Spark
>>>          Reporter: Pat Ferrel
>>>          Assignee: Pat Ferrel
>>> 
>>> 
>>> Implement mechanisms to read and write data from/to flexible stores. These 
>>> will support tuples streams and drms but with extensions that allow keeping 
>>> user defined values for IDs. The mechanism in some sense can replace 
>>> Sequence Files for import/export and will make the operation much easier 
>>> for the user. In many cases directly consuming their input files.
>>> 
>>> Start with text delimited files for input/output in the Spark version of 
>>> ItemSimilarity
>>> 
>>> A proposal is running with ItemSimilarity on Spark which and is documented 
>>> on the github wiki here: https://github.com/pferrel/harness/wiki
>>> 
>>> Comments are appreciated
>>> 
>>> 
>>> 
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.2#6252)
>>                                        
>> 
>                                         
> 
                                          

Reply via email to