Re: cf/couccurence code

2014-07-09 Thread Pat Ferrel
Hmm, that doesn’t seem like a good idea. Since there is precedence and for the sake of argument I’ll go ahead and do it but: 1) it means the wrong module will fail a build test when the error in not in the test 2) it is a kind of lie about the dependencies of a module. A consumer would think

Re: cf/couccurence code

2014-07-09 Thread Anand Avati
Pat, I agree that proposal is not ideal, and your points are of course valid. All I'm saying is solving the code vs test module is a separate issue, not a non-issue. However it is independent of the right location of cf code problem. Here's a PR for just the code move:

Re: cf/couccurence code

2014-07-08 Thread Anand Avati
I'm not completely sure how to address this (code and tests in separate modules) as I write, but I will give it a shot soon. On Mon, Jul 7, 2014 at 9:18 AM, Pat Ferrel pat.fer...@gmail.com wrote: OK, I’m spending more time on this than I have to spare. The test class extends

Re: cf/couccurence code

2014-07-08 Thread Pat Ferrel
I already did the code and tests in separate modules, that works but is not a good way to go imo. If there are tests that will work in math-scala then we can put the code in math-scala. I couldn’t find a way to do it. On Jul 8, 2014, at 4:40 PM, Anand Avati av...@gluster.org wrote: I'm not

Re: cf/couccurence code

2014-07-08 Thread Anand Avati
If that is the case, why not commit so much already (i.e, separate modules for code and test) since that has been the norm thus far (see DSSVD, DSPCA etc.) Fixing code vs test modules could be a separate task/activity (which I'm happy to pick up) on which cf code move need not be dependent on.

Re: cf/couccurence code

2014-06-30 Thread Ted Dunning
This makes reasonable sense. The CF stuff does *use* math a fair but but could be said not to *be* math in itself. On the other hand, the core/math split in Mahout itself was motivated by the need to isolate the Hadoop dependencies. I am not clear that the same is true here. Is there an

Re: cf/couccurence code

2014-06-30 Thread Pat Ferrel
No inherent need. The original question was about Spark dependencies brought up by Anand. Math and Cooccurrence are not dependent and anything that does file I/O is. Math-scala does not have spark in the pom, spark and the I/O and CLI stuff do. Speaking for Sebastian and Dmitriy (with some

Re: cf/couccurence code

2014-06-30 Thread Ted Dunning
On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel pat.fer...@gmail.com wrote: Speaking for Sebastian and Dmitriy (with some ignorance) I think the idea was to isolate things with Spark dependencies something like we did before with Hadoop. Go ahead and speak for me as well here! I think isolating

Re: cf/couccurence code

2014-06-30 Thread Pat Ferrel
No argument, just trying to decide whether to create core-scala or keep dumping anything not Spark dependent in math-scala. On Jun 30, 2014, at 9:32 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel pat.fer...@gmail.com wrote: Speaking for Sebastian and

Re: cf/couccurence code

2014-06-25 Thread Pat Ferrel
Seems like the cf stuff as well as other algos that are consumers of “math-scala” but are not really math, should go in a new “core” project perhaps. If so the pom should probably be pretty similar to math-scala so that any Spark dependencies are noticed. Keeping them in a scala only

cf/couccurence code

2014-06-19 Thread Anand Avati
Hi Pat and others, I see that cf/CooccuranceAnalysis.scala is currently under spark. Is there a specific reason? I see that the code itself is completely spark agnostic. I tried moving the code under math-scala/src/main/scala/org/apache/mahout/math/cf/ with the following trivial patch: diff --git

Re: cf/couccurence code

2014-06-19 Thread Sebastian Schelter
Hi Anand, Yes, this should not contain anything spark-specific. +1 for moving it. --sebastian On 06/19/2014 08:38 PM, Anand Avati wrote: Hi Pat and others, I see that cf/CooccuranceAnalysis.scala is currently under spark. Is there a specific reason? I see that the code itself is completely

Re: cf/couccurence code

2014-06-19 Thread Pat Ferrel
Actually it has several Spark deps like having an SparkContext, SparkConf, and and rdd for file I/O Please look before you vote. I’ve been waving this flag for awhile—I/O is not engine neutral. On Jun 19, 2014, at 11:41 AM, Sebastian Schelter s...@apache.org wrote: Hi Anand, Yes, this should

Re: cf/couccurence code

2014-06-19 Thread Dmitriy Lyubimov
Pat, it is -- or it is simply missing. If you are trying load a matrix from a text file, there's simply no mapping to a text file format -- but it could be created i suppose. if you are trying to load something other than a matrix, then it is not an issue of I/O but simply the fact that you are

Re: cf/couccurence code

2014-06-19 Thread Pat Ferrel
Sorry, in the car. Cf/cooccurrence is not Spark dependent it can be moved I thought the reference was to the pr I just did for ItemSimilarityJob, which has the deps I mentioned Sent from my iPhone On Jun 19, 2014, at 12:01 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: Pat, it is -- or it

Re: cf/couccurence code

2014-06-19 Thread Pat Ferrel
Not sure if the previous mail got through I'm in a car No spark deps in cf/cooccurrence it can be moved The deps are in I/O code in ItemSimilarityJob the subject of the pr just before your first email Sorry for the confusion Sent from my iPhone On Jun 19, 2014, at 12:06 PM, Anand Avati