[jira] [Commented] (MAHOUT-1567) Add online sparse dictionary learning (dimensionality reduction)

2014-06-02 Thread Ted Dunning (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015387#comment-14015387
 ] 

Ted Dunning commented on MAHOUT-1567:
-

A sequential implementation would still be interesting.  This kind of thing can 
often be leveraged into a parallel implementation later by doing a sequential 
pass first and learning exceptions in parallel later.



 Add online sparse dictionary learning (dimensionality reduction)
 

 Key: MAHOUT-1567
 URL: https://issues.apache.org/jira/browse/MAHOUT-1567
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
Reporter: Maciej Kula

 I have recently implemented a sparse online dictionary learning algorithm, 
 with an emphasis on learning very high-dimensional and very sparse 
 dictionaries. It is based on J. Mairal et al 'Online Dictionary Learning for 
 Sparse Coding' (http://www.di.ens.fr/willow/pdfs/icml09.pdf). It's an online 
 variant of low-rank matrix factorization, suitable for sparse binary matrices 
 (such as implicit feedback matrices).
 I would be very happy to bring this up to the Mahout standard and contribute 
 to the main codebase --- is this something you would in principle be 
 interested in having?
 The code (as well as some examples) are here: 
 https://github.com/maciejkula/dictionarylearning



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1365) Weighted ALS-WR iterator for Spark

2014-06-02 Thread Dmitriy Lyubimov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015639#comment-14015639
 ] 

Dmitriy Lyubimov commented on MAHOUT-1365:
--

[~ssc] Since you've done you before, can you please eyeball this and make a 
suggestion ? 
my current implementation proceeds with computations based on formula (7) in 
the pdf which is in its turn is derived directly from both papers.  (we ignore 
baseline confidence which i denote as c_0 in which case the expression under 
inversion comes apart as V'V which is common, tiny for all item vectors so it 
is just computed once and broadcasted; and then individual item correction 
U'D^(i)U which takes only rows of U where confidence is non-trivial (c!= c_0).

That kind of means that every U row has to send a message to every V for which 
where c!= c_0. I previously have done it with pregel. It turns out, in spark 
Bagel is a moot point since it is simply using groupBy underneath rather than a 
custom multicast communication. Still though, if i did it today, I would have 
to do a coGroup or something to achieve similar effect. Question is if there's 
a neat way to translate it into our current set of linear algebra primitives, 
or that's it, it would be our first case when we would have to create our first 
method that in part would be tightly coupled to Spark? Any thoughts?

 Weighted ALS-WR iterator for Spark
 --

 Key: MAHOUT-1365
 URL: https://issues.apache.org/jira/browse/MAHOUT-1365
 Project: Mahout
  Issue Type: Task
Reporter: Dmitriy Lyubimov
Assignee: Dmitriy Lyubimov
 Fix For: 1.0

 Attachments: distributed-als-with-confidence.pdf


 Given preference P and confidence C distributed sparse matrices, compute 
 ALS-WR solution for implicit feedback (Spark Bagel version).
 Following Hu-Koren-Volynsky method (stripping off any concrete methodology to 
 build C matrix), with parameterized test for convergence.
 The computational scheme is following ALS-WR method (which should be slightly 
 more efficient for sparser inputs). 
 The best performance will be achieved if non-sparse anomalies prefilitered 
 (eliminated) (such as an anomalously active user which doesn't represent 
 typical user anyway).
 the work is going here 
 https://github.com/dlyubimov/mahout-commits/tree/dev-0.9.x-scala. I am 
 porting away our (A1) implementation so there are a few issues associated 
 with that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MAHOUT-1365) Weighted ALS-WR iterator for Spark

2014-06-02 Thread Dmitriy Lyubimov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015639#comment-14015639
 ] 

Dmitriy Lyubimov edited comment on MAHOUT-1365 at 6/2/14 5:49 PM:
--

[~ssc] Since you've done this before, can you please eyeball this and make a 
suggestion ? 
my current implementation proceeds with computations based on formula (7) in 
the pdf which is in its turn is derived directly from both papers.  (we ignore 
baseline confidence which i denote as c_0 in which case the expression under 
inversion comes apart as V'V which is common, tiny for all item vectors so it 
is just computed once and broadcasted; and then individual item correction 
U'D^(i)U which takes only rows of U where confidence is non-trivial (c!= c_0).

That kind of means that every U row has to send a message to every V for which 
where c!= c_0. I previously have done it with pregel. It turns out, in spark 
Bagel is a moot point since it is simply using groupBy underneath rather than a 
custom multicast communication. Still though, if i did it today, I would have 
to do a coGroup or something to achieve similar effect. Question is if there's 
a neat way to translate it into our current set of linear algebra primitives, 
or that's it, it would be our first case when we would have to create our first 
method that in part would be tightly coupled to Spark? Any thoughts?


was (Author: dlyubimov):
[~ssc] Since you've done you before, can you please eyeball this and make a 
suggestion ? 
my current implementation proceeds with computations based on formula (7) in 
the pdf which is in its turn is derived directly from both papers.  (we ignore 
baseline confidence which i denote as c_0 in which case the expression under 
inversion comes apart as V'V which is common, tiny for all item vectors so it 
is just computed once and broadcasted; and then individual item correction 
U'D^(i)U which takes only rows of U where confidence is non-trivial (c!= c_0).

That kind of means that every U row has to send a message to every V for which 
where c!= c_0. I previously have done it with pregel. It turns out, in spark 
Bagel is a moot point since it is simply using groupBy underneath rather than a 
custom multicast communication. Still though, if i did it today, I would have 
to do a coGroup or something to achieve similar effect. Question is if there's 
a neat way to translate it into our current set of linear algebra primitives, 
or that's it, it would be our first case when we would have to create our first 
method that in part would be tightly coupled to Spark? Any thoughts?

 Weighted ALS-WR iterator for Spark
 --

 Key: MAHOUT-1365
 URL: https://issues.apache.org/jira/browse/MAHOUT-1365
 Project: Mahout
  Issue Type: Task
Reporter: Dmitriy Lyubimov
Assignee: Dmitriy Lyubimov
 Fix For: 1.0

 Attachments: distributed-als-with-confidence.pdf


 Given preference P and confidence C distributed sparse matrices, compute 
 ALS-WR solution for implicit feedback (Spark Bagel version).
 Following Hu-Koren-Volynsky method (stripping off any concrete methodology to 
 build C matrix), with parameterized test for convergence.
 The computational scheme is following ALS-WR method (which should be slightly 
 more efficient for sparser inputs). 
 The best performance will be achieved if non-sparse anomalies prefilitered 
 (eliminated) (such as an anomalously active user which doesn't represent 
 typical user anyway).
 the work is going here 
 https://github.com/dlyubimov/mahout-commits/tree/dev-0.9.x-scala. I am 
 porting away our (A1) implementation so there are a few issues associated 
 with that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


1490 pull request

2014-06-02 Thread Saikat Kanjilal
Dmitry,While I wait for you to review my pull request there are various areas I 
can look at in the interim:
1) Refining the blog to show the current APIs that we are tinkering with2) 
Adding more APIs around the DoubleDataFrameVector while you're reviewing the 
initial set, I have some ideas around which ones might be next that I can put 
in3) Start figuring out how to potentially leverage the UnsafeUtil class you 
introduced to hook into the various vector class APIs that I created

Would love to hear your thoughts on next set of immediate needs.Thanks in 
advance.

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-02 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015667#comment-14015667
 ] 

Pat Ferrel commented on MAHOUT-1464:


[~ssc] Should I reassign to me for now so we can get this committed?

 Cooccurrence Analysis on Spark
 --

 Key: MAHOUT-1464
 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
 Environment: hadoop, spark
Reporter: Pat Ferrel
Assignee: Sebastian Schelter
 Fix For: 1.0

 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
 MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh


 Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
 runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
 can be used as input. 
 Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
 several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-02 Thread Sebastian Schelter
The important thing here is that we test the code on a sufficiently large
dataset on a real cluster. Take that on, if you want!
Am 02.06.2014 20:08 schrieb Pat Ferrel (JIRA) j...@apache.org:


 [
 https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015667#comment-14015667
 ]

 Pat Ferrel commented on MAHOUT-1464:
 

 [~ssc] Should I reassign to me for now so we can get this committed?

  Cooccurrence Analysis on Spark
  --
 
  Key: MAHOUT-1464
  URL: https://issues.apache.org/jira/browse/MAHOUT-1464
  Project: Mahout
   Issue Type: Improvement
   Components: Collaborative Filtering
  Environment: hadoop, spark
 Reporter: Pat Ferrel
 Assignee: Sebastian Schelter
  Fix For: 1.0
 
  Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch,
 MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch,
 run-spark-xrsj.sh
 
 
  Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR)
 that runs on Spark. This should be compatible with Mahout Spark DRM DSL so
 a DRM can be used as input.
  Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence
 has several applications including cross-action recommendations.



 --
 This message was sent by Atlassian JIRA
 (v6.2#6252)



[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-02 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015788#comment-14015788
 ] 

Pat Ferrel commented on MAHOUT-1464:


Looks like DrmLike may have been refactored since this patch was written.

[~dlyubimov] The following patch code has an error at elem saying Missing 
parameter type 'elem' Looking at the scaladocs I tracked back to the DrmLike 
trait and see no way to .mapBlock on it. Has something been refactored here? 
The .nonZeroes() is a java sparse vector iterator I think. This worked about a 
month ago so thought you might have an idea how things have changed?

{code:scala}
  def computeIndicators(drmBtA: DrmLike[Int], numUsers: Int, 
maxInterestingItemsPerThing: Int,
bcastNumInteractionsB: Broadcast[Vector], 
bcastNumInteractionsA: Broadcast[Vector],
crossCooccurrence: Boolean = true) = {
drmBtA.mapBlock() {
  case (keys, block) =

val llrBlock = block.like()
val numInteractionsB: Vector = bcastNumInteractionsB
val numInteractionsA: Vector = bcastNumInteractionsA

for (index - 0 until keys.size) {

  val thingB = keys(index)

  // PriorityQueue to select the top-k items
  val topItemsPerThing = new 
mutable.PriorityQueue[(Int,Double)]()(orderByScore)

  block(index, ::).nonZeroes().foreach { elem = //! Error: 
Missing parameter type 'elem'
val thingA = elem.index
val cooccurrences = elem.get
{code}

 Cooccurrence Analysis on Spark
 --

 Key: MAHOUT-1464
 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
 Environment: hadoop, spark
Reporter: Pat Ferrel
Assignee: Sebastian Schelter
 Fix For: 1.0

 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
 MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh


 Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
 runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
 can be used as input. 
 Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
 several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-02 Thread Dmitriy Lyubimov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015806#comment-14015806
 ] 

Dmitriy Lyubimov commented on MAHOUT-1464:
--

I think this has nothing to do with anything in Spark or scala bindings I think 
. 

the .nonZeroes() is mahout-math method (java) which produces a java iterator, 
which then implicitly cast to scala iterator (since .foreach is scala 
operator). 

is JavaConversions._ still imported?

 Cooccurrence Analysis on Spark
 --

 Key: MAHOUT-1464
 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
 Environment: hadoop, spark
Reporter: Pat Ferrel
Assignee: Sebastian Schelter
 Fix For: 1.0

 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
 MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh


 Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
 runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
 can be used as input. 
 Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
 several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-02 Thread Dmitriy Lyubimov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015810#comment-14015810
 ] 

Dmitriy Lyubimov commented on MAHOUT-1464:
--

if you want me to verify this, please convert to pull request so i can 
painlessly sync to exactly what you are testing.

 Cooccurrence Analysis on Spark
 --

 Key: MAHOUT-1464
 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
 Environment: hadoop, spark
Reporter: Pat Ferrel
Assignee: Sebastian Schelter
 Fix For: 1.0

 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
 MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh


 Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
 runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
 can be used as input. 
 Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
 several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-02 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015824#comment-14015824
 ] 

Pat Ferrel commented on MAHOUT-1464:


import scala.collection.JavaConversions._

is included. I'll pare back to just this ticket and send a PR

 Cooccurrence Analysis on Spark
 --

 Key: MAHOUT-1464
 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
 Environment: hadoop, spark
Reporter: Pat Ferrel
Assignee: Sebastian Schelter
 Fix For: 1.0

 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
 MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh


 Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
 runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
 can be used as input. 
 Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
 several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2

2014-06-02 Thread Brian Salgado (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015952#comment-14015952
 ] 

Brian Salgado commented on MAHOUT-1329:
---

Hi Gokhan,

I've followed the instructions mentioned above and build master branch of 
https://github.com/apache/mahout locally and used the dependencies below when 
running Mahout job but I am still experiencing error below.

Dependencies:
mahout-core-1.0-SNAPSHOT.jar
mahout-math-1.0-SNAPSHOT.jar
hadoop-annotations-2.2.0.jar
hadoop-auth-2.2.0.jar
hadoop-common-2.2.0.jar
hadoop-mapreduce-client-common-2.2.0.jar
hadoop-mapreduce-client-core-2.2.0.jar
hadoop-yarn-api-2.2.0.jar
hadoop-yarn-client-2.2.0.jar
hadoop-yarn-common-2.2.0.jar
hadoop-yarn-server-common-2.2.0.jar
commons-cli-2.0-mahout.jar

Wanted to ask help if I have missed something out.

Regards,
Brian

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3-additional.patch, 1329-3.patch, 
 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (MAHOUT-1329) Mahout for hadoop 2

2014-06-02 Thread Brian Salgado (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015952#comment-14015952
 ] 

Brian Salgado edited comment on MAHOUT-1329 at 6/2/14 10:06 PM:


Hi Gokhan,

I've followed the instructions mentioned above and build master branch of 
https://github.com/apache/mahout locally and used the dependencies below when 
running Mahout job but I am still experiencing error below.

[Dependencies]
mahout-core-1.0-SNAPSHOT.jar
mahout-math-1.0-SNAPSHOT.jar
hadoop-annotations-2.2.0.jar
hadoop-auth-2.2.0.jar
hadoop-common-2.2.0.jar
hadoop-mapreduce-client-common-2.2.0.jar
hadoop-mapreduce-client-core-2.2.0.jar
hadoop-yarn-api-2.2.0.jar
hadoop-yarn-client-2.2.0.jar
hadoop-yarn-common-2.2.0.jar
hadoop-yarn-server-common-2.2.0.jar
commons-cli-2.0-mahout.jar

[Error]
14/06/02 13:50:30 INFO Configuration.deprecation: mapred.output.dir is 
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
Exception in thread main java.lang.IncompatibleClassChangeError: Found 
interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:166)
at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:579)
at com.bestbuy.recs.mahout.algorithms.BinaryStep.run(BinaryStep.java:82)

Wanted to ask help if I have missed something out.

Regards,
Brian


was (Author: brian.salgado):
Hi Gokhan,

I've followed the instructions mentioned above and build master branch of 
https://github.com/apache/mahout locally and used the dependencies below when 
running Mahout job but I am still experiencing error below.

Dependencies:
mahout-core-1.0-SNAPSHOT.jar
mahout-math-1.0-SNAPSHOT.jar
hadoop-annotations-2.2.0.jar
hadoop-auth-2.2.0.jar
hadoop-common-2.2.0.jar
hadoop-mapreduce-client-common-2.2.0.jar
hadoop-mapreduce-client-core-2.2.0.jar
hadoop-yarn-api-2.2.0.jar
hadoop-yarn-client-2.2.0.jar
hadoop-yarn-common-2.2.0.jar
hadoop-yarn-server-common-2.2.0.jar
commons-cli-2.0-mahout.jar

Wanted to ask help if I have missed something out.

Regards,
Brian

 Mahout for hadoop 2
 ---

 Key: MAHOUT-1329
 URL: https://issues.apache.org/jira/browse/MAHOUT-1329
 Project: Mahout
  Issue Type: Task
  Components: build
Affects Versions: 0.9
Reporter: Sergey Svinarchuk
Assignee: Gokhan Capan
  Labels: patch
 Fix For: 1.0

 Attachments: 1329-2.patch, 1329-3-additional.patch, 1329-3.patch, 
 1329.patch


 Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Problems with mapBlock()

2014-06-02 Thread Pat Ferrel
Having similar problems. I updated the imports in your old patch for 
MAHOUT-1464, do you have a new one? I just sent D a pointer to my repo for  
MAHOUT-1464 here: https://github.com/pferrel/mahout/tree/MAHOUT-1464

Side note, I can’t change the branch in a github generated PR. It always wants 
to merge with apache:master. Is that as expected?

On Jun 1, 2014, at 11:15 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:

Imports have changed with abstraction migration. Check the updated docs.
On May 31, 2014 11:21 PM, Sebastian Schelter s...@apache.org wrote:

 I've updated the codebase to work on the cooccurrence analysis algo, but I
 always run into this error now:
 
 error: value mapBlock is not a member of org.apache.mahout.math.drm.
 DrmLike[Int]
 
 I have the feeling that an implicit conversion might be missing, but I
 couldn't figure out where to put it, with out producing even more errors.
 
 --sebastian
 



[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016013#comment-14016013
 ] 

ASF GitHub Bot commented on MAHOUT-1464:


GitHub user dlyubimov reopened a pull request:

https://github.com/apache/mahout/pull/8

MAHOUT-1464 Cooccurrence Analysis on Spark

Grabbed Pat's branch. submitting as PR (WIP at this point). 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dlyubimov/mahout MAHOUT-1464

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mahout/pull/8.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8


commit 70654fa58dd4b801c551429945fa2f1377a60b2e
Author: pferrel p...@occamsmachete.com
Date:   2014-06-02T21:11:55Z

starting to merge the cooccurrence stuff, import errors

commit fc5fb6ac37e4c12d25c35ddb7912a32aac06e449
Author: pferrel p...@occamsmachete.com
Date:   2014-06-02T21:33:45Z

tried changing the imports in CooccurrenceAnalysis.scala to no avail

commit 242aed0e0921afe9a87ee8973ba8077cbe65fffa
Author: Dmitriy Lyubimov dlyubi...@apache.org
Date:   2014-06-02T22:42:57Z

Compilation fixes, updates for MAHOUT-1529 changes




 Cooccurrence Analysis on Spark
 --

 Key: MAHOUT-1464
 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
 Environment: hadoop, spark
Reporter: Pat Ferrel
Assignee: Sebastian Schelter
 Fix For: 1.0

 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
 MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh


 Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
 runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
 can be used as input. 
 Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
 several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016012#comment-14016012
 ] 

ASF GitHub Bot commented on MAHOUT-1464:


Github user dlyubimov closed the pull request at:

https://github.com/apache/mahout/pull/8


 Cooccurrence Analysis on Spark
 --

 Key: MAHOUT-1464
 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
 Environment: hadoop, spark
Reporter: Pat Ferrel
Assignee: Sebastian Schelter
 Fix For: 1.0

 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
 MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh


 Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
 runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
 can be used as input. 
 Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
 several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

2014-06-02 Thread Pat Ferrel (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016102#comment-14016102
 ] 

Pat Ferrel commented on MAHOUT-1464:


My problem is that my cluster is 1.2.1 and to upgrade everything I run on it 
has to go to H2. Oh bother.

I think the best thing is commit this and see it someone will run one of the 
several included tests on a cluster. It works local and seems to work clustered 
but the write fails. The write is not part of the core code.

Anyway unless someone vetos I'll commit it once I get at least one build 
integrated test included.

 Cooccurrence Analysis on Spark
 --

 Key: MAHOUT-1464
 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
 Project: Mahout
  Issue Type: Improvement
  Components: Collaborative Filtering
 Environment: hadoop, spark
Reporter: Pat Ferrel
Assignee: Sebastian Schelter
 Fix For: 1.0

 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
 MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh


 Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
 runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
 can be used as input. 
 Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
 several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAHOUT-1565) add MR2 options to MAHOUT_OPTS in bin/mahout

2014-06-02 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016135#comment-14016135
 ] 

Nishkam Ravi commented on MAHOUT-1565:
--

Has this PR been merged yet?

 add MR2 options to MAHOUT_OPTS in bin/mahout
 

 Key: MAHOUT-1565
 URL: https://issues.apache.org/jira/browse/MAHOUT-1565
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 1.0, 0.9
Reporter: Nishkam Ravi
 Fix For: 1.0

 Attachments: MAHOUT-1565.patch


 MR2 options are missing in MAHOUT_OPTS in bin/mahout and bin/mahout.cmd. Add 
 those options.



--
This message was sent by Atlassian JIRA
(v6.2#6252)