[jira] [Commented] (MAHOUT-1567) Add online sparse dictionary learning (dimensionality reduction)
[ https://issues.apache.org/jira/browse/MAHOUT-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015387#comment-14015387 ] Ted Dunning commented on MAHOUT-1567: - A sequential implementation would still be interesting. This kind of thing can often be leveraged into a parallel implementation later by doing a sequential pass first and learning exceptions in parallel later. Add online sparse dictionary learning (dimensionality reduction) Key: MAHOUT-1567 URL: https://issues.apache.org/jira/browse/MAHOUT-1567 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Reporter: Maciej Kula I have recently implemented a sparse online dictionary learning algorithm, with an emphasis on learning very high-dimensional and very sparse dictionaries. It is based on J. Mairal et al 'Online Dictionary Learning for Sparse Coding' (http://www.di.ens.fr/willow/pdfs/icml09.pdf). It's an online variant of low-rank matrix factorization, suitable for sparse binary matrices (such as implicit feedback matrices). I would be very happy to bring this up to the Mahout standard and contribute to the main codebase --- is this something you would in principle be interested in having? The code (as well as some examples) are here: https://github.com/maciejkula/dictionarylearning -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1365) Weighted ALS-WR iterator for Spark
[ https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015639#comment-14015639 ] Dmitriy Lyubimov commented on MAHOUT-1365: -- [~ssc] Since you've done you before, can you please eyeball this and make a suggestion ? my current implementation proceeds with computations based on formula (7) in the pdf which is in its turn is derived directly from both papers. (we ignore baseline confidence which i denote as c_0 in which case the expression under inversion comes apart as V'V which is common, tiny for all item vectors so it is just computed once and broadcasted; and then individual item correction U'D^(i)U which takes only rows of U where confidence is non-trivial (c!= c_0). That kind of means that every U row has to send a message to every V for which where c!= c_0. I previously have done it with pregel. It turns out, in spark Bagel is a moot point since it is simply using groupBy underneath rather than a custom multicast communication. Still though, if i did it today, I would have to do a coGroup or something to achieve similar effect. Question is if there's a neat way to translate it into our current set of linear algebra primitives, or that's it, it would be our first case when we would have to create our first method that in part would be tightly coupled to Spark? Any thoughts? Weighted ALS-WR iterator for Spark -- Key: MAHOUT-1365 URL: https://issues.apache.org/jira/browse/MAHOUT-1365 Project: Mahout Issue Type: Task Reporter: Dmitriy Lyubimov Assignee: Dmitriy Lyubimov Fix For: 1.0 Attachments: distributed-als-with-confidence.pdf Given preference P and confidence C distributed sparse matrices, compute ALS-WR solution for implicit feedback (Spark Bagel version). Following Hu-Koren-Volynsky method (stripping off any concrete methodology to build C matrix), with parameterized test for convergence. The computational scheme is following ALS-WR method (which should be slightly more efficient for sparser inputs). The best performance will be achieved if non-sparse anomalies prefilitered (eliminated) (such as an anomalously active user which doesn't represent typical user anyway). the work is going here https://github.com/dlyubimov/mahout-commits/tree/dev-0.9.x-scala. I am porting away our (A1) implementation so there are a few issues associated with that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (MAHOUT-1365) Weighted ALS-WR iterator for Spark
[ https://issues.apache.org/jira/browse/MAHOUT-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015639#comment-14015639 ] Dmitriy Lyubimov edited comment on MAHOUT-1365 at 6/2/14 5:49 PM: -- [~ssc] Since you've done this before, can you please eyeball this and make a suggestion ? my current implementation proceeds with computations based on formula (7) in the pdf which is in its turn is derived directly from both papers. (we ignore baseline confidence which i denote as c_0 in which case the expression under inversion comes apart as V'V which is common, tiny for all item vectors so it is just computed once and broadcasted; and then individual item correction U'D^(i)U which takes only rows of U where confidence is non-trivial (c!= c_0). That kind of means that every U row has to send a message to every V for which where c!= c_0. I previously have done it with pregel. It turns out, in spark Bagel is a moot point since it is simply using groupBy underneath rather than a custom multicast communication. Still though, if i did it today, I would have to do a coGroup or something to achieve similar effect. Question is if there's a neat way to translate it into our current set of linear algebra primitives, or that's it, it would be our first case when we would have to create our first method that in part would be tightly coupled to Spark? Any thoughts? was (Author: dlyubimov): [~ssc] Since you've done you before, can you please eyeball this and make a suggestion ? my current implementation proceeds with computations based on formula (7) in the pdf which is in its turn is derived directly from both papers. (we ignore baseline confidence which i denote as c_0 in which case the expression under inversion comes apart as V'V which is common, tiny for all item vectors so it is just computed once and broadcasted; and then individual item correction U'D^(i)U which takes only rows of U where confidence is non-trivial (c!= c_0). That kind of means that every U row has to send a message to every V for which where c!= c_0. I previously have done it with pregel. It turns out, in spark Bagel is a moot point since it is simply using groupBy underneath rather than a custom multicast communication. Still though, if i did it today, I would have to do a coGroup or something to achieve similar effect. Question is if there's a neat way to translate it into our current set of linear algebra primitives, or that's it, it would be our first case when we would have to create our first method that in part would be tightly coupled to Spark? Any thoughts? Weighted ALS-WR iterator for Spark -- Key: MAHOUT-1365 URL: https://issues.apache.org/jira/browse/MAHOUT-1365 Project: Mahout Issue Type: Task Reporter: Dmitriy Lyubimov Assignee: Dmitriy Lyubimov Fix For: 1.0 Attachments: distributed-als-with-confidence.pdf Given preference P and confidence C distributed sparse matrices, compute ALS-WR solution for implicit feedback (Spark Bagel version). Following Hu-Koren-Volynsky method (stripping off any concrete methodology to build C matrix), with parameterized test for convergence. The computational scheme is following ALS-WR method (which should be slightly more efficient for sparser inputs). The best performance will be achieved if non-sparse anomalies prefilitered (eliminated) (such as an anomalously active user which doesn't represent typical user anyway). the work is going here https://github.com/dlyubimov/mahout-commits/tree/dev-0.9.x-scala. I am porting away our (A1) implementation so there are a few issues associated with that. -- This message was sent by Atlassian JIRA (v6.2#6252)
1490 pull request
Dmitry,While I wait for you to review my pull request there are various areas I can look at in the interim: 1) Refining the blog to show the current APIs that we are tinkering with2) Adding more APIs around the DoubleDataFrameVector while you're reviewing the initial set, I have some ideas around which ones might be next that I can put in3) Start figuring out how to potentially leverage the UnsafeUtil class you introduced to hook into the various vector class APIs that I created Would love to hear your thoughts on next set of immediate needs.Thanks in advance.
[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015667#comment-14015667 ] Pat Ferrel commented on MAHOUT-1464: [~ssc] Should I reassign to me for now so we can get this committed? Cooccurrence Analysis on Spark -- Key: MAHOUT-1464 URL: https://issues.apache.org/jira/browse/MAHOUT-1464 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Environment: hadoop, spark Reporter: Pat Ferrel Assignee: Sebastian Schelter Fix For: 1.0 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM can be used as input. Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has several applications including cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark
The important thing here is that we test the code on a sufficiently large dataset on a real cluster. Take that on, if you want! Am 02.06.2014 20:08 schrieb Pat Ferrel (JIRA) j...@apache.org: [ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015667#comment-14015667 ] Pat Ferrel commented on MAHOUT-1464: [~ssc] Should I reassign to me for now so we can get this committed? Cooccurrence Analysis on Spark -- Key: MAHOUT-1464 URL: https://issues.apache.org/jira/browse/MAHOUT-1464 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Environment: hadoop, spark Reporter: Pat Ferrel Assignee: Sebastian Schelter Fix For: 1.0 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM can be used as input. Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has several applications including cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015788#comment-14015788 ] Pat Ferrel commented on MAHOUT-1464: Looks like DrmLike may have been refactored since this patch was written. [~dlyubimov] The following patch code has an error at elem saying Missing parameter type 'elem' Looking at the scaladocs I tracked back to the DrmLike trait and see no way to .mapBlock on it. Has something been refactored here? The .nonZeroes() is a java sparse vector iterator I think. This worked about a month ago so thought you might have an idea how things have changed? {code:scala} def computeIndicators(drmBtA: DrmLike[Int], numUsers: Int, maxInterestingItemsPerThing: Int, bcastNumInteractionsB: Broadcast[Vector], bcastNumInteractionsA: Broadcast[Vector], crossCooccurrence: Boolean = true) = { drmBtA.mapBlock() { case (keys, block) = val llrBlock = block.like() val numInteractionsB: Vector = bcastNumInteractionsB val numInteractionsA: Vector = bcastNumInteractionsA for (index - 0 until keys.size) { val thingB = keys(index) // PriorityQueue to select the top-k items val topItemsPerThing = new mutable.PriorityQueue[(Int,Double)]()(orderByScore) block(index, ::).nonZeroes().foreach { elem = //! Error: Missing parameter type 'elem' val thingA = elem.index val cooccurrences = elem.get {code} Cooccurrence Analysis on Spark -- Key: MAHOUT-1464 URL: https://issues.apache.org/jira/browse/MAHOUT-1464 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Environment: hadoop, spark Reporter: Pat Ferrel Assignee: Sebastian Schelter Fix For: 1.0 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM can be used as input. Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has several applications including cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015806#comment-14015806 ] Dmitriy Lyubimov commented on MAHOUT-1464: -- I think this has nothing to do with anything in Spark or scala bindings I think . the .nonZeroes() is mahout-math method (java) which produces a java iterator, which then implicitly cast to scala iterator (since .foreach is scala operator). is JavaConversions._ still imported? Cooccurrence Analysis on Spark -- Key: MAHOUT-1464 URL: https://issues.apache.org/jira/browse/MAHOUT-1464 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Environment: hadoop, spark Reporter: Pat Ferrel Assignee: Sebastian Schelter Fix For: 1.0 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM can be used as input. Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has several applications including cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015810#comment-14015810 ] Dmitriy Lyubimov commented on MAHOUT-1464: -- if you want me to verify this, please convert to pull request so i can painlessly sync to exactly what you are testing. Cooccurrence Analysis on Spark -- Key: MAHOUT-1464 URL: https://issues.apache.org/jira/browse/MAHOUT-1464 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Environment: hadoop, spark Reporter: Pat Ferrel Assignee: Sebastian Schelter Fix For: 1.0 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM can be used as input. Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has several applications including cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015824#comment-14015824 ] Pat Ferrel commented on MAHOUT-1464: import scala.collection.JavaConversions._ is included. I'll pare back to just this ticket and send a PR Cooccurrence Analysis on Spark -- Key: MAHOUT-1464 URL: https://issues.apache.org/jira/browse/MAHOUT-1464 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Environment: hadoop, spark Reporter: Pat Ferrel Assignee: Sebastian Schelter Fix For: 1.0 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM can be used as input. Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has several applications including cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1329) Mahout for hadoop 2
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015952#comment-14015952 ] Brian Salgado commented on MAHOUT-1329: --- Hi Gokhan, I've followed the instructions mentioned above and build master branch of https://github.com/apache/mahout locally and used the dependencies below when running Mahout job but I am still experiencing error below. Dependencies: mahout-core-1.0-SNAPSHOT.jar mahout-math-1.0-SNAPSHOT.jar hadoop-annotations-2.2.0.jar hadoop-auth-2.2.0.jar hadoop-common-2.2.0.jar hadoop-mapreduce-client-common-2.2.0.jar hadoop-mapreduce-client-core-2.2.0.jar hadoop-yarn-api-2.2.0.jar hadoop-yarn-client-2.2.0.jar hadoop-yarn-common-2.2.0.jar hadoop-yarn-server-common-2.2.0.jar commons-cli-2.0-mahout.jar Wanted to ask help if I have missed something out. Regards, Brian Mahout for hadoop 2 --- Key: MAHOUT-1329 URL: https://issues.apache.org/jira/browse/MAHOUT-1329 Project: Mahout Issue Type: Task Components: build Affects Versions: 0.9 Reporter: Sergey Svinarchuk Assignee: Gokhan Capan Labels: patch Fix For: 1.0 Attachments: 1329-2.patch, 1329-3-additional.patch, 1329-3.patch, 1329.patch Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (MAHOUT-1329) Mahout for hadoop 2
[ https://issues.apache.org/jira/browse/MAHOUT-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015952#comment-14015952 ] Brian Salgado edited comment on MAHOUT-1329 at 6/2/14 10:06 PM: Hi Gokhan, I've followed the instructions mentioned above and build master branch of https://github.com/apache/mahout locally and used the dependencies below when running Mahout job but I am still experiencing error below. [Dependencies] mahout-core-1.0-SNAPSHOT.jar mahout-math-1.0-SNAPSHOT.jar hadoop-annotations-2.2.0.jar hadoop-auth-2.2.0.jar hadoop-common-2.2.0.jar hadoop-mapreduce-client-common-2.2.0.jar hadoop-mapreduce-client-core-2.2.0.jar hadoop-yarn-api-2.2.0.jar hadoop-yarn-client-2.2.0.jar hadoop-yarn-common-2.2.0.jar hadoop-yarn-server-common-2.2.0.jar commons-cli-2.0-mahout.jar [Error] 14/06/02 13:50:30 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:166) at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:579) at com.bestbuy.recs.mahout.algorithms.BinaryStep.run(BinaryStep.java:82) Wanted to ask help if I have missed something out. Regards, Brian was (Author: brian.salgado): Hi Gokhan, I've followed the instructions mentioned above and build master branch of https://github.com/apache/mahout locally and used the dependencies below when running Mahout job but I am still experiencing error below. Dependencies: mahout-core-1.0-SNAPSHOT.jar mahout-math-1.0-SNAPSHOT.jar hadoop-annotations-2.2.0.jar hadoop-auth-2.2.0.jar hadoop-common-2.2.0.jar hadoop-mapreduce-client-common-2.2.0.jar hadoop-mapreduce-client-core-2.2.0.jar hadoop-yarn-api-2.2.0.jar hadoop-yarn-client-2.2.0.jar hadoop-yarn-common-2.2.0.jar hadoop-yarn-server-common-2.2.0.jar commons-cli-2.0-mahout.jar Wanted to ask help if I have missed something out. Regards, Brian Mahout for hadoop 2 --- Key: MAHOUT-1329 URL: https://issues.apache.org/jira/browse/MAHOUT-1329 Project: Mahout Issue Type: Task Components: build Affects Versions: 0.9 Reporter: Sergey Svinarchuk Assignee: Gokhan Capan Labels: patch Fix For: 1.0 Attachments: 1329-2.patch, 1329-3-additional.patch, 1329-3.patch, 1329.patch Update mahout for work with hadoop 2.X, targeting this for Mahout 1.0. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Problems with mapBlock()
Having similar problems. I updated the imports in your old patch for MAHOUT-1464, do you have a new one? I just sent D a pointer to my repo for MAHOUT-1464 here: https://github.com/pferrel/mahout/tree/MAHOUT-1464 Side note, I can’t change the branch in a github generated PR. It always wants to merge with apache:master. Is that as expected? On Jun 1, 2014, at 11:15 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: Imports have changed with abstraction migration. Check the updated docs. On May 31, 2014 11:21 PM, Sebastian Schelter s...@apache.org wrote: I've updated the codebase to work on the cooccurrence analysis algo, but I always run into this error now: error: value mapBlock is not a member of org.apache.mahout.math.drm. DrmLike[Int] I have the feeling that an implicit conversion might be missing, but I couldn't figure out where to put it, with out producing even more errors. --sebastian
[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016013#comment-14016013 ] ASF GitHub Bot commented on MAHOUT-1464: GitHub user dlyubimov reopened a pull request: https://github.com/apache/mahout/pull/8 MAHOUT-1464 Cooccurrence Analysis on Spark Grabbed Pat's branch. submitting as PR (WIP at this point). You can merge this pull request into a Git repository by running: $ git pull https://github.com/dlyubimov/mahout MAHOUT-1464 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mahout/pull/8.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8 commit 70654fa58dd4b801c551429945fa2f1377a60b2e Author: pferrel p...@occamsmachete.com Date: 2014-06-02T21:11:55Z starting to merge the cooccurrence stuff, import errors commit fc5fb6ac37e4c12d25c35ddb7912a32aac06e449 Author: pferrel p...@occamsmachete.com Date: 2014-06-02T21:33:45Z tried changing the imports in CooccurrenceAnalysis.scala to no avail commit 242aed0e0921afe9a87ee8973ba8077cbe65fffa Author: Dmitriy Lyubimov dlyubi...@apache.org Date: 2014-06-02T22:42:57Z Compilation fixes, updates for MAHOUT-1529 changes Cooccurrence Analysis on Spark -- Key: MAHOUT-1464 URL: https://issues.apache.org/jira/browse/MAHOUT-1464 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Environment: hadoop, spark Reporter: Pat Ferrel Assignee: Sebastian Schelter Fix For: 1.0 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM can be used as input. Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has several applications including cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016012#comment-14016012 ] ASF GitHub Bot commented on MAHOUT-1464: Github user dlyubimov closed the pull request at: https://github.com/apache/mahout/pull/8 Cooccurrence Analysis on Spark -- Key: MAHOUT-1464 URL: https://issues.apache.org/jira/browse/MAHOUT-1464 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Environment: hadoop, spark Reporter: Pat Ferrel Assignee: Sebastian Schelter Fix For: 1.0 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM can be used as input. Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has several applications including cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark
[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016102#comment-14016102 ] Pat Ferrel commented on MAHOUT-1464: My problem is that my cluster is 1.2.1 and to upgrade everything I run on it has to go to H2. Oh bother. I think the best thing is commit this and see it someone will run one of the several included tests on a cluster. It works local and seems to work clustered but the write fails. The write is not part of the core code. Anyway unless someone vetos I'll commit it once I get at least one build integrated test included. Cooccurrence Analysis on Spark -- Key: MAHOUT-1464 URL: https://issues.apache.org/jira/browse/MAHOUT-1464 Project: Mahout Issue Type: Improvement Components: Collaborative Filtering Environment: hadoop, spark Reporter: Pat Ferrel Assignee: Sebastian Schelter Fix For: 1.0 Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM can be used as input. Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has several applications including cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAHOUT-1565) add MR2 options to MAHOUT_OPTS in bin/mahout
[ https://issues.apache.org/jira/browse/MAHOUT-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016135#comment-14016135 ] Nishkam Ravi commented on MAHOUT-1565: -- Has this PR been merged yet? add MR2 options to MAHOUT_OPTS in bin/mahout Key: MAHOUT-1565 URL: https://issues.apache.org/jira/browse/MAHOUT-1565 Project: Mahout Issue Type: Improvement Affects Versions: 1.0, 0.9 Reporter: Nishkam Ravi Fix For: 1.0 Attachments: MAHOUT-1565.patch MR2 options are missing in MAHOUT_OPTS in bin/mahout and bin/mahout.cmd. Add those options. -- This message was sent by Atlassian JIRA (v6.2#6252)