[
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030843#comment-14030843
]
ASF GitHub Bot commented on MAHOUT-1464:
----------------------------------------
GitHub user pferrel opened a pull request:
https://github.com/apache/mahout/pull/18
MAHOUT-1464
The numNonZeroElementsPerColumn additions did not account for negative
values, only counted the positive non-zero values. Fixed this in the in core
and distributed case.
I added to Functions.java to create a Functions.notEqual. It may be
possible to do this with the other functions but it wasn't obvious so I wrote
one. The test is in MatrixOpsSuite, where is it used.
The distributed case was much simpler.
Changed tests to include negative values.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pferrel/mahout mahout-1464
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/mahout/pull/18.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18
----
commit 107a0ba9605241653a85b113661a8fa5c055529f
Author: pferrel <[email protected]>
Date: 2014-06-04T19:54:22Z
added Sebastian's CooccurrenceAnalysis patch updated it to use current
Mahout-DSL
commit 16c03f7fa73c156859d1dba3a333ef9e8bf922b0
Author: pferrel <[email protected]>
Date: 2014-06-04T21:32:18Z
added Sebastian's MurmurHash changes
Signed-off-by: pferrel <[email protected]>
commit c6adaa44c80bba99d41600e260bbb1ad5c972e69
Author: pferrel <[email protected]>
Date: 2014-06-05T16:52:23Z
MAHOUT-1464 import cleanup, minor changes to examples for running on Spark
Cluster
commit 1d66e5726e71e297ef4a7a27331463ba363098c0
Author: pferrel <[email protected]>
Date: 2014-06-06T20:19:32Z
scalatest for cooccurrence cross and self along with other
CooccurrenceAnalyisi methods
commit 766db0f9e7feb70520fbd444afcb910788f01e76
Author: pferrel <[email protected]>
Date: 2014-06-06T20:20:46Z
Merge branch 'master' into mahout-1464
commit e492976688cb8860354bb20a362d370405f560e1
Author: pferrel <[email protected]>
Date: 2014-06-06T20:50:07Z
cleaned up test comments
commit a49692eb1664de4b15de1864b95701a6410c80c8
Author: pferrel <[email protected]>
Date: 2014-06-06T21:09:55Z
got those cursed .DS_Stores out of the branch and put an exclude in
.gitignore
commit 268290d28d4f83cc47a7e6baebc5eb4c53d7c8da
Author: pferrel <[email protected]>
Date: 2014-06-07T21:50:04Z
Merge branch 'master' into mahout-1464
commit 63b10704390e18f513cca30596b1d25e146a6edd
Author: pferrel <[email protected]>
Date: 2014-06-08T15:26:36Z
Merge branch 'master' into mahout-1464
commit ac00d7655c4cba5f6c6dcb4882be95656b17a834
Author: pferrel <[email protected]>
Date: 2014-06-09T14:11:43Z
Merge branch 'master' into mahout-1464
commit fb008efeae3d5f6f6ba350fbc2ef3944da1dcaef
Author: pferrel <[email protected]>
Date: 2014-06-12T02:17:27Z
added 'colCounts' to a drm using the SparkEngine and MatrixOps, which, when
used in cooccurrence, fixes the problem with non-boolean preference values
commit 5b04cb31403e2521d9874ad5e14f28cd0af26c26
Author: pferrel <[email protected]>
Date: 2014-06-12T02:18:29Z
Merge branch 'master' into mahout-1464
commit e451a2a596f5ceda8d1b4990e97ad3d5673fdb5f
Author: pferrel <[email protected]>
Date: 2014-06-12T16:02:26Z
fixed some things from Dmitiy's comments, primary being the SparkEngine
accumulator was doing >= 0 instead of > 0
commit 411e0e92b4721626b736d66c292926fa4fdbb530
Author: pferrel <[email protected]>
Date: 2014-06-12T17:43:21Z
changing the name of drm.colCounts to drm.getNumNonZeroElements
commit 9655fd70f69ed97eb2d6765928a0a1f7dd760281
Author: pferrel <[email protected]>
Date: 2014-06-12T18:32:03Z
meant to say changing drm.colCounts to drm.numNonZeroElementsPerColumn
commit a2001375d46c5946b671f89f5a7cff2e6a094ea8
Author: pferrel <[email protected]>
Date: 2014-06-12T18:34:32Z
Merge branch 'master' into mahout-1464
commit 2db06b5566c8dcccb382733613b2fab6c223b5de
Author: pferrel <[email protected]>
Date: 2014-06-12T18:51:54Z
typo
commit 0b689b8b879c4ac03b71cf504a9d0d78ffa6bfa5
Author: pferrel <[email protected]>
Date: 2014-06-12T20:03:45Z
clean up test
commit 32afbe5e552ab94979dd545d14cda17ebc9c018e
Author: pferrel <[email protected]>
Date: 2014-06-12T23:42:08Z
one more fat finger error
commit b91e5e98c47829a5cc099289f83e99e6bf317dd6
Author: pferrel <[email protected]>
Date: 2014-06-13T16:18:33Z
did not account for negative values in the purely mathematical MatrixOps
and SparkEngine version of numNonZeroElementsPerColumn so fixed this and added
to tests
commit 9f6fd902f95c7daf687ecb59698f78217dbf6b6b
Author: pferrel <[email protected]>
Date: 2014-06-13T16:43:46Z
merging master to run new tests
----
> Cooccurrence Analysis on Spark
> ------------------------------
>
> Key: MAHOUT-1464
> URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Environment: hadoop, spark
> Reporter: Pat Ferrel
> Assignee: Pat Ferrel
> Fix For: 1.0
>
> Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch,
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM
> can be used as input.
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has
> several applications including cross-action recommendations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)