Not sure how this relates to the PR. If you look here you can see all the PR files and diffs from master. Comments can be attached to the files in question. https://github.com/apache/mahout/pull/12/files
iterateNonZero is not in question afaik, and is used in a couple places. If someone wants to write an alternative I’ll be happy to change things. On Jun 12, 2014, at 10:06 AM, Sebastian Schelter <s...@apache.org> wrote: Ok, but the current implementation still gives the correct number, as it checks for accidental zeros. I think we should add some custom implementations here to not have to go through the non-zeroes iterator. --sebastian On 06/12/2014 07:00 PM, Ted Dunning wrote: > The reason is that sparse implementations may have recorded a non-zero that > later got assigned a zero, but they didn't bother to remove the memory cell. > > > > > On Thu, Jun 12, 2014 at 9:50 AM, Sebastian Schelter <s...@apache.org> wrote: > >> I'm a bit lost in this discussion. Why do we assume that >> getNumNonZeroElements() on a Vector only returns an upper bound? The code >> in AbstractVector clearly returns the non-zeros only: >> >> int count = 0; >> Iterator<Element> it = iterateNonZero(); >> while (it.hasNext()) { >> if (it.next().get() != 0.0) { >> count++; >> } >> } >> return count; >> >> On the other hand, the internal code seems broken here, why does >> iterateNonZero potentially return 0's? >> >> --sebastian >> >> >> >> >> >> >> On 06/12/2014 06:38 PM, ASF GitHub Bot (JIRA) wrote: >> >>> >>> [ https://issues.apache.org/jira/browse/MAHOUT-1464?page= >>> com.atlassian.jira.plugin.system.issuetabpanels:comment- >>> tabpanel&focusedCommentId=14029345#comment-14029345 ] >>> >>> ASF GitHub Bot commented on MAHOUT-1464: >>> ---------------------------------------- >>> >>> Github user dlyubimov commented on the pull request: >>> >>> https://github.com/apache/mahout/pull/12#issuecomment-45915940 >>> >>> fix header to say MAHOUT-1464, then hit close and reopen, it will >>> restart the echo. >>> >>> >>> Cooccurrence Analysis on Spark >>>> ------------------------------ >>>> >>>> Key: MAHOUT-1464 >>>> URL: https://issues.apache.org/jira/browse/MAHOUT-1464 >>>> Project: Mahout >>>> Issue Type: Improvement >>>> Components: Collaborative Filtering >>>> Environment: hadoop, spark >>>> Reporter: Pat Ferrel >>>> Assignee: Pat Ferrel >>>> Fix For: 1.0 >>>> >>>> Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, >>>> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, >>>> run-spark-xrsj.sh >>>> >>>> >>>> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) >>>> that runs on Spark. This should be compatible with Mahout Spark DRM DSL so >>>> a DRM can be used as input. >>>> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence >>>> has several applications including cross-action recommendations. >>>> >>> >>> >>> >>> -- >>> This message was sent by Atlassian JIRA >>> (v6.2#6252) >>> >>> >> >