[ 
https://issues.apache.org/jira/browse/MAHOUT-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189151#comment-16189151
 ] 

ASF GitHub Bot commented on MAHOUT-2019:
----------------------------------------

GitHub user pferrel opened a pull request:

    https://github.com/apache/mahout/pull/342

    MAHOUT-2019 Sparse speedup

    ### Purpose of PR:
    to review an apparent speedup of spark-itemsimilarity and the underlying 
SimilarityAnalysis.cooccurrence by using an iterateNonZero instead of the 
previous for loops in SparseRowMatrix.
    
    For discussion only at present
    
    MAHOUT-2019
    
https://issues.apache.org/jira/projects/MAHOUT/issues/MAHOUT-2019?filter=allopenissues&orderby=priority+DESC%2C+updated+DESC
    
    ### Important ToDos
    Please mark each with an "x"
    - [x] A JIRA ticket exists (if not, please create this 
first)[https://issues.apache.org/jira/browse/ZEPPELIN/]
    - [x] Title of PR is "MAHOUT-XXXX Brief Description of Changes" where XXXX 
is the JIRA number.
    - [ ] Created unit tests where appropriate
    - [ ] Added licenses correct on newly added files
    - [ ] Assigned JIRA to self
    - [ ] Added documentation in scala docs/java docs, and to website
    - [ ] Successfully built and ran all unit tests, verified that all tests 
pass locally.
    
    If all of these things aren't complete, but you still feel it is
    appropriate to open a PR, please add [WIP] after MAHOUT-XXXX before the
    descriptions- e.g. "MAHOUT-XXXX [WIP] Description of Change"
    
    Does this change break earlier versions?
    
    Is this the beginning of a larger project for which a feature branch should 
be made?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pferrel/mahout sparse-speedup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/mahout/pull/342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #342
    
----
commit 26a2efa65e9f09df358e1021ebf45e3735e2ec6c
Author: pferrel <p...@occamsmachete.com>
Date:   2017-10-02T18:39:54Z

    minimum speedup fix

commit 9330a2ed6d1211459c57863a5d664377c55aa747
Author: pferrel <p...@occamsmachete.com>
Date:   2017-10-02T19:27:47Z

    minimum speedup fix with cast exception check

commit 722bd11f01e7250f99f21f17ec7211bf5abb2089
Author: pferrel <p...@occamsmachete.com>
Date:   2017-10-02T20:33:07Z

    added cast exception logging to SparseRowMatrix

commit 02700ef13c44e403cba58288dcbab5cfabed8585
Author: pferrel <p...@occamsmachete.com>
Date:   2017-10-02T20:35:14Z

    Merge branch 'master' into sparse-speedup

----


> SparseRowMatrix assign ops user for loops instead of iterateNonZero and so 
> can be optimized
> -------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-2019
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-2019
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.13.0
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>             Fix For: 0.13.1
>
>
> DRMs get blockified into SparseRowMatrix instances if the density is low. But 
> SRM inherits the implementation of method like "assign" from AbstractMatrix, 
> which uses nest for loops to traverse rows. For multiplying 2 matrices that 
> are extremely sparse, the kind if data you see in collaborative filtering, 
> this is extremely wasteful of execution time. Better to use a sparse vector's 
> iterateNonZero Iterator for some function types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to