[GitHub] spark pull request #21695: Maintining an order

nagpall Mon, 02 Jul 2018 02:07:05 -0700

GitHub user nagpall opened a pull request:

    https://github.com/apache/spark/pull/21695


    Maintining an order

    ## What is the problem?
    In both IndexedRowMatrix.computeSVD and IndexedRowMatrix.multiply indices 
are dropped before calling the methods from RowMatrix.
    For the IndexedRowMatrix.multiply I have observed that ordering within 
partitions is preserved, but that it seems to get mixed up between partitions. 
For example, for:
    
    part1Index1 part1Vector1
    part1Index2 part1Vector2
    part2Index1 part2Vector1
    part2Index2 part2Vector2
    
    I got:
    
    part2Index1 part1Vector1
    part2Index2 part1Vector2
    part1Index1 part2Vector1
    part1Index2 part2Vector2
    
    You can find the more details here :
    [https://issues.apache.org/jira/browse/SPARK-8614](url)
    
    ## What changes were proposed in this pull request?
    Instead of converting IndexedRowMatrix to RowMatrix and loosing index, we 
are keeping it IndexedRowMatrix and taking out index and row matrix and then 
multiplying the row with matrix and placing it at right index.
    
    ## How was this patch tested?
    With this changes all Ut's are passing for mllib module. 
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nagpall/spark patch-spark-8614

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21695.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21695
    
----
commit d833d1e2020dd45e063aeb56f7649f766a4a1635
Author: Anuj Nagpal <ajnagpalmnit@...>
Date:   2018-07-02T08:57:12Z

    Maintining an order

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21695: Maintining an order

Reply via email to