GitHub user danielyli opened a pull request:

    https://github.com/apache/spark/pull/17767

    Als refactor

    ## What changes were proposed in this pull request?
    
    This is a non-feature-changing refactoring of the ALS code (specifically, 
the `org.apache.spark.ml.recommendation` package), done to improve code 
maintainability and to add significant documentation to the existing code.  My 
motivation for this PR is that I've been working on an online streaming ALS 
implementation [[SPARK-6407](https://issues.apache.org/jira/browse/SPARK-6407)] 
(PR coming soon), and I've been refactoring the package to help me understand 
the existing code before adding to it.  I've also tried my best to include a 
fair bit of Scaladocs and inline comments where I felt they would have helped 
when I was reading the code.
    
    I've done a fair bit of rebasing and sausage making to make the commits 
easy to follow, since no one likes to stare at a 2,700-line PR.  Please let me 
know if I can make anything clearer.  I'd be happy to answer any questions.
    
    In a few places, you'll find a `PLEASE_ADVISE(danielyli):` tag in the code. 
 These are questions I had in the course of the refactoring.  I'd appreciate if 
the relevant folks could help me with these.  Thanks.
    
    ## How was this patch tested?
    
    As this is a non-feature-changing refactoring, existing tests were used.  
All existing ALS tests pass.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/danielyli/spark als-refactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17767.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17767
    
----
commit deca4db3f234ea60c1494265d4f3ac9375869dd6
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-05T21:55:21Z

    Split `ALS.scala` into multiple files
    
    This commit moves the classes `ALS` and `ALSModel` and the traits
    `ALSParams` and `ALSModelParams` into their own files.

commit 4086bc9d0c7689e0d2047ac17ada29fe236eb6e6
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-05T22:20:32Z

    Move solver classes into their own file
    
    This commit puts the classes `LeastSquaresNESolver`, `CholeskySolver`,
    `NNLSSolver`, and `NormalEquation` into a mixin in a separate file in
    order to reduce the size and improve the readability of `ALS.scala`.

commit 8aaa533df6f3c9a4b4e8c5d5023f831daf06fa9e
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-05T22:30:38Z

    Minor cleanup of imports
    
      *  import java.util.Arrays
      *  import scala.collection.mutable.ArrayBuilder

commit b68680025e71ebd422087ef95d5ecb7af40fa26d
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-05T22:48:50Z

    Create a package object to hold small type and class definitions

commit 83f849ee45fd7c80a1a50fcf12da1eb99d8b6346
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-06T02:17:14Z

    Refactor `RatingBlock`-related code
    
    This commit moves the following classes and methods into new files,
    separating and encapsulating them as appropriate:
    
      *  RatingBlock
      *  RatingBlockBuilder
      *  UncompressedInBlock
      *  UncompressedInBlockBuilder
      *  KeyWrapper
      *  UncompressedInBlockSort
      *  LocalIndexEncoder
      *  partitionRatings
      *  makeBlocks
    
    In the course of this refactoring we create a new class, `RatingBlocks`,
    to hold the user/item in/out block data and associated logic.

commit 819a00f7fe7384e588ce78cb65e9413ac6588401
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-06T07:08:43Z

    Pull out `RatingBlock` from `RatingBlocks` into its own file
    
    This commit puts the `RatingBlock` class into a mixin for the
    `RatingBlocks` companion object to extend.  This is done purely to
    increase readability by reducing the file size of `RatingBlocks.scala`.

commit 56d10ba1fa627f343e67525e2a3b08e7287bfe2f
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-06T08:50:06Z

    Tighten access modifiers where appropriate and make case classes `final`

commit b861d18784ba4ce688d3eacaea10169c9ce2d091
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-06T09:38:54Z

    Improve code hygiene of `RatingBlocks`
    
    Among other things, `while` loops that used manually incremented
    counters have been changed to `for` loops to increase readability.
    Performance should be nominally affected.

commit 5dfee79a1280d0a72bbe7b8596cdf86654fa0fbc
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-06T09:57:11Z

    Spruce up `ALS#fit`
    
    This commit adds vertical whitespace to improve readability.

commit 056d6d0ecc962f94c83f43a6384607bf8833d083
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-25T23:31:54Z

    Mark `RatingBlocks` constructor as `private`

commit 34df11247ec3fdcf29e220bedcdf28b58d1ac4ec
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-25T23:32:44Z

    Add Scaladocs to `RatingBlocks.scala`

commit 31b0dcd843d8edc16c6d2bf982d3de753b6dc066
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-06T09:59:24Z

    Spruce up `ALS.train` and `ALS.initFactors`
    
      *  Add descriptive Scaladoc giving high-level overview of `ALS.train`
         algorithm
    
      *  Refactor out duplicated code into variables
    
      *  Add comments to clarify code where appropriate
    
      *  Change `while` loops that used manually incremented counters to
         `for` loops to increase readability.  Performance should be
         nominally affected.
    
      *  Clarify names of variables where appropriate
    
      *  Add vertical whitespace to improve readability

commit 42243fef420de6d7db28194f0dc804284182dc64
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-06T10:11:46Z

    Spruce up `ALS.computeFactors`
    
      *  Refactor out duplicated code into variables
    
      *  Change `while` loops that used manually incremented counters to
         `for` loops to increase readability.  Performance should be
         nominally affected.
    
      *  Clarify names of variables where appropriate
    
      *  Add vertical whitespace to improve readability

commit ac01f7c19513a35d838df1db11c91a0db6abb00f
Author: Daniel Li <d...@danielyli.com>
Date:   2017-04-25T08:02:40Z

    Update Scaladoc for `ALS` class

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to