Daniel Li created SPARK-20486:
---------------------------------

             Summary: Encapsulate ALS in-block and out-block data structures 
and methods into a separate class
                 Key: SPARK-20486
                 URL: https://issues.apache.org/jira/browse/SPARK-20486
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
    Affects Versions: 2.1.0
            Reporter: Daniel Li
            Priority: Trivial


The in-block and out-block data structures in the ALS code is currently 
calculated within the {{ALS.train}} method itself.  I propose to move this 
code, along with its helper functions, into a separate class to encapsulate the 
creation of the blocks.  This has the added benefit of allowing us to include a 
comprehensive Scaladoc to this new class to explain in detail how this core 
part of the algorithm works.

Proposal:

{code}
private[recommendation] final case class RatingBlocks[ID](
  userIn: RDD[(Int, InBlock[ID])],
  userOut: RDD[(Int, OutBlock)],
  itemIn: RDD[(Int, InBlock[ID])],
  itemOut: RDD[(Int, OutBlock)]
)

private[recommendation] object RatingBlocks {
  def create[ID: ClassTag: Ordering](
      ratings: RDD[Rating[ID]],
      numUserBlocks: Int,
      numItemBlocks: Int,
      storageLevel: StorageLevel = StorageLevel.MEMORY_AND_DISK): 
RatingBlocks[ID] = {
    // In-block and out-block code currently in `ALS.train` goes here
  }

  private[this] def partitionRatings[ID: ClassTag](...) = { ... }

  private[this] def makeBlocks[ID: ClassTag](...) = { ... }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to