[GitHub] spark issue #21195: [Spark-23975][ML] Add support of array input for all clu...

MrBago Wed, 02 May 2018 15:28:35 -0700

Github user MrBago commented on the issue:

    https://github.com/apache/spark/pull/21195
  
    Thanks Lu!
    
    I had a pass over this PR and it looks pretty straightforward. One thing I 
noticed is that there are two patterns that we keep repeating. I think we 
should add private APIs for these patterns and delegate to those.
    
    The first pattern is the validate schema method defined in terms of 
typeCandidates. I suggest we add something like 
`validateVectorCompatibleColumn` to `DatasetUtils`. In addition to helping with 
code reuse, this api would make it easier if we ever decide, for example, to 
support Arrays of Ints.
    
    The second pattern is going from a dataframe & column name to an 
rdd[OldVector]. Lets add a method that does this, maybe something like 
`(DataFrame, String) => RDD[OldVector]`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21195: [Spark-23975][ML] Add support of array input for all clu...

Reply via email to