[ https://issues.apache.org/jira/browse/SPARK-15027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265238#comment-15265238 ]
Xiangrui Meng commented on SPARK-15027: --------------------------------------- It might be tricky to use Dataset due to encoders and generic ID types. But if we use DataFrame as input and output, it seems feasible. It would be great if you can take a look. > ALS.train should use DataFrame instead of RDD > --------------------------------------------- > > Key: SPARK-15027 > URL: https://issues.apache.org/jira/browse/SPARK-15027 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark > Affects Versions: 2.0.0 > Reporter: Xiangrui Meng > Assignee: Xiangrui Meng > > We should also update `ALS.train` to use `Dataset/DataFrame` instead of `RDD` > to be consistent with other APIs under spark.ml and it also leaves space for > Tungsten-based optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org