Try this: val ratings = purchase.map { line => line.split(',') match { case Array(user, item, rate) => (user.toInt, item.toInt, rate.toFloat) }.toDF("user", "item", "rate")
Doc for DataFrames: http://spark.apache.org/docs/latest/sql-programming-guide.html -Xiangrui On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jkatuk...@apple.com> wrote: > Hi all, > I am trying to use the new ALS implementation under > org.apache.spark.ml.recommendation.ALS. > > > > The new method to invoke for training seems to be override def fit(dataset: > DataFrame, paramMap: ParamMap): ALSModel. > > How do I create a dataframe object from ratings data set that is on hdfs ? > > > where as the method in the old ALS implementation under > org.apache.spark.mllib.recommendation.ALS was > def train( > ratings: RDD[Rating], > rank: Int, > iterations: Int, > lambda: Double, > blocks: Int, > seed: Long > ): MatrixFactorizationModel > > My code to run the old ALS train method is as below: > > "val sc = new SparkContext(conf) > > val pfile = args(0) > val purchase=sc.textFile(pfile) > val ratings = purchase.map(_.split(',') match { case Array(user, item, > rate) => > Rating(user.toInt, item.toInt, rate.toInt) > }) > > val model = ALS.train(ratings, rank, numIterations, 0.01)" > > > Now, for the new ALS fit method, I am trying to use the below code to run, > but getting a compilation error: > > val als = new ALS() > .setRank(rank) > .setRegParam(regParam) > .setImplicitPrefs(implicitPrefs) > .setNumUserBlocks(numUserBlocks) > .setNumItemBlocks(numItemBlocks) > > val sc = new SparkContext(conf) > > val pfile = args(0) > val purchase=sc.textFile(pfile) > val ratings = purchase.map(_.split(',') match { case Array(user, item, > rate) => > Rating(user.toInt, item.toInt, rate.toInt) > }) > > val model = als.fit(ratings.toDF()) > > I get an error that the method toDF() is not a member of > org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]]. > > Appreciate the help ! > > Thanks, > Jay > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org