Hi all,
I am trying to use the new ALS implementation under
org.apache.spark.ml.recommendation.ALS.
The new method to invoke for training seems to be override def fit(dataset:
DataFrame, paramMap: ParamMap): ALSModel.
How do I create a dataframe object from ratings data set that is on hdfs ?
where as the method in the old ALS implementation under
org.apache.spark.mllib.recommendation.ALS was
def train(
ratings: RDD[Rating],
rank: Int,
iterations: Int,
lambda: Double,
blocks: Int,
seed: Long
): MatrixFactorizationModel
My code to run the old ALS train method is as below:
val sc = new SparkContext(conf)
val pfile = args(0)
val purchase=sc.textFile(pfile)
val ratings = purchase.map(_.split(',') match { case Array(user, item,
rate) =
Rating(user.toInt, item.toInt, rate.toInt)
})
val model = ALS.train(ratings, rank, numIterations, 0.01)
Now, for the new ALS fit method, I am trying to use the below code to run,
but getting a compilation error:
val als = new ALS()
.setRank(rank)
.setRegParam(regParam)
.setImplicitPrefs(implicitPrefs)
.setNumUserBlocks(numUserBlocks)
.setNumItemBlocks(numItemBlocks)
val sc = new SparkContext(conf)
val pfile = args(0)
val purchase=sc.textFile(pfile)
val ratings = purchase.map(_.split(',') match { case Array(user, item,
rate) =
Rating(user.toInt, item.toInt, rate.toInt)
})
val model = als.fit(ratings.toDF())
I get an error that the method toDF() is not a member of
org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
Appreciate the help !
Thanks,
Jay
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org