Re: RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS

Xiangrui Meng Mon, 16 Mar 2015 11:39:01 -0700

Try this:

val ratings = purchase.map { line =>
  line.split(',') match { case Array(user, item, rate) =>
  (user.toInt, item.toInt, rate.toFloat)
}.toDF("user", "item", "rate")


Doc for DataFrames:
http://spark.apache.org/docs/latest/sql-programming-guide.html

-Xiangrui

On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jkatuk...@apple.com> wrote:
> Hi all,
> I am trying to use the new ALS implementation under
> org.apache.spark.ml.recommendation.ALS.
>
>
>
> The new method to invoke for training seems to be  override def fit(dataset:
> DataFrame, paramMap: ParamMap): ALSModel.
>
> How do I create a dataframe object from ratings data set that is on hdfs ?
>
>
> where as the method in the old ALS implementation under
> org.apache.spark.mllib.recommendation.ALS was
>  def train(
>       ratings: RDD[Rating],
>       rank: Int,
>       iterations: Int,
>       lambda: Double,
>       blocks: Int,
>       seed: Long
>     ): MatrixFactorizationModel
>
> My code to run the old ALS train method is as below:
>
>  "val sc = new SparkContext(conf)
>
>      val pfile = args(0)
>      val purchase=sc.textFile(pfile)
>     val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>         Rating(user.toInt, item.toInt, rate.toInt)
>     })
>
> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>
>
> Now, for the new ALS fit method, I am trying to use the below code to run,
> but getting a compilation error:
>
> val als = new ALS()
>        .setRank(rank)
>       .setRegParam(regParam)
>       .setImplicitPrefs(implicitPrefs)
>       .setNumUserBlocks(numUserBlocks)
>       .setNumItemBlocks(numItemBlocks)
>
> val sc = new SparkContext(conf)
>
>      val pfile = args(0)
>      val purchase=sc.textFile(pfile)
>     val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>         Rating(user.toInt, item.toInt, rate.toInt)
>     })
>
> val model = als.fit(ratings.toDF())
>
> I get an error that the method toDF() is not a member of
> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>
> Appreciate the help !
>
> Thanks,
> Jay
>
>
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS

Reply via email to