Re: how to implement ALS with csv file? getting error while calling Rating class

Nick Pentreath Tue, 08 Mar 2016 02:31:18 -0800

As I mentioned, using that *train* method returns the user and item factor
RDDs, as opposed to an ALSModel instance. You first need to construct a
model manually yourself. This is exactly why it's marked as *DeveloperApi*,
since it is not user-friendly and not strictly part of the ML pipeline
approach.


If you really want to use it, this should work:

import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.ml.recommendation.ALS.Rating

val conf = new SparkConf().setAppName("ALSWithStringID").setMaster("local[4]")
val sc = new SparkContext(conf)
val sql = new SQLContext(sc)
// Name,Value1,Value2.
val rdd = sc.parallelize(Seq(
  Rating[String]("foo", "1", 4.0f),
  Rating[String]("foo", "2", 2.0f),
  Rating[String]("bar", "1", 5.0f),
  Rating[String]("bar", "3", 1.0f)
))
val als = new ALS()
val (userFactors, itemFactors) = ALS.train(rdd)   // note have not
synced up training params with ALS instance params above.

import sql.implicits._
val userDF = userFactors.toDF("id", "features")
val itemDF = itemFactors.toDF("id", "features")
val model = new ALSModel(als.uid, als.getRank, userDF, itemDF)
  .setParent(als)
  .setUserCol("user")
  .setItemCol("item")

val pred = model.transform(rdd.toDF("user", "item", "rating"))
println(pred.show())


Note that you will need to be careful to sync up parameters between the
ALS.train and ALS instance and ALSModel. Note also that ml.ALS only
supports *transform* (which makes predictions for a set of user and item
columns in a DataFrame), and doesn't yet support the other predict methods
available in mllib.ALS


On Mon, 7 Mar 2016 at 21:25 Shishir Anshuman <shishiranshu...@gmail.com>
wrote:

> Hello Nick,
>
> I used *ml *instead of *mllib*  for ALS and Rating. But now It gives me
> error while using *predict()* from
> *org.apache.spark.mllib.recommendation.MatrixFactorizationModel.*
>
> I have attached the code and the error screenshot.
>
> Thank you.
>
> On Mon, Mar 7, 2016 at 12:40 PM, Nick Pentreath <nick.pentre...@gmail.com>
> wrote:
>
>> As you've pointed out, Rating requires user and item ids in Int form. So
>> you will need to map String user ids to integers.
>>
>> See this thread for example:
>> https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAJgQjQ9GhGqpg1=hvxpfrs+59elfj9f7knhp8nyqnh1ut_6...@mail.gmail.com%3E
>> .
>>
>> There is a DeveloperApi method
>> in org.apache.spark.ml.recommendation.ALS that takes Rating with generic
>> type (can be String) for user id and item id. However that is a little more
>> involved, and for larger scale data will be a lot less efficient.
>>
>> Something like this for example:
>>
>> import org.apache.spark.ml.recommendation.ALS
>> import org.apache.spark.ml.recommendation.ALS.Rating
>>
>> val conf = new 
>> SparkConf().setAppName("ALSWithStringID").setMaster("local[4]")
>> val sc = new SparkContext(conf)
>> // Name,Value1,Value2.
>> val rdd = sc.parallelize(Seq(
>>   Rating[String]("foo", "1", 4.0f),
>>   Rating[String]("foo", "2", 2.0f),
>>   Rating[String]("bar", "1", 5.0f),
>>   Rating[String]("bar", "3", 1.0f)
>> ))
>> val (userFactors, itemFactors) = ALS.train(rdd)
>>
>>
>> As you can see, you just get the factor RDDs back, and if you want an
>> ALSModel you will have to construct it yourself.
>>
>>
>> On Sun, 6 Mar 2016 at 18:23 Shishir Anshuman <shishiranshu...@gmail.com>
>> wrote:
>>
>>> I am new to apache Spark, and I want to implement the Alternating Least
>>> Squares algorithm. The data set is stored in a csv file in the format:
>>> *Name,Value1,Value2*.
>>>
>>> When I read the csv file, I get
>>> *java.lang.NumberFormatException.forInputString* error because the
>>> Rating class needs the parameters in the format: *(user: Int, product:
>>> Int, rating: Double)* and the first column of my file contains *Name*.
>>>
>>> Please suggest me a way to overcome this issue.
>>>
>>
>

Re: how to implement ALS with csv file? getting error while calling Rating class

Reply via email to