As I mentioned, using that *train* method returns the user and item factor RDDs, as opposed to an ALSModel instance. You first need to construct a model manually yourself. This is exactly why it's marked as *DeveloperApi*, since it is not user-friendly and not strictly part of the ML pipeline approach.
If you really want to use it, this should work: import org.apache.spark.ml.recommendation.ALS import org.apache.spark.ml.recommendation.ALS.Rating val conf = new SparkConf().setAppName("ALSWithStringID").setMaster("local[4]") val sc = new SparkContext(conf) val sql = new SQLContext(sc) // Name,Value1,Value2. val rdd = sc.parallelize(Seq( Rating[String]("foo", "1", 4.0f), Rating[String]("foo", "2", 2.0f), Rating[String]("bar", "1", 5.0f), Rating[String]("bar", "3", 1.0f) )) val als = new ALS() val (userFactors, itemFactors) = ALS.train(rdd) // note have not synced up training params with ALS instance params above. import sql.implicits._ val userDF = userFactors.toDF("id", "features") val itemDF = itemFactors.toDF("id", "features") val model = new ALSModel(als.uid, als.getRank, userDF, itemDF) .setParent(als) .setUserCol("user") .setItemCol("item") val pred = model.transform(rdd.toDF("user", "item", "rating")) println(pred.show()) Note that you will need to be careful to sync up parameters between the ALS.train and ALS instance and ALSModel. Note also that ml.ALS only supports *transform* (which makes predictions for a set of user and item columns in a DataFrame), and doesn't yet support the other predict methods available in mllib.ALS On Mon, 7 Mar 2016 at 21:25 Shishir Anshuman <shishiranshu...@gmail.com> wrote: > Hello Nick, > > I used *ml *instead of *mllib* for ALS and Rating. But now It gives me > error while using *predict()* from > *org.apache.spark.mllib.recommendation.MatrixFactorizationModel.* > > I have attached the code and the error screenshot. > > Thank you. > > On Mon, Mar 7, 2016 at 12:40 PM, Nick Pentreath <nick.pentre...@gmail.com> > wrote: > >> As you've pointed out, Rating requires user and item ids in Int form. So >> you will need to map String user ids to integers. >> >> See this thread for example: >> https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAJgQjQ9GhGqpg1=hvxpfrs+59elfj9f7knhp8nyqnh1ut_6...@mail.gmail.com%3E >> . >> >> There is a DeveloperApi method >> in org.apache.spark.ml.recommendation.ALS that takes Rating with generic >> type (can be String) for user id and item id. However that is a little more >> involved, and for larger scale data will be a lot less efficient. >> >> Something like this for example: >> >> import org.apache.spark.ml.recommendation.ALS >> import org.apache.spark.ml.recommendation.ALS.Rating >> >> val conf = new >> SparkConf().setAppName("ALSWithStringID").setMaster("local[4]") >> val sc = new SparkContext(conf) >> // Name,Value1,Value2. >> val rdd = sc.parallelize(Seq( >> Rating[String]("foo", "1", 4.0f), >> Rating[String]("foo", "2", 2.0f), >> Rating[String]("bar", "1", 5.0f), >> Rating[String]("bar", "3", 1.0f) >> )) >> val (userFactors, itemFactors) = ALS.train(rdd) >> >> >> As you can see, you just get the factor RDDs back, and if you want an >> ALSModel you will have to construct it yourself. >> >> >> On Sun, 6 Mar 2016 at 18:23 Shishir Anshuman <shishiranshu...@gmail.com> >> wrote: >> >>> I am new to apache Spark, and I want to implement the Alternating Least >>> Squares algorithm. The data set is stored in a csv file in the format: >>> *Name,Value1,Value2*. >>> >>> When I read the csv file, I get >>> *java.lang.NumberFormatException.forInputString* error because the >>> Rating class needs the parameters in the format: *(user: Int, product: >>> Int, rating: Double)* and the first column of my file contains *Name*. >>> >>> Please suggest me a way to overcome this issue. >>> >> >