SchemaRDD subclasses RDD in 1.2, but DataFrame is no longer an RDD in 1.3. We should allow DataFrames in ALS.train. I will submit a patch. You can use `ALS.train(training.rdd, ...)` for now as a workaround. -Xiangrui
On Tue, Apr 21, 2015 at 10:51 AM, Joseph Bradley <jos...@databricks.com> wrote: > Hi Ayan, > > If you want to use DataFrame, then you should use the Pipelines API > (org.apache.spark.ml.*) which will take DataFrames: > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.recommendation.ALS > > In the examples/ directory for ml/, you can find a MovieLensALS example. > > Good luck! > Joseph > > On Tue, Apr 21, 2015 at 4:58 AM, ayan guha <guha.a...@gmail.com> wrote: >> >> Hi >> >> I am getting an error >> >> Also, I am getting an error in mlib.ALS.train function when passing >> dataframe (do I need to convert the DF to RDD?) >> >> Code: >> training = ssc.sql("select userId,movieId,rating from ratings where >> partitionKey < 6").cache() >> print type(training) >> model = ALS.train(training,rank,numIter,lmbda) >> >> Error: >> <class 'pyspark.sql.dataframe.DataFrame'> >> >> Traceback (most recent call last): >> File "D:\Project\Spark\code\movie_sql.py", line 109, in <module> >> bestConf = getBestModel(sc,ssc,training,validation,validationNoRating) >> File "D:\Project\Spark\code\movie_sql.py", line 54, in getBestModel >> model = ALS.train(trainingRDD,rank,numIter,lmbda) >> File >> "D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\mllib\recommendation.py", >> line 139, in train >> model = callMLlibFunc("trainALSModel", cls._prepare(ratings), rank, >> iterations, >> File >> "D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\mllib\recommendation.py", >> line 127, in _prepare >> assert isinstance(ratings, RDD), "ratings should be RDD" >> AssertionError: ratings should be RDD >> >> It was working fine in 1.2.0 (till last night :)) >> >> Any solution? I am thinking to map the training dataframe back to a RDD, >> byt will lose the schema information. >> >> Best >> Ayan >> >> On Mon, Apr 20, 2015 at 10:23 PM, ayan guha <guha.a...@gmail.com> wrote: >>> >>> Hi >>> Just upgraded to Spark 1.3.1. >>> >>> I am getting an warning >>> >>> Warning (from warnings module): >>> File >>> "D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\sql\context.py", >>> line 191 >>> warnings.warn("inferSchema is deprecated, please use createDataFrame >>> instead") >>> UserWarning: inferSchema is deprecated, please use createDataFrame >>> instead >>> >>> However, documentation still says to use inferSchema. >>> Here: http://spark.apache.org/docs/latest/sql-programming-guide.htm in >>> section >>> >>> Also, I am getting an error in mlib.ALS.train function when passing >>> dataframe (do I need to convert the DF to RDD?) >>> >>> Code: >>> training = ssc.sql("select userId,movieId,rating from ratings where >>> partitionKey < 6").cache() >>> print type(training) >>> model = ALS.train(training,rank,numIter,lmbda) >>> >>> Error: >>> <class 'pyspark.sql.dataframe.DataFrame'> >>> Rank:8 Lmbda:1.0 iteration:10 >>> >>> Traceback (most recent call last): >>> File "D:\Project\Spark\code\movie_sql.py", line 109, in <module> >>> bestConf = >>> getBestModel(sc,ssc,training,validation,validationNoRating) >>> File "D:\Project\Spark\code\movie_sql.py", line 54, in getBestModel >>> model = ALS.train(trainingRDD,rank,numIter,lmbda) >>> File >>> "D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\mllib\recommendation.py", >>> line 139, in train >>> model = callMLlibFunc("trainALSModel", cls._prepare(ratings), rank, >>> iterations, >>> File >>> "D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\mllib\recommendation.py", >>> line 127, in _prepare >>> assert isinstance(ratings, RDD), "ratings should be RDD" >>> AssertionError: ratings should be RDD >>> >>> -- >>> Best Regards, >>> Ayan Guha >> >> >> >> >> -- >> Best Regards, >> Ayan Guha > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org