Hi Just upgraded to Spark 1.3.1. I am getting an warning
Warning (from warnings module): File "D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\sql\context.py", line 191 warnings.warn("inferSchema is deprecated, please use createDataFrame instead") UserWarning: inferSchema is deprecated, please use createDataFrame instead However, documentation still says to use inferSchema. Here: http://spark.apache.org/docs/latest/sql-programming-guide.htm in section Also, I am getting an error in mlib.ALS.train function when passing dataframe (do I need to convert the DF to RDD?) Code: training = ssc.sql("select userId,movieId,rating from ratings where partitionKey < 6").cache() print type(training) model = ALS.train(training,rank,numIter,lmbda) Error: <class 'pyspark.sql.dataframe.DataFrame'> Rank:8 Lmbda:1.0 iteration:10 Traceback (most recent call last): File "D:\Project\Spark\code\movie_sql.py", line 109, in <module> bestConf = getBestModel(sc,ssc,training,validation,validationNoRating) File "D:\Project\Spark\code\movie_sql.py", line 54, in getBestModel model = ALS.train(trainingRDD,rank,numIter,lmbda) File "D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\mllib\recommendation.py", line 139, in train model = callMLlibFunc("trainALSModel", cls._prepare(ratings), rank, iterations, File "D:\spark\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\spark-1.3.1-bin-hadoop2.6\python\pyspark\mllib\recommendation.py", line 127, in _prepare assert isinstance(ratings, RDD), "ratings should be RDD" AssertionError: ratings should be RDD -- Best Regards, Ayan Guha