Using a different spark jars than the one on the cluster

2015-03-18 Thread jaykatukuri
Hi all, I am trying to run my job which needs spark-sql_2.11-1.3.0.jar. The cluster that I am running on is still on spark-1.2.0. I tried the following : spark-submit --class class-name --num-executors 100 --master yarn application_jar--jars hdfs:///path/spark-sql_2.11-1.3.0.jar

RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS

2015-03-16 Thread jaykatukuri
Hi all, I am trying to use the new ALS implementation under org.apache.spark.ml.recommendation.ALS. The new method to invoke for training seems to be override def fit(dataset: DataFrame, paramMap: ParamMap): ALSModel. How do I create a dataframe object from ratings data set that is on hdfs ?