Using a different spark jars than the one on the cluster

2015-03-18 Thread jaykatukuri
Hi all,
I am trying to run my job which needs spark-sql_2.11-1.3.0.jar. 
The cluster that I am running on is still on spark-1.2.0.

I tried the following :

spark-submit --class class-name --num-executors 100 --master yarn 
application_jar--jars hdfs:///path/spark-sql_2.11-1.3.0.jar
hdfs:///input_data 

But, this did not work, I get an error that it is not able to find a
class/method that is in spark-sql_2.11-1.3.0.jar .

org.apache.spark.sql.SQLContext.implicits()Lorg/apache/spark/sql/SQLContext$implicits$

The question in general is how do we use a different version of spark jars
(spark-core, spark-sql, spark-ml etc) than the one's running on a cluster ?

Thanks,
Jay





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-a-different-spark-jars-than-the-one-on-the-cluster-tp22125.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS

2015-03-16 Thread jaykatukuri
Hi all,
I am trying to use the new ALS implementation under
org.apache.spark.ml.recommendation.ALS.



The new method to invoke for training seems to be  override def fit(dataset:
DataFrame, paramMap: ParamMap): ALSModel.

How do I create a dataframe object from ratings data set that is on hdfs ?


where as the method in the old ALS implementation under
org.apache.spark.mllib.recommendation.ALS was 
 def train(
  ratings: RDD[Rating],
  rank: Int,
  iterations: Int,
  lambda: Double,
  blocks: Int,
  seed: Long
): MatrixFactorizationModel

My code to run the old ALS train method is as below:

 val sc = new SparkContext(conf) 
 
 val pfile = args(0)
 val purchase=sc.textFile(pfile)
val ratings = purchase.map(_.split(',') match { case Array(user, item,
rate) =
Rating(user.toInt, item.toInt, rate.toInt)
})

val model = ALS.train(ratings, rank, numIterations, 0.01)


Now, for the new ALS fit method, I am trying to use the below code to run,
but getting a compilation error:

val als = new ALS()
   .setRank(rank)
  .setRegParam(regParam)
  .setImplicitPrefs(implicitPrefs)
  .setNumUserBlocks(numUserBlocks)
  .setNumItemBlocks(numItemBlocks)

val sc = new SparkContext(conf) 
 
 val pfile = args(0)
 val purchase=sc.textFile(pfile)
val ratings = purchase.map(_.split(',') match { case Array(user, item,
rate) =
Rating(user.toInt, item.toInt, rate.toInt)
})

val model = als.fit(ratings.toDF())

I get an error that the method toDF() is not a member of
org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].

Appreciate the help !

Thanks,
Jay






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org