Decision Tree Model

2015-10-01 Thread hishamm
Hi,

I am using SPARK 1.4.0, Python and Decision Trees to perform machine
learning classification. 

I test it by creating the predictions and zip it to the test data, as
following: 


*predictions = tree_model.predict(test_data.map(lambda a: a.features))
labels = test_data.map(lambda a: a.label).zip(predictions)
correct = 100 * (labels.filter(lambda (v, p): v == p).count() /
float(test_data.count()))*

I always get this error in the zipping phase:

*Can not deserialize RDD with different number of items in pair: (3, 2)*


To avoid zipping, I tried to do it in a different way, as follows:

*labels = test_data.map(lambda a: (a.label, tree_model.predict(a.features)))
correct = 100 * (labels.filter(lambda (v, p): v == p).count() /
float(test_data.count()))*

However, I always get this error:

*in __getnewargs__(self)
250 # This method is called when attempting to pickle
SparkContext, which is always an error:
251 raise Exception(
--> 252 "It appears that you are attempting to reference
SparkContext from a broadcast "
253 "variable, action, or transforamtion. SparkContext can
only be used on the driver, "
254 "not in code that it run on workers. For more
information, see SPARK-5063."

Exception: It appears that you are attempting to reference SparkContext from
a broadcast variable, action, or transforamtion. SparkContext can only be
used on the driver, not in code that it run on workers. For more
information, see SPARK-5063.*


Is the DecisionTreeModel part of the SparkContext ?!  
I found that using Scala, we can apply the second approach with no problem. 


So, how can I solve the two problems ?

Thanks and Regards,
Hisham












--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Decision-Tree-Model-tp24899.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



spark-submit in deployment mode with the --jars option

2015-06-28 Thread hishamm
Hi,

I want to deploy my application on a standalone cluster. 
Spark submit acts in strange way. When I deploy the application in
*client* mode, everything works well and my application can see the
additional jar files. 

Here is the command:
   spark-submit --master spark://1.2.3.4:7077 --deploy-mode  client
 --supervise --jars $(echo /myjars/*.jar | tr ' ' ',')  --class
 com.algorithm /my/path/algorithm.jar 

However, when I submit the command in *cluster* deployment mode. The
driver can not see the additional jars.
I always get *java.lang.ClassNotFoundException*

Here is the command:
   spark-submit --master spark://1.2.3.4:7077 --deploy-mode cluster
 --supervise --jars $(echo /myjars/*.jar | tr ' ' ',')  --class
 com.algorithm /my/path/algorithm.jar 


Do I miss something ?

thanks,
Hisham



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-in-deployment-mode-with-the-jars-option-tp23519.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org