Re: question about setting SPARK_CLASSPATH IN spark_env.sh

2014-06-18 Thread santhoma
Thanks, I hope this problem will go away once I upgrade to spark 1.0 where we
can send the clusterwide classpaths using spark-submit command



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/question-about-setting-SPARK-CLASSPATH-IN-spark-env-sh-tp7809p7822.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: question about setting SPARK_CLASSPATH IN spark_env.sh

2014-06-18 Thread santhoma
by the way, any idea how to sync the spark config dir with other nodes in the
cluster?

~santhosh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/question-about-setting-SPARK-CLASSPATH-IN-spark-env-sh-tp7809p7853.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


question about setting SPARK_CLASSPATH IN spark_env.sh

2014-06-17 Thread santhoma
Hi, 

This is about spark 0.9. 
I have a 3 node spark cluster. I want to add a locally available jarfile
(present on all nodes) to the SPARK_CLASPATH variable in
/etc/spark/conf/spark-env.sh  so that all nodes can access it.

Question is,
should I edit 'spark-env.sh' on all nodes to add the jar  ?
Or, is it enough to add it only in the master node from where I am
submitting jobs?

thanks
Santhosh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/question-about-setting-SPARK-CLASSPATH-IN-spark-env-sh-tp7809.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Configuring distributed caching with Spark and YARN

2014-04-01 Thread santhoma
I think with addJar() there is no 'caching',  in the sense files will be
copied everytime per job.
Whereas in hadoop distributed cache, files will be copied only once, and a
symlink will be created to the cache file for subsequent runs:
https://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/filecache/DistributedCache.html

Also,hadoop distributed cache can copy an archive  file to the node and
unzip it automatically to current working dir. The advantage here is that
the copying will be very fast..

Still looking for similar  mechanisms in SPARK




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-distributed-caching-with-Spark-and-YARN-tp1074p3566.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: How to set environment variable for a spark job

2014-03-26 Thread santhoma
OK, it was working. 
I printed System.getenv(..) for both env variables  and they gave  correct
values.

However it  did not give me the intended result. My intention was to load a
native library from LD_LIBRARY_PATH, but looks like the library is loaded
from value of -Djava.library.path. 

Value of this property is coming as
-Djava.library.path=/opt/cloudera/parcels/CDH-5.0.0-0.cdh5b2.p0.27/lib/spark/lib:/opt/cloudera/parcels/CDH-5.0.0-0.cdh5b2.p0.27/lib/hadoop/lib/native

Any idea how to append my custom path to it programatically? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-environment-variable-for-a-spark-job-tp3180p3249.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Java API - Serialization Issue

2014-03-25 Thread santhoma
This worked great. Thanks a lot



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Java-API-Serialization-Issue-tp1460p3178.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Java API - Serialization Issue

2014-03-24 Thread santhoma
I am also facing the same problem. I have implemented Serializable for my
code, but the exception is thrown from third party libraries on which I have
no control . 

Exception in thread main org.apache.spark.SparkException: Job aborted:
Task not serializable: java.io.NotSerializableException: (lib class name
here)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)

Is it mandatory that Serializable must be implemented for dependent jars as
well?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Java-API-Serialization-Issue-tp1460p3086.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: java.io.NotSerializableException Of dependent Java lib.

2014-03-24 Thread santhoma
Can someone answer this question please?
 Specifically about  the Serializable implementation of dependent jars .. ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-NotSerializableException-Of-dependent-Java-lib-tp1973p3087.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.