Re: question about setting SPARK_CLASSPATH IN spark_env.sh
Thanks, I hope this problem will go away once I upgrade to spark 1.0 where we can send the clusterwide classpaths using spark-submit command -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/question-about-setting-SPARK-CLASSPATH-IN-spark-env-sh-tp7809p7822.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: question about setting SPARK_CLASSPATH IN spark_env.sh
by the way, any idea how to sync the spark config dir with other nodes in the cluster? ~santhosh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/question-about-setting-SPARK-CLASSPATH-IN-spark-env-sh-tp7809p7853.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
question about setting SPARK_CLASSPATH IN spark_env.sh
Hi, This is about spark 0.9. I have a 3 node spark cluster. I want to add a locally available jarfile (present on all nodes) to the SPARK_CLASPATH variable in /etc/spark/conf/spark-env.sh so that all nodes can access it. Question is, should I edit 'spark-env.sh' on all nodes to add the jar ? Or, is it enough to add it only in the master node from where I am submitting jobs? thanks Santhosh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/question-about-setting-SPARK-CLASSPATH-IN-spark-env-sh-tp7809.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Configuring distributed caching with Spark and YARN
I think with addJar() there is no 'caching', in the sense files will be copied everytime per job. Whereas in hadoop distributed cache, files will be copied only once, and a symlink will be created to the cache file for subsequent runs: https://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/filecache/DistributedCache.html Also,hadoop distributed cache can copy an archive file to the node and unzip it automatically to current working dir. The advantage here is that the copying will be very fast.. Still looking for similar mechanisms in SPARK -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-distributed-caching-with-Spark-and-YARN-tp1074p3566.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: How to set environment variable for a spark job
OK, it was working. I printed System.getenv(..) for both env variables and they gave correct values. However it did not give me the intended result. My intention was to load a native library from LD_LIBRARY_PATH, but looks like the library is loaded from value of -Djava.library.path. Value of this property is coming as -Djava.library.path=/opt/cloudera/parcels/CDH-5.0.0-0.cdh5b2.p0.27/lib/spark/lib:/opt/cloudera/parcels/CDH-5.0.0-0.cdh5b2.p0.27/lib/hadoop/lib/native Any idea how to append my custom path to it programatically? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-environment-variable-for-a-spark-job-tp3180p3249.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Java API - Serialization Issue
This worked great. Thanks a lot -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Java-API-Serialization-Issue-tp1460p3178.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Java API - Serialization Issue
I am also facing the same problem. I have implemented Serializable for my code, but the exception is thrown from third party libraries on which I have no control . Exception in thread main org.apache.spark.SparkException: Job aborted: Task not serializable: java.io.NotSerializableException: (lib class name here) at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028) Is it mandatory that Serializable must be implemented for dependent jars as well? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Java-API-Serialization-Issue-tp1460p3086.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: java.io.NotSerializableException Of dependent Java lib.
Can someone answer this question please? Specifically about the Serializable implementation of dependent jars .. ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-NotSerializableException-Of-dependent-Java-lib-tp1973p3087.html Sent from the Apache Spark User List mailing list archive at Nabble.com.