Re: little confused about SPARK_JAVA_OPTS alternatives
Hi Andrew, I'm actually using spark-submit, and I tried using spark.executor.extraJavaOpts to configure tachyon client to connect to Tachyon HA master, however the configuration settings were not picked up. On the other hand when I set the same tachyon configuration parameters through SPARK_JAVA_OPTS or /conf/java_opts it actually worked. IMHO tachyon client classes are loaded into jvm and since they are mostly singletons system properties are not being refreshed. Let me know if you need more info. I have logs from both runs and I can try different settings on my spark cluster (I am running spark on mesos in fine grained mode) Best regards Lukasz -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/little-confused-about-SPARK-JAVA-OPTS-alternatives-tp5798p8551.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: little confused about SPARK_JAVA_OPTS alternatives
Well, even before spark-submit the standard way of setting spark configurations is to create a new SparkConf, set the values in the conf, and pass this to the SparkContext in your application. It's true that this involves "hard-coding" these configurations in your application, but these configurations intended to be application-level settings anyway, rather than cluster-wide settings. Environment variables are not really ideal for this purpose, though it's an easy way to change these settings quickly. 2014-06-20 14:03 GMT-07:00 Koert Kuipers : > thanks for the detailed answer andrew. thats helpful. > > i think the main thing thats bugging me is that there is no simple way for > an admin to always set something on the executors for a production > environment (an akka timeout comes to mind). yes i could use > spark-defaults for that, although that means everything must be submitted > through spark-submit, which is fairly new and i am not sure how much we > will use that yet. i will look into that some more. > > > On Thu, Jun 19, 2014 at 6:56 PM, Koert Kuipers wrote: > >> for a jvm application its not very appealing to me to use spark >> submit my application uses hadoop, so i should use "hadoop jar", and my >> application uses spark, so it should use "spark-submit". if i add a piece >> of code that uses some other system there will be yet another suggested way >> to launch it. thats not very scalable, since i can only launch it one way >> in the end... >> >> >> On Thu, Jun 19, 2014 at 4:58 PM, Andrew Or wrote: >> >>> Hi Koert and Lukasz, >>> >>> The recommended way of not hard-coding configurations in your >>> application is through conf/spark-defaults.conf as documented here: >>> http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties. >>> However, this is only applicable to >>> spark-submit, so this may not be useful to you. >>> >>> Depending on how you launch your Spark applications, you can workaround >>> this by manually specifying these configs as -Dspark.x=y >>> in your java command to launch Spark. This is actually how >>> SPARK_JAVA_OPTS used to work before 1.0. Note that spark-submit does >>> essentially the same thing, but sets these properties programmatically >>> by reading from the conf/spark-defaults.conf file and calling >>> System.setProperty("spark.x", "y"). >>> >>> Note that spark.executor.extraJavaOpts not intended for spark >>> configuration (see >>> http://spark.apache.org/docs/latest/configuration.html). >>> SPARK_DAEMON_JAVA_OPTS, as you pointed out, is for Spark daemons like >>> the standalone master, worker, and the history server; >>> it is also not intended for spark configurations to be picked up by >>> Spark executors and drivers. In general, any reference to "java opts" >>> in any variable or config refers to java options, as the name implies, >>> not Spark configuration. Unfortunately, it just so happened that we >>> used to mix the two in the same environment variable before 1.0. >>> >>> Is there a reason you're not using spark-submit? Is it for legacy >>> reasons? As of 1.0, most changes to launching Spark applications >>> will be done through spark-submit, so you may miss out on relevant new >>> features or bug fixes. >>> >>> Andrew >>> >>> >>> >>> 2014-06-19 7:41 GMT-07:00 Koert Kuipers : >>> >>> still struggling with SPARK_JAVA_OPTS being deprecated. i am using spark standalone. for example if i have a akka timeout setting that i would like to be applied to every piece of the spark framework (so spark master, spark workers, spark executor sub-processes, spark-shell, etc.). i used to do that with SPARK_JAVA_OPTS. now i am unsure. SPARK_DAEMON_JAVA_OPTS works for the master and workers, but not for the spark-shell i think? i tried using SPARK_DAEMON_JAVA_OPTS, and it does not seem that useful. for example for a worker it does not apply the settings to the executor sub-processes, while for SPARK_JAVA_OPTS it does do that. so seems like SPARK_JAVA_OPTS is my only way to change settings for the executors, yet its deprecated? On Wed, Jun 11, 2014 at 10:59 PM, elyast wrote: > Hi, > > I tried to use SPARK_JAVA_OPTS in spark-env.sh as well as > conf/java-opts > file to set additional java system properties. In this case I could > connect > to tachyon without any problem. > > However when I tried setting executor and driver extraJavaOptions in > spark-defaults.conf it doesn't. > > I suspect the root cause may be following: > > SparkSubmit doesn't fork additional JVM to actually run either driver > or > executor process and additional system properties are set after JVM is > created and other classes are loaded. It may happen that Tachyon > CommonConf > class is already being loaded and since its Singleton it won't pick up > and > changes to system properties.
Re: little confused about SPARK_JAVA_OPTS alternatives
thanks for the detailed answer andrew. thats helpful. i think the main thing thats bugging me is that there is no simple way for an admin to always set something on the executors for a production environment (an akka timeout comes to mind). yes i could use spark-defaults for that, although that means everything must be submitted through spark-submit, which is fairly new and i am not sure how much we will use that yet. i will look into that some more. On Thu, Jun 19, 2014 at 6:56 PM, Koert Kuipers wrote: > for a jvm application its not very appealing to me to use spark submit > my application uses hadoop, so i should use "hadoop jar", and my > application uses spark, so it should use "spark-submit". if i add a piece > of code that uses some other system there will be yet another suggested way > to launch it. thats not very scalable, since i can only launch it one way > in the end... > > > On Thu, Jun 19, 2014 at 4:58 PM, Andrew Or wrote: > >> Hi Koert and Lukasz, >> >> The recommended way of not hard-coding configurations in your application >> is through conf/spark-defaults.conf as documented here: >> http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties. >> However, this is only applicable to >> spark-submit, so this may not be useful to you. >> >> Depending on how you launch your Spark applications, you can workaround >> this by manually specifying these configs as -Dspark.x=y >> in your java command to launch Spark. This is actually how >> SPARK_JAVA_OPTS used to work before 1.0. Note that spark-submit does >> essentially the same thing, but sets these properties programmatically by >> reading from the conf/spark-defaults.conf file and calling >> System.setProperty("spark.x", "y"). >> >> Note that spark.executor.extraJavaOpts not intended for spark >> configuration (see http://spark.apache.org/docs/latest/configuration.html >> ). >> SPARK_DAEMON_JAVA_OPTS, as you pointed out, is for Spark daemons like >> the standalone master, worker, and the history server; >> it is also not intended for spark configurations to be picked up by Spark >> executors and drivers. In general, any reference to "java opts" >> in any variable or config refers to java options, as the name implies, >> not Spark configuration. Unfortunately, it just so happened that we >> used to mix the two in the same environment variable before 1.0. >> >> Is there a reason you're not using spark-submit? Is it for legacy >> reasons? As of 1.0, most changes to launching Spark applications >> will be done through spark-submit, so you may miss out on relevant new >> features or bug fixes. >> >> Andrew >> >> >> >> 2014-06-19 7:41 GMT-07:00 Koert Kuipers : >> >> still struggling with SPARK_JAVA_OPTS being deprecated. i am using spark >>> standalone. >>> >>> for example if i have a akka timeout setting that i would like to be >>> applied to every piece of the spark framework (so spark master, spark >>> workers, spark executor sub-processes, spark-shell, etc.). i used to do >>> that with SPARK_JAVA_OPTS. now i am unsure. >>> >>> SPARK_DAEMON_JAVA_OPTS works for the master and workers, but not for the >>> spark-shell i think? i tried using SPARK_DAEMON_JAVA_OPTS, and it does not >>> seem that useful. for example for a worker it does not apply the settings >>> to the executor sub-processes, while for SPARK_JAVA_OPTS it does do that. >>> so seems like SPARK_JAVA_OPTS is my only way to change settings for the >>> executors, yet its deprecated? >>> >>> >>> On Wed, Jun 11, 2014 at 10:59 PM, elyast >>> wrote: >>> Hi, I tried to use SPARK_JAVA_OPTS in spark-env.sh as well as conf/java-opts file to set additional java system properties. In this case I could connect to tachyon without any problem. However when I tried setting executor and driver extraJavaOptions in spark-defaults.conf it doesn't. I suspect the root cause may be following: SparkSubmit doesn't fork additional JVM to actually run either driver or executor process and additional system properties are set after JVM is created and other classes are loaded. It may happen that Tachyon CommonConf class is already being loaded and since its Singleton it won't pick up and changes to system properties. Please let me know what do u think. Can I use conf/java-opts ? since it's not really documented anywhere? Best regards Lukasz -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/little-confused-about-SPARK-JAVA-OPTS-alternatives-tp5798p7448.html Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> >> >
Re: little confused about SPARK_JAVA_OPTS alternatives
for a jvm application its not very appealing to me to use spark submit my application uses hadoop, so i should use "hadoop jar", and my application uses spark, so it should use "spark-submit". if i add a piece of code that uses some other system there will be yet another suggested way to launch it. thats not very scalable, since i can only launch it one way in the end... On Thu, Jun 19, 2014 at 4:58 PM, Andrew Or wrote: > Hi Koert and Lukasz, > > The recommended way of not hard-coding configurations in your application > is through conf/spark-defaults.conf as documented here: > http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties. > However, this is only applicable to > spark-submit, so this may not be useful to you. > > Depending on how you launch your Spark applications, you can workaround > this by manually specifying these configs as -Dspark.x=y > in your java command to launch Spark. This is actually how SPARK_JAVA_OPTS > used to work before 1.0. Note that spark-submit does > essentially the same thing, but sets these properties programmatically by > reading from the conf/spark-defaults.conf file and calling > System.setProperty("spark.x", "y"). > > Note that spark.executor.extraJavaOpts not intended for spark > configuration (see http://spark.apache.org/docs/latest/configuration.html > ). > SPARK_DAEMON_JAVA_OPTS, as you pointed out, is for Spark daemons like the > standalone master, worker, and the history server; > it is also not intended for spark configurations to be picked up by Spark > executors and drivers. In general, any reference to "java opts" > in any variable or config refers to java options, as the name implies, not > Spark configuration. Unfortunately, it just so happened that we > used to mix the two in the same environment variable before 1.0. > > Is there a reason you're not using spark-submit? Is it for legacy reasons? > As of 1.0, most changes to launching Spark applications > will be done through spark-submit, so you may miss out on relevant new > features or bug fixes. > > Andrew > > > > 2014-06-19 7:41 GMT-07:00 Koert Kuipers : > > still struggling with SPARK_JAVA_OPTS being deprecated. i am using spark >> standalone. >> >> for example if i have a akka timeout setting that i would like to be >> applied to every piece of the spark framework (so spark master, spark >> workers, spark executor sub-processes, spark-shell, etc.). i used to do >> that with SPARK_JAVA_OPTS. now i am unsure. >> >> SPARK_DAEMON_JAVA_OPTS works for the master and workers, but not for the >> spark-shell i think? i tried using SPARK_DAEMON_JAVA_OPTS, and it does not >> seem that useful. for example for a worker it does not apply the settings >> to the executor sub-processes, while for SPARK_JAVA_OPTS it does do that. >> so seems like SPARK_JAVA_OPTS is my only way to change settings for the >> executors, yet its deprecated? >> >> >> On Wed, Jun 11, 2014 at 10:59 PM, elyast >> wrote: >> >>> Hi, >>> >>> I tried to use SPARK_JAVA_OPTS in spark-env.sh as well as conf/java-opts >>> file to set additional java system properties. In this case I could >>> connect >>> to tachyon without any problem. >>> >>> However when I tried setting executor and driver extraJavaOptions in >>> spark-defaults.conf it doesn't. >>> >>> I suspect the root cause may be following: >>> >>> SparkSubmit doesn't fork additional JVM to actually run either driver or >>> executor process and additional system properties are set after JVM is >>> created and other classes are loaded. It may happen that Tachyon >>> CommonConf >>> class is already being loaded and since its Singleton it won't pick up >>> and >>> changes to system properties. >>> >>> Please let me know what do u think. >>> >>> Can I use conf/java-opts ? since it's not really documented anywhere? >>> >>> Best regards >>> Lukasz >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/little-confused-about-SPARK-JAVA-OPTS-alternatives-tp5798p7448.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >> >> >
Re: little confused about SPARK_JAVA_OPTS alternatives
Hi Koert and Lukasz, The recommended way of not hard-coding configurations in your application is through conf/spark-defaults.conf as documented here: http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties. However, this is only applicable to spark-submit, so this may not be useful to you. Depending on how you launch your Spark applications, you can workaround this by manually specifying these configs as -Dspark.x=y in your java command to launch Spark. This is actually how SPARK_JAVA_OPTS used to work before 1.0. Note that spark-submit does essentially the same thing, but sets these properties programmatically by reading from the conf/spark-defaults.conf file and calling System.setProperty("spark.x", "y"). Note that spark.executor.extraJavaOpts not intended for spark configuration (see http://spark.apache.org/docs/latest/configuration.html). SPARK_DAEMON_JAVA_OPTS, as you pointed out, is for Spark daemons like the standalone master, worker, and the history server; it is also not intended for spark configurations to be picked up by Spark executors and drivers. In general, any reference to "java opts" in any variable or config refers to java options, as the name implies, not Spark configuration. Unfortunately, it just so happened that we used to mix the two in the same environment variable before 1.0. Is there a reason you're not using spark-submit? Is it for legacy reasons? As of 1.0, most changes to launching Spark applications will be done through spark-submit, so you may miss out on relevant new features or bug fixes. Andrew 2014-06-19 7:41 GMT-07:00 Koert Kuipers : > still struggling with SPARK_JAVA_OPTS being deprecated. i am using spark > standalone. > > for example if i have a akka timeout setting that i would like to be > applied to every piece of the spark framework (so spark master, spark > workers, spark executor sub-processes, spark-shell, etc.). i used to do > that with SPARK_JAVA_OPTS. now i am unsure. > > SPARK_DAEMON_JAVA_OPTS works for the master and workers, but not for the > spark-shell i think? i tried using SPARK_DAEMON_JAVA_OPTS, and it does not > seem that useful. for example for a worker it does not apply the settings > to the executor sub-processes, while for SPARK_JAVA_OPTS it does do that. > so seems like SPARK_JAVA_OPTS is my only way to change settings for the > executors, yet its deprecated? > > > On Wed, Jun 11, 2014 at 10:59 PM, elyast > wrote: > >> Hi, >> >> I tried to use SPARK_JAVA_OPTS in spark-env.sh as well as conf/java-opts >> file to set additional java system properties. In this case I could >> connect >> to tachyon without any problem. >> >> However when I tried setting executor and driver extraJavaOptions in >> spark-defaults.conf it doesn't. >> >> I suspect the root cause may be following: >> >> SparkSubmit doesn't fork additional JVM to actually run either driver or >> executor process and additional system properties are set after JVM is >> created and other classes are loaded. It may happen that Tachyon >> CommonConf >> class is already being loaded and since its Singleton it won't pick up and >> changes to system properties. >> >> Please let me know what do u think. >> >> Can I use conf/java-opts ? since it's not really documented anywhere? >> >> Best regards >> Lukasz >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/little-confused-about-SPARK-JAVA-OPTS-alternatives-tp5798p7448.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >
Re: little confused about SPARK_JAVA_OPTS alternatives
still struggling with SPARK_JAVA_OPTS being deprecated. i am using spark standalone. for example if i have a akka timeout setting that i would like to be applied to every piece of the spark framework (so spark master, spark workers, spark executor sub-processes, spark-shell, etc.). i used to do that with SPARK_JAVA_OPTS. now i am unsure. SPARK_DAEMON_JAVA_OPTS works for the master and workers, but not for the spark-shell i think? i tried using SPARK_DAEMON_JAVA_OPTS, and it does not seem that useful. for example for a worker it does not apply the settings to the executor sub-processes, while for SPARK_JAVA_OPTS it does do that. so seems like SPARK_JAVA_OPTS is my only way to change settings for the executors, yet its deprecated? On Wed, Jun 11, 2014 at 10:59 PM, elyast wrote: > Hi, > > I tried to use SPARK_JAVA_OPTS in spark-env.sh as well as conf/java-opts > file to set additional java system properties. In this case I could connect > to tachyon without any problem. > > However when I tried setting executor and driver extraJavaOptions in > spark-defaults.conf it doesn't. > > I suspect the root cause may be following: > > SparkSubmit doesn't fork additional JVM to actually run either driver or > executor process and additional system properties are set after JVM is > created and other classes are loaded. It may happen that Tachyon CommonConf > class is already being loaded and since its Singleton it won't pick up and > changes to system properties. > > Please let me know what do u think. > > Can I use conf/java-opts ? since it's not really documented anywhere? > > Best regards > Lukasz > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/little-confused-about-SPARK-JAVA-OPTS-alternatives-tp5798p7448.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
Re: little confused about SPARK_JAVA_OPTS alternatives
Hi, I tried to use SPARK_JAVA_OPTS in spark-env.sh as well as conf/java-opts file to set additional java system properties. In this case I could connect to tachyon without any problem. However when I tried setting executor and driver extraJavaOptions in spark-defaults.conf it doesn't. I suspect the root cause may be following: SparkSubmit doesn't fork additional JVM to actually run either driver or executor process and additional system properties are set after JVM is created and other classes are loaded. It may happen that Tachyon CommonConf class is already being loaded and since its Singleton it won't pick up and changes to system properties. Please let me know what do u think. Can I use conf/java-opts ? since it's not really documented anywhere? Best regards Lukasz -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/little-confused-about-SPARK-JAVA-OPTS-alternatives-tp5798p7448.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: little confused about SPARK_JAVA_OPTS alternatives
Hi, I'm facing similar problem According to: http://tachyon-project.org/Running-Spark-on-Tachyon.html in order to allow tachyon client to connect to tachyon master in HA mode you need to pass 2 system properties: -Dtachyon.zookeeper.address=zookeeperHost1:2181,zookeeperHost2:2181 -Dtachyon.usezookeeper=true Previously I was doing it with SPARK_JAVA_OPTS I am trying in such a way: spark-defaults.conf: ... spark.executor.extraJavaOptions -Dtachyon.max.columns=1 -Dtachyon.usezookeeper=true -Dtachyon.zookeeper.address=hadoop-zoo-1:2181,hadoop-zoo-2:2181,hadoop-zoo-3:2181 ... However I am getting exception that connection string is not set (the zk string) 14/06/11 06:32:15 INFO : initialize(tachyon-ft://hadoop-ha-1:19998/tmp/users.txt, Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml). Connecting to Tachyon: tachyon-ft://hadoop-ha-1:19998/tmp/users.txt 14/06/11 06:32:15 INFO : Trying to connect master @ hadoop-ha-1/15.253.91.167:19998 14/06/11 06:32:15 WARN : tachyon.home is not set. Using /mnt/tachyon_default_home as the default value. Exception in thread "main" java.lang.NullPointerException: connectionString cannot be null at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208) at org.apache.curator.ensemble.fixed.FixedEnsembleProvider.(FixedEnsembleProvider.java:39) at org.apache.curator.framework.CuratorFrameworkFactory$Builder.connectString(CuratorFrameworkFactory.java:176) at org.apache.curator.framework.CuratorFrameworkFactory.newClient(CuratorFrameworkFactory.java:91) at org.apache.curator.framework.CuratorFrameworkFactory.newClient(CuratorFrameworkFactory.java:76) at tachyon.LeaderInquireClient.(LeaderInquireClient.java:48) at tachyon.LeaderInquireClient.getClient(LeaderInquireClient.java:57) at tachyon.master.MasterClient.getMasterAddress(MasterClient.java:96) Any help appreciated, it's really blocker for me. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/little-confused-about-SPARK-JAVA-OPTS-alternatives-tp5798p7383.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: little confused about SPARK_JAVA_OPTS alternatives
Just wondering - how are you launching your application? If you want to set values like this the right way is to add them to the SparkConf when you create a SparkContext. val conf = new SparkConf().set("spark.akka.frameSize", "1").setAppName(...).setMaster(...) val sc = new SparkContext(conf) - Patrick On Wed, May 14, 2014 at 9:09 AM, Koert Kuipers wrote: > i have some settings that i think are relevant for my application. they are > spark.akka settings so i assume they are relevant for both executors and my > driver program. > > i used to do: > SPARK_JAVA_OPTS="-Dspark.akka.frameSize=1" > > now this is deprecated. the alternatives mentioned are: > * some spark-submit settings which are not relevant to me since i do not use > spark-submit (i launch spark jobs from an existing application) > * spark.executor.extraJavaOptions to set -X options. i am not sure what -X > options are, but it doesnt sound like what i need, since its only for > executors > * SPARK_DAEMON_OPTS to set java options for standalone daemons (i.e. master, > worker), that sounds like i should not use it since i am trying to change > settings for an app, not a daemon. > > am i missing the correct setting to use? > should i do -Dspark.akka.frameSize=1 on my application launch directly, > and then also set spark.executor.extraJavaOptions? so basically repeat it?
Re: little confused about SPARK_JAVA_OPTS alternatives
hey patrick, i have a SparkConf i can add them too. i was looking for a way to do this where they are not hardwired within scala, which is what SPARK_JAVA_OPTS used to do. i guess if i just set -Dspark.akka.frameSize=1 on my java app launch then it will get picked up by the SparkConf too right? On Wed, May 14, 2014 at 2:54 PM, Patrick Wendell wrote: > Just wondering - how are you launching your application? If you want > to set values like this the right way is to add them to the SparkConf > when you create a SparkContext. > > val conf = new SparkConf().set("spark.akka.frameSize", > "1").setAppName(...).setMaster(...) > val sc = new SparkContext(conf) > > - Patrick > > On Wed, May 14, 2014 at 9:09 AM, Koert Kuipers wrote: > > i have some settings that i think are relevant for my application. they > are > > spark.akka settings so i assume they are relevant for both executors and > my > > driver program. > > > > i used to do: > > SPARK_JAVA_OPTS="-Dspark.akka.frameSize=1" > > > > now this is deprecated. the alternatives mentioned are: > > * some spark-submit settings which are not relevant to me since i do not > use > > spark-submit (i launch spark jobs from an existing application) > > * spark.executor.extraJavaOptions to set -X options. i am not sure what > -X > > options are, but it doesnt sound like what i need, since its only for > > executors > > * SPARK_DAEMON_OPTS to set java options for standalone daemons (i.e. > master, > > worker), that sounds like i should not use it since i am trying to change > > settings for an app, not a daemon. > > > > am i missing the correct setting to use? > > should i do -Dspark.akka.frameSize=1 on my application launch > directly, > > and then also set spark.executor.extraJavaOptions? so basically repeat > it? >