Emre, As you are keeping the properties file external to the JAR you need to make sure to submit the properties file as an additional --files (or whatever the necessary CLI switch is) so all the executors get a copy of the file along with the JAR.
If you know you are going to just put the properties file on HDFS then why don't you define a custom system setting like "properties.url" and pass it along: (this is for Spark shell, the only CLI string I have available at the moment:) spark-shell --jars $JAR_NAME \ --conf 'properties.url=hdfs://config/stuff.properties' \ --conf 'spark.executor.extraJavaOptions=-Dproperties.url=hdfs://config/stuff.properties'" ... then load the properties file during initialization by examining the properties.url system setting. I'd still strongly recommend Typesafe Config as it makes this a lot less painful, and I know for certain you can place your *.conf at a URL (using the -Dconfig.url=) though it probably won't work with an HDFS URL. On Tue Feb 17 2015 at 9:53:08 AM Gerard Maas <gerard.m...@gmail.com> wrote: > +1 for TypeSafe config > Our practice is to include all spark properties under a 'spark' entry in > the config file alongside job-specific configuration: > > A config file would look like: > spark { > master = "" > cleaner.ttl = 123456 > ... > } > job { > context { > src = "foo" > action = "barAction" > } > prop1 = "val1" > } > > Then, to create our Spark context, we transparently pass the spark section > to a SparkConf instance. > This idiom will instantiate the context with the spark specific > configuration: > > > sparkConfig.setAll(configToStringSeq(config.getConfig("spark").atPath("spark"))) > > And we can make use of the config object everywhere else. > > We use the override model of the typesafe config: reasonable defaults go > in the reference.conf (within the jar). Environment-specific overrides go > in the application.conf (alongside the job jar) and hacks are passed with > -Dprop=value :-) > > > -kr, Gerard. > > > On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc <emre.sev...@gmail.com> > wrote: > >> I've decided to try >> >> spark-submit ... --conf >> "spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties" >> >> But when I try to retrieve the value of propertiesFile via >> >> System.err.println("propertiesFile : " + >> System.getProperty("propertiesFile")); >> >> I get NULL: >> >> propertiesFile : null >> >> Interestingly, when I run spark-submit with --verbose, I see that it >> prints: >> >> spark.driver.extraJavaOptions -> >> -DpropertiesFile=/home/emre/data/belga/schemavalidator.properties >> >> I couldn't understand why I couldn't get to the value of "propertiesFile" >> by using standard System.getProperty method. (I can use new >> SparkConf().get("spark.driver.extraJavaOptions") and manually parse it, >> and retrieve the value, but I'd like to know why I cannot retrieve that >> value using System.getProperty method). >> >> Any ideas? >> >> If I can achieve what I've described above properly, I plan to pass a >> properties file that resides on HDFS, so that it will be available to my >> driver program wherever that program runs. >> >> -- >> Emre >> >> >> >> >> On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke <charles.fed...@gmail.com >> > wrote: >> >>> I haven't actually tried mixing non-Spark settings into the Spark >>> properties. Instead I package my properties into the jar and use the >>> Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala >>> specific) to get at my properties: >>> >>> Properties file: src/main/resources/integration.conf >>> >>> (below $ENV might be set to either "integration" or "prod"[3]) >>> >>> ssh -t root@$HOST "/root/spark/bin/spark-shell --jars /root/$JAR_NAME \ >>> --conf 'config.resource=$ENV.conf' \ >>> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'" >>> >>> Since the properties file is packaged up with the JAR I don't have to >>> worry about sending the file separately to all of the slave nodes. Typesafe >>> Config is written in Java so it will work if you're not using Scala. (The >>> Typesafe Config also has the advantage of being extremely easy to integrate >>> with code that is using Java Properties today.) >>> >>> If you instead want to send the file separately from the JAR and you use >>> the Typesafe Config library, you can specify "config.file" instead of >>> ".resource"; though I'd point you to [3] below if you want to make your >>> development life easier. >>> >>> 1. https://github.com/typesafehub/config >>> 2. https://github.com/ceedubs/ficus >>> 3. >>> http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/ >>> >>> >>> >>> On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc <emre.sev...@gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> I'm using Spark 1.2.1 and have a module.properties file, and in it I >>>> have non-Spark properties, as well as Spark properties, e.g.: >>>> >>>> job.output.dir=file:///home/emre/data/mymodule/out >>>> >>>> I'm trying to pass it to spark-submit via: >>>> >>>> spark-submit --class com.myModule --master local[4] --deploy-mode >>>> client --verbose --properties-file /home/emre/data/mymodule.properties >>>> mymodule.jar >>>> >>>> And I thought I could read the value of my non-Spark property, namely, >>>> job.output.dir by using: >>>> >>>> SparkConf sparkConf = new SparkConf(); >>>> final String validatedJSONoutputDir = >>>> sparkConf.get("job.output.dir"); >>>> >>>> But it gives me an exception: >>>> >>>> Exception in thread "main" java.util.NoSuchElementException: >>>> job.output.dir >>>> >>>> Is it not possible to mix Spark and non-Spark properties in a single >>>> .properties file, then pass it via --properties-file and then get the >>>> values of those non-Spark properties via SparkConf? >>>> >>>> Or is there another object / method to retrieve the values for those >>>> non-Spark properties? >>>> >>>> >>>> -- >>>> Emre Sevinç >>>> >>> >> >> >> -- >> Emre Sevinc >> > >