+1 for TypeSafe config Our practice is to include all spark properties under a 'spark' entry in the config file alongside job-specific configuration:
A config file would look like: spark { master = "" cleaner.ttl = 123456 ... } job { context { src = "foo" action = "barAction" } prop1 = "val1" } Then, to create our Spark context, we transparently pass the spark section to a SparkConf instance. This idiom will instantiate the context with the spark specific configuration: sparkConfig.setAll(configToStringSeq(config.getConfig("spark").atPath("spark"))) And we can make use of the config object everywhere else. We use the override model of the typesafe config: reasonable defaults go in the reference.conf (within the jar). Environment-specific overrides go in the application.conf (alongside the job jar) and hacks are passed with -Dprop=value :-) -kr, Gerard. On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc <emre.sev...@gmail.com> wrote: > I've decided to try > > spark-submit ... --conf > "spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties" > > But when I try to retrieve the value of propertiesFile via > > System.err.println("propertiesFile : " + > System.getProperty("propertiesFile")); > > I get NULL: > > propertiesFile : null > > Interestingly, when I run spark-submit with --verbose, I see that it > prints: > > spark.driver.extraJavaOptions -> > -DpropertiesFile=/home/emre/data/belga/schemavalidator.properties > > I couldn't understand why I couldn't get to the value of "propertiesFile" > by using standard System.getProperty method. (I can use new > SparkConf().get("spark.driver.extraJavaOptions") and manually parse it, > and retrieve the value, but I'd like to know why I cannot retrieve that > value using System.getProperty method). > > Any ideas? > > If I can achieve what I've described above properly, I plan to pass a > properties file that resides on HDFS, so that it will be available to my > driver program wherever that program runs. > > -- > Emre > > > > > On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke <charles.fed...@gmail.com> > wrote: > >> I haven't actually tried mixing non-Spark settings into the Spark >> properties. Instead I package my properties into the jar and use the >> Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala >> specific) to get at my properties: >> >> Properties file: src/main/resources/integration.conf >> >> (below $ENV might be set to either "integration" or "prod"[3]) >> >> ssh -t root@$HOST "/root/spark/bin/spark-shell --jars /root/$JAR_NAME \ >> --conf 'config.resource=$ENV.conf' \ >> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'" >> >> Since the properties file is packaged up with the JAR I don't have to >> worry about sending the file separately to all of the slave nodes. Typesafe >> Config is written in Java so it will work if you're not using Scala. (The >> Typesafe Config also has the advantage of being extremely easy to integrate >> with code that is using Java Properties today.) >> >> If you instead want to send the file separately from the JAR and you use >> the Typesafe Config library, you can specify "config.file" instead of >> ".resource"; though I'd point you to [3] below if you want to make your >> development life easier. >> >> 1. https://github.com/typesafehub/config >> 2. https://github.com/ceedubs/ficus >> 3. >> http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/ >> >> >> >> On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc <emre.sev...@gmail.com> >> wrote: >> >>> Hello, >>> >>> I'm using Spark 1.2.1 and have a module.properties file, and in it I >>> have non-Spark properties, as well as Spark properties, e.g.: >>> >>> job.output.dir=file:///home/emre/data/mymodule/out >>> >>> I'm trying to pass it to spark-submit via: >>> >>> spark-submit --class com.myModule --master local[4] --deploy-mode >>> client --verbose --properties-file /home/emre/data/mymodule.properties >>> mymodule.jar >>> >>> And I thought I could read the value of my non-Spark property, namely, >>> job.output.dir by using: >>> >>> SparkConf sparkConf = new SparkConf(); >>> final String validatedJSONoutputDir = >>> sparkConf.get("job.output.dir"); >>> >>> But it gives me an exception: >>> >>> Exception in thread "main" java.util.NoSuchElementException: >>> job.output.dir >>> >>> Is it not possible to mix Spark and non-Spark properties in a single >>> .properties file, then pass it via --properties-file and then get the >>> values of those non-Spark properties via SparkConf? >>> >>> Or is there another object / method to retrieve the values for those >>> non-Spark properties? >>> >>> >>> -- >>> Emre Sevinç >>> >> > > > -- > Emre Sevinc >