+1 for TypeSafe config
Our practice is to include all spark properties under a 'spark' entry in
the config file alongside job-specific configuration:

A config file would look like:
spark {
     master = ""
     cleaner.ttl = 123456
     ...
}
job {
    context {
        src = "foo"
        action = "barAction"
    }
    prop1 = "val1"
}

Then, to create our Spark context, we transparently pass the spark section
to a SparkConf instance.
This idiom will instantiate the context with the spark specific
configuration:

sparkConfig.setAll(configToStringSeq(config.getConfig("spark").atPath("spark")))

And we can make use of the config object everywhere else.

We use the override model of the typesafe config: reasonable defaults go in
the reference.conf (within the jar). Environment-specific overrides go in
the application.conf (alongside the job jar) and hacks are passed with
-Dprop=value :-)


-kr, Gerard.


On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc <emre.sev...@gmail.com> wrote:

> I've decided to try
>
>   spark-submit ... --conf
> "spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties"
>
> But when I try to retrieve the value of propertiesFile via
>
>    System.err.println("propertiesFile : " +
> System.getProperty("propertiesFile"));
>
> I get NULL:
>
>    propertiesFile : null
>
> Interestingly, when I run spark-submit with --verbose, I see that it
> prints:
>
>   spark.driver.extraJavaOptions ->
> -DpropertiesFile=/home/emre/data/belga/schemavalidator.properties
>
> I couldn't understand why I couldn't get to the value of "propertiesFile"
> by using standard System.getProperty method. (I can use new
> SparkConf().get("spark.driver.extraJavaOptions")  and manually parse it,
> and retrieve the value, but I'd like to know why I cannot retrieve that
> value using System.getProperty method).
>
> Any ideas?
>
> If I can achieve what I've described above properly, I plan to pass a
> properties file that resides on HDFS, so that it will be available to my
> driver program wherever that program runs.
>
> --
> Emre
>
>
>
>
> On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke <charles.fed...@gmail.com>
> wrote:
>
>> I haven't actually tried mixing non-Spark settings into the Spark
>> properties. Instead I package my properties into the jar and use the
>> Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
>> specific) to get at my properties:
>>
>> Properties file: src/main/resources/integration.conf
>>
>> (below $ENV might be set to either "integration" or "prod"[3])
>>
>> ssh -t root@$HOST "/root/spark/bin/spark-shell --jars /root/$JAR_NAME \
>>     --conf 'config.resource=$ENV.conf' \
>>     --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'"
>>
>> Since the properties file is packaged up with the JAR I don't have to
>> worry about sending the file separately to all of the slave nodes. Typesafe
>> Config is written in Java so it will work if you're not using Scala. (The
>> Typesafe Config also has the advantage of being extremely easy to integrate
>> with code that is using Java Properties today.)
>>
>> If you instead want to send the file separately from the JAR and you use
>> the Typesafe Config library, you can specify "config.file" instead of
>> ".resource"; though I'd point you to [3] below if you want to make your
>> development life easier.
>>
>> 1. https://github.com/typesafehub/config
>> 2. https://github.com/ceedubs/ficus
>> 3.
>> http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/
>>
>>
>>
>> On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc <emre.sev...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I'm using Spark 1.2.1 and have a module.properties file, and in it I
>>> have non-Spark properties, as well as Spark properties, e.g.:
>>>
>>>    job.output.dir=file:///home/emre/data/mymodule/out
>>>
>>> I'm trying to pass it to spark-submit via:
>>>
>>>    spark-submit --class com.myModule --master local[4] --deploy-mode
>>> client --verbose --properties-file /home/emre/data/mymodule.properties
>>> mymodule.jar
>>>
>>> And I thought I could read the value of my non-Spark property, namely,
>>> job.output.dir by using:
>>>
>>>     SparkConf sparkConf = new SparkConf();
>>>     final String validatedJSONoutputDir =
>>> sparkConf.get("job.output.dir");
>>>
>>> But it gives me an exception:
>>>
>>>     Exception in thread "main" java.util.NoSuchElementException:
>>> job.output.dir
>>>
>>> Is it not possible to mix Spark and non-Spark properties in a single
>>> .properties file, then pass it via --properties-file and then get the
>>> values of those non-Spark properties via SparkConf?
>>>
>>> Or is there another object / method to retrieve the values for those
>>> non-Spark properties?
>>>
>>>
>>> --
>>> Emre Sevinç
>>>
>>
>
>
> --
> Emre Sevinc
>

Reply via email to