Emre,

As you are keeping the properties file external to the JAR you need to make
sure to submit the properties file as an additional --files (or whatever
the necessary CLI switch is) so all the executors get a copy of the file
along with the JAR.

If you know you are going to just put the properties file on HDFS then why
don't you define a custom system setting like "properties.url" and pass it
along:

(this is for Spark shell, the only CLI string I have available at the
moment:)

spark-shell --jars $JAR_NAME \
    --conf 'properties.url=hdfs://config/stuff.properties' \
    --conf
'spark.executor.extraJavaOptions=-Dproperties.url=hdfs://config/stuff.properties'"

... then load the properties file during initialization by examining the
properties.url system setting.

I'd still strongly recommend Typesafe Config as it makes this a lot less
painful, and I know for certain you can place your *.conf at a URL (using
the -Dconfig.url=) though it probably won't work with an HDFS URL.


On Tue Feb 17 2015 at 9:53:08 AM Gerard Maas <gerard.m...@gmail.com> wrote:

> +1 for TypeSafe config
> Our practice is to include all spark properties under a 'spark' entry in
> the config file alongside job-specific configuration:
>
> A config file would look like:
> spark {
>      master = ""
>      cleaner.ttl = 123456
>      ...
> }
> job {
>     context {
>         src = "foo"
>         action = "barAction"
>     }
>     prop1 = "val1"
> }
>
> Then, to create our Spark context, we transparently pass the spark section
> to a SparkConf instance.
> This idiom will instantiate the context with the spark specific
> configuration:
>
>
> sparkConfig.setAll(configToStringSeq(config.getConfig("spark").atPath("spark")))
>
> And we can make use of the config object everywhere else.
>
> We use the override model of the typesafe config: reasonable defaults go
> in the reference.conf (within the jar). Environment-specific overrides go
> in the application.conf (alongside the job jar) and hacks are passed with
> -Dprop=value :-)
>
>
> -kr, Gerard.
>
>
> On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc <emre.sev...@gmail.com>
> wrote:
>
>> I've decided to try
>>
>>   spark-submit ... --conf
>> "spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties"
>>
>> But when I try to retrieve the value of propertiesFile via
>>
>>    System.err.println("propertiesFile : " +
>> System.getProperty("propertiesFile"));
>>
>> I get NULL:
>>
>>    propertiesFile : null
>>
>> Interestingly, when I run spark-submit with --verbose, I see that it
>> prints:
>>
>>   spark.driver.extraJavaOptions ->
>> -DpropertiesFile=/home/emre/data/belga/schemavalidator.properties
>>
>> I couldn't understand why I couldn't get to the value of "propertiesFile"
>> by using standard System.getProperty method. (I can use new
>> SparkConf().get("spark.driver.extraJavaOptions")  and manually parse it,
>> and retrieve the value, but I'd like to know why I cannot retrieve that
>> value using System.getProperty method).
>>
>> Any ideas?
>>
>> If I can achieve what I've described above properly, I plan to pass a
>> properties file that resides on HDFS, so that it will be available to my
>> driver program wherever that program runs.
>>
>> --
>> Emre
>>
>>
>>
>>
>> On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke <charles.fed...@gmail.com
>> > wrote:
>>
>>> I haven't actually tried mixing non-Spark settings into the Spark
>>> properties. Instead I package my properties into the jar and use the
>>> Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
>>> specific) to get at my properties:
>>>
>>> Properties file: src/main/resources/integration.conf
>>>
>>> (below $ENV might be set to either "integration" or "prod"[3])
>>>
>>> ssh -t root@$HOST "/root/spark/bin/spark-shell --jars /root/$JAR_NAME \
>>>     --conf 'config.resource=$ENV.conf' \
>>>     --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'"
>>>
>>> Since the properties file is packaged up with the JAR I don't have to
>>> worry about sending the file separately to all of the slave nodes. Typesafe
>>> Config is written in Java so it will work if you're not using Scala. (The
>>> Typesafe Config also has the advantage of being extremely easy to integrate
>>> with code that is using Java Properties today.)
>>>
>>> If you instead want to send the file separately from the JAR and you use
>>> the Typesafe Config library, you can specify "config.file" instead of
>>> ".resource"; though I'd point you to [3] below if you want to make your
>>> development life easier.
>>>
>>> 1. https://github.com/typesafehub/config
>>> 2. https://github.com/ceedubs/ficus
>>> 3.
>>> http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/
>>>
>>>
>>>
>>> On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc <emre.sev...@gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm using Spark 1.2.1 and have a module.properties file, and in it I
>>>> have non-Spark properties, as well as Spark properties, e.g.:
>>>>
>>>>    job.output.dir=file:///home/emre/data/mymodule/out
>>>>
>>>> I'm trying to pass it to spark-submit via:
>>>>
>>>>    spark-submit --class com.myModule --master local[4] --deploy-mode
>>>> client --verbose --properties-file /home/emre/data/mymodule.properties
>>>> mymodule.jar
>>>>
>>>> And I thought I could read the value of my non-Spark property, namely,
>>>> job.output.dir by using:
>>>>
>>>>     SparkConf sparkConf = new SparkConf();
>>>>     final String validatedJSONoutputDir =
>>>> sparkConf.get("job.output.dir");
>>>>
>>>> But it gives me an exception:
>>>>
>>>>     Exception in thread "main" java.util.NoSuchElementException:
>>>> job.output.dir
>>>>
>>>> Is it not possible to mix Spark and non-Spark properties in a single
>>>> .properties file, then pass it via --properties-file and then get the
>>>> values of those non-Spark properties via SparkConf?
>>>>
>>>> Or is there another object / method to retrieve the values for those
>>>> non-Spark properties?
>>>>
>>>>
>>>> --
>>>> Emre Sevinç
>>>>
>>>
>>
>>
>> --
>> Emre Sevinc
>>
>
>

Reply via email to