I've decided to try

  spark-submit ... --conf
"spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties"

But when I try to retrieve the value of propertiesFile via

   System.err.println("propertiesFile : " +
System.getProperty("propertiesFile"));

I get NULL:

   propertiesFile : null

Interestingly, when I run spark-submit with --verbose, I see that it prints:

  spark.driver.extraJavaOptions ->
-DpropertiesFile=/home/emre/data/belga/schemavalidator.properties

I couldn't understand why I couldn't get to the value of "propertiesFile"
by using standard System.getProperty method. (I can use new
SparkConf().get("spark.driver.extraJavaOptions")  and manually parse it,
and retrieve the value, but I'd like to know why I cannot retrieve that
value using System.getProperty method).

Any ideas?

If I can achieve what I've described above properly, I plan to pass a
properties file that resides on HDFS, so that it will be available to my
driver program wherever that program runs.

--
Emre




On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke <charles.fed...@gmail.com>
wrote:

> I haven't actually tried mixing non-Spark settings into the Spark
> properties. Instead I package my properties into the jar and use the
> Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
> specific) to get at my properties:
>
> Properties file: src/main/resources/integration.conf
>
> (below $ENV might be set to either "integration" or "prod"[3])
>
> ssh -t root@$HOST "/root/spark/bin/spark-shell --jars /root/$JAR_NAME \
>     --conf 'config.resource=$ENV.conf' \
>     --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'"
>
> Since the properties file is packaged up with the JAR I don't have to
> worry about sending the file separately to all of the slave nodes. Typesafe
> Config is written in Java so it will work if you're not using Scala. (The
> Typesafe Config also has the advantage of being extremely easy to integrate
> with code that is using Java Properties today.)
>
> If you instead want to send the file separately from the JAR and you use
> the Typesafe Config library, you can specify "config.file" instead of
> ".resource"; though I'd point you to [3] below if you want to make your
> development life easier.
>
> 1. https://github.com/typesafehub/config
> 2. https://github.com/ceedubs/ficus
> 3.
> http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/
>
>
>
> On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc <emre.sev...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I'm using Spark 1.2.1 and have a module.properties file, and in it I have
>> non-Spark properties, as well as Spark properties, e.g.:
>>
>>    job.output.dir=file:///home/emre/data/mymodule/out
>>
>> I'm trying to pass it to spark-submit via:
>>
>>    spark-submit --class com.myModule --master local[4] --deploy-mode
>> client --verbose --properties-file /home/emre/data/mymodule.properties
>> mymodule.jar
>>
>> And I thought I could read the value of my non-Spark property, namely,
>> job.output.dir by using:
>>
>>     SparkConf sparkConf = new SparkConf();
>>     final String validatedJSONoutputDir = sparkConf.get("job.output.dir");
>>
>> But it gives me an exception:
>>
>>     Exception in thread "main" java.util.NoSuchElementException:
>> job.output.dir
>>
>> Is it not possible to mix Spark and non-Spark properties in a single
>> .properties file, then pass it via --properties-file and then get the
>> values of those non-Spark properties via SparkConf?
>>
>> Or is there another object / method to retrieve the values for those
>> non-Spark properties?
>>
>>
>> --
>> Emre Sevinç
>>
>


-- 
Emre Sevinc

Reply via email to