Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-18 Thread Emre Sevinc
Thanks to everyone for suggestions and explanations.

Currently I've started to experiment with the following scenario, that
seems to work for me:

- Put the properties file on a web server so that it is centrally available
- Pass it to the Spark driver program via --conf 'propertiesFile=http:
//myWebServer.com/mymodule.properties'
- And then load the configuration using Apache Commons Configuration:

PropertiesConfiguration config = new PropertiesConfiguration();
config.load(System.getProperty(propertiesFile));

Using the method described above, I don't need to statically compile my
properties file into the über JAR anymore, I can modify the file on the web
server and when I submit my application via spark-submit, passing the URL
of the properties file, the driver program reads the contents of that file
for once, retrieves the values of the keys and continues.

PS: I've opted for Apache Commons Configuration because it is already part
of the many dependencies I have in my pom.xml, and I did not want to pull
another library, even though I Typesafe Config library seems to be a
powerful and flexible choice, too.

--
Emre



On Tue, Feb 17, 2015 at 6:12 PM, Charles Feduke charles.fed...@gmail.com
wrote:

 Emre,

 As you are keeping the properties file external to the JAR you need to
 make sure to submit the properties file as an additional --files (or
 whatever the necessary CLI switch is) so all the executors get a copy of
 the file along with the JAR.

 If you know you are going to just put the properties file on HDFS then why
 don't you define a custom system setting like properties.url and pass it
 along:

 (this is for Spark shell, the only CLI string I have available at the
 moment:)

 spark-shell --jars $JAR_NAME \
 --conf 'properties.url=hdfs://config/stuff.properties' \
 --conf
 'spark.executor.extraJavaOptions=-Dproperties.url=hdfs://config/stuff.properties'

 ... then load the properties file during initialization by examining the
 properties.url system setting.

 I'd still strongly recommend Typesafe Config as it makes this a lot less
 painful, and I know for certain you can place your *.conf at a URL (using
 the -Dconfig.url=) though it probably won't work with an HDFS URL.



 On Tue Feb 17 2015 at 9:53:08 AM Gerard Maas gerard.m...@gmail.com
 wrote:

 +1 for TypeSafe config
 Our practice is to include all spark properties under a 'spark' entry in
 the config file alongside job-specific configuration:

 A config file would look like:
 spark {
  master = 
  cleaner.ttl = 123456
  ...
 }
 job {
 context {
 src = foo
 action = barAction
 }
 prop1 = val1
 }

 Then, to create our Spark context, we transparently pass the spark
 section to a SparkConf instance.
 This idiom will instantiate the context with the spark specific
 configuration:


 sparkConfig.setAll(configToStringSeq(config.getConfig(spark).atPath(spark)))

 And we can make use of the config object everywhere else.

 We use the override model of the typesafe config: reasonable defaults go
 in the reference.conf (within the jar). Environment-specific overrides go
 in the application.conf (alongside the job jar) and hacks are passed with
 -Dprop=value :-)


 -kr, Gerard.


 On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc emre.sev...@gmail.com
 wrote:

 I've decided to try

   spark-submit ... --conf
 spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties

 But when I try to retrieve the value of propertiesFile via

System.err.println(propertiesFile :  +
 System.getProperty(propertiesFile));

 I get NULL:

propertiesFile : null

 Interestingly, when I run spark-submit with --verbose, I see that it
 prints:

   spark.driver.extraJavaOptions -
 -DpropertiesFile=/home/emre/data/belga/schemavalidator.properties

 I couldn't understand why I couldn't get to the value of
 propertiesFile by using standard System.getProperty method. (I can use
 new SparkConf().get(spark.driver.extraJavaOptions)  and manually parse
 it, and retrieve the value, but I'd like to know why I cannot retrieve that
 value using System.getProperty method).

 Any ideas?

 If I can achieve what I've described above properly, I plan to pass a
 properties file that resides on HDFS, so that it will be available to my
 driver program wherever that program runs.

 --
 Emre




 On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke 
 charles.fed...@gmail.com wrote:

 I haven't actually tried mixing non-Spark settings into the Spark
 properties. Instead I package my properties into the jar and use the
 Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
 specific) to get at my properties:

 Properties file: src/main/resources/integration.conf

 (below $ENV might be set to either integration or prod[3])

 ssh -t root@$HOST /root/spark/bin/spark-shell --jars /root/$JAR_NAME \
 --conf 'config.resource=$ENV.conf' \
 --conf
 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-17 Thread Charles Feduke
Emre,

As you are keeping the properties file external to the JAR you need to make
sure to submit the properties file as an additional --files (or whatever
the necessary CLI switch is) so all the executors get a copy of the file
along with the JAR.

If you know you are going to just put the properties file on HDFS then why
don't you define a custom system setting like properties.url and pass it
along:

(this is for Spark shell, the only CLI string I have available at the
moment:)

spark-shell --jars $JAR_NAME \
--conf 'properties.url=hdfs://config/stuff.properties' \
--conf
'spark.executor.extraJavaOptions=-Dproperties.url=hdfs://config/stuff.properties'

... then load the properties file during initialization by examining the
properties.url system setting.

I'd still strongly recommend Typesafe Config as it makes this a lot less
painful, and I know for certain you can place your *.conf at a URL (using
the -Dconfig.url=) though it probably won't work with an HDFS URL.


On Tue Feb 17 2015 at 9:53:08 AM Gerard Maas gerard.m...@gmail.com wrote:

 +1 for TypeSafe config
 Our practice is to include all spark properties under a 'spark' entry in
 the config file alongside job-specific configuration:

 A config file would look like:
 spark {
  master = 
  cleaner.ttl = 123456
  ...
 }
 job {
 context {
 src = foo
 action = barAction
 }
 prop1 = val1
 }

 Then, to create our Spark context, we transparently pass the spark section
 to a SparkConf instance.
 This idiom will instantiate the context with the spark specific
 configuration:


 sparkConfig.setAll(configToStringSeq(config.getConfig(spark).atPath(spark)))

 And we can make use of the config object everywhere else.

 We use the override model of the typesafe config: reasonable defaults go
 in the reference.conf (within the jar). Environment-specific overrides go
 in the application.conf (alongside the job jar) and hacks are passed with
 -Dprop=value :-)


 -kr, Gerard.


 On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc emre.sev...@gmail.com
 wrote:

 I've decided to try

   spark-submit ... --conf
 spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties

 But when I try to retrieve the value of propertiesFile via

System.err.println(propertiesFile :  +
 System.getProperty(propertiesFile));

 I get NULL:

propertiesFile : null

 Interestingly, when I run spark-submit with --verbose, I see that it
 prints:

   spark.driver.extraJavaOptions -
 -DpropertiesFile=/home/emre/data/belga/schemavalidator.properties

 I couldn't understand why I couldn't get to the value of propertiesFile
 by using standard System.getProperty method. (I can use new
 SparkConf().get(spark.driver.extraJavaOptions)  and manually parse it,
 and retrieve the value, but I'd like to know why I cannot retrieve that
 value using System.getProperty method).

 Any ideas?

 If I can achieve what I've described above properly, I plan to pass a
 properties file that resides on HDFS, so that it will be available to my
 driver program wherever that program runs.

 --
 Emre




 On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke charles.fed...@gmail.com
  wrote:

 I haven't actually tried mixing non-Spark settings into the Spark
 properties. Instead I package my properties into the jar and use the
 Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
 specific) to get at my properties:

 Properties file: src/main/resources/integration.conf

 (below $ENV might be set to either integration or prod[3])

 ssh -t root@$HOST /root/spark/bin/spark-shell --jars /root/$JAR_NAME \
 --conf 'config.resource=$ENV.conf' \
 --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'

 Since the properties file is packaged up with the JAR I don't have to
 worry about sending the file separately to all of the slave nodes. Typesafe
 Config is written in Java so it will work if you're not using Scala. (The
 Typesafe Config also has the advantage of being extremely easy to integrate
 with code that is using Java Properties today.)

 If you instead want to send the file separately from the JAR and you use
 the Typesafe Config library, you can specify config.file instead of
 .resource; though I'd point you to [3] below if you want to make your
 development life easier.

 1. https://github.com/typesafehub/config
 2. https://github.com/ceedubs/ficus
 3.
 http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/



 On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc emre.sev...@gmail.com
 wrote:

 Hello,

 I'm using Spark 1.2.1 and have a module.properties file, and in it I
 have non-Spark properties, as well as Spark properties, e.g.:

job.output.dir=file:///home/emre/data/mymodule/out

 I'm trying to pass it to spark-submit via:

spark-submit --class com.myModule --master local[4] --deploy-mode
 client --verbose --properties-file /home/emre/data/mymodule.properties
 mymodule.jar

 And I 

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-17 Thread Gerard Maas
+1 for TypeSafe config
Our practice is to include all spark properties under a 'spark' entry in
the config file alongside job-specific configuration:

A config file would look like:
spark {
 master = 
 cleaner.ttl = 123456
 ...
}
job {
context {
src = foo
action = barAction
}
prop1 = val1
}

Then, to create our Spark context, we transparently pass the spark section
to a SparkConf instance.
This idiom will instantiate the context with the spark specific
configuration:

sparkConfig.setAll(configToStringSeq(config.getConfig(spark).atPath(spark)))

And we can make use of the config object everywhere else.

We use the override model of the typesafe config: reasonable defaults go in
the reference.conf (within the jar). Environment-specific overrides go in
the application.conf (alongside the job jar) and hacks are passed with
-Dprop=value :-)


-kr, Gerard.


On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc emre.sev...@gmail.com wrote:

 I've decided to try

   spark-submit ... --conf
 spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties

 But when I try to retrieve the value of propertiesFile via

System.err.println(propertiesFile :  +
 System.getProperty(propertiesFile));

 I get NULL:

propertiesFile : null

 Interestingly, when I run spark-submit with --verbose, I see that it
 prints:

   spark.driver.extraJavaOptions -
 -DpropertiesFile=/home/emre/data/belga/schemavalidator.properties

 I couldn't understand why I couldn't get to the value of propertiesFile
 by using standard System.getProperty method. (I can use new
 SparkConf().get(spark.driver.extraJavaOptions)  and manually parse it,
 and retrieve the value, but I'd like to know why I cannot retrieve that
 value using System.getProperty method).

 Any ideas?

 If I can achieve what I've described above properly, I plan to pass a
 properties file that resides on HDFS, so that it will be available to my
 driver program wherever that program runs.

 --
 Emre




 On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke charles.fed...@gmail.com
 wrote:

 I haven't actually tried mixing non-Spark settings into the Spark
 properties. Instead I package my properties into the jar and use the
 Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
 specific) to get at my properties:

 Properties file: src/main/resources/integration.conf

 (below $ENV might be set to either integration or prod[3])

 ssh -t root@$HOST /root/spark/bin/spark-shell --jars /root/$JAR_NAME \
 --conf 'config.resource=$ENV.conf' \
 --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'

 Since the properties file is packaged up with the JAR I don't have to
 worry about sending the file separately to all of the slave nodes. Typesafe
 Config is written in Java so it will work if you're not using Scala. (The
 Typesafe Config also has the advantage of being extremely easy to integrate
 with code that is using Java Properties today.)

 If you instead want to send the file separately from the JAR and you use
 the Typesafe Config library, you can specify config.file instead of
 .resource; though I'd point you to [3] below if you want to make your
 development life easier.

 1. https://github.com/typesafehub/config
 2. https://github.com/ceedubs/ficus
 3.
 http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/



 On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc emre.sev...@gmail.com
 wrote:

 Hello,

 I'm using Spark 1.2.1 and have a module.properties file, and in it I
 have non-Spark properties, as well as Spark properties, e.g.:

job.output.dir=file:///home/emre/data/mymodule/out

 I'm trying to pass it to spark-submit via:

spark-submit --class com.myModule --master local[4] --deploy-mode
 client --verbose --properties-file /home/emre/data/mymodule.properties
 mymodule.jar

 And I thought I could read the value of my non-Spark property, namely,
 job.output.dir by using:

 SparkConf sparkConf = new SparkConf();
 final String validatedJSONoutputDir =
 sparkConf.get(job.output.dir);

 But it gives me an exception:

 Exception in thread main java.util.NoSuchElementException:
 job.output.dir

 Is it not possible to mix Spark and non-Spark properties in a single
 .properties file, then pass it via --properties-file and then get the
 values of those non-Spark properties via SparkConf?

 Or is there another object / method to retrieve the values for those
 non-Spark properties?


 --
 Emre Sevinç




 --
 Emre Sevinc



Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-17 Thread Emre Sevinc
I've decided to try

  spark-submit ... --conf
spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties

But when I try to retrieve the value of propertiesFile via

   System.err.println(propertiesFile :  +
System.getProperty(propertiesFile));

I get NULL:

   propertiesFile : null

Interestingly, when I run spark-submit with --verbose, I see that it prints:

  spark.driver.extraJavaOptions -
-DpropertiesFile=/home/emre/data/belga/schemavalidator.properties

I couldn't understand why I couldn't get to the value of propertiesFile
by using standard System.getProperty method. (I can use new
SparkConf().get(spark.driver.extraJavaOptions)  and manually parse it,
and retrieve the value, but I'd like to know why I cannot retrieve that
value using System.getProperty method).

Any ideas?

If I can achieve what I've described above properly, I plan to pass a
properties file that resides on HDFS, so that it will be available to my
driver program wherever that program runs.

--
Emre




On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke charles.fed...@gmail.com
wrote:

 I haven't actually tried mixing non-Spark settings into the Spark
 properties. Instead I package my properties into the jar and use the
 Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
 specific) to get at my properties:

 Properties file: src/main/resources/integration.conf

 (below $ENV might be set to either integration or prod[3])

 ssh -t root@$HOST /root/spark/bin/spark-shell --jars /root/$JAR_NAME \
 --conf 'config.resource=$ENV.conf' \
 --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'

 Since the properties file is packaged up with the JAR I don't have to
 worry about sending the file separately to all of the slave nodes. Typesafe
 Config is written in Java so it will work if you're not using Scala. (The
 Typesafe Config also has the advantage of being extremely easy to integrate
 with code that is using Java Properties today.)

 If you instead want to send the file separately from the JAR and you use
 the Typesafe Config library, you can specify config.file instead of
 .resource; though I'd point you to [3] below if you want to make your
 development life easier.

 1. https://github.com/typesafehub/config
 2. https://github.com/ceedubs/ficus
 3.
 http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/



 On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc emre.sev...@gmail.com
 wrote:

 Hello,

 I'm using Spark 1.2.1 and have a module.properties file, and in it I have
 non-Spark properties, as well as Spark properties, e.g.:

job.output.dir=file:///home/emre/data/mymodule/out

 I'm trying to pass it to spark-submit via:

spark-submit --class com.myModule --master local[4] --deploy-mode
 client --verbose --properties-file /home/emre/data/mymodule.properties
 mymodule.jar

 And I thought I could read the value of my non-Spark property, namely,
 job.output.dir by using:

 SparkConf sparkConf = new SparkConf();
 final String validatedJSONoutputDir = sparkConf.get(job.output.dir);

 But it gives me an exception:

 Exception in thread main java.util.NoSuchElementException:
 job.output.dir

 Is it not possible to mix Spark and non-Spark properties in a single
 .properties file, then pass it via --properties-file and then get the
 values of those non-Spark properties via SparkConf?

 Or is there another object / method to retrieve the values for those
 non-Spark properties?


 --
 Emre Sevinç




-- 
Emre Sevinc


Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Corey Nolet
We've been using commons configuration to pull our properties out of
properties files and system properties (prioritizing system properties over
others) and we add those properties to our spark conf explicitly and we use
ArgoPartser to get the command line argument for which property file to
load. We also implicitly added an extra parse args method to our SparkConf.
In our main method, we do something like this:

val sparkConf = SparkConfFactory.newSparkConf.parseModuleArts(args)
val sparkContext = new SparkContext(sparkConf)

Now all of our externally parsed properties are in the same spark conf so
we can pull them off anywhere in the program that has access to an
rdd/sparkcontext or the spark conf directly.

On Mon, Feb 16, 2015 at 10:42 AM, Sean Owen so...@cloudera.com wrote:

 How about system properties? or something like Typesafe Config which
 lets you at least override something in a built-in config file on the
 command line, with props or other files.

 On Mon, Feb 16, 2015 at 3:38 PM, Emre Sevinc emre.sev...@gmail.com
 wrote:
  Sean,
 
  I'm trying this as an alternative to what I currently do. Currently I
 have
  my module.properties file for my module in the resources directory, and
 that
  file is put inside the über JAR file when I build my application with
 Maven,
  and then when I submit it using spark-submit, I can read that
  module.properties file via the traditional method:
 
 
 
 properties.load(MyModule.class.getClassLoader().getResourceAsStream(module.properties));
 
  and everything works fine. The disadvantage is that in order to make any
  changes to that .properties file effective, I have to re-build my
  application. Therefore I'm trying to find a way to be able to send that
  module.properties file via spark-submit and read the values in iy, so
 that I
  will not be forced to build my application every time I want to make a
  change in the module.properties file.
 
  I've also checked the --files option of spark-submit, but I see that
 it is
  for sending the listed files to executors (correct me if I'm wrong), what
  I'm after is being able to pass dynamic properties (key/value pairs) to
 the
  Driver program of my Spark application. And I still could not find out
 how
  to do that.
 
  --
  Emre
 
 
 
 
 
  On Mon, Feb 16, 2015 at 4:28 PM, Sean Owen so...@cloudera.com wrote:
 
  Since SparkConf is only for Spark properties, I think it will in
  general only pay attention to and preserve spark.* properties. You
  could experiment with that. In general I wouldn't rely on Spark
  mechanisms for your configuration, and you can use any config
  mechanism you like to retain your own properties.
 
  On Mon, Feb 16, 2015 at 3:26 PM, Emre Sevinc emre.sev...@gmail.com
  wrote:
   Hello,
  
   I'm using Spark 1.2.1 and have a module.properties file, and in it I
   have
   non-Spark properties, as well as Spark properties, e.g.:
  
  job.output.dir=file:///home/emre/data/mymodule/out
  
   I'm trying to pass it to spark-submit via:
  
  spark-submit --class com.myModule --master local[4] --deploy-mode
   client
   --verbose --properties-file /home/emre/data/mymodule.properties
   mymodule.jar
  
   And I thought I could read the value of my non-Spark property, namely,
   job.output.dir by using:
  
   SparkConf sparkConf = new SparkConf();
   final String validatedJSONoutputDir =
   sparkConf.get(job.output.dir);
  
   But it gives me an exception:
  
   Exception in thread main java.util.NoSuchElementException:
   job.output.dir
  
   Is it not possible to mix Spark and non-Spark properties in a single
   .properties file, then pass it via --properties-file and then get the
   values
   of those non-Spark properties via SparkConf?
  
   Or is there another object / method to retrieve the values for those
   non-Spark properties?
  
  
   --
   Emre Sevinç
 
 
 
 
  --
  Emre Sevinc

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Sean Owen
Since SparkConf is only for Spark properties, I think it will in
general only pay attention to and preserve spark.* properties. You
could experiment with that. In general I wouldn't rely on Spark
mechanisms for your configuration, and you can use any config
mechanism you like to retain your own properties.

On Mon, Feb 16, 2015 at 3:26 PM, Emre Sevinc emre.sev...@gmail.com wrote:
 Hello,

 I'm using Spark 1.2.1 and have a module.properties file, and in it I have
 non-Spark properties, as well as Spark properties, e.g.:

job.output.dir=file:///home/emre/data/mymodule/out

 I'm trying to pass it to spark-submit via:

spark-submit --class com.myModule --master local[4] --deploy-mode client
 --verbose --properties-file /home/emre/data/mymodule.properties mymodule.jar

 And I thought I could read the value of my non-Spark property, namely,
 job.output.dir by using:

 SparkConf sparkConf = new SparkConf();
 final String validatedJSONoutputDir = sparkConf.get(job.output.dir);

 But it gives me an exception:

 Exception in thread main java.util.NoSuchElementException:
 job.output.dir

 Is it not possible to mix Spark and non-Spark properties in a single
 .properties file, then pass it via --properties-file and then get the values
 of those non-Spark properties via SparkConf?

 Or is there another object / method to retrieve the values for those
 non-Spark properties?


 --
 Emre Sevinç

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Emre Sevinc
Sean,

I'm trying this as an alternative to what I currently do. Currently I have
my module.properties file for my module in the resources directory, and
that file is put inside the über JAR file when I build my application with
Maven, and then when I submit it using spark-submit, I can read that
module.properties file via the traditional method:


properties.load(MyModule.class.getClassLoader().getResourceAsStream(module.properties));

and everything works fine. The disadvantage is that in order to make any
changes to that .properties file effective, I have to re-build my
application. Therefore I'm trying to find a way to be able to send that
module.properties file via spark-submit and read the values in iy, so that
I will not be forced to build my application every time I want to make a
change in the module.properties file.

I've also checked the --files option of spark-submit, but I see that it
is for sending the listed files to executors (correct me if I'm wrong),
what I'm after is being able to pass dynamic properties (key/value pairs)
to the Driver program of my Spark application. And I still could not find
out how to do that.

--
Emre





On Mon, Feb 16, 2015 at 4:28 PM, Sean Owen so...@cloudera.com wrote:

 Since SparkConf is only for Spark properties, I think it will in
 general only pay attention to and preserve spark.* properties. You
 could experiment with that. In general I wouldn't rely on Spark
 mechanisms for your configuration, and you can use any config
 mechanism you like to retain your own properties.

 On Mon, Feb 16, 2015 at 3:26 PM, Emre Sevinc emre.sev...@gmail.com
 wrote:
  Hello,
 
  I'm using Spark 1.2.1 and have a module.properties file, and in it I have
  non-Spark properties, as well as Spark properties, e.g.:
 
 job.output.dir=file:///home/emre/data/mymodule/out
 
  I'm trying to pass it to spark-submit via:
 
 spark-submit --class com.myModule --master local[4] --deploy-mode
 client
  --verbose --properties-file /home/emre/data/mymodule.properties
 mymodule.jar
 
  And I thought I could read the value of my non-Spark property, namely,
  job.output.dir by using:
 
  SparkConf sparkConf = new SparkConf();
  final String validatedJSONoutputDir =
 sparkConf.get(job.output.dir);
 
  But it gives me an exception:
 
  Exception in thread main java.util.NoSuchElementException:
  job.output.dir
 
  Is it not possible to mix Spark and non-Spark properties in a single
  .properties file, then pass it via --properties-file and then get the
 values
  of those non-Spark properties via SparkConf?
 
  Or is there another object / method to retrieve the values for those
  non-Spark properties?
 
 
  --
  Emre Sevinç




-- 
Emre Sevinc


Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Charles Feduke
I haven't actually tried mixing non-Spark settings into the Spark
properties. Instead I package my properties into the jar and use the
Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
specific) to get at my properties:

Properties file: src/main/resources/integration.conf

(below $ENV might be set to either integration or prod[3])

ssh -t root@$HOST /root/spark/bin/spark-shell --jars /root/$JAR_NAME \
--conf 'config.resource=$ENV.conf' \
--conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'

Since the properties file is packaged up with the JAR I don't have to worry
about sending the file separately to all of the slave nodes. Typesafe
Config is written in Java so it will work if you're not using Scala. (The
Typesafe Config also has the advantage of being extremely easy to integrate
with code that is using Java Properties today.)

If you instead want to send the file separately from the JAR and you use
the Typesafe Config library, you can specify config.file instead of
.resource; though I'd point you to [3] below if you want to make your
development life easier.

1. https://github.com/typesafehub/config
2. https://github.com/ceedubs/ficus
3.
http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/



On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc emre.sev...@gmail.com wrote:

 Hello,

 I'm using Spark 1.2.1 and have a module.properties file, and in it I have
 non-Spark properties, as well as Spark properties, e.g.:

job.output.dir=file:///home/emre/data/mymodule/out

 I'm trying to pass it to spark-submit via:

spark-submit --class com.myModule --master local[4] --deploy-mode
 client --verbose --properties-file /home/emre/data/mymodule.properties
 mymodule.jar

 And I thought I could read the value of my non-Spark property, namely,
 job.output.dir by using:

 SparkConf sparkConf = new SparkConf();
 final String validatedJSONoutputDir = sparkConf.get(job.output.dir);

 But it gives me an exception:

 Exception in thread main java.util.NoSuchElementException:
 job.output.dir

 Is it not possible to mix Spark and non-Spark properties in a single
 .properties file, then pass it via --properties-file and then get the
 values of those non-Spark properties via SparkConf?

 Or is there another object / method to retrieve the values for those
 non-Spark properties?


 --
 Emre Sevinç