Re: Buidling spark in Eclipse Kepler

2014-08-07 Thread Sean Owen
(Don't use gen-idea, just open it directly as a Maven project in IntelliJ.)

On Thu, Aug 7, 2014 at 4:53 AM, Ron Gonzalez
zlgonza...@yahoo.com.invalid wrote:
 So I downloaded community edition of IntelliJ, and ran sbt/sbt gen-idea.
 I then imported the pom.xml file.
 I'm still getting all sorts of errors from IntelliJ about unresolved 
 dependencies.
 Any suggestions?

 Thanks,
 Ron


 On Wednesday, August 6, 2014 12:29 PM, Ron Gonzalez 
 zlgonza...@yahoo.com.INVALID wrote:



 Hi,
   I'm trying to get the apache spark trunk compiling in my Eclipse, but I 
 can't seem to get it going. In particular, I've tried sbt/sbt eclipse, but it 
 doesn't seem to create the eclipse pieces for yarn and other projects. Doing 
 mvn eclipse:eclipse on yarn seems to fail as well as sbt/sbt eclipse just for 
 yarn fails. Is there some documentation available for eclipse? I've gone 
 through the ones on the site, but to no avail.
   Any tips?

 Thanks,
 Ron

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Documentation confusing or incorrect for decision trees?

2014-08-07 Thread Sean Owen
It's definitely just a typo. The ordered categories are A, C, B so the
other split can't be A | B, C. Just open a PR.

On Thu, Aug 7, 2014 at 2:11 AM, Matt Forbes m...@tellapart.com wrote:
 I found the section on ordering categorical features really interesting,
 but the A, B, C example seemed inconsistent. Am I interpreting this passage
 wrong, or are there typos? Aren't the split candidates A | C, B and A, C |
 B ?

 For example, for a binary classification problem with one categorical
 feature with three categories A, B and C with corresponding proportion of
 label 1 as 0.2, 0.6 and 0.4, the categorical features are ordered as A
 followed by C followed B or A, B, C. The two split candidates are A | C, B
 and A , B | C where | denotes the split.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[SNAPSHOT] Snapshot1 of Spark 1.1.0 has been posted

2014-08-07 Thread Patrick Wendell
Hi All,

I've packaged and published a snapshot release of Spark 1.1 for testing.
This is being distributed to the community for QA and preview purposes. It
is not yet an official RC for voting. Going forward, we'll do preview
releases like this for testing ahead of official votes.

The tag of this release is v1.1.0-snapshot1 (commit d428d8):
*https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d428d88418d385d1d04e1b0adcb6b068efe9c7b0
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d428d88418d385d1d04e1b0adcb6b068efe9c7b0*

The release files, including signatures, digests, etc can be found at:
*http://people.apache.org/~pwendell/spark-1.1.0-snapshot1/
http://people.apache.org/~pwendell/spark-1.1.0-snapshot1/*

Release artifacts are signed with the following key:
*https://people.apache.org/keys/committer/pwendell.asc
https://people.apache.org/keys/committer/pwendell.asc*

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1025/
https://repository.apache.org/content/repositories/orgapachespark-1024/

NOTE: Due to SPARK-2899, docs are not yet available for this release. Docs
will be posted ASAP.

To learn more about Apache Spark, please see
http://spark.apache.org/


Re: [SNAPSHOT] Snapshot1 of Spark 1.1.0 has been posted

2014-08-07 Thread Patrick Wendell
Minor correction: the encoded URL in the staging repo link was wrong.
The correct repo is:
https://repository.apache.org/content/repositories/orgapachespark-1025/


On Wed, Aug 6, 2014 at 11:23 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hi All,

 I've packaged and published a snapshot release of Spark 1.1 for testing. This 
 is being distributed to the community for QA and preview purposes. It is not 
 yet an official RC for voting. Going forward, we'll do preview releases like 
 this for testing ahead of official votes.

 The tag of this release is v1.1.0-snapshot1 (commit d428d8):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d428d88418d385d1d04e1b0adcb6b068efe9c7b0

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~pwendell/spark-1.1.0-snapshot1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1025/

 NOTE: Due to SPARK-2899, docs are not yet available for this release. Docs 
 will be posted ASAP.

 To learn more about Apache Spark, please see
 http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Buidling spark in Eclipse Kepler

2014-08-07 Thread Madhu
Ron,

I was able to build core in Eclipse following these steps:

https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-Eclipse

I was working only on core, so I know that works in Eclipse Juno.
I haven't tried yarn or other Eclipse releases.
Are you able to build *core* in Eclipse Kepler?

In my view, tool independence is a good thing.
I'll do what I can to support Eclipse.



-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Buidling-spark-in-Eclipse-Kepler-tp7712p7730.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Unit test best practice for Spark-derived projects

2014-08-07 Thread Madhu
How long does it take to get a spark context?
I found that if you don't have a network connection (reverse DNS lookup most
likely), it can take up 30 seconds to start up locally. I think a hosts file
entry is sufficient.



-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Unit-test-best-practice-for-Spark-derived-projects-tp7704p7731.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Unit test best practice for Spark-derived projects

2014-08-07 Thread Dmitriy Lyubimov
Thanks.

let me check this hypothesis (i have dhcp connection on a private net but
consequently not sure if there's an inverse).


On Thu, Aug 7, 2014 at 10:29 AM, Madhu ma...@madhu.com wrote:

 How long does it take to get a spark context?
 I found that if you don't have a network connection (reverse DNS lookup
 most
 likely), it can take up 30 seconds to start up locally. I think a hosts
 file
 entry is sufficient.



 -
 --
 Madhu
 https://www.linkedin.com/in/msiddalingaiah
 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Unit-test-best-practice-for-Spark-derived-projects-tp7704p7731.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Cody Koeninger
Just wanted to check in on this, see if I should file a bug report
regarding the mesos argument propagation.


On Thu, Jul 31, 2014 at 8:35 AM, Cody Koeninger c...@koeninger.org wrote:

 1. I've tried with and without escaping equals sign, it doesn't affect the
 results.

 2. Yeah, exporting SPARK_SUBMIT_OPTS from spark-env.sh works for getting
 system properties set in the local shell (although not for executors).

 3. We're using the default fine-grained mesos mode, not setting
 spark.mesos.coarse, so it doesn't seem immediately related to that ticket.
 Should I file a bug report?


 On Thu, Jul 31, 2014 at 1:33 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 The third issue may be related to this:
 https://issues.apache.org/jira/browse/SPARK-2022

 We can take a look at this during the bug fix period for the 1.1
 release next week. If we come up with a fix we can backport it into
 the 1.0 branch also.

 On Wed, Jul 30, 2014 at 11:31 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Thanks for digging around here. I think there are a few distinct issues.
 
  1. Properties containing the '=' character need to be escaped.
  I was able to load properties fine as long as I escape the '='
  character. But maybe we should document this:
 
  == spark-defaults.conf ==
  spark.foo a\=B
  == shell ==
  scala sc.getConf.get(spark.foo)
  res2: String = a=B
 
  2. spark.driver.extraJavaOptions, when set in the properties file,
  don't affect the driver when running in client mode (always the case
  for mesos). We should probably document this. In this case you need to
  either use --driver-java-options or set SPARK_SUBMIT_OPTS.
 
  3. Arguments aren't propagated on Mesos (this might be because of the
  other issues, or a separate bug).
 
  - Patrick
 
  On Wed, Jul 30, 2014 at 3:10 PM, Cody Koeninger c...@koeninger.org
 wrote:
  In addition, spark.executor.extraJavaOptions does not seem to behave
 as I
  would expect; java arguments don't seem to be propagated to executors.
 
 
  $ cat conf/spark-defaults.conf
 
  spark.master
 
 mesos://zk://etl-01.mxstg:2181,etl-02.mxstg:2181,etl-03.mxstg:2181/masters
  spark.executor.extraJavaOptions -Dfoo.bar.baz=23
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
 
  $ ./bin/spark-shell
 
  scala sc.getConf.get(spark.executor.extraJavaOptions)
  res0: String = -Dfoo.bar.baz=23
 
  scala sc.parallelize(1 to 100).map{ i = (
   |  java.net.InetAddress.getLocalHost.getHostName,
   |  System.getProperty(foo.bar.baz)
   | )}.collect
 
  res1: Array[(String, String)] = Array((dn-01.mxstg,null),
  (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
  (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
  (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
  (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-02.mxstg,null),
  (dn-02.mxstg,null), ...
 
 
 
  Note that this is a mesos deployment, although I wouldn't expect that
 to
  affect the availability of spark.driver.extraJavaOptions in a local
 spark
  shell.
 
 
  On Wed, Jul 30, 2014 at 4:18 PM, Cody Koeninger c...@koeninger.org
 wrote:
 
  Either whitespace or equals sign are valid properties file formats.
  Here's an example:
 
  $ cat conf/spark-defaults.conf
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
  $ ./bin/spark-shell -v
  Using properties file: /opt/spark/conf/spark-defaults.conf
  Adding default property:
 spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
 
 
  scala  System.getProperty(foo.bar.baz)
  res0: String = null
 
 
  If you add double quotes, the resulting string value will have double
  quotes.
 
 
  $ cat conf/spark-defaults.conf
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
  $ ./bin/spark-shell -v
  Using properties file: /opt/spark/conf/spark-defaults.conf
  Adding default property:
 spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
 
  scala  System.getProperty(foo.bar.baz)
  res0: String = null
 
 
  Neither one of those affects the issue; the underlying problem in my
 case
  seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and
  SPARK_JAVA_OPTS environment variables, but nothing parses
  spark-defaults.conf before the java process is started.
 
  Here's an example of the process running when only
 spark-defaults.conf is
  being used:
 
  $ ps -ef | grep spark
 
  514   5182  2058  0 21:05 pts/200:00:00 bash
 ./bin/spark-shell -v
 
  514   5189  5182  4 21:05 pts/200:00:22
 /usr/local/java/bin/java
  -cp
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
  -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
  org.apache.spark.deploy.SparkSubmit spark-shell -v --class
  org.apache.spark.repl.Main
 
 
  Here's an example of it when the command line --driver-java-options is
  used (and thus things work):
 
 
  $ ps -ef | grep spark
  514   5392  2058  0 21:15 pts/200:00:00 bash
 ./bin/spark-shell -v
  --driver-java-options -Dfoo.bar.baz=23
 
  514   5399  5392 80 21:15 

Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Marcelo Vanzin
Andrew has been working on a fix:
https://github.com/apache/spark/pull/1770

On Thu, Aug 7, 2014 at 2:35 PM, Cody Koeninger c...@koeninger.org wrote:
 Just wanted to check in on this, see if I should file a bug report
 regarding the mesos argument propagation.


 On Thu, Jul 31, 2014 at 8:35 AM, Cody Koeninger c...@koeninger.org wrote:

 1. I've tried with and without escaping equals sign, it doesn't affect the
 results.

 2. Yeah, exporting SPARK_SUBMIT_OPTS from spark-env.sh works for getting
 system properties set in the local shell (although not for executors).

 3. We're using the default fine-grained mesos mode, not setting
 spark.mesos.coarse, so it doesn't seem immediately related to that ticket.
 Should I file a bug report?


 On Thu, Jul 31, 2014 at 1:33 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 The third issue may be related to this:
 https://issues.apache.org/jira/browse/SPARK-2022

 We can take a look at this during the bug fix period for the 1.1
 release next week. If we come up with a fix we can backport it into
 the 1.0 branch also.

 On Wed, Jul 30, 2014 at 11:31 PM, Patrick Wendell pwend...@gmail.com
 wrote:
  Thanks for digging around here. I think there are a few distinct issues.
 
  1. Properties containing the '=' character need to be escaped.
  I was able to load properties fine as long as I escape the '='
  character. But maybe we should document this:
 
  == spark-defaults.conf ==
  spark.foo a\=B
  == shell ==
  scala sc.getConf.get(spark.foo)
  res2: String = a=B
 
  2. spark.driver.extraJavaOptions, when set in the properties file,
  don't affect the driver when running in client mode (always the case
  for mesos). We should probably document this. In this case you need to
  either use --driver-java-options or set SPARK_SUBMIT_OPTS.
 
  3. Arguments aren't propagated on Mesos (this might be because of the
  other issues, or a separate bug).
 
  - Patrick
 
  On Wed, Jul 30, 2014 at 3:10 PM, Cody Koeninger c...@koeninger.org
 wrote:
  In addition, spark.executor.extraJavaOptions does not seem to behave
 as I
  would expect; java arguments don't seem to be propagated to executors.
 
 
  $ cat conf/spark-defaults.conf
 
  spark.master
 
 mesos://zk://etl-01.mxstg:2181,etl-02.mxstg:2181,etl-03.mxstg:2181/masters
  spark.executor.extraJavaOptions -Dfoo.bar.baz=23
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
 
  $ ./bin/spark-shell
 
  scala sc.getConf.get(spark.executor.extraJavaOptions)
  res0: String = -Dfoo.bar.baz=23
 
  scala sc.parallelize(1 to 100).map{ i = (
   |  java.net.InetAddress.getLocalHost.getHostName,
   |  System.getProperty(foo.bar.baz)
   | )}.collect
 
  res1: Array[(String, String)] = Array((dn-01.mxstg,null),
  (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
  (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
  (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
  (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-02.mxstg,null),
  (dn-02.mxstg,null), ...
 
 
 
  Note that this is a mesos deployment, although I wouldn't expect that
 to
  affect the availability of spark.driver.extraJavaOptions in a local
 spark
  shell.
 
 
  On Wed, Jul 30, 2014 at 4:18 PM, Cody Koeninger c...@koeninger.org
 wrote:
 
  Either whitespace or equals sign are valid properties file formats.
  Here's an example:
 
  $ cat conf/spark-defaults.conf
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
  $ ./bin/spark-shell -v
  Using properties file: /opt/spark/conf/spark-defaults.conf
  Adding default property:
 spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
 
 
  scala  System.getProperty(foo.bar.baz)
  res0: String = null
 
 
  If you add double quotes, the resulting string value will have double
  quotes.
 
 
  $ cat conf/spark-defaults.conf
  spark.driver.extraJavaOptions -Dfoo.bar.baz=23
 
  $ ./bin/spark-shell -v
  Using properties file: /opt/spark/conf/spark-defaults.conf
  Adding default property:
 spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
 
  scala  System.getProperty(foo.bar.baz)
  res0: String = null
 
 
  Neither one of those affects the issue; the underlying problem in my
 case
  seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and
  SPARK_JAVA_OPTS environment variables, but nothing parses
  spark-defaults.conf before the java process is started.
 
  Here's an example of the process running when only
 spark-defaults.conf is
  being used:
 
  $ ps -ef | grep spark
 
  514   5182  2058  0 21:05 pts/200:00:00 bash
 ./bin/spark-shell -v
 
  514   5189  5182  4 21:05 pts/200:00:22
 /usr/local/java/bin/java
  -cp
 
 ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
  -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
  org.apache.spark.deploy.SparkSubmit spark-shell -v --class
  org.apache.spark.repl.Main
 
 
  Here's an example of it when the command line --driver-java-options is
  used (and thus things work):
 
 
  $ ps -ef | grep spark

Re: Unit test best practice for Spark-derived projects

2014-08-07 Thread Patrick Wendell
In the past I've found if I do a jstack when running some tests, it
sits forever inside of a hostname resolution step or something. I
never narrowed it down, though.

- Patrick

On Thu, Aug 7, 2014 at 10:45 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:
 Thanks.

 let me check this hypothesis (i have dhcp connection on a private net but
 consequently not sure if there's an inverse).


 On Thu, Aug 7, 2014 at 10:29 AM, Madhu ma...@madhu.com wrote:

 How long does it take to get a spark context?
 I found that if you don't have a network connection (reverse DNS lookup
 most
 likely), it can take up 30 seconds to start up locally. I think a hosts
 file
 entry is sufficient.



 -
 --
 Madhu
 https://www.linkedin.com/in/msiddalingaiah
 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/Unit-test-best-practice-for-Spark-derived-projects-tp7704p7731.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Buidling spark in Eclipse Kepler

2014-08-07 Thread Ron Gonzalez
So I opened it as a maven project (I opened it using the top-level pom.xml 
file), but rebuilding the project ends up in all sorts of errors about 
unresolved dependencies.

Thanks,
Ron


On Wednesday, August 6, 2014 11:15 PM, Sean Owen so...@cloudera.com wrote:
 


(Don't use gen-idea, just open it directly as a Maven project in IntelliJ.)


On Thu, Aug 7, 2014 at 4:53 AM, Ron Gonzalez
zlgonza...@yahoo.com.invalid wrote:
 So I downloaded community edition of IntelliJ, and ran sbt/sbt gen-idea.
 I then imported the pom.xml file.
 I'm still getting all sorts of errors from IntelliJ about unresolved 
 dependencies.
 Any suggestions?

 Thanks,
 Ron


 On Wednesday, August 6, 2014 12:29 PM, Ron Gonzalez 
 zlgonza...@yahoo.com.INVALID wrote:



 Hi,
   I'm trying to get the apache spark trunk compiling in my Eclipse, but I 
can't seem to get it going. In particular, I've tried sbt/sbt eclipse, but it 
doesn't seem to create the eclipse pieces for yarn and other projects. Doing 
mvn eclipse:eclipse on yarn seems to fail as well as sbt/sbt eclipse just for 
yarn fails. Is there some documentation available for eclipse? I've gone 
through the ones on the site, but to no avail.
   Any tips?

 Thanks,
 Ron

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Gary Malouf
Can this be cherry-picked for 1.1 if everything works out?  In my opinion,
it could be qualified as a bug fix.


On Thu, Aug 7, 2014 at 5:47 PM, Marcelo Vanzin van...@cloudera.com wrote:

 Andrew has been working on a fix:
 https://github.com/apache/spark/pull/1770

 On Thu, Aug 7, 2014 at 2:35 PM, Cody Koeninger c...@koeninger.org wrote:
  Just wanted to check in on this, see if I should file a bug report
  regarding the mesos argument propagation.
 
 
  On Thu, Jul 31, 2014 at 8:35 AM, Cody Koeninger c...@koeninger.org
 wrote:
 
  1. I've tried with and without escaping equals sign, it doesn't affect
 the
  results.
 
  2. Yeah, exporting SPARK_SUBMIT_OPTS from spark-env.sh works for getting
  system properties set in the local shell (although not for executors).
 
  3. We're using the default fine-grained mesos mode, not setting
  spark.mesos.coarse, so it doesn't seem immediately related to that
 ticket.
  Should I file a bug report?
 
 
  On Thu, Jul 31, 2014 at 1:33 AM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  The third issue may be related to this:
  https://issues.apache.org/jira/browse/SPARK-2022
 
  We can take a look at this during the bug fix period for the 1.1
  release next week. If we come up with a fix we can backport it into
  the 1.0 branch also.
 
  On Wed, Jul 30, 2014 at 11:31 PM, Patrick Wendell pwend...@gmail.com
  wrote:
   Thanks for digging around here. I think there are a few distinct
 issues.
  
   1. Properties containing the '=' character need to be escaped.
   I was able to load properties fine as long as I escape the '='
   character. But maybe we should document this:
  
   == spark-defaults.conf ==
   spark.foo a\=B
   == shell ==
   scala sc.getConf.get(spark.foo)
   res2: String = a=B
  
   2. spark.driver.extraJavaOptions, when set in the properties file,
   don't affect the driver when running in client mode (always the case
   for mesos). We should probably document this. In this case you need
 to
   either use --driver-java-options or set SPARK_SUBMIT_OPTS.
  
   3. Arguments aren't propagated on Mesos (this might be because of the
   other issues, or a separate bug).
  
   - Patrick
  
   On Wed, Jul 30, 2014 at 3:10 PM, Cody Koeninger c...@koeninger.org
  wrote:
   In addition, spark.executor.extraJavaOptions does not seem to behave
  as I
   would expect; java arguments don't seem to be propagated to
 executors.
  
  
   $ cat conf/spark-defaults.conf
  
   spark.master
  
 
 mesos://zk://etl-01.mxstg:2181,etl-02.mxstg:2181,etl-03.mxstg:2181/masters
   spark.executor.extraJavaOptions -Dfoo.bar.baz=23
   spark.driver.extraJavaOptions -Dfoo.bar.baz=23
  
  
   $ ./bin/spark-shell
  
   scala sc.getConf.get(spark.executor.extraJavaOptions)
   res0: String = -Dfoo.bar.baz=23
  
   scala sc.parallelize(1 to 100).map{ i = (
|  java.net.InetAddress.getLocalHost.getHostName,
|  System.getProperty(foo.bar.baz)
| )}.collect
  
   res1: Array[(String, String)] = Array((dn-01.mxstg,null),
   (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
   (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
   (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
   (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-02.mxstg,null),
   (dn-02.mxstg,null), ...
  
  
  
   Note that this is a mesos deployment, although I wouldn't expect
 that
  to
   affect the availability of spark.driver.extraJavaOptions in a local
  spark
   shell.
  
  
   On Wed, Jul 30, 2014 at 4:18 PM, Cody Koeninger c...@koeninger.org
 
  wrote:
  
   Either whitespace or equals sign are valid properties file formats.
   Here's an example:
  
   $ cat conf/spark-defaults.conf
   spark.driver.extraJavaOptions -Dfoo.bar.baz=23
  
   $ ./bin/spark-shell -v
   Using properties file: /opt/spark/conf/spark-defaults.conf
   Adding default property:
  spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
  
  
   scala  System.getProperty(foo.bar.baz)
   res0: String = null
  
  
   If you add double quotes, the resulting string value will have
 double
   quotes.
  
  
   $ cat conf/spark-defaults.conf
   spark.driver.extraJavaOptions -Dfoo.bar.baz=23
  
   $ ./bin/spark-shell -v
   Using properties file: /opt/spark/conf/spark-defaults.conf
   Adding default property:
  spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
  
   scala  System.getProperty(foo.bar.baz)
   res0: String = null
  
  
   Neither one of those affects the issue; the underlying problem in
 my
  case
   seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and
   SPARK_JAVA_OPTS environment variables, but nothing parses
   spark-defaults.conf before the java process is started.
  
   Here's an example of the process running when only
  spark-defaults.conf is
   being used:
  
   $ ps -ef | grep spark
  
   514   5182  2058  0 21:05 pts/200:00:00 bash
  ./bin/spark-shell -v
  
   514   5189  5182  4 21:05 pts/200:00:22
  /usr/local/java/bin/java
   -cp
  
 
 

Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Andrew Or
Thanks Marcelo, I have moved the changes to a new PR to describe the
problems more clearly: https://github.com/apache/spark/pull/1845

@Gary Yeah, the goal is to get this into 1.1 as a bug fix.


2014-08-07 17:30 GMT-07:00 Gary Malouf malouf.g...@gmail.com:

 Can this be cherry-picked for 1.1 if everything works out?  In my opinion,
 it could be qualified as a bug fix.


 On Thu, Aug 7, 2014 at 5:47 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

  Andrew has been working on a fix:
  https://github.com/apache/spark/pull/1770
 
  On Thu, Aug 7, 2014 at 2:35 PM, Cody Koeninger c...@koeninger.org
 wrote:
   Just wanted to check in on this, see if I should file a bug report
   regarding the mesos argument propagation.
  
  
   On Thu, Jul 31, 2014 at 8:35 AM, Cody Koeninger c...@koeninger.org
  wrote:
  
   1. I've tried with and without escaping equals sign, it doesn't affect
  the
   results.
  
   2. Yeah, exporting SPARK_SUBMIT_OPTS from spark-env.sh works for
 getting
   system properties set in the local shell (although not for executors).
  
   3. We're using the default fine-grained mesos mode, not setting
   spark.mesos.coarse, so it doesn't seem immediately related to that
  ticket.
   Should I file a bug report?
  
  
   On Thu, Jul 31, 2014 at 1:33 AM, Patrick Wendell pwend...@gmail.com
   wrote:
  
   The third issue may be related to this:
   https://issues.apache.org/jira/browse/SPARK-2022
  
   We can take a look at this during the bug fix period for the 1.1
   release next week. If we come up with a fix we can backport it into
   the 1.0 branch also.
  
   On Wed, Jul 30, 2014 at 11:31 PM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
Thanks for digging around here. I think there are a few distinct
  issues.
   
1. Properties containing the '=' character need to be escaped.
I was able to load properties fine as long as I escape the '='
character. But maybe we should document this:
   
== spark-defaults.conf ==
spark.foo a\=B
== shell ==
scala sc.getConf.get(spark.foo)
res2: String = a=B
   
2. spark.driver.extraJavaOptions, when set in the properties file,
don't affect the driver when running in client mode (always the
 case
for mesos). We should probably document this. In this case you need
  to
either use --driver-java-options or set SPARK_SUBMIT_OPTS.
   
3. Arguments aren't propagated on Mesos (this might be because of
 the
other issues, or a separate bug).
   
- Patrick
   
On Wed, Jul 30, 2014 at 3:10 PM, Cody Koeninger 
 c...@koeninger.org
   wrote:
In addition, spark.executor.extraJavaOptions does not seem to
 behave
   as I
would expect; java arguments don't seem to be propagated to
  executors.
   
   
$ cat conf/spark-defaults.conf
   
spark.master
   
  
 
 mesos://zk://etl-01.mxstg:2181,etl-02.mxstg:2181,etl-03.mxstg:2181/masters
spark.executor.extraJavaOptions -Dfoo.bar.baz=23
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
   
$ ./bin/spark-shell
   
scala sc.getConf.get(spark.executor.extraJavaOptions)
res0: String = -Dfoo.bar.baz=23
   
scala sc.parallelize(1 to 100).map{ i = (
 |  java.net.InetAddress.getLocalHost.getHostName,
 |  System.getProperty(foo.bar.baz)
 | )}.collect
   
res1: Array[(String, String)] = Array((dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-02.mxstg,null),
(dn-02.mxstg,null), ...
   
   
   
Note that this is a mesos deployment, although I wouldn't expect
  that
   to
affect the availability of spark.driver.extraJavaOptions in a
 local
   spark
shell.
   
   
On Wed, Jul 30, 2014 at 4:18 PM, Cody Koeninger 
 c...@koeninger.org
  
   wrote:
   
Either whitespace or equals sign are valid properties file
 formats.
Here's an example:
   
$ cat conf/spark-defaults.conf
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
$ ./bin/spark-shell -v
Using properties file: /opt/spark/conf/spark-defaults.conf
Adding default property:
   spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
   
   
scala  System.getProperty(foo.bar.baz)
res0: String = null
   
   
If you add double quotes, the resulting string value will have
  double
quotes.
   
   
$ cat conf/spark-defaults.conf
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
$ ./bin/spark-shell -v
Using properties file: /opt/spark/conf/spark-defaults.conf
Adding default property:
   spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
   
scala  System.getProperty(foo.bar.baz)
res0: String = null
   
   
Neither one of those affects the issue; the underlying problem in
  my
   case
seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and
SPARK_JAVA_OPTS environment 

Re: Fine-Grained Scheduler on Yarn

2014-08-07 Thread Jun Feng Liu
Any one know the answer?
 
Best Regards
 
Jun Feng Liu
IBM China Systems  Technology Laboratory in Beijing



Phone: 86-10-82452683 
E-mail: liuj...@cn.ibm.com


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China 
 

 



Jun Feng Liu/China/IBM 
2014/08/07 15:37

To
dev@spark.apache.org, 
cc

Subject
Fine-Grained Scheduler on Yarn





Hi, there

Just aware right now Spark only support fine grained scheduler on Mesos 
with MesosSchedulerBackend. The Yarn schedule sounds like only works on 
coarse-grained model. Is there any plan to implement fine-grained 
scheduler for YARN? Or there is any technical issue block us to do that.
 
Best Regards
 
Jun Feng Liu
IBM China Systems  Technology Laboratory in Beijing



Phone: 86-10-82452683 
E-mail: liuj...@cn.ibm.com


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China 
 

 


Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Andrew Or
@Cody I took a quick glance at the Mesos code and it appears that we
currently do not even pass extra java options to executors except in coarse
grained mode, and even in this mode we do not pass them to executors
correctly. I have filed a related JIRA here:
https://issues.apache.org/jira/browse/SPARK-2921. This is a somewhat
serious limitation and we will try to fix this for 1.1.

-Andrew


2014-08-07 19:42 GMT-07:00 Andrew Or and...@databricks.com:

 Thanks Marcelo, I have moved the changes to a new PR to describe the
 problems more clearly: https://github.com/apache/spark/pull/1845

 @Gary Yeah, the goal is to get this into 1.1 as a bug fix.


 2014-08-07 17:30 GMT-07:00 Gary Malouf malouf.g...@gmail.com:

 Can this be cherry-picked for 1.1 if everything works out?  In my opinion,
 it could be qualified as a bug fix.


 On Thu, Aug 7, 2014 at 5:47 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

  Andrew has been working on a fix:
  https://github.com/apache/spark/pull/1770
 
  On Thu, Aug 7, 2014 at 2:35 PM, Cody Koeninger c...@koeninger.org
 wrote:
   Just wanted to check in on this, see if I should file a bug report
   regarding the mesos argument propagation.
  
  
   On Thu, Jul 31, 2014 at 8:35 AM, Cody Koeninger c...@koeninger.org
  wrote:
  
   1. I've tried with and without escaping equals sign, it doesn't
 affect
  the
   results.
  
   2. Yeah, exporting SPARK_SUBMIT_OPTS from spark-env.sh works for
 getting
   system properties set in the local shell (although not for
 executors).
  
   3. We're using the default fine-grained mesos mode, not setting
   spark.mesos.coarse, so it doesn't seem immediately related to that
  ticket.
   Should I file a bug report?
  
  
   On Thu, Jul 31, 2014 at 1:33 AM, Patrick Wendell pwend...@gmail.com
 
   wrote:
  
   The third issue may be related to this:
   https://issues.apache.org/jira/browse/SPARK-2022
  
   We can take a look at this during the bug fix period for the 1.1
   release next week. If we come up with a fix we can backport it into
   the 1.0 branch also.
  
   On Wed, Jul 30, 2014 at 11:31 PM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
Thanks for digging around here. I think there are a few distinct
  issues.
   
1. Properties containing the '=' character need to be escaped.
I was able to load properties fine as long as I escape the '='
character. But maybe we should document this:
   
== spark-defaults.conf ==
spark.foo a\=B
== shell ==
scala sc.getConf.get(spark.foo)
res2: String = a=B
   
2. spark.driver.extraJavaOptions, when set in the properties file,
don't affect the driver when running in client mode (always the
 case
for mesos). We should probably document this. In this case you
 need
  to
either use --driver-java-options or set SPARK_SUBMIT_OPTS.
   
3. Arguments aren't propagated on Mesos (this might be because of
 the
other issues, or a separate bug).
   
- Patrick
   
On Wed, Jul 30, 2014 at 3:10 PM, Cody Koeninger 
 c...@koeninger.org
   wrote:
In addition, spark.executor.extraJavaOptions does not seem to
 behave
   as I
would expect; java arguments don't seem to be propagated to
  executors.
   
   
$ cat conf/spark-defaults.conf
   
spark.master
   
  
 
 mesos://zk://etl-01.mxstg:2181,etl-02.mxstg:2181,etl-03.mxstg:2181/masters
spark.executor.extraJavaOptions -Dfoo.bar.baz=23
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
   
$ ./bin/spark-shell
   
scala sc.getConf.get(spark.executor.extraJavaOptions)
res0: String = -Dfoo.bar.baz=23
   
scala sc.parallelize(1 to 100).map{ i = (
 |  java.net.InetAddress.getLocalHost.getHostName,
 |  System.getProperty(foo.bar.baz)
 | )}.collect
   
res1: Array[(String, String)] = Array((dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-02.mxstg,null),
(dn-02.mxstg,null), ...
   
   
   
Note that this is a mesos deployment, although I wouldn't expect
  that
   to
affect the availability of spark.driver.extraJavaOptions in a
 local
   spark
shell.
   
   
On Wed, Jul 30, 2014 at 4:18 PM, Cody Koeninger 
 c...@koeninger.org
  
   wrote:
   
Either whitespace or equals sign are valid properties file
 formats.
Here's an example:
   
$ cat conf/spark-defaults.conf
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
$ ./bin/spark-shell -v
Using properties file: /opt/spark/conf/spark-defaults.conf
Adding default property:
   spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
   
   
scala  System.getProperty(foo.bar.baz)
res0: String = null
   
   
If you add double quotes, the resulting string value will have
  double
quotes.
   
   
$ cat conf/spark-defaults.conf

Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Patrick Wendell
Andrew - I think your JIRA may duplicate existing work:
https://github.com/apache/spark/pull/1513


On Thu, Aug 7, 2014 at 7:55 PM, Andrew Or and...@databricks.com wrote:
 @Cody I took a quick glance at the Mesos code and it appears that we
 currently do not even pass extra java options to executors except in coarse
 grained mode, and even in this mode we do not pass them to executors
 correctly. I have filed a related JIRA here:
 https://issues.apache.org/jira/browse/SPARK-2921. This is a somewhat
 serious limitation and we will try to fix this for 1.1.

 -Andrew


 2014-08-07 19:42 GMT-07:00 Andrew Or and...@databricks.com:

 Thanks Marcelo, I have moved the changes to a new PR to describe the
 problems more clearly: https://github.com/apache/spark/pull/1845

 @Gary Yeah, the goal is to get this into 1.1 as a bug fix.


 2014-08-07 17:30 GMT-07:00 Gary Malouf malouf.g...@gmail.com:

 Can this be cherry-picked for 1.1 if everything works out?  In my opinion,
 it could be qualified as a bug fix.


 On Thu, Aug 7, 2014 at 5:47 PM, Marcelo Vanzin van...@cloudera.com
 wrote:

  Andrew has been working on a fix:
  https://github.com/apache/spark/pull/1770
 
  On Thu, Aug 7, 2014 at 2:35 PM, Cody Koeninger c...@koeninger.org
 wrote:
   Just wanted to check in on this, see if I should file a bug report
   regarding the mesos argument propagation.
  
  
   On Thu, Jul 31, 2014 at 8:35 AM, Cody Koeninger c...@koeninger.org
  wrote:
  
   1. I've tried with and without escaping equals sign, it doesn't
 affect
  the
   results.
  
   2. Yeah, exporting SPARK_SUBMIT_OPTS from spark-env.sh works for
 getting
   system properties set in the local shell (although not for
 executors).
  
   3. We're using the default fine-grained mesos mode, not setting
   spark.mesos.coarse, so it doesn't seem immediately related to that
  ticket.
   Should I file a bug report?
  
  
   On Thu, Jul 31, 2014 at 1:33 AM, Patrick Wendell pwend...@gmail.com
 
   wrote:
  
   The third issue may be related to this:
   https://issues.apache.org/jira/browse/SPARK-2022
  
   We can take a look at this during the bug fix period for the 1.1
   release next week. If we come up with a fix we can backport it into
   the 1.0 branch also.
  
   On Wed, Jul 30, 2014 at 11:31 PM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
Thanks for digging around here. I think there are a few distinct
  issues.
   
1. Properties containing the '=' character need to be escaped.
I was able to load properties fine as long as I escape the '='
character. But maybe we should document this:
   
== spark-defaults.conf ==
spark.foo a\=B
== shell ==
scala sc.getConf.get(spark.foo)
res2: String = a=B
   
2. spark.driver.extraJavaOptions, when set in the properties file,
don't affect the driver when running in client mode (always the
 case
for mesos). We should probably document this. In this case you
 need
  to
either use --driver-java-options or set SPARK_SUBMIT_OPTS.
   
3. Arguments aren't propagated on Mesos (this might be because of
 the
other issues, or a separate bug).
   
- Patrick
   
On Wed, Jul 30, 2014 at 3:10 PM, Cody Koeninger 
 c...@koeninger.org
   wrote:
In addition, spark.executor.extraJavaOptions does not seem to
 behave
   as I
would expect; java arguments don't seem to be propagated to
  executors.
   
   
$ cat conf/spark-defaults.conf
   
spark.master
   
  
 
 mesos://zk://etl-01.mxstg:2181,etl-02.mxstg:2181,etl-03.mxstg:2181/masters
spark.executor.extraJavaOptions -Dfoo.bar.baz=23
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
   
$ ./bin/spark-shell
   
scala sc.getConf.get(spark.executor.extraJavaOptions)
res0: String = -Dfoo.bar.baz=23
   
scala sc.parallelize(1 to 100).map{ i = (
 |  java.net.InetAddress.getLocalHost.getHostName,
 |  System.getProperty(foo.bar.baz)
 | )}.collect
   
res1: Array[(String, String)] = Array((dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
(dn-01.mxstg,null), (dn-01.mxstg,null), (dn-02.mxstg,null),
(dn-02.mxstg,null), ...
   
   
   
Note that this is a mesos deployment, although I wouldn't expect
  that
   to
affect the availability of spark.driver.extraJavaOptions in a
 local
   spark
shell.
   
   
On Wed, Jul 30, 2014 at 4:18 PM, Cody Koeninger 
 c...@koeninger.org
  
   wrote:
   
Either whitespace or equals sign are valid properties file
 formats.
Here's an example:
   
$ cat conf/spark-defaults.conf
spark.driver.extraJavaOptions -Dfoo.bar.baz=23
   
$ ./bin/spark-shell -v
Using properties file: /opt/spark/conf/spark-defaults.conf
Adding default property:
   spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
   
   
scala  System.getProperty(foo.bar.baz)
res0: 

Re: Fine-Grained Scheduler on Yarn

2014-08-07 Thread Patrick Wendell
The current YARN is equivalent to what is called fine grained mode in
Mesos. The scheduling of tasks happens totally inside of the Spark driver.


On Thu, Aug 7, 2014 at 7:50 PM, Jun Feng Liu liuj...@cn.ibm.com wrote:

 Any one know the answer?

 Best Regards


 *Jun Feng Liu*
 IBM China Systems  Technology Laboratory in Beijing

   --
  [image: 2D barcode - encoded with contact information] *Phone: 
 *86-10-82452683

 * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com
 [image: IBM]

 BLD 28,ZGC Software Park
 No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
 China





  *Jun Feng Liu/China/IBM*

 2014/08/07 15:37
   To
 dev@spark.apache.org,
 cc
   Subject
 Fine-Grained Scheduler on Yarn



 Hi, there

 Just aware right now Spark only support fine grained scheduler on Mesos
 with MesosSchedulerBackend. The Yarn schedule sounds like only works on
 coarse-grained model. Is there any plan to implement fine-grained scheduler
 for YARN? Or there is any technical issue block us to do that.

 Best Regards


 *Jun Feng Liu*
 IBM China Systems  Technology Laboratory in Beijing

   --
  [image: 2D barcode - encoded with contact information] *Phone: 
 *86-10-82452683

 * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com
 [image: IBM]

 BLD 28,ZGC Software Park
 No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
 China







Re: Fine-Grained Scheduler on Yarn

2014-08-07 Thread Patrick Wendell
Hey sorry about that - what I said was the opposite of what is true.

The current YARN mode is equivalent to coarse grained mesos. There is no
fine-grained scheduling on YARN at the moment. I'm not sure YARN supports
scheduling in units other than containers. Fine-grained scheduling requires
scheduling at the granularity of individual cores.


On Thu, Aug 7, 2014 at 9:43 PM, Patrick Wendell pwend...@gmail.com wrote:

 The current YARN is equivalent to what is called fine grained mode in
 Mesos. The scheduling of tasks happens totally inside of the Spark driver.


 On Thu, Aug 7, 2014 at 7:50 PM, Jun Feng Liu liuj...@cn.ibm.com wrote:

 Any one know the answer?

 Best Regards


 *Jun Feng Liu*
 IBM China Systems  Technology Laboratory in Beijing

   --
  [image: 2D barcode - encoded with contact information] *Phone: 
 *86-10-82452683

 * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com
 [image: IBM]

 BLD 28,ZGC Software Park
 No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
 China





  *Jun Feng Liu/China/IBM*

 2014/08/07 15:37
   To
 dev@spark.apache.org,
 cc
   Subject
 Fine-Grained Scheduler on Yarn



 Hi, there

 Just aware right now Spark only support fine grained scheduler on Mesos
 with MesosSchedulerBackend. The Yarn schedule sounds like only works on
 coarse-grained model. Is there any plan to implement fine-grained scheduler
 for YARN? Or there is any technical issue block us to do that.

 Best Regards


 *Jun Feng Liu*
 IBM China Systems  Technology Laboratory in Beijing

   --
  [image: 2D barcode - encoded with contact information] *Phone: 
 *86-10-82452683

 * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com
 [image: IBM]

 BLD 28,ZGC Software Park
 No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
 China








Re: replacement for SPARK_JAVA_OPTS

2014-08-07 Thread Andrew Or
Ah, great to know this is already being fixed. Thanks Patrick, I have
marked my JIRA as a duplicate.


2014-08-07 21:42 GMT-07:00 Patrick Wendell pwend...@gmail.com:

 Andrew - I think your JIRA may duplicate existing work:
 https://github.com/apache/spark/pull/1513


 On Thu, Aug 7, 2014 at 7:55 PM, Andrew Or and...@databricks.com wrote:
  @Cody I took a quick glance at the Mesos code and it appears that we
  currently do not even pass extra java options to executors except in
 coarse
  grained mode, and even in this mode we do not pass them to executors
  correctly. I have filed a related JIRA here:
  https://issues.apache.org/jira/browse/SPARK-2921. This is a somewhat
  serious limitation and we will try to fix this for 1.1.
 
  -Andrew
 
 
  2014-08-07 19:42 GMT-07:00 Andrew Or and...@databricks.com:
 
  Thanks Marcelo, I have moved the changes to a new PR to describe the
  problems more clearly: https://github.com/apache/spark/pull/1845
 
  @Gary Yeah, the goal is to get this into 1.1 as a bug fix.
 
 
  2014-08-07 17:30 GMT-07:00 Gary Malouf malouf.g...@gmail.com:
 
  Can this be cherry-picked for 1.1 if everything works out?  In my
 opinion,
  it could be qualified as a bug fix.
 
 
  On Thu, Aug 7, 2014 at 5:47 PM, Marcelo Vanzin van...@cloudera.com
  wrote:
 
   Andrew has been working on a fix:
   https://github.com/apache/spark/pull/1770
  
   On Thu, Aug 7, 2014 at 2:35 PM, Cody Koeninger c...@koeninger.org
  wrote:
Just wanted to check in on this, see if I should file a bug report
regarding the mesos argument propagation.
   
   
On Thu, Jul 31, 2014 at 8:35 AM, Cody Koeninger 
 c...@koeninger.org
   wrote:
   
1. I've tried with and without escaping equals sign, it doesn't
  affect
   the
results.
   
2. Yeah, exporting SPARK_SUBMIT_OPTS from spark-env.sh works for
  getting
system properties set in the local shell (although not for
  executors).
   
3. We're using the default fine-grained mesos mode, not setting
spark.mesos.coarse, so it doesn't seem immediately related to that
   ticket.
Should I file a bug report?
   
   
On Thu, Jul 31, 2014 at 1:33 AM, Patrick Wendell 
 pwend...@gmail.com
  
wrote:
   
The third issue may be related to this:
https://issues.apache.org/jira/browse/SPARK-2022
   
We can take a look at this during the bug fix period for the 1.1
release next week. If we come up with a fix we can backport it
 into
the 1.0 branch also.
   
On Wed, Jul 30, 2014 at 11:31 PM, Patrick Wendell 
  pwend...@gmail.com
wrote:
 Thanks for digging around here. I think there are a few
 distinct
   issues.

 1. Properties containing the '=' character need to be escaped.
 I was able to load properties fine as long as I escape the '='
 character. But maybe we should document this:

 == spark-defaults.conf ==
 spark.foo a\=B
 == shell ==
 scala sc.getConf.get(spark.foo)
 res2: String = a=B

 2. spark.driver.extraJavaOptions, when set in the properties
 file,
 don't affect the driver when running in client mode (always the
  case
 for mesos). We should probably document this. In this case you
  need
   to
 either use --driver-java-options or set SPARK_SUBMIT_OPTS.

 3. Arguments aren't propagated on Mesos (this might be because
 of
  the
 other issues, or a separate bug).

 - Patrick

 On Wed, Jul 30, 2014 at 3:10 PM, Cody Koeninger 
  c...@koeninger.org
wrote:
 In addition, spark.executor.extraJavaOptions does not seem to
  behave
as I
 would expect; java arguments don't seem to be propagated to
   executors.


 $ cat conf/spark-defaults.conf

 spark.master

   
  
 
 mesos://zk://etl-01.mxstg:2181,etl-02.mxstg:2181,etl-03.mxstg:2181/masters
 spark.executor.extraJavaOptions -Dfoo.bar.baz=23
 spark.driver.extraJavaOptions -Dfoo.bar.baz=23


 $ ./bin/spark-shell

 scala sc.getConf.get(spark.executor.extraJavaOptions)
 res0: String = -Dfoo.bar.baz=23

 scala sc.parallelize(1 to 100).map{ i = (
  |  java.net.InetAddress.getLocalHost.getHostName,
  |  System.getProperty(foo.bar.baz)
  | )}.collect

 res1: Array[(String, String)] = Array((dn-01.mxstg,null),
 (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
 (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
 (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
 (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-02.mxstg,null),
 (dn-02.mxstg,null), ...



 Note that this is a mesos deployment, although I wouldn't
 expect
   that
to
 affect the availability of spark.driver.extraJavaOptions in a
  local
spark
 shell.


 On Wed, Jul 30, 2014 at 4:18 PM, Cody Koeninger 
  c...@koeninger.org
   
wrote:

 Either whitespace or equals sign are valid properties file
  formats.
 Here's 

Re: Fine-Grained Scheduler on Yarn

2014-08-07 Thread Jun Feng Liu
Thanks for echo on this. Possible to adjust resource based on container 
numbers? e.g to allocate more container when driver need more resources 
and return some resource by delete some container when parts of container 
already have enough cores/memory 
 
Best Regards
 
Jun Feng Liu
IBM China Systems  Technology Laboratory in Beijing



Phone: 86-10-82452683 
E-mail: liuj...@cn.ibm.com


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China 
 

 



Patrick Wendell pwend...@gmail.com 
2014/08/08 13:10

To
Jun Feng Liu/China/IBM@IBMCN, 
cc
dev@spark.apache.org dev@spark.apache.org
Subject
Re: Fine-Grained Scheduler on Yarn






Hey sorry about that - what I said was the opposite of what is true.

The current YARN mode is equivalent to coarse grained mesos. There is no 
fine-grained scheduling on YARN at the moment. I'm not sure YARN supports 
scheduling in units other than containers. Fine-grained scheduling 
requires scheduling at the granularity of individual cores.


On Thu, Aug 7, 2014 at 9:43 PM, Patrick Wendell pwend...@gmail.com 
wrote:
The current YARN is equivalent to what is called fine grained mode in 
Mesos. The scheduling of tasks happens totally inside of the Spark driver.


On Thu, Aug 7, 2014 at 7:50 PM, Jun Feng Liu liuj...@cn.ibm.com wrote:
Any one know the answer?
Best Regards 
  
Jun Feng Liu
IBM China Systems  Technology Laboratory in Beijing 



Phone: 86-10-82452683 
E-mail: liuj...@cn.ibm.com 


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China 
 

  



Jun Feng Liu/China/IBM 
2014/08/07 15:37 


To
dev@spark.apache.org, 
cc

Subject
Fine-Grained Scheduler on Yarn







Hi, there 

Just aware right now Spark only support fine grained scheduler on Mesos 
with MesosSchedulerBackend. The Yarn schedule sounds like only works on 
coarse-grained model. Is there any plan to implement fine-grained 
scheduler for YARN? Or there is any technical issue block us to do that.
Best Regards 
  
Jun Feng Liu
IBM China Systems  Technology Laboratory in Beijing 



Phone: 86-10-82452683 
E-mail: liuj...@cn.ibm.com 


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China