GitHub user witgo opened a pull request: https://github.com/apache/spark/pull/590
Improved build configuration Ⅱ @berngp I merge your code to this PR You can merge this pull request into a Git repository by running: $ git pull https://github.com/witgo/spark improved_build Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/590.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #590 ---- commit 4e96c0153063b35fc03e497f28292a97832e81d4 Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com> Date: 2014-04-15T21:03:30Z Add YARN/Stable compiled classes to the CLASSPATH. The change adds the `./yarn/stable/target/<scala-version>/classes` to the _Classpath_ when a _dependencies_ assembly is available at the assembly directory. Why is this change necessary? Ease the development features and bug-fixes for Spark-YARN. [ticket: X] : NA Author : bernardo.gomezpala...@gmail.com Reviewer : ? Testing : ? commit 1342886a396be00eda9449c6d84155dfecf954c8 Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com> Date: 2014-04-15T21:46:44Z The `spark-class` shell now ignores non jar files in the assembly directory. Why is this change necessary? While developing in Spark I found myself rebuilding either the dependencies assembly or the full spark assembly. I kept running into the case of having both the dep-assembly and full-assembly in the same directory and getting an error when I called either `spark-shell` or `spark-submit`. Quick fix: move either of them as a .bkp file depending on the development work flow you are executing at the moment and enabling the `spark-class` to ignore non-jar files. An other option could be to move the "offending" jar to a different directory but in my opinion keeping them in there is a bit tidier. e.g. ``` ll ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar.bkp ``` [ticket: X] : ? commit ddf2547aa2aea8155f8d6c0386e2cb37bcf61537 Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com> Date: 2014-04-15T21:53:23Z The `spark-shell` option `--log-conf` also enables the SPARK_PRINT_LAUNCH_COMMAND . Why is this change necessary? Most likely when enabling the `--log-conf` through the `spark-shell` you are also interested on the full invocation of the java command including the _classpath_ and extended options. e.g. ``` INFO: Base Directory set to /Users/bernardo/work/github/berngp/spark INFO: Spark Master is yarn-client INFO: Spark REPL options -Dspark.logConf=true Spark Command: /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/bin/java -cp :/Users/bernardo/work/github/berngp/spark/conf:/Users/bernardo/work/github/berngp/spark/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/repl/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/mllib/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/bagel/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/graphx/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/streaming/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/tools/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/catalyst/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/core/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/sql/hive/target/scala-2.10/classes:/Users/bernardo/work/github/berngp/spark/yarn/stable/target/scala-2.10/classes:/Users/bernardo/work/github/berng p/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-deps.jar:/usr/local/Cellar/hadoop/2.2.0/libexec/etc/hadoop -XX:ErrorFile=/tmp/spark-shell-hs_err_pid.log -XX:HeapDumpPath=/tmp/spark-shell-java_pid.hprof -XX:-HeapDumpOnOutOfMemoryError -XX:-PrintGC -XX:-PrintGCDetails -XX:-PrintGCTimeStamps -XX:-PrintTenuringDistribution -XX:-PrintAdaptiveSizePolicy -XX:GCLogFileSize=1024K -XX:-UseGCLogFileRotation -Xloggc:/tmp/spark-shell-gc.log -XX:+UseConcMarkSweepGC -Dspark.cleaner.ttl=10000 -Dspark.driver.host=33.33.33.1 -Dspark.logConf=true -Djava.library.path= -Xms400M -Xmx400M org.apache.spark.repl.Main ``` [ticket: X] : ? commit 22045394955992c2c8dfe0e1040c6bb972be6ce4 Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com> Date: 2014-04-15T22:15:23Z Root is now Spark and qualify the assembly if it was built with YARN. Why is this change necessary? Renamed the SBT "root" project to "spark" to enhance readability. Currently the assembly is qualified with the Hadoop Version but not if YARN has been enabled or not. This change qualifies the assembly such that it is easy to identify if YARN was enabled. e.g ``` ./make-distribution.sh --hadoop 2.3.0 --with-yarn ls -l ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0-yarn.jar ``` vs ``` ./make-distribution.sh --hadoop 2.3.0 ls -l ./assembly/target/scala-2.10 spark-assembly-1.0.0-SNAPSHOT-hadoop2.3.0.jar ``` [ticket: X] : ? commit 889bf4ed742ed3d06cb62276ef554f2f37b53ee6 Author: Bernardo Gomez Palacio <bernardo.gomezpala...@gmail.com> Date: 2014-04-16T00:08:27Z Upgrade the Maven Build to YARN 2.3.0. Upgraded to YARN 2.3.0, removed unnecessary `relativePath` values and removed incorrect version for the "org.apache.hadoop:hadoop-client" dependency at yarn/pom.xml. commit 460510a4ddf7082b24baeecbff33bfaee6438ea7 Author: witgo <wi...@qq.com> Date: 2014-04-29T17:15:58Z merge https://github.com/berngp/spark/commits/feature/small-shell-changes commit f1c7535fe6e97e1d5ebf8adcac01d82c794a01f8 Author: witgo <wi...@qq.com> Date: 2014-04-29T17:48:01Z Improved build configuration Ⅱ ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---