Merge branch 'tp32'
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/92a2640b Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/92a2640b Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/92a2640b Branch: refs/heads/master Commit: 92a2640b6ae5191144e847696f933b4aa98e99a1 Parents: f5687ee a86097d Author: Daniel Kuppitz <daniel_kupp...@hotmail.com> Authored: Tue Dec 12 14:08:31 2017 -0700 Committer: Daniel Kuppitz <daniel_kupp...@hotmail.com> Committed: Tue Dec 12 14:08:31 2017 -0700 ---------------------------------------------------------------------- docs/src/recipes/olap-spark-yarn.asciidoc | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/92a2640b/docs/src/recipes/olap-spark-yarn.asciidoc ---------------------------------------------------------------------- diff --cc docs/src/recipes/olap-spark-yarn.asciidoc index 85bfe18,634adeb..429d282 --- a/docs/src/recipes/olap-spark-yarn.asciidoc +++ b/docs/src/recipes/olap-spark-yarn.asciidoc @@@ -94,14 -94,14 +94,15 @@@ $ . bin/spark-yarn.s ---- hadoop = System.getenv('HADOOP_HOME') hadoopConfDir = System.getenv('HADOOP_CONF_DIR') - archivePath = "/tmp/spark-gremlin.zip" - ['bash', '-c', "rm $archivePath 2>/dev/null; cd ext/spark-gremlin/lib && zip $archivePath *.jar"].execute() + archive = 'spark-gremlin.zip' + archivePath = "/tmp/$archive" + ['bash', '-c', "rm -f $archivePath; cd ext/spark-gremlin/lib && zip $archivePath *.jar"].execute().waitFor() conf = new PropertiesConfiguration('conf/hadoop/hadoop-gryo.properties') -conf.setProperty('spark.master', 'yarn-client') -conf.setProperty('spark.yarn.dist.archives', "$archivePath") -conf.setProperty('spark.yarn.appMasterEnv.CLASSPATH', "./$archive/*:$hadoopConfDir") -conf.setProperty('spark.executor.extraClassPath', "./$archive/*:$hadoopConfDir") +conf.setProperty('spark.master', 'yarn') +conf.setProperty('spark.submit.deployMode', 'client') +conf.setProperty('spark.yarn.archive', "$archivePath") +conf.setProperty('spark.yarn.appMasterEnv.CLASSPATH', "./__spark_libs__/*:$hadoopConfDir") +conf.setProperty('spark.executor.extraClassPath', "./__spark_libs__/*:$hadoopConfDir") conf.setProperty('spark.driver.extraLibraryPath', "$hadoop/lib/native:$hadoop/lib/native/Linux-amd64-64") conf.setProperty('spark.executor.extraLibraryPath', "$hadoop/lib/native:$hadoop/lib/native/Linux-amd64-64") conf.setProperty('gremlin.spark.persistContext', 'true') @@@ -121,14 -121,13 +122,14 @@@ Explanatio ~~~~~~~~~~~ This recipe does not require running the `bin/hadoop/init-tp-spark.sh` script described in the - http://tinkerpop.apache.org/docs/x.y.z/reference/#sparkgraphcomputer[reference documentation] and thus is also + link:http://tinkerpop.apache.org/docs/x.y.z/reference/#sparkgraphcomputer[reference documentation] and thus is also valid for cluster users without access permissions to do so. -Rather, it exploits the `spark.yarn.dist.archives` property, which points to an archive with jars on the local file + +Rather, it exploits the `spark.yarn.archive` property, which points to an archive with jars on the local file system and is loaded into the various YARN containers. As a result the `spark-gremlin.zip` archive becomes available -as the directory named `spark-gremlin.zip` in the YARN containers. The `spark.executor.extraClassPath` and -`spark.yarn.appMasterEnv.CLASSPATH` properties point to the files inside this archive. -This is why they contain the `./spark-gremlin.zip/*` item. Just because a Spark executor got the archive with +as the directory named `+__spark_libs__+` in the YARN containers. The `spark.executor.extraClassPath` and +`spark.yarn.appMasterEnv.CLASSPATH` properties point to the jars inside this directory. +This is why they contain the `+./__spark_lib__/*+` item. Just because a Spark executor got the archive with jars loaded into its container, does not mean it knows how to access them. Also the `HADOOP_GREMLIN_LIBS` mechanism is not used because it can not work for Spark on YARN as implemented (jars @@@ -152,7 -151,7 +153,7 @@@ as long as you do not use the `spark-su runtime dependencies listed in the `Gremlin-Plugin-Dependencies` section of the manifest file in the `spark-gremlin` jar. -You may not like the idea that the Hadoop and Spark jars from the TinkerPop distribution differ from the versions in +You may not like the idea that the Hadoop and Spark jars from the Tinkerpop distribution differ from the versions in your cluster. If so, just build TinkerPop from source with the corresponding dependencies changed in the various `pom.xml` -files (e.g. `spark-core_2.10-1.6.1-some-vendor.jar` instead of `spark-core_2.10-1.6.1.jar`). Of course, TinkerPop will +files (e.g. `spark-core_2.11-2.2.0-some-vendor.jar` instead of `spark-core_2.11-2.2.0.jar`). Of course, TinkerPop will - only build for exactly matching or slightly differing artifact versions. + only build for exactly matching or slightly differing artifact versions.