[ 
https://issues.apache.org/jira/browse/SPARK-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-16540:
--------------------------------
    Description: 
Currently when running spark on yarn, jars specified with \--jars, \--packages 
will be added twice, one is Spark's own file server, another is yarn's 
distributed cache, this can be seen from log:

for example:

{code}
./bin/spark-shell --master yarn-client --jars 
examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
{code}

If specified the jar to be added is scopt jar, it will added twice:

{noformat}
...
16/07/14 15:06:48 INFO Server: Started @5603ms
16/07/14 15:06:48 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
16/07/14 15:06:48 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://192.168.0.102:4040
16/07/14 15:06:48 INFO SparkContext: Added JAR 
file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
 at spark://192.168.0.102:63996/jars/scopt_2.11-3.3.0.jar with timestamp 
1468480008637
16/07/14 15:06:49 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/07/14 15:06:49 INFO Client: Requesting a new application from cluster with 1 
NodeManagers
16/07/14 15:06:49 INFO Client: Verifying our application has not requested more 
than the maximum memory capability of the cluster (8192 MB per container)
16/07/14 15:06:49 INFO Client: Will allocate AM container, with 896 MB memory 
including 384 MB overhead
16/07/14 15:06:49 INFO Client: Setting up container launch context for our AM
16/07/14 15:06:49 INFO Client: Setting up the launch environment for our AM 
container
16/07/14 15:06:49 INFO Client: Preparing resources for our AM container
16/07/14 15:06:49 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
16/07/14 15:06:50 INFO Client: Uploading resource 
file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_libs__6486179704064718817.zip
 -> 
hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_libs__6486179704064718817.zip
16/07/14 15:06:51 INFO Client: Uploading resource 
file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
 -> 
hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/scopt_2.11-3.3.0.jar
16/07/14 15:06:51 INFO Client: Uploading resource 
file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_conf__326416236462420861.zip
 -> 
hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_conf__.zip
...
{noformat}

Actually it is not necessary to add into Spark's file server. This problem 
exists both in client and cluster modes, and it is introduced in SPARK-15782 to 
fix the \--packages not work in spark-shell.

  was:
Currently when running spark on yarn, jars specified with \--jars, \--packages 
will be added twice, one is Spark's own file server, another is yarn's 
distributed cache, this can be seen from log:

for example:

{code}
./bin/spark-shell --master yarn-client --jars 
examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
{code}

If specified the jar to be added is scopt jar, it will added twice:

{noformat}
16/07/14 15:06:47 INFO SparkContext: Running Spark version 2.1.0-SNAPSHOT
16/07/14 15:06:48 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
16/07/14 15:06:48 INFO SecurityManager: Changing view acls to: sshao
16/07/14 15:06:48 INFO SecurityManager: Changing modify acls to: sshao
16/07/14 15:06:48 INFO SecurityManager: Changing view acls groups to:
16/07/14 15:06:48 INFO SecurityManager: Changing modify acls groups to:
16/07/14 15:06:48 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(sshao); groups 
with view permissions: Set(); users  with modify permissions: Set(sshao); 
groups with modify permissions: Set()
16/07/14 15:06:48 INFO Utils: Successfully started service 'sparkDriver' on 
port 63996.
16/07/14 15:06:48 INFO SparkEnv: Registering MapOutputTracker
16/07/14 15:06:48 INFO SparkEnv: Registering BlockManagerMaster
16/07/14 15:06:48 INFO DiskBlockManager: Created local directory at 
/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/blockmgr-1082e581-85c4-47f2-b897-7d385494205e
16/07/14 15:06:48 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/07/14 15:06:48 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/14 15:06:48 INFO log: Logging initialized @5507ms
16/07/14 15:06:48 INFO Server: jetty-9.2.16.v20160414
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@7c5d1d25{/jobs,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@550e9be6{/jobs/json,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@3f96f020{/jobs/job,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@32eae6f2{/jobs/job/json,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@26ca61bf{/stages,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@73a0f2b{/stages/json,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@fa5f81c{/stages/stage,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@6a1d526c{/stages/stage/json,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@1f2f0109{/stages/pool,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@483b0690{/stages/pool/json,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@687e6293{/storage,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@6870c3c2{/storage/json,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@fb0a08c{/storage/rdd,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@1faf386c{/storage/rdd/json,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@4debbf0{/environment,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@6a5e167a{/environment/json,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@60e06f7d{/executors,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@66a5755{/executors/json,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@771a7d53{/executors/threadDump,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@6d6d480c{/executors/threadDump/json,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@e95595b{/static,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@5a917723{/,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@7e4579c7{/api,null,AVAILABLE}
16/07/14 15:06:48 INFO ContextHandler: Started 
o.e.j.s.ServletContextHandler@796f632b{/stages/stage/kill,null,AVAILABLE}
16/07/14 15:06:48 INFO ServerConnector: Started 
ServerConnector@889a8a8{HTTP/1.1}{0.0.0.0:4040}
16/07/14 15:06:48 INFO Server: Started @5603ms
16/07/14 15:06:48 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
16/07/14 15:06:48 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://192.168.0.102:4040
16/07/14 15:06:48 INFO SparkContext: Added JAR 
file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
 at spark://192.168.0.102:63996/jars/scopt_2.11-3.3.0.jar with timestamp 
1468480008637
16/07/14 15:06:49 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/07/14 15:06:49 INFO Client: Requesting a new application from cluster with 1 
NodeManagers
16/07/14 15:06:49 INFO Client: Verifying our application has not requested more 
than the maximum memory capability of the cluster (8192 MB per container)
16/07/14 15:06:49 INFO Client: Will allocate AM container, with 896 MB memory 
including 384 MB overhead
16/07/14 15:06:49 INFO Client: Setting up container launch context for our AM
16/07/14 15:06:49 INFO Client: Setting up the launch environment for our AM 
container
16/07/14 15:06:49 INFO Client: Preparing resources for our AM container
16/07/14 15:06:49 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
is set, falling back to uploading libraries under SPARK_HOME.
16/07/14 15:06:50 INFO Client: Uploading resource 
file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_libs__6486179704064718817.zip
 -> 
hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_libs__6486179704064718817.zip
16/07/14 15:06:51 INFO Client: Uploading resource 
file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
 -> 
hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/scopt_2.11-3.3.0.jar
16/07/14 15:06:51 INFO Client: Uploading resource 
file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_conf__326416236462420861.zip
 -> 
hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_conf__.zip
...
{noformat}

Actually it is not necessary to add into Spark's file server. This problem 
exists both in client and cluster modes, and it is introduced in SPARK-15782 to 
fix the \--packages not work in spark-shell.


> Jars specified with --jars will added twice when running on YARN
> ----------------------------------------------------------------
>
>                 Key: SPARK-16540
>                 URL: https://issues.apache.org/jira/browse/SPARK-16540
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy, YARN
>    Affects Versions: 2.0.0
>            Reporter: Saisai Shao
>
> Currently when running spark on yarn, jars specified with \--jars, 
> \--packages will be added twice, one is Spark's own file server, another is 
> yarn's distributed cache, this can be seen from log:
> for example:
> {code}
> ./bin/spark-shell --master yarn-client --jars 
> examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
> {code}
> If specified the jar to be added is scopt jar, it will added twice:
> {noformat}
> ...
> 16/07/14 15:06:48 INFO Server: Started @5603ms
> 16/07/14 15:06:48 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 16/07/14 15:06:48 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
> http://192.168.0.102:4040
> 16/07/14 15:06:48 INFO SparkContext: Added JAR 
> file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
>  at spark://192.168.0.102:63996/jars/scopt_2.11-3.3.0.jar with timestamp 
> 1468480008637
> 16/07/14 15:06:49 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
> 16/07/14 15:06:49 INFO Client: Requesting a new application from cluster with 
> 1 NodeManagers
> 16/07/14 15:06:49 INFO Client: Verifying our application has not requested 
> more than the maximum memory capability of the cluster (8192 MB per container)
> 16/07/14 15:06:49 INFO Client: Will allocate AM container, with 896 MB memory 
> including 384 MB overhead
> 16/07/14 15:06:49 INFO Client: Setting up container launch context for our AM
> 16/07/14 15:06:49 INFO Client: Setting up the launch environment for our AM 
> container
> 16/07/14 15:06:49 INFO Client: Preparing resources for our AM container
> 16/07/14 15:06:49 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive 
> is set, falling back to uploading libraries under SPARK_HOME.
> 16/07/14 15:06:50 INFO Client: Uploading resource 
> file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_libs__6486179704064718817.zip
>  -> 
> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_libs__6486179704064718817.zip
> 16/07/14 15:06:51 INFO Client: Uploading resource 
> file:/Users/sshao/projects/apache-spark/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar
>  -> 
> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/scopt_2.11-3.3.0.jar
> 16/07/14 15:06:51 INFO Client: Uploading resource 
> file:/private/var/folders/tb/8pw1511s2q78mj7plnq8p9g40000gn/T/spark-a446300b-84bf-43ff-bfb1-3adfb0571a42/__spark_conf__326416236462420861.zip
>  -> 
> hdfs://localhost:8020/user/sshao/.sparkStaging/application_1468468348998_0009/__spark_conf__.zip
> ...
> {noformat}
> Actually it is not necessary to add into Spark's file server. This problem 
> exists both in client and cluster modes, and it is introduced in SPARK-15782 
> to fix the \--packages not work in spark-shell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to