[ https://issues.apache.org/jira/browse/SPARK-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009843#comment-15009843 ]
Jo Voordeckers edited comment on SPARK-11327 at 11/17/15 11:41 PM: ------------------------------------------------------------------- I've done more forensics on this: SPARK_EXECUTOR_OPTS gets populated with all the args in MesosClusterDispatcher. The driver loaunched via SparkSubmit (executed from the dispatcher) doesn't care about SPARK_EXECUTOR_OPTS. As a result, the driver doesn't see those properties, nor any of the jobs spawned from that driver. By appending the args onto the SparkSubmit call in the MesosClusterDispatcher as in my fix, makes sure that the driver gets the right properties and makes sure all properties are propagated all the way down to jobs spawned from the driver. I see 2 solutions here, either something along the lines of my patch or to make SparkSubmit aware of SPARK_EXECUTOR_OPTS. The latter involves parsing all variations of -Dfoo=bar and setting these system properties from inside the driver process, which I think is somewhat nasty and error prone. was (Author: jayv): I've done more forensics on this: SPARK_EXECUTOR_OPTS gets populated with all the args in MesosClusterDispatcher. The driver loaunched via SparkSubmit (executed from the dispatcher) doesn't care about SPARK_EXECUTOR_OPTS. As a result, the driver doesn't see those properties, nor any of the jobs spawned from that driver don't have any of the args. By appending the args onto the SparkSubmit call in the MesosClusterDispatcher as in my fix, makes sure that the driver gets the right properties and makes sure all properties are propagated all the way down to jobs spawned from the driver. I see 2 solutions here, either something along the lines of my patch or to make SparkSubmit aware of SPARK_EXECUTOR_OPTS. The latter involves parsing all variations of -Dfoo=bar and setting these system properties from inside the driver process, which I think is somewhat nasty and error prone. > spark-dispatcher doesn't pass along some spark properties > --------------------------------------------------------- > > Key: SPARK-11327 > URL: https://issues.apache.org/jira/browse/SPARK-11327 > Project: Spark > Issue Type: Bug > Components: Mesos > Reporter: Alan Braithwaite > > I haven't figured out exactly what's going on yet, but there's something in > the spark-dispatcher which is failing to pass along properties to the > spark-driver when using spark-submit in a clustered mesos docker environment. > Most importantly, it's not passing along spark.mesos.executor.docker.image... > cli: > {code} > docker run -t -i --rm --net=host > --entrypoint=/usr/local/spark/bin/spark-submit > docker.example.com/spark:2015.10.2 --conf spark.driver.memory=8G --conf > spark.mesos.executor.docker.image=docker.example.com/spark:2015.10.2 --master > mesos://spark-dispatcher.example.com:31262 --deploy-mode cluster > --properties-file /usr/local/spark/conf/spark-defaults.conf --class > com.example.spark.streaming.MyApp > http://jarserver.example.com:8000/sparkapp.jar zk1.example.com:2181 > spark-testing my-stream 40 > {code} > submit output: > {code} > 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request to launch > an application in mesos://compute1.example.com:31262. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending POST request to server > at http://compute1.example.com:31262/v1/submissions/create: > { > "action" : "CreateSubmissionRequest", > "appArgs" : [ "zk1.example.com:2181", "spark-testing", "requests", "40" ], > "appResource" : "http://jarserver.example.com:8000/sparkapp.jar", > "clientSparkVersion" : "1.5.0", > "environmentVariables" : { > "SPARK_SCALA_VERSION" : "2.10", > "SPARK_CONF_DIR" : "/usr/local/spark/conf", > "SPARK_HOME" : "/usr/local/spark", > "SPARK_ENV_LOADED" : "1" > }, > "mainClass" : "com.example.spark.streaming.MyApp", > "sparkProperties" : { > "spark.serializer" : "org.apache.spark.serializer.KryoSerializer", > "spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY" : > "/usr/local/lib/libmesos.so", > "spark.history.fs.logDirectory" : "hdfs://hdfsha.example.com/spark/logs", > "spark.eventLog.enabled" : "true", > "spark.driver.maxResultSize" : "0", > "spark.mesos.deploy.recoveryMode" : "ZOOKEEPER", > "spark.mesos.deploy.zookeeper.url" : > "zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181", > "spark.jars" : "http://jarserver.example.com:8000/sparkapp.jar", > "spark.driver.supervise" : "false", > "spark.app.name" : "com.example.spark.streaming.MyApp", > "spark.driver.memory" : "8G", > "spark.logConf" : "true", > "spark.deploy.zookeeper.dir" : "/spark_mesos_dispatcher", > "spark.mesos.executor.docker.image" : > "docker.example.com/spark-prod:2015.10.2", > "spark.submit.deployMode" : "cluster", > "spark.master" : "mesos://compute1.example.com:31262", > "spark.executor.memory" : "8G", > "spark.eventLog.dir" : "hdfs://hdfsha.example.com/spark/logs", > "spark.mesos.docker.executor.network" : "HOST", > "spark.mesos.executor.home" : "/usr/local/spark" > } > } > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: > { > "action" : "CreateSubmissionResponse", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > 15/10/26 22:03:53 INFO RestSubmissionClient: Submission successfully created > as driver-20151026220353-0011. Polling submission state... > 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request for the > status of submission driver-20151026220353-0011 in > mesos://compute1.example.com:31262. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending GET request to server > at > http://compute1.example.com:31262/v1/submissions/status/driver-20151026220353-0011. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: > { > "action" : "SubmissionStatusResponse", > "driverState" : "QUEUED", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > 15/10/26 22:03:53 INFO RestSubmissionClient: State of driver > driver-20151026220353-0011 is now QUEUED. > 15/10/26 22:03:53 INFO RestSubmissionClient: Server responded with > CreateSubmissionResponse: > { > "action" : "CreateSubmissionResponse", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > {code} > driver log: > {code} > 15/10/26 22:08:08 INFO SparkContext: Running Spark version 1.5.0 > 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, > sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Rate of > successful kerberos logins and latency (milliseconds)]) > 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, > sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Rate of > failed kerberos logins and latency (milliseconds)]) > 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, > sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[GetGroups]) > 15/10/26 22:08:08 DEBUG MetricsSystemImpl: UgiMetrics, User and group related > metrics > 15/10/26 22:08:08 DEBUG KerberosName: Kerberos krb5 configuration not found, > setting default realm to empty > 15/10/26 22:08:08 DEBUG Groups: Creating new Groups object > 15/10/26 22:08:08 DEBUG NativeCodeLoader: Trying to load the custom-built > native-hadoop library... > 15/10/26 22:08:08 DEBUG NativeCodeLoader: Failed to load native-hadoop with > error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path > 15/10/26 22:08:08 DEBUG NativeCodeLoader: > java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib > 15/10/26 22:08:08 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 15/10/26 22:08:08 DEBUG PerformanceAdvisory: Falling back to shell based > 15/10/26 22:08:08 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping > impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping > 15/10/26 22:08:08 DEBUG Shell: Failed to detect a valid hadoop home directory > java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. > at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302) > at org.apache.hadoop.util.Shell.<clinit>(Shell.java:327) > at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79) > at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:130) > at org.apache.hadoop.security.Groups.<init>(Groups.java:94) > at org.apache.hadoop.security.Groups.<init>(Groups.java:74) > at > org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:303) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:804) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647) > at > org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2084) > at > org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2084) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2084) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:310) > at > org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:847) > at > org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:81) > at > org.apache.spark.streaming.api.java.JavaStreamingContext.<init>(JavaStreamingContext.scala:134) > at com.example.spark.streaming.MyApp.main(MyApp.java:63) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 15/10/26 22:08:08 DEBUG Shell: setsid exited with exit code 0 > 15/10/26 22:08:08 DEBUG Groups: Group mapping > impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; > cacheTimeout=300000; warningDeltaMs=5000 > 15/10/26 22:08:08 DEBUG UserGroupInformation: hadoop login > 15/10/26 22:08:08 DEBUG UserGroupInformation: hadoop login commit > 15/10/26 22:08:08 DEBUG UserGroupInformation: using local user:UnixPrincipal: > root > 15/10/26 22:08:08 DEBUG UserGroupInformation: Using user: "UnixPrincipal: > root" with name root > 15/10/26 22:08:08 DEBUG UserGroupInformation: User entry: "root" > 15/10/26 22:08:08 DEBUG UserGroupInformation: UGI loginUser:root (auth:SIMPLE) > 15/10/26 22:08:08 INFO SparkContext: Spark configuration: > spark.app.name=MyApp > spark.deploy.zookeeper.dir=/spark_mesos_dispatcher > spark.driver.maxResultSize=0 > spark.driver.memory=8192M > spark.eventLog.dir=hdfs://hdfsha.example.com/spark/logs > spark.eventLog.enabled=true > spark.executor.memory=8G > spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so > spark.history.fs.logDirectory=hdfs://hdfsha.example.com/spark/logs > spark.jars=file:/var/lib/mesos/sandbox/sparkapp.jar > spark.logConf=true > spark.master=mesos://zk://zk1.example.com:2181/mesos > spark.mesos.deploy.recoveryMode=ZOOKEEPER > spark.mesos.deploy.zookeeper.url=zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181 > spark.mesos.docker.executor.network=HOST > spark.mesos.executor.home=/usr/local/spark > spark.serializer=org.apache.spark.serializer.KryoSerializer > spark.submit.deployMode=client > 15/10/26 22:08:08 INFO SecurityManager: Changing view acls to: root > 15/10/26 22:08:08 INFO SecurityManager: Changing modify acls to: root > 15/10/26 22:08:08 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); users > with modify permissions: Set(root) > 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified > 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified > 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified > {code} > The timestamps are different because I don't know which machine the driver is > going to be scheduled on, so after I know I did a docker start -ai <id> and > got the logs that way. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org