[ https://issues.apache.org/jira/browse/SPARK-11327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Braithwaite updated SPARK-11327: ------------------------------------- Description: I haven't figured out exactly what's going on yet, but there's something in the spark-dispatcher which is failing to pass along properties to the spark-driver when using spark-submit in a clustered mesos docker environment. Most importantly, it's not passing along spark.mesos.executor.docker.image. cli: {code} docker run -t -i --rm --net=host --entrypoint=/usr/local/spark/bin/spark-submit docker.example.com/spark:2015.10.2 --conf spark.driver.memory=8G --conf spark.mesos.executor.docker.image=docker.example.com/spark:2015.10.2 --master mesos://spark-dispatcher.example.com:31262 --deploy-mode cluster --properties-file /usr/local/spark/conf/spark-defaults.conf --class com.example.spark.streaming.MyApp http://jarserver.example.com:8000/sparkapp.jar zk1.example.com:2181 spark-testing my-stream 40 {code} submit output: {code} 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request to launch an application in mesos://compute1.example.com:31262. 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending POST request to server at http://compute1.example.com:31262/v1/submissions/create: { "action" : "CreateSubmissionRequest", "appArgs" : [ "zk1.example.com:2181", "spark-testing", "requests", "40" ], "appResource" : "http://jarserver.example.com:8000/sparkapp.jar", "clientSparkVersion" : "1.5.0", "environmentVariables" : { "SPARK_SCALA_VERSION" : "2.10", "SPARK_CONF_DIR" : "/usr/local/spark/conf", "SPARK_HOME" : "/usr/local/spark", "SPARK_ENV_LOADED" : "1" }, "mainClass" : "com.example.spark.streaming.MyApp", "sparkProperties" : { "spark.serializer" : "org.apache.spark.serializer.KryoSerializer", "spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY" : "/usr/local/lib/libmesos.so", "spark.history.fs.logDirectory" : "hdfs://hdfsha.example.com/spark/logs", "spark.eventLog.enabled" : "true", "spark.driver.maxResultSize" : "0", "spark.mesos.deploy.recoveryMode" : "ZOOKEEPER", "spark.mesos.deploy.zookeeper.url" : "zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181", "spark.jars" : "http://jarserver.example.com:8000/sparkapp.jar", "spark.driver.supervise" : "false", "spark.app.name" : "com.example.spark.streaming.MyApp", "spark.driver.memory" : "8G", "spark.logConf" : "true", "spark.deploy.zookeeper.dir" : "/spark_mesos_dispatcher", "spark.mesos.executor.docker.image" : "docker.example.com/spark-prod:2015.10.2", "spark.submit.deployMode" : "cluster", "spark.master" : "mesos://compute1.example.com:31262", "spark.executor.memory" : "8G", "spark.eventLog.dir" : "hdfs://hdfsha.example.com/spark/logs", "spark.mesos.docker.executor.network" : "HOST", "spark.mesos.executor.home" : "/usr/local/spark" } } 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: { "action" : "CreateSubmissionResponse", "serverSparkVersion" : "1.5.0", "submissionId" : "driver-20151026220353-0011", "success" : true } 15/10/26 22:03:53 INFO RestSubmissionClient: Submission successfully created as driver-20151026220353-0011. Polling submission state... 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20151026220353-0011 in mesos://compute1.example.com:31262. 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending GET request to server at http://compute1.example.com:31262/v1/submissions/status/driver-20151026220353-0011. 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: { "action" : "SubmissionStatusResponse", "driverState" : "QUEUED", "serverSparkVersion" : "1.5.0", "submissionId" : "driver-20151026220353-0011", "success" : true } 15/10/26 22:03:53 INFO RestSubmissionClient: State of driver driver-20151026220353-0011 is now QUEUED. 15/10/26 22:03:53 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse: { "action" : "CreateSubmissionResponse", "serverSparkVersion" : "1.5.0", "submissionId" : "driver-20151026220353-0011", "success" : true } {code} driver log: {code} 15/10/26 22:08:08 INFO SparkContext: Running Spark version 1.5.0 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)]) 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)]) 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[GetGroups]) 15/10/26 22:08:08 DEBUG MetricsSystemImpl: UgiMetrics, User and group related metrics 15/10/26 22:08:08 DEBUG KerberosName: Kerberos krb5 configuration not found, setting default realm to empty 15/10/26 22:08:08 DEBUG Groups: Creating new Groups object 15/10/26 22:08:08 DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library... 15/10/26 22:08:08 DEBUG NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 15/10/26 22:08:08 DEBUG NativeCodeLoader: java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 15/10/26 22:08:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/10/26 22:08:08 DEBUG PerformanceAdvisory: Falling back to shell based 15/10/26 22:08:08 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping 15/10/26 22:08:08 DEBUG Shell: Failed to detect a valid hadoop home directory java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:327) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:130) at org.apache.hadoop.security.Groups.<init>(Groups.java:94) at org.apache.hadoop.security.Groups.<init>(Groups.java:74) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:303) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:804) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2084) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2084) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2084) at org.apache.spark.SparkContext.<init>(SparkContext.scala:310) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:847) at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:81) at org.apache.spark.streaming.api.java.JavaStreamingContext.<init>(JavaStreamingContext.scala:134) at com.example.spark.streaming.MyApp.main(MyApp.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/10/26 22:08:08 DEBUG Shell: setsid exited with exit code 0 15/10/26 22:08:08 DEBUG Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000 15/10/26 22:08:08 DEBUG UserGroupInformation: hadoop login 15/10/26 22:08:08 DEBUG UserGroupInformation: hadoop login commit 15/10/26 22:08:08 DEBUG UserGroupInformation: using local user:UnixPrincipal: root 15/10/26 22:08:08 DEBUG UserGroupInformation: Using user: "UnixPrincipal: root" with name root 15/10/26 22:08:08 DEBUG UserGroupInformation: User entry: "root" 15/10/26 22:08:08 DEBUG UserGroupInformation: UGI loginUser:root (auth:SIMPLE) 15/10/26 22:08:08 INFO SparkContext: Spark configuration: spark.app.name=MyApp spark.deploy.zookeeper.dir=/spark_mesos_dispatcher spark.driver.maxResultSize=0 spark.driver.memory=8192M spark.eventLog.dir=hdfs://hdfsha.example.com/spark/logs spark.eventLog.enabled=true spark.executor.memory=8G spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so spark.history.fs.logDirectory=hdfs://hdfsha.example.com/spark/logs spark.jars=file:/var/lib/mesos/sandbox/sparkapp.jar spark.logConf=true spark.master=mesos://zk://zk1.example.com:2181/mesos spark.mesos.deploy.recoveryMode=ZOOKEEPER spark.mesos.deploy.zookeeper.url=zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181 spark.mesos.docker.executor.network=HOST spark.mesos.executor.home=/usr/local/spark spark.serializer=org.apache.spark.serializer.KryoSerializer spark.submit.deployMode=client 15/10/26 22:08:08 INFO SecurityManager: Changing view acls to: root 15/10/26 22:08:08 INFO SecurityManager: Changing modify acls to: root 15/10/26 22:08:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified {code} The timestamps are different because I don't know which machine the driver is going to be scheduled on, so after I know I did a docker start -ai <id> and got the logs that way. was: I haven't figured out exactly what's going on yet, but there's something in the spark-dispatcher which is failing to pass along properties to the spark-driver when using spark-submit in a clustered mesos docker environment. Most importantly, it's not passing along spark.mesos.executor.docker.image... cli: {code} docker run -t -i --rm --net=host --entrypoint=/usr/local/spark/bin/spark-submit docker.example.com/spark:2015.10.2 --conf spark.driver.memory=8G --conf spark.mesos.executor.docker.image=docker.example.com/spark:2015.10.2 --master mesos://spark-dispatcher.example.com:31262 --deploy-mode cluster --properties-file /usr/local/spark/conf/spark-defaults.conf --class com.example.spark.streaming.MyApp http://jarserver.example.com:8000/sparkapp.jar zk1.example.com:2181 spark-testing my-stream 40 {code} submit output: {code} 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request to launch an application in mesos://compute1.example.com:31262. 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending POST request to server at http://compute1.example.com:31262/v1/submissions/create: { "action" : "CreateSubmissionRequest", "appArgs" : [ "zk1.example.com:2181", "spark-testing", "requests", "40" ], "appResource" : "http://jarserver.example.com:8000/sparkapp.jar", "clientSparkVersion" : "1.5.0", "environmentVariables" : { "SPARK_SCALA_VERSION" : "2.10", "SPARK_CONF_DIR" : "/usr/local/spark/conf", "SPARK_HOME" : "/usr/local/spark", "SPARK_ENV_LOADED" : "1" }, "mainClass" : "com.example.spark.streaming.MyApp", "sparkProperties" : { "spark.serializer" : "org.apache.spark.serializer.KryoSerializer", "spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY" : "/usr/local/lib/libmesos.so", "spark.history.fs.logDirectory" : "hdfs://hdfsha.example.com/spark/logs", "spark.eventLog.enabled" : "true", "spark.driver.maxResultSize" : "0", "spark.mesos.deploy.recoveryMode" : "ZOOKEEPER", "spark.mesos.deploy.zookeeper.url" : "zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181", "spark.jars" : "http://jarserver.example.com:8000/sparkapp.jar", "spark.driver.supervise" : "false", "spark.app.name" : "com.example.spark.streaming.MyApp", "spark.driver.memory" : "8G", "spark.logConf" : "true", "spark.deploy.zookeeper.dir" : "/spark_mesos_dispatcher", "spark.mesos.executor.docker.image" : "docker.example.com/spark-prod:2015.10.2", "spark.submit.deployMode" : "cluster", "spark.master" : "mesos://compute1.example.com:31262", "spark.executor.memory" : "8G", "spark.eventLog.dir" : "hdfs://hdfsha.example.com/spark/logs", "spark.mesos.docker.executor.network" : "HOST", "spark.mesos.executor.home" : "/usr/local/spark" } } 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: { "action" : "CreateSubmissionResponse", "serverSparkVersion" : "1.5.0", "submissionId" : "driver-20151026220353-0011", "success" : true } 15/10/26 22:03:53 INFO RestSubmissionClient: Submission successfully created as driver-20151026220353-0011. Polling submission state... 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20151026220353-0011 in mesos://compute1.example.com:31262. 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending GET request to server at http://compute1.example.com:31262/v1/submissions/status/driver-20151026220353-0011. 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: { "action" : "SubmissionStatusResponse", "driverState" : "QUEUED", "serverSparkVersion" : "1.5.0", "submissionId" : "driver-20151026220353-0011", "success" : true } 15/10/26 22:03:53 INFO RestSubmissionClient: State of driver driver-20151026220353-0011 is now QUEUED. 15/10/26 22:03:53 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse: { "action" : "CreateSubmissionResponse", "serverSparkVersion" : "1.5.0", "submissionId" : "driver-20151026220353-0011", "success" : true } {code} driver log: {code} 15/10/26 22:08:08 INFO SparkContext: Running Spark version 1.5.0 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)]) 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)]) 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[GetGroups]) 15/10/26 22:08:08 DEBUG MetricsSystemImpl: UgiMetrics, User and group related metrics 15/10/26 22:08:08 DEBUG KerberosName: Kerberos krb5 configuration not found, setting default realm to empty 15/10/26 22:08:08 DEBUG Groups: Creating new Groups object 15/10/26 22:08:08 DEBUG NativeCodeLoader: Trying to load the custom-built native-hadoop library... 15/10/26 22:08:08 DEBUG NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path 15/10/26 22:08:08 DEBUG NativeCodeLoader: java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 15/10/26 22:08:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/10/26 22:08:08 DEBUG PerformanceAdvisory: Falling back to shell based 15/10/26 22:08:08 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping 15/10/26 22:08:08 DEBUG Shell: Failed to detect a valid hadoop home directory java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:327) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79) at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:130) at org.apache.hadoop.security.Groups.<init>(Groups.java:94) at org.apache.hadoop.security.Groups.<init>(Groups.java:74) at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:303) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:804) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2084) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2084) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2084) at org.apache.spark.SparkContext.<init>(SparkContext.scala:310) at org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:847) at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:81) at org.apache.spark.streaming.api.java.JavaStreamingContext.<init>(JavaStreamingContext.scala:134) at com.example.spark.streaming.MyApp.main(MyApp.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/10/26 22:08:08 DEBUG Shell: setsid exited with exit code 0 15/10/26 22:08:08 DEBUG Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000 15/10/26 22:08:08 DEBUG UserGroupInformation: hadoop login 15/10/26 22:08:08 DEBUG UserGroupInformation: hadoop login commit 15/10/26 22:08:08 DEBUG UserGroupInformation: using local user:UnixPrincipal: root 15/10/26 22:08:08 DEBUG UserGroupInformation: Using user: "UnixPrincipal: root" with name root 15/10/26 22:08:08 DEBUG UserGroupInformation: User entry: "root" 15/10/26 22:08:08 DEBUG UserGroupInformation: UGI loginUser:root (auth:SIMPLE) 15/10/26 22:08:08 INFO SparkContext: Spark configuration: spark.app.name=MyApp spark.deploy.zookeeper.dir=/spark_mesos_dispatcher spark.driver.maxResultSize=0 spark.driver.memory=8192M spark.eventLog.dir=hdfs://hdfsha.example.com/spark/logs spark.eventLog.enabled=true spark.executor.memory=8G spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so spark.history.fs.logDirectory=hdfs://hdfsha.example.com/spark/logs spark.jars=file:/var/lib/mesos/sandbox/sparkapp.jar spark.logConf=true spark.master=mesos://zk://zk1.example.com:2181/mesos spark.mesos.deploy.recoveryMode=ZOOKEEPER spark.mesos.deploy.zookeeper.url=zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181 spark.mesos.docker.executor.network=HOST spark.mesos.executor.home=/usr/local/spark spark.serializer=org.apache.spark.serializer.KryoSerializer spark.submit.deployMode=client 15/10/26 22:08:08 INFO SecurityManager: Changing view acls to: root 15/10/26 22:08:08 INFO SecurityManager: Changing modify acls to: root 15/10/26 22:08:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified {code} The timestamps are different because I don't know which machine the driver is going to be scheduled on, so after I know I did a docker start -ai <id> and got the logs that way. > spark-dispatcher doesn't pass along some spark properties > --------------------------------------------------------- > > Key: SPARK-11327 > URL: https://issues.apache.org/jira/browse/SPARK-11327 > Project: Spark > Issue Type: Bug > Components: Mesos > Reporter: Alan Braithwaite > > I haven't figured out exactly what's going on yet, but there's something in > the spark-dispatcher which is failing to pass along properties to the > spark-driver when using spark-submit in a clustered mesos docker environment. > Most importantly, it's not passing along spark.mesos.executor.docker.image. > cli: > {code} > docker run -t -i --rm --net=host > --entrypoint=/usr/local/spark/bin/spark-submit > docker.example.com/spark:2015.10.2 --conf spark.driver.memory=8G --conf > spark.mesos.executor.docker.image=docker.example.com/spark:2015.10.2 --master > mesos://spark-dispatcher.example.com:31262 --deploy-mode cluster > --properties-file /usr/local/spark/conf/spark-defaults.conf --class > com.example.spark.streaming.MyApp > http://jarserver.example.com:8000/sparkapp.jar zk1.example.com:2181 > spark-testing my-stream 40 > {code} > submit output: > {code} > 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request to launch > an application in mesos://compute1.example.com:31262. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending POST request to server > at http://compute1.example.com:31262/v1/submissions/create: > { > "action" : "CreateSubmissionRequest", > "appArgs" : [ "zk1.example.com:2181", "spark-testing", "requests", "40" ], > "appResource" : "http://jarserver.example.com:8000/sparkapp.jar", > "clientSparkVersion" : "1.5.0", > "environmentVariables" : { > "SPARK_SCALA_VERSION" : "2.10", > "SPARK_CONF_DIR" : "/usr/local/spark/conf", > "SPARK_HOME" : "/usr/local/spark", > "SPARK_ENV_LOADED" : "1" > }, > "mainClass" : "com.example.spark.streaming.MyApp", > "sparkProperties" : { > "spark.serializer" : "org.apache.spark.serializer.KryoSerializer", > "spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY" : > "/usr/local/lib/libmesos.so", > "spark.history.fs.logDirectory" : "hdfs://hdfsha.example.com/spark/logs", > "spark.eventLog.enabled" : "true", > "spark.driver.maxResultSize" : "0", > "spark.mesos.deploy.recoveryMode" : "ZOOKEEPER", > "spark.mesos.deploy.zookeeper.url" : > "zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181", > "spark.jars" : "http://jarserver.example.com:8000/sparkapp.jar", > "spark.driver.supervise" : "false", > "spark.app.name" : "com.example.spark.streaming.MyApp", > "spark.driver.memory" : "8G", > "spark.logConf" : "true", > "spark.deploy.zookeeper.dir" : "/spark_mesos_dispatcher", > "spark.mesos.executor.docker.image" : > "docker.example.com/spark-prod:2015.10.2", > "spark.submit.deployMode" : "cluster", > "spark.master" : "mesos://compute1.example.com:31262", > "spark.executor.memory" : "8G", > "spark.eventLog.dir" : "hdfs://hdfsha.example.com/spark/logs", > "spark.mesos.docker.executor.network" : "HOST", > "spark.mesos.executor.home" : "/usr/local/spark" > } > } > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: > { > "action" : "CreateSubmissionResponse", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > 15/10/26 22:03:53 INFO RestSubmissionClient: Submission successfully created > as driver-20151026220353-0011. Polling submission state... > 15/10/26 22:03:53 INFO RestSubmissionClient: Submitting a request for the > status of submission driver-20151026220353-0011 in > mesos://compute1.example.com:31262. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Sending GET request to server > at > http://compute1.example.com:31262/v1/submissions/status/driver-20151026220353-0011. > 15/10/26 22:03:53 DEBUG RestSubmissionClient: Response from the server: > { > "action" : "SubmissionStatusResponse", > "driverState" : "QUEUED", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > 15/10/26 22:03:53 INFO RestSubmissionClient: State of driver > driver-20151026220353-0011 is now QUEUED. > 15/10/26 22:03:53 INFO RestSubmissionClient: Server responded with > CreateSubmissionResponse: > { > "action" : "CreateSubmissionResponse", > "serverSparkVersion" : "1.5.0", > "submissionId" : "driver-20151026220353-0011", > "success" : true > } > {code} > driver log: > {code} > 15/10/26 22:08:08 INFO SparkContext: Running Spark version 1.5.0 > 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, > sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Rate of > successful kerberos logins and latency (milliseconds)]) > 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, > sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[Rate of > failed kerberos logins and latency (milliseconds)]) > 15/10/26 22:08:08 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, > sampleName=Ops, always=false, type=DEFAULT, valueName=Time, value=[GetGroups]) > 15/10/26 22:08:08 DEBUG MetricsSystemImpl: UgiMetrics, User and group related > metrics > 15/10/26 22:08:08 DEBUG KerberosName: Kerberos krb5 configuration not found, > setting default realm to empty > 15/10/26 22:08:08 DEBUG Groups: Creating new Groups object > 15/10/26 22:08:08 DEBUG NativeCodeLoader: Trying to load the custom-built > native-hadoop library... > 15/10/26 22:08:08 DEBUG NativeCodeLoader: Failed to load native-hadoop with > error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path > 15/10/26 22:08:08 DEBUG NativeCodeLoader: > java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib > 15/10/26 22:08:08 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 15/10/26 22:08:08 DEBUG PerformanceAdvisory: Falling back to shell based > 15/10/26 22:08:08 DEBUG JniBasedUnixGroupsMappingWithFallback: Group mapping > impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping > 15/10/26 22:08:08 DEBUG Shell: Failed to detect a valid hadoop home directory > java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set. > at org.apache.hadoop.util.Shell.checkHadoopHome(Shell.java:302) > at org.apache.hadoop.util.Shell.<clinit>(Shell.java:327) > at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79) > at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:130) > at org.apache.hadoop.security.Groups.<init>(Groups.java:94) > at org.apache.hadoop.security.Groups.<init>(Groups.java:74) > at > org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:303) > at > org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283) > at > org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260) > at > org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:804) > at > org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774) > at > org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647) > at > org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2084) > at > org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2084) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2084) > at org.apache.spark.SparkContext.<init>(SparkContext.scala:310) > at > org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:847) > at > org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:81) > at > org.apache.spark.streaming.api.java.JavaStreamingContext.<init>(JavaStreamingContext.scala:134) > at com.example.spark.streaming.MyApp.main(MyApp.java:63) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 15/10/26 22:08:08 DEBUG Shell: setsid exited with exit code 0 > 15/10/26 22:08:08 DEBUG Groups: Group mapping > impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; > cacheTimeout=300000; warningDeltaMs=5000 > 15/10/26 22:08:08 DEBUG UserGroupInformation: hadoop login > 15/10/26 22:08:08 DEBUG UserGroupInformation: hadoop login commit > 15/10/26 22:08:08 DEBUG UserGroupInformation: using local user:UnixPrincipal: > root > 15/10/26 22:08:08 DEBUG UserGroupInformation: Using user: "UnixPrincipal: > root" with name root > 15/10/26 22:08:08 DEBUG UserGroupInformation: User entry: "root" > 15/10/26 22:08:08 DEBUG UserGroupInformation: UGI loginUser:root (auth:SIMPLE) > 15/10/26 22:08:08 INFO SparkContext: Spark configuration: > spark.app.name=MyApp > spark.deploy.zookeeper.dir=/spark_mesos_dispatcher > spark.driver.maxResultSize=0 > spark.driver.memory=8192M > spark.eventLog.dir=hdfs://hdfsha.example.com/spark/logs > spark.eventLog.enabled=true > spark.executor.memory=8G > spark.executorEnv.MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so > spark.history.fs.logDirectory=hdfs://hdfsha.example.com/spark/logs > spark.jars=file:/var/lib/mesos/sandbox/sparkapp.jar > spark.logConf=true > spark.master=mesos://zk://zk1.example.com:2181/mesos > spark.mesos.deploy.recoveryMode=ZOOKEEPER > spark.mesos.deploy.zookeeper.url=zk1.example.com:2181,zk2.example.com:2181,zk3.example.com:2181,zk4.example.com:2181,zk5.example.com:2181 > spark.mesos.docker.executor.network=HOST > spark.mesos.executor.home=/usr/local/spark > spark.serializer=org.apache.spark.serializer.KryoSerializer > spark.submit.deployMode=client > 15/10/26 22:08:08 INFO SecurityManager: Changing view acls to: root > 15/10/26 22:08:08 INFO SecurityManager: Changing modify acls to: root > 15/10/26 22:08:08 INFO SecurityManager: SecurityManager: authentication > disabled; ui acls disabled; users with view permissions: Set(root); users > with modify permissions: Set(root) > 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified > 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified > 15/10/26 22:08:08 DEBUG SSLOptions: No SSL protocol specified > {code} > The timestamps are different because I don't know which machine the driver is > going to be scheduled on, so after I know I did a docker start -ai <id> and > got the logs that way. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org