[jira] [Updated] (SPARK-1359) SGD implementation is not efficient
[ https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-1359: --- Remaining Estimate: (was: 168h) Original Estimate: (was: 168h) > SGD implementation is not efficient > --- > > Key: SPARK-1359 > URL: https://issues.apache.org/jira/browse/SPARK-1359 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 0.9.0, 1.0.0 >Reporter: Xiangrui Meng >Priority: Major > > The SGD implementation samples a mini-batch to compute the stochastic > gradient. This is not efficient because examples are provided via an iterator > interface. We have to scan all of them to obtain a sample. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1359) SGD implementation is not efficient
[ https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-1359: --- Remaining Estimate: 168h Original Estimate: 168h > SGD implementation is not efficient > --- > > Key: SPARK-1359 > URL: https://issues.apache.org/jira/browse/SPARK-1359 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 0.9.0, 1.0.0 >Reporter: Xiangrui Meng >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > The SGD implementation samples a mini-batch to compute the stochastic > gradient. This is not efficient because examples are provided via an iterator > interface. We have to scan all of them to obtain a sample. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1359) SGD implementation is not efficient
[ https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-1359: --- Remaining Estimate: (was: 504h) Original Estimate: (was: 504h) > SGD implementation is not efficient > --- > > Key: SPARK-1359 > URL: https://issues.apache.org/jira/browse/SPARK-1359 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 0.9.0, 1.0.0 >Reporter: Xiangrui Meng >Priority: Major > > The SGD implementation samples a mini-batch to compute the stochastic > gradient. This is not efficient because examples are provided via an iterator > interface. We have to scan all of them to obtain a sample. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1359) SGD implementation is not efficient
[ https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-1359: --- Remaining Estimate: 504h Original Estimate: 504h > SGD implementation is not efficient > --- > > Key: SPARK-1359 > URL: https://issues.apache.org/jira/browse/SPARK-1359 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 0.9.0, 1.0.0 >Reporter: Xiangrui Meng >Priority: Major > Original Estimate: 504h > Remaining Estimate: 504h > > The SGD implementation samples a mini-batch to compute the stochastic > gradient. This is not efficient because examples are provided via an iterator > interface. We have to scan all of them to obtain a sample. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher
[ https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049380#comment-16049380 ] Michael Gummelt commented on SPARK-20812: - The dispatcher won't know about the secrets. It will only know about the name of the secret that the user provides. The Mesos plugin that handles secrets (which in DC/OS is the secret store), will actually start the task with the secret mounted. As for the secret store specifically, that's a DC/OS feature, not an Apache feature, so let's talk about it elsewhere. > Add Mesos Secrets support to the spark dispatcher > - > > Key: SPARK-20812 > URL: https://issues.apache.org/jira/browse/SPARK-20812 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Michael Gummelt > > Mesos 1.4 will support secrets. In order to support sending keytabs through > the Spark Dispatcher, or any other secret, we need to integrate this with the > Spark Dispatcher. > The integration should include support for both file-based and env-based > secrets. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher
[ https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-20812: Description: Mesos 1.4 will support secrets. In order to support sending keytabs through the Spark Dispatcher, or any other secret, we need to integrate this with the Spark Dispatcher. The integration should include support for both file-based and env-based secrets. was: Mesos 1.3 supports secrets. In order to support sending keytabs through the Spark Dispatcher, or any other secret, we need to integrate this with the Spark Dispatcher. The integration should include support for both file-based and env-based secrets. > Add Mesos Secrets support to the spark dispatcher > - > > Key: SPARK-20812 > URL: https://issues.apache.org/jira/browse/SPARK-20812 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Michael Gummelt > > Mesos 1.4 will support secrets. In order to support sending keytabs through > the Spark Dispatcher, or any other secret, we need to integrate this with the > Spark Dispatcher. > The integration should include support for both file-based and env-based > secrets. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20434) Move Hadoop delegation token code from yarn to core
[ https://issues.apache.org/jira/browse/SPARK-20434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-20434: Summary: Move Hadoop delegation token code from yarn to core (was: Move Kerberos delegation token code from yarn to core) > Move Hadoop delegation token code from yarn to core > --- > > Key: SPARK-20434 > URL: https://issues.apache.org/jira/browse/SPARK-20434 > Project: Spark > Issue Type: Task > Components: Mesos, Spark Core, YARN >Affects Versions: 2.1.0 >Reporter: Michael Gummelt > > This is to enable kerberos support for other schedulers, such as Mesos. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21000) Add labels support to the Spark Dispatcher
Michael Gummelt created SPARK-21000: --- Summary: Add labels support to the Spark Dispatcher Key: SPARK-21000 URL: https://issues.apache.org/jira/browse/SPARK-21000 Project: Spark Issue Type: New Feature Components: Mesos Affects Versions: 2.2.1 Reporter: Michael Gummelt Labels can be used for tagging drivers with arbitrary data, which can then be used by an organization's tooling. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4899) Support Mesos features: roles and checkpoints
[ https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023383#comment-16023383 ] Michael Gummelt commented on SPARK-4899: Thanks Kamal. I responded to the thread, which I'll copy here: bq. Restarting the agent without checkpointing enabled will kill the executor, but that still shouldn't cause the Spark job to fail, since Spark jobs should tolerate executor failures. So I'm fine with adding checkpointing support, but I'm not sure it actually solves any problem. > Support Mesos features: roles and checkpoints > - > > Key: SPARK-4899 > URL: https://issues.apache.org/jira/browse/SPARK-4899 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 1.2.0 >Reporter: Andrew Ash > > Inspired by https://github.com/apache/spark/pull/60 > Mesos has two features that would be nice for Spark to take advantage of: > 1. Roles -- a way to specify ACLs and priorities for users > 2. Checkpoints -- a way to restart a failed Mesos slave without losing all > the work that was happening on the box > Some of these may require a Mesos upgrade past our current 0.18.1 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4899) Support Mesos features: roles and checkpoints
[ https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021766#comment-16021766 ] Michael Gummelt commented on SPARK-4899: [~drcrallen] Can you link me to the conversation you had with Tim? I can't find it on the mailing list. > Support Mesos features: roles and checkpoints > - > > Key: SPARK-4899 > URL: https://issues.apache.org/jira/browse/SPARK-4899 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 1.2.0 >Reporter: Andrew Ash > > Inspired by https://github.com/apache/spark/pull/60 > Mesos has two features that would be nice for Spark to take advantage of: > 1. Roles -- a way to specify ACLs and priorities for users > 2. Checkpoints -- a way to restart a failed Mesos slave without losing all > the work that was happening on the box > Some of these may require a Mesos upgrade past our current 0.18.1 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4899) Support Mesos features: roles and checkpoints
[ https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021750#comment-16021750 ] Michael Gummelt commented on SPARK-4899: These are two separate features, which need two separate JIRAs. Roles is already supported, though, so this should either be renamed or closed in favor of a JIRA just for checkpointing. > Support Mesos features: roles and checkpoints > - > > Key: SPARK-4899 > URL: https://issues.apache.org/jira/browse/SPARK-4899 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 1.2.0 >Reporter: Andrew Ash > > Inspired by https://github.com/apache/spark/pull/60 > Mesos has two features that would be nice for Spark to take advantage of: > 1. Roles -- a way to specify ACLs and priorities for users > 2. Checkpoints -- a way to restart a failed Mesos slave without losing all > the work that was happening on the box > Some of these may require a Mesos upgrade past our current 0.18.1 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher
[ https://issues.apache.org/jira/browse/SPARK-20812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-20812: Description: Mesos 1.3 supports secrets. In order to support sending keytabs through the Spark Dispatcher, or any other secret, we need to integrate this with the Spark Dispatcher. The integration should include support for both file-based and env-based secrets. was:Mesos 1.3 supports secrets. In order to support sending keytabs through the Spark Dispatcher, or any other secret, we need to integrate this with the Spark Dispatcher. > Add Mesos Secrets support to the spark dispatcher > - > > Key: SPARK-20812 > URL: https://issues.apache.org/jira/browse/SPARK-20812 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.3.0 >Reporter: Michael Gummelt > > Mesos 1.3 supports secrets. In order to support sending keytabs through the > Spark Dispatcher, or any other secret, we need to integrate this with the > Spark Dispatcher. > The integration should include support for both file-based and env-based > secrets. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20812) Add Mesos Secrets support to the spark dispatcher
Michael Gummelt created SPARK-20812: --- Summary: Add Mesos Secrets support to the spark dispatcher Key: SPARK-20812 URL: https://issues.apache.org/jira/browse/SPARK-20812 Project: Spark Issue Type: New Feature Components: Mesos Affects Versions: 2.3.0 Reporter: Michael Gummelt Mesos 1.3 supports secrets. In order to support sending keytabs through the Spark Dispatcher, or any other secret, we need to integrate this with the Spark Dispatcher. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16627) --jars doesn't work in Mesos mode
[ https://issues.apache.org/jira/browse/SPARK-16627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016253#comment-16016253 ] Michael Gummelt commented on SPARK-16627: - I'm not completely sure, but I believe that the dispatcher is correctly setting {{spark.jars}}, but due to SPARK-10643, the driver is not recognizing the remote jar URL. > --jars doesn't work in Mesos mode > - > > Key: SPARK-16627 > URL: https://issues.apache.org/jira/browse/SPARK-16627 > Project: Spark > Issue Type: Bug > Components: Mesos >Reporter: Michael Gummelt > > Definitely doesn't work in cluster mode. Might not work in client mode > either. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12559) Cluster mode doesn't work with --packages
[ https://issues.apache.org/jira/browse/SPARK-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016243#comment-16016243 ] Michael Gummelt commented on SPARK-12559: - I changed the title from "Standalone cluster mode" to "cluster mode", since --packages doesn't work with any cluster mode. > Cluster mode doesn't work with --packages > - > > Key: SPARK-12559 > URL: https://issues.apache.org/jira/browse/SPARK-12559 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.3.0 >Reporter: Andrew Or > > From the mailing list: > {quote} > Another problem I ran into that you also might is that --packages doesn't > work with --deploy-mode cluster. It downloads the packages to a temporary > location on the node running spark-submit, then passes those paths to the > node that is running the Driver, but since that isn't the same machine, it > can't find anything and fails. The driver process *should* be the one > doing the downloading, but it isn't. I ended up having to create a fat JAR > with all of the dependencies to get around that one. > {quote} > The problem is that we currently don't upload jars to the cluster. It seems > to fix this we either (1) do upload jars, or (2) just run the packages code > on the driver side. I slightly prefer (2) because it's simpler. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12559) Cluster mode doesn't work with --packages
[ https://issues.apache.org/jira/browse/SPARK-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-12559: Summary: Cluster mode doesn't work with --packages (was: Standalone cluster mode doesn't work with --packages) > Cluster mode doesn't work with --packages > - > > Key: SPARK-12559 > URL: https://issues.apache.org/jira/browse/SPARK-12559 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.3.0 >Reporter: Andrew Or > > From the mailing list: > {quote} > Another problem I ran into that you also might is that --packages doesn't > work with --deploy-mode cluster. It downloads the packages to a temporary > location on the node running spark-submit, then passes those paths to the > node that is running the Driver, but since that isn't the same machine, it > can't find anything and fails. The driver process *should* be the one > doing the downloading, but it isn't. I ended up having to create a fat JAR > with all of the dependencies to get around that one. > {quote} > The problem is that we currently don't upload jars to the cluster. It seems > to fix this we either (1) do upload jars, or (2) just run the packages code > on the driver side. I slightly prefer (2) because it's simpler. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20447) spark mesos scheduler suppress call
[ https://issues.apache.org/jira/browse/SPARK-20447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983314#comment-15983314 ] Michael Gummelt commented on SPARK-20447: - The scheduler doesn't support suppression, no, but it does reject offers for 120s: https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L375, and this is configurable. With Mesos' 1s offer cycle, this should allow offers to be sent to 119 other frameworks before being re-offered to Spark. Is this not sufficient? > spark mesos scheduler suppress call > --- > > Key: SPARK-20447 > URL: https://issues.apache.org/jira/browse/SPARK-20447 > Project: Spark > Issue Type: Wish > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Pavel Plotnikov >Priority: Minor > > The spark mesos scheduler never sends the suppress call to mesos to exclude > application from Mesos batch allocation process (HierarchicalDRF allocator) > when spark application is idle and there are no tasks in the queue. As a > result, the application receives 0% cluster share because of the dynamic > resource allocation while other applications, that need additional resources, > can’t receive an offer because they have bigger cluster share that is > significantly more than 0% > About suppress call: > http://mesos.apache.org/documentation/latest/app-framework-development-guide/ -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20434) Move Kerberos delegation token code from yarn to core
Michael Gummelt created SPARK-20434: --- Summary: Move Kerberos delegation token code from yarn to core Key: SPARK-20434 URL: https://issues.apache.org/jira/browse/SPARK-20434 Project: Spark Issue Type: Task Components: Mesos, Spark Core, YARN Affects Versions: 2.1.0 Reporter: Michael Gummelt This is to enable kerberos support for other schedulers, such as Mesos. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969379#comment-15969379 ] Michael Gummelt commented on SPARK-20328: - Ah, yes, of course. Thanks. > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured (e.g. via > {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception in thread "main" java.io.IOException: Can't get Master Kerberos > principal for use as renewer > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) > {code} > I have a workaround where I set a YARN-specific configuration variable to > trick {{TokenCache}} into thinking YARN is configured, but this is obviously > suboptimal. > The proper fix to this would likely require significant {{hadoop}} > refactoring to make split information available without going through > {{JobConf}}, so I'm not yet sure what the best course of action is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969341#comment-15969341 ] Michael Gummelt commented on SPARK-16742: - [~jerryshao] No, but you can look at our solution here: https://github.com/mesosphere/spark/commit/0a2cc4248039ca989e177e96e92a594a025661fe#diff-79391110e9f26657e415aa169a004998R129 > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969341#comment-15969341 ] Michael Gummelt edited comment on SPARK-16742 at 4/14/17 6:01 PM: -- [~jerryshao] No, but you can look at our solution here: https://github.com/mesosphere/spark/commit/0a2cc4248039ca989e177e96e92a594a025661fe#diff-79391110e9f26657e415aa169a004998R129 The code we upstream will be quite different, but the delegation token handling will be similar. was (Author: mgummelt): [~jerryshao] No, but you can look at our solution here: https://github.com/mesosphere/spark/commit/0a2cc4248039ca989e177e96e92a594a025661fe#diff-79391110e9f26657e415aa169a004998R129 > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968469#comment-15968469 ] Michael Gummelt commented on SPARK-20328: - bq. I have no idea what that means. I'm pretty sure a delegation token is just another way for a subject to authenticate. So the driver uses the delegation token provided to it by {{spark-submit}} to authenticate. This is what I mean by "driver is already logged in via the delegation token". Since the driver is authenticated, it can request further delegation tokens. But my point is that it shouldn't need to, because that code is not "delegating" the tokens to any other process, which is the only thing delegation tokens are needed for. But this is neither here nor there. I think I know what I have to do. > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured (e.g. via > {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception in thread "main" java.io.IOException: Can't get Master Kerberos > principal for use as renewer > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) > {code} > I have a workaround where I set a YARN-specific configuration variable to > trick {{TokenCache}} into thinking YARN is configured, but this is obviously > suboptimal. > The proper fix to this would likely require significant {{hadoop}} > refactoring to make split information available without going through > {{JobConf}}, so I'm not yet sure what the best course of action is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968432#comment-15968432 ] Michael Gummelt commented on SPARK-20328: - bq. It depends. e.g. on YARN, when you submit in cluster mode, the driver is running in the cluster and all it has are delegation tokens. (The TGT is only available to the launcher process.) Right, but my understanding is that the driver is already logged in via the delegation token provided to it by the {{spark-submit}} process (via {{amContainer.setTokens}}), so it wouldn't need to then fetch further delegation tokens. > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured (e.g. via > {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception in thread "main" java.io.IOException: Can't get Master Kerberos > principal for use as renewer > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) > {code} > I have a workaround where I set a YARN-specific configuration variable to > trick {{TokenCache}} into thinking YARN is configured, but this is obviously > suboptimal. > The proper fix to this would likely require significant {{hadoop}} > refactoring to make split information available without going through > {{JobConf}}, so I'm not yet sure what the best course of action is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968416#comment-15968416 ] Michael Gummelt edited comment on SPARK-20328 at 4/14/17 12:02 AM: --- bq. It shouldn't need to do it not for the reasons you mention, but because Spark already the necessary credentials available (either a TGT, or a valid delegation token for HDFS). But it shouldn't need delegation tokens at all, right? The authentication of the currently logged in user, whether it be through the OS or through Kerberos, should be sufficient. was (Author: mgummelt): bq. It shouldn't need to do it not for the reasons you mention, but because Spark already the necessary credentials available (either a TGT, or a valid delegation token for HDFS). But it shouldn't need delegation tokens at all, right? > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured (e.g. via > {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception in thread "main" java.io.IOException: Can't get Master Kerberos > principal for use as renewer > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) > {code} > I have a workaround where I set a YARN-specific configuration variable to > trick {{TokenCache}} into thinking YARN is configured, but this is obviously > suboptimal. > The proper fix to this would likely require significant {{hadoop}} > refactoring to make split information available without going through > {{JobConf}}, so I'm not yet sure what the best course of action is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968411#comment-15968411 ] Michael Gummelt edited comment on SPARK-20328 at 4/13/17 11:59 PM: --- bq. The Mesos backend (I mean the code in Spark, not the Mesos service) can set the configs in the SparkContext's "hadoopConfiguration" object, can't it? I suppose this would work. It would rely on the assumption that the Mesos scheduler backend is started before the HadoopRDD is created, which happens to be true, but ideally we wouldn't have to rely on that ordering. Right now I'm just setting it in {{SparkSubmit}}, but that's not great either. I filed a Hadoop ticket for the {{FileInputFormat}} issue and linked it here. was (Author: mgummelt): > The Mesos backend (I mean the code in Spark, not the Mesos service) can set > the configs in the SparkContext's "hadoopConfiguration" object, can't it? I suppose this would work. It would rely on the assumption that the Mesos scheduler backend is started before the HadoopRDD is created, which happens to be true, but ideally we wouldn't have to rely on that ordering. Right now I'm just setting it in {{SparkSubmit}}, but that's not great either. I filed a Hadoop ticket for the {{FileInputFormat}} issue and linked it here. > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured (e.g. via > {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception in thread "main" java.io.IOException: Can't get Master Kerberos > principal for use as renewer > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) > {code} > I have a workaround where I set a YARN-specific configuration variable to > trick {{TokenCache}} into thinking YARN is configured, but this is obviously > suboptimal. > The proper fix to this would likely require significant {{hadoop}} > refactoring to make split information available without going through > {{JobConf}}, so I'm not yet sure what the best course of action is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968416#comment-15968416 ] Michael Gummelt edited comment on SPARK-20328 at 4/13/17 11:59 PM: --- bq. It shouldn't need to do it not for the reasons you mention, but because Spark already the necessary credentials available (either a TGT, or a valid delegation token for HDFS). But it shouldn't need delegation tokens at all, right? was (Author: mgummelt): > It shouldn't need to do it not for the reasons you mention, but because Spark > already the necessary credentials available (either a TGT, or a valid > delegation token for HDFS). But it shouldn't need delegation tokens at all, right? > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured (e.g. via > {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception in thread "main" java.io.IOException: Can't get Master Kerberos > principal for use as renewer > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) > {code} > I have a workaround where I set a YARN-specific configuration variable to > trick {{TokenCache}} into thinking YARN is configured, but this is obviously > suboptimal. > The proper fix to this would likely require significant {{hadoop}} > refactoring to make split information available without going through > {{JobConf}}, so I'm not yet sure what the best course of action is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968416#comment-15968416 ] Michael Gummelt commented on SPARK-20328: - > It shouldn't need to do it not for the reasons you mention, but because Spark > already the necessary credentials available (either a TGT, or a valid > delegation token for HDFS). But it shouldn't need delegation tokens at all, right? > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured (e.g. via > {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception in thread "main" java.io.IOException: Can't get Master Kerberos > principal for use as renewer > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) > {code} > I have a workaround where I set a YARN-specific configuration variable to > trick {{TokenCache}} into thinking YARN is configured, but this is obviously > suboptimal. > The proper fix to this would likely require significant {{hadoop}} > refactoring to make split information available without going through > {{JobConf}}, so I'm not yet sure what the best course of action is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968411#comment-15968411 ] Michael Gummelt commented on SPARK-20328: - > The Mesos backend (I mean the code in Spark, not the Mesos service) can set > the configs in the SparkContext's "hadoopConfiguration" object, can't it? I suppose this would work. It would rely on the assumption that the Mesos scheduler backend is started before the HadoopRDD is created, which happens to be true, but ideally we wouldn't have to rely on that ordering. Right now I'm just setting it in {{SparkSubmit}}, but that's not great either. I filed a Hadoop ticket for the {{FileInputFormat}} issue and linked it here. > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured (e.g. via > {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception in thread "main" java.io.IOException: Can't get Master Kerberos > principal for use as renewer > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) > {code} > I have a workaround where I set a YARN-specific configuration variable to > trick {{TokenCache}} into thinking YARN is configured, but this is obviously > suboptimal. > The proper fix to this would likely require significant {{hadoop}} > refactoring to make split information available without going through > {{JobConf}}, so I'm not yet sure what the best course of action is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968388#comment-15968388 ] Michael Gummelt edited comment on SPARK-20328 at 4/13/17 11:27 PM: --- Hey [~vanzin], thanks for the response. Everything you said is correct, but I want to clarify one thing: > You just need to make the Mesos backend in Spark do that automatically for > the submitting user. The problem can't be solved in the Mesos backend. When I fetch delegation tokens for transmission to Executors in the Mesos backend, there's no problem. I can set whatever renewer I want. The problem is that there's a second location where delegation tokens are fetched: {{HadoopRDD}}. This is entirely separate from the fetching that the scheduler backends do (either Mesos or YARN). {{HadoopRDD}} tries to fetch split data, and ultimately calls into {{TokenCache}} in the hadoop library, which fetches delegation tokens with the renewer set to the YARN ResourceManager's principal: https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213. I'm currently solving this by setting that config var in {{SparkSubmit}}. The big question I have, which I suppose is more for the {{hadoop}} team, is why in the world is {{FileInputFormat}} fetching delegation tokens? AFAICT, they're not sending those tokens to any other process. They're just fetching split data directly from the Name Nodes, and there should be no delegation required. was (Author: mgummelt): Hey [~vanzin], thanks for the response. Everything you said is correct, but I want to clarify one thing: > You just need to make the Mesos backend in Spark do that automatically for > the submitting user. The problem can't be solved in the Mesos backend. When I fetch delegation tokens for transmission to Executors in the Mesos backend, there's no problem. I can set whatever renewer I want. The problem is that there's a second location where delegation tokens are fetched: {{HadoopRDD}}. This is entirely separate from the fetching that the scheduler backends do (either Mesos or YARN). {{HadoopRDD}} tries to fetch split data, and ultimately calls into {{TokenCache}} in the hadoop library, which fetches delegation tokens with the renewer set to the YARN ResourceManager's principal: https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213. I'm currently solving this by setting that config var in {{SparkSubmit}}. The big question I have, which I suppose is more for the {{hadoop}} team, is why in the world is {{FileInputFormat}} fetching delegation tokens. AFAICT, they're not sending those tokens to any other process. They're just fetching split data directly from the Name Nodes, and there should be no delegation required. > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured (e.g. via > {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception in thread "main" java.io.IOException: Can't get Master Kerberos > principal for use as renewer > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache
[jira] [Comment Edited] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968388#comment-15968388 ] Michael Gummelt edited comment on SPARK-20328 at 4/13/17 11:27 PM: --- Hey [~vanzin], thanks for the response. Everything you said is correct, but I want to clarify one thing: > You just need to make the Mesos backend in Spark do that automatically for > the submitting user. The problem can't be solved in the Mesos backend. When I fetch delegation tokens for transmission to Executors in the Mesos backend, there's no problem. I can set whatever renewer I want. The problem is that there's a second location where delegation tokens are fetched: {{HadoopRDD}}. This is entirely separate from the fetching that the scheduler backends do (either Mesos or YARN). {{HadoopRDD}} tries to fetch split data, and ultimately calls into {{TokenCache}} in the hadoop library, which fetches delegation tokens with the renewer set to the YARN ResourceManager's principal: https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213. I'm currently solving this by setting that config var in {{SparkSubmit}}. The big question I have, which I suppose is more for the {{hadoop}} team, is why in the world is {{FileInputFormat}} fetching delegation tokens. AFAICT, they're not sending those tokens to any other process. They're just fetching split data directly from the Name Nodes, and there should be no delegation required. was (Author: mgummelt): Hey [~vanzin], thanks for the response. Everything you said is correct, but I want to clarify one thing: > You just need to make the Mesos backend in Spark do that automatically for > the submitting user. The problem can't be solved in the Mesos backend. When I fetch delegation tokens for transmission to Executors in the Mesos backend, there's no problem. I can set whatever renewer I want. The problem is that there's a second location where delegation tokens are fetched: {{HadoopRDD}}. This is entirely separate from the fetching that the scheduler backends do (either Mesos or YARN). {{HadoopRDD}} tries to fetch split data, and ultimately calls into {{TokenCache}} in the hadoop library, which fetches delegation tokens with the renewer set to the YARN ResourceManager's principal: https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213 The big question I have, which I suppose is more for the {{hadoop}} team, is why in the world is {{FileInputFormat}} fetching delegation tokens. AFAICT, they're not sending those tokens to any other process. They're just fetching split data directly from the Name Nodes, and there should be no delegation required. > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured (e.g. via > {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception in thread "main" java.io.IOException: Can't get Master Kerberos > principal for use as renewer > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(F
[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968388#comment-15968388 ] Michael Gummelt commented on SPARK-20328: - Hey [~vanzin], thanks for the response. Everything you said is correct, but I want to clarify one thing: > You just need to make the Mesos backend in Spark do that automatically for > the submitting user. The problem can't be solved in the Mesos backend. When I fetch delegation tokens for transmission to Executors in the Mesos backend, there's no problem. I can set whatever renewer I want. The problem is that there's a second location where delegation tokens are fetched: {{HadoopRDD}}. This is entirely separate from the fetching that the scheduler backends do (either Mesos or YARN). {{HadoopRDD}} tries to fetch split data, and ultimately calls into {{TokenCache}} in the hadoop library, which fetches delegation tokens with the renewer set to the YARN ResourceManager's principal: https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java#L213 The big question I have, which I suppose is more for the {{hadoop}} team, is why in the world is {{FileInputFormat}} fetching delegation tokens. AFAICT, they're not sending those tokens to any other process. They're just fetching split data directly from the Name Nodes, and there should be no delegation required. > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured (e.g. via > {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception in thread "main" java.io.IOException: Can't get Master Kerberos > principal for use as renewer > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) > {code} > I have a workaround where I set a YARN-specific configuration variable to > trick {{TokenCache}} into thinking YARN is configured, but this is obviously > suboptimal. > The proper fix to this would likely require significant {{hadoop}} > refactoring to make split information available without going through > {{JobConf}}, so I'm not yet sure what the best course of action is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-20328: Description: In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 Semantically, this is a problem because a HadoopRDD does not represent a Hadoop MapReduce job. Practically, this is a problem because this line: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 results in this MapReduce-specific security code being called: https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, which assumes the MapReduce master is configured (e.g. via {{yarn.resourcemanager.*}}). If it isn't, an exception is thrown. So I'm seeing this exception thrown as I'm trying to add Kerberos support for the Spark Mesos scheduler: {code} Exception in thread "main" java.io.IOException: Can't get Master Kerberos principal for use as renewer at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) {code} I have a workaround where I set a YARN-specific configuration variable to trick {{TokenCache}} into thinking YARN is configured, but this is obviously suboptimal. The proper fix to this would likely require significant {{hadoop}} refactoring to make split information available without going through {{JobConf}}, so I'm not yet sure what the best course of action is. was: In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 Semantically, this is a problem because a HadoopRDD does not represent a Hadoop MapReduce job. Practically, this is a problem because this line: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 results in this MapReduce-specific security code being called: https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, which assumes the MapReduce master is configured. If it isn't, an exception is thrown. So I'm seeing this exception thrown as I'm trying to add Kerberos support for the Spark Mesos scheduler: {code} Exception in thread "main" java.io.IOException: Can't get Master Kerberos principal for use as renewer at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) {code} I have a workaround where I set a YARN-specific configuration variable to trick {{TokenCache}} into thinking YARN is configured, but this is obviously suboptimal. The proper fix to this would likely require significant {{hadoop}} refactoring to make split information available without going through {{JobConf}}, so I'm not yet sure what the best course of action is. > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 >
[jira] [Updated] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-20328: Description: In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 Semantically, this is a problem because a HadoopRDD does not represent a Hadoop MapReduce job. Practically, this is a problem because this line: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 results in this MapReduce-specific security code being called: https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, which assumes the MapReduce master is configured. If it isn't, an exception is thrown. So I'm seeing this exception thrown as I'm trying to add Kerberos support for the Spark Mesos scheduler: {code} Exception in thread "main" java.io.IOException: Can't get Master Kerberos principal for use as renewer at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:205) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202) {code} I have a workaround where I set a YARN-specific configuration variable to trick {{TokenCache}} into thinking YARN is configured, but this is obviously suboptimal. The proper fix to this would likely require significant {{hadoop}} refactoring to make split information available without going through {{JobConf}}, so I'm not yet sure what the best course of action is. was: In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 Semantically, this is a problem because a HadoopRDD does not represent a Hadoop MapReduce job. Practically, this is a problem because this line: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 results in this MapReduce-specific security code being called: https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, which assumes the MapReduce master is configured. If it isn't, an exception is thrown. So I'm seeing this exception thrown as I'm trying to add Kerberos support for the Spark Mesos scheduler. I have a workaround where I set a YARN-specific configuration variable to trick {{TokenCache}} into thinking YARN is configured, but this is obviously suboptimal. The proper fix to this would likely require significant {{hadoop}} refactoring to make split information available without going through {{JobConf}}, so I'm not yet sure what the best course of action is. > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured. If it isn't, an exception > is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler: > {code} > Exception
[jira] [Commented] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
[ https://issues.apache.org/jira/browse/SPARK-20328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968232#comment-15968232 ] Michael Gummelt commented on SPARK-20328: - cc [~colorant] [~hfeng] [~vanzin] > HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs > - > > Key: SPARK-20328 > URL: https://issues.apache.org/jira/browse/SPARK-20328 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0, 2.1.1, 2.1.2 >Reporter: Michael Gummelt > > In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a > MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 > Semantically, this is a problem because a HadoopRDD does not represent a > Hadoop MapReduce job. Practically, this is a problem because this line: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 > results in this MapReduce-specific security code being called: > https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, > which assumes the MapReduce master is configured. If it isn't, an exception > is thrown. > So I'm seeing this exception thrown as I'm trying to add Kerberos support for > the Spark Mesos scheduler. I have a workaround where I set a YARN-specific > configuration variable to trick {{TokenCache}} into thinking YARN is > configured, but this is obviously suboptimal. > The proper fix to this would likely require significant {{hadoop}} > refactoring to make split information available without going through > {{JobConf}}, so I'm not yet sure what the best course of action is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20328) HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs
Michael Gummelt created SPARK-20328: --- Summary: HadoopRDDs create a MapReduce JobConf, but are not MapReduce jobs Key: SPARK-20328 URL: https://issues.apache.org/jira/browse/SPARK-20328 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.1.0, 2.1.1, 2.1.2 Reporter: Michael Gummelt In order to obtain {{InputSplit}} information, {{HadoopRDD}} creates a MapReduce {{JobConf}} out of the Hadoop {{Configuration}}: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L138 Semantically, this is a problem because a HadoopRDD does not represent a Hadoop MapReduce job. Practically, this is a problem because this line: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala#L194 results in this MapReduce-specific security code being called: https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/security/TokenCache.java#L130, which assumes the MapReduce master is configured. If it isn't, an exception is thrown. So I'm seeing this exception thrown as I'm trying to add Kerberos support for the Spark Mesos scheduler. I have a workaround where I set a YARN-specific configuration variable to trick {{TokenCache}} into thinking YARN is configured, but this is obviously suboptimal. The proper fix to this would likely require significant {{hadoop}} refactoring to make split information available without going through {{JobConf}}, so I'm not yet sure what the best course of action is. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963601#comment-15963601 ] Michael Gummelt commented on SPARK-16742: - bq. So, assuming that Mesos is configured properly, then it should be OK for Spark code to distribute user credentials. Right. It's just a matter of the cluster admin syncing Mesos credentials and kerberos credentials properly. In summary, it's simpler in YARN because YARN is Kerberos-aware, whereas Mesos isn't. bq. That sounds like you might need the current code that distributes keytabs and logs in the cluster to make even client mode work in this setup. Since client mode requires network access to the Mesos master, we generally assume that the user is on the same network as their datacenter, and can thus kinit against the KDC. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963583#comment-15963583 ] Michael Gummelt commented on SPARK-16742: - bq. That sounds problematic. The way YARN works is that it actually authenticates the user. Are you saying that Mesos doesn't do user authentication? AFAICT, YARN doesn't authenticate the Linux user. The KDC authenticates the kerberos principal, and YARN maps this principal to a Linux user via {{hadoop.security.auth_to_local}}. So if a user authenticated to the KDC via a principal "Joe", and the {{auth_to_local}} rule maps "Joe" to "root", then "Joe" can launch processes as "root", even though he never provided "root" credentials. It's up to the cluster administrator to properly setup this Kerberos -> Linux mapping. It's a similar story with Mesos. Mesos doesn't authenticate the Linux user. It authenticates the Mesos principal, and this principal is allowed to launch processes only as certain Linux users. It's up the cluster admin to setup this mapping appropriately. The big difference is that, by default, YARN will map the kerberos principal to the linux user with the same name, so there's no problem. Whereas Mesos will allow the driver to launch executors as any user that their Mesos principal is allowed to launch users as. So it's up to the admin to only provide users with consistent Mesos and Kerberos credentials. bq. Are you saying that for YARN or Mesos? When YARN runs in Kerberos mode, Kerberos dictates the user. I'm talking about YARN. See the above comment. If {{auth_to_local}} is used like I think it is, then that's what ultimately determines the Linux user, not just Kerberos. bq. The use case you mention ("user starting an application in cluster mode with no kerberos credentials") sounds actually worrying I actually said a "user might not be kinit'd". They may, however, have access to the keytab. But since they're not on the same network as the KDC, they can't authenticate directly. But they do have the creds. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963469#comment-15963469 ] Michael Gummelt commented on SPARK-16742: - [~jerryshao] Great! The current RPC used in Mesos is very simple. The executor just periodically requests the latest credentials from the driver, which uses the keytab to periodically renew. We can swap in a different mechanism once that exists. I left a comment on your design doc. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963446#comment-15963446 ] Michael Gummelt edited comment on SPARK-16742 at 4/10/17 8:35 PM: -- [~vanzin] bq. The most basic feature needed for any kerberos-related work is user isolation (different users cannot mess with each others' processes). I was under the impression that Mesos supported that. Mesos of course supports configuring the Linux user that process runs as. But in Spark, this isn't currently derived from the Kerberos principal. It's configured by the user. The scheduler's *Mesos* principal, along with ACLs configured in Mesos, is what determines which Linux users are allowed. That's why I was asking about {{hadoop.security.auth_to_local}}, to understand how YARN determines what Linux user to run executors as. It would be a vulnerability, for example, if the Linux user for the executors is simply derived from that of the driver, because two human users running as the same Linux user, but logged in via different Kerberos principals, would be able to see each others' tokens. bq. I don't know where this notion that cluster mode requires you to distribute keytabs comes from As you said, it's mostly the renewal use case that requires distributing the keytab, but that's not all. In many Mesos setups, and certainly in DC/OS, the submitting user might not already be kinit'd. They may be running from outside the datacenter entirely, without network access to the KDC. You're right that we could implement cluster mode in some form, but I'd rather keep the initial PR small. I hope that's acceptable. was (Author: mgummelt): [~vanzin] bq. The most basic feature needed for any kerberos-related work is user isolation (different users cannot mess with each others' processes). I was under the impression that Mesos supported that. Mesos of course supports configuring the Linux user that process runs as. But in Spark, this isn't currently derived from the Kerberos principal. It's configured by the user. The scheduler's *Mesos* principal, along with ACLs configured in Mesos, is what determines which Linux users are allowed. That's why I was asking about {{hadoop.security.auth_to_local}}, to understand how YARN determines what Linux user to run executors as. It would be a vulnerability, for example, if the Linux user for the executors is simply derived from that of the driver, because two human users running as the same Linux user, but logged in via different Kerberos principals, would be able to see each others' tokens. bq. I don't know where this notion that cluster mode requires you to distribute keytabs comes from As you said, it's mostly the renewal use case that requires distributing the keytab, but that's not all. In many Mesos setups, and certainly in DC/OS, the submitting user might not already be kinit'd. They may be running from outside the datacenter entirely, without network access to the KDC. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963446#comment-15963446 ] Michael Gummelt edited comment on SPARK-16742 at 4/10/17 8:20 PM: -- [~vanzin] bq. The most basic feature needed for any kerberos-related work is user isolation (different users cannot mess with each others' processes). I was under the impression that Mesos supported that. Mesos of course supports configuring the Linux user that process runs as. But in Spark, this isn't currently derived from the Kerberos principal. It's configured by the user. The scheduler's *Mesos* principal, along with ACLs configured in Mesos, is what determines which Linux users are allowed. That's why I was asking about {{hadoop.security.auth_to_local}}, to understand how YARN determines what Linux user to run executors as. It would be a vulnerability, for example, if the Linux user for the executors is simply derived from that of the driver, because two human users running as the same Linux user, but logged in via different Kerberos principals, would be able to see each others' tokens. bq. I don't know where this notion that cluster mode requires you to distribute keytabs comes from As you said, it's mostly the renewal use case that requires distributing the keytab, but that's not all. In many Mesos setups, and certainly in DC/OS, the submitting user might not already be kinit'd. They may be running from outside the datacenter entirely, without network access to the KDC. was (Author: mgummelt): [~vanzin] bq. The most basic feature needed for any kerberos-related work is user isolation (different users cannot mess with each others' processes). I was under the impression that Mesos supported that. Mesos of course supports configuring the Linux user that process runs as. But in Spark, this isn't currently derived from the Kerberos principal. It's configured by the user, and the *Mesos* principal of the scheduler, along with ACLs configured in Mesos, is what determines which Linux users are allowed. That's why I was asking about {{hadoop.security.auth_to_local}}, to understand how YARN determines what Linux user to run executors as. It would be a vulnerability, for example, if the Linux user for the executors is simply derived from that of the driver, because two human users running as the same Linux user, but logged in via different Kerberos principals, would be able to see each others' tokens. bq. I don't know where this notion that cluster mode requires you to distribute keytabs comes from As you said, it's mostly the renewal use case that requires distributing the keytab, but that's not all. In many Mesos setups, and certainly in DC/OS, the submitting user might not already be kinit'd. They may be running from outside the datacenter entirely, without network access to the KDC. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963446#comment-15963446 ] Michael Gummelt commented on SPARK-16742: - bq. The most basic feature needed for any kerberos-related work is user isolation (different users cannot mess with each others' processes). I was under the impression that Mesos supported that. Mesos of course supports configuring the Linux user that process runs as. But in Spark, this isn't currently derived from the Kerberos principal. It's configured by the user, and the *Mesos* principal of the scheduler, along with ACLs configured in Mesos, is what determines which Linux users are allowed. That's why I was asking about {{hadoop.security.auth_to_local}}, to understand how YARN determines what Linux user to run executors as. It would be a vulnerability, for example, if the Linux user for the executors is simply derived from that of the driver, because two human users running as the same Linux user, but logged in via different Kerberos principals, would be able to see each others' tokens. bq. I don't know where this notion that cluster mode requires you to distribute keytabs comes from As you said, it's mostly the renewal use case that requires distributing the keytab, but that's not all. In many Mesos setups, and certainly in DC/OS, the submitting user might not already be kinit'd. They may be running from outside the datacenter entirely, without network access to the KDC. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963446#comment-15963446 ] Michael Gummelt edited comment on SPARK-16742 at 4/10/17 8:18 PM: -- [~vanzin] bq. The most basic feature needed for any kerberos-related work is user isolation (different users cannot mess with each others' processes). I was under the impression that Mesos supported that. Mesos of course supports configuring the Linux user that process runs as. But in Spark, this isn't currently derived from the Kerberos principal. It's configured by the user, and the *Mesos* principal of the scheduler, along with ACLs configured in Mesos, is what determines which Linux users are allowed. That's why I was asking about {{hadoop.security.auth_to_local}}, to understand how YARN determines what Linux user to run executors as. It would be a vulnerability, for example, if the Linux user for the executors is simply derived from that of the driver, because two human users running as the same Linux user, but logged in via different Kerberos principals, would be able to see each others' tokens. bq. I don't know where this notion that cluster mode requires you to distribute keytabs comes from As you said, it's mostly the renewal use case that requires distributing the keytab, but that's not all. In many Mesos setups, and certainly in DC/OS, the submitting user might not already be kinit'd. They may be running from outside the datacenter entirely, without network access to the KDC. was (Author: mgummelt): bq. The most basic feature needed for any kerberos-related work is user isolation (different users cannot mess with each others' processes). I was under the impression that Mesos supported that. Mesos of course supports configuring the Linux user that process runs as. But in Spark, this isn't currently derived from the Kerberos principal. It's configured by the user, and the *Mesos* principal of the scheduler, along with ACLs configured in Mesos, is what determines which Linux users are allowed. That's why I was asking about {{hadoop.security.auth_to_local}}, to understand how YARN determines what Linux user to run executors as. It would be a vulnerability, for example, if the Linux user for the executors is simply derived from that of the driver, because two human users running as the same Linux user, but logged in via different Kerberos principals, would be able to see each others' tokens. bq. I don't know where this notion that cluster mode requires you to distribute keytabs comes from As you said, it's mostly the renewal use case that requires distributing the keytab, but that's not all. In many Mesos setups, and certainly in DC/OS, the submitting user might not already be kinit'd. They may be running from outside the datacenter entirely, without network access to the KDC. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962450#comment-15962450 ] Michael Gummelt commented on SPARK-16742: - Also, note that the above Mesos implementation is not dependent on Mesos in any way. It just uses Spark's existing RPC mechanisms to transmit delegation tokens. I see that there's a related effort here to standardize this RPC mechanism: https://issues.apache.org/jira/browse/SPARK-19143. We'd be more than happy to adopt that standard once it exists. But hopefully our one-off RPC that we're currently using is acceptable in the interim. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962440#comment-15962440 ] Michael Gummelt edited comment on SPARK-16742 at 4/10/17 5:28 AM: -- Hi [~vanzin], [~ganger85] and Strat.io are pulling back their Mesos Kerberos implementation for now, and we at Mesosphere are about to submit a PR to upstream our implementation. I have a few questions I'd like to run by you to make sure that PR goes smoothly. 1) I've been following your comments on this Spark Standalone Kerberos PR: https://github.com/apache/spark/pull/17530. It looks like your concern is that in *cluster mode*, the keytab is written to a file on the host running the driver, and is owned by the user of the Spark Worker, which will be the same for each job. So jobs submitted by multiple users will be able to read each other's keytabs. In *client mode*, it looks like the delegation tokens are written to a file (HADOOP_TOKEN_FILE_LOCATION) on the host running the executor, which suffers from the same problem as the keytab in cluster mode. The problem is then that a kerberos-authenticated user submitting their job would be unaware that their credentials are being leaked to other users. Is this an accurate description of the issue? 2) I understand that YARN writes delegation tokens via {{amContainer.setTokens()}}, which ultimately results in the delegation token being written to a file owned by the submitting user. However, since the "submitting user" is a Kerberos user, not a Unix user, I'm assuming that {{hadoop.security.auth_to_local}} is what maps the Kerberos user to the Unix user who runs the ApplicationMaster and owns that file. Is that correct? To avoid the shared-file problem for delegation tokens, our Mesos implementation currently has the Executor issue an RPC call to fetch the delegation token from the driver. There therefore isn't any need for at-rest access control, and if in-motion interception is in the user's threat model, then can be sure to run Spark with SSL. We avoid the shared-file problem for keytabs entirely, because there's no need to distribute the keytab, at least in client mode. Unlike YARN, the driver and the equivalent of the "ApplicationMaster" in Mesos are one and the same. They both exist in the same process, the {{spark-submit}} process. We're probably going to punt on cluster mode for now, just for simplicity, but we should be able to solve this in cluster mode as well, because unlike standalone, and much like YARN, Mesos controls what user the driver runs as. What do you think of the above approach? If you see any blockers, I would very much appreciate teasing those out now rather than during the PR. Thanks! was (Author: mgummelt): Hi [~vanzin], [~ganger85] and Strat.io are pulling back their Mesos Kerberos implementation for now, and we at Mesosphere are about to submit a PR to upstream our implementation. I have a few questions I'd like to run by you to make sure that PR goes smoothly. 1) I've been following your comments on this Spark Standalone Kerberos PR: https://github.com/apache/spark/pull/17530. It looks like your concern is that in *cluster mode*, the keytab is written to a file on the host running the driver, and is owned by the user of the Spark Worker, which will be the same for each job. So jobs submitted by multiple users will be able to read each other's keytabs. In *client mode*, it looks like the delegation tokens are written to a file (HADOOP_TOKEN_FILE_LOCATION) on the host running the executor, which suffers from the same problem as the keytab in cluster mode. The problem is then that a kerberos-authenticated user submitting their job would be unaware that their credentials are being leaked to other users. Is this an accurate description of the issue? 2) I understand that YARN writes delegation tokens via {{amContainer.setTokens()}}, which ultimately results in the delegation token being written to a file owned by the submitting user. However, since the "submitting user" is a Kerberos user, not a Unix user, I'm assuming that {{hadoop.security.auth_to_local}} is what maps the Kerberos user to the Unix user who runs the ApplicationMaster and owns that file. Is that correct? To avoid the shared-file problem for delegation tokens, our Mesos implementation currently has the Executor issue an RPC call to fetch the delegation token from the driver. There therefore isn't any need for at-rest encryption, and if in-motion encryption is in the user's threat model, then can be sure to run Spark with SSL. We avoid the shared-file problem for keytabs entirely, because there's no need to distribute the keytab, at least in client mode. Unlike YARN, the driver and the equivalent of the "ApplicationMaster" in Mesos are one and the same. They both exist in the
[jira] [Commented] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962440#comment-15962440 ] Michael Gummelt commented on SPARK-16742: - Hi [~vanzin], [~ganger85] and Strat.io are pulling back their Mesos Kerberos implementation for now, and we at Mesosphere are about to submit a PR to upstream our implementation. I have a few questions I'd like to run by you to make sure that PR goes smoothly. 1) I've been following your comments on this Spark Standalone Kerberos PR: https://github.com/apache/spark/pull/17530. It looks like your concern is that in *cluster mode*, the keytab is written to a file on the host running the driver, and is owned by the user of the Spark Worker, which will be the same for each job. So jobs submitted by multiple users will be able to read each other's keytabs. In *client mode*, it looks like the delegation tokens are written to a file (HADOOP_TOKEN_FILE_LOCATION) on the host running the executor, which suffers from the same problem as the keytab in cluster mode. The problem is then that a kerberos-authenticated user submitting their job would be unaware that their credentials are being leaked to other users. Is this an accurate description of the issue? 2) I understand that YARN writes delegation tokens via {{amContainer.setTokens()}}, which ultimately results in the delegation token being written to a file owned by the submitting user. However, since the "submitting user" is a Kerberos user, not a Unix user, I'm assuming that {{hadoop.security.auth_to_local}} is what maps the Kerberos user to the Unix user who runs the ApplicationMaster and owns that file. Is that correct? To avoid the shared-file problem for delegation tokens, our Mesos implementation currently has the Executor issue an RPC call to fetch the delegation token from the driver. There therefore isn't any need for at-rest encryption, and if in-motion encryption is in the user's threat model, then can be sure to run Spark with SSL. We avoid the shared-file problem for keytabs entirely, because there's no need to distribute the keytab, at least in client mode. Unlike YARN, the driver and the equivalent of the "ApplicationMaster" in Mesos are one and the same. They both exist in the same process, the {{spark-submit}} process. We're probably going to punt on cluster mode for now, just for simplicity, but we should be able to solve this in cluster mode as well, because unlike standalone, and much like YARN, Mesos controls what user the driver runs as. What do you think of the above approach? If you see any blockers, I would very much appreciate teasing those out now rather than during the PR. Thanks! > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20054) [Mesos] Detectability for resource starvation
[ https://issues.apache.org/jira/browse/SPARK-20054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935605#comment-15935605 ] Michael Gummelt commented on SPARK-20054: - Sounds like this could be solved just by having some better logging? Something that indicates the driver is waiting for more registered executors? > [Mesos] Detectability for resource starvation > - > > Key: SPARK-20054 > URL: https://issues.apache.org/jira/browse/SPARK-20054 > Project: Spark > Issue Type: Improvement > Components: Mesos, Scheduler >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0 >Reporter: Kamal Gurala >Priority: Minor > > We currently use Mesos 1.1.0 for our Spark cluster in coarse-grained mode. We > had a production issue recently wherein we had our spark frameworks accept > resources from the Mesos master, so executors were started and spark driver > was aware of them, but the driver didn’t plan any task and nothing was > happening for a long time because it didn't meet a minimum registered > resources threshold. and the cluster is usually under-provisioned in order > because not all the jobs need to run at the same time. These held resources > were never offered back to the master for re-allocation leading to the entire > cluster to a halt until we had to manually intervene. > Using DRF for mesos and FIFO for Spark and the cluster is usually > under-provisioned. At any point of time there could be 10-15 spark frameworks > running on Mesos on the under-provisioned cluster > The ask is to have a way to better recoverability or detectability for a > scenario where the individual Spark frameworks hold onto resources but never > launch any tasks or have these frameworks release these resources after a > fixed amount of time. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19373) Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at acquired cores rather than registerd cores
[ https://issues.apache.org/jira/browse/SPARK-19373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891332#comment-15891332 ] Michael Gummelt commented on SPARK-19373: - [~skonto] Either decline or hoard. > Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at > acquired cores rather than registerd cores > --- > > Key: SPARK-19373 > URL: https://issues.apache.org/jira/browse/SPARK-19373 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.3, 2.0.2, 2.1.0 >Reporter: Michael Gummelt >Assignee: Michael Gummelt > Fix For: 2.1.1, 2.2.0 > > > We're currently using `totalCoresAcquired` to account for registered > resources, which is incorrect. That variable measures the number of cores > the scheduler has accepted. We should be using `totalCoreCount` like the > other schedulers do. > Fixing this is important for locality, since users often want to wait for all > executors to come up before scheduling tasks to ensure they get a node-local > placement. > original PR to add support: https://github.com/apache/spark/pull/8672/files -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19373) Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at acquired cores rather than registerd cores
[ https://issues.apache.org/jira/browse/SPARK-19373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-19373: Affects Version/s: 1.6.3 2.0.2 > Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at > acquired cores rather than registerd cores > --- > > Key: SPARK-19373 > URL: https://issues.apache.org/jira/browse/SPARK-19373 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.6.3, 2.0.2, 2.1.0 >Reporter: Michael Gummelt > > We're currently using `totalCoresAcquired` to account for registered > resources, which is incorrect. That variable measures the number of cores > the scheduler has accepted. We should be using `totalCoreCount` like the > other schedulers do. > Fixing this is important for locality, since users often want to wait for all > executors to come up before scheduling tasks to ensure they get a node-local > placement. > original PR to add support: https://github.com/apache/spark/pull/8672/files -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19373) Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at acquired cores rather than registerd cores
[ https://issues.apache.org/jira/browse/SPARK-19373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888692#comment-15888692 ] Michael Gummelt commented on SPARK-19373: - This change makes it so that the user can instruct the driver to wait for all executors to register before scheduling tasks. The TaskSchedulerImpl understand locality, so it can then make the optimal placement. Otherwise, tasks are scheduled as soon as the first executor is registered, which of course might not be node-local for the first task. However, this is still assuming that executors will be scheduled on the correct nodes, which isn't guaranteed unless you're launching executors on every node in your cluster. For the best locality functionality, we need to integrate task locality information with dynamic allocation, so that the driver can dynamically spin up executors on the needed nodes. That is outside the scope of this JIRA, though. > Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at > acquired cores rather than registerd cores > --- > > Key: SPARK-19373 > URL: https://issues.apache.org/jira/browse/SPARK-19373 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Michael Gummelt > > We're currently using `totalCoresAcquired` to account for registered > resources, which is incorrect. That variable measures the number of cores > the scheduler has accepted. We should be using `totalCoreCount` like the > other schedulers do. > Fixing this is important for locality, since users often want to wait for all > executors to come up before scheduling tasks to ensure they get a node-local > placement. > original PR to add support: https://github.com/apache/spark/pull/8672/files -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19702) Add Suppress/Revive support to the Mesos Spark Dispatcher
[ https://issues.apache.org/jira/browse/SPARK-19702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-19702: Description: Due to the problem described here: https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos frameworks concurrently can result in starvation. For example, running 10 dispatchers could result in 5 of them getting all the offers, even if they have no jobs to launch. We must implement increase the refuse_seconds timeout to solve this problem. Another option would have been to implement suppress/revive, but that can cause starvation due to the unreliability of mesos RPC calls. (was: Due to the problem described here: https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos frameworks concurrently can result in starvation. For example, running 10 dispatchers could result in 5 of them getting all the offers, even if they have no jobs to launch. We must implement explicit SUPPRESS and REVIVE calls in the Spark Dispatcher to solve this problem.) > Add Suppress/Revive support to the Mesos Spark Dispatcher > - > > Key: SPARK-19702 > URL: https://issues.apache.org/jira/browse/SPARK-19702 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Michael Gummelt > > Due to the problem described here: > https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos > frameworks concurrently can result in starvation. For example, running 10 > dispatchers could result in 5 of them getting all the offers, even if they > have no jobs to launch. We must implement increase the refuse_seconds > timeout to solve this problem. Another option would have been to implement > suppress/revive, but that can cause starvation due to the unreliability of > mesos RPC calls. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19702) Increasse refuse_seconds timeout in the Mesos Spark Dispatcher
[ https://issues.apache.org/jira/browse/SPARK-19702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-19702: Summary: Increasse refuse_seconds timeout in the Mesos Spark Dispatcher (was: Add Suppress/Revive support to the Mesos Spark Dispatcher) > Increasse refuse_seconds timeout in the Mesos Spark Dispatcher > -- > > Key: SPARK-19702 > URL: https://issues.apache.org/jira/browse/SPARK-19702 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Michael Gummelt > > Due to the problem described here: > https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos > frameworks concurrently can result in starvation. For example, running 10 > dispatchers could result in 5 of them getting all the offers, even if they > have no jobs to launch. We must implement increase the refuse_seconds > timeout to solve this problem. Another option would have been to implement > suppress/revive, but that can cause starvation due to the unreliability of > mesos RPC calls. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19702) Add Suppress/Revive support to the Mesos Spark Dispatcher
Michael Gummelt created SPARK-19702: --- Summary: Add Suppress/Revive support to the Mesos Spark Dispatcher Key: SPARK-19702 URL: https://issues.apache.org/jira/browse/SPARK-19702 Project: Spark Issue Type: New Feature Components: Mesos Affects Versions: 2.1.0 Reporter: Michael Gummelt Due to the problem described here: https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos frameworks concurrently can result in starvation. For example, running 10 dispatchers could result in 5 of them getting all the offers, even if they have no jobs to launch. We must implement explicit SUPPRESS and REVIVE calls in the Spark Dispatcher to solve this problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19703) Add Suppress/Revive support to the Mesos Spark Driver
Michael Gummelt created SPARK-19703: --- Summary: Add Suppress/Revive support to the Mesos Spark Driver Key: SPARK-19703 URL: https://issues.apache.org/jira/browse/SPARK-19703 Project: Spark Issue Type: New Feature Components: Mesos Affects Versions: 2.1.0 Reporter: Michael Gummelt Due to the problem described here: https://issues.apache.org/jira/browse/MESOS-6112, Running > 5 Mesos frameworks concurrently can result in starvation. For example, running 10 jobs could result in 5 of them getting all the offers, even after they've launched all their executors. This leads to starvation of the other jobs. We must implement explicit SUPPRESS and REVIVE calls in the Spark Dispatcher to solve this problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19479) Spark Mesos artifact split causes spark-core dependency to not pull in mesos impl
[ https://issues.apache.org/jira/browse/SPARK-19479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855091#comment-15855091 ] Michael Gummelt commented on SPARK-19479: - Yea, sorry for the inconvenience, but I announced this on the dev list. Search for "Mesos is now a maven module". If I were you, I would create an email filter for "Mesos" on the user/dev lists. This is what I do. > Spark Mesos artifact split causes spark-core dependency to not pull in mesos > impl > - > > Key: SPARK-19479 > URL: https://issues.apache.org/jira/browse/SPARK-19479 > Project: Spark > Issue Type: Bug > Components: Mesos, Spark Core >Affects Versions: 2.1.0 >Reporter: Charles Allen > > https://github.com/apache/spark/pull/14637 ( > https://issues.apache.org/jira/browse/SPARK-16967 ) forked off the mesos impl > into its own artifact, but the release notes do not call this out. This broke > our deployments because we depend on packaging with spark-core, which no > longer had any mesos awareness. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-16742) Kerberos support for Spark on Mesos
Title: Message Title Michael Gummelt commented on SPARK-16742  Re: Kerberos support for Spark on Mesos Thomas Graves Yea, I'm pretty sure we're going to change that to use delegation tokens like the existing solutions. Add Comment  This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)
[jira] (SPARK-16742) Kerberos support for Spark on Mesos
Title: Message Title Michael Gummelt commented on SPARK-16742  Re: Kerberos support for Spark on Mesos As an update, we (Mesosphere) are working with Stratio on a joint solution. Stratio will submit a WIP PR soon, and we'll have a design discussion in this JIRA issue. Add Comment  This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)
[jira] (SPARK-16784) Configurable log4j settings
Title: Message Title Michael Gummelt updated an issue  Spark / SPARK-16784 Configurable log4j settings Change By: Michael Gummelt Affects Version/s: 2.1.0 Add Comment  This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d)
[jira] (SPARK-16784) Configurable log4j settings
Title: Message Title Michael Gummelt commented on SPARK-16784 Â Re: Configurable log4j settings This actually doesn't seem to work for executors. I have a file log4j.properties.debug with the following content: log4j.rootCategory=DEBUG, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n # Set the default spark-shell log level to WARN. When running the spark-shell, the # log level for this class is used to overwrite the root logger's log level, so that # the user can have different defaults for the shell and regular Spark apps. log4j.logger.org.apache.spark.repl.Main=INFO # Settings to quiet third party logs that are too verbose log4j.logger.org.spark-project.jetty=WARN log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO log4j.logger.org.apache.parquet=ERROR log4j.logger.parquet=ERROR # SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR And I've run my job as follows: root@ip-10-0-6-74:/opt/spark/dist# ./bin/spark-shell --keytab $(pwd)/hadoop-install/keytabs/nn.10.0.2.keytab --principal nn/10.0.2.103@LOCAL --master mesos://leader.mesos:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.7-2.1.0-hadoop-2.7 --conf spark.mesos.executor.home=/opt/spark/dist --conf spark.mesos.uris=http://mgummelt-mesos.s3.amazonaws.com/log4j.properties.debug --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=/mnt/mesos/sandbox/log4j.properties.debug" I've verified that /mnt/mesos/sandbox/log4j.properties.debug exists in the executor's file system, and that the executor process is run with -Dlog4j.configuration=/mnt/mesos/sandbox/log4j.properties.debug But debug logging is not enabled, and the executors print: root@ip-10-0-6-74:/opt/spark/dist# ./bin/spark-shell --keytab $(pwd)/hadoop-install/keytabs/nn.10.0.2.keytab --principal nn/10.0.2.103@LOCAL --master mesos://leader.mesos:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.7-2.1.0-hadoop-2.7 --conf spark.mesos.executor.home=/opt/spark/dist --conf spark.mesos.uris=http://mgummelt-mesos.s3.amazonaws.com/log4j.properties.debug --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=/mnt/mesos/sandbox/log4j.properties.debug"
[jira] (SPARK-16784) Configurable log4j settings
Title: Message Title Michael Gummelt edited a comment on SPARK-16784  Re: Configurable log4j settings This actually doesn't seem to work for executors.  I have a file {{log4j.properties.debug}} with the following content:{code}log4j.rootCategory=DEBUG, consolelog4j.appender.console=org.apache.log4j.ConsoleAppenderlog4j.appender.console.target=System.errlog4j.appender.console.layout=org.apache.log4j.PatternLayoutlog4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n# Set the default spark-shell log level to WARN. When running the spark-shell, the# log level for this class is used to overwrite the root logger's log level, so that# the user can have different defaults for the shell and regular Spark apps.log4j.logger.org.apache.spark.repl.Main=INFO# Settings to quiet third party logs that are too verboselog4j.logger.org.spark-project.jetty=WARNlog4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERRORlog4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFOlog4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFOlog4j.logger.org.apache.parquet=ERRORlog4j.logger.parquet=ERROR# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive supportlog4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATALlog4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR{code}And I've run my job as follows:{code}root@ip-10-0-6-74:/opt/spark/dist# ./bin/spark-shell --keytab $(pwd)/hadoop-install/keytabs/nn.10.0.2.keytab --principal nn/10.0.2.103@LOCAL --master mesos://leader.mesos:5050 --conf spark.mesos.executor.docker.image=mesosphere/spark:1.0.7-2.1.0-hadoop-2.7 --conf spark.mesos.executor.home=/opt/spark/dist --conf spark.mesos.uris=http://mgummelt-mesos.s3.amazonaws.com/log4j.properties.debug --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=/mnt/mesos/sandbox/log4j.properties.debug"{code}I've verified that {{/mnt/mesos/sandbox/log4j.properties.debug}} exists in the executor's file system, and that the executor process is run with {{-Dlog4j.configuration=/mnt/mesos/sandbox/log4j.properties.debug}}But debug logging is not enabled, and the executors print:{code} root@ip-10-0-6-74 \Using Spark's default log4j profile :  org / opt apache /spark/ dist# ./bin/spark log4j - shell --keytab $(pwd)/hadoop-install/keytabs/nn defaults . 10.0.2.keytab --principal nn properties17 / 10.0.2.103@LOCAL --master mesos: 01 / /leader.mesos 30 02 : 5050 43:34 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 9@ip  - 10 - conf spark.mesos.executor.docker.image=mesosphere/spark:1. 0 .7 -2 -159 . 1.0 us - hadoop west -2. 7 --conf spark compute . mesos.executor.home= internal17 / opt 01 / spark/dist --conf spark.mesos.uris=http 30 02 : 43:34 INFO SignalUtils: Registered signal handler for TERM17 / 01 / mgummelt-mesos.s3.amazonaws.com 30 02:43:34 INFO SignalUtils: Registered signal handler for HUP17 / log4j.properties.debug --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration= 01 / mnt/mesos/sandbox/log4j.properties.debug" 30 02:43:34 INFO SignalUtils: Registered signal handler for INT {code} Add Comment
[jira] [Created] (SPARK-19373) Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at acquired cores rather than registerd cores
Michael Gummelt created SPARK-19373: --- Summary: Mesos implementation of spark.scheduler.minRegisteredResourcesRatio looks at acquired cores rather than registerd cores Key: SPARK-19373 URL: https://issues.apache.org/jira/browse/SPARK-19373 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 2.1.0 Reporter: Michael Gummelt We're currently using `totalCoresAcquired` to account for registered resources, which is incorrect. That variable measures the number of cores the scheduler has accepted. We should be using `totalCoreCount` like the other schedulers do. Fixing this is important for locality, since users often want to wait for all executors to come up before scheduling tasks to ensure they get a node-local placement. original PR to add support: https://github.com/apache/spark/pull/8672/files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10643) Support remote application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-10643: Summary: Support remote application download in client mode spark submit (was: Support HDFS application download in client mode spark submit) > Support remote application download in client mode spark submit > --- > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15677467#comment-15677467 ] Michael Gummelt commented on SPARK-10643: - It's not just HDFS. HTTP urls fail as well: {code} ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local http://mgummelt-mesos.s3.amazonaws.com/spark-examples_2.11-2.0.0.jar Warning: Skip remote jar http://mgummelt-mesos.s3.amazonaws.com/spark-examples_2.11-2.0.0.jar. java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:225) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:686) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code} But the docs say this is supported: "hdfs:, http:, https:, ftp: - these pull down files and JARs from the URI as expected" > Support HDFS application download in client mode spark submit > - > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18232) Support Mesos CNI
[ https://issues.apache.org/jira/browse/SPARK-18232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667791#comment-15667791 ] Michael Gummelt commented on SPARK-18232: - [~rxin] Fix Version should be 2.1.0, right? > Support Mesos CNI > - > > Key: SPARK-18232 > URL: https://issues.apache.org/jira/browse/SPARK-18232 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Michael Gummelt >Assignee: Michael Gummelt > Fix For: 2.2.0 > > > Add the ability to launch containers attached to a CNI network: > http://mesos.apache.org/documentation/latest/cni/ > This allows for user-pluggable network isolation, including IP-per-container. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18232) Support Mesos CNI
Michael Gummelt created SPARK-18232: --- Summary: Support Mesos CNI Key: SPARK-18232 URL: https://issues.apache.org/jira/browse/SPARK-18232 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Michael Gummelt Add the ability to launch containers attached to a CNI network: http://mesos.apache.org/documentation/latest/cni/ This allows for user-pluggable network isolation, including IP-per-container. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622992#comment-15622992 ] Michael Gummelt commented on SPARK-16522: - This JIRA was for a bug in Mesos. If you're getting this error w/ Standalone, it's likely a different bug, and you should submit a separate JIRA. > [MESOS] Spark application throws exception on exit > -- > > Key: SPARK-16522 > URL: https://issues.apache.org/jira/browse/SPARK-16522 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui >Assignee: Sun Rui > Fix For: 2.0.1, 2.1.0 > > > Spark applications running on Mesos throw exception upon exit as follows: > {noformat} > 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > ... 4 more > Exception in thread "Thread-47" org.apache.spark.SparkException: Error > notifying standalone scheduler's driver endpoint > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > ... 2 more > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > ... 4 more > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.
[jira] [Updated] (SPARK-17454) Use Mesos disk resources
[ https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-17454: Summary: Use Mesos disk resources (was: Add option to specify Mesos resource offer constraints) > Use Mesos disk resources > > > Key: SPARK-17454 > URL: https://issues.apache.org/jira/browse/SPARK-17454 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Chris Bannister > > Currently the driver will accept offers from Mesos which have enough ram for > the executor and until its max cores is reached. There is no way to control > the required CPU's or disk for each executor, it would be very useful to be > able to apply something similar to spark.mesos.constraints to resource offers > instead of attributes on the offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17454) Add option to specify Mesos resource offer constraints
[ https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517007#comment-15517007 ] Michael Gummelt commented on SPARK-17454: - So you're trying to only launch executors on nodes with a sufficient amount of disk space? > Add option to specify Mesos resource offer constraints > -- > > Key: SPARK-17454 > URL: https://issues.apache.org/jira/browse/SPARK-17454 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Chris Bannister > > Currently the driver will accept offers from Mesos which have enough ram for > the executor and until its max cores is reached. There is no way to control > the required CPU's or disk for each executor, it would be very useful to be > able to apply something similar to spark.mesos.constraints to resource offers > instead of attributes on the offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-16742: Description: We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be contributing it to Apache Spark soon. Mesosphere design doc: https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 Mesosphere code: https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa was: We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be contributing it to Apache Spark soon. https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > Mesosphere design doc: > https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6 > Mesosphere code: > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-16742: Description: We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be contributing it to Apache Spark soon. https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa was: We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be contributing it to Apache Spark soon. https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-16742) Kerberos support for Spark on Mesos
[ https://issues.apache.org/jira/browse/SPARK-16742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-16742: Description: We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be contributing it to Apache Spark soon. https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa was:We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be contributing it to Apache Spark soon. > Kerberos support for Spark on Mesos > --- > > Key: SPARK-16742 > URL: https://issues.apache.org/jira/browse/SPARK-16742 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Michael Gummelt > > We at Mesosphere have written Kerberos support for Spark on Mesos. We'll be > contributing it to Apache Spark soon. > https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17454) Add option to specify Mesos resource offer constraints
[ https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15482682#comment-15482682 ] Michael Gummelt commented on SPARK-17454: - As of Spark 2.0, Mesos mode supports spark.executor.cores And the scheduler doesn't reserve any disk. It just writes to the local workspace. Do you have a need for disk reservation? > Add option to specify Mesos resource offer constraints > -- > > Key: SPARK-17454 > URL: https://issues.apache.org/jira/browse/SPARK-17454 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Chris Bannister > > Currently the driver will accept offers from Mesos which have enough ram for > the executor and until its max cores is reached. There is no way to control > the required CPU's or disk for each executor, it would be very useful to be > able to apply something similar to spark.mesos.constraints to resource offers > instead of attributes on the offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17419) Mesos virtual network support
Michael Gummelt created SPARK-17419: --- Summary: Mesos virtual network support Key: SPARK-17419 URL: https://issues.apache.org/jira/browse/SPARK-17419 Project: Spark Issue Type: Task Components: Mesos Reporter: Michael Gummelt http://mesos.apache.org/documentation/latest/cni/ This will enable launching executors into virtual networks for isolation and security. It will also enable container per IP. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17067) Revocable resource support
[ https://issues.apache.org/jira/browse/SPARK-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-17067: Description: Blocked by https://issues.apache.org/jira/browse/MESOS-4392 > Revocable resource support > -- > > Key: SPARK-17067 > URL: https://issues.apache.org/jira/browse/SPARK-17067 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Michael Gummelt > > Blocked by https://issues.apache.org/jira/browse/MESOS-4392 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-11183) enable support for mesos 0.24+
[ https://issues.apache.org/jira/browse/SPARK-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt closed SPARK-11183. --- Resolution: Done > enable support for mesos 0.24+ > -- > > Key: SPARK-11183 > URL: https://issues.apache.org/jira/browse/SPARK-11183 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Ioannis Polyzos > > mesos 0.24, the mesos leader info in ZK has changed to json tis result to > spark failed to running on 0.24+. > References : > https://issues.apache.org/jira/browse/MESOS-2340 > > http://mail-archives.apache.org/mod_mbox/mesos-commits/201506.mbox/%3ced4698dc56444bcdac3bdf19134db...@git.apache.org%3E > https://github.com/mesos/elasticsearch/issues/338 > https://github.com/spark-jobserver/spark-jobserver/issues/267 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-6679) java.lang.ClassNotFoundException on Mesos fine grained mode and input replication
[ https://issues.apache.org/jira/browse/SPARK-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt closed SPARK-6679. -- Resolution: Won't Fix fine-grained is deprecated > java.lang.ClassNotFoundException on Mesos fine grained mode and input > replication > - > > Key: SPARK-6679 > URL: https://issues.apache.org/jira/browse/SPARK-6679 > Project: Spark > Issue Type: Bug > Components: Mesos, Streaming >Affects Versions: 1.3.0 >Reporter: Ondřej Smola > > Spark Streaming 1.3.0, Mesos 0.21.1 - Only when using fine grained mode and > receiver input replication (StorageLevel.MEMORY_ONLY_2, > StorageLevel.MEMORY_AND_DISK_2). When using coarse grained mode it works. > When not using replication (StorageLevel.MEMORY_ONLY ...) it works. Error: > {code} > 15/03/26 14:50:00 ERROR TransportRequestHandler: Error while invoking > RpcHandler#receive() on RPC id 7178767328921933569 > java.lang.ClassNotFoundException: org/apache/spark/storage/StorageLevel > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:344) > at > org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) > at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:88) > at > org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:65) > at > org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster
[ https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt closed SPARK-5197. -- Resolution: Won't Fix fine-grained is deprecated > Support external shuffle service in fine-grained mode on mesos cluster > -- > > Key: SPARK-5197 > URL: https://issues.apache.org/jira/browse/SPARK-5197 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos, Shuffle >Reporter: Jongyoul Lee > > I think dynamic allocation is almost satisfied on mesos' fine-grained mode, > which already offers resources dynamically, and returns automatically when a > task is finished. It, however, doesn't have a mechanism on support external > shuffle service like yarn's way, which is AuxiliaryService. Because mesos > doesn't support AusiliaryService, we think a different way to do this. > - Launching a shuffle service like a spark job on same cluster > -- Pros > --- Support multi-tenant environment > --- Almost same way like yarn > -- Cons > --- Control long running 'background' job - service - when mesos runs > --- Satisfy all slave - or host - to have one shuffle service all the time > - Launching jobs within shuffle service > -- Pros > --- Easy to implement > --- Don't consider whether shuffle service exists or not. > -- Cons > --- exists multiple shuffle services under multi-tenant environment > --- Control shuffle service port dynamically on multi-user environment > In my opinion, the first one is better idea to support external shuffle > service. Please leave comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17320) Spark Mesos module not building on PRs
Michael Gummelt created SPARK-17320: --- Summary: Spark Mesos module not building on PRs Key: SPARK-17320 URL: https://issues.apache.org/jira/browse/SPARK-17320 Project: Spark Issue Type: Task Components: Mesos Affects Versions: 2.0.0 Reporter: Michael Gummelt Fix For: 2.0.1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-16627) --jars doesn't work in Mesos mode
[ https://issues.apache.org/jira/browse/SPARK-16627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt closed SPARK-16627. --- Resolution: Won't Fix > --jars doesn't work in Mesos mode > - > > Key: SPARK-16627 > URL: https://issues.apache.org/jira/browse/SPARK-16627 > Project: Spark > Issue Type: Bug > Components: Mesos >Reporter: Michael Gummelt > > Definitely doesn't work in cluster mode. Might not work in client mode > either. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17240) SparkConf is Serializable but contains a non-serializable field
[ https://issues.apache.org/jira/browse/SPARK-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15437436#comment-15437436 ] Michael Gummelt commented on SPARK-17240: - cc [~vanzin] > SparkConf is Serializable but contains a non-serializable field > --- > > Key: SPARK-17240 > URL: https://issues.apache.org/jira/browse/SPARK-17240 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1 >Reporter: Michael Gummelt > Fix For: 2.0.1 > > > This commit: > https://github.com/apache/spark/commit/5da6c4b24f512b63cd4e6ba7dd8968066a9396f5 > Added ConfigReader to SparkConf. SparkConf is Serializable, but ConfigReader > is not, which results in the following exception: > {code} > java.io.NotSerializableException: > org.apache.spark.internal.config.ConfigReader > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at org.apache.spark.util.Utils$.serialize(Utils.scala:134) > at > org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:111) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:170) > at > org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:126) > at > org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:265) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) > at org.spark_project.jetty.server.Server.handle(Server.java:499) > at > org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) > at > org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17240) SparkConf is Serializable but contains a non-serializable field
Michael Gummelt created SPARK-17240: --- Summary: SparkConf is Serializable but contains a non-serializable field Key: SPARK-17240 URL: https://issues.apache.org/jira/browse/SPARK-17240 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.1 Reporter: Michael Gummelt Fix For: 2.0.1 This commit: https://github.com/apache/spark/commit/5da6c4b24f512b63cd4e6ba7dd8968066a9396f5 Added ConfigReader to SparkConf. SparkConf is Serializable, but ConfigReader is not, which results in the following exception: {code} java.io.NotSerializableException: org.apache.spark.internal.config.ConfigReader at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.util.Utils$.serialize(Utils.scala:134) at org.apache.spark.scheduler.cluster.mesos.ZookeeperMesosClusterPersistenceEngine.persist(MesosClusterPersistenceEngine.scala:111) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.submitDriver(MesosClusterScheduler.scala:170) at org.apache.spark.deploy.rest.mesos.MesosSubmitRequestServlet.handleSubmit(MesosRestServer.scala:126) at org.apache.spark.deploy.rest.SubmitRequestServlet.doPost(RestSubmissionServer.scala:265) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:812) at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:587) at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.spark_project.jetty.server.Server.handle(Server.java:499) at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:311) at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.spark_project.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544) at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.spark_project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10401) spark-submit --unsupervise
[ https://issues.apache.org/jira/browse/SPARK-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428572#comment-15428572 ] Michael Gummelt commented on SPARK-10401: - This should probably be a separate JIRA, but I'm just adding a note here that {{--kill}} doesn't seem to kill the job immediately. It invokes Mesos' {{killTask}} function, which runs a {{docker stop}} for docker images. This sends a SIGTERM, which seems to be ignored, then sends a SIGKILL after 10s, which ultimately kills the job. I'd like to find out why the SIGTERM is ignored. > spark-submit --unsupervise > --- > > Key: SPARK-10401 > URL: https://issues.apache.org/jira/browse/SPARK-10401 > Project: Spark > Issue Type: New Feature > Components: Deploy, Mesos >Affects Versions: 1.5.0 >Reporter: Alberto Miorin > > When I submit a streaming job with the option --supervise to the new mesos > spark dispatcher, I cannot decommission the job. > I tried spark-submit --kill, but dispatcher always restarts it. > Driver and Executors are both Docker containers. > I think there should be a subcommand spark-submit --unsupervise -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17067) Revocable resource support
[ https://issues.apache.org/jira/browse/SPARK-17067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421698#comment-15421698 ] Michael Gummelt commented on SPARK-17067: - Add revocable resource support: http://mesos.apache.org/documentation/latest/oversubscription/ This will allow higher priority jobs (or other, non-Spark services) to preempty lower priority jobs > Revocable resource support > -- > > Key: SPARK-17067 > URL: https://issues.apache.org/jira/browse/SPARK-17067 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Michael Gummelt > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17067) Revocable resource support
Michael Gummelt created SPARK-17067: --- Summary: Revocable resource support Key: SPARK-17067 URL: https://issues.apache.org/jira/browse/SPARK-17067 Project: Spark Issue Type: Improvement Components: Mesos Reporter: Michael Gummelt -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16784) Configurable log4j settings
[ https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417857#comment-15417857 ] Michael Gummelt edited comment on SPARK-16784 at 8/11/16 8:11 PM: -- {{log4j.debug=true}} only results in log4j printing its internal debugging messages (e.g. config file location, appenders, etc.). It doesn't turn on debug logging for the application. was (Author: mgummelt): {{log4j.debug=true}} only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application. > Configurable log4j settings > --- > > Key: SPARK-16784 > URL: https://issues.apache.org/jira/browse/SPARK-16784 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt > > I often want to change the logging configuration on a single spark job. This > is easy in client mode. I just modify log4j.properties. It's difficult in > cluster mode, because I need to modify the log4j.properties in the > distribution in which the driver runs. I'd like a way of setting this > dynamically, such as a java system property. Some brief searching showed > that log4j doesn't seem to accept such a property, but I'd like to open up > this idea for further comment. Maybe we can find a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16784) Configurable log4j settings
[ https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417856#comment-15417856 ] Michael Gummelt commented on SPARK-16784: - `log4j.debug=true` only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application. > Configurable log4j settings > --- > > Key: SPARK-16784 > URL: https://issues.apache.org/jira/browse/SPARK-16784 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt > > I often want to change the logging configuration on a single spark job. This > is easy in client mode. I just modify log4j.properties. It's difficult in > cluster mode, because I need to modify the log4j.properties in the > distribution in which the driver runs. I'd like a way of setting this > dynamically, such as a java system property. Some brief searching showed > that log4j doesn't seem to accept such a property, but I'd like to open up > this idea for further comment. Maybe we can find a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16784) Configurable log4j settings
[ https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15417857#comment-15417857 ] Michael Gummelt edited comment on SPARK-16784 at 8/11/16 8:10 PM: -- {{log4j.debug=true}} only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application. was (Author: mgummelt): `log4j.debug=true` only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application. > Configurable log4j settings > --- > > Key: SPARK-16784 > URL: https://issues.apache.org/jira/browse/SPARK-16784 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt > > I often want to change the logging configuration on a single spark job. This > is easy in client mode. I just modify log4j.properties. It's difficult in > cluster mode, because I need to modify the log4j.properties in the > distribution in which the driver runs. I'd like a way of setting this > dynamically, such as a java system property. Some brief searching showed > that log4j doesn't seem to accept such a property, but I'd like to open up > this idea for further comment. Maybe we can find a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-16784) Configurable log4j settings
[ https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt reopened SPARK-16784: - `log4j.debug=true` only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application. > Configurable log4j settings > --- > > Key: SPARK-16784 > URL: https://issues.apache.org/jira/browse/SPARK-16784 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt > > I often want to change the logging configuration on a single spark job. This > is easy in client mode. I just modify log4j.properties. It's difficult in > cluster mode, because I need to modify the log4j.properties in the > distribution in which the driver runs. I'd like a way of setting this > dynamically, such as a java system property. Some brief searching showed > that log4j doesn't seem to accept such a property, but I'd like to open up > this idea for further comment. Maybe we can find a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-16784) Configurable log4j settings
[ https://issues.apache.org/jira/browse/SPARK-16784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-16784: Comment: was deleted (was: `log4j.debug=true` only results in log4j printing its debugging messages. It doesn't turn on debug logging for the application.) > Configurable log4j settings > --- > > Key: SPARK-16784 > URL: https://issues.apache.org/jira/browse/SPARK-16784 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Michael Gummelt > > I often want to change the logging configuration on a single spark job. This > is easy in client mode. I just modify log4j.properties. It's difficult in > cluster mode, because I need to modify the log4j.properties in the > distribution in which the driver runs. I'd like a way of setting this > dynamically, such as a java system property. Some brief searching showed > that log4j doesn't seem to accept such a property, but I'd like to open up > this idea for further comment. Maybe we can find a solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17002) Document that spark.ssl.protocol. is required for SSL
Michael Gummelt created SPARK-17002: --- Summary: Document that spark.ssl.protocol. is required for SSL Key: SPARK-17002 URL: https://issues.apache.org/jira/browse/SPARK-17002 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.0.0, 1.6.2 Reporter: Michael Gummelt cc [~jlewandowski] I was trying to start the Spark master. When setting {{spark.ssl.enabled=true}}, but failing to set {{spark.ssl.protocol}}, I get this none-too-helpful error message: {code} 16/08/10 15:17:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mgummelt); users with modify permissions: Set(mgummelt) 16/08/10 15:17:50 WARN SecurityManager: Using 'accept-all' trust manager for SSL connections. Exception in thread "main" java.security.KeyManagementException: Default SSLContext is initialized automatically at sun.security.ssl.SSLContextImpl$DefaultSSLContext.engineInit(SSLContextImpl.java:749) at javax.net.ssl.SSLContext.init(SSLContext.java:282) at org.apache.spark.SecurityManager.(SecurityManager.scala:284) at org.apache.spark.deploy.master.Master$.startRpcEnvAndEndpoint(Master.scala:1121) at org.apache.spark.deploy.master.Master$.main(Master.scala:1106) at org.apache.spark.deploy.master.Master.main(Master.scala) {code} We should document that {{spark.ssl.protocol}} is required, and throw a more helpful error message when it isn't set. In fact, we should remove the `getOrElse` here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SecurityManager.scala#L285, since the following line fails when the protocol is set to "Default" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt reopened SPARK-16522: - Reopening so we can track this until it's merged into the 2.0 branch. Also changed the fix version to 2.0.1 > [MESOS] Spark application throws exception on exit > -- > > Key: SPARK-16522 > URL: https://issues.apache.org/jira/browse/SPARK-16522 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui >Assignee: Sun Rui > Fix For: 2.0.1 > > > Spark applications running on Mesos throw exception upon exit as follows: > {noformat} > 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > ... 4 more > Exception in thread "Thread-47" org.apache.spark.SparkException: Error > notifying standalone scheduler's driver endpoint > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > ... 2 more > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > ... 4 more > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark
[jira] [Updated] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Gummelt updated SPARK-16522: Fix Version/s: (was: 2.1.0) 2.0.1 > [MESOS] Spark application throws exception on exit > -- > > Key: SPARK-16522 > URL: https://issues.apache.org/jira/browse/SPARK-16522 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui >Assignee: Sun Rui > Fix For: 2.0.1 > > > Spark applications running on Mesos throw exception upon exit as follows: > {noformat} > 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > ... 4 more > Exception in thread "Thread-47" org.apache.spark.SparkException: Error > notifying standalone scheduler's driver endpoint > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > ... 2 more > Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > ... 4 more > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) >
[jira] [Commented] (SPARK-16967) Collect Mesos support code into a module/profile
[ https://issues.apache.org/jira/browse/SPARK-16967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414004#comment-15414004 ] Michael Gummelt commented on SPARK-16967: - Will do > Collect Mesos support code into a module/profile > > > Key: SPARK-16967 > URL: https://issues.apache.org/jira/browse/SPARK-16967 > Project: Spark > Issue Type: Task > Components: Mesos, Spark Core >Affects Versions: 2.0.0 >Reporter: Sean Owen >Priority: Critical > > CC [~mgummelt] [~tnachen] [~skonto] > I think this is fairly easy and would be beneficial as more work goes into > Mesos. It should separate into a module like YARN does, just on principle > really, but because it also means anyone that doesn't need Mesos support can > build without it. > I'm entirely willing to take a shot at this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12909) Spark on Mesos accessing Secured HDFS w/Kerberos
[ https://issues.apache.org/jira/browse/SPARK-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413999#comment-15413999 ] Michael Gummelt commented on SPARK-12909: - I agree. I just spoke with Reynold about this. I'll create the module before the next big feature. > Spark on Mesos accessing Secured HDFS w/Kerberos > > > Key: SPARK-12909 > URL: https://issues.apache.org/jira/browse/SPARK-12909 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Greg Senia > > Ability for Spark on Mesos to use a Kerberized HDFS FileSystem for data It > seems like this is not possible based on email chains and forum articles? If > these are true how hard would it be to get this implemented I'm willing to > try to help. > https://community.hortonworks.com/questions/5415/spark-on-yarn-vs-mesos.html > https://www.mail-archive.com/user@spark.apache.org/msg31326.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12909) Spark on Mesos accessing Secured HDFS w/Kerberos
[ https://issues.apache.org/jira/browse/SPARK-12909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412624#comment-15412624 ] Michael Gummelt commented on SPARK-12909: - DC/OS Spark has this functionality, and we'll be upstreaming it to Apache Spark soon. > Spark on Mesos accessing Secured HDFS w/Kerberos > > > Key: SPARK-12909 > URL: https://issues.apache.org/jira/browse/SPARK-12909 > Project: Spark > Issue Type: New Feature > Components: Mesos >Reporter: Greg Senia > > Ability for Spark on Mesos to use a Kerberized HDFS FileSystem for data It > seems like this is not possible based on email chains and forum articles? If > these are true how hard would it be to get this implemented I'm willing to > try to help. > https://community.hortonworks.com/questions/5415/spark-on-yarn-vs-mesos.html > https://www.mail-archive.com/user@spark.apache.org/msg31326.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412461#comment-15412461 ] Michael Gummelt commented on SPARK-11638: - [~radekg] > The only advantage we had was using the same configuration inside of the > docker container. You mean you want to run the spark driver in a docker container? Which configuration did you have to change? I can look more into this, but I need a clear "It's easier/better to do X in bridge mode than in host mode". > So with the HTTP API, Spark would still require the heavy libmesos in order > to work with Mesos? No. The HTTP API will remove the libmesos dependency, which is nice. It's not an urgent priority though. > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentione
[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411315#comment-15411315 ] Michael Gummelt commented on SPARK-16944: - Yea, we typically call it "delay scheduling". It was first written about by the Spark/Mesos researchers: http://elmeleegy.com/khaled/papers/delay_scheduling.pdf Spark already has `spark.locality.wait`, but that's how long the task scheduler will wait until an executor will come up with the preferred locality. We need a similar concept for waiting for offers to come in so we can place the executor correctly in the first place. > [MESOS] Improve data locality when launching new executors when dynamic > allocation is enabled > - > > Key: SPARK-16944 > URL: https://issues.apache.org/jira/browse/SPARK-16944 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui > > Currently Spark on Yarn supports better data locality by considering the > preferred locations of the pending tasks when dynamic allocation is enabled. > Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better > that Mesos can also support this feature. > I guess that some logic existing in Yarn could be reused by Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411309#comment-15411309 ] Michael Gummelt commented on SPARK-16944: - Since Mesos is offer based, it's up to the Spark scheduler itself to choose which offers have the best locality. In YARN, I think they tell the resource manager about preferences. > [MESOS] Improve data locality when launching new executors when dynamic > allocation is enabled > - > > Key: SPARK-16944 > URL: https://issues.apache.org/jira/browse/SPARK-16944 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui > > Currently Spark on Yarn supports better data locality by considering the > preferred locations of the pending tasks when dynamic allocation is enabled. > Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better > that Mesos can also support this feature. > I guess that some logic existing in Yarn could be reused by Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16944) [MESOS] Improve data locality when launching new executors when dynamic allocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-16944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411307#comment-15411307 ] Michael Gummelt commented on SPARK-16944: - I think we can improve both with and without dynamic allocation. In both modes, Mesos is only looking at locality after it's already placed the executors. > [MESOS] Improve data locality when launching new executors when dynamic > allocation is enabled > - > > Key: SPARK-16944 > URL: https://issues.apache.org/jira/browse/SPARK-16944 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui > > Currently Spark on Yarn supports better data locality by considering the > preferred locations of the pending tasks when dynamic allocation is enabled. > Refer to https://issues.apache.org/jira/browse/SPARK-4352. It would be better > that Mesos can also support this feature. > I guess that some logic existing in Yarn could be reused by Mesos. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11638) Run Spark on Mesos with bridge networking
[ https://issues.apache.org/jira/browse/SPARK-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411277#comment-15411277 ] Michael Gummelt commented on SPARK-11638: - This JIRA is complex and a lot of it is out of date. Can someone briefly explain to me what the problem is? Why do you want bridge networking? > Run Spark on Mesos with bridge networking > - > > Key: SPARK-11638 > URL: https://issues.apache.org/jira/browse/SPARK-11638 > Project: Spark > Issue Type: Improvement > Components: Mesos, Spark Core >Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2, 1.6.0 >Reporter: Radoslaw Gruchalski > Attachments: 1.4.0.patch, 1.4.1.patch, 1.5.0.patch, 1.5.1.patch, > 1.5.2.patch, 1.6.0.patch, 2.3.11.patch, 2.3.4.patch > > > h4. Summary > Provides {{spark.driver.advertisedPort}}, > {{spark.fileserver.advertisedPort}}, {{spark.broadcast.advertisedPort}} and > {{spark.replClassServer.advertisedPort}} settings to enable running Spark in > Mesos on Docker with Bridge networking. Provides patches for Akka Remote to > enable Spark driver advertisement using alternative host and port. > With these settings, it is possible to run Spark Master in a Docker container > and have the executors running on Mesos talk back correctly to such Master. > The problem is discussed on the Mesos mailing list here: > https://mail-archives.apache.org/mod_mbox/mesos-user/201510.mbox/%3CCACTd3c9vjAMXk=bfotj5ljzfrh5u7ix-ghppfqknvg9mkkc...@mail.gmail.com%3E > h4. Running Spark on Mesos - LIBPROCESS_ADVERTISE_IP opens the door > In order for the framework to receive orders in the bridged container, Mesos > in the container has to register for offers using the IP address of the > Agent. Offers are sent by Mesos Master to the Docker container running on a > different host, an Agent. Normally, prior to Mesos 0.24.0, {{libprocess}} > would advertise itself using the IP address of the container, something like > {{172.x.x.x}}. Obviously, Mesos Master can't reach that address, it's a > different host, it's a different machine. Mesos 0.24.0 introduced two new > properties for {{libprocess}} - {{LIBPROCESS_ADVERTISE_IP}} and > {{LIBPROCESS_ADVERTISE_PORT}}. This allows the container to use the Agent's > address to register for offers. This was provided mainly for running Mesos in > Docker on Mesos. > h4. Spark - how does the above relate and what is being addressed here? > Similar to Mesos, out of the box, Spark does not allow to advertise its > services on ports different than bind ports. Consider following scenario: > Spark is running inside a Docker container on Mesos, it's a bridge networking > mode. Assuming a port {{}} for the {{spark.driver.port}}, {{6677}} for > the {{spark.fileserver.port}}, {{6688}} for the {{spark.broadcast.port}} and > {{23456}} for the {{spark.replClassServer.port}}. If such task is posted to > Marathon, Mesos will give 4 ports in range {{31000-32000}} mapping to the > container ports. Starting the executors from such container results in > executors not being able to communicate back to the Spark Master. > This happens because of 2 things: > Spark driver is effectively an {{akka-remote}} system with {{akka.tcp}} > transport. {{akka-remote}} prior to version {{2.4}} can't advertise a port > different to what it bound to. The settings discussed are here: > https://github.com/akka/akka/blob/f8c1671903923837f22d0726a955e0893add5e9f/akka-remote/src/main/resources/reference.conf#L345-L376. > These do not exist in Akka {{2.3.x}}. Spark driver will always advertise > port {{}} as this is the one {{akka-remote}} is bound to. > Any URIs the executors contact the Spark Master on, are prepared by Spark > Master and handed over to executors. These always contain the port number > used by the Master to find the service on. The services are: > - {{spark.broadcast.port}} > - {{spark.fileserver.port}} > - {{spark.replClassServer.port}} > all above ports are by default {{0}} (random assignment) but can be specified > using Spark configuration ( {{-Dspark...port}} ). However, they are limited > in the same way as the {{spark.driver.port}}; in the above example, an > executor should not contact the file server on port {{6677}} but rather on > the respective 31xxx assigned by Mesos. > Spark currently does not allow any of that. > h4. Taking on the problem, step 1: Spark Driver > As mentioned above, Spark Driver is based on {{akka-remote}}. In order to > take on the problem, the {{akka.remote.net.tcp.bind-hostname}} and > {{akka.remote.net.tcp.bind-port}} settings are a must. Spark does not compile > with Akka 2.4.x yet. > What we want is the back port of mentioned {{akka-remote}} settings to > {{2.3.x}} versions. These patches are attached to this ticket - > {{2.3.4.patch
[jira] [Created] (SPARK-16927) Mesos Cluster Dispatcher default properties
Michael Gummelt created SPARK-16927: --- Summary: Mesos Cluster Dispatcher default properties Key: SPARK-16927 URL: https://issues.apache.org/jira/browse/SPARK-16927 Project: Spark Issue Type: New Feature Components: Mesos Affects Versions: 2.0.0 Reporter: Michael Gummelt Add the capability to set default driver properties for all jobs submitted through the dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org