[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17357 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r143407147 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -23,14 +23,15 @@ import org.apache.commons.lang3.StringUtils import org.apache.spark.{SecurityManager, SparkConf} import org.apache.spark.deploy.{DependencyUtils, SparkHadoopUtil, SparkSubmit} +import org.apache.spark.internal.Logging import org.apache.spark.rpc.RpcEnv import org.apache.spark.util.{ChildFirstURLClassLoader, MutableURLClassLoader, Utils} /** * Utility object for launching driver programs such that they share fate with the Worker process. * This is used in standalone cluster mode only. */ -object DriverWrapper { +object DriverWrapper extends Logging { --- End diff -- I can not conceive, why that can be a problem. Can you also describe, why do you think that it can be a problem? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r143096475 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -23,14 +23,15 @@ import org.apache.commons.lang3.StringUtils import org.apache.spark.{SecurityManager, SparkConf} import org.apache.spark.deploy.{DependencyUtils, SparkHadoopUtil, SparkSubmit} +import org.apache.spark.internal.Logging import org.apache.spark.rpc.RpcEnv import org.apache.spark.util.{ChildFirstURLClassLoader, MutableURLClassLoader, Utils} /** * Utility object for launching driver programs such that they share fate with the Worker process. * This is used in standalone cluster mode only. */ -object DriverWrapper { +object DriverWrapper extends Logging { --- End diff -- This `DriverWrapper` actually runs within the same JVM of driver, and initialize log4j instance earlier. Will this be a problem? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r135699350 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala --- @@ -139,7 +139,9 @@ private[rest] class StandaloneSubmitRequestServlet( val driverExtraLibraryPath = sparkProperties.get("spark.driver.extraLibraryPath") val superviseDriver = sparkProperties.get("spark.driver.supervise") val appArgs = request.appArgs -val environmentVariables = request.environmentVariables +// Filter SPARK_LOCAL_(IP|HOSTNAME) environment variables from being set on the remote system. +val environmentVariables = + request.environmentVariables.filterNot(x => x._1.matches("SPARK_LOCAL_(IP|HOSTNAME")) --- End diff -- Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r135627170 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala --- @@ -139,7 +139,9 @@ private[rest] class StandaloneSubmitRequestServlet( val driverExtraLibraryPath = sparkProperties.get("spark.driver.extraLibraryPath") val superviseDriver = sparkProperties.get("spark.driver.supervise") val appArgs = request.appArgs -val environmentVariables = request.environmentVariables +// Filter SPARK_LOCAL_(IP|HOSTNAME) environment variables from being set on the remote system. +val environmentVariables = + request.environmentVariables.filterNot(x => x._1.matches("SPARK_LOCAL_(IP|HOSTNAME")) --- End diff -- Should it be `"SPARK_LOCAL_(IP|HOSTNAME)"`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r135485503 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala --- @@ -139,7 +139,9 @@ private[rest] class StandaloneSubmitRequestServlet( val driverExtraLibraryPath = sparkProperties.get("spark.driver.extraLibraryPath") val superviseDriver = sparkProperties.get("spark.driver.supervise") val appArgs = request.appArgs -val environmentVariables = request.environmentVariables +// Filter SPARK_LOCAL_(IP|HOSTNAME) environment variables from being set on the remote system. +val environmentVariables = + request.environmentVariables.filterNot(x => x._1.matches("SPARK_LOCAL_(IP|HOSTNAME")) --- End diff -- @jiangxb1987 Hi I have changed it to only filter HOSTNAME and IP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r135187410 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala --- @@ -139,7 +139,9 @@ private[rest] class StandaloneSubmitRequestServlet( val driverExtraLibraryPath = sparkProperties.get("spark.driver.extraLibraryPath") val superviseDriver = sparkProperties.get("spark.driver.supervise") val appArgs = request.appArgs -val environmentVariables = request.environmentVariables +// Filter SPARK_LOCAL environment variables from being set on the remote system. +val environmentVariables = + request.environmentVariables.filterNot(_._1.startsWith("SPARK_LOCAL")) --- End diff -- Alright, I will check how it is used across the project. Just noted, In `LocalDirsSuite`, comments in `test("SPARK_LOCAL_DIRS override also affects driver") ` seems to corroborate with my intentions here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r135032686 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala --- @@ -139,7 +139,9 @@ private[rest] class StandaloneSubmitRequestServlet( val driverExtraLibraryPath = sparkProperties.get("spark.driver.extraLibraryPath") val superviseDriver = sparkProperties.get("spark.driver.supervise") val appArgs = request.appArgs -val environmentVariables = request.environmentVariables +// Filter SPARK_LOCAL environment variables from being set on the remote system. +val environmentVariables = + request.environmentVariables.filterNot(_._1.startsWith("SPARK_LOCAL")) --- End diff -- Could you please check the code to ensure that `SPARK_LOCAL_DIRS` has not been used as a global config? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r134986472 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -38,8 +39,10 @@ object DriverWrapper { */ case workerUrl :: userJar :: mainClass :: extraArgs => val conf = new SparkConf() -val rpcEnv = RpcEnv.create("Driver", - Utils.localHostName(), 0, conf, new SecurityManager(conf)) +val host: String = Utils.localHostName() --- End diff -- Please correct me, AFAIU `spark.driver.host` may not be specified on each node of the cluster and its value might be global(i.e. cluster wide). Since driver is launched on a random node during a failover, it's value can not be pre-assigned and has to be picked up from that node's local environment. About `spark.driver.port`, I am not sure, but the fact is - it might work, even if it is global cluster wide constant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r134985214 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala --- @@ -139,7 +139,9 @@ private[rest] class StandaloneSubmitRequestServlet( val driverExtraLibraryPath = sparkProperties.get("spark.driver.extraLibraryPath") val superviseDriver = sparkProperties.get("spark.driver.supervise") val appArgs = request.appArgs -val environmentVariables = request.environmentVariables +// Filter SPARK_LOCAL environment variables from being set on the remote system. +val environmentVariables = + request.environmentVariables.filterNot(_._1.startsWith("SPARK_LOCAL")) --- End diff -- You are right, but shouldn't all SPARK_LOCAL* properties be picked up from the local environment of the node where driver is going to be started? Not filtering them, would mean, that these local properties are common to all nodes. But for this particular bug, `SPARK_LOCAL_DIRS` is not required to be filtered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r131582810 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala --- @@ -139,7 +139,9 @@ private[rest] class StandaloneSubmitRequestServlet( val driverExtraLibraryPath = sparkProperties.get("spark.driver.extraLibraryPath") val superviseDriver = sparkProperties.get("spark.driver.supervise") val appArgs = request.appArgs -val environmentVariables = request.environmentVariables +// Filter SPARK_LOCAL environment variables from being set on the remote system. +val environmentVariables = + request.environmentVariables.filterNot(_._1.startsWith("SPARK_LOCAL")) --- End diff -- I guess that Driver might not use `SPARK_LOCAL_DIRS`. But yes we may only need to filter out `SPARK_LOCAL_IP` and `SPARK_LOCAL_HOSTNAME`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r131544143 --- Diff: core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala --- @@ -139,7 +139,9 @@ private[rest] class StandaloneSubmitRequestServlet( val driverExtraLibraryPath = sparkProperties.get("spark.driver.extraLibraryPath") val superviseDriver = sparkProperties.get("spark.driver.supervise") val appArgs = request.appArgs -val environmentVariables = request.environmentVariables +// Filter SPARK_LOCAL environment variables from being set on the remote system. +val environmentVariables = + request.environmentVariables.filterNot(_._1.startsWith("SPARK_LOCAL")) --- End diff -- I think this should also affect `SPARK_LOCAL_DIRS`, is that expected? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17357#discussion_r131537192 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -38,8 +39,10 @@ object DriverWrapper { */ case workerUrl :: userJar :: mainClass :: extraArgs => val conf = new SparkConf() -val rpcEnv = RpcEnv.create("Driver", - Utils.localHostName(), 0, conf, new SecurityManager(conf)) +val host: String = Utils.localHostName() --- End diff -- I think it is reasonable to use `Utils.localHostName()`. But what is `spark.driver.host` for? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org