[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22887 @gjhkael can you clarify further what the undesirable behavior is, and what behavior you are looking for? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22887 > So it's reasonble for users to expect that, if they set hadoop config via the SQL SET command, it should override the one in spark-defaults.conf. I agree with that. But the previous explanation seemed to be saying that's the undesired behavior. Maybe I'm just having trouble with understanding what @gjhkael wrote. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22887 Spark SQL SET command can't update any static config or Spark core configs, but I think hadoop configs are different. It's not static as users can update it via `SparkContext.hadoopConfiguration`. `SparkSession.SessionState.newHadoopConf()` is a mechanism to allow users to set hadoop config per-session in Spark SQL. So it's reasonble for users to expect that, if they set hadoop config via the SQL SET command, it should override the one in `spark-defaults.conf`. Looking back at `appendS3AndSparkHadoopConfigurations`, it has 2 parameters: spark conf and hadoop conf. The spark conf comes from `spark-defaults.conf` and any user provided configs when building the `SparkContext`. The user provided configs override `spark-defaults.conf`. The hadoop conf is either an empty config(if `appendS3AndSparkHadoopConfigurations` is called from `SparkHadoopUtil.newHadoopConfiguration`), or from `SparkSession.SessionState.newHadoopConf()`(if `appendS3AndSparkHadoopConfigurations` is called from `HadoopTableReader`). For the first case, nothing we need to worry about. For the second case, I think the hadoop config should take priority, as it contains the configs specified by users at rutime. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22887 ok, that makes sense as in I understand what you're saying, but not sure it's what you actually want? Why shouldn't "set spark.hadoop.*" override spark-defaults.conf? But, in any case, it seems like the patch for SPARK-26060 (#23031) should take care of this (by raising an error). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user gjhkael commented on the issue: https://github.com/apache/spark/pull/22887 @vanzin @cloud-fan The simplest description: user set 'spark.hadoop.xxx' through setCommand will not cover the same configutation that set in spark-defaults.conf file. I don't know if that description makes sense? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22887 > I think this is what this PR tries to fix? To be fair I'm not sure I fully understand the PR description. But I know that the previous patch (which I commented on) broke the functionality I described - not in the context of SQL, but in the context of everything else in Spark that calls that code. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22887 > Basically, if my "core-size.xml" says "mapreduce.input.fileinputformat.split.maxsize" is 2, and my Spark conf says "spark.hadoop.mapreduce.input.fileinputformat.split.maxsize" is 3, then the value from the config generated by the method you're changing must be 3. I think this is what this PR tries to fix? the `hadoopConf` parameter of `appendS3AndSparkHadoopConfigurations` is either an empty one, or a one from `spark.SessionState.newHadoopConf()` which contains user-provided hadoop conf. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user gjhkael commented on the issue: https://github.com/apache/spark/pull/22887 @vanzin Thanks for you review, I add a new commit to let the user's "set" command take effect. Let me know if you have an easier way. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22887 Sorry, this is a breaking change. It changes the behavior from "I can currently override any Hadoop configs, even final ones, using spark.hadoop.*" to "I can never do that". If there's an issue with the SQL "set" command that needs to be addressed, this is the wrong place to do it. Basically, if my "core-size.xml" says "mapreduce.input.fileinputformat.split.maxsize" is 2, and my Spark conf says "spark.hadoop.mapreduce.input.fileinputformat.split.maxsize" is 3, then the value from the config generated by the method you're changing must be 3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22887 looks reasonable, cc @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user cxzl25 commented on the issue: https://github.com/apache/spark/pull/22887 user set hadoop conf can't overwrite spark-defaults.conf **SparkHadoopUtil.get.appendS3AndSparkHadoopConfigurations** overwrite the user-set spark.hadoop with the default configuration (sparkSession.sparkContext.conf) @gengliangwang @cloud-fan @gatorsmile Could you please give some comments when you have time? Thanks so much. https://github.com/apache/spark/blob/80813e198033cd63cc6100ee6ffe7d1eb1dff27b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L85-L89 ## test: ### spark-defaults.conf ``` spark.hadoop.mapreduce.input.fileinputformat.split.maxsize 2 ``` ### spark-shell ```scala val hadoopConfKey="mapreduce.input.fileinputformat.split.maxsize" spark.conf.get("spark.hadoop."+hadoopConfKey) // 2 var hadoopConf=spark.sessionState.newHadoopConf hadoopConf.get(hadoopConfKey) // 2 spark.conf.set(hadoopConfKey,1) // set 1 hadoopConf=spark.sessionState.newHadoopConf hadoopConf.get(hadoopConfKey) // 1 //org.apache.spark.sql.hive.HadoopTableReader append Conf org.apache.spark.deploy.SparkHadoopUtil.get.appendS3AndSparkHadoopConfigurations(spark.sparkContext.getConf, hadoopConf) //org.apache.spark.sql.hive.HadoopTableReader _broadcastedHadoopConf hadoopConf.get("mapreduce.input.fileinputformat.split.maxsize") // 2 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user gjhkael commented on the issue: https://github.com/apache/spark/pull/22887 > can you explain more about why you make the change? Some hadoop configuration set it in spark-default.conf, we want it to be global, but in some cases, user need to override the configuration but cannot works, for the sparkContext's conf fill the hadoopConf again finally before broadcast this hadoop conf. > Did you try `spark.SessionState.newHadoopConf()`? We have this problem is in Spark-sql, not use the datafame api --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/22887 Hi @gjhkael , can you explain more about why you make the change? Did you try `spark.SessionState.newHadoopConf()`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user gjhkael commented on the issue: https://github.com/apache/spark/pull/22887 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22887 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22887 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org