[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-12-04 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22887
  
@gjhkael can you clarify further what the undesirable behavior is, and what 
behavior you are looking for?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-11-28 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22887
  
> So it's reasonble for users to expect that, if they set hadoop config via 
the SQL SET command, it should override the one in spark-defaults.conf.

I agree with that. But the previous explanation seemed to be saying that's 
the undesired behavior. Maybe I'm just having trouble with understanding what 
@gjhkael wrote.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-11-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22887
  
Spark SQL SET command can't update any static config or Spark core configs, 
but I think hadoop configs are different. It's not static as users can update 
it via `SparkContext.hadoopConfiguration`.  
`SparkSession.SessionState.newHadoopConf()` is a mechanism to allow users to 
set hadoop config per-session in Spark SQL.

So it's reasonble for users to expect that, if they set hadoop config via 
the SQL SET command, it should override the one in `spark-defaults.conf`.

Looking back at `appendS3AndSparkHadoopConfigurations`, it has 2 
parameters: spark conf and hadoop conf. The spark conf comes from 
`spark-defaults.conf` and any user provided configs when building the 
`SparkContext`. The user provided configs override `spark-defaults.conf`. The 
hadoop conf is either an empty config(if `appendS3AndSparkHadoopConfigurations` 
is called from `SparkHadoopUtil.newHadoopConfiguration`), or from 
`SparkSession.SessionState.newHadoopConf()`(if 
`appendS3AndSparkHadoopConfigurations` is called from `HadoopTableReader`).

For the first case, nothing we need to worry about. For the second case, I 
think the hadoop config should take priority, as it contains the configs 
specified by users at rutime.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-11-27 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22887
  
ok, that makes sense as in I understand what you're saying, but not sure 
it's what you actually want?

Why shouldn't "set spark.hadoop.*" override spark-defaults.conf?

But, in any case, it seems like the patch for SPARK-26060 (#23031) should 
take care of this (by raising an error).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-11-27 Thread gjhkael
Github user gjhkael commented on the issue:

https://github.com/apache/spark/pull/22887
  
@vanzin @cloud-fan 
The simplest description: user set 'spark.hadoop.xxx' through setCommand 
will not cover the same configutation that set in spark-defaults.conf file. 
I don't know if that description makes sense?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-11-27 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22887
  
> I think this is what this PR tries to fix?

To be fair I'm not sure I fully understand the PR description. But I know 
that the previous patch (which I commented on) broke the functionality I 
described - not in the context of SQL, but in the context of everything else in 
Spark that calls that code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-11-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22887
  
> Basically, if my "core-size.xml" says 
"mapreduce.input.fileinputformat.split.maxsize" is 2, and my Spark conf says 
"spark.hadoop.mapreduce.input.fileinputformat.split.maxsize" is 3, then the 
value from the config generated by the method you're changing must be 3.

I think this is what this PR tries to fix? the `hadoopConf` parameter of 
`appendS3AndSparkHadoopConfigurations` is either an empty one, or a one from 
`spark.SessionState.newHadoopConf()` which contains user-provided hadoop conf.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-11-26 Thread gjhkael
Github user gjhkael commented on the issue:

https://github.com/apache/spark/pull/22887
  
@vanzin Thanks for you review, I add a new commit to let the user's "set" 
command take effect. Let me know if you have an easier way. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-11-26 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22887
  
Sorry, this is a breaking change. It changes the behavior from "I can 
currently override any Hadoop configs, even final ones, using spark.hadoop.*" 
to "I can never do that".

If there's an issue with the SQL "set" command that needs to be addressed, 
this is the wrong place to do it.

Basically, if my "core-size.xml" says 
"mapreduce.input.fileinputformat.split.maxsize" is 2, and my Spark conf says 
"spark.hadoop.mapreduce.input.fileinputformat.split.maxsize" is 3, then the 
value from the config generated by the method you're changing must be 3.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-11-11 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22887
  
looks reasonable, cc @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-11-09 Thread cxzl25
Github user cxzl25 commented on the issue:

https://github.com/apache/spark/pull/22887
  
user set hadoop conf can't overwrite spark-defaults.conf

**SparkHadoopUtil.get.appendS3AndSparkHadoopConfigurations** overwrite the 
user-set spark.hadoop with the default configuration 
(sparkSession.sparkContext.conf)

@gengliangwang @cloud-fan @gatorsmile  
Could you please give some comments when you have time?
Thanks so much.


https://github.com/apache/spark/blob/80813e198033cd63cc6100ee6ffe7d1eb1dff27b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L85-L89

## test:

### spark-defaults.conf
```
spark.hadoop.mapreduce.input.fileinputformat.split.maxsize  2
```

### spark-shell

```scala

val hadoopConfKey="mapreduce.input.fileinputformat.split.maxsize"
spark.conf.get("spark.hadoop."+hadoopConfKey) // 2
var hadoopConf=spark.sessionState.newHadoopConf
hadoopConf.get(hadoopConfKey) // 2

spark.conf.set(hadoopConfKey,1) // set 1
hadoopConf=spark.sessionState.newHadoopConf
hadoopConf.get(hadoopConfKey) // 1

//org.apache.spark.sql.hive.HadoopTableReader append Conf

org.apache.spark.deploy.SparkHadoopUtil.get.appendS3AndSparkHadoopConfigurations(spark.sparkContext.getConf,
 hadoopConf)

//org.apache.spark.sql.hive.HadoopTableReader _broadcastedHadoopConf
hadoopConf.get("mapreduce.input.fileinputformat.split.maxsize") // 2

```









---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-10-30 Thread gjhkael
Github user gjhkael commented on the issue:

https://github.com/apache/spark/pull/22887
  
> can you explain more about why you make the change?
   Some hadoop configuration set it in spark-default.conf, we want it to be 
global, but in some cases, user need to override the configuration but cannot 
works, for the sparkContext's conf fill the hadoopConf again finally before 
broadcast this hadoop conf. 
> Did you try `spark.SessionState.newHadoopConf()`?
   We have this problem is in Spark-sql, not use the datafame api 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-10-30 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22887
  
Hi @gjhkael ,
can you explain more about why you make the change?
Did you try `spark.SessionState.newHadoopConf()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-10-30 Thread gjhkael
Github user gjhkael commented on the issue:

https://github.com/apache/spark/pull/22887
  
test this please




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22887
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22887: [SPARK-25880][CORE] user set's hadoop conf should not ov...

2018-10-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22887
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org