Repository: spark Updated Branches: refs/heads/branch-1.6 e4227cb3e -> faf094c7c
[SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail⦠â¦s on secure Hadoop https://issues.apache.org/jira/browse/SPARK-12654 So the bug here is that WholeTextFileRDD.getPartitions has: val conf = getConf in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that to create a new newJobContext. The newJobContext will copy credentials around, but credentials are only present in a JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration its changing it from a JobConf to Configuration and dropping the credentials that were there. NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why it works. Author: Thomas Graves <tgra...@staydecay.corp.gq1.yahoo.com> Closes #10651 from tgravescs/SPARK-12654. (cherry picked from commit 553fd7b912a32476b481fd3f80c1d0664b6c6484) Signed-off-by: Tom Graves <tgra...@yahoo-inc.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/faf094c7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/faf094c7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/faf094c7 Branch: refs/heads/branch-1.6 Commit: faf094c7c35baf0e73290596d4ca66b7d083ed5b Parents: e4227cb Author: Thomas Graves <tgra...@apache.org> Authored: Fri Jan 8 14:38:19 2016 -0600 Committer: Tom Graves <tgra...@yahoo-inc.com> Committed: Fri Jan 8 14:38:42 2016 -0600 ---------------------------------------------------------------------- core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/faf094c7/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala ---------------------------------------------------------------------- diff --git a/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala b/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala index 86f38ae..c8b4f30 100644 --- a/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala +++ b/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala @@ -24,6 +24,7 @@ import scala.reflect.ClassTag import org.apache.hadoop.conf.{Configurable, Configuration} import org.apache.hadoop.io.Writable +import org.apache.hadoop.mapred.JobConf import org.apache.hadoop.mapreduce._ import org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit} @@ -95,7 +96,13 @@ class NewHadoopRDD[K, V]( // issues, this cloning is disabled by default. NewHadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized { logDebug("Cloning Hadoop Configuration") - new Configuration(conf) + // The Configuration passed in is actually a JobConf and possibly contains credentials. + // To keep those credentials properly we have to create a new JobConf not a Configuration. + if (conf.isInstanceOf[JobConf]) { + new JobConf(conf) + } else { + new Configuration(conf) + } } } else { conf --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org