spark git commit: [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail…

tgraves Fri, 08 Jan 2016 12:39:19 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 e4227cb3e -> faf094c7c



[SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true failâ¦

â¦s on secure Hadoop

https://issues.apache.org/jira/browse/SPARK-12654

So the bug here is that WholeTextFileRDD.getPartitions has:
val conf = getConf
in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it 
uses that to create a new newJobContext.
The newJobContext will copy credentials around, but credentials are only 
present in a JobConf not in a Hadoop Configuration. So basically when it is 
cloning the hadoop configuration its changing it from a JobConf to 
Configuration and dropping the credentials that were there. NewHadoopRDD just 
uses the conf passed in for the getPartitions (not getConf) which is why it 
works.

Author: Thomas Graves <tgra...@staydecay.corp.gq1.yahoo.com>

Closes #10651 from tgravescs/SPARK-12654.

(cherry picked from commit 553fd7b912a32476b481fd3f80c1d0664b6c6484)
Signed-off-by: Tom Graves <tgra...@yahoo-inc.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/faf094c7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/faf094c7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/faf094c7

Branch: refs/heads/branch-1.6
Commit: faf094c7c35baf0e73290596d4ca66b7d083ed5b
Parents: e4227cb
Author: Thomas Graves <tgra...@apache.org>
Authored: Fri Jan 8 14:38:19 2016 -0600
Committer: Tom Graves <tgra...@yahoo-inc.com>
Committed: Fri Jan 8 14:38:42 2016 -0600

----------------------------------------------------------------------
 core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/faf094c7/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala 
b/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
index 86f38ae..c8b4f30 100644
--- a/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
+++ b/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala
@@ -24,6 +24,7 @@ import scala.reflect.ClassTag
 
 import org.apache.hadoop.conf.{Configurable, Configuration}
 import org.apache.hadoop.io.Writable
+import org.apache.hadoop.mapred.JobConf
 import org.apache.hadoop.mapreduce._
 import org.apache.hadoop.mapreduce.lib.input.{CombineFileSplit, FileSplit}
 
@@ -95,7 +96,13 @@ class NewHadoopRDD[K, V](
       // issues, this cloning is disabled by default.
       NewHadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
         logDebug("Cloning Hadoop Configuration")
-        new Configuration(conf)
+        // The Configuration passed in is actually a JobConf and possibly 
contains credentials.
+        // To keep those credentials properly we have to create a new JobConf 
not a Configuration.
+        if (conf.isInstanceOf[JobConf]) {
+          new JobConf(conf)
+        } else {
+          new Configuration(conf)
+        }
       }
     } else {
       conf


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-12654] sc.wholeTextFiles with spark.hadoop.cloneConf=true fail…

Reply via email to