I am trying the very same thing to configure min split size with Spark
1.3.1 and i get compilation error
Code:
val hadoopConfiguration = new Configuration(sc.hadoopConfiguration)
hadoopConfiguration.set(mapreduce.input.fileinputformat.split.maxsize,
67108864)
You can indeed override the Hadoop configuration at a per-RDD level -
though it is a little more verbose, as in the below example, and you need
to effectively make a copy of the hadoop Configuration:
val thisRDDConf = new Configuration(sc.hadoopConfiguration)
currently its pretty hard to control the Hadoop Input/Output formats used
in Spark. The conventions seems to be to add extra parameters to all
methods and then somewhere deep inside the code (for example in
PairRDDFunctions.saveAsHadoopFile) all these parameters get translated into
settings on the