Re: hadoop input/output format advanced control

2015-06-26 Thread ๏̯͡๏
I am trying the very same thing to configure min split size with Spark 1.3.1 and i get compilation error Code: val hadoopConfiguration = new Configuration(sc.hadoopConfiguration) hadoopConfiguration.set(mapreduce.input.fileinputformat.split.maxsize, 67108864)

Re: hadoop input/output format advanced control

2015-03-24 Thread Nick Pentreath
You can indeed override the Hadoop configuration at a per-RDD level - though it is a little more verbose, as in the below example, and you need to effectively make a copy of the hadoop Configuration: val thisRDDConf = new Configuration(sc.hadoopConfiguration)

hadoop input/output format advanced control

2015-03-23 Thread Koert Kuipers
currently its pretty hard to control the Hadoop Input/Output formats used in Spark. The conventions seems to be to add extra parameters to all methods and then somewhere deep inside the code (for example in PairRDDFunctions.saveAsHadoopFile) all these parameters get translated into settings on the