see also:
https://github.com/apache/spark/pull/6848
On Mon, Jun 29, 2015 at 12:48 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
sc.hadoopConfiguration.set(mapreduce.input.fileinputformat.split.maxsize,
67108864)
sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith(_)).get
+ /*,
val hadoopConf = new Configuration(sc.hadoopConfiguration)
hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize,
67108864)
sc.hadoopConfiguration(hadoopConf)
or
sc.hadoopConfiguration = hadoopConf
threw error.
On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu
I can do this
val hadoopConf = new Configuration(sc.hadoopConfiguration)
*hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize,
67108864)*
sc.newAPIHadoopFile(
path + /*.avro,
classOf[AvroKeyInputFormat[GenericRecord]],
classOf[AvroKey[GenericRecord]],
sc.hadoopConfiguration.set(mapreduce.input.fileinputformat.split.maxsize,
67108864)
sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith(_)).get
+ /*, classOf[Text], classOf[Text])
works
On Sun, Jun 28, 2015 at 9:46 PM, Ted Yu yuzhih...@gmail.com wrote:
There isn't setter for
There isn't setter for sc.hadoopConfiguration
You can directly change value of parameter in sc.hadoopConfiguration
However, see the note in scaladoc:
* '''Note:''' As it will be reused in all Hadoop RDDs, it's better not
to modify it unless you
* plan to set some global configurations for
sequenceFile() calls hadoopFile() where:
val confBroadcast = broadcast(new
SerializableConfiguration(hadoopConfiguration))
You can set the parameter in sc.hadoopConfiguration before calling
sc.sequenceFile().
Cheers
On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: