Re: Fine control with sc.sequenceFile
see also: https://github.com/apache/spark/pull/6848 On Mon, Jun 29, 2015 at 12:48 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: sc.hadoopConfiguration.set(mapreduce.input.fileinputformat.split.maxsize, 67108864) sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith(_)).get + /*, classOf[Text], classOf[Text]) works On Sun, Jun 28, 2015 at 9:46 PM, Ted Yu yuzhih...@gmail.com wrote: There isn't setter for sc.hadoopConfiguration You can directly change value of parameter in sc.hadoopConfiguration However, see the note in scaladoc: * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not to modify it unless you * plan to set some global configurations for all Hadoop RDDs. Cheers On Sun, Jun 28, 2015 at 9:34 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: val hadoopConf = new Configuration(sc.hadoopConfiguration) hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize, 67108864) sc.hadoopConfiguration(hadoopConf) or sc.hadoopConfiguration = hadoopConf threw error. On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu yuzhih...@gmail.com wrote: sequenceFile() calls hadoopFile() where: val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration)) You can set the parameter in sc.hadoopConfiguration before calling sc.sequenceFile(). Cheers On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: I can do this val hadoopConf = new Configuration(sc.hadoopConfiguration) *hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize, 67108864)* sc.newAPIHadoopFile( path + /*.avro, classOf[AvroKeyInputFormat[GenericRecord]], classOf[AvroKey[GenericRecord]], classOf[NullWritable], hadoopConf) But i cant do the same with sc.sequenceFile(path, classOf[Text], classOf[Text]) How can i achieve the same with sequenceFile -- Deepak -- Deepak -- Deepak
Re: Fine control with sc.sequenceFile
val hadoopConf = new Configuration(sc.hadoopConfiguration) hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize, 67108864) sc.hadoopConfiguration(hadoopConf) or sc.hadoopConfiguration = hadoopConf threw error. On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu yuzhih...@gmail.com wrote: sequenceFile() calls hadoopFile() where: val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration)) You can set the parameter in sc.hadoopConfiguration before calling sc.sequenceFile(). Cheers On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: I can do this val hadoopConf = new Configuration(sc.hadoopConfiguration) *hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize, 67108864)* sc.newAPIHadoopFile( path + /*.avro, classOf[AvroKeyInputFormat[GenericRecord]], classOf[AvroKey[GenericRecord]], classOf[NullWritable], hadoopConf) But i cant do the same with sc.sequenceFile(path, classOf[Text], classOf[Text]) How can i achieve the same with sequenceFile -- Deepak -- Deepak
Re: Fine control with sc.sequenceFile
sc.hadoopConfiguration.set(mapreduce.input.fileinputformat.split.maxsize, 67108864) sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith(_)).get + /*, classOf[Text], classOf[Text]) works On Sun, Jun 28, 2015 at 9:46 PM, Ted Yu yuzhih...@gmail.com wrote: There isn't setter for sc.hadoopConfiguration You can directly change value of parameter in sc.hadoopConfiguration However, see the note in scaladoc: * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not to modify it unless you * plan to set some global configurations for all Hadoop RDDs. Cheers On Sun, Jun 28, 2015 at 9:34 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: val hadoopConf = new Configuration(sc.hadoopConfiguration) hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize, 67108864) sc.hadoopConfiguration(hadoopConf) or sc.hadoopConfiguration = hadoopConf threw error. On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu yuzhih...@gmail.com wrote: sequenceFile() calls hadoopFile() where: val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration)) You can set the parameter in sc.hadoopConfiguration before calling sc.sequenceFile(). Cheers On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: I can do this val hadoopConf = new Configuration(sc.hadoopConfiguration) *hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize, 67108864)* sc.newAPIHadoopFile( path + /*.avro, classOf[AvroKeyInputFormat[GenericRecord]], classOf[AvroKey[GenericRecord]], classOf[NullWritable], hadoopConf) But i cant do the same with sc.sequenceFile(path, classOf[Text], classOf[Text]) How can i achieve the same with sequenceFile -- Deepak -- Deepak -- Deepak
Re: Fine control with sc.sequenceFile
There isn't setter for sc.hadoopConfiguration You can directly change value of parameter in sc.hadoopConfiguration However, see the note in scaladoc: * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not to modify it unless you * plan to set some global configurations for all Hadoop RDDs. Cheers On Sun, Jun 28, 2015 at 9:34 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: val hadoopConf = new Configuration(sc.hadoopConfiguration) hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize, 67108864) sc.hadoopConfiguration(hadoopConf) or sc.hadoopConfiguration = hadoopConf threw error. On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu yuzhih...@gmail.com wrote: sequenceFile() calls hadoopFile() where: val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration)) You can set the parameter in sc.hadoopConfiguration before calling sc.sequenceFile(). Cheers On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: I can do this val hadoopConf = new Configuration(sc.hadoopConfiguration) *hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize, 67108864)* sc.newAPIHadoopFile( path + /*.avro, classOf[AvroKeyInputFormat[GenericRecord]], classOf[AvroKey[GenericRecord]], classOf[NullWritable], hadoopConf) But i cant do the same with sc.sequenceFile(path, classOf[Text], classOf[Text]) How can i achieve the same with sequenceFile -- Deepak -- Deepak
Re: Fine control with sc.sequenceFile
sequenceFile() calls hadoopFile() where: val confBroadcast = broadcast(new SerializableConfiguration(hadoopConfiguration)) You can set the parameter in sc.hadoopConfiguration before calling sc.sequenceFile(). Cheers On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote: I can do this val hadoopConf = new Configuration(sc.hadoopConfiguration) *hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize, 67108864)* sc.newAPIHadoopFile( path + /*.avro, classOf[AvroKeyInputFormat[GenericRecord]], classOf[AvroKey[GenericRecord]], classOf[NullWritable], hadoopConf) But i cant do the same with sc.sequenceFile(path, classOf[Text], classOf[Text]) How can i achieve the same with sequenceFile -- Deepak