Re: Fine control with sc.sequenceFile

2015-06-29 Thread Koert Kuipers
see also:
https://github.com/apache/spark/pull/6848

On Mon, Jun 29, 2015 at 12:48 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:

 sc.hadoopConfiguration.set(mapreduce.input.fileinputformat.split.maxsize,
 67108864)

 sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith(_)).get
 + /*, classOf[Text], classOf[Text])

 works

 On Sun, Jun 28, 2015 at 9:46 PM, Ted Yu yuzhih...@gmail.com wrote:

 There isn't setter for sc.hadoopConfiguration
 You can directly change value of parameter in sc.hadoopConfiguration

 However, see the note in scaladoc:
* '''Note:''' As it will be reused in all Hadoop RDDs, it's better not
 to modify it unless you
* plan to set some global configurations for all Hadoop RDDs.

 Cheers

 On Sun, Jun 28, 2015 at 9:34 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com
 wrote:

 val hadoopConf = new Configuration(sc.hadoopConfiguration)

 hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize,
 67108864)


 sc.hadoopConfiguration(hadoopConf)

 or

 sc.hadoopConfiguration = hadoopConf

 threw error.

 On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu yuzhih...@gmail.com wrote:

 sequenceFile() calls hadoopFile() where:
 val confBroadcast = broadcast(new
 SerializableConfiguration(hadoopConfiguration))

 You can set the parameter in sc.hadoopConfiguration before calling
 sc.sequenceFile().

 Cheers

 On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com
 wrote:

 I can do this

 val hadoopConf = new Configuration(sc.hadoopConfiguration)

 *hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize,
 67108864)*

 sc.newAPIHadoopFile(

   path + /*.avro,

   classOf[AvroKeyInputFormat[GenericRecord]],

   classOf[AvroKey[GenericRecord]],

   classOf[NullWritable],

   hadoopConf)


 But i cant do the same with

 sc.sequenceFile(path, classOf[Text], classOf[Text])
 How can i achieve the same with sequenceFile
 --
 Deepak





 --
 Deepak





 --
 Deepak




Re: Fine control with sc.sequenceFile

2015-06-28 Thread ๏̯͡๏
val hadoopConf = new Configuration(sc.hadoopConfiguration)

hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize,
67108864)


sc.hadoopConfiguration(hadoopConf)

or

sc.hadoopConfiguration = hadoopConf

threw error.

On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu yuzhih...@gmail.com wrote:

 sequenceFile() calls hadoopFile() where:
 val confBroadcast = broadcast(new
 SerializableConfiguration(hadoopConfiguration))

 You can set the parameter in sc.hadoopConfiguration before calling
 sc.sequenceFile().

 Cheers

 On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com
 wrote:

 I can do this

 val hadoopConf = new Configuration(sc.hadoopConfiguration)

 *hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize,
 67108864)*

 sc.newAPIHadoopFile(

   path + /*.avro,

   classOf[AvroKeyInputFormat[GenericRecord]],

   classOf[AvroKey[GenericRecord]],

   classOf[NullWritable],

   hadoopConf)


 But i cant do the same with

 sc.sequenceFile(path, classOf[Text], classOf[Text])
 How can i achieve the same with sequenceFile
 --
 Deepak





-- 
Deepak


Re: Fine control with sc.sequenceFile

2015-06-28 Thread ๏̯͡๏
sc.hadoopConfiguration.set(mapreduce.input.fileinputformat.split.maxsize,
67108864)

sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith(_)).get
+ /*, classOf[Text], classOf[Text])

works

On Sun, Jun 28, 2015 at 9:46 PM, Ted Yu yuzhih...@gmail.com wrote:

 There isn't setter for sc.hadoopConfiguration
 You can directly change value of parameter in sc.hadoopConfiguration

 However, see the note in scaladoc:
* '''Note:''' As it will be reused in all Hadoop RDDs, it's better not
 to modify it unless you
* plan to set some global configurations for all Hadoop RDDs.

 Cheers

 On Sun, Jun 28, 2015 at 9:34 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com
 wrote:

 val hadoopConf = new Configuration(sc.hadoopConfiguration)

 hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize,
 67108864)


 sc.hadoopConfiguration(hadoopConf)

 or

 sc.hadoopConfiguration = hadoopConf

 threw error.

 On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu yuzhih...@gmail.com wrote:

 sequenceFile() calls hadoopFile() where:
 val confBroadcast = broadcast(new
 SerializableConfiguration(hadoopConfiguration))

 You can set the parameter in sc.hadoopConfiguration before calling
 sc.sequenceFile().

 Cheers

 On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com
 wrote:

 I can do this

 val hadoopConf = new Configuration(sc.hadoopConfiguration)

 *hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize,
 67108864)*

 sc.newAPIHadoopFile(

   path + /*.avro,

   classOf[AvroKeyInputFormat[GenericRecord]],

   classOf[AvroKey[GenericRecord]],

   classOf[NullWritable],

   hadoopConf)


 But i cant do the same with

 sc.sequenceFile(path, classOf[Text], classOf[Text])
 How can i achieve the same with sequenceFile
 --
 Deepak





 --
 Deepak





-- 
Deepak


Re: Fine control with sc.sequenceFile

2015-06-28 Thread Ted Yu
There isn't setter for sc.hadoopConfiguration
You can directly change value of parameter in sc.hadoopConfiguration

However, see the note in scaladoc:
   * '''Note:''' As it will be reused in all Hadoop RDDs, it's better not
to modify it unless you
   * plan to set some global configurations for all Hadoop RDDs.

Cheers

On Sun, Jun 28, 2015 at 9:34 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:

 val hadoopConf = new Configuration(sc.hadoopConfiguration)

 hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize,
 67108864)


 sc.hadoopConfiguration(hadoopConf)

 or

 sc.hadoopConfiguration = hadoopConf

 threw error.

 On Sun, Jun 28, 2015 at 9:32 PM, Ted Yu yuzhih...@gmail.com wrote:

 sequenceFile() calls hadoopFile() where:
 val confBroadcast = broadcast(new
 SerializableConfiguration(hadoopConfiguration))

 You can set the parameter in sc.hadoopConfiguration before calling
 sc.sequenceFile().

 Cheers

 On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com
 wrote:

 I can do this

 val hadoopConf = new Configuration(sc.hadoopConfiguration)

 *hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize,
 67108864)*

 sc.newAPIHadoopFile(

   path + /*.avro,

   classOf[AvroKeyInputFormat[GenericRecord]],

   classOf[AvroKey[GenericRecord]],

   classOf[NullWritable],

   hadoopConf)


 But i cant do the same with

 sc.sequenceFile(path, classOf[Text], classOf[Text])
 How can i achieve the same with sequenceFile
 --
 Deepak





 --
 Deepak




Re: Fine control with sc.sequenceFile

2015-06-28 Thread Ted Yu
sequenceFile() calls hadoopFile() where:
val confBroadcast = broadcast(new
SerializableConfiguration(hadoopConfiguration))

You can set the parameter in sc.hadoopConfiguration before calling
sc.sequenceFile().

Cheers

On Sun, Jun 28, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:

 I can do this

 val hadoopConf = new Configuration(sc.hadoopConfiguration)

 *hadoopConf.set(mapreduce.input.fileinputformat.split.maxsize,
 67108864)*

 sc.newAPIHadoopFile(

   path + /*.avro,

   classOf[AvroKeyInputFormat[GenericRecord]],

   classOf[AvroKey[GenericRecord]],

   classOf[NullWritable],

   hadoopConf)


 But i cant do the same with

 sc.sequenceFile(path, classOf[Text], classOf[Text])
 How can i achieve the same with sequenceFile
 --
 Deepak