]
Sent: Wednesday, December 17, 2014 11:04 AM
To: Shuai Zheng; 'Sun, Rui'; user@spark.apache.org
Subject: RE: Control default partition when load a RDD from HDFS
Why not is a good option to create a RDD per each 200Mb file and then apply
the pre-calculations before merging them? I think
Nice, that is the answer I want.
Thanks!
From: Sun, Rui [mailto:rui@intel.com]
Sent: Wednesday, December 17, 2014 1:30 AM
To: Shuai Zheng; user@spark.apache.org
Subject: RE: Control default partition when load a RDD from HDFS
Hi, Shuai,
How did you turn off the file split
[mailto:szheng.c...@gmail.com]
Enviado el: miércoles, 17 de diciembre de 2014 16:01
Para: 'Sun, Rui'; user@spark.apache.org
Asunto: RE: Control default partition when load a RDD from HDFS
Nice, that is the answer I want.
Thanks!
From: Sun, Rui [mailto:rui@intel.com]
Sent: Wednesday, December 17, 2014 1:30
Hi, Shuai,
How did you turn off the file split in Hadoop? I guess you might have
implemented a customized FileInputFormat which overrides isSplitable() to
return FALSE. If you do have such FileInputFormat, you can simply pass it as a
constructor parameter to HadoopRDD or NewHadoopRDD in Spark.