RE: Control default partition when load a RDD from HDFS

2014-12-18 Thread Shuai Zheng
] Sent: Wednesday, December 17, 2014 11:04 AM To: Shuai Zheng; 'Sun, Rui'; user@spark.apache.org Subject: RE: Control default partition when load a RDD from HDFS Why not is a good option to create a RDD per each 200Mb file and then apply the pre-calculations before merging them? I think

RE: Control default partition when load a RDD from HDFS

2014-12-17 Thread Shuai Zheng
Nice, that is the answer I want. Thanks! From: Sun, Rui [mailto:rui@intel.com] Sent: Wednesday, December 17, 2014 1:30 AM To: Shuai Zheng; user@spark.apache.org Subject: RE: Control default partition when load a RDD from HDFS Hi, Shuai, How did you turn off the file split

RE: Control default partition when load a RDD from HDFS

2014-12-17 Thread Diego García Valverde
[mailto:szheng.c...@gmail.com] Enviado el: miércoles, 17 de diciembre de 2014 16:01 Para: 'Sun, Rui'; user@spark.apache.org Asunto: RE: Control default partition when load a RDD from HDFS Nice, that is the answer I want. Thanks! From: Sun, Rui [mailto:rui@intel.com] Sent: Wednesday, December 17, 2014 1:30

Control default partition when load a RDD from HDFS

2014-12-16 Thread Shuai Zheng
Hi All, My application load 1000 files, each file from 200M - a few GB, and combine with other data to do calculation. Some pre-calculation must be done on each file level, then after that, the result need to combine to do further calculation. In Hadoop, it is simple because I can

RE: Control default partition when load a RDD from HDFS

2014-12-16 Thread Sun, Rui
. From: Shuai Zheng [mailto:szheng.c...@gmail.com] Sent: Wednesday, December 17, 2014 4:16 AM To: user@spark.apache.org Subject: Control default partition when load a RDD from HDFS Hi All, My application load 1000 files, each file from 200M - a few GB, and combine with other data to do