]
Sent: Wednesday, December 17, 2014 11:04 AM
To: Shuai Zheng; 'Sun, Rui'; user@spark.apache.org
Subject: RE: Control default partition when load a RDD from HDFS
Why not is a good option to create a RDD per each 200Mb file and then apply
the pre-calculations before merging them? I think
Nice, that is the answer I want.
Thanks!
From: Sun, Rui [mailto:rui@intel.com]
Sent: Wednesday, December 17, 2014 1:30 AM
To: Shuai Zheng; user@spark.apache.org
Subject: RE: Control default partition when load a RDD from HDFS
Hi, Shuai,
How did you turn off the file split
[mailto:szheng.c...@gmail.com]
Enviado el: miércoles, 17 de diciembre de 2014 16:01
Para: 'Sun, Rui'; user@spark.apache.org
Asunto: RE: Control default partition when load a RDD from HDFS
Nice, that is the answer I want.
Thanks!
From: Sun, Rui [mailto:rui@intel.com]
Sent: Wednesday, December 17, 2014 1:30
Hi All,
My application load 1000 files, each file from 200M - a few GB, and combine
with other data to do calculation.
Some pre-calculation must be done on each file level, then after that, the
result need to combine to do further calculation.
In Hadoop, it is simple because I can
.
From: Shuai Zheng [mailto:szheng.c...@gmail.com]
Sent: Wednesday, December 17, 2014 4:16 AM
To: user@spark.apache.org
Subject: Control default partition when load a RDD from HDFS
Hi All,
My application load 1000 files, each file from 200M - a few GB, and combine
with other data to do