Number of files to load

2015-05-05 Thread Rendy Bambang Junior
Let say I am storing my data in HDFS with folder structure and file partitioning as per below: /analytics/2015/05/02/partition-2015-05-02-13-50- Note that new file is created every 5 minutes. As per my understanding, storing 5minutes file means we could not create RDD more granular than

Re: Number of files to load

2015-05-05 Thread Jonathan Coveney
As per my understanding, storing 5minutes file means we could not create RDD more granular than 5minutes. This depends on the file format. Many file formats are splittable (like parquet), meaning that you can seek into various points of the file. 2015-05-05 12:45 GMT-04:00 Rendy Bambang Junior