Let say I am storing my data in HDFS with folder structure and file
partitioning as per below:
/analytics/2015/05/02/partition-2015-05-02-13-50-
Note that new file is created every 5 minutes.
As per my understanding, storing 5minutes file means we could not create
RDD more granular than
As per my understanding, storing 5minutes file means we could not create
RDD more granular than 5minutes.
This depends on the file format. Many file formats are splittable (like
parquet), meaning that you can seek into various points of the file.
2015-05-05 12:45 GMT-04:00 Rendy Bambang Junior