Assuming you always read data together one large file is good and basic
hdfs use case
On Tue, 5 Nov 2019 at 4:28 am, Yaniv Harpaz wrote:
> It depends on your usage (when and how u read).
> the smaller files you were thinking about are also larger than the HDFS
> block size?
> I would not go for
It depends on your usage (when and how u read).
the smaller files you were thinking about are also larger than the HDFS
block size?
I would not go for something smaller than a block.
Usually (if relevant to the way you read the data) the partitioning helps
determine that.
Yaniv Harpaz
[ yaniv.har
Hi,
How do we choose between single large avro file (size much larger than HDFS
block size) vs multiple smaller avro files (close to HDFS block size?
Since avro is splittable, is there even a need to split a very large avro
file into smaller files?
I’m assuming that a single large avro file can