I succeeded to do some experimental evaluation, and it seems I correctly
understood the code:
A partition that consist of hive-buckets is read bucket-file by
bucket-file, which leads to the loss of internal sorting.
Does anyone have an opinion about my alternative idea of reading from
Hi Spark users,
I'm currently investigating spark's bucketing and partitioning
capabilities and I have some questions:
Let /T/ be a table that is bucketed and sorted by /T.id/ and partitioned
by /T.date/. Before persisting, /T/ has been repartitioned by /T.id/ to
get only one file per