Re: Grouping on bucketed and sorted columns

2016-09-02 Thread Fridtjof Sander
I succeeded to do some experimental evaluation, and it seems I correctly understood the code: A partition that consist of hive-buckets is read bucket-file by bucket-file, which leads to the loss of internal sorting. Does anyone have an opinion about my alternative idea of reading from

Grouping on bucketed and sorted columns

2016-08-31 Thread Fridtjof Sander
Hi Spark users, I'm currently investigating spark's bucketing and partitioning capabilities and I have some questions: Let /T/ be a table that is bucketed and sorted by /T.id/ and partitioned by /T.date/. Before persisting, /T/ has been repartitioned by /T.id/ to get only one file per