alamb commented on PR #7743:
URL:
https://github.com/apache/arrow-datafusion/pull/7743#issuecomment-1751421496
> I actually think the above is correct behavior. The table is not globally
sorted, but rather each individual file is sorted. Each time you insert, at
least one new file is inserted. In the above result we see two independently
sorted chunks, which means each insert created one new sorted file.
Yes, I think you are right. However, when I did an `EXPLAIN` plan I expect
to see no Sorts (since each file is sorted, they can just be merged with
`SortPreservingMerge`):
```
❯ explain select * from output order by time;
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Sort: output.time ASC NULLS LAST
|
| | TableScan: output projection=[time]
|
| physical_plan | SortPreservingMergeExec: [time@0 ASC NULLS LAST]
|
| | SortExec: expr=[time@0 ASC NULLS LAST]
|
| | ParquetExec: file_groups={16 groups:
[[private/tmp/output/FtsEcvDwXi7JVaVq_6.parquet,
private/tmp/output/FtsEcvDwXi7JVaVq_12.parquet],
[private/tmp/output/1PHmXyyoDVGbi7oo_5.parquet,
private/tmp/output/1PHmXyyoDVGbi7oo_12.parquet],
[private/tmp/output/1PHmXyyoDVGbi7oo_4.parquet,
private/tmp/output/1PHmXyyoDVGbi7oo_13.parquet],
[private/tmp/output/FtsEcvDwXi7JVaVq_13.parquet,
private/tmp/output/FtsEcvDwXi7JVaVq_7.parquet],
[private/tmp/output/1PHmXyyoDVGbi7oo_11.parquet,
private/tmp/output/1PHmXyyoDVGbi7oo_6.parquet], ...]}, projection=[time] |
| |
|
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.003 seconds.
```
But now that I look at that plan, perhaps the issue is that there is more
than one file in each group, so the sort order can't be maintained 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]