lalaorya opened a new issue, #15665:
URL: https://github.com/apache/datafusion/issues/15665
### Describe the bug
When using the LIMIT clause, simple `LIMIT N` syntax (such as `LIMIT 10`)
works normally, but when using the syntax with an offset (such as `LIMIT
10,20`), it fails and results in an internal error.
1. Create a table `tbl` with data.(using a custom union provider that
includes MemTable and ListingTable.)
2. Execute the following query (works normally):
```sql
SELECT * FROM tbl WHERE appId = 'xxx' LIMIT 0,10
```
3. Execute the following query (fails):
```sql
SELECT * FROM tbl WHERE appId = 'xxx' LIMIT 10,20
```
### Error Information
```
Internal error: GlobalLimitExec requires a single input partition.
This was likely caused by a bug in DataFusion's code and we would welcome
that you file an bug report in our issue tracker
```
### Execution Plan Comparison
#### Successful Query Execution Plan
```
[grpc-flight] physical plan after generate
CoalesceBatchesExec: target_batch_size=8192, fetch=20
FilterExec: appId@1 = 6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >=
1744199279329000 AND _l2_timestamp@0 < 1744199879329000
UnionExec
SortExec: expr=[_l2_timestamp@0 DESC NULLS LAST],
preserve_partitioning=[false]
MemoryExec: partitions=1, partition_sizes=[6]
ParquetExec: file_groups={1 group:
[[113bf01e-4ec6-42b8-9fbb-f8846bc51664/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744199703383977.1744199797977616.16.7315697162543697288.parquet,
113bf01e-4ec6-42b8-9fbb-f8846bc51664/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744065150679723.1744199685314003.205345.7315697162543697221.parquet]]},
projection=[_l2_timestamp, appId, message, namespace],
output_ordering=[_l2_timestamp@0 DESC NULLS LAST], predicate=appId@1 =
6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >= 1744199279329000 AND
_l2_timestamp@0 < 1744199879329000, pruning_predicate=appId_null_count@2 !=
appId_row_count@3 AND appId_min@0 <= 6eb17b3d80344184bb4c5592 AND
6eb17b3d80344184bb4c5592 <= appId_max@1 AND _l2_timestamp_null_count@5 !=
_l2_timestamp_row_count@6 AND _l2_timestamp_max@4 >= 1744199279329000 AND
_l2_timestamp_null_count@5 != _l2_timestamp_row_count@6 AND _l2_timestamp_min@7
< 1744199879329000, required_guarantees=[appId in (6eb
17b3d80344184bb4c5592)]
```
#### Failed Query Execution Plan
```
[grpc-flight] physical plan after generate
GlobalLimitExec: skip=10, fetch=20
CoalesceBatchesExec: target_batch_size=8192, fetch=30
FilterExec: appId@1 = 6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >=
1744199369830000 AND _l2_timestamp@0 < 1744199969830000
UnionExec
SortExec: expr=[_l2_timestamp@0 DESC NULLS LAST],
preserve_partitioning=[false]
MemoryExec: partitions=1, partition_sizes=[3]
ParquetExec: file_groups={1 group:
[[b5107ed9-933b-40e1-b87b-81d3da109d5f/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744199879085286.1744199936023233.10.7315697162543697422.parquet,
b5107ed9-933b-40e1-b87b-81d3da109d5f/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744199818030386.1744199863928697.8.7315697162543697371.parquet,
b5107ed9-933b-40e1-b87b-81d3da109d5f/schema=schema_key/$$/dev/app/appId=6eb17b3d80344184bb4c5592/1744065150679723.1744199797977616.205361.7315697162543697335.parquet]]},
projection=[_l2_timestamp, appId, message, namespace],
output_ordering=[_l2_timestamp@0 DESC NULLS LAST], predicate=appId@1 =
6eb17b3d80344184bb4c5592 AND _l2_timestamp@0 >= 1744199369830000 AND
_l2_timestamp@0 < 1744199969830000, pruning_predicate=appId_null_count@2 !=
appId_row_count@3 AND appId_min@0 <= 6eb17b3d80344184bb4c5592 AND
6eb17b3d80344184bb4c5592 <= appId_max@1 AND _l2_timestamp_null_count@5 !=
_l2_timestamp_row_count@6 AND _l2_timestamp_
max@4 >= 1744199369830000 AND _l2_timestamp_null_count@5 !=
_l2_timestamp_row_count@6 AND _l2_timestamp_min@7 < 1744199969830000,
required_guarantees=[appId in (6eb17b3d80344184bb4c5592)]
```
#### Questions and Clarifications
From the execution plan, I can see that the offset LIMIT query uses
`GlobalLimitExec: skip=10, fetch=20`, but fails with an error saying
"GlobalLimitExec requires a single input partition."
I'm not sure if this is a bug or a limitation in my usage pattern. Could
someone explain:
1. What "GlobalLimitExec requires a single input partition" means in this
context?
2. Is there a specific pattern I should follow when using LIMIT with an
offset in DataFusion?
3. Are there any configuration settings or query modifications that could
help resolve this issue?
Any insights would be appreciated.
### To Reproduce
_No response_
### Expected behavior
_No response_
### Additional context
- DataFusion version: 45.0.0
- Operating System: linux x86_64
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]