[
https://issues.apache.org/jira/browse/DRILL-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17680906#comment-17680906
]
ASF GitHub Bot commented on DRILL-8372:
---------------------------------------
paul-rogers commented on code in PR #2728:
URL: https://github.com/apache/drill/pull/2728#discussion_r1087483902
##########
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/limit/LimitRecordBatch.java:
##########
@@ -75,7 +75,7 @@ public IterOutcome innerNext() {
upStream = next(incoming);
}
// If EMIT that means leaf operator is UNNEST, in this case refresh the
limit states and return EMIT.
- if (upStream == EMIT) {
+ if (upStream == EMIT || upStream == NONE) {
Review Comment:
This doesn't seem to be quite the right solution. This block of code is for
a very particular case: an UNNEST.
Expanding this code, look at the top of the loop:
```java
if (!first && !needMoreRecords(numberOfRecords)) {
```
With a LIMIT 0, we hit the limit on the first batch. I'm not quite sure why
the `!first` is in place. Maybe history would tell us. Perhaps the right answer
is something like:
```java
if ( !needMoreRecords(numberOfRecords)) {
outgoingSv.setRecordCount(0);
VectorAccessibleUtilities.clear(incoming);
return super.innerNext();
}
if (!first) {
...
```
I suspect that the logic actually needs more analysis. What does it do on
the first batch now? What does `super.innerNext()` do, and do we want that if
we've reached the limit?
Generally, the debugger is the best way to sort this out. Try a LIMIT 0, a
LIMIT n where n < size of the first batch, LIMIT n where n > batch size && n <
2 * batch size, etc.
> Unfreed buffers when running a LIMIT 0 query over delimited text
> ----------------------------------------------------------------
>
> Key: DRILL-8372
> URL: https://issues.apache.org/jira/browse/DRILL-8372
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Text & CSV
> Affects Versions: 1.21.0
> Reporter: James Turton
> Assignee: James Turton
> Priority: Major
> Fix For: 1.21.0
>
>
> With the following data layout
>
> {code:java}
> /tmp/foo/bar:
> large_csv.csvh
> /tmp/foo/boo:
> large_csv.csvh
> {code}
> a LIMIT 0 query over it results in unfreed buffer errors.
> {code:java}
> apache drill (dfs.tmp)> select * from `foo` limit 0;
> Error: SYSTEM ERROR: IllegalStateException: Allocator[op:0:0:4:EasySubScan]
> closed with outstanding buffers allocated (3).
> Allocator(op:0:0:4:EasySubScan) 1000000/299008/3182592/10000000000
> (res/actual/peak/limit)
> child allocators: 0
> ledgers: 3
> ledger[113] allocator: op:0:0:4:EasySubScan), isOwning: true, size:
> 262144, references: 1, life: 277785186322881..0, allocatorManager: [109,
> life: 277785186258906..0] holds 1 buffers.
> DrillBuf[142], udle: [110 0..262144]
> ledger[114] allocator: op:0:0:4:EasySubScan), isOwning: true, size:
> 32768, references: 1, life: 277785186463824..0, allocatorManager: [110, life:
> 277785186414654..0] holds 1 buffers.
> DrillBuf[143], udle: [111 0..32768]
> ledger[112] allocator: op:0:0:4:EasySubScan), isOwning: true, size: 4096,
> references: 1, life: 277785186046095..0, allocatorManager: [108, life:
> 277785185921147..0] holds 1 buffers.
> DrillBuf[141], udle: [109 0..4096]
> reservations: 0 {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)