subject:"\[jira\] \[Commented\] \(DRILL\-6080\) Sort incorrectly limits batch size to 65535 records rather than 65536"

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346224#comment-16346224
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/1090


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343613#comment-16343613
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1090
  
+1


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342362#comment-16342362
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1090
  
Squashed and rebased. Reran the unit tests, which passed up to mongo, which 
failed for some unrelated reason.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>  Labels: ready-to-commit
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16342334#comment-16342334
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r164282435
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/managed/TestSortImpl.java
 ---
@@ -466,10 +469,10 @@ public void runLargeSortTest(OperatorFixture fixture, 
DataGenerator dataGen,
 
   public void runJumboBatchTest(OperatorFixture fixture, int rowCount) {
 timer.reset();
-DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
Character.MAX_VALUE);
-DataValidator validator = new DataValidator(rowCount, 
Character.MAX_VALUE);
+DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
ValueVector.MAX_ROW_COUNT);
--- End diff --

Thanks Vlad. I double-checked everything.

`testLargeBatch()` passes to `runJumboBatchTest()` the number of records 
per batch. It passes the maximum, which is `ValueVector.MAX_ROW_COUNT`:

```
  runJumboBatchTest(fixture, ValueVector.MAX_ROW_COUNT);
```

Then, `runJumboBatchTest()` sets up a test of *n* rows with *m* rows per 
batch. (This allows testing multiple batches with a small number of batches 
each.) Here, we want one batch with the maximum number of rows, so we pass:

```
DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
ValueVector.MAX_ROW_COUNT);
DataValidator validator = new DataValidator(rowCount, 
ValueVector.MAX_ROW_COUNT);
```

That is, a row count of 64K (passed in) in batches of 64K. The result is a 
single batch of the maximum possible row count.

Next, `DataGenerator` records these two numbers. It was a bit forgiving by 
adjusting the maximum number to the allowed maximum:

```
  this.batchSize = Math.min(batchSize, ValueVector.MAX_ROW_COUNT);
```

I see that this causes confusion, and is probably a bit too lenient for a 
test. So, I've enforced the limit instead:

```
  Preconditions.checkArgument(batchSize > 0 && batchSize <= 
ValueVector.MAX_ROW_COUNT);
  this.batchSize = batchSize;
```

I made the same change to the `DataValidator` class.

We also discussed the SV4 class. What may not be entirely obvious there is 
the length of an SV4 can often exceed the batch size limit. We have two 
indirection vectors (AKA "selection vectors"). The SV2 is an indirection on top 
of a single batch (and so has a length limited to 
`ValueVector.MAX_RECORD_COUNT`.) But, the SV4 "glues together" multiple 
batches, and so will have a length limited only by the maximum vector length. 
(Hard coded at 2 GB, but ideally limited to 16 GB.)

Further, as your work showed, we probably don't want a huge SV4 for other 
reasons. A primary use of at the SV4 is to merge a collection if batches to 
produce a set of merged output batches. You showed that merge performance 
suffers above a certain input width, and that width is far below the 
theoretical maximum.

So, I *think* I've got everything covered. If I'm overlooking anything, 
please do point it out so I can fix it.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340616#comment-16340616
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user vrozov commented on the issue:

https://github.com/apache/drill/pull/1090
  
LGTM, please rebase + squash to resolve conflicts.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340615#comment-16340615
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r164040082
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/managed/TestSortImpl.java
 ---
@@ -466,10 +469,10 @@ public void runLargeSortTest(OperatorFixture fixture, 
DataGenerator dataGen,
 
   public void runJumboBatchTest(OperatorFixture fixture, int rowCount) {
 timer.reset();
-DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
Character.MAX_VALUE);
-DataValidator validator = new DataValidator(rowCount, 
Character.MAX_VALUE);
+DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
ValueVector.MAX_ROW_COUNT);
--- End diff --

It is more code maintenance question. Changing maximum value in 
DataGenerator/DataValidator requires similar change here.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340473#comment-16340473
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r164022284
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/managed/TestSortImpl.java
 ---
@@ -466,10 +469,10 @@ public void runLargeSortTest(OperatorFixture fixture, 
DataGenerator dataGen,
 
   public void runJumboBatchTest(OperatorFixture fixture, int rowCount) {
 timer.reset();
-DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
Character.MAX_VALUE);
-DataValidator validator = new DataValidator(rowCount, 
Character.MAX_VALUE);
+DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
ValueVector.MAX_ROW_COUNT);
--- End diff --

That is true. But, what advantage is there to passing an arbitrarily large 
number that doesn't represent a valid Drill batch size? The reader would see it 
as an error until they go and look at the code to see that the potential error 
is not really an error.

The question is: is this a bug that needs to be fixed, or just a coding 
preference that we can let go for now?


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340469#comment-16340469
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r164021818
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
 ---
@@ -31,8 +31,9 @@
   private int length;
 
   public SelectionVector4(ByteBuf vector, int recordCount, int 
batchRecordCount) throws SchemaChangeException {
-if (recordCount > Integer.MAX_VALUE /4) {
-  throw new SchemaChangeException(String.format("Currently, Drill can 
only support allocations up to 2gb in size.  You requested an allocation of %d 
bytes.", recordCount * 4));
+if (recordCount > Integer.MAX_VALUE / 4) {
+  throw new SchemaChangeException(String.format("Currently, Drill can 
only support allocations up to 2gb in size. " +
+  "You requested an allocation of %d bytes.", recordCount * 4));
--- End diff --

Sounds like two issues.

First, while I pointed out opportunities for improvement in the code to be 
consistent with work elsewhere, the code as it is has worked for the last two 
years.

Second, if it helps to move this PR ahead for @ilooner, I can back out the 
formatting changes to this file so that it drops out of the PR. That said,  our 
general policy has been to include code cleanup within other commits rather 
than incurring the cost and delay of doing two commits for each bit of work 
(one for code cleanup, another for substantive changes.)

Besides this issue, anything else needed?


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340405#comment-16340405
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r164015707
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/managed/TestSortImpl.java
 ---
@@ -466,10 +469,10 @@ public void runLargeSortTest(OperatorFixture fixture, 
DataGenerator dataGen,
 
   public void runJumboBatchTest(OperatorFixture fixture, int rowCount) {
 timer.reset();
-DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
Character.MAX_VALUE);
-DataValidator validator = new DataValidator(rowCount, 
Character.MAX_VALUE);
+DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
ValueVector.MAX_ROW_COUNT);
--- End diff --

Correct, but will not DataGenerator/DataValidator use `Math.min(batchSize, 
ValueVector.MAX_ROW_COUNT);` and limit the size to the maximum possible?



> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340290#comment-16340290
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r163999199
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
 ---
@@ -31,8 +31,9 @@
   private int length;
 
   public SelectionVector4(ByteBuf vector, int recordCount, int 
batchRecordCount) throws SchemaChangeException {
-if (recordCount > Integer.MAX_VALUE /4) {
-  throw new SchemaChangeException(String.format("Currently, Drill can 
only support allocations up to 2gb in size.  You requested an allocation of %d 
bytes.", recordCount * 4));
+if (recordCount > Integer.MAX_VALUE / 4) {
+  throw new SchemaChangeException(String.format("Currently, Drill can 
only support allocations up to 2gb in size. " +
+  "You requested an allocation of %d bytes.", recordCount * 4));
--- End diff --

Do you mean `Integer.MAX_VALUE/4 + 1`? I'd prefer to avoid format only 
changes especially if the original code is broken.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16340282#comment-16340282
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/1090
  
@vrozov @paul-rogers Is there a way we can accelerate this PR? I keep on 
hitting the DRILL-6086 issue in my builds.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336950#comment-16336950
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r163456760
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
 ---
@@ -31,8 +31,9 @@
   private int length;
 
   public SelectionVector4(ByteBuf vector, int recordCount, int 
batchRecordCount) throws SchemaChangeException {
-if (recordCount > Integer.MAX_VALUE /4) {
-  throw new SchemaChangeException(String.format("Currently, Drill can 
only support allocations up to 2gb in size.  You requested an allocation of %d 
bytes.", recordCount * 4));
+if (recordCount > Integer.MAX_VALUE / 4) {
+  throw new SchemaChangeException(String.format("Currently, Drill can 
only support allocations up to 2gb in size. " +
+  "You requested an allocation of %d bytes.", recordCount * 4));
--- End diff --

This code was original, I only cleaned it up. It is true that the 
`ValueVector` code will fail for an allocation above 2 GB. But, that code is 
not used for the SV4. (Strangely, a selection *vector* is not a value 
*vector*...) So, the SV4 must impose its own limit.

The limit is, essentially, 32K batches * 64K records per batch. (Given how 
we do the math, it could be 64K batches since we mask off the high bits after 
the shift in generated code.)

I'm guessing the original thinking was: we can allocate vectors up to 2 GB, 
so we should only allow SV4s of the same size. With 4-bytes per SV4-entry, we 
can allocate up to `(Integer.MAX_VALUE + 1) / 4` bytes.

Code like this is hard to reason about because 1) it is existing and 2) is 
it broken. As we know, any allocation above 16 MB may fail due to memory 
fragmentation, so allowing a 2 GB size is wildly optimistic on a busy system or 
one that has been running long enough that direct memory all resides on the 
Netty free list.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336952#comment-16336952
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r163457037
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
 ---
@@ -100,8 +101,8 @@ public boolean next() {
   return false;
 }
 
-start = start+length;
-int newEnd = Math.min(start+length, recordCount);
+start = start + length;
--- End diff --

You may be right. I just added some spaces...



> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336951#comment-16336951
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r163456045
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/managed/TestSortImpl.java
 ---
@@ -466,10 +469,10 @@ public void runLargeSortTest(OperatorFixture fixture, 
DataGenerator dataGen,
 
   public void runJumboBatchTest(OperatorFixture fixture, int rowCount) {
 timer.reset();
-DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
Character.MAX_VALUE);
-DataValidator validator = new DataValidator(rowCount, 
Character.MAX_VALUE);
+DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
ValueVector.MAX_ROW_COUNT);
--- End diff --

Well... As it turns out, `ValueVector.MAX_ROW_COUNT` is 64K, which the the 
maximum size an SV2 can address. (An SV2 is 16 bits wide.) `Integer.MAX_VALUE` 
is 2^32, which would require a 32-bit SV2, which we don't have. So, using the 
`Integer.MAX_VALUE` would cause the test to fail as the sorter could not sort 
batches larger than 64K...

Prior we used to use `Character.MAX_VALUE`, but it is not intuitively 
obvious that our batch size should be correlated to the size of Java's UTF-16 
character encoding... And, in fact, the original bug is that they are not 
correlated: `Character.MAX_VALUE` is 65535, while `ValueVector.MAX_ROWS` is 
65536. As a result, we were not testing the full-batch corner case.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-23 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336949#comment-16336949
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r163455571
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/managed/TestSortImpl.java
 ---
@@ -466,10 +469,10 @@ public void runLargeSortTest(OperatorFixture fixture, 
DataGenerator dataGen,
 
   public void runJumboBatchTest(OperatorFixture fixture, int rowCount) {
 timer.reset();
-DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
Character.MAX_VALUE);
-DataValidator validator = new DataValidator(rowCount, 
Character.MAX_VALUE);
+DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
ValueVector.MAX_ROW_COUNT);
+DataValidator validator = new DataValidator(rowCount, 
ValueVector.MAX_ROW_COUNT);
 runLargeSortTest(fixture, dataGen, validator);
-System.out.println(timer.elapsed(TimeUnit.MILLISECONDS));
+//System.out.println(timer.elapsed(TimeUnit.MILLISECONDS));
--- End diff --

Removed all the timing & debugging code to avoid the need for commented-out 
lines.

Logging in tests is a no-op; we simply discard the logs.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335119#comment-16335119
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r162772482
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
 ---
@@ -100,8 +101,8 @@ public boolean next() {
   return false;
 }
 
-start = start+length;
-int newEnd = Math.min(start+length, recordCount);
+start = start + length;
--- End diff --

This code does not look right to me. It tries to enforce invariant that 
`start + length <= recordCount`, but based on the check on line 96, the 
invariant is not enforced in other places, so it is not clear why the invariant 
needs to be enforced here. If the invariant needs to be enforced, will it be 
better to use:
```
start += length;
length = Math.min(length, recordCount - start);
```


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335121#comment-16335121
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r163093639
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/managed/TestSortImpl.java
 ---
@@ -466,10 +469,10 @@ public void runLargeSortTest(OperatorFixture fixture, 
DataGenerator dataGen,
 
   public void runJumboBatchTest(OperatorFixture fixture, int rowCount) {
 timer.reset();
-DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
Character.MAX_VALUE);
-DataValidator validator = new DataValidator(rowCount, 
Character.MAX_VALUE);
+DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
ValueVector.MAX_ROW_COUNT);
--- End diff --

Will it be better to use
```
DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
Integer.MAX_VALUE);
DataValidator validator = new DataValidator(rowCount, Integer.MAX_VALUE);
```
my understanding is that idea is to use max batch size in the test.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335120#comment-16335120
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r162771968
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/selection/SelectionVector4.java
 ---
@@ -31,8 +31,9 @@
   private int length;
 
   public SelectionVector4(ByteBuf vector, int recordCount, int 
batchRecordCount) throws SchemaChangeException {
-if (recordCount > Integer.MAX_VALUE /4) {
-  throw new SchemaChangeException(String.format("Currently, Drill can 
only support allocations up to 2gb in size.  You requested an allocation of %d 
bytes.", recordCount * 4));
+if (recordCount > Integer.MAX_VALUE / 4) {
+  throw new SchemaChangeException(String.format("Currently, Drill can 
only support allocations up to 2gb in size. " +
+  "You requested an allocation of %d bytes.", recordCount * 4));
--- End diff --

My understanding is that Java will use `int` to compute `recordCount * 4` 
and overflow.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-22 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335118#comment-16335118
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r163093794
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/managed/TestSortImpl.java
 ---
@@ -466,10 +469,10 @@ public void runLargeSortTest(OperatorFixture fixture, 
DataGenerator dataGen,
 
   public void runJumboBatchTest(OperatorFixture fixture, int rowCount) {
 timer.reset();
-DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
Character.MAX_VALUE);
-DataValidator validator = new DataValidator(rowCount, 
Character.MAX_VALUE);
+DataGenerator dataGen = new DataGenerator(fixture, rowCount, 
ValueVector.MAX_ROW_COUNT);
+DataValidator validator = new DataValidator(rowCount, 
ValueVector.MAX_ROW_COUNT);
 runLargeSortTest(fixture, dataGen, validator);
-System.out.println(timer.elapsed(TimeUnit.MILLISECONDS));
+//System.out.println(timer.elapsed(TimeUnit.MILLISECONDS));
--- End diff --

Please remove or convert to logging.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329799#comment-16329799
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user paul-rogers commented on the issue:

https://github.com/apache/drill/pull/1090
  
Pushed a new commit that fixes an indexing error. Also removes the "unsafe" 
methods in favor of the "set" methods in DrillBuf as Vlad requested.

@vrozov, please take another look. 


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16325385#comment-16325385
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

Github user vrozov commented on a diff in the pull request:

https://github.com/apache/drill/pull/1090#discussion_r161384235
  
--- Diff: exec/memory/base/src/main/java/io/netty/buffer/DrillBuf.java ---
@@ -851,48 +851,52 @@ public void print(StringBuilder sb, int indent, 
Verbosity verbosity) {
* Write an integer to the buffer at the given byte index, without
* bounds checks.
*
-   * @param offset byte (not int) offset of the location to write
+   * @param bufOffset byte (not int) offset of the location to write
* @param value the value to write
*/
 
-  public void unsafePutInt(int offset, int value) {
-PlatformDependent.putInt(addr + offset, value);
+  public void unsafePutInt(int bufOffset, int value) {
+assert unsafeCheckIndex(bufOffset, 4);
--- End diff --

I don't think that `unsafePutXXX()` provides any performance benefits over 
existing `setXXX()` variants after DRILL-6004 was merged, except that unsafe 
variants use `assert` while existing use `final static boolean` flag that 
functionally and performance wise is equivalent to java `assert`. It is 
necessary to agree on a single mechanism to enable bounds checking and use it 
in all methods.


> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

2018-01-12 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324927#comment-16324927
 ] 

ASF GitHub Bot commented on DRILL-6080:
---

GitHub user paul-rogers opened a pull request:

https://github.com/apache/drill/pull/1090

DRILL-6080: Sort incorrectly limits batch size to 65535 records

Sort incorrectly limits batch size to 65535 records rather than 65536.

This PR also includes a few code cleanup items.

Also fixes DRILL-6086: Overflow in offset vector in row set writer

@vrozov, please review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paul-rogers/drill DRILL-6080

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1090


commit c1d3402a619f3355e47e845aae245fd0f96e2189
Author: Paul Rogers 
Date:   2018-01-11T00:04:53Z

DRILL-6080: Sort incorrectly limits batch size to 65535 records

Sort incorrectly limits batch size to 65535 records rather than 65536.

This PR also includes a few code cleanup items.

Fix for overflow in offset vector in row set writer




> Sort incorrectly limits batch size to 65535 records rather than 65536
> -
>
> Key: DRILL-6080
> URL: https://issues.apache.org/jira/browse/DRILL-6080
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.12.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
> Fix For: 1.13.0
>
>
> Drill places an upper limit on the number of rows in a batch of 64K. That is 
> 65,536 decimal. When we index records, the indexes run from 0 to 64K-1 or 0 
> to 65,535.
> The sort code incorrectly uses {{Character.MAX_VALUE}} as the maximum row 
> count. So, if an incoming batch uses the full 64K size, sort ends up 
> splitting batches unnecessarily.
> The fix is to instead use the correct constant `ValueVector.MAX_ROW_COUNT`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

[jira] [Commented] (DRILL-6080) Sort incorrectly limits batch size to 65535 records rather than 65536

22 matches

Site Navigation

Mail list logo

Footer information