[jira] [Updated] (DRILL-5602) Repeated List Vector fails to initialize the offset vector

2017-06-21 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5602:
---
Summary: Repeated List Vector fails to initialize the offset vector  (was: 
Vector corruption when allocating a repeated, variable-width vector)

> Repeated List Vector fails to initialize the offset vector
> --
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The query in DRILL-5513 highlighted a problem described in DRILL-5594: that 
> the external sort did not properly allocate its spill batch vectors, and 
> instead allowed them to grow by doubling. While fixing that issue, a new 
> issue became clear.
> The method to allocate a repeated map vector, however, has a serious bug, as 
> described in DRILL-5530: value vectors do not zero-fill the first allocation 
> for a vector (though subsequent reallocs are zero-filled.)
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.
> The problem is here in {{RepeatedMapVector}}:
> {code}
>   public void allocateOffsetsNew(int groupCount) {
> offsets.allocateNew(groupCount + 1);
>   }
> {code}
> Notice that there is no code to set the value at offset 0.
> Then, in the {{UInt4Vector}}:
> {code}
>   public void allocateNew(final int valueCount) {
> allocateBytes(valueCount * 4);
>   }
>   private void allocateBytes(final long size) {
> ...
> data = allocator.buffer(curSize);
> ...
> {code}
> The above eventually calls the Netty memory allocator, which explicitly 
> states that, for performance reasons, it does not zero-fill its buffers.
> The code works in small tests because the new buffer comes from Java direct 
> memory, which *does* zero-fill the buffer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5602) Repeated List Vector fails to initialize the offset vector

2017-06-21 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5602:
---
Description: 
The code that allocates a new {{RepeatedListVector}} does not initialize the 
first offset to zero as required:

{code}
  @Override
  public void allocateNew(int valueCount, int innerValueCount) {
clear();
getOffsetVector().allocateNew(valueCount + 1);
getMutator().reset();
  }
{code}

Since Netty does not zero-fill vectors, the result is vector corruption.

If the code worked correctly, here is the behavior when writing to the first 
element of the list:

* Access the offset vector at offset 0. Should be 0.
* Write the new value at that offset. Since the first offset is 0, the first 
value is written at 0 in the value vector.
* Write into offset 1 the value at offset 0 plus the length of the new value.

But, the offset vector is not initialized to zero. Instead, offset 0 contains 
the value 16 million. Now:

* Access the offset vector at offset 0. Value is 16 million.
* Write the new value at that offset. Write at position 16 million. This 
requires growing the value vector from its present size to 16 MB.


  was:
The query in DRILL-5513 highlighted a problem described in DRILL-5594: that the 
external sort did not properly allocate its spill batch vectors, and instead 
allowed them to grow by doubling. While fixing that issue, a new issue became 
clear.

The method to allocate a repeated map vector, however, has a serious bug, as 
described in DRILL-5530: value vectors do not zero-fill the first allocation 
for a vector (though subsequent reallocs are zero-filled.)

If the code worked correctly, here is the behavior when writing to the first 
element of the list:

* Access the offset vector at offset 0. Should be 0.
* Write the new value at that offset. Since the first offset is 0, the first 
value is written at 0 in the value vector.
* Write into offset 1 the value at offset 0 plus the length of the new value.

But, the offset vector is not initialized to zero. Instead, offset 0 contains 
the value 16 million. Now:

* Access the offset vector at offset 0. Value is 16 million.
* Write the new value at that offset. Write at position 16 million. This 
requires growing the value vector from its present size to 16 MB.

The problem is here in {{RepeatedMapVector}}:

{code}
  public void allocateOffsetsNew(int groupCount) {
offsets.allocateNew(groupCount + 1);
  }
{code}

Notice that there is no code to set the value at offset 0.

Then, in the {{UInt4Vector}}:

{code}
  public void allocateNew(final int valueCount) {
allocateBytes(valueCount * 4);
  }

  private void allocateBytes(final long size) {
...
data = allocator.buffer(curSize);
...
{code}

The above eventually calls the Netty memory allocator, which explicitly states 
that, for performance reasons, it does not zero-fill its buffers.

The code works in small tests because the new buffer comes from Java direct 
memory, which *does* zero-fill the buffer.



> Repeated List Vector fails to initialize the offset vector
> --
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The code that allocates a new {{RepeatedListVector}} does not initialize the 
> first offset to zero as required:
> {code}
>   @Override
>   public void allocateNew(int valueCount, int innerValueCount) {
> clear();
> getOffsetVector().allocateNew(valueCount + 1);
> getMutator().reset();
>   }
> {code}
> Since Netty does not zero-fill vectors, the result is vector corruption.
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5602) Repeated List Vector fails to initialize the offset vector

2017-06-22 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5602:
---
Issue Type: Sub-task  (was: Bug)
Parent: DRILL-5601

> Repeated List Vector fails to initialize the offset vector
> --
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Sub-task
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The code that allocates a new {{RepeatedListVector}} does not initialize the 
> first offset to zero as required:
> {code}
>   @Override
>   public void allocateNew(int valueCount, int innerValueCount) {
> clear();
> getOffsetVector().allocateNew(valueCount + 1);
> getMutator().reset();
>   }
> {code}
> Since Netty does not zero-fill vectors, the result is vector corruption.
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5602) Repeated List Vector fails to initialize the offset vector

2017-06-22 Thread Paul Rogers (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-5602:
---
Issue Type: Bug  (was: Sub-task)
Parent: (was: DRILL-5601)

> Repeated List Vector fails to initialize the offset vector
> --
>
> Key: DRILL-5602
> URL: https://issues.apache.org/jira/browse/DRILL-5602
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: 1.11.0
>
>
> The code that allocates a new {{RepeatedListVector}} does not initialize the 
> first offset to zero as required:
> {code}
>   @Override
>   public void allocateNew(int valueCount, int innerValueCount) {
> clear();
> getOffsetVector().allocateNew(valueCount + 1);
> getMutator().reset();
>   }
> {code}
> Since Netty does not zero-fill vectors, the result is vector corruption.
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)