[ 
https://issues.apache.org/jira/browse/DRILL-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058715#comment-16058715
 ] 

Paul Rogers commented on DRILL-5602:
------------------------------------

The problem also exists in the other forms of {{allocateNew}}. From 
{{VarCharVector}}:

{code}
  @Override
  public boolean allocateNewSafe() {
      ...
      data = allocator.buffer(requestedSize);
      allocationSizeInBytes = requestedSize;
      offsetVector.allocateNew();
    ...
    data.readerIndex(0);
    offsetVector.zeroVector();
    return true;
  }
{code}

Again, the offsets buffer is not initialized. Perhaps code that uses this form 
does the required initialization. It would be better to do it in the vector, 
rather than each bit of code that allocates vectors...

> Vector corruption when allocating a repeated, variable-width vector
> -------------------------------------------------------------------
>
>                 Key: DRILL-5602
>                 URL: https://issues.apache.org/jira/browse/DRILL-5602
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.11.0
>
>
> The query in DRILL-5513 highlighted a problem described in DRILL-5594: that 
> the external sort did not properly allocate its spill batch vectors, and 
> instead allowed them to grow by doubling. While fixing that issue, a new 
> issue became clear.
> The method to allocate a repeated map vector, however, has a serious bug, as 
> described in DRILL-5530: value vectors do not zero-fill the first allocation 
> for a vector (though subsequent reallocs are zero-filled.)
> If the code worked correctly, here is the behavior when writing to the first 
> element of the list:
> * Access the offset vector at offset 0. Should be 0.
> * Write the new value at that offset. Since the first offset is 0, the first 
> value is written at 0 in the value vector.
> * Write into offset 1 the value at offset 0 plus the length of the new value.
> But, the offset vector is not initialized to zero. Instead, offset 0 contains 
> the value 16 million. Now:
> * Access the offset vector at offset 0. Value is 16 million.
> * Write the new value at that offset. Write at position 16 million. This 
> requires growing the value vector from its present size to 16 MB.
> The problem is here in {{RepeatedMapVector}}:
> {code}
>   public void allocateOffsetsNew(int groupCount) {
>     offsets.allocateNew(groupCount + 1);
>   }
> {code}
> Notice that there is no code to set the value at offset 0.
> Then, in the {{UInt4Vector}}:
> {code}
>   public void allocateNew(final int valueCount) {
>     allocateBytes(valueCount * 4);
>   }
>   private void allocateBytes(final long size) {
>     ...
>     data = allocator.buffer(curSize);
>     ...
> {code}
> The above eventually calls the Netty memory allocator, which explicitly 
> states that, for performance reasons, it does not zero-fill its buffers.
> The code works in small tests because the new buffer comes from Java direct 
> memory, which *does* zero-fill the buffer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to