Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18260#discussion_r121307570
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 ---
    @@ -366,55 +364,22 @@ public double getDouble(int rowId) {
         }
       }
     
    -  //
    -  // APIs dealing with Arrays
    -  //
    -
    -  @Override
    -  public int getArrayLength(int rowId) {
    -    return arrayLengths[rowId];
    -  }
    -  @Override
    -  public int getArrayOffset(int rowId) {
    -    return arrayOffsets[rowId];
    -  }
    -
    -  @Override
    -  public void putArray(int rowId, int offset, int length) {
    -    arrayOffsets[rowId] = offset;
    -    arrayLengths[rowId] = length;
    -  }
    -
       @Override
       public void loadBytes(ColumnVector.Array array) {
         array.byteArray = byteData;
         array.byteArrayOffset = array.offset;
       }
     
    -  //
    -  // APIs dealing with Byte Arrays
    -  //
    -
    -  @Override
    -  public int putByteArray(int rowId, byte[] value, int offset, int length) 
{
    -    int result = arrayData().appendBytes(length, value, offset);
    -    arrayOffsets[rowId] = result;
    -    arrayLengths[rowId] = length;
    -    return result;
    -  }
    -
       // Spilt this function out since it is the slow path.
       @Override
       protected void reserveInternal(int newCapacity) {
         if (this.resultArray != null || 
DecimalType.isByteArrayDecimalType(type)) {
    -      int[] newLengths = new int[newCapacity];
    -      int[] newOffsets = new int[newCapacity];
    -      if (this.arrayLengths != null) {
    -        System.arraycopy(this.arrayLengths, 0, newLengths, 0, capacity);
    -        System.arraycopy(this.arrayOffsets, 0, newOffsets, 0, capacity);
    +      // need 2 ints as offset and length for each array.
    +      if (intData == null || intData.length < newCapacity * 2) {
    +        int[] newData = new int[newCapacity * 2];
    --- End diff --
    
    `newCapacity` here can be `MAX_CAPACITY` at most. When `newCapacity` is 
more than `MAX_CAPACITY /  2`, seems this allocation would cause problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to