Gene Pang created SPARK-48019:
---------------------------------

             Summary: ColumnVectors with dictionaries and nulls are not 
read/copied correctly
                 Key: SPARK-48019
                 URL: https://issues.apache.org/jira/browse/SPARK-48019
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.4.3
            Reporter: Gene Pang


`ColumnVectors` have APIs like `getInts`, `getFloats` and so on. Those return a 
primitive array with the contents of the vector. When the ColumnVector has a 
dictionary, the values are decoded with the dictionary before filling in the 
primitive array.

However, `ColumnVectors` can have `null`s, and for those `null` entries, the 
dictionary id is irrelevant, and can also be invalid. The dictionary should not 
be used for the `null` entries of the vector. Sometimes, this can cause an 
`ArrayIndexOutOfBoundsException` .

In addition to the possible Exception, copying a `ColumnarArray` is not 
correct. A `ColumnarArray` contains a `ColumnVector` so it can contain `null` 
values. However, the `copy()` for primitive types does not take into account 
the null-ness of the entries, and blindly copies all the primitive values. That 
means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to