Gene Pang created SPARK-48019: --------------------------------- Summary: ColumnVectors with dictionaries and nulls are not read/copied correctly Key: SPARK-48019 URL: https://issues.apache.org/jira/browse/SPARK-48019 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.3 Reporter: Gene Pang
`ColumnVectors` have APIs like `getInts`, `getFloats` and so on. Those return a primitive array with the contents of the vector. When the ColumnVector has a dictionary, the values are decoded with the dictionary before filling in the primitive array. However, `ColumnVectors` can have `null`s, and for those `null` entries, the dictionary id is irrelevant, and can also be invalid. The dictionary should not be used for the `null` entries of the vector. Sometimes, this can cause an `ArrayIndexOutOfBoundsException` . In addition to the possible Exception, copying a `ColumnarArray` is not correct. A `ColumnarArray` contains a `ColumnVector` so it can contain `null` values. However, the `copy()` for primitive types does not take into account the null-ness of the entries, and blindly copies all the primitive values. That means the null entries get lost. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org