[jira] [Updated] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly
[ https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gene Pang updated SPARK-48019: -- Description: {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those return a primitive array with the contents of the vector. When the ColumnVector has a dictionary, the values are decoded with the dictionary before filling in the primitive array. However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, the dictionary id is irrelevant, and can also be invalid. The dictionary should not be used for the {{null}} entries of the vector. Sometimes, this can cause an {{ArrayIndexOutOfBoundsException}} . In addition to the possible Exception, copying a {{ColumnarArray}} is not correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain {{null}} values. However, the {{copy()}} for primitive types does not take into account the null-ness of the entries, and blindly copies all the primitive values. That means the null entries get lost. was: `ColumnVectors` have APIs like `getInts`, `getFloats` and so on. Those return a primitive array with the contents of the vector. When the ColumnVector has a dictionary, the values are decoded with the dictionary before filling in the primitive array. However, `ColumnVectors` can have `null`s, and for those `null` entries, the dictionary id is irrelevant, and can also be invalid. The dictionary should not be used for the `null` entries of the vector. Sometimes, this can cause an `ArrayIndexOutOfBoundsException` . In addition to the possible Exception, copying a `ColumnarArray` is not correct. A `ColumnarArray` contains a `ColumnVector` so it can contain `null` values. However, the `copy()` for primitive types does not take into account the null-ness of the entries, and blindly copies all the primitive values. That means the null entries get lost. > ColumnVectors with dictionaries and nulls are not read/copied correctly > --- > > Key: SPARK-48019 > URL: https://issues.apache.org/jira/browse/SPARK-48019 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.3 >Reporter: Gene Pang >Priority: Major > > {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those > return a primitive array with the contents of the vector. When the > ColumnVector has a dictionary, the values are decoded with the dictionary > before filling in the primitive array. > However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, > the dictionary id is irrelevant, and can also be invalid. The dictionary > should not be used for the {{null}} entries of the vector. Sometimes, this > can cause an {{ArrayIndexOutOfBoundsException}} . > In addition to the possible Exception, copying a {{ColumnarArray}} is not > correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain > {{null}} values. However, the {{copy()}} for primitive types does not take > into account the null-ness of the entries, and blindly copies all the > primitive values. That means the null entries get lost. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly
Gene Pang created SPARK-48019: - Summary: ColumnVectors with dictionaries and nulls are not read/copied correctly Key: SPARK-48019 URL: https://issues.apache.org/jira/browse/SPARK-48019 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.3 Reporter: Gene Pang `ColumnVectors` have APIs like `getInts`, `getFloats` and so on. Those return a primitive array with the contents of the vector. When the ColumnVector has a dictionary, the values are decoded with the dictionary before filling in the primitive array. However, `ColumnVectors` can have `null`s, and for those `null` entries, the dictionary id is irrelevant, and can also be invalid. The dictionary should not be used for the `null` entries of the vector. Sometimes, this can cause an `ArrayIndexOutOfBoundsException` . In addition to the possible Exception, copying a `ColumnarArray` is not correct. A `ColumnarArray` contains a `ColumnVector` so it can contain `null` values. However, the `copy()` for primitive types does not take into account the null-ness of the entries, and blindly copies all the primitive values. That means the null entries get lost. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47826) Add VariantVal for PySpark
Gene Pang created SPARK-47826: - Summary: Add VariantVal for PySpark Key: SPARK-47826 URL: https://issues.apache.org/jira/browse/SPARK-47826 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 4.0.0 Reporter: Gene Pang Fix For: 4.0.0 Add a `VariantVal` implementation for PySpark. It includes convenience methods to convert the Variant to a string, or to a Python object, so that users can more easily work with Variant data. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20814) Mesos scheduler does not respect spark.executor.extraClassPath configuration
Gene Pang created SPARK-20814: - Summary: Mesos scheduler does not respect spark.executor.extraClassPath configuration Key: SPARK-20814 URL: https://issues.apache.org/jira/browse/SPARK-20814 Project: Spark Issue Type: Bug Components: Mesos Affects Versions: 2.2.0 Reporter: Gene Pang When Spark executors are deployed on Mesos, the Mesos scheduler no longer respects the "spark.executor.extraClassPath" configuration parameter. MesosCoarseGrainedSchedulerBackend used to use the environment variable "SPARK_CLASSPATH" to add the value of "spark.executor.extraClassPath" to the executor classpath. However, "SPARK_CLASSPATH" was deprecated, and was removed in this commit [https://github.com/apache/spark/commit/8f0490e22b4c7f1fdf381c70c5894d46b7f7e6fb#diff-387c5d0c916278495fc28420571adf9eL178]. This effectively broke the ability for users to specify "spark.executor.extraClassPath" for Spark executors deployed on Mesos. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org