[jira] [Updated] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

2024-04-26 Thread Gene Pang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gene Pang updated SPARK-48019:
--
Description: 
{{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those 
return a primitive array with the contents of the vector. When the ColumnVector 
has a dictionary, the values are decoded with the dictionary before filling in 
the primitive array.

However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, the 
dictionary id is irrelevant, and can also be invalid. The dictionary should not 
be used for the {{null}} entries of the vector. Sometimes, this can cause an 
{{ArrayIndexOutOfBoundsException}} .

In addition to the possible Exception, copying a {{ColumnarArray}} is not 
correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain 
{{null}} values. However, the {{copy()}} for primitive types does not take into 
account the null-ness of the entries, and blindly copies all the primitive 
values. That means the null entries get lost.

  was:
`ColumnVectors` have APIs like `getInts`, `getFloats` and so on. Those return a 
primitive array with the contents of the vector. When the ColumnVector has a 
dictionary, the values are decoded with the dictionary before filling in the 
primitive array.

However, `ColumnVectors` can have `null`s, and for those `null` entries, the 
dictionary id is irrelevant, and can also be invalid. The dictionary should not 
be used for the `null` entries of the vector. Sometimes, this can cause an 
`ArrayIndexOutOfBoundsException` .

In addition to the possible Exception, copying a `ColumnarArray` is not 
correct. A `ColumnarArray` contains a `ColumnVector` so it can contain `null` 
values. However, the `copy()` for primitive types does not take into account 
the null-ness of the entries, and blindly copies all the primitive values. That 
means the null entries get lost.


> ColumnVectors with dictionaries and nulls are not read/copied correctly
> ---
>
> Key: SPARK-48019
> URL: https://issues.apache.org/jira/browse/SPARK-48019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Gene Pang
>Priority: Major
>
> {{ColumnVectors}} have APIs like {{getInts}}, {{getFloats}} and so on. Those 
> return a primitive array with the contents of the vector. When the 
> ColumnVector has a dictionary, the values are decoded with the dictionary 
> before filling in the primitive array.
> However, {{ColumnVectors}} can have nulls, and for those {{null}} entries, 
> the dictionary id is irrelevant, and can also be invalid. The dictionary 
> should not be used for the {{null}} entries of the vector. Sometimes, this 
> can cause an {{ArrayIndexOutOfBoundsException}} .
> In addition to the possible Exception, copying a {{ColumnarArray}} is not 
> correct. A {{ColumnarArray}} contains a {{ColumnVector}} so it can contain 
> {{null}} values. However, the {{copy()}} for primitive types does not take 
> into account the null-ness of the entries, and blindly copies all the 
> primitive values. That means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48019) ColumnVectors with dictionaries and nulls are not read/copied correctly

2024-04-26 Thread Gene Pang (Jira)
Gene Pang created SPARK-48019:
-

 Summary: ColumnVectors with dictionaries and nulls are not 
read/copied correctly
 Key: SPARK-48019
 URL: https://issues.apache.org/jira/browse/SPARK-48019
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.3
Reporter: Gene Pang


`ColumnVectors` have APIs like `getInts`, `getFloats` and so on. Those return a 
primitive array with the contents of the vector. When the ColumnVector has a 
dictionary, the values are decoded with the dictionary before filling in the 
primitive array.

However, `ColumnVectors` can have `null`s, and for those `null` entries, the 
dictionary id is irrelevant, and can also be invalid. The dictionary should not 
be used for the `null` entries of the vector. Sometimes, this can cause an 
`ArrayIndexOutOfBoundsException` .

In addition to the possible Exception, copying a `ColumnarArray` is not 
correct. A `ColumnarArray` contains a `ColumnVector` so it can contain `null` 
values. However, the `copy()` for primitive types does not take into account 
the null-ness of the entries, and blindly copies all the primitive values. That 
means the null entries get lost.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47826) Add VariantVal for PySpark

2024-04-11 Thread Gene Pang (Jira)
Gene Pang created SPARK-47826:
-

 Summary: Add VariantVal for PySpark
 Key: SPARK-47826
 URL: https://issues.apache.org/jira/browse/SPARK-47826
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Gene Pang
 Fix For: 4.0.0


Add a `VariantVal` implementation for PySpark. It includes convenience methods 
to convert the Variant to a string, or to a Python object, so that users can 
more easily work with Variant data.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20814) Mesos scheduler does not respect spark.executor.extraClassPath configuration

2017-05-19 Thread Gene Pang (JIRA)
Gene Pang created SPARK-20814:
-

 Summary: Mesos scheduler does not respect 
spark.executor.extraClassPath configuration
 Key: SPARK-20814
 URL: https://issues.apache.org/jira/browse/SPARK-20814
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 2.2.0
Reporter: Gene Pang


When Spark executors are deployed on Mesos, the Mesos scheduler no longer 
respects the "spark.executor.extraClassPath" configuration parameter.

MesosCoarseGrainedSchedulerBackend used to use the environment variable 
"SPARK_CLASSPATH" to add the value of "spark.executor.extraClassPath" to the 
executor classpath. However, "SPARK_CLASSPATH" was deprecated, and was removed 
in this commit 
[https://github.com/apache/spark/commit/8f0490e22b4c7f1fdf381c70c5894d46b7f7e6fb#diff-387c5d0c916278495fc28420571adf9eL178].

This effectively broke the ability for users to specify 
"spark.executor.extraClassPath" for Spark executors deployed on Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org