[jira] [Assigned] (SPARK-41677) Protobuf serializer for StreamingQueryProgressWrapper
[ https://issues.apache.org/jira/browse/SPARK-41677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41677: Assignee: Apache Spark (was: Yang Jie) > Protobuf serializer for StreamingQueryProgressWrapper > - > > Key: SPARK-41677 > URL: https://issues.apache.org/jira/browse/SPARK-41677 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41677) Protobuf serializer for StreamingQueryProgressWrapper
[ https://issues.apache.org/jira/browse/SPARK-41677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-41677: Assignee: Yang Jie (was: Apache Spark) > Protobuf serializer for StreamingQueryProgressWrapper > - > > Key: SPARK-41677 > URL: https://issues.apache.org/jira/browse/SPARK-41677 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42143) Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo
[ https://issues.apache.org/jira/browse/SPARK-42143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679416#comment-17679416 ] Apache Spark commented on SPARK-42143: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39686 > Handle null string values in > RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo > > > Key: SPARK-42143 > URL: https://issues.apache.org/jira/browse/SPARK-42143 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42143) Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo
[ https://issues.apache.org/jira/browse/SPARK-42143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679415#comment-17679415 ] Apache Spark commented on SPARK-42143: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39686 > Handle null string values in > RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo > > > Key: SPARK-42143 > URL: https://issues.apache.org/jira/browse/SPARK-42143 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42143) Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo
[ https://issues.apache.org/jira/browse/SPARK-42143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42143: Assignee: Gengliang Wang (was: Apache Spark) > Handle null string values in > RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo > > > Key: SPARK-42143 > URL: https://issues.apache.org/jira/browse/SPARK-42143 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42143) Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo
[ https://issues.apache.org/jira/browse/SPARK-42143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42143: Assignee: Apache Spark (was: Gengliang Wang) > Handle null string values in > RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo > > > Key: SPARK-42143 > URL: https://issues.apache.org/jira/browse/SPARK-42143 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42143) Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo
[ https://issues.apache.org/jira/browse/SPARK-42143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-42143: -- Assignee: Gengliang Wang > Handle null string values in > RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo > > > Key: SPARK-42143 > URL: https://issues.apache.org/jira/browse/SPARK-42143 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41777) Add Integration Tests
[ https://issues.apache.org/jira/browse/SPARK-41777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41777. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39637 [https://github.com/apache/spark/pull/39637] > Add Integration Tests > - > > Key: SPARK-41777 > URL: https://issues.apache.org/jira/browse/SPARK-41777 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > Fix For: 3.4.0 > > > This requires us to add PyTorch as a testing dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41777) Add Integration Tests
[ https://issues.apache.org/jira/browse/SPARK-41777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41777: Assignee: Rithwik Ediga Lakhamsani > Add Integration Tests > - > > Key: SPARK-41777 > URL: https://issues.apache.org/jira/browse/SPARK-41777 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > > This requires us to add PyTorch as a testing dependency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41593) Implement logging from the executor nodes
[ https://issues.apache.org/jira/browse/SPARK-41593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-41593: Assignee: Rithwik Ediga Lakhamsani > Implement logging from the executor nodes > - > > Key: SPARK-41593 > URL: https://issues.apache.org/jira/browse/SPARK-41593 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41593) Implement logging from the executor nodes
[ https://issues.apache.org/jira/browse/SPARK-41593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-41593. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39299 [https://github.com/apache/spark/pull/39299] > Implement logging from the executor nodes > - > > Key: SPARK-41593 > URL: https://issues.apache.org/jira/browse/SPARK-41593 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Assignee: Rithwik Ediga Lakhamsani >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40264) Add helper function for DL model inference in pyspark.ml.functions
[ https://issues.apache.org/jira/browse/SPARK-40264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40264: Assignee: Lee Yang > Add helper function for DL model inference in pyspark.ml.functions > -- > > Key: SPARK-40264 > URL: https://issues.apache.org/jira/browse/SPARK-40264 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 3.2.2 >Reporter: Lee Yang >Assignee: Lee Yang >Priority: Minor > > Add a helper function to create a pandas_udf for inference on a given DL > model, where the user provides a predict function that is responsible for > loading the model and inferring on a batch of numpy inputs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40264) Add helper function for DL model inference in pyspark.ml.functions
[ https://issues.apache.org/jira/browse/SPARK-40264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40264. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39628 [https://github.com/apache/spark/pull/39628] > Add helper function for DL model inference in pyspark.ml.functions > -- > > Key: SPARK-40264 > URL: https://issues.apache.org/jira/browse/SPARK-40264 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 3.2.2 >Reporter: Lee Yang >Assignee: Lee Yang >Priority: Minor > Fix For: 3.4.0 > > > Add a helper function to create a pandas_udf for inference on a given DL > model, where the user provides a predict function that is responsible for > loading the model and inferring on a batch of numpy inputs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42142) Handle null string values in CachedQuantile/ExecutorSummary/PoolData
[ https://issues.apache.org/jira/browse/SPARK-42142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679411#comment-17679411 ] Apache Spark commented on SPARK-42142: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39685 > Handle null string values in CachedQuantile/ExecutorSummary/PoolData > > > Key: SPARK-42142 > URL: https://issues.apache.org/jira/browse/SPARK-42142 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42142) Handle null string values in CachedQuantile/ExecutorSummary/PoolData
[ https://issues.apache.org/jira/browse/SPARK-42142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679410#comment-17679410 ] Apache Spark commented on SPARK-42142: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39685 > Handle null string values in CachedQuantile/ExecutorSummary/PoolData > > > Key: SPARK-42142 > URL: https://issues.apache.org/jira/browse/SPARK-42142 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42142) Handle null string values in CachedQuantile/ExecutorSummary/PoolData
[ https://issues.apache.org/jira/browse/SPARK-42142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42142: Assignee: Gengliang Wang (was: Apache Spark) > Handle null string values in CachedQuantile/ExecutorSummary/PoolData > > > Key: SPARK-42142 > URL: https://issues.apache.org/jira/browse/SPARK-42142 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42142) Handle null string values in CachedQuantile/ExecutorSummary/PoolData
[ https://issues.apache.org/jira/browse/SPARK-42142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42142: Assignee: Apache Spark (was: Gengliang Wang) > Handle null string values in CachedQuantile/ExecutorSummary/PoolData > > > Key: SPARK-42142 > URL: https://issues.apache.org/jira/browse/SPARK-42142 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42143) Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo
[ https://issues.apache.org/jira/browse/SPARK-42143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679409#comment-17679409 ] Gengliang Wang commented on SPARK-42143: I am working on this one > Handle null string values in > RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo > > > Key: SPARK-42143 > URL: https://issues.apache.org/jira/browse/SPARK-42143 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42140) Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-42140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42140: Assignee: (was: Apache Spark) > Handle null string values in > ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper > - > > Key: SPARK-42140 > URL: https://issues.apache.org/jira/browse/SPARK-42140 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42140) Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-42140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42140: Assignee: Apache Spark > Handle null string values in > ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper > - > > Key: SPARK-42140 > URL: https://issues.apache.org/jira/browse/SPARK-42140 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42140) Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-42140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679408#comment-17679408 ] Apache Spark commented on SPARK-42140: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39684 > Handle null string values in > ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper > - > > Key: SPARK-42140 > URL: https://issues.apache.org/jira/browse/SPARK-42140 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42056) Add missing options for Protobuf functions.
[ https://issues.apache.org/jira/browse/SPARK-42056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42056: Assignee: Raghu Angadi > Add missing options for Protobuf functions. > --- > > Key: SPARK-42056 > URL: https://issues.apache.org/jira/browse/SPARK-42056 > Project: Spark > Issue Type: Improvement > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.4.0 > > > e should be able to pass options for both {{from_protobuf()}} and > {{{}to_protobuf(){}}}. > Currently there are some gaps: > * In Scala {{to_protobuf()}} does not have a way to pass options. > * In Scala {{from_protobuf()}} that takes Java class name does not allow > options. > * In Python, {{from_protobuf()}} that uses Java class name does not > propagate options. > * In Python {{to_protobuf()}} does not pass options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42056) Add missing options for Protobuf functions.
[ https://issues.apache.org/jira/browse/SPARK-42056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42056. -- Resolution: Fixed Issue resolved by pull request 39550 [https://github.com/apache/spark/pull/39550] > Add missing options for Protobuf functions. > --- > > Key: SPARK-42056 > URL: https://issues.apache.org/jira/browse/SPARK-42056 > Project: Spark > Issue Type: Improvement > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Fix For: 3.4.0 > > > e should be able to pass options for both {{from_protobuf()}} and > {{{}to_protobuf(){}}}. > Currently there are some gaps: > * In Scala {{to_protobuf()}} does not have a way to pass options. > * In Scala {{from_protobuf()}} that takes Java class name does not allow > options. > * In Python, {{from_protobuf()}} that uses Java class name does not > propagate options. > * In Python {{to_protobuf()}} does not pass options. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42142) Handle null string values in CachedQuantile/ExecutorSummary/PoolData
[ https://issues.apache.org/jira/browse/SPARK-42142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679406#comment-17679406 ] Gengliang Wang commented on SPARK-42142: I am working on this one > Handle null string values in CachedQuantile/ExecutorSummary/PoolData > > > Key: SPARK-42142 > URL: https://issues.apache.org/jira/browse/SPARK-42142 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42142) Handle null string values in CachedQuantile/ExecutorSummary/PoolData
[ https://issues.apache.org/jira/browse/SPARK-42142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-42142: -- Assignee: Gengliang Wang > Handle null string values in CachedQuantile/ExecutorSummary/PoolData > > > Key: SPARK-42142 > URL: https://issues.apache.org/jira/browse/SPARK-42142 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42140) Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-42140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679405#comment-17679405 ] Gengliang Wang commented on SPARK-42140: PairStrings can be null > Handle null string values in > ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper > - > > Key: SPARK-42140 > URL: https://issues.apache.org/jira/browse/SPARK-42140 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42138) Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-42138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-42138. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39680 [https://github.com/apache/spark/pull/39680] > Handle null string values in > JobData/TaskDataWrapper/ExecutorStageSummaryWrapper > > > Key: SPARK-42138 > URL: https://issues.apache.org/jira/browse/SPARK-42138 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42140) Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-42140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679374#comment-17679374 ] Yang Jie edited comment on SPARK-42140 at 1/21/23 6:30 AM: --- PairStrings should not be null String, can they be special cases? [~Gengliang.Wang] was (Author: luciferyang): RuntimeInfo and PairStrings should not be null String, can they be special cases? [~Gengliang.Wang] > Handle null string values in > ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper > - > > Key: SPARK-42140 > URL: https://issues.apache.org/jira/browse/SPARK-42140 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42141) Handle null string values in ApplicationInfo/ApplicationAttemptInfo
[ https://issues.apache.org/jira/browse/SPARK-42141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679402#comment-17679402 ] Yang Jie commented on SPARK-42141: -- Too small, merge into SPARK-42140 > Handle null string values in ApplicationInfo/ApplicationAttemptInfo > --- > > Key: SPARK-42141 > URL: https://issues.apache.org/jira/browse/SPARK-42141 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42141) Handle null string values in ApplicationInfo/ApplicationAttemptInfo
[ https://issues.apache.org/jira/browse/SPARK-42141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-42141. -- Resolution: Duplicate > Handle null string values in ApplicationInfo/ApplicationAttemptInfo > --- > > Key: SPARK-42141 > URL: https://issues.apache.org/jira/browse/SPARK-42141 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42140) Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-42140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679401#comment-17679401 ] Yang Jie commented on SPARK-42140: -- working on this > Handle null string values in > ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper > - > > Key: SPARK-42140 > URL: https://issues.apache.org/jira/browse/SPARK-42140 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42140) Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-42140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42140: - Summary: Handle null string values in ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper (was: Handle null string values in ApplicationEnvironmentInfoWrapper) > Handle null string values in > ApplicationEnvironmentInfoWrapper/ApplicationInfoWrapper > - > > Key: SPARK-42140 > URL: https://issues.apache.org/jira/browse/SPARK-42140 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42140) Handle null string values in ApplicationEnvironmentInfoWrapper
[ https://issues.apache.org/jira/browse/SPARK-42140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42140: - Summary: Handle null string values in ApplicationEnvironmentInfoWrapper (was: Handle null string values in ApplicationEnvironmentInfo/RuntimeInfo/PairStrings/ExecutorResourceRequest/TaskResourceRequest) > Handle null string values in ApplicationEnvironmentInfoWrapper > -- > > Key: SPARK-42140 > URL: https://issues.apache.org/jira/browse/SPARK-42140 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42144) Handle null string values in StageData/StreamBlockData/StreamingQueryData
[ https://issues.apache.org/jira/browse/SPARK-42144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42144: Assignee: (was: Apache Spark) > Handle null string values in StageData/StreamBlockData/StreamingQueryData > - > > Key: SPARK-42144 > URL: https://issues.apache.org/jira/browse/SPARK-42144 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42144) Handle null string values in StageData/StreamBlockData/StreamingQueryData
[ https://issues.apache.org/jira/browse/SPARK-42144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679397#comment-17679397 ] Apache Spark commented on SPARK-42144: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39683 > Handle null string values in StageData/StreamBlockData/StreamingQueryData > - > > Key: SPARK-42144 > URL: https://issues.apache.org/jira/browse/SPARK-42144 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42144) Handle null string values in StageData/StreamBlockData/StreamingQueryData
[ https://issues.apache.org/jira/browse/SPARK-42144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42144: Assignee: Apache Spark > Handle null string values in StageData/StreamBlockData/StreamingQueryData > - > > Key: SPARK-42144 > URL: https://issues.apache.org/jira/browse/SPARK-42144 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42144) Handle null string values in StageData/StreamBlockData/StreamingQueryData
[ https://issues.apache.org/jira/browse/SPARK-42144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679389#comment-17679389 ] Yang Jie commented on SPARK-42144: -- working on this > Handle null string values in StageData/StreamBlockData/StreamingQueryData > - > > Key: SPARK-42144 > URL: https://issues.apache.org/jira/browse/SPARK-42144 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41415) SASL Request Retries
[ https://issues.apache.org/jira/browse/SPARK-41415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan updated SPARK-41415: Fix Version/s: 3.2.4 3.3.2 > SASL Request Retries > > > Key: SPARK-41415 > URL: https://issues.apache.org/jira/browse/SPARK-41415 > Project: Spark > Issue Type: Task > Components: Shuffle >Affects Versions: 3.2.4 >Reporter: Aravind Patnam >Assignee: Aravind Patnam >Priority: Major > Fix For: 3.2.4, 3.3.2, 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42145) Handle null string values in SparkPlanGraphNode/SparkPlanGraphClusterWrapper
[ https://issues.apache.org/jira/browse/SPARK-42145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-42145. -- Resolution: Duplicate > Handle null string values in SparkPlanGraphNode/SparkPlanGraphClusterWrapper > > > Key: SPARK-42145 > URL: https://issues.apache.org/jira/browse/SPARK-42145 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42145) Handle null string values in SparkPlanGraphNode/SparkPlanGraphClusterWrapper
[ https://issues.apache.org/jira/browse/SPARK-42145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679388#comment-17679388 ] Yang Jie commented on SPARK-42145: -- Too small, will complete with SPARK-42139 > Handle null string values in SparkPlanGraphNode/SparkPlanGraphClusterWrapper > > > Key: SPARK-42145 > URL: https://issues.apache.org/jira/browse/SPARK-42145 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42139) Handle null string values in SQLExecutionUIData/SQLPlanMetric/SparkPlanGraphWrapper
[ https://issues.apache.org/jira/browse/SPARK-42139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42139: - Summary: Handle null string values in SQLExecutionUIData/SQLPlanMetric/SparkPlanGraphWrapper (was: Handle null string values in SQLExecutionUIData/SQLPlanMetric) > Handle null string values in > SQLExecutionUIData/SQLPlanMetric/SparkPlanGraphWrapper > --- > > Key: SPARK-42139 > URL: https://issues.apache.org/jira/browse/SPARK-42139 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42139) Handle null string values in SQLExecutionUIData/SQLPlanMetric
[ https://issues.apache.org/jira/browse/SPARK-42139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42139: Assignee: (was: Apache Spark) > Handle null string values in SQLExecutionUIData/SQLPlanMetric > - > > Key: SPARK-42139 > URL: https://issues.apache.org/jira/browse/SPARK-42139 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42139) Handle null string values in SQLExecutionUIData/SQLPlanMetric
[ https://issues.apache.org/jira/browse/SPARK-42139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679386#comment-17679386 ] Apache Spark commented on SPARK-42139: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39682 > Handle null string values in SQLExecutionUIData/SQLPlanMetric > - > > Key: SPARK-42139 > URL: https://issues.apache.org/jira/browse/SPARK-42139 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42139) Handle null string values in SQLExecutionUIData/SQLPlanMetric
[ https://issues.apache.org/jira/browse/SPARK-42139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679387#comment-17679387 ] Apache Spark commented on SPARK-42139: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/39682 > Handle null string values in SQLExecutionUIData/SQLPlanMetric > - > > Key: SPARK-42139 > URL: https://issues.apache.org/jira/browse/SPARK-42139 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42139) Handle null string values in SQLExecutionUIData/SQLPlanMetric
[ https://issues.apache.org/jira/browse/SPARK-42139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42139: Assignee: Apache Spark > Handle null string values in SQLExecutionUIData/SQLPlanMetric > - > > Key: SPARK-42139 > URL: https://issues.apache.org/jira/browse/SPARK-42139 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42134) Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes
[ https://issues.apache.org/jira/browse/SPARK-42134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-42134. Fix Version/s: 3.3.2 3.4.0 Assignee: Peter Toth Resolution: Fixed > Fix getPartitionFiltersAndDataFilters() to handle filters without referenced > attributes > --- > > Key: SPARK-42134 > URL: https://issues.apache.org/jira/browse/SPARK-42134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Major > Fix For: 3.3.2, 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18011) SparkR serialize "NA" throws exception
[ https://issues.apache.org/jira/browse/SPARK-18011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679385#comment-17679385 ] Apache Spark commented on SPARK-18011: -- User 'joveyuan-db' has created a pull request for this issue: https://github.com/apache/spark/pull/39681 > SparkR serialize "NA" throws exception > -- > > Key: SPARK-18011 > URL: https://issues.apache.org/jira/browse/SPARK-18011 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Miao Wang >Priority: Major > Labels: bulk-closed > > For some versions of R, if Date has "NA" field, backend will throw negative > index exception. > To reproduce the problem: > {code} > > a <- as.Date(c("2016-11-11", "NA")) > > b <- as.data.frame(a) > > c <- createDataFrame(b) > > dim(c) > 16/10/19 10:31:24 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.lang.NegativeArraySizeException > at org.apache.spark.api.r.SerDe$.readStringBytes(SerDe.scala:110) > at org.apache.spark.api.r.SerDe$.readString(SerDe.scala:119) > at org.apache.spark.api.r.SerDe$.readDate(SerDe.scala:128) > at org.apache.spark.api.r.SerDe$.readTypedObject(SerDe.scala:77) > at org.apache.spark.api.r.SerDe$.readObject(SerDe.scala:61) > at > org.apache.spark.sql.api.r.SQLUtils$$anonfun$bytesToRow$1.apply(SQLUtils.scala:161) > at > org.apache.spark.sql.api.r.SQLUtils$$anonfun$bytesToRow$1.apply(SQLUtils.scala:160) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.Range.foreach(Range.scala:160) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at org.apache.spark.sql.api.r.SQLUtils$.bytesToRow(SQLUtils.scala:160) > at > org.apache.spark.sql.api.r.SQLUtils$$anonfun$5.apply(SQLUtils.scala:138) > at > org.apache.spark.sql.api.r.SQLUtils$$anonfun$5.apply(SQLUtils.scala:138) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:372) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:99) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41987) createDataFrame supports column with map type.
[ https://issues.apache.org/jira/browse/SPARK-41987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41987. --- Resolution: Resolved > createDataFrame supports column with map type. > -- > > Key: SPARK-41987 > URL: https://issues.apache.org/jira/browse/SPARK-41987 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Connect API createDataFrame does not support create dataframe with > map type. > For example, > {code:java} > >>> df = spark.createDataFrame( > ... [(1, ["foo", "bar"], {"x": 1.0}), (2, [], {}), (3, None, None)], > ... ("id", "an_array", "a_map") > ... ) > {code} > The above code want create a dataframe with column 'a_map' which is map type. > But pyarrow recognize {"x": 1.0} as a struct not map. > pyarrow supports map with format [('x', 1.0)] > Because the data frame's schema is not correct, so the other sequence > operator will be impacted. > For example: > {code:java} > df.select("id", "a_map", posexplode_outer("an_array")).show() > {code} > Expected: > {code:java} > +---+--+++ > | id| a_map| pos| col| > +---+--+++ > | 1|{x -> 1.0}| 0| foo| > | 1|{x -> 1.0}| 1| bar| > | 2|{}|null|null| > | 3| null|null|null| > +---+--+++ > {code} > Got: > {code:java} > +---+--+++ > | id| a_map| pos| col| > +---+--+++ > | 1| {1.0}| 0| foo| > | 1| {1.0}| 1| bar| > | 2|{null}|null|null| > | 3| null|null|null| > +---+--+++ > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42139) Handle null string values in SQLExecutionUIData/SQLPlanMetric
[ https://issues.apache.org/jira/browse/SPARK-42139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679384#comment-17679384 ] Yang Jie commented on SPARK-42139: -- work on this one > Handle null string values in SQLExecutionUIData/SQLPlanMetric > - > > Key: SPARK-42139 > URL: https://issues.apache.org/jira/browse/SPARK-42139 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41845) Fix `count(expr("*"))` function
[ https://issues.apache.org/jira/browse/SPARK-41845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-41845. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39622 [https://github.com/apache/spark/pull/39622] > Fix `count(expr("*"))` function > --- > > Key: SPARK-41845 > URL: https://issues.apache.org/jira/browse/SPARK-41845 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 801, in pyspark.sql.connect.functions.count > Failed example: > df.select(count(expr("*")), count(df.alphabets)).show() > Expected: > +++ > |count(1)|count(alphabets)| > +++ > | 4| 3| > +++ > Got: > +++ > |count(alphabets)|count(alphabets)| > +++ > | 3| 3| > +++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42099) Make `count(*)` work correctly
[ https://issues.apache.org/jira/browse/SPARK-42099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-42099: - Assignee: Ruifeng Zheng > Make `count(*)` work correctly > -- > > Key: SPARK-42099 > URL: https://issues.apache.org/jira/browse/SPARK-42099 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > > cdf.select(CF.count("*"), CF.count(cdf.alphabets)).collect() > {code:java} > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `*` cannot be resolved. Did you mean one of the following? [`alphabets`] > Plan: 'Aggregate [unresolvedalias('count('*), None), count(alphabets#32) AS > count(alphabets)#35L] > +- Project [alphabets#30 AS alphabets#32] >+- LocalRelation [alphabets#30] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-41845) Fix `count(expr("*"))` function
[ https://issues.apache.org/jira/browse/SPARK-41845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-41845: - Assignee: Ruifeng Zheng > Fix `count(expr("*"))` function > --- > > Key: SPARK-41845 > URL: https://issues.apache.org/jira/browse/SPARK-41845 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Ruifeng Zheng >Priority: Major > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 801, in pyspark.sql.connect.functions.count > Failed example: > df.select(count(expr("*")), count(df.alphabets)).show() > Expected: > +++ > |count(1)|count(alphabets)| > +++ > | 4| 3| > +++ > Got: > +++ > |count(alphabets)|count(alphabets)| > +++ > | 3| 3| > +++ > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42099) Make `count(*)` work correctly
[ https://issues.apache.org/jira/browse/SPARK-42099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-42099. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39622 [https://github.com/apache/spark/pull/39622] > Make `count(*)` work correctly > -- > > Key: SPARK-42099 > URL: https://issues.apache.org/jira/browse/SPARK-42099 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > > cdf.select(CF.count("*"), CF.count(cdf.alphabets)).collect() > {code:java} > pyspark.sql.connect.client.SparkConnectAnalysisException: > [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name > `*` cannot be resolved. Did you mean one of the following? [`alphabets`] > Plan: 'Aggregate [unresolvedalias('count('*), None), count(alphabets#32) AS > count(alphabets)#35L] > +- Project [alphabets#30 AS alphabets#32] >+- LocalRelation [alphabets#30] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42145) Handle null string values in SparkPlanGraphNode/SparkPlanGraphClusterWrapper
Yang Jie created SPARK-42145: Summary: Handle null string values in SparkPlanGraphNode/SparkPlanGraphClusterWrapper Key: SPARK-42145 URL: https://issues.apache.org/jira/browse/SPARK-42145 Project: Spark Issue Type: Sub-task Components: SQL, Web UI Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-41677) Protobuf serializer for StreamingQueryProgressWrapper
[ https://issues.apache.org/jira/browse/SPARK-41677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reopened SPARK-41677: -- Restore this one > Protobuf serializer for StreamingQueryProgressWrapper > - > > Key: SPARK-41677 > URL: https://issues.apache.org/jira/browse/SPARK-41677 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42144) Handle null string values in StageData/StreamBlockData/StreamingQueryData
[ https://issues.apache.org/jira/browse/SPARK-42144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42144: - Summary: Handle null string values in StageData/StreamBlockData/StreamingQueryData (was: Handle null string values in StageData/StreamBlockData) > Handle null string values in StageData/StreamBlockData/StreamingQueryData > - > > Key: SPARK-42144 > URL: https://issues.apache.org/jira/browse/SPARK-42144 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42144) Handle null string values in StageData/StreamBlockData/StreamingQueryData
[ https://issues.apache.org/jira/browse/SPARK-42144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-42144: - Component/s: SQL Web UI > Handle null string values in StageData/StreamBlockData/StreamingQueryData > - > > Key: SPARK-42144 > URL: https://issues.apache.org/jira/browse/SPARK-42144 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42144) Handle null string values in StageData/StreamBlockData
Yang Jie created SPARK-42144: Summary: Handle null string values in StageData/StreamBlockData Key: SPARK-42144 URL: https://issues.apache.org/jira/browse/SPARK-42144 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42143) Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo
Yang Jie created SPARK-42143: Summary: Handle null string values in RDDStorageInfo/RDDDataDistribution/RDDPartitionInfo Key: SPARK-42143 URL: https://issues.apache.org/jira/browse/SPARK-42143 Project: Spark Issue Type: Sub-task Components: Spark Core, Web UI Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42142) Handle null string values in CachedQuantile/ExecutorSummary/PoolData
Yang Jie created SPARK-42142: Summary: Handle null string values in CachedQuantile/ExecutorSummary/PoolData Key: SPARK-42142 URL: https://issues.apache.org/jira/browse/SPARK-42142 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42137) Enable spark.kryo.unsafe by default
[ https://issues.apache.org/jira/browse/SPARK-42137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42137: - Assignee: Dongjoon Hyun > Enable spark.kryo.unsafe by default > --- > > Key: SPARK-42137 > URL: https://issues.apache.org/jira/browse/SPARK-42137 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42137) Enable spark.kryo.unsafe by default
[ https://issues.apache.org/jira/browse/SPARK-42137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42137. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39679 [https://github.com/apache/spark/pull/39679] > Enable spark.kryo.unsafe by default > --- > > Key: SPARK-42137 > URL: https://issues.apache.org/jira/browse/SPARK-42137 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42141) Handle null string values in ApplicationInfo/ApplicationAttemptInfo
Yang Jie created SPARK-42141: Summary: Handle null string values in ApplicationInfo/ApplicationAttemptInfo Key: SPARK-42141 URL: https://issues.apache.org/jira/browse/SPARK-42141 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42140) Handle null string values in ApplicationEnvironmentInfo/RuntimeInfo/PairStrings/ExecutorResourceRequest/TaskResourceRequest
[ https://issues.apache.org/jira/browse/SPARK-42140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679374#comment-17679374 ] Yang Jie commented on SPARK-42140: -- RuntimeInfo and PairStrings should not be null String, can they be special cases? [~Gengliang.Wang] > Handle null string values in > ApplicationEnvironmentInfo/RuntimeInfo/PairStrings/ExecutorResourceRequest/TaskResourceRequest > --- > > Key: SPARK-42140 > URL: https://issues.apache.org/jira/browse/SPARK-42140 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Web UI >Affects Versions: 3.4.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42140) Handle null string values in ApplicationEnvironmentInfo/RuntimeInfo/PairStrings/ExecutorResourceRequest/TaskResourceRequest
Yang Jie created SPARK-42140: Summary: Handle null string values in ApplicationEnvironmentInfo/RuntimeInfo/PairStrings/ExecutorResourceRequest/TaskResourceRequest Key: SPARK-42140 URL: https://issues.apache.org/jira/browse/SPARK-42140 Project: Spark Issue Type: Sub-task Components: Spark Core, Web UI Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42139) Handle null string values in SQLExecutionUIData/SQLPlanMetric
Yang Jie created SPARK-42139: Summary: Handle null string values in SQLExecutionUIData/SQLPlanMetric Key: SPARK-42139 URL: https://issues.apache.org/jira/browse/SPARK-42139 Project: Spark Issue Type: Sub-task Components: SQL, Web UI Affects Versions: 3.4.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42138) Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-42138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679359#comment-17679359 ] Apache Spark commented on SPARK-42138: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39680 > Handle null string values in > JobData/TaskDataWrapper/ExecutorStageSummaryWrapper > > > Key: SPARK-42138 > URL: https://issues.apache.org/jira/browse/SPARK-42138 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42138) Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-42138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42138: Assignee: Apache Spark (was: Gengliang Wang) > Handle null string values in > JobData/TaskDataWrapper/ExecutorStageSummaryWrapper > > > Key: SPARK-42138 > URL: https://issues.apache.org/jira/browse/SPARK-42138 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42138) Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-42138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679358#comment-17679358 ] Apache Spark commented on SPARK-42138: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/39680 > Handle null string values in > JobData/TaskDataWrapper/ExecutorStageSummaryWrapper > > > Key: SPARK-42138 > URL: https://issues.apache.org/jira/browse/SPARK-42138 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42138) Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper
[ https://issues.apache.org/jira/browse/SPARK-42138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42138: Assignee: Gengliang Wang (was: Apache Spark) > Handle null string values in > JobData/TaskDataWrapper/ExecutorStageSummaryWrapper > > > Key: SPARK-42138 > URL: https://issues.apache.org/jira/browse/SPARK-42138 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42138) Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper
Gengliang Wang created SPARK-42138: -- Summary: Handle null string values in JobData/TaskDataWrapper/ExecutorStageSummaryWrapper Key: SPARK-42138 URL: https://issues.apache.org/jira/browse/SPARK-42138 Project: Spark Issue Type: Sub-task Components: Web UI Affects Versions: 3.4.0 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40817) Remote spark.jars URIs ignored for Spark on Kubernetes in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-40817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-40817: -- Fix Version/s: 3.2.4 > Remote spark.jars URIs ignored for Spark on Kubernetes in cluster mode > --- > > Key: SPARK-40817 > URL: https://issues.apache.org/jira/browse/SPARK-40817 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.0.0, 3.1.3, 3.3.0, 3.2.2, 3.4.0 > Environment: Spark 3.1.3 > Kubernetes 1.21 > Ubuntu 20.04.1 >Reporter: Anton Ippolitov >Assignee: Anton Ippolitov >Priority: Major > Fix For: 3.2.4, 3.3.2, 3.4.0 > > Attachments: image-2022-10-17-10-44-46-862.png > > > I discovered that remote URIs in {{spark.jars}} get discarded when launching > Spark on Kubernetes in cluster mode via spark-submit. > h1. Reproduction > Here is an example reproduction with S3 being used for remote JAR storage: > I first created 2 JARs: > * {{/opt/my-local-jar.jar}} on the host where I'm running spark-submit > * {{s3://$BUCKET_NAME/my-remote-jar.jar}} in an S3 bucket I own > I then ran the following spark-submit command with {{spark.jars}} pointing to > both the local JAR and the remote JAR: > {code:java} > spark-submit \ > --master k8s://https://$KUBERNETES_API_SERVER_URL:443 \ > --deploy-mode cluster \ > --name=spark-submit-test \ > --class org.apache.spark.examples.SparkPi \ > --conf > spark.jars=/opt/my-local-jar.jar,s3a://$BUCKET_NAME/my-remote-jar.jar \ > --conf spark.kubernetes.file.upload.path=s3a://$BUCKET_NAME/my-upload-path/ > \ > [...] > /opt/spark/examples/jars/spark-examples_2.12-3.1.3.jar > {code} > Once the driver and the executors started, I confirmed that there was no > trace of {{my-remote-jar.jar}} anymore. For example, looking at the Spark > History Server, I could see that {{spark.jars}} got transformed into this: > !image-2022-10-17-10-44-46-862.png|width=991,height=80! > There was no mention of {{my-remote-jar.jar}} on the classpath or anywhere > else. > Note that I ran all tests with Spark 3.1.3, however the code which handles > those dependencies seems to be the same for more recent versions of Spark as > well. > h1. Root cause description > I believe that the issue seems to be coming from [this > logic|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L163-L186] > in {{{}BasicDriverFeatureStep.getAdditionalPodSystemProperties(){}}}. > Specifically, this logic takes all URIs in {{{}spark.jars{}}}, [filters only > on local > URIs,|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L165] > > [uploads|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L173] > those local files to {{spark.kubernetes.file.upload.path }}and then > [*replaces*|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L182] > the value of {{spark.jars}} with those newly uploaded JARs. By overwriting > the previous value of {{{}spark.jars{}}}, we are losing all mention of remote > JARs that were previously specified there. > Consequently, when the Spark driver starts afterwards, it only downloads JARs > from {{{}spark.kubernetes.file.upload.path{}}}. > h1. Possible solution > I think a possible fix would be to not fully overwrite the value of > {{spark.jars}} but to make sure that we keep remote URIs there. > The new logic would look something like this: > {code:java} > Seq(JARS, FILES, ARCHIVES, SUBMIT_PYTHON_FILES).foreach { key => > val uris = conf.get(key).filter(uri => > KubernetesUtils.isLocalAndResolvable(uri)) > // Save remote URIs > val remoteUris = conf.get(key).filter(uri => > !KubernetesUtils.isLocalAndResolvable(uri)) > val value = { > if (key == ARCHIVES) { > uris.map(UriBuilder.fromUri(_).fragment(null).build()).map(_.toString) > } else { > uris > } > } > val resolved = KubernetesUtils.uploadAndTransformFileUris(value, > Some(conf.sparkConf)) > if (resolved.nonEmpty) { > val resolvedValue = if (key == ARCHIVES) { > uris.zip(resolved).map { case (uri, r) => > UriBuilder.fromUri(r).fragment(new > java.net.URI(uri).getFragment).build().toString > } > } else { > resolved > } > // don't forg
[jira] [Assigned] (SPARK-40817) Remote spark.jars URIs ignored for Spark on Kubernetes in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-40817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-40817: - Assignee: Anton Ippolitov > Remote spark.jars URIs ignored for Spark on Kubernetes in cluster mode > --- > > Key: SPARK-40817 > URL: https://issues.apache.org/jira/browse/SPARK-40817 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.0.0, 3.1.3, 3.3.0, 3.2.2, 3.4.0 > Environment: Spark 3.1.3 > Kubernetes 1.21 > Ubuntu 20.04.1 >Reporter: Anton Ippolitov >Assignee: Anton Ippolitov >Priority: Major > Fix For: 3.3.2, 3.4.0 > > Attachments: image-2022-10-17-10-44-46-862.png > > > I discovered that remote URIs in {{spark.jars}} get discarded when launching > Spark on Kubernetes in cluster mode via spark-submit. > h1. Reproduction > Here is an example reproduction with S3 being used for remote JAR storage: > I first created 2 JARs: > * {{/opt/my-local-jar.jar}} on the host where I'm running spark-submit > * {{s3://$BUCKET_NAME/my-remote-jar.jar}} in an S3 bucket I own > I then ran the following spark-submit command with {{spark.jars}} pointing to > both the local JAR and the remote JAR: > {code:java} > spark-submit \ > --master k8s://https://$KUBERNETES_API_SERVER_URL:443 \ > --deploy-mode cluster \ > --name=spark-submit-test \ > --class org.apache.spark.examples.SparkPi \ > --conf > spark.jars=/opt/my-local-jar.jar,s3a://$BUCKET_NAME/my-remote-jar.jar \ > --conf spark.kubernetes.file.upload.path=s3a://$BUCKET_NAME/my-upload-path/ > \ > [...] > /opt/spark/examples/jars/spark-examples_2.12-3.1.3.jar > {code} > Once the driver and the executors started, I confirmed that there was no > trace of {{my-remote-jar.jar}} anymore. For example, looking at the Spark > History Server, I could see that {{spark.jars}} got transformed into this: > !image-2022-10-17-10-44-46-862.png|width=991,height=80! > There was no mention of {{my-remote-jar.jar}} on the classpath or anywhere > else. > Note that I ran all tests with Spark 3.1.3, however the code which handles > those dependencies seems to be the same for more recent versions of Spark as > well. > h1. Root cause description > I believe that the issue seems to be coming from [this > logic|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L163-L186] > in {{{}BasicDriverFeatureStep.getAdditionalPodSystemProperties(){}}}. > Specifically, this logic takes all URIs in {{{}spark.jars{}}}, [filters only > on local > URIs,|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L165] > > [uploads|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L173] > those local files to {{spark.kubernetes.file.upload.path }}and then > [*replaces*|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L182] > the value of {{spark.jars}} with those newly uploaded JARs. By overwriting > the previous value of {{{}spark.jars{}}}, we are losing all mention of remote > JARs that were previously specified there. > Consequently, when the Spark driver starts afterwards, it only downloads JARs > from {{{}spark.kubernetes.file.upload.path{}}}. > h1. Possible solution > I think a possible fix would be to not fully overwrite the value of > {{spark.jars}} but to make sure that we keep remote URIs there. > The new logic would look something like this: > {code:java} > Seq(JARS, FILES, ARCHIVES, SUBMIT_PYTHON_FILES).foreach { key => > val uris = conf.get(key).filter(uri => > KubernetesUtils.isLocalAndResolvable(uri)) > // Save remote URIs > val remoteUris = conf.get(key).filter(uri => > !KubernetesUtils.isLocalAndResolvable(uri)) > val value = { > if (key == ARCHIVES) { > uris.map(UriBuilder.fromUri(_).fragment(null).build()).map(_.toString) > } else { > uris > } > } > val resolved = KubernetesUtils.uploadAndTransformFileUris(value, > Some(conf.sparkConf)) > if (resolved.nonEmpty) { > val resolvedValue = if (key == ARCHIVES) { > uris.zip(resolved).map { case (uri, r) => > UriBuilder.fromUri(r).fragment(new > java.net.URI(uri).getFragment).build().toString > } > } else { > resolved > } > // don't
[jira] [Resolved] (SPARK-40817) Remote spark.jars URIs ignored for Spark on Kubernetes in cluster mode
[ https://issues.apache.org/jira/browse/SPARK-40817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-40817. --- Fix Version/s: 3.3.2 3.4.0 Resolution: Fixed > Remote spark.jars URIs ignored for Spark on Kubernetes in cluster mode > --- > > Key: SPARK-40817 > URL: https://issues.apache.org/jira/browse/SPARK-40817 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.0.0, 3.1.3, 3.3.0, 3.2.2, 3.4.0 > Environment: Spark 3.1.3 > Kubernetes 1.21 > Ubuntu 20.04.1 >Reporter: Anton Ippolitov >Priority: Major > Fix For: 3.3.2, 3.4.0 > > Attachments: image-2022-10-17-10-44-46-862.png > > > I discovered that remote URIs in {{spark.jars}} get discarded when launching > Spark on Kubernetes in cluster mode via spark-submit. > h1. Reproduction > Here is an example reproduction with S3 being used for remote JAR storage: > I first created 2 JARs: > * {{/opt/my-local-jar.jar}} on the host where I'm running spark-submit > * {{s3://$BUCKET_NAME/my-remote-jar.jar}} in an S3 bucket I own > I then ran the following spark-submit command with {{spark.jars}} pointing to > both the local JAR and the remote JAR: > {code:java} > spark-submit \ > --master k8s://https://$KUBERNETES_API_SERVER_URL:443 \ > --deploy-mode cluster \ > --name=spark-submit-test \ > --class org.apache.spark.examples.SparkPi \ > --conf > spark.jars=/opt/my-local-jar.jar,s3a://$BUCKET_NAME/my-remote-jar.jar \ > --conf spark.kubernetes.file.upload.path=s3a://$BUCKET_NAME/my-upload-path/ > \ > [...] > /opt/spark/examples/jars/spark-examples_2.12-3.1.3.jar > {code} > Once the driver and the executors started, I confirmed that there was no > trace of {{my-remote-jar.jar}} anymore. For example, looking at the Spark > History Server, I could see that {{spark.jars}} got transformed into this: > !image-2022-10-17-10-44-46-862.png|width=991,height=80! > There was no mention of {{my-remote-jar.jar}} on the classpath or anywhere > else. > Note that I ran all tests with Spark 3.1.3, however the code which handles > those dependencies seems to be the same for more recent versions of Spark as > well. > h1. Root cause description > I believe that the issue seems to be coming from [this > logic|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L163-L186] > in {{{}BasicDriverFeatureStep.getAdditionalPodSystemProperties(){}}}. > Specifically, this logic takes all URIs in {{{}spark.jars{}}}, [filters only > on local > URIs,|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L165] > > [uploads|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L173] > those local files to {{spark.kubernetes.file.upload.path }}and then > [*replaces*|https://github.com/apache/spark/blob/d1f8a503a26bcfb4e466d9accc5fa241a7933667/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala#L182] > the value of {{spark.jars}} with those newly uploaded JARs. By overwriting > the previous value of {{{}spark.jars{}}}, we are losing all mention of remote > JARs that were previously specified there. > Consequently, when the Spark driver starts afterwards, it only downloads JARs > from {{{}spark.kubernetes.file.upload.path{}}}. > h1. Possible solution > I think a possible fix would be to not fully overwrite the value of > {{spark.jars}} but to make sure that we keep remote URIs there. > The new logic would look something like this: > {code:java} > Seq(JARS, FILES, ARCHIVES, SUBMIT_PYTHON_FILES).foreach { key => > val uris = conf.get(key).filter(uri => > KubernetesUtils.isLocalAndResolvable(uri)) > // Save remote URIs > val remoteUris = conf.get(key).filter(uri => > !KubernetesUtils.isLocalAndResolvable(uri)) > val value = { > if (key == ARCHIVES) { > uris.map(UriBuilder.fromUri(_).fragment(null).build()).map(_.toString) > } else { > uris > } > } > val resolved = KubernetesUtils.uploadAndTransformFileUris(value, > Some(conf.sparkConf)) > if (resolved.nonEmpty) { > val resolvedValue = if (key == ARCHIVES) { > uris.zip(resolved).map { case (uri, r) => > UriBuilder.fromUri(r).fragment(new > java.net.URI(uri).getFragment).build().toString > } > } else { > resolved > } > // don'
[jira] [Assigned] (SPARK-42137) Enable spark.kryo.unsafe by default
[ https://issues.apache.org/jira/browse/SPARK-42137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42137: Assignee: (was: Apache Spark) > Enable spark.kryo.unsafe by default > --- > > Key: SPARK-42137 > URL: https://issues.apache.org/jira/browse/SPARK-42137 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42137) Enable spark.kryo.unsafe by default
[ https://issues.apache.org/jira/browse/SPARK-42137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42137: Assignee: Apache Spark > Enable spark.kryo.unsafe by default > --- > > Key: SPARK-42137 > URL: https://issues.apache.org/jira/browse/SPARK-42137 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42137) Enable spark.kryo.unsafe by default
[ https://issues.apache.org/jira/browse/SPARK-42137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679340#comment-17679340 ] Apache Spark commented on SPARK-42137: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/39679 > Enable spark.kryo.unsafe by default > --- > > Key: SPARK-42137 > URL: https://issues.apache.org/jira/browse/SPARK-42137 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42137) Enable spark.kryo.unsafe by default
Dongjoon Hyun created SPARK-42137: - Summary: Enable spark.kryo.unsafe by default Key: SPARK-42137 URL: https://issues.apache.org/jira/browse/SPARK-42137 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42130) Handle null string values in AccumulableInfo and ProcessSummary
[ https://issues.apache.org/jira/browse/SPARK-42130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-42130. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39666 [https://github.com/apache/spark/pull/39666] > Handle null string values in AccumulableInfo and ProcessSummary > --- > > Key: SPARK-42130 > URL: https://issues.apache.org/jira/browse/SPARK-42130 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.4.0 > > > Use optional string for string fields so that we can serialize/deserialize > null string correctly -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41916) Address General Fixes
[ https://issues.apache.org/jira/browse/SPARK-41916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41916: - Description: We want the distributor to have the ability to run multiple torchrun processes per task if task.gpu.amount > 1. We want to add a check to see if `import torch` doesn't raise an ImportError since the TorchDistributor requires torch. If it raises an ImportError, we will give the user more details. was:We want the distributor to have the ability to run multiple torchrun processes per task if task.gpu.amount > 1. > Address General Fixes > - > > Key: SPARK-41916 > URL: https://issues.apache.org/jira/browse/SPARK-41916 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > We want the distributor to have the ability to run multiple torchrun > processes per task if task.gpu.amount > 1. > We want to add a check to see if `import torch` doesn't raise an ImportError > since the TorchDistributor requires torch. If it raises an ImportError, we > will give the user more details. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41916) Address General Fizes
[ https://issues.apache.org/jira/browse/SPARK-41916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41916: - Summary: Address General Fizes (was: Address `spark.task.resource.gpu.amount > 1`) > Address General Fizes > - > > Key: SPARK-41916 > URL: https://issues.apache.org/jira/browse/SPARK-41916 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > We want the distributor to have the ability to run multiple torchrun > processes per task if task.gpu.amount > 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41776) Implement support for PyTorch Lightning
[ https://issues.apache.org/jira/browse/SPARK-41776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani resolved SPARK-41776. -- Resolution: Fixed Not needed, since we are now using `torch.distributed.run` > Implement support for PyTorch Lightning > --- > > Key: SPARK-41776 > URL: https://issues.apache.org/jira/browse/SPARK-41776 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > This requires us to just call train() on each spark task separately without > much preprocessing or postprocessing because PyTorch Lightning handles that > by itself. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41916) Address `spark.task.resource.gpu.amount > 1`
[ https://issues.apache.org/jira/browse/SPARK-41916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41916: - Description: We want the distributor to have the ability to run multiple torchrun processes per task if task.gpu.amount > 1. (was: We want the distributor to have the ability to run multiple torchrun processes per task if task.gpu.amount > 1 + address formatting comments on https://github.com/apache/spark/pull/39188#discussion_r1068903058) > Address `spark.task.resource.gpu.amount > 1` > > > Key: SPARK-41916 > URL: https://issues.apache.org/jira/browse/SPARK-41916 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > We want the distributor to have the ability to run multiple torchrun > processes per task if task.gpu.amount > 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41916) Address General Fixes
[ https://issues.apache.org/jira/browse/SPARK-41916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rithwik Ediga Lakhamsani updated SPARK-41916: - Summary: Address General Fixes (was: Address General Fizes) > Address General Fixes > - > > Key: SPARK-41916 > URL: https://issues.apache.org/jira/browse/SPARK-41916 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.4.0 >Reporter: Rithwik Ediga Lakhamsani >Priority: Major > > We want the distributor to have the ability to run multiple torchrun > processes per task if task.gpu.amount > 1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16484) Incremental Cardinality estimation operations with Hyperloglog
[ https://issues.apache.org/jira/browse/SPARK-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16484: Assignee: (was: Apache Spark) > Incremental Cardinality estimation operations with Hyperloglog > -- > > Key: SPARK-16484 > URL: https://issues.apache.org/jira/browse/SPARK-16484 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yongjia Wang >Priority: Major > Labels: bulk-closed > > Efficient cardinality estimation is very important, and SparkSQL has had > approxCountDistinct based on Hyperloglog for quite some time. However, there > isn't a way to do incremental estimation. For example, if we want to get > updated distinct counts of the last 90 days, we need to do the aggregation > for the entire window over and over again. The more efficient way involves > serializing the counter for smaller time windows (such as hourly) so the > counts can be efficiently updated in an incremental fashion for any time > window. > With the support of custom UDAF, Binary DataType and the HyperloglogPlusPlus > implementation in the current Spark version, it's easy enough to extend the > functionality to include incremental counting, and even other general set > operations such as intersection and set difference. Spark API is already as > elegant as it can be, but it still takes quite some effort to do a custom > implementation of the aforementioned operations which are supposed to be in > high demand. I have been searching but failed to find an usable existing > solution nor any ongoing effort for this. The closest I got is the following > but it does not work with Spark 1.6 due to API changes. > https://github.com/collectivemedia/spark-hyperloglog/blob/master/src/main/scala/org/apache/spark/sql/hyperloglog/aggregates.scala > I wonder if it worth to integrate such operations into SparkSQL. The only > problem I see is it depends on serialization of a specific HLL implementation > and introduce compatibility issues. But as long as the user is aware of such > issue, it should be fine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16484) Incremental Cardinality estimation operations with Hyperloglog
[ https://issues.apache.org/jira/browse/SPARK-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679252#comment-17679252 ] Apache Spark commented on SPARK-16484: -- User 'RyanBerti' has created a pull request for this issue: https://github.com/apache/spark/pull/39678 > Incremental Cardinality estimation operations with Hyperloglog > -- > > Key: SPARK-16484 > URL: https://issues.apache.org/jira/browse/SPARK-16484 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yongjia Wang >Priority: Major > Labels: bulk-closed > > Efficient cardinality estimation is very important, and SparkSQL has had > approxCountDistinct based on Hyperloglog for quite some time. However, there > isn't a way to do incremental estimation. For example, if we want to get > updated distinct counts of the last 90 days, we need to do the aggregation > for the entire window over and over again. The more efficient way involves > serializing the counter for smaller time windows (such as hourly) so the > counts can be efficiently updated in an incremental fashion for any time > window. > With the support of custom UDAF, Binary DataType and the HyperloglogPlusPlus > implementation in the current Spark version, it's easy enough to extend the > functionality to include incremental counting, and even other general set > operations such as intersection and set difference. Spark API is already as > elegant as it can be, but it still takes quite some effort to do a custom > implementation of the aforementioned operations which are supposed to be in > high demand. I have been searching but failed to find an usable existing > solution nor any ongoing effort for this. The closest I got is the following > but it does not work with Spark 1.6 due to API changes. > https://github.com/collectivemedia/spark-hyperloglog/blob/master/src/main/scala/org/apache/spark/sql/hyperloglog/aggregates.scala > I wonder if it worth to integrate such operations into SparkSQL. The only > problem I see is it depends on serialization of a specific HLL implementation > and introduce compatibility issues. But as long as the user is aware of such > issue, it should be fine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-16484) Incremental Cardinality estimation operations with Hyperloglog
[ https://issues.apache.org/jira/browse/SPARK-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16484: Assignee: Apache Spark > Incremental Cardinality estimation operations with Hyperloglog > -- > > Key: SPARK-16484 > URL: https://issues.apache.org/jira/browse/SPARK-16484 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Yongjia Wang >Assignee: Apache Spark >Priority: Major > Labels: bulk-closed > > Efficient cardinality estimation is very important, and SparkSQL has had > approxCountDistinct based on Hyperloglog for quite some time. However, there > isn't a way to do incremental estimation. For example, if we want to get > updated distinct counts of the last 90 days, we need to do the aggregation > for the entire window over and over again. The more efficient way involves > serializing the counter for smaller time windows (such as hourly) so the > counts can be efficiently updated in an incremental fashion for any time > window. > With the support of custom UDAF, Binary DataType and the HyperloglogPlusPlus > implementation in the current Spark version, it's easy enough to extend the > functionality to include incremental counting, and even other general set > operations such as intersection and set difference. Spark API is already as > elegant as it can be, but it still takes quite some effort to do a custom > implementation of the aforementioned operations which are supposed to be in > high demand. I have been searching but failed to find an usable existing > solution nor any ongoing effort for this. The closest I got is the following > but it does not work with Spark 1.6 due to API changes. > https://github.com/collectivemedia/spark-hyperloglog/blob/master/src/main/scala/org/apache/spark/sql/hyperloglog/aggregates.scala > I wonder if it worth to integrate such operations into SparkSQL. The only > problem I see is it depends on serialization of a specific HLL implementation > and introduce compatibility issues. But as long as the user is aware of such > issue, it should be fine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40303) The performance will be worse after codegen
[ https://issues.apache.org/jira/browse/SPARK-40303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-40303: - Assignee: Yuming Wang > The performance will be worse after codegen > --- > > Key: SPARK-40303 > URL: https://issues.apache.org/jira/browse/SPARK-40303 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Attachments: TestApiBenchmark.scala, TestApis.java, > TestParameters.java > > > {code:scala} > import org.apache.spark.benchmark.Benchmark > val dir = "/tmp/spark/benchmark" > val N = 200 > val columns = Range(0, 100).map(i => s"id % $i AS id$i") > spark.range(N).selectExpr(columns: _*).write.mode("Overwrite").parquet(dir) > // Seq(1, 2, 5, 10, 15, 25, 40, 60, 100) > Seq(60).foreach{ cnt => > val selectExps = columns.take(cnt).map(_.split(" ").last).map(c => > s"count(distinct $c)") > val benchmark = new Benchmark("Benchmark count distinct", N, minNumIters = > 1) > benchmark.addCase(s"$cnt count distinct with codegen") { _ => > withSQLConf( > "spark.sql.codegen.wholeStage" -> "true", > "spark.sql.codegen.factoryMode" -> "FALLBACK") { > spark.read.parquet(dir).selectExpr(selectExps: > _*).write.format("noop").mode("Overwrite").save() > } > } > benchmark.addCase(s"$cnt count distinct without codegen") { _ => > withSQLConf( > "spark.sql.codegen.wholeStage" -> "false", > "spark.sql.codegen.factoryMode" -> "NO_CODEGEN") { > spark.read.parquet(dir).selectExpr(selectExps: > _*).write.format("noop").mode("Overwrite").save() > } > } > benchmark.run() > } > {code} > {noformat} > Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark count distinct: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > 60 count distinct with codegen 628146 628146 >0 0.0 314072.8 1.0X > 60 count distinct without codegen147635 147635 >0 0.0 73817.5 4.3X > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40303) The performance will be worse after codegen
[ https://issues.apache.org/jira/browse/SPARK-40303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-40303. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 39671 [https://github.com/apache/spark/pull/39671] > The performance will be worse after codegen > --- > > Key: SPARK-40303 > URL: https://issues.apache.org/jira/browse/SPARK-40303 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0 > > Attachments: TestApiBenchmark.scala, TestApis.java, > TestParameters.java > > > {code:scala} > import org.apache.spark.benchmark.Benchmark > val dir = "/tmp/spark/benchmark" > val N = 200 > val columns = Range(0, 100).map(i => s"id % $i AS id$i") > spark.range(N).selectExpr(columns: _*).write.mode("Overwrite").parquet(dir) > // Seq(1, 2, 5, 10, 15, 25, 40, 60, 100) > Seq(60).foreach{ cnt => > val selectExps = columns.take(cnt).map(_.split(" ").last).map(c => > s"count(distinct $c)") > val benchmark = new Benchmark("Benchmark count distinct", N, minNumIters = > 1) > benchmark.addCase(s"$cnt count distinct with codegen") { _ => > withSQLConf( > "spark.sql.codegen.wholeStage" -> "true", > "spark.sql.codegen.factoryMode" -> "FALLBACK") { > spark.read.parquet(dir).selectExpr(selectExps: > _*).write.format("noop").mode("Overwrite").save() > } > } > benchmark.addCase(s"$cnt count distinct without codegen") { _ => > withSQLConf( > "spark.sql.codegen.wholeStage" -> "false", > "spark.sql.codegen.factoryMode" -> "NO_CODEGEN") { > spark.read.parquet(dir).selectExpr(selectExps: > _*).write.format("noop").mode("Overwrite").save() > } > } > benchmark.run() > } > {code} > {noformat} > Java HotSpot(TM) 64-Bit Server VM 1.8.0_281-b09 on Mac OS X 10.15.7 > Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz > Benchmark count distinct: Best Time(ms) Avg Time(ms) > Stdev(ms)Rate(M/s) Per Row(ns) Relative > > 60 count distinct with codegen 628146 628146 >0 0.0 314072.8 1.0X > 60 count distinct without codegen147635 147635 >0 0.0 73817.5 4.3X > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42136) Refactor BroadcastHashJoinExec output partitioning generation
[ https://issues.apache.org/jira/browse/SPARK-42136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679220#comment-17679220 ] Apache Spark commented on SPARK-42136: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/38038 > Refactor BroadcastHashJoinExec output partitioning generation > - > > Key: SPARK-42136 > URL: https://issues.apache.org/jira/browse/SPARK-42136 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42043) Basic Scala Client Result Implementation
[ https://issues.apache.org/jira/browse/SPARK-42043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679219#comment-17679219 ] Apache Spark commented on SPARK-42043: -- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/39677 > Basic Scala Client Result Implementation > - > > Key: SPARK-42043 > URL: https://issues.apache.org/jira/browse/SPARK-42043 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.4.0 >Reporter: Zhen Li >Assignee: Zhen Li >Priority: Major > Fix For: 3.4.0 > > > Adding the basic scala client Result implementation. Add some tests to verify > the result can be received correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42136) Refactor BroadcastHashJoinExec output partitioning generation
[ https://issues.apache.org/jira/browse/SPARK-42136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42136: Assignee: Apache Spark > Refactor BroadcastHashJoinExec output partitioning generation > - > > Key: SPARK-42136 > URL: https://issues.apache.org/jira/browse/SPARK-42136 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42136) Refactor BroadcastHashJoinExec output partitioning generation
[ https://issues.apache.org/jira/browse/SPARK-42136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42136: Assignee: (was: Apache Spark) > Refactor BroadcastHashJoinExec output partitioning generation > - > > Key: SPARK-42136 > URL: https://issues.apache.org/jira/browse/SPARK-42136 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42136) Refactor BroadcastHashJoinExec output partitioning generation
Peter Toth created SPARK-42136: -- Summary: Refactor BroadcastHashJoinExec output partitioning generation Key: SPARK-42136 URL: https://issues.apache.org/jira/browse/SPARK-42136 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Peter Toth -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42135) Scala Client Proper logging for the client
Zhen Li created SPARK-42135: --- Summary: Scala Client Proper logging for the client Key: SPARK-42135 URL: https://issues.apache.org/jira/browse/SPARK-42135 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.4.0 Reporter: Zhen Li Introduce proper logging for the client and change [https://github.com/apache/spark/pull/39541/files/2a589543bdec80f4cf806af0a8566d2de8c04140#r1082062813] to use the client logging. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42134) Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes
[ https://issues.apache.org/jira/browse/SPARK-42134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42134: Assignee: Apache Spark > Fix getPartitionFiltersAndDataFilters() to handle filters without referenced > attributes > --- > > Key: SPARK-42134 > URL: https://issues.apache.org/jira/browse/SPARK-42134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42134) Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes
[ https://issues.apache.org/jira/browse/SPARK-42134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679156#comment-17679156 ] Apache Spark commented on SPARK-42134: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/39676 > Fix getPartitionFiltersAndDataFilters() to handle filters without referenced > attributes > --- > > Key: SPARK-42134 > URL: https://issues.apache.org/jira/browse/SPARK-42134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42134) Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes
[ https://issues.apache.org/jira/browse/SPARK-42134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42134: Assignee: (was: Apache Spark) > Fix getPartitionFiltersAndDataFilters() to handle filters without referenced > attributes > --- > > Key: SPARK-42134 > URL: https://issues.apache.org/jira/browse/SPARK-42134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42134) Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes
Peter Toth created SPARK-42134: -- Summary: Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes Key: SPARK-42134 URL: https://issues.apache.org/jira/browse/SPARK-42134 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Peter Toth -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42129) Upgrade rocksdbjni to 7.9.2
[ https://issues.apache.org/jira/browse/SPARK-42129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42129: - Assignee: Yang Jie > Upgrade rocksdbjni to 7.9.2 > --- > > Key: SPARK-42129 > URL: https://issues.apache.org/jira/browse/SPARK-42129 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > https://github.com/facebook/rocksdb/releases/tag/v7.9.2 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org