[jira] [Created] (SPARK-33733) PullOutNondeterministic should check and collect deterministic field
ulysses you created SPARK-33733: --- Summary: PullOutNondeterministic should check and collect deterministic field Key: SPARK-33733 URL: https://issues.apache.org/jira/browse/SPARK-33733 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: ulysses you The deterministic field is wider than `NonDerterministic`, we should keepe same range between pull out and check analysis. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+
[ https://issues.apache.org/jira/browse/SPARK-33732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33732. --- Fix Version/s: 3.1.0 Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/30700 > Kubernetes integration tests doesn't work with Minikube 1.9+ > > > Key: SPARK-33732 > URL: https://issues.apache.org/jira/browse/SPARK-33732 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.1.0 > > > Kubernetes integration tests doesn't work with Minikube 1.9+. > This is due to the location of apiserver.crt and apiserver.key is changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33714) Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-33714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33714. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30676 [https://github.com/apache/spark/pull/30676] > Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework > -- > > Key: SPARK-33714 > URL: https://issues.apache.org/jira/browse/SPARK-33714 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > Fix For: 3.2.0 > > > Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33714) Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-33714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33714: --- Assignee: Terry Kim > Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework > -- > > Key: SPARK-33714 > URL: https://issues.apache.org/jira/browse/SPARK-33714 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > > Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests
[ https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33558. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30685 [https://github.com/apache/spark/pull/30685] > Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests > -- > > Key: SPARK-33558 > URL: https://issues.apache.org/jira/browse/SPARK-33558 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them > for V1 and v2 datasources. Some tests can be places to V1 and V2 specific > test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests
[ https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33558: --- Assignee: Maxim Gekk > Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests > -- > > Key: SPARK-33558 > URL: https://issues.apache.org/jira/browse/SPARK-33558 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them > for V1 and v2 datasources. Some tests can be places to V1 and V2 specific > test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33724) Allow decommissioning script location to be configured
[ https://issues.apache.org/jira/browse/SPARK-33724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33724. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30694 [https://github.com/apache/spark/pull/30694] > Allow decommissioning script location to be configured > -- > > Key: SPARK-33724 > URL: https://issues.apache.org/jira/browse/SPARK-33724 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Trivial > Fix For: 3.2.0 > > > Some people don't use the Spark image tool and instead do custom volume > mounts to make Spark available. As such the hard coded path does not work > well for them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+
[ https://issues.apache.org/jira/browse/SPARK-33732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247009#comment-17247009 ] Apache Spark commented on SPARK-33732: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/30700 > Kubernetes integration tests doesn't work with Minikube 1.9+ > > > Key: SPARK-33732 > URL: https://issues.apache.org/jira/browse/SPARK-33732 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Kubernetes integration tests doesn't work with Minikube 1.9+. > This is due to the location of apiserver.crt and apiserver.key is changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+
[ https://issues.apache.org/jira/browse/SPARK-33732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33732: Assignee: Kousuke Saruta (was: Apache Spark) > Kubernetes integration tests doesn't work with Minikube 1.9+ > > > Key: SPARK-33732 > URL: https://issues.apache.org/jira/browse/SPARK-33732 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Kubernetes integration tests doesn't work with Minikube 1.9+. > This is due to the location of apiserver.crt and apiserver.key is changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+
[ https://issues.apache.org/jira/browse/SPARK-33732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33732: Assignee: Apache Spark (was: Kousuke Saruta) > Kubernetes integration tests doesn't work with Minikube 1.9+ > > > Key: SPARK-33732 > URL: https://issues.apache.org/jira/browse/SPARK-33732 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > Kubernetes integration tests doesn't work with Minikube 1.9+. > This is due to the location of apiserver.crt and apiserver.key is changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+
[ https://issues.apache.org/jira/browse/SPARK-33732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247008#comment-17247008 ] Apache Spark commented on SPARK-33732: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/30700 > Kubernetes integration tests doesn't work with Minikube 1.9+ > > > Key: SPARK-33732 > URL: https://issues.apache.org/jira/browse/SPARK-33732 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Kubernetes integration tests doesn't work with Minikube 1.9+. > This is due to the location of apiserver.crt and apiserver.key is changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+
Kousuke Saruta created SPARK-33732: -- Summary: Kubernetes integration tests doesn't work with Minikube 1.9+ Key: SPARK-33732 URL: https://issues.apache.org/jira/browse/SPARK-33732 Project: Spark Issue Type: Improvement Components: Kubernetes, Tests Affects Versions: 3.1.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta Kubernetes integration tests doesn't work with Minikube 1.9+. This is due to the location of apiserver.crt and apiserver.key is changed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33730) Standardize warning types
[ https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33730: - Description: We should use warnings properly per [https://docs.python.org/3/library/warnings.html#warning-categories] In particular, - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the places we should show the warnings to end-users by default. - we should __maybe__ think about customizing stacklevel ([https://docs.python.org/3/library/warnings.html#warnings.warn]) like pandas does. - ... Current warnings are a bit messy and somewhat arbitrary. To be more explicit, we'll have to fix: {code:java} pyspark/context.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/ml/classification.py:warnings.warn("weightCol is ignored, " pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be removed in future versions. Use " pyspark/mllib/classification.py:warnings.warn( pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd are false. The model does nothing.") pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; " pyspark/rdd.py:warnings.warn( pyspark/shell.py:warnings.warn("Failed to initialize Spark session.") pyspark/shuffle.py:warnings.warn("Please install psutil to have better " pyspark/sql/catalog.py:warnings.warn( pyspark/sql/catalog.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/dataframe.py:warnings.warn( pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict and value is not None. value will be ignored.") pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees instead.", DeprecationWarning) pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians instead.", DeprecationWarning) pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use approx_count_distinct instead.", DeprecationWarning) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/functions.py:warnings.warn( pyspark/sql/pandas/group_ops.py:warnings.warn( pyspark/sql/session.py:warnings.warn("Fall back to non-hive support because failing to access HiveConf, " {code} PySpark prints warnings via using {{print}} in some places as well. We should also see if we should switch and replace to {{warnings.warn}}. was: We should use warnings properly per https://docs.python.org/3/library/warnings.html#warning-categories In particular, - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the places we should show the warnings to end-users by default. - we should __maybe__ think about customizing stacklevel (https://docs.python.org/3/library/warnings.html#warnings.warn) like pandas does. - ... Current warnings are a bit messy and somewhat arbitrary. To be more explicit, we'll have to fix: {code} pyspark/cloudpickle/cloudpickle.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/ml/classification.py:warnings.warn("weightCol is ignored, " pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be removed in future versions. Use " pyspark/mllib/classification.py:warnings.warn( pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd are false. The model does nothing.") pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; " pyspark/rdd.py:warnings.warn( pyspark/shell.py:warnings.warn("Failed to initialize Spark session.") pyspark/shuffle.py:warnings.warn("Please install psutil to have better " pyspark/sql/catalog.py:warnings.warn( pyspark/sql/catalog.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/column.py:warnings.warn(
[jira] [Updated] (SPARK-33730) Standardize warning types
[ https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33730: - Description: We should use warnings properly per https://docs.python.org/3/library/warnings.html#warning-categories In particular, - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the places we should show the warnings to end-users by default. - we should __maybe__ think about customizing stacklevel (https://docs.python.org/3/library/warnings.html#warnings.warn) like pandas does. - ... Current warnings are a bit messy and somewhat arbitrary. To be more explicit, we'll have to fix: {code} pyspark/cloudpickle/cloudpickle.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/ml/classification.py:warnings.warn("weightCol is ignored, " pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be removed in future versions. Use " pyspark/mllib/classification.py:warnings.warn( pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd are false. The model does nothing.") pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; " pyspark/rdd.py:warnings.warn( pyspark/shell.py:warnings.warn("Failed to initialize Spark session.") pyspark/shuffle.py:warnings.warn("Please install psutil to have better " pyspark/sql/catalog.py:warnings.warn( pyspark/sql/catalog.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/dataframe.py:warnings.warn( pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict and value is not None. value will be ignored.") pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees instead.", DeprecationWarning) pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians instead.", DeprecationWarning) pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use approx_count_distinct instead.", DeprecationWarning) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/functions.py:warnings.warn( pyspark/sql/pandas/group_ops.py:warnings.warn( pyspark/sql/session.py:warnings.warn("Fall back to non-hive support because failing to access HiveConf, " {code} PySpark prints warnings via using {{print}} in some places as well. We should also see if we should switch and replace to {{warnings.warn}}. was: We should use warnings properly per https://docs.python.org/3/library/warnings.html#warning-categories In particular, we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the places we should show the warnings to end-users by default. Current warnings are a bit messy and somewhat arbitrary. To be more explicit, we'll have to fix: {code} pyspark/cloudpickle/cloudpickle.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/ml/classification.py:warnings.warn("weightCol is ignored, " pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be removed in future versions. Use " pyspark/mllib/classification.py:warnings.warn( pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd are false. The model does nothing.") pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; " pyspark/rdd.py:warnings.warn( pyspark/shell.py:warnings.warn("Failed to initialize Spark session.") pyspark/shuffle.py:warnings.warn("Please install psutil to have better " pyspark/sql/catalog.py:warnings.warn( pyspark/sql/catalog.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn(
[jira] [Updated] (SPARK-33730) Standardize warning types
[ https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33730: - Description: We should use warnings properly per https://docs.python.org/3/library/warnings.html#warning-categories In particular, we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the places we should show the warnings to end-users by default. Current warnings are a bit messy and somewhat arbitrary. To be more explicit, we'll have to fix: {code} pyspark/cloudpickle/cloudpickle.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/ml/classification.py:warnings.warn("weightCol is ignored, " pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be removed in future versions. Use " pyspark/mllib/classification.py:warnings.warn( pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd are false. The model does nothing.") pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; " pyspark/rdd.py:warnings.warn( pyspark/shell.py:warnings.warn("Failed to initialize Spark session.") pyspark/shuffle.py:warnings.warn("Please install psutil to have better " pyspark/sql/catalog.py:warnings.warn( pyspark/sql/catalog.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/dataframe.py:warnings.warn( pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict and value is not None. value will be ignored.") pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees instead.", DeprecationWarning) pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians instead.", DeprecationWarning) pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use approx_count_distinct instead.", DeprecationWarning) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/functions.py:warnings.warn( pyspark/sql/pandas/group_ops.py:warnings.warn( pyspark/sql/session.py:warnings.warn("Fall back to non-hive support because failing to access HiveConf, " {code} PySpark prints warnings via using {{print}} in some places as well. We should also see if we should switch and replace to {{warnings.warn}}. was: We should use warnings properly per https://docs.python.org/3/library/warnings.html#warning-categories In particular, we should use {{FutureWarning}} instead of {{DeprecationWarning}} if we aim to show the warnings by default. Current warnings are a bit messy and somewhat arbitrary. To be more explicit, we'll have to fix: {code} pyspark/cloudpickle/cloudpickle.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/ml/classification.py:warnings.warn("weightCol is ignored, " pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be removed in future versions. Use " pyspark/mllib/classification.py:warnings.warn( pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd are false. The model does nothing.") pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; " pyspark/rdd.py:warnings.warn( pyspark/shell.py:warnings.warn("Failed to initialize Spark session.") pyspark/shuffle.py:warnings.warn("Please install psutil to have better " pyspark/sql/catalog.py:warnings.warn( pyspark/sql/catalog.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/dataframe.py:warnings.warn(
[jira] [Updated] (SPARK-33730) Standardize warning types
[ https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33730: - Description: We should use warnings properly per https://docs.python.org/3/library/warnings.html#warning-categories In particular, we should use {{FutureWarning}} instead of {{DeprecationWarning}} if we aim to show the warnings by default. Current warnings are a bit messy and somewhat arbitrary. To be more explicit, we'll have to fix: {code} pyspark/cloudpickle/cloudpickle.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/context.py:warnings.warn( pyspark/ml/classification.py:warnings.warn("weightCol is ignored, " pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be removed in future versions. Use " pyspark/mllib/classification.py:warnings.warn( pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd are false. The model does nothing.") pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/mllib/regression.py:warnings.warn( pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; " pyspark/rdd.py:warnings.warn( pyspark/shell.py:warnings.warn("Failed to initialize Spark session.") pyspark/shuffle.py:warnings.warn("Please install psutil to have better " pyspark/sql/catalog.py:warnings.warn( pyspark/sql/catalog.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/column.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/context.py:warnings.warn( pyspark/sql/dataframe.py:warnings.warn( pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict and value is not None. value will be ignored.") pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees instead.", DeprecationWarning) pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians instead.", DeprecationWarning) pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use approx_count_distinct instead.", DeprecationWarning) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/conversion.py:warnings.warn(msg) pyspark/sql/pandas/functions.py:warnings.warn( pyspark/sql/pandas/group_ops.py:warnings.warn( pyspark/sql/session.py:warnings.warn("Fall back to non-hive support because failing to access HiveConf, " {code} PySpark prints warnings via using {{print}} in some places as well. We should also see if we should switch and replace to {{warnings.warn}}. was: We should use warnings properly per https://docs.python.org/3/library/warnings.html#warning-categories In particular, we should use {{FutureWarning}} instead of {{DeprecationWarning}} if we aim to show the warnings by default. Current warnings are a bit messy and somewhat arbitrary. > Standardize warning types > - > > Key: SPARK-33730 > URL: https://issues.apache.org/jira/browse/SPARK-33730 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > We should use warnings properly per > https://docs.python.org/3/library/warnings.html#warning-categories > In particular, we should use {{FutureWarning}} instead of > {{DeprecationWarning}} if we aim to show the warnings by default. > Current warnings are a bit messy and somewhat arbitrary. > To be more explicit, we'll have to fix: > {code} > pyspark/cloudpickle/cloudpickle.py:warnings.warn( > pyspark/context.py:warnings.warn( > pyspark/context.py:warnings.warn( > pyspark/ml/classification.py:warnings.warn("weightCol is > ignored, " > pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will > be removed in future versions. Use " > pyspark/mllib/classification.py:warnings.warn( > pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd > are false. The model does nothing.") > pyspark/mllib/regression.py:warnings.warn( > pyspark/mllib/regression.py:warnings.warn( > pyspark/mllib/regression.py:warnings.warn( > pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; " > pyspark/rdd.py:
[jira] [Commented] (SPARK-33730) Standardize warning types
[ https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246997#comment-17246997 ] Hyukjin Kwon commented on SPARK-33730: -- [~zero323] would you be interested in this? > Standardize warning types > - > > Key: SPARK-33730 > URL: https://issues.apache.org/jira/browse/SPARK-33730 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Major > > We should use warnings properly per > https://docs.python.org/3/library/warnings.html#warning-categories > In particular, we should use {{FutureWarning}} instead of > {{DeprecationWarning}} if we aim to show the warnings by default. > Current warnings are a bit messy and somewhat arbitrary. > To be more explicit, we'll have to fix: > {code} > pyspark/cloudpickle/cloudpickle.py:warnings.warn( > pyspark/context.py:warnings.warn( > pyspark/context.py:warnings.warn( > pyspark/ml/classification.py:warnings.warn("weightCol is > ignored, " > pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will > be removed in future versions. Use " > pyspark/mllib/classification.py:warnings.warn( > pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd > are false. The model does nothing.") > pyspark/mllib/regression.py:warnings.warn( > pyspark/mllib/regression.py:warnings.warn( > pyspark/mllib/regression.py:warnings.warn( > pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; " > pyspark/rdd.py:warnings.warn( > pyspark/shell.py:warnings.warn("Failed to initialize Spark session.") > pyspark/shuffle.py:warnings.warn("Please install psutil to have > better " > pyspark/sql/catalog.py:warnings.warn( > pyspark/sql/catalog.py:warnings.warn( > pyspark/sql/column.py:warnings.warn( > pyspark/sql/column.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/dataframe.py:warnings.warn( > pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict > and value is not None. value will be ignored.") > pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees > instead.", DeprecationWarning) > pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians > instead.", DeprecationWarning) > pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use > approx_count_distinct instead.", DeprecationWarning) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/functions.py:warnings.warn( > pyspark/sql/pandas/group_ops.py:warnings.warn( > pyspark/sql/session.py:warnings.warn("Fall back to non-hive > support because failing to access HiveConf, " > {code} > PySpark prints warnings via using {{print}} in some places as well. We should > also see if we should switch and replace to {{warnings.warn}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33731) Standardize exception types
Hyukjin Kwon created SPARK-33731: Summary: Standardize exception types Key: SPARK-33731 URL: https://issues.apache.org/jira/browse/SPARK-33731 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.1.0 Reporter: Hyukjin Kwon We should: - have a better hierarchy for exception types - or at least use the default type of exceptions correctly instead of just throwing a plain Exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33730) Standardize warning types
Hyukjin Kwon created SPARK-33730: Summary: Standardize warning types Key: SPARK-33730 URL: https://issues.apache.org/jira/browse/SPARK-33730 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.1.0 Reporter: Hyukjin Kwon We should use warnings properly per https://docs.python.org/3/library/warnings.html#warning-categories In particular, we should use {{FutureWarning}} instead of {{DeprecationWarning}} if we aim to show the warnings by default. Current warnings are a bit messy and somewhat arbitrary. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33727: Assignee: Holden Karau > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Holden Karau >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33727. -- Fix Version/s: 3.0.2 3.1.0 Resolution: Fixed Issue resolved by pull request 30696 [https://github.com/apache/spark/pull/30696] > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Holden Karau >Priority: Major > Fix For: 3.1.0, 3.0.2 > > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33721) Support to use Hive build-in functions by configuration
[ https://issues.apache.org/jira/browse/SPARK-33721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenliang updated SPARK-33721: -- Description: Hive and Spark SQL engines have many differences in built-in functions.The differences between several functions are shown below: ||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL|| |unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{, }}{{' 24:00:00'}}{{));}}|1591027200| NULL| |to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL| |datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, }}{{'-00-00'}}{{);}}|737986| NULL| |collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, collect_set(c3)) c3_set }}{{from }}{{bigdata_offline.test_collect_set }}{{group }}{{by }}{{c1, c2;}} {{bigdata_offline.test_collect_set contains data:}} {{(1, 1, }}{{'1'}}{{),}}{{(1, 1, }}{{'2'}}{{)}}{{,}} {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}} {{(1, 1, }}{{'5'}}{{)}}|{{c1 c2 c3_set}} {{1 1 2##3##4##5##1}}|{{c1 c2 c3_set}} {{1 1 3##1##2##5##4}}| There is no conclusion on which engine is more accurate. Users prefer to be able to make choices according to their real production environment. I think we should do some improvement for this. Hive version is 1.2.1 was: Hive and Spark SQL engines have many differences in built-in functions.The differences between several functions are shown below: ||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL|| |unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{, }}{{' 24:00:00'}}{{));}}|1591027200| NULL| |to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL| |datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, }}{{'-00-00'}}{{);}}|737986| NULL| |collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, collect_set(c3)) c3_set }}{{from}}{{bigdata_offline.test_collect_set }}{{group }}{{by }}{{c1, c2;}} {{bigdata_offline.test_collect_set contains data:}} {{(1, 1, }}{{'1'}}{{),}}{{(1, 1, }}{{'2'}}{{)}}{{,}} {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}} {{(1, 1, }}{{'5'}}{{)}}|{{c1 c2 c3_set}} {{1 1 2##3##4##5##1}}|{{c1 c2 c3_set}} {{1 1 3##1##2##5##4}}| There is no conclusion on which engine is more accurate. Users prefer to be able to make choices according to their real production environment. I think we should do some improvement for this. Hive version is 1.2.1 > Support to use Hive build-in functions by configuration > --- > > Key: SPARK-33721 > URL: https://issues.apache.org/jira/browse/SPARK-33721 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3, 3.2.0 >Reporter: chenliang >Priority: Major > > Hive and Spark SQL engines have many differences in built-in functions.The > differences between several functions are shown below: > ||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL|| > |unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{, > }}{{' 24:00:00'}}{{));}}|1591027200| NULL| > |to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL| > |datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, > }}{{'-00-00'}}{{);}}|737986| NULL| > |collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, > collect_set(c3)) c3_set }}{{from }}{{bigdata_offline.test_collect_set > }}{{group }}{{by }}{{c1, c2;}} > {{bigdata_offline.test_collect_set contains data:}} > {{(1, 1, }}{{'1'}}{{),}}{{(1, 1, }}{{'2'}}{{)}}{{,}} > {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}} > {{(1, 1, }}{{'5'}}{{)}}|{{c1 c2 c3_set}} > {{1 1 2##3##4##5##1}}|{{c1 c2 c3_set}} > {{1 1 3##1##2##5##4}}| > There is no conclusion on which engine is more accurate. Users prefer to be > able to make choices according to their real production environment. > I think we should do some improvement for this. > > Hive version is 1.2.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33721) Support to use Hive build-in functions by configuration
[ https://issues.apache.org/jira/browse/SPARK-33721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenliang updated SPARK-33721: -- Description: Hive and Spark SQL engines have many differences in built-in functions.The differences between several functions are shown below: ||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL|| |unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{, }}{{' 24:00:00'}}{{));}}|1591027200| NULL| |to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL| |datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, }}{{'-00-00'}}{{);}}|737986| NULL| |collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, collect_set(c3)) c3_set }}{{from}}{{bigdata_offline.test_collect_set }}{{group }}{{by }}{{c1, c2;}} {{bigdata_offline.test_collect_set contains data:}} {{(1, 1, }}{{'1'}}{{),}}{{(1, 1, }}{{'2'}}{{)}}{{,}} {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}} {{(1, 1, }}{{'5'}}{{)}}|{{c1 c2 c3_set}} {{1 1 2##3##4##5##1}}|{{c1 c2 c3_set}} {{1 1 3##1##2##5##4}}| There is no conclusion on which engine is more accurate. Users prefer to be able to make choices according to their real production environment. I think we should do some improvement for this. Hive version is 1.2.1 was: Hive and Spark SQL engines have many differences in built-in functions.The differences between several functions are shown below: ||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL|| |unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{, }}{{' 24:00:00'}}{{));}}|1591027200| NULL| |to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL| |datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, }}{{'-00-00'}}{{);}}|737986| NULL| |collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, collect_set(c3)) c3_set }}{{from}}{{bigdata_offline.test_collect_set }}{{group }}{{by }}{{c1, c2;}} {{bigdata_offline.test_collect_set contains data:}} {{(1, 1, }}{{'1'}}{{),}}{{(1, 1, }}{{'2'}}{{)}}{{,}} {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}} {{(1, 1, }}{{'5'}}{{)}}|{{c1 c2 c3_set}} {{1 1 2##3##4##5##1}}|{{c1 c2 c3_set}} {{1 1 3##1##2##5##4}}| There is no conclusion on which engine is more accurate. Users prefer to be able to make choices according to their real production environment. I think we should do some improvement for this. > Support to use Hive build-in functions by configuration > --- > > Key: SPARK-33721 > URL: https://issues.apache.org/jira/browse/SPARK-33721 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3, 3.2.0 >Reporter: chenliang >Priority: Major > > Hive and Spark SQL engines have many differences in built-in functions.The > differences between several functions are shown below: > ||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL|| > |unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{, > }}{{' 24:00:00'}}{{));}}|1591027200| NULL| > |to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL| > |datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, > }}{{'-00-00'}}{{);}}|737986| NULL| > |collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, > collect_set(c3)) c3_set }}{{from}}{{bigdata_offline.test_collect_set > }}{{group }}{{by }}{{c1, c2;}} > {{bigdata_offline.test_collect_set contains data:}} > {{(1, 1, }}{{'1'}}{{),}}{{(1, 1, }}{{'2'}}{{)}}{{,}} > {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}} > {{(1, 1, }}{{'5'}}{{)}}|{{c1 c2 c3_set}} > {{1 1 2##3##4##5##1}}|{{c1 c2 c3_set}} > {{1 1 3##1##2##5##4}}| > There is no conclusion on which engine is more accurate. Users prefer to be > able to make choices according to their real production environment. > I think we should do some improvement for this. > > Hive version is 1.2.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33725: -- Fix Version/s: 2.4.8 > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 2.4.8, 3.1.0 > > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33725: -- Fix Version/s: (was: 3.2.0) 3.1.0 > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.1.0 > > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-22769) When driver stopping, there is errors: "Could not find CoarseGrainedScheduler" and "RpcEnv already stopped"
[ https://issues.apache.org/jira/browse/SPARK-22769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Su Qilong reopened SPARK-22769: --- The original reporter gave up work on this,i make a new change for this > When driver stopping, there is errors: "Could not find > CoarseGrainedScheduler" and "RpcEnv already stopped" > --- > > Key: SPARK-22769 > URL: https://issues.apache.org/jira/browse/SPARK-22769 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1 >Reporter: KaiXinXIaoLei >Priority: Major > > I run "spark-sql --master yarn --num-executors 1000 -f createTable.sql". When > task is finished, there is a error: org.apache.spark.SparkException: Could > not find CoarseGrainedScheduler. I think the log level should be warning, not > error. > {noformat} > 17/12/12 18:30:16 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 17/12/12 18:30:16 ERROR TransportRequestHandler: Error while invoking > RpcHandler#receive() for one-way message. > org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154) > at > org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570) > at > org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:180) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85) > {noformat} > and another error is : > {noformat} > 17/12/12 18:20:44 INFO MemoryStore: MemoryStore cleared > 17/12/12 18:20:44 INFO BlockManager: BlockManager stopped > 17/12/12 18:20:44 INFO BlockManagerMaster: BlockManagerMaster stopped > 17/12/12 18:20:44 ERROR TransportRequestHandler: Error while invoking > RpcHandler#receive() for one-way message. > org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134) > at > org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570) > at > org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:180) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at >
[jira] [Comment Edited] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
[ https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246957#comment-17246957 ] GeoffreyStark edited comment on SPARK-23086 at 12/10/20, 1:59 AM: -- In the case I encountered before, I checked later that SPARK was blocked not in the case of high concurrency, but because the NameNode's FoldedTreeset in Hadoop3.x was defective([HDFS-13671|https://issues.apache.org/jira/browse/HDFS-13671]), resulting in extremely unstable RPC, which was the root cause of SPARK blocking:) was (Author: gaofeng6): In the case I encountered before, I checked later that SPark was blocked not in the case of high concurrency, but because the NameNode's FoldedTreeset in HadoOP3.x was defective, resulting in extremely unstable RPC, which was the root cause of SPark blocking:) > Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog > -- > > Key: SPARK-23086 > URL: https://issues.apache.org/jira/browse/SPARK-23086 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 > Environment: * Spark 2.2.1 >Reporter: pin_zhang >Priority: Major > Labels: bulk-closed > > * Hive metastore is mysql > * Set hive.server2.thrift.max.worker.threads=500 > create table test (id string ) partitioned by (index int) stored as > parquet; > insert into test partition (index=1) values('id1'); > * 100 Clients run SQL“select * from table” on table > * Many clients (97%) blocked at HiveExternalCatalog.withClient > * Is synchronized expected when only run query against tables? > "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 > waiting for monitor entry [0x4e19a000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > - waiting to lock <0xc06a3ba8> (a > org.apache.spark.sql.hive.HiveExternalCatalog) > at > org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667) > - locked <0xc41ab748> (a > org.apache.spark.sql.hive.HiveSessionCatalog) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82) > at >
[jira] [Comment Edited] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
[ https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246958#comment-17246958 ] GeoffreyStark edited comment on SPARK-23086 at 12/10/20, 1:58 AM: -- Sorry, I forgot to say, I changed my nickname, I am the gaofeng in front of the comment section was (Author: gaofeng6): Sorry, I forgot to say, I changed my nickname, I am the Gaofeng in front of the comment section > Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog > -- > > Key: SPARK-23086 > URL: https://issues.apache.org/jira/browse/SPARK-23086 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 > Environment: * Spark 2.2.1 >Reporter: pin_zhang >Priority: Major > Labels: bulk-closed > > * Hive metastore is mysql > * Set hive.server2.thrift.max.worker.threads=500 > create table test (id string ) partitioned by (index int) stored as > parquet; > insert into test partition (index=1) values('id1'); > * 100 Clients run SQL“select * from table” on table > * Many clients (97%) blocked at HiveExternalCatalog.withClient > * Is synchronized expected when only run query against tables? > "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 > waiting for monitor entry [0x4e19a000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > - waiting to lock <0xc06a3ba8> (a > org.apache.spark.sql.hive.HiveExternalCatalog) > at > org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667) > - locked <0xc41ab748> (a > org.apache.spark.sql.hive.HiveSessionCatalog) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) > - locked <0xff491c48> (a > org.apache.spark.sql.execution.QueryExecution) > at >
[jira] [Commented] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
[ https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246958#comment-17246958 ] GeoffreyStark commented on SPARK-23086: --- Sorry, I forgot to say, I changed my nickname, I am the Gaofeng in front of the comment section > Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog > -- > > Key: SPARK-23086 > URL: https://issues.apache.org/jira/browse/SPARK-23086 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 > Environment: * Spark 2.2.1 >Reporter: pin_zhang >Priority: Major > Labels: bulk-closed > > * Hive metastore is mysql > * Set hive.server2.thrift.max.worker.threads=500 > create table test (id string ) partitioned by (index int) stored as > parquet; > insert into test partition (index=1) values('id1'); > * 100 Clients run SQL“select * from table” on table > * Many clients (97%) blocked at HiveExternalCatalog.withClient > * Is synchronized expected when only run query against tables? > "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 > waiting for monitor entry [0x4e19a000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > - waiting to lock <0xc06a3ba8> (a > org.apache.spark.sql.hive.HiveExternalCatalog) > at > org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667) > - locked <0xc41ab748> (a > org.apache.spark.sql.hive.HiveSessionCatalog) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) > - locked <0xff491c48> (a > org.apache.spark.sql.execution.QueryExecution) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) >
[jira] [Commented] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
[ https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246957#comment-17246957 ] GeoffreyStark commented on SPARK-23086: --- In the case I encountered before, I checked later that SPark was blocked not in the case of high concurrency, but because the NameNode's FoldedTreeset in HadoOP3.x was defective, resulting in extremely unstable RPC, which was the root cause of SPark blocking:) > Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog > -- > > Key: SPARK-23086 > URL: https://issues.apache.org/jira/browse/SPARK-23086 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.1 > Environment: * Spark 2.2.1 >Reporter: pin_zhang >Priority: Major > Labels: bulk-closed > > * Hive metastore is mysql > * Set hive.server2.thrift.max.worker.threads=500 > create table test (id string ) partitioned by (index int) stored as > parquet; > insert into test partition (index=1) values('id1'); > * 100 Clients run SQL“select * from table” on table > * Many clients (97%) blocked at HiveExternalCatalog.withClient > * Is synchronized expected when only run query against tables? > "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 > waiting for monitor entry [0x4e19a000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > - waiting to lock <0xc06a3ba8> (a > org.apache.spark.sql.hive.HiveExternalCatalog) > at > org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667) > - locked <0xc41ab748> (a > org.apache.spark.sql.hive.HiveSessionCatalog) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74) > at scala.collection.immutable.List.foreach(List.scala:381) > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) > - locked <0xff491c48> (a > org.apache.spark.sql.execution.QueryExecution) > at >
[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
[ https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246946#comment-17246946 ] Hyukjin Kwon commented on SPARK-33713: -- Yeah, I think it's fine. +1 no worries! > Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names > - > > Key: SPARK-33713 > URL: https://issues.apache.org/jira/browse/SPARK-33713 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Shane Knapp >Priority: Major > > We removed `hive-1.2` profile since branch-3.1. So, we can simplify the > Jenkins job title. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246945#comment-17246945 ] Hyukjin Kwon commented on SPARK-33727: -- I am facing this error in my local too. Thanks [~dongjoon], [~holden] and [~shaneknapp] for addressing this issue. > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33729) When refreshing cache, Spark should not use cached plan when recaching data
[ https://issues.apache.org/jira/browse/SPARK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246942#comment-17246942 ] Apache Spark commented on SPARK-33729: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/30699 > When refreshing cache, Spark should not use cached plan when recaching data > --- > > Key: SPARK-33729 > URL: https://issues.apache.org/jira/browse/SPARK-33729 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Chao Sun >Priority: Major > > Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark > will call {{refreshTable}} method within {{CatalogImpl}}. > {code} > override def refreshTable(tableName: String): Unit = { > val tableIdent = > sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName) > val tableMetadata = > sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent) > val table = sparkSession.table(tableIdent) > if (tableMetadata.tableType == CatalogTableType.VIEW) { > // Temp or persistent views: refresh (or invalidate) any metadata/data > cached > // in the plan recursively. > table.queryExecution.analyzed.refresh() > } else { > // Non-temp tables: refresh the metadata cache. > sessionCatalog.refreshTable(tableIdent) > } > // If this table is cached as an InMemoryRelation, drop the original > // cached version and make the new version cached lazily. > val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table) > // uncache the logical plan. > // note this is a no-op for the table itself if it's not cached, but will > invalidate all > // caches referencing this table. > sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true) > if (cache.nonEmpty) { > // save the cache name and cache level for recreation > val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName > val cacheLevel = > cache.get.cachedRepresentation.cacheBuilder.storageLevel > // recache with the same name and cache level. > sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, > cacheLevel) > } > } > {code} > Note that the {{table}} is created before the table relation cache is > cleared, and used later in {{cacheQuery}}. This is incorrect since it still > refers cached table relation which could be stale. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33729) When refreshing cache, Spark should not use cached plan when recaching data
[ https://issues.apache.org/jira/browse/SPARK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246941#comment-17246941 ] Apache Spark commented on SPARK-33729: -- User 'sunchao' has created a pull request for this issue: https://github.com/apache/spark/pull/30699 > When refreshing cache, Spark should not use cached plan when recaching data > --- > > Key: SPARK-33729 > URL: https://issues.apache.org/jira/browse/SPARK-33729 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Chao Sun >Priority: Major > > Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark > will call {{refreshTable}} method within {{CatalogImpl}}. > {code} > override def refreshTable(tableName: String): Unit = { > val tableIdent = > sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName) > val tableMetadata = > sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent) > val table = sparkSession.table(tableIdent) > if (tableMetadata.tableType == CatalogTableType.VIEW) { > // Temp or persistent views: refresh (or invalidate) any metadata/data > cached > // in the plan recursively. > table.queryExecution.analyzed.refresh() > } else { > // Non-temp tables: refresh the metadata cache. > sessionCatalog.refreshTable(tableIdent) > } > // If this table is cached as an InMemoryRelation, drop the original > // cached version and make the new version cached lazily. > val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table) > // uncache the logical plan. > // note this is a no-op for the table itself if it's not cached, but will > invalidate all > // caches referencing this table. > sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true) > if (cache.nonEmpty) { > // save the cache name and cache level for recreation > val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName > val cacheLevel = > cache.get.cachedRepresentation.cacheBuilder.storageLevel > // recache with the same name and cache level. > sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, > cacheLevel) > } > } > {code} > Note that the {{table}} is created before the table relation cache is > cleared, and used later in {{cacheQuery}}. This is incorrect since it still > refers cached table relation which could be stale. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33729) When refreshing cache, Spark should not use cached plan when recaching data
[ https://issues.apache.org/jira/browse/SPARK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33729: Assignee: Apache Spark > When refreshing cache, Spark should not use cached plan when recaching data > --- > > Key: SPARK-33729 > URL: https://issues.apache.org/jira/browse/SPARK-33729 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Chao Sun >Assignee: Apache Spark >Priority: Major > > Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark > will call {{refreshTable}} method within {{CatalogImpl}}. > {code} > override def refreshTable(tableName: String): Unit = { > val tableIdent = > sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName) > val tableMetadata = > sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent) > val table = sparkSession.table(tableIdent) > if (tableMetadata.tableType == CatalogTableType.VIEW) { > // Temp or persistent views: refresh (or invalidate) any metadata/data > cached > // in the plan recursively. > table.queryExecution.analyzed.refresh() > } else { > // Non-temp tables: refresh the metadata cache. > sessionCatalog.refreshTable(tableIdent) > } > // If this table is cached as an InMemoryRelation, drop the original > // cached version and make the new version cached lazily. > val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table) > // uncache the logical plan. > // note this is a no-op for the table itself if it's not cached, but will > invalidate all > // caches referencing this table. > sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true) > if (cache.nonEmpty) { > // save the cache name and cache level for recreation > val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName > val cacheLevel = > cache.get.cachedRepresentation.cacheBuilder.storageLevel > // recache with the same name and cache level. > sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, > cacheLevel) > } > } > {code} > Note that the {{table}} is created before the table relation cache is > cleared, and used later in {{cacheQuery}}. This is incorrect since it still > refers cached table relation which could be stale. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33729) When refreshing cache, Spark should not use cached plan when recaching data
[ https://issues.apache.org/jira/browse/SPARK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33729: Assignee: (was: Apache Spark) > When refreshing cache, Spark should not use cached plan when recaching data > --- > > Key: SPARK-33729 > URL: https://issues.apache.org/jira/browse/SPARK-33729 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Chao Sun >Priority: Major > > Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark > will call {{refreshTable}} method within {{CatalogImpl}}. > {code} > override def refreshTable(tableName: String): Unit = { > val tableIdent = > sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName) > val tableMetadata = > sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent) > val table = sparkSession.table(tableIdent) > if (tableMetadata.tableType == CatalogTableType.VIEW) { > // Temp or persistent views: refresh (or invalidate) any metadata/data > cached > // in the plan recursively. > table.queryExecution.analyzed.refresh() > } else { > // Non-temp tables: refresh the metadata cache. > sessionCatalog.refreshTable(tableIdent) > } > // If this table is cached as an InMemoryRelation, drop the original > // cached version and make the new version cached lazily. > val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table) > // uncache the logical plan. > // note this is a no-op for the table itself if it's not cached, but will > invalidate all > // caches referencing this table. > sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true) > if (cache.nonEmpty) { > // save the cache name and cache level for recreation > val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName > val cacheLevel = > cache.get.cachedRepresentation.cacheBuilder.storageLevel > // recache with the same name and cache level. > sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, > cacheLevel) > } > } > {code} > Note that the {{table}} is created before the table relation cache is > cleared, and used later in {{cacheQuery}}. This is incorrect since it still > refers cached table relation which could be stale. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33729) When refreshing cache, Spark should not use cached plan when recaching data
Chao Sun created SPARK-33729: Summary: When refreshing cache, Spark should not use cached plan when recaching data Key: SPARK-33729 URL: https://issues.apache.org/jira/browse/SPARK-33729 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Reporter: Chao Sun Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark will call {{refreshTable}} method within {{CatalogImpl}}. {code} override def refreshTable(tableName: String): Unit = { val tableIdent = sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName) val tableMetadata = sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent) val table = sparkSession.table(tableIdent) if (tableMetadata.tableType == CatalogTableType.VIEW) { // Temp or persistent views: refresh (or invalidate) any metadata/data cached // in the plan recursively. table.queryExecution.analyzed.refresh() } else { // Non-temp tables: refresh the metadata cache. sessionCatalog.refreshTable(tableIdent) } // If this table is cached as an InMemoryRelation, drop the original // cached version and make the new version cached lazily. val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table) // uncache the logical plan. // note this is a no-op for the table itself if it's not cached, but will invalidate all // caches referencing this table. sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true) if (cache.nonEmpty) { // save the cache name and cache level for recreation val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName val cacheLevel = cache.get.cachedRepresentation.cacheBuilder.storageLevel // recache with the same name and cache level. sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, cacheLevel) } } {code} Note that the {{table}} is created before the table relation cache is cleared, and used later in {{cacheQuery}}. This is incorrect since it still refers cached table relation which could be stale. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246934#comment-17246934 ] Apache Spark commented on SPARK-33725: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30698 > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246932#comment-17246932 ] Apache Spark commented on SPARK-33725: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30698 > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246931#comment-17246931 ] Apache Spark commented on SPARK-33725: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30697 > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246930#comment-17246930 ] Apache Spark commented on SPARK-33727: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/30696 > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246929#comment-17246929 ] Apache Spark commented on SPARK-33727: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/30696 > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33727: Assignee: (was: Apache Spark) > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33727: Assignee: Apache Spark > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246928#comment-17246928 ] Shane Knapp commented on SPARK-33727: - [~dongjoon] the build that i looked at ([https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37120/)] ran on one of the workers that's been set up for over a year, and hasn't had any network changes done to it since. {noformat} ¯\_(ツ)_/¯{noformat} i'd tend to agree w/holden's observation – sometimes keyservers are flaky. fallbacks are always a good thing. > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33728) Improve error messages during K8s integration test failure
[ https://issues.apache.org/jira/browse/SPARK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33728: Assignee: (was: Apache Spark) > Improve error messages during K8s integration test failure > -- > > Key: SPARK-33728 > URL: https://issues.apache.org/jira/browse/SPARK-33728 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Priority: Trivial > > If decommissioning fails it can be hard to debug because we don't have > executor logs. Capture some off the executor logs for debugging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33728) Improve error messages during K8s integration test failure
[ https://issues.apache.org/jira/browse/SPARK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246925#comment-17246925 ] Apache Spark commented on SPARK-33728: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/30435 > Improve error messages during K8s integration test failure > -- > > Key: SPARK-33728 > URL: https://issues.apache.org/jira/browse/SPARK-33728 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Priority: Trivial > > If decommissioning fails it can be hard to debug because we don't have > executor logs. Capture some off the executor logs for debugging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33728) Improve error messages during K8s integration test failure
[ https://issues.apache.org/jira/browse/SPARK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33728: Assignee: Apache Spark > Improve error messages during K8s integration test failure > -- > > Key: SPARK-33728 > URL: https://issues.apache.org/jira/browse/SPARK-33728 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Assignee: Apache Spark >Priority: Trivial > > If decommissioning fails it can be hard to debug because we don't have > executor logs. Capture some off the executor logs for debugging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246923#comment-17246923 ] Holden Karau commented on SPARK-33727: -- So I've seen the keys.gnupg.net key server be flaky on my machine at home. Maybe we could try having it fall back to another key server it that one fails? > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33728) Improve error messages during K8s integration test failure
[ https://issues.apache.org/jira/browse/SPARK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246924#comment-17246924 ] Apache Spark commented on SPARK-33728: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/30435 > Improve error messages during K8s integration test failure > -- > > Key: SPARK-33728 > URL: https://issues.apache.org/jira/browse/SPARK-33728 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Priority: Trivial > > If decommissioning fails it can be hard to debug because we don't have > executor logs. Capture some off the executor logs for debugging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33728) Improve error messages during K8s integration test failure
Holden Karau created SPARK-33728: Summary: Improve error messages during K8s integration test failure Key: SPARK-33728 URL: https://issues.apache.org/jira/browse/SPARK-33728 Project: Spark Issue Type: Improvement Components: Kubernetes, Tests Affects Versions: 3.1.0, 3.2.0 Reporter: Holden Karau If decommissioning fails it can be hard to debug because we don't have executor logs. Capture some off the executor logs for debugging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246899#comment-17246899 ] Dongjoon Hyun commented on SPARK-33727: --- BTW, K8s IT succeed 4 hours ago in this run. So, I guess that some workers may have different network setting. - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37110/console > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246897#comment-17246897 ] Dongjoon Hyun commented on SPARK-33727: --- The patch landed 28 days ago to recover K8S IT failure. - https://github.com/apache/spark/pull/30130 ([SPARK-33408][SPARK-32354][K8S][R] Use R 3.6.3 in K8s R image and re-enable RTestsSuite) K8s IT has been working correctly for one month with that key server. > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246894#comment-17246894 ] Shane Knapp edited comment on SPARK-33727 at 12/9/20, 11:30 PM: ok, this is not a failure on the jenkins worker – it's happening inside the docker container that the build spins up. just scroll back from the gpg error message and you'll see the docker STDOUT as it's trying to build the spark-r container. in fact, from looking at the logs it appears that the command that's actually failing in the container setup is: {noformat} apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'{noformat} that's causing the following error: {code:java} Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF gpg: keyserver receive failed: No name {code} so, i took a peek at ./resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile, and git blame for that specific line points to... drumroll please... {code:java} 22baf05a9ec (Dongjoon Hyun 2020-11-12 15:36:31 +0900 32) apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && {code} [~dongjoon] was (Author: shaneknapp): ok, this is not a failure on the jenkins worker – it's happening inside the docker container that the build spins up. just scroll back from the gpg error message and you'll see the docker STDOUT as it's trying to build the spark-r container. in fact, from looking at the logs it appears that the command that's actually failing in the container setup is: {noformat} apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'{noformat} that's causing the following error: {code:java} Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF gpg: keyserver receive failed: No name {code} so, i took a peek at ./resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile, and git blame for that specific line points to... drumroll please... {code:java} 22baf05a9ec (Dongjoon Hyun 2020-11-12 15:36:31 +0900 32) apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && \{code} [~dongjoon] > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246894#comment-17246894 ] Shane Knapp commented on SPARK-33727: - ok, this is not a failure on the jenkins worker – it's happening inside the docker container that the build spins up. just scroll back from the gpg error message and you'll see the docker STDOUT as it's trying to build the spark-r container. in fact, from looking at the logs it appears that the command that's actually failing in the container setup is: {noformat} apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'{noformat} that's causing the following error: {code:java} Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF gpg: keyserver receive failed: No name {code} so, i took a peek at ./resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile, and git blame for that specific line points to... drumroll please... {code:java} 22baf05a9ec (Dongjoon Hyun 2020-11-12 15:36:31 +0900 32) apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && \{code} [~dongjoon] > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246891#comment-17246891 ] Apache Spark commented on SPARK-33725: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30695 > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246890#comment-17246890 ] Apache Spark commented on SPARK-33725: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30695 > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246883#comment-17246883 ] Dongjoon Hyun commented on SPARK-33727: --- Hi, [~shaneknapp]. Could you try `gpg` command in the Jenkins server? Is it working? I'm wondering if there is some network security change there. > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246884#comment-17246884 ] Dongjoon Hyun commented on SPARK-33727: --- cc [~hyukjin.kwon] > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
[ https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33727: -- Description: K8s IT fails with gpg: keyserver receive failed: No name. This seems to be consistent in the new Jenkins Server. {code} Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF gpg: keyserver receive failed: No name The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' returned a non-zero code: 2 {code} It locally works on Mac. {code} $ gpg1 --keyserver keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF gpg: requesting key 256A04AF from hkp server keys.gnupg.net gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) " imported gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u gpg: Total number processed: 1 gpg: imported: 1 (RSA: 1) {code} It happens multiple times. - https://github.com/apache/spark/pull/30693 - https://github.com/apache/spark/pull/30694 was: K8s IT fails with gpg: keyserver receive failed: No name. This seems to be consistent in the new Jenkins Server. {code} Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF gpg: keyserver receive failed: No name The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' returned a non-zero code: 2 {code} It locally works on Mac. {code} $ gpg1 --keyserver keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF gpg: requesting key 256A04AF from hkp server keys.gnupg.net gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) " imported gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u gpg: Total number processed: 1 gpg: imported: 1 (RSA: 1) {code} > `gpg: keyserver receive failed: No name` during K8s IT > -- > > Key: SPARK-33727 > URL: https://issues.apache.org/jira/browse/SPARK-33727 > Project: Spark > Issue Type: Task > Components: Kubernetes, Project Infra, Tests >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > K8s IT fails with gpg: keyserver receive failed: No name. This seems to be > consistent in the new Jenkins Server. > {code} > Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver > keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: keyserver receive failed: No name > The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian > buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && > apt-key adv --keyserver keys.gnupg.net --recv-key > 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt > install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' > returned a non-zero code: 2 > {code} > It locally works on Mac. > {code} > $ gpg1 --keyserver keys.gnupg.net --recv-key > E19F5F87128899B192B1A2C2AD5F960A256A04AF > gpg: requesting key 256A04AF from hkp server keys.gnupg.net > gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) > " imported > gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model > gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u > gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u > gpg: Total number processed: 1 > gpg: imported: 1 (RSA: 1) > {code} > It happens multiple times. > - https://github.com/apache/spark/pull/30693 > - https://github.com/apache/spark/pull/30694 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT
Dongjoon Hyun created SPARK-33727: - Summary: `gpg: keyserver receive failed: No name` during K8s IT Key: SPARK-33727 URL: https://issues.apache.org/jira/browse/SPARK-33727 Project: Spark Issue Type: Task Components: Kubernetes, Project Infra, Tests Affects Versions: 3.0.2, 3.1.0, 3.2.0 Reporter: Dongjoon Hyun K8s IT fails with gpg: keyserver receive failed: No name. This seems to be consistent in the new Jenkins Server. {code} Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF gpg: keyserver receive failed: No name The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian buster-cran35/" >> /etc/apt/sources.list && apt install -y gnupg && apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && apt-get update && apt install -y -t buster-cran35 r-base r-base-dev && rm -rf /var/cache/apt/*' returned a non-zero code: 2 {code} It locally works on Mac. {code} $ gpg1 --keyserver keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF gpg: requesting key 256A04AF from hkp server keys.gnupg.net gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) " imported gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model gpg: depth: 0 valid: 2 signed: 1 trust: 0-, 0q, 0n, 0m, 0f, 2u gpg: depth: 1 valid: 1 signed: 0 trust: 1-, 0q, 0n, 0m, 0f, 0u gpg: Total number processed: 1 gpg: imported: 1 (RSA: 1) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33725. --- Fix Version/s: 3.2.0 Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/30690 > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
[ https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246858#comment-17246858 ] Sean R. Owen commented on SPARK-33713: -- I don't think we care much about build history, yes. Don't worry about that. > Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names > - > > Key: SPARK-33713 > URL: https://issues.apache.org/jira/browse/SPARK-33713 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Shane Knapp >Priority: Major > > We removed `hive-1.2` profile since branch-3.1. So, we can simplify the > Jenkins job title. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
[ https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246850#comment-17246850 ] Shane Knapp commented on SPARK-33713: - got it. gonna rejigger my rejiggering of the JJB configs. > Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names > - > > Key: SPARK-33713 > URL: https://issues.apache.org/jira/browse/SPARK-33713 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Shane Knapp >Priority: Major > > We removed `hive-1.2` profile since branch-3.1. So, we can simplify the > Jenkins job title. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33724) Allow decommissioning script location to be configured
[ https://issues.apache.org/jira/browse/SPARK-33724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246848#comment-17246848 ] Apache Spark commented on SPARK-33724: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/30694 > Allow decommissioning script location to be configured > -- > > Key: SPARK-33724 > URL: https://issues.apache.org/jira/browse/SPARK-33724 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Trivial > > Some people don't use the Spark image tool and instead do custom volume > mounts to make Spark available. As such the hard coded path does not work > well for them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33724) Allow decommissioning script location to be configured
[ https://issues.apache.org/jira/browse/SPARK-33724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33724: Assignee: Holden Karau (was: Apache Spark) > Allow decommissioning script location to be configured > -- > > Key: SPARK-33724 > URL: https://issues.apache.org/jira/browse/SPARK-33724 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Trivial > > Some people don't use the Spark image tool and instead do custom volume > mounts to make Spark available. As such the hard coded path does not work > well for them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33724) Allow decommissioning script location to be configured
[ https://issues.apache.org/jira/browse/SPARK-33724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33724: Assignee: Apache Spark (was: Holden Karau) > Allow decommissioning script location to be configured > -- > > Key: SPARK-33724 > URL: https://issues.apache.org/jira/browse/SPARK-33724 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Assignee: Apache Spark >Priority: Trivial > > Some people don't use the Spark image tool and instead do custom volume > mounts to make Spark available. As such the hard coded path does not work > well for them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33724) Allow decommissioning script location to be configured
[ https://issues.apache.org/jira/browse/SPARK-33724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246847#comment-17246847 ] Apache Spark commented on SPARK-33724: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/30694 > Allow decommissioning script location to be configured > -- > > Key: SPARK-33724 > URL: https://issues.apache.org/jira/browse/SPARK-33724 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Trivial > > Some people don't use the Spark image tool and instead do custom volume > mounts to make Spark available. As such the hard coded path does not work > well for them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33716) Decommissioning Race Condition during Pod Snapshot
[ https://issues.apache.org/jira/browse/SPARK-33716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246846#comment-17246846 ] Apache Spark commented on SPARK-33716: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/30693 > Decommissioning Race Condition during Pod Snapshot > -- > > Key: SPARK-33716 > URL: https://issues.apache.org/jira/browse/SPARK-33716 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > Some version of Kubernetes may create a deletion timestamp field before > changing the pod status to terminating, so a decommissioning node may have a > deletion timestamp and a stage of running. Depending on when the K8s snapshot > comes back this can cause a race condition with Spark believing the pod has > been deleted before it has been. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33716) Decommissioning Race Condition during Pod Snapshot
[ https://issues.apache.org/jira/browse/SPARK-33716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246845#comment-17246845 ] Apache Spark commented on SPARK-33716: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/30693 > Decommissioning Race Condition during Pod Snapshot > -- > > Key: SPARK-33716 > URL: https://issues.apache.org/jira/browse/SPARK-33716 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > Some version of Kubernetes may create a deletion timestamp field before > changing the pod status to terminating, so a decommissioning node may have a > deletion timestamp and a stage of running. Depending on when the K8s snapshot > comes back this can cause a race condition with Spark believing the pod has > been deleted before it has been. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33716) Decommissioning Race Condition during Pod Snapshot
[ https://issues.apache.org/jira/browse/SPARK-33716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33716: Assignee: Apache Spark (was: Holden Karau) > Decommissioning Race Condition during Pod Snapshot > -- > > Key: SPARK-33716 > URL: https://issues.apache.org/jira/browse/SPARK-33716 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Assignee: Apache Spark >Priority: Major > > Some version of Kubernetes may create a deletion timestamp field before > changing the pod status to terminating, so a decommissioning node may have a > deletion timestamp and a stage of running. Depending on when the K8s snapshot > comes back this can cause a race condition with Spark believing the pod has > been deleted before it has been. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33716) Decommissioning Race Condition during Pod Snapshot
[ https://issues.apache.org/jira/browse/SPARK-33716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33716: Assignee: Holden Karau (was: Apache Spark) > Decommissioning Race Condition during Pod Snapshot > -- > > Key: SPARK-33716 > URL: https://issues.apache.org/jira/browse/SPARK-33716 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.1.0, 3.2.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > Some version of Kubernetes may create a deletion timestamp field before > changing the pod status to terminating, so a decommissioning node may have a > deletion timestamp and a stage of running. Depending on when the K8s snapshot > comes back this can cause a race condition with Spark believing the pod has > been deleted before it has been. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
[ https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246833#comment-17246833 ] Dongjoon Hyun commented on SPARK-33713: --- For master and branch-3.1, we don't need a build history. And, `master` and `branch-3.1` have been already broken for a while. (cc [~hyukjin.kwon] and [~srowen]) > Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names > - > > Key: SPARK-33713 > URL: https://issues.apache.org/jira/browse/SPARK-33713 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Shane Knapp >Priority: Major > > We removed `hive-1.2` profile since branch-3.1. So, we can simplify the > Jenkins job title. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
[ https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246830#comment-17246830 ] Dongjoon Hyun commented on SPARK-33713: --- Oh, this was a request for only `master/branch-3.1` ("Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names"). > Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names > - > > Key: SPARK-33713 > URL: https://issues.apache.org/jira/browse/SPARK-33713 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Shane Knapp >Priority: Major > > We removed `hive-1.2` profile since branch-3.1. So, we can simplify the > Jenkins job title. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33726) Duplicate field names causes wrong answers during aggregation
[ https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated SPARK-33726: Labels: correctness (was: ) > Duplicate field names causes wrong answers during aggregation > - > > Key: SPARK-33726 > URL: https://issues.apache.org/jira/browse/SPARK-33726 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.1 >Reporter: Yian Liou >Priority: Major > Labels: correctness > > We saw this bug at Workday. > Duplicate field names for different fields can cause > org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to > return a fixed batch when it should have returned a variable batch leading to > wrong results. > This example produces wrong results in the spark shell: > scala> sql("with T as (select id as a, -id as x from range(3)), U as (select > id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as > ma, min(b) as mb from T join U on a=b group by U.x, T.x").show > > |*x*|*x*|*ma*|*mb*| > |-2|2|0|null| > |-1|1|null|1| > |0|0|0|0| > instead of correct output : > |*x*|*x*|*ma*|*mb*| > |0|0|0|0| > |-2|2|2|2| > |-1|1|1|1| > The issue can be solved by iterating over the fields themselves instead of > field names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
[ https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246816#comment-17246816 ] Shane Knapp commented on SPARK-33713: - also, let me think about other sneaky ways of making this happen w/o needing to lose all the build logs... > Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names > - > > Key: SPARK-33713 > URL: https://issues.apache.org/jira/browse/SPARK-33713 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Shane Knapp >Priority: Major > > We removed `hive-1.2` profile since branch-3.1. So, we can simplify the > Jenkins job title. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33726) Duplicate field names causes wrong answers during aggregation
Yian Liou created SPARK-33726: - Summary: Duplicate field names causes wrong answers during aggregation Key: SPARK-33726 URL: https://issues.apache.org/jira/browse/SPARK-33726 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1, 2.4.4 Reporter: Yian Liou We saw this bug at Workday. Duplicate field names for different fields can cause org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to return a fixed batch when it should have returned a variable batch leading to wrong results. This example produces wrong results in the spark shell: scala> sql("with T as (select id as a, -id as x from range(3)), U as (select id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as ma, min(b) as mb from T join U on a=b group by U.x, T.x").show |*x*|*x*|*ma*|*mb*| |-2|2|0|null| |-1|1|null|1| |0|0|0|0| instead of correct output : |*x*|*x*|*ma*|*mb*| |0|0|0|0| |-2|2|2|2| |-1|1|1|1| The issue can be solved by iterating over the fields themselves instead of field names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33726) Duplicate field names causes wrong answers during aggregation
[ https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246812#comment-17246812 ] Yian Liou commented on SPARK-33726: --- Will create a PR for the issue. > Duplicate field names causes wrong answers during aggregation > - > > Key: SPARK-33726 > URL: https://issues.apache.org/jira/browse/SPARK-33726 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.1 >Reporter: Yian Liou >Priority: Major > > We saw this bug at Workday. > Duplicate field names for different fields can cause > org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to > return a fixed batch when it should have returned a variable batch leading to > wrong results. > This example produces wrong results in the spark shell: > scala> sql("with T as (select id as a, -id as x from range(3)), U as (select > id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as > ma, min(b) as mb from T join U on a=b group by U.x, T.x").show > > |*x*|*x*|*ma*|*mb*| > |-2|2|0|null| > |-1|1|null|1| > |0|0|0|0| > instead of correct output : > |*x*|*x*|*ma*|*mb*| > |0|0|0|0| > |-2|2|2|2| > |-1|1|1|1| > The issue can be solved by iterating over the fields themselves instead of > field names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
[ https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246810#comment-17246810 ] Shane Knapp commented on SPARK-33713: - fwiw, here's what the new and sexy build names will be: {noformat} INFO:jenkins_jobs.builder:Number of jobs generated: 30 DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.6' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.6' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-jdk-11-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-jdk-11-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-sbt-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-sbt-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-jdk-11-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-jdk-11-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-sbt-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-sbt-hadoop-3.2' {noformat} > Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names > - > > Key: SPARK-33713 > URL: https://issues.apache.org/jira/browse/SPARK-33713 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Shane Knapp >Priority: Major > > We removed `hive-1.2` profile since branch-3.1. So, we can simplify the > Jenkins job title. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
[ https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246810#comment-17246810 ] Shane Knapp edited comment on SPARK-33713 at 12/9/20, 8:42 PM: --- fwiw, here's what the new and sexy build names will be: {code:java} INFO:jenkins_jobs.builder:Number of jobs generated: 30 DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.6' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.6' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-jdk-11-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-jdk-11-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-sbt-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-sbt-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-jdk-11-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-jdk-11-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-scala-2.13' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-sbt-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-master-test-sbt-hadoop-3.2' {code} {noformat} {noformat} was (Author: shaneknapp): fwiw, here's what the new and sexy build names will be: {noformat} INFO:jenkins_jobs.builder:Number of jobs generated: 30 DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.6' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.6' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2-jdk-11' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-3.2' DEBUG:jenkins_jobs.builder:Writing XML to 'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7' DEBUG:jenkins_jobs.builder:Writing XML to
[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
[ https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246808#comment-17246808 ] Shane Knapp commented on SPARK-33713: - hmm. so while i heartily endorse this, there will be a side-effect of renaming the jobs (particularly as we use Jenkins Job Builder – JJB – to deploy and manage the build configs): once i redeploy the updated JJB configs w/the shortened build names, it will create new builds and not change the names of existing ones. unless i'm unable to read docs anymore (which is entirely possible) there's no 'rename' ability. and since, according to jenkins, these are new builds, all previous build history will be lost... unless i manually copy things over on the jenkins primary filesystem (which i'd like to avoid if possible). given that we're only storing 2 weeks of builds, if we time it properly (aka at 3.1 release) the impact of losing these logs will be pretty insignificant. my suggestion: i want to do this, but i propose moving everything over when 3.1 is officially released. we'll lose the previous 2 weeks of build history across all branches, but since we're "starting fresh"-ish i think the impact will be minimized. thoughts? comments? should we pull anyone else in for their opions? [~sowen] [~hyukjin.kwon] [~holden] > Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names > - > > Key: SPARK-33713 > URL: https://issues.apache.org/jira/browse/SPARK-33713 > Project: Spark > Issue Type: Task > Components: Project Infra >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Shane Knapp >Priority: Major > > We removed `hive-1.2` profile since branch-3.1. So, we can simplify the > Jenkins job title. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33725: -- Affects Version/s: 3.1.0 2.4.7 3.0.1 > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33725: -- Issue Type: Bug (was: Improvement) > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32920) Add support in Spark driver to coordinate the finalization of the push/merge phase in push-based shuffle for a given shuffle and the initiation of the reduce stage
[ https://issues.apache.org/jira/browse/SPARK-32920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32920: Assignee: (was: Apache Spark) > Add support in Spark driver to coordinate the finalization of the push/merge > phase in push-based shuffle for a given shuffle and the initiation of the > reduce stage > --- > > Key: SPARK-32920 > URL: https://issues.apache.org/jira/browse/SPARK-32920 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Priority: Major > > With push-based shuffle, we are currently decoupling map task executions from > the shuffle block push process. Thus, when all map tasks finish, we might > want to wait for some small extra time to allow more shuffle blocks to get > pushed and merged. This requires some extra coordination in the Spark driver > when it transitions from a shuffle map stage to the corresponding reduce > stage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32920) Add support in Spark driver to coordinate the finalization of the push/merge phase in push-based shuffle for a given shuffle and the initiation of the reduce stage
[ https://issues.apache.org/jira/browse/SPARK-32920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246804#comment-17246804 ] Apache Spark commented on SPARK-32920: -- User 'venkata91' has created a pull request for this issue: https://github.com/apache/spark/pull/30691 > Add support in Spark driver to coordinate the finalization of the push/merge > phase in push-based shuffle for a given shuffle and the initiation of the > reduce stage > --- > > Key: SPARK-32920 > URL: https://issues.apache.org/jira/browse/SPARK-32920 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Assignee: Apache Spark >Priority: Major > > With push-based shuffle, we are currently decoupling map task executions from > the shuffle block push process. Thus, when all map tasks finish, we might > want to wait for some small extra time to allow more shuffle blocks to get > pushed and merged. This requires some extra coordination in the Spark driver > when it transitions from a shuffle map stage to the corresponding reduce > stage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32920) Add support in Spark driver to coordinate the finalization of the push/merge phase in push-based shuffle for a given shuffle and the initiation of the reduce stage
[ https://issues.apache.org/jira/browse/SPARK-32920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32920: Assignee: Apache Spark > Add support in Spark driver to coordinate the finalization of the push/merge > phase in push-based shuffle for a given shuffle and the initiation of the > reduce stage > --- > > Key: SPARK-32920 > URL: https://issues.apache.org/jira/browse/SPARK-32920 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Assignee: Apache Spark >Priority: Major > > With push-based shuffle, we are currently decoupling map task executions from > the shuffle block push process. Thus, when all map tasks finish, we might > want to wait for some small extra time to allow more shuffle blocks to get > pushed and merged. This requires some extra coordination in the Spark driver > when it transitions from a shuffle map stage to the corresponding reduce > stage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32920) Add support in Spark driver to coordinate the finalization of the push/merge phase in push-based shuffle for a given shuffle and the initiation of the reduce stage
[ https://issues.apache.org/jira/browse/SPARK-32920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246803#comment-17246803 ] Apache Spark commented on SPARK-32920: -- User 'venkata91' has created a pull request for this issue: https://github.com/apache/spark/pull/30691 > Add support in Spark driver to coordinate the finalization of the push/merge > phase in push-based shuffle for a given shuffle and the initiation of the > reduce stage > --- > > Key: SPARK-32920 > URL: https://issues.apache.org/jira/browse/SPARK-32920 > Project: Spark > Issue Type: Sub-task > Components: Shuffle, Spark Core >Affects Versions: 3.1.0 >Reporter: Min Shen >Priority: Major > > With push-based shuffle, we are currently decoupling map task executions from > the shuffle block push process. Thus, when all map tasks finish, we might > want to wait for some small extra time to allow more shuffle blocks to get > pushed and merged. This requires some extra coordination in the Spark driver > when it transitions from a shuffle map stage to the corresponding reduce > stage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246799#comment-17246799 ] Apache Spark commented on SPARK-33725: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30690 > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33725: Assignee: L. C. Hsieh (was: Apache Spark) > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246798#comment-17246798 ] Apache Spark commented on SPARK-33725: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30690 > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
[ https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33725: Assignee: Apache Spark (was: L. C. Hsieh) > Upgrade snappy-java to 1.1.8.2 > -- > > Key: SPARK-33725 > URL: https://issues.apache.org/jira/browse/SPARK-33725 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Major > > Minor version upgrade that includes: > * Fixed an initialization issue when using a recent Mac OS X version #265 > * Support Apple Silicon (M1, Mac-aarch64) > * Fixed the pure-java Snappy fallback logic when no native library for your > platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33261) Allow people to extend the pod feature steps
[ https://issues.apache.org/jira/browse/SPARK-33261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33261: -- Parent: (was: SPARK-33005) Issue Type: Improvement (was: Sub-task) > Allow people to extend the pod feature steps > > > Key: SPARK-33261 > URL: https://issues.apache.org/jira/browse/SPARK-33261 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > While we allow people to specify pod templates, some deployments could > benefit from being able to add a feature step. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33261) Allow people to extend the pod feature steps
[ https://issues.apache.org/jira/browse/SPARK-33261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-33261: -- Affects Version/s: (was: 3.1.0) 3.2.0 > Allow people to extend the pod feature steps > > > Key: SPARK-33261 > URL: https://issues.apache.org/jira/browse/SPARK-33261 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.2.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > > While we allow people to specify pod templates, some deployments could > benefit from being able to add a feature step. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33725) Upgrade snappy-java to 1.1.8.2
L. C. Hsieh created SPARK-33725: --- Summary: Upgrade snappy-java to 1.1.8.2 Key: SPARK-33725 URL: https://issues.apache.org/jira/browse/SPARK-33725 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.2.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh Minor version upgrade that includes: * Fixed an initialization issue when using a recent Mac OS X version #265 * Support Apple Silicon (M1, Mac-aarch64) * Fixed the pure-java Snappy fallback logic when no native library for your platform is found. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33724) Allow decommissioning script location to be configured
Holden Karau created SPARK-33724: Summary: Allow decommissioning script location to be configured Key: SPARK-33724 URL: https://issues.apache.org/jira/browse/SPARK-33724 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.1.0, 3.2.0 Reporter: Holden Karau Assignee: Holden Karau Some people don't use the Spark image tool and instead do custom volume mounts to make Spark available. As such the hard coded path does not work well for them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33722) Handle DELETE in ReplaceNullWithFalseInPredicate
[ https://issues.apache.org/jira/browse/SPARK-33722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33722. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30688 [https://github.com/apache/spark/pull/30688] > Handle DELETE in ReplaceNullWithFalseInPredicate > > > Key: SPARK-33722 > URL: https://issues.apache.org/jira/browse/SPARK-33722 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.2.0 > > > We should handle delete statements in {{ReplaceNullWithFalseInPredicate}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33722) Handle DELETE in ReplaceNullWithFalseInPredicate
[ https://issues.apache.org/jira/browse/SPARK-33722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33722: - Assignee: Anton Okolnychyi > Handle DELETE in ReplaceNullWithFalseInPredicate > > > Key: SPARK-33722 > URL: https://issues.apache.org/jira/browse/SPARK-33722 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > > We should handle delete statements in {{ReplaceNullWithFalseInPredicate}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32110) -0.0 vs 0.0 is inconsistent
[ https://issues.apache.org/jira/browse/SPARK-32110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246727#comment-17246727 ] Dongjoon Hyun commented on SPARK-32110: --- How do you think about the above [~revans2]'s comment, [~cloud_fan]? > -0.0 vs 0.0 is inconsistent > --- > > Key: SPARK-32110 > URL: https://issues.apache.org/jira/browse/SPARK-32110 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Robert Joseph Evans >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.0.2, 3.1.0 > > > This is related to SPARK-26021 where some things were fixed but there is > still a lot that is not consistent. > When parsing SQL {{-0.0}} is turned into {{0.0}}. This can produce quick > results that appear to be correct but are totally inconsistent for the same > operators. > {code:java} > scala> import spark.implicits._ > import spark.implicits._ > scala> spark.sql("SELECT 0.0 = -0.0").collect > res0: Array[org.apache.spark.sql.Row] = Array([true]) > scala> Seq((0.0, -0.0)).toDF("a", "b").selectExpr("a = b").collect > res1: Array[org.apache.spark.sql.Row] = Array([false]) > {code} > This also shows up in sorts > {code:java} > scala> Seq((0.0, -100.0), (-0.0, 100.0), (0.0, 100.0), (-0.0, > -100.0)).toDF("a", "b").orderBy("a", "b").collect > res2: Array[org.apache.spark.sql.Row] = Array([-0.0,-100.0], [-0.0,100.0], > [0.0,-100.0], [0.0,100.0]) > {code} > But not for a equi-join or for an aggregate > {code:java} > scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", > "r_b"), $"a" === $"r_a").collect > res3: Array[org.apache.spark.sql.Row] = Array([0.0,-0.0,-0.0,0.0]) > scala> Seq((0.0, 1.0), (-0.0, 1.0)).toDF("a", "b").groupBy("a").count.collect > res6: Array[org.apache.spark.sql.Row] = Array([0.0,2]) > {code} > This can lead to some very odd results. Like an equi-join with a filter that > logically should do nothing, but ends up filtering the result to nothing. > {code:java} > scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", > "r_b"), $"a" === $"r_a" && $"a" <= $"r_a").collect > res8: Array[org.apache.spark.sql.Row] = Array() > scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", > "r_b"), $"a" === $"r_a").collect > res9: Array[org.apache.spark.sql.Row] = Array([0.0,-0.0,-0.0,0.0]) > {code} > Hive never normalizes -0.0 to 0.0 so this results in non-ieee complaint > behavior everywhere, but at least it is consistently odd. > MySQL, Oracle, Postgres, and SQLite all appear to normalize the {{-0.0}} to > {{0.0}}. > The root cause of this appears to be that the java implementation of > {{Double.compare}} and {{Float.compare}} for open JDK places {{-0.0}} < > {{0.0}}. > This is not documented in the java docs but it is clearly documented in the > code, so it is not a "bug" that java is going to fix. > [https://github.com/openjdk/jdk/blob/a0a0539b0d3f9b6809c9759e697bfafd7b138ec1/src/java.base/share/classes/java/lang/Double.java#L1022-L1035] > It is also consistent with what is in the java docs for {{Double.equals}} > > [https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#equals-java.lang.Object-] > To be clear I am filing this mostly to document the current state rather than > to think it needs to be fixed ASAP. It is a rare corner case, but ended up > being really frustrating for me to debug what was happening. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data
[ https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246725#comment-17246725 ] Dongjoon Hyun commented on SPARK-18105: --- Apache Spark 3.x is using lz4-java-1.7.1.jar and this seems to be fixed by upgrading dependency, [~cloud_fan]. I'm not aware of any new incident about this. cc [~viirya] since he is working on the codec issue in Hadoop community recently. cc [~sunchao], too > LZ4 failed to decompress a stream of shuffled data > -- > > Key: SPARK-18105 > URL: https://issues.apache.org/jira/browse/SPARK-18105 > Project: Spark > Issue Type: Bug > Components: Spark Core >Reporter: Davies Liu >Priority: Major > > When lz4 is used to compress the shuffle files, it may fail to decompress it > as "stream is corrupt" > {code} > Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: > Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in > stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted > at > org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220) > at > org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109) > at java.io.BufferedInputStream.read(BufferedInputStream.java:353) > at java.io.DataInputStream.read(DataInputStream.java:149) > at com.google.common.io.ByteStreams.read(ByteStreams.java:828) > at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127) > at > org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110) > at scala.collection.Iterator$$anon$13.next(Iterator.scala:372) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30) > at > org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) > at > org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} > https://github.com/jpountz/lz4-java/issues/89 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33722) Handle DELETE in ReplaceNullWithFalseInPredicate
[ https://issues.apache.org/jira/browse/SPARK-33722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246722#comment-17246722 ] Apache Spark commented on SPARK-33722: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/30688 > Handle DELETE in ReplaceNullWithFalseInPredicate > > > Key: SPARK-33722 > URL: https://issues.apache.org/jira/browse/SPARK-33722 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Anton Okolnychyi >Priority: Major > > We should handle delete statements in {{ReplaceNullWithFalseInPredicate}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org