date:20201209

[jira] [Created] (SPARK-33733) PullOutNondeterministic should check and collect deterministic field

2020-12-09 Thread ulysses you (Jira)

ulysses you created SPARK-33733:
---

 Summary: PullOutNondeterministic should check and collect 
deterministic field
 Key: SPARK-33733
 URL: https://issues.apache.org/jira/browse/SPARK-33733
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: ulysses you


The deterministic field is wider than `NonDerterministic`, we should keepe same 
range between pull out and check analysis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33732.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/30700

> Kubernetes integration tests doesn't work with Minikube 1.9+
> 
>
> Key: SPARK-33732
> URL: https://issues.apache.org/jira/browse/SPARK-33732
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.1.0
>
>
> Kubernetes integration tests doesn't work with Minikube 1.9+.
> This is due to the location of apiserver.crt and apiserver.key is changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33714) Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework

2020-12-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33714.
-
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30676
[https://github.com/apache/spark/pull/30676]

> Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework
> --
>
> Key: SPARK-33714
> URL: https://issues.apache.org/jira/browse/SPARK-33714
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.2.0
>
>
> Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33714) Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework

2020-12-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33714:
---

Assignee: Terry Kim

> Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework
> --
>
> Key: SPARK-33714
> URL: https://issues.apache.org/jira/browse/SPARK-33714
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
>
> Migrate ALTER VIEW ... SET/UNSET TBLPROPERTIES to new resolution framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests

2020-12-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33558.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30685
[https://github.com/apache/spark/pull/30685]

> Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests
> --
>
> Key: SPARK-33558
> URL: https://issues.apache.org/jira/browse/SPARK-33558
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them 
> for V1 and v2 datasources. Some tests can be places to V1 and V2 specific 
> test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests

2020-12-09 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33558:
---

Assignee: Maxim Gekk

> Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests
> --
>
> Key: SPARK-33558
> URL: https://issues.apache.org/jira/browse/SPARK-33558
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them 
> for V1 and v2 datasources. Some tests can be places to V1 and V2 specific 
> test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33724) Allow decommissioning script location to be configured

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33724.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30694
[https://github.com/apache/spark/pull/30694]

> Allow decommissioning script location to be configured
> --
>
> Key: SPARK-33724
> URL: https://issues.apache.org/jira/browse/SPARK-33724
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Trivial
> Fix For: 3.2.0
>
>
> Some people don't use the Spark image tool and instead do custom volume 
> mounts to make Spark available. As such the hard coded path does not work 
> well for them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247009#comment-17247009
 ] 

Apache Spark commented on SPARK-33732:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/30700

> Kubernetes integration tests doesn't work with Minikube 1.9+
> 
>
> Key: SPARK-33732
> URL: https://issues.apache.org/jira/browse/SPARK-33732
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Kubernetes integration tests doesn't work with Minikube 1.9+.
> This is due to the location of apiserver.crt and apiserver.key is changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33732:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Kubernetes integration tests doesn't work with Minikube 1.9+
> 
>
> Key: SPARK-33732
> URL: https://issues.apache.org/jira/browse/SPARK-33732
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Kubernetes integration tests doesn't work with Minikube 1.9+.
> This is due to the location of apiserver.crt and apiserver.key is changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33732:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Kubernetes integration tests doesn't work with Minikube 1.9+
> 
>
> Key: SPARK-33732
> URL: https://issues.apache.org/jira/browse/SPARK-33732
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> Kubernetes integration tests doesn't work with Minikube 1.9+.
> This is due to the location of apiserver.crt and apiserver.key is changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247008#comment-17247008
 ] 

Apache Spark commented on SPARK-33732:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/30700

> Kubernetes integration tests doesn't work with Minikube 1.9+
> 
>
> Key: SPARK-33732
> URL: https://issues.apache.org/jira/browse/SPARK-33732
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> Kubernetes integration tests doesn't work with Minikube 1.9+.
> This is due to the location of apiserver.crt and apiserver.key is changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33732) Kubernetes integration tests doesn't work with Minikube 1.9+

2020-12-09 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-33732:
--

 Summary: Kubernetes integration tests doesn't work with Minikube 
1.9+
 Key: SPARK-33732
 URL: https://issues.apache.org/jira/browse/SPARK-33732
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes, Tests
Affects Versions: 3.1.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


Kubernetes integration tests doesn't work with Minikube 1.9+.
This is due to the location of apiserver.crt and apiserver.key is changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33730) Standardize warning types

2020-12-09 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33730:
-
Description: 
We should use warnings properly per 
[https://docs.python.org/3/library/warnings.html#warning-categories]

In particular,
 - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the 
places we should show the warnings to end-users by default.
 - we should __maybe__ think about customizing stacklevel 
([https://docs.python.org/3/library/warnings.html#warnings.warn]) like pandas 
does.
 - ...

Current warnings are a bit messy and somewhat arbitrary.

To be more explicit, we'll have to fix:
{code:java}
pyspark/context.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/ml/classification.py:warnings.warn("weightCol is 
ignored, "
pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be 
removed in future versions. Use "
pyspark/mllib/classification.py:warnings.warn(
pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
are false. The model does nothing.")
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
pyspark/rdd.py:warnings.warn(
pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
pyspark/shuffle.py:warnings.warn("Please install psutil to have 
better "
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/dataframe.py:warnings.warn(
pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict 
and value is not None. value will be ignored.")
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees 
instead.", DeprecationWarning)
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians 
instead.", DeprecationWarning)
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use 
approx_count_distinct instead.", DeprecationWarning)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/functions.py:warnings.warn(
pyspark/sql/pandas/group_ops.py:warnings.warn(
pyspark/sql/session.py:warnings.warn("Fall back to non-hive 
support because failing to access HiveConf, "
{code}
PySpark prints warnings via using {{print}} in some places as well. We should 
also see if we should switch and replace to {{warnings.warn}}.

  was:
We should use warnings properly per 
https://docs.python.org/3/library/warnings.html#warning-categories

In particular,
- we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the 
places we should show the warnings to end-users by default.
- we should __maybe__ think about customizing stacklevel 
(https://docs.python.org/3/library/warnings.html#warnings.warn) like pandas 
does.
- ...

Current warnings are a bit messy and somewhat arbitrary.

To be more explicit, we'll have to fix:

{code}
pyspark/cloudpickle/cloudpickle.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/ml/classification.py:warnings.warn("weightCol is 
ignored, "
pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be 
removed in future versions. Use "
pyspark/mllib/classification.py:warnings.warn(
pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
are false. The model does nothing.")
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
pyspark/rdd.py:warnings.warn(
pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
pyspark/shuffle.py:warnings.warn("Please install psutil to have 
better "
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(

[jira] [Updated] (SPARK-33730) Standardize warning types

2020-12-09 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33730:
-
Description: 
We should use warnings properly per 
https://docs.python.org/3/library/warnings.html#warning-categories

In particular,
- we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the 
places we should show the warnings to end-users by default.
- we should __maybe__ think about customizing stacklevel 
(https://docs.python.org/3/library/warnings.html#warnings.warn) like pandas 
does.
- ...

Current warnings are a bit messy and somewhat arbitrary.

To be more explicit, we'll have to fix:

{code}
pyspark/cloudpickle/cloudpickle.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/ml/classification.py:warnings.warn("weightCol is 
ignored, "
pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be 
removed in future versions. Use "
pyspark/mllib/classification.py:warnings.warn(
pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
are false. The model does nothing.")
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
pyspark/rdd.py:warnings.warn(
pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
pyspark/shuffle.py:warnings.warn("Please install psutil to have 
better "
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/dataframe.py:warnings.warn(
pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict 
and value is not None. value will be ignored.")
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees 
instead.", DeprecationWarning)
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians 
instead.", DeprecationWarning)
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use 
approx_count_distinct instead.", DeprecationWarning)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/functions.py:warnings.warn(
pyspark/sql/pandas/group_ops.py:warnings.warn(
pyspark/sql/session.py:warnings.warn("Fall back to non-hive 
support because failing to access HiveConf, "
{code}

PySpark prints warnings via using {{print}} in some places as well. We should 
also see if we should switch and replace to {{warnings.warn}}.

  was:
We should use warnings properly per 
https://docs.python.org/3/library/warnings.html#warning-categories

In particular, we should use {{FutureWarning}} instead of 
{{DeprecationWarning}} for the places we should show the warnings to end-users 
by default.

Current warnings are a bit messy and somewhat arbitrary.

To be more explicit, we'll have to fix:

{code}
pyspark/cloudpickle/cloudpickle.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/ml/classification.py:warnings.warn("weightCol is 
ignored, "
pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be 
removed in future versions. Use "
pyspark/mllib/classification.py:warnings.warn(
pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
are false. The model does nothing.")
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
pyspark/rdd.py:warnings.warn(
pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
pyspark/shuffle.py:warnings.warn("Please install psutil to have 
better "
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(

[jira] [Updated] (SPARK-33730) Standardize warning types

2020-12-09 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33730:
-
Description: 
We should use warnings properly per 
https://docs.python.org/3/library/warnings.html#warning-categories

In particular, we should use {{FutureWarning}} instead of 
{{DeprecationWarning}} for the places we should show the warnings to end-users 
by default.

Current warnings are a bit messy and somewhat arbitrary.

To be more explicit, we'll have to fix:

{code}
pyspark/cloudpickle/cloudpickle.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/ml/classification.py:warnings.warn("weightCol is 
ignored, "
pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be 
removed in future versions. Use "
pyspark/mllib/classification.py:warnings.warn(
pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
are false. The model does nothing.")
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
pyspark/rdd.py:warnings.warn(
pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
pyspark/shuffle.py:warnings.warn("Please install psutil to have 
better "
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/dataframe.py:warnings.warn(
pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict 
and value is not None. value will be ignored.")
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees 
instead.", DeprecationWarning)
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians 
instead.", DeprecationWarning)
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use 
approx_count_distinct instead.", DeprecationWarning)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/functions.py:warnings.warn(
pyspark/sql/pandas/group_ops.py:warnings.warn(
pyspark/sql/session.py:warnings.warn("Fall back to non-hive 
support because failing to access HiveConf, "
{code}

PySpark prints warnings via using {{print}} in some places as well. We should 
also see if we should switch and replace to {{warnings.warn}}.

  was:
We should use warnings properly per 
https://docs.python.org/3/library/warnings.html#warning-categories

In particular, we should use {{FutureWarning}} instead of 
{{DeprecationWarning}} if we aim to show the warnings by default.

Current warnings are a bit messy and somewhat arbitrary.

To be more explicit, we'll have to fix:

{code}
pyspark/cloudpickle/cloudpickle.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/ml/classification.py:warnings.warn("weightCol is 
ignored, "
pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be 
removed in future versions. Use "
pyspark/mllib/classification.py:warnings.warn(
pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
are false. The model does nothing.")
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
pyspark/rdd.py:warnings.warn(
pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
pyspark/shuffle.py:warnings.warn("Please install psutil to have 
better "
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/dataframe.py:warnings.warn(

[jira] [Updated] (SPARK-33730) Standardize warning types

2020-12-09 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33730:
-
Description: 
We should use warnings properly per 
https://docs.python.org/3/library/warnings.html#warning-categories

In particular, we should use {{FutureWarning}} instead of 
{{DeprecationWarning}} if we aim to show the warnings by default.

Current warnings are a bit messy and somewhat arbitrary.

To be more explicit, we'll have to fix:

{code}
pyspark/cloudpickle/cloudpickle.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/context.py:warnings.warn(
pyspark/ml/classification.py:warnings.warn("weightCol is 
ignored, "
pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will be 
removed in future versions. Use "
pyspark/mllib/classification.py:warnings.warn(
pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
are false. The model does nothing.")
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/mllib/regression.py:warnings.warn(
pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
pyspark/rdd.py:warnings.warn(
pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
pyspark/shuffle.py:warnings.warn("Please install psutil to have 
better "
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/catalog.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/column.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/context.py:warnings.warn(
pyspark/sql/dataframe.py:warnings.warn(
pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict 
and value is not None. value will be ignored.")
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees 
instead.", DeprecationWarning)
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians 
instead.", DeprecationWarning)
pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use 
approx_count_distinct instead.", DeprecationWarning)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/conversion.py:warnings.warn(msg)
pyspark/sql/pandas/functions.py:warnings.warn(
pyspark/sql/pandas/group_ops.py:warnings.warn(
pyspark/sql/session.py:warnings.warn("Fall back to non-hive 
support because failing to access HiveConf, "
{code}

PySpark prints warnings via using {{print}} in some places as well. We should 
also see if we should switch and replace to {{warnings.warn}}.

  was:
We should use warnings properly per 
https://docs.python.org/3/library/warnings.html#warning-categories

In particular, we should use {{FutureWarning}} instead of 
{{DeprecationWarning}} if we aim to show the warnings by default.

Current warnings are a bit messy and somewhat arbitrary.


> Standardize warning types
> -
>
> Key: SPARK-33730
> URL: https://issues.apache.org/jira/browse/SPARK-33730
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should use warnings properly per 
> https://docs.python.org/3/library/warnings.html#warning-categories
> In particular, we should use {{FutureWarning}} instead of 
> {{DeprecationWarning}} if we aim to show the warnings by default.
> Current warnings are a bit messy and somewhat arbitrary.
> To be more explicit, we'll have to fix:
> {code}
> pyspark/cloudpickle/cloudpickle.py:warnings.warn(
> pyspark/context.py:warnings.warn(
> pyspark/context.py:warnings.warn(
> pyspark/ml/classification.py:warnings.warn("weightCol is 
> ignored, "
> pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will 
> be removed in future versions. Use "
> pyspark/mllib/classification.py:warnings.warn(
> pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
> are false. The model does nothing.")
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
> pyspark/rdd.py:

[jira] [Commented] (SPARK-33730) Standardize warning types

2020-12-09 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246997#comment-17246997
 ] 

Hyukjin Kwon commented on SPARK-33730:
--

[~zero323] would you be interested in this?

> Standardize warning types
> -
>
> Key: SPARK-33730
> URL: https://issues.apache.org/jira/browse/SPARK-33730
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> We should use warnings properly per 
> https://docs.python.org/3/library/warnings.html#warning-categories
> In particular, we should use {{FutureWarning}} instead of 
> {{DeprecationWarning}} if we aim to show the warnings by default.
> Current warnings are a bit messy and somewhat arbitrary.
> To be more explicit, we'll have to fix:
> {code}
> pyspark/cloudpickle/cloudpickle.py:warnings.warn(
> pyspark/context.py:warnings.warn(
> pyspark/context.py:warnings.warn(
> pyspark/ml/classification.py:warnings.warn("weightCol is 
> ignored, "
> pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will 
> be removed in future versions. Use "
> pyspark/mllib/classification.py:warnings.warn(
> pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
> are false. The model does nothing.")
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
> pyspark/rdd.py:warnings.warn(
> pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
> pyspark/shuffle.py:warnings.warn("Please install psutil to have 
> better "
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict 
> and value is not None. value will be ignored.")
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use 
> approx_count_distinct instead.", DeprecationWarning)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/functions.py:warnings.warn(
> pyspark/sql/pandas/group_ops.py:warnings.warn(
> pyspark/sql/session.py:warnings.warn("Fall back to non-hive 
> support because failing to access HiveConf, "
> {code}
> PySpark prints warnings via using {{print}} in some places as well. We should 
> also see if we should switch and replace to {{warnings.warn}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33731) Standardize exception types

2020-12-09 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-33731:


 Summary: Standardize exception types
 Key: SPARK-33731
 URL: https://issues.apache.org/jira/browse/SPARK-33731
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


We should:
- have a better hierarchy for exception types
- or at least use the default type of exceptions correctly instead of just 
throwing a plain Exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33730) Standardize warning types

2020-12-09 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-33730:


 Summary: Standardize warning types
 Key: SPARK-33730
 URL: https://issues.apache.org/jira/browse/SPARK-33730
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


We should use warnings properly per 
https://docs.python.org/3/library/warnings.html#warning-categories

In particular, we should use {{FutureWarning}} instead of 
{{DeprecationWarning}} if we aim to show the warnings by default.

Current warnings are a bit messy and somewhat arbitrary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33727:


Assignee: Holden Karau

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Holden Karau
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33727.
--
Fix Version/s: 3.0.2
   3.1.0
   Resolution: Fixed

Issue resolved by pull request 30696
[https://github.com/apache/spark/pull/30696]

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33721) Support to use Hive build-in functions by configuration

2020-12-09 Thread chenliang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenliang updated SPARK-33721:
--
Description: 
Hive and Spark SQL engines have many differences in built-in functions.The 
differences between several functions are shown below：
||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL||
|unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{,  }}{{' 
24:00:00'}}{{));}}|1591027200| NULL|
|to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL|
|datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, 
}}{{'-00-00'}}{{);}}|737986| NULL|
|collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, collect_set(c3)) 
c3_set }}{{from }}{{bigdata_offline.test_collect_set }}{{group }}{{by }}{{c1, 
c2;}}
 {{bigdata_offline.test_collect_set contains data:}}
 {{(1, 1, }}{{'1'}}{{)，}}{{(1, 1, }}{{'2'}}{{)}}{{,}}
 {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}}
 {{(1, 1, }}{{'5'}}{{)}}|{{c1  c2  c3_set}}
 {{1   1   2##3##4##5##1}}|{{c1  c2  c3_set}}
 {{1   1   3##1##2##5##4}}|

There is no conclusion on which engine is  more accurate. Users prefer to be 
able to make choices according to their real production environment.

I think we should do some improvement for this.

 

Hive version is 1.2.1 

 

  was:
Hive and Spark SQL engines have many differences in built-in functions.The 
differences between several functions are shown below：
||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL||
|unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{,  }}{{' 
24:00:00'}}{{));}}|1591027200| NULL|
|to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL|
|datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, 
}}{{'-00-00'}}{{);}}|737986| NULL|
|collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, collect_set(c3)) 
c3_set }}{{from}}{{bigdata_offline.test_collect_set }}{{group }}{{by }}{{c1, 
c2;}}
 {{bigdata_offline.test_collect_set contains data:}}
 {{(1, 1, }}{{'1'}}{{)，}}{{(1, 1, }}{{'2'}}{{)}}{{,}}
 {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}}
 {{(1, 1, }}{{'5'}}{{)}}|{{c1  c2  c3_set}}
 {{1   1   2##3##4##5##1}}|{{c1  c2  c3_set}}
 {{1   1   3##1##2##5##4}}|

There is no conclusion on which engine is  more accurate. Users prefer to be 
able to make choices according to their real production environment.

I think we should do some improvement for this.

 

Hive version is 1.2.1 

 


> Support to use Hive build-in functions by configuration
> ---
>
> Key: SPARK-33721
> URL: https://issues.apache.org/jira/browse/SPARK-33721
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3, 3.2.0
>Reporter: chenliang
>Priority: Major
>
> Hive and Spark SQL engines have many differences in built-in functions.The 
> differences between several functions are shown below：
> ||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL||
> |unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{,  
> }}{{' 24:00:00'}}{{));}}|1591027200| NULL|
> |to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL|
> |datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, 
> }}{{'-00-00'}}{{);}}|737986| NULL|
> |collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, 
> collect_set(c3)) c3_set }}{{from }}{{bigdata_offline.test_collect_set 
> }}{{group }}{{by }}{{c1, c2;}}
>  {{bigdata_offline.test_collect_set contains data:}}
>  {{(1, 1, }}{{'1'}}{{)，}}{{(1, 1, }}{{'2'}}{{)}}{{,}}
>  {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}}
>  {{(1, 1, }}{{'5'}}{{)}}|{{c1  c2  c3_set}}
>  {{1   1   2##3##4##5##1}}|{{c1  c2  c3_set}}
>  {{1   1   3##1##2##5##4}}|
> There is no conclusion on which engine is  more accurate. Users prefer to be 
> able to make choices according to their real production environment.
> I think we should do some improvement for this.
>  
> Hive version is 1.2.1 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33721) Support to use Hive build-in functions by configuration

2020-12-09 Thread chenliang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenliang updated SPARK-33721:
--
Description: 
Hive and Spark SQL engines have many differences in built-in functions.The 
differences between several functions are shown below：
||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL||
|unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{,  }}{{' 
24:00:00'}}{{));}}|1591027200| NULL|
|to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL|
|datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, 
}}{{'-00-00'}}{{);}}|737986| NULL|
|collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, collect_set(c3)) 
c3_set }}{{from}}{{bigdata_offline.test_collect_set }}{{group }}{{by }}{{c1, 
c2;}}
 {{bigdata_offline.test_collect_set contains data:}}
 {{(1, 1, }}{{'1'}}{{)，}}{{(1, 1, }}{{'2'}}{{)}}{{,}}
 {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}}
 {{(1, 1, }}{{'5'}}{{)}}|{{c1  c2  c3_set}}
 {{1   1   2##3##4##5##1}}|{{c1  c2  c3_set}}
 {{1   1   3##1##2##5##4}}|

There is no conclusion on which engine is  more accurate. Users prefer to be 
able to make choices according to their real production environment.

I think we should do some improvement for this.

 

Hive version is 1.2.1 

 

  was:
Hive and Spark SQL engines have many differences in built-in functions.The 
differences between several functions are shown below：
||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL||
|unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{,  }}{{' 
24:00:00'}}{{));}}|1591027200| NULL|
|to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL|
|datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, 
}}{{'-00-00'}}{{);}}|737986| NULL|
|collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, collect_set(c3)) 
c3_set }}{{from}}{{bigdata_offline.test_collect_set }}{{group }}{{by }}{{c1, 
c2;}}
 {{bigdata_offline.test_collect_set contains data:}}
 {{(1, 1, }}{{'1'}}{{)，}}{{(1, 1, }}{{'2'}}{{)}}{{,}}
 {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}}
 {{(1, 1, }}{{'5'}}{{)}}|{{c1  c2  c3_set}}
 {{1   1   2##3##4##5##1}}|{{c1  c2  c3_set}}
 {{1   1   3##1##2##5##4}}|

There is no conclusion on which engine is  more accurate. Users prefer to be 
able to make choices according to their real production environment.

I think we should do some improvement for this.

 

 


> Support to use Hive build-in functions by configuration
> ---
>
> Key: SPARK-33721
> URL: https://issues.apache.org/jira/browse/SPARK-33721
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3, 3.2.0
>Reporter: chenliang
>Priority: Major
>
> Hive and Spark SQL engines have many differences in built-in functions.The 
> differences between several functions are shown below：
> ||*build-in functions*||SQL|| result of Hive SQL ||result of Spark SQL||
> |unix_timestamp|{{select}} {{unix_timestamp(concat(}}{{'2020-06-01'}}{{,  
> }}{{' 24:00:00'}}{{));}}|1591027200| NULL|
> |to_date|{{select}} {{to_date(}}{{'-00-00'}}{{);}}|0002-11-30| NULL|
> |datediff|{{select }}{{datediff(}}{{CURRENT_DATE}}{{, 
> }}{{'-00-00'}}{{);}}|737986| NULL|
> |collect_set|{{select}}{{c1}}{{,c2}}{{,concat_ws(}}{{'##'}}{{, 
> collect_set(c3)) c3_set }}{{from}}{{bigdata_offline.test_collect_set 
> }}{{group }}{{by }}{{c1, c2;}}
>  {{bigdata_offline.test_collect_set contains data:}}
>  {{(1, 1, }}{{'1'}}{{)，}}{{(1, 1, }}{{'2'}}{{)}}{{,}}
>  {{(1, 1, }}{{'3'}}{{)}}{{,}}{{(1, 1, }}{{'4'}}{{)}}{{,}}
>  {{(1, 1, }}{{'5'}}{{)}}|{{c1  c2  c3_set}}
>  {{1   1   2##3##4##5##1}}|{{c1  c2  c3_set}}
>  {{1   1   3##1##2##5##4}}|
> There is no conclusion on which engine is  more accurate. Users prefer to be 
> able to make choices according to their real production environment.
> I think we should do some improvement for this.
>  
> Hive version is 1.2.1 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33725:
--
Fix Version/s: 2.4.8

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 2.4.8, 3.1.0
>
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33725:
--
Fix Version/s: (was: 3.2.0)
   3.1.0

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.1.0
>
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-22769) When driver stopping, there is errors: "Could not find CoarseGrainedScheduler" and "RpcEnv already stopped"

2020-12-09 Thread Su Qilong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-22769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Su Qilong reopened SPARK-22769:
---

The original reporter gave up work on this，i make a new change for this

> When driver stopping, there is errors: "Could not find 
> CoarseGrainedScheduler" and "RpcEnv already stopped"
> ---
>
> Key: SPARK-22769
> URL: https://issues.apache.org/jira/browse/SPARK-22769
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: KaiXinXIaoLei
>Priority: Major
>
> I run "spark-sql --master yarn --num-executors 1000 -f createTable.sql". When 
> task is finished, there is a error: org.apache.spark.SparkException: Could 
> not find CoarseGrainedScheduler. I think the log level should be warning, not 
> error.
> {noformat}
> 17/12/12 18:30:16 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 17/12/12 18:30:16 ERROR TransportRequestHandler: Error while invoking 
> RpcHandler#receive() for one-way message.
> org.apache.spark.SparkException: Could not find CoarseGrainedScheduler.
> at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:154)
> at 
> org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:180)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)
> at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)
> at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:367)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:353)
> at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:346)
> at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
> {noformat}
> and another error is :
> {noformat}
> 17/12/12 18:20:44 INFO MemoryStore: MemoryStore cleared
> 17/12/12 18:20:44 INFO BlockManager: BlockManager stopped
> 17/12/12 18:20:44 INFO BlockManagerMaster: BlockManagerMaster stopped
> 17/12/12 18:20:44 ERROR TransportRequestHandler: Error while invoking 
> RpcHandler#receive() for one-way message.
> org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already stopped.
> at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
> at 
> org.apache.spark.rpc.netty.Dispatcher.postOneWayMessage(Dispatcher.scala:134)
> at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:570)
> at 
> org.apache.spark.network.server.TransportRequestHandler.processOneWayMessage(TransportRequestHandler.java:180)
> at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
> at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
> at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at 
>

[jira] [Comment Edited] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog

2020-12-09 Thread GeoffreyStark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246957#comment-17246957
 ] 

GeoffreyStark edited comment on SPARK-23086 at 12/10/20, 1:59 AM:
--

In the case I encountered before, I checked later that SPARK was blocked not in 
the case of high concurrency, but because the NameNode's FoldedTreeset in 
Hadoop3.x was 
defective([HDFS-13671|https://issues.apache.org/jira/browse/HDFS-13671]), 
resulting in extremely unstable RPC, which was the root cause of SPARK 
blocking:)

 


was (Author: gaofeng6):
In the case I encountered before, I checked later that SPark was blocked not in 
the case of high concurrency, but because the NameNode's FoldedTreeset in 
HadoOP3.x was defective, resulting in extremely unstable RPC, which was the 
root cause of SPark blocking:)

> Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
> --
>
> Key: SPARK-23086
> URL: https://issues.apache.org/jira/browse/SPARK-23086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: * Spark 2.2.1
>Reporter: pin_zhang
>Priority: Major
>  Labels: bulk-closed
>
> * Hive metastore is mysql
> * Set hive.server2.thrift.max.worker.threads=500
> create table test (id string ) partitioned by (index int) stored as  
> parquet;
> insert into test  partition (index=1) values('id1');
>  * 100 Clients run SQL“select * from table” on table
>  * Many clients (97%) blocked at HiveExternalCatalog.withClient
>  * Is synchronized expected when only run query against tables?   
> "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 
> waiting for monitor entry [0x4e19a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   - waiting to lock <0xc06a3ba8> (a 
> org.apache.spark.sql.hive.HiveExternalCatalog)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667)
>   - locked <0xc41ab748> (a 
> org.apache.spark.sql.hive.HiveSessionCatalog)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>   at scala.collection.immutable.List.foldLeft(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
>   at 
>

[jira] [Comment Edited] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog

2020-12-09 Thread GeoffreyStark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246958#comment-17246958
 ] 

GeoffreyStark edited comment on SPARK-23086 at 12/10/20, 1:58 AM:
--

Sorry, I forgot to say, I changed my nickname, I am the gaofeng in front of the 
comment section


was (Author: gaofeng6):
Sorry, I forgot to say, I changed my nickname, I am the Gaofeng in front of the 
comment section

> Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
> --
>
> Key: SPARK-23086
> URL: https://issues.apache.org/jira/browse/SPARK-23086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: * Spark 2.2.1
>Reporter: pin_zhang
>Priority: Major
>  Labels: bulk-closed
>
> * Hive metastore is mysql
> * Set hive.server2.thrift.max.worker.threads=500
> create table test (id string ) partitioned by (index int) stored as  
> parquet;
> insert into test  partition (index=1) values('id1');
>  * 100 Clients run SQL“select * from table” on table
>  * Many clients (97%) blocked at HiveExternalCatalog.withClient
>  * Is synchronized expected when only run query against tables?   
> "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 
> waiting for monitor entry [0x4e19a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   - waiting to lock <0xc06a3ba8> (a 
> org.apache.spark.sql.hive.HiveExternalCatalog)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667)
>   - locked <0xc41ab748> (a 
> org.apache.spark.sql.hive.HiveSessionCatalog)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>   at scala.collection.immutable.List.foldLeft(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
>   - locked <0xff491c48> (a 
> org.apache.spark.sql.execution.QueryExecution)
>   at 
>

[jira] [Commented] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog

2020-12-09 Thread GeoffreyStark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246958#comment-17246958
 ] 

GeoffreyStark commented on SPARK-23086:
---

Sorry, I forgot to say, I changed my nickname, I am the Gaofeng in front of the 
comment section

> Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
> --
>
> Key: SPARK-23086
> URL: https://issues.apache.org/jira/browse/SPARK-23086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: * Spark 2.2.1
>Reporter: pin_zhang
>Priority: Major
>  Labels: bulk-closed
>
> * Hive metastore is mysql
> * Set hive.server2.thrift.max.worker.threads=500
> create table test (id string ) partitioned by (index int) stored as  
> parquet;
> insert into test  partition (index=1) values('id1');
>  * 100 Clients run SQL“select * from table” on table
>  * Many clients (97%) blocked at HiveExternalCatalog.withClient
>  * Is synchronized expected when only run query against tables?   
> "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 
> waiting for monitor entry [0x4e19a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   - waiting to lock <0xc06a3ba8> (a 
> org.apache.spark.sql.hive.HiveExternalCatalog)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667)
>   - locked <0xc41ab748> (a 
> org.apache.spark.sql.hive.HiveSessionCatalog)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>   at scala.collection.immutable.List.foldLeft(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
>   - locked <0xff491c48> (a 
> org.apache.spark.sql.execution.QueryExecution)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
>

[jira] [Commented] (SPARK-23086) Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog

2020-12-09 Thread GeoffreyStark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246957#comment-17246957
 ] 

GeoffreyStark commented on SPARK-23086:
---

In the case I encountered before, I checked later that SPark was blocked not in 
the case of high concurrency, but because the NameNode's FoldedTreeset in 
HadoOP3.x was defective, resulting in extremely unstable RPC, which was the 
root cause of SPark blocking:)

> Spark SQL cannot support high concurrency for lock in HiveMetastoreCatalog
> --
>
> Key: SPARK-23086
> URL: https://issues.apache.org/jira/browse/SPARK-23086
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: * Spark 2.2.1
>Reporter: pin_zhang
>Priority: Major
>  Labels: bulk-closed
>
> * Hive metastore is mysql
> * Set hive.server2.thrift.max.worker.threads=500
> create table test (id string ) partitioned by (index int) stored as  
> parquet;
> insert into test  partition (index=1) values('id1');
>  * 100 Clients run SQL“select * from table” on table
>  * Many clients (97%) blocked at HiveExternalCatalog.withClient
>  * Is synchronized expected when only run query against tables?   
> "pool-21-thread-65" #1178 prio=5 os_prio=0 tid=0x2aaac8e06800 nid=0x1e70 
> waiting for monitor entry [0x4e19a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   - waiting to lock <0xc06a3ba8> (a 
> org.apache.spark.sql.hive.HiveExternalCatalog)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:674)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupRelation(SessionCatalog.scala:667)
>   - locked <0xc41ab748> (a 
> org.apache.spark.sql.hive.HiveSessionCatalog)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:646)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:601)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:631)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:61)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:59)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:624)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:570)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>   at scala.collection.immutable.List.foldLeft(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:74)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
>   - locked <0xff491c48> (a 
> org.apache.spark.sql.execution.QueryExecution)
>   at 
>

[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names

2020-12-09 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246946#comment-17246946
 ] 

Hyukjin Kwon commented on SPARK-33713:
--

Yeah, I think it's fine. +1 no worries!

> Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
> -
>
> Key: SPARK-33713
> URL: https://issues.apache.org/jira/browse/SPARK-33713
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> We removed `hive-1.2` profile since branch-3.1. So, we can simplify the 
> Jenkins job title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246945#comment-17246945
 ] 

Hyukjin Kwon commented on SPARK-33727:
--

 I am facing this error in my local too. Thanks [~dongjoon], [~holden] and 
[~shaneknapp] for addressing this issue.

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33729) When refreshing cache, Spark should not use cached plan when recaching data

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246942#comment-17246942
 ] 

Apache Spark commented on SPARK-33729:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/30699

> When refreshing cache, Spark should not use cached plan when recaching data
> ---
>
> Key: SPARK-33729
> URL: https://issues.apache.org/jira/browse/SPARK-33729
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark 
> will call {{refreshTable}} method within {{CatalogImpl}}.
> {code}
>   override def refreshTable(tableName: String): Unit = {
> val tableIdent = 
> sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
> val tableMetadata = 
> sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
> val table = sparkSession.table(tableIdent)
> if (tableMetadata.tableType == CatalogTableType.VIEW) {
>   // Temp or persistent views: refresh (or invalidate) any metadata/data 
> cached
>   // in the plan recursively.
>   table.queryExecution.analyzed.refresh()
> } else {
>   // Non-temp tables: refresh the metadata cache.
>   sessionCatalog.refreshTable(tableIdent)
> }
> // If this table is cached as an InMemoryRelation, drop the original
> // cached version and make the new version cached lazily.
> val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table)
> // uncache the logical plan.
> // note this is a no-op for the table itself if it's not cached, but will 
> invalidate all
> // caches referencing this table.
> sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true)
> if (cache.nonEmpty) {
>   // save the cache name and cache level for recreation
>   val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName
>   val cacheLevel = 
> cache.get.cachedRepresentation.cacheBuilder.storageLevel
>   // recache with the same name and cache level.
>   sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, 
> cacheLevel)
> }
>   }
> {code}
> Note that the {{table}} is created before the table relation cache is 
> cleared, and used later in {{cacheQuery}}. This is incorrect since it still 
> refers cached table relation which could be stale.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33729) When refreshing cache, Spark should not use cached plan when recaching data

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246941#comment-17246941
 ] 

Apache Spark commented on SPARK-33729:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/30699

> When refreshing cache, Spark should not use cached plan when recaching data
> ---
>
> Key: SPARK-33729
> URL: https://issues.apache.org/jira/browse/SPARK-33729
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark 
> will call {{refreshTable}} method within {{CatalogImpl}}.
> {code}
>   override def refreshTable(tableName: String): Unit = {
> val tableIdent = 
> sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
> val tableMetadata = 
> sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
> val table = sparkSession.table(tableIdent)
> if (tableMetadata.tableType == CatalogTableType.VIEW) {
>   // Temp or persistent views: refresh (or invalidate) any metadata/data 
> cached
>   // in the plan recursively.
>   table.queryExecution.analyzed.refresh()
> } else {
>   // Non-temp tables: refresh the metadata cache.
>   sessionCatalog.refreshTable(tableIdent)
> }
> // If this table is cached as an InMemoryRelation, drop the original
> // cached version and make the new version cached lazily.
> val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table)
> // uncache the logical plan.
> // note this is a no-op for the table itself if it's not cached, but will 
> invalidate all
> // caches referencing this table.
> sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true)
> if (cache.nonEmpty) {
>   // save the cache name and cache level for recreation
>   val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName
>   val cacheLevel = 
> cache.get.cachedRepresentation.cacheBuilder.storageLevel
>   // recache with the same name and cache level.
>   sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, 
> cacheLevel)
> }
>   }
> {code}
> Note that the {{table}} is created before the table relation cache is 
> cleared, and used later in {{cacheQuery}}. This is incorrect since it still 
> refers cached table relation which could be stale.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33729) When refreshing cache, Spark should not use cached plan when recaching data

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33729:


Assignee: Apache Spark

> When refreshing cache, Spark should not use cached plan when recaching data
> ---
>
> Key: SPARK-33729
> URL: https://issues.apache.org/jira/browse/SPARK-33729
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Apache Spark
>Priority: Major
>
> Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark 
> will call {{refreshTable}} method within {{CatalogImpl}}.
> {code}
>   override def refreshTable(tableName: String): Unit = {
> val tableIdent = 
> sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
> val tableMetadata = 
> sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
> val table = sparkSession.table(tableIdent)
> if (tableMetadata.tableType == CatalogTableType.VIEW) {
>   // Temp or persistent views: refresh (or invalidate) any metadata/data 
> cached
>   // in the plan recursively.
>   table.queryExecution.analyzed.refresh()
> } else {
>   // Non-temp tables: refresh the metadata cache.
>   sessionCatalog.refreshTable(tableIdent)
> }
> // If this table is cached as an InMemoryRelation, drop the original
> // cached version and make the new version cached lazily.
> val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table)
> // uncache the logical plan.
> // note this is a no-op for the table itself if it's not cached, but will 
> invalidate all
> // caches referencing this table.
> sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true)
> if (cache.nonEmpty) {
>   // save the cache name and cache level for recreation
>   val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName
>   val cacheLevel = 
> cache.get.cachedRepresentation.cacheBuilder.storageLevel
>   // recache with the same name and cache level.
>   sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, 
> cacheLevel)
> }
>   }
> {code}
> Note that the {{table}} is created before the table relation cache is 
> cleared, and used later in {{cacheQuery}}. This is incorrect since it still 
> refers cached table relation which could be stale.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33729) When refreshing cache, Spark should not use cached plan when recaching data

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33729:


Assignee: (was: Apache Spark)

> When refreshing cache, Spark should not use cached plan when recaching data
> ---
>
> Key: SPARK-33729
> URL: https://issues.apache.org/jira/browse/SPARK-33729
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark 
> will call {{refreshTable}} method within {{CatalogImpl}}.
> {code}
>   override def refreshTable(tableName: String): Unit = {
> val tableIdent = 
> sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
> val tableMetadata = 
> sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
> val table = sparkSession.table(tableIdent)
> if (tableMetadata.tableType == CatalogTableType.VIEW) {
>   // Temp or persistent views: refresh (or invalidate) any metadata/data 
> cached
>   // in the plan recursively.
>   table.queryExecution.analyzed.refresh()
> } else {
>   // Non-temp tables: refresh the metadata cache.
>   sessionCatalog.refreshTable(tableIdent)
> }
> // If this table is cached as an InMemoryRelation, drop the original
> // cached version and make the new version cached lazily.
> val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table)
> // uncache the logical plan.
> // note this is a no-op for the table itself if it's not cached, but will 
> invalidate all
> // caches referencing this table.
> sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true)
> if (cache.nonEmpty) {
>   // save the cache name and cache level for recreation
>   val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName
>   val cacheLevel = 
> cache.get.cachedRepresentation.cacheBuilder.storageLevel
>   // recache with the same name and cache level.
>   sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, 
> cacheLevel)
> }
>   }
> {code}
> Note that the {{table}} is created before the table relation cache is 
> cleared, and used later in {{cacheQuery}}. This is incorrect since it still 
> refers cached table relation which could be stale.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33729) When refreshing cache, Spark should not use cached plan when recaching data

2020-12-09 Thread Chao Sun (Jira)

Chao Sun created SPARK-33729:


 Summary: When refreshing cache, Spark should not use cached plan 
when recaching data
 Key: SPARK-33729
 URL: https://issues.apache.org/jira/browse/SPARK-33729
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Chao Sun


Currently when cache is refreshed, e.g., via "REFRESH TABLE" command, Spark 
will call {{refreshTable}} method within {{CatalogImpl}}.

{code}
  override def refreshTable(tableName: String): Unit = {
val tableIdent = 
sparkSession.sessionState.sqlParser.parseTableIdentifier(tableName)
val tableMetadata = 
sessionCatalog.getTempViewOrPermanentTableMetadata(tableIdent)
val table = sparkSession.table(tableIdent)

if (tableMetadata.tableType == CatalogTableType.VIEW) {
  // Temp or persistent views: refresh (or invalidate) any metadata/data 
cached
  // in the plan recursively.
  table.queryExecution.analyzed.refresh()
} else {
  // Non-temp tables: refresh the metadata cache.
  sessionCatalog.refreshTable(tableIdent)
}

// If this table is cached as an InMemoryRelation, drop the original
// cached version and make the new version cached lazily.
val cache = sparkSession.sharedState.cacheManager.lookupCachedData(table)

// uncache the logical plan.
// note this is a no-op for the table itself if it's not cached, but will 
invalidate all
// caches referencing this table.
sparkSession.sharedState.cacheManager.uncacheQuery(table, cascade = true)

if (cache.nonEmpty) {
  // save the cache name and cache level for recreation
  val cacheName = cache.get.cachedRepresentation.cacheBuilder.tableName
  val cacheLevel = cache.get.cachedRepresentation.cacheBuilder.storageLevel

  // recache with the same name and cache level.
  sparkSession.sharedState.cacheManager.cacheQuery(table, cacheName, 
cacheLevel)
}
  }
{code}

Note that the {{table}} is created before the table relation cache is cleared, 
and used later in {{cacheQuery}}. This is incorrect since it still refers 
cached table relation which could be stale.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246934#comment-17246934
 ] 

Apache Spark commented on SPARK-33725:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30698

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.2.0
>
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246932#comment-17246932
 ] 

Apache Spark commented on SPARK-33725:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30698

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.2.0
>
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246931#comment-17246931
 ] 

Apache Spark commented on SPARK-33725:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30697

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.2.0
>
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246930#comment-17246930
 ] 

Apache Spark commented on SPARK-33727:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30696

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246929#comment-17246929
 ] 

Apache Spark commented on SPARK-33727:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30696

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33727:


Assignee: (was: Apache Spark)

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33727:


Assignee: Apache Spark

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246928#comment-17246928
 ] 

Shane Knapp commented on SPARK-33727:
-

[~dongjoon] the build that i looked at 
([https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37120/)]
 ran on one of the workers that's been set up for over a year, and hasn't had 
any network changes done to it since.
{noformat}
¯\_(ツ)_/¯{noformat}
i'd tend to agree w/holden's observation – sometimes keyservers are flaky.  
fallbacks are always a good thing.

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33728) Improve error messages during K8s integration test failure

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33728:


Assignee: (was: Apache Spark)

> Improve error messages during K8s integration test failure
> --
>
> Key: SPARK-33728
> URL: https://issues.apache.org/jira/browse/SPARK-33728
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Priority: Trivial
>
> If decommissioning fails it can be hard to debug because we don't have 
> executor logs. Capture some off the executor logs for debugging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33728) Improve error messages during K8s integration test failure

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246925#comment-17246925
 ] 

Apache Spark commented on SPARK-33728:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30435

> Improve error messages during K8s integration test failure
> --
>
> Key: SPARK-33728
> URL: https://issues.apache.org/jira/browse/SPARK-33728
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Priority: Trivial
>
> If decommissioning fails it can be hard to debug because we don't have 
> executor logs. Capture some off the executor logs for debugging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33728) Improve error messages during K8s integration test failure

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33728:


Assignee: Apache Spark

> Improve error messages during K8s integration test failure
> --
>
> Key: SPARK-33728
> URL: https://issues.apache.org/jira/browse/SPARK-33728
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Assignee: Apache Spark
>Priority: Trivial
>
> If decommissioning fails it can be hard to debug because we don't have 
> executor logs. Capture some off the executor logs for debugging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Holden Karau (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246923#comment-17246923
 ] 

Holden Karau commented on SPARK-33727:
--

So I've seen the keys.gnupg.net key server be flaky on my machine at home. 
Maybe we could try having it fall back to another key server it that one fails?

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33728) Improve error messages during K8s integration test failure

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246924#comment-17246924
 ] 

Apache Spark commented on SPARK-33728:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30435

> Improve error messages during K8s integration test failure
> --
>
> Key: SPARK-33728
> URL: https://issues.apache.org/jira/browse/SPARK-33728
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Priority: Trivial
>
> If decommissioning fails it can be hard to debug because we don't have 
> executor logs. Capture some off the executor logs for debugging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33728) Improve error messages during K8s integration test failure

2020-12-09 Thread Holden Karau (Jira)

Holden Karau created SPARK-33728:


 Summary: Improve error messages during K8s integration test failure
 Key: SPARK-33728
 URL: https://issues.apache.org/jira/browse/SPARK-33728
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes, Tests
Affects Versions: 3.1.0, 3.2.0
Reporter: Holden Karau


If decommissioning fails it can be hard to debug because we don't have executor 
logs. Capture some off the executor logs for debugging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246899#comment-17246899
 ] 

Dongjoon Hyun commented on SPARK-33727:
---

BTW, K8s IT succeed 4 hours ago in this run. So, I guess that some workers may 
have different network setting.
- 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37110/console

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246897#comment-17246897
 ] 

Dongjoon Hyun commented on SPARK-33727:
---

The patch landed 28 days ago to recover K8S IT failure.
- https://github.com/apache/spark/pull/30130 
([SPARK-33408][SPARK-32354][K8S][R] Use R 3.6.3 in K8s R image and re-enable 
RTestsSuite)

K8s IT has been working correctly for one month with that key server.

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246894#comment-17246894
 ] 

Shane Knapp edited comment on SPARK-33727 at 12/9/20, 11:30 PM:


ok, this is not a failure on the jenkins worker – it's happening inside the 
docker container that the build spins up.  just scroll back from the gpg error 
message and you'll see the docker STDOUT as it's trying to build the spark-r 
container.

in fact, from looking at the logs it appears that the command that's actually 
failing in the container setup is:
{noformat}
apt-key adv --keyserver keys.gnupg.net --recv-key 
'E19F5F87128899B192B1A2C2AD5F960A256A04AF'{noformat}
that's causing the following error:
{code:java}
Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net 
--recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
gpg: keyserver receive failed: No name
{code}
so, i took a peek at 
./resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile,
 and git blame for that specific line points to...

drumroll please...
{code:java}
22baf05a9ec (Dongjoon Hyun 2020-11-12 15:36:31 +0900 32) apt-key adv 
--keyserver keys.gnupg.net --recv-key 
'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && {code}

 [~dongjoon]

 


was (Author: shaneknapp):
ok, this is not a failure on the jenkins worker – it's happening inside the 
docker container that the build spins up.  just scroll back from the gpg error 
message and you'll see the docker STDOUT as it's trying to build the spark-r 
container.

in fact, from looking at the logs it appears that the command that's actually 
failing in the container setup is:
{noformat}
apt-key adv --keyserver keys.gnupg.net --recv-key 
'E19F5F87128899B192B1A2C2AD5F960A256A04AF'{noformat}
that's causing the following error:
{code:java}
Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net 
--recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
gpg: keyserver receive failed: No name
{code}
so, i took a peek at 
./resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile,
 and git blame for that specific line points to...

drumroll please...
{code:java}
22baf05a9ec (Dongjoon Hyun  2020-11-12 15:36:31 +0900 32)   apt-key adv 
--keyserver keys.gnupg.net --recv-key 
'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && \{code}
[~dongjoon]

 

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246894#comment-17246894
 ] 

Shane Knapp commented on SPARK-33727:
-

ok, this is not a failure on the jenkins worker – it's happening inside the 
docker container that the build spins up.  just scroll back from the gpg error 
message and you'll see the docker STDOUT as it's trying to build the spark-r 
container.

in fact, from looking at the logs it appears that the command that's actually 
failing in the container setup is:
{noformat}
apt-key adv --keyserver keys.gnupg.net --recv-key 
'E19F5F87128899B192B1A2C2AD5F960A256A04AF'{noformat}
that's causing the following error:
{code:java}
Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net 
--recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
gpg: keyserver receive failed: No name
{code}
so, i took a peek at 
./resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile,
 and git blame for that specific line points to...

drumroll please...
{code:java}
22baf05a9ec (Dongjoon Hyun  2020-11-12 15:36:31 +0900 32)   apt-key adv 
--keyserver keys.gnupg.net --recv-key 
'E19F5F87128899B192B1A2C2AD5F960A256A04AF' && \{code}
[~dongjoon]

 

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246891#comment-17246891
 ] 

Apache Spark commented on SPARK-33725:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30695

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.2.0
>
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246890#comment-17246890
 ] 

Apache Spark commented on SPARK-33725:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30695

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.2.0
>
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246883#comment-17246883
 ] 

Dongjoon Hyun commented on SPARK-33727:
---

Hi, [~shaneknapp]. Could you try `gpg` command in the Jenkins server? Is it 
working? I'm wondering if there is some network security change there.

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246884#comment-17246884
 ] 

Dongjoon Hyun commented on SPARK-33727:
---

cc [~hyukjin.kwon]

> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33727:
--
Description: 
K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
consistent in the new Jenkins Server.

{code}
Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net 
--recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
gpg: keyserver receive failed: No name
The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   apt-key 
adv --keyserver keys.gnupg.net --recv-key 
'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt install 
-y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' returned a 
non-zero code: 2
{code}

It locally works on Mac.
{code}
$ gpg1 --keyserver keys.gnupg.net --recv-key 
E19F5F87128899B192B1A2C2AD5F960A256A04AF
gpg: requesting key 256A04AF from hkp server keys.gnupg.net
gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
" imported
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
gpg: Total number processed: 1
gpg:   imported: 1  (RSA: 1)
{code}


It happens multiple times.
- https://github.com/apache/spark/pull/30693
- https://github.com/apache/spark/pull/30694

  was:
K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
consistent in the new Jenkins Server.

{code}
Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net 
--recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
gpg: keyserver receive failed: No name
The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   apt-key 
adv --keyserver keys.gnupg.net --recv-key 
'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt install 
-y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' returned a 
non-zero code: 2
{code}

It locally works on Mac.
{code}
$ gpg1 --keyserver keys.gnupg.net --recv-key 
E19F5F87128899B192B1A2C2AD5F960A256A04AF
gpg: requesting key 256A04AF from hkp server keys.gnupg.net
gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
" imported
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
gpg: Total number processed: 1
gpg:   imported: 1  (RSA: 1)
{code}


> `gpg: keyserver receive failed: No name` during K8s IT
> --
>
> Key: SPARK-33727
> URL: https://issues.apache.org/jira/browse/SPARK-33727
> Project: Spark
>  Issue Type: Task
>  Components: Kubernetes, Project Infra, Tests
>Affects Versions: 3.0.2, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
> consistent in the new Jenkins Server.
> {code}
> Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver 
> keys.gnupg.net --recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: keyserver receive failed: No name
> The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
> buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   
> apt-key adv --keyserver keys.gnupg.net --recv-key 
> 'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt 
> install -y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' 
> returned a non-zero code: 2
> {code}
> It locally works on Mac.
> {code}
> $ gpg1 --keyserver keys.gnupg.net --recv-key 
> E19F5F87128899B192B1A2C2AD5F960A256A04AF
> gpg: requesting key 256A04AF from hkp server keys.gnupg.net
> gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
> " imported
> gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
> gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
> gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
> gpg: Total number processed: 1
> gpg:   imported: 1  (RSA: 1)
> {code}
> It happens multiple times.
> - https://github.com/apache/spark/pull/30693
> - https://github.com/apache/spark/pull/30694



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33727) `gpg: keyserver receive failed: No name` during K8s IT

2020-12-09 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-33727:
-

 Summary: `gpg: keyserver receive failed: No name` during K8s IT
 Key: SPARK-33727
 URL: https://issues.apache.org/jira/browse/SPARK-33727
 Project: Spark
  Issue Type: Task
  Components: Kubernetes, Project Infra, Tests
Affects Versions: 3.0.2, 3.1.0, 3.2.0
Reporter: Dongjoon Hyun


K8s IT fails with gpg: keyserver receive failed: No name. This seems to be 
consistent in the new Jenkins Server.

{code}
Executing: /tmp/apt-key-gpghome.gGqC9RwptN/gpg.1.sh --keyserver keys.gnupg.net 
--recv-key E19F5F87128899B192B1A2C2AD5F960A256A04AF
gpg: keyserver receive failed: No name
The command '/bin/sh -c echo "deb http://cloud.r-project.org/bin/linux/debian 
buster-cran35/" >> /etc/apt/sources.list &&   apt install -y gnupg &&   apt-key 
adv --keyserver keys.gnupg.net --recv-key 
'E19F5F87128899B192B1A2C2AD5F960A256A04AF' &&   apt-get update &&   apt install 
-y -t buster-cran35 r-base r-base-dev &&   rm -rf /var/cache/apt/*' returned a 
non-zero code: 2
{code}

It locally works on Mac.
{code}
$ gpg1 --keyserver keys.gnupg.net --recv-key 
E19F5F87128899B192B1A2C2AD5F960A256A04AF
gpg: requesting key 256A04AF from hkp server keys.gnupg.net
gpg: key 256A04AF: public key "Johannes Ranke (Wissenschaftlicher Berater) 
" imported
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   2  signed:   1  trust: 0-, 0q, 0n, 0m, 0f, 2u
gpg: depth: 1  valid:   1  signed:   0  trust: 1-, 0q, 0n, 0m, 0f, 0u
gpg: Total number processed: 1
gpg:   imported: 1  (RSA: 1)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33725.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/30690

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.2.0
>
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names

2020-12-09 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246858#comment-17246858
 ] 

Sean R. Owen commented on SPARK-33713:
--

I don't think we care much about build history, yes. Don't worry about that.

> Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
> -
>
> Key: SPARK-33713
> URL: https://issues.apache.org/jira/browse/SPARK-33713
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> We removed `hive-1.2` profile since branch-3.1. So, we can simplify the 
> Jenkins job title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names

2020-12-09 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246850#comment-17246850
 ] 

Shane Knapp commented on SPARK-33713:
-

got it.  gonna rejigger my rejiggering of the JJB configs.

> Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
> -
>
> Key: SPARK-33713
> URL: https://issues.apache.org/jira/browse/SPARK-33713
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> We removed `hive-1.2` profile since branch-3.1. So, we can simplify the 
> Jenkins job title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33724) Allow decommissioning script location to be configured

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246848#comment-17246848
 ] 

Apache Spark commented on SPARK-33724:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30694

> Allow decommissioning script location to be configured
> --
>
> Key: SPARK-33724
> URL: https://issues.apache.org/jira/browse/SPARK-33724
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Trivial
>
> Some people don't use the Spark image tool and instead do custom volume 
> mounts to make Spark available. As such the hard coded path does not work 
> well for them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33724) Allow decommissioning script location to be configured

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33724:


Assignee: Holden Karau  (was: Apache Spark)

> Allow decommissioning script location to be configured
> --
>
> Key: SPARK-33724
> URL: https://issues.apache.org/jira/browse/SPARK-33724
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Trivial
>
> Some people don't use the Spark image tool and instead do custom volume 
> mounts to make Spark available. As such the hard coded path does not work 
> well for them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33724) Allow decommissioning script location to be configured

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33724:


Assignee: Apache Spark  (was: Holden Karau)

> Allow decommissioning script location to be configured
> --
>
> Key: SPARK-33724
> URL: https://issues.apache.org/jira/browse/SPARK-33724
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Assignee: Apache Spark
>Priority: Trivial
>
> Some people don't use the Spark image tool and instead do custom volume 
> mounts to make Spark available. As such the hard coded path does not work 
> well for them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33724) Allow decommissioning script location to be configured

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246847#comment-17246847
 ] 

Apache Spark commented on SPARK-33724:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30694

> Allow decommissioning script location to be configured
> --
>
> Key: SPARK-33724
> URL: https://issues.apache.org/jira/browse/SPARK-33724
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Trivial
>
> Some people don't use the Spark image tool and instead do custom volume 
> mounts to make Spark available. As such the hard coded path does not work 
> well for them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33716) Decommissioning Race Condition during Pod Snapshot

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246846#comment-17246846
 ] 

Apache Spark commented on SPARK-33716:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30693

> Decommissioning Race Condition during Pod Snapshot
> --
>
> Key: SPARK-33716
> URL: https://issues.apache.org/jira/browse/SPARK-33716
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> Some version of Kubernetes may create a deletion timestamp field before 
> changing the pod status to terminating, so a decommissioning node may have a 
> deletion timestamp and a stage of running. Depending on when the K8s snapshot 
> comes back this can cause a race condition with Spark believing the pod has 
> been deleted before it has been.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33716) Decommissioning Race Condition during Pod Snapshot

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246845#comment-17246845
 ] 

Apache Spark commented on SPARK-33716:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30693

> Decommissioning Race Condition during Pod Snapshot
> --
>
> Key: SPARK-33716
> URL: https://issues.apache.org/jira/browse/SPARK-33716
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> Some version of Kubernetes may create a deletion timestamp field before 
> changing the pod status to terminating, so a decommissioning node may have a 
> deletion timestamp and a stage of running. Depending on when the K8s snapshot 
> comes back this can cause a race condition with Spark believing the pod has 
> been deleted before it has been.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33716) Decommissioning Race Condition during Pod Snapshot

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33716:


Assignee: Apache Spark  (was: Holden Karau)

> Decommissioning Race Condition during Pod Snapshot
> --
>
> Key: SPARK-33716
> URL: https://issues.apache.org/jira/browse/SPARK-33716
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Assignee: Apache Spark
>Priority: Major
>
> Some version of Kubernetes may create a deletion timestamp field before 
> changing the pod status to terminating, so a decommissioning node may have a 
> deletion timestamp and a stage of running. Depending on when the K8s snapshot 
> comes back this can cause a race condition with Spark believing the pod has 
> been deleted before it has been.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33716) Decommissioning Race Condition during Pod Snapshot

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33716:


Assignee: Holden Karau  (was: Apache Spark)

> Decommissioning Race Condition during Pod Snapshot
> --
>
> Key: SPARK-33716
> URL: https://issues.apache.org/jira/browse/SPARK-33716
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> Some version of Kubernetes may create a deletion timestamp field before 
> changing the pod status to terminating, so a decommissioning node may have a 
> deletion timestamp and a stage of running. Depending on when the K8s snapshot 
> comes back this can cause a race condition with Spark believing the pod has 
> been deleted before it has been.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names

2020-12-09 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246833#comment-17246833
 ] 

Dongjoon Hyun commented on SPARK-33713:
---

For master and branch-3.1, we don't need a build history. And, `master` and 
`branch-3.1` have been already broken for a while. (cc [~hyukjin.kwon] and 
[~srowen])

> Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
> -
>
> Key: SPARK-33713
> URL: https://issues.apache.org/jira/browse/SPARK-33713
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> We removed `hive-1.2` profile since branch-3.1. So, we can simplify the 
> Jenkins job title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names

2020-12-09 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246830#comment-17246830
 ] 

Dongjoon Hyun commented on SPARK-33713:
---

Oh, this was a request for only `master/branch-3.1` ("Remove `hive-2.3` post 
fix at master/branch-3.1 Jenkins job names").

> Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
> -
>
> Key: SPARK-33713
> URL: https://issues.apache.org/jira/browse/SPARK-33713
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> We removed `hive-1.2` profile since branch-3.1. So, we can simplify the 
> Jenkins job title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2020-12-09 Thread Erik Krogen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated SPARK-33726:

Labels: correctness  (was: )

> Duplicate field names causes wrong answers during aggregation
> -
>
> Key: SPARK-33726
> URL: https://issues.apache.org/jira/browse/SPARK-33726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.1
>Reporter: Yian Liou
>Priority: Major
>  Labels: correctness
>
> We saw this bug at Workday.
> Duplicate field names for different fields can cause  
> org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
> return a fixed batch when it should have returned a variable batch leading to 
> wrong results.
> This example produces wrong results in the spark shell:
> scala> sql("with T as (select id as a, -id as x from range(3)), U as (select 
> id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as 
> ma, min(b) as mb from T join U on a=b group by U.x, T.x").show
>  
> |*x*|*x*|*ma*|*mb*|
> |-2|2|0|null|
> |-1|1|null|1|
> |0|0|0|0|
>  instead of correct output : 
> |*x*|*x*|*ma*|*mb*|
> |0|0|0|0|
> |-2|2|2|2|
> |-1|1|1|1|
> The issue can be solved by iterating over the fields themselves instead of 
> field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names

2020-12-09 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246816#comment-17246816
 ] 

Shane Knapp commented on SPARK-33713:
-

also, let me think about other sneaky ways of making this happen w/o needing to 
lose all the build logs...

> Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
> -
>
> Key: SPARK-33713
> URL: https://issues.apache.org/jira/browse/SPARK-33713
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> We removed `hive-1.2` profile since branch-3.1. So, we can simplify the 
> Jenkins job title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2020-12-09 Thread Yian Liou (Jira)

Yian Liou created SPARK-33726:
-

 Summary: Duplicate field names causes wrong answers during 
aggregation
 Key: SPARK-33726
 URL: https://issues.apache.org/jira/browse/SPARK-33726
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1, 2.4.4
Reporter: Yian Liou


We saw this bug at Workday.

Duplicate field names for different fields can cause  
org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
return a fixed batch when it should have returned a variable batch leading to 
wrong results.

This example produces wrong results in the spark shell:

scala> sql("with T as (select id as a, -id as x from range(3)), U as (select id 
as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as ma, 
min(b) as mb from T join U on a=b group by U.x, T.x").show
 
|*x*|*x*|*ma*|*mb*|
|-2|2|0|null|
|-1|1|null|1|
|0|0|0|0|

 instead of correct output : 
|*x*|*x*|*ma*|*mb*|
|0|0|0|0|
|-2|2|2|2|
|-1|1|1|1|

The issue can be solved by iterating over the fields themselves instead of 
field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2020-12-09 Thread Yian Liou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246812#comment-17246812
 ] 

Yian Liou commented on SPARK-33726:
---

Will create a PR for the issue.

> Duplicate field names causes wrong answers during aggregation
> -
>
> Key: SPARK-33726
> URL: https://issues.apache.org/jira/browse/SPARK-33726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.1
>Reporter: Yian Liou
>Priority: Major
>
> We saw this bug at Workday.
> Duplicate field names for different fields can cause  
> org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
> return a fixed batch when it should have returned a variable batch leading to 
> wrong results.
> This example produces wrong results in the spark shell:
> scala> sql("with T as (select id as a, -id as x from range(3)), U as (select 
> id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as 
> ma, min(b) as mb from T join U on a=b group by U.x, T.x").show
>  
> |*x*|*x*|*ma*|*mb*|
> |-2|2|0|null|
> |-1|1|null|1|
> |0|0|0|0|
>  instead of correct output : 
> |*x*|*x*|*ma*|*mb*|
> |0|0|0|0|
> |-2|2|2|2|
> |-1|1|1|1|
> The issue can be solved by iterating over the fields themselves instead of 
> field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names

2020-12-09 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246810#comment-17246810
 ] 

Shane Knapp commented on SPARK-33713:
-

fwiw, here's what the new and sexy build names will be:
{noformat}
INFO:jenkins_jobs.builder:Number of jobs generated: 30 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.6' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.6' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7-jdk-11' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2-jdk-11' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-3.2' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-jdk-11' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-jdk-11-scala-2.13' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-scala-2.13' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-jdk-11' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-jdk-11-scala-2.13' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-sbt-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-sbt-hadoop-3.2' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-jdk-11' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-jdk-11-scala-2.13' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-scala-2.13' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-3.2' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-jdk-11' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-jdk-11-scala-2.13' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-scala-2.13' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-sbt-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-sbt-hadoop-3.2'
{noformat}

> Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
> -
>
> Key: SPARK-33713
> URL: https://issues.apache.org/jira/browse/SPARK-33713
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> We removed `hive-1.2` profile since branch-3.1. So, we can simplify the 
> Jenkins job title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names

2020-12-09 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246810#comment-17246810
 ] 

Shane Knapp edited comment on SPARK-33713 at 12/9/20, 8:42 PM:
---

fwiw, here's what the new and sexy build names will be:
{code:java}
INFO:jenkins_jobs.builder:Number of jobs generated:  30
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.6'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.7'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.6'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.7'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7-jdk-11'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2-jdk-11'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-2.7'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-3.2'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-jdk-11'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-jdk-11-scala-2.13'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7-scala-2.13'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-jdk-11'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-jdk-11-scala-2.13'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-sbt-hadoop-2.7'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-sbt-hadoop-3.2'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-2.7'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-jdk-11'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-jdk-11-scala-2.13'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-2.7-scala-2.13'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-3.2'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-jdk-11'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-jdk-11-scala-2.13'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-maven-hadoop-3.2-scala-2.13'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-sbt-hadoop-2.7'
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-master-test-sbt-hadoop-3.2'
{code}
{noformat}
{noformat}


was (Author: shaneknapp):
fwiw, here's what the new and sexy build names will be:
{noformat}
INFO:jenkins_jobs.builder:Number of jobs generated: 30 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.6' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-maven-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.6' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-2.4-test-sbt-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-2.7-jdk-11' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-maven-hadoop-3.2-jdk-11' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.0-test-sbt-hadoop-3.2' 
DEBUG:jenkins_jobs.builder:Writing XML to 
'target/jenkins-xml/spark-branch-3.1-test-maven-hadoop-2.7' 
DEBUG:jenkins_jobs.builder:Writing XML to

[jira] [Commented] (SPARK-33713) Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names

2020-12-09 Thread Shane Knapp (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246808#comment-17246808
 ] 

Shane Knapp commented on SPARK-33713:
-

hmm.  so while i heartily endorse this, there will be a side-effect of renaming 
the jobs (particularly as we use Jenkins Job Builder – JJB – to deploy and 
manage the build configs):

once i redeploy the updated JJB configs w/the shortened build names, it will 
create new builds and not change the names of existing ones.  unless i'm unable 
to read docs anymore (which is entirely possible) there's no 'rename' ability.

and since, according to jenkins, these are new builds, all previous build 
history will be lost...  unless i manually copy things over on the jenkins 
primary filesystem (which i'd like to avoid if possible).  given that we're 
only storing 2 weeks of builds, if we time it properly (aka at 3.1 release) the 
impact of losing these logs will be pretty insignificant.

my suggestion:

i want to do this, but i propose moving everything over when 3.1 is officially 
released.   we'll lose the previous 2 weeks of build history across all 
branches, but since we're "starting fresh"-ish i think the impact will be 
minimized.

 

thoughts?  comments?  should we pull anyone else in for their opions?  [~sowen] 
[~hyukjin.kwon] [~holden]

> Remove `hive-2.3` post fix at master/branch-3.1 Jenkins job names
> -
>
> Key: SPARK-33713
> URL: https://issues.apache.org/jira/browse/SPARK-33713
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 3.2.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
>
> We removed `hive-1.2` profile since branch-3.1. So, we can simplify the 
> Jenkins job title.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33725:
--
Affects Version/s: 3.1.0
   2.4.7
   3.0.1

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.7, 3.0.1, 3.1.0, 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33725:
--
Issue Type: Bug  (was: Improvement)

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32920) Add support in Spark driver to coordinate the finalization of the push/merge phase in push-based shuffle for a given shuffle and the initiation of the reduce stage

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32920:


Assignee: (was: Apache Spark)

> Add support in Spark driver to coordinate the finalization of the push/merge 
> phase in push-based shuffle for a given shuffle and the initiation of the 
> reduce stage
> ---
>
> Key: SPARK-32920
> URL: https://issues.apache.org/jira/browse/SPARK-32920
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Priority: Major
>
> With push-based shuffle, we are currently decoupling map task executions from 
> the shuffle block push process. Thus, when all map tasks finish, we might 
> want to wait for some small extra time to allow more shuffle blocks to get 
> pushed and merged. This requires some extra coordination in the Spark driver 
> when it transitions from a shuffle map stage to the corresponding reduce 
> stage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32920) Add support in Spark driver to coordinate the finalization of the push/merge phase in push-based shuffle for a given shuffle and the initiation of the reduce stage

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246804#comment-17246804
 ] 

Apache Spark commented on SPARK-32920:
--

User 'venkata91' has created a pull request for this issue:
https://github.com/apache/spark/pull/30691

> Add support in Spark driver to coordinate the finalization of the push/merge 
> phase in push-based shuffle for a given shuffle and the initiation of the 
> reduce stage
> ---
>
> Key: SPARK-32920
> URL: https://issues.apache.org/jira/browse/SPARK-32920
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Assignee: Apache Spark
>Priority: Major
>
> With push-based shuffle, we are currently decoupling map task executions from 
> the shuffle block push process. Thus, when all map tasks finish, we might 
> want to wait for some small extra time to allow more shuffle blocks to get 
> pushed and merged. This requires some extra coordination in the Spark driver 
> when it transitions from a shuffle map stage to the corresponding reduce 
> stage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32920) Add support in Spark driver to coordinate the finalization of the push/merge phase in push-based shuffle for a given shuffle and the initiation of the reduce stage

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32920:


Assignee: Apache Spark

> Add support in Spark driver to coordinate the finalization of the push/merge 
> phase in push-based shuffle for a given shuffle and the initiation of the 
> reduce stage
> ---
>
> Key: SPARK-32920
> URL: https://issues.apache.org/jira/browse/SPARK-32920
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Assignee: Apache Spark
>Priority: Major
>
> With push-based shuffle, we are currently decoupling map task executions from 
> the shuffle block push process. Thus, when all map tasks finish, we might 
> want to wait for some small extra time to allow more shuffle blocks to get 
> pushed and merged. This requires some extra coordination in the Spark driver 
> when it transitions from a shuffle map stage to the corresponding reduce 
> stage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32920) Add support in Spark driver to coordinate the finalization of the push/merge phase in push-based shuffle for a given shuffle and the initiation of the reduce stage

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246803#comment-17246803
 ] 

Apache Spark commented on SPARK-32920:
--

User 'venkata91' has created a pull request for this issue:
https://github.com/apache/spark/pull/30691

> Add support in Spark driver to coordinate the finalization of the push/merge 
> phase in push-based shuffle for a given shuffle and the initiation of the 
> reduce stage
> ---
>
> Key: SPARK-32920
> URL: https://issues.apache.org/jira/browse/SPARK-32920
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Priority: Major
>
> With push-based shuffle, we are currently decoupling map task executions from 
> the shuffle block push process. Thus, when all map tasks finish, we might 
> want to wait for some small extra time to allow more shuffle blocks to get 
> pushed and merged. This requires some extra coordination in the Spark driver 
> when it transitions from a shuffle map stage to the corresponding reduce 
> stage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246799#comment-17246799
 ] 

Apache Spark commented on SPARK-33725:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30690

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33725:


Assignee: L. C. Hsieh  (was: Apache Spark)

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246798#comment-17246798
 ] 

Apache Spark commented on SPARK-33725:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30690

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33725:


Assignee: Apache Spark  (was: L. C. Hsieh)

> Upgrade snappy-java to 1.1.8.2
> --
>
> Key: SPARK-33725
> URL: https://issues.apache.org/jira/browse/SPARK-33725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> Minor version upgrade that includes:
>  * Fixed an initialization issue when using a recent Mac OS X version #265
>  * Support Apple Silicon (M1, Mac-aarch64)
>  * Fixed the pure-java Snappy fallback logic when no native library for your 
> platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33261) Allow people to extend the pod feature steps

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33261:
--
Parent: (was: SPARK-33005)
Issue Type: Improvement  (was: Sub-task)

> Allow people to extend the pod feature steps
> 
>
> Key: SPARK-33261
> URL: https://issues.apache.org/jira/browse/SPARK-33261
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> While we allow people to specify pod templates, some deployments could 
> benefit from being able to add a feature step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33261) Allow people to extend the pod feature steps

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33261:
--
Affects Version/s: (was: 3.1.0)
   3.2.0

> Allow people to extend the pod feature steps
> 
>
> Key: SPARK-33261
> URL: https://issues.apache.org/jira/browse/SPARK-33261
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> While we allow people to specify pod templates, some deployments could 
> benefit from being able to add a feature step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33725) Upgrade snappy-java to 1.1.8.2

2020-12-09 Thread L. C. Hsieh (Jira)

L. C. Hsieh created SPARK-33725:
---

 Summary: Upgrade snappy-java to 1.1.8.2
 Key: SPARK-33725
 URL: https://issues.apache.org/jira/browse/SPARK-33725
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.2.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh


Minor version upgrade that includes:
 * Fixed an initialization issue when using a recent Mac OS X version #265
 * Support Apple Silicon (M1, Mac-aarch64)
 * Fixed the pure-java Snappy fallback logic when no native library for your 
platform is found.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33724) Allow decommissioning script location to be configured

2020-12-09 Thread Holden Karau (Jira)

Holden Karau created SPARK-33724:


 Summary: Allow decommissioning script location to be configured
 Key: SPARK-33724
 URL: https://issues.apache.org/jira/browse/SPARK-33724
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.1.0, 3.2.0
Reporter: Holden Karau
Assignee: Holden Karau


Some people don't use the Spark image tool and instead do custom volume mounts 
to make Spark available. As such the hard coded path does not work well for 
them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33722) Handle DELETE in ReplaceNullWithFalseInPredicate

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33722.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30688
[https://github.com/apache/spark/pull/30688]

> Handle DELETE in ReplaceNullWithFalseInPredicate
> 
>
> Key: SPARK-33722
> URL: https://issues.apache.org/jira/browse/SPARK-33722
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.2.0
>
>
> We should handle delete statements in {{ReplaceNullWithFalseInPredicate}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33722) Handle DELETE in ReplaceNullWithFalseInPredicate

2020-12-09 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33722:
-

Assignee: Anton Okolnychyi

> Handle DELETE in ReplaceNullWithFalseInPredicate
> 
>
> Key: SPARK-33722
> URL: https://issues.apache.org/jira/browse/SPARK-33722
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
>
> We should handle delete statements in {{ReplaceNullWithFalseInPredicate}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32110) -0.0 vs 0.0 is inconsistent

2020-12-09 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246727#comment-17246727
 ] 

Dongjoon Hyun commented on SPARK-32110:
---

How do you think about the above [~revans2]'s comment, [~cloud_fan]?

> -0.0 vs 0.0 is inconsistent
> ---
>
> Key: SPARK-32110
> URL: https://issues.apache.org/jira/browse/SPARK-32110
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Robert Joseph Evans
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>
> This is related to SPARK-26021 where some things were fixed but there is 
> still a lot that is not consistent.
> When parsing SQL {{-0.0}} is turned into {{0.0}}. This can produce quick 
> results that appear to be correct but are totally inconsistent for the same 
> operators.
> {code:java}
> scala> import spark.implicits._
> import spark.implicits._
> scala> spark.sql("SELECT 0.0 = -0.0").collect
> res0: Array[org.apache.spark.sql.Row] = Array([true])
> scala> Seq((0.0, -0.0)).toDF("a", "b").selectExpr("a = b").collect
> res1: Array[org.apache.spark.sql.Row] = Array([false])
> {code}
> This also shows up in sorts
> {code:java}
> scala> Seq((0.0, -100.0), (-0.0, 100.0), (0.0, 100.0), (-0.0, 
> -100.0)).toDF("a", "b").orderBy("a", "b").collect
> res2: Array[org.apache.spark.sql.Row] = Array([-0.0,-100.0], [-0.0,100.0], 
> [0.0,-100.0], [0.0,100.0])
> {code}
> But not for a equi-join or for an aggregate
> {code:java}
> scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", 
> "r_b"), $"a" === $"r_a").collect
> res3: Array[org.apache.spark.sql.Row] = Array([0.0,-0.0,-0.0,0.0])
> scala> Seq((0.0, 1.0), (-0.0, 1.0)).toDF("a", "b").groupBy("a").count.collect
> res6: Array[org.apache.spark.sql.Row] = Array([0.0,2])
> {code}
> This can lead to some very odd results. Like an equi-join with a filter that 
> logically should do nothing, but ends up filtering the result to nothing.
> {code:java}
> scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", 
> "r_b"), $"a" === $"r_a" && $"a" <= $"r_a").collect
> res8: Array[org.apache.spark.sql.Row] = Array()
> scala> Seq((0.0, -0.0)).toDF("a", "b").join(Seq((-0.0, 0.0)).toDF("r_a", 
> "r_b"), $"a" === $"r_a").collect
> res9: Array[org.apache.spark.sql.Row] = Array([0.0,-0.0,-0.0,0.0])
> {code}
> Hive never normalizes -0.0 to 0.0 so this results in non-ieee complaint 
> behavior everywhere, but at least it is consistently odd.
> MySQL, Oracle, Postgres, and SQLite all appear to normalize the {{-0.0}} to 
> {{0.0}}.
> The root cause of this appears to be that the java implementation of 
> {{Double.compare}} and {{Float.compare}} for open JDK places {{-0.0}} < 
> {{0.0}}.
> This is not documented in the java docs but it is clearly documented in the 
> code, so it is not a "bug" that java is going to fix.
> [https://github.com/openjdk/jdk/blob/a0a0539b0d3f9b6809c9759e697bfafd7b138ec1/src/java.base/share/classes/java/lang/Double.java#L1022-L1035]
> It is also consistent with what is in the java docs for {{Double.equals}}
>  
> [https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#equals-java.lang.Object-]
> To be clear I am filing this mostly to document the current state rather than 
> to think it needs to be fixed ASAP. It is a rare corner case, but ended up 
> being really frustrating for me to debug what was happening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data

2020-12-09 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246725#comment-17246725
 ] 

Dongjoon Hyun commented on SPARK-18105:
---

Apache Spark 3.x is using lz4-java-1.7.1.jar and this seems to be fixed by 
upgrading dependency, [~cloud_fan]. I'm not aware of any new incident about 
this.

cc [~viirya] since he is working on the codec issue in Hadoop community 
recently.
cc [~sunchao], too


> LZ4 failed to decompress a stream of shuffled data
> --
>
> Key: SPARK-18105
> URL: https://issues.apache.org/jira/browse/SPARK-18105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Davies Liu
>Priority: Major
>
> When lz4 is used to compress the shuffle files, it may fail to decompress it 
> as "stream is corrupt"
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in 
> stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted
>   at 
> org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220)
>   at 
> org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:353)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at com.google.common.io.ByteStreams.read(ByteStreams.java:828)
>   at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110)
>   at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>   at 
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> https://github.com/jpountz/lz4-java/issues/89



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33722) Handle DELETE in ReplaceNullWithFalseInPredicate

2020-12-09 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246722#comment-17246722
 ] 

Apache Spark commented on SPARK-33722:
--

User 'aokolnychyi' has created a pull request for this issue:
https://github.com/apache/spark/pull/30688

> Handle DELETE in ReplaceNullWithFalseInPredicate
> 
>
> Key: SPARK-33722
> URL: https://issues.apache.org/jira/browse/SPARK-33722
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Anton Okolnychyi
>Priority: Major
>
> We should handle delete statements in {{ReplaceNullWithFalseInPredicate}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 158 matches

Mail list logo