date:20211216

[jira] [Comment Edited] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-12-16 Thread jingxiong zhong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461230#comment-17461230
 ] 

jingxiong zhong edited comment on SPARK-36088 at 12/17/21, 6:53 AM:


In cluster mode, I hava another question that when I unzip python3.6.6.zip in 
pod , but no permission to execute, my execute operation as follows：

{code:sh}
spark-submit \
--archives ./python3.6.6.zip#python3.6.6 \
--conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \
--conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \
--conf spark.kubernetes.container.image.pullPolicy=Always \
./examples/src/main/python/pi.py 100
{code}



was (Author: JIRAUSER281124):
In cluster mode, I hava another question that when I unzip python3.6.6.zip in 
pod , but no permission to execute, my execute operation as follows：

{code:shell}
spark-submit \
--archives ./python3.6.6.zip#python3.6.6 \
--conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \
--conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \
--conf spark.kubernetes.container.image.pullPolicy=Always \
./examples/src/main/python/pi.py 100
{code}


> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode

2021-12-16 Thread jingxiong zhong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461230#comment-17461230
 ] 

jingxiong zhong commented on SPARK-36088:
-

In cluster mode, I hava another question that when I unzip python3.6.6.zip in 
pod , but no permission to execute, my execute operation as follows：

{code:shell}
spark-submit \
--archives ./python3.6.6.zip#python3.6.6 \
--conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \
--conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \
--conf spark.kubernetes.container.image.pullPolicy=Always \
./examples/src/main/python/pi.py 100
{code}


> 'spark.archives' does not extract the archive file into the driver under 
> client mode
> 
>
> Key: SPARK-36088
> URL: https://issues.apache.org/jira/browse/SPARK-36088
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 3.1.2
>Reporter: rickcheng
>Priority: Major
>
> When running spark in the k8s cluster, there are 2 deploy modes: cluster and 
> client. After my test, in the cluster mode, *spark.archives* can extract the 
> archive file to the working directory of the executors and driver. But in 
> client mode, *spark.archives* can only extract the archive file to the 
> working directory of the executors.
>  
> However, I need *spark.archives* to send the virtual environment tar file 
> packaged by conda to both the driver and executors under client mode (So that 
> the executor and the driver have the same python environment).
>  
> Why *spark.archives* does not extract the archive file into the working 
> directory of the driver under client mode?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37673) Implement `ps.timedelta_range` method

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461221#comment-17461221
 ] 

Apache Spark commented on SPARK-37673:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/34932

> Implement `ps.timedelta_range` method
> -
>
> Key: SPARK-37673
> URL: https://issues.apache.org/jira/browse/SPARK-37673
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `ps.timedelta_range` method



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37673) Implement `ps.timedelta_range` method

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37673:


Assignee: (was: Apache Spark)

> Implement `ps.timedelta_range` method
> -
>
> Key: SPARK-37673
> URL: https://issues.apache.org/jira/browse/SPARK-37673
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `ps.timedelta_range` method



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37673) Implement `ps.timedelta_range` method

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461220#comment-17461220
 ] 

Apache Spark commented on SPARK-37673:
--

User 'xinrong-databricks' has created a pull request for this issue:
https://github.com/apache/spark/pull/34932

> Implement `ps.timedelta_range` method
> -
>
> Key: SPARK-37673
> URL: https://issues.apache.org/jira/browse/SPARK-37673
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `ps.timedelta_range` method



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37673) Implement `ps.timedelta_range` method

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37673:


Assignee: Apache Spark

> Implement `ps.timedelta_range` method
> -
>
> Key: SPARK-37673
> URL: https://issues.apache.org/jira/browse/SPARK-37673
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Implement `ps.timedelta_range` method



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37673) Implement `ps.timedelta_range` method

2021-12-16 Thread Xinrong Meng (Jira)

Xinrong Meng created SPARK-37673:


 Summary: Implement `ps.timedelta_range` method
 Key: SPARK-37673
 URL: https://issues.apache.org/jira/browse/SPARK-37673
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Xinrong Meng


Implement `ps.timedelta_range` method



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37657) Support str and timestamp for (Series|DataFrame).describe()

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37657:


Assignee: (was: Apache Spark)

> Support str and timestamp for (Series|DataFrame).describe()
> ---
>
> Key: SPARK-37657
> URL: https://issues.apache.org/jira/browse/SPARK-37657
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Initialized in Koalas issue: 
> [https://github.com/databricks/koalas/issues/1888]
>  
> The `(Series|DataFrame).describe()` in pandas API on Spark doesn't work 
> properly when DataFrame has no numeric column.
>  
>  
> {code:java}
> >>> df = ps.DataFrame({'a': ["a", "b", "c"]})
> >>> df.describe()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../python/pyspark/pandas/frame.py", line 7582, in describe
> raise ValueError("Cannot describe a DataFrame without columns")
> ValueError: Cannot describe a DataFrame without columns 
> {code}
>  
> As it works fine in pandas, we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37657) Support str and timestamp for (Series|DataFrame).describe()

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37657:


Assignee: Apache Spark

> Support str and timestamp for (Series|DataFrame).describe()
> ---
>
> Key: SPARK-37657
> URL: https://issues.apache.org/jira/browse/SPARK-37657
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> Initialized in Koalas issue: 
> [https://github.com/databricks/koalas/issues/1888]
>  
> The `(Series|DataFrame).describe()` in pandas API on Spark doesn't work 
> properly when DataFrame has no numeric column.
>  
>  
> {code:java}
> >>> df = ps.DataFrame({'a': ["a", "b", "c"]})
> >>> df.describe()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../python/pyspark/pandas/frame.py", line 7582, in describe
> raise ValueError("Cannot describe a DataFrame without columns")
> ValueError: Cannot describe a DataFrame without columns 
> {code}
>  
> As it works fine in pandas, we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37657) Support str and timestamp for (Series|DataFrame).describe()

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461205#comment-17461205
 ] 

Apache Spark commented on SPARK-37657:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/34931

> Support str and timestamp for (Series|DataFrame).describe()
> ---
>
> Key: SPARK-37657
> URL: https://issues.apache.org/jira/browse/SPARK-37657
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Initialized in Koalas issue: 
> [https://github.com/databricks/koalas/issues/1888]
>  
> The `(Series|DataFrame).describe()` in pandas API on Spark doesn't work 
> properly when DataFrame has no numeric column.
>  
>  
> {code:java}
> >>> df = ps.DataFrame({'a': ["a", "b", "c"]})
> >>> df.describe()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../python/pyspark/pandas/frame.py", line 7582, in describe
> raise ValueError("Cannot describe a DataFrame without columns")
> ValueError: Cannot describe a DataFrame without columns 
> {code}
>  
> As it works fine in pandas, we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37672) Support ANSI Aggregate Function: regr_sxx

2021-12-16 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461204#comment-17461204
 ] 

jiaan.geng commented on SPARK-37672:


I'm working on.

> Support ANSI Aggregate Function: regr_sxx
> -
>
> Key: SPARK-37672
> URL: https://issues.apache.org/jira/browse/SPARK-37672
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> REGR_SXX is an ANSI aggregate function. many database support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37590) Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461202#comment-17461202
 ] 

Apache Spark commented on SPARK-37590:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/34930

> Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests
> 
>
> Key: SPARK-37590
> URL: https://issues.apache.org/jira/browse/SPARK-37590
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.3.0
>
>
> Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37590) Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461201#comment-17461201
 ] 

Apache Spark commented on SPARK-37590:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/34930

> Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests
> 
>
> Key: SPARK-37590
> URL: https://issues.apache.org/jira/browse/SPARK-37590
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.3.0
>
>
> Unify v1 and v2 ALTER NAMESPACE ... SET PROPERTIES tests



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37657) Support str and timestamp for (Series|DataFrame).describe()

2021-12-16 Thread Haejoon Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-37657:

Summary: Support str and timestamp for (Series|DataFrame).describe()  (was: 
Fix the bug in ps.(Series|DataFrame).describe())

> Support str and timestamp for (Series|DataFrame).describe()
> ---
>
> Key: SPARK-37657
> URL: https://issues.apache.org/jira/browse/SPARK-37657
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Initialized in Koalas issue: 
> [https://github.com/databricks/koalas/issues/1888]
>  
> The `(Series|DataFrame).describe()` in pandas API on Spark doesn't work 
> properly when DataFrame has no numeric column.
>  
>  
> {code:java}
> >>> df = ps.DataFrame({'a': ["a", "b", "c"]})
> >>> df.describe()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../python/pyspark/pandas/frame.py", line 7582, in describe
> raise ValueError("Cannot describe a DataFrame without columns")
> ValueError: Cannot describe a DataFrame without columns 
> {code}
>  
> As it works fine in pandas, we should fix it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37672) Support ANSI Aggregate Function: regr_sxx

2021-12-16 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-37672:
--

 Summary: Support ANSI Aggregate Function: regr_sxx
 Key: SPARK-37672
 URL: https://issues.apache.org/jira/browse/SPARK-37672
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng


REGR_SXX is an ANSI aggregate function. many database support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37641) Support ANSI Aggregate Function: regr_r2

2021-12-16 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37641:
---
Parent: SPARK-37671
Issue Type: Sub-task  (was: New Feature)

> Support ANSI Aggregate Function: regr_r2
> 
>
> Key: SPARK-37641
> URL: https://issues.apache.org/jira/browse/SPARK-37641
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> REGR_R2 is an ANSI aggregate function. many database support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37623) Support ANSI Aggregate Function: regr_slope & regr_intercept

2021-12-16 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37623:
---
Parent: SPARK-37671
Issue Type: Sub-task  (was: New Feature)

> Support ANSI Aggregate Function: regr_slope & regr_intercept
> 
>
> Key: SPARK-37623
> URL: https://issues.apache.org/jira/browse/SPARK-37623
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> REGR_SLOPE is an ANSI aggregate functions. many database support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37614) Support ANSI Aggregate Function: regr_avgx & regr_avgy

2021-12-16 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37614:
---
Parent: SPARK-37671
Issue Type: Sub-task  (was: New Feature)

> Support ANSI Aggregate Function: regr_avgx & regr_avgy
> --
>
> Key: SPARK-37614
> URL: https://issues.apache.org/jira/browse/SPARK-37614
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> REGR_AVGX and REGR_AVGY are ANSI aggregate functions. many database support 
> it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37613) Support ANSI Aggregate Function: regr_count

2021-12-16 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37613:
---
Parent: SPARK-37671
Issue Type: Sub-task  (was: New Feature)

> Support ANSI Aggregate Function: regr_count
> ---
>
> Key: SPARK-37613
> URL: https://issues.apache.org/jira/browse/SPARK-37613
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> REGR_COUNT is an ANSI aggregate function. many database support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37671) Support ANSI Aggregation Function of regression

2021-12-16 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-37671:
--

 Summary: Support ANSI Aggregation Function of regression
 Key: SPARK-37671
 URL: https://issues.apache.org/jira/browse/SPARK-37671
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng


Support ANSI Aggregation Function of regression



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37670) Support predicate pushdown and column pruning for de-duped CTEs

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37670:


Assignee: (was: Apache Spark)

> Support predicate pushdown and column pruning for de-duped CTEs
> ---
>
> Key: SPARK-37670
> URL: https://issues.apache.org/jira/browse/SPARK-37670
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wei Xue
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37670) Support predicate pushdown and column pruning for de-duped CTEs

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37670:


Assignee: Apache Spark

> Support predicate pushdown and column pruning for de-duped CTEs
> ---
>
> Key: SPARK-37670
> URL: https://issues.apache.org/jira/browse/SPARK-37670
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wei Xue
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37670) Support predicate pushdown and column pruning for de-duped CTEs

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461195#comment-17461195
 ] 

Apache Spark commented on SPARK-37670:
--

User 'maryannxue' has created a pull request for this issue:
https://github.com/apache/spark/pull/34929

> Support predicate pushdown and column pruning for de-duped CTEs
> ---
>
> Key: SPARK-37670
> URL: https://issues.apache.org/jira/browse/SPARK-37670
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Wei Xue
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37670) Support predicate pushdown and column pruning for de-duped CTEs

2021-12-16 Thread Wei Xue (Jira)

Wei Xue created SPARK-37670:
---

 Summary: Support predicate pushdown and column pruning for 
de-duped CTEs
 Key: SPARK-37670
 URL: https://issues.apache.org/jira/browse/SPARK-37670
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Wei Xue






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null

2021-12-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37654.
--
Fix Version/s: 3.3.0
   3.2.1
   3.1.3
   Resolution: Fixed

Issue resolved by pull request 34928
[https://github.com/apache/spark/pull/34928]

> Regression - NullPointerException in Row.getSeq when field null
> ---
>
> Key: SPARK-37654
> URL: https://issues.apache.org/jira/browse/SPARK-37654
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: Brandon Dahler
>Priority: Major
> Fix For: 3.3.0, 3.2.1, 3.1.3
>
>
> h2. Description
> A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if 
> the row contains a _null_ value at the requested index.
> {code:java}
> java.lang.NullPointerException
>   at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>   at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>   at org.apache.spark.sql.Row.getList(Row.scala:327)
>   at org.apache.spark.sql.Row.getList$(Row.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166)
> ...
> {code}
>  
> Prior to 3.1.1, the code would not throw an exception and instead would 
> return a null _Seq_ instance.
> h2. Reproduction
>  # Start a new spark-shell instance
>  # Execute the following script:
> {code:scala}
> import org.apache.spark.sql.Row
> Row(Seq("value")).getSeq(0)
> Row(Seq()).getSeq(0)
> Row(null).getSeq(0) {code}
> h3. Expected Output
> res2 outputs a _null_ value.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> res2: Seq[Nothing] = null
> {code}
> h3. Actual Output
> res2 throws a NullPointerException.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> java.lang.NullPointerException
>   at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>   at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>   ... 47 elided
> {code}
> h3. Environments Tested
> Tested against the following releases using the provided reproduction steps:
>  # spark-3.0.3-bin-hadoop2.7 - Succeeded
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.3
>   /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.1.2-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.1.2
>   /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.2.0-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.2.0
>   /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
> h2. Regression Source
> The regression appears to have been introduced in 
> [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317],
>  which addressed 
> [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526]
> h2. Work Around
> This regression can be worked around by using _Row.isNullAt(int)_ and 
> handling the null scenario in user code, prior to calling _Row.getSeq(int)_ 
> or _Row.getList(int)_.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null

2021-12-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37654:


Assignee: Huaxin Gao

> Regression - NullPointerException in Row.getSeq when field null
> ---
>
> Key: SPARK-37654
> URL: https://issues.apache.org/jira/browse/SPARK-37654
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: Brandon Dahler
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.1.3, 3.2.1, 3.3.0
>
>
> h2. Description
> A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if 
> the row contains a _null_ value at the requested index.
> {code:java}
> java.lang.NullPointerException
>   at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>   at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>   at org.apache.spark.sql.Row.getList(Row.scala:327)
>   at org.apache.spark.sql.Row.getList$(Row.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166)
> ...
> {code}
>  
> Prior to 3.1.1, the code would not throw an exception and instead would 
> return a null _Seq_ instance.
> h2. Reproduction
>  # Start a new spark-shell instance
>  # Execute the following script:
> {code:scala}
> import org.apache.spark.sql.Row
> Row(Seq("value")).getSeq(0)
> Row(Seq()).getSeq(0)
> Row(null).getSeq(0) {code}
> h3. Expected Output
> res2 outputs a _null_ value.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> res2: Seq[Nothing] = null
> {code}
> h3. Actual Output
> res2 throws a NullPointerException.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> java.lang.NullPointerException
>   at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>   at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>   ... 47 elided
> {code}
> h3. Environments Tested
> Tested against the following releases using the provided reproduction steps:
>  # spark-3.0.3-bin-hadoop2.7 - Succeeded
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.3
>   /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.1.2-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.1.2
>   /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.2.0-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.2.0
>   /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
> h2. Regression Source
> The regression appears to have been introduced in 
> [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317],
>  which addressed 
> [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526]
> h2. Work Around
> This regression can be worked around by using _Row.isNullAt(int)_ and 
> handling the null scenario in user code, prior to calling _Row.getSeq(int)_ 
> or _Row.getList(int)_.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37613) Support ANSI Aggregate Function: regr_count

2021-12-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37613:
---

Assignee: jiaan.geng

> Support ANSI Aggregate Function: regr_count
> ---
>
> Key: SPARK-37613
> URL: https://issues.apache.org/jira/browse/SPARK-37613
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> REGR_COUNT is an ANSI aggregate function. many database support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37613) Support ANSI Aggregate Function: regr_count

2021-12-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37613.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34880
[https://github.com/apache/spark/pull/34880]

> Support ANSI Aggregate Function: regr_count
> ---
>
> Key: SPARK-37613
> URL: https://issues.apache.org/jira/browse/SPARK-37613
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> REGR_COUNT is an ANSI aggregate function. many database support it.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37669) Remove unnecessary usages of OrderedDict

2021-12-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37669:


Assignee: Takuya Ueshin

> Remove unnecessary usages of OrderedDict
> 
>
> Key: SPARK-37669
> URL: https://issues.apache.org/jira/browse/SPARK-37669
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>
> Now that supported Python is 3.7 and above, we can remove unnecessary usages 
> of {{OrderedDict}} because built-in dict guarantees the insertion order.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37669) Remove unnecessary usages of OrderedDict

2021-12-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37669.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34926
[https://github.com/apache/spark/pull/34926]

> Remove unnecessary usages of OrderedDict
> 
>
> Key: SPARK-37669
> URL: https://issues.apache.org/jira/browse/SPARK-37669
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.3.0
>
>
> Now that supported Python is 3.7 and above, we can remove unnecessary usages 
> of {{OrderedDict}} because built-in dict guarantees the insertion order.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37654:


Assignee: (was: Apache Spark)

> Regression - NullPointerException in Row.getSeq when field null
> ---
>
> Key: SPARK-37654
> URL: https://issues.apache.org/jira/browse/SPARK-37654
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: Brandon Dahler
>Priority: Major
>
> h2. Description
> A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if 
> the row contains a _null_ value at the requested index.
> {code:java}
> java.lang.NullPointerException
>   at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>   at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>   at org.apache.spark.sql.Row.getList(Row.scala:327)
>   at org.apache.spark.sql.Row.getList$(Row.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166)
> ...
> {code}
>  
> Prior to 3.1.1, the code would not throw an exception and instead would 
> return a null _Seq_ instance.
> h2. Reproduction
>  # Start a new spark-shell instance
>  # Execute the following script:
> {code:scala}
> import org.apache.spark.sql.Row
> Row(Seq("value")).getSeq(0)
> Row(Seq()).getSeq(0)
> Row(null).getSeq(0) {code}
> h3. Expected Output
> res2 outputs a _null_ value.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> res2: Seq[Nothing] = null
> {code}
> h3. Actual Output
> res2 throws a NullPointerException.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> java.lang.NullPointerException
>   at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>   at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>   ... 47 elided
> {code}
> h3. Environments Tested
> Tested against the following releases using the provided reproduction steps:
>  # spark-3.0.3-bin-hadoop2.7 - Succeeded
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.3
>   /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.1.2-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.1.2
>   /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.2.0-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.2.0
>   /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
> h2. Regression Source
> The regression appears to have been introduced in 
> [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317],
>  which addressed 
> [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526]
> h2. Work Around
> This regression can be worked around by using _Row.isNullAt(int)_ and 
> handling the null scenario in user code, prior to calling _Row.getSeq(int)_ 
> or _Row.getList(int)_.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37654:


Assignee: Apache Spark

> Regression - NullPointerException in Row.getSeq when field null
> ---
>
> Key: SPARK-37654
> URL: https://issues.apache.org/jira/browse/SPARK-37654
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: Brandon Dahler
>Assignee: Apache Spark
>Priority: Major
>
> h2. Description
> A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if 
> the row contains a _null_ value at the requested index.
> {code:java}
> java.lang.NullPointerException
>   at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>   at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>   at org.apache.spark.sql.Row.getList(Row.scala:327)
>   at org.apache.spark.sql.Row.getList$(Row.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166)
> ...
> {code}
>  
> Prior to 3.1.1, the code would not throw an exception and instead would 
> return a null _Seq_ instance.
> h2. Reproduction
>  # Start a new spark-shell instance
>  # Execute the following script:
> {code:scala}
> import org.apache.spark.sql.Row
> Row(Seq("value")).getSeq(0)
> Row(Seq()).getSeq(0)
> Row(null).getSeq(0) {code}
> h3. Expected Output
> res2 outputs a _null_ value.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> res2: Seq[Nothing] = null
> {code}
> h3. Actual Output
> res2 throws a NullPointerException.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> java.lang.NullPointerException
>   at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>   at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>   ... 47 elided
> {code}
> h3. Environments Tested
> Tested against the following releases using the provided reproduction steps:
>  # spark-3.0.3-bin-hadoop2.7 - Succeeded
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.3
>   /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.1.2-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.1.2
>   /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.2.0-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.2.0
>   /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
> h2. Regression Source
> The regression appears to have been introduced in 
> [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317],
>  which addressed 
> [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526]
> h2. Work Around
> This regression can be worked around by using _Row.isNullAt(int)_ and 
> handling the null scenario in user code, prior to calling _Row.getSeq(int)_ 
> or _Row.getList(int)_.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37666) Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`

2021-12-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37666.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34925
[https://github.com/apache/spark/pull/34925]

> Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`
> 
>
> Key: SPARK-37666
> URL: https://issues.apache.org/jira/browse/SPARK-37666
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> Change the default mode from ECB to GCM in AES functions: aes_encrypt() and 
> aes_decrypt(). GCM is much more preferable because it is semantically secure. 
> Also the mode is used the default one in other systems like Snowflake, see 
> https://docs.snowflake.com/en/sql-reference/functions/encrypt.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37654) Regression - NullPointerException in Row.getSeq when field null

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461156#comment-17461156
 ] 

Apache Spark commented on SPARK-37654:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34928

> Regression - NullPointerException in Row.getSeq when field null
> ---
>
> Key: SPARK-37654
> URL: https://issues.apache.org/jira/browse/SPARK-37654
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.2.0
>Reporter: Brandon Dahler
>Priority: Major
>
> h2. Description
> A NullPointerException occurs in _org.apache.spark.sql.Row.getSeq(int)_ if 
> the row contains a _null_ value at the requested index.
> {code:java}
> java.lang.NullPointerException
>   at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>   at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>   at org.apache.spark.sql.Row.getList(Row.scala:327)
>   at org.apache.spark.sql.Row.getList$(Row.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getList(rows.scala:166)
> ...
> {code}
>  
> Prior to 3.1.1, the code would not throw an exception and instead would 
> return a null _Seq_ instance.
> h2. Reproduction
>  # Start a new spark-shell instance
>  # Execute the following script:
> {code:scala}
> import org.apache.spark.sql.Row
> Row(Seq("value")).getSeq(0)
> Row(Seq()).getSeq(0)
> Row(null).getSeq(0) {code}
> h3. Expected Output
> res2 outputs a _null_ value.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> res2: Seq[Nothing] = null
> {code}
> h3. Actual Output
> res2 throws a NullPointerException.
> {code:java}
> scala> import org.apache.spark.sql.Row
> import org.apache.spark.sql.Row
> scala>
> scala> Row(Seq("value")).getSeq(0)
> res0: Seq[Nothing] = List(value)
> scala> Row(Seq()).getSeq(0)
> res1: Seq[Nothing] = List()
> scala> Row(null).getSeq(0)
> java.lang.NullPointerException
>   at org.apache.spark.sql.Row.getSeq(Row.scala:319)
>   at org.apache.spark.sql.Row.getSeq$(Row.scala:319)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericRow.getSeq(rows.scala:166)
>   ... 47 elided
> {code}
> h3. Environments Tested
> Tested against the following releases using the provided reproduction steps:
>  # spark-3.0.3-bin-hadoop2.7 - Succeeded
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.0.3
>   /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.1.2-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.1.2
>   /_/Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
>  # spark-3.2.0-bin-hadoop3.2 - Failed
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.2.0
>   /_/Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_312) {code}
> h2. Regression Source
> The regression appears to have been introduced in 
> [25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb|https://github.com/apache/spark/commit/25c7d0fe6ae20a4c1c42e0cd0b448c08ab03f3fb#diff-722324a11a0e4635a59a9debc962da2c1678d86702a9a106fd0d51188f83853bR317],
>  which addressed 
> [SPARK-32526|https://issues.apache.org/jira/browse/SPARK-32526]
> h2. Work Around
> This regression can be worked around by using _Row.isNullAt(int)_ and 
> handling the null scenario in user code, prior to calling _Row.getSeq(int)_ 
> or _Row.getList(int)_.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37668) 'Index' object has no attribute 'levels' in pyspark.pandas.frame.DataFrame.insert

2021-12-16 Thread Haejoon Lee (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461143#comment-17461143
 ] 

Haejoon Lee commented on SPARK-37668:
-

Thanks for the report! Let me take a look

> 'Index' object has no attribute 'levels' in  
> pyspark.pandas.frame.DataFrame.insert
> --
>
> Key: SPARK-37668
> URL: https://issues.apache.org/jira/browse/SPARK-37668
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
>  [This piece of 
> code|https://github.com/apache/spark/blob/6e45b04db48008fa033b09df983d3bd1c4f790ea/python/pyspark/pandas/frame.py#L3991-L3993]
>  in {{pyspark.pandas.frame}} is going to fail on runtime, when 
> {{is_name_like_tuple}} evaluates to {{True}}
> {code:python}
> if is_name_like_tuple(column):
> if len(column) != len(self.columns.levels):
> {code}
> with 
> {code}
> 'Index' object has no attribute 'levels'
> {code}
> To be honest, I am not sure what is intended behavior (initially, I suspected 
> that we should have 
> {code:python}
>  if len(column) != self.columns.nlevels
> {code}
> but {{nlevels}} is hard-coded to one, and wouldn't be consistent with Pandas 
> at all.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-34544) pyspark toPandas() should return pd.DataFrame

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461133#comment-17461133
 ] 

Apache Spark commented on SPARK-34544:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/34927

> pyspark toPandas() should return pd.DataFrame
> -
>
> Key: SPARK-34544
> URL: https://issues.apache.org/jira/browse/SPARK-34544
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Rafal Wojdyla
>Assignee: Maciej Szymkiewicz
>Priority: Major
>
> Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete 
> "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that 
> certain pandas methods are not present in {{DataFrameLike}}, even tho those 
> methods are valid methods on pandas {{DataFrame}}, which is the actual type 
> of the object. This requires type ignore comments or asserts.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34544) pyspark toPandas() should return pd.DataFrame

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34544:


Assignee: Maciej Szymkiewicz  (was: Apache Spark)

> pyspark toPandas() should return pd.DataFrame
> -
>
> Key: SPARK-34544
> URL: https://issues.apache.org/jira/browse/SPARK-34544
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Rafal Wojdyla
>Assignee: Maciej Szymkiewicz
>Priority: Major
>
> Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete 
> "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that 
> certain pandas methods are not present in {{DataFrameLike}}, even tho those 
> methods are valid methods on pandas {{DataFrame}}, which is the actual type 
> of the object. This requires type ignore comments or asserts.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-34544) pyspark toPandas() should return pd.DataFrame

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-34544:


Assignee: Apache Spark  (was: Maciej Szymkiewicz)

> pyspark toPandas() should return pd.DataFrame
> -
>
> Key: SPARK-34544
> URL: https://issues.apache.org/jira/browse/SPARK-34544
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.1
>Reporter: Rafal Wojdyla
>Assignee: Apache Spark
>Priority: Major
>
> Right now {{toPandas()}} returns {{DataFrameLike}}, which is an incomplete 
> "view" of pandas {{DataFrame}}. Which leads to cases like mypy reporting that 
> certain pandas methods are not present in {{DataFrameLike}}, even tho those 
> methods are valid methods on pandas {{DataFrame}}, which is the actual type 
> of the object. This requires type ignore comments or asserts.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37669) Remove unnecessary usages of OrderedDict

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37669:


Assignee: (was: Apache Spark)

> Remove unnecessary usages of OrderedDict
> 
>
> Key: SPARK-37669
> URL: https://issues.apache.org/jira/browse/SPARK-37669
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Now that supported Python is 3.7 and above, we can remove unnecessary usages 
> of {{OrderedDict}} because built-in dict guarantees the insertion order.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37669) Remove unnecessary usages of OrderedDict

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37669:


Assignee: Apache Spark

> Remove unnecessary usages of OrderedDict
> 
>
> Key: SPARK-37669
> URL: https://issues.apache.org/jira/browse/SPARK-37669
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> Now that supported Python is 3.7 and above, we can remove unnecessary usages 
> of {{OrderedDict}} because built-in dict guarantees the insertion order.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37669) Remove unnecessary usages of OrderedDict

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461070#comment-17461070
 ] 

Apache Spark commented on SPARK-37669:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/34926

> Remove unnecessary usages of OrderedDict
> 
>
> Key: SPARK-37669
> URL: https://issues.apache.org/jira/browse/SPARK-37669
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Now that supported Python is 3.7 and above, we can remove unnecessary usages 
> of {{OrderedDict}} because built-in dict guarantees the insertion order.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36019) Cannot run leveldb related UTs on Mac OS of M1 architecture

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36019:
--
Parent: SPARK-35781
Issue Type: Sub-task  (was: Bug)

> Cannot run leveldb related UTs on Mac OS of M1 architecture
> ---
>
> Key: SPARK-36019
> URL: https://issues.apache.org/jira/browse/SPARK-36019
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Major
>
> When run leveldb related UTs on Mac OS of M1 architecture, there are some 
> test failed as follows:
> {code:java}
> [INFO] Running org.apache.spark.util.kvstore.LevelDBSuite
> [ERROR] Tests run: 10, Failures: 0, Errors: 10, Skipped: 0, Time elapsed: 
> 0.18 s <<< FAILURE! - in org.apache.spark.util.kvstore.LevelDBSuite
> [ERROR] 
> org.apache.spark.util.kvstore.LevelDBSuite.testMultipleTypesWriteReadDelete  
> Time elapsed: 0.146 s  <<< ERROR!
> java.lang.UnsatisfiedLinkError: 
> Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, 
> no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, 
> /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8:
>  
> dlopen(/Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8,
>  1): no suitable image found.  Did find:
>   
> /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8:
>  no matching architecture in universal wrapper
>   
> /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8:
>  no matching architecture in universal wrapper]
>   at 
> org.apache.spark.util.kvstore.LevelDBSuite.setup(LevelDBSuite.java:55)
> [ERROR] org.apache.spark.util.kvstore.LevelDBSuite.testObjectWriteReadDelete  
> Time elapsed: 0 s  <<< ERROR!
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.fusesource.leveldbjni.JniDBFactory
>   at 
> org.apache.spark.util.kvstore.LevelDBSuite.setup(LevelDBSuite.java:55)
> 
> [ERROR] Tests run: 105, Failures: 0, Errors: 48, Skipped: 0{code}
> There seems to be a lack of JNI support



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-36019) Cannot run leveldb related UTs on Mac OS of M1 architecture

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-36019.
-

> Cannot run leveldb related UTs on Mac OS of M1 architecture
> ---
>
> Key: SPARK-36019
> URL: https://issues.apache.org/jira/browse/SPARK-36019
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Major
>
> When run leveldb related UTs on Mac OS of M1 architecture, there are some 
> test failed as follows:
> {code:java}
> [INFO] Running org.apache.spark.util.kvstore.LevelDBSuite
> [ERROR] Tests run: 10, Failures: 0, Errors: 10, Skipped: 0, Time elapsed: 
> 0.18 s <<< FAILURE! - in org.apache.spark.util.kvstore.LevelDBSuite
> [ERROR] 
> org.apache.spark.util.kvstore.LevelDBSuite.testMultipleTypesWriteReadDelete  
> Time elapsed: 0.146 s  <<< ERROR!
> java.lang.UnsatisfiedLinkError: 
> Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, 
> no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, 
> /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8:
>  
> dlopen(/Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8,
>  1): no suitable image found.  Did find:
>   
> /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8:
>  no matching architecture in universal wrapper
>   
> /Users/yangjie01/SourceCode/git/spark-mine-12/common/kvstore/target/tmp/libleveldbjni-64-1-7259526109351494242.8:
>  no matching architecture in universal wrapper]
>   at 
> org.apache.spark.util.kvstore.LevelDBSuite.setup(LevelDBSuite.java:55)
> [ERROR] org.apache.spark.util.kvstore.LevelDBSuite.testObjectWriteReadDelete  
> Time elapsed: 0 s  <<< ERROR!
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.fusesource.leveldbjni.JniDBFactory
>   at 
> org.apache.spark.util.kvstore.LevelDBSuite.setup(LevelDBSuite.java:55)
> 
> [ERROR] Tests run: 105, Failures: 0, Errors: 48, Skipped: 0{code}
> There seems to be a lack of JNI support



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37317) Reduce weights in GaussianMixtureSuite

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37317:
--
Parent Issue: SPARK-35781  (was: SPARK-33772)

> Reduce weights in GaussianMixtureSuite
> --
>
> Key: SPARK-37317
> URL: https://issues.apache.org/jira/browse/SPARK-37317
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.1, 3.3.0
>
>
> {code}
> $ build/sbt "mllib/test"
> ...
> [info] *** 1 TEST FAILED ***
> [error] Failed: Total 1760, Failed 1, Errors 0, Passed 1759, Ignored 7
> [error] Failed tests:
> [error]   org.apache.spark.ml.clustering.GaussianMixtureSuite
> [error] (mllib / Test / test) sbt.TestsFailedException: Tests unsuccessful
> [error] Total time: 625 s (10:25), completed Nov 13, 2021, 6:21:13 PM
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37522) Fix MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37522:
--
Parent Issue: SPARK-35781  (was: SPARK-33772)

> Fix MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction
> --
>
> Key: SPARK-37522
> URL: https://issues.apache.org/jira/browse/SPARK-37522
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.2.1, 3.3.0
>
>
> The failure happens on Java 17 native Apple Silicon version on Python 
> 3.9/3.10.
> {code}
> $ java -version
> openjdk version "17.0.1" 2021-10-19 LTS
> OpenJDK Runtime Environment Zulu17.30+15-CA (build 17.0.1+12-LTS)
> OpenJDK 64-Bit Server VM Zulu17.30+15-CA (build 17.0.1+12-LTS, mixed mode, 
> sharing)
> {code}
> {code}
> ==
> FAIL: test_raw_and_probability_prediction 
> (pyspark.ml.tests.test_algorithms.MultilayerPerceptronClassifierTest)
> --
> Traceback (most recent call last):
>   File 
> "/Users/dongjoon/APACHE/spark-merge/python/pyspark/ml/tests/test_algorithms.py",
>  line 104, in test_raw_and_probability_prediction
> self.assertTrue(np.allclose(result.rawPrediction, expected_rawPrediction, 
> rtol=0.102))
> AssertionError: False is not true
> --
> Ran 1 test in 7.385s
> FAILED (failures=1)
> Had test failures in pyspark.ml.tests.test_algorithms 
> MultilayerPerceptronClassifierTest.test_raw_and_probability_prediction with 
> python3; see logs.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37272) Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37272:
--
Parent Issue: SPARK-35781  (was: SPARK-33772)

> Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon
> 
>
> Key: SPARK-37272
> URL: https://issues.apache.org/jira/browse/SPARK-37272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>
> Java 17 officially support Apple Silicon
> - JEP 391: macOS/AArch64 Port
> - https://bugs.openjdk.java.net/browse/JDK-8251280
> Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
> natively.
> {code}
> /Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable 
> arm64
> /Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
> /Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
> arm64
> {code}
> Since RocksDBJNI still doesn't support Apple Silicon natively, the following 
> failures occur on M1.
> {code}
> $ build/sbt "sql/testOnly *RocksDB* *.StreamingSessionWindowSuite"
> ...
> [info] Run completed in 23 seconds, 281 milliseconds.
> [info] Total number of tests run: 32
> [info] Suites: completed 2, aborted 2
> [info] Tests: succeeded 22, failed 10, canceled 0, ignored 0, pending 0
> [info] *** 2 SUITES ABORTED ***
> [info] *** 10 TESTS FAILED ***
> [error] Failed tests:
> [error]   org.apache.spark.sql.streaming.StreamingSessionWindowSuite
> [error]   
> org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreIntegrationSuite
> [error] Error during tests:
> [error]   org.apache.spark.sql.execution.streaming.state.RocksDBSuite
> [error]   
> org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreSuite
> [error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
> [error] Total time: 43 s, completed Nov 10, 2021 4:29:50 PM
> {code}
> This issue aims to add ExtendedRocksDBTest to disable RocksDB selectively on 
> Apple Silicon.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37282) Add ExtendedLevelDBTest and disable LevelDB tests on Apple Silicon

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37282:
--
Parent Issue: SPARK-35781  (was: SPARK-33772)

> Add ExtendedLevelDBTest and disable LevelDB tests on Apple Silicon
> --
>
> Key: SPARK-37282
> URL: https://issues.apache.org/jira/browse/SPARK-37282
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>
> Java 17 officially support Apple Silicon.
> - JEP 391: macOS/AArch64 Port
> - https://bugs.openjdk.java.net/browse/JDK-8251280
> Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
> natively.
> {code}
> /Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable 
> arm64
> /Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
> /Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
> arm64
> {code}
> Since LevelDBJNI still doesn't support Apple Silicon natively, the test cases 
> fail on M1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37655) Add RocksDB Implementation for KVStore

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37655:
--
Parent: SPARK-35781
Issue Type: Sub-task  (was: Improvement)

> Add RocksDB Implementation for KVStore
> --
>
> Key: SPARK-37655
> URL: https://issues.apache.org/jira/browse/SPARK-37655
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37655) Add RocksDB Implementation for KVStore

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37655:
--
Parent: (was: SPARK-33772)
Issue Type: Improvement  (was: Sub-task)

> Add RocksDB Implementation for KVStore
> --
>
> Key: SPARK-37655
> URL: https://issues.apache.org/jira/browse/SPARK-37655
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35781) Support Spark on Apple Silicon on macOS natively on Java 17

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-35781:
--
Summary: Support Spark on Apple Silicon on macOS natively on Java 17  (was: 
Support Spark on Apple Silicon on macOS natively)

> Support Spark on Apple Silicon on macOS natively on Java 17
> ---
>
> Key: SPARK-35781
> URL: https://issues.apache.org/jira/browse/SPARK-35781
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: DB Tsai
>Priority: Major
>
> This is an umbrella JIRA tracking the progress of supporting Apple Silicon on 
> macOS natively.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37669) Remove unnecessary usages of OrderedDict

2021-12-16 Thread Takuya Ueshin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461057#comment-17461057
 ] 

Takuya Ueshin commented on SPARK-37669:
---

I'm working on this.

> Remove unnecessary usages of OrderedDict
> 
>
> Key: SPARK-37669
> URL: https://issues.apache.org/jira/browse/SPARK-37669
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Now that supported Python is 3.7 and above, we can remove unnecessary usages 
> of {{OrderedDict}} because built-in dict guarantees the insertion order.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37669) Remove unnecessary usages of OrderedDict

2021-12-16 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-37669:
-

 Summary: Remove unnecessary usages of OrderedDict
 Key: SPARK-37669
 URL: https://issues.apache.org/jira/browse/SPARK-37669
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Takuya Ueshin


Now that supported Python is 3.7 and above, we can remove unnecessary usages of 
{{OrderedDict}} because built-in dict guarantees the insertion order.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37667) Spark throws TreeNodeException ("Couldn't find gen_alias") during wildcard column expansion

2021-12-16 Thread Kellan B Cummings (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellan B Cummings updated SPARK-37667:
--
Description: 
I'm seeing a TreeNodeException ("Couldn't find {_}gen_alias{_}") when running 
certain operations in Spark 3.1.2.

A few conditions need to be met to trigger the bug:
 - a DF with a nested struct joins to a second DF
 - a filter that compares a column in the right DF to a column in the left DF
 - wildcard column expansion of the nested struct
 - a group by statement on a struct column

*Data*
g...@github.com:kellanburket/spark3bug.git

 
{code:java}
val rightDf = spark.read.parquet("right.parquet")
val leftDf = spark.read.parquet("left.parquet"){code}
 

*Schemas*
{code:java}
leftDf.printSchema()
root
 |-- row: struct (nullable = true)
 |    |-- mid: string (nullable = true)
 |    |-- start: struct (nullable = true)
 |    |    |-- latitude: double (nullable = true)
 |    |    |-- longitude: double (nullable = true)
 |-- s2_cell_id: long (nullable = true){code}
{code:java}
rightDf.printSchema()
root
 |-- id: string (nullable = true)
 |-- s2_cell_id: long (nullable = true){code}
 

*Breaking Code*
{code:java}
leftDf.join(rightDf, "s2_cell_id").filter(
    "id != row.start.latitude"
).select(
   col("row.*"), col("id")
).groupBy(
    "start"
).agg(
   min("id")
).show(){code}
 

*Working Examples*

The following examples don't seem to be effected by the bug

Works without group by:
{code:java}
leftDf.join(rightDf, "s2_cell_id").filter(
    "id != row.start.latitude"
).select(
   col("row.*"), col("id")
).show(){code}
Works without filter
{code:java}
leftDf.join(rightDf, "s2_cell_id").select(
   col("row.*"), col("id")
).groupBy(
    "start"
).agg(
   min("id")
).show(){code}
Works without wildcard expansion
{code:java}
leftDf.join(rightDf, "s2_cell_id").filter(
    "id != row.start.latitude"
).select(
   col("row.start"), col("id")
).groupBy(
    "start"
).agg(
   min("id")
).show(){code}
Works with caching
{code:java}
leftDf.join(rightDf, "s2_cell_id").filter(
    "id != row.start.latitude"
).cache().select(
   col("row.*"),
   col("id")
).groupBy(
    "start"
).agg(
   min("id")
).show(){code}
*Error message*

 

 
{code:java}
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange hashpartitioning(start#2116, 1024), ENSURE_REQUIREMENTS, [id=#3849]
+- SortAggregate(key=[knownfloatingpointnormalized(if (isnull(start#2116)) null 
else named_struct(latitude, 
knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), 
longitude, 
knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS 
start#2116], functions=[partial_min(id#2103)], output=[start#2116, min#2138])
   +- *(2) Sort [knownfloatingpointnormalized(if (isnull(start#2116)) null else 
named_struct(latitude, 
knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), 
longitude, 
knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS 
start#2116 ASC NULLS FIRST], false, 0
      +- *(2) Project [_gen_alias_2133#2133 AS start#2116, id#2103]
         +- *(2) !BroadcastHashJoin [s2_cell_id#2108L], [s2_cell_id#2104L], 
Inner, BuildLeft, NOT (cast(id#2103 as double) = _gen_alias_2134#2134), false
            :- BroadcastQueryStage 0
            :  +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, 
bigint, false]),false), [id=#3768]
            :     +- *(1) Project [row#2107.start AS _gen_alias_2133#2133, 
s2_cell_id#2108L]
            :        +- *(1) Filter isnotnull(s2_cell_id#2108L)
            :           +- FileScan parquet [row#2107,s2_cell_id#2108L] 
Batched: false, DataFilters: [isnotnull(s2_cell_id#2108L)], Format: Parquet, 
Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/left], 
PartitionFilters: [], PushedFilters: [IsNotNull(s2_cell_id)], ReadSchema: 
struct>,s2_cell_id:bigint>
            +- *(2) Filter (isnotnull(id#2103) AND isnotnull(s2_cell_id#2104L))
               +- *(2) ColumnarToRow
                  +- FileScan parquet [id#2103,s2_cell_id#2104L] Batched: true, 
DataFilters: [isnotnull(id#2103), isnotnull(s2_cell_id#2104L)], Format: 
Parquet, Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/right], 
PartitionFilters: [], PushedFilters: [IsNotNull(id), IsNotNull(s2_cell_id)], 
ReadSchema: struct
  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
  at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$materializeFuture$1(ShuffleExchangeExec.scala:101)
  at org.apache.spark.sql.util.LazyValue.getOrInit(LazyValue.scala:41)
  at 
org.apache.spark.sql.execution.exchange.Exchange.getOrInitMaterializeFuture(Exchange.scala:71)
  at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.materializeFuture(ShuffleExchangeExec.scala:97)
  at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.materialize(ShuffleExchangeExec.scala:85)

[jira] [Updated] (SPARK-37667) Spark throws TreeNodeException ("Couldn't find gen_alias") during wildcard column expansion

2021-12-16 Thread Kellan B Cummings (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellan B Cummings updated SPARK-37667:
--
Summary: Spark throws TreeNodeException ("Couldn't find gen_alias") during 
wildcard column expansion  (was: Spark throws TreeNodeException during wildcard 
column expansion)

> Spark throws TreeNodeException ("Couldn't find gen_alias") during wildcard 
> column expansion
> ---
>
> Key: SPARK-37667
> URL: https://issues.apache.org/jira/browse/SPARK-37667
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Kellan B Cummings
>Priority: Major
>
> I'm seeing a TreeNodeException ("Couldn't find _gen_alias_") when running 
> certain operations in Spark 3.1.2.
> A few conditions need to be met to trigger the bug:
> - a DF with a nested struct joins to a second DF
> - a filter that compares a column in the right DF to a column in the left DF
> - wildcard column expansion of the nested struct
> - a group by statement on a struct column
> *Data*
> g...@github.com:kellanburket/spark3bug.git
>  
> {code:java}
> val rightDf = spark.read.parquet("right.parquet")
> val leftDf = spark.read.parquet("left.parquet"){code}
>  
> *Schemas*
> {code:java}
> leftDf.printSchema()
> root
>  |-- row: struct (nullable = true)
>  |    |-- mid: string (nullable = true)
>  |    |-- start: struct (nullable = true)
>  |    |    |-- latitude: double (nullable = true)
>  |    |    |-- longitude: double (nullable = true)
>  |-- s2_cell_id: long (nullable = true){code}
> {code:java}
> rightDf.printSchema()
> root
>  |-- id: string (nullable = true)
>  |-- s2_cell_id: long (nullable = true){code}
>  
> *Breaking Code*
> {code:java}
> leftDf.join(rightDf, "s2_cell_id").filter(
>     "id != row.start.latitude"
> ).select(
>    col("row.*"), col("id")
> ).groupBy(
>     "start"
> ).agg(
>    min("id")
> ).show(){code}
>  
> *Working Examples*
> The following examples don't seem to be effected by the bug
> Works without group by:
> {code:java}
> leftDf.join(rightDf, "s2_cell_id").filter(
>     "id != row.start.latitude"
> ).select(
>    col("row.*"), col("id")
> ).show(){code}
> Works without filter
> {code:java}
> leftDf.join(rightDf, "s2_cell_id").select(
>    col("row.*"), col("id")
> ).groupBy(
>     "start"
> ).agg(
>    min("id")
> ).show(){code}
> Works without variable expansion
> {code:java}
> leftDf.join(rightDf, "s2_cell_id").filter(
>     "id != row.start.latitude"
> ).select(
>    col("row.start"), col("id")
> ).groupBy(
>     "start"
> ).agg(
>    min("id")
> ).show(){code}
> Works with caching
> {code:java}
> leftDf.join(rightDf, "s2_cell_id").filter(
>     "id != row.start.latitude"
> ).cache().select(
>    col("row.*"),
>    col("id")
> ).groupBy(
>     "start"
> ).agg(
>    min("id")
> ).show(){code}
> *Error message*
>  
>  
> {code:java}
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Exchange hashpartitioning(start#2116, 1024), ENSURE_REQUIREMENTS, [id=#3849]
> +- SortAggregate(key=[knownfloatingpointnormalized(if (isnull(start#2116)) 
> null else named_struct(latitude, 
> knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), 
> longitude, 
> knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS 
> start#2116], functions=[partial_min(id#2103)], output=[start#2116, min#2138])
>    +- *(2) Sort [knownfloatingpointnormalized(if (isnull(start#2116)) null 
> else named_struct(latitude, 
> knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), 
> longitude, 
> knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS 
> start#2116 ASC NULLS FIRST], false, 0
>       +- *(2) Project [_gen_alias_2133#2133 AS start#2116, id#2103]
>          +- *(2) !BroadcastHashJoin [s2_cell_id#2108L], [s2_cell_id#2104L], 
> Inner, BuildLeft, NOT (cast(id#2103 as double) = _gen_alias_2134#2134), false
>             :- BroadcastQueryStage 0
>             :  +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, 
> bigint, false]),false), [id=#3768]
>             :     +- *(1) Project [row#2107.start AS _gen_alias_2133#2133, 
> s2_cell_id#2108L]
>             :        +- *(1) Filter isnotnull(s2_cell_id#2108L)
>             :           +- FileScan parquet [row#2107,s2_cell_id#2108L] 
> Batched: false, DataFilters: [isnotnull(s2_cell_id#2108L)], Format: Parquet, 
> Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/left], 
> PartitionFilters: [], PushedFilters: [IsNotNull(s2_cell_id)], ReadSchema: 
> struct>,s2_cell_id:bigint>
>             +- *(2) Filter (isnotnull(id#2103) AND 
> isnotnull(s2_cell_id#2104L))
>                +- *(2) ColumnarToRow
>                   +- FileScan parquet [id#2103,s2_cell_id#210

[jira] [Updated] (SPARK-37668) 'Index' object has no attribute 'levels' in pyspark.pandas.frame.DataFrame.insert

2021-12-16 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz updated SPARK-37668:
---
Summary: 'Index' object has no attribute 'levels' in  
pyspark.pandas.frame.DataFrame.insert  (was: 'Index' object has no attribute 
'levels')

> 'Index' object has no attribute 'levels' in  
> pyspark.pandas.frame.DataFrame.insert
> --
>
> Key: SPARK-37668
> URL: https://issues.apache.org/jira/browse/SPARK-37668
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
>  [This piece of 
> code|https://github.com/apache/spark/blob/6e45b04db48008fa033b09df983d3bd1c4f790ea/python/pyspark/pandas/frame.py#L3991-L3993]
>  in {{pyspark.pandas.frame}} is going to fail on runtime, when 
> {{is_name_like_tuple}} evaluates to {{True}}
> {code:python}
> if is_name_like_tuple(column):
> if len(column) != len(self.columns.levels):
> {code}
> with 
> {code}
> 'Index' object has no attribute 'levels'
> {code}
> To be honest, I am not sure what is intended behavior (initially, I suspected 
> that we should have 
> {code:python}
>  if len(column) != self.columns.nlevels
> {code}
> but {{nlevels}} is hard-coded to one, and wouldn't be consistent with Pandas 
> at all.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37668) 'Index' object has no attribute 'levels'

2021-12-16 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461039#comment-17461039
 ] 

Maciej Szymkiewicz commented on SPARK-37668:


cc [~hyukjin.kwon], [~itholic], [~ueshin], [~XinrongM].

> 'Index' object has no attribute 'levels'
> 
>
> Key: SPARK-37668
> URL: https://issues.apache.org/jira/browse/SPARK-37668
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
>  [This piece of 
> code|https://github.com/apache/spark/blob/6e45b04db48008fa033b09df983d3bd1c4f790ea/python/pyspark/pandas/frame.py#L3991-L3993]
>  in {{pyspark.pandas.frame}} is going to fail on runtime, when 
> {{is_name_like_tuple}} evaluates to {{True}}
> {code:python}
> if is_name_like_tuple(column):
> if len(column) != len(self.columns.levels):
> {code}
> with 
> {code}
> 'Index' object has no attribute 'levels'
> {code}
> To be honest, I am not sure what is intended behavior (initially, I suspected 
> that we should have 
> {code:python}
>  if len(column) != self.columns.nlevels
> {code}
> but {{nlevels}} is hard-coded to one, and wouldn't be consistent with Pandas 
> at all.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37667) Spark throws TreeNodeException during wildcard column expansion

2021-12-16 Thread Kellan B Cummings (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kellan B Cummings updated SPARK-37667:
--
Summary: Spark throws TreeNodeException during wildcard column expansion  
(was: Spark throws TreeNodeException during variable expansion)

> Spark throws TreeNodeException during wildcard column expansion
> ---
>
> Key: SPARK-37667
> URL: https://issues.apache.org/jira/browse/SPARK-37667
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Kellan B Cummings
>Priority: Major
>
> I'm seeing a TreeNodeException ("Couldn't find _gen_alias_") when running 
> certain operations in Spark 3.1.2.
> A few conditions need to be met to trigger the bug:
> - a DF with a nested struct joins to a second DF
> - a filter that compares a column in the right DF to a column in the left DF
> - wildcard column expansion of the nested struct
> - a group by statement on a struct column
> *Data*
> g...@github.com:kellanburket/spark3bug.git
>  
> {code:java}
> val rightDf = spark.read.parquet("right.parquet")
> val leftDf = spark.read.parquet("left.parquet"){code}
>  
> *Schemas*
> {code:java}
> leftDf.printSchema()
> root
>  |-- row: struct (nullable = true)
>  |    |-- mid: string (nullable = true)
>  |    |-- start: struct (nullable = true)
>  |    |    |-- latitude: double (nullable = true)
>  |    |    |-- longitude: double (nullable = true)
>  |-- s2_cell_id: long (nullable = true){code}
> {code:java}
> rightDf.printSchema()
> root
>  |-- id: string (nullable = true)
>  |-- s2_cell_id: long (nullable = true){code}
>  
> *Breaking Code*
> {code:java}
> leftDf.join(rightDf, "s2_cell_id").filter(
>     "id != row.start.latitude"
> ).select(
>    col("row.*"), col("id")
> ).groupBy(
>     "start"
> ).agg(
>    min("id")
> ).show(){code}
>  
> *Working Examples*
> The following examples don't seem to be effected by the bug
> Works without group by:
> {code:java}
> leftDf.join(rightDf, "s2_cell_id").filter(
>     "id != row.start.latitude"
> ).select(
>    col("row.*"), col("id")
> ).show(){code}
> Works without filter
> {code:java}
> leftDf.join(rightDf, "s2_cell_id").select(
>    col("row.*"), col("id")
> ).groupBy(
>     "start"
> ).agg(
>    min("id")
> ).show(){code}
> Works without variable expansion
> {code:java}
> leftDf.join(rightDf, "s2_cell_id").filter(
>     "id != row.start.latitude"
> ).select(
>    col("row.start"), col("id")
> ).groupBy(
>     "start"
> ).agg(
>    min("id")
> ).show(){code}
> Works with caching
> {code:java}
> leftDf.join(rightDf, "s2_cell_id").filter(
>     "id != row.start.latitude"
> ).cache().select(
>    col("row.*"),
>    col("id")
> ).groupBy(
>     "start"
> ).agg(
>    min("id")
> ).show(){code}
> *Error message*
>  
>  
> {code:java}
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Exchange hashpartitioning(start#2116, 1024), ENSURE_REQUIREMENTS, [id=#3849]
> +- SortAggregate(key=[knownfloatingpointnormalized(if (isnull(start#2116)) 
> null else named_struct(latitude, 
> knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), 
> longitude, 
> knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS 
> start#2116], functions=[partial_min(id#2103)], output=[start#2116, min#2138])
>    +- *(2) Sort [knownfloatingpointnormalized(if (isnull(start#2116)) null 
> else named_struct(latitude, 
> knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), 
> longitude, 
> knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS 
> start#2116 ASC NULLS FIRST], false, 0
>       +- *(2) Project [_gen_alias_2133#2133 AS start#2116, id#2103]
>          +- *(2) !BroadcastHashJoin [s2_cell_id#2108L], [s2_cell_id#2104L], 
> Inner, BuildLeft, NOT (cast(id#2103 as double) = _gen_alias_2134#2134), false
>             :- BroadcastQueryStage 0
>             :  +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, 
> bigint, false]),false), [id=#3768]
>             :     +- *(1) Project [row#2107.start AS _gen_alias_2133#2133, 
> s2_cell_id#2108L]
>             :        +- *(1) Filter isnotnull(s2_cell_id#2108L)
>             :           +- FileScan parquet [row#2107,s2_cell_id#2108L] 
> Batched: false, DataFilters: [isnotnull(s2_cell_id#2108L)], Format: Parquet, 
> Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/left], 
> PartitionFilters: [], PushedFilters: [IsNotNull(s2_cell_id)], ReadSchema: 
> struct>,s2_cell_id:bigint>
>             +- *(2) Filter (isnotnull(id#2103) AND 
> isnotnull(s2_cell_id#2104L))
>                +- *(2) ColumnarToRow
>                   +- FileScan parquet [id#2103,s2_cell_id#2104L] Batched: 
> true, DataFilters: [isnotnull(id#2103), isnotnull(s2_cell_id#2104L)], Format:

[jira] [Created] (SPARK-37668) 'Index' object has no attribute 'levels'

2021-12-16 Thread Maciej Szymkiewicz (Jira)

Maciej Szymkiewicz created SPARK-37668:
--

 Summary: 'Index' object has no attribute 'levels'
 Key: SPARK-37668
 URL: https://issues.apache.org/jira/browse/SPARK-37668
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.3.0
Reporter: Maciej Szymkiewicz


 [This piece of 
code|https://github.com/apache/spark/blob/6e45b04db48008fa033b09df983d3bd1c4f790ea/python/pyspark/pandas/frame.py#L3991-L3993]
 in {{pyspark.pandas.frame}} is going to fail on runtime, when 
{{is_name_like_tuple}} evaluates to {{True}}

{code:python}
if is_name_like_tuple(column):
if len(column) != len(self.columns.levels):
{code}

with 

{code}
'Index' object has no attribute 'levels'
{code}

To be honest, I am not sure what is intended behavior (initially, I suspected 
that we should have 

{code:python}
 if len(column) != self.columns.nlevels
{code}

but {{nlevels}} is hard-coded to one, and wouldn't be consistent with Pandas at 
all.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37667) Spark throws TreeNodeException during variable expansion

2021-12-16 Thread Kellan B Cummings (Jira)

Kellan B Cummings created SPARK-37667:
-

 Summary: Spark throws TreeNodeException during variable expansion
 Key: SPARK-37667
 URL: https://issues.apache.org/jira/browse/SPARK-37667
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2
Reporter: Kellan B Cummings


I'm seeing a TreeNodeException ("Couldn't find _gen_alias_") when running 
certain operations in Spark 3.1.2.

A few conditions need to be met to trigger the bug:
- a DF with a nested struct joins to a second DF
- a filter that compares a column in the right DF to a column in the left DF
- wildcard column expansion of the nested struct
- a group by statement on a struct column

*Data*
g...@github.com:kellanburket/spark3bug.git

 
{code:java}
val rightDf = spark.read.parquet("right.parquet")
val leftDf = spark.read.parquet("left.parquet"){code}
 

*Schemas*
{code:java}
leftDf.printSchema()
root
 |-- row: struct (nullable = true)
 |    |-- mid: string (nullable = true)
 |    |-- start: struct (nullable = true)
 |    |    |-- latitude: double (nullable = true)
 |    |    |-- longitude: double (nullable = true)
 |-- s2_cell_id: long (nullable = true){code}
{code:java}
rightDf.printSchema()
root
 |-- id: string (nullable = true)
 |-- s2_cell_id: long (nullable = true){code}
 

*Breaking Code*
{code:java}
leftDf.join(rightDf, "s2_cell_id").filter(
    "id != row.start.latitude"
).select(
   col("row.*"), col("id")
).groupBy(
    "start"
).agg(
   min("id")
).show(){code}
 

*Working Examples*

The following examples don't seem to be effected by the bug

Works without group by:
{code:java}
leftDf.join(rightDf, "s2_cell_id").filter(
    "id != row.start.latitude"
).select(
   col("row.*"), col("id")
).show(){code}
Works without filter
{code:java}
leftDf.join(rightDf, "s2_cell_id").select(
   col("row.*"), col("id")
).groupBy(
    "start"
).agg(
   min("id")
).show(){code}
Works without variable expansion
{code:java}
leftDf.join(rightDf, "s2_cell_id").filter(
    "id != row.start.latitude"
).select(
   col("row.start"), col("id")
).groupBy(
    "start"
).agg(
   min("id")
).show(){code}
Works with caching
{code:java}
leftDf.join(rightDf, "s2_cell_id").filter(
    "id != row.start.latitude"
).cache().select(
   col("row.*"),
   col("id")
).groupBy(
    "start"
).agg(
   min("id")
).show(){code}
*Error message*

 

 
{code:java}
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange hashpartitioning(start#2116, 1024), ENSURE_REQUIREMENTS, [id=#3849]
+- SortAggregate(key=[knownfloatingpointnormalized(if (isnull(start#2116)) null 
else named_struct(latitude, 
knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), 
longitude, 
knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS 
start#2116], functions=[partial_min(id#2103)], output=[start#2116, min#2138])
   +- *(2) Sort [knownfloatingpointnormalized(if (isnull(start#2116)) null else 
named_struct(latitude, 
knownfloatingpointnormalized(normalizenanandzero(start#2116.latitude)), 
longitude, 
knownfloatingpointnormalized(normalizenanandzero(start#2116.longitude AS 
start#2116 ASC NULLS FIRST], false, 0
      +- *(2) Project [_gen_alias_2133#2133 AS start#2116, id#2103]
         +- *(2) !BroadcastHashJoin [s2_cell_id#2108L], [s2_cell_id#2104L], 
Inner, BuildLeft, NOT (cast(id#2103 as double) = _gen_alias_2134#2134), false
            :- BroadcastQueryStage 0
            :  +- BroadcastExchange HashedRelationBroadcastMode(List(input[1, 
bigint, false]),false), [id=#3768]
            :     +- *(1) Project [row#2107.start AS _gen_alias_2133#2133, 
s2_cell_id#2108L]
            :        +- *(1) Filter isnotnull(s2_cell_id#2108L)
            :           +- FileScan parquet [row#2107,s2_cell_id#2108L] 
Batched: false, DataFilters: [isnotnull(s2_cell_id#2108L)], Format: Parquet, 
Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/left], 
PartitionFilters: [], PushedFilters: [IsNotNull(s2_cell_id)], ReadSchema: 
struct>,s2_cell_id:bigint>
            +- *(2) Filter (isnotnull(id#2103) AND isnotnull(s2_cell_id#2104L))
               +- *(2) ColumnarToRow
                  +- FileScan parquet [id#2103,s2_cell_id#2104L] Batched: true, 
DataFilters: [isnotnull(id#2103), isnotnull(s2_cell_id#2104L)], Format: 
Parquet, Location: InMemoryFileIndex[s3://co.mira.public/spark3_bug/right], 
PartitionFilters: [], PushedFilters: [IsNotNull(id), IsNotNull(s2_cell_id)], 
ReadSchema: struct
  at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
  at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeLike.$anonfun$materializeFuture$1(ShuffleExchangeExec.scala:101)
  at org.apache.spark.sql.util.LazyValue.getOrInit(LazyValue.scala:41)
  at 
org.apache.spark.sql.execution.exchange.Exchange.getOrInitMaterializeFuture(Exchange.scala:71)
  at 
org.apache.spark.sql.execution.exc

[jira] [Assigned] (SPARK-34521) spark.createDataFrame does not support Pandas StringDtype extension type

2021-12-16 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-34521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned SPARK-34521:


Assignee: Nicolas Azrak

> spark.createDataFrame does not support Pandas StringDtype extension type
> 
>
> Key: SPARK-34521
> URL: https://issues.apache.org/jira/browse/SPARK-34521
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.1
>Reporter: Pavel Ganelin
>Assignee: Nicolas Azrak
>Priority: Major
> Fix For: 3.3.0
>
>
> The following test case demonstrates the problem:
> {code:java}
> import pandas as pd
> from pyspark.sql import SparkSession, types
> spark = SparkSession.builder.appName(__file__)\
> .config("spark.sql.execution.arrow.pyspark.enabled","true") \
> .getOrCreate()
> good = pd.DataFrame([["abc"]], columns=["col"])
> schema = types.StructType([types.StructField("col", types.StringType(), 
> True)])
> df = spark.createDataFrame(good, schema=schema)
> df.show()
> bad = good.copy()
> bad["col"]=bad["col"].astype("string")
> schema = types.StructType([types.StructField("col", types.StringType(), 
> True)])
> df = spark.createDataFrame(bad, schema=schema)
> df.show(){code}
> The error:
> {code:java}
> C:\Python\3.8.3\lib\site-packages\pyspark\sql\pandas\conversion.py:289: 
> UserWarning: createDataFrame attempted Arrow optimization because 
> 'spark.sql.execution.arrow.pyspark.enabled' is set to true; however, failed 
> by the reason below:
>   Cannot specify a mask or a size when passing an object that is converted 
> with the __arrow_array__ protocol.
> Attempting non-optimization as 
> 'spark.sql.execution.arrow.pyspark.fallback.enabled' is set to true.
>   warnings.warn(msg)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37666) Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37666:


Assignee: Max Gekk  (was: Apache Spark)

> Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`
> 
>
> Key: SPARK-37666
> URL: https://issues.apache.org/jira/browse/SPARK-37666
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Change the default mode from ECB to GCM in AES functions: aes_encrypt() and 
> aes_decrypt(). GCM is much more preferable because it is semantically secure. 
> Also the mode is used the default one in other systems like Snowflake, see 
> https://docs.snowflake.com/en/sql-reference/functions/encrypt.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37666) Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37666:


Assignee: Apache Spark  (was: Max Gekk)

> Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`
> 
>
> Key: SPARK-37666
> URL: https://issues.apache.org/jira/browse/SPARK-37666
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Change the default mode from ECB to GCM in AES functions: aes_encrypt() and 
> aes_decrypt(). GCM is much more preferable because it is semantically secure. 
> Also the mode is used the default one in other systems like Snowflake, see 
> https://docs.snowflake.com/en/sql-reference/functions/encrypt.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37666) Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460984#comment-17460984
 ] 

Apache Spark commented on SPARK-37666:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/34925

> Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`
> 
>
> Key: SPARK-37666
> URL: https://issues.apache.org/jira/browse/SPARK-37666
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> Change the default mode from ECB to GCM in AES functions: aes_encrypt() and 
> aes_decrypt(). GCM is much more preferable because it is semantically secure. 
> Also the mode is used the default one in other systems like Snowflake, see 
> https://docs.snowflake.com/en/sql-reference/functions/encrypt.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37666) Set `GCM` as the default mode in `aes_encrypt()`/`aes_decrypt()`

2021-12-16 Thread Max Gekk (Jira)

Max Gekk created SPARK-37666:


 Summary: Set `GCM` as the default mode in 
`aes_encrypt()`/`aes_decrypt()`
 Key: SPARK-37666
 URL: https://issues.apache.org/jira/browse/SPARK-37666
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Max Gekk
Assignee: Max Gekk


Change the default mode from ECB to GCM in AES functions: aes_encrypt() and 
aes_decrypt(). GCM is much more preferable because it is semantically secure. 
Also the mode is used the default one in other systems like Snowflake, see 
https://docs.snowflake.com/en/sql-reference/functions/encrypt.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37664) Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java 11/17 result

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37664:
--
Parent: SPARK-33772
Issue Type: Sub-task  (was: Task)

> Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java 
> 11/17 result
> --
>
> Key: SPARK-37664
> URL: https://issues.apache.org/jira/browse/SPARK-37664
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37664) Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java 11/17 result

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37664:
--
Summary: Add InMemoryColumnarBenchmark and 
StateStoreBasicOperationsBenchmark Java 11/17 result  (was: Supplement 
benchmark result of InMemoryColumnarBenchmark and 
StateStoreBasicOperationsBenchmark for Java 11 and Java 17)

> Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java 
> 11/17 result
> --
>
> Key: SPARK-37664
> URL: https://issues.apache.org/jira/browse/SPARK-37664
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37664) Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java 11/17 result

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37664:
--
Issue Type: Task  (was: Improvement)

> Add InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark Java 
> 11/17 result
> --
>
> Key: SPARK-37664
> URL: https://issues.apache.org/jira/browse/SPARK-37664
> Project: Spark
>  Issue Type: Task
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37664.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34921
[https://github.com/apache/spark/pull/34921]

> Supplement benchmark result of InMemoryColumnarBenchmark and 
> StateStoreBasicOperationsBenchmark for Java 11 and Java 17
> ---
>
> Key: SPARK-37664
> URL: https://issues.apache.org/jira/browse/SPARK-37664
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17

2021-12-16 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-37664:
-

Assignee: Yang Jie

> Supplement benchmark result of InMemoryColumnarBenchmark and 
> StateStoreBasicOperationsBenchmark for Java 11 and Java 17
> ---
>
> Key: SPARK-37664
> URL: https://issues.apache.org/jira/browse/SPARK-37664
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37145) Improvement for extending pod feature steps with KubernetesConf

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37145:


Assignee: (was: Apache Spark)

> Improvement for extending pod feature steps with KubernetesConf
> ---
>
> Key: SPARK-37145
> URL: https://issues.apache.org/jira/browse/SPARK-37145
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: wangxin201492
>Priority: Major
>
> SPARK-33261 provides us with great convenience, but it only construct a 
> `KubernetesFeatureConfigStep` with a empty construction method.
> It would be better to use the construction method with `KubernetesConf` (or 
> more detail: `KubernetesDriverConf` and `KubernetesExecutorConf`)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37145) Improvement for extending pod feature steps with KubernetesConf

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37145:


Assignee: Apache Spark

> Improvement for extending pod feature steps with KubernetesConf
> ---
>
> Key: SPARK-37145
> URL: https://issues.apache.org/jira/browse/SPARK-37145
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: wangxin201492
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-33261 provides us with great convenience, but it only construct a 
> `KubernetesFeatureConfigStep` with a empty construction method.
> It would be better to use the construction method with `KubernetesConf` (or 
> more detail: `KubernetesDriverConf` and `KubernetesExecutorConf`)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37145) Improvement for extending pod feature steps with KubernetesConf

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460889#comment-17460889
 ] 

Apache Spark commented on SPARK-37145:
--

User 'Yikun' has created a pull request for this issue:
https://github.com/apache/spark/pull/34924

> Improvement for extending pod feature steps with KubernetesConf
> ---
>
> Key: SPARK-37145
> URL: https://issues.apache.org/jira/browse/SPARK-37145
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: wangxin201492
>Priority: Major
>
> SPARK-33261 provides us with great convenience, but it only construct a 
> `KubernetesFeatureConfigStep` with a empty construction method.
> It would be better to use the construction method with `KubernetesConf` (or 
> more detail: `KubernetesDriverConf` and `KubernetesExecutorConf`)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-37630) Security issue from Log4j 1.X exploit

2021-12-16 Thread Ismail H (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460832#comment-17460832
 ] 

Ismail H edited comment on SPARK-37630 at 12/16/21, 4:06 PM:
-

to [~divekarsc] , extract from 
https://access.redhat.com/security/cve/CVE-2021-4104 :
bq. Note this flaw ONLY affects applications which are specifically configured 
to use JMSAppender, which is not the default, or when the attacker has write 
access to the Log4j configuration for adding JMSAppender to the attacker's JMS 
Broker.

so the question is, is Spark using JMSAppender ?


was (Author: JIRAUSER281735):
to [~divekarsc] , extract from 
https://access.redhat.com/security/cve/CVE-2021-4104 :
bq. Note this flaw ONLY affects applications which are specifically configured 
to use JMSAppender, which is not the default, or when the attacker has write 
access to the Log4j configuration for adding JMSAppender to the attacker's JMS 
Broker. bq. 

so the question is, is Spark using JMSAppender ?

> Security issue from Log4j 1.X exploit
> -
>
> Key: SPARK-37630
> URL: https://issues.apache.org/jira/browse/SPARK-37630
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.8, 3.2.0
>Reporter: Ismail H
>Priority: Major
>  Labels: security
>
> log4j is being used in version [1.2.17|#L122]]
>  
> This version has been deprecated and since [then have a known issue that 
> hasn't been adressed in 1.X 
> versions|https://www.cvedetails.com/cve/CVE-2019-17571/].
>  
> *Solution:*
>  * Upgrade log4j to version 2.15.0 which correct all known issues. [Last 
> known issues |https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37630) Security issue from Log4j 1.X exploit

2021-12-16 Thread Ismail H (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460832#comment-17460832
 ] 

Ismail H commented on SPARK-37630:
--

to [~divekarsc] , extract from 
https://access.redhat.com/security/cve/CVE-2021-4104 :
bq. Note this flaw ONLY affects applications which are specifically configured 
to use JMSAppender, which is not the default, or when the attacker has write 
access to the Log4j configuration for adding JMSAppender to the attacker's JMS 
Broker. bq. 

so the question is, is Spark using JMSAppender ?

> Security issue from Log4j 1.X exploit
> -
>
> Key: SPARK-37630
> URL: https://issues.apache.org/jira/browse/SPARK-37630
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.8, 3.2.0
>Reporter: Ismail H
>Priority: Major
>  Labels: security
>
> log4j is being used in version [1.2.17|#L122]]
>  
> This version has been deprecated and since [then have a known issue that 
> hasn't been adressed in 1.X 
> versions|https://www.cvedetails.com/cve/CVE-2019-17571/].
>  
> *Solution:*
>  * Upgrade log4j to version 2.15.0 which correct all known issues. [Last 
> known issues |https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460777#comment-17460777
 ] 

Apache Spark commented on SPARK-35739:
--

User 'brandondahler' has created a pull request for this issue:
https://github.com/apache/spark/pull/34923

> [Spark Sql] Add Java-comptable Dataset.join overloads
> -
>
> Key: SPARK-35739
> URL: https://issues.apache.org/jira/browse/SPARK-35739
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Brandon Dahler
>Priority: Minor
>
> h2. Problem
> When using Spark SQL with Java, the required syntax to utilize the following 
> two overloads are unnatural and not obvious to developers that haven't had to 
> interoperate with Scala before:
> {code:java}
> def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame
> def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> Examples:
> Java 11 
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> // Overload with multiple usingColumns, no join type
> dataset1
>   .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2))
>   .show();
> // Overload with multiple usingColumns and a join type
> dataset1
>   .join(
> dataset2,
> JavaConverters.asScalaBuffer(List.of("column", "column2")),
> "left")
>   .show();
> {code}
>  
>  Additionally there is no overload that takes a single usingColumnn and a 
> joinType, forcing the developer to use the Seq[String] overload regardless of 
> language.
> Examples:
> Scala
> {code:java}
> val dataset1 :DataFrame = ...;
> val dataset2 :DataFrame = ...;
> dataset1
>   .join(dataset2, Seq("column"), "left")
>   .show();
> {code}
>  
>  Java 11
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> dataset1
>  .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left")
>  .show();
> {code}
> h2. Proposed Improvement
> Add 3 additional overloads to Dataset:
>   
> {code:java}
> def join(right: Dataset[_], usingColumn: List[String]): DataFrame
> def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame
> def join(right: Dataset[_], usingColumn: List[String], joinType: String): 
> DataFrame
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35739) [Spark Sql] Add Java-comptable Dataset.join overloads

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460775#comment-17460775
 ] 

Apache Spark commented on SPARK-35739:
--

User 'brandondahler' has created a pull request for this issue:
https://github.com/apache/spark/pull/34923

> [Spark Sql] Add Java-comptable Dataset.join overloads
> -
>
> Key: SPARK-35739
> URL: https://issues.apache.org/jira/browse/SPARK-35739
> Project: Spark
>  Issue Type: Improvement
>  Components: Java API, SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Brandon Dahler
>Priority: Minor
>
> h2. Problem
> When using Spark SQL with Java, the required syntax to utilize the following 
> two overloads are unnatural and not obvious to developers that haven't had to 
> interoperate with Scala before:
> {code:java}
> def join(right: Dataset[_], usingColumns: Seq[String]): DataFrame
> def join(right: Dataset[_], usingColumns: Seq[String], joinType: String): 
> DataFrame
> {code}
> Examples:
> Java 11 
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> // Overload with multiple usingColumns, no join type
> dataset1
>   .join(dataset2, JavaConverters.asScalaBuffer(List.of("column", "column2))
>   .show();
> // Overload with multiple usingColumns and a join type
> dataset1
>   .join(
> dataset2,
> JavaConverters.asScalaBuffer(List.of("column", "column2")),
> "left")
>   .show();
> {code}
>  
>  Additionally there is no overload that takes a single usingColumnn and a 
> joinType, forcing the developer to use the Seq[String] overload regardless of 
> language.
> Examples:
> Scala
> {code:java}
> val dataset1 :DataFrame = ...;
> val dataset2 :DataFrame = ...;
> dataset1
>   .join(dataset2, Seq("column"), "left")
>   .show();
> {code}
>  
>  Java 11
> {code:java}
> Dataset dataset1 = ...;
> Dataset dataset2 = ...;
> dataset1
>  .join(dataset2, JavaConverters.asScalaBuffer(List.of("column")), "left")
>  .show();
> {code}
> h2. Proposed Improvement
> Add 3 additional overloads to Dataset:
>   
> {code:java}
> def join(right: Dataset[_], usingColumn: List[String]): DataFrame
> def join(right: Dataset[_], usingColumn: String, joinType: String): DataFrame
> def join(right: Dataset[_], usingColumn: List[String], joinType: String): 
> DataFrame
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37483) Support push down top N to JDBC data source V2

2021-12-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37483.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34918
[https://github.com/apache/spark/pull/34918]

> Support push down top N to JDBC data source V2
> --
>
> Key: SPARK-37483
> URL: https://issues.apache.org/jira/browse/SPARK-37483
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37483) Support push down top N to JDBC data source V2

2021-12-16 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37483:
---

Assignee: jiaan.geng

> Support push down top N to JDBC data source V2
> --
>
> Key: SPARK-37483
> URL: https://issues.apache.org/jira/browse/SPARK-37483
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37665) JDBC: Provide generic metadata functions

2021-12-16 Thread Daniel Haviv (Jira)

Daniel Haviv created SPARK-37665:


 Summary: JDBC: Provide generic metadata functions
 Key: SPARK-37665
 URL: https://issues.apache.org/jira/browse/SPARK-37665
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Daniel Haviv


JDBC Driver vendors expose the metadata (databases/tables) for the underlying 
engine through the [DatabaseMetaData 
interface.|https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html]

Today when a user wants to fetch the metadata for an engine, they have to 
execute an sql statement that is tailored to a specific engine/syntax instead 
of using a more generic approach.

 I suggest we add two new functions to the JDBC reader: 
{code:java}
listDatabases & listTables{code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37663) Mitigate ConcurrentModificationException thrown from tests in SparkContextSuite

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460612#comment-17460612
 ] 

Apache Spark commented on SPARK-37663:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/34922

> Mitigate ConcurrentModificationException thrown from tests in 
> SparkContextSuite
> ---
>
> Key: SPARK-37663
> URL: https://issues.apache.org/jira/browse/SPARK-37663
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> ConcurrentModificationException can be thrown from tests in SparkContextSuite 
> with Scala 2.13.
> The cause seems to be same as SPARK-37315.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37663) Mitigate ConcurrentModificationException thrown from tests in SparkContextSuite

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37663:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Mitigate ConcurrentModificationException thrown from tests in 
> SparkContextSuite
> ---
>
> Key: SPARK-37663
> URL: https://issues.apache.org/jira/browse/SPARK-37663
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> ConcurrentModificationException can be thrown from tests in SparkContextSuite 
> with Scala 2.13.
> The cause seems to be same as SPARK-37315.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37663) Mitigate ConcurrentModificationException thrown from tests in SparkContextSuite

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37663:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Mitigate ConcurrentModificationException thrown from tests in 
> SparkContextSuite
> ---
>
> Key: SPARK-37663
> URL: https://issues.apache.org/jira/browse/SPARK-37663
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> ConcurrentModificationException can be thrown from tests in SparkContextSuite 
> with Scala 2.13.
> The cause seems to be same as SPARK-37315.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460539#comment-17460539
 ] 

Apache Spark commented on SPARK-37664:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34921

> Supplement benchmark result of InMemoryColumnarBenchmark and 
> StateStoreBasicOperationsBenchmark for Java 11 and Java 17
> ---
>
> Key: SPARK-37664
> URL: https://issues.apache.org/jira/browse/SPARK-37664
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37664:


Assignee: (was: Apache Spark)

> Supplement benchmark result of InMemoryColumnarBenchmark and 
> StateStoreBasicOperationsBenchmark for Java 11 and Java 17
> ---
>
> Key: SPARK-37664
> URL: https://issues.apache.org/jira/browse/SPARK-37664
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37664:


Assignee: Apache Spark

> Supplement benchmark result of InMemoryColumnarBenchmark and 
> StateStoreBasicOperationsBenchmark for Java 11 and Java 17
> ---
>
> Key: SPARK-37664
> URL: https://issues.apache.org/jira/browse/SPARK-37664
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460536#comment-17460536
 ] 

Apache Spark commented on SPARK-37664:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34921

> Supplement benchmark result of InMemoryColumnarBenchmark and 
> StateStoreBasicOperationsBenchmark for Java 11 and Java 17
> ---
>
> Key: SPARK-37664
> URL: https://issues.apache.org/jira/browse/SPARK-37664
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17

2021-12-16 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-37664:
-
Component/s: Tests
 (was: SQL)

> Supplement benchmark result of InMemoryColumnarBenchmark and 
> StateStoreBasicOperationsBenchmark for Java 11 and Java 17
> ---
>
> Key: SPARK-37664
> URL: https://issues.apache.org/jira/browse/SPARK-37664
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9853) Optimize shuffle fetch of contiguous partition IDs

2021-12-16 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-9853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-9853:

Priority: Major  (was: Minor)

> Optimize shuffle fetch of contiguous partition IDs
> --
>
> Key: SPARK-9853
> URL: https://issues.apache.org/jira/browse/SPARK-9853
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Reporter: Matei Alexandru Zaharia
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.0.0
>
>
> On the map side, we should be able to serve a block representing multiple 
> partition IDs in one block manager request



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37664) Supplement benchmark result of InMemoryColumnarBenchmark and StateStoreBasicOperationsBenchmark for Java 11 and Java 17

2021-12-16 Thread Yang Jie (Jira)

Yang Jie created SPARK-37664:


 Summary: Supplement benchmark result of InMemoryColumnarBenchmark 
and StateStoreBasicOperationsBenchmark for Java 11 and Java 17
 Key: SPARK-37664
 URL: https://issues.apache.org/jira/browse/SPARK-37664
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37663) Mitigate ConcurrentModificationException thrown from tests in SparkContextSuite

2021-12-16 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37663:
---
Summary: Mitigate ConcurrentModificationException thrown from tests in 
SparkContextSuite  (was: Mitigate ConcurrentModificationException thrown from a 
test in SparkContextSuite)

> Mitigate ConcurrentModificationException thrown from tests in 
> SparkContextSuite
> ---
>
> Key: SPARK-37663
> URL: https://issues.apache.org/jira/browse/SPARK-37663
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> ConcurrentModificationException can be thrown from tests in SparkContextSuite 
> with Scala 2.13.
> The cause seems to be same as SPARK-37315.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37663) SPARK-37315][ML][TEST] Mitigate ConcurrentModificationException thrown from a test in SparkContextSuite

2021-12-16 Thread Kousuke Saruta (Jira)

Kousuke Saruta created SPARK-37663:
--

 Summary: SPARK-37315][ML][TEST] Mitigate 
ConcurrentModificationException thrown from a test in SparkContextSuite
 Key: SPARK-37663
 URL: https://issues.apache.org/jira/browse/SPARK-37663
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Affects Versions: 3.3.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


ConcurrentModificationException can be thrown from tests in SparkContextSuite 
with Scala 2.13.
The cause seems to be same as SPARK-37315.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37663) Mitigate ConcurrentModificationException thrown from a test in SparkContextSuite

2021-12-16 Thread Kousuke Saruta (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37663:
---
Summary: Mitigate ConcurrentModificationException thrown from a test in 
SparkContextSuite  (was: SPARK-37315][ML][TEST] Mitigate 
ConcurrentModificationException thrown from a test in SparkContextSuite)

> Mitigate ConcurrentModificationException thrown from a test in 
> SparkContextSuite
> 
>
> Key: SPARK-37663
> URL: https://issues.apache.org/jira/browse/SPARK-37663
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> ConcurrentModificationException can be thrown from tests in SparkContextSuite 
> with Scala 2.13.
> The cause seems to be same as SPARK-37315.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37661) SparkSQLCLIDriver will use hive defaults to resolve warehouse dir

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37661:


Assignee: Apache Spark

> SparkSQLCLIDriver will use hive defaults to resolve warehouse dir
> -
>
> Key: SPARK-37661
> URL: https://issues.apache.org/jira/browse/SPARK-37661
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Minor
>
> {code:log}
> 21/12/16 15:27:26.713 main INFO SharedState: spark.sql.warehouse.dir is not 
> set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir 
> to the value of hive.metastore.warehouse.dir.
> 21/12/16 15:27:26.761 main INFO SharedState: Warehouse path is 
> 'file:/user/hive/warehouse'.
> ...
> ...
> 21/12/16 15:27:36.559 main INFO SharedState: Setting 
> hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
> 21/12/16 15:27:36.561 main INFO SharedState: Warehouse path is 
> 'file:/Users/kentyao/Downloads/spark/spark-3.2.0-bin-hadoop3.2/spark-warehouse'.
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37661) SparkSQLCLIDriver will use hive defaults to resolve warehouse dir

2021-12-16 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460505#comment-17460505
 ] 

Apache Spark commented on SPARK-37661:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34920

> SparkSQLCLIDriver will use hive defaults to resolve warehouse dir
> -
>
> Key: SPARK-37661
> URL: https://issues.apache.org/jira/browse/SPARK-37661
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Priority: Minor
>
> {code:log}
> 21/12/16 15:27:26.713 main INFO SharedState: spark.sql.warehouse.dir is not 
> set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir 
> to the value of hive.metastore.warehouse.dir.
> 21/12/16 15:27:26.761 main INFO SharedState: Warehouse path is 
> 'file:/user/hive/warehouse'.
> ...
> ...
> 21/12/16 15:27:36.559 main INFO SharedState: Setting 
> hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
> 21/12/16 15:27:36.561 main INFO SharedState: Warehouse path is 
> 'file:/Users/kentyao/Downloads/spark/spark-3.2.0-bin-hadoop3.2/spark-warehouse'.
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37661) SparkSQLCLIDriver will use hive defaults to resolve warehouse dir

2021-12-16 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37661:


Assignee: (was: Apache Spark)

> SparkSQLCLIDriver will use hive defaults to resolve warehouse dir
> -
>
> Key: SPARK-37661
> URL: https://issues.apache.org/jira/browse/SPARK-37661
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kent Yao
>Priority: Minor
>
> {code:log}
> 21/12/16 15:27:26.713 main INFO SharedState: spark.sql.warehouse.dir is not 
> set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir 
> to the value of hive.metastore.warehouse.dir.
> 21/12/16 15:27:26.761 main INFO SharedState: Warehouse path is 
> 'file:/user/hive/warehouse'.
> ...
> ...
> 21/12/16 15:27:36.559 main INFO SharedState: Setting 
> hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
> 21/12/16 15:27:36.561 main INFO SharedState: Warehouse path is 
> 'file:/Users/kentyao/Downloads/spark/spark-3.2.0-bin-hadoop3.2/spark-warehouse'.
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37662) exception when handling late data with watermarking and window

2021-12-16 Thread luigi (Jira)

luigi created SPARK-37662:
-

 Summary: exception when handling late data with watermarking and 
window
 Key: SPARK-37662
 URL: https://issues.apache.org/jira/browse/SPARK-37662
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.2.0
 Environment: spark v3.2.0

scala v2.12.12
Reporter: luigi


when i use watermark to block late data, meanwhile window for state 
de-duplication, the order will cause unexpected behavior.

a）below code will cause exception state that {color:#172b4d}"Couldn't find 
{color:#de350b}timestamp#58-T5000ms{color} in 
[{color:#4c9aff}window#550-T5000ms{color},raid#132L,app#528]"{color}
{code:java}
// code placeholder
withWatermark("timestamp", "5 seconds").
withColumn("window", window($"timestamp", "1 hours")).
dropDuplicates("window", "raid", "app"). {code}
b) but when i switch the order of watermark and window config as below, it work 
without any exception 
{code:java}
// code placeholder
withColumn("window", window($"timestamp", "1 hours")). 
withWatermark("timestamp", "5 seconds").
dropDuplicates("window", "raid", "app").  {code}
pls. note ,  this issue does not exist on spark v3.1.2



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37661) SparkSQLCLIDriver will use hive defaults to resolve warehouse dir

2021-12-16 Thread Kent Yao (Jira)

Kent Yao created SPARK-37661:


 Summary: SparkSQLCLIDriver will use hive defaults to resolve 
warehouse dir
 Key: SPARK-37661
 URL: https://issues.apache.org/jira/browse/SPARK-37661
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Kent Yao


{code:log}
21/12/16 15:27:26.713 main INFO SharedState: spark.sql.warehouse.dir is not 
set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir 
to the value of hive.metastore.warehouse.dir.
21/12/16 15:27:26.761 main INFO SharedState: Warehouse path is 
'file:/user/hive/warehouse'.

...
...

21/12/16 15:27:36.559 main INFO SharedState: Setting 
hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
21/12/16 15:27:36.561 main INFO SharedState: Warehouse path is 
'file:/Users/kentyao/Downloads/spark/spark-3.2.0-bin-hadoop3.2/spark-warehouse'.

 {code}








--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

97 matches

Mail list logo