[jira] [Assigned] (SPARK-37142) Add __all__ to pyspark/pandas/*/__init__.py

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37142:


Assignee: (was: Apache Spark)

> Add __all__ to pyspark/pandas/*/__init__.py
> ---
>
> Key: SPARK-37142
> URL: https://issues.apache.org/jira/browse/SPARK-37142
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37142) Add __all__ to pyspark/pandas/*/__init__.py

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37142:


Assignee: Apache Spark

> Add __all__ to pyspark/pandas/*/__init__.py
> ---
>
> Key: SPARK-37142
> URL: https://issues.apache.org/jira/browse/SPARK-37142
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37142) Add __all__ to pyspark/pandas/*/__init__.py

2021-10-27 Thread dch nguyen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dch nguyen updated SPARK-37142:
---
Issue Type: Improvement  (was: Bug)

> Add __all__ to pyspark/pandas/*/__init__.py
> ---
>
> Key: SPARK-37142
> URL: https://issues.apache.org/jira/browse/SPARK-37142
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37142) Add __all__ to pyspark/pandas/*/__init__.py

2021-10-27 Thread dch nguyen (Jira)
dch nguyen created SPARK-37142:
--

 Summary: Add __all__ to pyspark/pandas/*/__init__.py
 Key: SPARK-37142
 URL: https://issues.apache.org/jira/browse/SPARK-37142
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dch nguyen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37128) Application has been removed by master but driver still running

2021-10-27 Thread JacobZheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435189#comment-17435189
 ] 

JacobZheng commented on SPARK-37128:


My Spark version is 3.0.1 and I run Spark standalone on k8s.I don't know how to 
reproduce it, so it doesn't always come up. Sometimes it appears when an oom 
exception occurs in the executor. [~hyukjin.kwon]

> Application has been removed by master but driver still running
> ---
>
> Key: SPARK-37128
> URL: https://issues.apache.org/jira/browse/SPARK-37128
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: JacobZheng
>Priority: Major
>
> {code:java}
> 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 
> because it is EXITED
> 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 
> on worker worker-20210826183405-10.39.0.69-37147
> 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing 
> it.
> 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing 
> it.
> 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030
> 21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/4
> 21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/4
> 21/08/30 10:27:46 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/2
> 21/08/30 10:27:48 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/0
> 21/08/30 10:27:50 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/3{code}
> As the logs show, Spark master removed my application. But my driver process 
> is still running. I would like to know what could be the cause of this and 
> how I can avoid it.
>  
> My Spark version is 3.0.1 and I run Spark standalone on k8s.I don't know how 
> to reproduce it, so it doesn't always come up. Sometimes it appears when an 
> oom exception occurs in the executor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37128) Application has been removed by master but driver still running

2021-10-27 Thread JacobZheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JacobZheng updated SPARK-37128:
---
Description: 
{code:java}
21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 
because it is EXITED
21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 on 
worker worker-20210826183405-10.39.0.69-37147
21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing it.
21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing it.
21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030
21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/4
21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/4
21/08/30 10:27:46 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/2
21/08/30 10:27:48 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/0
21/08/30 10:27:50 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/3{code}
As the logs show, Spark master removed my application. But my driver process is 
still running. I would like to know what could be the cause of this and how I 
can avoid it.

 

My Spark version is 3.0.1 and I run Spark standalone on k8s.I don't know how to 
reproduce it, so it doesn't always come up. Sometimes it appears when an oom 
exception occurs in the executor

  was:
{code:java}
21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 
because it is EXITED
21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 on 
worker worker-20210826183405-10.39.0.69-37147
21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing it.
21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing it.
21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030
21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/4
21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/4
21/08/30 10:27:46 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/2
21/08/30 10:27:48 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/0
21/08/30 10:27:50 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/3{code}
As the logs show, Spark master removed my application. But my driver process is 
still running. I would like to know what could be the cause of this and how I 
can avoid it.

 

My Spark version is 3.0.1 and I run Spark standalone on k8s.


> Application has been removed by master but driver still running
> ---
>
> Key: SPARK-37128
> URL: https://issues.apache.org/jira/browse/SPARK-37128
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: JacobZheng
>Priority: Major
>
> {code:java}
> 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 
> because it is EXITED
> 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 
> on worker worker-20210826183405-10.39.0.69-37147
> 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing 
> it.
> 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing 
> it.
> 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030
> 21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/4
> 21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/4
> 21/08/30 10:27:46 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/2
> 21/08/30 10:27:48 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/0
> 21/08/30 10:27:50 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/3{code}
> As the logs show, Spark master removed my application. But my driver process 
> is still running. I would like to know what could be the cause of this and 
> how I can avoid it.
>  
> My Spark version is 3.0.1 and I run Spark standalone on k8s.I don't know how 
> to reproduce it, so it doesn't always come up. Sometimes it appears when an 
> oom exception occurs in the executor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37128) Application has been removed by master but driver still running

2021-10-27 Thread JacobZheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JacobZheng updated SPARK-37128:
---
Description: 
{code:java}
21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 
because it is EXITED
21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 on 
worker worker-20210826183405-10.39.0.69-37147
21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing it.
21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing it.
21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030
21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/4
21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/4
21/08/30 10:27:46 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/2
21/08/30 10:27:48 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/0
21/08/30 10:27:50 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/3{code}
As the logs show, Spark master removed my application. But my driver process is 
still running. I would like to know what could be the cause of this and how I 
can avoid it.

 

My Spark version is 3.0.1 and I run Spark standalone on k8s.

  was:
{code:java}
21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 
because it is EXITED
21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 on 
worker worker-20210826183405-10.39.0.69-37147
21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing it.
21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing it.
21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030
21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/4
21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/4
21/08/30 10:27:46 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/2
21/08/30 10:27:48 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/0
21/08/30 10:27:50 WARN Master: Got status update for unknown executor 
app-20210827190502-0030/3{code}
As the logs show, Spark master removed my application. But my driver process is 
still running. I would like to know what could be the cause of this and how I 
can avoid it.


> Application has been removed by master but driver still running
> ---
>
> Key: SPARK-37128
> URL: https://issues.apache.org/jira/browse/SPARK-37128
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: JacobZheng
>Priority: Major
>
> {code:java}
> 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 
> because it is EXITED
> 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 
> on worker worker-20210826183405-10.39.0.69-37147
> 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing 
> it.
> 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing 
> it.
> 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030
> 21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/4
> 21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/4
> 21/08/30 10:27:46 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/2
> 21/08/30 10:27:48 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/0
> 21/08/30 10:27:50 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/3{code}
> As the logs show, Spark master removed my application. But my driver process 
> is still running. I would like to know what could be the cause of this and 
> how I can avoid it.
>  
> My Spark version is 3.0.1 and I run Spark standalone on k8s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37117) Can't read files in one of Parquet encryption modes (external keymaterial)

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435187#comment-17435187
 ] 

Apache Spark commented on SPARK-37117:
--

User 'ggershinsky' has created a pull request for this issue:
https://github.com/apache/spark/pull/34415

> Can't read files in one of Parquet encryption modes (external keymaterial) 
> ---
>
> Key: SPARK-37117
> URL: https://issues.apache.org/jira/browse/SPARK-37117
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gidon Gershinsky
>Priority: Major
>
> Parquet encryption has a number of modes. One of them is "external 
> keymaterial", which keeps encrypted data keys in a separate file (as opposed 
> to inside Parquet file). Upon reading, the Spark Parquet connector does not 
> pass the file path, which causes an NPE. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37117) Can't read files in one of Parquet encryption modes (external keymaterial)

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37117:


Assignee: Apache Spark

> Can't read files in one of Parquet encryption modes (external keymaterial) 
> ---
>
> Key: SPARK-37117
> URL: https://issues.apache.org/jira/browse/SPARK-37117
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gidon Gershinsky
>Assignee: Apache Spark
>Priority: Major
>
> Parquet encryption has a number of modes. One of them is "external 
> keymaterial", which keeps encrypted data keys in a separate file (as opposed 
> to inside Parquet file). Upon reading, the Spark Parquet connector does not 
> pass the file path, which causes an NPE. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37117) Can't read files in one of Parquet encryption modes (external keymaterial)

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37117:


Assignee: (was: Apache Spark)

> Can't read files in one of Parquet encryption modes (external keymaterial) 
> ---
>
> Key: SPARK-37117
> URL: https://issues.apache.org/jira/browse/SPARK-37117
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gidon Gershinsky
>Priority: Major
>
> Parquet encryption has a number of modes. One of them is "external 
> keymaterial", which keeps encrypted data keys in a separate file (as opposed 
> to inside Parquet file). Upon reading, the Spark Parquet connector does not 
> pass the file path, which causes an NPE. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37117) Can't read files in one of Parquet encryption modes (external keymaterial)

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435186#comment-17435186
 ] 

Apache Spark commented on SPARK-37117:
--

User 'ggershinsky' has created a pull request for this issue:
https://github.com/apache/spark/pull/34415

> Can't read files in one of Parquet encryption modes (external keymaterial) 
> ---
>
> Key: SPARK-37117
> URL: https://issues.apache.org/jira/browse/SPARK-37117
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gidon Gershinsky
>Priority: Major
>
> Parquet encryption has a number of modes. One of them is "external 
> keymaterial", which keeps encrypted data keys in a separate file (as opposed 
> to inside Parquet file). Upon reading, the Spark Parquet connector does not 
> pass the file path, which causes an NPE. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37139) Inline type hints for python/pyspark/taskcontext.py and python/pyspark/version.py

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37139:


Assignee: Apache Spark

> Inline type hints for python/pyspark/taskcontext.py and 
> python/pyspark/version.py
> -
>
> Key: SPARK-37139
> URL: https://issues.apache.org/jira/browse/SPARK-37139
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37139) Inline type hints for python/pyspark/taskcontext.py and python/pyspark/version.py

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37139:


Assignee: (was: Apache Spark)

> Inline type hints for python/pyspark/taskcontext.py and 
> python/pyspark/version.py
> -
>
> Key: SPARK-37139
> URL: https://issues.apache.org/jira/browse/SPARK-37139
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37139) Inline type hints for python/pyspark/taskcontext.py and python/pyspark/version.py

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435172#comment-17435172
 ] 

Apache Spark commented on SPARK-37139:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34414

> Inline type hints for python/pyspark/taskcontext.py and 
> python/pyspark/version.py
> -
>
> Key: SPARK-37139
> URL: https://issues.apache.org/jira/browse/SPARK-37139
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37141) WorkerSuite cannot run on Mac OS

2021-10-27 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-37141:
-
Description: 
After SPARK-35907 run `org.apache.spark.deploy.worker.WorkerSuite` on Mac 
os(both M1 and Intel) failed
{code:java}
mvn clean install -DskipTests -pl core -am
mvn test -pl core -Dtest=none 
-DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite
{code}
{code:java}
WorkerSuite:
- test isUseLocalNodeSSLConfig
- test maybeUpdateSSLSettings
- test clearing of finishedExecutors (small number of executors)
- test clearing of finishedExecutors (more executors)
- test clearing of finishedDrivers (small number of drivers)
- test clearing of finishedDrivers (more drivers)
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  47.973 s
[INFO] Finished at: 2021-10-28T13:46:56+08:00
[INFO] 
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:2.0.2:test 
(test) on project spark-core_2.12: There are test failures -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
{code}
{code:java}
21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create 
directory /tmp
java.nio.file.FileAlreadyExistsException: /tmp
        at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at 
sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
        at java.nio.file.Files.createDirectory(Files.java:674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:727)
        at org.apache.spark.util.Utils$.createDirectory(Utils.scala:292)
        at org.apache.spark.deploy.worker.Worker.createWorkDir(Worker.scala:221)
        at org.apache.spark.deploy.worker.Worker.onStart(Worker.scala:232)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
        at 
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}
 

  was:
Run `org.apache.spark.deploy.worker.WorkerSuite` on Mac os(both M1 and Intel) 
failed
{code:java}
mvn clean install -DskipTests -pl core -am
mvn test -pl core -Dtest=none 
-DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite
{code}
{code:java}
WorkerSuite:
- test isUseLocalNodeSSLConfig
- test maybeUpdateSSLSettings
- test clearing of finishedExecutors (small number of executors)
- test clearing of finishedExecutors (more executors)
- test clearing of finishedDrivers (small number of drivers)
- test clearing of finishedDrivers (more drivers)
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  47.973 s
[INFO] Finished at: 2021-10-28T13:46:56+08:00
[INFO] 
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:2.0.2:test 
(test) on project spark-core_2.12: There are test failures -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
{code}
{code:java}
21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create 
directory /tmp
java.nio.file.FileAlreadyExistsException: /tmp
        at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.f

[jira] [Created] (SPARK-37141) WorkerSuite cannot run on Mac OS

2021-10-27 Thread Yang Jie (Jira)
Yang Jie created SPARK-37141:


 Summary: WorkerSuite cannot run on Mac OS
 Key: SPARK-37141
 URL: https://issues.apache.org/jira/browse/SPARK-37141
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 3.3.0
Reporter: Yang Jie


Run `org.apache.spark.deploy.worker.WorkerSuite` on Mac os(both M1 and Intel) 
failed
{code:java}
mvn clean install -DskipTests -pl core -am
mvn test -pl core -Dtest=none 
-DwildcardSuites=org.apache.spark.deploy.worker.WorkerSuite
{code}
{code:java}
WorkerSuite:
- test isUseLocalNodeSSLConfig
- test maybeUpdateSSLSettings
- test clearing of finishedExecutors (small number of executors)
- test clearing of finishedExecutors (more executors)
- test clearing of finishedDrivers (small number of drivers)
- test clearing of finishedDrivers (more drivers)
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time:  47.973 s
[INFO] Finished at: 2021-10-28T13:46:56+08:00
[INFO] 
[ERROR] Failed to execute goal org.scalatest:scalatest-maven-plugin:2.0.2:test 
(test) on project spark-core_2.12: There are test failures -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
{code}
{code:java}
21/10/28 13:46:56.133 dispatcher-event-loop-1 ERROR Utils: Failed to create 
directory /tmp
java.nio.file.FileAlreadyExistsException: /tmp
        at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:88)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
        at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
        at 
sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
        at java.nio.file.Files.createDirectory(Files.java:674)
        at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
        at java.nio.file.Files.createDirectories(Files.java:727)
        at org.apache.spark.util.Utils$.createDirectory(Utils.scala:292)
        at org.apache.spark.deploy.worker.Worker.createWorkDir(Worker.scala:221)
        at org.apache.spark.deploy.worker.Worker.onStart(Worker.scala:232)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at 
org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
        at 
org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37133) Add a config to optionally enforce ANSI reserved keywords

2021-10-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37133.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34403
[https://github.com/apache/spark/pull/34403]

> Add a config to optionally enforce ANSI reserved keywords
> -
>
> Key: SPARK-37133
> URL: https://issues.apache.org/jira/browse/SPARK-37133
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435146#comment-17435146
 ] 

Apache Spark commented on SPARK-37140:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34413

> Inline type hints for python/pyspark/resultiterable.py
> --
>
> Key: SPARK-37140
> URL: https://issues.apache.org/jira/browse/SPARK-37140
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37140:


Assignee: Apache Spark

> Inline type hints for python/pyspark/resultiterable.py
> --
>
> Key: SPARK-37140
> URL: https://issues.apache.org/jira/browse/SPARK-37140
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435145#comment-17435145
 ] 

Apache Spark commented on SPARK-37140:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34413

> Inline type hints for python/pyspark/resultiterable.py
> --
>
> Key: SPARK-37140
> URL: https://issues.apache.org/jira/browse/SPARK-37140
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37140:


Assignee: (was: Apache Spark)

> Inline type hints for python/pyspark/resultiterable.py
> --
>
> Key: SPARK-37140
> URL: https://issues.apache.org/jira/browse/SPARK-37140
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37138) Support ANSI Interval in functions that support numeric type

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435144#comment-17435144
 ] 

Apache Spark commented on SPARK-37138:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34412

> Support ANSI Interval in functions that support numeric type
> 
>
> Key: SPARK-37138
> URL: https://issues.apache.org/jira/browse/SPARK-37138
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Support ANSI Interval in functions that support numeric type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37138) Support ANSI Interval in functions that support numeric type

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435143#comment-17435143
 ] 

Apache Spark commented on SPARK-37138:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34412

> Support ANSI Interval in functions that support numeric type
> 
>
> Key: SPARK-37138
> URL: https://issues.apache.org/jira/browse/SPARK-37138
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Support ANSI Interval in functions that support numeric type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37138) Support ANSI Interval in functions that support numeric type

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37138:


Assignee: (was: Apache Spark)

> Support ANSI Interval in functions that support numeric type
> 
>
> Key: SPARK-37138
> URL: https://issues.apache.org/jira/browse/SPARK-37138
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Support ANSI Interval in functions that support numeric type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37138) Support ANSI Interval in functions that support numeric type

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37138:


Assignee: Apache Spark

> Support ANSI Interval in functions that support numeric type
> 
>
> Key: SPARK-37138
> URL: https://issues.apache.org/jira/browse/SPARK-37138
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> Support ANSI Interval in functions that support numeric type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37140) Inline type hints for python/pyspark/resultiterable.py

2021-10-27 Thread dch nguyen (Jira)
dch nguyen created SPARK-37140:
--

 Summary: Inline type hints for python/pyspark/resultiterable.py
 Key: SPARK-37140
 URL: https://issues.apache.org/jira/browse/SPARK-37140
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dch nguyen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37139) Inline type hints for python/pyspark/taskcontext.py and python/pyspark/version.py

2021-10-27 Thread dch nguyen (Jira)
dch nguyen created SPARK-37139:
--

 Summary: Inline type hints for python/pyspark/taskcontext.py and 
python/pyspark/version.py
 Key: SPARK-37139
 URL: https://issues.apache.org/jira/browse/SPARK-37139
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.3.0
Reporter: dch nguyen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37137) Inline type hints for python/pyspark/conf.py

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435142#comment-17435142
 ] 

Apache Spark commented on SPARK-37137:
--

User 'ByronHsu' has created a pull request for this issue:
https://github.com/apache/spark/pull/34411

> Inline type hints for python/pyspark/conf.py
> 
>
> Key: SPARK-37137
> URL: https://issues.apache.org/jira/browse/SPARK-37137
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37137) Inline type hints for python/pyspark/conf.py

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37137:


Assignee: Apache Spark

> Inline type hints for python/pyspark/conf.py
> 
>
> Key: SPARK-37137
> URL: https://issues.apache.org/jira/browse/SPARK-37137
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37137) Inline type hints for python/pyspark/conf.py

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37137:


Assignee: (was: Apache Spark)

> Inline type hints for python/pyspark/conf.py
> 
>
> Key: SPARK-37137
> URL: https://issues.apache.org/jira/browse/SPARK-37137
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37138) Support ANSI Interval in functions that support numeric type

2021-10-27 Thread angerszhu (Jira)
angerszhu created SPARK-37138:
-

 Summary: Support ANSI Interval in functions that support numeric 
type
 Key: SPARK-37138
 URL: https://issues.apache.org/jira/browse/SPARK-37138
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu


Support ANSI Interval in functions that support numeric type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37137) Inline type hints for python/pyspark/conf.py

2021-10-27 Thread Byron Hsu (Jira)
Byron Hsu created SPARK-37137:
-

 Summary: Inline type hints for python/pyspark/conf.py
 Key: SPARK-37137
 URL: https://issues.apache.org/jira/browse/SPARK-37137
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Byron Hsu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37136) Remove code about hive build in functions

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37136:


Assignee: (was: Apache Spark)

> Remove code about hive build in functions
> -
>
> Key: SPARK-37136
> URL: https://issues.apache.org/jira/browse/SPARK-37136
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Since we have implement `histogram_numeric`, no we can remove code about hive 
> build in functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37136) Remove code about hive build in functions

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37136:


Assignee: Apache Spark

> Remove code about hive build in functions
> -
>
> Key: SPARK-37136
> URL: https://issues.apache.org/jira/browse/SPARK-37136
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> Since we have implement `histogram_numeric`, no we can remove code about hive 
> build in functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37136) Remove code about hive build in functions

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435134#comment-17435134
 ] 

Apache Spark commented on SPARK-37136:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34410

> Remove code about hive build in functions
> -
>
> Key: SPARK-37136
> URL: https://issues.apache.org/jira/browse/SPARK-37136
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Since we have implement `histogram_numeric`, no we can remove code about hive 
> build in functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37135) Fix some mirco-benchmarks run failed

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37135:


Assignee: Apache Spark

> Fix some mirco-benchmarks run failed 
> -
>
> Key: SPARK-37135
> URL: https://issues.apache.org/jira/browse/SPARK-37135
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> 2 mirco-benchmarks run failed:
>  
> org.apache.spark.serializer.KryoSerializerBenchmark
> {code:java}
> Running org.apache.spark.serializer.KryoSerializerBenchmark:Running 
> org.apache.spark.serializer.KryoSerializerBenchmark:Running benchmark: 
> Benchmark KryoPool vs old"pool of 1" implementation  Running case: 
> KryoPool:true21/10/27 16:09:26 ERROR SparkContext: Error initializing 
> SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is 
> not set! at scala.Predef$.assert(Predef.scala:223) at 
> org.apache.spark.deploy.worker.Worker.(Worker.scala:148) at 
> org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954)
>  at 
> org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:71)
>  at 
> org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65)
>  at scala.collection.immutable.Range.foreach(Range.scala:158) at 
> org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) 
> at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2971)
>  at org.apache.spark.SparkContext.(SparkContext.scala:562) at 
> org.apache.spark.SparkContext.(SparkContext.scala:138) at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63)
>  at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at 
> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at 
> scala.util.Success.$anonfun$map$1(Try.scala:255) at 
> scala.util.Success.map(Try.scala:213) at 
> scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at 
> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at 
> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at 
> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
>  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) 
> at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>  at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) 
> at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
> at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183){code}
> org.apache.spark.sql.execution.benchmark.DateTimeBenchmark
> {code:java}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: Exception in thread 
> "main" org.apache.spark.sql.catalyst.parser.ParseException: Cannot mix 
> year-month and day-time fields: interval 1 month 2 day(line 1, pos 38)
> == SQL ==cast(timestamp_seconds(id) as date) + interval 1 month 2 
> day--^^^
>  at 
> org.apache.spark.sql.errors.QueryParsingErrors$.mixedIntervalUnitsError(QueryParsingErrors.scala:214)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.constructMultiUnitsIntervalLiteral(AstBuilder.scala:2435)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2479)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2454)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57)
>  at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17681)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37135) Fix some mirco-benchmarks run failed

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435133#comment-17435133
 ] 

Apache Spark commented on SPARK-37135:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/34409

> Fix some mirco-benchmarks run failed 
> -
>
> Key: SPARK-37135
> URL: https://issues.apache.org/jira/browse/SPARK-37135
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> 2 mirco-benchmarks run failed:
>  
> org.apache.spark.serializer.KryoSerializerBenchmark
> {code:java}
> Running org.apache.spark.serializer.KryoSerializerBenchmark:Running 
> org.apache.spark.serializer.KryoSerializerBenchmark:Running benchmark: 
> Benchmark KryoPool vs old"pool of 1" implementation  Running case: 
> KryoPool:true21/10/27 16:09:26 ERROR SparkContext: Error initializing 
> SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is 
> not set! at scala.Predef$.assert(Predef.scala:223) at 
> org.apache.spark.deploy.worker.Worker.(Worker.scala:148) at 
> org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954)
>  at 
> org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:71)
>  at 
> org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65)
>  at scala.collection.immutable.Range.foreach(Range.scala:158) at 
> org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) 
> at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2971)
>  at org.apache.spark.SparkContext.(SparkContext.scala:562) at 
> org.apache.spark.SparkContext.(SparkContext.scala:138) at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63)
>  at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at 
> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at 
> scala.util.Success.$anonfun$map$1(Try.scala:255) at 
> scala.util.Success.map(Try.scala:213) at 
> scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at 
> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at 
> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at 
> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
>  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) 
> at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>  at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) 
> at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
> at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183){code}
> org.apache.spark.sql.execution.benchmark.DateTimeBenchmark
> {code:java}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: Exception in thread 
> "main" org.apache.spark.sql.catalyst.parser.ParseException: Cannot mix 
> year-month and day-time fields: interval 1 month 2 day(line 1, pos 38)
> == SQL ==cast(timestamp_seconds(id) as date) + interval 1 month 2 
> day--^^^
>  at 
> org.apache.spark.sql.errors.QueryParsingErrors$.mixedIntervalUnitsError(QueryParsingErrors.scala:214)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.constructMultiUnitsIntervalLiteral(AstBuilder.scala:2435)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2479)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2454)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57)
>  at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17681)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37135) Fix some mirco-benchmarks run failed

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37135:


Assignee: Apache Spark

> Fix some mirco-benchmarks run failed 
> -
>
> Key: SPARK-37135
> URL: https://issues.apache.org/jira/browse/SPARK-37135
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
> 2 mirco-benchmarks run failed:
>  
> org.apache.spark.serializer.KryoSerializerBenchmark
> {code:java}
> Running org.apache.spark.serializer.KryoSerializerBenchmark:Running 
> org.apache.spark.serializer.KryoSerializerBenchmark:Running benchmark: 
> Benchmark KryoPool vs old"pool of 1" implementation  Running case: 
> KryoPool:true21/10/27 16:09:26 ERROR SparkContext: Error initializing 
> SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is 
> not set! at scala.Predef$.assert(Predef.scala:223) at 
> org.apache.spark.deploy.worker.Worker.(Worker.scala:148) at 
> org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954)
>  at 
> org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:71)
>  at 
> org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65)
>  at scala.collection.immutable.Range.foreach(Range.scala:158) at 
> org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) 
> at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2971)
>  at org.apache.spark.SparkContext.(SparkContext.scala:562) at 
> org.apache.spark.SparkContext.(SparkContext.scala:138) at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63)
>  at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at 
> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at 
> scala.util.Success.$anonfun$map$1(Try.scala:255) at 
> scala.util.Success.map(Try.scala:213) at 
> scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at 
> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at 
> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at 
> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
>  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) 
> at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>  at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) 
> at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
> at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183){code}
> org.apache.spark.sql.execution.benchmark.DateTimeBenchmark
> {code:java}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: Exception in thread 
> "main" org.apache.spark.sql.catalyst.parser.ParseException: Cannot mix 
> year-month and day-time fields: interval 1 month 2 day(line 1, pos 38)
> == SQL ==cast(timestamp_seconds(id) as date) + interval 1 month 2 
> day--^^^
>  at 
> org.apache.spark.sql.errors.QueryParsingErrors$.mixedIntervalUnitsError(QueryParsingErrors.scala:214)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.constructMultiUnitsIntervalLiteral(AstBuilder.scala:2435)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2479)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2454)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57)
>  at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17681)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37135) Fix some mirco-benchmarks run failed

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37135:


Assignee: (was: Apache Spark)

> Fix some mirco-benchmarks run failed 
> -
>
> Key: SPARK-37135
> URL: https://issues.apache.org/jira/browse/SPARK-37135
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> 2 mirco-benchmarks run failed:
>  
> org.apache.spark.serializer.KryoSerializerBenchmark
> {code:java}
> Running org.apache.spark.serializer.KryoSerializerBenchmark:Running 
> org.apache.spark.serializer.KryoSerializerBenchmark:Running benchmark: 
> Benchmark KryoPool vs old"pool of 1" implementation  Running case: 
> KryoPool:true21/10/27 16:09:26 ERROR SparkContext: Error initializing 
> SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is 
> not set! at scala.Predef$.assert(Predef.scala:223) at 
> org.apache.spark.deploy.worker.Worker.(Worker.scala:148) at 
> org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954)
>  at 
> org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:71)
>  at 
> org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65)
>  at scala.collection.immutable.Range.foreach(Range.scala:158) at 
> org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) 
> at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2971)
>  at org.apache.spark.SparkContext.(SparkContext.scala:562) at 
> org.apache.spark.SparkContext.(SparkContext.scala:138) at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58)
>  at 
> org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63)
>  at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at 
> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at 
> scala.util.Success.$anonfun$map$1(Try.scala:255) at 
> scala.util.Success.map(Try.scala:213) at 
> scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at 
> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at 
> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at 
> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at 
> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
>  at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) 
> at 
> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
>  at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) 
> at 
> java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
> at 
> java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183){code}
> org.apache.spark.sql.execution.benchmark.DateTimeBenchmark
> {code:java}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: Exception in thread 
> "main" org.apache.spark.sql.catalyst.parser.ParseException: Cannot mix 
> year-month and day-time fields: interval 1 month 2 day(line 1, pos 38)
> == SQL ==cast(timestamp_seconds(id) as date) + interval 1 month 2 
> day--^^^
>  at 
> org.apache.spark.sql.errors.QueryParsingErrors$.mixedIntervalUnitsError(QueryParsingErrors.scala:214)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.constructMultiUnitsIntervalLiteral(AstBuilder.scala:2435)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2479)
>  at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2454)
>  at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57)
>  at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17681)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37136) Remove code about hive build in functions

2021-10-27 Thread angerszhu (Jira)
angerszhu created SPARK-37136:
-

 Summary: Remove code about hive build in functions
 Key: SPARK-37136
 URL: https://issues.apache.org/jira/browse/SPARK-37136
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.2.0
Reporter: angerszhu


Since we have implement `histogram_numeric`, no we can remove code about hive 
build in functions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37135) Fix some mirco-benchmarks run failed

2021-10-27 Thread Yang Jie (Jira)
Yang Jie created SPARK-37135:


 Summary: Fix some mirco-benchmarks run failed 
 Key: SPARK-37135
 URL: https://issues.apache.org/jira/browse/SPARK-37135
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 3.3.0
Reporter: Yang Jie


2 mirco-benchmarks run failed:

 

org.apache.spark.serializer.KryoSerializerBenchmark
{code:java}
Running org.apache.spark.serializer.KryoSerializerBenchmark:Running 
org.apache.spark.serializer.KryoSerializerBenchmark:Running benchmark: 
Benchmark KryoPool vs old"pool of 1" implementation  Running case: 
KryoPool:true21/10/27 16:09:26 ERROR SparkContext: Error initializing 
SparkContext.java.lang.AssertionError: assertion failed: spark.test.home is not 
set! at scala.Predef$.assert(Predef.scala:223) at 
org.apache.spark.deploy.worker.Worker.(Worker.scala:148) at 
org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:954) 
at 
org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2(LocalSparkCluster.scala:71)
 at 
org.apache.spark.deploy.LocalSparkCluster.$anonfun$start$2$adapted(LocalSparkCluster.scala:65)
 at scala.collection.immutable.Range.foreach(Range.scala:158) at 
org.apache.spark.deploy.LocalSparkCluster.start(LocalSparkCluster.scala:65) at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2971)
 at org.apache.spark.SparkContext.(SparkContext.scala:562) at 
org.apache.spark.SparkContext.(SparkContext.scala:138) at 
org.apache.spark.serializer.KryoSerializerBenchmark$.createSparkContext(KryoSerializerBenchmark.scala:86)
 at 
org.apache.spark.serializer.KryoSerializerBenchmark$.sc$lzycompute$1(KryoSerializerBenchmark.scala:58)
 at 
org.apache.spark.serializer.KryoSerializerBenchmark$.sc$1(KryoSerializerBenchmark.scala:58)
 at 
org.apache.spark.serializer.KryoSerializerBenchmark$.$anonfun$run$3(KryoSerializerBenchmark.scala:63)
 at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23) at 
scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at 
scala.util.Success.$anonfun$map$1(Try.scala:255) at 
scala.util.Success.map(Try.scala:213) at 
scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at 
scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at 
scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at 
scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at 
java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
 at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) 
at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
 at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) 
at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183){code}
org.apache.spark.sql.execution.benchmark.DateTimeBenchmark
{code:java}
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: 
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: 
Cannot mix year-month and day-time fields: interval 1 month 2 day(line 1, pos 
38)
== SQL ==cast(timestamp_seconds(id) as date) + interval 1 month 2 
day--^^^
 at 
org.apache.spark.sql.errors.QueryParsingErrors$.mixedIntervalUnitsError(QueryParsingErrors.scala:214)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder.constructMultiUnitsIntervalLiteral(AstBuilder.scala:2435)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder.$anonfun$visitInterval$1(AstBuilder.scala:2479)
 at 
org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:133)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:2454)
 at 
org.apache.spark.sql.catalyst.parser.AstBuilder.visitInterval(AstBuilder.scala:57)
 at 
org.apache.spark.sql.catalyst.parser.SqlBaseParser$IntervalContext.accept(SqlBaseParser.java:17681)
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37036) Add util function to raise advice warning for pandas API on Spark.

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37036.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34389
[https://github.com/apache/spark/pull/34389]

> Add util function to raise advice warning for pandas API on Spark.
> --
>
> Key: SPARK-37036
> URL: https://issues.apache.org/jira/browse/SPARK-37036
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.3.0
>
>
> Pandas API on Spark has some features that potentially cause the performance 
> degradation or an unexpected behavior e.g. `sort_index`, `index_col`, 
> `to_pandas`, etc.
>  
> We should raise the proper advice warning for those functions so that users 
> can adjust their pandas-on-Spark code base more robust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37036) Add util function to raise advice warning for pandas API on Spark.

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37036:


Assignee: Haejoon Lee

> Add util function to raise advice warning for pandas API on Spark.
> --
>
> Key: SPARK-37036
> URL: https://issues.apache.org/jira/browse/SPARK-37036
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Pandas API on Spark has some features that potentially cause the performance 
> degradation or an unexpected behavior e.g. `sort_index`, `index_col`, 
> `to_pandas`, etc.
>  
> We should raise the proper advice warning for those functions so that users 
> can adjust their pandas-on-Spark code base more robust.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37119) parse_url can not handle `{` and `}` correctly

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37119:
-
Priority: Major  (was: Critical)

> parse_url can not handle `{` and `}` correctly
> --
>
> Key: SPARK-37119
> URL: https://issues.apache.org/jira/browse/SPARK-37119
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.2.0, 3.3.0
>Reporter: Liu Shuo
>Priority: Major
>
> when we execute the follow sql command
> {code:java}
> select parse_url('http://facebook.com/path/p1.php?query={aa}', 'QUERY')
> {code}
> the expected result:
>     query=\{aa}
> the actual result:
>     null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37119) parse_url can not handle `{` and `}` correctly

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37119.
--
Resolution: Invalid

> parse_url can not handle `{` and `}` correctly
> --
>
> Key: SPARK-37119
> URL: https://issues.apache.org/jira/browse/SPARK-37119
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.2.0, 3.3.0
>Reporter: Liu Shuo
>Priority: Critical
>
> when we execute the follow sql command
> {code:java}
> select parse_url('http://facebook.com/path/p1.php?query={aa}', 'QUERY')
> {code}
> the expected result:
>     query=\{aa}
> the actual result:
>     null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37117) Can't read files in one of Parquet encryption modes (external keymaterial)

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37117:
-
Target Version/s:   (was: 3.2.1)

> Can't read files in one of Parquet encryption modes (external keymaterial) 
> ---
>
> Key: SPARK-37117
> URL: https://issues.apache.org/jira/browse/SPARK-37117
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Gidon Gershinsky
>Priority: Major
>
> Parquet encryption has a number of modes. One of them is "external 
> keymaterial", which keeps encrypted data keys in a separate file (as opposed 
> to inside Parquet file). Upon reading, the Spark Parquet connector does not 
> pass the file path, which causes an NPE. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37122) java.lang.IllegalArgumentException Related to Prometheus

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37122:
-
Priority: Major  (was: Critical)

> java.lang.IllegalArgumentException Related to Prometheus
> 
>
> Key: SPARK-37122
> URL: https://issues.apache.org/jira/browse/SPARK-37122
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.1
>Reporter: Biswa Singh
>Priority: Major
>
> This issue is similar to 
> https://issues.apache.org/jira/browse/SPARK-35237?focusedCommentId=17340723&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17340723.
>  We receive the Following warning continuously:
>  
> 21:00:26.277 [rpc-server-4-2] WARN  o.a.s.n.s.TransportChannelHandler - 
> Exception in connection from 
> /10.198.3.179:51184java.lang.IllegalArgumentException: Too large frame: 
> 5135603447297303916 at 
> org.sparkproject.guava.base.Preconditions.checkArgument(Preconditions.java:119)
>  at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:148)
>  at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>  at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>  at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>  at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719) 
> at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
>  at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581) 
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) at 
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
>  at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) 
> at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Unknown Source)
>  
> Below are other details related to prometheus and my findings. Please SCROLL 
> DOWN to see the details:
>  
> {noformat}
> Prometheus Scrape Configuration
> ===
> - job_name: 'kubernetes-pods'
>   kubernetes_sd_configs:
> - role: pod
>   relabel_configs:
> - action: labelmap
>   regex: __meta_kubernetes_pod_label_(.+)
> - source_labels: [__meta_kubernetes_namespace]
>   action: replace
>   target_label: kubernetes_namespace
> - source_labels: [__meta_kubernetes_pod_name]
>   action: replace
>   target_label: kubernetes_pod_name
> - source_labels: 
> [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
>   action: keep
>   regex: true
> - source_labels: 
> [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
>   action: replace
>   target_label: __scheme__
>   regex: (https?)
> - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
>   action: replace
>   target_label: __metrics_path__
>   regex: (.+)
> - source_labels: [__address__, 
> __meta_kubernetes_pod_prometheus_io_port]
>   action: replace
>   target_label: __address__
>   regex: ([^:]+)(?::\d+)?;(\d+)
>   replacement: $1:$2
> tcptrack command output in spark3 pod
> ==
> 10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
> 10.198.22.240:51258  10.198.40.143:7079  CLOSED 10s 0 B/s
> 10.198.22.240:50354  10.198.40.143:7079  CLOSED 40s 0 B/s
> 10.198.22.240:33152  10.198.40.143:4040  ESTABLISHED 2s 0 B/s
> 10.198.22.240:47726  10.198.40.143:8090  ESTABLISHED 9s 0 B/s
> 10.198.22.240 = prometheus pod 
> ip10.198.40.143 = testpod ip 
> Issue
> ==
> Though the scrape config is expected to scrape on port 8090. I see prometheus 
> tries to initiate scrape on ports like 7079, 7078, 4040, etc on
> the spark3 pod and hence the exception in spark3 pod. But is this really a 
> p

[jira] [Issue Comment Deleted] (SPARK-37095) Inline type hints for files in python/pyspark/broadcast.py

2021-10-27 Thread Byron Hsu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Byron Hsu updated SPARK-37095:
--
Comment: was deleted

(was: test)

> Inline type hints for files in python/pyspark/broadcast.py
> --
>
> Key: SPARK-37095
> URL: https://issues.apache.org/jira/browse/SPARK-37095
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37128) Application has been removed by master but driver still running

2021-10-27 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435121#comment-17435121
 ] 

Hyukjin Kwon commented on SPARK-37128:
--

Can you share the steps to reproduce the issue? which environment did you use?

> Application has been removed by master but driver still running
> ---
>
> Key: SPARK-37128
> URL: https://issues.apache.org/jira/browse/SPARK-37128
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: JacobZheng
>Priority: Major
>
> {code:java}
> 21/08/30 10:27:31 INFO Master: Removing executor app-20210827190502-0030/1 
> because it is EXITED
> 21/08/30 10:27:31 INFO Master: Launching executor app-20210827190502-0030/4 
> on worker worker-20210826183405-10.39.0.69-37147
> 21/08/30 10:27:31 INFO Master: 10.39.0.68:47160 got disassociated, removing 
> it.
> 21/08/30 10:27:31 INFO Master: 10.39.0.68:35160 got disassociated, removing 
> it.
> 21/08/30 10:27:31 INFO Master: Removing app app-20210827190502-0030
> 21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/4
> 21/08/30 10:27:31 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/4
> 21/08/30 10:27:46 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/2
> 21/08/30 10:27:48 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/0
> 21/08/30 10:27:50 WARN Master: Got status update for unknown executor 
> app-20210827190502-0030/3{code}
> As the logs show, Spark master removed my application. But my driver process 
> is still running. I would like to know what could be the cause of this and 
> how I can avoid it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate

2021-10-27 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435120#comment-17435120
 ] 

Hyukjin Kwon commented on SPARK-37131:
--

cc [~allisonwang-db] and [~cloud_fan] FYI

> Support use IN/EXISTS with subquery in Project/Aggregate
> 
>
> Key: SPARK-37131
> URL: https://issues.apache.org/jira/browse/SPARK-37131
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tongwei
>Priority: Major
>
> {code:java}
> CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
> INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
> CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
> INSERT OVERWRITE TABLE tbl2 SELECT 0,2;
> case 1:
> select c1 in (select col1 from tbl1) from tbl2 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Project []
> case 2:
> select count(1), case when c1 in (select col1 from tbl1) then "A" else 
> "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) 
> then "A" else "B" end 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Aggregate []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37095) Inline type hints for files in python/pyspark/broadcast.py

2021-10-27 Thread Byron Hsu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435119#comment-17435119
 ] 

Byron Hsu commented on SPARK-37095:
---

test

> Inline type hints for files in python/pyspark/broadcast.py
> --
>
> Key: SPARK-37095
> URL: https://issues.apache.org/jira/browse/SPARK-37095
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37132) Incorrect Spark 3.2.0 package names with included Hadoop binaries

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37132.
--
Resolution: Duplicate

> Incorrect Spark 3.2.0 package names with included Hadoop binaries
> -
>
> Key: SPARK-37132
> URL: https://issues.apache.org/jira/browse/SPARK-37132
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Documentation
>Affects Versions: 3.2.0
>Reporter: Denis Krivenko
>Priority: Trivial
>
> *Spark 3.2.0+Hadoop* packages contains Hadoop 3.3 binaries, however file 
> names still refer to Hadoop 3.2, i.e. _spark-3.2.0-bin-*hadoop3.2*.tgz_
> [https://dlcdn.apache.org/spark/spark-3.2.0/]
> [https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz]
> [https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2-scala2.13.tgz]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-27 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435117#comment-17435117
 ] 

Hyukjin Kwon commented on SPARK-37134:
--

They are individual items so OR is correct. Feel free to create a PR to clarify 
them.

> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.2
>Reporter: carl rees
>Priority: Major
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37134:
-
Target Version/s:   (was: 1.6.2)

> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.2
>Reporter: carl rees
>Priority: Critical
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37134:
-
Shepherd:   (was: Tenovip33)

> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.2
>Reporter: carl rees
>Priority: Major
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37134:
-
Priority: Major  (was: Critical)

> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.2
>Reporter: carl rees
>Priority: Major
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37134:
-
Environment: (was: ?)

> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.2
>Reporter: carl rees
>Priority: Critical
> Fix For: 1.6.2
>
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37134:
-
Fix Version/s: (was: 1.6.2)

> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.2
>Reporter: carl rees
>Priority: Critical
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37121) TestUtils.isPythonVersionAtLeast38 returns incorrect results

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37121.
--
Fix Version/s: 3.2.1
   3.3.0
   Resolution: Fixed

Issue resolved by pull request 34395
[https://github.com/apache/spark/pull/34395]

> TestUtils.isPythonVersionAtLeast38 returns incorrect results
> 
>
> Key: SPARK-37121
> URL: https://issues.apache.org/jira/browse/SPARK-37121
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.2.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.3.0, 3.2.1
>
>
> I was working on {{HiveExternalCatalogVersionsSuite}} recently and noticed 
> that it was never running against the Spark 2.x release lines, only the 3.x 
> ones. The problem was coming from here, specifically the Python 3.8+ version 
> check:
> {code}
> versions
>   .filter(v => v.startsWith("3") || !TestUtils.isPythonVersionAtLeast38())
>   .filter(v => v.startsWith("3") || 
> !SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9))
> {code}
> I found that {{TestUtils.isPythonVersionAtLeast38()}} was always returning 
> true, even when my system installation of Python3 was 3.7. Thinking it was an 
> environment issue, I pulled up a debugger to check which version of Python 
> the test JVM was seeing, and it was in fact Python 3.7.
> Turns out the issue is with the {{isPythonVersionAtLeast38}} method:
> {code}
>   def isPythonVersionAtLeast38(): Boolean = {
> val attempt = if (Utils.isWindows) {
>   Try(Process(Seq("cmd.exe", "/C", "python3 --version"))
> .run(ProcessLogger(s => s.startsWith("Python 3.8") || 
> s.startsWith("Python 3.9")))
> .exitValue())
> } else {
>   Try(Process(Seq("sh", "-c", "python3 --version"))
> .run(ProcessLogger(s => s.startsWith("Python 3.8") || 
> s.startsWith("Python 3.9")))
> .exitValue())
> }
> attempt.isSuccess && attempt.get == 0
>   }
> {code}
> It's trying to evaluate the version of Python using a {{ProcessLogger}}, but 
> the logger accepts a {{String => Unit}} function, i.e., it does not make use 
> of the return value in any way (since it's meant for logging). So the result 
> of the {{startsWith}} checks are thrown away, and {{attempt.isSuccess && 
> attempt.get == 0}} will always be true as long as your system has a 
> {{python3}} binary of any version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37121) TestUtils.isPythonVersionAtLeast38 returns incorrect results

2021-10-27 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37121:


Assignee: Erik Krogen

> TestUtils.isPythonVersionAtLeast38 returns incorrect results
> 
>
> Key: SPARK-37121
> URL: https://issues.apache.org/jira/browse/SPARK-37121
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.2.0
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
>
> I was working on {{HiveExternalCatalogVersionsSuite}} recently and noticed 
> that it was never running against the Spark 2.x release lines, only the 3.x 
> ones. The problem was coming from here, specifically the Python 3.8+ version 
> check:
> {code}
> versions
>   .filter(v => v.startsWith("3") || !TestUtils.isPythonVersionAtLeast38())
>   .filter(v => v.startsWith("3") || 
> !SystemUtils.isJavaVersionAtLeast(JavaVersion.JAVA_9))
> {code}
> I found that {{TestUtils.isPythonVersionAtLeast38()}} was always returning 
> true, even when my system installation of Python3 was 3.7. Thinking it was an 
> environment issue, I pulled up a debugger to check which version of Python 
> the test JVM was seeing, and it was in fact Python 3.7.
> Turns out the issue is with the {{isPythonVersionAtLeast38}} method:
> {code}
>   def isPythonVersionAtLeast38(): Boolean = {
> val attempt = if (Utils.isWindows) {
>   Try(Process(Seq("cmd.exe", "/C", "python3 --version"))
> .run(ProcessLogger(s => s.startsWith("Python 3.8") || 
> s.startsWith("Python 3.9")))
> .exitValue())
> } else {
>   Try(Process(Seq("sh", "-c", "python3 --version"))
> .run(ProcessLogger(s => s.startsWith("Python 3.8") || 
> s.startsWith("Python 3.9")))
> .exitValue())
> }
> attempt.isSuccess && attempt.get == 0
>   }
> {code}
> It's trying to evaluate the version of Python using a {{ProcessLogger}}, but 
> the logger accepts a {{String => Unit}} function, i.e., it does not make use 
> of the return value in any way (since it's meant for logging). So the result 
> of the {{startsWith}} checks are thrown away, and {{attempt.isSuccess && 
> attempt.get == 0}} will always be true as long as your system has a 
> {{python3}} binary of any version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37047) Add overloads for lpad and rpad for BINARY strings

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17435105#comment-17435105
 ] 

Apache Spark commented on SPARK-37047:
--

User 'mkaravel' has created a pull request for this issue:
https://github.com/apache/spark/pull/34407

> Add overloads for lpad and rpad for BINARY strings
> --
>
> Key: SPARK-37047
> URL: https://issues.apache.org/jira/browse/SPARK-37047
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Menelaos Karavelas
>Assignee: Menelaos Karavelas
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, `lpad` and `rpad` accept BINARY strings as input (both in terms of 
> input string to be padded and padding pattern), and these strings get cast to 
> UTF8 strings. The result of the operation is a UTF8 string which may be 
> invalid as it can contain non-UTF8 characters.
> What we would like to do is to overload `lpad` and `rpad` to accept BINARY 
> strings as inputs (both for the string to be padded and the padding pattern) 
> and produce a left or right padded BINARY string as output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-27 Thread carl rees (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

carl rees updated SPARK-37134:
--
Description: 
sorry, no idea on the version it affects or what Shepard is? no explanation on 
this form so guessed whatever!

 

This page of your documentation is UNCLEAR

paragraph "Using PySpark Native Features" QUOTE

"PySpark allows to upload Python files ({{.py}}), zipped Python packages 
({{.zip}}), and Egg files ({{.egg}}) to the executors by:
 * Setting the configuration setting {{spark.submit.pyFiles}}

 * Setting {{--py-files}} option in Spark scripts

 * Directly calling 
[{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
 in applications

 

QUESTION: is this all of the above or each of the above steps?

suggest adding "OR" between each bullet point?

 

[https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]

 

  was:
sorry, no idea on the version it affects or what Shepard is? no explanation on 
this form so guessed whatever!

 

This page of your documentation is UNCLEAR

paragraph "Using PySpark Native Features" QUOTE

"PySpark allows to upload Python files ({{.py}}), zipped Python packages 
({{.zip}}), and Egg files ({{.egg}}) to the executors by:
 * Setting the configuration setting {{spark.submit.pyFiles}}

 * Setting {{--py-files}} option in Spark scripts

 * Directly calling 
[{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
 in applications

 

QUESTION: is this all of the above or each of the above steps?

 

[https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]

 


> documentation - unclear "Using PySpark Native Features"
> ---
>
> Key: SPARK-37134
> URL: https://issues.apache.org/jira/browse/SPARK-37134
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.6.2
> Environment: ?
>Reporter: carl rees
>Priority: Critical
> Fix For: 1.6.2
>
>
> sorry, no idea on the version it affects or what Shepard is? no explanation 
> on this form so guessed whatever!
>  
> This page of your documentation is UNCLEAR
> paragraph "Using PySpark Native Features" QUOTE
> "PySpark allows to upload Python files ({{.py}}), zipped Python packages 
> ({{.zip}}), and Egg files ({{.egg}}) to the executors by:
>  * Setting the configuration setting {{spark.submit.pyFiles}}
>  * Setting {{--py-files}} option in Spark scripts
>  * Directly calling 
> [{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
>  in applications
>  
> QUESTION: is this all of the above or each of the above steps?
> suggest adding "OR" between each bullet point?
>  
> [https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37134) documentation - unclear "Using PySpark Native Features"

2021-10-27 Thread carl rees (Jira)
carl rees created SPARK-37134:
-

 Summary: documentation - unclear "Using PySpark Native Features"
 Key: SPARK-37134
 URL: https://issues.apache.org/jira/browse/SPARK-37134
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.6.2
 Environment: ?
Reporter: carl rees
 Fix For: 1.6.2


sorry, no idea on the version it affects or what Shepard is? no explanation on 
this form so guessed whatever!

 

This page of your documentation is UNCLEAR

paragraph "Using PySpark Native Features" QUOTE

"PySpark allows to upload Python files ({{.py}}), zipped Python packages 
({{.zip}}), and Egg files ({{.egg}}) to the executors by:
 * Setting the configuration setting {{spark.submit.pyFiles}}

 * Setting {{--py-files}} option in Spark scripts

 * Directly calling 
[{{pyspark.SparkContext.addPyFile()}}|https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.SparkContext.addPyFile.html#pyspark.SparkContext.addPyFile]
 in applications

 

QUESTION: is this all of the above or each of the above steps?

 

[https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36646) Push down group by partition column for Aggregate (Min/Max/Count) for Parquet

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434995#comment-17434995
 ] 

Apache Spark commented on SPARK-36646:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34405

> Push down group by partition column for Aggregate (Min/Max/Count) for Parquet
> -
>
> Key: SPARK-36646
> URL: https://issues.apache.org/jira/browse/SPARK-36646
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>
> If Aggregate (Min/Max/Count) in parquet is group by partition column, push 
> down group by



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36646) Push down group by partition column for Aggregate (Min/Max/Count) for Parquet

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36646:


Assignee: (was: Apache Spark)

> Push down group by partition column for Aggregate (Min/Max/Count) for Parquet
> -
>
> Key: SPARK-36646
> URL: https://issues.apache.org/jira/browse/SPARK-36646
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>
> If Aggregate (Min/Max/Count) in parquet is group by partition column, push 
> down group by



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36646) Push down group by partition column for Aggregate (Min/Max/Count) for Parquet

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-36646:


Assignee: Apache Spark

> Push down group by partition column for Aggregate (Min/Max/Count) for Parquet
> -
>
> Key: SPARK-36646
> URL: https://issues.apache.org/jira/browse/SPARK-36646
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Major
>
> If Aggregate (Min/Max/Count) in parquet is group by partition column, push 
> down group by



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30220) Support Filter expression uses IN/EXISTS predicate sub-queries

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30220:


Assignee: (was: Apache Spark)

> Support Filter expression uses IN/EXISTS predicate sub-queries
> --
>
> Key: SPARK-30220
> URL: https://issues.apache.org/jira/browse/SPARK-30220
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> Spark SQL cannot supports a SQL with nested aggregate as below:
>  
> {code:java}
> select sum(unique1) FILTER (WHERE
>  unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;{code}
>  
> And Spark will throw exception as follows:
>  
> {code:java}
> org.apache.spark.sql.AnalysisException
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
> commands: Aggregate [sum(cast(unique1#x as bigint)) AS sum(unique1)#xL]
> : +- Project [unique1#x]
> : +- Filter (unique1#x < 100)
> : +- SubqueryAlias `onek`
> : +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, 
> hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, 
> stringu1#x, stringu2#x, string4#x] csv 
> file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/onek.data
> +- SubqueryAlias `tenk1`
>  +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, 
> hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, 
> stringu1#x, stringu2#x, string4#x] csv 
> file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/tenk.data{code}
>  
> But PostgreSQL supports this syntax.
> {code:java}
> select sum(unique1) FILTER (WHERE
>  unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;
>  sum 
> --
>  4950
> (1 row){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30220) Support Filter expression uses IN/EXISTS predicate sub-queries

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30220:


Assignee: Apache Spark

> Support Filter expression uses IN/EXISTS predicate sub-queries
> --
>
> Key: SPARK-30220
> URL: https://issues.apache.org/jira/browse/SPARK-30220
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Spark SQL cannot supports a SQL with nested aggregate as below:
>  
> {code:java}
> select sum(unique1) FILTER (WHERE
>  unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;{code}
>  
> And Spark will throw exception as follows:
>  
> {code:java}
> org.apache.spark.sql.AnalysisException
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
> commands: Aggregate [sum(cast(unique1#x as bigint)) AS sum(unique1)#xL]
> : +- Project [unique1#x]
> : +- Filter (unique1#x < 100)
> : +- SubqueryAlias `onek`
> : +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, 
> hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, 
> stringu1#x, stringu2#x, string4#x] csv 
> file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/onek.data
> +- SubqueryAlias `tenk1`
>  +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, 
> hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, 
> stringu1#x, stringu2#x, string4#x] csv 
> file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/tenk.data{code}
>  
> But PostgreSQL supports this syntax.
> {code:java}
> select sum(unique1) FILTER (WHERE
>  unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;
>  sum 
> --
>  4950
> (1 row){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30220) Support Filter expression uses IN/EXISTS predicate sub-queries

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434897#comment-17434897
 ] 

Apache Spark commented on SPARK-30220:
--

User 'tanelk' has created a pull request for this issue:
https://github.com/apache/spark/pull/34402

> Support Filter expression uses IN/EXISTS predicate sub-queries
> --
>
> Key: SPARK-30220
> URL: https://issues.apache.org/jira/browse/SPARK-30220
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Priority: Major
>
> Spark SQL cannot supports a SQL with nested aggregate as below:
>  
> {code:java}
> select sum(unique1) FILTER (WHERE
>  unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;{code}
>  
> And Spark will throw exception as follows:
>  
> {code:java}
> org.apache.spark.sql.AnalysisException
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
> commands: Aggregate [sum(cast(unique1#x as bigint)) AS sum(unique1)#xL]
> : +- Project [unique1#x]
> : +- Filter (unique1#x < 100)
> : +- SubqueryAlias `onek`
> : +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, 
> hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, 
> stringu1#x, stringu2#x, string4#x] csv 
> file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/onek.data
> +- SubqueryAlias `tenk1`
>  +- RelationV2[unique1#x, unique2#x, two#x, four#x, ten#x, twenty#x, 
> hundred#x, thousand#x, twothousand#x, fivethous#x, tenthous#x, odd#x, even#x, 
> stringu1#x, stringu2#x, string4#x] csv 
> file:/home/xitong/code/gengjiaan/spark/sql/core/target/scala-2.12/test-classes/test-data/postgresql/tenk.data{code}
>  
> But PostgreSQL supports this syntax.
> {code:java}
> select sum(unique1) FILTER (WHERE
>  unique1 IN (SELECT unique1 FROM onek where unique1 < 100)) FROM tenk1;
>  sum 
> --
>  4950
> (1 row){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37133) Add a config to optionally enforce ANSI reserved keywords

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37133:


Assignee: Apache Spark  (was: Wenchen Fan)

> Add a config to optionally enforce ANSI reserved keywords
> -
>
> Key: SPARK-37133
> URL: https://issues.apache.org/jira/browse/SPARK-37133
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37133) Add a config to optionally enforce ANSI reserved keywords

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434896#comment-17434896
 ] 

Apache Spark commented on SPARK-37133:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/34403

> Add a config to optionally enforce ANSI reserved keywords
> -
>
> Key: SPARK-37133
> URL: https://issues.apache.org/jira/browse/SPARK-37133
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37133) Add a config to optionally enforce ANSI reserved keywords

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37133:


Assignee: Wenchen Fan  (was: Apache Spark)

> Add a config to optionally enforce ANSI reserved keywords
> -
>
> Key: SPARK-37133
> URL: https://issues.apache.org/jira/browse/SPARK-37133
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37132) Incorrect Spark 3.2.0 package names with included Hadoop binaries

2021-10-27 Thread Denis Krivenko (Jira)
Denis Krivenko created SPARK-37132:
--

 Summary: Incorrect Spark 3.2.0 package names with included Hadoop 
binaries
 Key: SPARK-37132
 URL: https://issues.apache.org/jira/browse/SPARK-37132
 Project: Spark
  Issue Type: Bug
  Components: Build, Documentation
Affects Versions: 3.2.0
Reporter: Denis Krivenko


*Spark 3.2.0+Hadoop* packages contains Hadoop 3.3 binaries, however file names 
still refer to Hadoop 3.2, i.e. _spark-3.2.0-bin-*hadoop3.2*.tgz_

[https://dlcdn.apache.org/spark/spark-3.2.0/]

[https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2.tgz]

[https://dlcdn.apache.org/spark/spark-3.2.0/spark-3.2.0-bin-hadoop3.2-scala2.13.tgz]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37133) Add a config to optionally enforce ANSI reserved keywords

2021-10-27 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-37133:
---

 Summary: Add a config to optionally enforce ANSI reserved keywords
 Key: SPARK-37133
 URL: https://issues.apache.org/jira/browse/SPARK-37133
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37071) OpenHashMap should be serializable without reference tracking

2021-10-27 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-37071.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34351
[https://github.com/apache/spark/pull/34351]

> OpenHashMap should be serializable without reference tracking
> -
>
> Key: SPARK-37071
> URL: https://issues.apache.org/jira/browse/SPARK-37071
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Minor
> Fix For: 3.3.0
>
>
> The current implementation of OpenHashMap does not serialize without kryo 
> reference tracking turned on. This is unexpected from a simple type like 
> OpenHashMap, and forces the users to turn of reference tracking to use code 
> where OpenHashMap is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37071) OpenHashMap should be serializable without reference tracking

2021-10-27 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-37071:


Assignee: Emil Ejbyfeldt

> OpenHashMap should be serializable without reference tracking
> -
>
> Key: SPARK-37071
> URL: https://issues.apache.org/jira/browse/SPARK-37071
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Minor
>
> The current implementation of OpenHashMap does not serialize without kryo 
> reference tracking turned on. This is unexpected from a simple type like 
> OpenHashMap, and forces the users to turn of reference tracking to use code 
> where OpenHashMap is used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate

2021-10-27 Thread Tongwei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tongwei updated SPARK-37131:

Description: 
{code:java}
CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
INSERT OVERWRITE TABLE tbl2 SELECT 0,2;

case 1:
select c1 in (select col1 from tbl1) from tbl2 
Error msg:
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
few commands: Project []
case 2:
select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" 
end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" 
else "B" end 
Error msg:
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
few commands: Aggregate []
{code}

  was:
 

 
{code:java}
CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
INSERT OVERWRITE TABLE tbl2 SELECT 0,2;

case 1:
select c1 in (select col1 from tbl1) from tbl2 
Error msg:
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
few commands: Project []
case 2:
select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" 
end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" 
else "B" end 
Error msg:
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
few commands: Aggregate []
{code}


> Support use IN/EXISTS with subquery in Project/Aggregate
> 
>
> Key: SPARK-37131
> URL: https://issues.apache.org/jira/browse/SPARK-37131
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tongwei
>Priority: Major
>
> {code:java}
> CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
> INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
> CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
> INSERT OVERWRITE TABLE tbl2 SELECT 0,2;
> case 1:
> select c1 in (select col1 from tbl1) from tbl2 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Project []
> case 2:
> select count(1), case when c1 in (select col1 from tbl1) then "A" else 
> "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) 
> then "A" else "B" end 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Aggregate []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate

2021-10-27 Thread Tongwei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tongwei updated SPARK-37131:

Description: 
 

 
{code:java}
CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
INSERT OVERWRITE TABLE tbl2 SELECT 0,2;

case 1:
select c1 in (select col1 from tbl1) from tbl2 
Error msg:
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
few commands: Project []
case 2:
select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" 
end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" 
else "B" end 
Error msg:
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
few commands: Aggregate []
{code}

  was:
```

CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
 INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
 CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
 INSERT OVERWRITE TABLE tbl2 SELECT 0,2;

case 1:
 select c1 in (select col1 from tbl1) from tbl2 
 Error msg:
 IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
commands: Project []


 case 2:
 select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" end 
as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else 
"B" end 
 Error msg:
 IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
commands: Aggregate []

```


> Support use IN/EXISTS with subquery in Project/Aggregate
> 
>
> Key: SPARK-37131
> URL: https://issues.apache.org/jira/browse/SPARK-37131
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tongwei
>Priority: Major
>
>  
>  
> {code:java}
> CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
> INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
> CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
> INSERT OVERWRITE TABLE tbl2 SELECT 0,2;
> case 1:
> select c1 in (select col1 from tbl1) from tbl2 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Project []
> case 2:
> select count(1), case when c1 in (select col1 from tbl1) then "A" else 
> "B" end as tag from tbl2 group by case when c1 in (select col1 from tbl1) 
> then "A" else "B" end 
> Error msg:
> IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
> few commands: Aggregate []
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate

2021-10-27 Thread Tongwei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tongwei updated SPARK-37131:

Description: 
CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
 INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
 CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
 INSERT OVERWRITE TABLE tbl2 SELECT 0,2;

case 1:
 select c1 in (select col1 from tbl1) from tbl2 
 Error msg:
 IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
commands: Project []
 case 2:
 select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" end 
as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else 
"B" end 
 Error msg:
 IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
commands: Aggregate []

  was:
CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
INSERT OVERWRITE TABLE tbl2 SELECT 0,2;

case 1:
select c1 in (select col1 from tbl1) from tbl2 
Error msg:
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
few commands: Project []
case 2:
select count(*), case when c1 in (select col1 from tbl1) then "A" else "B" 
end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" 
else "B" end 
Error msg:
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
few commands: Aggregate []


> Support use IN/EXISTS with subquery in Project/Aggregate
> 
>
> Key: SPARK-37131
> URL: https://issues.apache.org/jira/browse/SPARK-37131
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tongwei
>Priority: Major
>
> CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
>  INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
>  CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
>  INSERT OVERWRITE TABLE tbl2 SELECT 0,2;
> case 1:
>  select c1 in (select col1 from tbl1) from tbl2 
>  Error msg:
>  IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
> commands: Project []
>  case 2:
>  select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" 
> end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then 
> "A" else "B" end 
>  Error msg:
>  IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
> commands: Aggregate []



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate

2021-10-27 Thread Tongwei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tongwei updated SPARK-37131:

Description: 
```

CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
 INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
 CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
 INSERT OVERWRITE TABLE tbl2 SELECT 0,2;

case 1:
 select c1 in (select col1 from tbl1) from tbl2 
 Error msg:
 IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
commands: Project []


 case 2:
 select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" end 
as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else 
"B" end 
 Error msg:
 IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
commands: Aggregate []

```

  was:
CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
 INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
 CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
 INSERT OVERWRITE TABLE tbl2 SELECT 0,2;

case 1:
 select c1 in (select col1 from tbl1) from tbl2 
 Error msg:
 IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
commands: Project []
 case 2:
 select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" end 
as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" else 
"B" end 
 Error msg:
 IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
commands: Aggregate []


> Support use IN/EXISTS with subquery in Project/Aggregate
> 
>
> Key: SPARK-37131
> URL: https://issues.apache.org/jira/browse/SPARK-37131
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tongwei
>Priority: Major
>
> ```
> CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
>  INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
>  CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
>  INSERT OVERWRITE TABLE tbl2 SELECT 0,2;
> case 1:
>  select c1 in (select col1 from tbl1) from tbl2 
>  Error msg:
>  IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
> commands: Project []
>  case 2:
>  select count(1), case when c1 in (select col1 from tbl1) then "A" else "B" 
> end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then 
> "A" else "B" end 
>  Error msg:
>  IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few 
> commands: Aggregate []
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37131) Support use IN/EXISTS with subquery in Project/Aggregate

2021-10-27 Thread Tongwei (Jira)
Tongwei created SPARK-37131:
---

 Summary: Support use IN/EXISTS with subquery in Project/Aggregate
 Key: SPARK-37131
 URL: https://issues.apache.org/jira/browse/SPARK-37131
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Tongwei


CREATE TABLE tbl1 (col1 INT, col2 STRING) USING PARQUET;
INSERT OVERWRITE TABLE tbl1 SELECT 0,1;
CREATE TABLE tbl2 (c1 INT, c2 STRING) USING PARQUET; 
INSERT OVERWRITE TABLE tbl2 SELECT 0,2;

case 1:
select c1 in (select col1 from tbl1) from tbl2 
Error msg:
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
few commands: Project []
case 2:
select count(*), case when c1 in (select col1 from tbl1) then "A" else "B" 
end as tag from tbl2 group by case when c1 in (select col1 from tbl1) then "A" 
else "B" end 
Error msg:
IN/EXISTS predicate sub-queries can only be used in Filter/Join and a 
few commands: Aggregate []



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37130) why spark-X.X.X-bin-without-hadoop.tgz does not provide spark-hive_X.jar (and spark-hive-thriftserver_X.jar)

2021-10-27 Thread Patrice DUROUX (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434848#comment-17434848
 ] 

Patrice DUROUX commented on SPARK-37130:


ps: a diff output

{{$ diff spark-without.lst spark-with.lst }}
{{667a668}}
{{> /jars/activation-1.1.1.jar}}
{{671a673}}
{{> /jars/antlr-runtime-3.5.2.jar}}
{{684a687}}
{{> /jars/bonecp-0.8.0.RELEASE.jar}}
{{689a693}}
{{> /jars/commons-cli-1.2.jar}}
{{694a699}}
{{> /jars/commons-dbcp-1.4.jar}}
{{695a701}}
{{> /jars/commons-lang-2.6.jar}}
{{696a703}}
{{> /jars/commons-logging-1.1.3.jar}}
{{698a706}}
{{> /jars/commons-pool-1.5.4.jar}}
{{701a710,717}}
{{> /jars/curator-client-2.13.0.jar}}
{{> /jars/curator-framework-2.13.0.jar}}
{{> /jars/curator-recipes-2.13.0.jar}}
{{> /jars/datanucleus-api-jdo-4.2.4.jar}}
{{> /jars/datanucleus-core-4.1.17.jar}}
{{> /jars/datanucleus-rdbms-4.1.19.jar}}
{{> /jars/derby-10.14.2.0.jar}}
{{> /jars/dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar}}
{{704c720,739}}
{{< /jars/gson-2.8.6.jar}}
{{---}}
{{> /jars/gson-2.2.4.jar}}
{{> /jars/guava-14.0.1.jar}}
{{> /jars/hadoop-client-api-3.3.1.jar}}
{{> /jars/hadoop-client-runtime-3.3.1.jar}}
{{> /jars/hadoop-shaded-guava-1.1.1.jar}}
{{> /jars/hadoop-yarn-server-web-proxy-3.3.1.jar}}
{{> /jars/HikariCP-2.5.1.jar}}
{{> /jars/hive-beeline-2.3.9.jar}}
{{> /jars/hive-cli-2.3.9.jar}}
{{> /jars/hive-common-2.3.9.jar}}
{{> /jars/hive-exec-2.3.9-core.jar}}
{{> /jars/hive-jdbc-2.3.9.jar}}
{{> /jars/hive-llap-common-2.3.9.jar}}
{{> /jars/hive-metastore-2.3.9.jar}}
{{> /jars/hive-serde-2.3.9.jar}}
{{> /jars/hive-service-rpc-3.1.2.jar}}
{{> /jars/hive-shims-0.23-2.3.9.jar}}
{{> /jars/hive-shims-2.3.9.jar}}
{{> /jars/hive-shims-common-2.3.9.jar}}
{{> /jars/hive-shims-scheduler-2.3.9.jar}}
{{705a741}}
{{> /jars/hive-vector-code-gen-2.3.9.jar}}
{{708a745,747}}
{{> /jars/htrace-core4-4.1.0-incubating.jar}}
{{> /jars/httpclient-4.5.13.jar}}
{{> /jars/httpcore-4.4.14.jar}}
{{712a752}}
{{> /jars/jackson-core-asl-1.9.13.jar}}
{{715a756}}
{{> /jars/jackson-mapper-asl-1.9.13.jar}}
{{724a766,767}}
{{> /jars/javax.jdo-3.2.0-m3.jar}}
{{> /jars/javolution-5.5.1.jar}}
{{727a771}}
{{> /jars/jdo-api-3.0.1.jar}}
{{734a779,783}}
{{> /jars/jline-2.14.6.jar}}
{{> /jars/joda-time-2.10.10.jar}}
{{> /jars/jodd-core-3.5.2.jar}}
{{> /jars/jpam-1.1.jar}}
{{> /jars/json-1.8.jar}}
{{739a789}}
{{> /jars/jta-1.1.jar}}
{{765a816,818}}
{{> /jars/libfb303-0.9.3.jar}}
{{> /jars/libthrift-0.12.0.jar}}
{{> /jars/log4j-1.2.17.jar}}
{{792a846}}
{{> /jars/protobuf-java-2.5.0.jar}}
{{804a859,860}}
{{> /jars/slf4j-api-1.7.30.jar}}
{{> /jars/slf4j-log4j12-1.7.30.jar}}
{{809a866,867}}
{{> /jars/spark-hive_2.12-3.2.0.jar}}
{{> /jars/spark-hive-thriftserver_2.12-3.2.0.jar}}
{{829a888,889}}
{{> /jars/ST4-4.0.4.jar}}
{{> /jars/stax-api-1.0.1.jar}}
{{830a891}}
{{> /jars/super-csv-2.2.0.jar}}
{{832a894}}
{{> /jars/transaction-api-1.1.jar}}
{{833a896}}
{{> /jars/velocity-1.5.jar}}
{{836a900,901}}
{{> /jars/zookeeper-3.6.2.jar}}
{{> /jars/zookeeper-jute-3.6.2.jar}}
{{919a985}}
{{> /python/dist/}}
{{1015a1082,1087}}
{{> /python/pyspark.egg-info/}}
{{> /python/pyspark.egg-info/dependency_links.txt}}
{{> /python/pyspark.egg-info/PKG-INFO}}
{{> /python/pyspark.egg-info/requires.txt}}
{{> /python/pyspark.egg-info/SOURCES.txt}}
{{> /python/pyspark.egg-info/top_level.txt}}
{{1269a1342,1346}}
{{> /python/pyspark/__pycache__/}}
{{> /python/pyspark/__pycache__/install.cpython-38.pyc}}
{{> /python/pyspark/python/}}
{{> /python/pyspark/python/pyspark/}}
{{> /python/pyspark/python/pyspark/shell.py}}
{{1497a1575,1579}}
{{> /R/lib/SparkR/doc/}}
{{> /R/lib/SparkR/doc/index.html}}
{{> /R/lib/SparkR/doc/sparkr-vignettes.html}}
{{> /R/lib/SparkR/doc/sparkr-vignettes.R}}
{{> /R/lib/SparkR/doc/sparkr-vignettes.Rmd}}
{{1514a1597}}
{{> /R/lib/SparkR/Meta/vignette.rds}}

> why spark-X.X.X-bin-without-hadoop.tgz does not provide spark-hive_X.jar (and 
> spark-hive-thriftserver_X.jar)
> 
>
> Key: SPARK-37130
> URL: https://issues.apache.org/jira/browse/SPARK-37130
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 3.1.2, 3.2.0
>Reporter: Patrice DUROUX
>Priority: Minor
>
> Hi,
> As my deployment is having its own Hadoop(+Hive) installed, I have tried to 
> install Spark using  its bundle without Hadoop. I suspect that some jars are 
> missing that are present in the corresponding spark-X.X.X-bin-hadoop3.2.tgz.
> After comparing their contents both spark-hive_2.12-X.X.X.jar and 
> spark-hive-thriftserver_2.12-X.X.X.jar are not in the 
> spark-X.X.X-bin-without---hadoop.tgz. And I don't know if some others should 
> also be there.
> Thanks,
> Patrice
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SPARK-37130) why spark-X.X.X-bin-without-hadoop.tgz does not provide spark-hive_X.jar (and spark-hive-thriftserver_X.jar)

2021-10-27 Thread Patrice DUROUX (Jira)
Patrice DUROUX created SPARK-37130:
--

 Summary: why spark-X.X.X-bin-without-hadoop.tgz does not provide 
spark-hive_X.jar (and spark-hive-thriftserver_X.jar)
 Key: SPARK-37130
 URL: https://issues.apache.org/jira/browse/SPARK-37130
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Affects Versions: 3.2.0, 3.1.2
Reporter: Patrice DUROUX


Hi,

As my deployment is having its own Hadoop(+Hive) installed, I have tried to 
install Spark using  its bundle without Hadoop. I suspect that some jars are 
missing that are present in the corresponding spark-X.X.X-bin-hadoop3.2.tgz.

After comparing their contents both spark-hive_2.12-X.X.X.jar and 
spark-hive-thriftserver_2.12-X.X.X.jar are not in the 
spark-X.X.X-bin-without---hadoop.tgz. And I don't know if some others should 
also be there.

Thanks,

Patrice

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30537) toPandas gets wrong dtypes when applied on empty DF when Arrow enabled

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434840#comment-17434840
 ] 

Apache Spark commented on SPARK-30537:
--

User 'pralabhkumar' has created a pull request for this issue:
https://github.com/apache/spark/pull/34401

> toPandas gets wrong dtypes when applied on empty DF when Arrow enabled
> --
>
> Key: SPARK-30537
> URL: https://issues.apache.org/jira/browse/SPARK-30537
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Same issue with SPARK-29188 persists when Arrow optimization is enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30537) toPandas gets wrong dtypes when applied on empty DF when Arrow enabled

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30537:


Assignee: Apache Spark

> toPandas gets wrong dtypes when applied on empty DF when Arrow enabled
> --
>
> Key: SPARK-30537
> URL: https://issues.apache.org/jira/browse/SPARK-30537
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Same issue with SPARK-29188 persists when Arrow optimization is enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30537) toPandas gets wrong dtypes when applied on empty DF when Arrow enabled

2021-10-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30537:


Assignee: (was: Apache Spark)

> toPandas gets wrong dtypes when applied on empty DF when Arrow enabled
> --
>
> Key: SPARK-30537
> URL: https://issues.apache.org/jira/browse/SPARK-30537
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Same issue with SPARK-29188 persists when Arrow optimization is enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30537) toPandas gets wrong dtypes when applied on empty DF when Arrow enabled

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434839#comment-17434839
 ] 

Apache Spark commented on SPARK-30537:
--

User 'pralabhkumar' has created a pull request for this issue:
https://github.com/apache/spark/pull/34401

> toPandas gets wrong dtypes when applied on empty DF when Arrow enabled
> --
>
> Key: SPARK-30537
> URL: https://issues.apache.org/jira/browse/SPARK-30537
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Same issue with SPARK-29188 persists when Arrow optimization is enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16280) Implement histogram_numeric SQL function

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-16280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434821#comment-17434821
 ] 

Apache Spark commented on SPARK-16280:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34380

> Implement histogram_numeric SQL function
> 
>
> Key: SPARK-16280
> URL: https://issues.apache.org/jira/browse/SPARK-16280
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: angerszhu
>Priority: Major
>  Labels: bulk-closed
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16280) Implement histogram_numeric SQL function

2021-10-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-16280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-16280:
---

Assignee: angerszhu

> Implement histogram_numeric SQL function
> 
>
> Key: SPARK-16280
> URL: https://issues.apache.org/jira/browse/SPARK-16280
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: angerszhu
>Priority: Major
>  Labels: bulk-closed
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16280) Implement histogram_numeric SQL function

2021-10-27 Thread Wenchen Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-16280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434812#comment-17434812
 ] 

Wenchen Fan commented on SPARK-16280:
-

resolved by https://github.com/apache/spark/pull/34380

> Implement histogram_numeric SQL function
> 
>
> Key: SPARK-16280
> URL: https://issues.apache.org/jira/browse/SPARK-16280
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: angerszhu
>Priority: Major
>  Labels: bulk-closed
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37082) Implement histogram_numeric aggregate function in spark

2021-10-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37082.
-
Resolution: Duplicate

> Implement histogram_numeric aggregate function in spark
> ---
>
> Key: SPARK-37082
> URL: https://issues.apache.org/jira/browse/SPARK-37082
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Implement histogram_numeric function in spark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36975) Refactor HiveClientImpl collect hive client call logic

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434790#comment-17434790
 ] 

Apache Spark commented on SPARK-36975:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34400

> Refactor HiveClientImpl collect hive client call logic
> --
>
> Key: SPARK-36975
> URL: https://issues.apache.org/jira/browse/SPARK-36975
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Currently, we treat one call withHiveState as one Hive Client call, it's too 
> weirld. Need to refator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36975) Refactor HiveClientImpl collect hive client call logic

2021-10-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434789#comment-17434789
 ] 

Apache Spark commented on SPARK-36975:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/34400

> Refactor HiveClientImpl collect hive client call logic
> --
>
> Key: SPARK-36975
> URL: https://issues.apache.org/jira/browse/SPARK-36975
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0
>Reporter: angerszhu
>Priority: Major
>
> Currently, we treat one call withHiveState as one Hive Client call, it's too 
> weirld. Need to refator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37129) Supplement all micro benchmark results use to Java 17

2021-10-27 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434762#comment-17434762
 ] 

Yang Jie commented on SPARK-37129:
--

I try to run all benchmarks on GA now

> Supplement all micro benchmark results use to Java 17
> -
>
> Key: SPARK-37129
> URL: https://issues.apache.org/jira/browse/SPARK-37129
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35260) DataSourceV2 Function Catalog implementation

2021-10-27 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434754#comment-17434754
 ] 

Dongjoon Hyun commented on SPARK-35260:
---

I assigned this umbrella issue to, [~csun].

> DataSourceV2 Function Catalog implementation
> 
>
> Key: SPARK-35260
> URL: https://issues.apache.org/jira/browse/SPARK-35260
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> This tracks the implementation and follow-up work for V2 Function Catalog 
> introduced in SPARK-27658



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-35260) DataSourceV2 Function Catalog implementation

2021-10-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-35260:
-

Assignee: Chao Sun

> DataSourceV2 Function Catalog implementation
> 
>
> Key: SPARK-35260
> URL: https://issues.apache.org/jira/browse/SPARK-35260
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> This tracks the implementation and follow-up work for V2 Function Catalog 
> introduced in SPARK-27658



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-37129) Supplement all micro benchmark results use to Java 17

2021-10-27 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-37129:
-
Comment: was deleted

(was: working on this)

> Supplement all micro benchmark results use to Java 17
> -
>
> Key: SPARK-37129
> URL: https://issues.apache.org/jira/browse/SPARK-37129
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37129) Supplement all micro benchmark results use to Java 17

2021-10-27 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434711#comment-17434711
 ] 

Yang Jie commented on SPARK-37129:
--

working on this

> Supplement all micro benchmark results use to Java 17
> -
>
> Key: SPARK-37129
> URL: https://issues.apache.org/jira/browse/SPARK-37129
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37129) Supplement all micro benchmark results use to Java 17

2021-10-27 Thread Yang Jie (Jira)
Yang Jie created SPARK-37129:


 Summary: Supplement all micro benchmark results use to Java 17
 Key: SPARK-37129
 URL: https://issues.apache.org/jira/browse/SPARK-37129
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 3.3.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37115) Replace HiveClient call with hive shim

2021-10-27 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37115:
---

Assignee: angerszhu

> Replace HiveClient call with hive shim
> --
>
> Key: SPARK-37115
> URL: https://issues.apache.org/jira/browse/SPARK-37115
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> Replace HiveClient call with hive shim



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >