[jira] [Assigned] (SPARK-40606) Eliminate `to_pandas` warnings in test

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40606:


Assignee: (was: Apache Spark)

> Eliminate `to_pandas` warnings in test
> --
>
> Key: SPARK-40606
> URL: https://issues.apache.org/jira/browse/SPARK-40606
> Project: Spark
>  Issue Type: Improvement
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40606) Eliminate `to_pandas` warnings in test

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610809#comment-17610809
 ] 

Apache Spark commented on SPARK-40606:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38042

> Eliminate `to_pandas` warnings in test
> --
>
> Key: SPARK-40606
> URL: https://issues.apache.org/jira/browse/SPARK-40606
> Project: Spark
>  Issue Type: Improvement
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40606) Eliminate `to_pandas` warnings in test

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40606:


Assignee: Apache Spark

> Eliminate `to_pandas` warnings in test
> --
>
> Key: SPARK-40606
> URL: https://issues.apache.org/jira/browse/SPARK-40606
> Project: Spark
>  Issue Type: Improvement
>  Components: ps, Tests
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40606) Eliminate `to_pandas` warnings in test

2022-09-28 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-40606:
-

 Summary: Eliminate `to_pandas` warnings in test
 Key: SPARK-40606
 URL: https://issues.apache.org/jira/browse/SPARK-40606
 Project: Spark
  Issue Type: Improvement
  Components: ps, Tests
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40604) Verify the temporary column names in PS

2022-09-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40604.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38040
[https://github.com/apache/spark/pull/38040]

> Verify the temporary column names in PS
> ---
>
> Key: SPARK-40604
> URL: https://issues.apache.org/jira/browse/SPARK-40604
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40604) Verify the temporary column names in PS

2022-09-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-40604:
-

Assignee: Ruifeng Zheng

> Verify the temporary column names in PS
> ---
>
> Key: SPARK-40604
> URL: https://issues.apache.org/jira/browse/SPARK-40604
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40605) Connect module should use log4j2.properties to configure test log output as other modules

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610775#comment-17610775
 ] 

Apache Spark commented on SPARK-40605:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38041

> Connect module should use log4j2.properties to configure test log output as 
> other modules
> -
>
> Key: SPARK-40605
> URL: https://issues.apache.org/jira/browse/SPARK-40605
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40605) Connect module should use log4j2.properties to configure test log output as other modules

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40605:


Assignee: (was: Apache Spark)

> Connect module should use log4j2.properties to configure test log output as 
> other modules
> -
>
> Key: SPARK-40605
> URL: https://issues.apache.org/jira/browse/SPARK-40605
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40605) Connect module should use log4j2.properties to configure test log output as other modules

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40605:


Assignee: Apache Spark

> Connect module should use log4j2.properties to configure test log output as 
> other modules
> -
>
> Key: SPARK-40605
> URL: https://issues.apache.org/jira/browse/SPARK-40605
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40605) Connect module should use log4j2.properties to configure test log output as other modules

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610776#comment-17610776
 ] 

Apache Spark commented on SPARK-40605:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38041

> Connect module should use log4j2.properties to configure test log output as 
> other modules
> -
>
> Key: SPARK-40605
> URL: https://issues.apache.org/jira/browse/SPARK-40605
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40605) Connect module should use log4j2.properties to configure test log output as other modules

2022-09-28 Thread Yang Jie (Jira)
Yang Jie created SPARK-40605:


 Summary: Connect module should use log4j2.properties to configure 
test log output as other modules
 Key: SPARK-40605
 URL: https://issues.apache.org/jira/browse/SPARK-40605
 Project: Spark
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14

2022-09-28 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610774#comment-17610774
 ] 

Yang Jie commented on SPARK-40593:
--

[~hyukjin.kwon] [~grundprinzip-db] Besides upgrading the default GLBC version 
of the OS, are there any other recommended ways to make it compile?

 

> protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 
> ---
>
> Key: SPARK-40593
> URL: https://issues.apache.org/jira/browse/SPARK-40593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Compile Connect module on CentOS release 6.3, the default glibc version is 
> 2.12, this will cause compilation to fail as follows:
> {code:java}
> [ERROR] PROTOC FAILED: 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /lib64/libc.so.6: version `GLIBC_2.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40580) Update the document for DataFrame.to_orc

2022-09-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-40580.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38018
[https://github.com/apache/spark/pull/38018]

> Update the document for DataFrame.to_orc
> 
>
> Key: SPARK-40580
> URL: https://issues.apache.org/jira/browse/SPARK-40580
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 3.4.0
>
>
> From pandas 1.5.0. `pandas.DataFrame.to_orc` is supported.
> Pandas API on Spark already support this feature, but the behavior is a bit 
> different from pandas, so we should update the documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40604) Verify the temporary column names in PS

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610764#comment-17610764
 ] 

Apache Spark commented on SPARK-40604:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38040

> Verify the temporary column names in PS
> ---
>
> Key: SPARK-40604
> URL: https://issues.apache.org/jira/browse/SPARK-40604
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40604) Verify the temporary column names in PS

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40604:


Assignee: Apache Spark

> Verify the temporary column names in PS
> ---
>
> Key: SPARK-40604
> URL: https://issues.apache.org/jira/browse/SPARK-40604
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40604) Verify the temporary column names in PS

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610761#comment-17610761
 ] 

Apache Spark commented on SPARK-40604:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/38040

> Verify the temporary column names in PS
> ---
>
> Key: SPARK-40604
> URL: https://issues.apache.org/jira/browse/SPARK-40604
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40604) Verify the temporary column names in PS

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40604:


Assignee: (was: Apache Spark)

> Verify the temporary column names in PS
> ---
>
> Key: SPARK-40604
> URL: https://issues.apache.org/jira/browse/SPARK-40604
> Project: Spark
>  Issue Type: Improvement
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40604) Verify the temporary column names in PS

2022-09-28 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-40604:
-

 Summary: Verify the temporary column names in PS
 Key: SPARK-40604
 URL: https://issues.apache.org/jira/browse/SPARK-40604
 Project: Spark
  Issue Type: Improvement
  Components: ps
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40580) Update the document for DataFrame.to_orc

2022-09-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-40580:


Assignee: Haejoon Lee

> Update the document for DataFrame.to_orc
> 
>
> Key: SPARK-40580
> URL: https://issues.apache.org/jira/browse/SPARK-40580
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> From pandas 1.5.0. `pandas.DataFrame.to_orc` is supported.
> Pandas API on Spark already support this feature, but the behavior is a bit 
> different from pandas, so we should update the documents.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-38723) Test the error class: CONCURRENT_QUERY

2022-09-28 Thread Haejoon Lee (Jira)


[ https://issues.apache.org/jira/browse/SPARK-38723 ]


Haejoon Lee deleted comment on SPARK-38723:
-

was (Author: itholic):
I'm working on it :)

> Test the error class: CONCURRENT_QUERY
> --
>
> Key: SPARK-38723
> URL: https://issues.apache.org/jira/browse/SPARK-38723
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Add at least one test for the error class *CONCURRENT_QUERY* to 
> QueryExecutionErrorsSuite. The test should cover the exception throw in 
> QueryExecutionErrors:
> {code:scala}
>   def concurrentQueryInstanceError(): Throwable = {
> new SparkConcurrentModificationException("CONCURRENT_QUERY", Array.empty)
>   }
> {code}
> For example, here is a test for the error class *UNSUPPORTED_FEATURE*: 
> https://github.com/apache/spark/blob/34e3029a43d2a8241f70f2343be8285cb7f231b9/sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala#L151-L170
> +The test must have a check of:+
> # the entire error message
> # sqlState if it is defined in the error-classes.json file
> # the error class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-09-28 Thread phoebe chen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610741#comment-17610741
 ] 

phoebe chen edited comment on SPARK-39725 at 9/28/22 10:13 PM:
---

[~bjornjorgensen]
[~hyukjin.kwon]
Thanks for the quick fix.
In the PR, the jetty.version is changed to 9.4.48.v20220622, just want to 
double confirm that all the jetty dependencies in Spark will be upgraded to 
this version, including jetty-io, right? 


was (Author: JIRAUSER283955):
[~bjornjorgensen][~hyukjin.kwon]
Thanks for the quick fix.
In the PR, the jetty.version is changed to 9.4.48.v20220622, just want to 
double confirm that all the jetty dependencies in Spark will be upgraded to 
this version, including jetty-io, right? 

> Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
> 
>
> Key: SPARK-39725
> URL: https://issues.apache.org/jira/browse/SPARK-39725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0
>
>
> [Release note |https://github.com/eclipse/jetty.project/releases] 
> [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622

2022-09-28 Thread phoebe chen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610741#comment-17610741
 ] 

phoebe chen commented on SPARK-39725:
-

[~bjornjorgensen][~hyukjin.kwon]
Thanks for the quick fix.
In the PR, the jetty.version is changed to 9.4.48.v20220622, just want to 
double confirm that all the jetty dependencies in Spark will be upgraded to 
this version, including jetty-io, right? 

> Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
> 
>
> Key: SPARK-39725
> URL: https://issues.apache.org/jira/browse/SPARK-39725
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
> Fix For: 3.4.0
>
>
> [Release note |https://github.com/eclipse/jetty.project/releases] 
> [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40574) Add PURGE to DROP TABLE doc

2022-09-28 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610717#comment-17610717
 ] 

Dongjoon Hyun commented on SPARK-40574:
---

I updated the Fix Version from 3.3.1 to 3.3.2 because 3.3.1 RC2 vote start 
without this.

> Add PURGE to DROP TABLE doc
> ---
>
> Key: SPARK-40574
> URL: https://issues.apache.org/jira/browse/SPARK-40574
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40583) Documentation error in "Integration with Cloud Infrastructures"

2022-09-28 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610716#comment-17610716
 ] 

Dongjoon Hyun commented on SPARK-40583:
---

I updated the Fix Version from 3.3.1 to 3.3.2 because 3.3.1 RC2 vote start 
without this.

> Documentation error in "Integration with Cloud Infrastructures"
> ---
>
> Key: SPARK-40583
> URL: https://issues.apache.org/jira/browse/SPARK-40583
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: Daniel Ranchal
>Assignee: Daniel Ranchal
>Priority: Minor
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>
> The artifactId that implements the integration with several cloud 
> infrastructures is wrong. Instead of "hadoop-cloud-\{SCALA_VERSION}", it 
> should say "spark-hadoop-cloud-\{SCALA_VERSION}".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40562) Add spark.sql.legacy.groupingIdWithAppendedUserGroupBy

2022-09-28 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610715#comment-17610715
 ] 

Dongjoon Hyun commented on SPARK-40562:
---

I updated the Fix Version from 3.3.1 to 3.3.2 because 3.3.1 RC2 vote start 
without this.

> Add spark.sql.legacy.groupingIdWithAppendedUserGroupBy
> --
>
> Key: SPARK-40562
> URL: https://issues.apache.org/jira/browse/SPARK-40562
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>
> {code:java}
> scala> sql("SELECT count(*), grouping__id from (VALUES (1,1,1),(2,2,2)) AS 
> t(k1,k2,v) GROUP BY k1 GROUPING SETS (k2) ").show()
> +++
> |count(1)|grouping__id|
> +++
> |       1|           2|
> |       1|           2|
> +++
> scala> sql("set spark.sql.legacy.groupingIdWithAppendedUserGroupBy=true")
> res1: org.apache.spark.sql.DataFrame = [key: string, value: string]scala> 
> sql("SELECT count(*), grouping__id from (VALUES (1,1,1),(2,2,2)) AS 
> t(k1,k2,v) GROUP BY k1 GROUPING SETS (k2) ").show()
> +++
> |count(1)|grouping__id|
> +++
> |       1|           1|
> |       1|           1|
> +++ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38717) Handle Hive's bucket spec case preserving behaviour

2022-09-28 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610714#comment-17610714
 ] 

Dongjoon Hyun commented on SPARK-38717:
---

I updated the Fix Version from 3.3.1 to 3.3.2 because 3.3.1 RC2 vote start 
without this.

> Handle Hive's bucket spec case preserving behaviour
> ---
>
> Key: SPARK-38717
> URL: https://issues.apache.org/jira/browse/SPARK-38717
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> {code}
> CREATE TABLE t(
>  c STRING,
>  B_C STRING
> )
> PARTITIONED BY (p_c STRING)
> CLUSTERED BY (B_C) INTO 4 BUCKETS
> STORED AS PARQUET
> {code}
> then
> {code}
> SELECT * FROM t
> {code}
> fails with:
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns 
> B_C is not part of the table columns ([FieldSchema(name:c, type:string, 
> comment:null), FieldSchema(name:b_c, type:string, comment:null)]
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.setBucketCols(Table.java:552)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:1098)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:764)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:274)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:763)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitionsByFilter$1(HiveExternalCatalog.scala:1287)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101)
>   ... 110 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39200) Stream is corrupted Exception while fetching the blocks from fallback storage system

2022-09-28 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610711#comment-17610711
 ] 

Dongjoon Hyun commented on SPARK-39200:
---

I updated the Fix Version from 3.3.1 to 3.3.2 because 3.3.1 RC2 vote start 
without this.

> Stream is corrupted Exception while fetching the blocks from fallback storage 
> system
> 
>
> Key: SPARK-39200
> URL: https://issues.apache.org/jira/browse/SPARK-39200
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Rajendra Gujja
>Assignee: Frank Yin
>Priority: Major
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>
> When executor decommissioning and fallback storage is enabled - the shuffle 
> reads are failing with `FetchFailedException: Stream is corrupted` 
> ref: https://issues.apache.org/jira/browse/SPARK-18105 (search for 
> decommission)
>  
> This is happening when the shuffle block is bigger than `inputstream.read` 
> can read in one attempt. The code path is not reading the block fully 
> (`readFully`) and the partial read is causing the exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40547) Fix dead links in sparkr-vignettes.Rmd

2022-09-28 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610712#comment-17610712
 ] 

Dongjoon Hyun commented on SPARK-40547:
---

I updated the Fix Version from 3.3.1 to 3.3.2 because 3.3.1 RC2 vote start 
without this.

> Fix dead links in sparkr-vignettes.Rmd
> --
>
> Key: SPARK-40547
> URL: https://issues.apache.org/jira/browse/SPARK-40547
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40535) NPE from observe of collect_list

2022-09-28 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610710#comment-17610710
 ] 

Dongjoon Hyun commented on SPARK-40535:
---

I updated the Fix Version from 3.3.1 to 3.3.2 because 3.3.1 RC2 vote start 
without this.

> NPE from observe of collect_list
> 
>
> Key: SPARK-40535
> URL: https://issues.apache.org/jira/browse/SPARK-40535
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> The code below reproduces the issue:
> {code:scala}
> import org.apache.spark.sql.functions._
> val df = spark.range(1,10,1,11)
> df.observe("collectedList", collect_list("id")).collect()
> {code}
> instead of
> {code}
> Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
> {code}
> it fails with the NPE:
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:641)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:602)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:624)
>   at 
> org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:205)
>   at 
> org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:33)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40547) Fix dead links in sparkr-vignettes.Rmd

2022-09-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40547:
--
Fix Version/s: 3.3.2
   (was: 3.3.1)

> Fix dead links in sparkr-vignettes.Rmd
> --
>
> Key: SPARK-40547
> URL: https://issues.apache.org/jira/browse/SPARK-40547
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40535) NPE from observe of collect_list

2022-09-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40535:
--
Fix Version/s: 3.3.2
   (was: 3.3.1)

> NPE from observe of collect_list
> 
>
> Key: SPARK-40535
> URL: https://issues.apache.org/jira/browse/SPARK-40535
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> The code below reproduces the issue:
> {code:scala}
> import org.apache.spark.sql.functions._
> val df = spark.range(1,10,1,11)
> df.observe("collectedList", collect_list("id")).collect()
> {code}
> instead of
> {code}
> Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
> {code}
> it fails with the NPE:
> {code:java}
> java.lang.NullPointerException
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:641)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:602)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:624)
>   at 
> org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:205)
>   at 
> org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:33)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40322) Fix all dead links

2022-09-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40322:
--
Fix Version/s: 3.3.2
   (was: 3.3.1)

> Fix all dead links
> --
>
> Key: SPARK-40322
> URL: https://issues.apache.org/jira/browse/SPARK-40322
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.3.2
>
>
>  
> [https://www.deadlinkchecker.com/website-dead-link-checker.asp]
>  
>  
> ||Status||URL||Source link text||
> |-1 Not found: The server name or address could not be 
> resolved|[http://engineering.ooyala.com/blog/using-parquet-and-scrooge-spark]|[Using
>  Parquet and Scrooge with Spark|https://spark.apache.org/documentation.html]|
> |-1 Not found: The server name or address could not be 
> resolved|[http://blinkdb.org/]|[BlinkDB|https://spark.apache.org/third-party-projects.html]|
> |404 Not 
> Found|[https://github.com/AyasdiOpenSource/df]|[DF|https://spark.apache.org/third-party-projects.html]|
> |-1 Timeout|[https://atp.io/]|[atp|https://spark.apache.org/powered-by.html]|
> |-1 Not found: The server name or address could not be 
> resolved|[http://www.sehir.edu.tr/en/]|[Istanbul Sehir 
> University|https://spark.apache.org/powered-by.html]|
> |404 Not Found|[http://nsn.com/]|[Nokia Solutions and 
> Networks|https://spark.apache.org/powered-by.html]|
> |-1 Not found: The server name or address could not be 
> resolved|[http://www.nubetech.co/]|[Nube 
> Technologies|https://spark.apache.org/powered-by.html]|
> |-1 Timeout|[http://ooyala.com/]|[Ooyala, 
> Inc.|https://spark.apache.org/powered-by.html]|
> |-1 Not found: The server name or address could not be 
> resolved|[http://engineering.ooyala.com/blog/fast-spark-queries-memory-datasets]|[Spark
>  for Fast Queries|https://spark.apache.org/powered-by.html]|
> |-1 Not found: The server name or address could not be 
> resolved|[http://www.sisa.samsung.com/]|[Samsung Research 
> America|https://spark.apache.org/powered-by.html]|
> |-1 
> Timeout|[https://checker.apache.org/projs/spark.html]|[https://checker.apache.org/projs/spark.html|https://spark.apache.org/release-process.html]|
> |404 Not Found|[https://ampcamp.berkeley.edu/amp-camp-two-strata-2013/]|[AMP 
> Camp 2 [302 from 
> http://ampcamp.berkeley.edu/amp-camp-two-strata-2013/]|https://spark.apache.org/documentation.html]|
> |404 Not Found|[https://ampcamp.berkeley.edu/agenda-2012/]|[AMP Camp 1 [302 
> from 
> http://ampcamp.berkeley.edu/agenda-2012/]|https://spark.apache.org/documentation.html]|
> |404 Not Found|[https://ampcamp.berkeley.edu/4/]|[AMP Camp 4 [302 from 
> http://ampcamp.berkeley.edu/4/]|https://spark.apache.org/documentation.html]|
> |404 Not Found|[https://ampcamp.berkeley.edu/3/]|[AMP Camp 3 [302 from 
> http://ampcamp.berkeley.edu/3/]|https://spark.apache.org/documentation.html]|
> |-500 Internal Server 
> Error-|-[https://www.packtpub.com/product/spark-cookbook/9781783987061]-|-[Spark
>  Cookbook [301 from 
> https://www.packtpub.com/big-data-and-business-intelligence/spark-cookbook]|https://spark.apache.org/documentation.html]-|
> |-500 Internal Server 
> Error-|-[https://www.packtpub.com/product/apache-spark-graph-processing/9781784391805]-|-[Apache
>  Spark Graph Processing [301 from 
> https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing]|https://spark.apache.org/documentation.html]-|
> |500 Internal Server 
> Error|[https://prevalentdesignevents.com/sparksummit/eu17/]|[register|https://spark.apache.org/news/]|
> |500 Internal Server 
> Error|[https://prevalentdesignevents.com/sparksummit/ss17/?_ga=1.211902866.780052874.1433437196]|[register|https://spark.apache.org/news/]|
> |500 Internal Server 
> Error|[https://www.prevalentdesignevents.com/sparksummit2015/europe/registration.aspx?source=header]|[register|https://spark.apache.org/news/]|
> |500 Internal Server 
> Error|[https://www.prevalentdesignevents.com/sparksummit2015/europe/speaker/]|[Spark
>  Summit Europe|https://spark.apache.org/news/]|
> |-1 
> Timeout|[http://strataconf.com/strata2013]|[Strata|https://spark.apache.org/news/]|
> |-1 Not found: The server name or address could not be 
> resolved|[http://blog.quantifind.com/posts/spark-unit-test/]|[Unit testing 
> with Spark|https://spark.apache.org/news/]|
> |-1 Not found: The server name or address could not be 
> resolved|[http://blog.quantifind.com/posts/logging-post/]|[Configuring 
> Spark's logs|https://spark.apache.org/news/]|
> |-1 
> Timeout|[http://strata.oreilly.com/2012/08/seven-reasons-why-i-like-spark.html]|[Spark|https://spark.apache.org/news/]|
> |-1 
> 

[jira] [Updated] (SPARK-39200) Stream is corrupted Exception while fetching the blocks from fallback storage system

2022-09-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-39200:
--
Fix Version/s: 3.3.2
   (was: 3.3.1)

> Stream is corrupted Exception while fetching the blocks from fallback storage 
> system
> 
>
> Key: SPARK-39200
> URL: https://issues.apache.org/jira/browse/SPARK-39200
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Rajendra Gujja
>Assignee: Frank Yin
>Priority: Major
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>
> When executor decommissioning and fallback storage is enabled - the shuffle 
> reads are failing with `FetchFailedException: Stream is corrupted` 
> ref: https://issues.apache.org/jira/browse/SPARK-18105 (search for 
> decommission)
>  
> This is happening when the shuffle block is bigger than `inputstream.read` 
> can read in one attempt. The code path is not reading the block fully 
> (`readFully`) and the partial read is causing the exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38717) Handle Hive's bucket spec case preserving behaviour

2022-09-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-38717:
--
Fix Version/s: 3.3.2
   (was: 3.3.1)

> Handle Hive's bucket spec case preserving behaviour
> ---
>
> Key: SPARK-38717
> URL: https://issues.apache.org/jira/browse/SPARK-38717
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.4.0, 3.3.2
>
>
> {code}
> CREATE TABLE t(
>  c STRING,
>  B_C STRING
> )
> PARTITIONED BY (p_c STRING)
> CLUSTERED BY (B_C) INTO 4 BUCKETS
> STORED AS PARQUET
> {code}
> then
> {code}
> SELECT * FROM t
> {code}
> fails with:
> {code}
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns 
> B_C is not part of the table columns ([FieldSchema(name:c, type:string, 
> comment:null), FieldSchema(name:b_c, type:string, comment:null)]
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.setBucketCols(Table.java:552)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:1098)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:764)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:274)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:763)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitionsByFilter$1(HiveExternalCatalog.scala:1287)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101)
>   ... 110 more
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40562) Add spark.sql.legacy.groupingIdWithAppendedUserGroupBy

2022-09-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40562:
--
Fix Version/s: 3.3.2
   (was: 3.3.1)

> Add spark.sql.legacy.groupingIdWithAppendedUserGroupBy
> --
>
> Key: SPARK-40562
> URL: https://issues.apache.org/jira/browse/SPARK-40562
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.3.1, 3.2.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>
> {code:java}
> scala> sql("SELECT count(*), grouping__id from (VALUES (1,1,1),(2,2,2)) AS 
> t(k1,k2,v) GROUP BY k1 GROUPING SETS (k2) ").show()
> +++
> |count(1)|grouping__id|
> +++
> |       1|           2|
> |       1|           2|
> +++
> scala> sql("set spark.sql.legacy.groupingIdWithAppendedUserGroupBy=true")
> res1: org.apache.spark.sql.DataFrame = [key: string, value: string]scala> 
> sql("SELECT count(*), grouping__id from (VALUES (1,1,1),(2,2,2)) AS 
> t(k1,k2,v) GROUP BY k1 GROUPING SETS (k2) ").show()
> +++
> |count(1)|grouping__id|
> +++
> |       1|           1|
> |       1|           1|
> +++ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40574) Add PURGE to DROP TABLE doc

2022-09-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40574:
--
Fix Version/s: (was: 3.3.1)
   3.3.2

> Add PURGE to DROP TABLE doc
> ---
>
> Key: SPARK-40574
> URL: https://issues.apache.org/jira/browse/SPARK-40574
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40583) Documentation error in "Integration with Cloud Infrastructures"

2022-09-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40583:
--
Fix Version/s: 3.3.2
   (was: 3.3.1)

> Documentation error in "Integration with Cloud Infrastructures"
> ---
>
> Key: SPARK-40583
> URL: https://issues.apache.org/jira/browse/SPARK-40583
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: Daniel Ranchal
>Assignee: Daniel Ranchal
>Priority: Minor
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
>
> The artifactId that implements the integration with several cloud 
> infrastructures is wrong. Instead of "hadoop-cloud-\{SCALA_VERSION}", it 
> should say "spark-hadoop-cloud-\{SCALA_VERSION}".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39804) Override Spark Core_2.12 (v3.3.0) logging configuration

2022-09-28 Thread cornel creanga (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610696#comment-17610696
 ] 

cornel creanga commented on SPARK-39804:


Spark will only load the default file when no custom configuration was declared 
(it check to see if log4j2 is using the 

org.apache.logging.log4j.core.config.DefaultConfiguration)

If you need an example how to declare a configuration programmatically you can 
take a look on my git repo 
[here|[http://example.com|https://github.com/cornelcreanga/spark-playground/blob/master/examples/src/main/scala/com/creanga/playground/spark/example/logging/CustomConfigurationFactory.java]].
 

> Override Spark Core_2.12 (v3.3.0) logging configuration
> ---
>
> Key: SPARK-39804
> URL: https://issues.apache.org/jira/browse/SPARK-39804
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Jitin Dominic
>Priority: Major
>
> I'm using Grails 2.5.4 and trying to use _SparkSession_ instance for 
> generating a Parquet output. Recently, upgraded the spark core and it's 
> related dependencies to their latest version(v3.3.0).
>  
> During the SparkSession builder() initialization, I notice that some extra 
> logs are getting displayed:
>  
> {noformat}
> Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties
> 22/07/13 11:58:54 WARN Utils: Your hostname, XY resolves to a loopback 
> address: 127.0.1.1; using 1XX.1XX.0.1XX instead (on interface wlo1)
> 22/07/13 11:58:54 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to 
> another address
> 22/07/13 11:58:54 INFO SparkContext: Running Spark version 3.3.0
> 22/07/13 11:58:54 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 22/07/13 11:58:54 INFO ResourceUtils: 
> ==
> 22/07/13 11:58:54 INFO ResourceUtils: No custom resources configured for 
> spark.driver.
> 22/07/13 11:58:54 INFO ResourceUtils: 
> ==
> 22/07/13 11:58:54 INFO SparkContext: Submitted application: ABCDE
> 22/07/13 11:58:54 INFO ResourceProfile: Default ResourceProfile created, 
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
> memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: 
> offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: 
> cpus, amount: 1.0)
> 22/07/13 11:58:54 INFO ResourceProfile: Limiting resource is cpu
> 22/07/13 11:58:54 INFO ResourceProfileManager: Added ResourceProfile id: 0
> 22/07/13 11:58:54 INFO SecurityManager: Changing view acls to: xy
> 22/07/13 11:58:54 INFO SecurityManager: Changing modify acls to: xy
> 22/07/13 11:58:54 INFO SecurityManager: Changing view acls groups to: 
> 22/07/13 11:58:54 INFO SecurityManager: Changing modify acls groups to: 
> 22/07/13 11:58:54 INFO SecurityManager: SecurityManager: authentication 
> disabled; ui acls disabled; users  with view permissions: Set(xy); groups 
> with view permissions: Set(); users  with modify permissions: Set(xy); groups 
> with modify permissions: Set()
> 22/07/13 11:58:54 INFO Utils: Successfully started service 'sparkDriver' on 
> port 39483.
> 22/07/13 11:58:54 INFO SparkEnv: Registering MapOutputTracker
> 22/07/13 11:58:54 INFO SparkEnv: Registering BlockManagerMaster
> 22/07/13 11:58:54 INFO BlockManagerMasterEndpoint: Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 22/07/13 11:58:54 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint 
> up
> 22/07/13 11:58:54 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
> 22/07/13 11:58:55 INFO DiskBlockManager: Created local directory at 
> /tmp/blockmgr-cf39a58e-e5bc-4a26-b92a-d945a0deb8e7
> 22/07/13 11:58:55 INFO MemoryStore: MemoryStore started with capacity 2004.6 
> MiB
> 22/07/13 11:58:55 INFO SparkEnv: Registering OutputCommitCoordinator
> 22/07/13 11:58:55 INFO Utils: Successfully started service 'SparkUI' on port 
> 4040.
> 22/07/13 11:58:55 INFO Executor: Starting executor ID driver on host 
> 1XX.1XX.0.1XX
> 22/07/13 11:58:55 INFO Executor: Starting executor with user classpath 
> (userClassPathFirst = false): ''
> 22/07/13 11:58:55 INFO Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 33993.
> 22/07/13 11:58:55 INFO NettyBlockTransferService: Server created on 
> 192.168.0.135:33993
> 22/07/13 11:58:55 INFO BlockManager: Using 
> org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
> policy
> 22/07/13 11:58:55 INFO BlockManagerMaster: Registering BlockManager 
> BlockManagerId(driver, 192.168.0.135, 33993, 

[jira] [Resolved] (SPARK-40595) Improve error message for unused CTE relations

2022-09-28 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-40595.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38029
[https://github.com/apache/spark/pull/38029]

> Improve error message for unused CTE relations
> --
>
> Key: SPARK-40595
> URL: https://issues.apache.org/jira/browse/SPARK-40595
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40595) Improve error message for unused CTE relations

2022-09-28 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-40595:


Assignee: Wenchen Fan

> Improve error message for unused CTE relations
> --
>
> Key: SPARK-40595
> URL: https://issues.apache.org/jira/browse/SPARK-40595
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40603) throw the original error from catalog implementations

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40603:


Assignee: (was: Apache Spark)

> throw the original error from catalog implementations
> -
>
> Key: SPARK-40603
> URL: https://issues.apache.org/jira/browse/SPARK-40603
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40603) throw the original error from catalog implementations

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40603:


Assignee: Apache Spark

> throw the original error from catalog implementations
> -
>
> Key: SPARK-40603
> URL: https://issues.apache.org/jira/browse/SPARK-40603
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40603) throw the original error from catalog implementations

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610643#comment-17610643
 ] 

Apache Spark commented on SPARK-40603:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/38039

> throw the original error from catalog implementations
> -
>
> Key: SPARK-40603
> URL: https://issues.apache.org/jira/browse/SPARK-40603
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40603) throw the original error from catalog implementations

2022-09-28 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-40603:
---

 Summary: throw the original error from catalog implementations
 Key: SPARK-40603
 URL: https://issues.apache.org/jira/browse/SPARK-40603
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40537) Re-enable mypi supoprt

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40537:


Assignee: Apache Spark

> Re-enable mypi supoprt
> --
>
> Key: SPARK-40537
> URL: https://issues.apache.org/jira/browse/SPARK-40537
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Assignee: Apache Spark
>Priority: Major
>
> Re-enable mypi checks for Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40537) Re-enable mypi supoprt

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610594#comment-17610594
 ] 

Apache Spark commented on SPARK-40537:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/38037

> Re-enable mypi supoprt
> --
>
> Key: SPARK-40537
> URL: https://issues.apache.org/jira/browse/SPARK-40537
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Re-enable mypi checks for Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40537) Re-enable mypi supoprt

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40537:


Assignee: (was: Apache Spark)

> Re-enable mypi supoprt
> --
>
> Key: SPARK-40537
> URL: https://issues.apache.org/jira/browse/SPARK-40537
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Martin Grund
>Priority: Major
>
> Re-enable mypi checks for Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40601) Improve error when cogrouping groups with mismatching key sizes

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40601:


Assignee: Apache Spark

> Improve error when cogrouping groups with mismatching key sizes
> ---
>
> Key: SPARK-40601
> URL: https://issues.apache.org/jira/browse/SPARK-40601
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Enrico Minack
>Assignee: Apache Spark
>Priority: Minor
>
> Cogrouping two grouped DataFrames in PySpark that have different group key 
> cardinalities raises an error that is not very descriptive:
> {code:python}
> left.groupby("id", "k")
> .cogroup(right.groupby("id"))
> {code}
> {code:java}
> Traceback (most recent call last):
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o726.collectToPython.
> : java.lang.IndexOutOfBoundsException: 1
>   at 
> scala.collection.mutable.ResizableArray.apply(ResizableArray.scala:46)
>   at 
> scala.collection.mutable.ResizableArray.apply$(ResizableArray.scala:45)
>   at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.physical.HashShuffleSpec.$anonfun$createPartitioning$5(partitioning.scala:650)
> {code}
> *Note:* This is Python-specific as cogrouping with differing group key sizes 
> is not possible in Scala. The respective Scala API is fully typed on the key.
> The problem is that {{EnsureRequirements.ensureDistributionAndOrdering}} 
> calls into {{HashShuffleSpec.createPartitioning(clustering)}} where length of 
> {{clustering}} is smaller than largest bits ({{{}v.head{}}}) in 
> {{{}hashKeyPositions{}}} (EnsureRequirements.scala:159):
> {code:java}
> hashKeyPositions.map(v => clustering(v.head))
> {code}
> Possible fixes:
>  # Assert identical size for group keys, and provide meaningful 
> cogroup-specific error message.
>  # {{EnsureRequirements}} identifies this situation and provide a meaningful 
> distribution-requirements-specific error message.
>  # Ideally both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40601) Improve error when cogrouping groups with mismatching key sizes

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610588#comment-17610588
 ] 

Apache Spark commented on SPARK-40601:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/38036

> Improve error when cogrouping groups with mismatching key sizes
> ---
>
> Key: SPARK-40601
> URL: https://issues.apache.org/jira/browse/SPARK-40601
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Enrico Minack
>Priority: Minor
>
> Cogrouping two grouped DataFrames in PySpark that have different group key 
> cardinalities raises an error that is not very descriptive:
> {code:python}
> left.groupby("id", "k")
> .cogroup(right.groupby("id"))
> {code}
> {code:java}
> Traceback (most recent call last):
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o726.collectToPython.
> : java.lang.IndexOutOfBoundsException: 1
>   at 
> scala.collection.mutable.ResizableArray.apply(ResizableArray.scala:46)
>   at 
> scala.collection.mutable.ResizableArray.apply$(ResizableArray.scala:45)
>   at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.physical.HashShuffleSpec.$anonfun$createPartitioning$5(partitioning.scala:650)
> {code}
> *Note:* This is Python-specific as cogrouping with differing group key sizes 
> is not possible in Scala. The respective Scala API is fully typed on the key.
> The problem is that {{EnsureRequirements.ensureDistributionAndOrdering}} 
> calls into {{HashShuffleSpec.createPartitioning(clustering)}} where length of 
> {{clustering}} is smaller than largest bits ({{{}v.head{}}}) in 
> {{{}hashKeyPositions{}}} (EnsureRequirements.scala:159):
> {code:java}
> hashKeyPositions.map(v => clustering(v.head))
> {code}
> Possible fixes:
>  # Assert identical size for group keys, and provide meaningful 
> cogroup-specific error message.
>  # {{EnsureRequirements}} identifies this situation and provide a meaningful 
> distribution-requirements-specific error message.
>  # Ideally both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40601) Improve error when cogrouping groups with mismatching key sizes

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610589#comment-17610589
 ] 

Apache Spark commented on SPARK-40601:
--

User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/38036

> Improve error when cogrouping groups with mismatching key sizes
> ---
>
> Key: SPARK-40601
> URL: https://issues.apache.org/jira/browse/SPARK-40601
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Enrico Minack
>Priority: Minor
>
> Cogrouping two grouped DataFrames in PySpark that have different group key 
> cardinalities raises an error that is not very descriptive:
> {code:python}
> left.groupby("id", "k")
> .cogroup(right.groupby("id"))
> {code}
> {code:java}
> Traceback (most recent call last):
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o726.collectToPython.
> : java.lang.IndexOutOfBoundsException: 1
>   at 
> scala.collection.mutable.ResizableArray.apply(ResizableArray.scala:46)
>   at 
> scala.collection.mutable.ResizableArray.apply$(ResizableArray.scala:45)
>   at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.physical.HashShuffleSpec.$anonfun$createPartitioning$5(partitioning.scala:650)
> {code}
> *Note:* This is Python-specific as cogrouping with differing group key sizes 
> is not possible in Scala. The respective Scala API is fully typed on the key.
> The problem is that {{EnsureRequirements.ensureDistributionAndOrdering}} 
> calls into {{HashShuffleSpec.createPartitioning(clustering)}} where length of 
> {{clustering}} is smaller than largest bits ({{{}v.head{}}}) in 
> {{{}hashKeyPositions{}}} (EnsureRequirements.scala:159):
> {code:java}
> hashKeyPositions.map(v => clustering(v.head))
> {code}
> Possible fixes:
>  # Assert identical size for group keys, and provide meaningful 
> cogroup-specific error message.
>  # {{EnsureRequirements}} identifies this situation and provide a meaningful 
> distribution-requirements-specific error message.
>  # Ideally both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40601) Improve error when cogrouping groups with mismatching key sizes

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40601:


Assignee: (was: Apache Spark)

> Improve error when cogrouping groups with mismatching key sizes
> ---
>
> Key: SPARK-40601
> URL: https://issues.apache.org/jira/browse/SPARK-40601
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Enrico Minack
>Priority: Minor
>
> Cogrouping two grouped DataFrames in PySpark that have different group key 
> cardinalities raises an error that is not very descriptive:
> {code:python}
> left.groupby("id", "k")
> .cogroup(right.groupby("id"))
> {code}
> {code:java}
> Traceback (most recent call last):
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> o726.collectToPython.
> : java.lang.IndexOutOfBoundsException: 1
>   at 
> scala.collection.mutable.ResizableArray.apply(ResizableArray.scala:46)
>   at 
> scala.collection.mutable.ResizableArray.apply$(ResizableArray.scala:45)
>   at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:49)
>   at 
> org.apache.spark.sql.catalyst.plans.physical.HashShuffleSpec.$anonfun$createPartitioning$5(partitioning.scala:650)
> {code}
> *Note:* This is Python-specific as cogrouping with differing group key sizes 
> is not possible in Scala. The respective Scala API is fully typed on the key.
> The problem is that {{EnsureRequirements.ensureDistributionAndOrdering}} 
> calls into {{HashShuffleSpec.createPartitioning(clustering)}} where length of 
> {{clustering}} is smaller than largest bits ({{{}v.head{}}}) in 
> {{{}hashKeyPositions{}}} (EnsureRequirements.scala:159):
> {code:java}
> hashKeyPositions.map(v => clustering(v.head))
> {code}
> Possible fixes:
>  # Assert identical size for group keys, and provide meaningful 
> cogroup-specific error message.
>  # {{EnsureRequirements}} identifies this situation and provide a meaningful 
> distribution-requirements-specific error message.
>  # Ideally both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40602) RunTimeException for ETL job

2022-09-28 Thread Zhixian Hu (Jira)
Zhixian Hu created SPARK-40602:
--

 Summary: RunTimeException for ETL job
 Key: SPARK-40602
 URL: https://issues.apache.org/jira/browse/SPARK-40602
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.4.5
Reporter: Zhixian Hu


pulling data from datalake on S3 and shows error below:

Py4JJavaError: An error occurred while calling o666.save. : 
java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive. You can set the Spark configuration 
setting spark.sql.hive.manageFilesourcePartitions to false to work around this 
problem, however this will result in degraded performance. Please report a bug: 
https://issues.apache.org/jira/browse/SPARK at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:785)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:791)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:789)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:331)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:239)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:231)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:275)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:231)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:314)
 at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:789)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1299)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1292)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1$$anonfun$apply$1.apply(HiveExternalCatalog.scala:144)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$maybeSynchronized(HiveExternalCatalog.scala:105)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1.apply(HiveExternalCatalog.scala:142)
 at 
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:372)
 at 
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:358)
 at 
com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:140)
 at 
org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1292)
 at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitionsByFilter(ExternalCatalogWithListener.scala:265)
 at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:1045)
 at 
org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
 at 
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:62)
 at 
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:279)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:279)
 at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:76)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:278) 
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:153)
 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:284)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:284)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:353)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:207)
 at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:351) at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:284) 
at 

[jira] [Created] (SPARK-40601) Improve error when cogrouping groups with mismatching key sizes

2022-09-28 Thread Enrico Minack (Jira)
Enrico Minack created SPARK-40601:
-

 Summary: Improve error when cogrouping groups with mismatching key 
sizes
 Key: SPARK-40601
 URL: https://issues.apache.org/jira/browse/SPARK-40601
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 3.4.0
Reporter: Enrico Minack


Cogrouping two grouped DataFrames in PySpark that have different group key 
cardinalities raises an error that is not very descriptive:
{code:python}
left.groupby("id", "k")
.cogroup(right.groupby("id"))
{code}
{code:java}
Traceback (most recent call last):
py4j.protocol.Py4JJavaError: An error occurred while calling 
o726.collectToPython.
: java.lang.IndexOutOfBoundsException: 1
at 
scala.collection.mutable.ResizableArray.apply(ResizableArray.scala:46)
at 
scala.collection.mutable.ResizableArray.apply$(ResizableArray.scala:45)
at scala.collection.mutable.ArrayBuffer.apply(ArrayBuffer.scala:49)
at 
org.apache.spark.sql.catalyst.plans.physical.HashShuffleSpec.$anonfun$createPartitioning$5(partitioning.scala:650)
{code}
*Note:* This is Python-specific as cogrouping with differing group key sizes is 
not possible in Scala. The respective Scala API is fully typed on the key.

The problem is that {{EnsureRequirements.ensureDistributionAndOrdering}} calls 
into {{HashShuffleSpec.createPartitioning(clustering)}} where length of 
{{clustering}} is smaller than largest bits ({{{}v.head{}}}) in 
{{{}hashKeyPositions{}}} (EnsureRequirements.scala:159):
{code:java}
hashKeyPositions.map(v => clustering(v.head))
{code}

Possible fixes:
 # Assert identical size for group keys, and provide meaningful 
cogroup-specific error message.
 # {{EnsureRequirements}} identifies this situation and provide a meaningful 
distribution-requirements-specific error message.
 # Ideally both.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40599) Add multiTransform methods to TreeNode to generate alternatives

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610576#comment-17610576
 ] 

Apache Spark commented on SPARK-40599:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/38034

> Add multiTransform methods to TreeNode to generate alternatives
> ---
>
> Key: SPARK-40599
> URL: https://issues.apache.org/jira/browse/SPARK-40599
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40599) Add multiTransform methods to TreeNode to generate alternatives

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40599:


Assignee: (was: Apache Spark)

> Add multiTransform methods to TreeNode to generate alternatives
> ---
>
> Key: SPARK-40599
> URL: https://issues.apache.org/jira/browse/SPARK-40599
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40599) Add multiTransform methods to TreeNode to generate alternatives

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40599:


Assignee: Apache Spark

> Add multiTransform methods to TreeNode to generate alternatives
> ---
>
> Key: SPARK-40599
> URL: https://issues.apache.org/jira/browse/SPARK-40599
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40600) Support recursiveFileLookup for partitioned datasource

2022-09-28 Thread Zhen Wang (Jira)
Zhen Wang created SPARK-40600:
-

 Summary: Support recursiveFileLookup for partitioned datasource
 Key: SPARK-40600
 URL: https://issues.apache.org/jira/browse/SPARK-40600
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.1
 Environment: Spark: 3.1.1

Hive: 3.1.2
Reporter: Zhen Wang


I use hive tez engine to execute union statement and insert into partitioned 
table may generate HIVE_UNION_SUBDIR subdirectory, and when I use spark sql to 
read this partitioned table, the data below HIVE_UNION_SUBDIR is not read.

For non-partitioned table, I can read the subdirectories of the table when 
setting recursiveFileLookup to true, but for partitioned table, it seems 
impossible to set recursiveFileLookup to true.

So I want to support recursiveFileLookup for partitioned table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40599) Add multiTransform methods to TreeNode to generate alternatives

2022-09-28 Thread Peter Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-40599:
---
Summary: Add multiTransform methods to TreeNode to generate alternatives  
(was: Add multiTransform methods to TreeNode to generate alternative 
transformations)

> Add multiTransform methods to TreeNode to generate alternatives
> ---
>
> Key: SPARK-40599
> URL: https://issues.apache.org/jira/browse/SPARK-40599
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40599) Add multiTransform methods to TreeNode to generate alternative transformations

2022-09-28 Thread Peter Toth (Jira)
Peter Toth created SPARK-40599:
--

 Summary: Add multiTransform methods to TreeNode to generate 
alternative transformations
 Key: SPARK-40599
 URL: https://issues.apache.org/jira/browse/SPARK-40599
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Peter Toth






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40598) Fix plotting features work properly with pandas 1.5.0.

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610552#comment-17610552
 ] 

Apache Spark commented on SPARK-40598:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38033

> Fix plotting features work properly with pandas 1.5.0.
> --
>
> Key: SPARK-40598
> URL: https://issues.apache.org/jira/browse/SPARK-40598
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The plotting methods from pandas API on Spark is not working properly in 
> pandas 1.5.0.
> We should support plotting with pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40598) Fix plotting features work properly with pandas 1.5.0.

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40598:


Assignee: (was: Apache Spark)

> Fix plotting features work properly with pandas 1.5.0.
> --
>
> Key: SPARK-40598
> URL: https://issues.apache.org/jira/browse/SPARK-40598
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The plotting methods from pandas API on Spark is not working properly in 
> pandas 1.5.0.
> We should support plotting with pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40598) Fix plotting features work properly with pandas 1.5.0.

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40598:


Assignee: Apache Spark

> Fix plotting features work properly with pandas 1.5.0.
> --
>
> Key: SPARK-40598
> URL: https://issues.apache.org/jira/browse/SPARK-40598
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> The plotting methods from pandas API on Spark is not working properly in 
> pandas 1.5.0.
> We should support plotting with pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40598) Fix plotting features work properly with pandas 1.5.0.

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610550#comment-17610550
 ] 

Apache Spark commented on SPARK-40598:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38033

> Fix plotting features work properly with pandas 1.5.0.
> --
>
> Key: SPARK-40598
> URL: https://issues.apache.org/jira/browse/SPARK-40598
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The plotting methods from pandas API on Spark is not working properly in 
> pandas 1.5.0.
> We should support plotting with pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40598) Fix plotting features work properly with pandas 1.5.0.

2022-09-28 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-40598:

Summary: Fix plotting features work properly with pandas 1.5.0.  (was: Fix 
PandasOnSpark*Plot work properly with pandas 1.5.0.)

> Fix plotting features work properly with pandas 1.5.0.
> --
>
> Key: SPARK-40598
> URL: https://issues.apache.org/jira/browse/SPARK-40598
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The plotting methods from pandas API on Spark is not working properly in 
> pandas 1.5.0.
> We should support plotting with pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40598) Fix PandasOnSpark*Plot work properly with pandas 1.5.0.

2022-09-28 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-40598:
---

 Summary: Fix PandasOnSpark*Plot work properly with pandas 1.5.0.
 Key: SPARK-40598
 URL: https://issues.apache.org/jira/browse/SPARK-40598
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 3.4.0
Reporter: Haejoon Lee


The plotting methods from pandas API on Spark is not working properly in pandas 
1.5.0.

We should support plotting with pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40597) local mode should respect TASK_MAX_FAILURES like all other cluster managers

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610526#comment-17610526
 ] 

Apache Spark commented on SPARK-40597:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/38032

> local mode should respect TASK_MAX_FAILURES like all other cluster managers
> ---
>
> Key: SPARK-40597
> URL: https://issues.apache.org/jira/browse/SPARK-40597
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Priority: Minor
>
> spark.task.maxFailures shall be respected by the local mode too, as it can be 
> re-executed while error occurs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40597) local mode should respect TASK_MAX_FAILURES like all other cluster managers

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40597:


Assignee: (was: Apache Spark)

> local mode should respect TASK_MAX_FAILURES like all other cluster managers
> ---
>
> Key: SPARK-40597
> URL: https://issues.apache.org/jira/browse/SPARK-40597
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Priority: Minor
>
> spark.task.maxFailures shall be respected by the local mode too, as it can be 
> re-executed while error occurs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40597) local mode should respect TASK_MAX_FAILURES like all other cluster managers

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40597:


Assignee: Apache Spark

> local mode should respect TASK_MAX_FAILURES like all other cluster managers
> ---
>
> Key: SPARK-40597
> URL: https://issues.apache.org/jira/browse/SPARK-40597
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Minor
>
> spark.task.maxFailures shall be respected by the local mode too, as it can be 
> re-executed while error occurs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40597) local mode should respect TASK_MAX_FAILURES like all other cluster managers

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610525#comment-17610525
 ] 

Apache Spark commented on SPARK-40597:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/38032

> local mode should respect TASK_MAX_FAILURES like all other cluster managers
> ---
>
> Key: SPARK-40597
> URL: https://issues.apache.org/jira/browse/SPARK-40597
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Kent Yao
>Priority: Minor
>
> spark.task.maxFailures shall be respected by the local mode too, as it can be 
> re-executed while error occurs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40597) local mode should respect TASK_MAX_FAILURES like all other cluster managers

2022-09-28 Thread Kent Yao (Jira)
Kent Yao created SPARK-40597:


 Summary: local mode should respect TASK_MAX_FAILURES like all 
other cluster managers
 Key: SPARK-40597
 URL: https://issues.apache.org/jira/browse/SPARK-40597
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Kent Yao


spark.task.maxFailures shall be respected by the local mode too, as it can be 
re-executed while error occurs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40589) Fix test for `DataFrame.corr_with` to skip the pandas regression

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610497#comment-17610497
 ] 

Apache Spark commented on SPARK-40589:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38031

> Fix test for `DataFrame.corr_with` to skip the pandas regression
> 
>
> Key: SPARK-40589
> URL: https://issues.apache.org/jira/browse/SPARK-40589
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There is a regression in pandas 1.5.0 for DataFrame.corrwith 
> ([https://github.com/pandas-dev/pandas/issues/48826])
> We should make the test pass to support pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40589) Fix test for `DataFrame.corr_with` to skip the pandas regression

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610498#comment-17610498
 ] 

Apache Spark commented on SPARK-40589:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/38031

> Fix test for `DataFrame.corr_with` to skip the pandas regression
> 
>
> Key: SPARK-40589
> URL: https://issues.apache.org/jira/browse/SPARK-40589
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There is a regression in pandas 1.5.0 for DataFrame.corrwith 
> ([https://github.com/pandas-dev/pandas/issues/48826])
> We should make the test pass to support pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40589) Fix test for `DataFrame.corr_with` to skip the pandas regression

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40589:


Assignee: (was: Apache Spark)

> Fix test for `DataFrame.corr_with` to skip the pandas regression
> 
>
> Key: SPARK-40589
> URL: https://issues.apache.org/jira/browse/SPARK-40589
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There is a regression in pandas 1.5.0 for DataFrame.corrwith 
> ([https://github.com/pandas-dev/pandas/issues/48826])
> We should make the test pass to support pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40589) Fix test for `DataFrame.corr_with` to skip the pandas regression

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40589:


Assignee: Apache Spark

> Fix test for `DataFrame.corr_with` to skip the pandas regression
> 
>
> Key: SPARK-40589
> URL: https://issues.apache.org/jira/browse/SPARK-40589
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> There is a regression in pandas 1.5.0 for DataFrame.corrwith 
> ([https://github.com/pandas-dev/pandas/issues/48826])
> We should make the test pass to support pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40589) Fix test for `DataFrame.corr_with` to skip the pandas regression

2022-09-28 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-40589:

Description: 
There is a regression in pandas 1.5.0 for DataFrame.corrwith 
([https://github.com/pandas-dev/pandas/issues/48826])

We should make the test pass to support pandas 1.5.x.

  was:
The behavior of `DataFrame.corr_with` is not matched with latest pandas.

We should make the test pass to support pandas 1.5.x.


> Fix test for `DataFrame.corr_with` to skip the pandas regression
> 
>
> Key: SPARK-40589
> URL: https://issues.apache.org/jira/browse/SPARK-40589
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There is a regression in pandas 1.5.0 for DataFrame.corrwith 
> ([https://github.com/pandas-dev/pandas/issues/48826])
> We should make the test pass to support pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40589) Fix test for `DataFrame.corr_with` to skip the pandas regression

2022-09-28 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-40589:

Summary: Fix test for `DataFrame.corr_with` to skip the pandas regression  
(was: Fix `DataFrame.corr_with`)

> Fix test for `DataFrame.corr_with` to skip the pandas regression
> 
>
> Key: SPARK-40589
> URL: https://issues.apache.org/jira/browse/SPARK-40589
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The behavior of `DataFrame.corr_with` is not matched with latest pandas.
> We should make the test pass to support pandas 1.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40596) Populate ExecutorDecommission with more informative messages

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610491#comment-17610491
 ] 

Apache Spark commented on SPARK-40596:
--

User 'bozhang2820' has created a pull request for this issue:
https://github.com/apache/spark/pull/38030

> Populate ExecutorDecommission with more informative messages
> 
>
> Key: SPARK-40596
> URL: https://issues.apache.org/jira/browse/SPARK-40596
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Bo Zhang
>Priority: Major
>
> Currently the message in {{ExecutorDecommission}} is a fixed value 
> {{{}"Executor decommission."{}}}, and it is the same for all cases, including 
> spot instance interruptions and auto-scaling down. We should put a detailed 
> message in {{ExecutorDecommission}} to better differentiate those cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40596) Populate ExecutorDecommission with more informative messages

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610490#comment-17610490
 ] 

Apache Spark commented on SPARK-40596:
--

User 'bozhang2820' has created a pull request for this issue:
https://github.com/apache/spark/pull/38030

> Populate ExecutorDecommission with more informative messages
> 
>
> Key: SPARK-40596
> URL: https://issues.apache.org/jira/browse/SPARK-40596
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Bo Zhang
>Priority: Major
>
> Currently the message in {{ExecutorDecommission}} is a fixed value 
> {{{}"Executor decommission."{}}}, and it is the same for all cases, including 
> spot instance interruptions and auto-scaling down. We should put a detailed 
> message in {{ExecutorDecommission}} to better differentiate those cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40596) Populate ExecutorDecommission with more informative messages

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40596:


Assignee: (was: Apache Spark)

> Populate ExecutorDecommission with more informative messages
> 
>
> Key: SPARK-40596
> URL: https://issues.apache.org/jira/browse/SPARK-40596
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Bo Zhang
>Priority: Major
>
> Currently the message in {{ExecutorDecommission}} is a fixed value 
> {{{}"Executor decommission."{}}}, and it is the same for all cases, including 
> spot instance interruptions and auto-scaling down. We should put a detailed 
> message in {{ExecutorDecommission}} to better differentiate those cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40596) Populate ExecutorDecommission with more informative messages

2022-09-28 Thread Bo Zhang (Jira)
Bo Zhang created SPARK-40596:


 Summary: Populate ExecutorDecommission with more informative 
messages
 Key: SPARK-40596
 URL: https://issues.apache.org/jira/browse/SPARK-40596
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.0
Reporter: Bo Zhang


Currently the message in {{ExecutorDecommission}} is a fixed value 
{{{}"Executor decommission."{}}}, and it is the same for all cases, including 
spot instance interruptions and auto-scaling down. We should put a detailed 
message in {{ExecutorDecommission}} to better differentiate those cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40594) Eagerly release hashed relation in ShuffledHashJoin

2022-09-28 Thread XiDuo You (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-40594:
--
Description: 
ShuffledHashJoin releases the built hashed relation at the end of task using 
taskCompletionListener. It is not always good enough for complex sql query.

If a smj or window on the top of the shj, then the hashed relation in shj would 
be leak. All rows have been consumed in sort before smj or window then the 
buffer can not allocate the memory which is hold by hashed relation. Then it 
causes unnecessary spill.

It is a common case in multi-join, since AQE supports convert smj to shj at 
runtime.

  was:
ShuffledHashJoin releases the built hashed relation at the end of task using 
taskCompletionListener. It is not always good enough for complex sql query.

If a smj on the top of the shj, then the hashed relation in shj would be leak. 
All rows have been consumed in sort before smj and then in smj the buffered 
rows can not allocate the memory which is hold by hashed relation. Then it 
causes unnecessary spill.

It is a common case in multi-join, since AQE supports convert smj to shj at 
runtime.


> Eagerly release hashed relation in ShuffledHashJoin
> ---
>
> Key: SPARK-40594
> URL: https://issues.apache.org/jira/browse/SPARK-40594
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Priority: Major
>
> ShuffledHashJoin releases the built hashed relation at the end of task using 
> taskCompletionListener. It is not always good enough for complex sql query.
> If a smj or window on the top of the shj, then the hashed relation in shj 
> would be leak. All rows have been consumed in sort before smj or window then 
> the buffer can not allocate the memory which is hold by hashed relation. Then 
> it causes unnecessary spill.
> It is a common case in multi-join, since AQE supports convert smj to shj at 
> runtime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14

2022-09-28 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-40593:
-
Priority: Minor  (was: Major)

> protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 
> ---
>
> Key: SPARK-40593
> URL: https://issues.apache.org/jira/browse/SPARK-40593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Minor
>
> Compile Connect module on CentOS release 6.3, the default glibc version is 
> 2.12, this will cause compilation to fail as follows:
> {code:java}
> [ERROR] PROTOC FAILED: 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /lib64/libc.so.6: version `GLIBC_2.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14

2022-09-28 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610448#comment-17610448
 ] 

Yang Jie commented on SPARK-40593:
--

This requirement should be declared in a document

 

> protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 
> ---
>
> Key: SPARK-40593
> URL: https://issues.apache.org/jira/browse/SPARK-40593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> Compile Connect module on CentOS release 6.3, the default glibc version is 
> 2.12, this will cause compilation to fail as follows:
> {code:java}
> [ERROR] PROTOC FAILED: 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /lib64/libc.so.6: version `GLIBC_2.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
>  /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by 
> /home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40595) Improve error message for unused CTE relations

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610432#comment-17610432
 ] 

Apache Spark commented on SPARK-40595:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/38029

> Improve error message for unused CTE relations
> --
>
> Key: SPARK-40595
> URL: https://issues.apache.org/jira/browse/SPARK-40595
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40595) Improve error message for unused CTE relations

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40595:


Assignee: (was: Apache Spark)

> Improve error message for unused CTE relations
> --
>
> Key: SPARK-40595
> URL: https://issues.apache.org/jira/browse/SPARK-40595
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40595) Improve error message for unused CTE relations

2022-09-28 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40595:


Assignee: Apache Spark

> Improve error message for unused CTE relations
> --
>
> Key: SPARK-40595
> URL: https://issues.apache.org/jira/browse/SPARK-40595
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40595) Improve error message for unused CTE relations

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610429#comment-17610429
 ] 

Apache Spark commented on SPARK-40595:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/38029

> Improve error message for unused CTE relations
> --
>
> Key: SPARK-40595
> URL: https://issues.apache.org/jira/browse/SPARK-40595
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-40442) Unstable Spark history server: DB is closed

2022-09-28 Thread Santosh Pingale (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610427#comment-17610427
 ] 

Santosh Pingale edited comment on SPARK-40442 at 9/28/22 8:30 AM:
--

{code:java}
HTTP ERROR 500 java.lang.IllegalStateException: DB is closed.
URI:https://xxx/sparkhistory/history/application_1664214774022_4650/1/jobs
STATUS:500
MESSAGE:java.lang.IllegalState
Exception: DB is closed.
SERVLET:org.apache.spark.ui.JettyUtils$$anon$1-cd3a472
CAUSED BY:java.lang.IllegalStateException: DB is closed.Caused 
by:java.lang.IllegalStateException: DB is closed.
at org.apache.spark.util.kvstore.LevelDB.db(LevelDB.java:364)
at 
org.apache.spark.util.kvstore.LevelDBIterator.(LevelDBIterator.java:51)
at org.apache.spark.util.kvstore.LevelDB$1.iterator(LevelDB.java:253)
at 
org.apache.spark.util.kvstore.KVStoreView.closeableIterator(KVStoreView.java:117)
at 
org.apache.spark.status.AppStatusStore.$anonfun$applicationInfo$1(AppStatusStore.scala:44)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2741)
at 
org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:46)
at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:276)
at org.apache.spark.ui.WebUI.$anonfun$attachPage$1(WebUI.scala:90)
at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:81)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
at 
org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at 
org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1631)
at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at 
org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at 
org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
at 
org.apache.spark.deploy.history.ApplicationCacheCheckFilter.doFilter(ApplicationCache.scala:405)
at 
org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at 
org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
at 
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at 
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
at 
org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
at 
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.sparkproject.jetty.server.Server.handle(Server.java:516)
at 
org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:400)
at 
org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:645)
at 
org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:392)
at 
org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at 
org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at 
org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
at 
org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at 
org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
at 
org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
at 
org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
at java.lang.Thread.run(Thread.java:748) {code}
The 

[jira] [Resolved] (SPARK-40592) Implement `min_count` in `GroupBy.max`

2022-09-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40592.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38026
[https://github.com/apache/spark/pull/38026]

> Implement `min_count` in `GroupBy.max`
> --
>
> Key: SPARK-40592
> URL: https://issues.apache.org/jira/browse/SPARK-40592
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40442) Unstable Spark history server: DB is closed

2022-09-28 Thread Santosh Pingale (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610427#comment-17610427
 ] 

Santosh Pingale commented on SPARK-40442:
-

{code:java}
2022-09-28 10:20:57,000 WARN /history/application_1664214774022_4650/1/jobs/ 
java.lang.IllegalStateException: DB is closed. 
at org.apache.spark.util.kvstore.LevelDB.db(LevelDB.java:364) 
at 
org.apache.spark.util.kvstore.LevelDBIterator.(LevelDBIterator.java:51) 
at org.apache.spark.util.kvstore.LevelDB$1.iterator(LevelDB.java:253) at 
org.apache.spark.util.kvstore.KVStoreView.closeableIterator(KVStoreView.java:117)
 at 
org.apache.spark.status.AppStatusStore.$anonfun$applicationInfo$1(AppStatusStore.scala:44)
 at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2741) at 
org.apache.spark.status.AppStatusStore.applicationInfo(AppStatusStore.scala:46) 
at org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:276) at 
org.apache.spark.ui.WebUI.$anonfun$attachPage$1(WebUI.scala:90) at 
org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:81) at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:503) at 
javax.servlet.http.HttpServlet.service(HttpServlet.java:590) at 
org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) at 
org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1631)
 at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at 
org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at 
org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
 at 
org.apache.spark.deploy.history.ApplicationCacheCheckFilter.doFilter(ApplicationCache.scala:405)
 at org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) 
at 
org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
 at 
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) 
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
 at 
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
 at 
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) 
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
 at 
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
 at 
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
 at 
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
 at 
org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
 at 
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
 at org.sparkproject.jetty.server.Server.handle(Server.java:516) at 
org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:400) 
at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:645) at 
org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:392) at 
org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
 at 
org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
 at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105) at 
org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
 at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
 at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
 at 
org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
 at 
org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
 at 
org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
 at 
org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
 at java.lang.Thread.run(Thread.java:748) {code}
The error seems to be present for some applications that just have finished. 
The UI also reports this error. Upon restart of the SHS however the error goes 
away. 

> Unstable Spark history server: DB is closed
> ---
>
> Key: SPARK-40442
> URL: https://issues.apache.org/jira/browse/SPARK-40442
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.2
>Reporter: Santosh Pingale
>Priority: Minor
>
> Since we upgraded our spark history server to 3.2.2, it has been unstable. We 
> get 

[jira] [Created] (SPARK-40595) Improve error message for unused CTE relations

2022-09-28 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-40595:
---

 Summary: Improve error message for unused CTE relations
 Key: SPARK-40595
 URL: https://issues.apache.org/jira/browse/SPARK-40595
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40592) Implement `min_count` in `GroupBy.max`

2022-09-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-40592:
-

Assignee: Ruifeng Zheng

> Implement `min_count` in `GroupBy.max`
> --
>
> Key: SPARK-40592
> URL: https://issues.apache.org/jira/browse/SPARK-40592
> Project: Spark
>  Issue Type: Sub-task
>  Components: ps
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40435) Add test suites for applyInPandasWithState in PySpark

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610407#comment-17610407
 ] 

Apache Spark commented on SPARK-40435:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38028

> Add test suites for applyInPandasWithState in PySpark
> -
>
> Key: SPARK-40435
> URL: https://issues.apache.org/jira/browse/SPARK-40435
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.4.0
>
>
> Basically port the test suite from Scala/Java version of API to Python API. 
> Have e2e test suite purely implemented with python.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40435) Add test suites for applyInPandasWithState in PySpark

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610408#comment-17610408
 ] 

Apache Spark commented on SPARK-40435:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/38028

> Add test suites for applyInPandasWithState in PySpark
> -
>
> Key: SPARK-40435
> URL: https://issues.apache.org/jira/browse/SPARK-40435
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.4.0
>
>
> Basically port the test suite from Scala/Java version of API to Python API. 
> Have e2e test suite purely implemented with python.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40509) Construct an example of applyInPandasWithState in examples directory

2022-09-28 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-40509:


Assignee: Chaoqin Li

> Construct an example of applyInPandasWithState in examples directory
> 
>
> Key: SPARK-40509
> URL: https://issues.apache.org/jira/browse/SPARK-40509
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Chaoqin Li
>Priority: Major
>
> Since we introduce a new API (applyInPandasWithState) in PySpark, it worths 
> to have a separate full example of the API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40509) Construct an example of applyInPandasWithState in examples directory

2022-09-28 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-40509.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38013
[https://github.com/apache/spark/pull/38013]

> Construct an example of applyInPandasWithState in examples directory
> 
>
> Key: SPARK-40509
> URL: https://issues.apache.org/jira/browse/SPARK-40509
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Jungtaek Lim
>Assignee: Chaoqin Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Since we introduce a new API (applyInPandasWithState) in PySpark, it worths 
> to have a separate full example of the API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40540) Migrate compilation errors onto error classes

2022-09-28 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610395#comment-17610395
 ] 

Apache Spark commented on SPARK-40540:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/38027

> Migrate compilation errors onto error classes
> -
>
> Key: SPARK-40540
> URL: https://issues.apache.org/jira/browse/SPARK-40540
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.4.0
>
>
> Use temporary error classes in the compilation exceptions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40594) Eagerly release hashed relation in ShuffledHashJoin

2022-09-28 Thread XiDuo You (Jira)
XiDuo You created SPARK-40594:
-

 Summary: Eagerly release hashed relation in ShuffledHashJoin
 Key: SPARK-40594
 URL: https://issues.apache.org/jira/browse/SPARK-40594
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: XiDuo You


ShuffledHashJoin releases the built hashed relation at the end of task using 
taskCompletionListener. It is not always good enough for complex sql query.

If a smj on the top of the shj, then the hashed relation in shj would be leak. 
All rows have been consumed in sort before smj and then in smj the buffered 
rows can not allocate the memory which is hold by hashed relation. Then it 
causes unnecessary spill.

It is a common case in multi-join, since AQE supports convert smj to shj at 
runtime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40593) protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14

2022-09-28 Thread Yang Jie (Jira)
Yang Jie created SPARK-40593:


 Summary: protoc-3.21.1-linux-x86_64.exe requires GLIBC_2.14 
 Key: SPARK-40593
 URL: https://issues.apache.org/jira/browse/SPARK-40593
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Yang Jie


Compile Connect module on CentOS release 6.3, the default glibc version is 
2.12, this will cause compilation to fail as follows:
{code:java}
[ERROR] PROTOC FAILED: 
/home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
 /lib64/libc.so.6: version `GLIBC_2.14' not found (required by 
/home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
/home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
 /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.18' not found (required by 
/home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
/home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
 /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.14' not found (required by 
/home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
/home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe:
 /usr/lib64/libstdc++.so.6: version `CXXABI_1.3.5' not found (required by 
/home/disk0/spark-source/connect/target/protoc-plugins/protoc-3.21.1-linux-x86_64.exe)
 {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40575) Add badges for PySpark downloads

2022-09-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40575.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 38014
[https://github.com/apache/spark/pull/38014]

> Add badges for PySpark downloads
> 
>
> Key: SPARK-40575
> URL: https://issues.apache.org/jira/browse/SPARK-40575
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >