from:"Hyukjin Kwon \(JIRA\)"

[jira] [Resolved] (SPARK-48593) Fix the string representation of lambda function

2024-06-12 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48593.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46948
[https://github.com/apache/spark/pull/46948]

> Fix the string representation of lambda function
> 
>
> Key: SPARK-48593
> URL: https://issues.apache.org/jira/browse/SPARK-48593
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48593) Fix the string representation of lambda function

2024-06-12 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48593:


Assignee: Ruifeng Zheng

> Fix the string representation of lambda function
> 
>
> Key: SPARK-48593
> URL: https://issues.apache.org/jira/browse/SPARK-48593
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48421) SPJ: Add documentation

2024-06-12 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48421:


Assignee: Szehon Ho

> SPJ: Add documentation
> --
>
> Key: SPARK-48421
> URL: https://issues.apache.org/jira/browse/SPARK-48421
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
>
> As part of SPARK-48329, we mentioned "Storage Partition Join" but noticed 
> there is no documentation describing the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48421) SPJ: Add documentation

2024-06-12 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48421.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46745
[https://github.com/apache/spark/pull/46745]

> SPJ: Add documentation
> --
>
> Key: SPARK-48421
> URL: https://issues.apache.org/jira/browse/SPARK-48421
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> As part of SPARK-48329, we mentioned "Storage Partition Join" but noticed 
> there is no documentation describing the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48591) Simplify the if-else branches with `F.lit`

2024-06-12 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48591.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46946
[https://github.com/apache/spark/pull/46946]

> Simplify the if-else branches with `F.lit`
> --
>
> Key: SPARK-48591
> URL: https://issues.apache.org/jira/browse/SPARK-48591
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48591) Simplify the if-else branches with `F.lit`

2024-06-12 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48591:


Assignee: Ruifeng Zheng

> Simplify the if-else branches with `F.lit`
> --
>
> Key: SPARK-48591
> URL: https://issues.apache.org/jira/browse/SPARK-48591
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48598) Propagate cached schema in dataframe operations

2024-06-12 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48598:


Assignee: Ruifeng Zheng

> Propagate cached schema in dataframe operations
> ---
>
> Key: SPARK-48598
> URL: https://issues.apache.org/jira/browse/SPARK-48598
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48598) Propagate cached schema in dataframe operations

2024-06-12 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48598.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46954
[https://github.com/apache/spark/pull/46954]

> Propagate cached schema in dataframe operations
> ---
>
> Key: SPARK-48598
> URL: https://issues.apache.org/jira/browse/SPARK-48598
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48569) Connect - StreamingQuery.name should return null when not specified

2024-06-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48569.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46920
[https://github.com/apache/spark/pull/46920]

> Connect - StreamingQuery.name should return null when not specified
> ---
>
> Key: SPARK-48569
> URL: https://issues.apache.org/jira/browse/SPARK-48569
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48564) Propagate cached schema in set operations

2024-06-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48564.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46915
[https://github.com/apache/spark/pull/46915]

> Propagate cached schema in set operations
> -
>
> Key: SPARK-48564
> URL: https://issues.apache.org/jira/browse/SPARK-48564
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48564) Propagate cached schema in set operations

2024-06-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48564:


Assignee: Ruifeng Zheng

> Propagate cached schema in set operations
> -
>
> Key: SPARK-48564
> URL: https://issues.apache.org/jira/browse/SPARK-48564
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48560) Make StreamingQueryListener.spark settable

2024-06-09 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48560.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46909
[https://github.com/apache/spark/pull/46909]

> Make StreamingQueryListener.spark settable
> --
>
> Key: SPARK-48560
> URL: https://issues.apache.org/jira/browse/SPARK-48560
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Downstream users might already implement StreamingQueryListener.spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48552) multi-line CSV schema inference should also throw FAILED_READ_FILE

2024-06-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48552.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46890
[https://github.com/apache/spark/pull/46890]

> multi-line CSV schema inference should also throw FAILED_READ_FILE
> --
>
> Key: SPARK-48552
> URL: https://issues.apache.org/jira/browse/SPARK-48552
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48552) multi-line CSV schema inference should also throw FAILED_READ_FILE

2024-06-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48552:


Assignee: Wenchen Fan

> multi-line CSV schema inference should also throw FAILED_READ_FILE
> --
>
> Key: SPARK-48552
> URL: https://issues.apache.org/jira/browse/SPARK-48552
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48560) Make StreamingQueryListener.spark settable

2024-06-06 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48560:


 Summary: Make StreamingQueryListener.spark settable
 Key: SPARK-48560
 URL: https://issues.apache.org/jira/browse/SPARK-48560
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Structured Streaming
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Downstream users might already implement StreamingQueryListener.spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47952) Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn

2024-06-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47952:


Assignee: TakawaAkirayo  (was: Adam Binford)

> Support retrieving the real SparkConnectService GRPC address and port 
> programmatically when running on Yarn
> ---
>
> Key: SPARK-47952
> URL: https://issues.apache.org/jira/browse/SPARK-47952
> Project: Spark
>  Issue Type: Story
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: TakawaAkirayo
>Assignee: TakawaAkirayo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> 1.User Story:
> Our data analysts and data scientists use Jupyter notebooks provisioned on 
> Kubernetes (k8s) with limited CPU/memory resources to run Spark-shell/pyspark 
> in the terminal via Yarn Client mode. However, Yarn Client mode consumes 
> significant local memory if the job is heavy, and the total resource pool of 
> k8s for notebooks is limited. To leverage the abundant resources of our 
> Hadoop cluster for scalability purposes, we aim to utilize SparkConnect. This 
> allows the driver on Yarn with SparkConnectService started and uses 
> SparkConnect client to connect to the remote driver.
> To provide a seamless experience with one command startup for both server and 
> client, we've wrapped the following processes in one script:
> 1) Start a local coordinator server (implemented by us, not in this PR) with 
> a specified port.
> 2) Start SparkConnectServer by spark-submit via Yarn Cluster mode with 
> user-input Spark configurations and the local coordinator server's address 
> and port. Append an additional listener class in the configuration for 
> SparkConnectService callback with the actual address and port on Yarn to the 
> coordinator server.
> 3) Wait for the coordinator server to receive the address callback from the 
> SparkConnectService on Yarn and export the real address.
> 4) Start the client (pyspark --remote) with the remote address.
> Finally, a remote SparkConnect Server is started on Yarn with a local 
> SparkConnect client connected. Users no longer need to start the server 
> beforehand and connect to the remote server after they manually explore the 
> address on Yarn.
> 2.Problem statement of this change:
> 1) The specified port for the SparkConnectService GRPC server might be 
> occupied on the node of the Hadoop Cluster. To increase the success rate of 
> startup, it needs to retry on conflicts rather than fail directly.
> 2) Because the final binding port could be uncertain based on #1 and the 
> remote address is unpredictable on Yarn, we need to retrieve the address and 
> port programmatically and inject it automatically on the start of `pyspark 
> --remote`. The SparkConnectService needs to communicate its location back to 
> the launcher side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47952) Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn

2024-06-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47952.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46182
[https://github.com/apache/spark/pull/46182]

> Support retrieving the real SparkConnectService GRPC address and port 
> programmatically when running on Yarn
> ---
>
> Key: SPARK-47952
> URL: https://issues.apache.org/jira/browse/SPARK-47952
> Project: Spark
>  Issue Type: Story
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: TakawaAkirayo
>Assignee: Adam Binford
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> 1.User Story:
> Our data analysts and data scientists use Jupyter notebooks provisioned on 
> Kubernetes (k8s) with limited CPU/memory resources to run Spark-shell/pyspark 
> in the terminal via Yarn Client mode. However, Yarn Client mode consumes 
> significant local memory if the job is heavy, and the total resource pool of 
> k8s for notebooks is limited. To leverage the abundant resources of our 
> Hadoop cluster for scalability purposes, we aim to utilize SparkConnect. This 
> allows the driver on Yarn with SparkConnectService started and uses 
> SparkConnect client to connect to the remote driver.
> To provide a seamless experience with one command startup for both server and 
> client, we've wrapped the following processes in one script:
> 1) Start a local coordinator server (implemented by us, not in this PR) with 
> a specified port.
> 2) Start SparkConnectServer by spark-submit via Yarn Cluster mode with 
> user-input Spark configurations and the local coordinator server's address 
> and port. Append an additional listener class in the configuration for 
> SparkConnectService callback with the actual address and port on Yarn to the 
> coordinator server.
> 3) Wait for the coordinator server to receive the address callback from the 
> SparkConnectService on Yarn and export the real address.
> 4) Start the client (pyspark --remote) with the remote address.
> Finally, a remote SparkConnect Server is started on Yarn with a local 
> SparkConnect client connected. Users no longer need to start the server 
> beforehand and connect to the remote server after they manually explore the 
> address on Yarn.
> 2.Problem statement of this change:
> 1) The specified port for the SparkConnectService GRPC server might be 
> occupied on the node of the Hadoop Cluster. To increase the success rate of 
> startup, it needs to retry on conflicts rather than fail directly.
> 2) Because the final binding port could be uncertain based on #1 and the 
> remote address is unpredictable on Yarn, we need to retrieve the address and 
> port programmatically and inject it automatically on the start of `pyspark 
> --remote`. The SparkConnectService needs to communicate its location back to 
> the launcher side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47952) Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn

2024-06-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47952:


Assignee: Adam Binford

> Support retrieving the real SparkConnectService GRPC address and port 
> programmatically when running on Yarn
> ---
>
> Key: SPARK-47952
> URL: https://issues.apache.org/jira/browse/SPARK-47952
> Project: Spark
>  Issue Type: Story
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: TakawaAkirayo
>Assignee: Adam Binford
>Priority: Minor
>  Labels: pull-request-available
>
> 1.User Story:
> Our data analysts and data scientists use Jupyter notebooks provisioned on 
> Kubernetes (k8s) with limited CPU/memory resources to run Spark-shell/pyspark 
> in the terminal via Yarn Client mode. However, Yarn Client mode consumes 
> significant local memory if the job is heavy, and the total resource pool of 
> k8s for notebooks is limited. To leverage the abundant resources of our 
> Hadoop cluster for scalability purposes, we aim to utilize SparkConnect. This 
> allows the driver on Yarn with SparkConnectService started and uses 
> SparkConnect client to connect to the remote driver.
> To provide a seamless experience with one command startup for both server and 
> client, we've wrapped the following processes in one script:
> 1) Start a local coordinator server (implemented by us, not in this PR) with 
> a specified port.
> 2) Start SparkConnectServer by spark-submit via Yarn Cluster mode with 
> user-input Spark configurations and the local coordinator server's address 
> and port. Append an additional listener class in the configuration for 
> SparkConnectService callback with the actual address and port on Yarn to the 
> coordinator server.
> 3) Wait for the coordinator server to receive the address callback from the 
> SparkConnectService on Yarn and export the real address.
> 4) Start the client (pyspark --remote) with the remote address.
> Finally, a remote SparkConnect Server is started on Yarn with a local 
> SparkConnect client connected. Users no longer need to start the server 
> beforehand and connect to the remote server after they manually explore the 
> address on Yarn.
> 2.Problem statement of this change:
> 1) The specified port for the SparkConnectService GRPC server might be 
> occupied on the node of the Hadoop Cluster. To increase the success rate of 
> startup, it needs to retry on conflicts rather than fail directly.
> 2) Because the final binding port could be uncertain based on #1 and the 
> remote address is unpredictable on Yarn, we need to retrieve the address and 
> port programmatically and inject it automatically on the start of `pyspark 
> --remote`. The SparkConnectService needs to communicate its location back to 
> the launcher side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48550) Directly use the parent Window class

2024-06-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48550.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46892
[https://github.com/apache/spark/pull/46892]

> Directly use the parent Window class
> 
>
> Key: SPARK-48550
> URL: https://issues.apache.org/jira/browse/SPARK-48550
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48550) Directly use the parent Window class

2024-06-06 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48550:


Assignee: Ruifeng Zheng

> Directly use the parent Window class
> 
>
> Key: SPARK-48550
> URL: https://issues.apache.org/jira/browse/SPARK-48550
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48533) Add test for cached schema

2024-06-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48533:


Assignee: Ruifeng Zheng

> Add test for cached schema
> --
>
> Key: SPARK-48533
> URL: https://issues.apache.org/jira/browse/SPARK-48533
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48533) Add test for cached schema

2024-06-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48533.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46871
[https://github.com/apache/spark/pull/46871]

> Add test for cached schema
> --
>
> Key: SPARK-48533
> URL: https://issues.apache.org/jira/browse/SPARK-48533
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48534) Support interruptOperation in streaming queries

2024-06-04 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48534:


 Summary: Support interruptOperation in streaming queries
 Key: SPARK-48534
 URL: https://issues.apache.org/jira/browse/SPARK-48534
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Similar with https://issues.apache.org/jira/browse/SPARK-48485 but we should 
also add interruptOperation 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`

2024-06-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48523.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46862
[https://github.com/apache/spark/pull/46862]

> Add `grpc_max_message_size ` description to `client-connection-string.md`
> -
>
> Key: SPARK-48523
> URL: https://issues.apache.org/jira/browse/SPARK-48523
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`

2024-06-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48523:


Assignee: BingKun Pan

> Add `grpc_max_message_size ` description to `client-connection-string.md`
> -
>
> Key: SPARK-48523
> URL: https://issues.apache.org/jira/browse/SPARK-48523
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48485) Support interruptTag and interruptAll in streaming queries

2024-06-04 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48485.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46819
[https://github.com/apache/spark/pull/46819]

> Support interruptTag and interruptAll in streaming queries
> --
>
> Key: SPARK-48485
> URL: https://issues.apache.org/jira/browse/SPARK-48485
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Spark Connect's interrupt API does not interrupt streaming queries. We should 
> support them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48482:


Assignee: Wei Liu

> dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
> --
>
> Key: SPARK-48482
> URL: https://issues.apache.org/jira/browse/SPARK-48482
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48482.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46817
[https://github.com/apache/spark/pull/46817]

> dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
> --
>
> Key: SPARK-48482
> URL: https://issues.apache.org/jira/browse/SPARK-48482
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48508) Client Side RPC optimization for Spark Connect

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48508:


Assignee: Ruifeng Zheng

> Client Side RPC optimization for Spark Connect
> --
>
> Key: SPARK-48508
> URL: https://issues.apache.org/jira/browse/SPARK-48508
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48508) Client Side RPC optimization for Spark Connect

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48508.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46848
[https://github.com/apache/spark/pull/46848]

> Client Side RPC optimization for Spark Connect
> --
>
> Key: SPARK-48508
> URL: https://issues.apache.org/jira/browse/SPARK-48508
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48507) Use Hadoop 3.3.6 winutils in `build_sparkr_window`

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48507.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46846
[https://github.com/apache/spark/pull/46846]

> Use Hadoop 3.3.6 winutils in `build_sparkr_window`
> --
>
> Key: SPARK-48507
> URL: https://issues.apache.org/jira/browse/SPARK-48507
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48507) Use Hadoop 3.3.6 winutils in `build_sparkr_window`

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48507:


Assignee: BingKun Pan

> Use Hadoop 3.3.6 winutils in `build_sparkr_window`
> --
>
> Key: SPARK-48507
> URL: https://issues.apache.org/jira/browse/SPARK-48507
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48504) Parent Window class for Spark Connect and Spark Classic

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48504:


Assignee: Ruifeng Zheng

> Parent Window class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-48504
> URL: https://issues.apache.org/jira/browse/SPARK-48504
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48504) Parent Window class for Spark Connect and Spark Classic

2024-06-03 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48504.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46841
[https://github.com/apache/spark/pull/46841]

> Parent Window class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-48504
> URL: https://issues.apache.org/jira/browse/SPARK-48504
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48496) Use static regex Pattern instances in common/utils JavaUtils

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48496.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/46829

> Use static regex Pattern instances in common/utils JavaUtils
> 
>
> Key: SPARK-48496
> URL: https://issues.apache.org/jira/browse/SPARK-48496
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Some methods in JavaUtils.java are recompiling regexes on every invocation; 
> we should instead store a single cached Pattern.
> This is a minor perf. issue that I spotted in the context of other profiling. 
> Not a huge bottleneck in the grand scheme of things, but simple and 
> straightforward to fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48489) Throw an user-facing error when reading invalid schema from text DataSource

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48489.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46823
[https://github.com/apache/spark/pull/46823]

> Throw an user-facing error when reading invalid schema from text DataSource
> ---
>
> Key: SPARK-48489
> URL: https://issues.apache.org/jira/browse/SPARK-48489
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Stefan Bukorovic
>Assignee: Stefan Bukorovic
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Text DataSource produces table schema with only 1 column, but it is possible 
> to try and create a table with schema having multiple columns.
> Currently, when user tries this, we have an assert in the code, which fails 
> and throws internal spark error. We should throw a better user-facing error.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48489) Throw an user-facing error when reading invalid schema from text DataSource

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48489:


Assignee: Stefan Bukorovic

> Throw an user-facing error when reading invalid schema from text DataSource
> ---
>
> Key: SPARK-48489
> URL: https://issues.apache.org/jira/browse/SPARK-48489
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Stefan Bukorovic
>Assignee: Stefan Bukorovic
>Priority: Minor
>  Labels: pull-request-available
>
> Text DataSource produces table schema with only 1 column, but it is possible 
> to try and create a table with schema having multiple columns.
> Currently, when user tries this, we have an assert in the code, which fails 
> and throws internal spark error. We should throw a better user-facing error.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48374) Support additional PyArrow Table column types

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48374:


Assignee: Ian Cook

> Support additional PyArrow Table column types
> -
>
> Key: SPARK-48374
> URL: https://issues.apache.org/jira/browse/SPARK-48374
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-48220 adds support for passing a PyArrow Table to 
> {{{}createDataFrame(){}}}, but there are a few PyArrow column types that are 
> not yet supported:
>  * fixed-size binary
>  * fixed-size list
>  * large list
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48374) Support additional PyArrow Table column types

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48374.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46688
[https://github.com/apache/spark/pull/46688]

> Support additional PyArrow Table column types
> -
>
> Key: SPARK-48374
> URL: https://issues.apache.org/jira/browse/SPARK-48374
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-48220 adds support for passing a PyArrow Table to 
> {{{}createDataFrame(){}}}, but there are a few PyArrow column types that are 
> not yet supported:
>  * fixed-size binary
>  * fixed-size list
>  * large list
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48220:


Assignee: Ian Cook

> Allow passing PyArrow Table to createDataFrame()
> 
>
> Key: SPARK-48220
> URL: https://issues.apache.org/jira/browse/SPARK-48220
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Input/Output, PySpark, SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-47365 added support for returning a Spark DataFrame as a PyArrow Table.
> It would be nice if we could also go in the opposite direction, enabling 
> users to create a Spark DataFrame from a PyArrow Table by passing the PyArrow 
> Table to {{spark.createDataFrame()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()

2024-06-02 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48220.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46529
[https://github.com/apache/spark/pull/46529]

> Allow passing PyArrow Table to createDataFrame()
> 
>
> Key: SPARK-48220
> URL: https://issues.apache.org/jira/browse/SPARK-48220
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Input/Output, PySpark, SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-47365 added support for returning a Spark DataFrame as a PyArrow Table.
> It would be nice if we could also go in the opposite direction, enabling 
> users to create a Spark DataFrame from a PyArrow Table by passing the PyArrow 
> Table to {{spark.createDataFrame()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48485) Support interruptTag and interruptAll in streaming queries

2024-05-31 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48485:


 Summary: Support interruptTag and interruptAll in streaming queries
 Key: SPARK-48485
 URL: https://issues.apache.org/jira/browse/SPARK-48485
 Project: Spark
  Issue Type: Improvement
  Components: Connect, Structured Streaming
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Spark Connect's interrupt API does not interrupt streaming queries. We should 
support them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48474:


Assignee: BingKun Pan

> Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
> ---
>
> Key: SPARK-48474
> URL: https://issues.apache.org/jira/browse/SPARK-48474
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48474.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46808
[https://github.com/apache/spark/pull/46808]

> Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
> ---
>
> Key: SPARK-48474
> URL: https://issues.apache.org/jira/browse/SPARK-48474
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48467) Upgrade Maven to 3.9.7

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48467:


Assignee: BingKun Pan

> Upgrade Maven to 3.9.7
> --
>
> Key: SPARK-48467
> URL: https://issues.apache.org/jira/browse/SPARK-48467
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48467) Upgrade Maven to 3.9.7

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48467.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46798
[https://github.com/apache/spark/pull/46798]

> Upgrade Maven to 3.9.7
> --
>
> Key: SPARK-48467
> URL: https://issues.apache.org/jira/browse/SPARK-48467
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47716:


Assignee: Jack Chen

> SQLQueryTestSuite flaky case due to view name conflict
> --
>
> Key: SPARK-47716
> URL: https://issues.apache.org/jira/browse/SPARK-47716
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
>
> In SQLQueryTestSuite, the test case "Test logic for determining whether a 
> query is semantically sorted" can sometimes fail with an error
> {{Cannot create table or view `main`.`default`.`t1` because it already 
> exists.}}
> if run concurrently with other sql test cases that also create tables with 
> the same name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47716.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45855
[https://github.com/apache/spark/pull/45855]

> SQLQueryTestSuite flaky case due to view name conflict
> --
>
> Key: SPARK-47716
> URL: https://issues.apache.org/jira/browse/SPARK-47716
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In SQLQueryTestSuite, the test case "Test logic for determining whether a 
> query is semantically sorted" can sometimes fail with an error
> {{Cannot create table or view `main`.`default`.`t1` because it already 
> exists.}}
> if run concurrently with other sql test cases that also create tables with 
> the same name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48461) Replace NullPointerExceptions with proper error classes in AssertNotNull expression

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48461.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46793
[https://github.com/apache/spark/pull/46793]

> Replace NullPointerExceptions with proper error classes in AssertNotNull 
> expression
> ---
>
> Key: SPARK-48461
> URL: https://issues.apache.org/jira/browse/SPARK-48461
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> [Code location 
> here|https://github.com/apache/spark/blob/f5d9b809881552c0e1b5af72b2a32caa25018eb3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L1929]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48446:


Assignee: Yuchen Liu

> Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
> --
>
> Key: SPARK-48446
> URL: https://issues.apache.org/jira/browse/SPARK-48446
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yuchen Liu
>Assignee: Yuchen Liu
>Priority: Minor
>  Labels: easyfix, pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For dropDuplicates, the example on 
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22]
>  is out of date compared with 
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html].
>  The argument should be a list.
> The discrepancy is also true for dropDuplicatesWithinWatermark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48446.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46797
[https://github.com/apache/spark/pull/46797]

> Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
> --
>
> Key: SPARK-48446
> URL: https://issues.apache.org/jira/browse/SPARK-48446
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yuchen Liu
>Assignee: Yuchen Liu
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 4.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For dropDuplicates, the example on 
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22]
>  is out of date compared with 
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html].
>  The argument should be a list.
> The discrepancy is also true for dropDuplicatesWithinWatermark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48475) Optimize _get_jvm_function in PySpark.

2024-05-30 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48475.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46809
[https://github.com/apache/spark/pull/46809]

> Optimize _get_jvm_function in PySpark.
> --
>
> Key: SPARK-48475
> URL: https://issues.apache.org/jira/browse/SPARK-48475
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48464) Refactor SQLConfSuite and StatisticsSuite

2024-05-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48464.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46796
[https://github.com/apache/spark/pull/46796]

> Refactor SQLConfSuite and StatisticsSuite
> -
>
> Key: SPARK-48464
> URL: https://issues.apache.org/jira/browse/SPARK-48464
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48454) Directly use the parent dataframe class

2024-05-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48454.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46785
[https://github.com/apache/spark/pull/46785]

> Directly use the parent dataframe class
> ---
>
> Key: SPARK-48454
> URL: https://issues.apache.org/jira/browse/SPARK-48454
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48454) Directly use the parent dataframe class

2024-05-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48454:


Assignee: Ruifeng Zheng

> Directly use the parent dataframe class
> ---
>
> Key: SPARK-48454
> URL: https://issues.apache.org/jira/browse/SPARK-48454
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48442) Add parenthesis to awaitTermination call

2024-05-29 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48442.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46779
[https://github.com/apache/spark/pull/46779]

> Add parenthesis to awaitTermination call
> 
>
> Key: SPARK-48442
> URL: https://issues.apache.org/jira/browse/SPARK-48442
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.3
>Reporter: Riya Verma
>Assignee: Riya Verma
>Priority: Trivial
>  Labels: correctness, pull-request-available, starter
> Fix For: 4.0.0
>
>
> In {{test_stream_reader}} and {{test_stream_writer}} of 
> {*}test_python_streaming_datasource.py{*}, the call {{q.awaitTermination}} 
> does not invoke a function call as intended, but instead returns a python 
> function object. The fix is to change this to {{{}q.awaitTermination(){}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48459) Implement DataFrameQueryContext in Spark Connect

2024-05-29 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48459:


 Summary: Implement DataFrameQueryContext in Spark Connect
 Key: SPARK-48459
 URL: https://issues.apache.org/jira/browse/SPARK-48459
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Implements the same https://github.com/apache/spark/pull/45377 in Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48445) Don't inline UDFs with non-cheap children in CollapseProject

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48445.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46780
[https://github.com/apache/spark/pull/46780]

> Don't inline UDFs with non-cheap children in CollapseProject
> 
>
> Key: SPARK-48445
> URL: https://issues.apache.org/jira/browse/SPARK-48445
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Because UDFs (and certain other expressions) are considered cheap by 
> CollapseProject.isCheap, they are inlined and potentially duplicated (which 
> is ok, because rules like ExtractPythonUDFs will de-duplicate them). However, 
> if the UDFs contain other non-cheap expressions, those will also be 
> duplicated and can potentially cause performance regressions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48445) Don't inline UDFs with non-cheap children in CollapseProject

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48445:


Assignee: Kelvin Jiang

> Don't inline UDFs with non-cheap children in CollapseProject
> 
>
> Key: SPARK-48445
> URL: https://issues.apache.org/jira/browse/SPARK-48445
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
>
> Because UDFs (and certain other expressions) are considered cheap by 
> CollapseProject.isCheap, they are inlined and potentially duplicated (which 
> is ok, because rules like ExtractPythonUDFs will de-duplicate them). However, 
> if the UDFs contain other non-cheap expressions, those will also be 
> duplicated and can potentially cause performance regressions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel

2024-05-28 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850230#comment-17850230
 ] 

Hyukjin Kwon commented on SPARK-23015:
--

Fixed in https://github.com/apache/spark/pull/43706

> spark-submit fails when submitting several jobs in parallel
> ---
>
> Key: SPARK-23015
> URL: https://issues.apache.org/jira/browse/SPARK-23015
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
> Environment: Windows 10 (1709/16299.125)
> Spark 2.3.0
> Java 8, Update 151
>Reporter: Hugh Zabriskie
>Priority: Major
>  Labels: bulk-closed, pull-request-available
> Fix For: 4.0.0
>
>
> Spark Submit's launching library prints the command to execute the launcher 
> (org.apache.spark.launcher.main) to a temporary text file, reads the result 
> back into a variable, and then executes that command.
> {code}
> set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
> "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main 
> %* > %LAUNCHER_OUTPUT%
> {code}
> [bin/spark-class2.cmd, 
> L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]
> That temporary text file is given a pseudo-random name by the %RANDOM% env 
> variable generator, which generates a number between 0 and 32767.
> This appears to be the cause of an error occurring when several spark-submit 
> jobs are launched simultaneously. The following error is returned from stderr:
> {quote}The process cannot access the file because it is being used by another 
> process. The system cannot find the file
> USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
> The process cannot access the file because it is being used by another 
> process.{quote}
> My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
> causing the launcher library to attempt to write to the same file from 
> multiple processes. Another mechanism is needed for reliably generating the 
> names of the temporary files so that the concurrency issue is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-23015) spark-submit fails when submitting several jobs in parallel

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-23015:
--

> spark-submit fails when submitting several jobs in parallel
> ---
>
> Key: SPARK-23015
> URL: https://issues.apache.org/jira/browse/SPARK-23015
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
> Environment: Windows 10 (1709/16299.125)
> Spark 2.3.0
> Java 8, Update 151
>Reporter: Hugh Zabriskie
>Priority: Major
>  Labels: bulk-closed, pull-request-available
>
> Spark Submit's launching library prints the command to execute the launcher 
> (org.apache.spark.launcher.main) to a temporary text file, reads the result 
> back into a variable, and then executes that command.
> {code}
> set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
> "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main 
> %* > %LAUNCHER_OUTPUT%
> {code}
> [bin/spark-class2.cmd, 
> L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]
> That temporary text file is given a pseudo-random name by the %RANDOM% env 
> variable generator, which generates a number between 0 and 32767.
> This appears to be the cause of an error occurring when several spark-submit 
> jobs are launched simultaneously. The following error is returned from stderr:
> {quote}The process cannot access the file because it is being used by another 
> process. The system cannot find the file
> USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
> The process cannot access the file because it is being used by another 
> process.{quote}
> My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
> causing the launcher library to attempt to write to the same file from 
> multiple processes. Another mechanism is needed for reliably generating the 
> names of the temporary files so that the concurrency issue is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23015) spark-submit fails when submitting several jobs in parallel

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-23015.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

> spark-submit fails when submitting several jobs in parallel
> ---
>
> Key: SPARK-23015
> URL: https://issues.apache.org/jira/browse/SPARK-23015
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
> Environment: Windows 10 (1709/16299.125)
> Spark 2.3.0
> Java 8, Update 151
>Reporter: Hugh Zabriskie
>Priority: Major
>  Labels: bulk-closed, pull-request-available
> Fix For: 4.0.0
>
>
> Spark Submit's launching library prints the command to execute the launcher 
> (org.apache.spark.launcher.main) to a temporary text file, reads the result 
> back into a variable, and then executes that command.
> {code}
> set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
> "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main 
> %* > %LAUNCHER_OUTPUT%
> {code}
> [bin/spark-class2.cmd, 
> L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]
> That temporary text file is given a pseudo-random name by the %RANDOM% env 
> variable generator, which generates a number between 0 and 32767.
> This appears to be the cause of an error occurring when several spark-submit 
> jobs are launched simultaneously. The following error is returned from stderr:
> {quote}The process cannot access the file because it is being used by another 
> process. The system cannot find the file
> USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
> The process cannot access the file because it is being used by another 
> process.{quote}
> My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
> causing the launcher library to attempt to write to the same file from 
> multiple processes. Another mechanism is needed for reliably generating the 
> names of the temporary files so that the concurrency issue is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42965) metadata mismatch for StructField when running some tests.

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42965:


Assignee: Ruifeng Zheng

> metadata mismatch for StructField when running some tests.
> --
>
> Key: SPARK-42965
> URL: https://issues.apache.org/jira/browse/SPARK-42965
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 4.0.0
>
>
> For some reason, the metadata of `StructField` is different in a few tests 
> when using Spark Connect. However, the function works properly.
> For example, when running `python/run-tests --testnames 
> 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops 
> BinaryOpsParityTests.test_add'` it complains `AssertionError: 
> ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), 
> False))], [StructField('bool', LongType(), False)])` because metadata is 
> different something like `\{'__autoGeneratedAlias': 'true'}` but they have 
> same name, type and nullable, so the function just works well.
> Therefore, we have temporarily added a branch for Spark Connect in the code 
> so that we can create InternalFrame properly to provide more pandas APIs in 
> Spark Connect. If a clear cause is found, we may need to revert it back to 
> its original state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48322) Drop internal metadata in `DataFrame.schema`

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48322.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46636
[https://github.com/apache/spark/pull/46636]

> Drop internal metadata in `DataFrame.schema`
> 
>
> Key: SPARK-48322
> URL: https://issues.apache.org/jira/browse/SPARK-48322
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42965) metadata mismatch for StructField when running some tests.

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42965.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46636
[https://github.com/apache/spark/pull/46636]

> metadata mismatch for StructField when running some tests.
> --
>
> Key: SPARK-42965
> URL: https://issues.apache.org/jira/browse/SPARK-42965
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> For some reason, the metadata of `StructField` is different in a few tests 
> when using Spark Connect. However, the function works properly.
> For example, when running `python/run-tests --testnames 
> 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops 
> BinaryOpsParityTests.test_add'` it complains `AssertionError: 
> ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), 
> False))], [StructField('bool', LongType(), False)])` because metadata is 
> different something like `\{'__autoGeneratedAlias': 'true'}` but they have 
> same name, type and nullable, so the function just works well.
> Therefore, we have temporarily added a branch for Spark Connect in the code 
> so that we can create InternalFrame properly to provide more pandas APIs in 
> Spark Connect. If a clear cause is found, we may need to revert it back to 
> its original state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48322) Drop internal metadata in `DataFrame.schema`

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48322:


Assignee: Ruifeng Zheng

> Drop internal metadata in `DataFrame.schema`
> 
>
> Key: SPARK-48322
> URL: https://issues.apache.org/jira/browse/SPARK-48322
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48438) Directly use the parent column class

2024-05-28 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48438.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46775
[https://github.com/apache/spark/pull/46775]

> Directly use the parent column class
> 
>
> Key: SPARK-48438
> URL: https://issues.apache.org/jira/browse/SPARK-48438
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48434) Make printSchema use the cached schema

2024-05-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48434.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46764
[https://github.com/apache/spark/pull/46764]

> Make printSchema use the cached schema
> --
>
> Key: SPARK-48434
> URL: https://issues.apache.org/jira/browse/SPARK-48434
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48434) Make printSchema use the cached schema

2024-05-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48434:


Assignee: Ruifeng Zheng

> Make printSchema use the cached schema
> --
>
> Key: SPARK-48434
> URL: https://issues.apache.org/jira/browse/SPARK-48434
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48432) Unnecessary Integer unboxing in UnivocityParser

2024-05-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48432:


Assignee: Vladimir Golubev

> Unnecessary Integer unboxing in UnivocityParser
> ---
>
> Key: SPARK-48432
> URL: https://issues.apache.org/jira/browse/SPARK-48432
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> `tokenIndexArr` is created as an array of `java.lang.Integers`. However, it 
> is used not only for the wrapped java parser, but also during parsing to 
> identify the correct token index.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48432) Unnecessary Integer unboxing in UnivocityParser

2024-05-27 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48432.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46759
[https://github.com/apache/spark/pull/46759]

> Unnecessary Integer unboxing in UnivocityParser
> ---
>
> Key: SPARK-48432
> URL: https://issues.apache.org/jira/browse/SPARK-48432
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> `tokenIndexArr` is created as an array of `java.lang.Integers`. However, it 
> is used not only for the wrapped java parser, but also during parsing to 
> identify the correct token index.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name

2024-05-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48425.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46751
[https://github.com/apache/spark/pull/46751]

> Replaces pyspark-connect to pyspark_connect for its output name
> ---
>
> Key: SPARK-48425
> URL: https://issues.apache.org/jira/browse/SPARK-48425
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The issue is at setuptools from 69.X.X.
> It replaces dash in package name to underscore 
> (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)
> https://github.com/pypa/setuptools/issues/4214



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name

2024-05-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48425:


Assignee: Hyukjin Kwon

> Replaces pyspark-connect to pyspark_connect for its output name
> ---
>
> Key: SPARK-48425
> URL: https://issues.apache.org/jira/browse/SPARK-48425
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> The issue is at setuptools from 69.X.X.
> It replaces dash in package name to underscore 
> (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)
> https://github.com/pypa/setuptools/issues/4214



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name

2024-05-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48425:
-
Description: 
The issue is at setuptools from 69.X.X.

It replaces dash in package name to underscore 
(`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)

https://github.com/pypa/setuptools/issues/4214

  was:
The issue is in the regression at setuptools from 69.X.X.

It replaces dash in package name to underscore 
(`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)

https://github.com/pypa/setuptools/issues/4214


> Replaces pyspark-connect to pyspark_connect for its output name
> ---
>
> Key: SPARK-48425
> URL: https://issues.apache.org/jira/browse/SPARK-48425
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> The issue is at setuptools from 69.X.X.
> It replaces dash in package name to underscore 
> (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)
> https://github.com/pypa/setuptools/issues/4214



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name

2024-05-26 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48425:


 Summary: Replaces pyspark-connect to pyspark_connect for its 
output name
 Key: SPARK-48425
 URL: https://issues.apache.org/jira/browse/SPARK-48425
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


The issue is in the regression at setuptools from 69.X.X.

It replaces dash in package name to underscore 
(`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`)

https://github.com/pypa/setuptools/issues/4214



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48424) Make dev/is-changed.py to return true it it fails

2024-05-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48424.
--
Fix Version/s: 3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46749
[https://github.com/apache/spark/pull/46749]

> Make dev/is-changed.py to return true it it fails
> -
>
> Key: SPARK-48424
> URL: https://issues.apache.org/jira/browse/SPARK-48424
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2, 4.0.0
>
>
> e.g., 
> https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48424) Make dev/is-changed.py to return true it it fails

2024-05-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48424:


Assignee: Hyukjin Kwon

> Make dev/is-changed.py to return true it it fails
> -
>
> Key: SPARK-48424
> URL: https://issues.apache.org/jira/browse/SPARK-48424
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> e.g., 
> https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48424) Make dev/is-changed.py to return true it it fails

2024-05-26 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48424:


 Summary: Make dev/is-changed.py to return true it it fails
 Key: SPARK-48424
 URL: https://issues.apache.org/jira/browse/SPARK-48424
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0, 3.5.2
Reporter: Hyukjin Kwon


e.g., 
https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48370.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46683
[https://github.com/apache/spark/pull/46683]

> Checkpoint and localCheckpoint in Scala Spark Connect client
> 
>
> Key: SPARK-48370
> URL: https://issues.apache.org/jira/browse/SPARK-48370
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark 
> Connect client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48370:


Assignee: Hyukjin Kwon

> Checkpoint and localCheckpoint in Scala Spark Connect client
> 
>
> Key: SPARK-48370
> URL: https://issues.apache.org/jira/browse/SPARK-48370
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark 
> Connect client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48393) Move a group of constants to `pyspark.util`

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48393:


Assignee: Ruifeng Zheng

> Move a group of constants to `pyspark.util`
> ---
>
> Key: SPARK-48393
> URL: https://issues.apache.org/jira/browse/SPARK-48393
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48393) Move a group of constants to `pyspark.util`

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48393.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46710
[https://github.com/apache/spark/pull/46710]

> Move a group of constants to `pyspark.util`
> ---
>
> Key: SPARK-48393
> URL: https://issues.apache.org/jira/browse/SPARK-48393
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-48379:
--
  Assignee: (was: Stefan Kandic)

Reverted in 
https://github.com/apache/spark/commit/9fd85d9acc5acf455d0ad910ef2848695576242b

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48379:
-
Fix Version/s: (was: 4.0.0)

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48389.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46703
[https://github.com/apache/spark/pull/46703]

> Remove obsolete workflow cancel_duplicate_workflow_runs
> ---
>
> Key: SPARK-48389
> URL: https://issues.apache.org/jira/browse/SPARK-48389
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> After https://github.com/apache/spark/pull/46689, we don't need this anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48389:


Assignee: Hyukjin Kwon

> Remove obsolete workflow cancel_duplicate_workflow_runs
> ---
>
> Key: SPARK-48389
> URL: https://issues.apache.org/jira/browse/SPARK-48389
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> After https://github.com/apache/spark/pull/46689, we don't need this anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs

2024-05-22 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48389:


 Summary: Remove obsolete workflow cancel_duplicate_workflow_runs
 Key: SPARK-48389
 URL: https://issues.apache.org/jira/browse/SPARK-48389
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


After https://github.com/apache/spark/pull/46689, we don't need this anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48379:


Assignee: Stefan Kandic

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48379.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46689
[https://github.com/apache/spark/pull/46689]

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48341) Allow Spark Connect plugins to use QueryTest in their tests

2024-05-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48341.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46667
[https://github.com/apache/spark/pull/46667]

> Allow Spark Connect plugins to use QueryTest in their tests
> ---
>
> Key: SPARK-48341
> URL: https://issues.apache.org/jira/browse/SPARK-48341
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-21 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48370:


 Summary: Checkpoint and localCheckpoint in Scala Spark Connect 
client
 Key: SPARK-48370
 URL: https://issues.apache.org/jira/browse/SPARK-48370
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark Connect 
client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48370:
-
Issue Type: Improvement  (was: Bug)

> Checkpoint and localCheckpoint in Scala Spark Connect client
> 
>
> Key: SPARK-48370
> URL: https://issues.apache.org/jira/browse/SPARK-48370
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark 
> Connect client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48367) Fix lint-scala for scalafmt to detect properly

2024-05-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48367.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46679
[https://github.com/apache/spark/pull/46679]

> Fix lint-scala for scalafmt to detect properly
> --
>
> Key: SPARK-48367
> URL: https://issues.apache.org/jira/browse/SPARK-48367
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> ./build/mvn \
> -Pscala-2.13 \
> scalafmt:format \
> -Dscalafmt.skip=false \
> -Dscalafmt.validateOnly=true \
> -Dscalafmt.changedOnly=false \
> -pl connector/connect/common \
> -pl connector/connect/server \
> -pl connector/connect/client/jvm
> {code}
> fails as below:
> {code}
> [INFO] Scalafmt results: 1 of 36 were unformatted
> [INFO] Details:
> [INFO] - Requires formatting: ConnectProtoUtils.scala
> [INFO] - Formatted: UdfUtils.scala
> [INFO] - Formatted: DataTypeProtoConverter.scala
> [INFO] - Formatted: ConnectCommon.scala
> [INFO] - Formatted: ProtoUtils.scala
> [INFO] - Formatted: Abbreviator.scala
> [INFO] - Formatted: ProtoDataTypes.scala
> [INFO] - Formatted: LiteralValueProtoConverter.scala
> [INFO] - Formatted: InvalidPlanInput.scala
> [INFO] - Formatted: ForeachWriterPacket.scala
> [INFO] - Formatted: StreamingListenerPacket.scala
> [INFO] - Formatted: StorageLevelProtoConverter.scala
> [INFO] - Formatted: UdfPacket.scala
> [INFO] - Formatted: ClassFinder.scala
> [INFO] - Formatted: SparkConnectClient.scala
> [INFO] - Formatted: GrpcRetryHandler.scala
> [INFO] - Formatted: GrpcExceptionConverter.scala
> [INFO] - Formatted: ArrowEncoderUtils.scala
> [INFO] - Formatted: ScalaCollectionUtils.scala
> [INFO] - Formatted: ArrowDeserializer.scala
> [INFO] - Formatted: ArrowVectorReader.scala
> [INFO] - Formatted: ArrowSerializer.scala
> [INFO] - Formatted: ConcatenatingArrowStreamReader.scala
> [INFO] - Formatted: RetryPolicy.scala
> [INFO] - Formatted: SparkConnectStubState.scala
> [INFO] - Formatted: ArtifactManager.scala
> [INFO] - Formatted: SparkResult.scala
> [INFO] - Formatted: RetriesExceeded.scala
> [INFO] - Formatted: CloseableIterator.scala
> [INFO] - Formatted: package.scala
> [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala
> [INFO] - Formatted: ResponseValidator.scala
> [INFO] - Formatted: SparkConnectClientParser.scala
> [INFO] - Formatted: CustomSparkConnectStub.scala
> [INFO] - Formatted: CustomSparkConnectBlockingStub.scala
> [INFO] - Formatted: TestUDFs.scala
> {code}
> This is because the output format has changed due to scalafmt version upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48367) Fix lint-scala for scalafmt to detect properly

2024-05-21 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48367:


Assignee: Hyukjin Kwon

> Fix lint-scala for scalafmt to detect properly
> --
>
> Key: SPARK-48367
> URL: https://issues.apache.org/jira/browse/SPARK-48367
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> ./build/mvn \
> -Pscala-2.13 \
> scalafmt:format \
> -Dscalafmt.skip=false \
> -Dscalafmt.validateOnly=true \
> -Dscalafmt.changedOnly=false \
> -pl connector/connect/common \
> -pl connector/connect/server \
> -pl connector/connect/client/jvm
> {code}
> fails as below:
> {code}
> [INFO] Scalafmt results: 1 of 36 were unformatted
> [INFO] Details:
> [INFO] - Requires formatting: ConnectProtoUtils.scala
> [INFO] - Formatted: UdfUtils.scala
> [INFO] - Formatted: DataTypeProtoConverter.scala
> [INFO] - Formatted: ConnectCommon.scala
> [INFO] - Formatted: ProtoUtils.scala
> [INFO] - Formatted: Abbreviator.scala
> [INFO] - Formatted: ProtoDataTypes.scala
> [INFO] - Formatted: LiteralValueProtoConverter.scala
> [INFO] - Formatted: InvalidPlanInput.scala
> [INFO] - Formatted: ForeachWriterPacket.scala
> [INFO] - Formatted: StreamingListenerPacket.scala
> [INFO] - Formatted: StorageLevelProtoConverter.scala
> [INFO] - Formatted: UdfPacket.scala
> [INFO] - Formatted: ClassFinder.scala
> [INFO] - Formatted: SparkConnectClient.scala
> [INFO] - Formatted: GrpcRetryHandler.scala
> [INFO] - Formatted: GrpcExceptionConverter.scala
> [INFO] - Formatted: ArrowEncoderUtils.scala
> [INFO] - Formatted: ScalaCollectionUtils.scala
> [INFO] - Formatted: ArrowDeserializer.scala
> [INFO] - Formatted: ArrowVectorReader.scala
> [INFO] - Formatted: ArrowSerializer.scala
> [INFO] - Formatted: ConcatenatingArrowStreamReader.scala
> [INFO] - Formatted: RetryPolicy.scala
> [INFO] - Formatted: SparkConnectStubState.scala
> [INFO] - Formatted: ArtifactManager.scala
> [INFO] - Formatted: SparkResult.scala
> [INFO] - Formatted: RetriesExceeded.scala
> [INFO] - Formatted: CloseableIterator.scala
> [INFO] - Formatted: package.scala
> [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala
> [INFO] - Formatted: ResponseValidator.scala
> [INFO] - Formatted: SparkConnectClientParser.scala
> [INFO] - Formatted: CustomSparkConnectStub.scala
> [INFO] - Formatted: CustomSparkConnectBlockingStub.scala
> [INFO] - Formatted: TestUDFs.scala
> {code}
> This is because the output format has changed due to scalafmt version upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-48367) Fix lint-scala for scalafmt to detect properly

2024-05-21 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-48367:


 Summary: Fix lint-scala for scalafmt to detect properly
 Key: SPARK-48367
 URL: https://issues.apache.org/jira/browse/SPARK-48367
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
./build/mvn \
-Pscala-2.13 \
scalafmt:format \
-Dscalafmt.skip=false \
-Dscalafmt.validateOnly=true \
-Dscalafmt.changedOnly=false \
-pl connector/connect/common \
-pl connector/connect/server \
-pl connector/connect/client/jvm
{code}

fails as below:

{code}
[INFO] Scalafmt results: 1 of 36 were unformatted
[INFO] Details:
[INFO] - Requires formatting: ConnectProtoUtils.scala
[INFO] - Formatted: UdfUtils.scala
[INFO] - Formatted: DataTypeProtoConverter.scala
[INFO] - Formatted: ConnectCommon.scala
[INFO] - Formatted: ProtoUtils.scala
[INFO] - Formatted: Abbreviator.scala
[INFO] - Formatted: ProtoDataTypes.scala
[INFO] - Formatted: LiteralValueProtoConverter.scala
[INFO] - Formatted: InvalidPlanInput.scala
[INFO] - Formatted: ForeachWriterPacket.scala
[INFO] - Formatted: StreamingListenerPacket.scala
[INFO] - Formatted: StorageLevelProtoConverter.scala
[INFO] - Formatted: UdfPacket.scala
[INFO] - Formatted: ClassFinder.scala
[INFO] - Formatted: SparkConnectClient.scala
[INFO] - Formatted: GrpcRetryHandler.scala
[INFO] - Formatted: GrpcExceptionConverter.scala
[INFO] - Formatted: ArrowEncoderUtils.scala
[INFO] - Formatted: ScalaCollectionUtils.scala
[INFO] - Formatted: ArrowDeserializer.scala
[INFO] - Formatted: ArrowVectorReader.scala
[INFO] - Formatted: ArrowSerializer.scala
[INFO] - Formatted: ConcatenatingArrowStreamReader.scala
[INFO] - Formatted: RetryPolicy.scala
[INFO] - Formatted: SparkConnectStubState.scala
[INFO] - Formatted: ArtifactManager.scala
[INFO] - Formatted: SparkResult.scala
[INFO] - Formatted: RetriesExceeded.scala
[INFO] - Formatted: CloseableIterator.scala
[INFO] - Formatted: package.scala
[INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala
[INFO] - Formatted: ResponseValidator.scala
[INFO] - Formatted: SparkConnectClientParser.scala
[INFO] - Formatted: CustomSparkConnectStub.scala
[INFO] - Formatted: CustomSparkConnectBlockingStub.scala
[INFO] - Formatted: TestUDFs.scala
{code}

This is because the output format has changed due to scalafmt version upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48363) Cleanup some redundant codes in `from_xml`

2024-05-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48363.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46674
[https://github.com/apache/spark/pull/46674]

> Cleanup some redundant codes in `from_xml`
> --
>
> Key: SPARK-48363
> URL: https://issues.apache.org/jira/browse/SPARK-48363
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48363) Cleanup some redundant codes in `from_xml`

2024-05-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48363:


Assignee: BingKun Pan

> Cleanup some redundant codes in `from_xml`
> --
>
> Key: SPARK-48363
> URL: https://issues.apache.org/jira/browse/SPARK-48363
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz

2024-05-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48340.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 4
[https://github.com/apache/spark/pull/4]

> Support TimestampNTZ  infer schema miss prefer_timestamp_ntz
> 
>
> Key: SPARK-48340
> URL: https://issues.apache.org/jira/browse/SPARK-48340
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2024-05-20-18-38-39-769.png
>
>
> !image-2024-05-20-18-38-39-769.png|width=746,height=450!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz

2024-05-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48340:


Assignee: angerszhu

> Support TimestampNTZ  infer schema miss prefer_timestamp_ntz
> 
>
> Key: SPARK-48340
> URL: https://issues.apache.org/jira/browse/SPARK-48340
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-05-20-18-38-39-769.png
>
>
> !image-2024-05-20-18-38-39-769.png|width=746,height=450!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint

2024-05-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48258.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46570
[https://github.com/apache/spark/pull/46570]

> Implement DataFrame.checkpoint and DataFrame.localCheckpoint
> 
>
> Key: SPARK-48258
> URL: https://issues.apache.org/jira/browse/SPARK-48258
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature 
> parity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 22158 matches

Mail list logo