[jira] [Resolved] (SPARK-48593) Fix the string representation of lambda function
[ https://issues.apache.org/jira/browse/SPARK-48593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48593. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46948 [https://github.com/apache/spark/pull/46948] > Fix the string representation of lambda function > > > Key: SPARK-48593 > URL: https://issues.apache.org/jira/browse/SPARK-48593 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48593) Fix the string representation of lambda function
[ https://issues.apache.org/jira/browse/SPARK-48593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48593: Assignee: Ruifeng Zheng > Fix the string representation of lambda function > > > Key: SPARK-48593 > URL: https://issues.apache.org/jira/browse/SPARK-48593 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48421) SPJ: Add documentation
[ https://issues.apache.org/jira/browse/SPARK-48421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48421: Assignee: Szehon Ho > SPJ: Add documentation > -- > > Key: SPARK-48421 > URL: https://issues.apache.org/jira/browse/SPARK-48421 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 4.0.0 >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Major > Labels: pull-request-available > > As part of SPARK-48329, we mentioned "Storage Partition Join" but noticed > there is no documentation describing the same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48421) SPJ: Add documentation
[ https://issues.apache.org/jira/browse/SPARK-48421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48421. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46745 [https://github.com/apache/spark/pull/46745] > SPJ: Add documentation > -- > > Key: SPARK-48421 > URL: https://issues.apache.org/jira/browse/SPARK-48421 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 4.0.0 >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > As part of SPARK-48329, we mentioned "Storage Partition Join" but noticed > there is no documentation describing the same. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48591) Simplify the if-else branches with `F.lit`
[ https://issues.apache.org/jira/browse/SPARK-48591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48591. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46946 [https://github.com/apache/spark/pull/46946] > Simplify the if-else branches with `F.lit` > -- > > Key: SPARK-48591 > URL: https://issues.apache.org/jira/browse/SPARK-48591 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48591) Simplify the if-else branches with `F.lit`
[ https://issues.apache.org/jira/browse/SPARK-48591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48591: Assignee: Ruifeng Zheng > Simplify the if-else branches with `F.lit` > -- > > Key: SPARK-48591 > URL: https://issues.apache.org/jira/browse/SPARK-48591 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48598) Propagate cached schema in dataframe operations
[ https://issues.apache.org/jira/browse/SPARK-48598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48598: Assignee: Ruifeng Zheng > Propagate cached schema in dataframe operations > --- > > Key: SPARK-48598 > URL: https://issues.apache.org/jira/browse/SPARK-48598 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48598) Propagate cached schema in dataframe operations
[ https://issues.apache.org/jira/browse/SPARK-48598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48598. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46954 [https://github.com/apache/spark/pull/46954] > Propagate cached schema in dataframe operations > --- > > Key: SPARK-48598 > URL: https://issues.apache.org/jira/browse/SPARK-48598 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48569) Connect - StreamingQuery.name should return null when not specified
[ https://issues.apache.org/jira/browse/SPARK-48569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48569. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46920 [https://github.com/apache/spark/pull/46920] > Connect - StreamingQuery.name should return null when not specified > --- > > Key: SPARK-48569 > URL: https://issues.apache.org/jira/browse/SPARK-48569 > Project: Spark > Issue Type: New Feature > Components: Connect, SS >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48564) Propagate cached schema in set operations
[ https://issues.apache.org/jira/browse/SPARK-48564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48564. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46915 [https://github.com/apache/spark/pull/46915] > Propagate cached schema in set operations > - > > Key: SPARK-48564 > URL: https://issues.apache.org/jira/browse/SPARK-48564 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48564) Propagate cached schema in set operations
[ https://issues.apache.org/jira/browse/SPARK-48564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48564: Assignee: Ruifeng Zheng > Propagate cached schema in set operations > - > > Key: SPARK-48564 > URL: https://issues.apache.org/jira/browse/SPARK-48564 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48560) Make StreamingQueryListener.spark settable
[ https://issues.apache.org/jira/browse/SPARK-48560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48560. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46909 [https://github.com/apache/spark/pull/46909] > Make StreamingQueryListener.spark settable > -- > > Key: SPARK-48560 > URL: https://issues.apache.org/jira/browse/SPARK-48560 > Project: Spark > Issue Type: Improvement > Components: PySpark, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Downstream users might already implement StreamingQueryListener.spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48552) multi-line CSV schema inference should also throw FAILED_READ_FILE
[ https://issues.apache.org/jira/browse/SPARK-48552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48552. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46890 [https://github.com/apache/spark/pull/46890] > multi-line CSV schema inference should also throw FAILED_READ_FILE > -- > > Key: SPARK-48552 > URL: https://issues.apache.org/jira/browse/SPARK-48552 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48552) multi-line CSV schema inference should also throw FAILED_READ_FILE
[ https://issues.apache.org/jira/browse/SPARK-48552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48552: Assignee: Wenchen Fan > multi-line CSV schema inference should also throw FAILED_READ_FILE > -- > > Key: SPARK-48552 > URL: https://issues.apache.org/jira/browse/SPARK-48552 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48560) Make StreamingQueryListener.spark settable
Hyukjin Kwon created SPARK-48560: Summary: Make StreamingQueryListener.spark settable Key: SPARK-48560 URL: https://issues.apache.org/jira/browse/SPARK-48560 Project: Spark Issue Type: Improvement Components: PySpark, Structured Streaming Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Downstream users might already implement StreamingQueryListener.spark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47952) Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn
[ https://issues.apache.org/jira/browse/SPARK-47952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47952: Assignee: TakawaAkirayo (was: Adam Binford) > Support retrieving the real SparkConnectService GRPC address and port > programmatically when running on Yarn > --- > > Key: SPARK-47952 > URL: https://issues.apache.org/jira/browse/SPARK-47952 > Project: Spark > Issue Type: Story > Components: Connect >Affects Versions: 4.0.0 >Reporter: TakawaAkirayo >Assignee: TakawaAkirayo >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > 1.User Story: > Our data analysts and data scientists use Jupyter notebooks provisioned on > Kubernetes (k8s) with limited CPU/memory resources to run Spark-shell/pyspark > in the terminal via Yarn Client mode. However, Yarn Client mode consumes > significant local memory if the job is heavy, and the total resource pool of > k8s for notebooks is limited. To leverage the abundant resources of our > Hadoop cluster for scalability purposes, we aim to utilize SparkConnect. This > allows the driver on Yarn with SparkConnectService started and uses > SparkConnect client to connect to the remote driver. > To provide a seamless experience with one command startup for both server and > client, we've wrapped the following processes in one script: > 1) Start a local coordinator server (implemented by us, not in this PR) with > a specified port. > 2) Start SparkConnectServer by spark-submit via Yarn Cluster mode with > user-input Spark configurations and the local coordinator server's address > and port. Append an additional listener class in the configuration for > SparkConnectService callback with the actual address and port on Yarn to the > coordinator server. > 3) Wait for the coordinator server to receive the address callback from the > SparkConnectService on Yarn and export the real address. > 4) Start the client (pyspark --remote) with the remote address. > Finally, a remote SparkConnect Server is started on Yarn with a local > SparkConnect client connected. Users no longer need to start the server > beforehand and connect to the remote server after they manually explore the > address on Yarn. > 2.Problem statement of this change: > 1) The specified port for the SparkConnectService GRPC server might be > occupied on the node of the Hadoop Cluster. To increase the success rate of > startup, it needs to retry on conflicts rather than fail directly. > 2) Because the final binding port could be uncertain based on #1 and the > remote address is unpredictable on Yarn, we need to retrieve the address and > port programmatically and inject it automatically on the start of `pyspark > --remote`. The SparkConnectService needs to communicate its location back to > the launcher side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47952) Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn
[ https://issues.apache.org/jira/browse/SPARK-47952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47952. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46182 [https://github.com/apache/spark/pull/46182] > Support retrieving the real SparkConnectService GRPC address and port > programmatically when running on Yarn > --- > > Key: SPARK-47952 > URL: https://issues.apache.org/jira/browse/SPARK-47952 > Project: Spark > Issue Type: Story > Components: Connect >Affects Versions: 4.0.0 >Reporter: TakawaAkirayo >Assignee: Adam Binford >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > 1.User Story: > Our data analysts and data scientists use Jupyter notebooks provisioned on > Kubernetes (k8s) with limited CPU/memory resources to run Spark-shell/pyspark > in the terminal via Yarn Client mode. However, Yarn Client mode consumes > significant local memory if the job is heavy, and the total resource pool of > k8s for notebooks is limited. To leverage the abundant resources of our > Hadoop cluster for scalability purposes, we aim to utilize SparkConnect. This > allows the driver on Yarn with SparkConnectService started and uses > SparkConnect client to connect to the remote driver. > To provide a seamless experience with one command startup for both server and > client, we've wrapped the following processes in one script: > 1) Start a local coordinator server (implemented by us, not in this PR) with > a specified port. > 2) Start SparkConnectServer by spark-submit via Yarn Cluster mode with > user-input Spark configurations and the local coordinator server's address > and port. Append an additional listener class in the configuration for > SparkConnectService callback with the actual address and port on Yarn to the > coordinator server. > 3) Wait for the coordinator server to receive the address callback from the > SparkConnectService on Yarn and export the real address. > 4) Start the client (pyspark --remote) with the remote address. > Finally, a remote SparkConnect Server is started on Yarn with a local > SparkConnect client connected. Users no longer need to start the server > beforehand and connect to the remote server after they manually explore the > address on Yarn. > 2.Problem statement of this change: > 1) The specified port for the SparkConnectService GRPC server might be > occupied on the node of the Hadoop Cluster. To increase the success rate of > startup, it needs to retry on conflicts rather than fail directly. > 2) Because the final binding port could be uncertain based on #1 and the > remote address is unpredictable on Yarn, we need to retrieve the address and > port programmatically and inject it automatically on the start of `pyspark > --remote`. The SparkConnectService needs to communicate its location back to > the launcher side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47952) Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn
[ https://issues.apache.org/jira/browse/SPARK-47952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47952: Assignee: Adam Binford > Support retrieving the real SparkConnectService GRPC address and port > programmatically when running on Yarn > --- > > Key: SPARK-47952 > URL: https://issues.apache.org/jira/browse/SPARK-47952 > Project: Spark > Issue Type: Story > Components: Connect >Affects Versions: 4.0.0 >Reporter: TakawaAkirayo >Assignee: Adam Binford >Priority: Minor > Labels: pull-request-available > > 1.User Story: > Our data analysts and data scientists use Jupyter notebooks provisioned on > Kubernetes (k8s) with limited CPU/memory resources to run Spark-shell/pyspark > in the terminal via Yarn Client mode. However, Yarn Client mode consumes > significant local memory if the job is heavy, and the total resource pool of > k8s for notebooks is limited. To leverage the abundant resources of our > Hadoop cluster for scalability purposes, we aim to utilize SparkConnect. This > allows the driver on Yarn with SparkConnectService started and uses > SparkConnect client to connect to the remote driver. > To provide a seamless experience with one command startup for both server and > client, we've wrapped the following processes in one script: > 1) Start a local coordinator server (implemented by us, not in this PR) with > a specified port. > 2) Start SparkConnectServer by spark-submit via Yarn Cluster mode with > user-input Spark configurations and the local coordinator server's address > and port. Append an additional listener class in the configuration for > SparkConnectService callback with the actual address and port on Yarn to the > coordinator server. > 3) Wait for the coordinator server to receive the address callback from the > SparkConnectService on Yarn and export the real address. > 4) Start the client (pyspark --remote) with the remote address. > Finally, a remote SparkConnect Server is started on Yarn with a local > SparkConnect client connected. Users no longer need to start the server > beforehand and connect to the remote server after they manually explore the > address on Yarn. > 2.Problem statement of this change: > 1) The specified port for the SparkConnectService GRPC server might be > occupied on the node of the Hadoop Cluster. To increase the success rate of > startup, it needs to retry on conflicts rather than fail directly. > 2) Because the final binding port could be uncertain based on #1 and the > remote address is unpredictable on Yarn, we need to retrieve the address and > port programmatically and inject it automatically on the start of `pyspark > --remote`. The SparkConnectService needs to communicate its location back to > the launcher side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48550) Directly use the parent Window class
[ https://issues.apache.org/jira/browse/SPARK-48550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48550. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46892 [https://github.com/apache/spark/pull/46892] > Directly use the parent Window class > > > Key: SPARK-48550 > URL: https://issues.apache.org/jira/browse/SPARK-48550 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48550) Directly use the parent Window class
[ https://issues.apache.org/jira/browse/SPARK-48550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48550: Assignee: Ruifeng Zheng > Directly use the parent Window class > > > Key: SPARK-48550 > URL: https://issues.apache.org/jira/browse/SPARK-48550 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48533) Add test for cached schema
[ https://issues.apache.org/jira/browse/SPARK-48533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48533: Assignee: Ruifeng Zheng > Add test for cached schema > -- > > Key: SPARK-48533 > URL: https://issues.apache.org/jira/browse/SPARK-48533 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48533) Add test for cached schema
[ https://issues.apache.org/jira/browse/SPARK-48533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48533. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46871 [https://github.com/apache/spark/pull/46871] > Add test for cached schema > -- > > Key: SPARK-48533 > URL: https://issues.apache.org/jira/browse/SPARK-48533 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48534) Support interruptOperation in streaming queries
Hyukjin Kwon created SPARK-48534: Summary: Support interruptOperation in streaming queries Key: SPARK-48534 URL: https://issues.apache.org/jira/browse/SPARK-48534 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Similar with https://issues.apache.org/jira/browse/SPARK-48485 but we should also add interruptOperation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`
[ https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48523. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46862 [https://github.com/apache/spark/pull/46862] > Add `grpc_max_message_size ` description to `client-connection-string.md` > - > > Key: SPARK-48523 > URL: https://issues.apache.org/jira/browse/SPARK-48523 > Project: Spark > Issue Type: Improvement > Components: Connect, Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`
[ https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48523: Assignee: BingKun Pan > Add `grpc_max_message_size ` description to `client-connection-string.md` > - > > Key: SPARK-48523 > URL: https://issues.apache.org/jira/browse/SPARK-48523 > Project: Spark > Issue Type: Improvement > Components: Connect, Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48485) Support interruptTag and interruptAll in streaming queries
[ https://issues.apache.org/jira/browse/SPARK-48485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48485. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46819 [https://github.com/apache/spark/pull/46819] > Support interruptTag and interruptAll in streaming queries > -- > > Key: SPARK-48485 > URL: https://issues.apache.org/jira/browse/SPARK-48485 > Project: Spark > Issue Type: Improvement > Components: Connect, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Spark Connect's interrupt API does not interrupt streaming queries. We should > support them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
[ https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48482: Assignee: Wei Liu > dropDuplicates and dropDuplicatesWithinWatermark should accept varargs > -- > > Key: SPARK-48482 > URL: https://issues.apache.org/jira/browse/SPARK-48482 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
[ https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48482. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46817 [https://github.com/apache/spark/pull/46817] > dropDuplicates and dropDuplicatesWithinWatermark should accept varargs > -- > > Key: SPARK-48482 > URL: https://issues.apache.org/jira/browse/SPARK-48482 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Wei Liu >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48508) Client Side RPC optimization for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48508: Assignee: Ruifeng Zheng > Client Side RPC optimization for Spark Connect > -- > > Key: SPARK-48508 > URL: https://issues.apache.org/jira/browse/SPARK-48508 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48508) Client Side RPC optimization for Spark Connect
[ https://issues.apache.org/jira/browse/SPARK-48508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48508. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46848 [https://github.com/apache/spark/pull/46848] > Client Side RPC optimization for Spark Connect > -- > > Key: SPARK-48508 > URL: https://issues.apache.org/jira/browse/SPARK-48508 > Project: Spark > Issue Type: Umbrella > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48507) Use Hadoop 3.3.6 winutils in `build_sparkr_window`
[ https://issues.apache.org/jira/browse/SPARK-48507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48507. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46846 [https://github.com/apache/spark/pull/46846] > Use Hadoop 3.3.6 winutils in `build_sparkr_window` > -- > > Key: SPARK-48507 > URL: https://issues.apache.org/jira/browse/SPARK-48507 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48507) Use Hadoop 3.3.6 winutils in `build_sparkr_window`
[ https://issues.apache.org/jira/browse/SPARK-48507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48507: Assignee: BingKun Pan > Use Hadoop 3.3.6 winutils in `build_sparkr_window` > -- > > Key: SPARK-48507 > URL: https://issues.apache.org/jira/browse/SPARK-48507 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48504) Parent Window class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-48504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48504: Assignee: Ruifeng Zheng > Parent Window class for Spark Connect and Spark Classic > --- > > Key: SPARK-48504 > URL: https://issues.apache.org/jira/browse/SPARK-48504 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48504) Parent Window class for Spark Connect and Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-48504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48504. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46841 [https://github.com/apache/spark/pull/46841] > Parent Window class for Spark Connect and Spark Classic > --- > > Key: SPARK-48504 > URL: https://issues.apache.org/jira/browse/SPARK-48504 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48496) Use static regex Pattern instances in common/utils JavaUtils
[ https://issues.apache.org/jira/browse/SPARK-48496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48496. -- Fix Version/s: 4.0.0 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/46829 > Use static regex Pattern instances in common/utils JavaUtils > > > Key: SPARK-48496 > URL: https://issues.apache.org/jira/browse/SPARK-48496 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Some methods in JavaUtils.java are recompiling regexes on every invocation; > we should instead store a single cached Pattern. > This is a minor perf. issue that I spotted in the context of other profiling. > Not a huge bottleneck in the grand scheme of things, but simple and > straightforward to fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48489) Throw an user-facing error when reading invalid schema from text DataSource
[ https://issues.apache.org/jira/browse/SPARK-48489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48489. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46823 [https://github.com/apache/spark/pull/46823] > Throw an user-facing error when reading invalid schema from text DataSource > --- > > Key: SPARK-48489 > URL: https://issues.apache.org/jira/browse/SPARK-48489 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.3 >Reporter: Stefan Bukorovic >Assignee: Stefan Bukorovic >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Text DataSource produces table schema with only 1 column, but it is possible > to try and create a table with schema having multiple columns. > Currently, when user tries this, we have an assert in the code, which fails > and throws internal spark error. We should throw a better user-facing error. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48489) Throw an user-facing error when reading invalid schema from text DataSource
[ https://issues.apache.org/jira/browse/SPARK-48489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48489: Assignee: Stefan Bukorovic > Throw an user-facing error when reading invalid schema from text DataSource > --- > > Key: SPARK-48489 > URL: https://issues.apache.org/jira/browse/SPARK-48489 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.3 >Reporter: Stefan Bukorovic >Assignee: Stefan Bukorovic >Priority: Minor > Labels: pull-request-available > > Text DataSource produces table schema with only 1 column, but it is possible > to try and create a table with schema having multiple columns. > Currently, when user tries this, we have an assert in the code, which fails > and throws internal spark error. We should throw a better user-facing error. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48374) Support additional PyArrow Table column types
[ https://issues.apache.org/jira/browse/SPARK-48374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48374: Assignee: Ian Cook > Support additional PyArrow Table column types > - > > Key: SPARK-48374 > URL: https://issues.apache.org/jira/browse/SPARK-48374 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > > SPARK-48220 adds support for passing a PyArrow Table to > {{{}createDataFrame(){}}}, but there are a few PyArrow column types that are > not yet supported: > * fixed-size binary > * fixed-size list > * large list > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48374) Support additional PyArrow Table column types
[ https://issues.apache.org/jira/browse/SPARK-48374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48374. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46688 [https://github.com/apache/spark/pull/46688] > Support additional PyArrow Table column types > - > > Key: SPARK-48374 > URL: https://issues.apache.org/jira/browse/SPARK-48374 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SPARK-48220 adds support for passing a PyArrow Table to > {{{}createDataFrame(){}}}, but there are a few PyArrow column types that are > not yet supported: > * fixed-size binary > * fixed-size list > * large list > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()
[ https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48220: Assignee: Ian Cook > Allow passing PyArrow Table to createDataFrame() > > > Key: SPARK-48220 > URL: https://issues.apache.org/jira/browse/SPARK-48220 > Project: Spark > Issue Type: Sub-task > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > > SPARK-47365 added support for returning a Spark DataFrame as a PyArrow Table. > It would be nice if we could also go in the opposite direction, enabling > users to create a Spark DataFrame from a PyArrow Table by passing the PyArrow > Table to {{spark.createDataFrame()}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()
[ https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48220. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46529 [https://github.com/apache/spark/pull/46529] > Allow passing PyArrow Table to createDataFrame() > > > Key: SPARK-48220 > URL: https://issues.apache.org/jira/browse/SPARK-48220 > Project: Spark > Issue Type: Sub-task > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SPARK-47365 added support for returning a Spark DataFrame as a PyArrow Table. > It would be nice if we could also go in the opposite direction, enabling > users to create a Spark DataFrame from a PyArrow Table by passing the PyArrow > Table to {{spark.createDataFrame()}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48485) Support interruptTag and interruptAll in streaming queries
Hyukjin Kwon created SPARK-48485: Summary: Support interruptTag and interruptAll in streaming queries Key: SPARK-48485 URL: https://issues.apache.org/jira/browse/SPARK-48485 Project: Spark Issue Type: Improvement Components: Connect, Structured Streaming Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Spark Connect's interrupt API does not interrupt streaming queries. We should support them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
[ https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48474: Assignee: BingKun Pan > Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit` > --- > > Key: SPARK-48474 > URL: https://issues.apache.org/jira/browse/SPARK-48474 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
[ https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48474. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46808 [https://github.com/apache/spark/pull/46808] > Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit` > --- > > Key: SPARK-48474 > URL: https://issues.apache.org/jira/browse/SPARK-48474 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48467) Upgrade Maven to 3.9.7
[ https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48467: Assignee: BingKun Pan > Upgrade Maven to 3.9.7 > -- > > Key: SPARK-48467 > URL: https://issues.apache.org/jira/browse/SPARK-48467 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48467) Upgrade Maven to 3.9.7
[ https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48467. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46798 [https://github.com/apache/spark/pull/46798] > Upgrade Maven to 3.9.7 > -- > > Key: SPARK-48467 > URL: https://issues.apache.org/jira/browse/SPARK-48467 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict
[ https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47716: Assignee: Jack Chen > SQLQueryTestSuite flaky case due to view name conflict > -- > > Key: SPARK-47716 > URL: https://issues.apache.org/jira/browse/SPARK-47716 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jack Chen >Assignee: Jack Chen >Priority: Major > Labels: pull-request-available > > In SQLQueryTestSuite, the test case "Test logic for determining whether a > query is semantically sorted" can sometimes fail with an error > {{Cannot create table or view `main`.`default`.`t1` because it already > exists.}} > if run concurrently with other sql test cases that also create tables with > the same name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict
[ https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47716. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45855 [https://github.com/apache/spark/pull/45855] > SQLQueryTestSuite flaky case due to view name conflict > -- > > Key: SPARK-47716 > URL: https://issues.apache.org/jira/browse/SPARK-47716 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jack Chen >Assignee: Jack Chen >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > In SQLQueryTestSuite, the test case "Test logic for determining whether a > query is semantically sorted" can sometimes fail with an error > {{Cannot create table or view `main`.`default`.`t1` because it already > exists.}} > if run concurrently with other sql test cases that also create tables with > the same name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48461) Replace NullPointerExceptions with proper error classes in AssertNotNull expression
[ https://issues.apache.org/jira/browse/SPARK-48461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48461. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46793 [https://github.com/apache/spark/pull/46793] > Replace NullPointerExceptions with proper error classes in AssertNotNull > expression > --- > > Key: SPARK-48461 > URL: https://issues.apache.org/jira/browse/SPARK-48461 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > [Code location > here|https://github.com/apache/spark/blob/f5d9b809881552c0e1b5af72b2a32caa25018eb3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L1929] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
[ https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48446: Assignee: Yuchen Liu > Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax > -- > > Key: SPARK-48446 > URL: https://issues.apache.org/jira/browse/SPARK-48446 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yuchen Liu >Assignee: Yuchen Liu >Priority: Minor > Labels: easyfix, pull-request-available > Original Estimate: 1h > Remaining Estimate: 1h > > For dropDuplicates, the example on > [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22] > is out of date compared with > [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html]. > The argument should be a list. > The discrepancy is also true for dropDuplicatesWithinWatermark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
[ https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48446. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46797 [https://github.com/apache/spark/pull/46797] > Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax > -- > > Key: SPARK-48446 > URL: https://issues.apache.org/jira/browse/SPARK-48446 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Yuchen Liu >Assignee: Yuchen Liu >Priority: Minor > Labels: easyfix, pull-request-available > Fix For: 4.0.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > For dropDuplicates, the example on > [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22] > is out of date compared with > [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html]. > The argument should be a list. > The discrepancy is also true for dropDuplicatesWithinWatermark. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48475) Optimize _get_jvm_function in PySpark.
[ https://issues.apache.org/jira/browse/SPARK-48475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48475. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46809 [https://github.com/apache/spark/pull/46809] > Optimize _get_jvm_function in PySpark. > -- > > Key: SPARK-48475 > URL: https://issues.apache.org/jira/browse/SPARK-48475 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Chenhao Li >Assignee: Chenhao Li >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48464) Refactor SQLConfSuite and StatisticsSuite
[ https://issues.apache.org/jira/browse/SPARK-48464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48464. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46796 [https://github.com/apache/spark/pull/46796] > Refactor SQLConfSuite and StatisticsSuite > - > > Key: SPARK-48464 > URL: https://issues.apache.org/jira/browse/SPARK-48464 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 4.0.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48454) Directly use the parent dataframe class
[ https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48454. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46785 [https://github.com/apache/spark/pull/46785] > Directly use the parent dataframe class > --- > > Key: SPARK-48454 > URL: https://issues.apache.org/jira/browse/SPARK-48454 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48454) Directly use the parent dataframe class
[ https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48454: Assignee: Ruifeng Zheng > Directly use the parent dataframe class > --- > > Key: SPARK-48454 > URL: https://issues.apache.org/jira/browse/SPARK-48454 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48442) Add parenthesis to awaitTermination call
[ https://issues.apache.org/jira/browse/SPARK-48442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48442. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46779 [https://github.com/apache/spark/pull/46779] > Add parenthesis to awaitTermination call > > > Key: SPARK-48442 > URL: https://issues.apache.org/jira/browse/SPARK-48442 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.3 >Reporter: Riya Verma >Assignee: Riya Verma >Priority: Trivial > Labels: correctness, pull-request-available, starter > Fix For: 4.0.0 > > > In {{test_stream_reader}} and {{test_stream_writer}} of > {*}test_python_streaming_datasource.py{*}, the call {{q.awaitTermination}} > does not invoke a function call as intended, but instead returns a python > function object. The fix is to change this to {{{}q.awaitTermination(){}}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48459) Implement DataFrameQueryContext in Spark Connect
Hyukjin Kwon created SPARK-48459: Summary: Implement DataFrameQueryContext in Spark Connect Key: SPARK-48459 URL: https://issues.apache.org/jira/browse/SPARK-48459 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Implements the same https://github.com/apache/spark/pull/45377 in Spark Connect -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48445) Don't inline UDFs with non-cheap children in CollapseProject
[ https://issues.apache.org/jira/browse/SPARK-48445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48445. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46780 [https://github.com/apache/spark/pull/46780] > Don't inline UDFs with non-cheap children in CollapseProject > > > Key: SPARK-48445 > URL: https://issues.apache.org/jira/browse/SPARK-48445 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Because UDFs (and certain other expressions) are considered cheap by > CollapseProject.isCheap, they are inlined and potentially duplicated (which > is ok, because rules like ExtractPythonUDFs will de-duplicate them). However, > if the UDFs contain other non-cheap expressions, those will also be > duplicated and can potentially cause performance regressions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48445) Don't inline UDFs with non-cheap children in CollapseProject
[ https://issues.apache.org/jira/browse/SPARK-48445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48445: Assignee: Kelvin Jiang > Don't inline UDFs with non-cheap children in CollapseProject > > > Key: SPARK-48445 > URL: https://issues.apache.org/jira/browse/SPARK-48445 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Kelvin Jiang >Assignee: Kelvin Jiang >Priority: Major > Labels: pull-request-available > > Because UDFs (and certain other expressions) are considered cheap by > CollapseProject.isCheap, they are inlined and potentially duplicated (which > is ok, because rules like ExtractPythonUDFs will de-duplicate them). However, > if the UDFs contain other non-cheap expressions, those will also be > duplicated and can potentially cause performance regressions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850230#comment-17850230 ] Hyukjin Kwon commented on SPARK-23015: -- Fixed in https://github.com/apache/spark/pull/43706 > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > Labels: bulk-closed, pull-request-available > Fix For: 4.0.0 > > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-23015: -- > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > Labels: bulk-closed, pull-request-available > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23015) spark-submit fails when submitting several jobs in parallel
[ https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23015. -- Fix Version/s: 4.0.0 Resolution: Fixed > spark-submit fails when submitting several jobs in parallel > --- > > Key: SPARK-23015 > URL: https://issues.apache.org/jira/browse/SPARK-23015 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1 > Environment: Windows 10 (1709/16299.125) > Spark 2.3.0 > Java 8, Update 151 >Reporter: Hugh Zabriskie >Priority: Major > Labels: bulk-closed, pull-request-available > Fix For: 4.0.0 > > > Spark Submit's launching library prints the command to execute the launcher > (org.apache.spark.launcher.main) to a temporary text file, reads the result > back into a variable, and then executes that command. > {code} > set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt > "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main > %* > %LAUNCHER_OUTPUT% > {code} > [bin/spark-class2.cmd, > L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66] > That temporary text file is given a pseudo-random name by the %RANDOM% env > variable generator, which generates a number between 0 and 32767. > This appears to be the cause of an error occurring when several spark-submit > jobs are launched simultaneously. The following error is returned from stderr: > {quote}The process cannot access the file because it is being used by another > process. The system cannot find the file > USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt. > The process cannot access the file because it is being used by another > process.{quote} > My hypothesis is that %RANDOM% is returning the same value for multiple jobs, > causing the launcher library to attempt to write to the same file from > multiple processes. Another mechanism is needed for reliably generating the > names of the temporary files so that the concurrency issue is resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42965) metadata mismatch for StructField when running some tests.
[ https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-42965: Assignee: Ruifeng Zheng > metadata mismatch for StructField when running some tests. > -- > > Key: SPARK-42965 > URL: https://issues.apache.org/jira/browse/SPARK-42965 > Project: Spark > Issue Type: Improvement > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 4.0.0 > > > For some reason, the metadata of `StructField` is different in a few tests > when using Spark Connect. However, the function works properly. > For example, when running `python/run-tests --testnames > 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops > BinaryOpsParityTests.test_add'` it complains `AssertionError: > ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), > False))], [StructField('bool', LongType(), False)])` because metadata is > different something like `\{'__autoGeneratedAlias': 'true'}` but they have > same name, type and nullable, so the function just works well. > Therefore, we have temporarily added a branch for Spark Connect in the code > so that we can create InternalFrame properly to provide more pandas APIs in > Spark Connect. If a clear cause is found, we may need to revert it back to > its original state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48322) Drop internal metadata in `DataFrame.schema`
[ https://issues.apache.org/jira/browse/SPARK-48322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48322. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46636 [https://github.com/apache/spark/pull/46636] > Drop internal metadata in `DataFrame.schema` > > > Key: SPARK-48322 > URL: https://issues.apache.org/jira/browse/SPARK-48322 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42965) metadata mismatch for StructField when running some tests.
[ https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-42965. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46636 [https://github.com/apache/spark/pull/46636] > metadata mismatch for StructField when running some tests. > -- > > Key: SPARK-42965 > URL: https://issues.apache.org/jira/browse/SPARK-42965 > Project: Spark > Issue Type: Improvement > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > For some reason, the metadata of `StructField` is different in a few tests > when using Spark Connect. However, the function works properly. > For example, when running `python/run-tests --testnames > 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops > BinaryOpsParityTests.test_add'` it complains `AssertionError: > ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), > False))], [StructField('bool', LongType(), False)])` because metadata is > different something like `\{'__autoGeneratedAlias': 'true'}` but they have > same name, type and nullable, so the function just works well. > Therefore, we have temporarily added a branch for Spark Connect in the code > so that we can create InternalFrame properly to provide more pandas APIs in > Spark Connect. If a clear cause is found, we may need to revert it back to > its original state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48322) Drop internal metadata in `DataFrame.schema`
[ https://issues.apache.org/jira/browse/SPARK-48322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48322: Assignee: Ruifeng Zheng > Drop internal metadata in `DataFrame.schema` > > > Key: SPARK-48322 > URL: https://issues.apache.org/jira/browse/SPARK-48322 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark, SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48438) Directly use the parent column class
[ https://issues.apache.org/jira/browse/SPARK-48438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48438. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46775 [https://github.com/apache/spark/pull/46775] > Directly use the parent column class > > > Key: SPARK-48438 > URL: https://issues.apache.org/jira/browse/SPARK-48438 > Project: Spark > Issue Type: Improvement > Components: Connect, PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48434) Make printSchema use the cached schema
[ https://issues.apache.org/jira/browse/SPARK-48434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48434. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46764 [https://github.com/apache/spark/pull/46764] > Make printSchema use the cached schema > -- > > Key: SPARK-48434 > URL: https://issues.apache.org/jira/browse/SPARK-48434 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48434) Make printSchema use the cached schema
[ https://issues.apache.org/jira/browse/SPARK-48434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48434: Assignee: Ruifeng Zheng > Make printSchema use the cached schema > -- > > Key: SPARK-48434 > URL: https://issues.apache.org/jira/browse/SPARK-48434 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48432) Unnecessary Integer unboxing in UnivocityParser
[ https://issues.apache.org/jira/browse/SPARK-48432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48432: Assignee: Vladimir Golubev > Unnecessary Integer unboxing in UnivocityParser > --- > > Key: SPARK-48432 > URL: https://issues.apache.org/jira/browse/SPARK-48432 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > `tokenIndexArr` is created as an array of `java.lang.Integers`. However, it > is used not only for the wrapped java parser, but also during parsing to > identify the correct token index. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48432) Unnecessary Integer unboxing in UnivocityParser
[ https://issues.apache.org/jira/browse/SPARK-48432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48432. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46759 [https://github.com/apache/spark/pull/46759] > Unnecessary Integer unboxing in UnivocityParser > --- > > Key: SPARK-48432 > URL: https://issues.apache.org/jira/browse/SPARK-48432 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > `tokenIndexArr` is created as an array of `java.lang.Integers`. However, it > is used not only for the wrapped java parser, but also during parsing to > identify the correct token index. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name
[ https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48425. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46751 [https://github.com/apache/spark/pull/46751] > Replaces pyspark-connect to pyspark_connect for its output name > --- > > Key: SPARK-48425 > URL: https://issues.apache.org/jira/browse/SPARK-48425 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > The issue is at setuptools from 69.X.X. > It replaces dash in package name to underscore > (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) > https://github.com/pypa/setuptools/issues/4214 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name
[ https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48425: Assignee: Hyukjin Kwon > Replaces pyspark-connect to pyspark_connect for its output name > --- > > Key: SPARK-48425 > URL: https://issues.apache.org/jira/browse/SPARK-48425 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > The issue is at setuptools from 69.X.X. > It replaces dash in package name to underscore > (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) > https://github.com/pypa/setuptools/issues/4214 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name
[ https://issues.apache.org/jira/browse/SPARK-48425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48425: - Description: The issue is at setuptools from 69.X.X. It replaces dash in package name to underscore (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) https://github.com/pypa/setuptools/issues/4214 was: The issue is in the regression at setuptools from 69.X.X. It replaces dash in package name to underscore (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) https://github.com/pypa/setuptools/issues/4214 > Replaces pyspark-connect to pyspark_connect for its output name > --- > > Key: SPARK-48425 > URL: https://issues.apache.org/jira/browse/SPARK-48425 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > The issue is at setuptools from 69.X.X. > It replaces dash in package name to underscore > (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) > https://github.com/pypa/setuptools/issues/4214 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48425) Replaces pyspark-connect to pyspark_connect for its output name
Hyukjin Kwon created SPARK-48425: Summary: Replaces pyspark-connect to pyspark_connect for its output name Key: SPARK-48425 URL: https://issues.apache.org/jira/browse/SPARK-48425 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon The issue is in the regression at setuptools from 69.X.X. It replaces dash in package name to underscore (`pyspark_connect-4.0.0.dev1.tar.gz` vs `pyspark-connect-4.0.0.dev1.tar.gz`) https://github.com/pypa/setuptools/issues/4214 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48424) Make dev/is-changed.py to return true it it fails
[ https://issues.apache.org/jira/browse/SPARK-48424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48424. -- Fix Version/s: 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46749 [https://github.com/apache/spark/pull/46749] > Make dev/is-changed.py to return true it it fails > - > > Key: SPARK-48424 > URL: https://issues.apache.org/jira/browse/SPARK-48424 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0, 3.5.2 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2, 4.0.0 > > > e.g., > https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48424) Make dev/is-changed.py to return true it it fails
[ https://issues.apache.org/jira/browse/SPARK-48424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48424: Assignee: Hyukjin Kwon > Make dev/is-changed.py to return true it it fails > - > > Key: SPARK-48424 > URL: https://issues.apache.org/jira/browse/SPARK-48424 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0, 3.5.2 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > e.g., > https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48424) Make dev/is-changed.py to return true it it fails
Hyukjin Kwon created SPARK-48424: Summary: Make dev/is-changed.py to return true it it fails Key: SPARK-48424 URL: https://issues.apache.org/jira/browse/SPARK-48424 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0, 3.5.2 Reporter: Hyukjin Kwon e.g., https://github.com/apache/spark/actions/runs/9244026522/job/25435224163?pr=46747 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
[ https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48370. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46683 [https://github.com/apache/spark/pull/46683] > Checkpoint and localCheckpoint in Scala Spark Connect client > > > Key: SPARK-48370 > URL: https://issues.apache.org/jira/browse/SPARK-48370 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark > Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
[ https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48370: Assignee: Hyukjin Kwon > Checkpoint and localCheckpoint in Scala Spark Connect client > > > Key: SPARK-48370 > URL: https://issues.apache.org/jira/browse/SPARK-48370 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark > Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48393) Move a group of constants to `pyspark.util`
[ https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48393: Assignee: Ruifeng Zheng > Move a group of constants to `pyspark.util` > --- > > Key: SPARK-48393 > URL: https://issues.apache.org/jira/browse/SPARK-48393 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48393) Move a group of constants to `pyspark.util`
[ https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48393. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46710 [https://github.com/apache/spark/pull/46710] > Move a group of constants to `pyspark.util` > --- > > Key: SPARK-48393 > URL: https://issues.apache.org/jira/browse/SPARK-48393 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-48379: -- Assignee: (was: Stefan Kandic) Reverted in https://github.com/apache/spark/commit/9fd85d9acc5acf455d0ad910ef2848695576242b > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48379: - Fix Version/s: (was: 4.0.0) > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs
[ https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48389. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46703 [https://github.com/apache/spark/pull/46703] > Remove obsolete workflow cancel_duplicate_workflow_runs > --- > > Key: SPARK-48389 > URL: https://issues.apache.org/jira/browse/SPARK-48389 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > After https://github.com/apache/spark/pull/46689, we don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs
[ https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48389: Assignee: Hyukjin Kwon > Remove obsolete workflow cancel_duplicate_workflow_runs > --- > > Key: SPARK-48389 > URL: https://issues.apache.org/jira/browse/SPARK-48389 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > After https://github.com/apache/spark/pull/46689, we don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs
Hyukjin Kwon created SPARK-48389: Summary: Remove obsolete workflow cancel_duplicate_workflow_runs Key: SPARK-48389 URL: https://issues.apache.org/jira/browse/SPARK-48389 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon After https://github.com/apache/spark/pull/46689, we don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48379: Assignee: Stefan Kandic > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48379. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46689 [https://github.com/apache/spark/pull/46689] > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48341) Allow Spark Connect plugins to use QueryTest in their tests
[ https://issues.apache.org/jira/browse/SPARK-48341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48341. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46667 [https://github.com/apache/spark/pull/46667] > Allow Spark Connect plugins to use QueryTest in their tests > --- > > Key: SPARK-48341 > URL: https://issues.apache.org/jira/browse/SPARK-48341 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Tom van Bussel >Assignee: Tom van Bussel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
Hyukjin Kwon created SPARK-48370: Summary: Checkpoint and localCheckpoint in Scala Spark Connect client Key: SPARK-48370 URL: https://issues.apache.org/jira/browse/SPARK-48370 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
[ https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48370: - Issue Type: Improvement (was: Bug) > Checkpoint and localCheckpoint in Scala Spark Connect client > > > Key: SPARK-48370 > URL: https://issues.apache.org/jira/browse/SPARK-48370 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark > Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48367) Fix lint-scala for scalafmt to detect properly
[ https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48367. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46679 [https://github.com/apache/spark/pull/46679] > Fix lint-scala for scalafmt to detect properly > -- > > Key: SPARK-48367 > URL: https://issues.apache.org/jira/browse/SPARK-48367 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > ./build/mvn \ > -Pscala-2.13 \ > scalafmt:format \ > -Dscalafmt.skip=false \ > -Dscalafmt.validateOnly=true \ > -Dscalafmt.changedOnly=false \ > -pl connector/connect/common \ > -pl connector/connect/server \ > -pl connector/connect/client/jvm > {code} > fails as below: > {code} > [INFO] Scalafmt results: 1 of 36 were unformatted > [INFO] Details: > [INFO] - Requires formatting: ConnectProtoUtils.scala > [INFO] - Formatted: UdfUtils.scala > [INFO] - Formatted: DataTypeProtoConverter.scala > [INFO] - Formatted: ConnectCommon.scala > [INFO] - Formatted: ProtoUtils.scala > [INFO] - Formatted: Abbreviator.scala > [INFO] - Formatted: ProtoDataTypes.scala > [INFO] - Formatted: LiteralValueProtoConverter.scala > [INFO] - Formatted: InvalidPlanInput.scala > [INFO] - Formatted: ForeachWriterPacket.scala > [INFO] - Formatted: StreamingListenerPacket.scala > [INFO] - Formatted: StorageLevelProtoConverter.scala > [INFO] - Formatted: UdfPacket.scala > [INFO] - Formatted: ClassFinder.scala > [INFO] - Formatted: SparkConnectClient.scala > [INFO] - Formatted: GrpcRetryHandler.scala > [INFO] - Formatted: GrpcExceptionConverter.scala > [INFO] - Formatted: ArrowEncoderUtils.scala > [INFO] - Formatted: ScalaCollectionUtils.scala > [INFO] - Formatted: ArrowDeserializer.scala > [INFO] - Formatted: ArrowVectorReader.scala > [INFO] - Formatted: ArrowSerializer.scala > [INFO] - Formatted: ConcatenatingArrowStreamReader.scala > [INFO] - Formatted: RetryPolicy.scala > [INFO] - Formatted: SparkConnectStubState.scala > [INFO] - Formatted: ArtifactManager.scala > [INFO] - Formatted: SparkResult.scala > [INFO] - Formatted: RetriesExceeded.scala > [INFO] - Formatted: CloseableIterator.scala > [INFO] - Formatted: package.scala > [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala > [INFO] - Formatted: ResponseValidator.scala > [INFO] - Formatted: SparkConnectClientParser.scala > [INFO] - Formatted: CustomSparkConnectStub.scala > [INFO] - Formatted: CustomSparkConnectBlockingStub.scala > [INFO] - Formatted: TestUDFs.scala > {code} > This is because the output format has changed due to scalafmt version upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48367) Fix lint-scala for scalafmt to detect properly
[ https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48367: Assignee: Hyukjin Kwon > Fix lint-scala for scalafmt to detect properly > -- > > Key: SPARK-48367 > URL: https://issues.apache.org/jira/browse/SPARK-48367 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > ./build/mvn \ > -Pscala-2.13 \ > scalafmt:format \ > -Dscalafmt.skip=false \ > -Dscalafmt.validateOnly=true \ > -Dscalafmt.changedOnly=false \ > -pl connector/connect/common \ > -pl connector/connect/server \ > -pl connector/connect/client/jvm > {code} > fails as below: > {code} > [INFO] Scalafmt results: 1 of 36 were unformatted > [INFO] Details: > [INFO] - Requires formatting: ConnectProtoUtils.scala > [INFO] - Formatted: UdfUtils.scala > [INFO] - Formatted: DataTypeProtoConverter.scala > [INFO] - Formatted: ConnectCommon.scala > [INFO] - Formatted: ProtoUtils.scala > [INFO] - Formatted: Abbreviator.scala > [INFO] - Formatted: ProtoDataTypes.scala > [INFO] - Formatted: LiteralValueProtoConverter.scala > [INFO] - Formatted: InvalidPlanInput.scala > [INFO] - Formatted: ForeachWriterPacket.scala > [INFO] - Formatted: StreamingListenerPacket.scala > [INFO] - Formatted: StorageLevelProtoConverter.scala > [INFO] - Formatted: UdfPacket.scala > [INFO] - Formatted: ClassFinder.scala > [INFO] - Formatted: SparkConnectClient.scala > [INFO] - Formatted: GrpcRetryHandler.scala > [INFO] - Formatted: GrpcExceptionConverter.scala > [INFO] - Formatted: ArrowEncoderUtils.scala > [INFO] - Formatted: ScalaCollectionUtils.scala > [INFO] - Formatted: ArrowDeserializer.scala > [INFO] - Formatted: ArrowVectorReader.scala > [INFO] - Formatted: ArrowSerializer.scala > [INFO] - Formatted: ConcatenatingArrowStreamReader.scala > [INFO] - Formatted: RetryPolicy.scala > [INFO] - Formatted: SparkConnectStubState.scala > [INFO] - Formatted: ArtifactManager.scala > [INFO] - Formatted: SparkResult.scala > [INFO] - Formatted: RetriesExceeded.scala > [INFO] - Formatted: CloseableIterator.scala > [INFO] - Formatted: package.scala > [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala > [INFO] - Formatted: ResponseValidator.scala > [INFO] - Formatted: SparkConnectClientParser.scala > [INFO] - Formatted: CustomSparkConnectStub.scala > [INFO] - Formatted: CustomSparkConnectBlockingStub.scala > [INFO] - Formatted: TestUDFs.scala > {code} > This is because the output format has changed due to scalafmt version upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48367) Fix lint-scala for scalafmt to detect properly
Hyukjin Kwon created SPARK-48367: Summary: Fix lint-scala for scalafmt to detect properly Key: SPARK-48367 URL: https://issues.apache.org/jira/browse/SPARK-48367 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} ./build/mvn \ -Pscala-2.13 \ scalafmt:format \ -Dscalafmt.skip=false \ -Dscalafmt.validateOnly=true \ -Dscalafmt.changedOnly=false \ -pl connector/connect/common \ -pl connector/connect/server \ -pl connector/connect/client/jvm {code} fails as below: {code} [INFO] Scalafmt results: 1 of 36 were unformatted [INFO] Details: [INFO] - Requires formatting: ConnectProtoUtils.scala [INFO] - Formatted: UdfUtils.scala [INFO] - Formatted: DataTypeProtoConverter.scala [INFO] - Formatted: ConnectCommon.scala [INFO] - Formatted: ProtoUtils.scala [INFO] - Formatted: Abbreviator.scala [INFO] - Formatted: ProtoDataTypes.scala [INFO] - Formatted: LiteralValueProtoConverter.scala [INFO] - Formatted: InvalidPlanInput.scala [INFO] - Formatted: ForeachWriterPacket.scala [INFO] - Formatted: StreamingListenerPacket.scala [INFO] - Formatted: StorageLevelProtoConverter.scala [INFO] - Formatted: UdfPacket.scala [INFO] - Formatted: ClassFinder.scala [INFO] - Formatted: SparkConnectClient.scala [INFO] - Formatted: GrpcRetryHandler.scala [INFO] - Formatted: GrpcExceptionConverter.scala [INFO] - Formatted: ArrowEncoderUtils.scala [INFO] - Formatted: ScalaCollectionUtils.scala [INFO] - Formatted: ArrowDeserializer.scala [INFO] - Formatted: ArrowVectorReader.scala [INFO] - Formatted: ArrowSerializer.scala [INFO] - Formatted: ConcatenatingArrowStreamReader.scala [INFO] - Formatted: RetryPolicy.scala [INFO] - Formatted: SparkConnectStubState.scala [INFO] - Formatted: ArtifactManager.scala [INFO] - Formatted: SparkResult.scala [INFO] - Formatted: RetriesExceeded.scala [INFO] - Formatted: CloseableIterator.scala [INFO] - Formatted: package.scala [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala [INFO] - Formatted: ResponseValidator.scala [INFO] - Formatted: SparkConnectClientParser.scala [INFO] - Formatted: CustomSparkConnectStub.scala [INFO] - Formatted: CustomSparkConnectBlockingStub.scala [INFO] - Formatted: TestUDFs.scala {code} This is because the output format has changed due to scalafmt version upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48363) Cleanup some redundant codes in `from_xml`
[ https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48363. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46674 [https://github.com/apache/spark/pull/46674] > Cleanup some redundant codes in `from_xml` > -- > > Key: SPARK-48363 > URL: https://issues.apache.org/jira/browse/SPARK-48363 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48363) Cleanup some redundant codes in `from_xml`
[ https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48363: Assignee: BingKun Pan > Cleanup some redundant codes in `from_xml` > -- > > Key: SPARK-48363 > URL: https://issues.apache.org/jira/browse/SPARK-48363 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz
[ https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48340. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 4 [https://github.com/apache/spark/pull/4] > Support TimestampNTZ infer schema miss prefer_timestamp_ntz > > > Key: SPARK-48340 > URL: https://issues.apache.org/jira/browse/SPARK-48340 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: image-2024-05-20-18-38-39-769.png > > > !image-2024-05-20-18-38-39-769.png|width=746,height=450! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz
[ https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48340: Assignee: angerszhu > Support TimestampNTZ infer schema miss prefer_timestamp_ntz > > > Key: SPARK-48340 > URL: https://issues.apache.org/jira/browse/SPARK-48340 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Attachments: image-2024-05-20-18-38-39-769.png > > > !image-2024-05-20-18-38-39-769.png|width=746,height=450! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint
[ https://issues.apache.org/jira/browse/SPARK-48258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48258. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46570 [https://github.com/apache/spark/pull/46570] > Implement DataFrame.checkpoint and DataFrame.localCheckpoint > > > Key: SPARK-48258 > URL: https://issues.apache.org/jira/browse/SPARK-48258 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature > parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org