[jira] [Resolved] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48370.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46683
[https://github.com/apache/spark/pull/46683]

> Checkpoint and localCheckpoint in Scala Spark Connect client
> 
>
> Key: SPARK-48370
> URL: https://issues.apache.org/jira/browse/SPARK-48370
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark 
> Connect client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48370:


Assignee: Hyukjin Kwon

> Checkpoint and localCheckpoint in Scala Spark Connect client
> 
>
> Key: SPARK-48370
> URL: https://issues.apache.org/jira/browse/SPARK-48370
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark 
> Connect client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48393) Move a group of constants to `pyspark.util`

2024-05-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48393:


Assignee: Ruifeng Zheng

> Move a group of constants to `pyspark.util`
> ---
>
> Key: SPARK-48393
> URL: https://issues.apache.org/jira/browse/SPARK-48393
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48393) Move a group of constants to `pyspark.util`

2024-05-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48393.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46710
[https://github.com/apache/spark/pull/46710]

> Move a group of constants to `pyspark.util`
> ---
>
> Key: SPARK-48393
> URL: https://issues.apache.org/jira/browse/SPARK-48393
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-48379:
--
  Assignee: (was: Stefan Kandic)

Reverted in 
https://github.com/apache/spark/commit/9fd85d9acc5acf455d0ad910ef2848695576242b

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48379:
-
Fix Version/s: (was: 4.0.0)

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs

2024-05-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48389.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46703
[https://github.com/apache/spark/pull/46703]

> Remove obsolete workflow cancel_duplicate_workflow_runs
> ---
>
> Key: SPARK-48389
> URL: https://issues.apache.org/jira/browse/SPARK-48389
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> After https://github.com/apache/spark/pull/46689, we don't need this anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs

2024-05-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48389:


Assignee: Hyukjin Kwon

> Remove obsolete workflow cancel_duplicate_workflow_runs
> ---
>
> Key: SPARK-48389
> URL: https://issues.apache.org/jira/browse/SPARK-48389
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> After https://github.com/apache/spark/pull/46689, we don't need this anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs

2024-05-22 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48389:


 Summary: Remove obsolete workflow cancel_duplicate_workflow_runs
 Key: SPARK-48389
 URL: https://issues.apache.org/jira/browse/SPARK-48389
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


After https://github.com/apache/spark/pull/46689, we don't need this anymore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48379:


Assignee: Stefan Kandic

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Assignee: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48379) Cancel build during a PR when a new commit is pushed

2024-05-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48379.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46689
[https://github.com/apache/spark/pull/46689]

> Cancel build during a PR when a new commit is pushed
> 
>
> Key: SPARK-48379
> URL: https://issues.apache.org/jira/browse/SPARK-48379
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Creating a new commit on a branch should cancel the build of previous commits 
> for the same branch.
> Exceptions are master and branch-* branches where we still want to have 
> concurrent builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48341) Allow Spark Connect plugins to use QueryTest in their tests

2024-05-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48341.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46667
[https://github.com/apache/spark/pull/46667]

> Allow Spark Connect plugins to use QueryTest in their tests
> ---
>
> Key: SPARK-48341
> URL: https://issues.apache.org/jira/browse/SPARK-48341
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-21 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48370:


 Summary: Checkpoint and localCheckpoint in Scala Spark Connect 
client
 Key: SPARK-48370
 URL: https://issues.apache.org/jira/browse/SPARK-48370
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark Connect 
client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client

2024-05-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48370:
-
Issue Type: Improvement  (was: Bug)

> Checkpoint and localCheckpoint in Scala Spark Connect client
> 
>
> Key: SPARK-48370
> URL: https://issues.apache.org/jira/browse/SPARK-48370
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark 
> Connect client. We should do it in Scala too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48367) Fix lint-scala for scalafmt to detect properly

2024-05-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48367.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46679
[https://github.com/apache/spark/pull/46679]

> Fix lint-scala for scalafmt to detect properly
> --
>
> Key: SPARK-48367
> URL: https://issues.apache.org/jira/browse/SPARK-48367
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> ./build/mvn \
> -Pscala-2.13 \
> scalafmt:format \
> -Dscalafmt.skip=false \
> -Dscalafmt.validateOnly=true \
> -Dscalafmt.changedOnly=false \
> -pl connector/connect/common \
> -pl connector/connect/server \
> -pl connector/connect/client/jvm
> {code}
> fails as below:
> {code}
> [INFO] Scalafmt results: 1 of 36 were unformatted
> [INFO] Details:
> [INFO] - Requires formatting: ConnectProtoUtils.scala
> [INFO] - Formatted: UdfUtils.scala
> [INFO] - Formatted: DataTypeProtoConverter.scala
> [INFO] - Formatted: ConnectCommon.scala
> [INFO] - Formatted: ProtoUtils.scala
> [INFO] - Formatted: Abbreviator.scala
> [INFO] - Formatted: ProtoDataTypes.scala
> [INFO] - Formatted: LiteralValueProtoConverter.scala
> [INFO] - Formatted: InvalidPlanInput.scala
> [INFO] - Formatted: ForeachWriterPacket.scala
> [INFO] - Formatted: StreamingListenerPacket.scala
> [INFO] - Formatted: StorageLevelProtoConverter.scala
> [INFO] - Formatted: UdfPacket.scala
> [INFO] - Formatted: ClassFinder.scala
> [INFO] - Formatted: SparkConnectClient.scala
> [INFO] - Formatted: GrpcRetryHandler.scala
> [INFO] - Formatted: GrpcExceptionConverter.scala
> [INFO] - Formatted: ArrowEncoderUtils.scala
> [INFO] - Formatted: ScalaCollectionUtils.scala
> [INFO] - Formatted: ArrowDeserializer.scala
> [INFO] - Formatted: ArrowVectorReader.scala
> [INFO] - Formatted: ArrowSerializer.scala
> [INFO] - Formatted: ConcatenatingArrowStreamReader.scala
> [INFO] - Formatted: RetryPolicy.scala
> [INFO] - Formatted: SparkConnectStubState.scala
> [INFO] - Formatted: ArtifactManager.scala
> [INFO] - Formatted: SparkResult.scala
> [INFO] - Formatted: RetriesExceeded.scala
> [INFO] - Formatted: CloseableIterator.scala
> [INFO] - Formatted: package.scala
> [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala
> [INFO] - Formatted: ResponseValidator.scala
> [INFO] - Formatted: SparkConnectClientParser.scala
> [INFO] - Formatted: CustomSparkConnectStub.scala
> [INFO] - Formatted: CustomSparkConnectBlockingStub.scala
> [INFO] - Formatted: TestUDFs.scala
> {code}
> This is because the output format has changed due to scalafmt version upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48367) Fix lint-scala for scalafmt to detect properly

2024-05-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48367:


Assignee: Hyukjin Kwon

> Fix lint-scala for scalafmt to detect properly
> --
>
> Key: SPARK-48367
> URL: https://issues.apache.org/jira/browse/SPARK-48367
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> ./build/mvn \
> -Pscala-2.13 \
> scalafmt:format \
> -Dscalafmt.skip=false \
> -Dscalafmt.validateOnly=true \
> -Dscalafmt.changedOnly=false \
> -pl connector/connect/common \
> -pl connector/connect/server \
> -pl connector/connect/client/jvm
> {code}
> fails as below:
> {code}
> [INFO] Scalafmt results: 1 of 36 were unformatted
> [INFO] Details:
> [INFO] - Requires formatting: ConnectProtoUtils.scala
> [INFO] - Formatted: UdfUtils.scala
> [INFO] - Formatted: DataTypeProtoConverter.scala
> [INFO] - Formatted: ConnectCommon.scala
> [INFO] - Formatted: ProtoUtils.scala
> [INFO] - Formatted: Abbreviator.scala
> [INFO] - Formatted: ProtoDataTypes.scala
> [INFO] - Formatted: LiteralValueProtoConverter.scala
> [INFO] - Formatted: InvalidPlanInput.scala
> [INFO] - Formatted: ForeachWriterPacket.scala
> [INFO] - Formatted: StreamingListenerPacket.scala
> [INFO] - Formatted: StorageLevelProtoConverter.scala
> [INFO] - Formatted: UdfPacket.scala
> [INFO] - Formatted: ClassFinder.scala
> [INFO] - Formatted: SparkConnectClient.scala
> [INFO] - Formatted: GrpcRetryHandler.scala
> [INFO] - Formatted: GrpcExceptionConverter.scala
> [INFO] - Formatted: ArrowEncoderUtils.scala
> [INFO] - Formatted: ScalaCollectionUtils.scala
> [INFO] - Formatted: ArrowDeserializer.scala
> [INFO] - Formatted: ArrowVectorReader.scala
> [INFO] - Formatted: ArrowSerializer.scala
> [INFO] - Formatted: ConcatenatingArrowStreamReader.scala
> [INFO] - Formatted: RetryPolicy.scala
> [INFO] - Formatted: SparkConnectStubState.scala
> [INFO] - Formatted: ArtifactManager.scala
> [INFO] - Formatted: SparkResult.scala
> [INFO] - Formatted: RetriesExceeded.scala
> [INFO] - Formatted: CloseableIterator.scala
> [INFO] - Formatted: package.scala
> [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala
> [INFO] - Formatted: ResponseValidator.scala
> [INFO] - Formatted: SparkConnectClientParser.scala
> [INFO] - Formatted: CustomSparkConnectStub.scala
> [INFO] - Formatted: CustomSparkConnectBlockingStub.scala
> [INFO] - Formatted: TestUDFs.scala
> {code}
> This is because the output format has changed due to scalafmt version upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48367) Fix lint-scala for scalafmt to detect properly

2024-05-21 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48367:


 Summary: Fix lint-scala for scalafmt to detect properly
 Key: SPARK-48367
 URL: https://issues.apache.org/jira/browse/SPARK-48367
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
./build/mvn \
-Pscala-2.13 \
scalafmt:format \
-Dscalafmt.skip=false \
-Dscalafmt.validateOnly=true \
-Dscalafmt.changedOnly=false \
-pl connector/connect/common \
-pl connector/connect/server \
-pl connector/connect/client/jvm
{code}

fails as below:

{code}
[INFO] Scalafmt results: 1 of 36 were unformatted
[INFO] Details:
[INFO] - Requires formatting: ConnectProtoUtils.scala
[INFO] - Formatted: UdfUtils.scala
[INFO] - Formatted: DataTypeProtoConverter.scala
[INFO] - Formatted: ConnectCommon.scala
[INFO] - Formatted: ProtoUtils.scala
[INFO] - Formatted: Abbreviator.scala
[INFO] - Formatted: ProtoDataTypes.scala
[INFO] - Formatted: LiteralValueProtoConverter.scala
[INFO] - Formatted: InvalidPlanInput.scala
[INFO] - Formatted: ForeachWriterPacket.scala
[INFO] - Formatted: StreamingListenerPacket.scala
[INFO] - Formatted: StorageLevelProtoConverter.scala
[INFO] - Formatted: UdfPacket.scala
[INFO] - Formatted: ClassFinder.scala
[INFO] - Formatted: SparkConnectClient.scala
[INFO] - Formatted: GrpcRetryHandler.scala
[INFO] - Formatted: GrpcExceptionConverter.scala
[INFO] - Formatted: ArrowEncoderUtils.scala
[INFO] - Formatted: ScalaCollectionUtils.scala
[INFO] - Formatted: ArrowDeserializer.scala
[INFO] - Formatted: ArrowVectorReader.scala
[INFO] - Formatted: ArrowSerializer.scala
[INFO] - Formatted: ConcatenatingArrowStreamReader.scala
[INFO] - Formatted: RetryPolicy.scala
[INFO] - Formatted: SparkConnectStubState.scala
[INFO] - Formatted: ArtifactManager.scala
[INFO] - Formatted: SparkResult.scala
[INFO] - Formatted: RetriesExceeded.scala
[INFO] - Formatted: CloseableIterator.scala
[INFO] - Formatted: package.scala
[INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala
[INFO] - Formatted: ResponseValidator.scala
[INFO] - Formatted: SparkConnectClientParser.scala
[INFO] - Formatted: CustomSparkConnectStub.scala
[INFO] - Formatted: CustomSparkConnectBlockingStub.scala
[INFO] - Formatted: TestUDFs.scala
{code}

This is because the output format has changed due to scalafmt version upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48363) Cleanup some redundant codes in `from_xml`

2024-05-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48363.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46674
[https://github.com/apache/spark/pull/46674]

> Cleanup some redundant codes in `from_xml`
> --
>
> Key: SPARK-48363
> URL: https://issues.apache.org/jira/browse/SPARK-48363
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48363) Cleanup some redundant codes in `from_xml`

2024-05-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48363:


Assignee: BingKun Pan

> Cleanup some redundant codes in `from_xml`
> --
>
> Key: SPARK-48363
> URL: https://issues.apache.org/jira/browse/SPARK-48363
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz

2024-05-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48340.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 4
[https://github.com/apache/spark/pull/4]

> Support TimestampNTZ  infer schema miss prefer_timestamp_ntz
> 
>
> Key: SPARK-48340
> URL: https://issues.apache.org/jira/browse/SPARK-48340
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2024-05-20-18-38-39-769.png
>
>
> !image-2024-05-20-18-38-39-769.png|width=746,height=450!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz

2024-05-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48340:


Assignee: angerszhu

> Support TimestampNTZ  infer schema miss prefer_timestamp_ntz
> 
>
> Key: SPARK-48340
> URL: https://issues.apache.org/jira/browse/SPARK-48340
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-05-20-18-38-39-769.png
>
>
> !image-2024-05-20-18-38-39-769.png|width=746,height=450!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint

2024-05-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48258.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46570
[https://github.com/apache/spark/pull/46570]

> Implement DataFrame.checkpoint and DataFrame.localCheckpoint
> 
>
> Key: SPARK-48258
> URL: https://issues.apache.org/jira/browse/SPARK-48258
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature 
> parity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48333) Test `test_sorting_functions_with_column` with same `Column`

2024-05-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48333.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46654
[https://github.com/apache/spark/pull/46654]

> Test `test_sorting_functions_with_column` with same `Column`
> 
>
> Key: SPARK-48333
> URL: https://issues.apache.org/jira/browse/SPARK-48333
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48333) Test `test_sorting_functions_with_column` with same `Column`

2024-05-19 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48333:


Assignee: Ruifeng Zheng

> Test `test_sorting_functions_with_column` with same `Column`
> 
>
> Key: SPARK-48333
> URL: https://issues.apache.org/jira/browse/SPARK-48333
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48319) Test `assert_true` and `raise_error` with the same error class as Spark Classic

2024-05-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48319:


Assignee: Ruifeng Zheng

> Test `assert_true` and `raise_error` with the same error class as Spark 
> Classic
> ---
>
> Key: SPARK-48319
> URL: https://issues.apache.org/jira/browse/SPARK-48319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48319) Test `assert_true` and `raise_error` with the same error class as Spark Classic

2024-05-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48319.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46633
[https://github.com/apache/spark/pull/46633]

> Test `assert_true` and `raise_error` with the same error class as Spark 
> Classic
> ---
>
> Key: SPARK-48319
> URL: https://issues.apache.org/jira/browse/SPARK-48319
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file

2024-05-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48317:


Assignee: Hyukjin Kwon

> Enable test_udtf_with_analyze_using_archive and 
> test_udtf_with_analyze_using_file
> -
>
> Key: SPARK-48317
> URL: https://issues.apache.org/jira/browse/SPARK-48317
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file

2024-05-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48317.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46632
[https://github.com/apache/spark/pull/46632]

> Enable test_udtf_with_analyze_using_archive and 
> test_udtf_with_analyze_using_file
> -
>
> Key: SPARK-48317
> URL: https://issues.apache.org/jira/browse/SPARK-48317
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition

2024-05-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48316:


Assignee: Hyukjin Kwon

> Fix comments for SparkFrameMethodsParityTests.test_coalesce and 
> test_repartition
> 
>
> Key: SPARK-48316
> URL: https://issues.apache.org/jira/browse/SPARK-48316
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition

2024-05-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48316.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46629
[https://github.com/apache/spark/pull/46629]

> Fix comments for SparkFrameMethodsParityTests.test_coalesce and 
> test_repartition
> 
>
> Key: SPARK-48316
> URL: https://issues.apache.org/jira/browse/SPARK-48316
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition

2024-05-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48316:
-
Summary: Fix comments for SparkFrameMethodsParityTests.test_coalesce and 
test_repartition  (was: Enable SparkFrameMethodsParityTests.test_coalesce and 
test_repartition)

> Fix comments for SparkFrameMethodsParityTests.test_coalesce and 
> test_repartition
> 
>
> Key: SPARK-48316
> URL: https://issues.apache.org/jira/browse/SPARK-48316
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file

2024-05-16 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48317:


 Summary: Enable test_udtf_with_analyze_using_archive and 
test_udtf_with_analyze_using_file
 Key: SPARK-48317
 URL: https://issues.apache.org/jira/browse/SPARK-48317
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter

2024-05-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48238:
-
Parent: (was: SPARK-47970)
Issue Type: Bug  (was: Sub-task)

> Spark fail to start due to class 
> o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
> ---
>
> Key: SPARK-48238
> URL: https://issues.apache.org/jira/browse/SPARK-48238
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Blocker
>  Labels: pull-request-available
>
> I tested the latest master branch, it failed to start on YARN mode
> {code:java}
> dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code}
>  
> {code:java}
> $ bin/spark-sql --master yarn
> WARNING: Using incubator modules: jdk.incubator.vector
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor 
> spark.yarn.archive} is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext.
> org.sparkproject.jetty.util.MultiException: Multiple exceptions
>     at 
> org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:935) 
> ~[scala-library-2.13.13.jar:?]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.SparkContext.(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118)
>  ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?]
>     at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112)
>  [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:64)
>  

[jira] [Created] (SPARK-48316) Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition

2024-05-16 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48316:


 Summary: Enable SparkFrameMethodsParityTests.test_coalesce and 
test_repartition
 Key: SPARK-48316
 URL: https://issues.apache.org/jira/browse/SPARK-48316
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48310) Cached Properties Should return copies instead of values

2024-05-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48310.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46621
[https://github.com/apache/spark/pull/46621]

> Cached Properties Should return copies instead of values
> 
>
> Key: SPARK-48310
> URL: https://issues.apache.org/jira/browse/SPARK-48310
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When returning cached properties for schema and columns a user might 
> incidentally modify the cached values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir

2024-05-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48268.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46571
[https://github.com/apache/spark/pull/46571]

> Add a configuration for SparkContext.setCheckpointDir
> -
>
> Key: SPARK-48268
> URL: https://issues.apache.org/jira/browse/SPARK-48268
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Would be great to have it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir

2024-05-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48268:


Assignee: Hyukjin Kwon

> Add a configuration for SparkContext.setCheckpointDir
> -
>
> Key: SPARK-48268
> URL: https://issues.apache.org/jira/browse/SPARK-48268
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Would be great to have it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48295) Turn on compute.ops_on_diff_frames by default

2024-05-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48295.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46602
[https://github.com/apache/spark/pull/46602]

> Turn on compute.ops_on_diff_frames by default
> -
>
> Key: SPARK-48295
> URL: https://issues.apache.org/jira/browse/SPARK-48295
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48295) Turn on compute.ops_on_diff_frames by default

2024-05-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48295:


Assignee: Ruifeng Zheng

> Turn on compute.ops_on_diff_frames by default
> -
>
> Key: SPARK-48295
> URL: https://issues.apache.org/jira/browse/SPARK-48295
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48100) [SQL][XML] Fix issues in skipping nested structure fields not selected in schema

2024-05-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48100:


Assignee: Shujing Yang

> [SQL][XML] Fix issues in skipping nested structure fields not selected in 
> schema
> 
>
> Key: SPARK-48100
> URL: https://issues.apache.org/jira/browse/SPARK-48100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
>
> Previously, the XML parser can't skip nested structure data fields 
> effectively when they were not selected in the schema. For instance, in the 
> below example, `df.select("struct2").collect()` returns `Seq(null)` as 
> `struct1` wasn't effectively skipped. This PR fixes this issue.
> {code:java}
> 
>   
>     1
>   
>   
>     2
>   
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48100) [SQL][XML] Fix issues in skipping nested structure fields not selected in schema

2024-05-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48100.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46348
[https://github.com/apache/spark/pull/46348]

> [SQL][XML] Fix issues in skipping nested structure fields not selected in 
> schema
> 
>
> Key: SPARK-48100
> URL: https://issues.apache.org/jira/browse/SPARK-48100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Shujing Yang
>Assignee: Shujing Yang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Previously, the XML parser can't skip nested structure data fields 
> effectively when they were not selected in the schema. For instance, in the 
> below example, `df.select("struct2").collect()` returns `Seq(null)` as 
> `struct1` wasn't effectively skipped. This PR fixes this issue.
> {code:java}
> 
>   
>     1
>   
>   
>     2
>   
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48247) Use all values in a python dict when inferring MapType schema

2024-05-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48247.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46547
[https://github.com/apache/spark/pull/46547]

> Use all values in a python dict when inferring MapType schema
> -
>
> Key: SPARK-48247
> URL: https://issues.apache.org/jira/browse/SPARK-48247
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Similar with SPARK-39168



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48247) Use all values in a python dict when inferring MapType schema

2024-05-14 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48247:


Assignee: Hyukjin Kwon

> Use all values in a python dict when inferring MapType schema
> -
>
> Key: SPARK-48247
> URL: https://issues.apache.org/jira/browse/SPARK-48247
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Similar with SPARK-39168



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48266) Move o.a.spark.sql.connect.dsl to test dir

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48266.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46567
[https://github.com/apache/spark/pull/46567]

> Move o.a.spark.sql.connect.dsl to test dir
> --
>
> Key: SPARK-48266
> URL: https://issues.apache.org/jira/browse/SPARK-48266
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir

2024-05-13 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48268:


 Summary: Add a configuration for SparkContext.setCheckpointDir
 Key: SPARK-48268
 URL: https://issues.apache.org/jira/browse/SPARK-48268
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Would be great to have it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint

2024-05-13 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48258:


 Summary: Implement DataFrame.checkpoint and 
DataFrame.localCheckpoint
 Key: SPARK-48258
 URL: https://issues.apache.org/jira/browse/SPARK-48258
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature 
parity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48254.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46555
[https://github.com/apache/spark/pull/46555]

> Enhance Guava version extraction rule in dev/test-dependencies.sh
> -
>
> Key: SPARK-48254
> URL: https://issues.apache.org/jira/browse/SPARK-48254
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48254:


Assignee: Cheng Pan

> Enhance Guava version extraction rule in dev/test-dependencies.sh
> -
>
> Key: SPARK-48254
> URL: https://issues.apache.org/jira/browse/SPARK-48254
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48248) Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48248:


Assignee: Hyukjin Kwon

> Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement
> -
>
> Key: SPARK-48248
> URL: https://issues.apache.org/jira/browse/SPARK-48248
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled",
> >>>  True)
> >>> spark.createDataFrame(1, "a")
> DataFrame[_1: array>]
> {code}
> should infer it as an integer of array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48248) Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48248.
--
Fix Version/s: 3.4.4
   3.5.2
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 46548
[https://github.com/apache/spark/pull/46548]

> Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement
> -
>
> Key: SPARK-48248
> URL: https://issues.apache.org/jira/browse/SPARK-48248
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.4, 3.5.2, 4.0.0
>
>
> {code}
> >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled",
> >>>  True)
> >>> spark.createDataFrame(1, "a")
> DataFrame[_1: array>]
> {code}
> should infer it as an integer of array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48250) Enable array inference tests at test_parity_types.py

2024-05-13 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48250.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46550
[https://github.com/apache/spark/pull/46550]

> Enable array inference tests at test_parity_types.py
> 
>
> Key: SPARK-48250
> URL: https://issues.apache.org/jira/browse/SPARK-48250
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Some tests in test_types.py are using RDD unnecessarily. We can remove that 
> to enable some tests with Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter

2024-05-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48238:
-
Parent: SPARK-47970
Issue Type: Sub-task  (was: Bug)

> Spark fail to start due to class 
> o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
> ---
>
> Key: SPARK-48238
> URL: https://issues.apache.org/jira/browse/SPARK-48238
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Blocker
>
> I tested the latest master branch, it failed to start on YARN mode
> {code:java}
> dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code}
>  
> {code:java}
> $ bin/spark-sql --master yarn
> WARNING: Using incubator modules: jdk.incubator.vector
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor 
> spark.yarn.archive} is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext.
> org.sparkproject.jetty.util.MultiException: Multiple exceptions
>     at 
> org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) 
> ~[scala-library-2.13.13.jar:?]
>     at scala.collection.AbstractIterable.foreach(Iterable.scala:935) 
> ~[scala-library-2.13.13.jar:?]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79)
>  ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
>     at org.apache.spark.SparkContext.(SparkContext.scala:690) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) 
> ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118)
>  ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?]
>     at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112)
>  [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:64)
>  [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>     at 
> 

[jira] [Updated] (SPARK-48250) Enable array inference tests at test_parity_types.py

2024-05-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48250:
-
Issue Type: Test  (was: Bug)

> Enable array inference tests at test_parity_types.py
> 
>
> Key: SPARK-48250
> URL: https://issues.apache.org/jira/browse/SPARK-48250
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48250) Enable array inference tests at test_parity_types.py

2024-05-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48250:
-
Priority: Minor  (was: Major)

> Enable array inference tests at test_parity_types.py
> 
>
> Key: SPARK-48250
> URL: https://issues.apache.org/jira/browse/SPARK-48250
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48250) Enable array inference tests at test_parity_types.py

2024-05-12 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48250:


 Summary: Enable array inference tests at test_parity_types.py
 Key: SPARK-48250
 URL: https://issues.apache.org/jira/browse/SPARK-48250
 Project: Spark
  Issue Type: Bug
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-48249) Use non-null value for legacy conf of inferArrayTypeFromFirstElement

2024-05-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon deleted SPARK-48249:
-


> Use non-null value for legacy conf of inferArrayTypeFromFirstElement
> 
>
> Key: SPARK-48249
> URL: https://issues.apache.org/jira/browse/SPARK-48249
> Project: Spark
>  Issue Type: Bug
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled",
> >>>  True)
> >>> spark.createDataFrame([[[None, 1]]])
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/session.py", line 1538, in 
> createDataFrame
> return self._create_dataframe(
>^^^
>   File "/.../spark/python/pyspark/sql/session.py", line 1582, in 
> _create_dataframe
> rdd, struct = self._createFromLocal(
>   ^^
>   File "/.../spark/python/pyspark/sql/session.py", line 1184, in 
> _createFromLocal
> struct = self._inferSchemaFromList(data, names=schema)
>  ^
>   File "/.../spark/python/pyspark/sql/session.py", line 1060, in 
> _inferSchemaFromList
> raise PySparkValueError(
> pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_DETERMINE_TYPE] 
> Some of types cannot be determined after inferring.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48249) Use non-null value for legacy conf of inferArrayTypeFromFirstElement

2024-05-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48249:
-
Description: 
{code}
>>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled",
>>>  True)
>>> spark.createDataFrame([[[None, 1]]])
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../spark/python/pyspark/sql/session.py", line 1538, in createDataFrame
return self._create_dataframe(
   ^^^
  File "/.../spark/python/pyspark/sql/session.py", line 1582, in 
_create_dataframe
rdd, struct = self._createFromLocal(
  ^^
  File "/.../spark/python/pyspark/sql/session.py", line 1184, in 
_createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
 ^
  File "/.../spark/python/pyspark/sql/session.py", line 1060, in 
_inferSchemaFromList
raise PySparkValueError(
pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_DETERMINE_TYPE] Some 
of types cannot be determined after inferring.
{code}

  was:
{code}
>>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled",
>>>  True)
>>> spark.createDataFrame([None, 1])
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../spark/python/pyspark/sql/session.py", line 1538, in createDataFrame
return self._create_dataframe(
   ^^^
  File "/.../spark/python/pyspark/sql/session.py", line 1582, in 
_create_dataframe
rdd, struct = self._createFromLocal(
  ^^
  File "/.../spark/python/pyspark/sql/session.py", line 1184, in 
_createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
 ^
  File "/.../spark/python/pyspark/sql/session.py", line 1046, in 
_inferSchemaFromList
schema = reduce(
 ^^^
  File "/.../spark/python/pyspark/sql/session.py", line 1049, in 
_infer_schema(
  File "/.../spark/python/pyspark/sql/types.py", line 2015, in _infer_schema
raise PySparkTypeError(
pyspark.errors.exceptions.base.PySparkTypeError: [CANNOT_INFER_SCHEMA_FOR_TYPE] 
Can not infer schema for type: `NoneType`.
{code}


> Use non-null value for legacy conf of inferArrayTypeFromFirstElement
> 
>
> Key: SPARK-48249
> URL: https://issues.apache.org/jira/browse/SPARK-48249
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled",
> >>>  True)
> >>> spark.createDataFrame([[[None, 1]]])
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/session.py", line 1538, in 
> createDataFrame
> return self._create_dataframe(
>^^^
>   File "/.../spark/python/pyspark/sql/session.py", line 1582, in 
> _create_dataframe
> rdd, struct = self._createFromLocal(
>   ^^
>   File "/.../spark/python/pyspark/sql/session.py", line 1184, in 
> _createFromLocal
> struct = self._inferSchemaFromList(data, names=schema)
>  ^
>   File "/.../spark/python/pyspark/sql/session.py", line 1060, in 
> _inferSchemaFromList
> raise PySparkValueError(
> pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_DETERMINE_TYPE] 
> Some of types cannot be determined after inferring.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48249) Use non-null value for legacy conf of inferArrayTypeFromFirstElement

2024-05-12 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48249:


 Summary: Use non-null value for legacy conf of 
inferArrayTypeFromFirstElement
 Key: SPARK-48249
 URL: https://issues.apache.org/jira/browse/SPARK-48249
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
>>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled",
>>>  True)
>>> spark.createDataFrame([None, 1])
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../spark/python/pyspark/sql/session.py", line 1538, in createDataFrame
return self._create_dataframe(
   ^^^
  File "/.../spark/python/pyspark/sql/session.py", line 1582, in 
_create_dataframe
rdd, struct = self._createFromLocal(
  ^^
  File "/.../spark/python/pyspark/sql/session.py", line 1184, in 
_createFromLocal
struct = self._inferSchemaFromList(data, names=schema)
 ^
  File "/.../spark/python/pyspark/sql/session.py", line 1046, in 
_inferSchemaFromList
schema = reduce(
 ^^^
  File "/.../spark/python/pyspark/sql/session.py", line 1049, in 
_infer_schema(
  File "/.../spark/python/pyspark/sql/types.py", line 2015, in _infer_schema
raise PySparkTypeError(
pyspark.errors.exceptions.base.PySparkTypeError: [CANNOT_INFER_SCHEMA_FOR_TYPE] 
Can not infer schema for type: `NoneType`.
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48248) Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement

2024-05-12 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48248:


 Summary: Fix nested array to respect legacy conf of 
inferArrayTypeFromFirstElement
 Key: SPARK-48248
 URL: https://issues.apache.org/jira/browse/SPARK-48248
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
>>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled",
>>>  True)
>>> spark.createDataFrame(1, "a")
DataFrame[_1: array>]
{code}

should infer it as an integer of array



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48247) Use all values in a python dict when inferring MapType schema

2024-05-12 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48247:


 Summary: Use all values in a python dict when inferring MapType 
schema
 Key: SPARK-48247
 URL: https://issues.apache.org/jira/browse/SPARK-48247
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Similar with SPARK-39168



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48245) Typo in `BadRecordException` class doc

2024-05-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48245.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46542
[https://github.com/apache/spark/pull/46542]

> Typo in `BadRecordException` class doc
> --
>
> Key: SPARK-48245
> URL: https://issues.apache.org/jira/browse/SPARK-48245
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48245) Typo in `BadRecordException` class doc

2024-05-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48245:


Assignee: Vladimir Golubev

> Typo in `BadRecordException` class doc
> --
>
> Key: SPARK-48245
> URL: https://issues.apache.org/jira/browse/SPARK-48245
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48239) Update the release docker image to follow what we use in Github Action jobs

2024-05-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48239.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46534
[https://github.com/apache/spark/pull/46534]

> Update the release docker image to follow what we use in Github Action jobs
> ---
>
> Key: SPARK-48239
> URL: https://issues.apache.org/jira/browse/SPARK-48239
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44265) Built-in XML data source support

2024-05-12 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845752#comment-17845752
 ] 

Hyukjin Kwon commented on SPARK-44265:
--

https://issues.apache.org/jira/browse/SPARK-45190 is not done yet

> Built-in XML data source support
> 
>
> Key: SPARK-44265
> URL: https://issues.apache.org/jira/browse/SPARK-44265
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Critical
>  Labels: pull-request-available
>
> XML is a widely used data format. An external spark-xml package 
> ([https://github.com/databricks/spark-xml)] is available to read and write 
> XML data in spark. Making spark-xml built-in will provide a better user 
> experience for Spark SQL and structured streaming. The proposal is to inline 
> code from spark-xml package.
>  
> Here is the link to 
> [SPIP|https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs

2024-05-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48240:


Assignee: BingKun Pan

> Replace `Local[..]` with `"Local[...]"` in the docs
> ---
>
> Key: SPARK-48240
> URL: https://issues.apache.org/jira/browse/SPARK-48240
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs

2024-05-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48240.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46535
[https://github.com/apache/spark/pull/46535]

> Replace `Local[..]` with `"Local[...]"` in the docs
> ---
>
> Key: SPARK-48240
> URL: https://issues.apache.org/jira/browse/SPARK-48240
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48232) Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build

2024-05-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48232:


Assignee: Hyukjin Kwon

> Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build
> -
>
> Key: SPARK-48232
> URL: https://issues.apache.org/jira/browse/SPARK-48232
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/runs/9022174253/job/24804919747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48232) Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build

2024-05-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48232.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46522
[https://github.com/apache/spark/pull/46522]

> Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build
> -
>
> Key: SPARK-48232
> URL: https://issues.apache.org/jira/browse/SPARK-48232
> Project: Spark
>  Issue Type: Test
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://github.com/apache/spark/actions/runs/9022174253/job/24804919747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48232) Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build

2024-05-10 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48232:


 Summary: Fix 'pyspark.sql.tests.connect.test_connect_session' in 
Python 3.12 build
 Key: SPARK-48232
 URL: https://issues.apache.org/jira/browse/SPARK-48232
 Project: Spark
  Issue Type: Test
  Components: Connect, PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/apache/spark/actions/runs/9022174253/job/24804919747



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48176) Fix name of FIELD_ALREADY_EXISTS error condition

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48176:


Assignee: Nicholas Chammas

> Fix name of FIELD_ALREADY_EXISTS error condition
> 
>
> Key: SPARK-48176
> URL: https://issues.apache.org/jira/browse/SPARK-48176
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48176) Fix name of FIELD_ALREADY_EXISTS error condition

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48176.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46510
[https://github.com/apache/spark/pull/46510]

> Fix name of FIELD_ALREADY_EXISTS error condition
> 
>
> Key: SPARK-48176
> URL: https://issues.apache.org/jira/browse/SPARK-48176
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48180) Analyzer bug with multiple ORDER BY items for input table argument

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48180:


Assignee: Daniel

> Analyzer bug with multiple ORDER BY items for input table argument
> --
>
> Key: SPARK-48180
> URL: https://issues.apache.org/jira/browse/SPARK-48180
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0, 4.0.0, 3.5.1
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
>
> Steps to reproduce:
>  
> {{from pyspark.sql.functions import udtf}}
> {{@udtf(returnType="a: int, b: int")}}
> {{class tvf:}}
> {{  def eval(self, *args):}}
> {{    yield 1, 2}}
>  
> {{SELECT * FROM tvf(}}
> {{  TABLE(}}
> {{    SELECT 1 AS device_id, 2 AS data_ds}}
> {{    )}}
> {{    WITH SINGLE PARTITION}}
> {{    ORDER BY device_id, data_ds}}
> {{ )}}
> {{[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT] 
> Unsupported subquery expression: Table arguments are used in a function where 
> they are not supported:}}
> {{'UnresolvedTableValuedFunction [tvf], [table-argument#338 [], 'data_ds], 
> false}}
> {{   +- Project [1 AS device_id#336, 2 AS data_ds#337]}}
> {{      +- OneRowRelation}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48180) Analyzer bug with multiple ORDER BY items for input table argument

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48180.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46451
[https://github.com/apache/spark/pull/46451]

> Analyzer bug with multiple ORDER BY items for input table argument
> --
>
> Key: SPARK-48180
> URL: https://issues.apache.org/jira/browse/SPARK-48180
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0, 4.0.0, 3.5.1
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Steps to reproduce:
>  
> {{from pyspark.sql.functions import udtf}}
> {{@udtf(returnType="a: int, b: int")}}
> {{class tvf:}}
> {{  def eval(self, *args):}}
> {{    yield 1, 2}}
>  
> {{SELECT * FROM tvf(}}
> {{  TABLE(}}
> {{    SELECT 1 AS device_id, 2 AS data_ds}}
> {{    )}}
> {{    WITH SINGLE PARTITION}}
> {{    ORDER BY device_id, data_ds}}
> {{ )}}
> {{[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT] 
> Unsupported subquery expression: Table arguments are used in a function where 
> they are not supported:}}
> {{'UnresolvedTableValuedFunction [tvf], [table-argument#338 [], 'data_ds], 
> false}}
> {{   +- Project [1 AS device_id#336, 2 AS data_ds#337]}}
> {{      +- OneRowRelation}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48148) JSON objects should not be modified when read as STRING

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48148:


Assignee: Eric Maynard

> JSON objects should not be modified when read as STRING
> ---
>
> Key: SPARK-48148
> URL: https://issues.apache.org/jira/browse/SPARK-48148
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Eric Maynard
>Assignee: Eric Maynard
>Priority: Major
>  Labels: pull-request-available
>
> Currently, when reading a JSON like this:
> bq. {"a": {"b": -999.995}}
> With the schema:
> bq. a STRING
> Spark will yield a result like this:
> bq. {"b": -1000.0}
> This is due to how we convert a non-string value to a string in JacksonParser



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48148) JSON objects should not be modified when read as STRING

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48148.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46408
[https://github.com/apache/spark/pull/46408]

> JSON objects should not be modified when read as STRING
> ---
>
> Key: SPARK-48148
> URL: https://issues.apache.org/jira/browse/SPARK-48148
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Eric Maynard
>Assignee: Eric Maynard
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, when reading a JSON like this:
> bq. {"a": {"b": -999.995}}
> With the schema:
> bq. a STRING
> Spark will yield a result like this:
> bq. {"b": -1000.0}
> This is due to how we convert a non-string value to a string in JacksonParser



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48089) Streaming query listener not working in 3.5 client <> 4.0 server

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48089.
--
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46513
[https://github.com/apache/spark/pull/46513]

> Streaming query listener not working in 3.5 client <> 4.0 server
> 
>
> Key: SPARK-48089
> URL: https://issues.apache.org/jira/browse/SPARK-48089
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>
> {code}
> ==
> ERROR [1.488s]: test_listener_events 
> (pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests.test_listener_events)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/streaming/test_parity_listener.py",
>  line 53, in test_listener_events
> self.spark.streams.addListener(test_listener)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
>  line 244, in addListener
> self._execute_streaming_query_manager_cmd(cmd)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
>  line 260, in _execute_streaming_query_manager_cmd
> (_, properties) = self._session.client.execute_command(exec_cmd)
>   ^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 982, in execute_command
> data, _, _, _, properties = self._execute_and_fetch(req)
> 
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1283, in _execute_and_fetch
> for response in self._execute_and_fetch_as_iterator(req):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1264, in _execute_and_fetch_as_iterator
> self._handle_error(error)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1503, in _handle_error
> self._handle_rpc_error(error)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1539, in _handle_rpc_error
> raise convert_exception(info, status.message) from None
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (java.io.EOFException) 
> --
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48089) Streaming query listener not working in 3.5 client <> 4.0 server

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48089:


Assignee: Wei Liu

> Streaming query listener not working in 3.5 client <> 4.0 server
> 
>
> Key: SPARK-48089
> URL: https://issues.apache.org/jira/browse/SPARK-48089
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> ==
> ERROR [1.488s]: test_listener_events 
> (pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests.test_listener_events)
> --
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/streaming/test_parity_listener.py",
>  line 53, in test_listener_events
> self.spark.streams.addListener(test_listener)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
>  line 244, in addListener
> self._execute_streaming_query_manager_cmd(cmd)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py",
>  line 260, in _execute_streaming_query_manager_cmd
> (_, properties) = self._session.client.execute_command(exec_cmd)
>   ^^
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 982, in execute_command
> data, _, _, _, properties = self._execute_and_fetch(req)
> 
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1283, in _execute_and_fetch
> for response in self._execute_and_fetch_as_iterator(req):
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1264, in _execute_and_fetch_as_iterator
> self._handle_error(error)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1503, in _handle_error
> self._handle_rpc_error(error)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py",
>  line 1539, in _handle_rpc_error
> raise convert_exception(info, status.message) from None
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (java.io.EOFException) 
> --
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48082) Recover compatibility with Spark Connect client 3.5 <> Spark Connect server 4.0

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48082.
--
  Assignee: Hyukjin Kwon
Resolution: Done

>  Recover compatibility with Spark Connect client 3.5 <> Spark Connect server 
> 4.0
> 
>
> Key: SPARK-48082
> URL: https://issues.apache.org/jira/browse/SPARK-48082
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> https://github.com/apache/spark/pull/46298#issuecomment-2087905857
> There are test failures identified when you run Spark 3.5 Spark Connect 
> client <> Spark Connect server 4.0.
> They should ideally be compatible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48212) Fully enable PandasUDFParityTests. test_udf_wrong_arg

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48212.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46498
[https://github.com/apache/spark/pull/46498]

> Fully enable PandasUDFParityTests. test_udf_wrong_arg
> -
>
> Key: SPARK-48212
> URL: https://issues.apache.org/jira/browse/SPARK-48212
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47365) Add toArrow() DataFrame method to PySpark

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47365.
--
Resolution: Fixed

Issue resolved by pull request 45481
[https://github.com/apache/spark/pull/45481]

> Add toArrow() DataFrame method to PySpark
> -
>
> Key: SPARK-47365
> URL: https://issues.apache.org/jira/browse/SPARK-47365
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Input/Output, PySpark, SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Over in the Apache Arrow community, we hear from a lot of users who want to 
> return the contents of a PySpark DataFrame as a [PyArrow 
> Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. 
> Currently the only documented way to do this is:
> *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table*
> This adds significant overhead compared to going direct from PySpark 
> DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to 
> convert to 
> pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html],
>  would it be possible to publicly expose a *toArrow()* method of the Spark 
> DataFrame class?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47365) Add toArrow() DataFrame method to PySpark

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47365:


Assignee: Ian Cook

> Add toArrow() DataFrame method to PySpark
> -
>
> Key: SPARK-47365
> URL: https://issues.apache.org/jira/browse/SPARK-47365
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Input/Output, PySpark, SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Over in the Apache Arrow community, we hear from a lot of users who want to 
> return the contents of a PySpark DataFrame as a [PyArrow 
> Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. 
> Currently the only documented way to do this is:
> *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table*
> This adds significant overhead compared to going direct from PySpark 
> DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to 
> convert to 
> pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html],
>  would it be possible to publicly expose a *toArrow()* method of the Spark 
> DataFrame class?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47986) [CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server

2024-05-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47986.
--
Resolution: Fixed

Issue resolved by pull request 46435
[https://github.com/apache/spark/pull/46435]

> [CONNECT][PYTHON] Unable to create a new session when the default session is 
> closed by the server
> -
>
> Key: SPARK-47986
> URL: https://issues.apache.org/jira/browse/SPARK-47986
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Niranjan Jayakar
>Assignee: Niranjan Jayakar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When the server closes a session, usually after a cluster restart, the client 
> is unaware of this until it receives an error.
> Once it does so, there is no way for the client to create a new session since 
> the stale sessions are still recorded as default and active sessions.
> The only solution currently is to restart the Python interpreter on the 
> client, or to reach into the session builder and change the active or default 
> session.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance

2024-05-08 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844808#comment-17844808
 ] 

Hyukjin Kwon commented on SPARK-48094:
--

Woohoo!

> Reduce GitHub Action usage according to ASF project allowance
> -
>
> Key: SPARK-48094
> URL: https://issues.apache.org/jira/browse/SPARK-48094
> Project: Spark
>  Issue Type: Umbrella
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 4.0.0
>
> Attachments: Screenshot 2024-05-02 at 23.56.05.png
>
>
> h2. ASF INFRA POLICY
> - https://infra.apache.org/github-actions-policy.html
> h2. MONITORING
> - https://infra-reports.apache.org/#ghactions=spark=168
>  !Screenshot 2024-05-02 at 23.56.05.png|width=100%! 
> h2. TARGET
> * All workflows MUST have a job concurrency level less than or equal to 20. 
> This means a workflow cannot have more than 20 jobs running at the same time 
> across all matrices.
> * All workflows SHOULD have a job concurrency level less than or equal to 15. 
> Just because 20 is the max, doesn't mean you should strive for 20.
> * The average number of minutes a project uses per calendar week MUST NOT 
> exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 
> hours).
> * The average number of minutes a project uses in any consecutive five-day 
> period MUST NOT exceed the equivalent of 30 full-time runners (216,000 
> minutes, or 3,600 hours).
> h2. DEADLINE
> bq. 17th of May, 2024
> Since the deadline is 17th of May, 2024, I set this as the highest priority, 
> `Blocker`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48205) Remove the private[sql] modifier for Python data sources

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48205.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46487
[https://github.com/apache/spark/pull/46487]

> Remove the private[sql] modifier for Python data sources
> 
>
> Key: SPARK-48205
> URL: https://issues.apache.org/jira/browse/SPARK-48205
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> To make it consistent with UDFs and UDTFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48205) Remove the private[sql] modifier for Python data sources

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48205:


Assignee: Allison Wang

> Remove the private[sql] modifier for Python data sources
> 
>
> Key: SPARK-48205
> URL: https://issues.apache.org/jira/browse/SPARK-48205
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> To make it consistent with UDFs and UDTFs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48087:


Assignee: Hyukjin Kwon

> Python UDTF incompatibility in 3.5 client <> 4.0 server
> ---
>
> Key: SPARK-48087
> URL: https://issues.apache.org/jira/browse/SPARK-48087
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> ==
> FAIL [0.103s]: test_udtf_init_with_additional_args 
> (pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args)
> --
> pyspark.errors.exceptions.connect.PythonException: 
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
> eval_type)
> self._check_result_or_exception(TestUDTF, ret_type, expected)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", 
> line 598, in _check_result_or_exception
> with self.assertRaisesRegex(err_type, expected):
> AssertionError: "AttributeError" does not match "
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1834, in main
> process()
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1826, in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 224, in dump_stream
> self.serializer.dump_stream(self._batched(iterator), stream)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 145, in dump_stream
> for obj in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 213, in _batched
> for item in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1391, in mapper
> yield eval(*[a[o] for o in args_kwargs_offsets])
>   ^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1371, in evaluate
> return tuple(map(verify_and_convert_result, res))
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1340, in verify_and_convert_result
> return toInternal(result)
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 1291, in toInternal
> return tuple(
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 1292, in 
> f.toInternal(v) if c else v
> ^^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 907, in toInternal
> return self.dataType.toInternal(obj)
>^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 372, in toInternal
> calendar.timegm(dt.utctimetuple()) if dt.tzinfo else 
> time.mktime(dt.timetuple())
> ..."
> {code}
> {code}
> ==
> FAIL [0.096s]: test_udtf_init_with_additional_args 
> (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args)
> --
> pyspark.errors.exceptions.connect.PythonException: 
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
> eval_type)
>
> ^^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 946, in read_udtf
> raise PySparkRuntimeError(
> pyspark.errors.exceptions.base.PySparkRuntimeError: 
> 

[jira] [Resolved] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48087.
--
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 46473
[https://github.com/apache/spark/pull/46473]

> Python UDTF incompatibility in 3.5 client <> 4.0 server
> ---
>
> Key: SPARK-48087
> URL: https://issues.apache.org/jira/browse/SPARK-48087
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>
> {code}
> ==
> FAIL [0.103s]: test_udtf_init_with_additional_args 
> (pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args)
> --
> pyspark.errors.exceptions.connect.PythonException: 
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
> eval_type)
> self._check_result_or_exception(TestUDTF, ret_type, expected)
>   File 
> "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", 
> line 598, in _check_result_or_exception
> with self.assertRaisesRegex(err_type, expected):
> AssertionError: "AttributeError" does not match "
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1834, in main
> process()
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1826, in process
> serializer.dump_stream(out_iter, outfile)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 224, in dump_stream
> self.serializer.dump_stream(self._batched(iterator), stream)
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 145, in dump_stream
> for obj in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py",
>  line 213, in _batched
> for item in iterator:
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1391, in mapper
> yield eval(*[a[o] for o in args_kwargs_offsets])
>   ^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1371, in evaluate
> return tuple(map(verify_and_convert_result, res))
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1340, in verify_and_convert_result
> return toInternal(result)
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 1291, in toInternal
> return tuple(
>^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 1292, in 
> f.toInternal(v) if c else v
> ^^^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 907, in toInternal
> return self.dataType.toInternal(obj)
>^
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", 
> line 372, in toInternal
> calendar.timegm(dt.utctimetuple()) if dt.tzinfo else 
> time.mktime(dt.timetuple())
> ..."
> {code}
> {code}
> ==
> FAIL [0.096s]: test_udtf_init_with_additional_args 
> (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args)
> --
> pyspark.errors.exceptions.connect.PythonException: 
>   An exception was thrown from the Python worker. Please see the stack trace 
> below.
> Traceback (most recent call last):
>   File 
> "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", 
> line 1816, in main
> func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, 
> eval_type)
>
> ^^^
>   File 
> 

[jira] [Resolved] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48094.
--
  Assignee: Dongjoon Hyun
Resolution: Done

Seems like we're done :-)? I will resolve this one for now but feel free to 
reopen if there are more work to be done!

> Reduce GitHub Action usage according to ASF project allowance
> -
>
> Key: SPARK-48094
> URL: https://issues.apache.org/jira/browse/SPARK-48094
> Project: Spark
>  Issue Type: Umbrella
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Attachments: Screenshot 2024-05-02 at 23.56.05.png
>
>
> h2. ASF INFRA POLICY
> - https://infra.apache.org/github-actions-policy.html
> h2. MONITORING
> - https://infra-reports.apache.org/#ghactions=spark=168
>  !Screenshot 2024-05-02 at 23.56.05.png|width=100%! 
> h2. TARGET
> * All workflows MUST have a job concurrency level less than or equal to 20. 
> This means a workflow cannot have more than 20 jobs running at the same time 
> across all matrices.
> * All workflows SHOULD have a job concurrency level less than or equal to 15. 
> Just because 20 is the max, doesn't mean you should strive for 20.
> * The average number of minutes a project uses per calendar week MUST NOT 
> exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 
> hours).
> * The average number of minutes a project uses in any consecutive five-day 
> period MUST NOT exceed the equivalent of 30 full-time runners (216,000 
> minutes, or 3,600 hours).
> h2. DEADLINE
> bq. 17th of May, 2024
> Since the deadline is 17th of May, 2024, I set this as the highest priority, 
> `Blocker`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48094:
-
Fix Version/s: 4.0.0

> Reduce GitHub Action usage according to ASF project allowance
> -
>
> Key: SPARK-48094
> URL: https://issues.apache.org/jira/browse/SPARK-48094
> Project: Spark
>  Issue Type: Umbrella
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 4.0.0
>
> Attachments: Screenshot 2024-05-02 at 23.56.05.png
>
>
> h2. ASF INFRA POLICY
> - https://infra.apache.org/github-actions-policy.html
> h2. MONITORING
> - https://infra-reports.apache.org/#ghactions=spark=168
>  !Screenshot 2024-05-02 at 23.56.05.png|width=100%! 
> h2. TARGET
> * All workflows MUST have a job concurrency level less than or equal to 20. 
> This means a workflow cannot have more than 20 jobs running at the same time 
> across all matrices.
> * All workflows SHOULD have a job concurrency level less than or equal to 15. 
> Just because 20 is the max, doesn't mean you should strive for 20.
> * The average number of minutes a project uses per calendar week MUST NOT 
> exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 
> hours).
> * The average number of minutes a project uses in any consecutive five-day 
> period MUST NOT exceed the equivalent of 30 full-time runners (216,000 
> minutes, or 3,600 hours).
> h2. DEADLINE
> bq. 17th of May, 2024
> Since the deadline is 17th of May, 2024, I set this as the highest priority, 
> `Blocker`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48163:
-
Fix Version/s: (was: 4.0.0)

> Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - 
> get_resources_command`
> 
>
> Key: SPARK-48163
> URL: https://issues.apache.org/jira/browse/SPARK-48163
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> - SPARK-43923: commands send events ((get_resources_command {
> [info] }
> [info] ,None)) *** FAILED *** (35 milliseconds)
> [info]   VerifyEvents.this.listener.executeHolder.isDefined was false 
> (SparkConnectServiceSuite.scala:873)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-48163:
--
  Assignee: (was: Dongjoon Hyun)

> Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - 
> get_resources_command`
> 
>
> Key: SPARK-48163
> URL: https://issues.apache.org/jira/browse/SPARK-48163
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> - SPARK-43923: commands send events ((get_resources_command {
> [info] }
> [info] ,None)) *** FAILED *** (35 milliseconds)
> [info]   VerifyEvents.this.listener.executeHolder.isDefined was false 
> (SparkConnectServiceSuite.scala:873)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`

2024-05-08 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844618#comment-17844618
 ] 

Hyukjin Kwon commented on SPARK-48163:
--

reverted in 
https://github.com/apache/spark/commit/bd896cac168aa5793413058ca706c73705edbf96

> Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - 
> get_resources_command`
> 
>
> Key: SPARK-48163
> URL: https://issues.apache.org/jira/browse/SPARK-48163
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> - SPARK-43923: commands send events ((get_resources_command {
> [info] }
> [info] ,None)) *** FAILED *** (35 milliseconds)
> [info]   VerifyEvents.this.listener.executeHolder.isDefined was false 
> (SparkConnectServiceSuite.scala:873)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48164) Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48164.
--
Resolution: Invalid

> Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - 
> get_resources_command`
> --
>
> Key: SPARK-48164
> URL: https://issues.apache.org/jira/browse/SPARK-48164
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48164) Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48164:
-
Target Version/s:   (was: 4.0.0)

> Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - 
> get_resources_command`
> --
>
> Key: SPARK-48164
> URL: https://issues.apache.org/jira/browse/SPARK-48164
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48193) Make `maven-deploy-plugin` retry 3 times

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48193.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46471
[https://github.com/apache/spark/pull/46471]

> Make `maven-deploy-plugin` retry 3 times
> 
>
> Key: SPARK-48193
> URL: https://issues.apache.org/jira/browse/SPARK-48193
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48193) Make `maven-deploy-plugin` retry 3 times

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48193:


Assignee: BingKun Pan

> Make `maven-deploy-plugin` retry 3 times
> 
>
> Key: SPARK-48193
> URL: https://issues.apache.org/jira/browse/SPARK-48193
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48192) Enable TPC-DS tests in forked repository

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48192.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46470
[https://github.com/apache/spark/pull/46470]

> Enable TPC-DS tests in forked repository
> 
>
> Key: SPARK-48192
> URL: https://issues.apache.org/jira/browse/SPARK-48192
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> TPC-DS is pretty important in SQL. Shoud at least enable it in forked 
> repositories (PR builders) which does not consume ASF resource.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48192) Enable TPC-DS tests in forked repository

2024-05-08 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48192:


Assignee: Hyukjin Kwon

> Enable TPC-DS tests in forked repository
> 
>
> Key: SPARK-48192
> URL: https://issues.apache.org/jira/browse/SPARK-48192
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> TPC-DS is pretty important in SQL. Shoud at least enable it in forked 
> repositories (PR builders) which does not consume ASF resource.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48192) Enable TPC-DS tests in forked repository

2024-05-08 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48192:


 Summary: Enable TPC-DS tests in forked repository
 Key: SPARK-48192
 URL: https://issues.apache.org/jira/browse/SPARK-48192
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


TPC-DS is pretty important in SQL. Shoud at least enable it in forked 
repositories (PR builders) which does not consume ASF resource.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48045) Pandas API groupby with multi-agg-relabel ignores as_index=False

2024-05-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48045:


Assignee: Saidatt Sinai Amonkar

> Pandas API groupby with multi-agg-relabel ignores as_index=False
> 
>
> Key: SPARK-48045
> URL: https://issues.apache.org/jira/browse/SPARK-48045
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark
>Affects Versions: 3.5.1
> Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2
>Reporter: Paul George
>Assignee: Saidatt Sinai Amonkar
>Priority: Minor
>  Labels: pull-request-available
>
> A Pandas API DataFrame groupby with as_index=False and a multilevel 
> relabeling, such as
> {code:java}
> from pyspark import pandas as ps
> ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", 
> as_index=False).agg(b_max=("b", "max")){code}
> fails to include group keys in the resulting DataFrame. This diverges from 
> expected behavior as well as from the behavior of native Pandas, e.g.
> *actual*
> {code:java}
>    b_max
> 0      1 {code}
> *expected*
> {code:java}
>    a  b_max
> 0  0      1 {code}
>  
> A possible fix is to prepend groupby key columns to {{*order*}} and 
> {{*columns*}} before filtering here:  
> [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >