[jira] [Resolved] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
[ https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48370. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46683 [https://github.com/apache/spark/pull/46683] > Checkpoint and localCheckpoint in Scala Spark Connect client > > > Key: SPARK-48370 > URL: https://issues.apache.org/jira/browse/SPARK-48370 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark > Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
[ https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48370: Assignee: Hyukjin Kwon > Checkpoint and localCheckpoint in Scala Spark Connect client > > > Key: SPARK-48370 > URL: https://issues.apache.org/jira/browse/SPARK-48370 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark > Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48393) Move a group of constants to `pyspark.util`
[ https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48393: Assignee: Ruifeng Zheng > Move a group of constants to `pyspark.util` > --- > > Key: SPARK-48393 > URL: https://issues.apache.org/jira/browse/SPARK-48393 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48393) Move a group of constants to `pyspark.util`
[ https://issues.apache.org/jira/browse/SPARK-48393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48393. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46710 [https://github.com/apache/spark/pull/46710] > Move a group of constants to `pyspark.util` > --- > > Key: SPARK-48393 > URL: https://issues.apache.org/jira/browse/SPARK-48393 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-48379: -- Assignee: (was: Stefan Kandic) Reverted in https://github.com/apache/spark/commit/9fd85d9acc5acf455d0ad910ef2848695576242b > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48379: - Fix Version/s: (was: 4.0.0) > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs
[ https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48389. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46703 [https://github.com/apache/spark/pull/46703] > Remove obsolete workflow cancel_duplicate_workflow_runs > --- > > Key: SPARK-48389 > URL: https://issues.apache.org/jira/browse/SPARK-48389 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > After https://github.com/apache/spark/pull/46689, we don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs
[ https://issues.apache.org/jira/browse/SPARK-48389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48389: Assignee: Hyukjin Kwon > Remove obsolete workflow cancel_duplicate_workflow_runs > --- > > Key: SPARK-48389 > URL: https://issues.apache.org/jira/browse/SPARK-48389 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > After https://github.com/apache/spark/pull/46689, we don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48389) Remove obsolete workflow cancel_duplicate_workflow_runs
Hyukjin Kwon created SPARK-48389: Summary: Remove obsolete workflow cancel_duplicate_workflow_runs Key: SPARK-48389 URL: https://issues.apache.org/jira/browse/SPARK-48389 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon After https://github.com/apache/spark/pull/46689, we don't need this anymore -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48379: Assignee: Stefan Kandic > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Assignee: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48379) Cancel build during a PR when a new commit is pushed
[ https://issues.apache.org/jira/browse/SPARK-48379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48379. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46689 [https://github.com/apache/spark/pull/46689] > Cancel build during a PR when a new commit is pushed > > > Key: SPARK-48379 > URL: https://issues.apache.org/jira/browse/SPARK-48379 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Creating a new commit on a branch should cancel the build of previous commits > for the same branch. > Exceptions are master and branch-* branches where we still want to have > concurrent builds. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48341) Allow Spark Connect plugins to use QueryTest in their tests
[ https://issues.apache.org/jira/browse/SPARK-48341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48341. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46667 [https://github.com/apache/spark/pull/46667] > Allow Spark Connect plugins to use QueryTest in their tests > --- > > Key: SPARK-48341 > URL: https://issues.apache.org/jira/browse/SPARK-48341 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Tom van Bussel >Assignee: Tom van Bussel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
Hyukjin Kwon created SPARK-48370: Summary: Checkpoint and localCheckpoint in Scala Spark Connect client Key: SPARK-48370 URL: https://issues.apache.org/jira/browse/SPARK-48370 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48370) Checkpoint and localCheckpoint in Scala Spark Connect client
[ https://issues.apache.org/jira/browse/SPARK-48370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48370: - Issue Type: Improvement (was: Bug) > Checkpoint and localCheckpoint in Scala Spark Connect client > > > Key: SPARK-48370 > URL: https://issues.apache.org/jira/browse/SPARK-48370 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > SPARK-48258 implemented checkpoint and localcheckpoint in Python Spark > Connect client. We should do it in Scala too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48367) Fix lint-scala for scalafmt to detect properly
[ https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48367. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46679 [https://github.com/apache/spark/pull/46679] > Fix lint-scala for scalafmt to detect properly > -- > > Key: SPARK-48367 > URL: https://issues.apache.org/jira/browse/SPARK-48367 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > ./build/mvn \ > -Pscala-2.13 \ > scalafmt:format \ > -Dscalafmt.skip=false \ > -Dscalafmt.validateOnly=true \ > -Dscalafmt.changedOnly=false \ > -pl connector/connect/common \ > -pl connector/connect/server \ > -pl connector/connect/client/jvm > {code} > fails as below: > {code} > [INFO] Scalafmt results: 1 of 36 were unformatted > [INFO] Details: > [INFO] - Requires formatting: ConnectProtoUtils.scala > [INFO] - Formatted: UdfUtils.scala > [INFO] - Formatted: DataTypeProtoConverter.scala > [INFO] - Formatted: ConnectCommon.scala > [INFO] - Formatted: ProtoUtils.scala > [INFO] - Formatted: Abbreviator.scala > [INFO] - Formatted: ProtoDataTypes.scala > [INFO] - Formatted: LiteralValueProtoConverter.scala > [INFO] - Formatted: InvalidPlanInput.scala > [INFO] - Formatted: ForeachWriterPacket.scala > [INFO] - Formatted: StreamingListenerPacket.scala > [INFO] - Formatted: StorageLevelProtoConverter.scala > [INFO] - Formatted: UdfPacket.scala > [INFO] - Formatted: ClassFinder.scala > [INFO] - Formatted: SparkConnectClient.scala > [INFO] - Formatted: GrpcRetryHandler.scala > [INFO] - Formatted: GrpcExceptionConverter.scala > [INFO] - Formatted: ArrowEncoderUtils.scala > [INFO] - Formatted: ScalaCollectionUtils.scala > [INFO] - Formatted: ArrowDeserializer.scala > [INFO] - Formatted: ArrowVectorReader.scala > [INFO] - Formatted: ArrowSerializer.scala > [INFO] - Formatted: ConcatenatingArrowStreamReader.scala > [INFO] - Formatted: RetryPolicy.scala > [INFO] - Formatted: SparkConnectStubState.scala > [INFO] - Formatted: ArtifactManager.scala > [INFO] - Formatted: SparkResult.scala > [INFO] - Formatted: RetriesExceeded.scala > [INFO] - Formatted: CloseableIterator.scala > [INFO] - Formatted: package.scala > [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala > [INFO] - Formatted: ResponseValidator.scala > [INFO] - Formatted: SparkConnectClientParser.scala > [INFO] - Formatted: CustomSparkConnectStub.scala > [INFO] - Formatted: CustomSparkConnectBlockingStub.scala > [INFO] - Formatted: TestUDFs.scala > {code} > This is because the output format has changed due to scalafmt version upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48367) Fix lint-scala for scalafmt to detect properly
[ https://issues.apache.org/jira/browse/SPARK-48367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48367: Assignee: Hyukjin Kwon > Fix lint-scala for scalafmt to detect properly > -- > > Key: SPARK-48367 > URL: https://issues.apache.org/jira/browse/SPARK-48367 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > ./build/mvn \ > -Pscala-2.13 \ > scalafmt:format \ > -Dscalafmt.skip=false \ > -Dscalafmt.validateOnly=true \ > -Dscalafmt.changedOnly=false \ > -pl connector/connect/common \ > -pl connector/connect/server \ > -pl connector/connect/client/jvm > {code} > fails as below: > {code} > [INFO] Scalafmt results: 1 of 36 were unformatted > [INFO] Details: > [INFO] - Requires formatting: ConnectProtoUtils.scala > [INFO] - Formatted: UdfUtils.scala > [INFO] - Formatted: DataTypeProtoConverter.scala > [INFO] - Formatted: ConnectCommon.scala > [INFO] - Formatted: ProtoUtils.scala > [INFO] - Formatted: Abbreviator.scala > [INFO] - Formatted: ProtoDataTypes.scala > [INFO] - Formatted: LiteralValueProtoConverter.scala > [INFO] - Formatted: InvalidPlanInput.scala > [INFO] - Formatted: ForeachWriterPacket.scala > [INFO] - Formatted: StreamingListenerPacket.scala > [INFO] - Formatted: StorageLevelProtoConverter.scala > [INFO] - Formatted: UdfPacket.scala > [INFO] - Formatted: ClassFinder.scala > [INFO] - Formatted: SparkConnectClient.scala > [INFO] - Formatted: GrpcRetryHandler.scala > [INFO] - Formatted: GrpcExceptionConverter.scala > [INFO] - Formatted: ArrowEncoderUtils.scala > [INFO] - Formatted: ScalaCollectionUtils.scala > [INFO] - Formatted: ArrowDeserializer.scala > [INFO] - Formatted: ArrowVectorReader.scala > [INFO] - Formatted: ArrowSerializer.scala > [INFO] - Formatted: ConcatenatingArrowStreamReader.scala > [INFO] - Formatted: RetryPolicy.scala > [INFO] - Formatted: SparkConnectStubState.scala > [INFO] - Formatted: ArtifactManager.scala > [INFO] - Formatted: SparkResult.scala > [INFO] - Formatted: RetriesExceeded.scala > [INFO] - Formatted: CloseableIterator.scala > [INFO] - Formatted: package.scala > [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala > [INFO] - Formatted: ResponseValidator.scala > [INFO] - Formatted: SparkConnectClientParser.scala > [INFO] - Formatted: CustomSparkConnectStub.scala > [INFO] - Formatted: CustomSparkConnectBlockingStub.scala > [INFO] - Formatted: TestUDFs.scala > {code} > This is because the output format has changed due to scalafmt version upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48367) Fix lint-scala for scalafmt to detect properly
Hyukjin Kwon created SPARK-48367: Summary: Fix lint-scala for scalafmt to detect properly Key: SPARK-48367 URL: https://issues.apache.org/jira/browse/SPARK-48367 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} ./build/mvn \ -Pscala-2.13 \ scalafmt:format \ -Dscalafmt.skip=false \ -Dscalafmt.validateOnly=true \ -Dscalafmt.changedOnly=false \ -pl connector/connect/common \ -pl connector/connect/server \ -pl connector/connect/client/jvm {code} fails as below: {code} [INFO] Scalafmt results: 1 of 36 were unformatted [INFO] Details: [INFO] - Requires formatting: ConnectProtoUtils.scala [INFO] - Formatted: UdfUtils.scala [INFO] - Formatted: DataTypeProtoConverter.scala [INFO] - Formatted: ConnectCommon.scala [INFO] - Formatted: ProtoUtils.scala [INFO] - Formatted: Abbreviator.scala [INFO] - Formatted: ProtoDataTypes.scala [INFO] - Formatted: LiteralValueProtoConverter.scala [INFO] - Formatted: InvalidPlanInput.scala [INFO] - Formatted: ForeachWriterPacket.scala [INFO] - Formatted: StreamingListenerPacket.scala [INFO] - Formatted: StorageLevelProtoConverter.scala [INFO] - Formatted: UdfPacket.scala [INFO] - Formatted: ClassFinder.scala [INFO] - Formatted: SparkConnectClient.scala [INFO] - Formatted: GrpcRetryHandler.scala [INFO] - Formatted: GrpcExceptionConverter.scala [INFO] - Formatted: ArrowEncoderUtils.scala [INFO] - Formatted: ScalaCollectionUtils.scala [INFO] - Formatted: ArrowDeserializer.scala [INFO] - Formatted: ArrowVectorReader.scala [INFO] - Formatted: ArrowSerializer.scala [INFO] - Formatted: ConcatenatingArrowStreamReader.scala [INFO] - Formatted: RetryPolicy.scala [INFO] - Formatted: SparkConnectStubState.scala [INFO] - Formatted: ArtifactManager.scala [INFO] - Formatted: SparkResult.scala [INFO] - Formatted: RetriesExceeded.scala [INFO] - Formatted: CloseableIterator.scala [INFO] - Formatted: package.scala [INFO] - Formatted: ExecutePlanResponseReattachableIterator.scala [INFO] - Formatted: ResponseValidator.scala [INFO] - Formatted: SparkConnectClientParser.scala [INFO] - Formatted: CustomSparkConnectStub.scala [INFO] - Formatted: CustomSparkConnectBlockingStub.scala [INFO] - Formatted: TestUDFs.scala {code} This is because the output format has changed due to scalafmt version upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48363) Cleanup some redundant codes in `from_xml`
[ https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48363. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46674 [https://github.com/apache/spark/pull/46674] > Cleanup some redundant codes in `from_xml` > -- > > Key: SPARK-48363 > URL: https://issues.apache.org/jira/browse/SPARK-48363 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48363) Cleanup some redundant codes in `from_xml`
[ https://issues.apache.org/jira/browse/SPARK-48363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48363: Assignee: BingKun Pan > Cleanup some redundant codes in `from_xml` > -- > > Key: SPARK-48363 > URL: https://issues.apache.org/jira/browse/SPARK-48363 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz
[ https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48340. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 4 [https://github.com/apache/spark/pull/4] > Support TimestampNTZ infer schema miss prefer_timestamp_ntz > > > Key: SPARK-48340 > URL: https://issues.apache.org/jira/browse/SPARK-48340 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: image-2024-05-20-18-38-39-769.png > > > !image-2024-05-20-18-38-39-769.png|width=746,height=450! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48340) Support TimestampNTZ infer schema miss prefer_timestamp_ntz
[ https://issues.apache.org/jira/browse/SPARK-48340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48340: Assignee: angerszhu > Support TimestampNTZ infer schema miss prefer_timestamp_ntz > > > Key: SPARK-48340 > URL: https://issues.apache.org/jira/browse/SPARK-48340 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0, 3.5.1 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Labels: pull-request-available > Attachments: image-2024-05-20-18-38-39-769.png > > > !image-2024-05-20-18-38-39-769.png|width=746,height=450! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint
[ https://issues.apache.org/jira/browse/SPARK-48258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48258. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46570 [https://github.com/apache/spark/pull/46570] > Implement DataFrame.checkpoint and DataFrame.localCheckpoint > > > Key: SPARK-48258 > URL: https://issues.apache.org/jira/browse/SPARK-48258 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature > parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48333) Test `test_sorting_functions_with_column` with same `Column`
[ https://issues.apache.org/jira/browse/SPARK-48333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48333. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46654 [https://github.com/apache/spark/pull/46654] > Test `test_sorting_functions_with_column` with same `Column` > > > Key: SPARK-48333 > URL: https://issues.apache.org/jira/browse/SPARK-48333 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48333) Test `test_sorting_functions_with_column` with same `Column`
[ https://issues.apache.org/jira/browse/SPARK-48333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48333: Assignee: Ruifeng Zheng > Test `test_sorting_functions_with_column` with same `Column` > > > Key: SPARK-48333 > URL: https://issues.apache.org/jira/browse/SPARK-48333 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48319) Test `assert_true` and `raise_error` with the same error class as Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-48319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48319: Assignee: Ruifeng Zheng > Test `assert_true` and `raise_error` with the same error class as Spark > Classic > --- > > Key: SPARK-48319 > URL: https://issues.apache.org/jira/browse/SPARK-48319 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48319) Test `assert_true` and `raise_error` with the same error class as Spark Classic
[ https://issues.apache.org/jira/browse/SPARK-48319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48319. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46633 [https://github.com/apache/spark/pull/46633] > Test `assert_true` and `raise_error` with the same error class as Spark > Classic > --- > > Key: SPARK-48319 > URL: https://issues.apache.org/jira/browse/SPARK-48319 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file
[ https://issues.apache.org/jira/browse/SPARK-48317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48317: Assignee: Hyukjin Kwon > Enable test_udtf_with_analyze_using_archive and > test_udtf_with_analyze_using_file > - > > Key: SPARK-48317 > URL: https://issues.apache.org/jira/browse/SPARK-48317 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file
[ https://issues.apache.org/jira/browse/SPARK-48317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48317. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46632 [https://github.com/apache/spark/pull/46632] > Enable test_udtf_with_analyze_using_archive and > test_udtf_with_analyze_using_file > - > > Key: SPARK-48317 > URL: https://issues.apache.org/jira/browse/SPARK-48317 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition
[ https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48316: Assignee: Hyukjin Kwon > Fix comments for SparkFrameMethodsParityTests.test_coalesce and > test_repartition > > > Key: SPARK-48316 > URL: https://issues.apache.org/jira/browse/SPARK-48316 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition
[ https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48316. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46629 [https://github.com/apache/spark/pull/46629] > Fix comments for SparkFrameMethodsParityTests.test_coalesce and > test_repartition > > > Key: SPARK-48316 > URL: https://issues.apache.org/jira/browse/SPARK-48316 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48316) Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition
[ https://issues.apache.org/jira/browse/SPARK-48316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48316: - Summary: Fix comments for SparkFrameMethodsParityTests.test_coalesce and test_repartition (was: Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition) > Fix comments for SparkFrameMethodsParityTests.test_coalesce and > test_repartition > > > Key: SPARK-48316 > URL: https://issues.apache.org/jira/browse/SPARK-48316 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48317) Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file
Hyukjin Kwon created SPARK-48317: Summary: Enable test_udtf_with_analyze_using_archive and test_udtf_with_analyze_using_file Key: SPARK-48317 URL: https://issues.apache.org/jira/browse/SPARK-48317 Project: Spark Issue Type: Sub-task Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
[ https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48238: - Parent: (was: SPARK-47970) Issue Type: Bug (was: Sub-task) > Spark fail to start due to class > o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter > --- > > Key: SPARK-48238 > URL: https://issues.apache.org/jira/browse/SPARK-48238 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Blocker > Labels: pull-request-available > > I tested the latest master branch, it failed to start on YARN mode > {code:java} > dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code} > > {code:java} > $ bin/spark-sql --master yarn > WARNING: Using incubator modules: jdk.incubator.vector > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor > spark.yarn.archive} is set, falling back to uploading libraries under > SPARK_HOME. > 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext. > org.sparkproject.jetty.util.MultiException: Multiple exceptions > at > org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) > ~[scala-library-2.13.13.jar:?] > at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) > ~[scala-library-2.13.13.jar:?] > at scala.collection.AbstractIterable.foreach(Iterable.scala:935) > ~[scala-library-2.13.13.jar:?] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.SparkContext.(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118) > ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?] > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112) > [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:64) >
[jira] [Created] (SPARK-48316) Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition
Hyukjin Kwon created SPARK-48316: Summary: Enable SparkFrameMethodsParityTests.test_coalesce and test_repartition Key: SPARK-48316 URL: https://issues.apache.org/jira/browse/SPARK-48316 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48310) Cached Properties Should return copies instead of values
[ https://issues.apache.org/jira/browse/SPARK-48310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48310. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46621 [https://github.com/apache/spark/pull/46621] > Cached Properties Should return copies instead of values > > > Key: SPARK-48310 > URL: https://issues.apache.org/jira/browse/SPARK-48310 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When returning cached properties for schema and columns a user might > incidentally modify the cached values. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir
[ https://issues.apache.org/jira/browse/SPARK-48268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48268. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46571 [https://github.com/apache/spark/pull/46571] > Add a configuration for SparkContext.setCheckpointDir > - > > Key: SPARK-48268 > URL: https://issues.apache.org/jira/browse/SPARK-48268 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Would be great to have it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir
[ https://issues.apache.org/jira/browse/SPARK-48268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48268: Assignee: Hyukjin Kwon > Add a configuration for SparkContext.setCheckpointDir > - > > Key: SPARK-48268 > URL: https://issues.apache.org/jira/browse/SPARK-48268 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Would be great to have it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48295) Turn on compute.ops_on_diff_frames by default
[ https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48295. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46602 [https://github.com/apache/spark/pull/46602] > Turn on compute.ops_on_diff_frames by default > - > > Key: SPARK-48295 > URL: https://issues.apache.org/jira/browse/SPARK-48295 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48295) Turn on compute.ops_on_diff_frames by default
[ https://issues.apache.org/jira/browse/SPARK-48295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48295: Assignee: Ruifeng Zheng > Turn on compute.ops_on_diff_frames by default > - > > Key: SPARK-48295 > URL: https://issues.apache.org/jira/browse/SPARK-48295 > Project: Spark > Issue Type: Improvement > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48100) [SQL][XML] Fix issues in skipping nested structure fields not selected in schema
[ https://issues.apache.org/jira/browse/SPARK-48100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48100: Assignee: Shujing Yang > [SQL][XML] Fix issues in skipping nested structure fields not selected in > schema > > > Key: SPARK-48100 > URL: https://issues.apache.org/jira/browse/SPARK-48100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > > Previously, the XML parser can't skip nested structure data fields > effectively when they were not selected in the schema. For instance, in the > below example, `df.select("struct2").collect()` returns `Seq(null)` as > `struct1` wasn't effectively skipped. This PR fixes this issue. > {code:java} > > > 1 > > > 2 > > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48100) [SQL][XML] Fix issues in skipping nested structure fields not selected in schema
[ https://issues.apache.org/jira/browse/SPARK-48100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48100. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46348 [https://github.com/apache/spark/pull/46348] > [SQL][XML] Fix issues in skipping nested structure fields not selected in > schema > > > Key: SPARK-48100 > URL: https://issues.apache.org/jira/browse/SPARK-48100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Shujing Yang >Assignee: Shujing Yang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Previously, the XML parser can't skip nested structure data fields > effectively when they were not selected in the schema. For instance, in the > below example, `df.select("struct2").collect()` returns `Seq(null)` as > `struct1` wasn't effectively skipped. This PR fixes this issue. > {code:java} > > > 1 > > > 2 > > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48247) Use all values in a python dict when inferring MapType schema
[ https://issues.apache.org/jira/browse/SPARK-48247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48247. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46547 [https://github.com/apache/spark/pull/46547] > Use all values in a python dict when inferring MapType schema > - > > Key: SPARK-48247 > URL: https://issues.apache.org/jira/browse/SPARK-48247 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Similar with SPARK-39168 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48247) Use all values in a python dict when inferring MapType schema
[ https://issues.apache.org/jira/browse/SPARK-48247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48247: Assignee: Hyukjin Kwon > Use all values in a python dict when inferring MapType schema > - > > Key: SPARK-48247 > URL: https://issues.apache.org/jira/browse/SPARK-48247 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Similar with SPARK-39168 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48266) Move o.a.spark.sql.connect.dsl to test dir
[ https://issues.apache.org/jira/browse/SPARK-48266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48266. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46567 [https://github.com/apache/spark/pull/46567] > Move o.a.spark.sql.connect.dsl to test dir > -- > > Key: SPARK-48266 > URL: https://issues.apache.org/jira/browse/SPARK-48266 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48268) Add a configuration for SparkContext.setCheckpointDir
Hyukjin Kwon created SPARK-48268: Summary: Add a configuration for SparkContext.setCheckpointDir Key: SPARK-48268 URL: https://issues.apache.org/jira/browse/SPARK-48268 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Would be great to have it -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48258) Implement DataFrame.checkpoint and DataFrame.localCheckpoint
Hyukjin Kwon created SPARK-48258: Summary: Implement DataFrame.checkpoint and DataFrame.localCheckpoint Key: SPARK-48258 URL: https://issues.apache.org/jira/browse/SPARK-48258 Project: Spark Issue Type: Improvement Components: Connect, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon We should add DataFrame.checkpoint and DataFrame.localCheckpoint for feature parity. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh
[ https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48254. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46555 [https://github.com/apache/spark/pull/46555] > Enhance Guava version extraction rule in dev/test-dependencies.sh > - > > Key: SPARK-48254 > URL: https://issues.apache.org/jira/browse/SPARK-48254 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48254) Enhance Guava version extraction rule in dev/test-dependencies.sh
[ https://issues.apache.org/jira/browse/SPARK-48254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48254: Assignee: Cheng Pan > Enhance Guava version extraction rule in dev/test-dependencies.sh > - > > Key: SPARK-48254 > URL: https://issues.apache.org/jira/browse/SPARK-48254 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48248) Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement
[ https://issues.apache.org/jira/browse/SPARK-48248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48248: Assignee: Hyukjin Kwon > Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement > - > > Key: SPARK-48248 > URL: https://issues.apache.org/jira/browse/SPARK-48248 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled", > >>> True) > >>> spark.createDataFrame(1, "a") > DataFrame[_1: array>] > {code} > should infer it as an integer of array -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48248) Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement
[ https://issues.apache.org/jira/browse/SPARK-48248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48248. -- Fix Version/s: 3.4.4 3.5.2 4.0.0 Resolution: Fixed Issue resolved by pull request 46548 [https://github.com/apache/spark/pull/46548] > Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement > - > > Key: SPARK-48248 > URL: https://issues.apache.org/jira/browse/SPARK-48248 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.4.4, 3.5.2, 4.0.0 > > > {code} > >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled", > >>> True) > >>> spark.createDataFrame(1, "a") > DataFrame[_1: array>] > {code} > should infer it as an integer of array -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48250) Enable array inference tests at test_parity_types.py
[ https://issues.apache.org/jira/browse/SPARK-48250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48250. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46550 [https://github.com/apache/spark/pull/46550] > Enable array inference tests at test_parity_types.py > > > Key: SPARK-48250 > URL: https://issues.apache.org/jira/browse/SPARK-48250 > Project: Spark > Issue Type: Test > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Some tests in test_types.py are using RDD unnecessarily. We can remove that > to enable some tests with Spark Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48238) Spark fail to start due to class o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter
[ https://issues.apache.org/jira/browse/SPARK-48238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48238: - Parent: SPARK-47970 Issue Type: Sub-task (was: Bug) > Spark fail to start due to class > o.a.h.yarn.server.webproxy.amfilter.AmIpFilter is not a jakarta.servlet.Filter > --- > > Key: SPARK-48238 > URL: https://issues.apache.org/jira/browse/SPARK-48238 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Blocker > > I tested the latest master branch, it failed to start on YARN mode > {code:java} > dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn{code} > > {code:java} > $ bin/spark-sql --master yarn > WARNING: Using incubator modules: jdk.incubator.vector > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 2024-05-10 17:58:17 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2024-05-10 17:58:18 WARN Client: Neither spark.yarn.jars nor > spark.yarn.archive} is set, falling back to uploading libraries under > SPARK_HOME. > 2024-05-10 17:58:25 ERROR SparkContext: Error initializing SparkContext. > org.sparkproject.jetty.util.MultiException: Multiple exceptions > at > org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) > ~[scala-library-2.13.13.jar:?] > at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) > ~[scala-library-2.13.13.jar:?] > at scala.collection.AbstractIterable.foreach(Iterable.scala:935) > ~[scala-library-2.13.13.jar:?] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?] > at org.apache.spark.SparkContext.(SparkContext.scala:690) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) > ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118) > ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?] > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112) > [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:64) > [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at >
[jira] [Updated] (SPARK-48250) Enable array inference tests at test_parity_types.py
[ https://issues.apache.org/jira/browse/SPARK-48250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48250: - Issue Type: Test (was: Bug) > Enable array inference tests at test_parity_types.py > > > Key: SPARK-48250 > URL: https://issues.apache.org/jira/browse/SPARK-48250 > Project: Spark > Issue Type: Test > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48250) Enable array inference tests at test_parity_types.py
[ https://issues.apache.org/jira/browse/SPARK-48250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48250: - Priority: Minor (was: Major) > Enable array inference tests at test_parity_types.py > > > Key: SPARK-48250 > URL: https://issues.apache.org/jira/browse/SPARK-48250 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48250) Enable array inference tests at test_parity_types.py
Hyukjin Kwon created SPARK-48250: Summary: Enable array inference tests at test_parity_types.py Key: SPARK-48250 URL: https://issues.apache.org/jira/browse/SPARK-48250 Project: Spark Issue Type: Bug Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Deleted] (SPARK-48249) Use non-null value for legacy conf of inferArrayTypeFromFirstElement
[ https://issues.apache.org/jira/browse/SPARK-48249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon deleted SPARK-48249: - > Use non-null value for legacy conf of inferArrayTypeFromFirstElement > > > Key: SPARK-48249 > URL: https://issues.apache.org/jira/browse/SPARK-48249 > Project: Spark > Issue Type: Bug >Reporter: Hyukjin Kwon >Priority: Major > > {code} > >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled", > >>> True) > >>> spark.createDataFrame([[[None, 1]]]) > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/session.py", line 1538, in > createDataFrame > return self._create_dataframe( >^^^ > File "/.../spark/python/pyspark/sql/session.py", line 1582, in > _create_dataframe > rdd, struct = self._createFromLocal( > ^^ > File "/.../spark/python/pyspark/sql/session.py", line 1184, in > _createFromLocal > struct = self._inferSchemaFromList(data, names=schema) > ^ > File "/.../spark/python/pyspark/sql/session.py", line 1060, in > _inferSchemaFromList > raise PySparkValueError( > pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_DETERMINE_TYPE] > Some of types cannot be determined after inferring. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48249) Use non-null value for legacy conf of inferArrayTypeFromFirstElement
[ https://issues.apache.org/jira/browse/SPARK-48249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48249: - Description: {code} >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled", >>> True) >>> spark.createDataFrame([[[None, 1]]]) Traceback (most recent call last): File "", line 1, in File "/.../spark/python/pyspark/sql/session.py", line 1538, in createDataFrame return self._create_dataframe( ^^^ File "/.../spark/python/pyspark/sql/session.py", line 1582, in _create_dataframe rdd, struct = self._createFromLocal( ^^ File "/.../spark/python/pyspark/sql/session.py", line 1184, in _createFromLocal struct = self._inferSchemaFromList(data, names=schema) ^ File "/.../spark/python/pyspark/sql/session.py", line 1060, in _inferSchemaFromList raise PySparkValueError( pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_DETERMINE_TYPE] Some of types cannot be determined after inferring. {code} was: {code} >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled", >>> True) >>> spark.createDataFrame([None, 1]) Traceback (most recent call last): File "", line 1, in File "/.../spark/python/pyspark/sql/session.py", line 1538, in createDataFrame return self._create_dataframe( ^^^ File "/.../spark/python/pyspark/sql/session.py", line 1582, in _create_dataframe rdd, struct = self._createFromLocal( ^^ File "/.../spark/python/pyspark/sql/session.py", line 1184, in _createFromLocal struct = self._inferSchemaFromList(data, names=schema) ^ File "/.../spark/python/pyspark/sql/session.py", line 1046, in _inferSchemaFromList schema = reduce( ^^^ File "/.../spark/python/pyspark/sql/session.py", line 1049, in _infer_schema( File "/.../spark/python/pyspark/sql/types.py", line 2015, in _infer_schema raise PySparkTypeError( pyspark.errors.exceptions.base.PySparkTypeError: [CANNOT_INFER_SCHEMA_FOR_TYPE] Can not infer schema for type: `NoneType`. {code} > Use non-null value for legacy conf of inferArrayTypeFromFirstElement > > > Key: SPARK-48249 > URL: https://issues.apache.org/jira/browse/SPARK-48249 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > {code} > >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled", > >>> True) > >>> spark.createDataFrame([[[None, 1]]]) > Traceback (most recent call last): > File "", line 1, in > File "/.../spark/python/pyspark/sql/session.py", line 1538, in > createDataFrame > return self._create_dataframe( >^^^ > File "/.../spark/python/pyspark/sql/session.py", line 1582, in > _create_dataframe > rdd, struct = self._createFromLocal( > ^^ > File "/.../spark/python/pyspark/sql/session.py", line 1184, in > _createFromLocal > struct = self._inferSchemaFromList(data, names=schema) > ^ > File "/.../spark/python/pyspark/sql/session.py", line 1060, in > _inferSchemaFromList > raise PySparkValueError( > pyspark.errors.exceptions.base.PySparkValueError: [CANNOT_DETERMINE_TYPE] > Some of types cannot be determined after inferring. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48249) Use non-null value for legacy conf of inferArrayTypeFromFirstElement
Hyukjin Kwon created SPARK-48249: Summary: Use non-null value for legacy conf of inferArrayTypeFromFirstElement Key: SPARK-48249 URL: https://issues.apache.org/jira/browse/SPARK-48249 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled", >>> True) >>> spark.createDataFrame([None, 1]) Traceback (most recent call last): File "", line 1, in File "/.../spark/python/pyspark/sql/session.py", line 1538, in createDataFrame return self._create_dataframe( ^^^ File "/.../spark/python/pyspark/sql/session.py", line 1582, in _create_dataframe rdd, struct = self._createFromLocal( ^^ File "/.../spark/python/pyspark/sql/session.py", line 1184, in _createFromLocal struct = self._inferSchemaFromList(data, names=schema) ^ File "/.../spark/python/pyspark/sql/session.py", line 1046, in _inferSchemaFromList schema = reduce( ^^^ File "/.../spark/python/pyspark/sql/session.py", line 1049, in _infer_schema( File "/.../spark/python/pyspark/sql/types.py", line 2015, in _infer_schema raise PySparkTypeError( pyspark.errors.exceptions.base.PySparkTypeError: [CANNOT_INFER_SCHEMA_FOR_TYPE] Can not infer schema for type: `NoneType`. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48248) Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement
Hyukjin Kwon created SPARK-48248: Summary: Fix nested array to respect legacy conf of inferArrayTypeFromFirstElement Key: SPARK-48248 URL: https://issues.apache.org/jira/browse/SPARK-48248 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} >>> spark.conf.set("spark.sql.pyspark.legacy.inferArrayTypeFromFirstElement.enabled", >>> True) >>> spark.createDataFrame(1, "a") DataFrame[_1: array>] {code} should infer it as an integer of array -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48247) Use all values in a python dict when inferring MapType schema
Hyukjin Kwon created SPARK-48247: Summary: Use all values in a python dict when inferring MapType schema Key: SPARK-48247 URL: https://issues.apache.org/jira/browse/SPARK-48247 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Similar with SPARK-39168 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48245) Typo in `BadRecordException` class doc
[ https://issues.apache.org/jira/browse/SPARK-48245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48245. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46542 [https://github.com/apache/spark/pull/46542] > Typo in `BadRecordException` class doc > -- > > Key: SPARK-48245 > URL: https://issues.apache.org/jira/browse/SPARK-48245 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Trivial > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48245) Typo in `BadRecordException` class doc
[ https://issues.apache.org/jira/browse/SPARK-48245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48245: Assignee: Vladimir Golubev > Typo in `BadRecordException` class doc > -- > > Key: SPARK-48245 > URL: https://issues.apache.org/jira/browse/SPARK-48245 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Trivial > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48239) Update the release docker image to follow what we use in Github Action jobs
[ https://issues.apache.org/jira/browse/SPARK-48239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48239. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46534 [https://github.com/apache/spark/pull/46534] > Update the release docker image to follow what we use in Github Action jobs > --- > > Key: SPARK-48239 > URL: https://issues.apache.org/jira/browse/SPARK-48239 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44265) Built-in XML data source support
[ https://issues.apache.org/jira/browse/SPARK-44265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845752#comment-17845752 ] Hyukjin Kwon commented on SPARK-44265: -- https://issues.apache.org/jira/browse/SPARK-45190 is not done yet > Built-in XML data source support > > > Key: SPARK-44265 > URL: https://issues.apache.org/jira/browse/SPARK-44265 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Critical > Labels: pull-request-available > > XML is a widely used data format. An external spark-xml package > ([https://github.com/databricks/spark-xml)] is available to read and write > XML data in spark. Making spark-xml built-in will provide a better user > experience for Spark SQL and structured streaming. The proposal is to inline > code from spark-xml package. > > Here is the link to > [SPIP|https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs
[ https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48240: Assignee: BingKun Pan > Replace `Local[..]` with `"Local[...]"` in the docs > --- > > Key: SPARK-48240 > URL: https://issues.apache.org/jira/browse/SPARK-48240 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs
[ https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48240. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46535 [https://github.com/apache/spark/pull/46535] > Replace `Local[..]` with `"Local[...]"` in the docs > --- > > Key: SPARK-48240 > URL: https://issues.apache.org/jira/browse/SPARK-48240 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48232) Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build
[ https://issues.apache.org/jira/browse/SPARK-48232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48232: Assignee: Hyukjin Kwon > Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build > - > > Key: SPARK-48232 > URL: https://issues.apache.org/jira/browse/SPARK-48232 > Project: Spark > Issue Type: Test > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/runs/9022174253/job/24804919747 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48232) Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build
[ https://issues.apache.org/jira/browse/SPARK-48232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48232. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46522 [https://github.com/apache/spark/pull/46522] > Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build > - > > Key: SPARK-48232 > URL: https://issues.apache.org/jira/browse/SPARK-48232 > Project: Spark > Issue Type: Test > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > https://github.com/apache/spark/actions/runs/9022174253/job/24804919747 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48232) Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build
Hyukjin Kwon created SPARK-48232: Summary: Fix 'pyspark.sql.tests.connect.test_connect_session' in Python 3.12 build Key: SPARK-48232 URL: https://issues.apache.org/jira/browse/SPARK-48232 Project: Spark Issue Type: Test Components: Connect, PySpark, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon https://github.com/apache/spark/actions/runs/9022174253/job/24804919747 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48176) Fix name of FIELD_ALREADY_EXISTS error condition
[ https://issues.apache.org/jira/browse/SPARK-48176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48176: Assignee: Nicholas Chammas > Fix name of FIELD_ALREADY_EXISTS error condition > > > Key: SPARK-48176 > URL: https://issues.apache.org/jira/browse/SPARK-48176 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48176) Fix name of FIELD_ALREADY_EXISTS error condition
[ https://issues.apache.org/jira/browse/SPARK-48176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48176. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46510 [https://github.com/apache/spark/pull/46510] > Fix name of FIELD_ALREADY_EXISTS error condition > > > Key: SPARK-48176 > URL: https://issues.apache.org/jira/browse/SPARK-48176 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48180) Analyzer bug with multiple ORDER BY items for input table argument
[ https://issues.apache.org/jira/browse/SPARK-48180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48180: Assignee: Daniel > Analyzer bug with multiple ORDER BY items for input table argument > -- > > Key: SPARK-48180 > URL: https://issues.apache.org/jira/browse/SPARK-48180 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0, 4.0.0, 3.5.1 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > > Steps to reproduce: > > {{from pyspark.sql.functions import udtf}} > {{@udtf(returnType="a: int, b: int")}} > {{class tvf:}} > {{ def eval(self, *args):}} > {{ yield 1, 2}} > > {{SELECT * FROM tvf(}} > {{ TABLE(}} > {{ SELECT 1 AS device_id, 2 AS data_ds}} > {{ )}} > {{ WITH SINGLE PARTITION}} > {{ ORDER BY device_id, data_ds}} > {{ )}} > {{[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT] > Unsupported subquery expression: Table arguments are used in a function where > they are not supported:}} > {{'UnresolvedTableValuedFunction [tvf], [table-argument#338 [], 'data_ds], > false}} > {{ +- Project [1 AS device_id#336, 2 AS data_ds#337]}} > {{ +- OneRowRelation}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48180) Analyzer bug with multiple ORDER BY items for input table argument
[ https://issues.apache.org/jira/browse/SPARK-48180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48180. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46451 [https://github.com/apache/spark/pull/46451] > Analyzer bug with multiple ORDER BY items for input table argument > -- > > Key: SPARK-48180 > URL: https://issues.apache.org/jira/browse/SPARK-48180 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0, 4.0.0, 3.5.1 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Steps to reproduce: > > {{from pyspark.sql.functions import udtf}} > {{@udtf(returnType="a: int, b: int")}} > {{class tvf:}} > {{ def eval(self, *args):}} > {{ yield 1, 2}} > > {{SELECT * FROM tvf(}} > {{ TABLE(}} > {{ SELECT 1 AS device_id, 2 AS data_ds}} > {{ )}} > {{ WITH SINGLE PARTITION}} > {{ ORDER BY device_id, data_ds}} > {{ )}} > {{[UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_TABLE_ARGUMENT] > Unsupported subquery expression: Table arguments are used in a function where > they are not supported:}} > {{'UnresolvedTableValuedFunction [tvf], [table-argument#338 [], 'data_ds], > false}} > {{ +- Project [1 AS device_id#336, 2 AS data_ds#337]}} > {{ +- OneRowRelation}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48148) JSON objects should not be modified when read as STRING
[ https://issues.apache.org/jira/browse/SPARK-48148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48148: Assignee: Eric Maynard > JSON objects should not be modified when read as STRING > --- > > Key: SPARK-48148 > URL: https://issues.apache.org/jira/browse/SPARK-48148 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Eric Maynard >Assignee: Eric Maynard >Priority: Major > Labels: pull-request-available > > Currently, when reading a JSON like this: > bq. {"a": {"b": -999.995}} > With the schema: > bq. a STRING > Spark will yield a result like this: > bq. {"b": -1000.0} > This is due to how we convert a non-string value to a string in JacksonParser -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48148) JSON objects should not be modified when read as STRING
[ https://issues.apache.org/jira/browse/SPARK-48148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48148. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46408 [https://github.com/apache/spark/pull/46408] > JSON objects should not be modified when read as STRING > --- > > Key: SPARK-48148 > URL: https://issues.apache.org/jira/browse/SPARK-48148 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Eric Maynard >Assignee: Eric Maynard >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently, when reading a JSON like this: > bq. {"a": {"b": -999.995}} > With the schema: > bq. a STRING > Spark will yield a result like this: > bq. {"b": -1000.0} > This is due to how we convert a non-string value to a string in JacksonParser -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48089) Streaming query listener not working in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48089. -- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46513 [https://github.com/apache/spark/pull/46513] > Streaming query listener not working in 3.5 client <> 4.0 server > > > Key: SPARK-48089 > URL: https://issues.apache.org/jira/browse/SPARK-48089 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > > {code} > == > ERROR [1.488s]: test_listener_events > (pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests.test_listener_events) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/streaming/test_parity_listener.py", > line 53, in test_listener_events > self.spark.streams.addListener(test_listener) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", > line 244, in addListener > self._execute_streaming_query_manager_cmd(cmd) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", > line 260, in _execute_streaming_query_manager_cmd > (_, properties) = self._session.client.execute_command(exec_cmd) > ^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 982, in execute_command > data, _, _, _, properties = self._execute_and_fetch(req) > > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1283, in _execute_and_fetch > for response in self._execute_and_fetch_as_iterator(req): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1264, in _execute_and_fetch_as_iterator > self._handle_error(error) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1503, in _handle_error > self._handle_rpc_error(error) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1539, in _handle_rpc_error > raise convert_exception(info, status.message) from None > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (java.io.EOFException) > -- > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48089) Streaming query listener not working in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48089: Assignee: Wei Liu > Streaming query listener not working in 3.5 client <> 4.0 server > > > Key: SPARK-48089 > URL: https://issues.apache.org/jira/browse/SPARK-48089 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Structured Streaming >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Wei Liu >Priority: Major > Labels: pull-request-available > > {code} > == > ERROR [1.488s]: test_listener_events > (pyspark.sql.tests.connect.streaming.test_parity_listener.StreamingListenerParityTests.test_listener_events) > -- > Traceback (most recent call last): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/connect/streaming/test_parity_listener.py", > line 53, in test_listener_events > self.spark.streams.addListener(test_listener) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", > line 244, in addListener > self._execute_streaming_query_manager_cmd(cmd) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/streaming/query.py", > line 260, in _execute_streaming_query_manager_cmd > (_, properties) = self._session.client.execute_command(exec_cmd) > ^^ > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 982, in execute_command > data, _, _, _, properties = self._execute_and_fetch(req) > > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1283, in _execute_and_fetch > for response in self._execute_and_fetch_as_iterator(req): > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1264, in _execute_and_fetch_as_iterator > self._handle_error(error) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1503, in _handle_error > self._handle_rpc_error(error) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", > line 1539, in _handle_rpc_error > raise convert_exception(info, status.message) from None > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (java.io.EOFException) > -- > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48082) Recover compatibility with Spark Connect client 3.5 <> Spark Connect server 4.0
[ https://issues.apache.org/jira/browse/SPARK-48082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48082. -- Assignee: Hyukjin Kwon Resolution: Done > Recover compatibility with Spark Connect client 3.5 <> Spark Connect server > 4.0 > > > Key: SPARK-48082 > URL: https://issues.apache.org/jira/browse/SPARK-48082 > Project: Spark > Issue Type: Umbrella > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > https://github.com/apache/spark/pull/46298#issuecomment-2087905857 > There are test failures identified when you run Spark 3.5 Spark Connect > client <> Spark Connect server 4.0. > They should ideally be compatible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48212) Fully enable PandasUDFParityTests. test_udf_wrong_arg
[ https://issues.apache.org/jira/browse/SPARK-48212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48212. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46498 [https://github.com/apache/spark/pull/46498] > Fully enable PandasUDFParityTests. test_udf_wrong_arg > - > > Key: SPARK-48212 > URL: https://issues.apache.org/jira/browse/SPARK-48212 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47365) Add toArrow() DataFrame method to PySpark
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47365. -- Resolution: Fixed Issue resolved by pull request 45481 [https://github.com/apache/spark/pull/45481] > Add toArrow() DataFrame method to PySpark > - > > Key: SPARK-47365 > URL: https://issues.apache.org/jira/browse/SPARK-47365 > Project: Spark > Issue Type: Sub-task > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Over in the Apache Arrow community, we hear from a lot of users who want to > return the contents of a PySpark DataFrame as a [PyArrow > Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. > Currently the only documented way to do this is: > *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table* > This adds significant overhead compared to going direct from PySpark > DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to > convert to > pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html], > would it be possible to publicly expose a *toArrow()* method of the Spark > DataFrame class? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47365) Add toArrow() DataFrame method to PySpark
[ https://issues.apache.org/jira/browse/SPARK-47365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47365: Assignee: Ian Cook > Add toArrow() DataFrame method to PySpark > - > > Key: SPARK-47365 > URL: https://issues.apache.org/jira/browse/SPARK-47365 > Project: Spark > Issue Type: Sub-task > Components: Connect, Input/Output, PySpark, SQL >Affects Versions: 4.0.0, 3.5.1 >Reporter: Ian Cook >Assignee: Ian Cook >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Over in the Apache Arrow community, we hear from a lot of users who want to > return the contents of a PySpark DataFrame as a [PyArrow > Table|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html]. > Currently the only documented way to do this is: > *PySpark DataFrame* --> *pandas DataFrame* --> *PyArrow Table* > This adds significant overhead compared to going direct from PySpark > DataFrame to PyArrow Table. Since [PySpark already goes through PyArrow to > convert to > pandas|https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html], > would it be possible to publicly expose a *toArrow()* method of the Spark > DataFrame class? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47986) [CONNECT][PYTHON] Unable to create a new session when the default session is closed by the server
[ https://issues.apache.org/jira/browse/SPARK-47986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47986. -- Resolution: Fixed Issue resolved by pull request 46435 [https://github.com/apache/spark/pull/46435] > [CONNECT][PYTHON] Unable to create a new session when the default session is > closed by the server > - > > Key: SPARK-47986 > URL: https://issues.apache.org/jira/browse/SPARK-47986 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.5.0, 3.5.1 >Reporter: Niranjan Jayakar >Assignee: Niranjan Jayakar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When the server closes a session, usually after a cluster restart, the client > is unaware of this until it receives an error. > Once it does so, there is no way for the client to create a new session since > the stale sessions are still recorded as default and active sessions. > The only solution currently is to restart the Python interpreter on the > client, or to reach into the session builder and change the active or default > session. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance
[ https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844808#comment-17844808 ] Hyukjin Kwon commented on SPARK-48094: -- Woohoo! > Reduce GitHub Action usage according to ASF project allowance > - > > Key: SPARK-48094 > URL: https://issues.apache.org/jira/browse/SPARK-48094 > Project: Spark > Issue Type: Umbrella > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 4.0.0 > > Attachments: Screenshot 2024-05-02 at 23.56.05.png > > > h2. ASF INFRA POLICY > - https://infra.apache.org/github-actions-policy.html > h2. MONITORING > - https://infra-reports.apache.org/#ghactions=spark=168 > !Screenshot 2024-05-02 at 23.56.05.png|width=100%! > h2. TARGET > * All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > * All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > * The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > * The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > h2. DEADLINE > bq. 17th of May, 2024 > Since the deadline is 17th of May, 2024, I set this as the highest priority, > `Blocker`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48205) Remove the private[sql] modifier for Python data sources
[ https://issues.apache.org/jira/browse/SPARK-48205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48205. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46487 [https://github.com/apache/spark/pull/46487] > Remove the private[sql] modifier for Python data sources > > > Key: SPARK-48205 > URL: https://issues.apache.org/jira/browse/SPARK-48205 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > To make it consistent with UDFs and UDTFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48205) Remove the private[sql] modifier for Python data sources
[ https://issues.apache.org/jira/browse/SPARK-48205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48205: Assignee: Allison Wang > Remove the private[sql] modifier for Python data sources > > > Key: SPARK-48205 > URL: https://issues.apache.org/jira/browse/SPARK-48205 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Labels: pull-request-available > > To make it consistent with UDFs and UDTFs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48087: Assignee: Hyukjin Kwon > Python UDTF incompatibility in 3.5 client <> 4.0 server > --- > > Key: SPARK-48087 > URL: https://issues.apache.org/jira/browse/SPARK-48087 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > == > FAIL [0.103s]: test_udtf_init_with_additional_args > (pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1816, in main > func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, > eval_type) > self._check_result_or_exception(TestUDTF, ret_type, expected) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", > line 598, in _check_result_or_exception > with self.assertRaisesRegex(err_type, expected): > AssertionError: "AttributeError" does not match " > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 224, in dump_stream > self.serializer.dump_stream(self._batched(iterator), stream) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 145, in dump_stream > for obj in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 213, in _batched > for item in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1391, in mapper > yield eval(*[a[o] for o in args_kwargs_offsets]) > ^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1371, in evaluate > return tuple(map(verify_and_convert_result, res)) >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1340, in verify_and_convert_result > return toInternal(result) >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 1291, in toInternal > return tuple( >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 1292, in > f.toInternal(v) if c else v > ^^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 907, in toInternal > return self.dataType.toInternal(obj) >^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 372, in toInternal > calendar.timegm(dt.utctimetuple()) if dt.tzinfo else > time.mktime(dt.timetuple()) > ..." > {code} > {code} > == > FAIL [0.096s]: test_udtf_init_with_additional_args > (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1816, in main > func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, > eval_type) > > ^^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 946, in read_udtf > raise PySparkRuntimeError( > pyspark.errors.exceptions.base.PySparkRuntimeError: >
[jira] [Resolved] (SPARK-48087) Python UDTF incompatibility in 3.5 client <> 4.0 server
[ https://issues.apache.org/jira/browse/SPARK-48087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48087. -- Fix Version/s: 3.5.2 Resolution: Fixed Issue resolved by pull request 46473 [https://github.com/apache/spark/pull/46473] > Python UDTF incompatibility in 3.5 client <> 4.0 server > --- > > Key: SPARK-48087 > URL: https://issues.apache.org/jira/browse/SPARK-48087 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 3.5.2 > > > {code} > == > FAIL [0.103s]: test_udtf_init_with_additional_args > (pyspark.sql.tests.connect.test_parity_udtf.ArrowUDTFParityTests.test_udtf_init_with_additional_args) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1816, in main > func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, > eval_type) > self._check_result_or_exception(TestUDTF, ret_type, expected) > File > "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_udtf.py", > line 598, in _check_result_or_exception > with self.assertRaisesRegex(err_type, expected): > AssertionError: "AttributeError" does not match " > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1834, in main > process() > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1826, in process > serializer.dump_stream(out_iter, outfile) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 224, in dump_stream > self.serializer.dump_stream(self._batched(iterator), stream) > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 145, in dump_stream > for obj in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/serializers.py", > line 213, in _batched > for item in iterator: > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1391, in mapper > yield eval(*[a[o] for o in args_kwargs_offsets]) > ^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1371, in evaluate > return tuple(map(verify_and_convert_result, res)) >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1340, in verify_and_convert_result > return toInternal(result) >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 1291, in toInternal > return tuple( >^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 1292, in > f.toInternal(v) if c else v > ^^^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 907, in toInternal > return self.dataType.toInternal(obj) >^ > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/sql/types.py", > line 372, in toInternal > calendar.timegm(dt.utctimetuple()) if dt.tzinfo else > time.mktime(dt.timetuple()) > ..." > {code} > {code} > == > FAIL [0.096s]: test_udtf_init_with_additional_args > (pyspark.sql.tests.connect.test_parity_udtf.UDTFParityTests.test_udtf_init_with_additional_args) > -- > pyspark.errors.exceptions.connect.PythonException: > An exception was thrown from the Python worker. Please see the stack trace > below. > Traceback (most recent call last): > File > "/home/runner/work/spark/spark/python/lib/pyspark.zip/pyspark/worker.py", > line 1816, in main > func, profiler, deserializer, serializer = read_udtf(pickleSer, infile, > eval_type) > > ^^^ > File >
[jira] [Resolved] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance
[ https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48094. -- Assignee: Dongjoon Hyun Resolution: Done Seems like we're done :-)? I will resolve this one for now but feel free to reopen if there are more work to be done! > Reduce GitHub Action usage according to ASF project allowance > - > > Key: SPARK-48094 > URL: https://issues.apache.org/jira/browse/SPARK-48094 > Project: Spark > Issue Type: Umbrella > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Attachments: Screenshot 2024-05-02 at 23.56.05.png > > > h2. ASF INFRA POLICY > - https://infra.apache.org/github-actions-policy.html > h2. MONITORING > - https://infra-reports.apache.org/#ghactions=spark=168 > !Screenshot 2024-05-02 at 23.56.05.png|width=100%! > h2. TARGET > * All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > * All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > * The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > * The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > h2. DEADLINE > bq. 17th of May, 2024 > Since the deadline is 17th of May, 2024, I set this as the highest priority, > `Blocker`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48094) Reduce GitHub Action usage according to ASF project allowance
[ https://issues.apache.org/jira/browse/SPARK-48094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48094: - Fix Version/s: 4.0.0 > Reduce GitHub Action usage according to ASF project allowance > - > > Key: SPARK-48094 > URL: https://issues.apache.org/jira/browse/SPARK-48094 > Project: Spark > Issue Type: Umbrella > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Blocker > Fix For: 4.0.0 > > Attachments: Screenshot 2024-05-02 at 23.56.05.png > > > h2. ASF INFRA POLICY > - https://infra.apache.org/github-actions-policy.html > h2. MONITORING > - https://infra-reports.apache.org/#ghactions=spark=168 > !Screenshot 2024-05-02 at 23.56.05.png|width=100%! > h2. TARGET > * All workflows MUST have a job concurrency level less than or equal to 20. > This means a workflow cannot have more than 20 jobs running at the same time > across all matrices. > * All workflows SHOULD have a job concurrency level less than or equal to 15. > Just because 20 is the max, doesn't mean you should strive for 20. > * The average number of minutes a project uses per calendar week MUST NOT > exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 > hours). > * The average number of minutes a project uses in any consecutive five-day > period MUST NOT exceed the equivalent of 30 full-time runners (216,000 > minutes, or 3,600 hours). > h2. DEADLINE > bq. 17th of May, 2024 > Since the deadline is 17th of May, 2024, I set this as the highest priority, > `Blocker`. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`
[ https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48163: - Fix Version/s: (was: 4.0.0) > Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - > get_resources_command` > > > Key: SPARK-48163 > URL: https://issues.apache.org/jira/browse/SPARK-48163 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > {code} > - SPARK-43923: commands send events ((get_resources_command { > [info] } > [info] ,None)) *** FAILED *** (35 milliseconds) > [info] VerifyEvents.this.listener.executeHolder.isDefined was false > (SparkConnectServiceSuite.scala:873) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`
[ https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-48163: -- Assignee: (was: Dongjoon Hyun) > Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - > get_resources_command` > > > Key: SPARK-48163 > URL: https://issues.apache.org/jira/browse/SPARK-48163 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > - SPARK-43923: commands send events ((get_resources_command { > [info] } > [info] ,None)) *** FAILED *** (35 milliseconds) > [info] VerifyEvents.this.listener.executeHolder.isDefined was false > (SparkConnectServiceSuite.scala:873) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-48163) Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`
[ https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844618#comment-17844618 ] Hyukjin Kwon commented on SPARK-48163: -- reverted in https://github.com/apache/spark/commit/bd896cac168aa5793413058ca706c73705edbf96 > Disable `SparkConnectServiceSuite.SPARK-43923: commands send events - > get_resources_command` > > > Key: SPARK-48163 > URL: https://issues.apache.org/jira/browse/SPARK-48163 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > {code} > - SPARK-43923: commands send events ((get_resources_command { > [info] } > [info] ,None)) *** FAILED *** (35 milliseconds) > [info] VerifyEvents.this.listener.executeHolder.isDefined was false > (SparkConnectServiceSuite.scala:873) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48164) Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`
[ https://issues.apache.org/jira/browse/SPARK-48164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48164. -- Resolution: Invalid > Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - > get_resources_command` > -- > > Key: SPARK-48164 > URL: https://issues.apache.org/jira/browse/SPARK-48164 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48164) Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`
[ https://issues.apache.org/jira/browse/SPARK-48164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-48164: - Target Version/s: (was: 4.0.0) > Re-enable `SparkConnectServiceSuite.SPARK-43923: commands send events - > get_resources_command` > -- > > Key: SPARK-48164 > URL: https://issues.apache.org/jira/browse/SPARK-48164 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48193) Make `maven-deploy-plugin` retry 3 times
[ https://issues.apache.org/jira/browse/SPARK-48193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48193. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46471 [https://github.com/apache/spark/pull/46471] > Make `maven-deploy-plugin` retry 3 times > > > Key: SPARK-48193 > URL: https://issues.apache.org/jira/browse/SPARK-48193 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48193) Make `maven-deploy-plugin` retry 3 times
[ https://issues.apache.org/jira/browse/SPARK-48193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48193: Assignee: BingKun Pan > Make `maven-deploy-plugin` retry 3 times > > > Key: SPARK-48193 > URL: https://issues.apache.org/jira/browse/SPARK-48193 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48192) Enable TPC-DS tests in forked repository
[ https://issues.apache.org/jira/browse/SPARK-48192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48192. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46470 [https://github.com/apache/spark/pull/46470] > Enable TPC-DS tests in forked repository > > > Key: SPARK-48192 > URL: https://issues.apache.org/jira/browse/SPARK-48192 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > TPC-DS is pretty important in SQL. Shoud at least enable it in forked > repositories (PR builders) which does not consume ASF resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48192) Enable TPC-DS tests in forked repository
[ https://issues.apache.org/jira/browse/SPARK-48192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48192: Assignee: Hyukjin Kwon > Enable TPC-DS tests in forked repository > > > Key: SPARK-48192 > URL: https://issues.apache.org/jira/browse/SPARK-48192 > Project: Spark > Issue Type: Sub-task > Components: Project Infra, SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > TPC-DS is pretty important in SQL. Shoud at least enable it in forked > repositories (PR builders) which does not consume ASF resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48192) Enable TPC-DS tests in forked repository
Hyukjin Kwon created SPARK-48192: Summary: Enable TPC-DS tests in forked repository Key: SPARK-48192 URL: https://issues.apache.org/jira/browse/SPARK-48192 Project: Spark Issue Type: Sub-task Components: Project Infra, SQL Affects Versions: 4.0.0 Reporter: Hyukjin Kwon TPC-DS is pretty important in SQL. Shoud at least enable it in forked repositories (PR builders) which does not consume ASF resource. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48045) Pandas API groupby with multi-agg-relabel ignores as_index=False
[ https://issues.apache.org/jira/browse/SPARK-48045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48045: Assignee: Saidatt Sinai Amonkar > Pandas API groupby with multi-agg-relabel ignores as_index=False > > > Key: SPARK-48045 > URL: https://issues.apache.org/jira/browse/SPARK-48045 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.5.1 > Environment: Python 3.11, PySpark 3.5.1, Pandas=2.2.2 >Reporter: Paul George >Assignee: Saidatt Sinai Amonkar >Priority: Minor > Labels: pull-request-available > > A Pandas API DataFrame groupby with as_index=False and a multilevel > relabeling, such as > {code:java} > from pyspark import pandas as ps > ps.DataFrame({"a": [0, 0], "b": [0, 1]}).groupby("a", > as_index=False).agg(b_max=("b", "max")){code} > fails to include group keys in the resulting DataFrame. This diverges from > expected behavior as well as from the behavior of native Pandas, e.g. > *actual* > {code:java} > b_max > 0 1 {code} > *expected* > {code:java} > a b_max > 0 0 1 {code} > > A possible fix is to prepend groupby key columns to {{*order*}} and > {{*columns*}} before filtering here: > [https://github.com/apache/spark/blob/master/python/pyspark/pandas/groupby.py#L327-L328] > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org