Re: [PR] [SPARK-32246][BUILD][INFRA] Add new Github Action to run Kinesis tests [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on code in PR #43736: URL: https://github.com/apache/spark/pull/43736#discussion_r1395285496 ## .github/workflows/build_and_test.yml: ## @@ -555,6 +555,81 @@ jobs: with: name: test-results-sparkr--${{ inputs.java }}-${{ inputs.hadoop

Re: [PR] [SPARK-32246][BUILD][INFRA] Add new Github Action to run Kinesis tests [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on code in PR #43736: URL: https://github.com/apache/spark/pull/43736#discussion_r1395285496 ## .github/workflows/build_and_test.yml: ## @@ -555,6 +555,81 @@ jobs: with: name: test-results-sparkr--${{ inputs.java }}-${{ inputs.hadoop

Re: [PR] [SPARK-32246][BUILD][INFRA] Add new Github Action to run Kinesis tests [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on code in PR #43736: URL: https://github.com/apache/spark/pull/43736#discussion_r1395282977 ## pom.xml: ## @@ -202,6 +202,7 @@ 4.1.17 14.0.1 3.1.9 +2.2.11 Review Comment: You can spin off this together with

Re: [PR] [SPARK-32246][BUILD][INFRA] Add new Github Action to run Kinesis tests [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on code in PR #43736: URL: https://github.com/apache/spark/pull/43736#discussion_r1395282370 ## connector/kinesis-asl/pom.xml: ## @@ -76,6 +76,12 @@ jackson-dataformat-cbor ${fasterxml.jackson.version} + + javax.xml.bind +

Re: [PR] [SPARK-32246][BUILD][INFRA] Add new Github Action to run Kinesis tests [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on code in PR #43736: URL: https://github.com/apache/spark/pull/43736#discussion_r1395281539 ## .github/workflows/build_and_test.yml: ## @@ -1049,7 +1124,7 @@ jobs: sudo install minikube-linux-amd64 /usr/local/bin/minikube rm

Re: [PR] [SPARK-45938][INFRA] Add `utils` to the dependencies of the `core/unsafe/network_common` module in `module.py` [spark]

2023-11-15 Thread via GitHub
LuciferYang commented on PR #43818: URL: https://github.com/apache/spark/pull/43818#issuecomment-1813946114 Thanks @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45938][INFRA] Add `utils` to the dependencies of the `core/unsafe/network_common` module in `module.py` [spark]

2023-11-15 Thread via GitHub
LuciferYang closed pull request #43818: [SPARK-45938][INFRA] Add `utils` to the dependencies of the `core/unsafe/network_common` module in `module.py` URL: https://github.com/apache/spark/pull/43818 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-32246][BUILD][INFRA] Add new Github Action to run Kinesis tests [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on code in PR #43736: URL: https://github.com/apache/spark/pull/43736#discussion_r1395280300 ## .github/workflows/build_and_test.yml: ## @@ -555,6 +555,81 @@ jobs: with: name: test-results-sparkr--${{ inputs.java }}-${{ inputs.hadoop

Re: [PR] [SPARK-45948][K8S] Make single-pod spark jobs respect `spark.app.id` [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on PR #43833: URL: https://github.com/apache/spark/pull/43833#issuecomment-1813937357 Could you review this when you have some time, please, @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [WIP][INFRA] Test PyArrow 14 [spark]

2023-11-15 Thread via GitHub
zhengruifeng commented on PR #43829: URL: https://github.com/apache/spark/pull/43829#issuecomment-1813930288 ``` pyarrow 14.0.1 pydantic 2.5.1 pydantic_core2.14.3 PyGObject3.36.0 ``` -- This is an automated

[PR] [SPARK-45948][K8S] Make single-pod spark jobs respect `spark.app.id` [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun opened a new pull request, #43833: URL: https://github.com/apache/spark/pull/43833 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-45946][SS] Fix use of deprecated FileUtils write to pass default charset in RocksDBSuite [spark]

2023-11-15 Thread via GitHub
anishshri-db commented on PR #43832: URL: https://github.com/apache/spark/pull/43832#issuecomment-1813879801 cc - @HeartSaVioR - PTAL, thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-45946] Fix use of deprecated FileUtils write to pass default charset in RocksDBSuite [spark]

2023-11-15 Thread via GitHub
anishshri-db opened a new pull request, #43832: URL: https://github.com/apache/spark/pull/43832 ### What changes were proposed in this pull request? Fix use of deprecated FileUtils write to pass default charset in RocksDBSuite ### Why are the changes needed? Without the

Re: [PR] [SPARK-45527][CORE] Use fraction to do the resource calculation [spark]

2023-11-15 Thread via GitHub
wbo4958 commented on code in PR #43494: URL: https://github.com/apache/spark/pull/43494#discussion_r1395211436 ## core/src/main/scala/org/apache/spark/scheduler/ExecutorResourcesAmounts.scala: ## @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-45527][CORE] Use fraction to do the resource calculation [spark]

2023-11-15 Thread via GitHub
wbo4958 commented on code in PR #43494: URL: https://github.com/apache/spark/pull/43494#discussion_r1395204414 ## core/src/main/scala/org/apache/spark/scheduler/ExecutorResourcesAmounts.scala: ## @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-45527][CORE] Use fraction to do the resource calculation [spark]

2023-11-15 Thread via GitHub
wbo4958 commented on code in PR #43494: URL: https://github.com/apache/spark/pull/43494#discussion_r1395200464 ## core/src/main/scala/org/apache/spark/scheduler/ExecutorResourcesAmounts.scala: ## @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-45527][CORE] Use fraction to do the resource calculation [spark]

2023-11-15 Thread via GitHub
wbo4958 commented on code in PR #43494: URL: https://github.com/apache/spark/pull/43494#discussion_r1395199672 ## core/src/main/scala/org/apache/spark/scheduler/ExecutorResourcesAmounts.scala: ## @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-45527][CORE] Use fraction to do the resource calculation [spark]

2023-11-15 Thread via GitHub
wbo4958 commented on code in PR #43494: URL: https://github.com/apache/spark/pull/43494#discussion_r1395199364 ## core/src/main/scala/org/apache/spark/resource/ResourceAllocator.scala: ## @@ -20,6 +20,42 @@ package org.apache.spark.resource import scala.collection.mutable

Re: [PR] [SPARK-45527][CORE] Use fraction to do the resource calculation [spark]

2023-11-15 Thread via GitHub
wbo4958 commented on code in PR #43494: URL: https://github.com/apache/spark/pull/43494#discussion_r1395199162 ## core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala: ## @@ -191,7 +191,10 @@ private[spark] class CoarseGrainedExecutorBackend(

Re: [PR] [SPARK-45927][PYTHON] Update path handling for Python data source [spark]

2023-11-15 Thread via GitHub
allisonwang-db commented on code in PR #43809: URL: https://github.com/apache/spark/pull/43809#discussion_r1395186286 ## sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -246,7 +246,15 @@ class DataFrameReader private[sql](sparkSession: SparkSession)

Re: [PR] [SPARK-45511] Fix state reader suite flakiness by clean up resources after each test run [spark]

2023-11-15 Thread via GitHub
chaoqin-li1123 commented on PR #43831: URL: https://github.com/apache/spark/pull/43831#issuecomment-1813830603 @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-45511] fix state reader suite flakiness by clean up resources after each test run [spark]

2023-11-15 Thread via GitHub
chaoqin-li1123 opened a new pull request, #43831: URL: https://github.com/apache/spark/pull/43831 ### What changes were proposed in this pull request? Fix state reader suite flakiness by clean up resources after each test. Because all state store instance share the same

Re: [PR] [SPARK-33393][SQL] Support SHOW TABLE EXTENDED in v2 [spark]

2023-11-15 Thread via GitHub
panbingkun commented on PR #37588: URL: https://github.com/apache/spark/pull/37588#issuecomment-1813824544 > thanks, merging to master! Thank you again for your great help! ❤️❤️❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [SPARK-45927][PYTHON] Update path handling for Python data source [spark]

2023-11-15 Thread via GitHub
cloud-fan commented on code in PR #43809: URL: https://github.com/apache/spark/pull/43809#discussion_r1395160860 ## sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -246,7 +246,15 @@ class DataFrameReader private[sql](sparkSession: SparkSession)

Re: [PR] [SPARK-45927][PYTHON] Update path handling for Python data source [spark]

2023-11-15 Thread via GitHub
cloud-fan commented on code in PR #43809: URL: https://github.com/apache/spark/pull/43809#discussion_r1395160402 ## sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -246,7 +246,15 @@ class DataFrameReader private[sql](sparkSession: SparkSession)

Re: [PR] [SPARK-33393][SQL] Support SHOW TABLE EXTENDED in v2 [spark]

2023-11-15 Thread via GitHub
cloud-fan closed pull request #37588: [SPARK-33393][SQL] Support SHOW TABLE EXTENDED in v2 URL: https://github.com/apache/spark/pull/37588 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-33393][SQL] Support SHOW TABLE EXTENDED in v2 [spark]

2023-11-15 Thread via GitHub
cloud-fan commented on PR #37588: URL: https://github.com/apache/spark/pull/37588#issuecomment-1813802035 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-45764][PYTHON][DOCS][3.3] Make code block copyable [spark]

2023-11-15 Thread via GitHub
panbingkun opened a new pull request, #43830: URL: https://github.com/apache/spark/pull/43830 ### What changes were proposed in this pull request? The pr aims to make code block `copyable `in pyspark docs. Backport above to `branch 3.3`. Master branch pr:

[PR] [WIP][INFRA] Test PyArrow 14 [spark]

2023-11-15 Thread via GitHub
zhengruifeng opened a new pull request, #43829: URL: https://github.com/apache/spark/pull/43829 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[PR] [SPARK-45764][PYTHON][DOCS][3.4] Make code block copyable [spark]

2023-11-15 Thread via GitHub
panbingkun opened a new pull request, #43828: URL: https://github.com/apache/spark/pull/43828 ### What changes were proposed in this pull request? The pr aims to make code block `copyable `in pyspark docs. Backport above to `branch 3.4`. Master branch pr:

Re: [PR] [SPARK-45747][SS] Use prefix key information in state metadata to handle reading state for session window aggregation [spark]

2023-11-15 Thread via GitHub
HeartSaVioR closed pull request #43788: [SPARK-45747][SS] Use prefix key information in state metadata to handle reading state for session window aggregation URL: https://github.com/apache/spark/pull/43788 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-45764][PYTHON][DOCS][3.5] Make code block copyable [spark]

2023-11-15 Thread via GitHub
panbingkun commented on PR #43827: URL: https://github.com/apache/spark/pull/43827#issuecomment-1813732360 I am making backports for other branches: branch-3.3, branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45827][SQL] Fix variant parquet reader. [spark]

2023-11-15 Thread via GitHub
cloud-fan closed pull request #43825: [SPARK-45827][SQL] Fix variant parquet reader. URL: https://github.com/apache/spark/pull/43825 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45827][SQL] Fix variant parquet reader. [spark]

2023-11-15 Thread via GitHub
cloud-fan commented on PR #43825: URL: https://github.com/apache/spark/pull/43825#issuecomment-1813730404 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-45764][PYTHON][DOCS][3.5] Make code block copyable [spark]

2023-11-15 Thread via GitHub
panbingkun opened a new pull request, #43827: URL: https://github.com/apache/spark/pull/43827 ### What changes were proposed in this pull request? The pr aims to make code block `copyable `in pyspark docs. The pr is backporting to `branch 3.5`. ### Why are the changes

[PR] [SPARK-45945][CONNECT] Add a helper function for `parser` [spark]

2023-11-15 Thread via GitHub
zhengruifeng opened a new pull request, #43826: URL: https://github.com/apache/spark/pull/43826 ### What changes were proposed in this pull request? Add a helper function for `parser` ### Why are the changes needed? we don't use other parser in planner, add this helper just

Re: [PR] [SPARK-45506][CONNECT] Add ivy URI support to SparkConnect addArtifact [spark]

2023-11-15 Thread via GitHub
LuciferYang commented on PR #43354: URL: https://github.com/apache/spark/pull/43354#issuecomment-1813722799 @vsevolodstep-db I found that after moving MavenUtilsSuite.scala to the common-utils module, it cannot pass the test. Do you know why? The current GA does not test this case

Re: [PR] [SPARK-45938][INFRA] Add `utils` to the dependencies of the `core/unsafe/network_common` module in `module.py` [spark]

2023-11-15 Thread via GitHub
LuciferYang commented on code in PR #43818: URL: https://github.com/apache/spark/pull/43818#discussion_r1395096661 ## dev/sparktestsupport/modules.py: ## @@ -178,7 +178,7 @@ def __hash__(self): core = Module( name="core", -dependencies=[kvstore, network_common,

Re: [PR] [SPARK-45938][INFRA] Add `utils` to the dependencies of the `core/unsafe/network_common` module in `module.py` [spark]

2023-11-15 Thread via GitHub
zhengruifeng commented on code in PR #43818: URL: https://github.com/apache/spark/pull/43818#discussion_r1395096255 ## dev/sparktestsupport/modules.py: ## @@ -113,6 +113,14 @@ def __hash__(self): ], ) +utils = Module( Review Comment: yeah, this is python :) --

Re: [PR] [SPARK-45764][PYTHON][DOCS] Make code block copyable [spark]

2023-11-15 Thread via GitHub
panbingkun commented on PR #43799: URL: https://github.com/apache/spark/pull/43799#issuecomment-1813716352 > @panbingkun would you mind creating a backporting PR? Actually yeah I think it's an important improvement in docs. Okay, Let me do it. -- This is an automated message from

Re: [PR] [SPARK-45938][INFRA] Add `utils` to the dependencies of the `core` module in `module.py` [spark]

2023-11-15 Thread via GitHub
LuciferYang commented on code in PR #43818: URL: https://github.com/apache/spark/pull/43818#discussion_r1395095578 ## dev/sparktestsupport/modules.py: ## @@ -113,6 +113,14 @@ def __hash__(self): ], ) +utils = Module( Review Comment: Moving it is because of

Re: [PR] [SPARK-45938][INFRA] Add `utils` to the dependencies of the `core` module in `module.py` [spark]

2023-11-15 Thread via GitHub
zhengruifeng commented on code in PR #43818: URL: https://github.com/apache/spark/pull/43818#discussion_r1395092781 ## dev/sparktestsupport/modules.py: ## @@ -178,7 +178,7 @@ def __hash__(self): core = Module( name="core", -dependencies=[kvstore, network_common,

Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-15 Thread via GitHub
LuciferYang commented on PR #43796: URL: https://github.com/apache/spark/pull/43796#issuecomment-1813710853 rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [MINOR] Fix some typo [spark]

2023-11-15 Thread via GitHub
HyukjinKwon closed pull request #43724: [MINOR] Fix some typo URL: https://github.com/apache/spark/pull/43724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] [SPARK-45922][CONNECT][CLIENT] Minor retries refactoring (follow-up to multiple policies) [spark]

2023-11-15 Thread via GitHub
HyukjinKwon commented on PR #43800: URL: https://github.com/apache/spark/pull/43800#issuecomment-1813710341 Mind retriggering https://github.com/cdkrot/apache_spark/actions/runs/6877183050/job/18704368968? I think it might be related. -- This is an automated message from the Apache Git

Re: [PR] [MINOR] Fix some typo [spark]

2023-11-15 Thread via GitHub
HyukjinKwon commented on PR #43724: URL: https://github.com/apache/spark/pull/43724#issuecomment-1813710430 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45562][DOCS] Regenerate `docs/sql-error-conditions.md` and add `42KDF` to `SQLSTATE table` in `error/README.md` [spark]

2023-11-15 Thread via GitHub
LuciferYang commented on PR #43817: URL: https://github.com/apache/spark/pull/43817#issuecomment-1813709899 Thanks @dongjoon-hyun @HyukjinKwon @beliefer @sandip-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-45930][SQL] Support non-deterministic UDFs in MapInPandas/MapInArrow [spark]

2023-11-15 Thread via GitHub
HyukjinKwon closed pull request #43810: [SPARK-45930][SQL] Support non-deterministic UDFs in MapInPandas/MapInArrow URL: https://github.com/apache/spark/pull/43810 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-45930][SQL] Support non-deterministic UDFs in MapInPandas/MapInArrow [spark]

2023-11-15 Thread via GitHub
HyukjinKwon commented on PR #43810: URL: https://github.com/apache/spark/pull/43810#issuecomment-1813707722 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-44488][SQL] Support deserializing long types when creating `Metadata` object from JObject [spark]

2023-11-15 Thread via GitHub
HyukjinKwon commented on PR #42083: URL: https://github.com/apache/spark/pull/42083#issuecomment-1813704504 It will be available from 4.0.0 most likely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-45533][CORE] Use j.l.r.Cleaner instead of finalize for RocksDBIterator/LevelDBIterator [spark]

2023-11-15 Thread via GitHub
LuciferYang commented on code in PR #43502: URL: https://github.com/apache/spark/pull/43502#discussion_r1395081107 ## common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java: ## @@ -182,23 +193,34 @@ public boolean skip(long n) { @Override public

Re: [PR] [SPARK-45873][CORE][YARN][K8S] Make ExecutorFailureTracker more tolerant when app remains sufficient resources [spark]

2023-11-15 Thread via GitHub
yaooqinn commented on PR #43746: URL: https://github.com/apache/spark/pull/43746#issuecomment-1813660864 > What do you mean by this, are you saying the Spark on YARN handling of preempted containers is not working properly? Meaning if the container is preempted it should not show up as an

Re: [PR] [SPARK-45931][PYTHON][DOCS] Refine docstring of mapInPandas [spark]

2023-11-15 Thread via GitHub
HyukjinKwon closed pull request #43811: [SPARK-45931][PYTHON][DOCS] Refine docstring of mapInPandas URL: https://github.com/apache/spark/pull/43811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-45931][PYTHON][DOCS] Refine docstring of mapInPandas [spark]

2023-11-15 Thread via GitHub
HyukjinKwon commented on PR #43811: URL: https://github.com/apache/spark/pull/43811#issuecomment-1813617078 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45936][PS] Optimize `Index.symmetric_difference` [spark]

2023-11-15 Thread via GitHub
HyukjinKwon closed pull request #43816: [SPARK-45936][PS] Optimize `Index.symmetric_difference` URL: https://github.com/apache/spark/pull/43816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45936][PS] Optimize `Index.symmetric_difference` [spark]

2023-11-15 Thread via GitHub
HyukjinKwon commented on PR #43816: URL: https://github.com/apache/spark/pull/43816#issuecomment-1813613763 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45935][PYTHON][DOCS] Fix RST files link substitutions error [spark]

2023-11-15 Thread via GitHub
panbingkun commented on code in PR #43815: URL: https://github.com/apache/spark/pull/43815#discussion_r1395047884 ## python/docs/source/conf.py: ## @@ -102,9 +102,9 @@ .. |examples| replace:: Examples .. _examples:

Re: [PR] [SPARK-45827] Fix variant parquet reader. [spark]

2023-11-15 Thread via GitHub
chenhao-db commented on code in PR #43825: URL: https://github.com/apache/spark/pull/43825#discussion_r1395045528 ## sql/core/src/test/scala/org/apache/spark/sql/VariantSuite.scala: ## @@ -73,5 +73,12 @@ class VariantSuite extends QueryTest with SharedSparkSession {

Re: [PR] [SPARK-45827] Fix variant parquet reader. [spark]

2023-11-15 Thread via GitHub
cloud-fan commented on code in PR #43825: URL: https://github.com/apache/spark/pull/43825#discussion_r1395044163 ## sql/core/src/test/scala/org/apache/spark/sql/VariantSuite.scala: ## @@ -73,5 +73,12 @@ class VariantSuite extends QueryTest with SharedSparkSession {

Re: [PR] [SPARK-33393][SQL] Support SHOW TABLE EXTENDED in v2 [spark]

2023-11-15 Thread via GitHub
panbingkun commented on PR #37588: URL: https://github.com/apache/spark/pull/37588#issuecomment-1813555489 @cloud-fan If you have time, could you please take a look at this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-45827] Fix variant parquet reader. [spark]

2023-11-15 Thread via GitHub
chenhao-db commented on PR #43825: URL: https://github.com/apache/spark/pull/43825#issuecomment-1813534407 @cloud-fan @HyukjinKwon could you help take a look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-44699][CORE] Add log when finished write events to file in EventLogFileWriter.closeWriter [spark]

2023-11-15 Thread via GitHub
github-actions[bot] commented on PR #42372: URL: https://github.com/apache/spark/pull/42372#issuecomment-1813504639 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [Spark Ticket][WIP]Added a warning to pop up in the case the user doesn't use gpus [spark]

2023-11-15 Thread via GitHub
github-actions[bot] commented on PR #42308: URL: https://github.com/apache/spark/pull/42308#issuecomment-1813504691 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-44685][SQL] Remove deprecated Catalog#createExternalTable [spark]

2023-11-15 Thread via GitHub
github-actions[bot] commented on PR #42356: URL: https://github.com/apache/spark/pull/42356#issuecomment-1813504669 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-45525][SQL][PYTHON] Initial support for Python data source write [spark]

2023-11-15 Thread via GitHub
allisonwang-db commented on PR #43791: URL: https://github.com/apache/spark/pull/43791#issuecomment-1813486370 @cloud-fan @HyukjinKwon @ueshin This PR is ready for review. It focuses on the optimizer/execution part of data source write and is independent of the DataFrameWriter. --

Re: [PR] [SPARK-45592][SPARK-45282][SQL] Correctness issue in AQE with InMemoryTableScanExec [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on PR #43760: URL: https://github.com/apache/spark/pull/43760#issuecomment-1813474266 For the record, I landed at branch-3.4 after resolving conflicts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-45935][PYTHON][DOCS] Fix RST files link substitutions error [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on code in PR #43815: URL: https://github.com/apache/spark/pull/43815#discussion_r1394963625 ## python/docs/source/conf.py: ## @@ -102,9 +102,9 @@ .. |examples| replace:: Examples .. _examples:

Re: [PR] [SPARK-45930][SQL] Support non-deterministic UDFs in MapInPandas/MapInArrow [spark]

2023-11-15 Thread via GitHub
allisonwang-db commented on PR #43810: URL: https://github.com/apache/spark/pull/43810#issuecomment-1813393808 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-45827] Fix variant parquet reader. [spark]

2023-11-15 Thread via GitHub
chenhao-db opened a new pull request, #43825: URL: https://github.com/apache/spark/pull/43825 ## What changes were proposed in this pull request? This is a follow-up of https://github.com/apache/spark/pull/43707. The previous PR missed a piece in the variant parquet reader: we are

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on PR #43814: URL: https://github.com/apache/spark/pull/43814#issuecomment-1813372467 I also cherry-picked this to branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-44488][SQL] Support deserializing long types when creating `Metadata` object from JObject [spark]

2023-11-15 Thread via GitHub
scottsand-db commented on PR #42083: URL: https://github.com/apache/spark/pull/42083#issuecomment-1813363910 Will this make Apache Spark 3.6 release? Or 4.0? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on PR #43814: URL: https://github.com/apache/spark/pull/43814#issuecomment-1813346533 Also, thank you, @yaooqinn and @bjornjorgensen , too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun closed pull request #43814: [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout URL: https://github.com/apache/spark/pull/43814 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on PR #43814: URL: https://github.com/apache/spark/pull/43814#issuecomment-1813345234 Thank you so much, @huaxingao . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
huaxingao commented on PR #43814: URL: https://github.com/apache/spark/pull/43814#issuecomment-1813344307 LGTM Thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45719][K8S][TESTS] Upgrade AWS SDK to v2 for Kubernetes IT [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on PR #43510: URL: https://github.com/apache/spark/pull/43510#issuecomment-1813343142 Welcome to the Apache Spark community, @junyuc25 ! I added you to the Apache Spark contributor group and assigned SPARK-45719 to you. -- This is an automated message from the

Re: [PR] [SPARK-45719][K8S][TESTS] Upgrade AWS SDK to v2 for Kubernetes IT [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun closed pull request #43510: [SPARK-45719][K8S][TESTS] Upgrade AWS SDK to v2 for Kubernetes IT URL: https://github.com/apache/spark/pull/43510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SARK-45866][SQL] Fix for Reuse of Exchange in AQE not happening when DPP filters are pushed down to the underlying Scan (like iceberg) [spark]

2023-11-15 Thread via GitHub
ahshahid commented on PR #43824: URL: https://github.com/apache/spark/pull/43824#issuecomment-1813334777 I will add the documentation to the new methods in next commit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] [SARK-45866][SQL] Fix for Reuse of Exchange in AQE not happening when DPP filters are pushed down to the underlying Scan (like iceberg) [spark]

2023-11-15 Thread via GitHub
ahshahid opened a new pull request, #43824: URL: https://github.com/apache/spark/pull/43824 ### What changes were proposed in this pull request? The main change in this PR is to augment the trait of SupportsRuntimeV2Filtering by adding two new methods `default boolean

Re: [PR] [SPARK-45762][CORE] Support shuffle managers defined in user jars by changing startup order [spark]

2023-11-15 Thread via GitHub
abellina commented on code in PR #43627: URL: https://github.com/apache/spark/pull/43627#discussion_r1394870890 ## core/src/main/scala/org/apache/spark/SparkEnv.scala: ## @@ -415,6 +418,11 @@ object SparkEnv extends Logging { advertiseAddress, blockManagerPort,

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on PR #43814: URL: https://github.com/apache/spark/pull/43814#issuecomment-1813319975 Could you review this `Spark Standalone` documentation PR when you have some time, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-45856] Move ArtifactManager from Spark Connect into SparkSession (sql/core) [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on code in PR #43735: URL: https://github.com/apache/spark/pull/43735#discussion_r1394847206 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -243,6 +244,16 @@ class SparkSession private( @Unstable def streams:

Re: [PR] [SPARK-45762][CORE] Support shuffle managers defined in user jars by changing startup order [spark]

2023-11-15 Thread via GitHub
tgravescs commented on code in PR #43627: URL: https://github.com/apache/spark/pull/43627#discussion_r1394837773 ## core/src/main/scala/org/apache/spark/SparkEnv.scala: ## @@ -415,6 +418,11 @@ object SparkEnv extends Logging { advertiseAddress, blockManagerPort,

Re: [PR] [SPARK-45810][Python] Create Python UDTF API to stop consuming rows from the input table [spark]

2023-11-15 Thread via GitHub
ueshin commented on PR #43682: URL: https://github.com/apache/spark/pull/43682#issuecomment-1813313871 Thanks! merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45810][Python] Create Python UDTF API to stop consuming rows from the input table [spark]

2023-11-15 Thread via GitHub
ueshin closed pull request #43682: [SPARK-45810][Python] Create Python UDTF API to stop consuming rows from the input table URL: https://github.com/apache/spark/pull/43682 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-45868][CONNECT] Make sure `spark.table` use the same parser with vanilla spark [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on PR #43741: URL: https://github.com/apache/spark/pull/43741#issuecomment-1813308124 Merged to master. Thank you, @zhengruifeng and all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-45868][CONNECT] Make sure `spark.table` use the same parser with vanilla spark [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun closed pull request #43741: [SPARK-45868][CONNECT] Make sure `spark.table` use the same parser with vanilla spark URL: https://github.com/apache/spark/pull/43741 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on PR #43814: URL: https://github.com/apache/spark/pull/43814#issuecomment-1813302230 > Thank you for fixing the dokumentasjon for K8S and Standalone :) Thanks, but I'm going to proceed K8s part in a new JIRA because of the previous comment. -- This is an

Re: [PR] [SPARK-45941][PS] Upgrade `pandas` to version 2.1.3 [spark]

2023-11-15 Thread via GitHub
bjornjorgensen commented on PR #43822: URL: https://github.com/apache/spark/pull/43822#issuecomment-1813301006 Thank you @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
bjornjorgensen commented on PR #43814: URL: https://github.com/apache/spark/pull/43814#issuecomment-1813297066 Thank you for fixing the dokumentasjon for K8S and Standalone :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
bjornjorgensen commented on code in PR #43814: URL: https://github.com/apache/spark/pull/43814#discussion_r1394833447 ## docs/running-on-kubernetes.md: ## @@ -1203,17 +1203,17 @@ See the [configuration page](configuration.html) for information on Spark config 3.0.0 -

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on code in PR #43814: URL: https://github.com/apache/spark/pull/43814#discussion_r1394832622 ## docs/running-on-kubernetes.md: ## @@ -1203,17 +1203,17 @@ See the [configuration page](configuration.html) for information on Spark config 3.0.0 -

Re: [PR] [SPARK-45925][SQL] Making SubqueryBroadcastExec equivalent to SubqueryAdaptiveBroadcastExec [spark]

2023-11-15 Thread via GitHub
ahshahid commented on PR #43807: URL: https://github.com/apache/spark/pull/43807#issuecomment-1813289805 @beliefer I think you may be right. In my another PR for broadcast-var-pushdown, I am seeing unmodified SubqueryAdaptiveBroadcastExec in the stage cache 's keys. May be it is an issue

Re: [PR] [SPARK-45925][SQL] Making SubqueryBroadcastExec equivalent to SubqueryAdaptiveBroadcastExec [spark]

2023-11-15 Thread via GitHub
ahshahid closed pull request #43807: [SPARK-45925][SQL] Making SubqueryBroadcastExec equivalent to SubqueryAdaptiveBroadcastExec URL: https://github.com/apache/spark/pull/43807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-15 Thread via GitHub
ahshahid closed pull request #43806: [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec URL: https://github.com/apache/spark/pull/43806 -- This is an automated message from the Apache Git Service. To respond to

[PR] [SPARK-45942][Core] Only do the thread interruption check for putIterator on executors [spark]

2023-11-15 Thread via GitHub
huanliwang-db opened a new pull request, #43823: URL: https://github.com/apache/spark/pull/43823 ### What changes were proposed in this pull request? Only do the thread interruption check for putIterator on executors ### Why are the changes needed?

Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-15 Thread via GitHub
ahshahid commented on PR #43806: URL: https://github.com/apache/spark/pull/43806#issuecomment-1813288994 @beliefer I think you may be right. In my another PR for broadcast-var-pushdown, I am seeing unmodified SubqueryAdaptiveBroadcastExec in the stage cache 's keys. May be it is an issue

Re: [PR] [SPARK-45527][CORE] Use fraction to do the resource calculation [spark]

2023-11-15 Thread via GitHub
tgravescs commented on code in PR #43494: URL: https://github.com/apache/spark/pull/43494#discussion_r1384051957 ## core/src/main/scala/org/apache/spark/resource/ResourceUtils.scala: ## @@ -170,16 +170,16 @@ private[spark] object ResourceUtils extends Logging { // integer

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on code in PR #43814: URL: https://github.com/apache/spark/pull/43814#discussion_r1394826363 ## docs/running-on-kubernetes.md: ## @@ -1203,17 +1203,17 @@ See the [configuration page](configuration.html) for information on Spark config 3.0.0 -

Re: [PR] [SPARK-45934][DOCS] Fix `Spark Standalone` documentation table layout [spark]

2023-11-15 Thread via GitHub
dongjoon-hyun commented on code in PR #43814: URL: https://github.com/apache/spark/pull/43814#discussion_r1394824967 ## docs/running-on-kubernetes.md: ## @@ -1203,17 +1203,17 @@ See the [configuration page](configuration.html) for information on Spark config 3.0.0 -

  1   2   >