[GitHub] [spark] zhengruifeng commented on a diff in pull request #43010: [SPARK-41086][SQL] Use DataFrame ID to semantically validate CollectMetrics

2023-09-20 Thread via GitHub
zhengruifeng commented on code in PR #43010: URL: https://github.com/apache/spark/pull/43010#discussion_r1332491666 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -990,6 +990,9 @@ message CollectMetrics { // (Required) The metric

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43010: [SPARK-41086][SQL] Use DataFrame ID to semantically validate CollectMetrics

2023-09-20 Thread via GitHub
zhengruifeng commented on code in PR #43010: URL: https://github.com/apache/spark/pull/43010#discussion_r1332489254 ## python/pyspark/sql/connect/plan.py: ## @@ -1197,6 +1197,7 @@ def plan(self, session: "SparkConnectClient") -> proto.Relation:

[GitHub] [spark] cloud-fan commented on a diff in pull request #42612: [SPARK-44913][SQL] DS V2 supports push down V2 UDF that has magic method

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #42612: URL: https://github.com/apache/spark/pull/42612#discussion_r1332486360 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala: ## @@ -270,6 +271,8 @@ object SerializerSupport { *

[GitHub] [spark] cloud-fan commented on a diff in pull request #43010: [SPARK-41086][SQL] Use DataFrame ID to semantically validate CollectMetrics

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #43010: URL: https://github.com/apache/spark/pull/43010#discussion_r1332484719 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala: ## @@ -779,34 +779,35 @@ class AnalysisSuite extends AnalysisTest with

[GitHub] [spark] cloud-fan commented on a diff in pull request #43010: [SPARK-41086][SQL] Use DataFrame ID to semantically validate CollectMetrics

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #43010: URL: https://github.com/apache/spark/pull/43010#discussion_r1332484719 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala: ## @@ -779,34 +779,35 @@ class AnalysisSuite extends AnalysisTest with

[GitHub] [spark] cloud-fan commented on a diff in pull request #43010: [SPARK-41086][SQL] Use DataFrame ID to semantically validate CollectMetrics

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #43010: URL: https://github.com/apache/spark/pull/43010#discussion_r1332484136 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1097,17 +1097,15 @@ trait CheckAnalysis extends PredicateHelper

[GitHub] [spark] cloud-fan commented on a diff in pull request #43010: [SPARK-41086][SQL] Use DataFrame ID to semantically validate CollectMetrics

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #43010: URL: https://github.com/apache/spark/pull/43010#discussion_r1332483781 ## python/pyspark/sql/connect/plan.py: ## @@ -1197,6 +1197,7 @@ def plan(self, session: "SparkConnectClient") -> proto.Relation:

[GitHub] [spark] itholic opened a new pull request, #43024: [SPARK-45246][BUILD][PS] Encourage using latest `jinja2` other than documentation build

2023-09-20 Thread via GitHub
itholic opened a new pull request, #43024: URL: https://github.com/apache/spark/pull/43024 ### What changes were proposed in this pull request? This PR proposes to update the `jinja2` version from `dev/requirements.txt` to the latest version for PySpark. ### Why are the

[GitHub] [spark] rangadi commented on a diff in pull request #43023: [SPARK-45245] PythonWorkerFactory: Timeout if worker does not connect back.

2023-09-20 Thread via GitHub
rangadi commented on code in PR #43023: URL: https://github.com/apache/spark/pull/43023#discussion_r1332415585 ## core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala: ## @@ -184,10 +185,20 @@ private[spark] class PythonWorkerFactory(

[GitHub] [spark] rangadi commented on pull request #43023: [SPARK-45245] PythonWorkerFactory: Timeout if worker does not connect back.

2023-09-20 Thread via GitHub
rangadi commented on PR #43023: URL: https://github.com/apache/spark/pull/43023#issuecomment-1728734215 cc: @HyukjinKwon, @WweiL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] rangadi opened a new pull request, #43023: [SPARK-45245] PythonWorkerFactory: Timeout if worker does not connect back.

2023-09-20 Thread via GitHub
rangadi opened a new pull request, #43023: URL: https://github.com/apache/spark/pull/43023 ### What changes were proposed in this pull request? `createSimpleWorker()` method in `PythonWorkerFactory` waits forever if the worker fails to connect back to the server. This is

[GitHub] [spark] zwangsheng commented on pull request #43022: [SPARK-45244][IT] Correct spelling in VolcanoTestsSuite

2023-09-20 Thread via GitHub
zwangsheng commented on PR #43022: URL: https://github.com/apache/spark/pull/43022#issuecomment-1728726734 friendly ping @Yikun, PTAL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zwangsheng opened a new pull request, #43022: [SPARK-45244][IT] Correct spelling in VolcanoTestsSuite

2023-09-20 Thread via GitHub
zwangsheng opened a new pull request, #43022: URL: https://github.com/apache/spark/pull/43022 ### What changes were proposed in this pull request? ### Why are the changes needed? Correct typo in VolcanoTestsSuite, which naming methods with `checkAnnotaion`.

[GitHub] [spark] HeartSaVioR commented on pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails

2023-09-20 Thread via GitHub
HeartSaVioR commented on PR #42895: URL: https://github.com/apache/spark/pull/42895#issuecomment-1728709440 UPDATE: we sorted the issue with the JIRA account out now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HeartSaVioR commented on pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails

2023-09-20 Thread via GitHub
HeartSaVioR commented on PR #42895: URL: https://github.com/apache/spark/pull/42895#issuecomment-1728702545 Please send the mail to priv...@spark.apache.org and look for action from PMC members. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub
itholic commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1332390021 ## python/pyspark/pandas/datetimes.py: ## @@ -116,26 +117,59 @@ def pandas_microsecond(s) -> ps.Series[np.int32]: # type: ignore[no-untyped-def def

[GitHub] [spark] neilramaswamy commented on pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails

2023-09-20 Thread via GitHub
neilramaswamy commented on PR #42895: URL: https://github.com/apache/spark/pull/42895#issuecomment-1728699388 My account hasn't been approved yet (submitted request 8 days ago). I vaguely remember there being an email/channel to ping about this, but I can't find the name. Do you recall?

[GitHub] [spark] HeartSaVioR commented on pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails

2023-09-20 Thread via GitHub
HeartSaVioR commented on PR #42895: URL: https://github.com/apache/spark/pull/42895#issuecomment-1728695227 @neilramaswamy Could you please give your JIRA account so that I can add you as contributor and assign the ticket to you? Thanks! -- This is an automated message from the Apache

[GitHub] [spark] HeartSaVioR closed pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails

2023-09-20 Thread via GitHub
HeartSaVioR closed pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails URL: https://github.com/apache/spark/pull/42895 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HeartSaVioR commented on pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails

2023-09-20 Thread via GitHub
HeartSaVioR commented on PR #42895: URL: https://github.com/apache/spark/pull/42895#issuecomment-1728693932 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HeartSaVioR commented on pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails

2023-09-20 Thread via GitHub
HeartSaVioR commented on PR #42895: URL: https://github.com/apache/spark/pull/42895#issuecomment-1728693569 Looks like these test failures are flaky. As long as I see the previous test failure is gone, I could say this PR passes all tests. -- This is an automated message from the Apache

[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub
itholic commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1332376664 ## python/pyspark/pandas/datetimes.py: ## @@ -116,26 +117,59 @@ def pandas_microsecond(s) -> ps.Series[np.int32]: # type: ignore[no-untyped-def def

[GitHub] [spark] itholic commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub
itholic commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1332376264 ## python/pyspark/pandas/datetimes.py: ## @@ -116,26 +117,59 @@ def pandas_microsecond(s) -> ps.Series[np.int32]: # type: ignore[no-untyped-def def

[GitHub] [spark] HeartSaVioR commented on pull request #42940: [SPARK-45178][SS] Fallback to execute a single batch for Trigger.AvailableNow with unsupported sources rather than using wrapper

2023-09-20 Thread via GitHub
HeartSaVioR commented on PR #42940: URL: https://github.com/apache/spark/pull/42940#issuecomment-1728653787 https://lists.apache.org/thread/ljronxf6bymvqjmlwpzy84gzgvnqrmoh ^^^ DISCUSSION thread in dev@. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #43019: [SPARK-45219][PYTHON][DOCS] Refine docstring of withColumn(s)Renamed

2023-09-20 Thread via GitHub
HyukjinKwon commented on code in PR #43019: URL: https://github.com/apache/spark/pull/43019#discussion_r1332312402 ## python/pyspark/sql/dataframe.py: ## @@ -5797,25 +5798,53 @@ def withColumnRenamed(self, existing: str, new: str) -> "DataFrame": Parameters

[GitHub] [spark] zhengruifeng commented on pull request #43011: [WIP][SPARK-45232][DOCS] Add missing function groups to SQL references

2023-09-20 Thread via GitHub
zhengruifeng commented on PR #43011: URL: https://github.com/apache/spark/pull/43011#issuecomment-1728625579 we can check the documents built in the GA of this PR, https://github.com/zhengruifeng/spark/actions/runs/6249096629

[GitHub] [spark] zhengruifeng commented on pull request #43011: [WIP][SPARK-45232][DOCS] Add missing function groups to SQL references

2023-09-20 Thread via GitHub
zhengruifeng commented on PR #43011: URL: https://github.com/apache/spark/pull/43011#issuecomment-1728624460 ![image](https://github.com/apache/spark/assets/7322292/09ae6ec9-a2a6-4f00-b260-6c680ff3)

[GitHub] [spark] zhengruifeng commented on pull request #42382: [ML] Remove usage of RDD APIs for load/save in spark-ml

2023-09-20 Thread via GitHub
zhengruifeng commented on PR #42382: URL: https://github.com/apache/spark/pull/42382#issuecomment-1728616265 what about adding a implicit conversions `sc -> spark` in ml: ml only use `spark`; 3-rd lib will need to import this implicit conversion; or just keep two

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #43019: [SPARK-45219][PYTHON][DOCS] Refine docstring of withColumn(s)Renamed

2023-09-20 Thread via GitHub
HyukjinKwon commented on code in PR #43019: URL: https://github.com/apache/spark/pull/43019#discussion_r1332312402 ## python/pyspark/sql/dataframe.py: ## @@ -5797,25 +5798,53 @@ def withColumnRenamed(self, existing: str, new: str) -> "DataFrame": Parameters

[GitHub] [spark] xiongbo-sjtu opened a new pull request, #43021: [SPARK-45227][CORE] Fix an issue with CoarseGrainedExecutorBackend wh…

2023-09-20 Thread via GitHub
xiongbo-sjtu opened a new pull request, #43021: URL: https://github.com/apache/spark/pull/43021 ### What changes were proposed in this pull request? Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck ### Why are the

[GitHub] [spark] HyukjinKwon closed pull request #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub
HyukjinKwon closed pull request #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference URL: https://github.com/apache/spark/pull/43013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub
HyukjinKwon commented on PR #43013: URL: https://github.com/apache/spark/pull/43013#issuecomment-1728598145 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub
HyukjinKwon commented on PR #43013: URL: https://github.com/apache/spark/pull/43013#issuecomment-1728597889 Yeah. We should probably document `pyspark.ml.connect` separately but there haven't been examples out yet. Let's document that separately. -- This is an automated message from the

[GitHub] [spark] HyukjinKwon closed pull request #42997: [SPARK-45216][SQL] Fix non-deterministic seeded Dataset APIs

2023-09-20 Thread via GitHub
HyukjinKwon closed pull request #42997: [SPARK-45216][SQL] Fix non-deterministic seeded Dataset APIs URL: https://github.com/apache/spark/pull/42997 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #42997: [SPARK-45216][SQL] Fix non-deterministic seeded Dataset APIs

2023-09-20 Thread via GitHub
HyukjinKwon commented on PR #42997: URL: https://github.com/apache/spark/pull/42997#issuecomment-1728596271 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gatorsmile commented on pull request #43011: [WIP][SPARK-45232][DOCS] Add missing function groups to SQL references

2023-09-20 Thread via GitHub
gatorsmile commented on PR #43011: URL: https://github.com/apache/spark/pull/43011#issuecomment-1728585646 cc @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] github-actions[bot] commented on pull request #41384: [SPARK-43297][ML]Use scala parallel collection ParVector to accelarate LocalKMeans.

2023-09-20 Thread via GitHub
github-actions[bot] commented on PR #41384: URL: https://github.com/apache/spark/pull/41384#issuecomment-1728583712 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] zhengruifeng commented on pull request #43011: [WIP][SPARK-45232][DOCS] Add missing function groups to SQL references

2023-09-20 Thread via GitHub
zhengruifeng commented on PR #43011: URL: https://github.com/apache/spark/pull/43011#issuecomment-1728581850 @allisonwang-db I am not sure, I don't see document for FROM clause, you may check 3 places: - https://spark.apache.org/docs/latest/api/sql/index.html#explode -

[GitHub] [spark] itholic commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub
itholic commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1728575805 Yeah, mypy check always tricky  Let me take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] heyihong commented on pull request #43017: [SPARK-45239] Reduce default spark.connect.jvmStacktrace.maxSize

2023-09-20 Thread via GitHub
heyihong commented on PR #43017: URL: https://github.com/apache/spark/pull/43017#issuecomment-1728573383 > Hi @heyihong , since you already have this error enrichment PR: #42987 Does it make sense to still support this `spark.connect.jvmStacktrace.maxSize` in spark connect? Also, just

[GitHub] [spark] zhengruifeng commented on pull request #42860: [SPARK-45107][PYTHON][DOCS] Refine docstring of explode

2023-09-20 Thread via GitHub
zhengruifeng commented on PR #42860: URL: https://github.com/apache/spark/pull/42860#issuecomment-1728571438 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #42860: [SPARK-45107][PYTHON][DOCS] Refine docstring of explode

2023-09-20 Thread via GitHub
zhengruifeng closed pull request #42860: [SPARK-45107][PYTHON][DOCS] Refine docstring of explode URL: https://github.com/apache/spark/pull/42860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #43001: [SPARK-45218][PYTHON][DOCS] Refine docstring of Column.isin

2023-09-20 Thread via GitHub
zhengruifeng commented on PR #43001: URL: https://github.com/apache/spark/pull/43001#issuecomment-1728570548 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #43001: [SPARK-45218][PYTHON][DOCS] Refine docstring of Column.isin

2023-09-20 Thread via GitHub
zhengruifeng closed pull request #43001: [SPARK-45218][PYTHON][DOCS] Refine docstring of Column.isin URL: https://github.com/apache/spark/pull/43001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] zhengruifeng commented on pull request #43012: [SPARK-45234][PYTHON][DOCS] Refine DocString of `regr_*` functions

2023-09-20 Thread via GitHub
zhengruifeng commented on PR #43012: URL: https://github.com/apache/spark/pull/43012#issuecomment-1728568181 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #43012: [SPARK-45234][PYTHON][DOCS] Refine DocString of `regr_*` functions

2023-09-20 Thread via GitHub
zhengruifeng closed pull request #43012: [SPARK-45234][PYTHON][DOCS] Refine DocString of `regr_*` functions URL: https://github.com/apache/spark/pull/43012 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

2023-09-20 Thread via GitHub
zhengruifeng commented on code in PR #43014: URL: https://github.com/apache/spark/pull/43014#discussion_r1332291118 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -1237,13 +1237,23 @@ def test_sql(self): self.assertEqual(1, len(pdf.index)) def

[GitHub] [spark] HyukjinKwon commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
HyukjinKwon commented on PR #42949: URL: https://github.com/apache/spark/pull/42949#issuecomment-1728565241 Mind running `dev/reformat-python` one more last time? Seems like there's a diff between old and new black versions. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] xiongbo-sjtu closed pull request #43020: [SPARK-45227][CORE] Fix an issue with CoarseGrainedExecutorBackend

2023-09-20 Thread via GitHub
xiongbo-sjtu closed pull request #43020: [SPARK-45227][CORE] Fix an issue with CoarseGrainedExecutorBackend URL: https://github.com/apache/spark/pull/43020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] xiongbo-sjtu opened a new pull request, #43020: [SPARK-45227][CORE] Fix an issue with CoarseGrainedExecutorBackend

2023-09-20 Thread via GitHub
xiongbo-sjtu opened a new pull request, #43020: URL: https://github.com/apache/spark/pull/43020 ### What changes were proposed in this pull request? Fix a subtle thread-safety issue with CoarseGrainedExecutorBackend where an executor process randomly gets stuck ### Why are the

[GitHub] [spark] dongjoon-hyun commented on pull request #43002: [SPARK-43498][PS][TESTS] Enable `StatsTests.test_axis_on_dataframe` for pandas 2.0.0.

2023-09-20 Thread via GitHub
dongjoon-hyun commented on PR #43002: URL: https://github.com/apache/spark/pull/43002#issuecomment-1728516043 +1, LGTM, too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] amaliujia commented on pull request #43010: [SPARK-41086][SQL] Use DataFrame ID to semantically validate CollectMetrics

2023-09-20 Thread via GitHub
amaliujia commented on PR #43010: URL: https://github.com/apache/spark/pull/43010#issuecomment-1728458545 @cloud-fan trying this idea -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #42993: [SPARK-45231][INFRA] Remove unrecognized and meaningless command about `Ammonite` from the GA testing workflow.

2023-09-20 Thread via GitHub
dongjoon-hyun commented on PR #42993: URL: https://github.com/apache/spark/pull/42993#issuecomment-1728412993 cc @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun commented on pull request #43008: [SPARK-44113][BUILD][INFRA][DOCS] Drop support for Scala 2.12

2023-09-20 Thread via GitHub
dongjoon-hyun commented on PR #43008: URL: https://github.com/apache/spark/pull/43008#issuecomment-1728334196 I'm good with this single PR. Could you resolve the conflict? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun closed pull request #43018: [SPARK-45241][INFRA] Use Zulu JDK in `build_and_test` and `build_java21` GitHub Action and Java 21

2023-09-20 Thread via GitHub
dongjoon-hyun closed pull request #43018: [SPARK-45241][INFRA] Use Zulu JDK in `build_and_test` and `build_java21` GitHub Action and Java 21 URL: https://github.com/apache/spark/pull/43018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on pull request #43018: [SPARK-45241][INFRA] Use Zulu JDK in `build_and_test` and `build_java21` GitHub Action and Java 21

2023-09-20 Thread via GitHub
dongjoon-hyun commented on PR #43018: URL: https://github.com/apache/spark/pull/43018#issuecomment-1728265731 The one Parquet test failure. I verified manually. It seems to be a flaky test. ``` $ java -version openjdk version "21" 2023-09-19 LTS ... $ build/sbt

[GitHub] [spark] dongjoon-hyun commented on pull request #43018: [SPARK-45241][INFRA] Use Zulu JDK in `build_and_test` and `build_java21` GitHub Action and Java 21

2023-09-20 Thread via GitHub
dongjoon-hyun commented on PR #43018: URL: https://github.com/apache/spark/pull/43018#issuecomment-1728258545 Thank you, @viirya . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] viirya commented on pull request #43018: [SPARK-45241][INFRA] Use Zulu JDK in `build_and_test` and `build_java21` GitHub Action and Java 21

2023-09-20 Thread via GitHub
viirya commented on PR #43018: URL: https://github.com/apache/spark/pull/43018#issuecomment-1728214949 Looks good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] robreeves closed pull request #43006: [SPARK-TODO][CORE][SQL] Eliminate intermediate staging directory for dynamic partition overwrite commits

2023-09-20 Thread via GitHub
robreeves closed pull request #43006: [SPARK-TODO][CORE][SQL] Eliminate intermediate staging directory for dynamic partition overwrite commits URL: https://github.com/apache/spark/pull/43006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on pull request #43018: [SPARK-45241][INFRA] Use Zulu JDK in `build_and_test` and `build_java21` GitHub Action and Java 21

2023-09-20 Thread via GitHub
dongjoon-hyun commented on PR #43018: URL: https://github.com/apache/spark/pull/43018#issuecomment-1728175236 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on pull request #43018: [SPARK-45241][INFRA] Use Zulu JDK in `build_and_test` and `build_java21` GitHub Action and Java 21

2023-09-20 Thread via GitHub
dongjoon-hyun commented on PR #43018: URL: https://github.com/apache/spark/pull/43018#issuecomment-1728174982 Java 21 passed. ![Screenshot 2023-09-20 at 10 39 23  AM](https://github.com/apache/spark/assets/9700541/ee99d4e7-cad4-4ad3-9324-631a00b7730a) -- This is an automated

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Error Enrichment for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331977811 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] sunchao commented on a diff in pull request #42612: [SPARK-44913][SQL] DS V2 supports push down V2 UDF that has magic method

2023-09-20 Thread via GitHub
sunchao commented on code in PR #42612: URL: https://github.com/apache/spark/pull/42612#discussion_r1331964553 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala: ## @@ -279,7 +283,9 @@ case class StaticInvoke( inputTypes:

[GitHub] [spark] peter-toth commented on a diff in pull request #42997: [SPARK-45216][SQL] Fix non-deterministic seeded Dataset APIs

2023-09-20 Thread via GitHub
peter-toth commented on code in PR #42997: URL: https://github.com/apache/spark/pull/42997#discussion_r1331955820 ## python/pyspark/sql/connect/functions.py: ## @@ -388,7 +390,7 @@ def rand(seed: Optional[int] = None) -> Column: if seed is not None: return

[GitHub] [spark] peter-toth commented on a diff in pull request #42997: [SPARK-45216][SQL] Fix non-deterministic seeded Dataset APIs

2023-09-20 Thread via GitHub
peter-toth commented on code in PR #42997: URL: https://github.com/apache/spark/pull/42997#discussion_r1331955820 ## python/pyspark/sql/connect/functions.py: ## @@ -388,7 +390,7 @@ def rand(seed: Optional[int] = None) -> Column: if seed is not None: return

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42997: [SPARK-45216][SQL] Fix non-deterministic seeded Dataset APIs

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #42997: URL: https://github.com/apache/spark/pull/42997#discussion_r1331943027 ## python/pyspark/sql/connect/functions.py: ## @@ -388,7 +390,7 @@ def rand(seed: Optional[int] = None) -> Column: if seed is not None: return

[GitHub] [spark] allisonwang-db opened a new pull request, #43019: [SPARK-45219][PYTHON][DOCS] Refine docstring of withColumn(s)Renamed

2023-09-20 Thread via GitHub
allisonwang-db opened a new pull request, #43019: URL: https://github.com/apache/spark/pull/43019 ### What changes were proposed in this pull request? This PR refines the docstring of `DataFrame.withColumnRenamed` and `DataFrame.withColumnsRenamed`. ### Why are the

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331929198 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] allisonwang-db commented on a diff in pull request #43011: [WIP][SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #43011: URL: https://github.com/apache/spark/pull/43011#discussion_r1331925178 ## sql/gen-sql-functions-docs.py: ## @@ -34,6 +34,8 @@ "math_funcs", "conditional_funcs", "generator_funcs", "predicate_funcs", "string_funcs",

[GitHub] [spark] allisonwang-db commented on a diff in pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #43014: URL: https://github.com/apache/spark/pull/43014#discussion_r1331920397 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -1237,13 +1237,23 @@ def test_sql(self): self.assertEqual(1, len(pdf.index))

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331900167 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331897585 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file:

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331899412 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file:

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331901403 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331900167 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331901152 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] peter-toth commented on pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
peter-toth commented on PR #42864: URL: https://github.com/apache/spark/pull/42864#issuecomment-1728048096 Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB`

2023-09-20 Thread via GitHub
dongjoon-hyun closed pull request #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB` URL: https://github.com/apache/spark/pull/43015 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] dongjoon-hyun opened a new pull request, #43018: [SPARK-45241][INFRA] Use Zulu JDK in java-other-versions pipeline and Java 21

2023-09-20 Thread via GitHub
dongjoon-hyun opened a new pull request, #43018: URL: https://github.com/apache/spark/pull/43018 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] dongjoon-hyun commented on pull request #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB`

2023-09-20 Thread via GitHub
dongjoon-hyun commented on PR #43015: URL: https://github.com/apache/spark/pull/43015#issuecomment-1727988603 Merged to master/3.5/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] yaooqinn commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub
yaooqinn commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331832544 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331894962 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file:

[GitHub] [spark] cloud-fan closed pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
cloud-fan closed pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions URL: https://github.com/apache/spark/pull/42864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331785918 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] cdkrot commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on PR #42949: URL: https://github.com/apache/spark/pull/42949#issuecomment-1727917945 Updated fork's master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] hdaikoku commented on a diff in pull request #42426: [SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-09-20 Thread via GitHub
hdaikoku commented on code in PR #42426: URL: https://github.com/apache/spark/pull/42426#discussion_r1331742974 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java: ## @@ -274,7 +287,13 @@ private void

[GitHub] [spark] cloud-fan commented on pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
cloud-fan commented on PR #42864: URL: https://github.com/apache/spark/pull/42864#issuecomment-1727813318 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331247912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331243346 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -93,33 +179,65 @@ private[client] object

[GitHub] [spark] heyihong opened a new pull request, #43017: [SPARK-45239] Reduce default spark.connect.jvmStacktrace.maxSize

2023-09-20 Thread via GitHub
heyihong opened a new pull request, #43017: URL: https://github.com/apache/spark/pull/43017 ### What changes were proposed in this pull request? - Reduce default spark.connect.jvmStacktrace.maxSize ### Why are the changes needed? -

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331693297 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] cloud-fan commented on pull request #42997: [SPARK-45216][SQL] Fix non-deterministic seeded Dataset APIs

2023-09-20 Thread via GitHub
cloud-fan commented on PR #42997: URL: https://github.com/apache/spark/pull/42997#issuecomment-1727800532 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on a diff in pull request #43005: [WIP][SPARK-44112][BUILD][INFRA] Drop Java 8 and 11 support

2023-09-20 Thread via GitHub
LuciferYang commented on code in PR #43005: URL: https://github.com/apache/spark/pull/43005#discussion_r1331245458 ## .github/workflows/build_coverage.yml: ## @@ -17,7 +17,7 @@ # under the License. # -name: "Build / Coverage (master, Scala 2.12, Hadoop 3, JDK 8)" +name:

[GitHub] [spark] zhengruifeng commented on pull request #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub
zhengruifeng commented on PR #43013: URL: https://github.com/apache/spark/pull/43013#issuecomment-1727396313 `/mllib` in scala, `pyspark.ml` and `pyspark.mllib` in python, don't work on connect. only new module `pyspark.ml.connect` works on connect. `pyspark.ml` contains many

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331247912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] cloud-fan commented on pull request #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case

2023-09-20 Thread via GitHub
cloud-fan commented on PR #42971: URL: https://github.com/apache/spark/pull/42971#issuecomment-1727363616 The failed streaming test is unrelated, I'm merging it to master/3.5, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on pull request #43008: [SPARK-44113][BUILD][INFRA][DOCS] Drop support for Scala 2.12

2023-09-20 Thread via GitHub
LuciferYang commented on PR #43008: URL: https://github.com/apache/spark/pull/43008#issuecomment-1727210378 cc @dongjoon-hyun FYI Do we need to further split this PR ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun closed pull request #43007: [SPARK-45229][CORE][UI] Show the number of drivers waiting in SUBMITTED status in MasterPage

2023-09-20 Thread via GitHub
dongjoon-hyun closed pull request #43007: [SPARK-45229][CORE][UI] Show the number of drivers waiting in SUBMITTED status in MasterPage URL: https://github.com/apache/spark/pull/43007 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] LuciferYang commented on pull request #43005: [WIP][SPARK-44112][BUILD][INFRA] Drop Java 8 and 11 support

2023-09-20 Thread via GitHub
LuciferYang commented on PR #43005: URL: https://github.com/apache/spark/pull/43005#issuecomment-1727201913 wait https://github.com/apache/spark/pull/43008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331684745 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

  1   2   >