[GitHub] [spark] HeartSaVioR commented on pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails

2023-09-20 Thread via GitHub
HeartSaVioR commented on PR #42895: URL: https://github.com/apache/spark/pull/42895#issuecomment-1727033619 https://github.com/neilramaswamy/nr-spark/actions/runs/6242426233/job/16951813562 This failure seems to be real one. SparkThrowableSuite provides a guide to regenerate the

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331581159 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
zhengruifeng commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331147608 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -414,12 +407,13 @@ object functions { * @group agg_funcs * @since 1.3.0 */ -

[GitHub] [spark] dzhigimont commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub
dzhigimont commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1727427143 > @dzhigimont Can we just make the CI pass for now? I can help in the follow-ups after merging this one. > > Seems like the mypy checks is failing for now: > > ``` >

[GitHub] [spark] WeichenXu123 commented on pull request #42382: [ML] Remove usage of RDD APIs for load/save in spark-ml

2023-09-20 Thread via GitHub
WeichenXu123 commented on PR #42382: URL: https://github.com/apache/spark/pull/42382#issuecomment-1727681685 @zhengruifeng Can we make the interface `saveMetadata` support both `sparkContext` and `sparkSession` argument ? and in spark repo, we always pass sparkSession as the

[GitHub] [spark] HyukjinKwon commented on pull request #43002: [SPARK-43498][PS][TESTS] Enable `StatsTests.test_axis_on_dataframe` for pandas 2.0.0.

2023-09-20 Thread via GitHub
HyukjinKwon commented on PR #43002: URL: https://github.com/apache/spark/pull/43002#issuecomment-1727167048 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng opened a new pull request, #43012: [SPARK-45234][PYTHON][DOCS] Refine DocString of `regr_*` functions

2023-09-20 Thread via GitHub
zhengruifeng opened a new pull request, #43012: URL: https://github.com/apache/spark/pull/43012 ### What changes were proposed in this pull request? Refine DocString of `regr_*` functions ### Why are the changes needed? fix the wildcard import ### Does this PR

[GitHub] [spark] HyukjinKwon commented on pull request #41016: [SPARK-43341][SQL] Patch StructType.toDDL not picking up on non-nullability of nested column

2023-09-20 Thread via GitHub
HyukjinKwon commented on PR #41016: URL: https://github.com/apache/spark/pull/41016#issuecomment-1727125449 @BramBoog it has a conflicts against the lastest master branch. You would need to resolve the conflicts by git fetch upstream & git rebase upstream/master -- This is an automated

[GitHub] [spark] zhengruifeng commented on pull request #43003: [SPARK-45226][PYTHON][DOCS] Refine docstring of `rand/randn`

2023-09-20 Thread via GitHub
zhengruifeng commented on PR #43003: URL: https://github.com/apache/spark/pull/43003#issuecomment-1727125799 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331134534 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -442,6 +442,10 @@ case class InSubquery(values: Seq[Expression],

[GitHub] [spark] LuciferYang opened a new pull request, #43008: [SPARK-44113][BUILD] Drop Scala 2.12 Support

2023-09-20 Thread via GitHub
LuciferYang opened a new pull request, #43008: URL: https://github.com/apache/spark/pull/43008 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] HyukjinKwon opened a new pull request, #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub
HyukjinKwon opened a new pull request, #43013: URL: https://github.com/apache/spark/pull/43013 ### What changes were proposed in this pull request? This PR proposes to add a couple of notes about which modules are supported by Spark Connect. ### Why are the changes needed?

[GitHub] [spark] HyukjinKwon commented on pull request #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub
HyukjinKwon commented on PR #43013: URL: https://github.com/apache/spark/pull/43013#issuecomment-1727296619 Build: https://github.com/HyukjinKwon/spark/actions/runs/6245820107/job/16955248336 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331079422 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -442,6 +442,10 @@ case class InSubquery(values: Seq[Expression],

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331243346 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -93,33 +179,65 @@ private[client] object

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #43005: [WIP][SPARK-44112][BUILD][INFRA] Drop Java 8 and 11 support

2023-09-20 Thread via GitHub
bjornjorgensen commented on code in PR #43005: URL: https://github.com/apache/spark/pull/43005#discussion_r1331229836 ## .github/workflows/build_coverage.yml: ## @@ -17,7 +17,7 @@ # under the License. # -name: "Build / Coverage (master, Scala 2.12, Hadoop 3, JDK 8)" +name:

[GitHub] [spark] hdaikoku commented on pull request #42426: [SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-09-20 Thread via GitHub
hdaikoku commented on PR #42426: URL: https://github.com/apache/spark/pull/42426#issuecomment-1727835491 > I think `SparkUncaughtExceptionHandler` should caught this OOM exception and abort the executor. I'm not sure if I'm following this. For this particular case, OOM was actually

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331113924 ## python/pyspark/sql/column.py: ## @@ -712,11 +712,11 @@ def __getitem__(self, k: Any) -> "Column": >>> df =

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331693297 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42979: [SPARK-45035][SQL] Fix ignoreCorruptFiles with multiline CSV/JSON will report error

2023-09-20 Thread via GitHub
Hisoka-X commented on code in PR #42979: URL: https://github.com/apache/spark/pull/42979#discussion_r1331553477 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala: ## @@ -190,12 +191,19 @@ object MultiLineCSVDataSource extends

[GitHub] [spark] zhengruifeng closed pull request #43003: [SPARK-45226][PYTHON][DOCS] Refine docstring of `rand/randn`

2023-09-20 Thread via GitHub
zhengruifeng closed pull request #43003: [SPARK-45226][PYTHON][DOCS] Refine docstring of `rand/randn` URL: https://github.com/apache/spark/pull/43003 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dzhigimont commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub
dzhigimont commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1331412813 ## python/pyspark/pandas/indexes/datetimes.py: ## @@ -214,28 +215,8 @@ def microsecond(self) -> Index: ) return

[GitHub] [spark] panbingkun commented on pull request #42993: [SPARK-45231][INFRA] Remove unrecognized and meaningless command about amm from the GA testing workflow.

2023-09-20 Thread via GitHub
panbingkun commented on PR #42993: URL: https://github.com/apache/spark/pull/42993#issuecomment-1727147543 cc @vicennial @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia opened a new pull request, #43010: [WIP]

2023-09-20 Thread via GitHub
amaliujia opened a new pull request, #43010: URL: https://github.com/apache/spark/pull/43010 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331311772 ## sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala: ## @@ -723,11 +728,14 @@ object IntegratedUDFTestUtils extends SQLHelper {

[GitHub] [spark] hvanhovell commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
hvanhovell commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331644650 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] zhengruifeng opened a new pull request, #43011: [SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub
zhengruifeng opened a new pull request, #43011: URL: https://github.com/apache/spark/pull/43011 ### What changes were proposed in this pull request? Add missing function groups to SQL references: - xml_funcs - lambda_funcs - collection_funcs - url_funcs - hash_funcsx

[GitHub] [spark] dzhigimont commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub
dzhigimont commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1331413958 ## python/pyspark/pandas/datetimes.py: ## @@ -116,26 +117,59 @@ def pandas_microsecond(s) -> ps.Series[np.int32]: # type: ignore[no-untyped-def def

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331247912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] MaxGekk commented on pull request #42996: [SPARK-45224][PYTHON] Add examples w/ map and array as parameters of `sql()`

2023-09-20 Thread via GitHub
MaxGekk commented on PR #42996: URL: https://github.com/apache/spark/pull/42996#issuecomment-1727192557 Merging to master. Thank you, @HyukjinKwon and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement FetchErrorDetails RPC

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1323060159 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/ClientStreamingQuerySuite.scala: ## @@ -175,6 +175,36 @@ class ClientStreamingQuerySuite

[GitHub] [spark] beliefer commented on a diff in pull request #42612: [SPARK-44913][SQL] DS V2 supports push down V2 UDF that has magic method

2023-09-20 Thread via GitHub
beliefer commented on code in PR #42612: URL: https://github.com/apache/spark/pull/42612#discussion_r1331177757 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala: ## @@ -279,7 +283,9 @@ case class StaticInvoke( inputTypes:

[GitHub] [spark] HyukjinKwon closed pull request #43002: [SPARK-43498][PS][TESTS] Enable `StatsTests.test_axis_on_dataframe` for pandas 2.0.0.

2023-09-20 Thread via GitHub
HyukjinKwon closed pull request #43002: [SPARK-43498][PS][TESTS] Enable `StatsTests.test_axis_on_dataframe` for pandas 2.0.0. URL: https://github.com/apache/spark/pull/43002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on pull request #43007: [SPARK-45229][CORE][UI] Show the number of drivers waiting in SUBMITTED status in MasterPage

2023-09-20 Thread via GitHub
dongjoon-hyun commented on PR #43007: URL: https://github.com/apache/spark/pull/43007#issuecomment-1727060848 Thank you for revising the PR title. Since `core` module test passed and I verified manually, let me merge this. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43011: [SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub
zhengruifeng commented on code in PR #43011: URL: https://github.com/apache/spark/pull/43011#discussion_r1331200011 ## sql/gen-sql-functions-docs.py: ## @@ -34,6 +34,8 @@ "math_funcs", "conditional_funcs", "generator_funcs", "predicate_funcs", "string_funcs",

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

2023-09-20 Thread via GitHub
zhengruifeng commented on code in PR #43014: URL: https://github.com/apache/spark/pull/43014#discussion_r1331636427 ## python/pyspark/sql/connect/plan.py: ## @@ -1049,21 +1049,23 @@ def __init__(self, query: str, args: Optional[Union[Dict[str, Any], List]] = Non

[GitHub] [spark] cloud-fan closed pull request #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case

2023-09-20 Thread via GitHub
cloud-fan closed pull request #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case URL: https://github.com/apache/spark/pull/42971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331243346 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -93,33 +179,65 @@ private[client] object

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331113924 ## python/pyspark/sql/column.py: ## @@ -712,11 +712,11 @@ def __getitem__(self, k: Any) -> "Column": >>> df =

[GitHub] [spark] cxzl25 commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub
cxzl25 commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331640480 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

[GitHub] [spark] dzhigimont commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub
dzhigimont commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1727680820 > @dzhigimont Can we just make the CI pass for now? I can help in the follow-ups after merging this one. > > Seems like the mypy checks is failing for now: > > ``` >

[GitHub] [spark] LuciferYang opened a new pull request, #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB`

2023-09-20 Thread via GitHub
LuciferYang opened a new pull request, #43015: URL: https://github.com/apache/spark/pull/43015 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331301998 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingSymmetricHashJoinHelperSuite.scala: ## @@ -49,7 +44,12 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331145071 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingSymmetricHashJoinHelperSuite.scala: ## @@ -49,7 +44,12 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331143291 ## sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala: ## @@ -723,11 +728,14 @@ object IntegratedUDFTestUtils extends SQLHelper {

[GitHub] [spark] LuciferYang commented on pull request #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB`

2023-09-20 Thread via GitHub
LuciferYang commented on PR #43015: URL: https://github.com/apache/spark/pull/43015#issuecomment-1727645034 cc @dongjoon-hyun FYI I think this one need to backport to branch-3.4 and branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] beliefer commented on a diff in pull request #42612: [SPARK-44913][SQL] DS V2 supports push down V2 UDF that has magic method

2023-09-20 Thread via GitHub
beliefer commented on code in PR #42612: URL: https://github.com/apache/spark/pull/42612#discussion_r1331177757 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala: ## @@ -279,7 +283,9 @@ case class StaticInvoke( inputTypes:

[GitHub] [spark] LuciferYang commented on a diff in pull request #43008: [WIP][SPARK-44113][BUILD][INFRA][DOCS] Drop support for Scala 2.12

2023-09-20 Thread via GitHub
LuciferYang commented on code in PR #43008: URL: https://github.com/apache/spark/pull/43008#discussion_r1331207849 ## dev/change-scala-version.sh: ## @@ -19,7 +19,7 @@ set -e -VALID_VERSIONS=( 2.12 2.13 ) +VALID_VERSIONS=( 2.13 ) Review Comment: No further

[GitHub] [spark] LuciferYang commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-20 Thread via GitHub
LuciferYang commented on PR #42908: URL: https://github.com/apache/spark/pull/42908#issuecomment-1727552095 Thanks @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub
zhengruifeng commented on PR #43013: URL: https://github.com/apache/spark/pull/43013#issuecomment-1727400614 `pyspark.ml.connect` only supports a small subset of `pyspark.ml` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #42916: [MiNOR][DOCS] Fix a typo in HashAggregateExec.scala

2023-09-20 Thread via GitHub
HyukjinKwon commented on PR #42916: URL: https://github.com/apache/spark/pull/42916#issuecomment-1727123248 We have our own logic to detect forked repostiories' github actions run. You would need to go to settings in your forked repo, and enable it. For now, seems I can't find the Githuh

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331297747 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/ClientStreamingQuerySuite.scala: ## @@ -175,6 +175,37 @@ class ClientStreamingQuerySuite

[GitHub] [spark] yaooqinn commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub
yaooqinn commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331623060 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
zhengruifeng commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331149367 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -414,12 +407,13 @@ object functions { * @group agg_funcs * @since 1.3.0 */ -

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43011: [WIP][SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub
zhengruifeng commented on code in PR #43011: URL: https://github.com/apache/spark/pull/43011#discussion_r1331213799 ## sql/gen-sql-functions-docs.py: ## @@ -34,6 +34,8 @@ "math_funcs", "conditional_funcs", "generator_funcs", "predicate_funcs", "string_funcs",

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331110271 ## sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -708,7 +708,7 @@ private[sql] object RelationalGroupedDataset { case

[GitHub] [spark] itholic commented on a diff in pull request #42994: [SPARK-43433][PS] Match `GroupBy.nth` behavior to the latest Pandas

2023-09-20 Thread via GitHub
itholic commented on code in PR #42994: URL: https://github.com/apache/spark/pull/42994#discussion_r1331072003 ## python/pyspark/pandas/groupby.py: ## @@ -1155,14 +1152,32 @@ def nth(self, n: int) -> FrameLike: else: sdf =

[GitHub] [spark] LuciferYang commented on pull request #43008: [SPARK-44113][BUILD][INFRA][DOCS] Drop support for Scala 2.12

2023-09-20 Thread via GitHub
LuciferYang commented on PR #43008: URL: https://github.com/apache/spark/pull/43008#issuecomment-1727210378 cc @dongjoon-hyun FYI Do we need to further split this PR ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun closed pull request #43007: [SPARK-45229][CORE][UI] Show the number of drivers waiting in SUBMITTED status in MasterPage

2023-09-20 Thread via GitHub
dongjoon-hyun closed pull request #43007: [SPARK-45229][CORE][UI] Show the number of drivers waiting in SUBMITTED status in MasterPage URL: https://github.com/apache/spark/pull/43007 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng commented on pull request #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub
zhengruifeng commented on PR #43013: URL: https://github.com/apache/spark/pull/43013#issuecomment-1727396313 `/mllib` in scala, `pyspark.ml` and `pyspark.mllib` in python, don't work on connect. only new module `pyspark.ml.connect` works on connect. `pyspark.ml` contains many

[GitHub] [spark] heyihong opened a new pull request, #43017: [SPARK-45239] Reduce default spark.connect.jvmStacktrace.maxSize

2023-09-20 Thread via GitHub
heyihong opened a new pull request, #43017: URL: https://github.com/apache/spark/pull/43017 ### What changes were proposed in this pull request? - Reduce default spark.connect.jvmStacktrace.maxSize ### Why are the changes needed? -

[GitHub] [spark] cloud-fan commented on pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
cloud-fan commented on PR #42864: URL: https://github.com/apache/spark/pull/42864#issuecomment-1727813318 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331785918 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331894962 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file:

[GitHub] [spark] dongjoon-hyun opened a new pull request, #43018: [SPARK-45241][INFRA] Use Zulu JDK in java-other-versions pipeline and Java 21

2023-09-20 Thread via GitHub
dongjoon-hyun opened a new pull request, #43018: URL: https://github.com/apache/spark/pull/43018 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] yaooqinn commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub
yaooqinn commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331832544 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

[GitHub] [spark] dongjoon-hyun commented on pull request #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB`

2023-09-20 Thread via GitHub
dongjoon-hyun commented on PR #43015: URL: https://github.com/apache/spark/pull/43015#issuecomment-1727988603 Merged to master/3.5/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331901152 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] allisonwang-db opened a new pull request, #43019: [SPARK-45219][PYTHON][DOCS] Refine docstring of withColumn(s)Renamed

2023-09-20 Thread via GitHub
allisonwang-db opened a new pull request, #43019: URL: https://github.com/apache/spark/pull/43019 ### What changes were proposed in this pull request? This PR refines the docstring of `DataFrame.withColumnRenamed` and `DataFrame.withColumnsRenamed`. ### Why are the

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42997: [SPARK-45216][SQL] Fix non-deterministic seeded Dataset APIs

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #42997: URL: https://github.com/apache/spark/pull/42997#discussion_r1331943027 ## python/pyspark/sql/connect/functions.py: ## @@ -388,7 +390,7 @@ def rand(seed: Optional[int] = None) -> Column: if seed is not None: return

[GitHub] [spark] yaooqinn opened a new pull request, #43016: [SPARK-45077][UI][FOLLOWUP] Update comment to link the forked repo yaooqinn/dagre-d3

2023-09-20 Thread via GitHub
yaooqinn opened a new pull request, #43016: URL: https://github.com/apache/spark/pull/43016 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] MaxGekk opened a new pull request, #43014: [WIP][CONNECT][PYTHON] Support map and array parameters by `sql()`

2023-09-20 Thread via GitHub
MaxGekk opened a new pull request, #43014: URL: https://github.com/apache/spark/pull/43014 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] MaxGekk closed pull request #42996: [SPARK-45224][PYTHON] Add examples w/ map and array as parameters of `sql()`

2023-09-20 Thread via GitHub
MaxGekk closed pull request #42996: [SPARK-45224][PYTHON] Add examples w/ map and array as parameters of `sql()` URL: https://github.com/apache/spark/pull/42996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43011: [WIP][SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub
zhengruifeng commented on code in PR #43011: URL: https://github.com/apache/spark/pull/43011#discussion_r1331213799 ## sql/gen-sql-functions-docs.py: ## @@ -34,6 +34,8 @@ "math_funcs", "conditional_funcs", "generator_funcs", "predicate_funcs", "string_funcs",

[GitHub] [spark] LuciferYang commented on pull request #43005: [WIP][SPARK-44112][BUILD][INFRA] Drop Java 8 and 11 support

2023-09-20 Thread via GitHub
LuciferYang commented on PR #43005: URL: https://github.com/apache/spark/pull/43005#issuecomment-1727201913 wait https://github.com/apache/spark/pull/43008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331243346 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -93,33 +179,65 @@ private[client] object

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331247912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331929198 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331077927 ## python/pyspark/sql/column.py: ## @@ -712,11 +712,11 @@ def __getitem__(self, k: Any) -> "Column": >>> df =

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331247912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub
cloud-fan commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331684745 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331247912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] cloud-fan commented on pull request #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case

2023-09-20 Thread via GitHub
cloud-fan commented on PR #42971: URL: https://github.com/apache/spark/pull/42971#issuecomment-1727363616 The failed streaming test is unrelated, I'm merging it to master/3.5, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331900167 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331899412 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file:

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331897585 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file:

[GitHub] [spark] allisonwang-db commented on a diff in pull request #43011: [WIP][SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #43011: URL: https://github.com/apache/spark/pull/43011#discussion_r1331925178 ## sql/gen-sql-functions-docs.py: ## @@ -34,6 +34,8 @@ "math_funcs", "conditional_funcs", "generator_funcs", "predicate_funcs", "string_funcs",

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331251817 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] wankunde opened a new pull request, #43009: [SPARK-45230][SQL] Plan sorter for Aggregate after SMJ

2023-09-20 Thread via GitHub
wankunde opened a new pull request, #43009: URL: https://github.com/apache/spark/pull/43009 ### What changes were proposed in this pull request? This PR could be a followup of https://github.com/apache/spark/pull/42488 and https://github.com/apache/spark/pull/42557. If

[GitHub] [spark] cloud-fan closed pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
cloud-fan closed pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions URL: https://github.com/apache/spark/pull/42864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun closed pull request #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB`

2023-09-20 Thread via GitHub
dongjoon-hyun closed pull request #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB` URL: https://github.com/apache/spark/pull/43015 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] peter-toth commented on pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub
peter-toth commented on PR #42864: URL: https://github.com/apache/spark/pull/42864#issuecomment-1728048096 Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331900167 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] allisonwang-db commented on a diff in pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

2023-09-20 Thread via GitHub
allisonwang-db commented on code in PR #43014: URL: https://github.com/apache/spark/pull/43014#discussion_r1331920397 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -1237,13 +1237,23 @@ def test_sql(self): self.assertEqual(1, len(pdf.index))

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43011: [WIP][SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub
zhengruifeng commented on code in PR #43011: URL: https://github.com/apache/spark/pull/43011#discussion_r1331200011 ## sql/gen-sql-functions-docs.py: ## @@ -34,6 +34,8 @@ "math_funcs", "conditional_funcs", "generator_funcs", "predicate_funcs", "string_funcs",

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub
heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331693297 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] LuciferYang commented on a diff in pull request #43005: [WIP][SPARK-44112][BUILD][INFRA] Drop Java 8 and 11 support

2023-09-20 Thread via GitHub
LuciferYang commented on code in PR #43005: URL: https://github.com/apache/spark/pull/43005#discussion_r1331245458 ## .github/workflows/build_coverage.yml: ## @@ -17,7 +17,7 @@ # under the License. # -name: "Build / Coverage (master, Scala 2.12, Hadoop 3, JDK 8)" +name:

[GitHub] [spark] cloud-fan commented on pull request #42997: [SPARK-45216][SQL] Fix non-deterministic seeded Dataset APIs

2023-09-20 Thread via GitHub
cloud-fan commented on PR #42997: URL: https://github.com/apache/spark/pull/42997#issuecomment-1727800532 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cdkrot commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub
cdkrot commented on PR #42949: URL: https://github.com/apache/spark/pull/42949#issuecomment-1727917945 Updated fork's master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] hdaikoku commented on a diff in pull request #42426: [SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-09-20 Thread via GitHub
hdaikoku commented on code in PR #42426: URL: https://github.com/apache/spark/pull/42426#discussion_r1331742974 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java: ## @@ -274,7 +287,13 @@ private void

  1   2   >