date:20230920

[GitHub] [spark] HeartSaVioR commented on pull request #42895: [SPARK-45138][SS] Define a new error class and apply it when checkpointing state to DFS fails

2023-09-20 Thread via GitHub

HeartSaVioR commented on PR #42895: URL: https://github.com/apache/spark/pull/42895#issuecomment-1727033619 https://github.com/neilramaswamy/nr-spark/actions/runs/6242426233/job/16951813562 This failure seems to be real one. SparkThrowableSuite provides a guide to regenerate the

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub

cloud-fan commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331581159 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

zhengruifeng commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331147608 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -414,12 +407,13 @@ object functions { * @group agg_funcs * @since 1.3.0 */ -

[GitHub] [spark] dzhigimont commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub

dzhigimont commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1727427143 > @dzhigimont Can we just make the CI pass for now? I can help in the follow-ups after merging this one. > > Seems like the mypy checks is failing for now: > > ``` >

[GitHub] [spark] WeichenXu123 commented on pull request #42382: [ML] Remove usage of RDD APIs for load/save in spark-ml

2023-09-20 Thread via GitHub

WeichenXu123 commented on PR #42382: URL: https://github.com/apache/spark/pull/42382#issuecomment-1727681685 @zhengruifeng Can we make the interface `saveMetadata` support both `sparkContext` and `sparkSession` argument ? and in spark repo, we always pass sparkSession as the

[GitHub] [spark] HyukjinKwon commented on pull request #43002: [SPARK-43498][PS][TESTS] Enable `StatsTests.test_axis_on_dataframe` for pandas 2.0.0.

2023-09-20 Thread via GitHub

HyukjinKwon commented on PR #43002: URL: https://github.com/apache/spark/pull/43002#issuecomment-1727167048 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng opened a new pull request, #43012: [SPARK-45234][PYTHON][DOCS] Refine DocString of `regr_*` functions

2023-09-20 Thread via GitHub

zhengruifeng opened a new pull request, #43012: URL: https://github.com/apache/spark/pull/43012 ### What changes were proposed in this pull request? Refine DocString of `regr_*` functions ### Why are the changes needed? fix the wildcard import ### Does this PR

[GitHub] [spark] HyukjinKwon commented on pull request #41016: [SPARK-43341][SQL] Patch StructType.toDDL not picking up on non-nullability of nested column

2023-09-20 Thread via GitHub

HyukjinKwon commented on PR #41016: URL: https://github.com/apache/spark/pull/41016#issuecomment-1727125449 @BramBoog it has a conflicts against the lastest master branch. You would need to resolve the conflicts by git fetch upstream & git rebase upstream/master -- This is an automated

[GitHub] [spark] zhengruifeng commented on pull request #43003: [SPARK-45226][PYTHON][DOCS] Refine docstring of `rand/randn`

2023-09-20 Thread via GitHub

zhengruifeng commented on PR #43003: URL: https://github.com/apache/spark/pull/43003#issuecomment-1727125799 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331134534 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -442,6 +442,10 @@ case class InSubquery(values: Seq[Expression],

[GitHub] [spark] LuciferYang opened a new pull request, #43008: [SPARK-44113][BUILD] Drop Scala 2.12 Support

2023-09-20 Thread via GitHub

LuciferYang opened a new pull request, #43008: URL: https://github.com/apache/spark/pull/43008 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] HyukjinKwon opened a new pull request, #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub

HyukjinKwon opened a new pull request, #43013: URL: https://github.com/apache/spark/pull/43013 ### What changes were proposed in this pull request? This PR proposes to add a couple of notes about which modules are supported by Spark Connect. ### Why are the changes needed?

[GitHub] [spark] HyukjinKwon commented on pull request #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub

HyukjinKwon commented on PR #43013: URL: https://github.com/apache/spark/pull/43013#issuecomment-1727296619 Build: https://github.com/HyukjinKwon/spark/actions/runs/6245820107/job/16955248336 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331079422 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -442,6 +442,10 @@ case class InSubquery(values: Seq[Expression],

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331243346 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -93,33 +179,65 @@ private[client] object

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #43005: [WIP][SPARK-44112][BUILD][INFRA] Drop Java 8 and 11 support

2023-09-20 Thread via GitHub

bjornjorgensen commented on code in PR #43005: URL: https://github.com/apache/spark/pull/43005#discussion_r1331229836 ## .github/workflows/build_coverage.yml: ## @@ -17,7 +17,7 @@ # under the License. # -name: "Build / Coverage (master, Scala 2.12, Hadoop 3, JDK 8)" +name:

[GitHub] [spark] hdaikoku commented on pull request #42426: [SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-09-20 Thread via GitHub

hdaikoku commented on PR #42426: URL: https://github.com/apache/spark/pull/42426#issuecomment-1727835491 > I think `SparkUncaughtExceptionHandler` should caught this OOM exception and abort the executor. I'm not sure if I'm following this. For this particular case, OOM was actually

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331113924 ## python/pyspark/sql/column.py: ## @@ -712,11 +712,11 @@ def __getitem__(self, k: Any) -> "Column": >>> df =

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331693297 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] Hisoka-X commented on a diff in pull request #42979: [SPARK-45035][SQL] Fix ignoreCorruptFiles with multiline CSV/JSON will report error

2023-09-20 Thread via GitHub

Hisoka-X commented on code in PR #42979: URL: https://github.com/apache/spark/pull/42979#discussion_r1331553477 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala: ## @@ -190,12 +191,19 @@ object MultiLineCSVDataSource extends

[GitHub] [spark] zhengruifeng closed pull request #43003: [SPARK-45226][PYTHON][DOCS] Refine docstring of `rand/randn`

2023-09-20 Thread via GitHub

zhengruifeng closed pull request #43003: [SPARK-45226][PYTHON][DOCS] Refine docstring of `rand/randn` URL: https://github.com/apache/spark/pull/43003 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dzhigimont commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub

dzhigimont commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1331412813 ## python/pyspark/pandas/indexes/datetimes.py: ## @@ -214,28 +215,8 @@ def microsecond(self) -> Index: ) return

[GitHub] [spark] panbingkun commented on pull request #42993: [SPARK-45231][INFRA] Remove unrecognized and meaningless command about amm from the GA testing workflow.

2023-09-20 Thread via GitHub

panbingkun commented on PR #42993: URL: https://github.com/apache/spark/pull/42993#issuecomment-1727147543 cc @vicennial @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia opened a new pull request, #43010: [WIP]

2023-09-20 Thread via GitHub

amaliujia opened a new pull request, #43010: URL: https://github.com/apache/spark/pull/43010 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331311772 ## sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala: ## @@ -723,11 +728,14 @@ object IntegratedUDFTestUtils extends SQLHelper {

[GitHub] [spark] hvanhovell commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

hvanhovell commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331644650 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] zhengruifeng opened a new pull request, #43011: [SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub

zhengruifeng opened a new pull request, #43011: URL: https://github.com/apache/spark/pull/43011 ### What changes were proposed in this pull request? Add missing function groups to SQL references: - xml_funcs - lambda_funcs - collection_funcs - url_funcs - hash_funcsx

[GitHub] [spark] dzhigimont commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub

dzhigimont commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1331413958 ## python/pyspark/pandas/datetimes.py: ## @@ -116,26 +117,59 @@ def pandas_microsecond(s) -> ps.Series[np.int32]: # type: ignore[no-untyped-def def

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331247912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] MaxGekk commented on pull request #42996: [SPARK-45224][PYTHON] Add examples w/ map and array as parameters of `sql()`

2023-09-20 Thread via GitHub

MaxGekk commented on PR #42996: URL: https://github.com/apache/spark/pull/42996#issuecomment-1727192557 Merging to master. Thank you, @HyukjinKwon and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] heyihong commented on a diff in pull request #42377: [SPARK-44622][SQL][CONNECT] Implement FetchErrorDetails RPC

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42377: URL: https://github.com/apache/spark/pull/42377#discussion_r1323060159 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/ClientStreamingQuerySuite.scala: ## @@ -175,6 +175,36 @@ class ClientStreamingQuerySuite

[GitHub] [spark] beliefer commented on a diff in pull request #42612: [SPARK-44913][SQL] DS V2 supports push down V2 UDF that has magic method

2023-09-20 Thread via GitHub

beliefer commented on code in PR #42612: URL: https://github.com/apache/spark/pull/42612#discussion_r1331177757 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala: ## @@ -279,7 +283,9 @@ case class StaticInvoke( inputTypes:

[GitHub] [spark] HyukjinKwon closed pull request #43002: [SPARK-43498][PS][TESTS] Enable `StatsTests.test_axis_on_dataframe` for pandas 2.0.0.

2023-09-20 Thread via GitHub

HyukjinKwon closed pull request #43002: [SPARK-43498][PS][TESTS] Enable `StatsTests.test_axis_on_dataframe` for pandas 2.0.0. URL: https://github.com/apache/spark/pull/43002 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on pull request #43007: [SPARK-45229][CORE][UI] Show the number of drivers waiting in SUBMITTED status in MasterPage

2023-09-20 Thread via GitHub

dongjoon-hyun commented on PR #43007: URL: https://github.com/apache/spark/pull/43007#issuecomment-1727060848 Thank you for revising the PR title. Since `core` module test passed and I verified manually, let me merge this. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43011: [SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub

zhengruifeng commented on code in PR #43011: URL: https://github.com/apache/spark/pull/43011#discussion_r1331200011 ## sql/gen-sql-functions-docs.py: ## @@ -34,6 +34,8 @@ "math_funcs", "conditional_funcs", "generator_funcs", "predicate_funcs", "string_funcs",

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

2023-09-20 Thread via GitHub

zhengruifeng commented on code in PR #43014: URL: https://github.com/apache/spark/pull/43014#discussion_r1331636427 ## python/pyspark/sql/connect/plan.py: ## @@ -1049,21 +1049,23 @@ def __init__(self, query: str, args: Optional[Union[Dict[str, Any], List]] = Non

[GitHub] [spark] cloud-fan closed pull request #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case

2023-09-20 Thread via GitHub

cloud-fan closed pull request #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case URL: https://github.com/apache/spark/pull/42971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331243346 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -93,33 +179,65 @@ private[client] object

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331113924 ## python/pyspark/sql/column.py: ## @@ -712,11 +712,11 @@ def __getitem__(self, k: Any) -> "Column": >>> df =

[GitHub] [spark] cxzl25 commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub

cxzl25 commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331640480 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

[GitHub] [spark] dzhigimont commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-09-20 Thread via GitHub

dzhigimont commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1727680820 > @dzhigimont Can we just make the CI pass for now? I can help in the follow-ups after merging this one. > > Seems like the mypy checks is failing for now: > > ``` >

[GitHub] [spark] LuciferYang opened a new pull request, #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB`

2023-09-20 Thread via GitHub

LuciferYang opened a new pull request, #43015: URL: https://github.com/apache/spark/pull/43015 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331301998 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingSymmetricHashJoinHelperSuite.scala: ## @@ -49,7 +44,12 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331145071 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingSymmetricHashJoinHelperSuite.scala: ## @@ -49,7 +44,12 @@ class

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331143291 ## sql/core/src/test/scala/org/apache/spark/sql/IntegratedUDFTestUtils.scala: ## @@ -723,11 +728,14 @@ object IntegratedUDFTestUtils extends SQLHelper {

[GitHub] [spark] LuciferYang commented on pull request #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB`

2023-09-20 Thread via GitHub

LuciferYang commented on PR #43015: URL: https://github.com/apache/spark/pull/43015#issuecomment-1727645034 cc @dongjoon-hyun FYI I think this one need to backport to branch-3.4 and branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] beliefer commented on a diff in pull request #42612: [SPARK-44913][SQL] DS V2 supports push down V2 UDF that has magic method

2023-09-20 Thread via GitHub

beliefer commented on code in PR #42612: URL: https://github.com/apache/spark/pull/42612#discussion_r1331177757 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala: ## @@ -279,7 +283,9 @@ case class StaticInvoke( inputTypes:

[GitHub] [spark] LuciferYang commented on a diff in pull request #43008: [WIP][SPARK-44113][BUILD][INFRA][DOCS] Drop support for Scala 2.12

2023-09-20 Thread via GitHub

LuciferYang commented on code in PR #43008: URL: https://github.com/apache/spark/pull/43008#discussion_r1331207849 ## dev/change-scala-version.sh: ## @@ -19,7 +19,7 @@ set -e -VALID_VERSIONS=( 2.12 2.13 ) +VALID_VERSIONS=( 2.13 ) Review Comment: No further

[GitHub] [spark] LuciferYang commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-20 Thread via GitHub

LuciferYang commented on PR #42908: URL: https://github.com/apache/spark/pull/42908#issuecomment-1727552095 Thanks @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub

zhengruifeng commented on PR #43013: URL: https://github.com/apache/spark/pull/43013#issuecomment-1727400614 `pyspark.ml.connect` only supports a small subset of `pyspark.ml` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #42916: [MiNOR][DOCS] Fix a typo in HashAggregateExec.scala

2023-09-20 Thread via GitHub

HyukjinKwon commented on PR #42916: URL: https://github.com/apache/spark/pull/42916#issuecomment-1727123248 We have our own logic to detect forked repostiories' github actions run. You would need to go to settings in your forked repo, and enable it. For now, seems I can't find the Githuh

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331297747 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/ClientStreamingQuerySuite.scala: ## @@ -175,6 +175,37 @@ class ClientStreamingQuerySuite

[GitHub] [spark] yaooqinn commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub

yaooqinn commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331623060 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

zhengruifeng commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331149367 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -414,12 +407,13 @@ object functions { * @group agg_funcs * @since 1.3.0 */ -

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43011: [WIP][SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub

zhengruifeng commented on code in PR #43011: URL: https://github.com/apache/spark/pull/43011#discussion_r1331213799 ## sql/gen-sql-functions-docs.py: ## @@ -34,6 +34,8 @@ "math_funcs", "conditional_funcs", "generator_funcs", "predicate_funcs", "string_funcs",

[GitHub] [spark] peter-toth commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

peter-toth commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331110271 ## sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -708,7 +708,7 @@ private[sql] object RelationalGroupedDataset { case

[GitHub] [spark] itholic commented on a diff in pull request #42994: [SPARK-43433][PS] Match `GroupBy.nth` behavior to the latest Pandas

2023-09-20 Thread via GitHub

itholic commented on code in PR #42994: URL: https://github.com/apache/spark/pull/42994#discussion_r1331072003 ## python/pyspark/pandas/groupby.py: ## @@ -1155,14 +1152,32 @@ def nth(self, n: int) -> FrameLike: else: sdf =

[GitHub] [spark] LuciferYang commented on pull request #43008: [SPARK-44113][BUILD][INFRA][DOCS] Drop support for Scala 2.12

2023-09-20 Thread via GitHub

LuciferYang commented on PR #43008: URL: https://github.com/apache/spark/pull/43008#issuecomment-1727210378 cc @dongjoon-hyun FYI Do we need to further split this PR ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun closed pull request #43007: [SPARK-45229][CORE][UI] Show the number of drivers waiting in SUBMITTED status in MasterPage

2023-09-20 Thread via GitHub

dongjoon-hyun closed pull request #43007: [SPARK-45229][CORE][UI] Show the number of drivers waiting in SUBMITTED status in MasterPage URL: https://github.com/apache/spark/pull/43007 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng commented on pull request #43013: [MINOR][DOCS][CONNECT] Update notes about supported modules in PySpark API reference

2023-09-20 Thread via GitHub

zhengruifeng commented on PR #43013: URL: https://github.com/apache/spark/pull/43013#issuecomment-1727396313 `/mllib` in scala, `pyspark.ml` and `pyspark.mllib` in python, don't work on connect. only new module `pyspark.ml.connect` works on connect. `pyspark.ml` contains many

[GitHub] [spark] heyihong opened a new pull request, #43017: [SPARK-45239] Reduce default spark.connect.jvmStacktrace.maxSize

2023-09-20 Thread via GitHub

heyihong opened a new pull request, #43017: URL: https://github.com/apache/spark/pull/43017 ### What changes were proposed in this pull request? - Reduce default spark.connect.jvmStacktrace.maxSize ### Why are the changes needed? -

[GitHub] [spark] cloud-fan commented on pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

cloud-fan commented on PR #42864: URL: https://github.com/apache/spark/pull/42864#issuecomment-1727813318 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub

cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331785918 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub

allisonwang-db commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331894962 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file:

[GitHub] [spark] dongjoon-hyun opened a new pull request, #43018: [SPARK-45241][INFRA] Use Zulu JDK in java-other-versions pipeline and Java 21

2023-09-20 Thread via GitHub

dongjoon-hyun opened a new pull request, #43018: URL: https://github.com/apache/spark/pull/43018 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] yaooqinn commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub

yaooqinn commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331832544 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

[GitHub] [spark] dongjoon-hyun commented on pull request #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB`

2023-09-20 Thread via GitHub

dongjoon-hyun commented on PR #43015: URL: https://github.com/apache/spark/pull/43015#issuecomment-1727988603 Merged to master/3.5/3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub

cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331901152 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] allisonwang-db opened a new pull request, #43019: [SPARK-45219][PYTHON][DOCS] Refine docstring of withColumn(s)Renamed

2023-09-20 Thread via GitHub

allisonwang-db opened a new pull request, #43019: URL: https://github.com/apache/spark/pull/43019 ### What changes were proposed in this pull request? This PR refines the docstring of `DataFrame.withColumnRenamed` and `DataFrame.withColumnsRenamed`. ### Why are the

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42997: [SPARK-45216][SQL] Fix non-deterministic seeded Dataset APIs

2023-09-20 Thread via GitHub

allisonwang-db commented on code in PR #42997: URL: https://github.com/apache/spark/pull/42997#discussion_r1331943027 ## python/pyspark/sql/connect/functions.py: ## @@ -388,7 +390,7 @@ def rand(seed: Optional[int] = None) -> Column: if seed is not None: return

[GitHub] [spark] yaooqinn opened a new pull request, #43016: [SPARK-45077][UI][FOLLOWUP] Update comment to link the forked repo yaooqinn/dagre-d3

2023-09-20 Thread via GitHub

yaooqinn opened a new pull request, #43016: URL: https://github.com/apache/spark/pull/43016 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] MaxGekk opened a new pull request, #43014: [WIP][CONNECT][PYTHON] Support map and array parameters by `sql()`

2023-09-20 Thread via GitHub

MaxGekk opened a new pull request, #43014: URL: https://github.com/apache/spark/pull/43014 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] MaxGekk closed pull request #42996: [SPARK-45224][PYTHON] Add examples w/ map and array as parameters of `sql()`

2023-09-20 Thread via GitHub

MaxGekk closed pull request #42996: [SPARK-45224][PYTHON] Add examples w/ map and array as parameters of `sql()` URL: https://github.com/apache/spark/pull/42996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43011: [WIP][SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub

zhengruifeng commented on code in PR #43011: URL: https://github.com/apache/spark/pull/43011#discussion_r1331213799 ## sql/gen-sql-functions-docs.py: ## @@ -34,6 +34,8 @@ "math_funcs", "conditional_funcs", "generator_funcs", "predicate_funcs", "string_funcs",

[GitHub] [spark] LuciferYang commented on pull request #43005: [WIP][SPARK-44112][BUILD][INFRA] Drop Java 8 and 11 support

2023-09-20 Thread via GitHub

LuciferYang commented on PR #43005: URL: https://github.com/apache/spark/pull/43005#issuecomment-1727201913 wait https://github.com/apache/spark/pull/43008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331243346 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala: ## @@ -93,33 +179,65 @@ private[client] object

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331247912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub

cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331929198 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1331077927 ## python/pyspark/sql/column.py: ## @@ -712,11 +712,11 @@ def __getitem__(self, k: Any) -> "Column": >>> df =

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331247912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] cloud-fan commented on a diff in pull request #42199: [SPARK-44579][SQL] Support Interrupt On Cancel in SQLExecution

2023-09-20 Thread via GitHub

cloud-fan commented on code in PR #42199: URL: https://github.com/apache/spark/pull/42199#discussion_r1331684745 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala: ## @@ -77,6 +79,11 @@ object SQLExecution { } val rootExecutionId =

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331247912 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] cloud-fan commented on pull request #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case

2023-09-20 Thread via GitHub

cloud-fan commented on PR #42971: URL: https://github.com/apache/spark/pull/42971#issuecomment-1727363616 The failed streaming test is unrelated, I'm merging it to master/3.5, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub

cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331900167 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub

allisonwang-db commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331899412 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file:

[GitHub] [spark] allisonwang-db commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub

allisonwang-db commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331897585 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file:

[GitHub] [spark] allisonwang-db commented on a diff in pull request #43011: [WIP][SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub

allisonwang-db commented on code in PR #43011: URL: https://github.com/apache/spark/pull/43011#discussion_r1331925178 ## sql/gen-sql-functions-docs.py: ## @@ -34,6 +34,8 @@ "math_funcs", "conditional_funcs", "generator_funcs", "predicate_funcs", "string_funcs",

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331251817 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] wankunde opened a new pull request, #43009: [SPARK-45230][SQL] Plan sorter for Aggregate after SMJ

2023-09-20 Thread via GitHub

wankunde opened a new pull request, #43009: URL: https://github.com/apache/spark/pull/43009 ### What changes were proposed in this pull request? This PR could be a followup of https://github.com/apache/spark/pull/42488 and https://github.com/apache/spark/pull/42557. If

[GitHub] [spark] cloud-fan closed pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

cloud-fan closed pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions URL: https://github.com/apache/spark/pull/42864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dongjoon-hyun closed pull request #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB`

2023-09-20 Thread via GitHub

dongjoon-hyun closed pull request #43015: [SPARK-45237][DOCS] Change the default value of `spark.history.store.hybridStore.diskBackend` in `monitoring.md` to `ROCKSDB` URL: https://github.com/apache/spark/pull/43015 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] peter-toth commented on pull request #42864: [SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-20 Thread via GitHub

peter-toth commented on PR #42864: URL: https://github.com/apache/spark/pull/42864#issuecomment-1728048096 Thanks for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cdkrot commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub

cdkrot commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1331900167 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -243,11 +244,15 @@ def _create_requests( self, *path: str, pyfile: bool, archive: bool, file: bool

[GitHub] [spark] allisonwang-db commented on a diff in pull request #43014: [SPARK-45235][CONNECT][PYTHON] Support map and array parameters by `sql()`

2023-09-20 Thread via GitHub

allisonwang-db commented on code in PR #43014: URL: https://github.com/apache/spark/pull/43014#discussion_r1331920397 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -1237,13 +1237,23 @@ def test_sql(self): self.assertEqual(1, len(pdf.index))

[GitHub] [spark] zhengruifeng commented on a diff in pull request #43011: [WIP][SPARK-45232][DOC] Add missing function groups to SQL references

2023-09-20 Thread via GitHub

zhengruifeng commented on code in PR #43011: URL: https://github.com/apache/spark/pull/43011#discussion_r1331200011 ## sql/gen-sql-functions-docs.py: ## @@ -34,6 +34,8 @@ "math_funcs", "conditional_funcs", "generator_funcs", "predicate_funcs", "string_funcs",

[GitHub] [spark] heyihong commented on a diff in pull request #42987: [SPARK-45207][SQL][CONNECT] Implement Complete Error Reconstruction for Scala Client

2023-09-20 Thread via GitHub

heyihong commented on code in PR #42987: URL: https://github.com/apache/spark/pull/42987#discussion_r1331693297 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2882,8 +2882,7 @@ object SQLConf { "level settings.")

[GitHub] [spark] LuciferYang commented on a diff in pull request #43005: [WIP][SPARK-44112][BUILD][INFRA] Drop Java 8 and 11 support

2023-09-20 Thread via GitHub

LuciferYang commented on code in PR #43005: URL: https://github.com/apache/spark/pull/43005#discussion_r1331245458 ## .github/workflows/build_coverage.yml: ## @@ -17,7 +17,7 @@ # under the License. # -name: "Build / Coverage (master, Scala 2.12, Hadoop 3, JDK 8)" +name:

[GitHub] [spark] cloud-fan commented on pull request #42997: [SPARK-45216][SQL] Fix non-deterministic seeded Dataset APIs

2023-09-20 Thread via GitHub

cloud-fan commented on PR #42997: URL: https://github.com/apache/spark/pull/42997#issuecomment-1727800532 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cdkrot commented on pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-20 Thread via GitHub

cdkrot commented on PR #42949: URL: https://github.com/apache/spark/pull/42949#issuecomment-1727917945 Updated fork's master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] hdaikoku commented on a diff in pull request #42426: [SPARK-44756][CORE] Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-09-20 Thread via GitHub

hdaikoku commented on code in PR #42426: URL: https://github.com/apache/spark/pull/42426#discussion_r1331742974 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java: ## @@ -274,7 +287,13 @@ private void

1 2 >

1 - 100 of 166 matches

Mail list logo