date:20240311

Re: [PR] [WIP][SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-03-11 Thread via GitHub

panbingkun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-1990682229 @dongjoon-hyun To be more `reliable`, I will convert this pr from `draft` to `review` until GA can run successfully. I have tried the test `ClientStreamingQuerySuite ` several

[PR] [SPARK-47347][PYTHON][CONNECT][TESTS] Factor session-related tests out of `test_connect_basic` [spark]

2024-03-11 Thread via GitHub

zhengruifeng opened a new pull request, #45472: URL: https://github.com/apache/spark/pull/45472 ### What changes were proposed in this pull request? Factor session-related tests out of `test_connect_basic` ### Why are the changes needed? for testing parallelism ###

Re: [PR] [SPARK-47279][CORE]When the messageLoop encounter a fatal exception, such as oom, exit the JVM to avoid the driver hanging forever [spark]

2024-03-11 Thread via GitHub

yaooqinn commented on PR #45385: URL: https://github.com/apache/spark/pull/45385#issuecomment-1990595991 Instead of handling such a special case here, JVM has provided helpful arguments to deal with OutOfMemoryError. -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-11 Thread via GitHub

HeartSaVioR closed pull request #45360: [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families URL: https://github.com/apache/spark/pull/45360 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-11 Thread via GitHub

HeartSaVioR commented on PR #45360: URL: https://github.com/apache/spark/pull/45360#issuecomment-1990520156 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47332][SS][Connect] Remove not needed logic in PythonStreamingRunner [spark]

2024-03-11 Thread via GitHub

WweiL commented on PR #45448: URL: https://github.com/apache/spark/pull/45448#issuecomment-1990155844 Closing this since https://github.com/apache/spark/pull/45468 has the same change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47332][SS][Connect] Remove not needed logic in PythonStreamingRunner [spark]

2024-03-11 Thread via GitHub

WweiL closed pull request #45448: [SPARK-47332][SS][Connect] Remove not needed logic in PythonStreamingRunner URL: https://github.com/apache/spark/pull/45448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub

WweiL commented on code in PR #45468: URL: https://github.com/apache/spark/pull/45468#discussion_r1520814035 ## core/src/main/scala/org/apache/spark/api/python/StreamingPythonRunner.scala: ## @@ -68,17 +68,11 @@ private[spark] class StreamingPythonRunner(

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub

HeartSaVioR closed pull request #45023: [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source URL: https://github.com/apache/spark/pull/45023 -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub

HeartSaVioR commented on PR #45023: URL: https://github.com/apache/spark/pull/45023#issuecomment-1990058421 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub

HeartSaVioR commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520802905 ## python/pyspark/sql/datasource.py: ## @@ -426,6 +426,10 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: in the final

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-11 Thread via GitHub

panbingkun commented on code in PR #44665: URL: https://github.com/apache/spark/pull/44665#discussion_r1520791983 ## python/pyspark/sql/functions/builtin.py: ## @@ -15534,19 +15532,7 @@ def to_csv(col: "ColumnOrName", options: Optional[Dict[str, str]] = None) -> Col |

[PR] [SPARK-47342][SQL] Support TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE [spark]

2024-03-11 Thread via GitHub

yaooqinn opened a new pull request, #45471: URL: https://github.com/apache/spark/pull/45471 ### What changes were proposed in this pull request? This PR Supports TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE when `preferTimestampNTZ` option is set to true by users

Re: [PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub

panbingkun commented on PR #45469: URL: https://github.com/apache/spark/pull/45469#issuecomment-1989938050 The pr related to spark-website is here: https://github.com/apache/spark-website/pull/508 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-46043][SQL][FOLLOWUP] do not resolve v2 table provider with custom session catalog [spark]

2024-03-11 Thread via GitHub

yaooqinn commented on PR #45440: URL: https://github.com/apache/spark/pull/45440#issuecomment-1989935613 Merged to master Thank you @cloud-fan @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46043][SQL][FOLLOWUP] do not resolve v2 table provider with custom session catalog [spark]

2024-03-11 Thread via GitHub

yaooqinn closed pull request #45440: [SPARK-46043][SQL][FOLLOWUP] do not resolve v2 table provider with custom session catalog URL: https://github.com/apache/spark/pull/45440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] [SPARK=47344] Extend INVALID_IDENTIFIER [spark]

2024-03-11 Thread via GitHub

srielau opened a new pull request, #45470: URL: https://github.com/apache/spark/pull/45470 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub

yaooqinn commented on PR #45469: URL: https://github.com/apache/spark/pull/45469#issuecomment-1989889832 Merged to master. Thank you all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub

yaooqinn closed pull request #45469: [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates URL: https://github.com/apache/spark/pull/45469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub

panbingkun commented on PR #45469: URL: https://github.com/apache/spark/pull/45469#issuecomment-1989795305 cc @itholic @zhengruifeng @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub

panbingkun commented on PR #45469: URL: https://github.com/apache/spark/pull/45469#issuecomment-1989791290 This issue has existed since version `3.4.0`. After this PR, I will submit `a patch` to fix the doc in `spark-website`.

[PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub

panbingkun opened a new pull request, #45469: URL: https://github.com/apache/spark/pull/45469 ### What changes were proposed in this pull request? The pr aims to remove the extra text `.. include:: /shared/replacements.md` on page `sql-error-conditions-sqlstates.md`. ### Why are

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub

HeartSaVioR commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520711461 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -73,24 +74,31 @@ class MapStateImpl[K, V]( } /** Get the map

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub

HeartSaVioR commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520713399 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -73,24 +74,31 @@ class MapStateImpl[K, V]( } /** Get the map

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub

HeartSaVioR commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520711461 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -73,24 +74,31 @@ class MapStateImpl[K, V]( } /** Get the map

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub

jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520699470 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/MapStateSuite.scala: ## @@ -67,6 +67,7 @@ class MapStateSuite extends StateVariableSuiteBase

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-11 Thread via GitHub

MaxGekk commented on code in PR #44665: URL: https://github.com/apache/spark/pull/44665#discussion_r1520694548 ## python/pyspark/sql/functions/builtin.py: ## @@ -15534,19 +15532,7 @@ def to_csv(col: "ColumnOrName", options: Optional[Dict[str, str]] = None) -> Col |

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub

HeartSaVioR commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520688468 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/MapStateSuite.scala: ## @@ -67,6 +67,7 @@ class MapStateSuite extends

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub

HeartSaVioR commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520688468 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/MapStateSuite.scala: ## @@ -67,6 +67,7 @@ class MapStateSuite extends

Re: [PR] [MINOR] Minor English fixes [spark]

2024-03-11 Thread via GitHub

nchammas commented on PR #45461: URL: https://github.com/apache/spark/pull/45461#issuecomment-1989744426 Ah, the test failure is due to the generated error documentation that is checked in to git. #44971 will eliminate this kind of maintenance headache. (Also, look at that diff

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub

anishshri-db commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1520644679 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -163,6 +249,16 @@ case class TransformWithStateExec(

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub

anishshri-db commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1520644342 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -121,6 +123,46 @@ class StatefulProcessorHandleImpl(

Re: [PR] [SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-03-11 Thread via GitHub

panbingkun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-1989711132 > It seems that one of `ClientStreamingQuerySuite` test hangs due to the independent flakiness. Could you re-trigger it when it fails? Sure, let me continue to observe and

Re: [PR] [SPARK-47343][SQL] Fix NPE when `sqlString` variable value is null string in execute immediate [spark]

2024-03-11 Thread via GitHub

MaxGekk commented on code in PR #45462: URL: https://github.com/apache/spark/pull/45462#discussion_r1520631768 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -3004,6 +3004,12 @@ ], "sqlState" : "2200E" }, + "NULL_QUERY_STRING_EXECUTE_IMMEDIATE"

Re: [PR] [SPARK-47335][BUILD] Upgrade `mvn-scalafmt` to `1.1.1684076452.9f83818` & `scalafmt` to `3.8.0` [spark]

2024-03-11 Thread via GitHub

panbingkun commented on PR #45452: URL: https://github.com/apache/spark/pull/45452#issuecomment-1989710105 > Please let me know if this is ready, @panbingkun . Yeah, it's ready  -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [MINOR] Minor English fixes [spark]

2024-03-11 Thread via GitHub

xinrong-meng commented on PR #45461: URL: https://github.com/apache/spark/pull/45461#issuecomment-1989702036 LGTM after fixing the test, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub

anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520618069 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub

jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520613982 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47309][SQL][XML] Add schema inference unit tests [spark]

2024-03-11 Thread via GitHub

HyukjinKwon closed pull request #45411: [SPARK-47309][SQL][XML] Add schema inference unit tests URL: https://github.com/apache/spark/pull/45411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47309][SQL][XML] Add schema inference unit tests [spark]

2024-03-11 Thread via GitHub

HyukjinKwon commented on PR #45411: URL: https://github.com/apache/spark/pull/45411#issuecomment-1989688316 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub

allisonwang-db commented on code in PR #45468: URL: https://github.com/apache/spark/pull/45468#discussion_r1520588014 ## core/src/main/scala/org/apache/spark/api/python/StreamingPythonRunner.scala: ## @@ -68,17 +68,11 @@ private[spark] class StreamingPythonRunner(

Re: [PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub

HyukjinKwon commented on PR #45468: URL: https://github.com/apache/spark/pull/45468#issuecomment-1989660024 im fine w/ this change but would defer to @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47332][SS][Connect] Remove not needed logic in PythonStreamingRunner [spark]

2024-03-11 Thread via GitHub

HyukjinKwon commented on PR #45448: URL: https://github.com/apache/spark/pull/45448#issuecomment-1989657160 seems the test failure is related -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47341][Connect] Replace commands with relations in a few tests in SparkConnectClientSuite [spark]

2024-03-11 Thread via GitHub

HyukjinKwon closed pull request #45460: [SPARK-47341][Connect] Replace commands with relations in a few tests in SparkConnectClientSuite URL: https://github.com/apache/spark/pull/45460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47341][Connect] Replace commands with relations in a few tests in SparkConnectClientSuite [spark]

2024-03-11 Thread via GitHub

HyukjinKwon commented on PR #45460: URL: https://github.com/apache/spark/pull/45460#issuecomment-1989654626 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR] Minor English fixes [spark]

2024-03-11 Thread via GitHub

HyukjinKwon commented on PR #45461: URL: https://github.com/apache/spark/pull/45461#issuecomment-1989652642 I think the test failure is related: ``` [info] - Error classes match with document *** FAILED *** (145 milliseconds) [info] "...one of the DataFrame[s] but Spark is

Re: [PR] [SPARK-47327][SQL] Fix thread safety issue in ICU Collator [spark]

2024-03-11 Thread via GitHub

HyukjinKwon closed pull request #45436: [SPARK-47327][SQL] Fix thread safety issue in ICU Collator URL: https://github.com/apache/spark/pull/45436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47327][SQL] Fix thread safety issue in ICU Collator [spark]

2024-03-11 Thread via GitHub

HyukjinKwon commented on PR #45436: URL: https://github.com/apache/spark/pull/45436#issuecomment-1989650161 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-11 Thread via GitHub

wbo4958 commented on PR #45232: URL: https://github.com/apache/spark/pull/45232#issuecomment-1989639850 Hi @grundprinzip, Could you help review it again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub

ueshin commented on code in PR #45468: URL: https://github.com/apache/spark/pull/45468#discussion_r1520516470 ## core/src/main/scala/org/apache/spark/SparkEnv.scala: ## @@ -141,20 +141,22 @@ class SparkEnv ( pythonExec: String, workerModule: String,

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub

anishshri-db commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1520520070 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -103,8 +116,12 @@ case class TransformWithStateExec(

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub

anishshri-db commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1520519676 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -121,6 +123,46 @@ class StatefulProcessorHandleImpl(

Re: [PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub

allisonwang-db commented on PR #45468: URL: https://github.com/apache/spark/pull/45468#issuecomment-1989589046 cc @ueshin @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub

allisonwang-db opened a new pull request, #45468: URL: https://github.com/apache/spark/pull/45468 ### What changes were proposed in this pull request? This PR adds an extra config to env.createPythonWorker to make daemon mode configurable to give more flexibility when

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-03-11 Thread via GitHub

erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1520503365 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -897,6 +897,138 @@ class AdaptiveQueryExecSuite }

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub

anishshri-db commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1520472228 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -121,6 +123,46 @@ class StatefulProcessorHandleImpl(

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub

allisonwang-db commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520455477 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub

allisonwang-db commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520455477 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub

allisonwang-db commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520450074 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

[PR] WIP [spark]

2024-03-11 Thread via GitHub

jingz-db opened a new pull request, #45467: URL: https://github.com/apache/spark/pull/45467 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub

ahshahid commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520395436 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47309][SQL][XML] Add schema inference unit tests [spark]

2024-03-11 Thread via GitHub

shujingyang-db commented on code in PR #45411: URL: https://github.com/apache/spark/pull/45411#discussion_r1520393576 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/XmlInferSchemaSuite.scala: ## @@ -0,0 +1,296 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-47309][SQL][XML] Add schema inference unit tests [spark]

2024-03-11 Thread via GitHub

shujingyang-db commented on code in PR #45411: URL: https://github.com/apache/spark/pull/45411#discussion_r1520394446 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/TestXmlData.scala: ## @@ -68,4 +68,444 @@ private[xml] trait TestXmlData { f(dir)

Re: [PR] [SPARK-47309][SQL][XML] Add schema inference unit tests [spark]

2024-03-11 Thread via GitHub

shujingyang-db commented on code in PR #45411: URL: https://github.com/apache/spark/pull/45411#discussion_r1520393576 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/XmlInferSchemaSuite.scala: ## @@ -0,0 +1,296 @@ +/* + * Licensed to the Apache

[PR] [SPARK-47345] [SQL]: Xml functions suite [spark]

2024-03-11 Thread via GitHub

yhosny opened a new pull request, #45466: URL: https://github.com/apache/spark/pull/45466 ### What changes were proposed in this pull request? Convert JsonFunctiosnSuite.scala to XML equivalent. Note that XML doesn’t implement all json functions like json_tuple,

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub

chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520368942 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonStreamingDataSourceSuite.scala: ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub

ahshahid commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520323348 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub

ahshahid commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520314613 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-11 Thread via GitHub

anishshri-db commented on code in PR #45360: URL: https://github.com/apache/spark/pull/45360#discussion_r1520296947 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -582,7 +636,7 @@ class RocksDBSuite extends

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-11 Thread via GitHub

anishshri-db commented on code in PR #45360: URL: https://github.com/apache/spark/pull/45360#discussion_r1520296736 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -536,6 +536,67 @@ class RocksDBSuite extends

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-11 Thread via GitHub

sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1520267117 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -635,6 +636,22 @@ trait ShuffleSpec { */ def

Re: [PR] [WIP][SPARK 46840] Add sql.execution.benchmark.CollationBenchmark.scala Scaffolding [spark]

2024-03-11 Thread via GitHub

GideonPotok commented on PR #45453: URL: https://github.com/apache/spark/pull/45453#issuecomment-1989191094 > @GideonPotok - I think that better approach for benchmarking collation track is to start with the basics. e.g. unit benchmarks against `CollationFactory` +`UTF8String`. E.g. what

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-11 Thread via GitHub

sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1520187304 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub

jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520256936 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub

chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520229021 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-47307][SQL] Add a config to optionally chunk base64 strings [spark]

2024-03-11 Thread via GitHub

dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1989116494 Could you do the following to re-generate the golden files, @ted-jenks ? ``` SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"

Re: [PR] [SPARK-47339][BUILD] Upgrade checkStyle to `10.14.0` [spark]

2024-03-11 Thread via GitHub

dongjoon-hyun commented on PR #45451: URL: https://github.com/apache/spark/pull/45451#issuecomment-1989111676 Thank you, @panbingkun . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47339][BUILD] Upgrade checkStyle to `10.14.0` [spark]

2024-03-11 Thread via GitHub

dongjoon-hyun closed pull request #45451: [SPARK-47339][BUILD] Upgrade checkStyle to `10.14.0` URL: https://github.com/apache/spark/pull/45451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-03-11 Thread via GitHub

dongjoon-hyun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-1989108153 It seems that one of `ClientStreamingQuerySuite` test hangs due to the independent flakiness. Could you re-trigger it when it fails? ``` [info] *** Test still running after 3

Re: [PR] [SPARK-45245][CONNECT][TESTS][FOLLOW-UP] Remove unneeded Matchers trait in the test [spark]

2024-03-11 Thread via GitHub

dongjoon-hyun commented on PR #45459: URL: https://github.com/apache/spark/pull/45459#issuecomment-1989103412 Merged to master. Thank you all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45245][CONNECT][TESTS][FOLLOW-UP] Remove unneeded Matchers trait in the test [spark]

2024-03-11 Thread via GitHub

dongjoon-hyun closed pull request #45459: [SPARK-45245][CONNECT][TESTS][FOLLOW-UP] Remove unneeded Matchers trait in the test URL: https://github.com/apache/spark/pull/45459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45827][SQL][FOLLOWUP] Fix for collation [spark]

2024-03-11 Thread via GitHub

dongjoon-hyun commented on PR #45463: URL: https://github.com/apache/spark/pull/45463#issuecomment-1989098700 Thank you, @cashmand and all. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-45827][SQL][FOLLOWUP] Fix for collation [spark]

2024-03-11 Thread via GitHub

dongjoon-hyun closed pull request #45463: [SPARK-45827][SQL][FOLLOWUP] Fix for collation URL: https://github.com/apache/spark/pull/45463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Raykim/iceberg 150 [spark]

2024-03-11 Thread via GitHub

rayhondo closed pull request #45465: Raykim/iceberg 150 URL: https://github.com/apache/spark/pull/45465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[PR] Raykim/iceberg 150 [spark]

2024-03-11 Thread via GitHub

rayhondo opened a new pull request, #45465: URL: https://github.com/apache/spark/pull/45465 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SQL] Bind JDBC dialect to JDBCRDD at construction [spark]

2024-03-11 Thread via GitHub

johnnywalker commented on code in PR #45410: URL: https://github.com/apache/spark/pull/45410#discussion_r1520132359 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala: ## @@ -153,12 +153,12 @@ object JDBCRDD extends Logging { */ class

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub

peter-toth commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520115101 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub

peter-toth commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520078411 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub

peter-toth commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520078411 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

2024-03-11 Thread via GitHub

EnricoMi commented on PR #45464: URL: https://github.com/apache/spark/pull/45464#issuecomment-1988911223 > @EnricoMi this looks much simpler than my previous attempt #38357 Thanks for the pointer! I have a PR for driver log support in the pipeline. -- This is an automated message

Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

2024-03-11 Thread via GitHub

EnricoMi commented on code in PR #45464: URL: https://github.com/apache/spark/pull/45464#discussion_r1520050341 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala: ## @@ -28,6 +28,46 @@ import

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub

peter-toth commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520049076 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

2024-03-11 Thread via GitHub

EnricoMi commented on code in PR #45464: URL: https://github.com/apache/spark/pull/45464#discussion_r1519987571 ## docs/configuration.md: ## @@ -1627,15 +1627,13 @@ Apart from these, the following properties are also available, and may be useful

Re: [PR] [WIP][SPARK 46840] Add sql.execution.benchmark.CollationBenchmark.scala Scaffolding [spark]

2024-03-11 Thread via GitHub

dbatomic commented on PR #45453: URL: https://github.com/apache/spark/pull/45453#issuecomment-1988798774 @GideonPotok - I think that better approach for benchmarking collation track is to start with the basics. e.g. unit benchmarks against `CollationFactory` +`UTF8String`. E.g. what is the

Re: [PR] [SPARK-47295][SQL][COLLATION] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-11 Thread via GitHub

stevomitric commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1519971782 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -378,13 +378,6 @@ public boolean matchAt(final UTF8String s, int pos) {

Re: [PR] [SPARK-47255][SQL] Assign names to the error classes _LEGACY_ERROR_TEMP_323[6-7] and _LEGACY_ERROR_TEMP_324[7-9] [spark]

2024-03-11 Thread via GitHub

miland-db commented on PR #45423: URL: https://github.com/apache/spark/pull/45423#issuecomment-1988766106 Thank you! @MaxGekk and thank you @HyukjinKwon for the comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [WIP][SPARK-47254][SQL] Assign names to the error classes _LEGACY_ERROR_TEMP_325[1-9] [spark]

2024-03-11 Thread via GitHub

MaxGekk commented on code in PR #45407: URL: https://github.com/apache/spark/pull/45407#discussion_r1519942137 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkIntervalUtils.scala: ## @@ -131,24 +131,21 @@ trait SparkIntervalUtils { */ def

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub

sahnib commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1519943584 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -103,8 +116,12 @@ case class TransformWithStateExec( val

Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

2024-03-11 Thread via GitHub

pan3793 commented on code in PR #45464: URL: https://github.com/apache/spark/pull/45464#discussion_r1519873684 ## docs/configuration.md: ## @@ -1627,15 +1627,13 @@ Apart from these, the following properties are also available, and may be useful

Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

2024-03-11 Thread via GitHub

pan3793 commented on PR #45464: URL: https://github.com/apache/spark/pull/45464#issuecomment-1988639547 @EnricoMi this looks much simpler than my previous attempt https://github.com/apache/spark/pull/38357 -- This is an automated message from the Apache Git Service. To respond to the

1 2 >

1 - 100 of 192 matches

Mail list logo