Re: [PR] [WIP][SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-03-11 Thread via GitHub
panbingkun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-1990682229 @dongjoon-hyun To be more `reliable`, I will convert this pr from `draft` to `review` until GA can run successfully. I have tried the test `ClientStreamingQuerySuite ` several

[PR] [SPARK-47347][PYTHON][CONNECT][TESTS] Factor session-related tests out of `test_connect_basic` [spark]

2024-03-11 Thread via GitHub
zhengruifeng opened a new pull request, #45472: URL: https://github.com/apache/spark/pull/45472 ### What changes were proposed in this pull request? Factor session-related tests out of `test_connect_basic` ### Why are the changes needed? for testing parallelism ###

Re: [PR] [SPARK-47279][CORE]When the messageLoop encounter a fatal exception, such as oom, exit the JVM to avoid the driver hanging forever [spark]

2024-03-11 Thread via GitHub
yaooqinn commented on PR #45385: URL: https://github.com/apache/spark/pull/45385#issuecomment-1990595991 Instead of handling such a special case here, JVM has provided helpful arguments to deal with OutOfMemoryError. -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-11 Thread via GitHub
HeartSaVioR closed pull request #45360: [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families URL: https://github.com/apache/spark/pull/45360 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-11 Thread via GitHub
HeartSaVioR commented on PR #45360: URL: https://github.com/apache/spark/pull/45360#issuecomment-1990520156 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47332][SS][Connect] Remove not needed logic in PythonStreamingRunner [spark]

2024-03-11 Thread via GitHub
WweiL commented on PR #45448: URL: https://github.com/apache/spark/pull/45448#issuecomment-1990155844 Closing this since https://github.com/apache/spark/pull/45468 has the same change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47332][SS][Connect] Remove not needed logic in PythonStreamingRunner [spark]

2024-03-11 Thread via GitHub
WweiL closed pull request #45448: [SPARK-47332][SS][Connect] Remove not needed logic in PythonStreamingRunner URL: https://github.com/apache/spark/pull/45448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub
WweiL commented on code in PR #45468: URL: https://github.com/apache/spark/pull/45468#discussion_r1520814035 ## core/src/main/scala/org/apache/spark/api/python/StreamingPythonRunner.scala: ## @@ -68,17 +68,11 @@ private[spark] class StreamingPythonRunner(

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub
HeartSaVioR closed pull request #45023: [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source URL: https://github.com/apache/spark/pull/45023 -- This is an automated message from the Apache Git Service.

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub
HeartSaVioR commented on PR #45023: URL: https://github.com/apache/spark/pull/45023#issuecomment-1990058421 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub
HeartSaVioR commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520802905 ## python/pyspark/sql/datasource.py: ## @@ -426,6 +426,10 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: in the final

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-11 Thread via GitHub
panbingkun commented on code in PR #44665: URL: https://github.com/apache/spark/pull/44665#discussion_r1520791983 ## python/pyspark/sql/functions/builtin.py: ## @@ -15534,19 +15532,7 @@ def to_csv(col: "ColumnOrName", options: Optional[Dict[str, str]] = None) -> Col |

[PR] [SPARK-47342][SQL] Support TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE [spark]

2024-03-11 Thread via GitHub
yaooqinn opened a new pull request, #45471: URL: https://github.com/apache/spark/pull/45471 ### What changes were proposed in this pull request? This PR Supports TimestampNTZ for DB2 TIMESTAMP WITH TIME ZONE when `preferTimestampNTZ` option is set to true by users

Re: [PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub
panbingkun commented on PR #45469: URL: https://github.com/apache/spark/pull/45469#issuecomment-1989938050 The pr related to spark-website is here: https://github.com/apache/spark-website/pull/508 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-46043][SQL][FOLLOWUP] do not resolve v2 table provider with custom session catalog [spark]

2024-03-11 Thread via GitHub
yaooqinn commented on PR #45440: URL: https://github.com/apache/spark/pull/45440#issuecomment-1989935613 Merged to master Thank you @cloud-fan @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46043][SQL][FOLLOWUP] do not resolve v2 table provider with custom session catalog [spark]

2024-03-11 Thread via GitHub
yaooqinn closed pull request #45440: [SPARK-46043][SQL][FOLLOWUP] do not resolve v2 table provider with custom session catalog URL: https://github.com/apache/spark/pull/45440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] [SPARK=47344] Extend INVALID_IDENTIFIER [spark]

2024-03-11 Thread via GitHub
srielau opened a new pull request, #45470: URL: https://github.com/apache/spark/pull/45470 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub
yaooqinn commented on PR #45469: URL: https://github.com/apache/spark/pull/45469#issuecomment-1989889832 Merged to master. Thank you all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub
yaooqinn closed pull request #45469: [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates URL: https://github.com/apache/spark/pull/45469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub
panbingkun commented on PR #45469: URL: https://github.com/apache/spark/pull/45469#issuecomment-1989795305 cc @itholic @zhengruifeng @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub
panbingkun commented on PR #45469: URL: https://github.com/apache/spark/pull/45469#issuecomment-1989791290 This issue has existed since version `3.4.0`. After this PR, I will submit `a patch` to fix the doc in `spark-website`.

[PR] [MINOR][DOCS] Remove the extra text on page sql-error-conditions-sqlstates [spark]

2024-03-11 Thread via GitHub
panbingkun opened a new pull request, #45469: URL: https://github.com/apache/spark/pull/45469 ### What changes were proposed in this pull request? The pr aims to remove the extra text `.. include:: /shared/replacements.md` on page `sql-error-conditions-sqlstates.md`. ### Why are

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub
HeartSaVioR commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520711461 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -73,24 +74,31 @@ class MapStateImpl[K, V]( } /** Get the map

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub
HeartSaVioR commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520713399 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -73,24 +74,31 @@ class MapStateImpl[K, V]( } /** Get the map

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub
HeartSaVioR commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520711461 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -73,24 +74,31 @@ class MapStateImpl[K, V]( } /** Get the map

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub
jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520699470 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/MapStateSuite.scala: ## @@ -67,6 +67,7 @@ class MapStateSuite extends StateVariableSuiteBase

Re: [PR] [SPARK-46654][SQL][PYTHON] Make `to_csv` explicitly indicate that it does not support some types of data [spark]

2024-03-11 Thread via GitHub
MaxGekk commented on code in PR #44665: URL: https://github.com/apache/spark/pull/44665#discussion_r1520694548 ## python/pyspark/sql/functions/builtin.py: ## @@ -15534,19 +15532,7 @@ def to_csv(col: "ColumnOrName", options: Optional[Dict[str, str]] = None) -> Col |

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub
HeartSaVioR commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520688468 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/MapStateSuite.scala: ## @@ -67,6 +67,7 @@ class MapStateSuite extends

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub
HeartSaVioR commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520688468 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/MapStateSuite.scala: ## @@ -67,6 +67,7 @@ class MapStateSuite extends

Re: [PR] [MINOR] Minor English fixes [spark]

2024-03-11 Thread via GitHub
nchammas commented on PR #45461: URL: https://github.com/apache/spark/pull/45461#issuecomment-1989744426 Ah, the test failure is due to the generated error documentation that is checked in to git. #44971 will eliminate this kind of maintenance headache. (Also, look at that diff

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub
anishshri-db commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1520644679 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -163,6 +249,16 @@ case class TransformWithStateExec(

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub
anishshri-db commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1520644342 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -121,6 +123,46 @@ class StatefulProcessorHandleImpl(

Re: [PR] [SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-03-11 Thread via GitHub
panbingkun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-1989711132 > It seems that one of `ClientStreamingQuerySuite` test hangs due to the independent flakiness. Could you re-trigger it when it fails? Sure, let me continue to observe and

Re: [PR] [SPARK-47343][SQL] Fix NPE when `sqlString` variable value is null string in execute immediate [spark]

2024-03-11 Thread via GitHub
MaxGekk commented on code in PR #45462: URL: https://github.com/apache/spark/pull/45462#discussion_r1520631768 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -3004,6 +3004,12 @@ ], "sqlState" : "2200E" }, + "NULL_QUERY_STRING_EXECUTE_IMMEDIATE"

Re: [PR] [SPARK-47335][BUILD] Upgrade `mvn-scalafmt` to `1.1.1684076452.9f83818` & `scalafmt` to `3.8.0` [spark]

2024-03-11 Thread via GitHub
panbingkun commented on PR #45452: URL: https://github.com/apache/spark/pull/45452#issuecomment-1989710105 > Please let me know if this is ready, @panbingkun . Yeah, it's ready  -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] [MINOR] Minor English fixes [spark]

2024-03-11 Thread via GitHub
xinrong-meng commented on PR #45461: URL: https://github.com/apache/spark/pull/45461#issuecomment-1989702036 LGTM after fixing the test, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub
anishshri-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520618069 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub
jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520613982 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-47309][SQL][XML] Add schema inference unit tests [spark]

2024-03-11 Thread via GitHub
HyukjinKwon closed pull request #45411: [SPARK-47309][SQL][XML] Add schema inference unit tests URL: https://github.com/apache/spark/pull/45411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47309][SQL][XML] Add schema inference unit tests [spark]

2024-03-11 Thread via GitHub
HyukjinKwon commented on PR #45411: URL: https://github.com/apache/spark/pull/45411#issuecomment-1989688316 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub
allisonwang-db commented on code in PR #45468: URL: https://github.com/apache/spark/pull/45468#discussion_r1520588014 ## core/src/main/scala/org/apache/spark/api/python/StreamingPythonRunner.scala: ## @@ -68,17 +68,11 @@ private[spark] class StreamingPythonRunner(

Re: [PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub
HyukjinKwon commented on PR #45468: URL: https://github.com/apache/spark/pull/45468#issuecomment-1989660024 im fine w/ this change but would defer to @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47332][SS][Connect] Remove not needed logic in PythonStreamingRunner [spark]

2024-03-11 Thread via GitHub
HyukjinKwon commented on PR #45448: URL: https://github.com/apache/spark/pull/45448#issuecomment-1989657160 seems the test failure is related -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47341][Connect] Replace commands with relations in a few tests in SparkConnectClientSuite [spark]

2024-03-11 Thread via GitHub
HyukjinKwon closed pull request #45460: [SPARK-47341][Connect] Replace commands with relations in a few tests in SparkConnectClientSuite URL: https://github.com/apache/spark/pull/45460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47341][Connect] Replace commands with relations in a few tests in SparkConnectClientSuite [spark]

2024-03-11 Thread via GitHub
HyukjinKwon commented on PR #45460: URL: https://github.com/apache/spark/pull/45460#issuecomment-1989654626 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR] Minor English fixes [spark]

2024-03-11 Thread via GitHub
HyukjinKwon commented on PR #45461: URL: https://github.com/apache/spark/pull/45461#issuecomment-1989652642 I think the test failure is related: ``` [info] - Error classes match with document *** FAILED *** (145 milliseconds) [info] "...one of the DataFrame[s] but Spark is

Re: [PR] [SPARK-47327][SQL] Fix thread safety issue in ICU Collator [spark]

2024-03-11 Thread via GitHub
HyukjinKwon closed pull request #45436: [SPARK-47327][SQL] Fix thread safety issue in ICU Collator URL: https://github.com/apache/spark/pull/45436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47327][SQL] Fix thread safety issue in ICU Collator [spark]

2024-03-11 Thread via GitHub
HyukjinKwon commented on PR #45436: URL: https://github.com/apache/spark/pull/45436#issuecomment-1989650161 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-11 Thread via GitHub
wbo4958 commented on PR #45232: URL: https://github.com/apache/spark/pull/45232#issuecomment-1989639850 Hi @grundprinzip, Could you help review it again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub
ueshin commented on code in PR #45468: URL: https://github.com/apache/spark/pull/45468#discussion_r1520516470 ## core/src/main/scala/org/apache/spark/SparkEnv.scala: ## @@ -141,20 +141,22 @@ class SparkEnv ( pythonExec: String, workerModule: String,

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub
anishshri-db commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1520520070 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -103,8 +116,12 @@ case class TransformWithStateExec(

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub
anishshri-db commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1520519676 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -121,6 +123,46 @@ class StatefulProcessorHandleImpl(

Re: [PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub
allisonwang-db commented on PR #45468: URL: https://github.com/apache/spark/pull/45468#issuecomment-1989589046 cc @ueshin @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-47346][PYTHON] Make daemon mode configurable when creating Python planner workers [spark]

2024-03-11 Thread via GitHub
allisonwang-db opened a new pull request, #45468: URL: https://github.com/apache/spark/pull/45468 ### What changes were proposed in this pull request? This PR adds an extra config to env.createPythonWorker to make daemon mode configurable to give more flexibility when

Re: [PR] [SPARK-47148][SQL] Avoid to materialize AQE ExchangeQueryStageExec on the cancellation [spark]

2024-03-11 Thread via GitHub
erenavsarogullari commented on code in PR #45234: URL: https://github.com/apache/spark/pull/45234#discussion_r1520503365 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -897,6 +897,138 @@ class AdaptiveQueryExecSuite }

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub
anishshri-db commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1520472228 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -121,6 +123,46 @@ class StatefulProcessorHandleImpl(

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub
allisonwang-db commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520455477 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub
allisonwang-db commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520455477 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub
allisonwang-db commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520450074 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

[PR] WIP [spark]

2024-03-11 Thread via GitHub
jingz-db opened a new pull request, #45467: URL: https://github.com/apache/spark/pull/45467 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub
ahshahid commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520395436 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47309][SQL][XML] Add schema inference unit tests [spark]

2024-03-11 Thread via GitHub
shujingyang-db commented on code in PR #45411: URL: https://github.com/apache/spark/pull/45411#discussion_r1520393576 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/XmlInferSchemaSuite.scala: ## @@ -0,0 +1,296 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-47309][SQL][XML] Add schema inference unit tests [spark]

2024-03-11 Thread via GitHub
shujingyang-db commented on code in PR #45411: URL: https://github.com/apache/spark/pull/45411#discussion_r1520394446 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/TestXmlData.scala: ## @@ -68,4 +68,444 @@ private[xml] trait TestXmlData { f(dir)

Re: [PR] [SPARK-47309][SQL][XML] Add schema inference unit tests [spark]

2024-03-11 Thread via GitHub
shujingyang-db commented on code in PR #45411: URL: https://github.com/apache/spark/pull/45411#discussion_r1520393576 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/xml/XmlInferSchemaSuite.scala: ## @@ -0,0 +1,296 @@ +/* + * Licensed to the Apache

[PR] [SPARK-47345] [SQL]: Xml functions suite [spark]

2024-03-11 Thread via GitHub
yhosny opened a new pull request, #45466: URL: https://github.com/apache/spark/pull/45466 ### What changes were proposed in this pull request? Convert JsonFunctiosnSuite.scala to XML equivalent. Note that XML doesn’t implement all json functions like json_tuple,

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520368942 ## sql/core/src/test/scala/org/apache/spark/sql/execution/python/PythonStreamingDataSourceSuite.scala: ## @@ -0,0 +1,233 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub
ahshahid commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520323348 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub
ahshahid commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520314613 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-11 Thread via GitHub
anishshri-db commented on code in PR #45360: URL: https://github.com/apache/spark/pull/45360#discussion_r1520296947 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -582,7 +636,7 @@ class RocksDBSuite extends

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-11 Thread via GitHub
anishshri-db commented on code in PR #45360: URL: https://github.com/apache/spark/pull/45360#discussion_r1520296736 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala: ## @@ -536,6 +536,67 @@ class RocksDBSuite extends

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-11 Thread via GitHub
sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1520267117 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala: ## @@ -635,6 +636,22 @@ trait ShuffleSpec { */ def

Re: [PR] [WIP][SPARK 46840] Add sql.execution.benchmark.CollationBenchmark.scala Scaffolding [spark]

2024-03-11 Thread via GitHub
GideonPotok commented on PR #45453: URL: https://github.com/apache/spark/pull/45453#issuecomment-1989191094 > @GideonPotok - I think that better approach for benchmarking collation track is to start with the basics. e.g. unit benchmarks against `CollationFactory` +`UTF8String`. E.g. what

Re: [PR] [SPARK-47094][SQL] SPJ : Dynamically rebalance number of buckets when they are not equal [spark]

2024-03-11 Thread via GitHub
sunchao commented on code in PR #45267: URL: https://github.com/apache/spark/pull/45267#discussion_r1520187304 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/ReducibleFunction.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47272][SS] Add MapState implementation for State API v2. [spark]

2024-03-11 Thread via GitHub
jingz-db commented on code in PR #45341: URL: https://github.com/apache/spark/pull/45341#discussion_r1520256936 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MapStateImpl.scala: ## @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-11 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1520229021 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-47307][SQL] Add a config to optionally chunk base64 strings [spark]

2024-03-11 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1989116494 Could you do the following to re-generate the golden files, @ted-jenks ? ``` SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite"

Re: [PR] [SPARK-47339][BUILD] Upgrade checkStyle to `10.14.0` [spark]

2024-03-11 Thread via GitHub
dongjoon-hyun commented on PR #45451: URL: https://github.com/apache/spark/pull/45451#issuecomment-1989111676 Thank you, @panbingkun . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47339][BUILD] Upgrade checkStyle to `10.14.0` [spark]

2024-03-11 Thread via GitHub
dongjoon-hyun closed pull request #45451: [SPARK-47339][BUILD] Upgrade checkStyle to `10.14.0` URL: https://github.com/apache/spark/pull/45451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47194][BUILD] Upgrade log4j to 2.23.1 [spark]

2024-03-11 Thread via GitHub
dongjoon-hyun commented on PR #45326: URL: https://github.com/apache/spark/pull/45326#issuecomment-1989108153 It seems that one of `ClientStreamingQuerySuite` test hangs due to the independent flakiness. Could you re-trigger it when it fails? ``` [info] *** Test still running after 3

Re: [PR] [SPARK-45245][CONNECT][TESTS][FOLLOW-UP] Remove unneeded Matchers trait in the test [spark]

2024-03-11 Thread via GitHub
dongjoon-hyun commented on PR #45459: URL: https://github.com/apache/spark/pull/45459#issuecomment-1989103412 Merged to master. Thank you all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45245][CONNECT][TESTS][FOLLOW-UP] Remove unneeded Matchers trait in the test [spark]

2024-03-11 Thread via GitHub
dongjoon-hyun closed pull request #45459: [SPARK-45245][CONNECT][TESTS][FOLLOW-UP] Remove unneeded Matchers trait in the test URL: https://github.com/apache/spark/pull/45459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-45827][SQL][FOLLOWUP] Fix for collation [spark]

2024-03-11 Thread via GitHub
dongjoon-hyun commented on PR #45463: URL: https://github.com/apache/spark/pull/45463#issuecomment-1989098700 Thank you, @cashmand and all. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-45827][SQL][FOLLOWUP] Fix for collation [spark]

2024-03-11 Thread via GitHub
dongjoon-hyun closed pull request #45463: [SPARK-45827][SQL][FOLLOWUP] Fix for collation URL: https://github.com/apache/spark/pull/45463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Raykim/iceberg 150 [spark]

2024-03-11 Thread via GitHub
rayhondo closed pull request #45465: Raykim/iceberg 150 URL: https://github.com/apache/spark/pull/45465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[PR] Raykim/iceberg 150 [spark]

2024-03-11 Thread via GitHub
rayhondo opened a new pull request, #45465: URL: https://github.com/apache/spark/pull/45465 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SQL] Bind JDBC dialect to JDBCRDD at construction [spark]

2024-03-11 Thread via GitHub
johnnywalker commented on code in PR #45410: URL: https://github.com/apache/spark/pull/45410#discussion_r1520132359 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala: ## @@ -153,12 +153,12 @@ object JDBCRDD extends Logging { */ class

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub
peter-toth commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520115101 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub
peter-toth commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520078411 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub
peter-toth commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520078411 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

2024-03-11 Thread via GitHub
EnricoMi commented on PR #45464: URL: https://github.com/apache/spark/pull/45464#issuecomment-1988911223 > @EnricoMi this looks much simpler than my previous attempt #38357 Thanks for the pointer! I have a PR for driver log support in the pipeline. -- This is an automated message

Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

2024-03-11 Thread via GitHub
EnricoMi commented on code in PR #45464: URL: https://github.com/apache/spark/pull/45464#discussion_r1520050341 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala: ## @@ -28,6 +28,46 @@ import

Re: [PR] [SPARK-47217][SQL] Fix ambiguity check in self joins [spark]

2024-03-11 Thread via GitHub
peter-toth commented on code in PR #45343: URL: https://github.com/apache/spark/pull/45343#discussion_r1520049076 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSelfJoinSuite.scala: ## @@ -498,4 +559,70 @@ class DataFrameSelfJoinSuite extends QueryTest with

Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

2024-03-11 Thread via GitHub
EnricoMi commented on code in PR #45464: URL: https://github.com/apache/spark/pull/45464#discussion_r1519987571 ## docs/configuration.md: ## @@ -1627,15 +1627,13 @@ Apart from these, the following properties are also available, and may be useful

Re: [PR] [WIP][SPARK 46840] Add sql.execution.benchmark.CollationBenchmark.scala Scaffolding [spark]

2024-03-11 Thread via GitHub
dbatomic commented on PR #45453: URL: https://github.com/apache/spark/pull/45453#issuecomment-1988798774 @GideonPotok - I think that better approach for benchmarking collation track is to start with the basics. e.g. unit benchmarks against `CollationFactory` +`UTF8String`. E.g. what is the

Re: [PR] [SPARK-47295][SQL][COLLATION] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-11 Thread via GitHub
stevomitric commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1519971782 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -378,13 +378,6 @@ public boolean matchAt(final UTF8String s, int pos) {

Re: [PR] [SPARK-47255][SQL] Assign names to the error classes _LEGACY_ERROR_TEMP_323[6-7] and _LEGACY_ERROR_TEMP_324[7-9] [spark]

2024-03-11 Thread via GitHub
miland-db commented on PR #45423: URL: https://github.com/apache/spark/pull/45423#issuecomment-1988766106 Thank you! @MaxGekk and thank you @HyukjinKwon for the comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [WIP][SPARK-47254][SQL] Assign names to the error classes _LEGACY_ERROR_TEMP_325[1-9] [spark]

2024-03-11 Thread via GitHub
MaxGekk commented on code in PR #45407: URL: https://github.com/apache/spark/pull/45407#discussion_r1519942137 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkIntervalUtils.scala: ## @@ -131,24 +131,21 @@ trait SparkIntervalUtils { */ def

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-11 Thread via GitHub
sahnib commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1519943584 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/TransformWithStateExec.scala: ## @@ -103,8 +116,12 @@ case class TransformWithStateExec( val

Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

2024-03-11 Thread via GitHub
pan3793 commented on code in PR #45464: URL: https://github.com/apache/spark/pull/45464#discussion_r1519873684 ## docs/configuration.md: ## @@ -1627,15 +1627,13 @@ Apart from these, the following properties are also available, and may be useful

Re: [PR] [SPARK-47323][K8S] Support custom executor log urls [spark]

2024-03-11 Thread via GitHub
pan3793 commented on PR #45464: URL: https://github.com/apache/spark/pull/45464#issuecomment-1988639547 @EnricoMi this looks much simpler than my previous attempt https://github.com/apache/spark/pull/38357 -- This is an automated message from the Apache Git Service. To respond to the

  1   2   >