Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984929745 +1 for the direction if we need to support both. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984930529 As the Spark Community didn't get any issue report during v3.3.0 - v3.5.1 releases, I think this is a corner case. Maybe we can make the config internal. -- This is an automated messag

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517084721 ## python/pyspark/resource/profile.py: ## @@ -114,14 +122,23 @@ def id(self) -> int: int A unique id of this :class:`ResourceProfile` """

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517084819 ## python/pyspark/resource/profile.py: ## @@ -114,14 +122,23 @@ def id(self) -> int: int A unique id of this :class:`ResourceProfile` """

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517085041 ## python/pyspark/resource/tests/test_connect_resources.py: ## @@ -0,0 +1,46 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517085186 ## python/pyspark/resource/tests/test_connect_resources.py: ## @@ -0,0 +1,46 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45424: URL: https://github.com/apache/spark/pull/45424#discussion_r1517119767 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala: ## @@ -104,13 +104,19 @@ class AttributeSet private (private val baseSet:

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1517121075 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging { */

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1517121687 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +85,28 @@ class BasicInMemoryTableCatalog extends Table

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1517123366 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +85,28 @@ class BasicInMemoryTableCatalog extends Tab

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45360: URL: https://github.com/apache/spark/pull/45360#issuecomment-1984994353 Is this good to go? @HeartSaVioR @rangadi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1517155629 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging { */

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1517168161 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging { */

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45424: [SPARK-47319][SQL] Improve missingInput calculation URL: https://github.com/apache/spark/pull/45424 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45424: URL: https://github.com/apache/spark/pull/45424#issuecomment-1985053630 Thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1517119556 ## sql/api/src/main/scala/org/apache/spark/sql/streaming/ExpiredTimerInfo.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[PR] [SPARK-47305][SQL][TESTS] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang opened a new pull request, #45428: URL: https://github.com/apache/spark/pull/45428 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985094500 This is my first time handling such a situation, is it better to create a new Jira or is it better as a FOLLOWUP of SPARK-47305? cc @HyukjinKwon @HeartSaVioR @zhengruifeng -

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985099843 also cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985100729 Thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517209325 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class Dat

Re: [PR] [SPARK-46510][CORE] Spark shell log filter should be applied to all AbstractAppender [spark]

2024-03-07 Thread via GitHub
AngersZh closed pull request #44496: [SPARK-46510][CORE] Spark shell log filter should be applied to all AbstractAppender URL: https://github.com/apache/spark/pull/44496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46654][SQL][Python] Make `to_csv` explicitly indicate that it does not support complex types of data [spark]

2024-03-07 Thread via GitHub
panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-1985106638 friendly ping @HyukjinKwon, When you are not busy, can you please continue to help review this PR? -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1517226969 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -396,7 +396,9 @@ public boolean startsWith(final UTF8String prefix, int collationId

[PR] [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references [spark]

2024-03-07 Thread via GitHub
zhengruifeng opened a new pull request, #45429: URL: https://github.com/apache/spark/pull/45429 ### What changes were proposed in this pull request? Add `VariantType` to API references ### Why are the changes needed? `VariantType` has been added in `__all__` in `types`

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1517229022 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -410,7 +412,9 @@ public boolean endsWith(final UTF8String suffix, int collationId)

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1517236308 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517243024 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } + te

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517243024 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } + te

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517243024 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } + te

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1517217945 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StatefulProcessorHandleSuite.scala: ## @@ -0,0 +1,299 @@ +/* + * Licensed to the Apache So

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on PR #45360: URL: https://github.com/apache/spark/pull/45360#issuecomment-1985157244 Will review sooner than later. Maybe by today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45429: URL: https://github.com/apache/spark/pull/45429#issuecomment-1985159258 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45429: [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references URL: https://github.com/apache/spark/pull/45429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985161811 FOLLOWUP tag should be OK. Thanks for handling this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] [SPARK-47322][PYTHON][CONNECT] Make `withColumnsRenamed` duplicated column name handling consisten with `withColumnRenamed` [spark]

2024-03-07 Thread via GitHub
zhengruifeng opened a new pull request, #45431: URL: https://github.com/apache/spark/pull/45431 ### What changes were proposed in this pull request? Make `withColumnsRenamed` duplicated column name handling consistent with `withColumnRenamed` ### Why are the changes needed?

Re: [PR] [SPARK-47322][PYTHON][CONNECT] Make `withColumnsRenamed` duplicated column name handling consisten with `withColumnRenamed` [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45431: URL: https://github.com/apache/spark/pull/45431#issuecomment-1985167614 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517309692 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } + te

[PR] [WIP] Issue to fix foreachbatch persist issue for stateful queries [spark]

2024-03-07 Thread via GitHub
anishshri-db opened a new pull request, #45432: URL: https://github.com/apache/spark/pull/45432 ### What changes were proposed in this pull request? Issue to fix foreachbatch persist issue for stateful queries ### Why are the changes needed? This allows us to prevent stateful

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517348253 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class Dat

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517351498 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517356072 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517356903 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517356903 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

<    1   2