Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517356903 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517356903 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517356072 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517351498 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517348253 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

[PR] [WIP] Issue to fix foreachbatch persist issue for stateful queries [spark]

2024-03-07 Thread via GitHub
anishshri-db opened a new pull request, #45432: URL: https://github.com/apache/spark/pull/45432 ### What changes were proposed in this pull request? Issue to fix foreachbatch persist issue for stateful queries ### Why are the changes needed? This allows us to prevent

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517309692 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-47322][PYTHON][CONNECT] Make `withColumnsRenamed` duplicated column name handling consisten with `withColumnRenamed` [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45431: URL: https://github.com/apache/spark/pull/45431#issuecomment-1985167614 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] [SPARK-47322][PYTHON][CONNECT] Make `withColumnsRenamed` duplicated column name handling consisten with `withColumnRenamed` [spark]

2024-03-07 Thread via GitHub
zhengruifeng opened a new pull request, #45431: URL: https://github.com/apache/spark/pull/45431 ### What changes were proposed in this pull request? Make `withColumnsRenamed` duplicated column name handling consistent with `withColumnRenamed` ### Why are the changes needed?

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985161811 FOLLOWUP tag should be OK. Thanks for handling this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45429: [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references URL: https://github.com/apache/spark/pull/45429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45429: URL: https://github.com/apache/spark/pull/45429#issuecomment-1985159258 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on PR #45360: URL: https://github.com/apache/spark/pull/45360#issuecomment-1985157244 Will review sooner than later. Maybe by today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1517217945 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StatefulProcessorHandleSuite.scala: ## @@ -0,0 +1,299 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517243024 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517243024 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517243024 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1517236308 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1517229022 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -410,7 +412,9 @@ public boolean endsWith(final UTF8String suffix, int collationId)

[PR] [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references [spark]

2024-03-07 Thread via GitHub
zhengruifeng opened a new pull request, #45429: URL: https://github.com/apache/spark/pull/45429 ### What changes were proposed in this pull request? Add `VariantType` to API references ### Why are the changes needed? `VariantType` has been added in `__all__` in `types`

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1517226969 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -396,7 +396,9 @@ public boolean startsWith(final UTF8String prefix, int

Re: [PR] [SPARK-46654][SQL][Python] Make `to_csv` explicitly indicate that it does not support complex types of data [spark]

2024-03-07 Thread via GitHub
panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-1985106638 friendly ping @HyukjinKwon, When you are not busy, can you please continue to help review this PR? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-46510][CORE] Spark shell log filter should be applied to all AbstractAppender [spark]

2024-03-07 Thread via GitHub
AngersZh closed pull request #44496: [SPARK-46510][CORE] Spark shell log filter should be applied to all AbstractAppender URL: https://github.com/apache/spark/pull/44496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517209325 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985100729 Thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985099843 also cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985094500 This is my first time handling such a situation, is it better to create a new Jira or is it better as a FOLLOWUP of SPARK-47305? cc @HyukjinKwon @HeartSaVioR @zhengruifeng

[PR] [SPARK-47305][SQL][TESTS] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang opened a new pull request, #45428: URL: https://github.com/apache/spark/pull/45428 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1517119556 ## sql/api/src/main/scala/org/apache/spark/sql/streaming/ExpiredTimerInfo.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45424: URL: https://github.com/apache/spark/pull/45424#issuecomment-1985053630 Thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45424: [SPARK-47319][SQL] Improve missingInput calculation URL: https://github.com/apache/spark/pull/45424 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1517168161 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging { */

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1517155629 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging { */

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45360: URL: https://github.com/apache/spark/pull/45360#issuecomment-1984994353 Is this good to go? @HeartSaVioR @rangadi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1517123366 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +85,28 @@ class BasicInMemoryTableCatalog extends

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1517121687 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +85,28 @@ class BasicInMemoryTableCatalog extends

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1517121075 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging {

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45424: URL: https://github.com/apache/spark/pull/45424#discussion_r1517119767 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala: ## @@ -104,13 +104,19 @@ class AttributeSet private (private val

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517085186 ## python/pyspark/resource/tests/test_connect_resources.py: ## @@ -0,0 +1,46 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517085041 ## python/pyspark/resource/tests/test_connect_resources.py: ## @@ -0,0 +1,46 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517084819 ## python/pyspark/resource/profile.py: ## @@ -114,14 +122,23 @@ def id(self) -> int: int A unique id of this :class:`ResourceProfile`

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517084721 ## python/pyspark/resource/profile.py: ## @@ -114,14 +122,23 @@ def id(self) -> int: int A unique id of this :class:`ResourceProfile`

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984930529 As the Spark Community didn't get any issue report during v3.3.0 - v3.5.1 releases, I think this is a corner case. Maybe we can make the config internal. -- This is an automated

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984929745 +1 for the direction if we need to support both. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984926315 Thank you @dongjoon-hyun. In such circumstances, I guess we can add a configuration for base64 classes to avoid breaking things again. AFAIK, Apache Hive also uses the JDK

Re: [PR] [SPARK-47314][DOC] Remove the wrong comment line of `ExternalSorter#writePartitionedMapOutput` method [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45415: URL: https://github.com/apache/spark/pull/45415#issuecomment-1984919818 Thanks @zwangsheng, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47314][DOC] Remove the wrong comment line of `ExternalSorter#writePartitionedMapOutput` method [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45415: [SPARK-47314][DOC] Remove the wrong comment line of `ExternalSorter#writePartitionedMapOutput` method URL: https://github.com/apache/spark/pull/45415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45427: URL: https://github.com/apache/spark/pull/45427#issuecomment-1984918192 Late +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45427: [MINOR][INFRA] Make "y/n" consistent within merge script URL: https://github.com/apache/spark/pull/45427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45427: URL: https://github.com/apache/spark/pull/45427#issuecomment-1984911418 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47314][DOC] Correct the `ExternalSorter#writePartitionedMapOutput` method comment [spark]

2024-03-07 Thread via GitHub
zwangsheng commented on code in PR #45415: URL: https://github.com/apache/spark/pull/45415#discussion_r1517066704 ## core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala: ## @@ -690,7 +690,7 @@ private[spark] class ExternalSorter[K, V, C]( * Write all

[PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
HyukjinKwon opened a new pull request, #45427: URL: https://github.com/apache/spark/pull/45427 ### What changes were proposed in this pull request? This PR changes the y/n message and condition consistent within merging script. ### Why are the changes needed? For

Re: [PR] [SPARK-46992]Fix cache consistence [spark]

2024-03-07 Thread via GitHub
doki23 commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1984850287 > All children have to be considered for changes of their persistence state. Currently it only checks the fist found child. For clarity there is a test which fails:

Re: [PR] [SPARK-47309][SQL][XML] Fix schema inference issues in XML [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45426: [SPARK-47309][SQL][XML] Fix schema inference issues in XML URL: https://github.com/apache/spark/pull/45426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47309][SQL][XML] Fix schema inference issues in XML [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45426: URL: https://github.com/apache/spark/pull/45426#issuecomment-1984850009 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45269: [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers URL: https://github.com/apache/spark/pull/45269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45269: URL: https://github.com/apache/spark/pull/45269#issuecomment-1984848824 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1517022572 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software

Re: [PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45423: URL: https://github.com/apache/spark/pull/45423#issuecomment-1984835207 See also https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45423: URL: https://github.com/apache/spark/pull/45423#issuecomment-1984834978 Mind filing a JIRA and linking it to the PR title please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]

2024-03-07 Thread via GitHub
github-actions[bot] closed pull request #42398: [SPARK-42746][SQL] Add the LISTAGG() aggregate function URL: https://github.com/apache/spark/pull/42398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-46034][CORE] SparkContext add file should also copy file to local root path [spark]

2024-03-07 Thread via GitHub
github-actions[bot] commented on PR #43936: URL: https://github.com/apache/spark/pull/43936#issuecomment-1984826472 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-46071][SQL] Optimize CaseWhen toJSON content [spark]

2024-03-07 Thread via GitHub
github-actions[bot] commented on PR #43979: URL: https://github.com/apache/spark/pull/43979#issuecomment-1984826451 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[PR] [SPARK-47309][SQL][XML] Fix schema inference issues in XML [spark]

2024-03-07 Thread via GitHub
shujingyang-db opened a new pull request, #45426: URL: https://github.com/apache/spark/pull/45426 ### What changes were proposed in this pull request? This PR fixes XML schema inference issues: 1. when there's an empty tag 2. when merging schema for NullType

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on PR #45378: URL: https://github.com/apache/spark/pull/45378#issuecomment-1984523232 Merged to master, thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng closed pull request #45378: [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling URL: https://github.com/apache/spark/pull/45378 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984433848 Thank you for the confirmation, @ted-jenks . Well, in this case, it's too late to change the behavior again. Apache Spark 3.3 is already the EOL status since last year and I don't

Re: [PR] [SPARK-47311][SQL][PYTHON] Suppress Python exceptions where PySpark is not in the Python path [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on PR #45414: URL: https://github.com/apache/spark/pull/45414#issuecomment-1984375769 Looks nice, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-07 Thread via GitHub
agubichev commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1516770647 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -34,7 +34,7 @@ import

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on code in PR #45378: URL: https://github.com/apache/spark/pull/45378#discussion_r1516752307 ## python/pyspark/sql/tests/test_session.py: ## @@ -531,6 +531,33 @@ def test_dump_invalid_type(self): }, ) +def

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
ueshin commented on code in PR #45378: URL: https://github.com/apache/spark/pull/45378#discussion_r1516750441 ## python/pyspark/sql/profiler.py: ## @@ -224,6 +224,54 @@ def dump(id: int) -> None: for id in sorted(code_map.keys()): dump(id) +

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on code in PR #45378: URL: https://github.com/apache/spark/pull/45378#discussion_r1516752307 ## python/pyspark/sql/tests/test_session.py: ## @@ -531,6 +531,33 @@ def test_dump_invalid_type(self): }, ) +def

[PR] [SPARK-47318][Security] Adds HKDF round to AuthEngine key derivation [spark]

2024-03-07 Thread via GitHub
sweisdb opened a new pull request, #45425: URL: https://github.com/apache/spark/pull/45425 ### What changes were proposed in this pull request? This change adds an additional pass through a key derivation function (KDF) to the key exchange protocol in `AuthEngine`. Currently, it uses

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
attilapiros commented on code in PR #45424: URL: https://github.com/apache/spark/pull/45424#discussion_r1516669562 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala: ## @@ -104,13 +104,19 @@ class AttributeSet private (private val

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
attilapiros commented on code in PR #45424: URL: https://github.com/apache/spark/pull/45424#discussion_r1516651884 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala: ## @@ -104,13 +104,19 @@ class AttributeSet private (private val

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
peter-toth commented on PR #45424: URL: https://github.com/apache/spark/pull/45424#issuecomment-1984153122 @cloud-fan can you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
attilapiros commented on PR #45424: URL: https://github.com/apache/spark/pull/45424#issuecomment-1984150861 LGTM I talked to @peter-toth offline and the improvement comes from not calculating the `inputSet` at all when references is empty -- This is an automated message from the

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
allisonwang-db commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1516609093 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-37932][SQL]Wait to resolve missing attributes before applying DeduplicateRelations [spark]

2024-03-07 Thread via GitHub
peter-toth commented on PR #35684: URL: https://github.com/apache/spark/pull/35684#issuecomment-1984107426 @martinf-moodys, [SPARK-47319](https://issues.apache.org/jira/browse/SPARK-47319) / https://github.com/apache/spark/pull/45424 might help, especially if you have many `Union` nodes

[PR] [SPARK-47319][SQL] Fix missingInput calculation [spark]

2024-03-07 Thread via GitHub
peter-toth opened a new pull request, #45424: URL: https://github.com/apache/spark/pull/45424 ### What changes were proposed in this pull request? This PR speeds up `QueryPlan.missingInput()` calculation. ### Why are the changes needed? This seems to be the root cause of

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
stefankandic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516535896 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -117,7 +117,7 @@ object DataType { private val FIXED_DECIMAL =

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
sahnib commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1516471532 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
miland-db closed pull request #45423: Miland db/miland legacy error class URL: https://github.com/apache/spark/pull/45423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
miland-db opened a new pull request, #45423: URL: https://github.com/apache/spark/pull/45423 ### What changes were proposed in this pull request? In the PR, I propose to assign the proper names to the legacy error classes _LEGACY_ERROR_TEMP_324[7-9], and modify tests in testing suites to

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
dbatomic commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1516510742 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-07 Thread via GitHub
jchen5 commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1516503722 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -34,7 +34,7 @@ import

Re: [PR] [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider [spark]

2024-03-07 Thread via GitHub
cloud-fan closed pull request #45409: [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider URL: https://github.com/apache/spark/pull/45409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on PR #45409: URL: https://github.com/apache/spark/pull/45409#issuecomment-1983973892 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
uros-db opened a new pull request, #45422: URL: https://github.com/apache/spark/pull/45422 ### What changes were proposed in this pull request? ### Why are the changes needed? Currently, all `StringType` arguments passed to built-in string functions in Spark SQL get

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516378011 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516415830 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
stefankandic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516396458 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1516398411 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging { */

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
stefankandic commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516396458 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516389365 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -384,27 +387,47 @@ public boolean startsWith(final UTF8String prefix) { }

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516391257 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -31,6 +32,8 @@ import com.esotericsoftware.kryo.io.Input; import

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516381847 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -384,27 +387,47 @@ public boolean startsWith(final UTF8String prefix) { }

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516380909 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -384,27 +387,47 @@ public boolean startsWith(final UTF8String prefix) { }

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1516379232 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -31,6 +32,8 @@ import com.esotericsoftware.kryo.io.Input; import

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516378011 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

  1   2   >