Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
peter-toth commented on PR #45424: URL: https://github.com/apache/spark/pull/45424#issuecomment-1984153122 @cloud-fan can you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
ueshin commented on code in PR #45378: URL: https://github.com/apache/spark/pull/45378#discussion_r1516750441 ## python/pyspark/sql/profiler.py: ## @@ -224,6 +224,54 @@ def dump(id: int) -> None: for id in sorted(code_map.keys()): dump(id) +

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on code in PR #45378: URL: https://github.com/apache/spark/pull/45378#discussion_r1516752307 ## python/pyspark/sql/tests/test_session.py: ## @@ -531,6 +531,33 @@ def test_dump_invalid_type(self): }, ) +def

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on code in PR #45378: URL: https://github.com/apache/spark/pull/45378#discussion_r1516752307 ## python/pyspark/sql/tests/test_session.py: ## @@ -531,6 +531,33 @@ def test_dump_invalid_type(self): }, ) +def

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984433848 Thank you for the confirmation, @ted-jenks . Well, in this case, it's too late to change the behavior again. Apache Spark 3.3 is already the EOL status since last year and I don't

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
attilapiros commented on PR #45424: URL: https://github.com/apache/spark/pull/45424#issuecomment-1984150861 LGTM I talked to @peter-toth offline and the improvement comes from not calculating the `inputSet` at all when references is empty -- This is an automated message from the

[PR] [SPARK-47318][Security] Adds HKDF round to AuthEngine key derivation [spark]

2024-03-07 Thread via GitHub
sweisdb opened a new pull request, #45425: URL: https://github.com/apache/spark/pull/45425 ### What changes were proposed in this pull request? This change adds an additional pass through a key derivation function (KDF) to the key exchange protocol in `AuthEngine`. Currently, it uses

[PR] [SPARK-47309][SQL][XML] Fix schema inference issues in XML [spark]

2024-03-07 Thread via GitHub
shujingyang-db opened a new pull request, #45426: URL: https://github.com/apache/spark/pull/45426 ### What changes were proposed in this pull request? This PR fixes XML schema inference issues: 1. when there's an empty tag 2. when merging schema for NullType

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
allisonwang-db commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1516609093 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
attilapiros commented on code in PR #45424: URL: https://github.com/apache/spark/pull/45424#discussion_r1516651884 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala: ## @@ -104,13 +104,19 @@ class AttributeSet private (private val

Re: [PR] [SPARK-47311][SQL][PYTHON] Suppress Python exceptions where PySpark is not in the Python path [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on PR #45414: URL: https://github.com/apache/spark/pull/45414#issuecomment-1984375769 Looks nice, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng commented on PR #45378: URL: https://github.com/apache/spark/pull/45378#issuecomment-1984523232 Merged to master, thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
attilapiros commented on code in PR #45424: URL: https://github.com/apache/spark/pull/45424#discussion_r1516669562 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala: ## @@ -104,13 +104,19 @@ class AttributeSet private (private val

Re: [PR] [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling [spark]

2024-03-07 Thread via GitHub
xinrong-meng closed pull request #45378: [SPARK-47276][PYTHON][CONNECT] Introduce `spark.profile.clear` for SparkSession-based profiling URL: https://github.com/apache/spark/pull/45378 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46743][SQL] Count bug after constant folding [spark]

2024-03-07 Thread via GitHub
agubichev commented on code in PR #45125: URL: https://github.com/apache/spark/pull/45125#discussion_r1516770647 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteWithExpression.scala: ## @@ -34,7 +34,7 @@ import

Re: [PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45427: [MINOR][INFRA] Make "y/n" consistent within merge script URL: https://github.com/apache/spark/pull/45427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45360: URL: https://github.com/apache/spark/pull/45360#issuecomment-1984994353 Is this good to go? @HeartSaVioR @rangadi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1517119556 ## sql/api/src/main/scala/org/apache/spark/sql/streaming/ExpiredTimerInfo.scala: ## @@ -0,0 +1,40 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [SPARK-46654][SQL][Python] Make `to_csv` explicitly indicate that it does not support complex types of data [spark]

2024-03-07 Thread via GitHub
panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-1985106638 friendly ping @HyukjinKwon, When you are not busy, can you please continue to help review this PR? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517356903 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517356072 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
chaoqin-li1123 commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517356903 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47314][DOC] Remove the wrong comment line of `ExternalSorter#writePartitionedMapOutput` method [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45415: URL: https://github.com/apache/spark/pull/45415#issuecomment-1984919818 Thanks @zwangsheng, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984926315 Thank you @dongjoon-hyun. In such circumstances, I guess we can add a configuration for base64 classes to avoid breaking things again. AFAIK, Apache Hive also uses the JDK

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1517121075 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging {

[PR] [SPARK-47305][SQL][TESTS] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang opened a new pull request, #45428: URL: https://github.com/apache/spark/pull/45428 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517243024 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517243024 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-46913][SS] Add support for processing/event time based timers with transformWithState operator [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on code in PR #45051: URL: https://github.com/apache/spark/pull/45051#discussion_r1517217945 ## sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StatefulProcessorHandleSuite.scala: ## @@ -0,0 +1,299 @@ +/* + * Licensed to the Apache

Re: [PR] [SPARK-47309][SQL][XML] Fix schema inference issues in XML [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45426: [SPARK-47309][SQL][XML] Fix schema inference issues in XML URL: https://github.com/apache/spark/pull/45426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45269: URL: https://github.com/apache/spark/pull/45269#issuecomment-1984848824 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45269: [SPARK-47078][DOCS][PYTHON] Documentation for SparkSession-based Profilers URL: https://github.com/apache/spark/pull/45269 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46992]Fix cache consistence [spark]

2024-03-07 Thread via GitHub
doki23 commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1984850287 > All children have to be considered for changes of their persistence state. Currently it only checks the fist found child. For clarity there is a test which fails:

Re: [PR] [SPARK-47309][SQL][XML] Fix schema inference issues in XML [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45426: URL: https://github.com/apache/spark/pull/45426#issuecomment-1984850009 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47314][DOC] Correct the `ExternalSorter#writePartitionedMapOutput` method comment [spark]

2024-03-07 Thread via GitHub
zwangsheng commented on code in PR #45415: URL: https://github.com/apache/spark/pull/45415#discussion_r1517066704 ## core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala: ## @@ -690,7 +690,7 @@ private[spark] class ExternalSorter[K, V, C]( * Write all

Re: [PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45427: URL: https://github.com/apache/spark/pull/45427#issuecomment-1984918192 Late +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517085041 ## python/pyspark/resource/tests/test_connect_resources.py: ## @@ -0,0 +1,46 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517085186 ## python/pyspark/resource/tests/test_connect_resources.py: ## @@ -0,0 +1,46 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45424: URL: https://github.com/apache/spark/pull/45424#issuecomment-1985053630 Thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] [SPARK-47322][PYTHON][CONNECT] Make `withColumnsRenamed` duplicated column name handling consisten with `withColumnRenamed` [spark]

2024-03-07 Thread via GitHub
zhengruifeng opened a new pull request, #45431: URL: https://github.com/apache/spark/pull/45431 ### What changes were proposed in this pull request? Make `withColumnsRenamed` duplicated column name handling consistent with `withColumnRenamed` ### Why are the changes needed?

Re: [PR] [SPARK-47322][PYTHON][CONNECT] Make `withColumnsRenamed` duplicated column name handling consisten with `withColumnRenamed` [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45431: URL: https://github.com/apache/spark/pull/45431#issuecomment-1985167614 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984930529 As the Spark Community didn't get any issue report during v3.3.0 - v3.5.1 releases, I think this is a corner case. Maybe we can make the config internal. -- This is an automated

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985100729 Thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1517236308 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517243024 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-47250][SS] Add additional validations and NERF changes for RocksDB state provider and use of column families [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on PR #45360: URL: https://github.com/apache/spark/pull/45360#issuecomment-1985157244 Will review sooner than later. Maybe by today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references [spark]

2024-03-07 Thread via GitHub
HyukjinKwon closed pull request #45429: [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references URL: https://github.com/apache/spark/pull/45429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45429: URL: https://github.com/apache/spark/pull/45429#issuecomment-1985159258 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517348253 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46071][SQL] Optimize CaseWhen toJSON content [spark]

2024-03-07 Thread via GitHub
github-actions[bot] commented on PR #43979: URL: https://github.com/apache/spark/pull/43979#issuecomment-1984826451 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-46034][CORE] SparkContext add file should also copy file to local root path [spark]

2024-03-07 Thread via GitHub
github-actions[bot] commented on PR #43936: URL: https://github.com/apache/spark/pull/43936#issuecomment-1984826472 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-42746][SQL] Add the LISTAGG() aggregate function [spark]

2024-03-07 Thread via GitHub
github-actions[bot] closed pull request #42398: [SPARK-42746][SQL] Add the LISTAGG() aggregate function URL: https://github.com/apache/spark/pull/42398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-47307] Replace RFC 2045 base64 encoder with RFC 4648 encoder [spark]

2024-03-07 Thread via GitHub
dongjoon-hyun commented on PR #45408: URL: https://github.com/apache/spark/pull/45408#issuecomment-1984929745 +1 for the direction if we need to support both. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45424: URL: https://github.com/apache/spark/pull/45424#discussion_r1517119767 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala: ## @@ -104,13 +104,19 @@ class AttributeSet private (private val

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1517155629 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging { */

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985094500 This is my first time handling such a situation, is it better to create a new Jira or is it better as a FOLLOWUP of SPARK-47305? cc @HyukjinKwon @HeartSaVioR @zhengruifeng

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985099843 also cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1517226969 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -396,7 +396,9 @@ public boolean startsWith(final UTF8String prefix, int

[PR] [SPARK-47079][PYTHON][DOCS][FOLLOWUP] Add `VariantType` to API references [spark]

2024-03-07 Thread via GitHub
zhengruifeng opened a new pull request, #45429: URL: https://github.com/apache/spark/pull/45429 ### What changes were proposed in this pull request? Add `VariantType` to API references ### Why are the changes needed? `VariantType` has been added in `__all__` in `types`

Re: [PR] [SPARK-47305][SQL][TESTS][FOLLOWUP][3.4] Fix the compilation error related to `PropagateEmptyRelationSuite` [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on PR #45428: URL: https://github.com/apache/spark/pull/45428#issuecomment-1985161811 FOLLOWUP tag should be OK. Thanks for handling this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47319][SQL] Improve missingInput calculation [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45424: [SPARK-47319][SQL] Improve missingInput calculation URL: https://github.com/apache/spark/pull/45424 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46510][CORE] Spark shell log filter should be applied to all AbstractAppender [spark]

2024-03-07 Thread via GitHub
AngersZh closed pull request #44496: [SPARK-46510][CORE] Spark shell log filter should be applied to all AbstractAppender URL: https://github.com/apache/spark/pull/44496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517351498 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonStreamingSourceRunner.scala: ## @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software

Re: [PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45423: URL: https://github.com/apache/spark/pull/45423#issuecomment-1984834978 Mind filing a JIRA and linking it to the PR title please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Miland db/miland legacy error class [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45423: URL: https://github.com/apache/spark/pull/45423#issuecomment-1984835207 See also https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
HyukjinKwon opened a new pull request, #45427: URL: https://github.com/apache/spark/pull/45427 ### What changes were proposed in this pull request? This PR changes the y/n message and condition consistent within merging script. ### Why are the changes needed? For

Re: [PR] [MINOR][INFRA] Make "y/n" consistent within merge script [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on PR #45427: URL: https://github.com/apache/spark/pull/45427#issuecomment-1984911418 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-47314][DOC] Remove the wrong comment line of `ExternalSorter#writePartitionedMapOutput` method [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45415: [SPARK-47314][DOC] Remove the wrong comment line of `ExternalSorter#writePartitionedMapOutput` method URL: https://github.com/apache/spark/pull/45415 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517084819 ## python/pyspark/resource/profile.py: ## @@ -114,14 +122,23 @@ def id(self) -> int: int A unique id of this :class:`ResourceProfile`

Re: [PR] [SPARK-46812][CONNECT][PYTHON] Make mapInPandas / mapInArrow support ResourceProfile [spark]

2024-03-07 Thread via GitHub
wbo4958 commented on code in PR #45232: URL: https://github.com/apache/spark/pull/45232#discussion_r1517084721 ## python/pyspark/resource/profile.py: ## @@ -114,14 +122,23 @@ def id(self) -> int: int A unique id of this :class:`ResourceProfile`

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1517123366 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +85,28 @@ class BasicInMemoryTableCatalog extends

Re: [PR] [SPARK-46962][SS][PYTHON] Add interface for python streaming data source API and implement python worker to run python streaming data source [spark]

2024-03-07 Thread via GitHub
HeartSaVioR commented on code in PR #45023: URL: https://github.com/apache/spark/pull/45023#discussion_r1517209325 ## python/pyspark/sql/datasource.py: ## @@ -298,6 +320,133 @@ def read(self, partition: InputPartition) -> Iterator[Union[Tuple, Row]]: ... +class

Re: [PR] [SPARK-46834][SQL][Collations] Support for aggregates [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45290: URL: https://github.com/apache/spark/pull/45290#discussion_r1517309692 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -183,6 +185,57 @@ class CollationSuite extends DatasourceV2SQLBase { } } +

Re: [PR] [SPARK-47296][SQL][COLLATION] Fail unsupported functions for non-binary collations [spark]

2024-03-07 Thread via GitHub
HyukjinKwon commented on code in PR #45422: URL: https://github.com/apache/spark/pull/45422#discussion_r1517022572 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationUtils.scala: ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-47265][SQL][TESTS] Replace `createTable(..., schema: StructType, ...)` with `createTable(..., columns: Array[Column], ...)` in UT [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45368: URL: https://github.com/apache/spark/pull/45368#discussion_r1517121687 ## sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTableCatalog.scala: ## @@ -84,28 +85,28 @@ class BasicInMemoryTableCatalog extends

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1517168161 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging { */

Re: [PR] [SPARK-47295] Added ICU StringSearch for 'startsWith' and 'endsWith' functions [spark]

2024-03-07 Thread via GitHub
uros-db commented on code in PR #45421: URL: https://github.com/apache/spark/pull/45421#discussion_r1517229022 ## common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java: ## @@ -410,7 +412,9 @@ public boolean endsWith(final UTF8String suffix, int collationId)

[PR] [WIP] Issue to fix foreachbatch persist issue for stateful queries [spark]

2024-03-07 Thread via GitHub
anishshri-db opened a new pull request, #45432: URL: https://github.com/apache/spark/pull/45432 ### What changes were proposed in this pull request? Issue to fix foreachbatch persist issue for stateful queries ### Why are the changes needed? This allows us to prevent

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516070311 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -117,7 +117,7 @@ object DataType { private val FIXED_DECIMAL =

Re: [PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
cloud-fan commented on code in PR #45418: URL: https://github.com/apache/spark/pull/45418#discussion_r1516087243 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -87,17 +87,26 @@ abstract class JdbcDialect extends Serializable with Logging {

Re: [PR] [SPARK-47302][SQL][Collation] Collate keyword as identifier [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45405: URL: https://github.com/apache/spark/pull/45405#discussion_r1516062859 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -1096,7 +1096,7 @@ colPosition ; collateClause -: COLLATE

[PR] [SPARK-46761][SQL] Quoted strings in a JSON path should support ? characters [spark]

2024-03-07 Thread via GitHub
planga82 opened a new pull request, #45420: URL: https://github.com/apache/spark/pull/45420 ### What changes were proposed in this pull request? If there is a JSON with a ? character in the key like ``` {"?":"QUESTION"} ``` This PR allow to add this character

Re: [PR] [SPARK-47314][DOC] Correct the `ExternalSorter#writePartitionedMapOutput` method comment [spark]

2024-03-07 Thread via GitHub
LuciferYang commented on code in PR #45415: URL: https://github.com/apache/spark/pull/45415#discussion_r1516106051 ## core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala: ## @@ -690,7 +690,7 @@ private[spark] class ExternalSorter[K, V, C]( * Write all

Re: [PR] [SPARK-45827][SQL] Move data type checks to CreatableRelationProvider [spark]

2024-03-07 Thread via GitHub
cashmand commented on code in PR #45409: URL: https://github.com/apache/spark/pull/45409#discussion_r1516216789 ## sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala: ## @@ -175,6 +175,25 @@ trait CreatableRelationProvider { mode: SaveMode,

[PR] [SPARK-47316][SQL] Fix TimestampNTZ in Postgres Array [spark]

2024-03-07 Thread via GitHub
yaooqinn opened a new pull request, #45418: URL: https://github.com/apache/spark/pull/45418 ### What changes were proposed in this pull request? For Postgres, TimestampNTZ works well for plain TimestampNTZ types but not for nested ones, typically for now: array. This

Re: [PR] [SPARK-47315][SQL][TEST] Clean up tempView for `createTempView` UT [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45417: URL: https://github.com/apache/spark/pull/45417#issuecomment-1983000945 Merged to master. Thank you @wForget and @HyukjinKwon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-47315][SQL][TEST] Clean up tempView for `createTempView` UT [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45417: [SPARK-47315][SQL][TEST] Clean up tempView for `createTempView` UT URL: https://github.com/apache/spark/pull/45417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[PR] [SPARK-36691][PYTHON] PythonRunner failed should pass error message to ApplicationMaster too [spark]

2024-03-07 Thread via GitHub
AngersZh opened a new pull request, #33934: URL: https://github.com/apache/spark/pull/33934 ### What changes were proposed in this pull request? In current pyspark, stderr and stdout are print together, if python script exit, PythonRunner will only throw a `SparkUserAppsException`

[PR] [DOCS][PYTHON] Fix documentation typo in takeSample method [spark]

2024-03-07 Thread via GitHub
kimborowicz opened a new pull request, #45419: URL: https://github.com/apache/spark/pull/45419 ### What changes were proposed in this pull request? Fixed an error in the docstring documentation for the parameter `withReplacement` of `takeSample` method in `pyspark.RDD`, should be

Re: [PR] [SPARK-47238][SQL] Reduce executor memory usage by making generated code in WSCG a broadcast variable [spark]

2024-03-07 Thread via GitHub
jwang0306 commented on PR #45348: URL: https://github.com/apache/spark/pull/45348#issuecomment-1983243153 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47300][SQL] `quoteIfNeeded` should quote identifier starts with digits [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45401: URL: https://github.com/apache/spark/pull/45401#issuecomment-1983276998 Merged to master. Thank you @cloud-fan @dongjoon-hyun @HyukjinKwon @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-47254][SQL] Assign names to the error classes _LEGACY_ERROR_TEMP_325[1-9][WIP] [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on code in PR #45407: URL: https://github.com/apache/spark/pull/45407#discussion_r1515965942 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala: ## @@ -455,19 +455,6 @@ class DDLParserSuite extends AnalysisTest with

Re: [PR] [SPARK-47254][SQL] Assign names to the error classes _LEGACY_ERROR_TEMP_325[1-9][WIP] [spark]

2024-03-07 Thread via GitHub
MaxGekk commented on PR #45407: URL: https://github.com/apache/spark/pull/45407#issuecomment-1983367613 @stefanbuk-db If you are still working on the PR, please, move the tag `[WIP]` at the beginning of PR's title (this is a convention) -- This is an automated message from the Apache Git

Re: [PR] [SPARK-46992]Fix cache consistence [spark]

2024-03-07 Thread via GitHub
dtarima commented on PR #45181: URL: https://github.com/apache/spark/pull/45181#issuecomment-1983260338 All children have to be considered for changes of their persistence state. Currently it only checks the fist found child. For clarity there is a test which fails:

Re: [PR] [SPARK-47300][SQL] `quoteIfNeeded` should quote identifier starts with digits [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45401: [SPARK-47300][SQL] `quoteIfNeeded` should quote identifier starts with digits URL: https://github.com/apache/spark/pull/45401 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47298][BUILD] Upgrade `mysql-connector-j` to `8.3.0` and `mariadb-java-client` to `2.7.12` [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45399: [SPARK-47298][BUILD] Upgrade `mysql-connector-j` to `8.3.0` and `mariadb-java-client` to `2.7.12` URL: https://github.com/apache/spark/pull/45399 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47298][BUILD] Upgrade `mysql-connector-j` to `8.3.0` and `mariadb-java-client` to `2.7.12` [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45399: URL: https://github.com/apache/spark/pull/45399#issuecomment-1983290627 Merged to master. Thank you @panbingkun @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47278][BUILD] Upgrade rocksdbjni to 8.11.3 [spark]

2024-03-07 Thread via GitHub
yaooqinn commented on PR #45365: URL: https://github.com/apache/spark/pull/45365#issuecomment-1983294203 Merged to master. Thank you @LuciferYang @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47278][BUILD] Upgrade rocksdbjni to 8.11.3 [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45365: [SPARK-47278][BUILD] Upgrade rocksdbjni to 8.11.3 URL: https://github.com/apache/spark/pull/45365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [MINOR][DOCS][PYTHON] Fix documentation typo in takeSample method [spark]

2024-03-07 Thread via GitHub
yaooqinn closed pull request #45419: [MINOR][DOCS][PYTHON] Fix documentation typo in takeSample method URL: https://github.com/apache/spark/pull/45419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

  1   2   >