Re: [PR] Collations proof of concept [spark]

2024-05-24 Thread via GitHub
github-actions[bot] commented on PR #44537: URL: https://github.com/apache/spark/pull/44537#issuecomment-2130548277 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47041] PushDownUtils should support not only FileScanBuilder but any SupportsPushDownCatalystFilters [spark]

2024-05-24 Thread via GitHub
github-actions[bot] commented on PR #45099: URL: https://github.com/apache/spark/pull/45099#issuecomment-2130548267 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47579][CORE][PART3] Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub
zeotuan commented on PR #46724: URL: https://github.com/apache/spark/pull/46724#issuecomment-2130535309 @gengliangwang Please help review this. I will merge this after #46739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [SPARK-48416][SQL] Support related nested WITH expression [spark]

2024-05-24 Thread via GitHub
zml1206 opened a new pull request, #46741: URL: https://github.com/apache/spark/pull/46741 ### What changes were proposed in this pull request? Refactor `RewriteWithExpression` logic to support related nested `WITH` expression. Generate `Project` order: 1. internally nested

Re: [PR] [SPARK-48292[CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status [spark]

2024-05-24 Thread via GitHub
viirya commented on PR #46696: URL: https://github.com/apache/spark/pull/46696#issuecomment-2130496440 Looks good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using a scala TreeMap (RB Tree) [spark]

2024-05-24 Thread via GitHub
GideonPotok closed pull request #46404: [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using a scala TreeMap (RB Tree) URL: https://github.com/apache/spark/pull/46404 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-48394][CORE] Cleanup mapIdToMapIndex on mapoutput unregister [spark]

2024-05-24 Thread via GitHub
dongjoon-hyun commented on PR #46706: URL: https://github.com/apache/spark/pull/46706#issuecomment-2130495121 Merged to master only because this was defined as `Improvement`. https://github.com/apache/spark/assets/9700541/e6901f63-cfb8-491b-9368-a840493befdc;> -- This is an

Re: [PR] [WIP] Don't review: E2e [spark]

2024-05-24 Thread via GitHub
GideonPotok closed pull request #46670: [WIP] Don't review: E2e URL: https://github.com/apache/spark/pull/46670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] [SPARK-48394][CORE] Cleanup mapIdToMapIndex on mapoutput unregister [spark]

2024-05-24 Thread via GitHub
dongjoon-hyun closed pull request #46706: [SPARK-48394][CORE] Cleanup mapIdToMapIndex on mapoutput unregister URL: https://github.com/apache/spark/pull/46706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47579][CORE][PART3] Spark core: Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub
dongjoon-hyun commented on PR #46739: URL: https://github.com/apache/spark/pull/46739#issuecomment-2130487265 Pending CIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48407][SQL][DOCS] Teradata: Document Type Conversion rules between Spark SQL and teradata [spark]

2024-05-24 Thread via GitHub
dongjoon-hyun closed pull request #46728: [SPARK-48407][SQL][DOCS] Teradata: Document Type Conversion rules between Spark SQL and teradata URL: https://github.com/apache/spark/pull/46728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48325][CORE] Always specify messages in ExecutorRunner.killProcess [spark]

2024-05-24 Thread via GitHub
dongjoon-hyun commented on PR #46641: URL: https://github.com/apache/spark/pull/46641#issuecomment-2130485187 Merged to master for Apache Spark 4.0.0. Thank you, @bozhang2820 and all. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-48325][CORE] Always specify messages in ExecutorRunner.killProcess [spark]

2024-05-24 Thread via GitHub
dongjoon-hyun closed pull request #46641: [SPARK-48325][CORE] Always specify messages in ExecutorRunner.killProcess URL: https://github.com/apache/spark/pull/46641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48292[CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status [spark]

2024-05-24 Thread via GitHub
cloud-fan commented on PR #46696: URL: https://github.com/apache/spark/pull/46696#issuecomment-2130363916 can we also revert https://github.com/apache/spark/pull/46562 in this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47008][CORE] Added Hadoops fileSystems hasPathCapability check to avoid FileNotFoundException(s) when using S3 Express One Zone Storage. [spark]

2024-05-24 Thread via GitHub
leovegas commented on code in PR #46678: URL: https://github.com/apache/spark/pull/46678#discussion_r1613994258 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala: ## @@ -62,10 +62,10 @@ object OrcUtils extends Logging { val

Re: [PR] [SPARK-47008][CORE] Added Hadoops fileSystems hasPathCapability check to avoid FileNotFoundException(s) when using S3 Express One Zone Storage. [spark]

2024-05-24 Thread via GitHub
leovegas commented on code in PR #46678: URL: https://github.com/apache/spark/pull/46678#discussion_r1613992518 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala: ## @@ -125,16 +125,4 @@ private[hive] object OrcFileOperator extends Logging {

Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-24 Thread via GitHub
cloud-fan commented on code in PR #46580: URL: https://github.com/apache/spark/pull/46580#discussion_r1613945984 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala: ## @@ -20,19 +20,23 @@ package

Re: [PR] [SPARK-48380][CORE][WIP]: SerDeUtil.javaToPython to support batchSize parameter [spark]

2024-05-24 Thread via GitHub
JoshRosen commented on code in PR #46697: URL: https://github.com/apache/spark/pull/46697#discussion_r1613915291 ## core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala: ## @@ -104,12 +104,40 @@ private[spark] object SerDeUtil extends Logging { } } + /** +

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub
JoshRosen commented on code in PR #46705: URL: https://github.com/apache/spark/pull/46705#discussion_r1613904865 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging { */ def

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub
JoshRosen commented on code in PR #46705: URL: https://github.com/apache/spark/pull/46705#discussion_r1613904865 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging { */ def

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub
JoshRosen commented on code in PR #46705: URL: https://github.com/apache/spark/pull/46705#discussion_r1613896716 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging { */ def

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub
JoshRosen commented on code in PR #46705: URL: https://github.com/apache/spark/pull/46705#discussion_r1613891588 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging { */ def

Re: [PR] [MINOR][SQL] Remove outdated `TODO`s from `UnsafeHashedRelation` [spark]

2024-05-24 Thread via GitHub
JoshRosen commented on code in PR #46736: URL: https://github.com/apache/spark/pull/46736#discussion_r1613884389 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala: ## @@ -409,9 +407,6 @@ private[joins] class UnsafeHashedRelation( val

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub
wayneguow commented on code in PR #46731: URL: https://github.com/apache/spark/pull/46731#discussion_r1613880314 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -90,8 +91,8 @@ class ResolveSessionCatalog(val catalogManager:

Re: [PR] [SPARK-48286] Fix analysis and creation of column with exists default expression [spark]

2024-05-24 Thread via GitHub
cloud-fan commented on PR #46594: URL: https://github.com/apache/spark/pull/46594#issuecomment-2130135726 can we add a test? Besically any default column value that is unfoldable, like `current_date()`, can trigger this bug -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-48286] Fix analysis and creation of column with exists default expression [spark]

2024-05-24 Thread via GitHub
cloud-fan commented on PR #46594: URL: https://github.com/apache/spark/pull/46594#issuecomment-2130132833 Can you retry the github action job? Seems flaky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]

2024-05-24 Thread via GitHub
gengliangwang commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1613860819 ## common/utils/src/main/scala/org/apache/spark/internal/README.md: ## Review Comment: I will wait for @mridulm's response until this weekend. -- This

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub
wayneguow commented on code in PR #46731: URL: https://github.com/apache/spark/pull/46731#discussion_r1613124606 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -90,8 +91,8 @@ class ResolveSessionCatalog(val catalogManager:

[PR] [SPARK-48411][SS][PYTHON] Add E2E test for DropDuplicateWithinWatermark [spark]

2024-05-24 Thread via GitHub
eason-yuchen-liu opened a new pull request, #46740: URL: https://github.com/apache/spark/pull/46740 ### What changes were proposed in this pull request? This PR adds a test for API DropDuplicateWithinWatermark, which was previously missing. ### Why are the changes needed?

Re: [PR] [SPARK-47579][CORE][PART3] Spark core: Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub
gengliangwang commented on PR #46739: URL: https://github.com/apache/spark/pull/46739#issuecomment-2130009429 cc @panbingkun @zeotuan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-47579][CORE][PART3] Spark core: Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub
gengliangwang opened a new pull request, #46739: URL: https://github.com/apache/spark/pull/46739 ### What changes were proposed in this pull request? The PR aims to migrate logInfo in Core module with variables to structured logging framework. ### Why are the

Re: [PR] [SPARK-47579][SQL][FOLLOWUP] Restore the `--help` print format of spark sql shell [spark]

2024-05-24 Thread via GitHub
gengliangwang closed pull request #46735: [SPARK-47579][SQL][FOLLOWUP] Restore the `--help` print format of spark sql shell URL: https://github.com/apache/spark/pull/46735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47579][SQL][FOLLOWUP] Restore the `--help` print format of spark sql shell [spark]

2024-05-24 Thread via GitHub
gengliangwang commented on PR #46735: URL: https://github.com/apache/spark/pull/46735#issuecomment-2129973290 Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2129923419 @uros-db I forgot but should I add collation support to `org.apache.spark.sql.catalyst.expressions.aggregate.PandasMode`? The only difference will be 1. Support for null

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-24 Thread via GitHub
cloud-fan commented on code in PR #46440: URL: https://github.com/apache/spark/pull/46440#discussion_r1613651414 ## connector/connect/common/src/test/resources/query-tests/explain-results/function_shiftleft.explain: ## @@ -1,2 +1,2 @@ -Project [shiftleft(cast(b#0 as int), 2) AS

Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-24 Thread via GitHub
nikolamand-db commented on PR #46580: URL: https://github.com/apache/spark/pull/46580#issuecomment-2129615940 > I think this fixes https://issues.apache.org/jira/browse/SPARK-46625 as well. Can we add a test to verify? Checked locally, seems like these changes don't resolve the

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613161561 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613448325 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613447350 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48413][SQL] ALTER COLUMN with collation [spark]

2024-05-24 Thread via GitHub
johanl-db commented on code in PR #46734: URL: https://github.com/apache/spark/pull/46734#discussion_r1613401422 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala: ## @@ -396,8 +396,9 @@ case class AlterTableChangeColumnCommand( val newDataSchema

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613408156 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613407352 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613404598 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613401443 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613389337 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1613391451 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -17,15 +17,136 @@ package org.apache.spark.unsafe.types; import

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613389337 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613386426 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613384389 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-48415][PYTHON] Refactor `TypeName` to support parameterized datatypes [spark]

2024-05-24 Thread via GitHub
zhengruifeng commented on code in PR #46738: URL: https://github.com/apache/spark/pull/46738#discussion_r1613379691 ## python/pyspark/sql/types.py: ## @@ -120,9 +120,8 @@ def __eq__(self, other: Any) -> bool: def __ne__(self, other: Any) -> bool: return not

[PR] [SPARK-48415][PYTHON] Refactor `TypeName` to support parameterized datatypes [spark]

2024-05-24 Thread via GitHub
zhengruifeng opened a new pull request, #46738: URL: https://github.com/apache/spark/pull/46738 ### What changes were proposed in this pull request? 1, refactor `TypeName` to support parameterized datatypes 2, remove redundant simpleString/jsonValue methods, since they are type name

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1613207707 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -147,6 +162,45 @@ public static String toLowerCase(final String

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613363048 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613361676 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613358004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613358004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613351378 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,21 @@ case class Mode( override def inputTypes:

Re: [PR] [SPARK-44050][K8S]add retry config when creating Kubernetes resources. [spark]

2024-05-24 Thread via GitHub
imtzer commented on PR #45911: URL: https://github.com/apache/spark/pull/45911#issuecomment-2129285181 > > same problem when using spark operator, it's weird why the code does not throw anything when configmap is not created > > When using spark-submit, there is no error output in

Re: [PR] [SPARK-48309][YARN]Stop am retry, in situations where some errors and retries may not be successful [spark]

2024-05-24 Thread via GitHub
guixiaowen commented on PR #46620: URL: https://github.com/apache/spark/pull/46620#issuecomment-2129261156 > Hi, @guixiaowen I need some time to think about it as it might break some existing workloads. > > Meantime, you can > > * Update the PR desc for better readability >

[PR] [SPARK-48414][PYTHON] Fix breaking change in python's `fromJson` [spark]

2024-05-24 Thread via GitHub
stefankandic opened a new pull request, #46737: URL: https://github.com/apache/spark/pull/46737 ### What changes were proposed in this pull request? Fix breaking change in `fromJson` method by having default param values. ### Why are the changes needed? In

Re: [PR] [SPARK-48309][YARN]Stop am retry, in situations where some errors and retries may not be successful [spark]

2024-05-24 Thread via GitHub
yaooqinn commented on PR #46620: URL: https://github.com/apache/spark/pull/46620#issuecomment-2129183148 Hi, @guixiaowen I need some time to think about it as it might break some existing workloads. Meantime, you can - Update the PR desc for better readability - Update the PR

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-24 Thread via GitHub
ulysses-you closed pull request #46440: [SPARK-48168][SQL] Add bitwise shifting operators support URL: https://github.com/apache/spark/pull/46440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-24 Thread via GitHub
ulysses-you commented on PR #46440: URL: https://github.com/apache/spark/pull/46440#issuecomment-2129180382 thanks, merged to master(4.0.0) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613221638 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -610,8 +610,42 @@ public void testFindInSet() throws SparkException {

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub
dbatomic commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613213985 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1613207707 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -147,6 +162,45 @@ public static String toLowerCase(final String

Re: [PR] [SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1612779931 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -279,6 +280,7 @@ abstract class Optimizer(catalogManager: CatalogManager)

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub
zhengruifeng commented on PR #46733: URL: https://github.com/apache/spark/pull/46733#issuecomment-2129082503 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub
zhengruifeng closed pull request #46733: [SPARK-48412][PYTHON] Refactor data type json parse URL: https://github.com/apache/spark/pull/46733 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [MINOR][SQL] Remove outdated `TODO`s from `UnsafeHashedRelation` [spark]

2024-05-24 Thread via GitHub
LuciferYang commented on code in PR #46736: URL: https://github.com/apache/spark/pull/46736#discussion_r1613184450 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala: ## @@ -409,9 +407,6 @@ private[joins] class UnsafeHashedRelation( val

Re: [PR] [MINOR][SQL] Remove outdated `TODO`s from `UnsafeHashedRelation` [spark]

2024-05-24 Thread via GitHub
LuciferYang commented on code in PR #46736: URL: https://github.com/apache/spark/pull/46736#discussion_r1613184450 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala: ## @@ -409,9 +407,6 @@ private[joins] class UnsafeHashedRelation( val

Re: [PR] [SPARK-48413][SQL] ALTER COLUMN with collation [spark]

2024-05-24 Thread via GitHub
olaky commented on code in PR #46734: URL: https://github.com/apache/spark/pull/46734#discussion_r1613167082 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -408,6 +408,37 @@ object DataType { } } + /** + * Check if `from` is equal to

Re: [PR] [MINOR][SQL] Remove an outdated `TODO` from UnsafeHashedRelation [spark]

2024-05-24 Thread via GitHub
LuciferYang commented on PR #46736: URL: https://github.com/apache/spark/pull/46736#issuecomment-2129074503 > How about removing the `TODO(josh)` at L412 [done](https://github.com/apache/spark/pull/9127/files#diff-127291a0287f790755be5473765ea03eb65f8b58b9ec0760955f124e21e3452f)

Re: [PR] [SPARK-48213][SQL] Do not push down predicate if non-cheap expression exceed reused limit [spark]

2024-05-24 Thread via GitHub
zml1206 commented on PR #46499: URL: https://github.com/apache/spark/pull/46499#issuecomment-2129064493 > Can we make `With` nested as well? I have thought about it for a long time, and I can change the logic of withRewrite. The `alias` generated by the lowest layer with is in

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613161561 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [MINOR][SQL] Remove an outdated `TODO` from UnsafeHashedRelation [spark]

2024-05-24 Thread via GitHub
yaooqinn commented on PR #46736: URL: https://github.com/apache/spark/pull/46736#issuecomment-2129046316 How about removing the `TODO(josh)` at L412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48396] Support configuring max cores can be used for SQL [spark]

2024-05-24 Thread via GitHub
yabola commented on PR #46713: URL: https://github.com/apache/spark/pull/46713#issuecomment-2129043469 @cloud-fan could you take a look, thank you~ This is useful in a shared sql cluster. This will make it easier to control sql. The picture below shows that cores used is consistent.

Re: [PR] [SPARK-48352][SQL]set max file counter through spark conf [spark]

2024-05-24 Thread via GitHub
guixiaowen commented on PR #46668: URL: https://github.com/apache/spark/pull/46668#issuecomment-2129040807 @LuciferYang Can you review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub
davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613136311 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub
yaooqinn commented on PR #46704: URL: https://github.com/apache/spark/pull/46704#issuecomment-2129013813 Merged to master. Thank you @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub
yaooqinn closed pull request #46704: [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version URL: https://github.com/apache/spark/pull/46704 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [MINOR][CORE] Remove an outdated TODO from UnsafeHashedRelation [spark]

2024-05-24 Thread via GitHub
LuciferYang commented on code in PR #46736: URL: https://github.com/apache/spark/pull/46736#discussion_r1613125740 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala: ## @@ -396,8 +396,6 @@ private[joins] class UnsafeHashedRelation( val

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub
wayneguow commented on code in PR #46731: URL: https://github.com/apache/spark/pull/46731#discussion_r1613124606 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -90,8 +91,8 @@ class ResolveSessionCatalog(val catalogManager:

[PR] [MINOR][CORE] Remove an outdated TODO from UnsafeHashedRelation [spark]

2024-05-24 Thread via GitHub
LuciferYang opened a new pull request, #46736: URL: https://github.com/apache/spark/pull/46736 ### What changes were proposed in this pull request? This pr remove an outdated TODO from `UnsafeHashedRelation`: ``` // TODO(josh): This needs to be revisited before we merge this

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub
panbingkun commented on code in PR #46704: URL: https://github.com/apache/spark/pull/46704#discussion_r1613116527 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala: ## @@ -38,7 +38,7 @@ class

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub
panbingkun commented on code in PR #46704: URL: https://github.com/apache/spark/pull/46704#discussion_r1613116527 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala: ## @@ -38,7 +38,7 @@ class

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub
panbingkun commented on PR #46704: URL: https://github.com/apache/spark/pull/46704#issuecomment-2128980038 cc @dongjoon-hyun @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub
panbingkun commented on code in PR #46704: URL: https://github.com/apache/spark/pull/46704#discussion_r1613092135 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala: ## @@ -38,7 +38,7 @@ class

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub
wayneguow commented on code in PR #46731: URL: https://github.com/apache/spark/pull/46731#discussion_r1613088183 ## common/utils/src/main/resources/error/error-conditions.json: ## Review Comment: > We seem to lack a UT case related to `_LEGACY_ERROR_TEMP_1054`

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub
zhengruifeng commented on code in PR #46733: URL: https://github.com/apache/spark/pull/46733#discussion_r1613083917 ## python/pyspark/sql/types.py: ## @@ -1756,13 +1756,45 @@ def toJson(self, zone_id: str = "UTC") -> str: TimestampNTZType, NullType, VariantType,

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub
zhengruifeng commented on PR #46733: URL: https://github.com/apache/spark/pull/46733#issuecomment-2128905041 > Is this just refactoring or causing any behaviour change? this is just refactoring -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub
panbingkun commented on PR #46731: URL: https://github.com/apache/spark/pull/46731#issuecomment-2128896002 > cc @MaxGekk @panbingkun FYI also cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48384][BUILD] Exclude `io.netty:netty-tcnative-boringssl-static` from `zookeeper` [spark]

2024-05-24 Thread via GitHub
LuciferYang commented on PR #46695: URL: https://github.com/apache/spark/pull/46695#issuecomment-2128895356 Merged into master for Spark 4.0. Thanks @panbingkun @dongjoon-hyun @hasnain-db and @pan3793 ~ -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48384][BUILD] Exclude `io.netty:netty-tcnative-boringssl-static` from `zookeeper` [spark]

2024-05-24 Thread via GitHub
LuciferYang closed pull request #46695: [SPARK-48384][BUILD] Exclude `io.netty:netty-tcnative-boringssl-static` from `zookeeper` URL: https://github.com/apache/spark/pull/46695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47579][SQL][FOLLOWUP] Restore the print format of spark sql shell [spark]

2024-05-24 Thread via GitHub
yaooqinn commented on PR #46735: URL: https://github.com/apache/spark/pull/46735#issuecomment-2128887663 cc @gengliangwang thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub
HyukjinKwon commented on code in PR #46733: URL: https://github.com/apache/spark/pull/46733#discussion_r1613069878 ## python/pyspark/sql/types.py: ## @@ -1756,13 +1756,45 @@ def toJson(self, zone_id: str = "UTC") -> str: TimestampNTZType, NullType, VariantType, +

[PR] [SPARK-47579][SQL][FOLLOWUP] Restore the print format of spark sql shell [spark]

2024-05-24 Thread via GitHub
yaooqinn opened a new pull request, #46735: URL: https://github.com/apache/spark/pull/46735 ### What changes were proposed in this pull request? Restore the print format of spark sql shell ### Why are the changes needed? bugfix ### Does this PR introduce

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub
HyukjinKwon commented on PR #46733: URL: https://github.com/apache/spark/pull/46733#issuecomment-2128885962 Is this just refactoring or causing any behaviour change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-44050][K8S]add retry config when creating Kubernetes resources. [spark]

2024-05-24 Thread via GitHub
liangyouze commented on PR #45911: URL: https://github.com/apache/spark/pull/45911#issuecomment-2128884766 > same problem when using spark operator, it's weird why the code does not throw anything when configmap is not created When using spark-submit, there is no output in the

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub
panbingkun commented on code in PR #46731: URL: https://github.com/apache/spark/pull/46731#discussion_r1613064843 ## common/utils/src/main/resources/error/error-conditions.json: ## Review Comment: We seem to lack a UT case related to `_LEGACY_ERROR_TEMP_1054` -- This

  1   2   3   4   5   6   7   8   9   10   >