from:"via GitHub"

Re: [PR] Collations proof of concept [spark]

2024-05-24 Thread via GitHub

github-actions[bot] commented on PR #44537: URL: https://github.com/apache/spark/pull/44537#issuecomment-2130548277 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47041] PushDownUtils should support not only FileScanBuilder but any SupportsPushDownCatalystFilters [spark]

2024-05-24 Thread via GitHub

github-actions[bot] commented on PR #45099: URL: https://github.com/apache/spark/pull/45099#issuecomment-2130548267 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-47579][CORE][PART3] Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub

zeotuan commented on PR #46724: URL: https://github.com/apache/spark/pull/46724#issuecomment-2130535309 @gengliangwang Please help review this. I will merge this after #46739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[PR] [SPARK-48416][SQL] Support related nested WITH expression [spark]

2024-05-24 Thread via GitHub

zml1206 opened a new pull request, #46741: URL: https://github.com/apache/spark/pull/46741 ### What changes were proposed in this pull request? Refactor `RewriteWithExpression` logic to support related nested `WITH` expression. Generate `Project` order： 1. internally nested

Re: [PR] [SPARK-48292[CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status [spark]

2024-05-24 Thread via GitHub

viirya commented on PR #46696: URL: https://github.com/apache/spark/pull/46696#issuecomment-2130496440 Looks good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using a scala TreeMap (RB Tree) [spark]

2024-05-24 Thread via GitHub

GideonPotok closed pull request #46404: [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using a scala TreeMap (RB Tree) URL: https://github.com/apache/spark/pull/46404 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-48394][CORE] Cleanup mapIdToMapIndex on mapoutput unregister [spark]

2024-05-24 Thread via GitHub

dongjoon-hyun commented on PR #46706: URL: https://github.com/apache/spark/pull/46706#issuecomment-2130495121 Merged to master only because this was defined as `Improvement`. https://github.com/apache/spark/assets/9700541/e6901f63-cfb8-491b-9368-a840493befdc;> -- This is an

Re: [PR] [WIP] Don't review: E2e [spark]

2024-05-24 Thread via GitHub

GideonPotok closed pull request #46670: [WIP] Don't review: E2e URL: https://github.com/apache/spark/pull/46670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] [SPARK-48394][CORE] Cleanup mapIdToMapIndex on mapoutput unregister [spark]

2024-05-24 Thread via GitHub

dongjoon-hyun closed pull request #46706: [SPARK-48394][CORE] Cleanup mapIdToMapIndex on mapoutput unregister URL: https://github.com/apache/spark/pull/46706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-47579][CORE][PART3] Spark core: Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub

dongjoon-hyun commented on PR #46739: URL: https://github.com/apache/spark/pull/46739#issuecomment-2130487265 Pending CIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-48407][SQL][DOCS] Teradata: Document Type Conversion rules between Spark SQL and teradata [spark]

2024-05-24 Thread via GitHub

dongjoon-hyun closed pull request #46728: [SPARK-48407][SQL][DOCS] Teradata: Document Type Conversion rules between Spark SQL and teradata URL: https://github.com/apache/spark/pull/46728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48325][CORE] Always specify messages in ExecutorRunner.killProcess [spark]

2024-05-24 Thread via GitHub

dongjoon-hyun commented on PR #46641: URL: https://github.com/apache/spark/pull/46641#issuecomment-2130485187 Merged to master for Apache Spark 4.0.0. Thank you, @bozhang2820 and all. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-48325][CORE] Always specify messages in ExecutorRunner.killProcess [spark]

2024-05-24 Thread via GitHub

dongjoon-hyun closed pull request #46641: [SPARK-48325][CORE] Always specify messages in ExecutorRunner.killProcess URL: https://github.com/apache/spark/pull/46641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48292[CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status [spark]

2024-05-24 Thread via GitHub

cloud-fan commented on PR #46696: URL: https://github.com/apache/spark/pull/46696#issuecomment-2130363916 can we also revert https://github.com/apache/spark/pull/46562 in this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-47008][CORE] Added Hadoops fileSystems hasPathCapability check to avoid FileNotFoundException(s) when using S3 Express One Zone Storage. [spark]

2024-05-24 Thread via GitHub

leovegas commented on code in PR #46678: URL: https://github.com/apache/spark/pull/46678#discussion_r1613994258 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala: ## @@ -62,10 +62,10 @@ object OrcUtils extends Logging { val

Re: [PR] [SPARK-47008][CORE] Added Hadoops fileSystems hasPathCapability check to avoid FileNotFoundException(s) when using S3 Express One Zone Storage. [spark]

2024-05-24 Thread via GitHub

leovegas commented on code in PR #46678: URL: https://github.com/apache/spark/pull/46678#discussion_r1613992518 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala: ## @@ -125,16 +125,4 @@ private[hive] object OrcFileOperator extends Logging {

Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-24 Thread via GitHub

cloud-fan commented on code in PR #46580: URL: https://github.com/apache/spark/pull/46580#discussion_r1613945984 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala: ## @@ -20,19 +20,23 @@ package

Re: [PR] [SPARK-48380][CORE][WIP]: SerDeUtil.javaToPython to support batchSize parameter [spark]

2024-05-24 Thread via GitHub

JoshRosen commented on code in PR #46697: URL: https://github.com/apache/spark/pull/46697#discussion_r1613915291 ## core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala: ## @@ -104,12 +104,40 @@ private[spark] object SerDeUtil extends Logging { } } + /** +

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub

JoshRosen commented on code in PR #46705: URL: https://github.com/apache/spark/pull/46705#discussion_r1613904865 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging { */ def

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub

JoshRosen commented on code in PR #46705: URL: https://github.com/apache/spark/pull/46705#discussion_r1613904865 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging { */ def

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub

JoshRosen commented on code in PR #46705: URL: https://github.com/apache/spark/pull/46705#discussion_r1613896716 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging { */ def

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub

JoshRosen commented on code in PR #46705: URL: https://github.com/apache/spark/pull/46705#discussion_r1613891588 ## core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala: ## @@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging { */ def

Re: [PR] [MINOR][SQL] Remove outdated `TODO`s from `UnsafeHashedRelation` [spark]

2024-05-24 Thread via GitHub

JoshRosen commented on code in PR #46736: URL: https://github.com/apache/spark/pull/46736#discussion_r1613884389 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala: ## @@ -409,9 +407,6 @@ private[joins] class UnsafeHashedRelation( val

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub

wayneguow commented on code in PR #46731: URL: https://github.com/apache/spark/pull/46731#discussion_r1613880314 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -90,8 +91,8 @@ class ResolveSessionCatalog(val catalogManager:

Re: [PR] [SPARK-48286] Fix analysis and creation of column with exists default expression [spark]

2024-05-24 Thread via GitHub

cloud-fan commented on PR #46594: URL: https://github.com/apache/spark/pull/46594#issuecomment-2130135726 can we add a test? Besically any default column value that is unfoldable, like `current_date()`, can trigger this bug -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-48286] Fix analysis and creation of column with exists default expression [spark]

2024-05-24 Thread via GitHub

cloud-fan commented on PR #46594: URL: https://github.com/apache/spark/pull/46594#issuecomment-2130132833 Can you retry the github action job? Seems flaky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]

2024-05-24 Thread via GitHub

gengliangwang commented on code in PR #46634: URL: https://github.com/apache/spark/pull/46634#discussion_r1613860819 ## common/utils/src/main/scala/org/apache/spark/internal/README.md: ## Review Comment: I will wait for @mridulm's response until this weekend. -- This

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub

wayneguow commented on code in PR #46731: URL: https://github.com/apache/spark/pull/46731#discussion_r1613124606 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -90,8 +91,8 @@ class ResolveSessionCatalog(val catalogManager:

[PR] [SPARK-48411][SS][PYTHON] Add E2E test for DropDuplicateWithinWatermark [spark]

2024-05-24 Thread via GitHub

eason-yuchen-liu opened a new pull request, #46740: URL: https://github.com/apache/spark/pull/46740 ### What changes were proposed in this pull request? This PR adds a test for API DropDuplicateWithinWatermark, which was previously missing. ### Why are the changes needed?

Re: [PR] [SPARK-47579][CORE][PART3] Spark core: Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub

gengliangwang commented on PR #46739: URL: https://github.com/apache/spark/pull/46739#issuecomment-2130009429 cc @panbingkun @zeotuan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-47579][CORE][PART3] Spark core: Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub

gengliangwang opened a new pull request, #46739: URL: https://github.com/apache/spark/pull/46739 ### What changes were proposed in this pull request? The PR aims to migrate logInfo in Core module with variables to structured logging framework. ### Why are the

Re: [PR] [SPARK-47579][SQL][FOLLOWUP] Restore the `--help` print format of spark sql shell [spark]

2024-05-24 Thread via GitHub

gengliangwang closed pull request #46735: [SPARK-47579][SQL][FOLLOWUP] Restore the `--help` print format of spark sql shell URL: https://github.com/apache/spark/pull/46735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-47579][SQL][FOLLOWUP] Restore the `--help` print format of spark sql shell [spark]

2024-05-24 Thread via GitHub

gengliangwang commented on PR #46735: URL: https://github.com/apache/spark/pull/46735#issuecomment-2129973290 Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub

GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2129923419 @uros-db I forgot but should I add collation support to `org.apache.spark.sql.catalyst.expressions.aggregate.PandasMode`? The only difference will be 1. Support for null

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-24 Thread via GitHub

cloud-fan commented on code in PR #46440: URL: https://github.com/apache/spark/pull/46440#discussion_r1613651414 ## connector/connect/common/src/test/resources/query-tests/explain-results/function_shiftleft.explain: ## @@ -1,2 +1,2 @@ -Project [shiftleft(cast(b#0 as int), 2) AS

Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-24 Thread via GitHub

nikolamand-db commented on PR #46580: URL: https://github.com/apache/spark/pull/46580#issuecomment-2129615940 > I think this fixes https://issues.apache.org/jira/browse/SPARK-46625 as well. Can we add a test to verify? Checked locally, seems like these changes don't resolve the

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub

davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613161561 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613448325 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613447350 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48413][SQL] ALTER COLUMN with collation [spark]

2024-05-24 Thread via GitHub

johanl-db commented on code in PR #46734: URL: https://github.com/apache/spark/pull/46734#discussion_r1613401422 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala: ## @@ -396,8 +396,9 @@ case class AlterTableChangeColumnCommand( val newDataSchema

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613408156 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613407352 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613404598 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613401443 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,155 @@ * Utility class for collation-aware

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub

davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613389337 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1613391451 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -17,15 +17,136 @@ package org.apache.spark.unsafe.types; import

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub

davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613389337 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub

davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613386426 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub

davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613384389 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala: ## @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software

Re: [PR] [SPARK-48415][PYTHON] Refactor `TypeName` to support parameterized datatypes [spark]

2024-05-24 Thread via GitHub

zhengruifeng commented on code in PR #46738: URL: https://github.com/apache/spark/pull/46738#discussion_r1613379691 ## python/pyspark/sql/types.py: ## @@ -120,9 +120,8 @@ def __eq__(self, other: Any) -> bool: def __ne__(self, other: Any) -> bool: return not

[PR] [SPARK-48415][PYTHON] Refactor `TypeName` to support parameterized datatypes [spark]

2024-05-24 Thread via GitHub

zhengruifeng opened a new pull request, #46738: URL: https://github.com/apache/spark/pull/46738 ### What changes were proposed in this pull request? 1, refactor `TypeName` to support parameterized datatypes 2, remove redundant simpleString/jsonValue methods, since they are type name

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1613207707 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -147,6 +162,45 @@ public static String toLowerCase(final String

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613363048 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613361676 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613358004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613358004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613351378 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,21 @@ case class Mode( override def inputTypes:

Re: [PR] [SPARK-44050][K8S]add retry config when creating Kubernetes resources. [spark]

2024-05-24 Thread via GitHub

imtzer commented on PR #45911: URL: https://github.com/apache/spark/pull/45911#issuecomment-2129285181 > > same problem when using spark operator, it's weird why the code does not throw anything when configmap is not created > > When using spark-submit, there is no error output in

Re: [PR] [SPARK-48309][YARN]Stop am retry, in situations where some errors and retries may not be successful [spark]

2024-05-24 Thread via GitHub

guixiaowen commented on PR #46620: URL: https://github.com/apache/spark/pull/46620#issuecomment-2129261156 > Hi, @guixiaowen I need some time to think about it as it might break some existing workloads. > > Meantime, you can > > * Update the PR desc for better readability >

[PR] [SPARK-48414][PYTHON] Fix breaking change in python's `fromJson` [spark]

2024-05-24 Thread via GitHub

stefankandic opened a new pull request, #46737: URL: https://github.com/apache/spark/pull/46737 ### What changes were proposed in this pull request? Fix breaking change in `fromJson` method by having default param values. ### Why are the changes needed? In

Re: [PR] [SPARK-48309][YARN]Stop am retry, in situations where some errors and retries may not be successful [spark]

2024-05-24 Thread via GitHub

yaooqinn commented on PR #46620: URL: https://github.com/apache/spark/pull/46620#issuecomment-2129183148 Hi, @guixiaowen I need some time to think about it as it might break some existing workloads. Meantime, you can - Update the PR desc for better readability - Update the PR

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-24 Thread via GitHub

ulysses-you closed pull request #46440: [SPARK-48168][SQL] Add bitwise shifting operators support URL: https://github.com/apache/spark/pull/46440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-24 Thread via GitHub

ulysses-you commented on PR #46440: URL: https://github.com/apache/spark/pull/46440#issuecomment-2129180382 thanks, merged to master(4.0.0) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46682: URL: https://github.com/apache/spark/pull/46682#discussion_r1613221638 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -610,8 +610,42 @@ public void testFindInSet() throws SparkException {

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub

dbatomic commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613213985 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46700: URL: https://github.com/apache/spark/pull/46700#discussion_r1613207707 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -147,6 +162,45 @@ public static String toLowerCase(final String

Re: [PR] [SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-24 Thread via GitHub

uros-db commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1612779931 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -279,6 +280,7 @@ abstract class Optimizer(catalogManager: CatalogManager)

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub

zhengruifeng commented on PR #46733: URL: https://github.com/apache/spark/pull/46733#issuecomment-2129082503 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub

zhengruifeng closed pull request #46733: [SPARK-48412][PYTHON] Refactor data type json parse URL: https://github.com/apache/spark/pull/46733 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [MINOR][SQL] Remove outdated `TODO`s from `UnsafeHashedRelation` [spark]

2024-05-24 Thread via GitHub

LuciferYang commented on code in PR #46736: URL: https://github.com/apache/spark/pull/46736#discussion_r1613184450 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala: ## @@ -409,9 +407,6 @@ private[joins] class UnsafeHashedRelation( val

Re: [PR] [MINOR][SQL] Remove outdated `TODO`s from `UnsafeHashedRelation` [spark]

2024-05-24 Thread via GitHub

LuciferYang commented on code in PR #46736: URL: https://github.com/apache/spark/pull/46736#discussion_r1613184450 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala: ## @@ -409,9 +407,6 @@ private[joins] class UnsafeHashedRelation( val

Re: [PR] [SPARK-48413][SQL] ALTER COLUMN with collation [spark]

2024-05-24 Thread via GitHub

olaky commented on code in PR #46734: URL: https://github.com/apache/spark/pull/46734#discussion_r1613167082 ## sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala: ## @@ -408,6 +408,37 @@ object DataType { } } + /** + * Check if `from` is equal to

Re: [PR] [MINOR][SQL] Remove an outdated `TODO` from UnsafeHashedRelation [spark]

2024-05-24 Thread via GitHub

LuciferYang commented on PR #46736: URL: https://github.com/apache/spark/pull/46736#issuecomment-2129074503 > How about removing the `TODO(josh)` at L412 [done](https://github.com/apache/spark/pull/9127/files#diff-127291a0287f790755be5473765ea03eb65f8b58b9ec0760955f124e21e3452f)

Re: [PR] [SPARK-48213][SQL] Do not push down predicate if non-cheap expression exceed reused limit [spark]

2024-05-24 Thread via GitHub

zml1206 commented on PR #46499: URL: https://github.com/apache/spark/pull/46499#issuecomment-2129064493 > Can we make `With` nested as well? I have thought about it for a long time, and I can change the logic of withRewrite. The `alias` generated by the lowest layer with is in

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub

davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613161561 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [MINOR][SQL] Remove an outdated `TODO` from UnsafeHashedRelation [spark]

2024-05-24 Thread via GitHub

yaooqinn commented on PR #46736: URL: https://github.com/apache/spark/pull/46736#issuecomment-2129046316 How about removing the `TODO(josh)` at L412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48396] Support configuring max cores can be used for SQL [spark]

2024-05-24 Thread via GitHub

yabola commented on PR #46713: URL: https://github.com/apache/spark/pull/46713#issuecomment-2129043469 @cloud-fan could you take a look, thank you~ This is useful in a shared sql cluster. This will make it easier to control sql. The picture below shows that cores used is consistent.

Re: [PR] [SPARK-48352][SQL]set max file counter through spark conf [spark]

2024-05-24 Thread via GitHub

guixiaowen commented on PR #46668: URL: https://github.com/apache/spark/pull/46668#issuecomment-2129040807 @LuciferYang Can you review this？ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub

davidm-db commented on code in PR #46665: URL: https://github.com/apache/spark/pull/46665#discussion_r1613136311 ## sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; } public boolean

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub

yaooqinn commented on PR #46704: URL: https://github.com/apache/spark/pull/46704#issuecomment-2129013813 Merged to master. Thank you @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub

yaooqinn closed pull request #46704: [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version URL: https://github.com/apache/spark/pull/46704 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [MINOR][CORE] Remove an outdated TODO from UnsafeHashedRelation [spark]

2024-05-24 Thread via GitHub

LuciferYang commented on code in PR #46736: URL: https://github.com/apache/spark/pull/46736#discussion_r1613125740 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala: ## @@ -396,8 +396,6 @@ private[joins] class UnsafeHashedRelation( val

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub

wayneguow commented on code in PR #46731: URL: https://github.com/apache/spark/pull/46731#discussion_r1613124606 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -90,8 +91,8 @@ class ResolveSessionCatalog(val catalogManager:

[PR] [MINOR][CORE] Remove an outdated TODO from UnsafeHashedRelation [spark]

2024-05-24 Thread via GitHub

LuciferYang opened a new pull request, #46736: URL: https://github.com/apache/spark/pull/46736 ### What changes were proposed in this pull request? This pr remove an outdated TODO from `UnsafeHashedRelation`: ``` // TODO(josh): This needs to be revisited before we merge this

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub

panbingkun commented on code in PR #46704: URL: https://github.com/apache/spark/pull/46704#discussion_r1613116527 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala: ## @@ -38,7 +38,7 @@ class

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub

panbingkun commented on code in PR #46704: URL: https://github.com/apache/spark/pull/46704#discussion_r1613116527 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MariaDBKrbIntegrationSuite.scala: ## @@ -38,7 +38,7 @@ class

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub

panbingkun commented on PR #46704: URL: https://github.com/apache/spark/pull/46704#issuecomment-2128980038 cc @dongjoon-hyun @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub

panbingkun commented on code in PR #46704: URL: https://github.com/apache/spark/pull/46704#discussion_r1613092135 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresKrbIntegrationSuite.scala: ## @@ -38,7 +38,7 @@ class

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub

wayneguow commented on code in PR #46731: URL: https://github.com/apache/spark/pull/46731#discussion_r1613088183 ## common/utils/src/main/resources/error/error-conditions.json: ## Review Comment: > We seem to lack a UT case related to `_LEGACY_ERROR_TEMP_1054`

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub

zhengruifeng commented on code in PR #46733: URL: https://github.com/apache/spark/pull/46733#discussion_r1613083917 ## python/pyspark/sql/types.py: ## @@ -1756,13 +1756,45 @@ def toJson(self, zone_id: str = "UTC") -> str: TimestampNTZType, NullType, VariantType,

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub

zhengruifeng commented on PR #46733: URL: https://github.com/apache/spark/pull/46733#issuecomment-2128905041 > Is this just refactoring or causing any behaviour change? this is just refactoring -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub

panbingkun commented on PR #46731: URL: https://github.com/apache/spark/pull/46731#issuecomment-2128896002 > cc @MaxGekk @panbingkun FYI also cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48384][BUILD] Exclude `io.netty:netty-tcnative-boringssl-static` from `zookeeper` [spark]

2024-05-24 Thread via GitHub

LuciferYang commented on PR #46695: URL: https://github.com/apache/spark/pull/46695#issuecomment-2128895356 Merged into master for Spark 4.0. Thanks @panbingkun @dongjoon-hyun @hasnain-db and @pan3793 ~ -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48384][BUILD] Exclude `io.netty:netty-tcnative-boringssl-static` from `zookeeper` [spark]

2024-05-24 Thread via GitHub

LuciferYang closed pull request #46695: [SPARK-48384][BUILD] Exclude `io.netty:netty-tcnative-boringssl-static` from `zookeeper` URL: https://github.com/apache/spark/pull/46695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47579][SQL][FOLLOWUP] Restore the print format of spark sql shell [spark]

2024-05-24 Thread via GitHub

yaooqinn commented on PR #46735: URL: https://github.com/apache/spark/pull/46735#issuecomment-2128887663 cc @gengliangwang thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub

HyukjinKwon commented on code in PR #46733: URL: https://github.com/apache/spark/pull/46733#discussion_r1613069878 ## python/pyspark/sql/types.py: ## @@ -1756,13 +1756,45 @@ def toJson(self, zone_id: str = "UTC") -> str: TimestampNTZType, NullType, VariantType, +

[PR] [SPARK-47579][SQL][FOLLOWUP] Restore the print format of spark sql shell [spark]

2024-05-24 Thread via GitHub

yaooqinn opened a new pull request, #46735: URL: https://github.com/apache/spark/pull/46735 ### What changes were proposed in this pull request? Restore the print format of spark sql shell ### Why are the changes needed? bugfix ### Does this PR introduce

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub

HyukjinKwon commented on PR #46733: URL: https://github.com/apache/spark/pull/46733#issuecomment-2128885962 Is this just refactoring or causing any behaviour change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-44050][K8S]add retry config when creating Kubernetes resources. [spark]

2024-05-24 Thread via GitHub

liangyouze commented on PR #45911: URL: https://github.com/apache/spark/pull/45911#issuecomment-2128884766 > same problem when using spark operator, it's weird why the code does not throw anything when configmap is not created When using spark-submit, there is no output in the

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub

panbingkun commented on code in PR #46731: URL: https://github.com/apache/spark/pull/46731#discussion_r1613064843 ## common/utils/src/main/resources/error/error-conditions.json: ## Review Comment: We seem to lack a UT case related to `_LEGACY_ERROR_TEMP_1054` -- This

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 68215 matches

Mail list logo