Re: [PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
mihailom-db commented on PR #46806: URL: https://github.com/apache/spark/pull/46806#issuecomment-2139508692 @yaooqinn is this considered green? All CIs that were run passed, but most of them were skipped. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-30 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1620668890 ## sql/core/benchmarks/CollationBenchmark-results.txt: ## @@ -1,54 +1,79 @@ -OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure +OpenJDK 64-Bit Server VM

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-30 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1620662883 ## sql/core/benchmarks/CollationBenchmark-results.txt: ## @@ -1,54 +1,79 @@ -OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure +OpenJDK 64-Bit

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-30 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1620663435 ## sql/core/benchmarks/CollationBenchmark-results.txt: ## @@ -1,54 +1,79 @@ -OpenJDK 64-Bit Server VM 17.0.11+9-LTS on Linux 6.5.0-1018-azure +OpenJDK 64-Bit

Re: [PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
yaooqinn commented on PR #46806: URL: https://github.com/apache/spark/pull/46806#issuecomment-2139427708 Thank you for the update. would you mind to raise another PR to target master branch?Then, if CI is green here, we can merge that one with backport s together. -- This is an

Re: [PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
mihailom-db commented on PR #46806: URL: https://github.com/apache/spark/pull/46806#issuecomment-2139410598 @yaooqinn Switched base branch in new PR. Does this PR description seem plausible? -- This is an automated message from the Apache Git Service. To respond to the message, please

[PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
mihailom-db opened a new pull request, #46806: URL: https://github.com/apache/spark/pull/46806 ### What changes were proposed in this pull request? Removal of stripMargin from the code in `DockerJDBCIntegrationV2Suite`. ### Why are the changes needed?

Re: [PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
mihailom-db closed pull request #46803: [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects URL: https://github.com/apache/spark/pull/46803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
yaooqinn commented on PR #46803: URL: https://github.com/apache/spark/pull/46803#issuecomment-2139375639 Can you change the target branch to 3.5 and elaborate a bit more about the issue and solution? -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] [WIP][SPARK-48172][SQL] Fix escaping issues in JDBCDialects [spark]

2024-05-30 Thread via GitHub
mihailom-db opened a new pull request, #46803: URL: https://github.com/apache/spark/pull/46803 ### What changes were proposed in this pull request? Fix for daily job failure on 3.4 branch. ### Why are the changes needed? Tests failing. ### Does this PR introduce

[PR] [WIP][SPARK-48280][SQL] Add Expression Walker for Testing [spark]

2024-05-30 Thread via GitHub
mihailom-db opened a new pull request, #46801: URL: https://github.com/apache/spark/pull/46801 ### What changes were proposed in this pull request? Addition of a test. ### Why are the changes needed? Collations introduced a lot of changes to many functions and this test aims

[PR] [WIP][SPARK-47415][SQL] Add collation support for Levenshtein expression [spark]

2024-05-29 Thread via GitHub
uros-db opened a new pull request, #46788: URL: https://github.com/apache/spark/pull/46788 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
uros-db commented on PR #46772: URL: https://github.com/apache/spark/pull/46772#issuecomment-2136550324 As noted in one of the resolved comments - there's no loss of coverage. However, some tests have been (temporarily) removed because `StringTrim` no longer supports UNICODE collation

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
uros-db commented on code in PR #46772: URL: https://github.com/apache/spark/pull/46772#discussion_r1618200968 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -994,20 +996,6 @@ public void testStringTrim() throws SparkException {

Re: [PR] [WIP] [SPARK-48442] [PYSPARK] Add parenthesis to awaitTermination [spark]

2024-05-28 Thread via GitHub
riyaverm-db commented on PR #46779: URL: https://github.com/apache/spark/pull/46779#issuecomment-2136209502 @chaoqin-li1123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
mkaravel commented on code in PR #46772: URL: https://github.com/apache/spark/pull/46772#discussion_r1617945708 ## common/unsafe/src/test/scala/org/apache/spark/unsafe/types/CollationFactorySuite.scala: ## @@ -48,7 +48,7 @@ class CollationFactorySuite extends AnyFunSuite with

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2024-05-28 Thread via GitHub
pkotikalapudi commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-2135858753 > hey guys, which version will this land in? We have to get reviews and approvals from PMC members and our Sheperd (@HeartSaVioR ) before set a timeline on when it can be

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-28 Thread via GitHub
uros-db commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2134820952 We can leave `PandasMode` for a separate PR, but we'll definitely need to take care of it at one point now that you've explored various options and finished the `groupMapReduce`

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
uros-db commented on PR #46772: URL: https://github.com/apache/spark/pull/46772#issuecomment-2134806618 another note for future reference: as per [ICU docs](https://unicode-org.github.io/icu/userguide/collation/concepts.html), we've decided to stick with **TERTIARY** collation strength for

Re: [PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
uros-db commented on code in PR #46772: URL: https://github.com/apache/spark/pull/46772#discussion_r1616816500 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationFactory.java: ## @@ -245,12 +245,12 @@ public CollationIdentifier identifier() {

[PR] [WIP][SPARK-48435][SQL] UNICODE collation should not support binary equality [spark]

2024-05-28 Thread via GitHub
uros-db opened a new pull request, #46772: URL: https://github.com/apache/spark/pull/46772 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-27 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1615951326 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -135,22 +135,90 @@ public static UTF8String

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-26 Thread via GitHub
GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2132241154 @uros-db ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2024-05-26 Thread via GitHub
stym06 commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-2132173624 hey guys, which version will this land in? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using a scala TreeMap (RB Tree) [spark]

2024-05-24 Thread via GitHub
GideonPotok closed pull request #46404: [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using a scala TreeMap (RB Tree) URL: https://github.com/apache/spark/pull/46404 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2129923419 @uros-db I forgot but should I add collation support to `org.apache.spark.sql.catalyst.expressions.aggregate.PandasMode`? The only difference will be 1. Support for null

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613363048 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613361676 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613358004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613358004 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,16 +90,29 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1613351378 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,21 @@ case class Mode( override def inputTypes:

[PR] [WIP][SPARK-48410][SQL] Fix InitCap expression for UTF8_BINARY_LCASE & ICU collations [spark]

2024-05-24 Thread via GitHub
uros-db opened a new pull request, #46732: URL: https://github.com/apache/spark/pull/46732 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612994451 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -208,87 +208,99 @@ public static boolean execICU(final UTF8String l,

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower & Upper expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612993851 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -208,87 +208,99 @@ public static boolean execICU(final UTF8String l,

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612949455 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -135,22 +135,90 @@ public static UTF8String

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612937162 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -135,22 +135,90 @@ public static UTF8String

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612936409 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationSupport.java: ## @@ -208,87 +208,99 @@ public static boolean execICU(final UTF8String l,

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-24 Thread via GitHub
uros-db commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612933590 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -135,22 +135,90 @@ public static UTF8String

Re: [PR] [WIP][SPARK-48403][SQL] Fix Lower, Upper, InitCap expressions for UTF8_BINARY_LCASE collation [spark]

2024-05-23 Thread via GitHub
mkaravel commented on code in PR #46720: URL: https://github.com/apache/spark/pull/46720#discussion_r1612654646 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -135,22 +135,90 @@ public static UTF8String

[PR] [WIP][SPARK-48318][SQL] Enable hash join support for all collations (complex types) [spark]

2024-05-23 Thread via GitHub
uros-db opened a new pull request, #46722: URL: https://github.com/apache/spark/pull/46722 ### What changes were proposed in this pull request? Enable collation support for hash join on complex types. - Logical plan is rewritten in analysis to (recursively) replace all

Re: [PR] [WIP][SPARK-47353][SQL][Prototype of alternative algorithm] Enable collation support for the Mode expression using multiple experimental approaches [spark]

2024-05-23 Thread via GitHub
GideonPotok closed pull request #46488: [WIP][SPARK-47353][SQL][Prototype of alternative algorithm] Enable collation support for the Mode expression using multiple experimental approaches URL: https://github.com/apache/spark/pull/46488 -- This is an automated message from the Apache Git

Re: [PR] [WIP][SPARK-48397][SQL] Add data write time metric to FileFormatDataWriter [spark]

2024-05-23 Thread via GitHub
jiwen624 commented on code in PR #46714: URL: https://github.com/apache/spark/pull/46714#discussion_r1611925610 ## sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala: ## @@ -837,7 +837,8 @@ class SQLMetricsSuite extends SharedSparkSession with

[PR] [WIP][SPARK-48397][SQL] Add data write time metric to FileFormatDataWriter [spark]

2024-05-23 Thread via GitHub
jiwen624 opened a new pull request, #46714: URL: https://github.com/apache/spark/pull/46714 ### What changes were proposed in this pull request? For FileFormatDataWriter we currently record metrics of "task commit time" and "job commit time" in

[PR] [WIP][SPARK-48392][CORE] Also load `spark-defaults.conf` when provided `--properties-file` [spark]

2024-05-22 Thread via GitHub
sunchao opened a new pull request, #46709: URL: https://github.com/apache/spark/pull/46709 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-22 Thread via GitHub
GideonPotok commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1610561627 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,22 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2123419758 @uros-db I have made changes for all but your latest suggestion (re whitelists -- will add that soon) -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608866089 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,22 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608865653 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,22 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608247812 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,18 +102,56 @@ case class Mode( if (buffer.isEmpty) {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608240127 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,33 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608242763 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,33 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608240127 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,33 @@ case class Mode( override def inputTypes:

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-21 Thread via GitHub
uros-db commented on code in PR #46597: URL: https://github.com/apache/spark/pull/46597#discussion_r1608236375 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -48,6 +49,33 @@ case class Mode( override def inputTypes:

[PR] [WIP][SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-21 Thread via GitHub
uros-db opened a new pull request, #46682: URL: https://github.com/apache/spark/pull/46682 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubstringIndex) [spark]

2024-05-20 Thread via GitHub
mkaravel commented on PR #46589: URL: https://github.com/apache/spark/pull/46589#issuecomment-2121389075 Please update the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-20 Thread via GitHub
mkaravel commented on PR #46511: URL: https://github.com/apache/spark/pull/46511#issuecomment-2121384473 Please fill in the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-20 Thread via GitHub
GideonPotok commented on PR #46597: URL: https://github.com/apache/spark/pull/46597#issuecomment-2121332325 @uros-db I agree that we should avoid auxiliary structures. And I don't see a good way to move the changes to implementation of `merge` and `update` without keeping an auxiliary

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-19 Thread via GitHub
GideonPotok closed pull request #46526: [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce URL: https://github.com/apache/spark/pull/46526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] [WIP] [spark]

2024-05-17 Thread via GitHub
ericm-db opened a new pull request, #46644: URL: https://github.com/apache/spark/pull/46644 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-47227][FOLLOW][DOCS] Building Extensions [spark]

2024-05-17 Thread via GitHub
nchammas commented on code in PR #45340: URL: https://github.com/apache/spark/pull/45340#discussion_r1605122743 ## docs/spark-connect-extending.md: ## @@ -0,0 +1,248 @@ +--- +layout: global +title: Extending Spark Connect with Custom Functionality +license: | + Licensed to the

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-17 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1604485097 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -709,12 +774,24 @@ public void testLocate() throws SparkException {

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-17 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1604482878 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -709,12 +774,24 @@ public void testLocate() throws SparkException {

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-17 Thread via GitHub
mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1603739103 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-17 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1604439268 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-17 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1604437739 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
uros-db commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1604352238 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
uros-db commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1604346112 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software

Re: [PR] [WIP][Spark 44646] Reduce usage of log4j core [spark]

2024-05-16 Thread via GitHub
github-actions[bot] closed pull request #45001: [WIP][Spark 44646] Reduce usage of log4j core URL: https://github.com/apache/spark/pull/45001 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubstringIndex) [spark]

2024-05-16 Thread via GitHub
mkaravel commented on code in PR #46589: URL: https://github.com/apache/spark/pull/46589#discussion_r1603752983 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -278,47 +431,29 @@ public static UTF8String

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (Contains, StartsWith, EndsWith, StringLocate) [spark]

2024-05-16 Thread via GitHub
mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1603716669 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,143 @@ * Utility class for collation-aware

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
cloud-fan commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603420522 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteCollationJoin.scala: ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603381135 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -1051,6 +1052,153 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603377917 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -784,19 +785,19 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603376505 ## sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala: ## @@ -784,19 +785,19 @@ class CollationSuite extends DatasourceV2SQLBase with

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603374449 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -397,7 +398,11 @@ trait JoinSelectionHelper extends Logging { protected

Re: [PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-16 Thread via GitHub
dbatomic commented on code in PR #46599: URL: https://github.com/apache/spark/pull/46599#discussion_r1603367540 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/CollationKey.scala: ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation

[PR] [WIP][SPARK-48298][Core] Add TCP mode to StatsD sink [spark]

2024-05-15 Thread via GitHub
jiwen624 opened a new pull request, #46604: URL: https://github.com/apache/spark/pull/46604 ### What changes were proposed in this pull request? Working on it... ### Why are the changes needed? As mentioned in the Jira ticket: https://issues.apache.org/jira/browse/SPARK-48298

Re: [PR] [WIP][Spark 44646] Reduce usage of log4j core [spark]

2024-05-15 Thread via GitHub
github-actions[bot] commented on PR #45001: URL: https://github.com/apache/spark/pull/45001#issuecomment-2113687236 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[PR] [WIP][SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-15 Thread via GitHub
uros-db opened a new pull request, #46599: URL: https://github.com/apache/spark/pull/46599 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [V2] [spark]

2024-05-15 Thread via GitHub
GideonPotok opened a new pull request, #46597: URL: https://github.com/apache/spark/pull/46597 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[PR] [WIP][SPARK-48281][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringInStr, SubStringIndex) [spark]

2024-05-15 Thread via GitHub
uros-db opened a new pull request, #46589: URL: https://github.com/apache/spark/pull/46589 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-14 Thread via GitHub
GideonPotok commented on code in PR #46526: URL: https://github.com/apache/spark/pull/46526#discussion_r1600668920 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## @@ -22,7 +22,7 @@ import

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1600495650 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -183,6 +204,19 @@ public static int findInSet(final UTF8String

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1600495650 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -183,6 +204,19 @@ public static int findInSet(final UTF8String

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46526: URL: https://github.com/apache/spark/pull/46526#discussion_r1600471419 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## Review Comment: all things considered, I would say proceed with this approach - clean everything

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46526: URL: https://github.com/apache/spark/pull/46526#discussion_r1600468023 ## sql/core/src/test/scala/org/apache/spark/sql/CollationStringExpressionsSuite.scala: ## Review Comment: "mode" is not a StringExpression let's move the

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46526: URL: https://github.com/apache/spark/pull/46526#discussion_r1600463245 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## Review Comment: apropos altering the benchmark to yield better results for this particular

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46526: URL: https://github.com/apache/spark/pull/46526#discussion_r1600457296 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## Review Comment: apropos going through lower first, we need to be careful so as not to destroy the

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-14 Thread via GitHub
GideonPotok commented on code in PR #46526: URL: https://github.com/apache/spark/pull/46526#discussion_r1600041876 ## sql/core/benchmarks/CollationBenchmark-jdk21-results.txt: ## Review Comment: 0. Note, by the way that because we are relying on supportsBinaryEquality,

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1600292083 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -183,6 +204,19 @@ public static int findInSet(final UTF8String

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-14 Thread via GitHub

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599873181 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -102,20 +102,30 @@ public void testContains() throws SparkException {

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599772157 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -183,6 +204,19 @@ public static int findInSet(final UTF8String

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599736471 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -183,6 +204,19 @@ public static int findInSet(final UTF8String

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599689228 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,27 @@ * Utility class for collation-aware UTF8String

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599693830 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,27 @@ * Utility class for collation-aware UTF8String

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
uros-db commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599689228 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,27 @@ * Utility class for collation-aware UTF8String

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599455730 ## common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java: ## @@ -34,6 +34,27 @@ * Utility class for collation-aware

Re: [PR] [WIP][SPARK-48221][SQL] Alter string search logic for UTF8_BINARY_LCASE collation [spark]

2024-05-14 Thread via GitHub
mkaravel commented on code in PR #46511: URL: https://github.com/apache/spark/pull/46511#discussion_r1599447728 ## common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java: ## @@ -102,20 +102,30 @@ public void testContains() throws SparkException {

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-13 Thread via GitHub
uros-db commented on code in PR #46526: URL: https://github.com/apache/spark/pull/46526#discussion_r1598850667 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala: ## @@ -74,6 +78,18 @@ case class Mode( if (buffer.isEmpty) {

  1   2   3   4   5   6   7   8   >