date:20240524

[PR] [SPARK-48421][SQL] SPJ: Add documentation [spark]

2024-05-24 Thread via GitHub



szehon-ho opened a new pull request, #46745:
URL: https://github.com/apache/spark/pull/46745

   
   ### What changes were proposed in this pull request?
   Add docs for SPJ
   
   ### Why are the changes needed?
   There are no docs describing SPJ, even though it is mentioned in migration 
notes:  https://github.com/apache/spark/pull/46673
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Checked the new text
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48329][SQL] Enable `spark.sql.sources.v2.bucketing.pushPartValues.enabled` by default [spark]

2024-05-24 Thread via GitHub



szehon-ho commented on PR #46673:
URL: https://github.com/apache/spark/pull/46673#issuecomment-2130960125

   @dongjoon-hyun do you have any guidance how we may have a new Spark 4.0+ 
tracking JIRA to link all the new SPJ items?  I think it will be nice to have 
all of them in one list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]

2024-05-24 Thread via GitHub



panbingkun commented on code in PR #46634:
URL: https://github.com/apache/spark/pull/46634#discussion_r1614383361


##
common/utils/src/main/scala/org/apache/spark/internal/README.md:
##


Review Comment:
   Okay, I have adopted the plan of deleting `README.md`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48394][CORE] Cleanup mapIdToMapIndex on mapoutput unregister [spark]

2024-05-24 Thread via GitHub



Ngone51 commented on PR #46706:
URL: https://github.com/apache/spark/pull/46706#issuecomment-2130842872

   @dongjoon-hyun  @mridulm  Sorry, can we make it a bug and backport to 
maintenance release branches? This actually causes us an issue internally. I 
was pushing a quick fix before realizing it is the root cause.
   
   The issue leads to shuffle fetch failure and the job failure in the end. It 
happens this way:
   
   1) Stage A computes the partition P0 by task t1 (TID, a.k.a mapId) on 
executor e1
   2) Executor Y starts deommission
   3) Executor Y reports false-positve lost to driver during its decommission
   4) Stage B reuse the shuffle dependency with Stage A, and computes the 
partition P0 again by task t2 on executor e2
   5) When task t2 finishes, we see two items ((t1 -> P0), (t2 -> P0)) for the 
same paritition  in `mapIdToMapIndex` but only one item 
(mapStatuses(P0)=MapStatus(t2, e2)) in `mapStatuses`.
   6) Executor Y starts to migrate task t1's mapstatus (to executor e3 for 
example) and call `updateMapOutput` on driver. Regarding to 5), we'd use mapId 
(i.e., t1) to get mapIndex (i.e., P0) and use P0 to get task t2's mapstatus.
   
   ```scala
   // updateMapOutput
   val mapIndex = mapIdToMapIndex.get(mapId)
   val mapStatusOpt = mapIndex.map(mapStatuses(_)).flatMap(Option(_))
   ```
   7) Task t2's mapstatus's location then would be updated to executor e3 but 
it's indeed still located on executor e2. This finally leads to the fetch 
failure in the end.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]

2024-05-24 Thread via GitHub



mridulm commented on code in PR #46634:
URL: https://github.com/apache/spark/pull/46634#discussion_r1614353283


##
common/utils/src/main/scala/org/apache/spark/internal/README.md:
##


Review Comment:
   IIRC @panbingkun included the contents into the existing code; which is also 
a nice addition to the code documentation.
   IMO that is a better overall approach given this is internal to spark.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-48420][BUILD] Upgrade netty to `4.1.110.Final` [spark]

2024-05-24 Thread via GitHub



panbingkun opened a new pull request, #46744:
URL: https://github.com/apache/spark/pull/46744

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48239][INFRA][FOLLOWUP] install the missing `jq` library [spark]

2024-05-24 Thread via GitHub



cloud-fan closed pull request #46743: [SPARK-48239][INFRA][FOLLOWUP] install 
the missing `jq` library
URL: https://github.com/apache/spark/pull/46743


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48239][INFRA][FOLLOWUP] install the missing `jq` library [spark]

2024-05-24 Thread via GitHub



cloud-fan commented on PR #46743:
URL: https://github.com/apache/spark/pull/46743#issuecomment-2130735901

   thanks for review, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47579][CORE][PART3] Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub



gengliangwang commented on PR #46724:
URL: https://github.com/apache/spark/pull/46724#issuecomment-2130729394

   @zeotuan I was aware of this PR when I worked on 
https://github.com/apache/spark/pull/46739, since you were not responding to 
the jira comment.
   
   Please resolve the merge conflicts and I will review and merge this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47579][CORE][PART3] Spark core: Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub



gengliangwang closed pull request #46739: [SPARK-47579][CORE][PART3] Spark 
core: Migrate logInfo with variables to structured logging framework
URL: https://github.com/apache/spark/pull/46739


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47579][CORE][PART3] Spark core: Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub



gengliangwang commented on PR #46739:
URL: https://github.com/apache/spark/pull/46739#issuecomment-2130728636

   @cloud-fan @dongjoon-hyun thanks for the review.
   Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48239][INFRA][FOLLOWUP] install the missing `jq` library [spark]

2024-05-24 Thread via GitHub



cloud-fan commented on PR #46743:
URL: https://github.com/apache/spark/pull/46743#issuecomment-2130727208

   cc @HyukjinKwon @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48239][INFRA][FOLLOWUP] install the missing `jq` library [spark]

2024-05-24 Thread via GitHub



cloud-fan commented on code in PR #46743:
URL: https://github.com/apache/spark/pull/46743#discussion_r1614307758


##
dev/create-release/release-util.sh:
##
@@ -128,6 +128,9 @@ function get_release_info {
 RC_COUNT=1
   fi
 
+  if [ "$GIT_BRANCH" = "master" ]; then
+RELEASE_VERSION="$RELEASE_VERSION-preview1"

Review Comment:
   This is unrelated change, but gives better prompt for preview releases



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-48239][INFRA][FOLLOWUP] install the missing `jq` library [spark]

2024-05-24 Thread via GitHub



cloud-fan opened a new pull request, #46743:
URL: https://github.com/apache/spark/pull/46743

   
   
   ### What changes were proposed in this pull request?
   
   This is a followup of https://github.com/apache/spark/pull/46534 . We missed 
the `jq` library which is needed to create git tags.
   
   ### Why are the changes needed?
   
   fix bug
   
   ### Does this PR introduce _any_ user-facing change?
   
   no
   
   ### How was this patch tested?
   
   manual
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48419][SQL] Foldable propagation replace foldable column shoul… [spark]

2024-05-24 Thread via GitHub



KnightChess closed pull request #46742: [SPARK-48419][SQL] Foldable propagation 
replace foldable column shoul…
URL: https://github.com/apache/spark/pull/46742


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48415][PYTHON] Refactor `TypeName` to support parameterized datatypes [spark]

2024-05-24 Thread via GitHub



HyukjinKwon commented on PR #46738:
URL: https://github.com/apache/spark/pull/46738#issuecomment-2130648241

   I think we should list all behaviour changes. `typeName` is sort of an API 
too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-48419][SQL] Foldable propagation replace foldable column shoul… [spark]

2024-05-24 Thread via GitHub



KnightChess opened a new pull request, #46742:
URL: https://github.com/apache/spark/pull/46742

   …d use origin column name
   
   ### What changes were proposed in this pull request?
   fix optimizer rule `FoldablePropagation` will change column name, use origin 
name.
   
   ### Why are the changes needed?
   fix bug
   
   
   ### Does this PR introduce _any_ user-facing change?
   Noe
   
   
   ### How was this patch tested?
   Added UT
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] Collations proof of concept [spark]

2024-05-24 Thread via GitHub



github-actions[bot] commented on PR #44537:
URL: https://github.com/apache/spark/pull/44537#issuecomment-2130548277

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47041] PushDownUtils should support not only FileScanBuilder but any SupportsPushDownCatalystFilters [spark]

2024-05-24 Thread via GitHub



github-actions[bot] commented on PR #45099:
URL: https://github.com/apache/spark/pull/45099#issuecomment-2130548267

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47579][CORE][PART3] Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub



zeotuan commented on PR #46724:
URL: https://github.com/apache/spark/pull/46724#issuecomment-2130535309

   @gengliangwang Please help review this. I will merge this after #46739


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-48416][SQL] Support related nested WITH expression [spark]

2024-05-24 Thread via GitHub



zml1206 opened a new pull request, #46741:
URL: https://github.com/apache/spark/pull/46741

   
   ### What changes were proposed in this pull request?
   Refactor `RewriteWithExpression` logic to support related nested `WITH` 
expression. 
   Generate `Project` order：
   1. internally nested with of main expression definitions
   2. main expression
   3. internally nested with of main expression
   
   
   ### Why are the changes needed?
   Optimize `WITH` nested to apply to more scenarios, such as push down 
predicate.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Unit test.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48292[CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status [spark]

2024-05-24 Thread via GitHub



viirya commented on PR #46696:
URL: https://github.com/apache/spark/pull/46696#issuecomment-2130496440

   Looks good to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using a scala TreeMap (RB Tree) [spark]

2024-05-24 Thread via GitHub



GideonPotok closed pull request #46404: [WIP][SPARK-47353][SQL] Enable 
collation support for the Mode expression using a scala TreeMap (RB Tree) 
URL: https://github.com/apache/spark/pull/46404


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48394][CORE] Cleanup mapIdToMapIndex on mapoutput unregister [spark]

2024-05-24 Thread via GitHub



dongjoon-hyun commented on PR #46706:
URL: https://github.com/apache/spark/pull/46706#issuecomment-2130495121

   Merged to master only because this was defined as `Improvement`.
   
   https://github.com/apache/spark/assets/9700541/e6901f63-cfb8-491b-9368-a840493befdc";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP] Don't review: E2e [spark]

2024-05-24 Thread via GitHub



GideonPotok closed pull request #46670: [WIP] Don't review: E2e
URL: https://github.com/apache/spark/pull/46670


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48394][CORE] Cleanup mapIdToMapIndex on mapoutput unregister [spark]

2024-05-24 Thread via GitHub



dongjoon-hyun closed pull request #46706: [SPARK-48394][CORE] Cleanup 
mapIdToMapIndex on mapoutput unregister
URL: https://github.com/apache/spark/pull/46706


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47579][CORE][PART3] Spark core: Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub



dongjoon-hyun commented on PR #46739:
URL: https://github.com/apache/spark/pull/46739#issuecomment-2130487265

   Pending CIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48407][SQL][DOCS] Teradata: Document Type Conversion rules between Spark SQL and teradata [spark]

2024-05-24 Thread via GitHub



dongjoon-hyun closed pull request #46728: [SPARK-48407][SQL][DOCS] Teradata: 
Document Type Conversion rules between Spark SQL and teradata
URL: https://github.com/apache/spark/pull/46728


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48325][CORE] Always specify messages in ExecutorRunner.killProcess [spark]

2024-05-24 Thread via GitHub



dongjoon-hyun commented on PR #46641:
URL: https://github.com/apache/spark/pull/46641#issuecomment-2130485187

   Merged to master for Apache Spark 4.0.0. 
   
   Thank you, @bozhang2820 and all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48325][CORE] Always specify messages in ExecutorRunner.killProcess [spark]

2024-05-24 Thread via GitHub



dongjoon-hyun closed pull request #46641: [SPARK-48325][CORE] Always specify 
messages in ExecutorRunner.killProcess
URL: https://github.com/apache/spark/pull/46641


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48292[CORE] Revert [SPARK-39195][SQL] Spark OutputCommitCoordinator should abort stage when committed file not consistent with task status [spark]

2024-05-24 Thread via GitHub



cloud-fan commented on PR #46696:
URL: https://github.com/apache/spark/pull/46696#issuecomment-2130363916

   can we also revert https://github.com/apache/spark/pull/46562 in this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47008][CORE] Added Hadoops fileSystems hasPathCapability check to avoid FileNotFoundException(s) when using S3 Express One Zone Storage. [spark]

2024-05-24 Thread via GitHub



leovegas commented on code in PR #46678:
URL: https://github.com/apache/spark/pull/46678#discussion_r1613994258


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala:
##
@@ -62,10 +62,10 @@ object OrcUtils extends Logging {
   val CATALYST_TYPE_ATTRIBUTE_NAME = "spark.sql.catalyst.type"
 
   def listOrcFiles(pathStr: String, conf: Configuration): Seq[Path] = {
+// TODO: Check if the paths coming in are already qualified and simplify.
 val origPath = new Path(pathStr)
 val fs = origPath.getFileSystem(conf)
 val paths = SparkHadoopUtil.get.listLeafStatuses(fs, origPath)
-  .filterNot(_.isDirectory)

Review Comment:
   listFiles in listLeafStatuses provides FileStatus of the files only.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47008][CORE] Added Hadoops fileSystems hasPathCapability check to avoid FileNotFoundException(s) when using S3 Express One Zone Storage. [spark]

2024-05-24 Thread via GitHub



leovegas commented on code in PR #46678:
URL: https://github.com/apache/spark/pull/46678#discussion_r1613992518


##
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala:
##
@@ -125,16 +125,4 @@ private[hive] object OrcFileOperator extends Logging {
   path: String, conf: Option[Configuration]): 
Option[StructObjectInspector] = {
 getFileReader(path, 
conf).map(_.getObjectInspector.asInstanceOf[StructObjectInspector])
   }
-
-  def listOrcFiles(pathStr: String, conf: Configuration): Seq[Path] = {

Review Comment:
   Removed this method because duplicate of 
   
https://github.com/apache/spark/blob/99213d7123a88e1fc79bc73b898c4971c78f37ff/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala#L64



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-24 Thread via GitHub



cloud-fan commented on code in PR #46580:
URL: https://github.com/apache/spark/pull/46580#discussion_r1613945984


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveIdentifierClause.scala:
##
@@ -20,19 +20,23 @@ package org.apache.spark.sql.catalyst.analysis
 import org.apache.spark.sql.catalyst.expressions.{AliasHelper, EvalHelper, 
Expression}
 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
 import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
-import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor}
 import org.apache.spark.sql.catalyst.trees.TreePattern.UNRESOLVED_IDENTIFIER
 import org.apache.spark.sql.types.StringType
 
 /**
  * Resolves the identifier expressions and builds the original 
plans/expressions.
  */
-object ResolveIdentifierClause extends Rule[LogicalPlan] with AliasHelper with 
EvalHelper {
+class ResolveIdentifierClause(earlyBatches: 
Seq[RuleExecutor[LogicalPlan]#Batch])
+  extends Rule[LogicalPlan] with AliasHelper with EvalHelper {
 
   override def apply(plan: LogicalPlan): LogicalPlan = 
plan.resolveOperatorsUpWithPruning(
 _.containsAnyPattern(UNRESOLVED_IDENTIFIER)) {
 case p: PlanWithUnresolvedIdentifier if p.identifierExpr.resolved =>
-  p.planBuilder.apply(evalIdentifierExpr(p.identifierExpr))
+  val executor = new RuleExecutor[LogicalPlan] {

Review Comment:
   We only need to create one `RuleExecutor` instance once in this rule.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48380][CORE][WIP]: SerDeUtil.javaToPython to support batchSize parameter [spark]

2024-05-24 Thread via GitHub



JoshRosen commented on code in PR #46697:
URL: https://github.com/apache/spark/pull/46697#discussion_r1613915291


##
core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala:
##
@@ -104,12 +104,40 @@ private[spark] object SerDeUtil extends Logging {
 }
   }
 
+  /**
+   * Use a fixed batch size
+   */
+  private[spark] class FixedBatchedPickler(iter: Iterator[Any], batch: Int) 
extends Iterator[Array[Byte]] {

Review Comment:
   Looks like Scalastyle is complaining about line length, so you might need to 
wrap this like
   
   ```suggestion
 private[spark] class FixedBatchedPickler(
 iter: Iterator[Any],
 batch: Int)
   extends Iterator[Array[Byte]] {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub



JoshRosen commented on code in PR #46705:
URL: https://github.com/apache/spark/pull/46705#discussion_r1613904865


##
core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala:
##
@@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging {
*/
   def fromAccumulators(accums: Seq[AccumulatorV2[_, _]]): TaskMetrics = {
 val tm = new TaskMetrics
-for (acc <- accums) {
+val (innerAccums, externalAccums) = accums.
+  partition(t => t.name.isDefined && tm.nameToAccums.contains(t.name.get))

Review Comment:
   Just thinking aloud here:
   
   Do we do a lot of intermediate collections allocations in this `.partition` 
operation itself? e.g. under the hood, would we be resizing an array builder or 
doing a bunch of linked list construction operations?
   
   I am wondering whether instead it might be faster (or at least more 
predicable / easier to reason about) to construct a Java ArrayBuilder / 
ArrayBuffer, imperatively append the external accumulators to it, then do a 
single addAll call at the end, e.g. something like
   
   ```scala
   def fromAccumulators(accums: Seq[AccumulatorV2[_, _]]): TaskMetrics = {
   val tm = new TaskMetrics
   val externalAccums = new java.util.ArrayList[AccumulatorV2[Any, Any]]()
   for (acc <- accums) {
 val name = acc.name
 if (name.isDefined && tm.nameToAccums.contains(name.get)) {
   val tmAcc = 
tm.nameToAccums(name.get).asInstanceOf[AccumulatorV2[Any, Any]]
   tmAcc.metadata = acc.metadata
   tmAcc.merge(acc.asInstanceOf[AccumulatorV2[Any, Any]])
 } else {
   externalAccums.add(acc)
 }
   }
   tm._externalAccums.addAll(externalAccums)
   tm
 }
   ```
   
   This is less net code churn and is probably either comparable in performance 
or faster, and avoids any performance unpredictability as a function of the Seq 
type.
   
   Given that this code runs in a hot path, I think the use of imperative 
mutation vs. a more pure functional approach is fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub



JoshRosen commented on code in PR #46705:
URL: https://github.com/apache/spark/pull/46705#discussion_r1613904865


##
core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala:
##
@@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging {
*/
   def fromAccumulators(accums: Seq[AccumulatorV2[_, _]]): TaskMetrics = {
 val tm = new TaskMetrics
-for (acc <- accums) {
+val (innerAccums, externalAccums) = accums.
+  partition(t => t.name.isDefined && tm.nameToAccums.contains(t.name.get))

Review Comment:
   Just thinking aloud here:
   
   Do we do a lot of intermediate collections allocations in this `.partition` 
operation itself? e.g. under the hood, would we be resizing an array builder or 
doing a bunch of linked list construction operations?
   
   I am wondering whether instead it might be faster (or at least more 
predicable / easier to reason about) to construct a Java ArrayBuilder / 
ArrayBuffer, imperatively append the external accumulators to it, then do a 
single addAll call at the end, e.g. something like
   
   ```scala
   def fromAccumulators(accums: Seq[AccumulatorV2[_, _]]): TaskMetrics = {
   val tm = new TaskMetrics
   val externalAccums = new java.util.ArrayList[AccumulatorV2[Any, Any]]()
   for (acc <- accums) {
 val name = acc.name
 if (name.isDefined && tm.nameToAccums.contains(name.get)) {
   val tmAcc = 
tm.nameToAccums(name.get).asInstanceOf[AccumulatorV2[Any, Any]]
   tmAcc.metadata = acc.metadata
   tmAcc.merge(acc.asInstanceOf[AccumulatorV2[Any, Any]])
 } else {
   externalAccums.add(acc)
 }
   }
   tm._externalAccums.addAll(externalAccums)
   tm
 }
   ```
   
   This is less net code churn and is probably either comparable in performance 
or faster, and avoids any performance unpredictability as a function of the Seq 
type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub



JoshRosen commented on code in PR #46705:
URL: https://github.com/apache/spark/pull/46705#discussion_r1613896716


##
core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala:
##
@@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging {
*/
   def fromAccumulators(accums: Seq[AccumulatorV2[_, _]]): TaskMetrics = {
 val tm = new TaskMetrics
-for (acc <- accums) {
+val (innerAccums, externalAccums) = accums.

Review Comment:
   > tm.nameToAccums is a fixed size LinkedHashMap, why do we need create 
another local variable？
   
   I think the concern was that each call to `.nameToAccums` under the hood 
goes through a `lazy val` which is slightly more expensive to access than an 
ordinary field.
   
   On the other hand, I think the old and new code are performing the same 
total number of `nameToAccums` accesses:
   
   - Before, we'd call `tm.nameToAccums` once per named accumulator to check 
whether it was a task metric, then a second time for each task metric to 
retrieve the actual value.
   - In the new code we perform the same calls but in a different sequence: we 
now do all of the initial calls in a batch then do the second "retrieve the 
actual accum" in a second pass.
   
   Given that we're performing the same total number of calls, I don't think 
that the new code represents a perf. regression w.r.t. those accesses, so the 
suggestion of storing `val nameToAccums = tm.nameToAccums` would just be a 
small performance optimization rather than a perf. regression fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48391][CORE]Using addAll instead of add function in fromAccumulatorInfos method of TaskMetrics Class [spark]

2024-05-24 Thread via GitHub



JoshRosen commented on code in PR #46705:
URL: https://github.com/apache/spark/pull/46705#discussion_r1613891588


##
core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala:
##
@@ -328,16 +328,15 @@ private[spark] object TaskMetrics extends Logging {
*/
   def fromAccumulators(accums: Seq[AccumulatorV2[_, _]]): TaskMetrics = {
 val tm = new TaskMetrics
-for (acc <- accums) {
+val (innerAccums, externalAccums) = accums.
+  partition(t => t.name.isDefined && tm.nameToAccums.contains(t.name.get))
+for (acc <- innerAccums) {
   val name = acc.name
-  if (name.isDefined && tm.nameToAccums.contains(name.get)) {
-val tmAcc = tm.nameToAccums(name.get).asInstanceOf[AccumulatorV2[Any, 
Any]]
-tmAcc.metadata = acc.metadata
-tmAcc.merge(acc.asInstanceOf[AccumulatorV2[Any, Any]])
-  } else {
-tm._externalAccums.add(acc)
-  }
+  val tmAcc = tm.nameToAccums(name.get).asInstanceOf[AccumulatorV2[Any, 
Any]]
+  tmAcc.metadata = acc.metadata
+  tmAcc.merge(acc.asInstanceOf[AccumulatorV2[Any, Any]])
 }
+tm._externalAccums.addAll(externalAccums.asJava)

Review Comment:
   Yeah, it looks like `CopyOnWriteArrayList.addAll` performs at most one array 
copy: 
https://github.com/AdoptOpenJDK/openjdk-jdk12u/blame/master/src/java.base/share/classes/java/util/concurrent/CopyOnWriteArrayList.java#L724-L751



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [MINOR][SQL] Remove outdated `TODO`s from `UnsafeHashedRelation` [spark]

2024-05-24 Thread via GitHub



JoshRosen commented on code in PR #46736:
URL: https://github.com/apache/spark/pull/46736#discussion_r1613884389


##
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala:
##
@@ -409,9 +407,6 @@ private[joins] class UnsafeHashedRelation(
 val pageSizeBytes = Option(SparkEnv.get).map(_.memoryManager.pageSizeBytes)
   .getOrElse(new SparkConf().get(BUFFER_PAGESIZE).getOrElse(16L * 1024 * 
1024))
 
-// TODO(josh): We won't need this dummy memory manager after future 
refactorings; revisit

Review Comment:
   These are ancient TOODs, but my recollection is that they were related to 
some limitations in storage memory management related to transferring memory 
between execution and storage: at the time (and even now) I don't think we had 
a reliable way to transfer task-allocated memory pages into a shared context 
where they can be used by multiple tasks and counted towards storage memory, 
hence some of the hacks here.
   
   I'm not firmly opposed to removing TODOs like this, but I also don't 
particularly understand the motivation for doing so: sure, it's a bit of 
clutter in the code but if we're not replacing them with a newer comment or 
making a code change then it just seems like cleanup for cleanup's sake, which 
I am generally opposed to for code churn / review bandwidth reasons.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub



wayneguow commented on code in PR #46731:
URL: https://github.com/apache/spark/pull/46731#discussion_r1613880314


##
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala:
##
@@ -90,8 +91,8 @@ class ResolveSessionCatalog(val catalogManager: 
CatalogManager)
 table.schema.findNestedField(Seq(colName), resolver = conf.resolver)
   .map(_._2.dataType)
   .getOrElse {
-throw 
QueryCompilationErrors.alterColumnCannotFindColumnInV1TableError(
-  quoteIfNeeded(colName), table)
+throw QueryCompilationErrors.unresolvedColumnError(

Review Comment:
   Would it be better if we could provide users with table and context 
information?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48286] Fix analysis and creation of column with exists default expression [spark]

2024-05-24 Thread via GitHub



cloud-fan commented on PR #46594:
URL: https://github.com/apache/spark/pull/46594#issuecomment-2130135726

   can we add a test? Besically any default column value that is unfoldable, 
like `current_date()`, can trigger this bug


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48286] Fix analysis and creation of column with exists default expression [spark]

2024-05-24 Thread via GitHub



cloud-fan commented on PR #46594:
URL: https://github.com/apache/spark/pull/46594#issuecomment-2130132833

   Can you retry the github action job? Seems flaky 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48320][CORE][DOCS] Add external third-party ecosystem access guide to the `scala/java` doc [spark]

2024-05-24 Thread via GitHub



gengliangwang commented on code in PR #46634:
URL: https://github.com/apache/spark/pull/46634#discussion_r1613860819


##
common/utils/src/main/scala/org/apache/spark/internal/README.md:
##


Review Comment:
   I will wait for @mridulm's response until this weekend. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub



wayneguow commented on code in PR #46731:
URL: https://github.com/apache/spark/pull/46731#discussion_r1613124606


##
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala:
##
@@ -90,8 +91,8 @@ class ResolveSessionCatalog(val catalogManager: 
CatalogManager)
 table.schema.findNestedField(Seq(colName), resolver = conf.resolver)
   .map(_._2.dataType)
   .getOrElse {
-throw 
QueryCompilationErrors.alterColumnCannotFindColumnInV1TableError(
-  quoteIfNeeded(colName), table)
+throw QueryCompilationErrors.unresolvedColumnError(

Review Comment:
   > Can we reuse `UNRESOLVED_COLUMN.WITH_SUGGESTION`?
   
   For the scenario of `ALTER COLUMN`, since there is one table, I think it is 
OK; for `resolveFieldNames` in `Analyzer`, I think it would be better if the 
table can be explicitly specified.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-48411][SS][PYTHON] Add E2E test for DropDuplicateWithinWatermark [spark]

2024-05-24 Thread via GitHub



eason-yuchen-liu opened a new pull request, #46740:
URL: https://github.com/apache/spark/pull/46740

   ### What changes were proposed in this pull request?
   This PR adds a test for API DropDuplicateWithinWatermark, which was 
previously missing.
   
   
   ### Why are the changes needed?
   Check the correctness of API DropDuplicateWithinWatermark.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Passed:
   ```
   python/run-tests --testnames pyspark.sql.tests.streaming.test_streaming
   python/run-tests --testnames 
pyspark.sql.tests.connect.streaming.test_parity_streaming
   ```
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47579][CORE][PART3] Spark core: Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub



gengliangwang commented on PR #46739:
URL: https://github.com/apache/spark/pull/46739#issuecomment-2130009429

   cc @panbingkun @zeotuan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-47579][CORE][PART3] Spark core: Migrate logInfo with variables to structured logging framework [spark]

2024-05-24 Thread via GitHub



gengliangwang opened a new pull request, #46739:
URL: https://github.com/apache/spark/pull/46739

   
   
   ### What changes were proposed in this pull request?
   
   The PR aims to migrate logInfo in Core module with variables to structured 
logging framework.
   
   
   
   ### Why are the changes needed?
   
   
   To enhance Apache Spark's logging system by implementing structured logging.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   ### How was this patch tested?
   
   GA tests
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Yes, Generated-by: GPT-4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47579][SQL][FOLLOWUP] Restore the `--help` print format of spark sql shell [spark]

2024-05-24 Thread via GitHub



gengliangwang closed pull request #46735: [SPARK-47579][SQL][FOLLOWUP] Restore 
the `--help` print format of spark sql shell
URL: https://github.com/apache/spark/pull/46735


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47579][SQL][FOLLOWUP] Restore the `--help` print format of spark sql shell [spark]

2024-05-24 Thread via GitHub



gengliangwang commented on PR #46735:
URL: https://github.com/apache/spark/pull/46735#issuecomment-2129973290

   Merging to master. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub



GideonPotok commented on PR #46597:
URL: https://github.com/apache/spark/pull/46597#issuecomment-2129923419

   @uros-db I forgot but should I add collation support to 
`org.apache.spark.sql.catalyst.expressions.aggregate.PandasMode`?
   
   The only difference will be 
   1. Support for null keys (thus StringType won't necessarily mean all values 
in buffer are UTF8String, some might just be null, right?)
   2. PandasMode returns a list of all values that are tied for mode. In that 
case, should all the values be present? Eg if you have the pandas_mode of ['a', 
'a', 'a', 'b',  'b', 'B'], with utf_binary_lcase collation, what do you think 
pandas_mode should return? If we want to support PandasMode, I can do a little 
research on what other databases seem to favor for this type of question. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-24 Thread via GitHub



cloud-fan commented on code in PR #46440:
URL: https://github.com/apache/spark/pull/46440#discussion_r1613651414


##
connector/connect/common/src/test/resources/query-tests/explain-results/function_shiftleft.explain:
##
@@ -1,2 +1,2 @@
-Project [shiftleft(cast(b#0 as int), 2) AS shiftleft(b, 2)#0]
+Project [(cast(b#0 as int) << 2) AS (b << 2)#0]

Review Comment:
   can we keep the pretty string unchanged? if people explicitly use 
`shiftleft` function, then the generated alias name should still be `shiftleft`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier [spark]

2024-05-24 Thread via GitHub



nikolamand-db commented on PR #46580:
URL: https://github.com/apache/spark/pull/46580#issuecomment-2129615940

   > I think this fixes https://issues.apache.org/jira/browse/SPARK-46625 as 
well. Can we add a test to verify?
   
   Checked locally, seems like these changes don't resolve the issue:
   ```
   spark-sql (default)> explain extended WITH S(c1, c2) AS (VALUES(1, 2), (2, 
3)), T(c1, c2) AS (VALUES ('a', 'b'), ('c', 'd')) SELECT 
IDENTIFIER('max')(IDENTIFIER('c1')) FROM IDENTIFIER('T');
   
   == Parsed Logical Plan ==
   CTE [S, T]
   :  :- 'SubqueryAlias S
   :  :  +- 'UnresolvedSubqueryColumnAliases [c1, c2]
   :  : +- 'UnresolvedInlineTable [col1, col2], [[1, 2], [2, 3]]
   :  +- 'SubqueryAlias T
   : +- 'UnresolvedSubqueryColumnAliases [c1, c2]
   :+- 'UnresolvedInlineTable [col1, col2], [[a, b], [c, d]]
   +- 'Project [unresolvedalias(expressionwithunresolvedidentifier(max, 
expressionwithunresolvedidentifier(c1, ), 
org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$1453/0x000401c95138@312ddc51))]
  +- 'PlanWithUnresolvedIdentifier T, 
org.apache.spark.sql.catalyst.parser.AstBuilder$$Lambda$1420/0x000401c5e688@330f3046
   
   == Analyzed Logical Plan ==
   org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] 
A column, variable, or function parameter with name `c1` cannot be resolved. 
Did you mean one of the following? [`Z`, `s`]. SQLSTATE: 42703; line 1 pos 129;
   'WithCTE
   :- CTERelationDef 0, false
   :  +- SubqueryAlias S
   : +- Project [col1#5 AS c1#9, col2#6 AS c2#10]
   :+- LocalRelation [col1#5, col2#6]
   :- CTERelationDef 1, false
   :  +- SubqueryAlias T
   : +- Project [col1#7 AS c1#11, col2#8 AS c2#12]
   :+- LocalRelation [col1#7, col2#8]
   +- 'Project [unresolvedalias('max('c1))]
  +- SubqueryAlias spark_catalog.default.t
 +- Relation spark_catalog.default.t[Z#13,s#14] parquet
   ```
   `PlanWithUnresolvedIdentifier` gets resolved but it tries to use table `t` 
from catalog instead of CTE definition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub



davidm-db commented on code in PR #46665:
URL: https://github.com/apache/spark/pull/46665#discussion_r1613161561


##
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##
@@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; }
   public boolean double_quoted_identifiers = false;
 }
 
+compoundOrSingleStatement
+: singleStatement
+| singleCompound
+;
+
+singleCompound
+: compound SEMICOLON* EOF
+;
+
+compound
+: BEGIN compoundBody END
+;
+
+// Last semicolon in body is optional.
+compoundBody

Review Comment:
   As Milan said, we will reuse `compoundBody` in future in the blocks that 
don't require `BEGIN` and `END` (if else, while, etc.).
   One more reason is that in such blocks (that don't begin with `BEGIN`) 
variable declaration is not allowed, so it is easier to have these cases 
separated for the sake of additional checks in visitor functions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46682:
URL: https://github.com/apache/spark/pull/46682#discussion_r1613448325


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##
@@ -34,6 +34,155 @@
  * Utility class for collation-aware UTF8String operations.
  */
 public class CollationAwareUTF8String {
+
+  /**
+   * The constant value to indicate that the match is not found when searching 
for a pattern
+   * string in a target string.
+   */
+  private static final int MATCH_NOT_FOUND = -1;
+
+  /**
+   * Returns whether the target string starts with the specified prefix, 
starting from the
+   * specified position (0-based index referring to character position in 
UTF8String), with respect
+   * to the UTF8_BINARY_LCASE collation. The method assumes that the prefix is 
already lowercased
+   * prior to method call to avoid the overhead of calling .toLowerCase() 
multiple times on the
+   * same prefix string.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return whether the target string starts with the specified prefix in 
UTF8_BINARY_LCASE
+   */
+  public static boolean lowercaseMatchFrom(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+return lowercaseMatchLengthFrom(target, lowercasePattern, startPos) != 
MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns the length of the substring of the target string that starts with 
the specified
+   * prefix, starting from the specified position (0-based index referring to 
character position
+   * in UTF8String), with respect to the UTF8_BINARY_LCASE collation. The 
method assumes that the
+   * prefix is already lowercased. The method only considers the part of 
target string that
+   * starts from the specified (inclusive) position (that is, the method does 
not look at UTF8
+   * characters of the target string at or after position `endPos`). If the 
prefix is not found,
+   * MATCH_NOT_FOUND is returned.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return length of the target substring that ends with the specified 
prefix in lowercase
+   */
+  public static int lowercaseMatchLengthFrom(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+assert startPos >= 0;
+for (int len = 0; len <= target.numChars() - startPos; ++len) {
+  if (target.substring(startPos, startPos + 
len).toLowerCase().equals(lowercasePattern)) {
+return len;
+  }
+}
+return MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns the position of the first occurrence of the pattern string in the 
target string,
+   * starting from the specified position (0-based index referring to 
character position in
+   * UTF8String), with respect to the UTF8_BINARY_LCASE collation. The method 
assumes that the
+   * pattern string is already lowercased prior to call. If the pattern is not 
found,
+   * MATCH_NOT_FOUND is returned.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return the position of the first occurrence of pattern in target
+   */
+  public static int lowercaseFind(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+assert startPos >= 0;
+for (int i = startPos; i <= target.numChars(); ++i) {
+  if (lowercaseMatchFrom(target, lowercasePattern, i)) {
+return i;
+  }
+}
+return MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns whether the target string ends with the specified suffix, ending 
at the specified
+   * position (0-based index referring to character position in UTF8String), 
with respect to the
+   * UTF8_BINARY_LCASE collation. The method assumes that the suffix is 
already lowercased prior
+   * to method call to avoid the overhead of calling .toLowerCase() multiple 
times on the same
+   * suffix string.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param endPos the end position for searching (in the target string)
+   * @return whether the target string ends with the specified suffix in 
lowercase
+   */
+  public static boolean lowercaseMatchUntil(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int endPos) {
+return lowercaseMatchLengthUntil(target, lowercasePattern, endPos) != 
MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns the length of the substring of the target string that ends with 
the specified
+   * suffix, ending at the specified position (0-based index refer

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46682:
URL: https://github.com/apache/spark/pull/46682#discussion_r1613447350


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##
@@ -34,6 +34,155 @@
  * Utility class for collation-aware UTF8String operations.
  */
 public class CollationAwareUTF8String {
+
+  /**
+   * The constant value to indicate that the match is not found when searching 
for a pattern
+   * string in a target string.
+   */
+  private static final int MATCH_NOT_FOUND = -1;
+
+  /**
+   * Returns whether the target string starts with the specified prefix, 
starting from the
+   * specified position (0-based index referring to character position in 
UTF8String), with respect
+   * to the UTF8_BINARY_LCASE collation. The method assumes that the prefix is 
already lowercased
+   * prior to method call to avoid the overhead of calling .toLowerCase() 
multiple times on the
+   * same prefix string.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return whether the target string starts with the specified prefix in 
UTF8_BINARY_LCASE
+   */
+  public static boolean lowercaseMatchFrom(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+return lowercaseMatchLengthFrom(target, lowercasePattern, startPos) != 
MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns the length of the substring of the target string that starts with 
the specified
+   * prefix, starting from the specified position (0-based index referring to 
character position
+   * in UTF8String), with respect to the UTF8_BINARY_LCASE collation. The 
method assumes that the
+   * prefix is already lowercased. The method only considers the part of 
target string that
+   * starts from the specified (inclusive) position (that is, the method does 
not look at UTF8
+   * characters of the target string at or after position `endPos`). If the 
prefix is not found,
+   * MATCH_NOT_FOUND is returned.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return length of the target substring that ends with the specified 
prefix in lowercase
+   */
+  public static int lowercaseMatchLengthFrom(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+assert startPos >= 0;
+for (int len = 0; len <= target.numChars() - startPos; ++len) {
+  if (target.substring(startPos, startPos + 
len).toLowerCase().equals(lowercasePattern)) {
+return len;
+  }
+}
+return MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns the position of the first occurrence of the pattern string in the 
target string,
+   * starting from the specified position (0-based index referring to 
character position in
+   * UTF8String), with respect to the UTF8_BINARY_LCASE collation. The method 
assumes that the
+   * pattern string is already lowercased prior to call. If the pattern is not 
found,
+   * MATCH_NOT_FOUND is returned.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return the position of the first occurrence of pattern in target
+   */
+  public static int lowercaseFind(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+assert startPos >= 0;
+for (int i = startPos; i <= target.numChars(); ++i) {
+  if (lowercaseMatchFrom(target, lowercasePattern, i)) {
+return i;
+  }
+}
+return MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns whether the target string ends with the specified suffix, ending 
at the specified
+   * position (0-based index referring to character position in UTF8String), 
with respect to the
+   * UTF8_BINARY_LCASE collation. The method assumes that the suffix is 
already lowercased prior
+   * to method call to avoid the overhead of calling .toLowerCase() multiple 
times on the same
+   * suffix string.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param endPos the end position for searching (in the target string)
+   * @return whether the target string ends with the specified suffix in 
lowercase
+   */
+  public static boolean lowercaseMatchUntil(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int endPos) {
+return lowercaseMatchLengthUntil(target, lowercasePattern, endPos) != 
MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns the length of the substring of the target string that ends with 
the specified
+   * suffix, ending at the specified position (0-based index refer

Re: [PR] [SPARK-48413][SQL] ALTER COLUMN with collation [spark]

2024-05-24 Thread via GitHub



johanl-db commented on code in PR #46734:
URL: https://github.com/apache/spark/pull/46734#discussion_r1613401422


##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala:
##
@@ -396,8 +396,9 @@ case class AlterTableChangeColumnCommand(
 val newDataSchema = table.dataSchema.fields.map { field =>
   if (field.name == originColumn.name) {
 // Create a new column from the origin column with the new comment.
+val newField = field.copy(dataType = newColumn.dataType)

Review Comment:
   Can you make it more explicit that we effectively allow the data type to 
change now but we only allow type changes that differ by collation?
   E.g. by splitting `columnEqual` into checking that names match on one side 
and that the type change is supported on the other + add a ` withNewType` 
method.



##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala:
##
@@ -396,8 +396,9 @@ case class AlterTableChangeColumnCommand(
 val newDataSchema = table.dataSchema.fields.map { field =>
   if (field.name == originColumn.name) {
 // Create a new column from the origin column with the new comment.
+val newField = field.copy(dataType = newColumn.dataType)

Review Comment:
   Lines 359-367: the comment needs to be updated to be clear we now allow 
changing collation.
   
   Side note: it gives the hive-style syntax for ALTER COLUMN but that's only 
one of the two syntaxes, see:
   
https://github.com/apache/spark/blob/master/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L130
   
   The other one is the one (partially) documented and the more 
capable/preferred one: 
https://spark.apache.org/docs/3.5.1/sql-ref-syntax-ddl-alter-table.html#parameters-4



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46682:
URL: https://github.com/apache/spark/pull/46682#discussion_r1613408156


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##
@@ -34,6 +34,155 @@
  * Utility class for collation-aware UTF8String operations.
  */
 public class CollationAwareUTF8String {
+
+  /**
+   * The constant value to indicate that the match is not found when searching 
for a pattern
+   * string in a target string.
+   */
+  private static final int MATCH_NOT_FOUND = -1;
+
+  /**
+   * Returns whether the target string starts with the specified prefix, 
starting from the
+   * specified position (0-based index referring to character position in 
UTF8String), with respect
+   * to the UTF8_BINARY_LCASE collation. The method assumes that the prefix is 
already lowercased
+   * prior to method call to avoid the overhead of calling .toLowerCase() 
multiple times on the
+   * same prefix string.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return whether the target string starts with the specified prefix in 
UTF8_BINARY_LCASE
+   */
+  public static boolean lowercaseMatchFrom(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+return lowercaseMatchLengthFrom(target, lowercasePattern, startPos) != 
MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns the length of the substring of the target string that starts with 
the specified
+   * prefix, starting from the specified position (0-based index referring to 
character position
+   * in UTF8String), with respect to the UTF8_BINARY_LCASE collation. The 
method assumes that the
+   * prefix is already lowercased. The method only considers the part of 
target string that
+   * starts from the specified (inclusive) position (that is, the method does 
not look at UTF8
+   * characters of the target string at or after position `endPos`). If the 
prefix is not found,
+   * MATCH_NOT_FOUND is returned.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return length of the target substring that ends with the specified 
prefix in lowercase
+   */
+  public static int lowercaseMatchLengthFrom(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+assert startPos >= 0;
+for (int len = 0; len <= target.numChars() - startPos; ++len) {
+  if (target.substring(startPos, startPos + 
len).toLowerCase().equals(lowercasePattern)) {
+return len;
+  }
+}
+return MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns the position of the first occurrence of the pattern string in the 
target string,
+   * starting from the specified position (0-based index referring to 
character position in
+   * UTF8String), with respect to the UTF8_BINARY_LCASE collation. The method 
assumes that the
+   * pattern string is already lowercased prior to call. If the pattern is not 
found,
+   * MATCH_NOT_FOUND is returned.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return the position of the first occurrence of pattern in target
+   */
+  public static int lowercaseFind(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+assert startPos >= 0;
+for (int i = startPos; i <= target.numChars(); ++i) {
+  if (lowercaseMatchFrom(target, lowercasePattern, i)) {
+return i;
+  }
+}
+return MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns whether the target string ends with the specified suffix, ending 
at the specified
+   * position (0-based index referring to character position in UTF8String), 
with respect to the
+   * UTF8_BINARY_LCASE collation. The method assumes that the suffix is 
already lowercased prior
+   * to method call to avoid the overhead of calling .toLowerCase() multiple 
times on the same
+   * suffix string.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param endPos the end position for searching (in the target string)
+   * @return whether the target string ends with the specified suffix in 
lowercase
+   */
+  public static boolean lowercaseMatchUntil(

Review Comment:
   yes, this will need to be a public method (changes in 
https://github.com/apache/spark/pull/46511)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46682:
URL: https://github.com/apache/spark/pull/46682#discussion_r1613407352


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##
@@ -34,6 +34,155 @@
  * Utility class for collation-aware UTF8String operations.
  */
 public class CollationAwareUTF8String {
+
+  /**
+   * The constant value to indicate that the match is not found when searching 
for a pattern
+   * string in a target string.
+   */
+  private static final int MATCH_NOT_FOUND = -1;
+
+  /**
+   * Returns whether the target string starts with the specified prefix, 
starting from the
+   * specified position (0-based index referring to character position in 
UTF8String), with respect
+   * to the UTF8_BINARY_LCASE collation. The method assumes that the prefix is 
already lowercased
+   * prior to method call to avoid the overhead of calling .toLowerCase() 
multiple times on the
+   * same prefix string.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return whether the target string starts with the specified prefix in 
UTF8_BINARY_LCASE
+   */
+  public static boolean lowercaseMatchFrom(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+return lowercaseMatchLengthFrom(target, lowercasePattern, startPos) != 
MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns the length of the substring of the target string that starts with 
the specified
+   * prefix, starting from the specified position (0-based index referring to 
character position
+   * in UTF8String), with respect to the UTF8_BINARY_LCASE collation. The 
method assumes that the
+   * prefix is already lowercased. The method only considers the part of 
target string that
+   * starts from the specified (inclusive) position (that is, the method does 
not look at UTF8
+   * characters of the target string at or after position `endPos`). If the 
prefix is not found,
+   * MATCH_NOT_FOUND is returned.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return length of the target substring that ends with the specified 
prefix in lowercase
+   */
+  public static int lowercaseMatchLengthFrom(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+assert startPos >= 0;
+for (int len = 0; len <= target.numChars() - startPos; ++len) {
+  if (target.substring(startPos, startPos + 
len).toLowerCase().equals(lowercasePattern)) {
+return len;
+  }
+}
+return MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns the position of the first occurrence of the pattern string in the 
target string,
+   * starting from the specified position (0-based index referring to 
character position in
+   * UTF8String), with respect to the UTF8_BINARY_LCASE collation. The method 
assumes that the
+   * pattern string is already lowercased prior to call. If the pattern is not 
found,
+   * MATCH_NOT_FOUND is returned.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return the position of the first occurrence of pattern in target
+   */
+  public static int lowercaseFind(

Review Comment:
   this one can go private



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46682:
URL: https://github.com/apache/spark/pull/46682#discussion_r1613404598


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##
@@ -34,6 +34,155 @@
  * Utility class for collation-aware UTF8String operations.
  */
 public class CollationAwareUTF8String {
+
+  /**
+   * The constant value to indicate that the match is not found when searching 
for a pattern
+   * string in a target string.
+   */
+  private static final int MATCH_NOT_FOUND = -1;
+
+  /**
+   * Returns whether the target string starts with the specified prefix, 
starting from the
+   * specified position (0-based index referring to character position in 
UTF8String), with respect
+   * to the UTF8_BINARY_LCASE collation. The method assumes that the prefix is 
already lowercased
+   * prior to method call to avoid the overhead of calling .toLowerCase() 
multiple times on the
+   * same prefix string.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return whether the target string starts with the specified prefix in 
UTF8_BINARY_LCASE
+   */
+  public static boolean lowercaseMatchFrom(
+  final UTF8String target,
+  final UTF8String lowercasePattern,
+  int startPos) {
+return lowercaseMatchLengthFrom(target, lowercasePattern, startPos) != 
MATCH_NOT_FOUND;
+  }
+
+  /**
+   * Returns the length of the substring of the target string that starts with 
the specified
+   * prefix, starting from the specified position (0-based index referring to 
character position
+   * in UTF8String), with respect to the UTF8_BINARY_LCASE collation. The 
method assumes that the
+   * prefix is already lowercased. The method only considers the part of 
target string that
+   * starts from the specified (inclusive) position (that is, the method does 
not look at UTF8
+   * characters of the target string at or after position `endPos`). If the 
prefix is not found,
+   * MATCH_NOT_FOUND is returned.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return length of the target substring that ends with the specified 
prefix in lowercase
+   */
+  public static int lowercaseMatchLengthFrom(

Review Comment:
   this one can go private



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46682:
URL: https://github.com/apache/spark/pull/46682#discussion_r1613401443


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##
@@ -34,6 +34,155 @@
  * Utility class for collation-aware UTF8String operations.
  */
 public class CollationAwareUTF8String {
+
+  /**
+   * The constant value to indicate that the match is not found when searching 
for a pattern
+   * string in a target string.
+   */
+  private static final int MATCH_NOT_FOUND = -1;
+
+  /**
+   * Returns whether the target string starts with the specified prefix, 
starting from the
+   * specified position (0-based index referring to character position in 
UTF8String), with respect
+   * to the UTF8_BINARY_LCASE collation. The method assumes that the prefix is 
already lowercased
+   * prior to method call to avoid the overhead of calling .toLowerCase() 
multiple times on the
+   * same prefix string.
+   *
+   * @param target the string to be searched in
+   * @param lowercasePattern the string to be searched for
+   * @param startPos the start position for searching (in the target string)
+   * @return whether the target string starts with the specified prefix in 
UTF8_BINARY_LCASE
+   */
+  public static boolean lowercaseMatchFrom(

Review Comment:
   yes, this will need to be a public method (changes in 
https://github.com/apache/spark/pull/46511)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub



davidm-db commented on code in PR #46665:
URL: https://github.com/apache/spark/pull/46665#discussion_r1613389337


##
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##
@@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; }
   public boolean double_quoted_identifiers = false;
 }
 
+compoundOrSingleStatement
+: singleStatement
+| singleCompound
+;
+
+singleCompound
+: compound SEMICOLON* EOF
+;
+
+compound
+: BEGIN compoundBody END
+;
+
+// Last semicolon in body is optional.
+compoundBody
+: compoundStatement (SEMICOLON+ compoundStatement)* SEMICOLON*

Review Comment:
   Talked with Serge offline. His ref spec says that `;` is single and 
mandatory and needs to be present even in the latest statement in the block. 
This is more restrictive, but the idea is that we can easily make it less 
restrictive if we need it in the future.
   
   So now, we have only `(compoundStatement SEMICOLON)*` which allows empty 
`BEGIN END` blocks, but that's also per ref spec.
   
   We can discuss further if we want this changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46700:
URL: https://github.com/apache/spark/pull/46700#discussion_r1613391451


##
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java:
##
@@ -17,15 +17,136 @@
 package org.apache.spark.unsafe.types;
 
 import org.apache.spark.SparkException;
+import org.apache.spark.sql.catalyst.util.CollationAwareUTF8String;
 import org.apache.spark.sql.catalyst.util.CollationFactory;
 import org.apache.spark.sql.catalyst.util.CollationSupport;
 import org.junit.jupiter.api.Test;
 
 import static org.junit.jupiter.api.Assertions.*;
 
-
+// checkstyle.off: AvoidEscapedUnicodeCharacters
 public class CollationSupportSuite {
 
+  /**
+   * Collation-aware UTF8String comparison.
+   */
+
+  private void assertStringCompare(String s1, String s2, String collationName, 
int expected)
+  throws SparkException {
+UTF8String l = UTF8String.fromString(s1);
+UTF8String r = UTF8String.fromString(s2);
+int compare = 
CollationFactory.fetchCollation(collationName).comparator.compare(l, r);
+assertEquals(Integer.signum(expected), Integer.signum(compare));
+  }
+
+  @Test
+  public void testCompare() throws SparkException {
+// Edge cases
+assertStringCompare("", "", "UTF8_BINARY", 0);
+assertStringCompare("a", "", "UTF8_BINARY", 1);
+assertStringCompare("", "a", "UTF8_BINARY", -1);
+assertStringCompare("", "", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("a", "", "UTF8_BINARY_LCASE", 1);
+assertStringCompare("", "a", "UTF8_BINARY_LCASE", -1);
+assertStringCompare("", "", "UNICODE", 0);
+assertStringCompare("a", "", "UNICODE", 1);
+assertStringCompare("", "a", "UNICODE", -1);
+assertStringCompare("", "", "UNICODE_CI", 0);
+assertStringCompare("a", "", "UNICODE_CI", 1);
+assertStringCompare("", "a", "UNICODE_CI", -1);
+// Basic tests
+assertStringCompare("AbCd", "aBcD", "UTF8_BINARY", -1);
+assertStringCompare("ABCD", "abcd", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("AbcD", "aBCd", "UNICODE", 1);
+assertStringCompare("abcd", "ABCD", "UNICODE_CI", 0);
+// Accent variation
+assertStringCompare("aBćD", "ABĆD", "UTF8_BINARY", 1);
+assertStringCompare("AbCδ", "ABCΔ", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("äBCd", "ÄBCD", "UNICODE", -1);
+assertStringCompare("Ab́cD", "AB́CD", "UNICODE_CI", 0);
+// Case-variable character length
+assertStringCompare("i\u0307", "İ", "UTF8_BINARY", -1);
+assertStringCompare("İ", "i\u0307", "UTF8_BINARY", 1);
+assertStringCompare("i\u0307", "İ", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("İ", "i\u0307", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("i\u0307", "İ", "UNICODE", -1);
+assertStringCompare("İ", "i\u0307", "UNICODE", 1);
+assertStringCompare("i\u0307", "İ", "UNICODE_CI", 0);
+assertStringCompare("İ", "i\u0307", "UNICODE_CI", 0);
+assertStringCompare("i\u0307İ", "i\u0307İ", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("i\u0307İ", "İi\u0307", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("İi\u0307", "i\u0307İ", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("İi\u0307", "İi\u0307", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("i\u0307İ", "i\u0307İ", "UNICODE_CI", 0);
+assertStringCompare("i\u0307İ", "İi\u0307", "UNICODE_CI", 0);
+assertStringCompare("İi\u0307", "i\u0307İ", "UNICODE_CI", 0);
+assertStringCompare("İi\u0307", "İi\u0307", "UNICODE_CI", 0);
+// Conditional case mapping
+assertStringCompare("ς", "σ", "UTF8_BINARY", -1);
+assertStringCompare("ς", "Σ", "UTF8_BINARY", 1);
+assertStringCompare("σ", "Σ", "UTF8_BINARY", 1);
+assertStringCompare("ς", "σ", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("ς", "Σ", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("σ", "Σ", "UTF8_BINARY_LCASE", 0);
+assertStringCompare("ς", "σ", "UNICODE", 1);
+assertStringCompare("ς", "Σ", "UNICODE", 1);
+assertStringCompare("σ", "Σ", "UNICODE", -1);
+assertStringCompare("ς", "σ", "UNICODE_CI", 0);
+assertStringCompare("ς", "Σ", "UNICODE_CI", 0);
+assertStringCompare("σ", "Σ", "UNICODE_CI", 0);
+  }
+
+  private void assertLcaseCompare(String target, String expected, String 
collationName)
+  throws SparkException {
+if (collationName.equals("UTF8_BINARY")) {
+  UTF8String targetUTF8 = UTF8String.fromString(target);
+  UTF8String expectedUTF8 = UTF8String.fromString(expected);
+  assertEquals(expectedUTF8, targetUTF8.toLowerCase());
+} else if (collationName.equals("UTF8_BINARY_LCASE")) {
+  assertEquals(expected, 
CollationAwareUTF8String.lowerCaseCodePoints(target));
+} else {
+  int collationId = CollationFactory.collationNameToId(collationName);
+  assertEquals(expected, CollationAwareUTF8String.toLowerCase(target, 
collationId));
+}
+  }
+
+  @Test
+  public void testLcaseCompare() throws SparkException {
+// Edge cases
+assertLcaseCompare(""

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub



davidm-db commented on code in PR #46665:
URL: https://github.com/apache/spark/pull/46665#discussion_r1613389337


##
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##
@@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; }
   public boolean double_quoted_identifiers = false;
 }
 
+compoundOrSingleStatement
+: singleStatement
+| singleCompound
+;
+
+singleCompound
+: compound SEMICOLON* EOF
+;
+
+compound
+: BEGIN compoundBody END
+;
+
+// Last semicolon in body is optional.
+compoundBody
+: compoundStatement (SEMICOLON+ compoundStatement)* SEMICOLON*

Review Comment:
   Talked with Serge offline. His ref spec says that `;` is mandatory and needs 
to be present even in the latest statement in the block. This is more 
restrictive, but the idea is that we can easily make it less restrictive if we 
need it in the future.
   
   So now, we have only `(compoundStatement SEMICOLON)*` which allows empty 
`BEGIN END` blocks, but that's also per ref spec.
   
   We can discuss further if we want this changed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub



davidm-db commented on code in PR #46665:
URL: https://github.com/apache/spark/pull/46665#discussion_r1613386426


##
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##
@@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; }
   public boolean double_quoted_identifiers = false;
 }
 
+compoundOrSingleStatement
+: singleStatement
+| singleCompound
+;
+
+singleCompound
+: compound SEMICOLON* EOF
+;
+
+compound

Review Comment:
   Renamed to `beginEndCompoundBlock`. Added the `Compound` part so it's clear 
that it's related to compound statements.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub



davidm-db commented on code in PR #46665:
URL: https://github.com/apache/spark/pull/46665#discussion_r1613384389


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.parser
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+
+sealed trait CompoundPlanStatement

Review Comment:
   Done.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/SqlScriptingLogicalOperators.scala:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.parser
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+
+sealed trait CompoundPlanStatement
+
+// Statement that is supposed to be executed against Spark.
+// This can also be a Spark expression that is wrapped in a statement.
+case class SparkStatementWithPlan(
+parsedPlan: LogicalPlan,
+sourceStart: Int,

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48415][PYTHON] Refactor `TypeName` to support parameterized datatypes [spark]

2024-05-24 Thread via GitHub



zhengruifeng commented on code in PR #46738:
URL: https://github.com/apache/spark/pull/46738#discussion_r1613379691


##
python/pyspark/sql/types.py:
##
@@ -120,9 +120,8 @@ def __eq__(self, other: Any) -> bool:
 def __ne__(self, other: Any) -> bool:
 return not self.__eq__(other)
 
-@classmethod
-def typeName(cls) -> str:
-return cls.__name__[:-4].lower()
+def typeName(self) -> str:
+return 
self.__class__.__name__.removesuffix("Type").removesuffix("UDT").lower()

Review Comment:
   follow Scala side
   
https://github.com/apache/spark/blob/6f6b4860268dc250d8e31a251d740733798aa512/sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala#L58-L62



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-48415][PYTHON] Refactor `TypeName` to support parameterized datatypes [spark]

2024-05-24 Thread via GitHub



zhengruifeng opened a new pull request, #46738:
URL: https://github.com/apache/spark/pull/46738

   ### What changes were proposed in this pull request?
   1, refactor `TypeName` to support parameterized datatypes
   2, remove redundant simpleString/jsonValue methods, since they are type name 
by default.
   
   ### Why are the changes needed?
   to be consistent with scala side
   
   
   ### Does this PR introduce _any_ user-facing change?
   no, the `typeName` was only internally used in tree string now
   
   
   ### How was this patch tested?
   ci
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46700:
URL: https://github.com/apache/spark/pull/46700#discussion_r1613207707


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##
@@ -147,6 +162,45 @@ public static String toLowerCase(final String target, 
final int collationId) {
 return UCharacter.toLowerCase(locale, target);
   }
 
+  /**
+   * Converts a single code point to lowercase using ICU rules, with special 
handling for
+   * conditional case mappings (i.e. characters that map to multiple 
characters in lowercase).

Review Comment:
   ah, so "conditional case mapping" == "context-sensitivity"
   
   instead, what I wanted to say is just "one-to-many case mapping"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46597:
URL: https://github.com/apache/spark/pull/46597#discussion_r1613363048


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala:
##
@@ -74,16 +90,29 @@ case class Mode(
 if (buffer.isEmpty) {
   return null
 }
-
+val collationAwareBuffer = child.dataType match {
+  case c: StringType if
+!CollationFactory.fetchCollation(c.collationId).supportsBinaryEquality 
=>
+val collationId = c.collationId
+val modeMap = buffer.toSeq.groupMapReduce {
+  case (key: String, _) =>
+CollationFactory.getCollationKey(UTF8String.fromString(key), 
collationId)
+  case (key: UTF8String, _) =>
+CollationFactory.getCollationKey(key, collationId)
+  case (key, _) => key

Review Comment:
   also, what case exactly does this catch? if `case c: StringType` is already 
matched, then we should be able to do 
`CollationFactory.getCollationKey(key.asInstanceOf[UTF8String], collationId)` 
right away?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46597:
URL: https://github.com/apache/spark/pull/46597#discussion_r1613361676


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala:
##
@@ -74,16 +90,29 @@ case class Mode(
 if (buffer.isEmpty) {
   return null
 }
-
+val collationAwareBuffer = child.dataType match {
+  case c: StringType if
+!CollationFactory.fetchCollation(c.collationId).supportsBinaryEquality 
=>
+val collationId = c.collationId
+val modeMap = buffer.toSeq.groupMapReduce {
+  case (key: String, _) =>
+CollationFactory.getCollationKey(UTF8String.fromString(key), 
collationId)
+  case (key: UTF8String, _) =>
+CollationFactory.getCollationKey(key, collationId)

Review Comment:
   could we combine these with something like: 
`CollationFactory.getCollationKey(key.asInstanceOf[UTF8String], collationId)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46597:
URL: https://github.com/apache/spark/pull/46597#discussion_r1613358004


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala:
##
@@ -74,16 +90,29 @@ case class Mode(
 if (buffer.isEmpty) {
   return null
 }
-
+val collationAwareBuffer = child.dataType match {
+  case c: StringType if
+!CollationFactory.fetchCollation(c.collationId).supportsBinaryEquality 
=>
+val collationId = c.collationId
+val modeMap = buffer.toSeq.groupMapReduce {
+  case (key: String, _) =>
+CollationFactory.getCollationKey(UTF8String.fromString(key), 
collationId)
+  case (key: UTF8String, _) =>
+CollationFactory.getCollationKey(key, collationId)

Review Comment:
   ```suggestion
 CollationFactory.getCollationKey(key.asInstanceOf[UTF8String], 
collationId)
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46597:
URL: https://github.com/apache/spark/pull/46597#discussion_r1613358004


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala:
##
@@ -74,16 +90,29 @@ case class Mode(
 if (buffer.isEmpty) {
   return null
 }
-
+val collationAwareBuffer = child.dataType match {
+  case c: StringType if
+!CollationFactory.fetchCollation(c.collationId).supportsBinaryEquality 
=>
+val collationId = c.collationId
+val modeMap = buffer.toSeq.groupMapReduce {
+  case (key: String, _) =>
+CollationFactory.getCollationKey(UTF8String.fromString(key), 
collationId)
+  case (key: UTF8String, _) =>
+CollationFactory.getCollationKey(key, collationId)

Review Comment:
   ```suggestion
 CollationFactory.getCollationKey(key.asInstanceOf[UTF8String], 
collationId)
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [WIP][SPARK-47353][SQL] Enable collation support for the Mode expression using GroupMapReduce [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46597:
URL: https://github.com/apache/spark/pull/46597#discussion_r1613351378


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Mode.scala:
##
@@ -48,6 +49,21 @@ case class Mode(
 
   override def inputTypes: Seq[AbstractDataType] = Seq(AnyDataType)
 
+  override def checkInputDataTypes(): TypeCheckResult = {
+val defaultCheck = super.checkInputDataTypes()
+if (defaultCheck.isFailure) {
+  defaultCheck
+} else if (UnsafeRowUtils.isBinaryStable(child.dataType) ||
+  child.dataType.isInstanceOf[StringType]) {
+  defaultCheck

Review Comment:
   ```suggestion
 TypeCheckResult.TypeCheckSuccess
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-44050][K8S]add retry config when creating Kubernetes resources. [spark]

2024-05-24 Thread via GitHub



imtzer commented on PR #45911:
URL: https://github.com/apache/spark/pull/45911#issuecomment-2129285181

   > > same problem when using spark operator, it's weird why the code does not 
throw anything when configmap is not created
   > 
   > When using spark-submit, there is no error output in the console, and the 
client will show that the driver pod is always in the ContainerCreating state 
and will never end.
   
   Maybe I met another issuse, I use spark-submit in spark operator pod with 
k8s mode, but sometimes driver pod keeps getting stuck in ContainerCreating 
state due to missing ConfigMap and the console output shows 'Killed'. I added 
some log in KubernetesClientApplication.scala like this:
   ```
   logInfo("before pod create, " + driverPodName)
   
   var watch: Watch = null
   var createdDriverPod: Pod = null
   try {
 createdDriverPod =
   
kubernetesClient.pods().inNamespace(conf.namespace).resource(resolvedDriverPod).create()
   } catch {...}
   
   logInfo("before pre resource refresh, " + driverPodName)
   
   // Refresh all pre-resources' owner references
   try {
 addOwnerReference(createdDriverPod, preKubernetesResources)
 kubernetesClient.resourceList(preKubernetesResources: 
_*).forceConflicts().serverSideApply()
   } catch {...}
   
   logInfo("before other resource, " + driverPodName)
   
   // setup resources after pod creation, and refresh all resources' owner 
references
   try {
 val otherKubernetesResources = 
resolvedDriverSpec.driverKubernetesResources ++ Seq(configMap)
 addOwnerReference(createdDriverPod, otherKubernetesResources)
 kubernetesClient.resourceList(otherKubernetesResources: 
_*).createOrReplace()
 logInfo("after other resource, " + driverPodName)
   
   } catch {...}
   ```
   and the log did not show `after other resource` when spark-submit was 
'Killed', the configmap that driver pod need had not been successfully created!
   Also, I checked oom using `dmesg`, and some process in spark operator pod 
was killed because of oom


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48309][YARN]Stop am retry, in situations where some errors and retries may not be successful [spark]

2024-05-24 Thread via GitHub



guixiaowen commented on PR #46620:
URL: https://github.com/apache/spark/pull/46620#issuecomment-2129261156

   > Hi, @guixiaowen I need some time to think about it as it might break some 
existing workloads.
   > 
   > Meantime, you can
   > 
   > * Update the PR desc for better readability
   > * Update the PR desc according to the latest change
   > * Revise the doc of `spark.yarn.maxAppAttempts`
   
   @yaooqinn  Ok,  Thank you. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-48414][PYTHON] Fix breaking change in python's `fromJson` [spark]

2024-05-24 Thread via GitHub



stefankandic opened a new pull request, #46737:
URL: https://github.com/apache/spark/pull/46737

   
   
   ### What changes were proposed in this pull request?
   
   Fix breaking change in `fromJson` method by having default param values.
   
   
   ### Why are the changes needed?
   
   In order to not break clients that don't care about collations.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing UTs.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48309][YARN]Stop am retry, in situations where some errors and retries may not be successful [spark]

2024-05-24 Thread via GitHub



yaooqinn commented on PR #46620:
URL: https://github.com/apache/spark/pull/46620#issuecomment-2129183148

   Hi, @guixiaowen I need some time to think about it as it might break some 
existing workloads.
   
   Meantime, you can
   - Update the PR desc for better readability
   - Update the PR desc according to the latest change
   - Revise the doc of `spark.yarn.maxAppAttempts`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-24 Thread via GitHub



ulysses-you closed pull request #46440: [SPARK-48168][SQL] Add bitwise shifting 
operators support
URL: https://github.com/apache/spark/pull/46440


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48168][SQL] Add bitwise shifting operators support [spark]

2024-05-24 Thread via GitHub



ulysses-you commented on PR #46440:
URL: https://github.com/apache/spark/pull/46440#issuecomment-2129180382

   thanks, merged to master(4.0.0)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48282][SQL] Alter string search logic for UTF8_BINARY_LCASE collation (StringReplace, FindInSet) [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46682:
URL: https://github.com/apache/spark/pull/46682#discussion_r1613221638


##
common/unsafe/src/test/java/org/apache/spark/unsafe/types/CollationSupportSuite.java:
##
@@ -610,8 +610,42 @@ public void testFindInSet() throws SparkException {
 assertFindInSet("界x", "test,大千,世,界X,大,千,世界", "UNICODE_CI", 4);
 assertFindInSet("界x", "test,大千,界Xx,世,界X,大,千,世界", "UNICODE_CI", 5);
 assertFindInSet("大", "test,大千,世,界X,大,千,世界", "UNICODE_CI", 5);
+assertFindInSet("i̇", "İ", "UNICODE_CI", 1);
+assertFindInSet("i", "İ", "UNICODE_CI", 0);
+assertFindInSet("i̇", "i̇", "UNICODE_CI", 1);
+assertFindInSet("i", "i̇", "UNICODE_CI", 0);
+assertFindInSet("i̇", "İ,", "UNICODE_CI", 1);
+assertFindInSet("i", "İ,", "UNICODE_CI", 0);
+assertFindInSet("i̇", "i̇,", "UNICODE_CI", 1);
+assertFindInSet("i", "i̇,", "UNICODE_CI", 0);
+assertFindInSet("i̇", "ab,İ", "UNICODE_CI", 2);
+assertFindInSet("i", "ab,İ", "UNICODE_CI", 0);
+assertFindInSet("i̇", "ab,i̇", "UNICODE_CI", 2);
+assertFindInSet("i", "ab,i̇", "UNICODE_CI", 0);
+assertFindInSet("i̇", "ab,İ,12", "UNICODE_CI", 2);
+assertFindInSet("i", "ab,İ,12", "UNICODE_CI", 0);
+assertFindInSet("i̇", "ab,i̇,12", "UNICODE_CI", 2);
+assertFindInSet("i", "ab,i̇,12", "UNICODE_CI", 0);
 assertFindInSet("i̇o", "ab,İo,12", "UNICODE_CI", 2);
 assertFindInSet("İo", "ab,i̇o,12", "UNICODE_CI", 2);
+assertFindInSet("i̇", "İ", "UTF8_BINARY_LCASE", 1);
+assertFindInSet("i", "İ", "UTF8_BINARY_LCASE", 0);
+assertFindInSet("i̇", "i̇", "UTF8_BINARY_LCASE", 1);
+assertFindInSet("i", "i̇", "UTF8_BINARY_LCASE", 0);
+assertFindInSet("i̇", "İ,", "UTF8_BINARY_LCASE", 1);
+assertFindInSet("i", "İ,", "UTF8_BINARY_LCASE", 0);
+assertFindInSet("i̇", "i̇,", "UTF8_BINARY_LCASE", 1);
+assertFindInSet("i", "i̇,", "UTF8_BINARY_LCASE", 0);
+assertFindInSet("i̇", "ab,İ", "UTF8_BINARY_LCASE", 2);
+assertFindInSet("i", "ab,İ", "UTF8_BINARY_LCASE", 0);
+assertFindInSet("i̇", "ab,i̇", "UTF8_BINARY_LCASE", 2);
+assertFindInSet("i", "ab,i̇", "UTF8_BINARY_LCASE", 0);
+assertFindInSet("i̇", "ab,İ,12", "UTF8_BINARY_LCASE", 2);
+assertFindInSet("i", "ab,İ,12", "UTF8_BINARY_LCASE", 0);
+assertFindInSet("i̇", "ab,i̇,12", "UTF8_BINARY_LCASE", 2);
+assertFindInSet("i", "ab,i̇,12", "UTF8_BINARY_LCASE", 0);
+assertFindInSet("i̇o", "ab,İo,12", "UTF8_BINARY_LCASE", 2);
+assertFindInSet("İo", "ab,i̇o,12", "UTF8_BINARY_LCASE", 2);

Review Comment:
   returns 3, as expected



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub



dbatomic commented on code in PR #46665:
URL: https://github.com/apache/spark/pull/46665#discussion_r1613213985


##
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##
@@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; }
   public boolean double_quoted_identifiers = false;
 }
 
+compoundOrSingleStatement
+: singleStatement
+| singleCompound
+;
+
+singleCompound
+: compound SEMICOLON* EOF
+;
+
+compound
+: BEGIN compoundBody END
+;
+
+// Last semicolon in body is optional.
+compoundBody
+: compoundStatement (SEMICOLON+ compoundStatement)* SEMICOLON*

Review Comment:
   Yeah, in POC we did what other languages do (e.g. in C++ you are allowed to 
say `int x = 12`. Also, for single statements we do allow trailing 
semicolons - `SELECT 1;` is valid.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48283][SQL] Modify string comparison for UTF8_BINARY_LCASE [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46700:
URL: https://github.com/apache/spark/pull/46700#discussion_r1613207707


##
common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/CollationAwareUTF8String.java:
##
@@ -147,6 +162,45 @@ public static String toLowerCase(final String target, 
final int collationId) {
 return UCharacter.toLowerCase(locale, target);
   }
 
+  /**
+   * Converts a single code point to lowercase using ICU rules, with special 
handling for
+   * conditional case mappings (i.e. characters that map to multiple 
characters in lowercase).

Review Comment:
   ah, so "conditional case mapping" == "context-sensitivity"
   
   instead, what I wanted to say is just "case mapping"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48000][SQL] Enable hash join support for all collations (StringType) [spark]

2024-05-24 Thread via GitHub



uros-db commented on code in PR #46599:
URL: https://github.com/apache/spark/pull/46599#discussion_r1612779931


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala:
##
@@ -279,6 +280,7 @@ abstract class Optimizer(catalogManager: CatalogManager)
   RewritePredicateSubquery.ruleName ::
   NormalizeFloatingNumbers.ruleName ::
   ReplaceUpdateFieldsExpression.ruleName ::
+  RewriteCollationJoin.ruleName ::

Review Comment:
   update: I think there is no impact on correctness, so I went ahead and 
removed `RewriteCollationJoin` from `nonExcludableRules`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub



zhengruifeng commented on PR #46733:
URL: https://github.com/apache/spark/pull/46733#issuecomment-2129082503

   merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48412][PYTHON] Refactor data type json parse [spark]

2024-05-24 Thread via GitHub



zhengruifeng closed pull request #46733: [SPARK-48412][PYTHON] Refactor data 
type json parse
URL: https://github.com/apache/spark/pull/46733


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [MINOR][SQL] Remove outdated `TODO`s from `UnsafeHashedRelation` [spark]

2024-05-24 Thread via GitHub



LuciferYang commented on code in PR #46736:
URL: https://github.com/apache/spark/pull/46736#discussion_r1613184450


##
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala:
##
@@ -409,9 +407,6 @@ private[joins] class UnsafeHashedRelation(
 val pageSizeBytes = Option(SparkEnv.get).map(_.memoryManager.pageSizeBytes)
   .getOrElse(new SparkConf().get(BUFFER_PAGESIZE).getOrElse(16L * 1024 * 
1024))
 
-// TODO(josh): We won't need this dummy memory manager after future 
refactorings; revisit

Review Comment:
   friendly ping @JoshRosen , can we remove these 2 TODO?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [MINOR][SQL] Remove outdated `TODO`s from `UnsafeHashedRelation` [spark]

2024-05-24 Thread via GitHub



LuciferYang commented on code in PR #46736:
URL: https://github.com/apache/spark/pull/46736#discussion_r1613184450


##
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala:
##
@@ -409,9 +407,6 @@ private[joins] class UnsafeHashedRelation(
 val pageSizeBytes = Option(SparkEnv.get).map(_.memoryManager.pageSizeBytes)
   .getOrElse(new SparkConf().get(BUFFER_PAGESIZE).getOrElse(16L * 1024 * 
1024))
 
-// TODO(josh): We won't need this dummy memory manager after future 
refactorings; revisit

Review Comment:
   friendly ping @JoshRosen , can we remove these 2 TOOD?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48413][SQL] ALTER COLUMN with collation [spark]

2024-05-24 Thread via GitHub



olaky commented on code in PR #46734:
URL: https://github.com/apache/spark/pull/46734#discussion_r1613167082


##
sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala:
##
@@ -408,6 +408,37 @@ object DataType {
 }
   }
 
+  /**
+   * Check if `from` is equal to `to` type except for collations and 
nullability, which are
+   * both checked to be compatible so that data of type `from` can be 
interpreted as of type `to`.
+   */
+  private[sql] def equalsIgnoreCompatibleCollationAndNullability(

Review Comment:
   let's already give this a name that reflects that is returns true of the it 
is allowed to evolve the type. This will be true for more cases than collations 
in the future



##
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala:
##
@@ -445,7 +446,8 @@ case class AlterTableChangeColumnCommand(
   // name(by resolver) and dataType.
   private def columnEqual(
   field: StructField, other: StructField, resolver: Resolver): Boolean = {
-resolver(field.name, other.name) && field.dataType == other.dataType
+resolver(field.name, other.name) &&
+  DataType.equalsIgnoreCompatibleCollationAndNullability(field.dataType, 
other.dataType)

Review Comment:
   The comment of the command in line 359 is not outdated



##
sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala:
##
@@ -408,6 +408,37 @@ object DataType {
 }
   }
 
+  /**
+   * Check if `from` is equal to `to` type except for collations and 
nullability, which are
+   * both checked to be compatible so that data of type `from` can be 
interpreted as of type `to`.
+   */
+  private[sql] def equalsIgnoreCompatibleCollationAndNullability(
+  from: DataType,
+  to: DataType): Boolean = {
+(from, to) match {
+  case (_: StringType, _: StringType) => true
+
+  case (ArrayType(fromElement, fn), ArrayType(toElement, tn)) =>
+(tn || !fn) && 
equalsIgnoreCompatibleCollationAndNullability(fromElement, toElement)
+
+  case (MapType(fromKey, fromValue, fn), MapType(toKey, toValue, tn)) =>
+(tn || !fn) &&
+  // Map keys cannot change collation.
+  equalsIgnoreCompatibleNullability(fromKey, toKey) &&
+  equalsIgnoreCompatibleCollationAndNullability(fromValue, toValue)
+
+  case (StructType(fromFields), StructType(toFields)) =>
+fromFields.length == toFields.length &&
+  fromFields.zip(toFields).forall { case (fromField, toField) =>

Review Comment:
   I would rather do a forAll on fromFields and look up each field by name in 
toFields. That way we do not depend on the order



##
sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala:
##
@@ -408,6 +408,37 @@ object DataType {
 }
   }
 
+  /**
+   * Check if `from` is equal to `to` type except for collations and 
nullability, which are
+   * both checked to be compatible so that data of type `from` can be 
interpreted as of type `to`.
+   */
+  private[sql] def equalsIgnoreCompatibleCollationAndNullability(
+  from: DataType,
+  to: DataType): Boolean = {
+(from, to) match {
+  case (_: StringType, _: StringType) => true
+
+  case (ArrayType(fromElement, fn), ArrayType(toElement, tn)) =>
+(tn || !fn) && 
equalsIgnoreCompatibleCollationAndNullability(fromElement, toElement)

Review Comment:
   Why do we want to allow setting the type nullable?



##
sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala:
##
@@ -408,6 +408,37 @@ object DataType {
 }
   }
 
+  /**
+   * Check if `from` is equal to `to` type except for collations and 
nullability, which are
+   * both checked to be compatible so that data of type `from` can be 
interpreted as of type `to`.
+   */
+  private[sql] def equalsIgnoreCompatibleCollationAndNullability(
+  from: DataType,
+  to: DataType): Boolean = {
+(from, to) match {
+  case (_: StringType, _: StringType) => true
+
+  case (ArrayType(fromElement, fn), ArrayType(toElement, tn)) =>
+(tn || !fn) && 
equalsIgnoreCompatibleCollationAndNullability(fromElement, toElement)
+
+  case (MapType(fromKey, fromValue, fn), MapType(toKey, toValue, tn)) =>

Review Comment:
   let's also not abbreviate here



##
sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala:
##
@@ -408,6 +408,37 @@ object DataType {
 }
   }
 
+  /**
+   * Check if `from` is equal to `to` type except for collations and 
nullability, which are
+   * both checked to be compatible so that data of type `from` can be 
interpreted as of type `to`.
+   */
+  private[sql] def equalsIgnoreCompatibleCollationAndNullability(
+  from: DataType,
+  to: DataType): Boolean = {
+(from, to) match {
+  case (_: StringType, _: StringType) => true
+
+  case (ArrayType(fromElement, fn), ArrayType(toElement, tn)) =>
+(tn || !fn) && 
equalsIgnoreCompatibleCollationAndNullability(fromEl

Re: [PR] [MINOR][SQL] Remove an outdated `TODO` from UnsafeHashedRelation [spark]

2024-05-24 Thread via GitHub



LuciferYang commented on PR #46736:
URL: https://github.com/apache/spark/pull/46736#issuecomment-2129074503

   > How about removing the `TODO(josh)` at L412
   
   
[done](https://github.com/apache/spark/pull/9127/files#diff-127291a0287f790755be5473765ea03eb65f8b58b9ec0760955f124e21e3452f)
   
   
![image](https://github.com/apache/spark/assets/1475305/51904dc0-e8d9-470f-b6ed-9904246d9e47)
   
   seems ok


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48213][SQL] Do not push down predicate if non-cheap expression exceed reused limit [spark]

2024-05-24 Thread via GitHub



zml1206 commented on PR #46499:
URL: https://github.com/apache/spark/pull/46499#issuecomment-2129064493

   > Can we make `With` nested as well?
   
   
   I have thought about it for a long time, and I can change the logic of 
withRewrite. The `alias` generated by the lowest layer with is in the uppermost 
project, which is currently reversed. This can solve the complex nesting above.
   `with(with(((commonexpressionref(CommonExpressionId(1,false), IntegerType, 
true) + commonexpressionref(CommonExpressionId(1,false), IntegerType, true)) > 
commonexpressionref(CommonExpressionId(0,false), IntegerType, true)), 
commonexpressiondef((commonexpressionref(CommonExpressionId(0,false), 
IntegerType, true) * commonexpressionref(CommonExpressionId(0,false), 
IntegerType, true)), CommonExpressionId(1,false))), commonexpressiondef((a#0 + 
a#0), CommonExpressionId(0,false)))`
   Do you think it’s feasible? If so, I’ll do it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub



davidm-db commented on code in PR #46665:
URL: https://github.com/apache/spark/pull/46665#discussion_r1613161561


##
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##
@@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; }
   public boolean double_quoted_identifiers = false;
 }
 
+compoundOrSingleStatement
+: singleStatement
+| singleCompound
+;
+
+singleCompound
+: compound SEMICOLON* EOF
+;
+
+compound
+: BEGIN compoundBody END
+;
+
+// Last semicolon in body is optional.
+compoundBody

Review Comment:
   As Milan said, we will reuse `compoundBody` in future in the blocks that 
don't require `BEGIN` and `END` (if else, while, etc.).
   One more reason is that in such blocks (that don't begin with `BEGIN`) 
variable declaration is not allowed, so we need to have these cases separated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [MINOR][SQL] Remove an outdated `TODO` from UnsafeHashedRelation [spark]

2024-05-24 Thread via GitHub



yaooqinn commented on PR #46736:
URL: https://github.com/apache/spark/pull/46736#issuecomment-2129046316

   How about removing the `TODO(josh)` at L412 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48396] Support configuring max cores can be used for SQL [spark]

2024-05-24 Thread via GitHub



yabola commented on PR #46713:
URL: https://github.com/apache/spark/pull/46713#issuecomment-2129043469

   @cloud-fan could you take a look, thank you~ This is useful in a shared sql 
cluster. This will make it easier to control sql. 
   The picture below shows that cores used is consistent.
   https://github.com/apache/spark/assets/31469905/f6812b84-d5fb-4436-b1a2-3171547c94e0";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48352][SQL]set max file counter through spark conf [spark]

2024-05-24 Thread via GitHub



guixiaowen commented on PR #46668:
URL: https://github.com/apache/spark/pull/46668#issuecomment-2129040807

   @LuciferYang  Can you review this？


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48342][SQL] Introduction of SQL Scripting Parser [spark]

2024-05-24 Thread via GitHub



davidm-db commented on code in PR #46665:
URL: https://github.com/apache/spark/pull/46665#discussion_r1613136311


##
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4:
##
@@ -42,6 +42,29 @@ options { tokenVocab = SqlBaseLexer; }
   public boolean double_quoted_identifiers = false;
 }
 
+compoundOrSingleStatement
+: singleStatement
+| singleCompound
+;
+
+singleCompound
+: compound SEMICOLON* EOF
+;
+
+compound
+: BEGIN compoundBody END
+;
+
+// Last semicolon in body is optional.
+compoundBody
+: compoundStatement (SEMICOLON+ compoundStatement)* SEMICOLON*

Review Comment:
   Not sure why we did it like this in POC, I've tried in MySql Workbench now 
and it seems that MySql doesn't allow multiple "empty" semicolons... I'll need 
to check what is the desired behavior and will update accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub



yaooqinn commented on PR #46704:
URL: https://github.com/apache/spark/pull/46704#issuecomment-2129013813

   Merged to master. Thank you @panbingkun  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-48409][BUILD][TESTS] Upgrade MySQL & Postgres & Mariadb docker image version [spark]

2024-05-24 Thread via GitHub



yaooqinn closed pull request #46704: [SPARK-48409][BUILD][TESTS] Upgrade MySQL 
& Postgres & Mariadb docker image version
URL: https://github.com/apache/spark/pull/46704


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [MINOR][CORE] Remove an outdated TODO from UnsafeHashedRelation [spark]

2024-05-24 Thread via GitHub



LuciferYang commented on code in PR #46736:
URL: https://github.com/apache/spark/pull/46736#discussion_r1613125740


##
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala:
##
@@ -396,8 +396,6 @@ private[joins] class UnsafeHashedRelation(
 val nKeys = readLong()
 val nValues = readLong()
 // This is used in Broadcast, shared by multiple tasks, so we use on-heap 
memory
-// TODO(josh): This needs to be revisited before we merge this patch; 
making this change now
-// so that tests compile:

Review Comment:
   It seems that this was a reminder for the author to revisit this part of the 
code before merging, but it was forgotten to be deleted when the code was 
merged?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-47257][SQL] Assign names to error classes _LEGACY_ERROR_TEMP_105[3-4] and _LEGACY_ERROR_TEMP_1113 [spark]

2024-05-24 Thread via GitHub



wayneguow commented on code in PR #46731:
URL: https://github.com/apache/spark/pull/46731#discussion_r1613124606


##
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala:
##
@@ -90,8 +91,8 @@ class ResolveSessionCatalog(val catalogManager: 
CatalogManager)
 table.schema.findNestedField(Seq(colName), resolver = conf.resolver)
   .map(_._2.dataType)
   .getOrElse {
-throw 
QueryCompilationErrors.alterColumnCannotFindColumnInV1TableError(
-  quoteIfNeeded(colName), table)
+throw QueryCompilationErrors.unresolvedColumnError(

Review Comment:
   > Can we reuse `UNRESOLVED_COLUMN.WITH_SUGGESTION`?
   
   For the scenario of `ALTER COLUMN`, since there is one table, I think it is 
OK; for `resolveFieldNames` in `Analyzer`, I think it would be better if the 
table can be explicitly specified.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 132 matches

Mail list logo