Re: [PR] [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46396: URL: https://github.com/apache/spark/pull/46396#issuecomment-2095144999 Please note that this is also related to the following INFRA blocker issue. - SPARK-48094 Reduce GitHub Action usage according to ASF project allowance -- This is an automated

Re: [PR] [SPARK-47920][DOCS][SS][PYTHON] Add doc for python streaming data source API [spark]

2024-05-05 Thread via GitHub
chaoqin-li1123 commented on code in PR #46139: URL: https://github.com/apache/spark/pull/46139#discussion_r1590520510 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -84,6 +93,131 @@ Define the reader logic to generate synthetic data. Use the `faker` library

Re: [PR] [SPARK-47920][DOCS][SS][PYTHON] Add doc for python streaming data source API [spark]

2024-05-05 Thread via GitHub
chaoqin-li1123 commented on code in PR #46139: URL: https://github.com/apache/spark/pull/46139#discussion_r1590520296 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -137,3 +271,21 @@ Use the fake datasource with a different number of rows: # | Caitlin

Re: [PR] [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46396: URL: https://github.com/apache/spark/pull/46396#issuecomment-2095200623 Could you review this PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] [SPARK-48137][INFRA] Run `yarn` test only in PR builders and Maven Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #46395: URL: https://github.com/apache/spark/pull/46395 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-47920][DOCS][SS][PYTHON] Add doc for python streaming data source API [spark]

2024-05-05 Thread via GitHub
chaoqin-li1123 commented on code in PR #46139: URL: https://github.com/apache/spark/pull/46139#discussion_r1590521175 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -59,8 +59,17 @@ Start by creating a new subclass of :class:`DataSource`. Define the source

Re: [PR] [SPARK-47920][DOCS][SS][PYTHON] Add doc for python streaming data source API [spark]

2024-05-05 Thread via GitHub
chaoqin-li1123 commented on code in PR #46139: URL: https://github.com/apache/spark/pull/46139#discussion_r1590520752 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -59,8 +59,17 @@ Start by creating a new subclass of :class:`DataSource`. Define the source

Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-05 Thread via GitHub
panbingkun commented on PR #46288: URL: https://github.com/apache/spark/pull/46288#issuecomment-2095042166 > Do we have a corresponding issue in the Scala community or JLine community (or GitHub issue)? > > IMO, a. We can ignore Scala 2.13.14 completely due to this issue. b. We can

Re: [PR] [SPARK-47920][DOCS][SS][PYTHON] Add doc for python streaming data source API [spark]

2024-05-05 Thread via GitHub
chaoqin-li1123 commented on code in PR #46139: URL: https://github.com/apache/spark/pull/46139#discussion_r1590525766 ## python/docs/source/user_guide/sql/python_data_source.rst: ## @@ -84,6 +93,131 @@ Define the reader logic to generate synthetic data. Use the `faker` library

Re: [PR] [SPARK-48137][INFRA] Run `yarn` test only in PR builders and Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46395: URL: https://github.com/apache/spark/pull/46395#issuecomment-2095237705 Could you review this PR too, @LuciferYang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46396: URL: https://github.com/apache/spark/pull/46396#issuecomment-2095237347 Thank you so much, @LuciferYang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-48127][INFRA] Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules [spark]

2024-05-05 Thread via GitHub
panbingkun commented on PR #46376: URL: https://github.com/apache/spark/pull/46376#issuecomment-2095048623 late LGTM.  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-05-05 Thread via GitHub
panbingkun commented on code in PR #46022: URL: https://github.com/apache/spark/pull/46022#discussion_r1590471939 ## connector/profiler/src/main/scala/org/apache/spark/executor/profiler/ExecutorProfilerPlugin.scala: ## @@ -23,7 +23,8 @@ import scala.util.Random import

Re: [PR] [SPARK-48137][INFRA] Run `yarn` test only in PR builders and Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46395: URL: https://github.com/apache/spark/pull/46395#issuecomment-2095112747 WDYT, @HyukjinKwon ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48137][INFRA] Run `yarn` test only in PR builders and Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46395: URL: https://github.com/apache/spark/pull/46395#issuecomment-2095122806 Alternatively, we can ignore `YarnClusterSuite` completely. However, I'm not sure who is going to take on that JIRA issue in the future. So, I choose this path. -- This is an

Re: [PR] [SPARK-47240][CORE][PART1] Migrate logInfo with variables to structured logging framework [spark]

2024-05-05 Thread via GitHub
zeotuan commented on code in PR #46362: URL: https://github.com/apache/spark/pull/46362#discussion_r1590481881 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -26,13 +26,16 @@ trait LogKey { } /** - * Various keys used for mapped diagnostic

[PR] [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #46396: URL: https://github.com/apache/spark/pull/46396 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46396: URL: https://github.com/apache/spark/pull/46396#issuecomment-2095142285 cc @juliuszsompolski , @grundprinzip , @HyukjinKwon , @LuciferYang from SPARK-44422 - #42009 -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46396: URL: https://github.com/apache/spark/pull/46396#issuecomment-2095142646 I hope we can fix the root cause in SPARK-48139 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-05 Thread via GitHub
chaoqin-li1123 commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2095178573 @HyukjinKwon both test_python_datasource, test_python_streaming_datasource will fail with the same error if py4j*.zip is removed. > Traceback (most recent call last): >

Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-05 Thread via GitHub
huanliwang-db commented on PR #46351: URL: https://github.com/apache/spark/pull/46351#issuecomment-2095178435 > Ideally all non-EOL version lines, 3.5/3.4. @HeartSaVioR how can i do the backport? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-47240][CORE][PART1] Migrate logInfo with variables to structured logging framework [spark]

2024-05-05 Thread via GitHub
zeotuan commented on PR #46362: URL: https://github.com/apache/spark/pull/46362#issuecomment-2095094081 @gengliangwang Hi please help review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47240][CORE][PART1] Migrate logInfo with variables to structured logging framework [spark]

2024-05-05 Thread via GitHub
zeotuan commented on code in PR #46362: URL: https://github.com/apache/spark/pull/46362#discussion_r1590481881 ## common/utils/src/main/scala/org/apache/spark/internal/LogKey.scala: ## @@ -26,13 +26,16 @@ trait LogKey { } /** - * Various keys used for mapped diagnostic

Re: [PR] [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test [spark]

2024-05-05 Thread via GitHub
LuciferYang commented on PR #46396: URL: https://github.com/apache/spark/pull/46396#issuecomment-2095236411 Merged into master for Spark 4.0. Thanks @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test [spark]

2024-05-05 Thread via GitHub
LuciferYang closed pull request #46396: [SPARK-48138][CONNECT][TESTS] Disable a flaky `SparkSessionE2ESuite.interrupt tag` test URL: https://github.com/apache/spark/pull/46396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-47681][SQL][FOLLOWUP] Fix variant decimal handling. [spark]

2024-05-05 Thread via GitHub
chenhao-db commented on code in PR #46338: URL: https://github.com/apache/spark/pull/46338#discussion_r1590327198 ## common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java: ## @@ -392,21 +392,32 @@ public static double getDouble(byte[] value, int pos) {

Re: [PR] [SPARK-48131][Core] Unify MDC key `mdc.taskName` and `task_name` [spark]

2024-05-05 Thread via GitHub
mridulm commented on PR #46386: URL: https://github.com/apache/spark/pull/46386#issuecomment-2094831806 Wouldn't this not impact/break existing log config files where users are customizing the template ? -- This is an automated message from the Apache Git Service. To respond to the

[PR] [SPARK-48134][CORE] Spark core (java side): Migrate `error/warn/info` with variables to structured logging framework [spark]

2024-05-05 Thread via GitHub
panbingkun opened a new pull request, #46390: URL: https://github.com/apache/spark/pull/46390 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this

Re: [PR] [SPARK-48006][SQL]add SortOrder for window function which has no orde… [spark]

2024-05-05 Thread via GitHub
guixiaowen commented on PR #46243: URL: https://github.com/apache/spark/pull/46243#issuecomment-2094808977 @dongjoon-hyun hi, Can you help me review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays when nulls exist [spark]

2024-05-05 Thread via GitHub
cloud-fan closed pull request #46372: [SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays when nulls exist URL: https://github.com/apache/spark/pull/46372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays when nulls exist [spark]

2024-05-05 Thread via GitHub
cloud-fan commented on PR #46372: URL: https://github.com/apache/spark/pull/46372#issuecomment-2094816913 thanks, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-47681][SQL][FOLLOWUP] Fix variant decimal handling. [spark]

2024-05-05 Thread via GitHub
cloud-fan commented on code in PR #46338: URL: https://github.com/apache/spark/pull/46338#discussion_r1590326222 ## common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java: ## @@ -392,21 +392,32 @@ public static double getDouble(byte[] value, int pos) {

Re: [PR] [SPARK-48133][INFRA] Run `sparkr` only in PR builders and Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46389: URL: https://github.com/apache/spark/pull/46389#issuecomment-2094837554 Could you review this PR, @cloud-fan ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48133][INFRA] Run `sparkr` only in PR builders and Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46389: URL: https://github.com/apache/spark/pull/46389#issuecomment-2094655922 Could you review this PR, @beliefer ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-48133][INFRA] Run `sparkr` only in PR builders and Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46389: URL: https://github.com/apache/spark/pull/46389#issuecomment-2094646444 Could you review this `SparkR` PR too? This is the same kind of activity, @gengliangwang . -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-48133][INFRA] Run `sparkr` only in PR builders and Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on code in PR #46389: URL: https://github.com/apache/spark/pull/46389#discussion_r1590216707 ## .github/workflows/build_and_test.yml: ## @@ -76,17 +76,17 @@ jobs: id: set-outputs run: | if [ -z "${{ inputs.jobs }}" ]; then -

Re: [PR] [SPARK-47484][SQL] Allow trailing comma in column definition list [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #45593: URL: https://github.com/apache/spark/pull/45593#issuecomment-2094674035 Ya, I also believe this is not good for Apache Spark community in a long term perspective. -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] [SPARK-48133][INFRA] Run `sparkr` only in PR builders and Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #46389: URL: https://github.com/apache/spark/pull/46389 ### What changes were proposed in this pull request? This PR aims to run `sparkr` only in PR builder and Daily Python CIs. In other words, only the commit builder will skip it by default.

Re: [PR] [SPARK-48135][INFRA] Run `buf` and `ui` only in PR builders and Java 21 Daily CI [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46392: URL: https://github.com/apache/spark/pull/46392#issuecomment-2094975220 Could you review this PR when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48135][INFRA] Run `buf` and `ui` only in PR builders and Java 21 Daily CI [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun closed pull request #46392: [SPARK-48135][INFRA] Run `buf` and `ui` only in PR builders and Java 21 Daily CI URL: https://github.com/apache/spark/pull/46392 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48135][INFRA] Run `buf` and `ui` only in PR builders and Java 21 Daily CI [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46392: URL: https://github.com/apache/spark/pull/46392#issuecomment-2094999355 Thank you, @viirya ! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind [spark]

2024-05-05 Thread via GitHub
github-actions[bot] closed pull request #44712: [SPARK-46705][SS] Make RocksDB State Store Compaction Less Likely to fall behind URL: https://github.com/apache/spark/pull/44712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-42307][SQL][BUILD][DOCS] Adding in a better name for `_LEGACY_ERROR_TEMP_2232` [spark]

2024-05-05 Thread via GitHub
github-actions[bot] closed pull request #44337: [SPARK-42307][SQL][BUILD][DOCS] Adding in a better name for `_LEGACY_ERROR_TEMP_2232` URL: https://github.com/apache/spark/pull/44337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46848] XML: Enhance XML bad record handling with partial results support [spark]

2024-05-05 Thread via GitHub
github-actions[bot] commented on PR #44875: URL: https://github.com/apache/spark/pull/44875#issuecomment-2095014463 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-05 Thread via GitHub
HyukjinKwon commented on PR #46391: URL: https://github.com/apache/spark/pull/46391#issuecomment-2095016743 cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-05 Thread via GitHub
sinaiamonkar-sai commented on PR #46391: URL: https://github.com/apache/spark/pull/46391#issuecomment-2095018053 Thank you @dongjoon-hyun! Sure, let me add that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [SPARK-48136][INFRA][CONNECT] Always upload Spark Connect log files in scheduled build for Spark Connect [spark]

2024-05-05 Thread via GitHub
HyukjinKwon opened a new pull request, #46393: URL: https://github.com/apache/spark/pull/46393 ### What changes were proposed in this pull request? This PR proposes to upload Spark Connect log files in scheduled build for Spark Connect ### Why are the changes needed?

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-05 Thread via GitHub
HyukjinKwon commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2095022472 @chaoqin-li1123 Seems like this test does not work with pure Python library. Can you see if the tests pass after removing `python/lib/py4j*.zip`? Let me revert this for now

[PR] [SPARK-48054][INFRA][FOLLOW-UP] Rename SPARK_SKIP_JVM_REQUIRED_TESTS to SPARK_SKIP_CONNECT_COMPAT_TESTS [spark]

2024-05-05 Thread via GitHub
HyukjinKwon opened a new pull request, #46394: URL: https://github.com/apache/spark/pull/46394 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/46298 that properly uses the environment variable

Re: [PR] [SPARK-47777][PYTHON][SS][TESTS] Add spark connect test for python streaming data source [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #45950: URL: https://github.com/apache/spark/pull/45950#issuecomment-2095027198 Thank you for checking and mitigating this by reverting, @HyukjinKwon . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-48136][INFRA][CONNECT] Always upload Spark Connect log files in scheduled build for Spark Connect [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun closed pull request #46393: [SPARK-48136][INFRA][CONNECT] Always upload Spark Connect log files in scheduled build for Spark Connect URL: https://github.com/apache/spark/pull/46393 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [SPARK-48136][INFRA][CONNECT] Always upload Spark Connect log files in scheduled build for Spark Connect [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46393: URL: https://github.com/apache/spark/pull/46393#issuecomment-2095028010 Merged to master for Apache Spark 4.0.0-preview preparation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-48054][INFRA][FOLLOW-UP] Rename SPARK_SKIP_JVM_REQUIRED_TESTS to SPARK_SKIP_CONNECT_COMPAT_TESTS [spark]

2024-05-05 Thread via GitHub
HyukjinKwon closed pull request #46394: [SPARK-48054][INFRA][FOLLOW-UP] Rename SPARK_SKIP_JVM_REQUIRED_TESTS to SPARK_SKIP_CONNECT_COMPAT_TESTS URL: https://github.com/apache/spark/pull/46394 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-48054][INFRA][FOLLOW-UP] Rename SPARK_SKIP_JVM_REQUIRED_TESTS to SPARK_SKIP_CONNECT_COMPAT_TESTS [spark]

2024-05-05 Thread via GitHub
HyukjinKwon commented on PR #46394: URL: https://github.com/apache/spark/pull/46394#issuecomment-2095028367 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs [spark]

2024-05-05 Thread via GitHub
zhengruifeng commented on PR #46367: URL: https://github.com/apache/spark/pull/46367#issuecomment-2095031733 late LGTM, thanks @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46367: URL: https://github.com/apache/spark/pull/46367#issuecomment-2095034994 Thanks, @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-42093][SQL] Move JavaTypeInference to AgnosticEncoders [spark]

2024-05-05 Thread via GitHub
viirya commented on code in PR #39615: URL: https://github.com/apache/spark/pull/39615#discussion_r1590399660 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -166,317 +148,58 @@ object JavaTypeInference {

Re: [PR] [SPARK-42093][SQL] Move JavaTypeInference to AgnosticEncoders [spark]

2024-05-05 Thread via GitHub
viirya commented on code in PR #39615: URL: https://github.com/apache/spark/pull/39615#discussion_r1590399793 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -166,317 +148,58 @@ object JavaTypeInference {

Re: [PR] [SPARK-42093][SQL] Move JavaTypeInference to AgnosticEncoders [spark]

2024-05-05 Thread via GitHub
viirya commented on code in PR #39615: URL: https://github.com/apache/spark/pull/39615#discussion_r1590400148 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -166,317 +148,58 @@ object JavaTypeInference {

Re: [PR] [SPARK-42093][SQL] Move JavaTypeInference to AgnosticEncoders [spark]

2024-05-05 Thread via GitHub
viirya commented on code in PR #39615: URL: https://github.com/apache/spark/pull/39615#discussion_r1590400148 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -166,317 +148,58 @@ object JavaTypeInference {

Re: [PR] [SPARK-42093][SQL] Move JavaTypeInference to AgnosticEncoders [spark]

2024-05-05 Thread via GitHub
viirya commented on code in PR #39615: URL: https://github.com/apache/spark/pull/39615#discussion_r1590399660 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala: ## @@ -166,317 +148,58 @@ object JavaTypeInference {

Re: [PR] [SPARK-48133][INFRA] Run `sparkr` only in PR builders and Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46389: URL: https://github.com/apache/spark/pull/46389#issuecomment-2094933246 Could you review this when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-48133][INFRA] Run `sparkr` only in PR builders and Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun commented on PR #46389: URL: https://github.com/apache/spark/pull/46389#issuecomment-2094935863 Thank you so much, @viirya ! Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-48133][INFRA] Run `sparkr` only in PR builders and Daily CIs [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun closed pull request #46389: [SPARK-48133][INFRA] Run `sparkr` only in PR builders and Daily CIs URL: https://github.com/apache/spark/pull/46389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[PR] [SPARK-48135][INFRA] Run `but` and `ui` only in PR builders and Java 21 Daily CI [spark]

2024-05-05 Thread via GitHub
dongjoon-hyun opened a new pull request, #46392: URL: https://github.com/apache/spark/pull/46392 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

Re: [PR] [SPARK-48031] view evolution [spark]

2024-05-05 Thread via GitHub
srielau commented on PR #46267: URL: https://github.com/apache/spark/pull/46267#issuecomment-2094939625 @cloud-fan @gengliangwang This is ready now. Please review. Docs are included in this PR. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] [SPARK-47547][SQL] Fix inaccurate false positive rates when N is large for Bloom Filter [spark]

2024-05-05 Thread via GitHub
batubond007 commented on PR #46370: URL: https://github.com/apache/spark/pull/46370#issuecomment-2094943570 Hi @dongjoon-hyun , can you please review this pr which is also my first one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-05 Thread via GitHub
sinaiamonkar-sai opened a new pull request, #46391: URL: https://github.com/apache/spark/pull/46391 ### What changes were proposed in this pull request? In a Scenario where we use GroupBy in PySpark API with relabeling of aggregate columns and using as_index = False, the columns with

Re: [PR] [SPARK-48045][PYTHON] Pandas API groupby with multi-agg-relabel ignores as_index=False [spark]

2024-05-05 Thread via GitHub
sinaiamonkar-sai commented on PR #46391: URL: https://github.com/apache/spark/pull/46391#issuecomment-2094885603 Hello, @holdenk ! This is my first Spark PR. Can you please review it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to