[GitHub] [spark] SelfImpr001 commented on a diff in pull request #37732: [SPARK-40253] [SQL] Fixed loss of precision for writing 0.00 specific…

2022-09-03 Thread GitBox
SelfImpr001 commented on code in PR #37732: URL: https://github.com/apache/spark/pull/37732#discussion_r962252317 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala: ## @@ -76,8 +76,19 @@ object Literal { val decimal = Decimal(d)

[GitHub] [spark] AmplabJenkins commented on pull request #37770: [SPARK-40314][SQL] Add scala and python bindings for inline and inline_outer

2022-09-03 Thread GitBox
AmplabJenkins commented on PR #37770: URL: https://github.com/apache/spark/pull/37770#issuecomment-1236246586 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #37771: [SPARK-40315][SQL] Add equals() and hashCode() to ArrayBasedMapData

2022-09-03 Thread GitBox
AmplabJenkins commented on PR #37771: URL: https://github.com/apache/spark/pull/37771#issuecomment-1236246575 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on pull request #37624: [SPARK-40186][CORE][YARN] Ensure `mergedShuffleCleaner` have been shutdown before `db` close

2022-09-03 Thread GitBox
LuciferYang commented on PR #37624: URL: https://github.com/apache/spark/pull/37624#issuecomment-1236241317 friendly ping @tgravescs , @Ngone51 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on pull request #37648: [SPARK-38909][BUILD][CORE][YARN][FOLLOWUP] Make some code cleanup related to shuffle state db

2022-09-03 Thread GitBox
LuciferYang commented on PR #37648: URL: https://github.com/apache/spark/pull/37648#issuecomment-1236241254 should we merge this one ? @Ngone51 @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] dongjoon-hyun commented on pull request #37729: Revert "[SPARK-33861][SQL] Simplify conditional in predicate"

2022-09-03 Thread GitBox
dongjoon-hyun commented on PR #37729: URL: https://github.com/apache/spark/pull/37729#issuecomment-1236236113 Thank you for reverting, @wangyum . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun closed pull request #37787: [SPARK-40323][BUILD] Update ORC to 1.8.0

2022-09-03 Thread GitBox
dongjoon-hyun closed pull request #37787: [SPARK-40323][BUILD] Update ORC to 1.8.0 URL: https://github.com/apache/spark/pull/37787 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins commented on pull request #37777: [WIP][SPARK-40309][PYTHON][PS] Introduce `sql_conf` context manager for `pyspark.sql`

2022-09-03 Thread GitBox
AmplabJenkins commented on PR #3: URL: https://github.com/apache/spark/pull/3#issuecomment-1236230648 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #37779: [SPARK-40320][Core] Executor should exit when it failed to initialize for fatal error

2022-09-03 Thread GitBox
AmplabJenkins commented on PR #37779: URL: https://github.com/apache/spark/pull/37779#issuecomment-1236230647 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] IrishBird commented on pull request #34684: [SPARK-37442][SQL] InMemoryRelation statistics bug causing broadcast join failures with AQE enabled

2022-09-03 Thread GitBox
IrishBird commented on PR #34684: URL: https://github.com/apache/spark/pull/34684#issuecomment-1236227362 I have a question as to this. a. I have one text files that about 12G. and this file have 5 columns, for my broadcast table, I only need to read 1 column, and I called the API :so

[GitHub] [spark] williamhyun opened a new pull request, #37787: [SPARK-40323][BUILD] Update ORC to 1.8.0

2022-09-03 Thread GitBox
williamhyun opened a new pull request, #37787: URL: https://github.com/apache/spark/pull/37787 ### What changes were proposed in this pull request? This PR aims to update ORC to 1.8.0. ### Why are the changes needed? This will bring the latest changes and bug fixes. -

[GitHub] [spark] AmplabJenkins commented on pull request #37785: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use co

2022-09-03 Thread GitBox
AmplabJenkins commented on PR #37785: URL: https://github.com/apache/spark/pull/37785#issuecomment-1236215730 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #37786: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 5, ~28 functions)

2022-09-03 Thread GitBox
AmplabJenkins commented on PR #37786: URL: https://github.com/apache/spark/pull/37786#issuecomment-1236215711 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] santosh-d3vpl3x commented on pull request #37761: [SPARK-40311][SQL][CORE][Python] Add withColumnsRenamed to scala and pyspark API

2022-09-03 Thread GitBox
santosh-d3vpl3x commented on PR #37761: URL: https://github.com/apache/spark/pull/37761#issuecomment-1236191695 @zhengruifeng would you like to have a look at this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk closed pull request #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function

2022-09-03 Thread GitBox
MaxGekk closed pull request #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function URL: https://github.com/apache/spark/pull/37763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] MaxGekk commented on pull request #37763: [SPARK-40308][SQL] Allow non-foldable delimiter arguments to `str_to_map` function

2022-09-03 Thread GitBox
MaxGekk commented on PR #37763: URL: https://github.com/apache/spark/pull/37763#issuecomment-1236191274 +1, LGTM. Merging to master. Thank you, @bersprockets and @HyukjinKwon for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] khalidmammadov opened a new pull request, #37786: [SPARK-40142][PYTHON][SQL][FOLLOW-UP] Make pyspark.sql.functions examples self-contained (part 5, ~28 functions)

2022-09-03 Thread GitBox
khalidmammadov opened a new pull request, #37786: URL: https://github.com/apache/spark/pull/37786 ### What changes were proposed in this pull request? It's part of the Pyspark docstrings improvement series (https://github.com/apache/spark/pull/37592,

[GitHub] [spark] JoshRosen commented on pull request #37413: [SPARK-39983][CORE][SQL] Do not cache unserialized broadcast relations on the driver

2022-09-03 Thread GitBox
JoshRosen commented on PR #37413: URL: https://github.com/apache/spark/pull/37413#issuecomment-1236173944 Hi @sos3k, We investigated your bug report and determined that the root-cause was a latent bug in `UnsafeHashedRelation` that was triggered more frequently following the

[GitHub] [spark] hgs19921112 commented on pull request #37785: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use comp

2022-09-03 Thread GitBox
hgs19921112 commented on PR #37785: URL: https://github.com/apache/spark/pull/37785#issuecomment-1236172407 Can you have look at this? @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] hgs19921112 opened a new pull request, #37785: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use com

2022-09-03 Thread GitBox
hgs19921112 opened a new pull request, #37785: URL: https://github.com/apache/spark/pull/37785 ### What changes were proposed in this pull request? Atfter RemoveRedundantAggregates rule, we should pull the complex group by expression out. ### Why are the changes

[GitHub] [spark] hgs19921112 closed pull request #37784: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex ex

2022-09-03 Thread GitBox
hgs19921112 closed pull request #37784: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression URL: https://github.com/apache/spark/pull/37784 -- This is an automated message from the Apache Git

[GitHub] [spark] hgs19921112 opened a new pull request, #37784: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use com

2022-09-03 Thread GitBox
hgs19921112 opened a new pull request, #37784: URL: https://github.com/apache/spark/pull/37784 ### What changes were proposed in this pull request? Atfter RemoveRedundantAggregates rule, we should pull the complex group by expression out. ### Why are the changes

[GitHub] [spark] srowen commented on a diff in pull request #37771: [SPARK-40315][SQL] Add equals() and hashCode() to ArrayBasedMapData

2022-09-03 Thread GitBox
srowen commented on code in PR #37771: URL: https://github.com/apache/spark/pull/37771#discussion_r962166466 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapData.scala: ## @@ -35,6 +37,29 @@ class ArrayBasedMapData(val keyArray: ArrayData, val

[GitHub] [spark] srowen commented on a diff in pull request #37771: [SPARK-40315][SQL] Add equals() and hashCode() to ArrayBasedMapData

2022-09-03 Thread GitBox
srowen commented on code in PR #37771: URL: https://github.com/apache/spark/pull/37771#discussion_r962166373 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/ArrayBasedMapBuilderSuite.scala: ## @@ -142,4 +142,36 @@ class ArrayBasedMapBuilderSuite extends

[GitHub] [spark] srowen commented on pull request #37762: [SPARK-39996][BUILD] Upgrade `postgresql` to 42.5.0

2022-09-03 Thread GitBox
srowen commented on PR #37762: URL: https://github.com/apache/spark/pull/37762#issuecomment-1236138369 It seems fine; the CVE almost surely doesn't affect Spark as it's a test-only dependency but hey. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] srowen closed pull request #37757: Branch 3.3 sam

2022-09-03 Thread GitBox
srowen closed pull request #37757: Branch 3.3 sam URL: https://github.com/apache/spark/pull/37757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] srowen commented on pull request #37778: Unsafe loop

2022-09-03 Thread GitBox
srowen commented on PR #37778: URL: https://github.com/apache/spark/pull/37778#issuecomment-1236138062 https://spark.apache.org/contributing.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] srowen closed pull request #37778: Unsafe loop

2022-09-03 Thread GitBox
srowen closed pull request #37778: Unsafe loop URL: https://github.com/apache/spark/pull/37778 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] srowen commented on a diff in pull request #37724: [SPARK-40273][PYTHON][DOCS] Fix the documents "Contributing and Maintaining Type Hints".

2022-09-03 Thread GitBox
srowen commented on code in PR #37724: URL: https://github.com/apache/spark/pull/37724#discussion_r962165446 ## python/docs/source/development/contributing.rst: ## @@ -189,9 +186,6 @@ Annotations should, when possible: * Be compatible with the current stable MyPy release.

[GitHub] [spark] hgs19921112 closed pull request #37782: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex ex

2022-09-03 Thread GitBox
hgs19921112 closed pull request #37782: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions` should applied to avoid attribute missing when use complex expression URL: https://github.com/apache/spark/pull/37782 -- This is an automated message from the Apache Git

[GitHub] [spark] LuciferYang opened a new pull request, #37783: [SPARK-40321][BUILD] Upgrade rocksdbjni to 7.5.3

2022-09-03 Thread GitBox
LuciferYang opened a new pull request, #37783: URL: https://github.com/apache/spark/pull/37783 ### What changes were proposed in this pull request? This PR aims to upgrade RocksDB JNI library from 7.4.5 to 7.5.3. ### Why are the changes needed? This version bring

[GitHub] [spark] zero323 commented on pull request #37774: [SPARK-40210][PYTHON][FOLLOW-UP][TEST] Speed up new tests using one action instead of many

2022-09-03 Thread GitBox
zero323 commented on PR #37774: URL: https://github.com/apache/spark/pull/37774#issuecomment-1236108467 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zero323 closed pull request #37774: [SPARK-40210][PYTHON][FOLLOW-UP][TEST] Speed up new tests using one action instead of many

2022-09-03 Thread GitBox
zero323 closed pull request #37774: [SPARK-40210][PYTHON][FOLLOW-UP][TEST] Speed up new tests using one action instead of many URL: https://github.com/apache/spark/pull/37774 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] hgs19921112 opened a new pull request, #37782: [SPARK-40288][SQL]After `RemoveRedundantAggregates`, `PullOutGroupingExpressions`" + "should applied to avoid attribute missing when

2022-09-03 Thread GitBox
hgs19921112 opened a new pull request, #37782: URL: https://github.com/apache/spark/pull/37782 ### What changes were proposed in this pull request? Atfter RemoveRedundantAggregates rule, we should pull the complex group by expression out. ### Why are the changes

[GitHub] [spark] khalidmammadov commented on pull request #37774: [SPARK-40210][PYTHON][FOLLOW-UP][TEST] Speed up new tests using one action instead of many

2022-09-03 Thread GitBox
khalidmammadov commented on PR #37774: URL: https://github.com/apache/spark/pull/37774#issuecomment-1236091611 > @khalidmammadov Could you adjust the title to `[SPARK-40210][PYTHON][FOLLOW-UP][TEST] Speed up new tests using one action instead of many`? Thanks. Done -- This is an

[GitHub] [spark] zero323 commented on pull request #37774: [SPARK-40210][PYTHON][TEST]Follow-up to speed up new tests using one action instead of many

2022-09-03 Thread GitBox
zero323 commented on PR #37774: URL: https://github.com/apache/spark/pull/37774#issuecomment-1236086072 @khalidmammadov Could you adjust the title to `[SPARK-40210][PYTHON][FOLLOW-UP][TEST] Speed up new tests using one action instead of many`? Thanks. -- This is an automated message

[GitHub] [spark] wangyum commented on pull request #37729: Revert "[SPARK-33861][SQL] Simplify conditional in predicate"

2022-09-03 Thread GitBox
wangyum commented on PR #37729: URL: https://github.com/apache/spark/pull/37729#issuecomment-1236077496 Merged to master, branch-3.3 and branch-3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] wangyum closed pull request #37729: Revert "[SPARK-33861][SQL] Simplify conditional in predicate"

2022-09-03 Thread GitBox
wangyum closed pull request #37729: Revert "[SPARK-33861][SQL] Simplify conditional in predicate" URL: https://github.com/apache/spark/pull/37729 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Yikun closed pull request #37781: static image

2022-09-03 Thread GitBox
Yikun closed pull request #37781: static image URL: https://github.com/apache/spark/pull/37781 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] wangyum commented on pull request #37780: [SPARK-39414][BUILD][FOLLOWUP] Update Scala to 2.12.16 in doc

2022-09-03 Thread GitBox
wangyum commented on PR #37780: URL: https://github.com/apache/spark/pull/37780#issuecomment-1236061305 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] wangyum closed pull request #37780: [SPARK-39414][BUILD][FOLLOWUP] Update Scala to 2.12.16 in doc

2022-09-03 Thread GitBox
wangyum closed pull request #37780: [SPARK-39414][BUILD][FOLLOWUP] Update Scala to 2.12.16 in doc URL: https://github.com/apache/spark/pull/37780 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] viirya commented on pull request #37463: [SPARK-40033][SQL] Nested schema pruning support through element_at

2022-09-03 Thread GitBox
viirya commented on PR #37463: URL: https://github.com/apache/spark/pull/37463#issuecomment-1236060312 Thanks. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] wankunde commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-03 Thread GitBox
wankunde commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r962112740 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2242,70 +2252,110 @@ private[spark] class DAGScheduler( val numMergers =

[GitHub] [spark] viirya closed pull request #37463: [SPARK-40033][SQL] Nested schema pruning support through element_at

2022-09-03 Thread GitBox
viirya closed pull request #37463: [SPARK-40033][SQL] Nested schema pruning support through element_at URL: https://github.com/apache/spark/pull/37463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] zhengruifeng commented on pull request #37752: [SPARK-40301][PYTHON] Add parameter validations in pyspark.rdd

2022-09-03 Thread GitBox
zhengruifeng commented on PR #37752: URL: https://github.com/apache/spark/pull/37752#issuecomment-1236059439 > Thanks! > > Shall we mention that ValueError will be raised for invalid inputs in the`Does this PR introduce any user-facing change` of PR description? sure, thanks

[GitHub] [spark] wankunde commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-03 Thread GitBox
wankunde commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r962111737 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2242,70 +2252,110 @@ private[spark] class DAGScheduler( val numMergers =

[GitHub] [spark] wankunde commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-03 Thread GitBox
wankunde commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r962110338 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -282,13 +286,19 @@ private[spark] class DAGScheduler( None } - // Use

[GitHub] [spark] wankunde commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-03 Thread GitBox
wankunde commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r962110217 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2309,7 +2309,16 @@ package object config { " shuffle is enabled.")

[GitHub] [spark] MaxGekk commented on pull request #37746: [SPARK-40293][SQL] Make the V2 table error message more meaningful

2022-09-03 Thread GitBox
MaxGekk commented on PR #37746: URL: https://github.com/apache/spark/pull/37746#issuecomment-1236057380 @huaxingao Could you fix PlanResolutionSuite and SparkThrowableSuite, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] wankunde commented on a diff in pull request #37533: [SPARK-40096]Fix finalize shuffle stage slow due to connection creation slow

2022-09-03 Thread GitBox
wankunde commented on code in PR #37533: URL: https://github.com/apache/spark/pull/37533#discussion_r962110234 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -282,13 +286,19 @@ private[spark] class DAGScheduler( None } - // Use