[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38409: [SPARK-40930][CONNECT] Support Collect() in Python client

2022-10-27 Thread GitBox
HyukjinKwon commented on code in PR #38409: URL: https://github.com/apache/spark/pull/38409#discussion_r1006433421 ## python/pyspark/sql/connect/dataframe.py: ## @@ -305,8 +308,12 @@ def _print_plan(self) -> str: return self._plan.print() return "" -

[GitHub] [spark] HyukjinKwon commented on pull request #38408: [SPARK-40858][INFRA] Upgrade setup-python to v4

2022-10-27 Thread GitBox
HyukjinKwon commented on PR #38408: URL: https://github.com/apache/spark/pull/38408#issuecomment-1293036353 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38393: [SPARK-40915][CONNECT] Improve `on` in Join in Python client

2022-10-27 Thread GitBox
zhengruifeng commented on code in PR #38393: URL: https://github.com/apache/spark/pull/38393#discussion_r1006443809 ## python/pyspark/sql/connect/plan.py: ## @@ -537,7 +537,7 @@ def __init__( self, left: Optional["LogicalPlan"], right: "LogicalPlan",

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38409: [SPARK-40930][CONNECT] Support Collect() in Python client

2022-10-27 Thread GitBox
HyukjinKwon commented on code in PR #38409: URL: https://github.com/apache/spark/pull/38409#discussion_r1006432921 ## python/pyspark/sql/connect/dataframe.py: ## @@ -305,8 +308,12 @@ def _print_plan(self) -> str: return self._plan.print() return "" -

[GitHub] [spark] MaxGekk commented on a diff in pull request #38394: [SPARK-40759][SQL] Migrate type check failures of time window onto error classes

2022-10-27 Thread GitBox
MaxGekk commented on code in PR #38394: URL: https://github.com/apache/spark/pull/38394#discussion_r1006432848 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala: ## @@ -114,18 +115,48 @@ case class TimeWindow( val dataTypeCheck =

[GitHub] [spark] HyukjinKwon closed pull request #38408: [SPARK-40858][INFRA] Upgrade setup-python to v4

2022-10-27 Thread GitBox
HyukjinKwon closed pull request #38408: [SPARK-40858][INFRA] Upgrade setup-python to v4 URL: https://github.com/apache/spark/pull/38408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on a diff in pull request #38393: [SPARK-40915][CONNECT] Improve `on` in Join in Python client

2022-10-27 Thread GitBox
amaliujia commented on code in PR #38393: URL: https://github.com/apache/spark/pull/38393#discussion_r1006436736 ## python/pyspark/sql/connect/dataframe.py: ## @@ -218,8 +218,13 @@ def head(self, n: int) -> Optional["pandas.DataFrame"]: self.limit(n) return

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38318: [SPARK-40852][CONNECT][PYTHON] Introduce `DataFrameFunction` in proto and implement `DataFrame.summary`

2022-10-27 Thread GitBox
zhengruifeng commented on code in PR #38318: URL: https://github.com/apache/spark/pull/38318#discussion_r1006439147 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -93,6 +96,19 @@ class SparkConnectPlanner(plan:

[GitHub] [spark-docker] Yikun commented on pull request #20: [SPARK-40929] Add Apache Spark 3.3.1 Dockerfiles

2022-10-27 Thread GitBox
Yikun commented on PR #20: URL: https://github.com/apache/spark-docker/pull/20#issuecomment-1293047988 @HyukjinKwon @wangyum Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on a diff in pull request #38393: [SPARK-40915][CONNECT] Improve `on` in Join in Python client

2022-10-27 Thread GitBox
amaliujia commented on code in PR #38393: URL: https://github.com/apache/spark/pull/38393#discussion_r1006437955 ## python/pyspark/sql/connect/plan.py: ## @@ -537,7 +537,7 @@ def __init__( self, left: Optional["LogicalPlan"], right: "LogicalPlan", -

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-10-27 Thread GitBox
HyukjinKwon commented on code in PR #38395: URL: https://github.com/apache/spark/pull/38395#discussion_r1006477093 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -2100,3 +2100,53 @@ object AsOfJoin { } } }

[GitHub] [spark] Yikun commented on pull request #38408: [SPARK-40928][INFRA] Upgrade setup-python to v4

2022-10-27 Thread GitBox
Yikun commented on PR #38408: URL: https://github.com/apache/spark/pull/38408#issuecomment-1293208375 @HyukjinKwon Thanks! My bad for wrong jira number, I just re-edit PR title to write JIRA for future search. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] MaxGekk commented on pull request #38394: [SPARK-40759][SQL] Migrate type check failures of time window onto error classes

2022-10-27 Thread GitBox
MaxGekk commented on PR #38394: URL: https://github.com/apache/spark/pull/38394#issuecomment-1293242573 +1, LGTM. Merging to master. Thank you, @LuciferYang. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk closed pull request #38394: [SPARK-40759][SQL] Migrate type check failures of time window onto error classes

2022-10-27 Thread GitBox
MaxGekk closed pull request #38394: [SPARK-40759][SQL] Migrate type check failures of time window onto error classes URL: https://github.com/apache/spark/pull/38394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #38394: [SPARK-40759][SQL] Migrate type check failures of time window onto error classes

2022-10-27 Thread GitBox
LuciferYang commented on PR #38394: URL: https://github.com/apache/spark/pull/38394#issuecomment-1293243967 Thanks @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #38399: [SPARK-40922][PYTHON] Document multiple path support in `pyspark.pandas.read_csv`

2022-10-27 Thread GitBox
HyukjinKwon commented on PR #38399: URL: https://github.com/apache/spark/pull/38399#issuecomment-1293346850 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] MaxGekk closed pull request #38387: [SPARK-40910][SQL] Replace UnsupportedOperationException with SparkUnsupportedOperationException

2022-10-27 Thread GitBox
MaxGekk closed pull request #38387: [SPARK-40910][SQL] Replace UnsupportedOperationException with SparkUnsupportedOperationException URL: https://github.com/apache/spark/pull/38387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk commented on pull request #38387: [SPARK-40910][SQL] Replace UnsupportedOperationException with SparkUnsupportedOperationException

2022-10-27 Thread GitBox
MaxGekk commented on PR #38387: URL: https://github.com/apache/spark/pull/38387#issuecomment-1293373569 +1, LGTM. Merging to master. Thank you, @panbingkun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #38413: [SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox
LuciferYang commented on PR #38413: URL: https://github.com/apache/spark/pull/38413#issuecomment-1293477815 cc @MaxGekk @HyukjinKwon @dongjoon-hyun Think again, does this refactor look more simple? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] bjornjorgensen commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox
bjornjorgensen commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1293732028 @xinrong-meng There are two tests that don't work for branch 3.2 Those are both python tests, can you have a look at them? -- This is an automated message from the Apache

[GitHub] [spark] zhengruifeng commented on pull request #38411: [SPARK-40933][SQL] Make df.stat.{cov, corr} consistent with sql functions

2022-10-27 Thread GitBox
zhengruifeng commented on PR #38411: URL: https://github.com/apache/spark/pull/38411#issuecomment-1293442088 seems different null handling, let me update this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] wangyeweikuer closed pull request #38414: Review from master

2022-10-27 Thread GitBox
wangyeweikuer closed pull request #38414: Review from master URL: https://github.com/apache/spark/pull/38414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [spark] LuciferYang opened a new pull request, #38412: [SPARK-40935][BUILD] Upgrade zstd-jni to 1.5.2-5

2022-10-27 Thread GitBox
LuciferYang opened a new pull request, #38412: URL: https://github.com/apache/spark/pull/38412 ### What changes were proposed in this pull request? This pr aims to upgrade zstd-jni to 1.5.2-5 ### Why are the changes needed? This version start to support magicless frames:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-10-27 Thread GitBox
zhengruifeng commented on code in PR #38395: URL: https://github.com/apache/spark/pull/38395#discussion_r1006804374 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -2100,3 +2100,53 @@ object AsOfJoin { } } }

[GitHub] [spark] srowen commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox
srowen commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1293460759 The test still shows 'pending', hm. It sounds like you saw some possibly-unrelated tests failing. Ideally we'd see the tests pass first. Can you re-set or re-run the tests? -- This is

[GitHub] [spark] awdavidson commented on pull request #38312: [SPARK-40819][SQL] Timestamp nanos behaviour regression

2022-10-27 Thread GitBox
awdavidson commented on PR #38312: URL: https://github.com/apache/spark/pull/38312#issuecomment-1293507848 @cloud-fan @LuciferYang any update/response regarding this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] MaxGekk closed pull request #38402: [SPARK-40924][SQL] Fix for Unhex when input has odd number of symbols

2022-10-27 Thread GitBox
MaxGekk closed pull request #38402: [SPARK-40924][SQL] Fix for Unhex when input has odd number of symbols URL: https://github.com/apache/spark/pull/38402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on pull request #38402: [SPARK-40924][SQL] Fix for Unhex when input has odd number of symbols

2022-10-27 Thread GitBox
MaxGekk commented on PR #38402: URL: https://github.com/apache/spark/pull/38402#issuecomment-1293368185 +1, LGTM. Merging to master/3.3. Thank you, @vitaliili-db. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] bjornjorgensen commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox
bjornjorgensen commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1293450929 @srowen two days have passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on a diff in pull request #38413: [SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox
LuciferYang commented on code in PR #38413: URL: https://github.com/apache/spark/pull/38413#discussion_r1006825185 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisTest.scala: ## @@ -183,30 +185,28 @@ trait AnalysisTest extends PlanTest {

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-10-27 Thread GitBox
SandishKumarHN commented on code in PR #38344: URL: https://github.com/apache/spark/pull/38344#discussion_r1007058916 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala: ## @@ -98,13 +92,9 @@ private[protobuf] case class

[GitHub] [spark] MaxGekk commented on pull request #38402: [SPARK-40924][SQL] Fix for Unhex when input has odd number of symbols

2022-10-27 Thread GitBox
MaxGekk commented on PR #38402: URL: https://github.com/apache/spark/pull/38402#issuecomment-1293370876 @vitaliili-db The changes cause conflicts in `branch-3.3`. Could you open a separate PR and backport this to 3.3, please. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] bjornjorgensen commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox
bjornjorgensen commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1293716740 @srowen tests have been re-run but it's the same result. But it's the same with everyone else's PR, for a long time. Like this one

[GitHub] [spark] MaxGekk commented on a diff in pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-10-27 Thread GitBox
MaxGekk commented on code in PR #38344: URL: https://github.com/apache/spark/pull/38344#discussion_r1006747706 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala: ## @@ -71,16 +70,11 @@ private[protobuf] case class

[GitHub] [spark] ljfgem commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-10-27 Thread GitBox
ljfgem commented on code in PR #35636: URL: https://github.com/apache/spark/pull/35636#discussion_r1007065247 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/V2ViewDescription.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] MaxGekk commented on a diff in pull request #38344: [SPARK-40777][SQL][PROTOBUF] Protobuf import support and move error-classes.

2022-10-27 Thread GitBox
MaxGekk commented on code in PR #38344: URL: https://github.com/apache/spark/pull/38344#discussion_r1006748193 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala: ## @@ -98,13 +92,9 @@ private[protobuf] case class

[GitHub] [spark] LuciferYang opened a new pull request, #38413: [SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox
LuciferYang opened a new pull request, #38413: URL: https://github.com/apache/spark/pull/38413 ### What changes were proposed in this pull request? This pr try to simplify `AnalysisTest#assertAnalysisErrorClass` method in the following way: - Remove the outer conditions:

[GitHub] [spark] LuciferYang commented on pull request #38413: [SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox
LuciferYang commented on PR #38413: URL: https://github.com/apache/spark/pull/38413#issuecomment-1293745767 > assertAnalysisErrorClass Sounds good, let me try. Set this to draft first and will ping you when it can be reviewed @MaxGekk -- This is an automated

[GitHub] [spark] wangyeweikuer opened a new pull request, #38414: Review from master

2022-10-27 Thread GitBox
wangyeweikuer opened a new pull request, #38414: URL: https://github.com/apache/spark/pull/38414 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] HyukjinKwon closed pull request #38399: [SPARK-40922][PYTHON] Document multiple path support in `pyspark.pandas.read_csv`

2022-10-27 Thread GitBox
HyukjinKwon closed pull request #38399: [SPARK-40922][PYTHON] Document multiple path support in `pyspark.pandas.read_csv` URL: https://github.com/apache/spark/pull/38399 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] LuciferYang commented on a diff in pull request #38413: [SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox
LuciferYang commented on code in PR #38413: URL: https://github.com/apache/spark/pull/38413#discussion_r1006825185 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisTest.scala: ## @@ -183,30 +185,28 @@ trait AnalysisTest extends PlanTest {

[GitHub] [spark] bjornjorgensen commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox
bjornjorgensen commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1293504608 ok, I re-run the tests now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on pull request #38413: [SPARK-40936][SQL][TESTS] Remove outer conditions to simplify `AnalysisTest#assertAnalysisErrorClass` method

2022-10-27 Thread GitBox
MaxGekk commented on PR #38413: URL: https://github.com/apache/spark/pull/38413#issuecomment-129372 I wonder why do we need `assertAnalysisErrorClass()` at all. `checkError` does the same job. Seems like `assertAnalysisErrorClass()` checks additionally case sensitivity (can be done in

[GitHub] [spark] amaliujia commented on a diff in pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-10-27 Thread GitBox
amaliujia commented on code in PR #38395: URL: https://github.com/apache/spark/pull/38395#discussion_r1007192022 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -2100,3 +2100,53 @@ object AsOfJoin { } } } +

[GitHub] [spark] dongjoon-hyun closed pull request #38369: [SPARK-40895][BUILD] Upgrade arrow to 10.0.0

2022-10-27 Thread GitBox
dongjoon-hyun closed pull request #38369: [SPARK-40895][BUILD] Upgrade arrow to 10.0.0 URL: https://github.com/apache/spark/pull/38369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #38369: [SPARK-40895][BUILD] Upgrade arrow to 10.0.0

2022-10-27 Thread GitBox
LuciferYang commented on PR #38369: URL: https://github.com/apache/spark/pull/38369#issuecomment-1294357109 Thanks @dongjoon-hyun @itholic @bjornjorgensen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] shardulm94 commented on pull request #37479: [SPARK-40045][SQL]Optimize the order of filtering predicates

2022-10-27 Thread GitBox
shardulm94 commented on PR #37479: URL: https://github.com/apache/spark/pull/37479#issuecomment-1294365611 @caican00 Do you think this PR is ready for another round of review? In our organization, we have seen a number of users impacted by this after migration to DSv2, so it would be nice

[GitHub] [spark] zhengruifeng commented on pull request #38411: [SPARK-40933][SQL] Make df.stat.{cov, corr} consistent with sql functions

2022-10-27 Thread GitBox
zhengruifeng commented on PR #38411: URL: https://github.com/apache/spark/pull/38411#issuecomment-1294384281 @HyukjinKwon as far as I know, they are the seem now. the original tests didn't cover null handling and empty dataset, I add a new UT to make sure no behavior change. --

[GitHub] [spark] dtenedor opened a new pull request, #38418: [SPARK-40944][SQL] Relax ordering constraint for CREATE TABLE column options

2022-10-27 Thread GitBox
dtenedor opened a new pull request, #38418: URL: https://github.com/apache/spark/pull/38418 ### What changes were proposed in this pull request? Relax ordering constraint for CREATE TABLE column options. Before this PR, the grammar for each CREATE TABLE column was: ```

[GitHub] [spark] vinodkc commented on pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2022-10-27 Thread GitBox
vinodkc commented on PR #38419: URL: https://github.com/apache/spark/pull/38419#issuecomment-1294221419 @HyukjinKwon , @dongjoon-hyun , Can you please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-10-27 Thread GitBox
vinodkc commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1294245157 @dtenedor , yes, please close yours as a dup. I appreciate your help in reviewing this PR and on top of this change, I'm planning to add additional built-in mask functions supported

[GitHub] [spark] dongjoon-hyun commented on pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox
dongjoon-hyun commented on PR #38417: URL: https://github.com/apache/spark/pull/38417#issuecomment-1294267607 Could you review this, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-10-27 Thread GitBox
zhengruifeng commented on code in PR #38395: URL: https://github.com/apache/spark/pull/38395#discussion_r1007535733 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -2100,3 +2100,53 @@ object AsOfJoin { } } }

[GitHub] [spark] HeartSaVioR commented on pull request #38361: [SPARK-40892][SQL][SS] Loosen the requirement of window_time rule - allow multiple window_time calls

2022-10-27 Thread GitBox
HeartSaVioR commented on PR #38361: URL: https://github.com/apache/spark/pull/38361#issuecomment-1294361918 Thanks @cloud-fan ! Given this PR stayed for 4 days and no feedback so far, I'm merging this. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] amaliujia commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox
amaliujia commented on code in PR #38415: URL: https://github.com/apache/spark/pull/38415#discussion_r1007553459 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -47,6 +47,13 @@ message Relation { Unknown unknown = 999; } + // Optional.

[GitHub] [spark] amaliujia commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox
amaliujia commented on code in PR #38415: URL: https://github.com/apache/spark/pull/38415#discussion_r1007553459 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -47,6 +47,13 @@ message Relation { Unknown unknown = 999; } + // Optional.

[GitHub] [spark] alex-balikov commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox
alex-balikov commented on code in PR #38405: URL: https://github.com/apache/spark/pull/38405#discussion_r1007479005 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SessionWindow.scala: ## @@ -68,11 +68,29 @@ case class SessionWindow(timeColumn:

[GitHub] [spark] amaliujia commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox
amaliujia commented on code in PR #38415: URL: https://github.com/apache/spark/pull/38415#discussion_r1007556667 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -47,6 +47,13 @@ message Relation { Unknown unknown = 999; } + // Optional.

[GitHub] [spark] zhengruifeng commented on pull request #37955: [SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0

2022-10-27 Thread GitBox
zhengruifeng commented on PR #37955: URL: https://github.com/apache/spark/pull/37955#issuecomment-1294400941 Merged into master, thank you @itholic for doing this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] sunchao commented on pull request #38277: [SPARK-40815][SQL] Introduce DelegateSymlinkTextInputFormat to handle empty splits when "spark.hadoopRDD.ignoreEmptySplits" is enabled in ord

2022-10-27 Thread GitBox
sunchao commented on PR #38277: URL: https://github.com/apache/spark/pull/38277#issuecomment-129460 @sadikovi sorry for the delay, will take a look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] vinodkc opened a new pull request, #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2022-10-27 Thread GitBox
vinodkc opened a new pull request, #38419: URL: https://github.com/apache/spark/pull/38419 ### What changes were proposed in this pull request? This PR implements the built-in function `TRUNC`to truncate numbers to the previous integer or decimal. It optionally accepts a second

[GitHub] [spark] rahulsmahadev commented on a diff in pull request #38404: [WIP] Replace Where

2022-10-27 Thread GitBox
rahulsmahadev commented on code in PR #38404: URL: https://github.com/apache/spark/pull/38404#discussion_r1007450737 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1276,6 +1276,24 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] vinodkc commented on a diff in pull request #38263: [SPARK-40692][SQL] Support data masking built-in function 'mask_hash'

2022-10-27 Thread GitBox
vinodkc commented on code in PR #38263: URL: https://github.com/apache/spark/pull/38263#discussion_r1007474647 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3950,6 +3950,14 @@ object SQLConf {

[GitHub] [spark] github-actions[bot] commented on pull request #37219: [WIP][SPARK-39794][PYTHON] Introduce parametric singleton for DataType

2022-10-27 Thread GitBox
github-actions[bot] commented on PR #37219: URL: https://github.com/apache/spark/pull/37219#issuecomment-1294261496 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37048: [SPARK-39655][CORE] Add a config to limit the number of RDD partitions

2022-10-27 Thread GitBox
github-actions[bot] closed pull request #37048: [SPARK-39655][CORE] Add a config to limit the number of RDD partitions URL: https://github.com/apache/spark/pull/37048 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun closed pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox
dongjoon-hyun closed pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17` URL: https://github.com/apache/spark/pull/38417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #38419: [SPARK-40945][SQL] Support built-in function to truncate numbers

2022-10-27 Thread GitBox
HyukjinKwon commented on PR #38419: URL: https://github.com/apache/spark/pull/38419#issuecomment-1294277396 cc @wangyum FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] Yikun commented on pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox
Yikun commented on PR #38417: URL: https://github.com/apache/spark/pull/38417#issuecomment-1294284723 Thanks, late LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox
cloud-fan commented on code in PR #38415: URL: https://github.com/apache/spark/pull/38415#discussion_r1007551962 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -47,6 +47,13 @@ message Relation { Unknown unknown = 999; } + // Optional.

[GitHub] [spark] cloud-fan commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox
cloud-fan commented on code in PR #38415: URL: https://github.com/apache/spark/pull/38415#discussion_r1007552036 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -47,6 +47,13 @@ message Relation { Unknown unknown = 999; } + // Optional.

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox
HeartSaVioR commented on code in PR #38405: URL: https://github.com/apache/spark/pull/38405#discussion_r1007564839 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SessionWindow.scala: ## @@ -68,11 +68,29 @@ case class SessionWindow(timeColumn:

[GitHub] [spark] AngersZhuuuu commented on pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side

2022-10-27 Thread GitBox
AngersZh commented on PR #35594: URL: https://github.com/apache/spark/pull/35594#issuecomment-1294389484 ping @cloud-fan @yaooqinn @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark-docker] Yikun commented on pull request #15: [SPARK-40569] Expose SPARK_MASTER_PORT 7077 for spark standalone cluster

2022-10-27 Thread GitBox
Yikun commented on PR #15: URL: https://github.com/apache/spark-docker/pull/15#issuecomment-1294391023 also cc @holdenk @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #38395: [SPARK-40917][SQL] Add a dedicated logical plan for `Summary`

2022-10-27 Thread GitBox
cloud-fan commented on code in PR #38395: URL: https://github.com/apache/spark/pull/38395#discussion_r1007548886 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -2100,3 +2100,53 @@ object AsOfJoin { } } } +

[GitHub] [spark] cloud-fan commented on a diff in pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox
cloud-fan commented on code in PR #38415: URL: https://github.com/apache/spark/pull/38415#discussion_r1007555285 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -47,6 +47,13 @@ message Relation { Unknown unknown = 999; } + // Optional.

[GitHub] [spark] cloud-fan commented on a diff in pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox
cloud-fan commented on code in PR #38406: URL: https://github.com/apache/spark/pull/38406#discussion_r100731 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -215,4 +180,16 @@ class SparkConnectProtoSuite extends

[GitHub] [spark] zhengruifeng closed pull request #37955: [SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0

2022-10-27 Thread GitBox
zhengruifeng closed pull request #37955: [SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0 URL: https://github.com/apache/spark/pull/37955 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox
HeartSaVioR commented on code in PR #38405: URL: https://github.com/apache/spark/pull/38405#discussion_r1007411825 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala: ## @@ -0,0 +1,400 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] carlfu-db commented on pull request #38404: [WIP] Replace Where

2022-10-27 Thread GitBox
carlfu-db commented on PR #38404: URL: https://github.com/apache/spark/pull/38404#issuecomment-1294192968 > Mind adding a test, filing a JIRA, etc? See also https://spark.apache.org/contributing.html Will do. Still in progress :) -- This is an automated message from the Apache

[GitHub] [spark] dongjoon-hyun commented on pull request #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox
dongjoon-hyun commented on PR #38417: URL: https://github.com/apache/spark/pull/38417#issuecomment-1294270084 Thank you so much, @viirya . Merged to master for Apache Spark 3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] itholic commented on pull request #37955: [SPARK-40512][SPARK-40896][PS][INFRA] Upgrade pandas to 1.5.0

2022-10-27 Thread GitBox
itholic commented on PR #37955: URL: https://github.com/apache/spark/pull/37955#issuecomment-1294313246 CI passed! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a diff in pull request #38418: [SPARK-40944][SQL] Relax ordering constraint for CREATE TABLE column options

2022-10-27 Thread GitBox
cloud-fan commented on code in PR #38418: URL: https://github.com/apache/spark/pull/38418#discussion_r1007527891 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -2813,13 +2813,41 @@ class AstBuilder extends

[GitHub] [spark] HeartSaVioR closed pull request #38361: [SPARK-40892][SQL][SS] Loosen the requirement of window_time rule - allow multiple window_time calls

2022-10-27 Thread GitBox
HeartSaVioR closed pull request #38361: [SPARK-40892][SQL][SS] Loosen the requirement of window_time rule - allow multiple window_time calls URL: https://github.com/apache/spark/pull/38361 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] cloud-fan commented on a diff in pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox
cloud-fan commented on code in PR #38406: URL: https://github.com/apache/spark/pull/38406#discussion_r100731 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -215,4 +180,16 @@ class SparkConnectProtoSuite extends

[GitHub] [spark] amaliujia commented on a diff in pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox
amaliujia commented on code in PR #38406: URL: https://github.com/apache/spark/pull/38406#discussion_r1007558507 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -215,4 +180,16 @@ class SparkConnectProtoSuite extends

[GitHub] [spark] dongjoon-hyun commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox
dongjoon-hyun commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1293774559 > Could you make the same patch for 3.1 branch? No, Apache Spark 3.1 reached EOL last month because the first release was March 2, 2021. -- This is an automated message

[GitHub] [spark] dongjoon-hyun closed pull request #38412: [SPARK-40935][BUILD] Upgrade zstd-jni to 1.5.2-5

2022-10-27 Thread GitBox
dongjoon-hyun closed pull request #38412: [SPARK-40935][BUILD] Upgrade zstd-jni to 1.5.2-5 URL: https://github.com/apache/spark/pull/38412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on pull request #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox
amaliujia commented on PR #38415: URL: https://github.com/apache/spark/pull/38415#issuecomment-1293887655 R: @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia opened a new pull request, #38415: [SPARK-40938][CONNECT] Support Alias for every type of Relation

2022-10-27 Thread GitBox
amaliujia opened a new pull request, #38415: URL: https://github.com/apache/spark/pull/38415 ### What changes were proposed in this pull request? In the past, Connect server can check `alias` for `Read` and `Project`. However for Spark DataFrame, every DataFrame can be

[GitHub] [spark] amaliujia commented on a diff in pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox
amaliujia commented on code in PR #38406: URL: https://github.com/apache/spark/pull/38406#discussion_r1007207351 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -67,7 +67,7 @@ class SparkConnectPlanner(plan:

[GitHub] [spark] vitaliili-db commented on pull request #38416: [SPARK-40924][SQL][3.3] Fix for Unhex when input has odd number of symbols

2022-10-27 Thread GitBox
vitaliili-db commented on PR #38416: URL: https://github.com/apache/spark/pull/38416#issuecomment-1293889541 @MaxGekk backported, please take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] vitaliili-db opened a new pull request, #38416: [SPARK-40924][SQL][3.3] Fix for Unhex when input has odd number of symbols

2022-10-27 Thread GitBox
vitaliili-db opened a new pull request, #38416: URL: https://github.com/apache/spark/pull/38416 ### What changes were proposed in this pull request? Fix for a bug in Unhex function when there is an odd number of symbols in the input string. This is backport of #38402

[GitHub] [spark] dongjoon-hyun commented on pull request #38412: [SPARK-40935][BUILD] Upgrade zstd-jni to 1.5.2-5

2022-10-27 Thread GitBox
dongjoon-hyun commented on PR #38412: URL: https://github.com/apache/spark/pull/38412#issuecomment-1293828092 Thank you, @LuciferYang , @HyukjinKwon , @singhpk234 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] AmplabJenkins commented on pull request #38410: [SPARK-40932][CORE] Fix issue messages for allGather are overridden

2022-10-27 Thread GitBox
AmplabJenkins commented on PR #38410: URL: https://github.com/apache/spark/pull/38410#issuecomment-1293998577 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] alex-balikov commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox
alex-balikov commented on code in PR #38405: URL: https://github.com/apache/spark/pull/38405#discussion_r1007369668 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala: ## @@ -0,0 +1,400 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] vitas commented on pull request #38352: [SPARK-40801][BUILD][3.2] Upgrade `Apache commons-text` to 1.10

2022-10-27 Thread GitBox
vitas commented on PR #38352: URL: https://github.com/apache/spark/pull/38352#issuecomment-1293772433 Could you make the same patch for 3.1 branch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] alex-balikov commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox
alex-balikov commented on code in PR #38405: URL: https://github.com/apache/spark/pull/38405#discussion_r1007348475 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala: ## @@ -0,0 +1,400 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia commented on a diff in pull request #38406: [SPARK-40926][CONNECT] Refactor server side tests to only use DataFrame API

2022-10-27 Thread GitBox
amaliujia commented on code in PR #38406: URL: https://github.com/apache/spark/pull/38406#discussion_r1007318294 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -215,4 +180,16 @@ class SparkConnectProtoSuite extends

[GitHub] [spark] dongjoon-hyun opened a new pull request, #38417: [SPARK-40941][K8S] Use Java 17 in K8s Dockerfile by default and remove `Dockerfile.java17`

2022-10-27 Thread GitBox
dongjoon-hyun opened a new pull request, #38417: URL: https://github.com/apache/spark/pull/38417 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] alex-balikov commented on a diff in pull request #38405: [SPARK-40925][SQL][SS] Fix stateful operator late record filtering

2022-10-27 Thread GitBox
alex-balikov commented on code in PR #38405: URL: https://github.com/apache/spark/pull/38405#discussion_r1007394123 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala: ## @@ -0,0 +1,400 @@ +/* + * Licensed to the Apache Software

  1   2   >