[GitHub] [spark] itholic commented on pull request #36058: [SPARK-38780][PYTHON][DOCS] PySpark docs build should fail when there is warning.

2022-04-03 Thread GitBox
itholic commented on PR #36058: URL: https://github.com/apache/spark/pull/36058#issuecomment-1087114454 Let me re-trigger the build with rebasing master after https://github.com/apache/spark/pull/36057 is merged. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] itholic commented on pull request #36057: [MINOR][DOCS] Remove PySpark doc build warnings

2022-04-03 Thread GitBox
itholic commented on PR #36057: URL: https://github.com/apache/spark/pull/36057#issuecomment-1087113553 Just opened a PR at https://github.com/apache/spark/pull/36058 to make warning to be failed. So, let me cherry-pick this fix to the opened PR after merging. -- This is an

[GitHub] [spark] itholic opened a new pull request, #36058: [SPARK-38780][PYTHON][DOCS] PySpark docs build should fail when there is warning.

2022-04-03 Thread GitBox
itholic opened a new pull request, #36058: URL: https://github.com/apache/spark/pull/36058 ### What changes were proposed in this pull request? This PR proposes to add option "-W" when running PySpark documentation build via Sphinx. ### Why are the changes needed?

[GitHub] [spark] itholic commented on pull request #34324: [SPARK-37015][PYTHON] Inline type hints for python/pyspark/streaming/dstream.py

2022-04-03 Thread GitBox
itholic commented on PR #34324: URL: https://github.com/apache/spark/pull/34324#issuecomment-1087105177 Also mind taking a last look for this, @zero323 ??  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] itholic commented on pull request #34293: [SPARK-37014][PYTHON] Inline type hints for python/pyspark/streaming/context.py

2022-04-03 Thread GitBox
itholic commented on PR #34293: URL: https://github.com/apache/spark/pull/34293#issuecomment-1087104988 Seems fine to me. Would you mind taking a last look for this, @zero323 ?? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] itholic commented on pull request #36057: [MINOR][DOCS] Remove PySpark doc build warnings

2022-04-03 Thread GitBox
itholic commented on PR #36057: URL: https://github.com/apache/spark/pull/36057#issuecomment-1087089407 @HyukjinKwon sure, let me take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #36057: [MINOR][DOCS] Remove PySpark doc build warnings

2022-04-03 Thread GitBox
HyukjinKwon commented on PR #36057: URL: https://github.com/apache/spark/pull/36057#issuecomment-1087087712 cc @xinrong-databricks and @zero323 FYI. @itholic BTW, I remember we talked about warnings in Sphinx build before. I think it should fail for these warnings but not sure why it

[GitHub] [spark] HyukjinKwon opened a new pull request, #36057: [MINOR][DOCS] Remove PySpark doc build warnings

2022-04-03 Thread GitBox
HyukjinKwon opened a new pull request, #36057: URL: https://github.com/apache/spark/pull/36057 ### What changes were proposed in this pull request? This PR fixes a various documentation build warnings in PySpark documentation ### Why are the changes needed? To render the

[GitHub] [spark] AngersZhuuuu opened a new pull request, #36056: [WIP][SPARK-36571][SQL] Add an SQLOverwriteHadoopMapReduceCommitProtocol to support all SQL overwrite write data to staging dir

2022-04-03 Thread GitBox
AngersZh opened a new pull request, #36056: URL: https://github.com/apache/spark/pull/36056 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] huaxingao commented on pull request #36050: [SPARK-38779] [SQL][Tests] Unify the pushed operator checking between FileSource test suite and JDBCV2Suite

2022-04-03 Thread GitBox
huaxingao commented on PR #36050: URL: https://github.com/apache/spark/pull/36050#issuecomment-1087061975 Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] wangyum commented on pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter

2022-04-03 Thread GitBox
wangyum commented on PR #36047: URL: https://github.com/apache/spark/pull/36047#issuecomment-1087059583 Merged to master and branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum closed pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter

2022-04-03 Thread GitBox
wangyum closed pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter URL: https://github.com/apache/spark/pull/36047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] lw33 commented on pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

2022-04-03 Thread GitBox
lw33 commented on PR #35979: URL: https://github.com/apache/spark/pull/35979#issuecomment-1087033057 Yes, maybe we don't need to do this change. I just found this problem when compacting event log, the event log could write to the path, but compat failed, so I thought this might be a bug.

[GitHub] [spark] dongjoon-hyun closed pull request #36050: [SPARK-38779] [SQL][Tests] Unify the pushed operator checking between FileSource test suite and JDBCV2Suite

2022-04-03 Thread GitBox
dongjoon-hyun closed pull request #36050: [SPARK-38779] [SQL][Tests] Unify the pushed operator checking between FileSource test suite and JDBCV2Suite URL: https://github.com/apache/spark/pull/36050 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] viirya closed pull request #36055: [SPARK-34863][SQL][FOLLOWUP] Disable `spark.sql.parquet.enableNestedColumnVectorizedReader` by default

2022-04-03 Thread GitBox
viirya closed pull request #36055: [SPARK-34863][SQL][FOLLOWUP] Disable `spark.sql.parquet.enableNestedColumnVectorizedReader` by default URL: https://github.com/apache/spark/pull/36055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] viirya commented on pull request #36055: [SPARK-34863][SQL][FOLLOWUP] Disable `spark.sql.parquet.enableNestedColumnVectorizedReader` by default

2022-04-03 Thread GitBox
viirya commented on PR #36055: URL: https://github.com/apache/spark/pull/36055#issuecomment-1087021571 Thanks. Merging to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #36048: [SPARK-38774][PYTHON] Implement Series.autocorr

2022-04-03 Thread GitBox
zhengruifeng commented on PR #36048: URL: https://github.com/apache/spark/pull/36048#issuecomment-1086995190 @xinrong-databricks Will add the tests and update the PR description, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] dongjoon-hyun commented on pull request #36049: [SPARK-38775][ML] cleanup validation functions

2022-04-03 Thread GitBox
dongjoon-hyun commented on PR #36049: URL: https://github.com/apache/spark/pull/36049#issuecomment-1086992390 Thank you so much! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #36049: [SPARK-38775][ML] cleanup validation functions

2022-04-03 Thread GitBox
zhengruifeng commented on PR #36049: URL: https://github.com/apache/spark/pull/36049#issuecomment-1086990897 @dongjoon-hyun Ok, I will hold on this PR since its target version is 3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] github-actions[bot] commented on pull request #33257: [SPARK-36039][K8S] Fix executor pod hadoop conf mount

2022-04-03 Thread GitBox
github-actions[bot] commented on PR #33257: URL: https://github.com/apache/spark/pull/33257#issuecomment-1086983267 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #34629: [SPARK-37355][CORE]Avoid Block Manager registrations when Executor is shutting down

2022-04-03 Thread GitBox
github-actions[bot] closed pull request #34629: [SPARK-37355][CORE]Avoid Block Manager registrations when Executor is shutting down URL: https://github.com/apache/spark/pull/34629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] github-actions[bot] closed pull request #34953: [SPARK-37682][SQL]Apply 'merged column' and 'bit vector' in RewriteDistinctAggregates

2022-04-03 Thread GitBox
github-actions[bot] closed pull request #34953: [SPARK-37682][SQL]Apply 'merged column' and 'bit vector' in RewriteDistinctAggregates URL: https://github.com/apache/spark/pull/34953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] github-actions[bot] commented on pull request #34990: [SPARK-37717][SQL] Improve logging in BroadcastExchangeExec

2022-04-03 Thread GitBox
github-actions[bot] commented on PR #34990: URL: https://github.com/apache/spark/pull/34990#issuecomment-1086983247 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] HyukjinKwon commented on pull request #36038: [WIP][SPARK-38759][PYTHON][SS] Add StreamingQueryListener support in PySpark

2022-04-03 Thread GitBox
HyukjinKwon commented on PR #36038: URL: https://github.com/apache/spark/pull/36038#issuecomment-1086981229 Yeah, actually that's what I was going to point out. Should be better to create a separate PR to improve the documentation for both sides :-). -- This is an automated message from

[GitHub] [spark] sunchao commented on pull request #34659: [SPARK-34863][SQL] Support complex types for Parquet vectorized reader

2022-04-03 Thread GitBox
sunchao commented on PR #34659: URL: https://github.com/apache/spark/pull/34659#issuecomment-1086971364 Thanks all for the review!!! @viirya I just opened #36055 for the follow-up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] sunchao opened a new pull request, #36055: [SPARK-34863][SQL][FOLLOWUP] Disable `spark.sql.parquet.enableNestedColumnVectorizedReader` by default

2022-04-03 Thread GitBox
sunchao opened a new pull request, #36055: URL: https://github.com/apache/spark/pull/36055 ### What changes were proposed in this pull request? This PR disables `spark.sql.parquet.enableNestedColumnVectorizedReader` by default. ### Why are the changes needed?

[GitHub] [spark] HeartSaVioR commented on pull request #36038: [WIP][SPARK-38759][PYTHON][SS] Add StreamingQueryListener support in PySpark

2022-04-03 Thread GitBox
HeartSaVioR commented on PR #36038: URL: https://github.com/apache/spark/pull/36038#issuecomment-1086968356 I see review comments about the doc which seem to be just copied from Scala/Java API doc. Since this PR focuses mainly to deal with feature parity, how about simply allowing

[GitHub] [spark] sunchao commented on pull request #35657: [SPARK-37377][SQL] Initial implementation of Storage-Partitioned Join

2022-04-03 Thread GitBox
sunchao commented on PR #35657: URL: https://github.com/apache/spark/pull/35657#issuecomment-1086963569 Thanks @dongjoon-hyun , updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] huaxingao commented on pull request #36039: [SPARK-38761][SQL] DS V2 supports push down misc non-aggregate functions

2022-04-03 Thread GitBox
huaxingao commented on PR #36039: URL: https://github.com/apache/spark/pull/36039#issuecomment-1086961617 I have a general question: what are the criteria of the functions that can be pushed down to data source? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] huaxingao commented on pull request #36050: [SPARK-38779] [SQL][Tests] Unify the pushed operator checking between FileSource test suite and JDBCV2Suite

2022-04-03 Thread GitBox
huaxingao commented on PR #36050: URL: https://github.com/apache/spark/pull/36050#issuecomment-1086961002 @dongjoon-hyun I created Spark-38779 for this. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] sigmod commented on pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter

2022-04-03 Thread GitBox
sigmod commented on PR #36047: URL: https://github.com/apache/spark/pull/36047#issuecomment-1086954891 LGTM. Can we merge it to branch-3.3 as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun closed pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures

2022-04-03 Thread GitBox
dongjoon-hyun closed pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures URL: https://github.com/apache/spark/pull/36054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures

2022-04-03 Thread GitBox
dongjoon-hyun commented on PR #36054: URL: https://github.com/apache/spark/pull/36054#issuecomment-1086943631 Thank you, @srowen . This is a single test suite only change, and I verified in two ways. Merged to master/3.3. ``` SPARK_ANSI_SQL_MODE=true build/sbt "mllib/testOnly

[GitHub] [spark] dongjoon-hyun commented on pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures

2022-04-03 Thread GitBox
dongjoon-hyun commented on PR #36054: URL: https://github.com/apache/spark/pull/36054#issuecomment-1086940137 cc @gengliangwang , @srowen , @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`

2022-04-03 Thread GitBox
dongjoon-hyun commented on PR #36051: URL: https://github.com/apache/spark/pull/36051#issuecomment-1086940092 Here is the follow-up. - https://github.com/apache/spark/pull/36054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures

2022-04-03 Thread GitBox
dongjoon-hyun opened a new pull request, #36054: URL: https://github.com/apache/spark/pull/36054 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`

2022-04-03 Thread GitBox
dongjoon-hyun commented on PR #36051: URL: https://github.com/apache/spark/pull/36051#issuecomment-1086939224 Oops. I realized that more `OutOfRange` failures were hidden in the same test case behind the previous `Overflow` failure. I'll make a follow-up soon. -- This is an automated

[GitHub] [spark] xinrong-databricks commented on pull request #36048: [SPARK-38774][PYTHON] Implement Series.autocorr

2022-04-03 Thread GitBox
xinrong-databricks commented on pull request #36048: URL: https://github.com/apache/spark/pull/36048#issuecomment-1086927554 Thanks @zhengruifeng! https://github.com/apache/spark/blob/master/python/pyspark/pandas/tests/test_series.py is a good place to add tests. It would be

[GitHub] [spark] xinrong-databricks commented on a change in pull request #36006: [SPARK-38686][PYTHON] Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`

2022-04-03 Thread GitBox
xinrong-databricks commented on a change in pull request #36006: URL: https://github.com/apache/spark/pull/36006#discussion_r841262157 ## File path: python/pyspark/pandas/indexes/multi.py ## @@ -893,6 +893,70 @@ def drop(self, codes: List[Any], level: Optional[Union[int,

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35991: [SPARK-38675][CORE] Fix race during unlock in BlockInfoManager

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #35991: URL: https://github.com/apache/spark/pull/35991#discussion_r841258213 ## File path: core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala ## @@ -360,12 +360,17 @@ private[storage] class BlockInfoManager

[GitHub] [spark] dongjoon-hyun commented on pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

2022-04-03 Thread GitBox
dongjoon-hyun commented on pull request #35979: URL: https://github.com/apache/spark/pull/35979#issuecomment-1086919965 Back to the original proposal, why do we need to support `illegal char`, @lw33 ? It's illegal, isn't it? -- This is an automated message from the Apache Git Service.

[GitHub] [spark] dongjoon-hyun commented on pull request #36033: [SPARK-38754][SQL][TEST][3.1] Using EquivalentExpressions getEquivalentExprs function instead of getExprState at SubexpressionEliminati

2022-04-03 Thread GitBox
dongjoon-hyun commented on pull request #36033: URL: https://github.com/apache/spark/pull/36033#issuecomment-1086918872 Originally, `branch-3.1` was broken, but `branch-3.2` wasn't. Given that, the forward-port from 3.1 to 3.2 looks wrong to me. I'm going to revert this from

[GitHub] [spark] dongjoon-hyun commented on pull request #36033: [SPARK-38754][SQL][TEST][3.1] Using EquivalentExpressions getEquivalentExprs function instead of getExprState at SubexpressionEliminati

2022-04-03 Thread GitBox
dongjoon-hyun commented on pull request #36033: URL: https://github.com/apache/spark/pull/36033#issuecomment-1086917353 Hi, @cloud-fan . This seems to break branch-3.2 compilation. ``` [error]

[GitHub] [spark] dongjoon-hyun commented on pull request #35886: [SPARK-38582][K8S] Introduce buildEnvVars and buildEnvVarsWithFieldRef for KubernetesUtils to eliminate duplicate code pattern

2022-04-03 Thread GitBox
dongjoon-hyun commented on pull request #35886: URL: https://github.com/apache/spark/pull/35886#issuecomment-1086916633 FYI, we had better hold on these kind of PRs during the planned release process. It's the same for the other refactoring PRs. -

[GitHub] [spark] yaooqinn commented on pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j

2022-04-03 Thread GitBox
yaooqinn commented on pull request #35765: URL: https://github.com/apache/spark/pull/35765#issuecomment-1086914484 thanks @dongjoon-hyun and all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] awdavidson commented on a change in pull request #36048: [SPARK-38774][PYTHON] Implement Series.autocorr

2022-04-03 Thread GitBox
awdavidson commented on a change in pull request #36048: URL: https://github.com/apache/spark/pull/36048#discussion_r841254014 ## File path: python/pyspark/pandas/series.py ## @@ -2937,6 +2937,73 @@ def add_suffix(self, suffix: str) -> "Series":

[GitHub] [spark] dongjoon-hyun closed pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j

2022-04-03 Thread GitBox
dongjoon-hyun closed pull request #35765: URL: https://github.com/apache/spark/pull/35765 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] dongjoon-hyun closed pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`

2022-04-03 Thread GitBox
dongjoon-hyun closed pull request #36051: URL: https://github.com/apache/spark/pull/36051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`

2022-04-03 Thread GitBox
dongjoon-hyun commented on pull request #36051: URL: https://github.com/apache/spark/pull/36051#issuecomment-1086909995 Thank you, @gengliangwang , @srowen , @yaooqinn . Merged to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] yaooqinn opened a new pull request #36053: [SPARK-38778][INFRA][BUILD] Replace http with https for project url in pom

2022-04-03 Thread GitBox
yaooqinn opened a new pull request #36053: URL: https://github.com/apache/spark/pull/36053 ### What changes were proposed in this pull request? change http://spark.apache.org/ to https://spark.apache.org/ in the project URL of all pom files ### Why are the

[GitHub] [spark] AmplabJenkins commented on pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter

2022-04-03 Thread GitBox
AmplabJenkins commented on pull request #36047: URL: https://github.com/apache/spark/pull/36047#issuecomment-1086901189 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] yaooqinn opened a new pull request #36052: [SPARK-38777][YARN] Add `bin/spark-submit --kill / --status` support for yarn

2022-04-03 Thread GitBox
yaooqinn opened a new pull request #36052: URL: https://github.com/apache/spark/pull/36052 ### What changes were proposed in this pull request? In this PR, we extend the `bin/spark-submit` to make it support ` --kill / --status` cli options for yarn cluster manager,

[GitHub] [spark] srowen commented on a change in pull request #36049: [SPARK-38775][ML] cleanup validation functions

2022-04-03 Thread GitBox
srowen commented on a change in pull request #36049: URL: https://github.com/apache/spark/pull/36049#discussion_r841241230 ## File path: mllib/src/main/scala/org/apache/spark/ml/util/DatasetUtils.scala ## @@ -138,4 +140,61 @@ private[spark] object DatasetUtils { case

[GitHub] [spark] gengliangwang commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`

2022-04-03 Thread GitBox
gengliangwang commented on pull request #36051: URL: https://github.com/apache/spark/pull/36051#issuecomment-1086883386 @dongjoon-hyun thanks for fixing it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dcoliversun commented on pull request #36044: [SPARK-38770][K8S] Remove `renameMainAppResource` from `baseDriverContainer`

2022-04-03 Thread GitBox
dcoliversun commented on pull request #36044: URL: https://github.com/apache/spark/pull/36044#issuecomment-1086847636  Thanks for your help @dongjoon-hyun @martin-g -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] lw33 commented on pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

2022-04-03 Thread GitBox
lw33 commented on pull request #35979: URL: https://github.com/apache/spark/pull/35979#issuecomment-1086831394 > Sorry guys. Supporting `illegal char` by removing `toURI` doesn't look like a safe improvement to me. > > Given the trade-off between benefit and risk, we had better

[GitHub] [spark] sarutak commented on pull request #35443: [MINOR][CORE] Change the log level to WARN for the message which is shown in case users attemp to add a JAR twice

2022-04-03 Thread GitBox
sarutak commented on pull request #35443: URL: https://github.com/apache/spark/pull/35443#issuecomment-1086831154 @dongjoon-hyun Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox
AngersZh commented on pull request #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1086820418 @dongjoon-hyun Build failed but seems not related to this pr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox
AngersZh commented on pull request #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1086800839 > I meant this, [#35799 (comment)](https://github.com/apache/spark/pull/35799#discussion_r841166511) . :) > > > Didn't got your point about revise the code for

[GitHub] [spark] peter-toth commented on pull request #35382: [SPARK-28090][SQL] Improve `replaceAliasButKeepName` performance

2022-04-03 Thread GitBox
peter-toth commented on pull request #35382: URL: https://github.com/apache/spark/pull/35382#issuecomment-1086800126 Thanks @cloud-fan, @dongjoon-hyun for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] kianelbo closed pull request #35977: [SPARK-38660][PYTHON] PySpark DeprecationWarning: distutils Version classes are deprecated

2022-04-03 Thread GitBox
kianelbo closed pull request #35977: URL: https://github.com/apache/spark/pull/35977 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] dongjoon-hyun commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox
dongjoon-hyun commented on pull request #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1086799205 I meant this, https://github.com/apache/spark/pull/35799#discussion_r841166511 . :) > Didn't got your point about revise the code for Apache Spark 3.4. -- This is

[GitHub] [spark] dongjoon-hyun commented on pull request #34970: [DO NOT MERGE] investigate test failures if we test ANSI mode in github actions

2022-04-03 Thread GitBox
dongjoon-hyun commented on pull request #34970: URL: https://github.com/apache/spark/pull/34970#issuecomment-1086798884 Shall we close this if all tests are completed, @gengliangwang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on pull request #35290: [SPARK-37865][SQL][3.0]Fix union bug when the first child of union has duplicate columns

2022-04-03 Thread GitBox
dongjoon-hyun commented on pull request #35290: URL: https://github.com/apache/spark/pull/35290#issuecomment-1086798303 Hi, @chasingegg . Thank you for making a PR. However, Apache Spark 3.0.0 was released on June 18, 2020. According to [Apache Spark Versioning

[GitHub] [spark] dongjoon-hyun closed pull request #35443: [MINOR][CORE] Change the log level to WARN for the message which is shown in case users attemp to add a JAR twice

2022-04-03 Thread GitBox
dongjoon-hyun closed pull request #35443: URL: https://github.com/apache/spark/pull/35443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] dongjoon-hyun closed pull request #35382: [SPARK-28090][SQL] Improve `replaceAliasButKeepName` performance

2022-04-03 Thread GitBox
dongjoon-hyun closed pull request #35382: URL: https://github.com/apache/spark/pull/35382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox
AngersZh commented on pull request #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1086796545 > In general, +1 for the requirement and idea, @AngersZh . Shall we revise the code for Apache Spark 3.4? Didn't got your point about `revise the code for

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox
AngersZh commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841178986 ## File path: streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala ## @@ -386,3 +400,14 @@ class

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841176375 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala ## @@ -294,4 +295,23 @@ class SparkSessionExtensions {

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841174227 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRulesHolder.scala ## @@ -0,0 +1,30 @@ +/* + * Licensed to

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841174107 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -83,7 +83,7 @@ case class

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841174107 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -83,7 +83,7 @@ case class

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841173889 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -28,7 +29,9 @@ import

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841173889 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -28,7 +29,9 @@ import

[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in 'ALS validate input dataset' test case

2022-04-03 Thread GitBox
dongjoon-hyun commented on pull request #36051: URL: https://github.com/apache/spark/pull/36051#issuecomment-1086791099 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun opened a new pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in 'ALS validate input dataset' test case

2022-04-03 Thread GitBox
dongjoon-hyun opened a new pull request #36051: URL: https://github.com/apache/spark/pull/36051 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j

2022-04-03 Thread GitBox
dongjoon-hyun edited a comment on pull request #35765: URL: https://github.com/apache/spark/pull/35765#issuecomment-1086786301 cc @MaxGekk since he is a release manager. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j

2022-04-03 Thread GitBox
dongjoon-hyun commented on pull request #35765: URL: https://github.com/apache/spark/pull/35765#issuecomment-1086786301 cc @MaxGekk since he is a release manager. Although this is still unclear to me, this could be a regression at Apache Spark 3.3.0 release. -- This is an

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35984: [MINOR] Log AnalysisException output for debug and tracing

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #35984: URL: https://github.com/apache/spark/pull/35984#discussion_r841167501 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1761,7 +1761,9 @@ class Analyzer(override

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35984: [MINOR] Log AnalysisException output for debug and tracing

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #35984: URL: https://github.com/apache/spark/pull/35984#discussion_r841167501 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1761,7 +1761,9 @@ class Analyzer(override

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841167066 ## File path: streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala ## @@ -386,3 +400,14 @@ class

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841166959 ## File path: streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala ## @@ -235,6 +236,19 @@ class

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841166793 ## File path: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ## @@ -616,6 +616,27 @@ class StreamingContext

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841166511 ## File path: streaming/src/main/scala/org/apache/spark/streaming/StreamingConf.scala ## @@ -185,4 +185,10 @@ object StreamingConf {

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox
dongjoon-hyun commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841166490 ## File path: streaming/src/main/scala/org/apache/spark/streaming/StreamingConf.scala ## @@ -185,4 +185,10 @@ object StreamingConf {

[GitHub] [spark] dongjoon-hyun commented on pull request #35881: [SPARK-36664][CORE] Log time waiting for cluster resources

2022-04-03 Thread GitBox
dongjoon-hyun commented on pull request #35881: URL: https://github.com/apache/spark/pull/35881#issuecomment-1086784648 Thank you, @holdenk . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] JacobZheng0927 removed a comment on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.

2022-04-03 Thread GitBox
JacobZheng0927 removed a comment on pull request #29965: URL: https://github.com/apache/spark/pull/29965#issuecomment-1085680797 I'm wondering if this change will cause a driver memory overflow, as duplicate SQLMetric may take up a lot of memory. @leanken-zz -- This is an automated