date:20220403

[GitHub] [spark] itholic commented on pull request #36058: [SPARK-38780][PYTHON][DOCS] PySpark docs build should fail when there is warning.

2022-04-03 Thread GitBox



itholic commented on PR #36058:
URL: https://github.com/apache/spark/pull/36058#issuecomment-1087114454

   Let me re-trigger the build with rebasing master after 
https://github.com/apache/spark/pull/36057 is merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on pull request #36057: [MINOR][DOCS] Remove PySpark doc build warnings

2022-04-03 Thread GitBox



itholic commented on PR #36057:
URL: https://github.com/apache/spark/pull/36057#issuecomment-1087113553

   Just opened a PR at https://github.com/apache/spark/pull/36058 to make 
warning to be failed.
   
   So, let me cherry-pick this fix to the opened PR after merging.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic opened a new pull request, #36058: [SPARK-38780][PYTHON][DOCS] PySpark docs build should fail when there is warning.

2022-04-03 Thread GitBox



itholic opened a new pull request, #36058:
URL: https://github.com/apache/spark/pull/36058

   ### What changes were proposed in this pull request?
   
   This PR proposes to add option "-W" when running PySpark documentation build 
via Sphinx.
   
   
   ### Why are the changes needed?
   
   To make documentation build failing when the documentation violates the 
Sphinx warning rules.
   
   ### Does this PR introduce _any_ user-facing change?
   
   This would make docs a bit more prettier.
   
   
   ### How was this patch tested?
   
   The existing build & tests should be passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on pull request #34324: [SPARK-37015][PYTHON] Inline type hints for python/pyspark/streaming/dstream.py

2022-04-03 Thread GitBox



itholic commented on PR #34324:
URL: https://github.com/apache/spark/pull/34324#issuecomment-1087105177

   Also mind taking a last look for this, @zero323 ??  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on pull request #34293: [SPARK-37014][PYTHON] Inline type hints for python/pyspark/streaming/context.py

2022-04-03 Thread GitBox



itholic commented on PR #34293:
URL: https://github.com/apache/spark/pull/34293#issuecomment-1087104988

   Seems fine to me.
   
   Would you mind taking a last look for this, @zero323 ??


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] itholic commented on pull request #36057: [MINOR][DOCS] Remove PySpark doc build warnings

2022-04-03 Thread GitBox



itholic commented on PR #36057:
URL: https://github.com/apache/spark/pull/36057#issuecomment-1087089407

   @HyukjinKwon sure, let me take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36057: [MINOR][DOCS] Remove PySpark doc build warnings

2022-04-03 Thread GitBox



HyukjinKwon commented on PR #36057:
URL: https://github.com/apache/spark/pull/36057#issuecomment-1087087712

   cc @xinrong-databricks and @zero323 FYI.
   
   @itholic BTW, I remember we talked about warnings in Sphinx build before. I 
think it should fail for these warnings but not sure why it doesn't. Would you 
mind taking a look for this one, and making it failed if there are warnings 
detected during the build?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request, #36057: [MINOR][DOCS] Remove PySpark doc build warnings

2022-04-03 Thread GitBox



HyukjinKwon opened a new pull request, #36057:
URL: https://github.com/apache/spark/pull/36057

   ### What changes were proposed in this pull request?
   
   This PR fixes a various documentation build warnings in PySpark documentation
   
   ### Why are the changes needed?
   
   To render the docs better.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it changes the documentation to be prettier. Pretty minor though.
   
   ### How was this patch tested?
   
   I manually tested it by building the PySpark documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu opened a new pull request, #36056: [WIP][SPARK-36571][SQL] Add an SQLOverwriteHadoopMapReduceCommitProtocol to support all SQL overwrite write data to staging dir

2022-04-03 Thread GitBox



AngersZh opened a new pull request, #36056:
URL: https://github.com/apache/spark/pull/36056

   …
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36050: [SPARK-38779] [SQL][Tests] Unify the pushed operator checking between FileSource test suite and JDBCV2Suite

2022-04-03 Thread GitBox



huaxingao commented on PR #36050:
URL: https://github.com/apache/spark/pull/36050#issuecomment-1087061975

   Thanks all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter

2022-04-03 Thread GitBox



wangyum commented on PR #36047:
URL: https://github.com/apache/spark/pull/36047#issuecomment-1087059583

   Merged to master and branch-3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum closed pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter

2022-04-03 Thread GitBox



wangyum closed pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add 
ColumnPruning in injectBloomFilter
URL: https://github.com/apache/spark/pull/36047


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] lw33 commented on pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

2022-04-03 Thread GitBox



lw33 commented on PR #35979:
URL: https://github.com/apache/spark/pull/35979#issuecomment-1087033057

   Yes, maybe we don't need to do this change. I just found this problem when 
compacting event log, the event log could write to the path, but compat failed, 
so I thought this might be a bug. Of course I could change the path to avoid 
this problem. @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36050: [SPARK-38779] [SQL][Tests] Unify the pushed operator checking between FileSource test suite and JDBCV2Suite

2022-04-03 Thread GitBox



dongjoon-hyun closed pull request #36050: [SPARK-38779] [SQL][Tests] Unify the 
pushed operator checking between FileSource test suite and JDBCV2Suite
URL: https://github.com/apache/spark/pull/36050


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya closed pull request #36055: [SPARK-34863][SQL][FOLLOWUP] Disable `spark.sql.parquet.enableNestedColumnVectorizedReader` by default

2022-04-03 Thread GitBox



viirya closed pull request #36055: [SPARK-34863][SQL][FOLLOWUP] Disable 
`spark.sql.parquet.enableNestedColumnVectorizedReader` by default
URL: https://github.com/apache/spark/pull/36055


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on pull request #36055: [SPARK-34863][SQL][FOLLOWUP] Disable `spark.sql.parquet.enableNestedColumnVectorizedReader` by default

2022-04-03 Thread GitBox



viirya commented on PR #36055:
URL: https://github.com/apache/spark/pull/36055#issuecomment-1087021571

   Thanks. Merging to master/3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #36048: [SPARK-38774][PYTHON] Implement Series.autocorr

2022-04-03 Thread GitBox



zhengruifeng commented on PR #36048:
URL: https://github.com/apache/spark/pull/36048#issuecomment-1086995190

   @xinrong-databricks  Will add the tests and update the PR description, 
thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36049: [SPARK-38775][ML] cleanup validation functions

2022-04-03 Thread GitBox



dongjoon-hyun commented on PR #36049:
URL: https://github.com/apache/spark/pull/36049#issuecomment-1086992390

   Thank you so much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #36049: [SPARK-38775][ML] cleanup validation functions

2022-04-03 Thread GitBox



zhengruifeng commented on PR #36049:
URL: https://github.com/apache/spark/pull/36049#issuecomment-1086990897

   @dongjoon-hyun  Ok, I will hold on this PR since its target version is 3.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #33257: [SPARK-36039][K8S] Fix executor pod hadoop conf mount

2022-04-03 Thread GitBox



github-actions[bot] commented on PR #33257:
URL: https://github.com/apache/spark/pull/33257#issuecomment-1086983267

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] closed pull request #34629: [SPARK-37355][CORE]Avoid Block Manager registrations when Executor is shutting down

2022-04-03 Thread GitBox



github-actions[bot] closed pull request #34629: [SPARK-37355][CORE]Avoid Block 
Manager registrations when Executor is shutting down
URL: https://github.com/apache/spark/pull/34629


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] closed pull request #34953: [SPARK-37682][SQL]Apply 'merged column' and 'bit vector' in RewriteDistinctAggregates

2022-04-03 Thread GitBox



github-actions[bot] closed pull request #34953: [SPARK-37682][SQL]Apply 'merged 
column' and 'bit vector' in RewriteDistinctAggregates
URL: https://github.com/apache/spark/pull/34953


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #34990: [SPARK-37717][SQL] Improve logging in BroadcastExchangeExec

2022-04-03 Thread GitBox



github-actions[bot] commented on PR #34990:
URL: https://github.com/apache/spark/pull/34990#issuecomment-1086983247

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #36038: [WIP][SPARK-38759][PYTHON][SS] Add StreamingQueryListener support in PySpark

2022-04-03 Thread GitBox



HyukjinKwon commented on PR #36038:
URL: https://github.com/apache/spark/pull/36038#issuecomment-1086981229

   Yeah, actually that's what I was going to point out. Should be better to 
create a separate PR to improve the documentation for both sides :-).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #34659: [SPARK-34863][SQL] Support complex types for Parquet vectorized reader

2022-04-03 Thread GitBox



sunchao commented on PR #34659:
URL: https://github.com/apache/spark/pull/34659#issuecomment-1086971364

   Thanks all for the review!!! @viirya I just opened #36055 for the follow-up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sunchao opened a new pull request, #36055: [SPARK-34863][SQL][FOLLOWUP] Disable `spark.sql.parquet.enableNestedColumnVectorizedReader` by default

2022-04-03 Thread GitBox



sunchao opened a new pull request, #36055:
URL: https://github.com/apache/spark/pull/36055

   
   
   ### What changes were proposed in this pull request?
   
   
   This PR disables `spark.sql.parquet.enableNestedColumnVectorizedReader` by 
default.
   
   ### Why are the changes needed?
   
   
   In #34659 the config was turned mainly for testing reason. As the feature is 
new, we should turn it off by default.
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   The config `spark.sql.parquet.enableNestedColumnVectorizedReader` is turned 
off by default now.
   
   ### How was this patch tested?
   
   
   Existing tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #36038: [WIP][SPARK-38759][PYTHON][SS] Add StreamingQueryListener support in PySpark

2022-04-03 Thread GitBox



HeartSaVioR commented on PR #36038:
URL: https://github.com/apache/spark/pull/36038#issuecomment-1086968356

   I see review comments about the doc which seem to be just copied from 
Scala/Java API doc.
   
   Since this PR focuses mainly to deal with feature parity, how about simply 
allowing copy-paste doc from Scala/Java API doc (with additional content if 
something only applies to PySpark), and having another PR fixing both 
Scala/Java and PySpark doc altogether after this has merged?
   
   The doc for Scala/Java API has been served for years and personally I'd 
consider commenting on the doc as proposing changes on "existing doc" instead 
of new doc, since the doc change here should trigger the change on Scala/Java 
doc to be in sync, effectively making changes on existing doc.
   
   Does it make sense for everyone?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sunchao commented on pull request #35657: [SPARK-37377][SQL] Initial implementation of Storage-Partitioned Join

2022-04-03 Thread GitBox



sunchao commented on PR #35657:
URL: https://github.com/apache/spark/pull/35657#issuecomment-1086963569

   Thanks @dongjoon-hyun , updated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36039: [SPARK-38761][SQL] DS V2 supports push down misc non-aggregate functions

2022-04-03 Thread GitBox



huaxingao commented on PR #36039:
URL: https://github.com/apache/spark/pull/36039#issuecomment-1086961617

   I have a general question: what are the criteria of the functions that can 
be pushed down to data source?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on pull request #36050: [SPARK-38779] [SQL][Tests] Unify the pushed operator checking between FileSource test suite and JDBCV2Suite

2022-04-03 Thread GitBox



huaxingao commented on PR #36050:
URL: https://github.com/apache/spark/pull/36050#issuecomment-1086961002

   @dongjoon-hyun I created Spark-38779 for this. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sigmod commented on pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter

2022-04-03 Thread GitBox



sigmod commented on PR #36047:
URL: https://github.com/apache/spark/pull/36047#issuecomment-1086954891

   LGTM. Can we merge it to branch-3.3 as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures

2022-04-03 Thread GitBox



dongjoon-hyun closed pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] 
Disable ANSI_ENABLED more for `Out of Range` failures
URL: https://github.com/apache/spark/pull/36054


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures

2022-04-03 Thread GitBox



dongjoon-hyun commented on PR #36054:
URL: https://github.com/apache/spark/pull/36054#issuecomment-1086943631

   Thank you, @srowen . This is a single test suite only change, and I verified 
in two ways. Merged to master/3.3.
   
   ```
   SPARK_ANSI_SQL_MODE=true build/sbt "mllib/testOnly *.ALSSuite"
   SPARK_ANSI_SQL_MODE=false build/sbt "mllib/testOnly *.ALSSuite"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures

2022-04-03 Thread GitBox



dongjoon-hyun commented on PR #36054:
URL: https://github.com/apache/spark/pull/36054#issuecomment-1086940137

   cc @gengliangwang , @srowen , @yaooqinn


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`

2022-04-03 Thread GitBox



dongjoon-hyun commented on PR #36051:
URL: https://github.com/apache/spark/pull/36051#issuecomment-1086940092

   Here is the follow-up.
   - https://github.com/apache/spark/pull/36054


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures

2022-04-03 Thread GitBox



dongjoon-hyun opened a new pull request, #36054:
URL: https://github.com/apache/spark/pull/36054

   …
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`

2022-04-03 Thread GitBox



dongjoon-hyun commented on PR #36051:
URL: https://github.com/apache/spark/pull/36051#issuecomment-1086939224

   Oops. I realized that more `OutOfRange` failures were hidden in the same 
test case behind the previous `Overflow` failure. I'll make a follow-up soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on pull request #36048: [SPARK-38774][PYTHON] Implement Series.autocorr

2022-04-03 Thread GitBox



xinrong-databricks commented on pull request #36048:
URL: https://github.com/apache/spark/pull/36048#issuecomment-1086927554


   Thanks @zhengruifeng!
   
   
https://github.com/apache/spark/blob/master/python/pyspark/pandas/tests/test_series.py
 is a good place to add tests.
   
   It would be great to specify what changes in **Does this PR introduce any 
user-facing change?** section of the PR description. An example is good enough.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] xinrong-databricks commented on a change in pull request #36006: [SPARK-38686][PYTHON] Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`

2022-04-03 Thread GitBox



xinrong-databricks commented on a change in pull request #36006:
URL: https://github.com/apache/spark/pull/36006#discussion_r841262157



##
File path: python/pyspark/pandas/indexes/multi.py
##
@@ -893,6 +893,70 @@ def drop(self, codes: List[Any], level: 
Optional[Union[int, Name]] = None) -> "M
 )
 return cast(MultiIndex, DataFrame(internal).index)
 
+def drop_duplicates(self, keep: Union[bool, str] = "first") -> 
"MultiIndex":
+"""

Review comment:
   Thank!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35991: [SPARK-38675][CORE] Fix race during unlock in BlockInfoManager

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #35991:
URL: https://github.com/apache/spark/pull/35991#discussion_r841258213



##
File path: core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala
##
@@ -360,12 +360,17 @@ private[storage] class BlockInfoManager extends Logging {
 info.writerTask = BlockInfo.NO_WRITER
 writeLocksByTask.get(taskAttemptId).remove(blockId)

Review comment:
   For this, gentle ping @hvanhovell for his confirmation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

2022-04-03 Thread GitBox



dongjoon-hyun commented on pull request #35979:
URL: https://github.com/apache/spark/pull/35979#issuecomment-1086919965


   Back to the original proposal, why do we need to support `illegal char`, 
@lw33 ? It's illegal, isn't it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36033: [SPARK-38754][SQL][TEST][3.1] Using EquivalentExpressions getEquivalentExprs function instead of getExprState at SubexpressionEliminati

2022-04-03 Thread GitBox



dongjoon-hyun commented on pull request #36033:
URL: https://github.com/apache/spark/pull/36033#issuecomment-1086918872


   Originally, `branch-3.1` was broken, but `branch-3.2` wasn't.
   Given that, the forward-port from 3.1 to 3.2 looks wrong to me.
   I'm going to revert this from `branch-3.2` and recover branch-3.2 
compilation.
   If we need this in branch-3.2 for some other reasons, please make a 
follow-up PR to get a build result.
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36033: [SPARK-38754][SQL][TEST][3.1] Using EquivalentExpressions getEquivalentExprs function instead of getExprState at SubexpressionEliminati

2022-04-03 Thread GitBox



dongjoon-hyun commented on pull request #36033:
URL: https://github.com/apache/spark/pull/36033#issuecomment-1086917353


   Hi, @cloud-fan .
   
   This seems to break branch-3.2 compilation.
   ```
   [error] 
/home/runner/work/spark/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubexpressionEliminationSuite.scala:404:26:
 value getEquivalentExprs is not a member of 
org.apache.spark.sql.catalyst.expressions.EquivalentExpressions
   [error]   assert(equivalence.getEquivalentExprs(expression).size == 0)
   [error]  ^
   [error] one error found
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #35886: [SPARK-38582][K8S] Introduce buildEnvVars and buildEnvVarsWithFieldRef for KubernetesUtils to eliminate duplicate code pattern

2022-04-03 Thread GitBox



dongjoon-hyun commented on pull request #35886:
URL: https://github.com/apache/spark/pull/35886#issuecomment-1086916633


   FYI, we had better hold on these kind of PRs during the planned release 
process. It's the same for the other refactoring PRs.
   - https://github.com/apache/spark/pull/36049#pullrequestreview-929704033


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j

2022-04-03 Thread GitBox



yaooqinn commented on pull request #35765:
URL: https://github.com/apache/spark/pull/35765#issuecomment-1086914484


   thanks @dongjoon-hyun and all


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] awdavidson commented on a change in pull request #36048: [SPARK-38774][PYTHON] Implement Series.autocorr

2022-04-03 Thread GitBox



awdavidson commented on a change in pull request #36048:
URL: https://github.com/apache/spark/pull/36048#discussion_r841254014



##
File path: python/pyspark/pandas/series.py
##
@@ -2937,6 +2937,73 @@ def add_suffix(self, suffix: str) -> "Series":
 DataFrame(internal.with_new_sdf(sdf, index_fields=([None] * 
internal.index_level)))
 )
 
+def autocorr(self, periods: int = 1) -> float:
+"""
+Compute the lag-N autocorrelation.
+
+This method computes the Pearson correlation between
+the Series and its shifted self.
+
+Parameters
+--
+periods : int, default 1
+Number of lags to apply before performing autocorrelation.
+
+Returns
+---
+float
+The Pearson correlation between self and self.shift(lag).
+
+See Also
+
+Series.corr : Compute the correlation between two Series.
+Series.shift : Shift index by desired number of periods.
+DataFrame.corr : Compute pairwise correlation of columns.
+
+Notes
+-
+If the Pearson correlation is not well defined return 'NaN'.
+
+Examples
+
+>>> s = ps.Series([.2, .0, .6, .2, np.nan, .5, .6])
+>>> s.autocorr()  # doctest: +ELLIPSIS
+-0.141219...
+>>> s.autocorr(0)  # doctest: +ELLIPSIS
+1.0...
+>>> s.autocorr(2)  # doctest: +ELLIPSIS
+0.970725...
+>>> s.autocorr(-3)  # doctest: +ELLIPSIS
+0.277350...
+>>> s.autocorr(5)  # doctest: +ELLIPSIS
+-1.00...
+>>> s.autocorr(6)  # doctest: +ELLIPSIS
+nan
+
+If the Pearson correlation is not well defined, then 'NaN' is returned.
+
+>>> s = ps.Series([1, 0, 0, 0])
+>>> s.autocorr()
+nan
+"""
+# This implementation is suboptimal because it moves all data to a 
single partition,
+# global sort should be used instead of window, but it should be a 
start
+if not isinstance(periods, int):
+raise TypeError("periods should be an int; however, got [%s]" % 
type(periods).__name__)
+
+scol = self.spark.column.alias("__tmp_col__")
+if periods == 0:
+lag_col = scol.alias("__tmp_lag_col__")
+else:
+window = Window.orderBy(NATURAL_ORDER_COLUMN_NAME)
+lag_col = F.lag(scol, 
periods).over(window).alias("__tmp_lag_col__")
+
+return (
+self._internal.spark_frame.select([scol, lag_col])
+.dropna("any")
+.corr("__tmp_col__", "__tmp_lag_col__")

Review comment:
   Nit: should we define the column names in variables which are reused 
throughout the method?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j

2022-04-03 Thread GitBox



dongjoon-hyun closed pull request #35765:
URL: https://github.com/apache/spark/pull/35765


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`

2022-04-03 Thread GitBox



dongjoon-hyun closed pull request #36051:
URL: https://github.com/apache/spark/pull/36051


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`

2022-04-03 Thread GitBox



dongjoon-hyun commented on pull request #36051:
URL: https://github.com/apache/spark/pull/36051#issuecomment-1086909995


   Thank you, @gengliangwang , @srowen , @yaooqinn . Merged to master/3.3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn opened a new pull request #36053: [SPARK-38778][INFRA][BUILD] Replace http with https for project url in pom

2022-04-03 Thread GitBox



yaooqinn opened a new pull request #36053:
URL: https://github.com/apache/spark/pull/36053


   
   
   
   
   ### What changes were proposed in this pull request?
   
   
   change http://spark.apache.org/ to 
https://spark.apache.org/ in the project URL of all pom files
   ### Why are the changes needed?
   
   
   fix home page in maven central 
https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/3.2.1
   
   
    From
   License | Apache 2.0
   -- | --
   Categories |Hadoop Query Engines
   HomePage|http://spark.apache.org/
   Date | (Jan 26, 2022)
   
    to
   
   License | Apache 2.0
   -- | --
   Categories |Hadoop Query Engines
   HomePage|https://spark.apache.org/
   Date | (Jan 26, 2022)
   ### Does this PR introduce _any_ user-facing change?
   
   no
   
   
   ### How was this patch tested?
   
   
   pass GHA


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter

2022-04-03 Thread GitBox



AmplabJenkins commented on pull request #36047:
URL: https://github.com/apache/spark/pull/36047#issuecomment-1086901189


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn opened a new pull request #36052: [SPARK-38777][YARN] Add `bin/spark-submit --kill / --status` support for yarn

2022-04-03 Thread GitBox



yaooqinn opened a new pull request #36052:
URL: https://github.com/apache/spark/pull/36052


   
   
   ### What changes were proposed in this pull request?
   
   
   In this PR, we extend the  `bin/spark-submit` to make it support ` --kill / 
--status` cli options for yarn cluster manager, which is supported by 
standalone/kubernetes and mesos.
   
   
   ### Why are the changes needed?
   
   
   improve `bin/spark-submit` to make it consistent across cluster managers
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   yes. as the behavior we describe above
   
   ### How was this patch tested?
   
   
   new unit test added


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #36049: [SPARK-38775][ML] cleanup validation functions

2022-04-03 Thread GitBox



srowen commented on a change in pull request #36049:
URL: https://github.com/apache/spark/pull/36049#discussion_r841241230



##
File path: mllib/src/main/scala/org/apache/spark/ml/util/DatasetUtils.scala
##
@@ -138,4 +140,61 @@ private[spark] object DatasetUtils {
   case Row(point: Vector) => OldVectors.fromML(point)
 }
   }
+
+  /**
+   * Get the number of classes.  This looks in column metadata first, and if 
that is missing,
+   * then this assumes classes are indexed 0,1,...,numClasses-1 and computes 
numClasses
+   * by finding the maximum label value.
+   *
+   * Label validation (ensuring all labels are integers >= 0) needs to be 
handled elsewhere,
+   * such as in `extractLabeledPoints()`.
+   *
+   * @param dataset  Dataset which contains a column [[labelCol]]
+   * @param maxNumClasses  Maximum number of classes allowed when inferred 
from data.  If numClasses
+   *   is specified in the metadata, then maxNumClasses is 
ignored.
+   * @return  number of classes
+   * @throws IllegalArgumentException  if metadata does not specify 
numClasses, and the
+   *   actual numClasses exceeds maxNumClasses
+   */
+  private[ml] def getNumClasses(
+  dataset: Dataset[_],
+  labelCol: String,
+  maxNumClasses: Int = 100): Int = {
+MetadataUtils.getNumClasses(dataset.schema(labelCol)) match {
+  case Some(n: Int) => n
+  case None =>
+// Get number of classes from dataset itself.
+val maxLabelRow: Array[Row] = dataset
+  .select(max(checkClassificationLabels(labelCol, 
Some(maxNumClasses
+  .take(1)
+if (maxLabelRow.isEmpty || maxLabelRow(0).get(0) == null) {
+  throw new SparkException("ML algorithm was given empty dataset.")
+}
+val maxDoubleLabel: Double = maxLabelRow.head.getDouble(0)
+require((maxDoubleLabel + 1).isValidInt, s"Classifier found max label 
value =" +
+  s" $maxDoubleLabel but requires integers in range [0, ... 
${Int.MaxValue})")
+val numClasses = maxDoubleLabel.toInt + 1
+require(numClasses <= maxNumClasses, s"Classifier inferred $numClasses 
from label values" +
+  s" in column $labelCol, but this exceeded the max numClasses 
($maxNumClasses) allowed" +
+  s" to be inferred from values.  To avoid this error for labels with 
> $maxNumClasses" +
+  s" classes, specify numClasses explicitly in the metadata; this can 
be done by applying" +
+  s" StringIndexer to the label column.")
+logInfo(this.getClass.getCanonicalName + s" inferred $numClasses 
classes for" +
+  s" labelCol=$labelCol since numClasses was not specified in the 
column metadata.")
+numClasses
+}
+  }
+
+  /**
+   * Obtain the number of features in a vector column.
+   * If no metadata is available, extract it from the dataset.
+   */
+  private[ml] def getNumFeatures(dataset: Dataset[_], vectorCol: String): Int 
= {
+MetadataUtils.getNumFeatures(dataset.schema(vectorCol)) match {

Review comment:
   getOrElse works here too but doesn't matter




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`

2022-04-03 Thread GitBox



gengliangwang commented on pull request #36051:
URL: https://github.com/apache/spark/pull/36051#issuecomment-1086883386


   @dongjoon-hyun thanks for fixing it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dcoliversun commented on pull request #36044: [SPARK-38770][K8S] Remove `renameMainAppResource` from `baseDriverContainer`

2022-04-03 Thread GitBox



dcoliversun commented on pull request #36044:
URL: https://github.com/apache/spark/pull/36044#issuecomment-1086847636


    Thanks for your help @dongjoon-hyun @martin-g 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] lw33 commented on pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

2022-04-03 Thread GitBox



lw33 commented on pull request #35979:
URL: https://github.com/apache/spark/pull/35979#issuecomment-1086831394


   > Sorry guys. Supporting `illegal char` by removing `toURI` doesn't look 
like a safe improvement to me.
   > 
   > Given the trade-off between benefit and risk, we had better recommend to 
avoid this kind of **illegal char** usage instead, @lw33 .
   
   Why we need `toURI` here, i see the parent class `SingleEventLogFileWriter` 
`logpath` is `Path.toString`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak commented on pull request #35443: [MINOR][CORE] Change the log level to WARN for the message which is shown in case users attemp to add a JAR twice

2022-04-03 Thread GitBox



sarutak commented on pull request #35443:
URL: https://github.com/apache/spark/pull/35443#issuecomment-1086831154


   @dongjoon-hyun Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox



AngersZh commented on pull request #35799:
URL: https://github.com/apache/spark/pull/35799#issuecomment-1086820418


   @dongjoon-hyun Build failed but seems not related to this pr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox



AngersZh commented on pull request #35799:
URL: https://github.com/apache/spark/pull/35799#issuecomment-1086800839


   > I meant this, [#35799 
(comment)](https://github.com/apache/spark/pull/35799#discussion_r841166511) . 
:)
   > 
   > > Didn't got your point about revise the code for Apache Spark 3.4.
   
   Ok, have changed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] peter-toth commented on pull request #35382: [SPARK-28090][SQL] Improve `replaceAliasButKeepName` performance

2022-04-03 Thread GitBox



peter-toth commented on pull request #35382:
URL: https://github.com/apache/spark/pull/35382#issuecomment-1086800126


   Thanks @cloud-fan, @dongjoon-hyun for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kianelbo closed pull request #35977: [SPARK-38660][PYTHON] PySpark DeprecationWarning: distutils Version classes are deprecated

2022-04-03 Thread GitBox



kianelbo closed pull request #35977:
URL: https://github.com/apache/spark/pull/35977


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox



dongjoon-hyun commented on pull request #35799:
URL: https://github.com/apache/spark/pull/35799#issuecomment-1086799205


   I meant this, 
https://github.com/apache/spark/pull/35799#discussion_r841166511 . :)
   > Didn't got your point about revise the code for Apache Spark 3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #34970: [DO NOT MERGE] investigate test failures if we test ANSI mode in github actions

2022-04-03 Thread GitBox



dongjoon-hyun commented on pull request #34970:
URL: https://github.com/apache/spark/pull/34970#issuecomment-1086798884


   Shall we close this if all tests are completed, @gengliangwang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #35290: [SPARK-37865][SQL][3.0]Fix union bug when the first child of union has duplicate columns

2022-04-03 Thread GitBox



dongjoon-hyun commented on pull request #35290:
URL: https://github.com/apache/spark/pull/35290#issuecomment-1086798303


   Hi, @chasingegg . Thank you for making a PR. However, Apache Spark 3.0.0 was 
released on June 18, 2020. According to [Apache Spark Versioning 
Policy](https://spark.apache.org/versioning-policy.html), `branch-3.0` is no 
longer considered maintained because it passed 18 months already. No more 3.0.x 
releases should be expected, even for bug fixes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #35443: [MINOR][CORE] Change the log level to WARN for the message which is shown in case users attemp to add a JAR twice

2022-04-03 Thread GitBox



dongjoon-hyun closed pull request #35443:
URL: https://github.com/apache/spark/pull/35443


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #35382: [SPARK-28090][SQL] Improve `replaceAliasButKeepName` performance

2022-04-03 Thread GitBox



dongjoon-hyun closed pull request #35382:
URL: https://github.com/apache/spark/pull/35382


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox



AngersZh commented on pull request #35799:
URL: https://github.com/apache/spark/pull/35799#issuecomment-1086796545


   > In general, +1 for the requirement and idea, @AngersZh . Shall we 
revise the code for Apache Spark 3.4?
   
   Didn't got your point about `revise the code for Apache Spark 3.4`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox



AngersZh commented on a change in pull request #35799:
URL: https://github.com/apache/spark/pull/35799#discussion_r841178986



##
File path: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala
##
@@ -386,3 +400,14 @@ class StreamingContextStoppingCollector(val ssc: 
StreamingContext) extends Strea
 }
   }
 }
+
+class CustomizedStreamingListener extends StreamingListener {

Review comment:
   Done

##
File path: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala
##
@@ -235,6 +236,19 @@ class StreamingListenerSuite extends TestSuiteBase with 
LocalStreamingContext wi
 verifyNoMoreInteractions(streamingListener)
   }
 
+  test("SPARK-38498: Support customized streaming listener") {

Review comment:
   Done

##
File path: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
##
@@ -616,6 +616,27 @@ class StreamingContext private[streaming] (
 }
   }
 
+  /**
+   * Registers streaming listeners specified in spark.streaming.extraListeners.
+   */
+  private def setupExtraStreamingListener(): Unit = {

Review comment:
   Done

##
File path: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingConf.scala
##
@@ -185,4 +185,10 @@ object StreamingConf {
   .longConf
   .createWithDefault(0)
 
+  private[streaming] val STREAMING_EXTRA_LISTENERS = 
ConfigBuilder("spark.streaming.extraListeners")
+.doc("Class names of streaming listeners to add to StreamingContext during 
initialization.")
+.version("3.3.0")

Review comment:
   Done

##
File path: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingConf.scala
##
@@ -185,4 +185,10 @@ object StreamingConf {
   .longConf
   .createWithDefault(0)
 
+  private[streaming] val STREAMING_EXTRA_LISTENERS = 
ConfigBuilder("spark.streaming.extraListeners")

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #36011:
URL: https://github.com/apache/spark/pull/36011#discussion_r841176375



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
##
@@ -294,4 +295,23 @@ class SparkSessionExtensions {
   def injectTableFunction(functionDescription: TableFunctionDescription): Unit 
= {
 injectedTableFunctions += functionDescription
   }
+
+  private[this] val runtimeOptimizerRules = mutable.Buffer.empty[RuleBuilder]
+
+  private[sql] def buildRuntimeOptimizerRules(session: SparkSession): 
Seq[Rule[LogicalPlan]] = {
+runtimeOptimizerRules.map(_.apply(session)).toSeq
+  }
+
+  /**
+   * Inject a runtime `Rule` builder into the [[SparkSession]].
+   * The injected rules will be executed after built-in
+   * [[org.apache.spark.sql.execution.adaptive.AQEOptimizer]] rules are 
applied.
+   * A runtime optimizer rule is used to improve the quality of a logical plan 
during execution
+   * which can leverage accurate statistics from shuffle.
+   *
+   * Note that, it does not work if adaptive query execution is disabled.
+   */
+  def injectRuntimeOptimizerRule(builder: RuleBuilder): Unit = {
+runtimeOptimizerRules += builder
+  }

Review comment:
   Shall we move these new additions somewhere around the exiting 
`queryStagePrepRuleBuilders` and `buildQueryStagePrepRules`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #36011:
URL: https://github.com/apache/spark/pull/36011#discussion_r841174227



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRulesHolder.scala
##
@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.SparkPlan
+
+/**
+ * A holder to warp the SQL extension rules of adaptive query execution
+ */
+class AdaptiveRulesHolder(

Review comment:
   +1 for this addition.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #36011:
URL: https://github.com/apache/spark/pull/36011#discussion_r841174107



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##
@@ -83,7 +83,7 @@ case class AdaptiveSparkPlanExec(
   @transient private val planChangeLogger = new PlanChangeLogger[SparkPlan]()
 
   // The logical plan optimizer for re-optimizing the current logical plan.
-  @transient private val optimizer = new AQEOptimizer(conf)
+  @transient private val optimizer = new AQEOptimizer(context.session)

Review comment:
   ```scala
   - @transient private val optimizer = new AQEOptimizer(context.session)
   + @transient private val optimizer = new AQEOptimizer(conf,
 session.sessionState.adaptiveRulesHolder.runtimeOptimizerRules)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #36011:
URL: https://github.com/apache/spark/pull/36011#discussion_r841174107



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala
##
@@ -83,7 +83,7 @@ case class AdaptiveSparkPlanExec(
   @transient private val planChangeLogger = new PlanChangeLogger[SparkPlan]()
 
   // The logical plan optimizer for re-optimizing the current logical plan.
-  @transient private val optimizer = new AQEOptimizer(conf)
+  @transient private val optimizer = new AQEOptimizer(context.session)

Review comment:
   ```
   - @transient private val optimizer = new AQEOptimizer(context.session)
   + @transient private val optimizer = new AQEOptimizer(conf, 
session.sessionState.adaptiveRulesHolder.runtimeOptimizerRules)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #36011:
URL: https://github.com/apache/spark/pull/36011#discussion_r841173889



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
##
@@ -28,7 +29,9 @@ import org.apache.spark.util.Utils
 /**
  * The optimizer for re-optimizing the logical plan used by 
AdaptiveSparkPlanExec.
  */
-class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] {
+class AQEOptimizer(session: SparkSession) extends RuleExecutor[LogicalPlan] {

Review comment:
   `SparkSession` seems an overkill. Shall we narrow down and pass over 
`session.sessionState.adaptiveRulesHolder.runtimeOptimizerRules` only (in 
addition to the existing `conf`)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #36011:
URL: https://github.com/apache/spark/pull/36011#discussion_r841173889



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala
##
@@ -28,7 +29,9 @@ import org.apache.spark.util.Utils
 /**
  * The optimizer for re-optimizing the logical plan used by 
AdaptiveSparkPlanExec.
  */
-class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] {
+class AQEOptimizer(session: SparkSession) extends RuleExecutor[LogicalPlan] {

Review comment:
   `SparkSession` seems an overkill. Shall we narrow down and pass over 
`session.sessionState.adaptiveRulesHolder` only (in addition to the existing 
`conf`)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in 'ALS validate input dataset' test case

2022-04-03 Thread GitBox



dongjoon-hyun commented on pull request #36051:
URL: https://github.com/apache/spark/pull/36051#issuecomment-1086791099


   cc @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun opened a new pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in 'ALS validate input dataset' test case

2022-04-03 Thread GitBox



dongjoon-hyun opened a new pull request #36051:
URL: https://github.com/apache/spark/pull/36051


   …
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j

2022-04-03 Thread GitBox



dongjoon-hyun edited a comment on pull request #35765:
URL: https://github.com/apache/spark/pull/35765#issuecomment-1086786301


   cc @MaxGekk since he is a release manager.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j

2022-04-03 Thread GitBox



dongjoon-hyun commented on pull request #35765:
URL: https://github.com/apache/spark/pull/35765#issuecomment-1086786301


   cc @MaxGekk since he is a release manager.
   
   Although this is still unclear to me, this could be a regression at Apache 
Spark 3.3.0 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35984: [MINOR] Log AnalysisException output for debug and tracing

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #35984:
URL: https://github.com/apache/spark/pull/35984#discussion_r841167501



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1761,7 +1761,9 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 try {
   innerResolve(expr, isTopLevel = true)
 } catch {
-  case _: AnalysisException if !throws => expr
+  case ae: AnalysisException if !throws =>
+logWarning(ae.message)

Review comment:
   If you want to suggest this feature for `debug and tracing` as you 
mentioned in the PR title, this should be `logDebug` or `logTrace`, 
@monkeyboy123 .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35984: [MINOR] Log AnalysisException output for debug and tracing

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #35984:
URL: https://github.com/apache/spark/pull/35984#discussion_r841167501



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##
@@ -1761,7 +1761,9 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 try {
   innerResolve(expr, isTopLevel = true)
 } catch {
-  case _: AnalysisException if !throws => expr
+  case ae: AnalysisException if !throws =>
+logWarning(ae.message)

Review comment:
   If you want to suggest this feature for `debug and tracing`, this should 
be `logDebug` or `logTrace`, @monkeyboy123 .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #35799:
URL: https://github.com/apache/spark/pull/35799#discussion_r841167066



##
File path: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala
##
@@ -386,3 +400,14 @@ class StreamingContextStoppingCollector(val ssc: 
StreamingContext) extends Strea
 }
   }
 }
+
+class CustomizedStreamingListener extends StreamingListener {

Review comment:
   Shall we use a different word instead of `Customized`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #35799:
URL: https://github.com/apache/spark/pull/35799#discussion_r841166959



##
File path: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala
##
@@ -235,6 +236,19 @@ class StreamingListenerSuite extends TestSuiteBase with 
LocalStreamingContext wi
 verifyNoMoreInteractions(streamingListener)
   }
 
+  test("SPARK-38498: Support customized streaming listener") {

Review comment:
   `customized` -> `extra` because the conf name is 
`STREAMING_EXTRA_LISTENERS`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #35799:
URL: https://github.com/apache/spark/pull/35799#discussion_r841166793



##
File path: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
##
@@ -616,6 +616,27 @@ class StreamingContext private[streaming] (
 }
   }
 
+  /**
+   * Registers streaming listeners specified in spark.streaming.extraListeners.
+   */
+  private def setupExtraStreamingListener(): Unit = {

Review comment:
   I guess `registerXXX` is more seamless than `setupXXX` because 
   - we have `registerProgressListener` already
   - function description also says `Registers`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #35799:
URL: https://github.com/apache/spark/pull/35799#discussion_r841166511



##
File path: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingConf.scala
##
@@ -185,4 +185,10 @@ object StreamingConf {
   .longConf
   .createWithDefault(0)
 
+  private[streaming] val STREAMING_EXTRA_LISTENERS = 
ConfigBuilder("spark.streaming.extraListeners")
+.doc("Class names of streaming listeners to add to StreamingContext during 
initialization.")
+.version("3.3.0")

Review comment:
   This should be the version of `master` branch which is currently `3.4.0`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-04-03 Thread GitBox



dongjoon-hyun commented on a change in pull request #35799:
URL: https://github.com/apache/spark/pull/35799#discussion_r841166490



##
File path: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingConf.scala
##
@@ -185,4 +185,10 @@ object StreamingConf {
   .longConf
   .createWithDefault(0)
 
+  private[streaming] val STREAMING_EXTRA_LISTENERS = 
ConfigBuilder("spark.streaming.extraListeners")

Review comment:
   If you don't mind, shall we match the code patter the other codes like 
line 183?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #35881: [SPARK-36664][CORE] Log time waiting for cluster resources

2022-04-03 Thread GitBox



dongjoon-hyun commented on pull request #35881:
URL: https://github.com/apache/spark/pull/35881#issuecomment-1086784648


   Thank you, @holdenk .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JacobZheng0927 removed a comment on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.

2022-04-03 Thread GitBox



JacobZheng0927 removed a comment on pull request #29965:
URL: https://github.com/apache/spark/pull/29965#issuecomment-1085680797


   I'm wondering if this change will cause a driver memory overflow, as 
duplicate SQLMetric may take up a lot of memory. @leanken-zz 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

87 matches

Mail list logo