[GitHub] [spark] itholic commented on pull request #36058: [SPARK-38780][PYTHON][DOCS] PySpark docs build should fail when there is warning.
itholic commented on PR #36058: URL: https://github.com/apache/spark/pull/36058#issuecomment-1087114454 Let me re-trigger the build with rebasing master after https://github.com/apache/spark/pull/36057 is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #36057: [MINOR][DOCS] Remove PySpark doc build warnings
itholic commented on PR #36057: URL: https://github.com/apache/spark/pull/36057#issuecomment-1087113553 Just opened a PR at https://github.com/apache/spark/pull/36058 to make warning to be failed. So, let me cherry-pick this fix to the opened PR after merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic opened a new pull request, #36058: [SPARK-38780][PYTHON][DOCS] PySpark docs build should fail when there is warning.
itholic opened a new pull request, #36058: URL: https://github.com/apache/spark/pull/36058 ### What changes were proposed in this pull request? This PR proposes to add option "-W" when running PySpark documentation build via Sphinx. ### Why are the changes needed? To make documentation build failing when the documentation violates the Sphinx warning rules. ### Does this PR introduce _any_ user-facing change? This would make docs a bit more prettier. ### How was this patch tested? The existing build & tests should be passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #34324: [SPARK-37015][PYTHON] Inline type hints for python/pyspark/streaming/dstream.py
itholic commented on PR #34324: URL: https://github.com/apache/spark/pull/34324#issuecomment-1087105177 Also mind taking a last look for this, @zero323 ?? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #34293: [SPARK-37014][PYTHON] Inline type hints for python/pyspark/streaming/context.py
itholic commented on PR #34293: URL: https://github.com/apache/spark/pull/34293#issuecomment-1087104988 Seems fine to me. Would you mind taking a last look for this, @zero323 ?? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #36057: [MINOR][DOCS] Remove PySpark doc build warnings
itholic commented on PR #36057: URL: https://github.com/apache/spark/pull/36057#issuecomment-1087089407 @HyukjinKwon sure, let me take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #36057: [MINOR][DOCS] Remove PySpark doc build warnings
HyukjinKwon commented on PR #36057: URL: https://github.com/apache/spark/pull/36057#issuecomment-1087087712 cc @xinrong-databricks and @zero323 FYI. @itholic BTW, I remember we talked about warnings in Sphinx build before. I think it should fail for these warnings but not sure why it doesn't. Would you mind taking a look for this one, and making it failed if there are warnings detected during the build? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request, #36057: [MINOR][DOCS] Remove PySpark doc build warnings
HyukjinKwon opened a new pull request, #36057: URL: https://github.com/apache/spark/pull/36057 ### What changes were proposed in this pull request? This PR fixes a various documentation build warnings in PySpark documentation ### Why are the changes needed? To render the docs better. ### Does this PR introduce _any_ user-facing change? Yes, it changes the documentation to be prettier. Pretty minor though. ### How was this patch tested? I manually tested it by building the PySpark documentation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu opened a new pull request, #36056: [WIP][SPARK-36571][SQL] Add an SQLOverwriteHadoopMapReduceCommitProtocol to support all SQL overwrite write data to staging dir
AngersZh opened a new pull request, #36056: URL: https://github.com/apache/spark/pull/36056 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #36050: [SPARK-38779] [SQL][Tests] Unify the pushed operator checking between FileSource test suite and JDBCV2Suite
huaxingao commented on PR #36050: URL: https://github.com/apache/spark/pull/36050#issuecomment-1087061975 Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter
wangyum commented on PR #36047: URL: https://github.com/apache/spark/pull/36047#issuecomment-1087059583 Merged to master and branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum closed pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter
wangyum closed pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter URL: https://github.com/apache/spark/pull/36047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] lw33 commented on pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path
lw33 commented on PR #35979: URL: https://github.com/apache/spark/pull/35979#issuecomment-1087033057 Yes, maybe we don't need to do this change. I just found this problem when compacting event log, the event log could write to the path, but compat failed, so I thought this might be a bug. Of course I could change the path to avoid this problem. @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #36050: [SPARK-38779] [SQL][Tests] Unify the pushed operator checking between FileSource test suite and JDBCV2Suite
dongjoon-hyun closed pull request #36050: [SPARK-38779] [SQL][Tests] Unify the pushed operator checking between FileSource test suite and JDBCV2Suite URL: https://github.com/apache/spark/pull/36050 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya closed pull request #36055: [SPARK-34863][SQL][FOLLOWUP] Disable `spark.sql.parquet.enableNestedColumnVectorizedReader` by default
viirya closed pull request #36055: [SPARK-34863][SQL][FOLLOWUP] Disable `spark.sql.parquet.enableNestedColumnVectorizedReader` by default URL: https://github.com/apache/spark/pull/36055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #36055: [SPARK-34863][SQL][FOLLOWUP] Disable `spark.sql.parquet.enableNestedColumnVectorizedReader` by default
viirya commented on PR #36055: URL: https://github.com/apache/spark/pull/36055#issuecomment-1087021571 Thanks. Merging to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #36048: [SPARK-38774][PYTHON] Implement Series.autocorr
zhengruifeng commented on PR #36048: URL: https://github.com/apache/spark/pull/36048#issuecomment-1086995190 @xinrong-databricks Will add the tests and update the PR description, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #36049: [SPARK-38775][ML] cleanup validation functions
dongjoon-hyun commented on PR #36049: URL: https://github.com/apache/spark/pull/36049#issuecomment-1086992390 Thank you so much! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #36049: [SPARK-38775][ML] cleanup validation functions
zhengruifeng commented on PR #36049: URL: https://github.com/apache/spark/pull/36049#issuecomment-1086990897 @dongjoon-hyun Ok, I will hold on this PR since its target version is 3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #33257: [SPARK-36039][K8S] Fix executor pod hadoop conf mount
github-actions[bot] commented on PR #33257: URL: https://github.com/apache/spark/pull/33257#issuecomment-1086983267 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #34629: [SPARK-37355][CORE]Avoid Block Manager registrations when Executor is shutting down
github-actions[bot] closed pull request #34629: [SPARK-37355][CORE]Avoid Block Manager registrations when Executor is shutting down URL: https://github.com/apache/spark/pull/34629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] closed pull request #34953: [SPARK-37682][SQL]Apply 'merged column' and 'bit vector' in RewriteDistinctAggregates
github-actions[bot] closed pull request #34953: [SPARK-37682][SQL]Apply 'merged column' and 'bit vector' in RewriteDistinctAggregates URL: https://github.com/apache/spark/pull/34953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #34990: [SPARK-37717][SQL] Improve logging in BroadcastExchangeExec
github-actions[bot] commented on PR #34990: URL: https://github.com/apache/spark/pull/34990#issuecomment-1086983247 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #36038: [WIP][SPARK-38759][PYTHON][SS] Add StreamingQueryListener support in PySpark
HyukjinKwon commented on PR #36038: URL: https://github.com/apache/spark/pull/36038#issuecomment-1086981229 Yeah, actually that's what I was going to point out. Should be better to create a separate PR to improve the documentation for both sides :-). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #34659: [SPARK-34863][SQL] Support complex types for Parquet vectorized reader
sunchao commented on PR #34659: URL: https://github.com/apache/spark/pull/34659#issuecomment-1086971364 Thanks all for the review!!! @viirya I just opened #36055 for the follow-up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao opened a new pull request, #36055: [SPARK-34863][SQL][FOLLOWUP] Disable `spark.sql.parquet.enableNestedColumnVectorizedReader` by default
sunchao opened a new pull request, #36055: URL: https://github.com/apache/spark/pull/36055 ### What changes were proposed in this pull request? This PR disables `spark.sql.parquet.enableNestedColumnVectorizedReader` by default. ### Why are the changes needed? In #34659 the config was turned mainly for testing reason. As the feature is new, we should turn it off by default. ### Does this PR introduce _any_ user-facing change? The config `spark.sql.parquet.enableNestedColumnVectorizedReader` is turned off by default now. ### How was this patch tested? Existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #36038: [WIP][SPARK-38759][PYTHON][SS] Add StreamingQueryListener support in PySpark
HeartSaVioR commented on PR #36038: URL: https://github.com/apache/spark/pull/36038#issuecomment-1086968356 I see review comments about the doc which seem to be just copied from Scala/Java API doc. Since this PR focuses mainly to deal with feature parity, how about simply allowing copy-paste doc from Scala/Java API doc (with additional content if something only applies to PySpark), and having another PR fixing both Scala/Java and PySpark doc altogether after this has merged? The doc for Scala/Java API has been served for years and personally I'd consider commenting on the doc as proposing changes on "existing doc" instead of new doc, since the doc change here should trigger the change on Scala/Java doc to be in sync, effectively making changes on existing doc. Does it make sense for everyone? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on pull request #35657: [SPARK-37377][SQL] Initial implementation of Storage-Partitioned Join
sunchao commented on PR #35657: URL: https://github.com/apache/spark/pull/35657#issuecomment-1086963569 Thanks @dongjoon-hyun , updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #36039: [SPARK-38761][SQL] DS V2 supports push down misc non-aggregate functions
huaxingao commented on PR #36039: URL: https://github.com/apache/spark/pull/36039#issuecomment-1086961617 I have a general question: what are the criteria of the functions that can be pushed down to data source? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #36050: [SPARK-38779] [SQL][Tests] Unify the pushed operator checking between FileSource test suite and JDBCV2Suite
huaxingao commented on PR #36050: URL: https://github.com/apache/spark/pull/36050#issuecomment-1086961002 @dongjoon-hyun I created Spark-38779 for this. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sigmod commented on pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter
sigmod commented on PR #36047: URL: https://github.com/apache/spark/pull/36047#issuecomment-1086954891 LGTM. Can we merge it to branch-3.3 as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures
dongjoon-hyun closed pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures URL: https://github.com/apache/spark/pull/36054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures
dongjoon-hyun commented on PR #36054: URL: https://github.com/apache/spark/pull/36054#issuecomment-1086943631 Thank you, @srowen . This is a single test suite only change, and I verified in two ways. Merged to master/3.3. ``` SPARK_ANSI_SQL_MODE=true build/sbt "mllib/testOnly *.ALSSuite" SPARK_ANSI_SQL_MODE=false build/sbt "mllib/testOnly *.ALSSuite" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures
dongjoon-hyun commented on PR #36054: URL: https://github.com/apache/spark/pull/36054#issuecomment-1086940137 cc @gengliangwang , @srowen , @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`
dongjoon-hyun commented on PR #36051: URL: https://github.com/apache/spark/pull/36051#issuecomment-1086940092 Here is the follow-up. - https://github.com/apache/spark/pull/36054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request, #36054: [SPARK-38776][MLLIB][TESTS][FOLLOWUP] Disable ANSI_ENABLED more for `Out of Range` failures
dongjoon-hyun opened a new pull request, #36054: URL: https://github.com/apache/spark/pull/36054 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`
dongjoon-hyun commented on PR #36051: URL: https://github.com/apache/spark/pull/36051#issuecomment-1086939224 Oops. I realized that more `OutOfRange` failures were hidden in the same test case behind the previous `Overflow` failure. I'll make a follow-up soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on pull request #36048: [SPARK-38774][PYTHON] Implement Series.autocorr
xinrong-databricks commented on pull request #36048: URL: https://github.com/apache/spark/pull/36048#issuecomment-1086927554 Thanks @zhengruifeng! https://github.com/apache/spark/blob/master/python/pyspark/pandas/tests/test_series.py is a good place to add tests. It would be great to specify what changes in **Does this PR introduce any user-facing change?** section of the PR description. An example is good enough. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-databricks commented on a change in pull request #36006: [SPARK-38686][PYTHON] Implement `keep` parameter of `(Index/MultiIndex).drop_duplicates`
xinrong-databricks commented on a change in pull request #36006: URL: https://github.com/apache/spark/pull/36006#discussion_r841262157 ## File path: python/pyspark/pandas/indexes/multi.py ## @@ -893,6 +893,70 @@ def drop(self, codes: List[Any], level: Optional[Union[int, Name]] = None) -> "M ) return cast(MultiIndex, DataFrame(internal).index) +def drop_duplicates(self, keep: Union[bool, str] = "first") -> "MultiIndex": +""" Review comment: Thank! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35991: [SPARK-38675][CORE] Fix race during unlock in BlockInfoManager
dongjoon-hyun commented on a change in pull request #35991: URL: https://github.com/apache/spark/pull/35991#discussion_r841258213 ## File path: core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala ## @@ -360,12 +360,17 @@ private[storage] class BlockInfoManager extends Logging { info.writerTask = BlockInfo.NO_WRITER writeLocksByTask.get(taskAttemptId).remove(blockId) Review comment: For this, gentle ping @hvanhovell for his confirmation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path
dongjoon-hyun commented on pull request #35979: URL: https://github.com/apache/spark/pull/35979#issuecomment-1086919965 Back to the original proposal, why do we need to support `illegal char`, @lw33 ? It's illegal, isn't it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #36033: [SPARK-38754][SQL][TEST][3.1] Using EquivalentExpressions getEquivalentExprs function instead of getExprState at SubexpressionEliminati
dongjoon-hyun commented on pull request #36033: URL: https://github.com/apache/spark/pull/36033#issuecomment-1086918872 Originally, `branch-3.1` was broken, but `branch-3.2` wasn't. Given that, the forward-port from 3.1 to 3.2 looks wrong to me. I'm going to revert this from `branch-3.2` and recover branch-3.2 compilation. If we need this in branch-3.2 for some other reasons, please make a follow-up PR to get a build result. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #36033: [SPARK-38754][SQL][TEST][3.1] Using EquivalentExpressions getEquivalentExprs function instead of getExprState at SubexpressionEliminati
dongjoon-hyun commented on pull request #36033: URL: https://github.com/apache/spark/pull/36033#issuecomment-1086917353 Hi, @cloud-fan . This seems to break branch-3.2 compilation. ``` [error] /home/runner/work/spark/spark/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/SubexpressionEliminationSuite.scala:404:26: value getEquivalentExprs is not a member of org.apache.spark.sql.catalyst.expressions.EquivalentExpressions [error] assert(equivalence.getEquivalentExprs(expression).size == 0) [error] ^ [error] one error found ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #35886: [SPARK-38582][K8S] Introduce buildEnvVars and buildEnvVarsWithFieldRef for KubernetesUtils to eliminate duplicate code pattern
dongjoon-hyun commented on pull request #35886: URL: https://github.com/apache/spark/pull/35886#issuecomment-1086916633 FYI, we had better hold on these kind of PRs during the planned release process. It's the same for the other refactoring PRs. - https://github.com/apache/spark/pull/36049#pullrequestreview-929704033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j
yaooqinn commented on pull request #35765: URL: https://github.com/apache/spark/pull/35765#issuecomment-1086914484 thanks @dongjoon-hyun and all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] awdavidson commented on a change in pull request #36048: [SPARK-38774][PYTHON] Implement Series.autocorr
awdavidson commented on a change in pull request #36048: URL: https://github.com/apache/spark/pull/36048#discussion_r841254014 ## File path: python/pyspark/pandas/series.py ## @@ -2937,6 +2937,73 @@ def add_suffix(self, suffix: str) -> "Series": DataFrame(internal.with_new_sdf(sdf, index_fields=([None] * internal.index_level))) ) +def autocorr(self, periods: int = 1) -> float: +""" +Compute the lag-N autocorrelation. + +This method computes the Pearson correlation between +the Series and its shifted self. + +Parameters +-- +periods : int, default 1 +Number of lags to apply before performing autocorrelation. + +Returns +--- +float +The Pearson correlation between self and self.shift(lag). + +See Also + +Series.corr : Compute the correlation between two Series. +Series.shift : Shift index by desired number of periods. +DataFrame.corr : Compute pairwise correlation of columns. + +Notes +- +If the Pearson correlation is not well defined return 'NaN'. + +Examples + +>>> s = ps.Series([.2, .0, .6, .2, np.nan, .5, .6]) +>>> s.autocorr() # doctest: +ELLIPSIS +-0.141219... +>>> s.autocorr(0) # doctest: +ELLIPSIS +1.0... +>>> s.autocorr(2) # doctest: +ELLIPSIS +0.970725... +>>> s.autocorr(-3) # doctest: +ELLIPSIS +0.277350... +>>> s.autocorr(5) # doctest: +ELLIPSIS +-1.00... +>>> s.autocorr(6) # doctest: +ELLIPSIS +nan + +If the Pearson correlation is not well defined, then 'NaN' is returned. + +>>> s = ps.Series([1, 0, 0, 0]) +>>> s.autocorr() +nan +""" +# This implementation is suboptimal because it moves all data to a single partition, +# global sort should be used instead of window, but it should be a start +if not isinstance(periods, int): +raise TypeError("periods should be an int; however, got [%s]" % type(periods).__name__) + +scol = self.spark.column.alias("__tmp_col__") +if periods == 0: +lag_col = scol.alias("__tmp_lag_col__") +else: +window = Window.orderBy(NATURAL_ORDER_COLUMN_NAME) +lag_col = F.lag(scol, periods).over(window).alias("__tmp_lag_col__") + +return ( +self._internal.spark_frame.select([scol, lag_col]) +.dropna("any") +.corr("__tmp_col__", "__tmp_lag_col__") Review comment: Nit: should we define the column names in variables which are reused throughout the method? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j
dongjoon-hyun closed pull request #35765: URL: https://github.com/apache/spark/pull/35765 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`
dongjoon-hyun closed pull request #36051: URL: https://github.com/apache/spark/pull/36051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`
dongjoon-hyun commented on pull request #36051: URL: https://github.com/apache/spark/pull/36051#issuecomment-1086909995 Thank you, @gengliangwang , @srowen , @yaooqinn . Merged to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn opened a new pull request #36053: [SPARK-38778][INFRA][BUILD] Replace http with https for project url in pom
yaooqinn opened a new pull request #36053: URL: https://github.com/apache/spark/pull/36053 ### What changes were proposed in this pull request? change http://spark.apache.org/ to https://spark.apache.org/ in the project URL of all pom files ### Why are the changes needed? fix home page in maven central https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.13/3.2.1 From License | Apache 2.0 -- | -- Categories |Hadoop Query Engines HomePage|http://spark.apache.org/ Date | (Jan 26, 2022) to License | Apache 2.0 -- | -- Categories |Hadoop Query Engines HomePage|https://spark.apache.org/ Date | (Jan 26, 2022) ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? pass GHA -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #36047: [SPARK-32268][SQL][FOLLOWUP] Add ColumnPruning in injectBloomFilter
AmplabJenkins commented on pull request #36047: URL: https://github.com/apache/spark/pull/36047#issuecomment-1086901189 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn opened a new pull request #36052: [SPARK-38777][YARN] Add `bin/spark-submit --kill / --status` support for yarn
yaooqinn opened a new pull request #36052: URL: https://github.com/apache/spark/pull/36052 ### What changes were proposed in this pull request? In this PR, we extend the `bin/spark-submit` to make it support ` --kill / --status` cli options for yarn cluster manager, which is supported by standalone/kubernetes and mesos. ### Why are the changes needed? improve `bin/spark-submit` to make it consistent across cluster managers ### Does this PR introduce _any_ user-facing change? yes. as the behavior we describe above ### How was this patch tested? new unit test added -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #36049: [SPARK-38775][ML] cleanup validation functions
srowen commented on a change in pull request #36049: URL: https://github.com/apache/spark/pull/36049#discussion_r841241230 ## File path: mllib/src/main/scala/org/apache/spark/ml/util/DatasetUtils.scala ## @@ -138,4 +140,61 @@ private[spark] object DatasetUtils { case Row(point: Vector) => OldVectors.fromML(point) } } + + /** + * Get the number of classes. This looks in column metadata first, and if that is missing, + * then this assumes classes are indexed 0,1,...,numClasses-1 and computes numClasses + * by finding the maximum label value. + * + * Label validation (ensuring all labels are integers >= 0) needs to be handled elsewhere, + * such as in `extractLabeledPoints()`. + * + * @param dataset Dataset which contains a column [[labelCol]] + * @param maxNumClasses Maximum number of classes allowed when inferred from data. If numClasses + * is specified in the metadata, then maxNumClasses is ignored. + * @return number of classes + * @throws IllegalArgumentException if metadata does not specify numClasses, and the + * actual numClasses exceeds maxNumClasses + */ + private[ml] def getNumClasses( + dataset: Dataset[_], + labelCol: String, + maxNumClasses: Int = 100): Int = { +MetadataUtils.getNumClasses(dataset.schema(labelCol)) match { + case Some(n: Int) => n + case None => +// Get number of classes from dataset itself. +val maxLabelRow: Array[Row] = dataset + .select(max(checkClassificationLabels(labelCol, Some(maxNumClasses + .take(1) +if (maxLabelRow.isEmpty || maxLabelRow(0).get(0) == null) { + throw new SparkException("ML algorithm was given empty dataset.") +} +val maxDoubleLabel: Double = maxLabelRow.head.getDouble(0) +require((maxDoubleLabel + 1).isValidInt, s"Classifier found max label value =" + + s" $maxDoubleLabel but requires integers in range [0, ... ${Int.MaxValue})") +val numClasses = maxDoubleLabel.toInt + 1 +require(numClasses <= maxNumClasses, s"Classifier inferred $numClasses from label values" + + s" in column $labelCol, but this exceeded the max numClasses ($maxNumClasses) allowed" + + s" to be inferred from values. To avoid this error for labels with > $maxNumClasses" + + s" classes, specify numClasses explicitly in the metadata; this can be done by applying" + + s" StringIndexer to the label column.") +logInfo(this.getClass.getCanonicalName + s" inferred $numClasses classes for" + + s" labelCol=$labelCol since numClasses was not specified in the column metadata.") +numClasses +} + } + + /** + * Obtain the number of features in a vector column. + * If no metadata is available, extract it from the dataset. + */ + private[ml] def getNumFeatures(dataset: Dataset[_], vectorCol: String): Int = { +MetadataUtils.getNumFeatures(dataset.schema(vectorCol)) match { Review comment: getOrElse works here too but doesn't matter -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in `ALSSuite`
gengliangwang commented on pull request #36051: URL: https://github.com/apache/spark/pull/36051#issuecomment-1086883386 @dongjoon-hyun thanks for fixing it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dcoliversun commented on pull request #36044: [SPARK-38770][K8S] Remove `renameMainAppResource` from `baseDriverContainer`
dcoliversun commented on pull request #36044: URL: https://github.com/apache/spark/pull/36044#issuecomment-1086847636 Thanks for your help @dongjoon-hyun @martin-g -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] lw33 commented on pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path
lw33 commented on pull request #35979: URL: https://github.com/apache/spark/pull/35979#issuecomment-1086831394 > Sorry guys. Supporting `illegal char` by removing `toURI` doesn't look like a safe improvement to me. > > Given the trade-off between benefit and risk, we had better recommend to avoid this kind of **illegal char** usage instead, @lw33 . Why we need `toURI` here, i see the parent class `SingleEventLogFileWriter` `logpath` is `Path.toString`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sarutak commented on pull request #35443: [MINOR][CORE] Change the log level to WARN for the message which is shown in case users attemp to add a JAR twice
sarutak commented on pull request #35443: URL: https://github.com/apache/spark/pull/35443#issuecomment-1086831154 @dongjoon-hyun Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration
AngersZh commented on pull request #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1086820418 @dongjoon-hyun Build failed but seems not related to this pr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration
AngersZh commented on pull request #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1086800839 > I meant this, [#35799 (comment)](https://github.com/apache/spark/pull/35799#discussion_r841166511) . :) > > > Didn't got your point about revise the code for Apache Spark 3.4. Ok, have changed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] peter-toth commented on pull request #35382: [SPARK-28090][SQL] Improve `replaceAliasButKeepName` performance
peter-toth commented on pull request #35382: URL: https://github.com/apache/spark/pull/35382#issuecomment-1086800126 Thanks @cloud-fan, @dongjoon-hyun for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] kianelbo closed pull request #35977: [SPARK-38660][PYTHON] PySpark DeprecationWarning: distutils Version classes are deprecated
kianelbo closed pull request #35977: URL: https://github.com/apache/spark/pull/35977 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration
dongjoon-hyun commented on pull request #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1086799205 I meant this, https://github.com/apache/spark/pull/35799#discussion_r841166511 . :) > Didn't got your point about revise the code for Apache Spark 3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #34970: [DO NOT MERGE] investigate test failures if we test ANSI mode in github actions
dongjoon-hyun commented on pull request #34970: URL: https://github.com/apache/spark/pull/34970#issuecomment-1086798884 Shall we close this if all tests are completed, @gengliangwang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #35290: [SPARK-37865][SQL][3.0]Fix union bug when the first child of union has duplicate columns
dongjoon-hyun commented on pull request #35290: URL: https://github.com/apache/spark/pull/35290#issuecomment-1086798303 Hi, @chasingegg . Thank you for making a PR. However, Apache Spark 3.0.0 was released on June 18, 2020. According to [Apache Spark Versioning Policy](https://spark.apache.org/versioning-policy.html), `branch-3.0` is no longer considered maintained because it passed 18 months already. No more 3.0.x releases should be expected, even for bug fixes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #35443: [MINOR][CORE] Change the log level to WARN for the message which is shown in case users attemp to add a JAR twice
dongjoon-hyun closed pull request #35443: URL: https://github.com/apache/spark/pull/35443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #35382: [SPARK-28090][SQL] Improve `replaceAliasButKeepName` performance
dongjoon-hyun closed pull request #35382: URL: https://github.com/apache/spark/pull/35382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration
AngersZh commented on pull request #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1086796545 > In general, +1 for the requirement and idea, @AngersZh . Shall we revise the code for Apache Spark 3.4? Didn't got your point about `revise the code for Apache Spark 3.4`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration
AngersZh commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841178986 ## File path: streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala ## @@ -386,3 +400,14 @@ class StreamingContextStoppingCollector(val ssc: StreamingContext) extends Strea } } } + +class CustomizedStreamingListener extends StreamingListener { Review comment: Done ## File path: streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala ## @@ -235,6 +236,19 @@ class StreamingListenerSuite extends TestSuiteBase with LocalStreamingContext wi verifyNoMoreInteractions(streamingListener) } + test("SPARK-38498: Support customized streaming listener") { Review comment: Done ## File path: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ## @@ -616,6 +616,27 @@ class StreamingContext private[streaming] ( } } + /** + * Registers streaming listeners specified in spark.streaming.extraListeners. + */ + private def setupExtraStreamingListener(): Unit = { Review comment: Done ## File path: streaming/src/main/scala/org/apache/spark/streaming/StreamingConf.scala ## @@ -185,4 +185,10 @@ object StreamingConf { .longConf .createWithDefault(0) + private[streaming] val STREAMING_EXTRA_LISTENERS = ConfigBuilder("spark.streaming.extraListeners") +.doc("Class names of streaming listeners to add to StreamingContext during initialization.") +.version("3.3.0") Review comment: Done ## File path: streaming/src/main/scala/org/apache/spark/streaming/StreamingConf.scala ## @@ -185,4 +185,10 @@ object StreamingConf { .longConf .createWithDefault(0) + private[streaming] val STREAMING_EXTRA_LISTENERS = ConfigBuilder("spark.streaming.extraListeners") Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841176375 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala ## @@ -294,4 +295,23 @@ class SparkSessionExtensions { def injectTableFunction(functionDescription: TableFunctionDescription): Unit = { injectedTableFunctions += functionDescription } + + private[this] val runtimeOptimizerRules = mutable.Buffer.empty[RuleBuilder] + + private[sql] def buildRuntimeOptimizerRules(session: SparkSession): Seq[Rule[LogicalPlan]] = { +runtimeOptimizerRules.map(_.apply(session)).toSeq + } + + /** + * Inject a runtime `Rule` builder into the [[SparkSession]]. + * The injected rules will be executed after built-in + * [[org.apache.spark.sql.execution.adaptive.AQEOptimizer]] rules are applied. + * A runtime optimizer rule is used to improve the quality of a logical plan during execution + * which can leverage accurate statistics from shuffle. + * + * Note that, it does not work if adaptive query execution is disabled. + */ + def injectRuntimeOptimizerRule(builder: RuleBuilder): Unit = { +runtimeOptimizerRules += builder + } Review comment: Shall we move these new additions somewhere around the exiting `queryStagePrepRuleBuilders` and `buildQueryStagePrepRules`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841174227 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRulesHolder.scala ## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.adaptive + +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.SparkPlan + +/** + * A holder to warp the SQL extension rules of adaptive query execution + */ +class AdaptiveRulesHolder( Review comment: +1 for this addition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841174107 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -83,7 +83,7 @@ case class AdaptiveSparkPlanExec( @transient private val planChangeLogger = new PlanChangeLogger[SparkPlan]() // The logical plan optimizer for re-optimizing the current logical plan. - @transient private val optimizer = new AQEOptimizer(conf) + @transient private val optimizer = new AQEOptimizer(context.session) Review comment: ```scala - @transient private val optimizer = new AQEOptimizer(context.session) + @transient private val optimizer = new AQEOptimizer(conf, session.sessionState.adaptiveRulesHolder.runtimeOptimizerRules) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841174107 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala ## @@ -83,7 +83,7 @@ case class AdaptiveSparkPlanExec( @transient private val planChangeLogger = new PlanChangeLogger[SparkPlan]() // The logical plan optimizer for re-optimizing the current logical plan. - @transient private val optimizer = new AQEOptimizer(conf) + @transient private val optimizer = new AQEOptimizer(context.session) Review comment: ``` - @transient private val optimizer = new AQEOptimizer(context.session) + @transient private val optimizer = new AQEOptimizer(conf, session.sessionState.adaptiveRulesHolder.runtimeOptimizerRules) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841173889 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -28,7 +29,9 @@ import org.apache.spark.util.Utils /** * The optimizer for re-optimizing the logical plan used by AdaptiveSparkPlanExec. */ -class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { +class AQEOptimizer(session: SparkSession) extends RuleExecutor[LogicalPlan] { Review comment: `SparkSession` seems an overkill. Shall we narrow down and pass over `session.sessionState.adaptiveRulesHolder.runtimeOptimizerRules` only (in addition to the existing `conf`)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer
dongjoon-hyun commented on a change in pull request #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r841173889 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AQEOptimizer.scala ## @@ -28,7 +29,9 @@ import org.apache.spark.util.Utils /** * The optimizer for re-optimizing the logical plan used by AdaptiveSparkPlanExec. */ -class AQEOptimizer(conf: SQLConf) extends RuleExecutor[LogicalPlan] { +class AQEOptimizer(session: SparkSession) extends RuleExecutor[LogicalPlan] { Review comment: `SparkSession` seems an overkill. Shall we narrow down and pass over `session.sessionState.adaptiveRulesHolder` only (in addition to the existing `conf`)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in 'ALS validate input dataset' test case
dongjoon-hyun commented on pull request #36051: URL: https://github.com/apache/spark/pull/36051#issuecomment-1086791099 cc @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #36051: [SPARK-38776][MLLIB][TESTS] Disable ANSI_ENABLED explicitly in 'ALS validate input dataset' test case
dongjoon-hyun opened a new pull request #36051: URL: https://github.com/apache/spark/pull/36051 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j
dongjoon-hyun edited a comment on pull request #35765: URL: https://github.com/apache/spark/pull/35765#issuecomment-1086786301 cc @MaxGekk since he is a release manager. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #35765: [SPARK-38446][Core] Fix deadlock between ExecutorClassLoader and FileDownloadCallback caused by Log4j
dongjoon-hyun commented on pull request #35765: URL: https://github.com/apache/spark/pull/35765#issuecomment-1086786301 cc @MaxGekk since he is a release manager. Although this is still unclear to me, this could be a regression at Apache Spark 3.3.0 release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35984: [MINOR] Log AnalysisException output for debug and tracing
dongjoon-hyun commented on a change in pull request #35984: URL: https://github.com/apache/spark/pull/35984#discussion_r841167501 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1761,7 +1761,9 @@ class Analyzer(override val catalogManager: CatalogManager) try { innerResolve(expr, isTopLevel = true) } catch { - case _: AnalysisException if !throws => expr + case ae: AnalysisException if !throws => +logWarning(ae.message) Review comment: If you want to suggest this feature for `debug and tracing` as you mentioned in the PR title, this should be `logDebug` or `logTrace`, @monkeyboy123 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35984: [MINOR] Log AnalysisException output for debug and tracing
dongjoon-hyun commented on a change in pull request #35984: URL: https://github.com/apache/spark/pull/35984#discussion_r841167501 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ## @@ -1761,7 +1761,9 @@ class Analyzer(override val catalogManager: CatalogManager) try { innerResolve(expr, isTopLevel = true) } catch { - case _: AnalysisException if !throws => expr + case ae: AnalysisException if !throws => +logWarning(ae.message) Review comment: If you want to suggest this feature for `debug and tracing`, this should be `logDebug` or `logTrace`, @monkeyboy123 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration
dongjoon-hyun commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841167066 ## File path: streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala ## @@ -386,3 +400,14 @@ class StreamingContextStoppingCollector(val ssc: StreamingContext) extends Strea } } } + +class CustomizedStreamingListener extends StreamingListener { Review comment: Shall we use a different word instead of `Customized`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration
dongjoon-hyun commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841166959 ## File path: streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala ## @@ -235,6 +236,19 @@ class StreamingListenerSuite extends TestSuiteBase with LocalStreamingContext wi verifyNoMoreInteractions(streamingListener) } + test("SPARK-38498: Support customized streaming listener") { Review comment: `customized` -> `extra` because the conf name is `STREAMING_EXTRA_LISTENERS`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration
dongjoon-hyun commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841166793 ## File path: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ## @@ -616,6 +616,27 @@ class StreamingContext private[streaming] ( } } + /** + * Registers streaming listeners specified in spark.streaming.extraListeners. + */ + private def setupExtraStreamingListener(): Unit = { Review comment: I guess `registerXXX` is more seamless than `setupXXX` because - we have `registerProgressListener` already - function description also says `Registers`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration
dongjoon-hyun commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841166511 ## File path: streaming/src/main/scala/org/apache/spark/streaming/StreamingConf.scala ## @@ -185,4 +185,10 @@ object StreamingConf { .longConf .createWithDefault(0) + private[streaming] val STREAMING_EXTRA_LISTENERS = ConfigBuilder("spark.streaming.extraListeners") +.doc("Class names of streaming listeners to add to StreamingContext during initialization.") +.version("3.3.0") Review comment: This should be the version of `master` branch which is currently `3.4.0`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration
dongjoon-hyun commented on a change in pull request #35799: URL: https://github.com/apache/spark/pull/35799#discussion_r841166490 ## File path: streaming/src/main/scala/org/apache/spark/streaming/StreamingConf.scala ## @@ -185,4 +185,10 @@ object StreamingConf { .longConf .createWithDefault(0) + private[streaming] val STREAMING_EXTRA_LISTENERS = ConfigBuilder("spark.streaming.extraListeners") Review comment: If you don't mind, shall we match the code patter the other codes like line 183? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #35881: [SPARK-36664][CORE] Log time waiting for cluster resources
dongjoon-hyun commented on pull request #35881: URL: https://github.com/apache/spark/pull/35881#issuecomment-1086784648 Thank you, @holdenk . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JacobZheng0927 removed a comment on pull request #29965: [SPARK-33016][SQL] Potential SQLMetrics missed which might cause WEB UI display issue while AQE is on.
JacobZheng0927 removed a comment on pull request #29965: URL: https://github.com/apache/spark/pull/29965#issuecomment-1085680797 I'm wondering if this change will cause a driver memory overflow, as duplicate SQLMetric may take up a lot of memory. @leanken-zz -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org