[spark] branch master updated (51ebcd9 -> a4788ee)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 51ebcd9 [SPARK-32863][SS] Full outer stream-stream join add a4788ee [MINOR][SS] Rename auxiliary protected methods in StreamingJoinSuite No new revisions were added by this update. Summary of changes: .../apache/spark/sql/streaming/StreamingJoinSuite.scala | 16 1 file changed, 8 insertions(+), 8 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
Inbox (2) | New Cloud Notification
Dear User2 New documents assigned to 'commits@spark.apache.org ' are available on spark.apache.org Cloudclick here to retrieve document(s) now Powered by spark.apache.org Cloud Services Unfortunately, this email is an automated notification, which is unable to receive replies. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f71f345 -> 51ebcd9)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f71f345 [SPARK-33544][SQL] Optimize size of CreateArray/CreateMap to be the size of its children add 51ebcd9 [SPARK-32863][SS] Full outer stream-stream join No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 71 --- .../analysis/UnsupportedOperationsSuite.scala | 16 +- .../streaming/StreamingSymmetricHashJoinExec.scala | 57 -- .../spark/sql/streaming/StreamingJoinSuite.scala | 209 - 4 files changed, 297 insertions(+), 56 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5a1c5ac -> f71f345)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5a1c5ac [SPARK-33622][R][ML] Add array_to_vector to SparkR add f71f345 [SPARK-33544][SQL] Optimize size of CreateArray/CreateMap to be the size of its children No new revisions were added by this update. Summary of changes: .../catalyst/expressions/complexTypeCreator.scala | 12 +-- .../spark/sql/catalyst/optimizer/expressions.scala | 13 +++ .../catalyst/optimizer/ConstantFoldingSuite.scala | 36 +++ .../optimizer/InferFiltersFromGenerateSuite.scala | 41 +- 4 files changed, 98 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (5d0045e -> 5a1c5ac)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 5d0045e [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL add 5a1c5ac [SPARK-33622][R][ML] Add array_to_vector to SparkR No new revisions were added by this update. Summary of changes: R/pkg/NAMESPACE | 1 + R/pkg/R/functions.R | 26 +- R/pkg/R/generics.R| 4 R/pkg/tests/fulltests/test_sparkSQL.R | 3 ++- 4 files changed, 32 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6abfeb6 [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL 6abfeb6 is described below commit 6abfeb6884a3cdfe4c6e621219e6cf5a35d6467e Author: Gengliang Wang AuthorDate: Wed Dec 2 01:36:41 2020 +0800 [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL ### What changes were proposed in this pull request? When running Spark behind a reverse proxy(e.g. Nginx, Apache HTTP server), the request URL can be encoded twice if we pass the query string directly to the constructor of `java.net.URI`: ``` > val uri = "http://localhost:8081/test; > val query = "order%5B0%5D%5Bcolumn%5D=0" // query string of URL from the reverse proxy > val rewrittenURI = URI.create(uri.toString()) > new URI(rewrittenURI.getScheme(), rewrittenURI.getAuthority(), rewrittenURI.getPath(), query, rewrittenURI.getFragment()).toString result: http://localhost:8081/test?order%255B0%255D%255Bcolumn%255D=0 ``` In Spark's stage page, the URL of "/taskTable" contains query parameter order[0][dir]. After encoding twice, the query parameter becomes `order%255B0%255D%255Bdir%255D` and it will be decoded as `order%5B0%5D%5Bdir%5D` instead of `order[0][dir]`. As a result, there will be NullPointerException from https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176 Other than that, the other parameter may not work as expected after encoded twice. This PR is to fix the bug by calling the method `URI.create(String URL)` directly. This convenience method can avoid encoding twice on the query parameter. ``` > val uri = "http://localhost:8081/test; > val query = "order%5B0%5D%5Bcolumn%5D=0" > URI.create(s"$uri?$query").toString result: http://localhost:8081/test?order%5B0%5D%5Bcolumn%5D=0 > URI.create(s"$uri?$query").getQuery result: order[0][column]=0 ``` ### Why are the changes needed? Fix a potential bug when Spark's reverse proxy is enabled. The bug itself is similar to https://github.com/apache/spark/pull/29271. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add a new unit test. Also, Manual UI testing for master, worker and app UI with an nginx proxy Spark config: ``` spark.ui.port 8080 spark.ui.reverseProxy=true spark.ui.reverseProxyUrl=/path/to/spark/ ``` nginx config: ``` server { listen 9000; set $SPARK_MASTER http://127.0.0.1:8080; # split spark UI path into prefix and local path within master UI location ~ ^(/path/to/spark/) { # strip prefix when forwarding request rewrite /path/to/spark(/.*) $1 break; #rewrite /path/to/spark/ "/" ; # forward to spark master UI proxy_pass $SPARK_MASTER; proxy_intercept_errors on; error_page 301 302 307 = handle_redirects; } location handle_redirects { set $saved_redirect_location '$upstream_http_location'; proxy_pass $saved_redirect_location; } } ``` Closes #30552 from gengliangwang/decodeProxyRedirect. Authored-by: Gengliang Wang Signed-off-by: Gengliang Wang (cherry picked from commit 5d0045eedf4b138c031accac2b1fa1e8d6f3f7c6) Signed-off-by: Gengliang Wang --- core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 16 ++-- core/src/test/scala/org/apache/spark/ui/UISuite.scala| 9 + 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala index a4ba565..3820a88 100644 --- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala +++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala @@ -400,17 +400,13 @@ private[spark] object JettyUtils extends Logging { uri.append(rest) } -val rewrittenURI = URI.create(uri.toString()) -if (query != null) { - return new URI( - rewrittenURI.getScheme(), - rewrittenURI.getAuthority(), - rewrittenURI.getPath(), - query, - rewrittenURI.getFragment() -).normalize() +val queryString = if (query == null) { + "" +} else { + s"?$query" } -rewrittenURI.normalize() +// SPARK-33611: use method `URI.create` to avoid percent-encoding twice on the query string. +URI.create(uri.toString() + queryString).normalize() }
[spark] branch master updated (c24f2b2 -> 5d0045e)
This is an automated email from the ASF dual-hosted git repository. gengliang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from c24f2b2 [SPARK-33612][SQL] Add dataSourceRewriteRules batch to Optimizer add 5d0045e [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL No new revisions were added by this update. Summary of changes: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 16 ++-- core/src/test/scala/org/apache/spark/ui/UISuite.scala| 9 + 2 files changed, 15 insertions(+), 10 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (478fb7f5 -> c24f2b2)
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 478fb7f5 [SPARK-33608][SQL] Handle DELETE/UPDATE/MERGE in PullupCorrelatedPredicates add c24f2b2 [SPARK-33612][SQL] Add dataSourceRewriteRules batch to Optimizer No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 9 + .../apache/spark/sql/internal/BaseSessionStateBuilder.scala | 11 +++ 2 files changed, 20 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (cf4ad21 -> 478fb7f5)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from cf4ad21 [SPARK-33503][SQL] Refactor SortOrder class to allow multiple childrens add 478fb7f5 [SPARK-33608][SQL] Handle DELETE/UPDATE/MERGE in PullupCorrelatedPredicates No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/optimizer/subquery.scala| 2 + .../PullupCorrelatedPredicatesSuite.scala | 64 +- 2 files changed, 65 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9273d42 -> cf4ad21)
This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9273d42 [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue add cf4ad21 [SPARK-33503][SQL] Refactor SortOrder class to allow multiple childrens No new revisions were added by this update. Summary of changes: .../spark/sql/catalyst/analysis/Analyzer.scala | 2 +- .../apache/spark/sql/catalyst/dsl/package.scala| 4 ++-- .../spark/sql/catalyst/expressions/SortOrder.scala | 10 + .../spark/sql/catalyst/parser/AstBuilder.scala | 2 +- .../main/scala/org/apache/spark/sql/Column.scala | 8 +++ .../sql/execution/AliasAwareOutputExpression.scala | 6 + .../sql/execution/joins/SortMergeJoinExec.scala| 9 .../apache/spark/sql/execution/PlannerSuite.scala | 26 ++ 8 files changed, 46 insertions(+), 21 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d38883c -> 9273d42)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d38883c [SPARK-32405][SQL][FOLLOWUP] Throw Exception if provider is specified in JDBCTableCatalog create table add 9273d42 [SPARK-33045][SQL][FOLLOWUP] Support built-in function like_any and fix StackOverflowError issue No new revisions were added by this update. Summary of changes: .../apache/spark/sql/catalyst/dsl/package.scala| 4 + .../catalyst/expressions/regexpExpressions.scala | 98 ++ .../spark/sql/catalyst/parser/AstBuilder.scala | 31 --- .../org/apache/spark/sql/internal/SQLConf.scala| 14 .../expressions/RegexpExpressionsSuite.scala | 26 ++ .../catalyst/parser/ExpressionParserSuite.scala| 12 +-- .../test/resources/sql-tests/inputs/like-all.sql | 2 - .../test/resources/sql-tests/inputs/like-any.sql | 2 + 8 files changed, 138 insertions(+), 51 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e5bb293 -> d38883c)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e5bb293 [SPARK-32032][SS] Avoid infinite wait in driver because of KafkaConsumer.poll(long) API add d38883c [SPARK-32405][SQL][FOLLOWUP] Throw Exception if provider is specified in JDBCTableCatalog create table No new revisions were added by this update. Summary of changes: .../datasources/v2/jdbc/JDBCTableCatalog.scala | 3 ++- .../v2/jdbc/JDBCTableCatalogSuite.scala| 27 +++--- .../org/apache/spark/sql/jdbc/JDBCV2Suite.scala| 21 ++--- 3 files changed, 22 insertions(+), 29 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (1034815 -> e5bb293)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 1034815 [SPARK-33572][SQL] Datetime building should fail if the year, month, ..., second combination is invalid add e5bb293 [SPARK-32032][SS] Avoid infinite wait in driver because of KafkaConsumer.poll(long) API No new revisions were added by this update. Summary of changes: docs/ss-migration-guide.md | 5 + docs/structured-streaming-kafka-integration.md | 20 + .../spark/sql/kafka010/ConsumerStrategy.scala | 65 ++- .../org/apache/spark/sql/kafka010/KafkaBatch.scala | 2 +- .../spark/sql/kafka010/KafkaOffsetReader.scala | 601 ++--- ...etReader.scala => KafkaOffsetReaderAdmin.scala} | 284 +- ...eader.scala => KafkaOffsetReaderConsumer.scala} | 39 +- .../apache/spark/sql/kafka010/KafkaRelation.scala | 2 +- .../spark/sql/kafka010/KafkaSourceProvider.scala | 6 +- .../spark/sql/kafka010/ConsumerStrategySuite.scala | 147 + .../sql/kafka010/KafkaMicroBatchSourceSuite.scala | 42 +- .../sql/kafka010/KafkaOffsetReaderSuite.scala | 95 +++- .../spark/sql/kafka010/KafkaRelationSuite.scala| 47 +- .../org/apache/spark/sql/internal/SQLConf.scala| 13 + 14 files changed, 542 insertions(+), 826 deletions(-) copy external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/{KafkaOffsetReader.scala => KafkaOffsetReaderAdmin.scala} (73%) copy external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/{KafkaOffsetReader.scala => KafkaOffsetReaderConsumer.scala} (96%) create mode 100644 external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/ConsumerStrategySuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (52e5cc4 -> 1034815)
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 52e5cc4 [SPARK-27188][SS] FileStreamSink: provide a new option to have retention on output files add 1034815 [SPARK-33572][SQL] Datetime building should fail if the year, month, ..., second combination is invalid No new revisions were added by this update. Summary of changes: .../catalyst/expressions/datetimeExpressions.scala | 27 +++-- .../catalyst/expressions/intervalExpressions.scala | 23 +++- .../expressions/DateExpressionsSuite.scala | 118 +++-- .../expressions/IntervalExpressionsSuite.scala | 60 +++ .../sql-tests/results/postgreSQL/date.sql.out | 15 +-- 5 files changed, 187 insertions(+), 56 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org