This is an automated email from the ASF dual-hosted git repository. gengliang pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push: new 6abfeb6 [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL 6abfeb6 is described below commit 6abfeb6884a3cdfe4c6e621219e6cf5a35d6467e Author: Gengliang Wang <gengliang.w...@databricks.com> AuthorDate: Wed Dec 2 01:36:41 2020 +0800 [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten proxy URL ### What changes were proposed in this pull request? When running Spark behind a reverse proxy(e.g. Nginx, Apache HTTP server), the request URL can be encoded twice if we pass the query string directly to the constructor of `java.net.URI`: ``` > val uri = "http://localhost:8081/test" > val query = "order%5B0%5D%5Bcolumn%5D=0" // query string of URL from the reverse proxy > val rewrittenURI = URI.create(uri.toString()) > new URI(rewrittenURI.getScheme(), rewrittenURI.getAuthority(), rewrittenURI.getPath(), query, rewrittenURI.getFragment()).toString result: http://localhost:8081/test?order%255B0%255D%255Bcolumn%255D=0 ``` In Spark's stage page, the URL of "/taskTable" contains query parameter order[0][dir]. After encoding twice, the query parameter becomes `order%255B0%255D%255Bdir%255D` and it will be decoded as `order%5B0%5D%5Bdir%5D` instead of `order[0][dir]`. As a result, there will be NullPointerException from https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176 Other than that, the other parameter may not work as expected after encoded twice. This PR is to fix the bug by calling the method `URI.create(String URL)` directly. This convenience method can avoid encoding twice on the query parameter. ``` > val uri = "http://localhost:8081/test" > val query = "order%5B0%5D%5Bcolumn%5D=0" > URI.create(s"$uri?$query").toString result: http://localhost:8081/test?order%5B0%5D%5Bcolumn%5D=0 > URI.create(s"$uri?$query").getQuery result: order[0][column]=0 ``` ### Why are the changes needed? Fix a potential bug when Spark's reverse proxy is enabled. The bug itself is similar to https://github.com/apache/spark/pull/29271. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add a new unit test. Also, Manual UI testing for master, worker and app UI with an nginx proxy Spark config: ``` spark.ui.port 8080 spark.ui.reverseProxy=true spark.ui.reverseProxyUrl=/path/to/spark/ ``` nginx config: ``` server { listen 9000; set $SPARK_MASTER http://127.0.0.1:8080; # split spark UI path into prefix and local path within master UI location ~ ^(/path/to/spark/) { # strip prefix when forwarding request rewrite /path/to/spark(/.*) $1 break; #rewrite /path/to/spark/ "/" ; # forward to spark master UI proxy_pass $SPARK_MASTER; proxy_intercept_errors on; error_page 301 302 307 = handle_redirects; } location handle_redirects { set $saved_redirect_location '$upstream_http_location'; proxy_pass $saved_redirect_location; } } ``` Closes #30552 from gengliangwang/decodeProxyRedirect. Authored-by: Gengliang Wang <gengliang.w...@databricks.com> Signed-off-by: Gengliang Wang <gengliang.w...@databricks.com> (cherry picked from commit 5d0045eedf4b138c031accac2b1fa1e8d6f3f7c6) Signed-off-by: Gengliang Wang <gengliang.w...@databricks.com> --- core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 16 ++++++---------- core/src/test/scala/org/apache/spark/ui/UISuite.scala | 9 +++++++++ 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala index a4ba565..3820a88 100644 --- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala +++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala @@ -400,17 +400,13 @@ private[spark] object JettyUtils extends Logging { uri.append(rest) } - val rewrittenURI = URI.create(uri.toString()) - if (query != null) { - return new URI( - rewrittenURI.getScheme(), - rewrittenURI.getAuthority(), - rewrittenURI.getPath(), - query, - rewrittenURI.getFragment() - ).normalize() + val queryString = if (query == null) { + "" + } else { + s"?$query" } - rewrittenURI.normalize() + // SPARK-33611: use method `URI.create` to avoid percent-encoding twice on the query string. + URI.create(uri.toString() + queryString).normalize() } def createProxyLocationHeader( diff --git a/core/src/test/scala/org/apache/spark/ui/UISuite.scala b/core/src/test/scala/org/apache/spark/ui/UISuite.scala index 56026ea..c7e1dfe 100644 --- a/core/src/test/scala/org/apache/spark/ui/UISuite.scala +++ b/core/src/test/scala/org/apache/spark/ui/UISuite.scala @@ -216,6 +216,15 @@ class UISuite extends SparkFunSuite { assert(rewrittenURI === null) } + test("SPARK-33611: Avoid encoding twice on the query parameter of proxy rewrittenURI") { + val prefix = "/worker-id" + val target = "http://localhost:8081" + val path = "/worker-id/json" + val rewrittenURI = + JettyUtils.createProxyURI(prefix, target, path, "order%5B0%5D%5Bcolumn%5D=0") + assert(rewrittenURI.toString === "http://localhost:8081/json?order%5B0%5D%5Bcolumn%5D=0") + } + test("verify rewriting location header for reverse proxy") { val clientRequest = mock(classOf[HttpServletRequest]) var headerValue = "http://localhost:4040/jobs" --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org