This is an automated email from the ASF dual-hosted git repository.

gengliang pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 6abfeb6  [SPARK-33611][UI] Avoid encoding twice on the query parameter 
of rewritten proxy URL
6abfeb6 is described below

commit 6abfeb6884a3cdfe4c6e621219e6cf5a35d6467e
Author: Gengliang Wang <gengliang.w...@databricks.com>
AuthorDate: Wed Dec 2 01:36:41 2020 +0800

    [SPARK-33611][UI] Avoid encoding twice on the query parameter of rewritten 
proxy URL
    
    ### What changes were proposed in this pull request?
    
    When running Spark behind a reverse proxy(e.g. Nginx, Apache HTTP server), 
the request URL can be encoded twice if we pass the query string directly to 
the constructor of `java.net.URI`:
    ```
    > val uri = "http://localhost:8081/test";
    > val query = "order%5B0%5D%5Bcolumn%5D=0"  // query string of URL from the 
reverse proxy
    > val rewrittenURI = URI.create(uri.toString())
    
    > new URI(rewrittenURI.getScheme(),
          rewrittenURI.getAuthority(),
          rewrittenURI.getPath(),
          query,
          rewrittenURI.getFragment()).toString
    result: http://localhost:8081/test?order%255B0%255D%255Bcolumn%255D=0
    ```
    
    In Spark's stage page, the URL of "/taskTable" contains query parameter 
order[0][dir]. After encoding twice, the query parameter becomes 
`order%255B0%255D%255Bdir%255D` and it will be decoded as 
`order%5B0%5D%5Bdir%5D` instead of `order[0][dir]`. As a result, there will be 
NullPointerException from 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/api/v1/StagesResource.scala#L176
    Other than that, the other parameter may not work as expected after encoded 
twice.
    
    This PR is to fix the bug by calling the method `URI.create(String URL)` 
directly. This convenience method can avoid encoding twice on the query 
parameter.
    ```
    > val uri = "http://localhost:8081/test";
    > val query = "order%5B0%5D%5Bcolumn%5D=0"
    > URI.create(s"$uri?$query").toString
    result: http://localhost:8081/test?order%5B0%5D%5Bcolumn%5D=0
    
    > URI.create(s"$uri?$query").getQuery
    result: order[0][column]=0
    ```
    
    ### Why are the changes needed?
    
    Fix a potential bug when Spark's reverse proxy is enabled.
    The bug itself is similar to https://github.com/apache/spark/pull/29271.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Add a new unit test.
    Also, Manual UI testing for master, worker and app UI with an nginx proxy
    
    Spark config:
    ```
    spark.ui.port 8080
    spark.ui.reverseProxy=true
    spark.ui.reverseProxyUrl=/path/to/spark/
    ```
    nginx config:
    ```
    server {
        listen 9000;
        set $SPARK_MASTER http://127.0.0.1:8080;
        # split spark UI path into prefix and local path within master UI
        location ~ ^(/path/to/spark/) {
            # strip prefix when forwarding request
            rewrite /path/to/spark(/.*) $1  break;
            #rewrite /path/to/spark/ "/" ;
            # forward to spark master UI
            proxy_pass $SPARK_MASTER;
            proxy_intercept_errors on;
            error_page 301 302 307 = handle_redirects;
        }
        location handle_redirects {
            set $saved_redirect_location '$upstream_http_location';
            proxy_pass $saved_redirect_location;
        }
    }
    ```
    
    Closes #30552 from gengliangwang/decodeProxyRedirect.
    
    Authored-by: Gengliang Wang <gengliang.w...@databricks.com>
    Signed-off-by: Gengliang Wang <gengliang.w...@databricks.com>
    (cherry picked from commit 5d0045eedf4b138c031accac2b1fa1e8d6f3f7c6)
    Signed-off-by: Gengliang Wang <gengliang.w...@databricks.com>
---
 core/src/main/scala/org/apache/spark/ui/JettyUtils.scala | 16 ++++++----------
 core/src/test/scala/org/apache/spark/ui/UISuite.scala    |  9 +++++++++
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala 
b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
index a4ba565..3820a88 100644
--- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
+++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
@@ -400,17 +400,13 @@ private[spark] object JettyUtils extends Logging {
       uri.append(rest)
     }
 
-    val rewrittenURI = URI.create(uri.toString())
-    if (query != null) {
-      return new URI(
-          rewrittenURI.getScheme(),
-          rewrittenURI.getAuthority(),
-          rewrittenURI.getPath(),
-          query,
-          rewrittenURI.getFragment()
-        ).normalize()
+    val queryString = if (query == null) {
+      ""
+    } else {
+      s"?$query"
     }
-    rewrittenURI.normalize()
+    // SPARK-33611: use method `URI.create` to avoid percent-encoding twice on 
the query string.
+    URI.create(uri.toString() + queryString).normalize()
   }
 
   def createProxyLocationHeader(
diff --git a/core/src/test/scala/org/apache/spark/ui/UISuite.scala 
b/core/src/test/scala/org/apache/spark/ui/UISuite.scala
index 56026ea..c7e1dfe 100644
--- a/core/src/test/scala/org/apache/spark/ui/UISuite.scala
+++ b/core/src/test/scala/org/apache/spark/ui/UISuite.scala
@@ -216,6 +216,15 @@ class UISuite extends SparkFunSuite {
     assert(rewrittenURI === null)
   }
 
+  test("SPARK-33611: Avoid encoding twice on the query parameter of proxy 
rewrittenURI") {
+    val prefix = "/worker-id"
+    val target = "http://localhost:8081";
+    val path = "/worker-id/json"
+    val rewrittenURI =
+      JettyUtils.createProxyURI(prefix, target, path, 
"order%5B0%5D%5Bcolumn%5D=0")
+    assert(rewrittenURI.toString === 
"http://localhost:8081/json?order%5B0%5D%5Bcolumn%5D=0";)
+  }
+
   test("verify rewriting location header for reverse proxy") {
     val clientRequest = mock(classOf[HttpServletRequest])
     var headerValue = "http://localhost:4040/jobs";


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to