Cheng Lian created SPARK-7847:
---------------------------------

             Summary: Fix dynamic partition path escaping
                 Key: SPARK-7847
                 URL: https://issues.apache.org/jira/browse/SPARK-7847
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.3.1, 1.3.0, 1.4.0
            Reporter: Cheng Lian
            Assignee: Cheng Lian
            Priority: Critical


Background: when writing dynamic partitions, partition values are converted to 
string and escaped if necessary. For example, a partition column {{p}} of type 
{{String}} may have a value {{A/B}}, then the corresponding partition directory 
name is escaped into {{p=A%2fB}}.

Currently, there are two issues regarding to dynamic partition path escaping. 
The first issue is that, when reading back partition values, escaped strings 
are not unescaped. This one is easy to fix.

The second issue is more subtle. In [PR 
#5381|https://github.com/apache/spark/pull/5381/files#diff-c69b9e667e93b7e4693812cc72abb65fR492]
 we tried to use {{Path.toUri.toString}} to fix an escaping issue related to S3 
credentials with {{/}} character. Unfortunately, {{Path.toUri.toString}} also 
escapes {{%}} characters in the path. Thus, using the dynamic partitioning case 
mentioned above, {{p=A%2fB}} is double escaped into {{p=A%252fB}} ({{%}} 
escaped into {{%25}}).

The expected behavior here should be, only escaping the URI user info part (S3 
key and secret) but leave all other components untouched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to