[
https://issues.apache.org/jira/browse/HIVE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568491#comment-13568491
]
Arun A K commented on HIVE-3906:
--------------------------------
[~liuzongquan] Please find the patch attached. Also submitting the same for
review.
https://reviews.apache.org/r/9221/
> URI_Escape and URI_UnEscape UDF
> -------------------------------
>
> Key: HIVE-3906
> URL: https://issues.apache.org/jira/browse/HIVE-3906
> Project: Hive
> Issue Type: New Feature
> Components: UDF
> Affects Versions: 0.8.1
> Environment: Hadoop 0.20.1
> Java 1.6.0
> Reporter: Liu Zongquan
> Labels: patch
> Fix For: 0.8.1
>
> Attachments: HIVE-3906.1.patch.txt, udf_uri_escape.q,
> udf_uri_escape.q.out, udf_uri_unescape.q, udf_uri_unescape.q.out
>
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> Current releases of Hive lacks a function which would encode URL or form
> parameters or it escapes the URI.
> The function URI_ESCAPE (uri) would return the encoded form of the URI which
> would be useful while using HiveQL.Its always advisable to encode URL or form
> parameters; plain form parameter is vulnerable to cross site attack, SQL
> injection and may direct our web application into some unpredicted output.
> Functionality :-
> Function Name: URI_ESCAPE (uri)
> Returns the encoded form of the uri.
> Example: hive> SELECT URI_ESCAPE('http://www.example.com?a=l&t');
> -> 'http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t'
> Usage :-
> Case 1 : To get encoded uri corresponding to a particular uri
> hive> SELECT URI_ESCAPE('http://google.com/resource?key=value1 & value2');
> -> 'http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2'
> Case 2 : To query a table to get encoded form of the urls corresponding to
> users
> Table :- USER_URLS
> userid |url
> USR00001|http://www.example.com?a=l&t
> USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf
>
> USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4
> USR01000|http://google.com/resource?key=value
> USR10000|http://google.com/resource?key=value1 & value2
> USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
> USR10010|gopher://gopher.voa.gov
> USR10100|http://www.apple.com/index.html
> USR11000|file:/data/letters/to_mom.txt
> USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html
> Query : select userid,url,uri_escape(uri) from USER_URLS;
> Result :-
> USR00001|http://www.example.com?a=l&t|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t
>
> USR00010|http://search.barnesandnoble.com/booksearch/first
> book.pdf|http://search.barnesandnoble.com/booksearch/first%20book.pdf
>
> USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec
> h.264.mp4|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
> USR01000|http://google.com/resource?key=value|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
> USR10000|http://google.com/resource?key=value1 &
> value2|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
> USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
> USR10010|gopher://gopher.voa.gov|gopher%3A%2F%2Fgopher.voa.gov
> USR10100|http://www.apple.com/index.html|http%3A%2F%2Fwww.apple.com%2Findex.html
> USR11000|file:/data/letters/to_mom.txt|file%3A%2Fdata%2Fletters%2Fto_mom.txt
> USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html
> Current releases of Hive lacks a function which would decode the encoded uri.
> The function URI_UNESCAPE (uri) would return the decoded form of the encoded
> URI which would be useful while using HiveQL.This function converts the
> specified string by replacing any escape sequences with their unescaped
> representation.
> Functionality :-
> Function Name: URI_UNESCAPE (uri)
> Returns the decoded form of the encoded uri.
> Example: hive> SELECT
> URI_UNESCAPE('http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t');
> -> 'http://www.example.com?a=l&t'
> Usage :-
> Case 1 : To get decoded uri corresponding to a particular encoded uri
> hive> SELECT
> URI_UNESCAPE('http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2');
> -> 'http://google.com/resource?key=value1 & value2'
> Case 2 : To query a table to get decoded form of the encoded urls
> corresponding to users
> Table :- USER_URLS
> userid |encodedurl
> USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t
> USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf
> USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
> USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
> USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
> USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
> USR10010|gopher%3A%2F%2Fgopher.voa.gov
> USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html
> USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt
> USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html
> Query : select userid,encodedurl,uri_unescape(encodedurl) from USER_URLS;
> Result :-
> USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t|http://www.example.com?a=l&t
> USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf|http://search.barnesandnoble.com/booksearch/first
> book.pdf
> USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf|http://abc.dev.domain.com/0007AC/ads/800x480
> 15sec h.264.mp4
> USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue|http://google.com/resource?key=value
> USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2|http://google.com/resource?key=value1
> & value2
> USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
> USR10010|gopher%3A%2F%2Fgopher.voa.gov|gopher://gopher.voa.gov
> USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html|http://www.apple.com/index.html
> USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt|file:/data/letters/to_mom.txt
> USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html|http://www.cuug.ab.ca:8001/~branderr/csce.html
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira