[ 
https://issues.apache.org/jira/browse/HIVE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-3906:
--------------------------------------
    Fix Version/s:     (was: 0.8.1)

I cleared the fixVersion field since this ticket is still open. Please review 
this ticket and if the fix is already committed to a specific version please 
set the version accordingly and mark the ticket as RESOLVED.

According to the [JIRA 
guidelines|https://cwiki.apache.org/confluence/display/Hive/HowToContribute] 
the fixVersion should be set only when the issue is resolved/closed.

> URI_Escape and URI_UnEscape UDF
> -------------------------------
>
>                 Key: HIVE-3906
>                 URL: https://issues.apache.org/jira/browse/HIVE-3906
>             Project: Hive
>          Issue Type: New Feature
>          Components: UDF
>    Affects Versions: 0.8.1
>         Environment: Hadoop 0.20.1
> Java 1.6.0
>            Reporter: Liu Zongquan
>            Priority: Major
>              Labels: patch
>         Attachments: HIVE-3906.1.patch.txt, udf_uri_escape.q, 
> udf_uri_escape.q.out, udf_uri_unescape.q, udf_uri_unescape.q.out
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Current releases of Hive lacks a function which would encode URL or form 
> parameters or it escapes the URI.
> The function URI_ESCAPE (uri) would return the encoded form  of the URI which 
> would be useful while using HiveQL.Its always advisable to encode URL or form 
> parameters; plain form parameter is vulnerable to cross site attack, SQL 
> injection and may direct our web application into some unpredicted output.
> Functionality :-
> Function Name: URI_ESCAPE (uri)
> Returns the encoded form of the uri.
> Example: hive> SELECT URI_ESCAPE('http://www.example.com?a=l&t');
> -> 'http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t'
> Usage :-
> Case 1 : To get encoded uri corresponding to a particular uri
> hive> SELECT URI_ESCAPE('http://google.com/resource?key=value1 & value2');
> -> 'http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2'
> Case 2 : To query a table to get encoded form of the urls corresponding to 
> users
> Table :- USER_URLS
> userid |url
> USR00001|http://www.example.com?a=l&t   
> USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf           
>            
> USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4
> USR01000|http://google.com/resource?key=value
> USR10000|http://google.com/resource?key=value1 & value2
> USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
> USR10010|gopher://gopher.voa.gov
> USR10100|http://www.apple.com/index.html
> USR11000|file:/data/letters/to_mom.txt
> USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html 
> Query : select userid,url,uri_escape(uri) from USER_URLS;
> Result :-
> USR00001|http://www.example.com?a=l&t|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t
>    
> USR00010|http://search.barnesandnoble.com/booksearch/first 
> book.pdf|http://search.barnesandnoble.com/booksearch/first%20book.pdf         
>             
> USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec 
> h.264.mp4|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
> USR01000|http://google.com/resource?key=value|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
> USR10000|http://google.com/resource?key=value1 & 
> value2|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
> USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
> USR10010|gopher://gopher.voa.gov|gopher%3A%2F%2Fgopher.voa.gov
> USR10100|http://www.apple.com/index.html|http%3A%2F%2Fwww.apple.com%2Findex.html
> USR11000|file:/data/letters/to_mom.txt|file%3A%2Fdata%2Fletters%2Fto_mom.txt
> USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html
> Current releases of Hive lacks a function which would decode the encoded uri.
> The function URI_UNESCAPE (uri) would return the decoded form  of the encoded 
> URI which would be useful while using HiveQL.This function converts the 
> specified string by replacing any escape sequences with their unescaped 
> representation.
> Functionality :-
> Function Name: URI_UNESCAPE (uri)
> Returns the decoded form of the encoded uri.
> Example: hive> SELECT 
> URI_UNESCAPE('http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t');
> -> 'http://www.example.com?a=l&t'
> Usage :-
> Case 1 : To get decoded uri corresponding to a particular encoded uri
> hive> SELECT 
> URI_UNESCAPE('http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2');
> -> 'http://google.com/resource?key=value1 & value2'
> Case 2 : To query a table to get decoded form of the encoded urls 
> corresponding to users
> Table :- USER_URLS
> userid |encodedurl
> USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t
> USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf
> USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf
> USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue
> USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2
> USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1
> USR10010|gopher%3A%2F%2Fgopher.voa.gov
> USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html
> USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt
> USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html
> Query : select userid,encodedurl,uri_unescape(encodedurl) from USER_URLS;
> Result :-
> USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t|http://www.example.com?a=l&t
> USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf|http://search.barnesandnoble.com/booksearch/first
>  book.pdf
> USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf|http://abc.dev.domain.com/0007AC/ads/800x480
>  15sec h.264.mp4
> USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue|http://google.com/resource?key=value
> USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2|http://google.com/resource?key=value1
>  & value2
> USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1
> USR10010|gopher%3A%2F%2Fgopher.voa.gov|gopher://gopher.voa.gov
> USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html|http://www.apple.com/index.html
> USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt|file:/data/letters/to_mom.txt
> USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html|http://www.cuug.ab.ca:8001/~branderr/csce.html
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to