[ https://issues.apache.org/jira/browse/HIVE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stamatis Zampetakis updated HIVE-3906: -------------------------------------- Fix Version/s: (was: 0.8.1) I cleared the fixVersion field since this ticket is still open. Please review this ticket and if the fix is already committed to a specific version please set the version accordingly and mark the ticket as RESOLVED. According to the [JIRA guidelines|https://cwiki.apache.org/confluence/display/Hive/HowToContribute] the fixVersion should be set only when the issue is resolved/closed. > URI_Escape and URI_UnEscape UDF > ------------------------------- > > Key: HIVE-3906 > URL: https://issues.apache.org/jira/browse/HIVE-3906 > Project: Hive > Issue Type: New Feature > Components: UDF > Affects Versions: 0.8.1 > Environment: Hadoop 0.20.1 > Java 1.6.0 > Reporter: Liu Zongquan > Priority: Major > Labels: patch > Attachments: HIVE-3906.1.patch.txt, udf_uri_escape.q, > udf_uri_escape.q.out, udf_uri_unescape.q, udf_uri_unescape.q.out > > Original Estimate: 96h > Remaining Estimate: 96h > > Current releases of Hive lacks a function which would encode URL or form > parameters or it escapes the URI. > The function URI_ESCAPE (uri) would return the encoded form of the URI which > would be useful while using HiveQL.Its always advisable to encode URL or form > parameters; plain form parameter is vulnerable to cross site attack, SQL > injection and may direct our web application into some unpredicted output. > Functionality :- > Function Name: URI_ESCAPE (uri) > Returns the encoded form of the uri. > Example: hive> SELECT URI_ESCAPE('http://www.example.com?a=l&t'); > -> 'http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t' > Usage :- > Case 1 : To get encoded uri corresponding to a particular uri > hive> SELECT URI_ESCAPE('http://google.com/resource?key=value1 & value2'); > -> 'http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2' > Case 2 : To query a table to get encoded form of the urls corresponding to > users > Table :- USER_URLS > userid |url > USR00001|http://www.example.com?a=l&t > USR00010|http://search.barnesandnoble.com/booksearch/first book.pdf > > USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4 > USR01000|http://google.com/resource?key=value > USR10000|http://google.com/resource?key=value1 & value2 > USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1 > USR10010|gopher://gopher.voa.gov > USR10100|http://www.apple.com/index.html > USR11000|file:/data/letters/to_mom.txt > USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html > Query : select userid,url,uri_escape(uri) from USER_URLS; > Result :- > USR00001|http://www.example.com?a=l&t|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t > > USR00010|http://search.barnesandnoble.com/booksearch/first > book.pdf|http://search.barnesandnoble.com/booksearch/first%20book.pdf > > USR00100|http://abc.dev.domain.com/0007AC/ads/800x480 15sec > h.264.mp4|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf > USR01000|http://google.com/resource?key=value|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue > USR10000|http://google.com/resource?key=value1 & > value2|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2 > USR10001|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1 > USR10010|gopher://gopher.voa.gov|gopher%3A%2F%2Fgopher.voa.gov > USR10100|http://www.apple.com/index.html|http%3A%2F%2Fwww.apple.com%2Findex.html > USR11000|file:/data/letters/to_mom.txt|file%3A%2Fdata%2Fletters%2Fto_mom.txt > USR11001|http://www.cuug.ab.ca:8001/~branderr/csce.html|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html > Current releases of Hive lacks a function which would decode the encoded uri. > The function URI_UNESCAPE (uri) would return the decoded form of the encoded > URI which would be useful while using HiveQL.This function converts the > specified string by replacing any escape sequences with their unescaped > representation. > Functionality :- > Function Name: URI_UNESCAPE (uri) > Returns the decoded form of the encoded uri. > Example: hive> SELECT > URI_UNESCAPE('http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t'); > -> 'http://www.example.com?a=l&t' > Usage :- > Case 1 : To get decoded uri corresponding to a particular encoded uri > hive> SELECT > URI_UNESCAPE('http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2'); > -> 'http://google.com/resource?key=value1 & value2' > Case 2 : To query a table to get decoded form of the encoded urls > corresponding to users > Table :- USER_URLS > userid |encodedurl > USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t > USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf > USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf > USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue > USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2 > USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1 > USR10010|gopher%3A%2F%2Fgopher.voa.gov > USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html > USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt > USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html > Query : select userid,encodedurl,uri_unescape(encodedurl) from USER_URLS; > Result :- > USR00001|http%3A%2F%2Fwww.example.com%3Fa%3Dl%26t|http://www.example.com?a=l&t > USR00010|http://search.barnesandnoble.com/booksearch/first%20book.pdf|http://search.barnesandnoble.com/booksearch/first > book.pdf > USR00100|http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst%20book.pdf|http://abc.dev.domain.com/0007AC/ads/800x480 > 15sec h.264.mp4 > USR01000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue|http://google.com/resource?key=value > USR10000|http%3A%2F%2Fgoogle.com%2Fresource%3Fkey%3Dvalue1%20%26%20value2|http://google.com/resource?key=value1 > & value2 > USR10001|ftp%3A%2F%2Feau.ww.eesd.gov.calgary%2Fhome%2Fsmith%2Fbudget.wk1|ftp://eau.ww.eesd.gov.calgary/home/smith/budget.wk1 > USR10010|gopher%3A%2F%2Fgopher.voa.gov|gopher://gopher.voa.gov > USR10100|http%3A%2F%2Fwww.apple.com%2Findex.html|http://www.apple.com/index.html > USR11000|file%3A%2Fdata%2Fletters%2Fto_mom.txt|file:/data/letters/to_mom.txt > USR11001|http%3A%2F%2Fwww.cuug.ab.ca%3A8001%2F%7Ebranderr%2Fcsce.html|http://www.cuug.ab.ca:8001/~branderr/csce.html > -- This message was sent by Atlassian Jira (v8.20.10#820010)