[ 
https://issues.apache.org/jira/browse/NUTCH-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel closed NUTCH-1685.
----------------------------------

    Resolution: Duplicate

You are right, [~markus17]. 

> URLUtil.toUNICODE fails on IDNs
> -------------------------------
>
>                 Key: NUTCH-1685
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1685
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.7, 2.2.1
>         Environment: Java 7, OpenJDK 64-Bit, 1.7.0_25
>            Reporter: Sebastian Nagel
>             Fix For: 2.3, 1.8
>
>         Attachments: NUTCH-1685-2x-test.patch
>
>
> URLUtil.toUNICODE() fails on IDNs and returns null instead of the Unicode 
> URL. The constructor of URI obviously does not accept IDN host names. For 
> {{http://www.xn--evir-zoa.com/}} the constructor IDN() throws the exception:
> {code}
> java.net.URISyntaxException: Illegal character in hostname at index 11: 
> http://www.çevir.com/
> {code}
> Principally, IDN.toUnicode() can convert URLs (not only domain or host 
> names). However, it does not convert URLs with host part consisting of only 
> two parts: {{http://xn--uni-tbingen-xhb.de/}}. Is that the reason why we need 
> URLUtil.toUNICODE() ?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to