Hello,

I'm from security at Google. I'm working on a differential fuzzer between
libidn2 and the Python idna package. (Essentially, I've written a program
that rapidly tries inputs for libidn2 and Python idna, and makes sure that
the same input produces the same result). I was writing this to find bugs
in the Python idna package, but I think I've found 3 bugs in libidn2
instead. I'm reaching out to report these 3 bugs.

In all of these cases, libidn2 rejects encoding the specified domain name
with an error, but Python idna encodes it fine. Also, in all of
these cases, libidn2 will happily *decode* the punycode generated by Python
idna, into the same input that it refuses to encode.

This input causes libidn2 to report an error of "domain name longer than
255 characters." However, the punycode domain name is only 146 characters.

   - Domain name:


   
髦暩晦晦晦獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳筳獳싂.퐀쓄쓄쓄쓄쓄쓄쓄쓄쓄쓄쓄쓼쓄쓄쓄쓄쓄쓄쓄쓄쓄㻄쓄쓄럄䄀싂.뼀猀獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳ⱁ㩁
   - Domain name hex codepoints:

   ['9ae6', '66a9', '6666', '6666', '6666', '7373', '7373', '7373', '7373',
   '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
   '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
   '7373', '7373', '7373', '7373', '7373', '7b73', '7373', 'c2c2', '2e',
   'd400', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4',
   'c4c4', 'c4c4', 'c4c4', 'c4fc', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4',
   'c4c4', 'c4c4', 'c4c4', 'c4c4', '3ec4', 'c4c4', 'c4c4', 'b7c4', '4100',
   'c2c2', '2e', 'bf00', '7300', '7373', '7373', '7373', '7373', '7373',
   '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
   '7373', '7373', '7373', '7373', '2c41', '3a41']
   - Punycode:


   
xn--lkvaa9xr87caaaaaaaaaaaaaaaaaaaaaaaaaaa7968dcp2n7tvk.xn--p9mx3db62rwgjlncaaaaaaaaaaaaaaaaaaaba41m468u.xn--bfj606ben8bfnaaaaaaaaaaaaaaaaaa79563b


This input causes libidn2 encoding to report an error of "string has
forbidden bi-directional properties". To determine which library was wrong,
I implemented the bidi rule myself, and I believe this should be valid.

   - Domain name:

    ਗ਼.ÿ߽̃̃̃
   - Domain name hex codepoints:

   ['a17', 'a3c', '2e', 'ff', '7fd', '303', '303', '303']
   - Punycode:


   
xn--lkvaa9xr87caaaaaaaaaaaaaaaaaaaaaaaaaaa7968dcp2n7tvk.xn--p9mx3db62rwgjlncaaaaaaaaaaaaaaaaaaaba41m468u.xn--bfj606ben8bfnaaaaaaaaaaaaaaaaaa79563b


This input causes libidn2 to report a disallowed character. This appears to
not be a "bug", but rather out-of-date tables in libidn2. The offending
character <https://www.fileformat.info/info/unicode/char/0e90/index.htm> was
only added to Unicode in 2019.

   - Domain name:

   ຐ.xyz <http://xn--46c.xyz>
   - Domain name hex codepoints:

   ['e90', '2e', '78', '79', '7a']
   - Punycode:

   xn--46c.xyz

Reply via email to