Edit report at https://bugs.php.net/bug.php?id=48147&edit=1
ID: 48147 Updated by: ezy...@php.net Reported by: kulakov74 at yandex dot ru Summary: iconv with //IGNORE cuts the string -Status: Bogus +Status: Re-Opened Type: Bug Package: ICONV related Operating System: Linux PHP Version: 5.*, 6CVS (2009-05-05) Block user comment: N Private report: N New Comment: I think I understand how to fix this bug, without modifying glibc. We need to modify our invocation of iconv in order to mirror the behavior of iconv_prog.c:process_block() when the '-c' flag is set (if we mimic the code closely enough, we also get an extra bonus of sensible block processing behavior, which is better than the horrible over-allocation iconv does right now). In particular, we need to handle the EILSEQ error code correctly. Previous Comments: ------------------------------------------------------------------------ [2011-12-18 22:34:38] ezy...@php.net Upstream bugs: http://sources.redhat.com/bugzilla/show_bug.cgi?id=13517 http://sources.redhat.com/bugzilla/show_bug.cgi?id=13518 ------------------------------------------------------------------------ [2011-12-18 19:37:53] ezy...@php.net Not broken in latest version of libiconv ezyang@javelin:~/Desktop/libiconv-1.14/src$ ./iconv_no_i18n --version iconv (GNU libiconv 1.14) Copyright (C) 2000-2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Bruno Haible. ezyang@javelin:~/Desktop/libiconv-1.14/src$ ./iconv_no_i18n -f utf-8 -t iso-8859-1//IGNORE ~/iconv.html | wc -c 15312 ezyang@javelin:~/Desktop/libiconv-1.14/src$ iconv -f utf-8 -t iso-8859-1//IGNORE ~/iconv.html | wc -c iconv: illegal input sequence at position 8168 8157 ------------------------------------------------------------------------ [2009-05-07 13:58:21] j...@php.net We still can't fix bugs in glibc iconv implementation. Try this on command line and you get same results: # iconv -f utf-8 -t iso-8859-1 iconv.html > /dev/null iconv: illegal input sequence at position 3589 # iconv -f utf-8 -t iso-8859-1//IGNORE iconv.html > /dev/null iconv: illegal input sequence at position 8168 ------------------------------------------------------------------------ [2009-05-07 07:50:52] lbarn...@php.net Marked it as verified as I got exactly the same results: The first iconv() call (the one without //IGNORE) fails on the emphasis character "â¦" (value="Searchâ¦"), which can't be represented in ISO-8859-1. The second iconv() call (the one with //IGNORE) fails later (so the emphasis is ignored, which may means that the //IGNORE flag is supported), and there is no apparent reason for failing at offset 8157 (only regular ASCII chars around). ------------------------------------------------------------------------ [2009-05-06 18:36:10] j...@php.net Arnaud: Please don't reopen bogus bugs without explanation. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48147 -- Edit this bug report at https://bugs.php.net/bug.php?id=48147&edit=1