On 07/20/2016 07:04 AM, Eric Blake wrote:
> On 07/20/2016 06:21 AM, Pádraig Brady wrote:
> 
>> It's worth considering having a separate (already existing?) util
>> to fix data before processing. That could have options to:
>>   drop invalid chars, replace with replacement char,
>>   apply various http://unicode.org/reports/tr15/#Norm_Forms,
>>   convert enclosed forms like ㊷ to 42 etc.
>> I.E. we should avoid complicating each util where possible,
>> and at least avoid having options on each util that could be
>> hoisted to a more general util like above.
>>
>> Silently dropping invalid characters probably isn't a great idea,
>> and warnings to stderr is a bit messy and could be seen to contradict
>> POSIX which suggests exiting with failure if anything output to stderr.
>> A compromise might be to just replace invalid chars with
>> the replacement character � and then include that in
>> normal character processing, to make issues in input apparent.
> 
> Since there are several plausible error-handling methods (silently
> discard invalid input, flag input as invalid with an error and no
> further output, convert invalid input into replacement character and
> proceed with output), all of which can be considered desirable in some
> circumstances, I wonder if we should give ALL utilities a common
> --encoding-error=POLICY option that allows runtime selection between the
> three policies, and/or an environment variable that selects the default
> policy in absence of a command line choice.

Interestingly enough, today's POSIX phone call started discussions on
how iconv() needs to be enhanced to support multiple error handling modes:

http://austingroupbugs.net/bug_view_page.php?bug_id=1007


-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to