On 07/20/2016 07:04 AM, Eric Blake wrote: > On 07/20/2016 06:21 AM, Pádraig Brady wrote: > >> It's worth considering having a separate (already existing?) util >> to fix data before processing. That could have options to: >> drop invalid chars, replace with replacement char, >> apply various http://unicode.org/reports/tr15/#Norm_Forms, >> convert enclosed forms like ㊷ to 42 etc. >> I.E. we should avoid complicating each util where possible, >> and at least avoid having options on each util that could be >> hoisted to a more general util like above. >> >> Silently dropping invalid characters probably isn't a great idea, >> and warnings to stderr is a bit messy and could be seen to contradict >> POSIX which suggests exiting with failure if anything output to stderr. >> A compromise might be to just replace invalid chars with >> the replacement character � and then include that in >> normal character processing, to make issues in input apparent. > > Since there are several plausible error-handling methods (silently > discard invalid input, flag input as invalid with an error and no > further output, convert invalid input into replacement character and > proceed with output), all of which can be considered desirable in some > circumstances, I wonder if we should give ALL utilities a common > --encoding-error=POLICY option that allows runtime selection between the > three policies, and/or an environment variable that selects the default > policy in absence of a command line choice.
Interestingly enough, today's POSIX phone call started discussions on how iconv() needs to be enhanced to support multiple error handling modes: http://austingroupbugs.net/bug_view_page.php?bug_id=1007 -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
