Bug#1032173: identity recoding is too identical

Zefram Tue, 28 Feb 2023 17:33:16 -0800

Package: recode
Version: 3.6-24
Severity: normal

recode(1) usually checks that its input conforms to the specified input
encoding, and signals an error if it doesn't:


$ echo $'L\xe9on' | recode utf8..utf7
Lrecode: Invalid input in step `UTF-8..UNICODE-1-1-UTF-7'

But if the output encoding happens to be the same as the input encoding
then this checking doesn't happen, and invalid output can be produced:

$ echo $'L\xe9on' | recode utf8..utf8 | od -tc
0000000   L 351   o   n  \n
0000005

The invocation with both encodings the same superficially looks like
it's requesting an identity transformation, and it would correctly have
the behaviour of an identity transformation on input that were correctly
encoded.  Because of the input checking that recode(1) usually provides,
it seems like this kind of invocation would be useful, as something
that copies its input while checking the encoding.  Apparently it's
being optimised incorrectly, to a pure identity transformation without
the checking.

-zefram

Bug#1032173: identity recoding is too identical

Reply via email to