Bug in documentation for Encode::decode_utf8 ?

2013-09-05 Thread William Blunn

The documentation for Encode::decode_utf8 begins:

$string = decode_utf8($octets [, CHECK]);

Equivalent to $string = decode(utf8, $octets [, CHECK]).

So what should the following one-liner emit?

perl -E 'use charnames :full; use Encode; my $x = \N{LATIN CAPITAL 
LETTER A WITH CIRCUMFLEX}\N{POUND SIGN}; say decode_utf8($x) eq 
decode(utf8, $x) ? Fine : WTF?'


Fine\n ?

On my system it emits WTF?\n.

It seems that decode_utf8(...) is a no-op if the input string has the 
UTF8 flag on, but decode(utf8, ...) will always try to decode 
regardless of the state of the UTF8 flag.


But the documentation says that they are equivalent.

So the documentation would appear to be at odds with the behaviour.

Regards,

Bill


Re: Bug in documentation for Encode::decode_utf8 ?

2013-09-05 Thread Jason Clifford

On 2013-09-05 09:31, William Blunn wrote:

It seems that decode_utf8(...) is a no-op if the input string has the
UTF8 flag on, but decode(utf8, ...) will always try to decode
regardless of the state of the UTF8 flag.

But the documentation says that they are equivalent.

So the documentation would appear to be at odds with the behaviour.


Yes. Looking at the code decode_utf8 has return $_ if is_utf8($_) as 
it's first line which decode does not.
decode_utf8 also lacks the a check that find_encoding('utf8') succeeded 
before using it however if that causes a problem it's because there are 
far bigger problems. Another difference is that decode_utf8 appends '' 
if $_ is a reference while decode() appends '' without any condition.




Re: Bug in documentation for Encode::decode_utf8 ?

2013-09-05 Thread Mark Fowler
On Thursday, 5 September 2013 at 04:31, William Blunn wrote:
 But the documentation says that they are equivalent.
 
 So the documentation would appear to be at odds with the behaviour.
Yeah, this is a bug.  This one:

https://rt.cpan.org/Public/Bug/Display.html?id=87267

This was actually fixed in Encode 2.53 (which was released a week ago today)

https://metacpan.org/source/DANKOGAI/Encode-2.54/Changes

Now decode_utf8 does exactly what the documentation says it used to, they are 
now equivalent.

Mark.