Thu Aug 16 05:11:13 2018: Request 126280 was acted upon.
Transaction: Correspondence added by RSCHUPP
Queue: PAR-Packer
Subject: 90-rt122949.t fails when "Use Unicode UTF-8 for worldwide
language support" is enabled
Broken in: (no value)
Severity: (no value)
Owner: Nobody
Requestors: [email protected]
Status: new
Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=126280 >
On 2018-08-15 19:56:57, XENU wrote:
> "\357\277\275" is a REPLACEMENT CHARACTER. It seems that when the UTF-
> 8 checkbox is enabled, bytes that aren't valid UTF-8 are being
> replaced with that character. "\x{85}" obviously isn't a valid UTF-8
> character.
Nope, "\x{85}" is a valid Unicode code point (there's no such thing as a
"UTF-8 character"), cf. http://www.unicode.org/charts/PDF/U0080.pdf
For backgroud information, we're in a murky Windows area here:
when you call the C-level function (somewhere in the guts of PAR::Packer)
spawnvp(P_WAIT, "some.exe", argv)
you have to actually manipulate the strings in argv[] so that some.exe
actually sees the original argv in its
main(argc, argv)
The most obvious gotcha is when some argv[i] contains blanks, e.g.
"foo bar quux", which will arrive at some.exe as *three* separate elements of
argv[],
"foo", "bar", "quux". See Win32::ShellQuote for details, that's where I stole
most of the test cases from.
Anyway, a 100% solution is probably not possible and "\x{85}", while legal
Unicode,
isn't a very relevant test case - it's a control char ("NEXT LINE"). So there
may
be a reason why Microsoft treats it differently under "Use Unicode UTF-8 for
worldwide language support".
Let's replace this test case with some more relevant cases uses of strings
with non-ASCII chars:
[ qq[german umlaute \x{E4}\x{F6}\x{FC}] ],
[ qq[chinese zhongwen \x{4E2D}\{6587}] ],
Can you rerun the failing test with these modifications under "Use Unicode..."?
Cheers, Roderich