In perl.git, the branch smoke-me/khw-encode has been created
<http://perl5.git.perl.org/perl.git/commitdiff/ee7e2e356d9c0badaa71d9a7c296110fdc3523dc?hp=0000000000000000000000000000000000000000>
at ee7e2e356d9c0badaa71d9a7c296110fdc3523dc (commit)
- Log -----------------------------------------------------------------
commit ee7e2e356d9c0badaa71d9a7c296110fdc3523dc
Author: Karl Williamson <[email protected]>
Date: Thu Sep 29 11:51:41 2016 -0600
APItest/t/utf8.t: Skip some tests if major one fails
If the patched test fails, the subsequent ones in the loop are
meaningless, so don't execute them.
M ext/XS-APItest/t/utf8.t
commit 450388809c39d7712cf5a7d9474afde480bdab94
Author: Karl Williamson <[email protected]>
Date: Thu Sep 29 11:50:51 2016 -0600
APItest/t/utf8.t: Fix typo
This was a typo in the UTF-EBCDIC for a code point, so affected only
tests on that platform
M ext/XS-APItest/t/utf8.t
commit 6fdfff5cf77e9d828ab2c98c7e472f6e16904a51
Author: Karl Williamson <[email protected]>
Date: Wed Sep 28 15:05:17 2016 -0600
Add details to UTF-8 malformation error messages
I've long been unsatisfied with the information contained in the
error/warning messages raised when some input is malformed UTF-8, but
have been reluctant to change the text in case some one is relying on
it. One reason that someone might be parsing the messages is that there
has been no convenient way to otherwise pin down what the exact
malformation might be. A couple of commits from now will add a facility
to get the type of malformation unambiguously. This will be a better
mechanism to use for those rare modules that need to know what's the
exact malformation.
So, I will fix and issue pull requests for any module broken by this
commit.
The messages are changed by now dumping (in \xXY format) the bytes that
make up the malformed character, and extra details are added in most
cases.
Messages about overlongs now display the code point they evaluate to and
what the shortest UTF-8 sequence for generating that code point is.
Messages about overflowing now just display that it overflows, since the
entire byte sequence is now dumped. The previous message displayed just
the byte which was being processed where overflow was detected, but that
is not helpful at all.
M embed.fnc
M embed.h
M ext/XS-APItest/t/utf8.t
M lib/utf8.t
M proto.h
M t/io/utf8.t
M t/lib/warnings/utf8
M t/op/pack.t
M t/op/utf8decode.t
M utf8.c
commit 1bb4232936a28ed37b439abbf7d36c4cc7daa7b2
Author: Karl Williamson <[email protected]>
Date: Wed Sep 28 10:19:03 2016 -0600
utf8.c: Consolidate duplicate error msg text
This text is generated in 2 places; consolidate into one place.
M embed.fnc
M embed.h
M proto.h
M utf8.c
commit b35a72a4ab9b5088fd9055d79ea0452f824c34db
Author: Karl Williamson <[email protected]>
Date: Wed Sep 28 20:42:30 2016 -0600
utf8n_to_uvchr() Fix EBCDIC bug with overlongs
The comment removed in this commit was wrong, and so was the code it
described. On EBCDIC platforms, there are malformations that need to be
converted from Unicode to native. When I wrote that I wasn't thinking
about overlongs, which can evaluate to any code point. The new tests in
d566bd20c27a46aecd668d2f739b9515f46ac74f caught this.
M utf8.c
commit fe09146d26b011c331eeba272c11f54219f1e6ac
Author: Karl Williamson <[email protected]>
Date: Thu Sep 15 09:09:07 2016 -0600
XXX incomplete: Add sv_utf8_decode_flags
M embed.fnc
M embed.h
M proto.h
M sv.c
M sv.h
commit 616a4419a60f23161f23044fccf9ae118936331a
Author: Karl Williamson <[email protected]>
Date: Wed Sep 14 22:40:23 2016 -0600
customized
M t/porting/customized.dat
commit 0d4e517b886e70caa8183dcf36ce4d0b98cda484
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:20:52 2016 -0600
Use core REPLACEMENT CHARACTER definition
This allows the code to now work on EBCDIC as well.
M cpan/Encode/Encode/encode.h
commit 197908187d872546d516828f0aab55d6f23122d2
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:16:00 2016 -0600
XXX commit msg: Encode.xs: Rmv unused function
M cpan/Encode/Encode.xs
commit aeed81164509bc5ae40047b38a1827cb9d85d3c6
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:12:39 2016 -0600
Encode.xs: white-space only
M cpan/Encode/Encode.xs
commit 21d562970bc2e3a62a63e2a6fb0a53ed72d2a853
Author: Karl Williamson <[email protected]>
Date: Thu Sep 1 12:12:06 2016 -0600
XXX maybe more in commit msg: Speed up Encode UTF-8 validation checking
This replaces the current scheme for checking UTF-8 validity by one
in which normal processing doesn't require having to decode the UTF-8
into code points. The copying of characters individually from the input
to the output is changed to be a single operation for each entire span
of valid input at once.
Thus in the normal case, what ends up happening is a tight loop to
check the validity, and then a memmove of the entire input to the
output, then return.
If an error is found, it copies all the valid input before the error,
then handles the character in error, then positions to the next input
position, and repeats the whole process starting from there.
It uses the functionality available from the Perl 5 core to to look at
just the bytes that comprise the UTF-8 to make the determination,
converting to code points only those that are defective some how in
order to display them in warnings and error messages.
Thus, this does not need to know about the intricacies of UTF-8
malformations, relying on the core to handle this.
This cannot be pushed to CPAN until Devel::PPPort has been updated to
implement all the functions now needed.
M cpan/Encode/Encode.pm
M cpan/Encode/Encode.xs
-----------------------------------------------------------------------
--
Perl5 Master Repository