[perl.git] branch smoke-me/khw-encode, created. v5.25.5-59-gee7e2e3

Karl Williamson Thu, 29 Sep 2016 12:00:25 -0700

In perl.git, the branch smoke-me/khw-encode has been created

<http://perl5.git.perl.org/perl.git/commitdiff/ee7e2e356d9c0badaa71d9a7c296110fdc3523dc?hp=0000000000000000000000000000000000000000>


        at  ee7e2e356d9c0badaa71d9a7c296110fdc3523dc (commit)

- Log -----------------------------------------------------------------
commit ee7e2e356d9c0badaa71d9a7c296110fdc3523dc
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 29 11:51:41 2016 -0600

    APItest/t/utf8.t: Skip some tests if major one fails
    
    If the patched test fails, the subsequent ones in the loop are
    meaningless, so don't execute them.

M       ext/XS-APItest/t/utf8.t

commit 450388809c39d7712cf5a7d9474afde480bdab94
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 29 11:50:51 2016 -0600

    APItest/t/utf8.t: Fix typo
    
    This was a typo in the UTF-EBCDIC for a code point, so affected only
    tests on that platform

M       ext/XS-APItest/t/utf8.t

commit 6fdfff5cf77e9d828ab2c98c7e472f6e16904a51
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 28 15:05:17 2016 -0600

    Add details to UTF-8 malformation error messages
    
    I've long been unsatisfied with the information contained in the
    error/warning messages raised when some input is malformed UTF-8, but
    have been reluctant to change the text in case some one is relying on
    it.  One reason that someone might be parsing the messages is that there
    has been no convenient way to otherwise pin down what the exact
    malformation might be.  A couple of commits from now will add a facility
    to get the type of malformation unambiguously.  This will be a better
    mechanism to use for those rare modules that need to know what's the
    exact malformation.
    
    So, I will fix and issue pull requests for any module broken by this
    commit.
    
    The messages are changed by now dumping (in \xXY format) the bytes that
    make up the malformed character, and extra details are added in most
    cases.
    
    Messages about overlongs now display the code point they evaluate to and
    what the shortest UTF-8 sequence for generating that code point is.
    
    Messages about overflowing now just display that it overflows, since the
    entire byte sequence is now dumped.  The previous message displayed just
    the byte which was being processed where overflow was detected, but that
    is not helpful at all.

M       embed.fnc
M       embed.h
M       ext/XS-APItest/t/utf8.t
M       lib/utf8.t
M       proto.h
M       t/io/utf8.t
M       t/lib/warnings/utf8
M       t/op/pack.t
M       t/op/utf8decode.t
M       utf8.c

commit 1bb4232936a28ed37b439abbf7d36c4cc7daa7b2
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 28 10:19:03 2016 -0600

    utf8.c: Consolidate duplicate error msg text
    
    This text is generated in 2 places; consolidate into one place.

M       embed.fnc
M       embed.h
M       proto.h
M       utf8.c

commit b35a72a4ab9b5088fd9055d79ea0452f824c34db
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 28 20:42:30 2016 -0600

    utf8n_to_uvchr() Fix EBCDIC bug with overlongs
    
    The comment removed in this commit was wrong, and so was the code it
    described.  On EBCDIC platforms, there are malformations that need to be
    converted from Unicode to native.  When I wrote that I wasn't thinking
    about overlongs, which can evaluate to any code point.  The new tests in
    d566bd20c27a46aecd668d2f739b9515f46ac74f caught this.

M       utf8.c

commit fe09146d26b011c331eeba272c11f54219f1e6ac
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 15 09:09:07 2016 -0600

    XXX incomplete: Add sv_utf8_decode_flags

M       embed.fnc
M       embed.h
M       proto.h
M       sv.c
M       sv.h

commit 616a4419a60f23161f23044fccf9ae118936331a
Author: Karl Williamson <[email protected]>
Date:   Wed Sep 14 22:40:23 2016 -0600

    customized

M       t/porting/customized.dat

commit 0d4e517b886e70caa8183dcf36ce4d0b98cda484
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:20:52 2016 -0600

    Use core REPLACEMENT CHARACTER definition
    
    This allows the code to now work on EBCDIC as well.

M       cpan/Encode/Encode/encode.h

commit 197908187d872546d516828f0aab55d6f23122d2
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:16:00 2016 -0600

    XXX commit msg: Encode.xs: Rmv unused function

M       cpan/Encode/Encode.xs

commit aeed81164509bc5ae40047b38a1827cb9d85d3c6
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:12:39 2016 -0600

    Encode.xs: white-space only

M       cpan/Encode/Encode.xs

commit 21d562970bc2e3a62a63e2a6fb0a53ed72d2a853
Author: Karl Williamson <[email protected]>
Date:   Thu Sep 1 12:12:06 2016 -0600

    XXX maybe more in commit msg: Speed up Encode UTF-8 validation checking
    
    This replaces the current scheme for checking UTF-8 validity by one
    in which normal processing doesn't require having to decode the UTF-8
    into code points.  The copying of characters individually from the input
    to the output is changed to be a single operation for each entire span
    of valid input at once.
    
    Thus in the normal case, what ends up happening is a tight loop to
    check the validity, and then a memmove of the entire input to the
    output, then return.
    
    If an error is found, it copies all the valid input before the error,
    then handles the character in error, then positions to the next input
    position, and repeats the whole process starting from there.
    
    It uses the functionality available from the Perl 5 core to to look at
    just the bytes that comprise the UTF-8 to make the determination,
    converting to code points only those that are defective some how in
    order to display them in warnings and error messages.
    
    Thus, this does not need to know about the intricacies of UTF-8
    malformations, relying on the core to handle this.
    
    This cannot be pushed to CPAN until Devel::PPPort has been updated to
    implement all the functions now needed.

M       cpan/Encode/Encode.pm
M       cpan/Encode/Encode.xs
-----------------------------------------------------------------------

--
Perl5 Master Repository

[perl.git] branch smoke-me/khw-encode, created. v5.25.5-59-gee7e2e3

Reply via email to