[perl.git] branch smoke-me/khw-encode, created. v5.25.4-11-ga8eb2e0

Karl Williamson Mon, 22 Aug 2016 11:36:18 -0700

In perl.git, the branch smoke-me/khw-encode has been created

<http://perl5.git.perl.org/perl.git/commitdiff/a8eb2e025035173ac08bf1371188a189466d64d2?hp=0000000000000000000000000000000000000000>


        at  a8eb2e025035173ac08bf1371188a189466d64d2 (commit)

- Log -----------------------------------------------------------------
commit a8eb2e025035173ac08bf1371188a189466d64d2
Author: Karl Williamson <k...@cpan.org>
Date:   Sat Aug 20 15:16:06 2016 -0600

    Speed up Encode UTF-8 validation checking
    
    This replaces the current scheme for checking UTF-8 validity by one
    in which normal processing doesn't require having to decode the UTF-8
    into code points.  The copying of characters individually from the input
    to the output is changed to be a single operation for each entire span
    of valid input at once.
    
    Thus in the normal case, what ends up happening is a tight loop to
    check the validity, and then a memmove of the entire input to the
    output, then return.
    
    If an error is found, it copies all the valid input before the error,
    then handles the character in error, then positions to the next input
    position and repeats.
    
    It uses the functionality available from the Perl 5 core to to look at
    just the bytes that comprise the UTF-8 to make the determination,
    converting to code points only those that are defective some how in
    order to display them in warnings and error messages.  (The core macro
    it calls,isUTF8_CHAR(), currently does convert extremely large code
    points as well, only those well above any legal Unicode ones, and hence
    extremely unlikely to be encountered in practice.)
    
    Thus, this does not need to know about the intricacies of UTF-8
    malformations, relying on the core to handle this.
    
    Not all the core facilities used are in the public API.  That was true
    of the implementation this replaces as well.  I'm confident enough in
    all the ones it does use to put them in the API.
    
    I have not looked at previous Perl versions to see how this would work
    on them.  That will have to be tested and ppport used to overcome this.
    That should be done anyway to make sure we've got less buggy Unicode
    handling code available to older modules.

M       cpan/Encode/Encode.pm
M       cpan/Encode/Encode.xs
M       t/porting/customized.dat

commit 40646e1822ebcb15f1f70d9153bd3714b2013372
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Aug 22 12:28:21 2016 -0600

    utf8.c: Use 'break' instead of 'goto'
    
    The goto is a relic of a previous implementation; 'break' is preferred
    if there isn't a reason to use goto.

M       utf8.c

commit b3d5da70d866b2de261e195f4d0b68fb34991e39
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Aug 22 12:25:00 2016 -0600

    is_utf8_string_loc() param should not be NULL
    
    It makes no sense to call this function with a NULL parameter, as the
    whole point of using this function is to set what that param points to.
    If you don't want this, you should be using the similar function that
    doesn't have this parameter.

M       embed.fnc
M       proto.h

commit 8ecdd3b937d2529c7df2eb884d6617ba7b62152f
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Aug 22 12:21:06 2016 -0600

    Document valid_utf8_to_uvchr() and inline it
    
    This function has been in several releases without problem, and is short
    enough that some compilers can inline it.  This commit also notes that
    it is a pure function to the compiler, and that the result should not be
    ignored.

M       embed.fnc
M       embed.h
M       inline.h
M       proto.h
M       utf8.c

commit e157288025538029bb90e34f66c737d7f1b03007
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Aug 22 10:48:55 2016 -0600

    utf8.c: Clarify comments for valid_utf8_to_uvchr()

M       utf8.c

commit 78e0d689d5a60cb47b08cd58d00975b3162c9059
Author: Karl Williamson <k...@cpan.org>
Date:   Mon Aug 22 10:59:48 2016 -0600

    utf8.c: Join EBCDIC/non-EBCDIC code
    
    This was missed in 534752c1d25d7c52c702337927c37e40c4df103d

M       utf8.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

[perl.git] branch smoke-me/khw-encode, created. v5.25.4-11-ga8eb2e0

Reply via email to