[perl.git] branch khw/ebcdic, created. v5.17.9-211-g5eab0cd

Karl Williamson Sun, 10 Mar 2013 21:14:38 -0700

In perl.git, the branch khw/ebcdic has been created

<http://perl5.git.perl.org/perl.git/commitdiff/5eab0cdb5fb4522ee49456f11931f3132e298f8f?hp=0000000000000000000000000000000000000000>


        at  5eab0cdb5fb4522ee49456f11931f3132e298f8f (commit)

- Log -----------------------------------------------------------------
commit 5eab0cdb5fb4522ee49456f11931f3132e298f8f
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Fri Mar 8 11:01:32 2013 -0700

    XXX EBCDIC header files

M       charclass_invlists.h
M       l1_char_class_tab.h
M       unicode_constants.h

commit b04d445fefbaa43db80553dc17262a6981ee49b1
Author: John Goodyear <johng...@us.ibm.com>
Date:   Sat Mar 2 12:31:25 2013 -0700

    XXX Temporary for z/OS long long support

M       Configure
M       hints/os390.sh

commit 67df5c08cd1fd2eb2becec160727f02c1c212e04
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Mar 10 13:11:07 2013 -0600

    XXX Temporary comment out ParseXS check
    
    this is to get things to compile for now

M       dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm

commit 0df90f27f317e20ae53c03f4860e9dd36fe000e6
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Mar 10 11:34:10 2013 -0600

    XXX Collate, Normalize: Allow to compile under EBCDIC

M       cpan/Unicode-Collate/Collate.pm
M       cpan/Unicode-Collate/mkheader
M       cpan/Unicode-Normalize/Normalize.pm
M       cpan/Unicode-Normalize/mkheader

commit 09150d7e6dcdadeae95d30562e5841d4c277d34d
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sat Mar 9 21:57:38 2013 -0700

    XXX dquote_static.c: Silence wrong warning on EBCDIC
    
    Unsure of whether to add the 2nd !isCNTRL_L1 to silence return trip,
    which should be a separate commit anyway.
    
    This silences an inappropriate warning that doesn't happen on ASCII
    platforms.  CTRL-T maps to 0x14 on both ASCII and EBCDIC platforms.  But
    0x14 is a C1 control on EBCDIC, a C0 on ASCII.  Therefore the test that
    it's a control should include both C0 and C1, which isCNTRL_L1() does.
    
    Also has a white-space change, outdenting a line so it doesn't wrap in
    an 80 column window.

M       dquote_static.c

commit dbe9e4667a31f99490f0e23e30223b29a2421282
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Thu Mar 7 12:08:41 2013 -0700

    utfebcdic.h: Change 'unsigned char' to U8
    
    This is for consistency with the rest of Perl

M       utfebcdic.h

commit 3d03e09e5d6eb1a7b047fe4f8eec23adcc731092
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Fri Mar 8 08:11:38 2013 -0700

    regen/regcharclass.pl: Make more EBCDIC-friendly
    
    This commit changes the code generated by the macros so that they work
    right out-of-the-box on non-ASCII platforms for non-UTF-8 inputs.  THEY
    ARE WRONG for UTF-8, but this is good enough to get perl bootstrapped
    onto the target platform, and regcharclass.pl can be run there,
    generating macros correct UTF-8.

M       regcharclass.h
M       regen/regcharclass.pl

commit da5610eafb6066b458d7ac4fe57519bb38779740
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Mar 6 21:30:01 2013 -0700

    utfebcdic.h: Add (UV) cast
    
    The operand of this macro is implicitly a UV.  Make sure that it is.

M       utfebcdic.h

commit e9e33410f3ad60a1f8b9168b8e901c9e89bed13a
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Mar 6 17:04:58 2013 -0700

    handy.h: Allow bootstrapping to non-ASCII platform
    
    This adds a bunch of macros and moves things around to support
    conditional compilation when Configure is called with
    -DBOOTSTRAP_CHARSET.  Doing so causes the usual macros that are
    table-driven to not be used, since the table may not be valid when
    bringing Perl up for the first time on a non-ASCII platform.
    
    This allows it to compile using the platform's native C library ctype
    functions, which should work enough to compile miniperl, and allow the
    table to be changed to be valid.  Then Configure can be re-run to not
    bootstrap, and normal compilation can proceed

M       handy.h
M       inline.h

commit 598ffe073b09f8e8753f1ec12887d9b5899b9ad8
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Mar 4 13:43:26 2013 -0700

    gv.c: Remove EBCDIC dependency

M       gv.c

commit a03816435f8cfa1da0d6c677a4543c8cb4119f7e
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Mar 4 13:00:47 2013 -0700

    toke.c: Remove EBCDIC dependency

M       toke.c

commit 222c72555e6776c48a2d835cee183d644fc459b0
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Mar 4 09:14:25 2013 -0700

    toke.c: Remove character set dependency
    
    Instead of hard-coding the bit patterns that comprise the Byte Order
    Mark in the UTF-8 or UTF-EBCDIC encodings, use the generated ones for
    the current platform.
    
    This removes some EBCDIC-only code.

M       toke.c

commit 577c3241c380127cf76ec6cddf625f0ca8921b78
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Mar 4 09:10:27 2013 -0700

    unicode_constants.h: Add #defines for Byte Order Mark
    
    These will be used in future commits

M       regen/unicode_constants.pl
M       unicode_constants.h

commit c44bc3320120012b60c708b0fe44130496fdb70b
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sat Mar 2 15:04:18 2013 -0700

    XXX: Find a cleaner way. Handle missing is_UTF8_CHAR_utf8_safe
    
    This macro may not be present, and is currently used exclusively in
    IS_UTF8_CHAR, which itself may be undefined, and code should cope with
    that.  This is a work-around until a better solution is found.

M       utf8.c
M       utf8.h

commit 165725ff92b973a7cc44159b44251a713e23b5cc
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sat Mar 2 14:09:04 2013 -0700

    Add Porting tool for help with non-ASCII platforms
    
    Porting/reorder_l1_char_class_tab.pl is used to bootstrap Perl onto a
    non-ASCII platform with no working Perl.

M       MANIFEST
A       Porting/reorder_l1_char_class_tab.pl
M       regen/mk_PL_charclass.pl

commit 74526830206f06e68509865ac993b5c16e13a10b
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sat Mar 2 13:06:58 2013 -0700

    inline.h: Reorder functions
    
    The comment implied that the functions below it in the file were
    deprecated, but in fact only the next two functions were.  This
    clarifies that and moves them so they are the final ones in the file

M       inline.h

commit 96e0aafd0c7de8865eb6d50bc74e9454bf1d3755
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sat Mar 2 12:33:42 2013 -0700

    utfebcdic.h: Add comment

M       utfebcdic.h

commit 028969003dabd9e627336fc76645d84d94acc020
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sat Mar 2 12:12:11 2013 -0700

    utf8.h: Clean up START_MARK definition and use
    
    The previous definition broke good encapsulation rules.  UTF_START_MARK
    should return something that fits in a byte; it shouldn't be the caller
    that does this.  So the mask is moved into the definition.  This means
    it can apply only to the portion that creates something larger than a
    byte.  Further, the EBCDIC version can be simplified, since 7 is the
    largest possible number of bytes in an EBCDIC UTF8 character.

M       utf8.h
M       utfebcdic.h

commit 8f322df02c9649471364c80ce88daa7a8905f344
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sat Mar 2 12:05:26 2013 -0700

    utf8.h: Move #includes
    
    These two files were only being #included for non-ebcdic compiles; they
    should be included always.

M       utf8.h

commit bfb4d32192882d7707dc31a3aa7832db3df4ee86
Author: John Goodyear <johng...@us.ibm.com>
Date:   Sat Mar 2 11:49:14 2013 -0700

    utfebcdic.h: Remove extra parameter expansions
    
    These two macros were improperly expanding the parameters as well as
    defining the operation, leading to compile errors.

M       utfebcdic.h

commit 2bfad75e80a07a20b554945110f2735da6ad0b77
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Fri Mar 1 08:28:52 2013 -0700

    utf8.h: Simplify UTF8_EIGHT_BIT_foo on EBCDIC
    
    These macros were previously defined in terms of UTF8_TWO_BYTE_HI and
    UTF8_TWO_BYTE_LO.  But the EIGHT_BIT versions can use the less general
    and simpler NATIVE_TO_LATN1 instead of NATIVE_TO_UNI because the input
    domain is restricted in the EIGHT_BIT.  Note that on ASCII platforms,
    these both expand to the same thing, so the difference matters only on
    EBCDIC.

M       utf8.h

commit 8a148438132be9ebd4be875108f43cee9c589abf
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Thu Feb 28 09:25:27 2013 -0700

    XXX temp:  show makedepend cerr

M       makedepend.SH

commit 965635fe4487a0a73a018517d4e4d86ba90c2382
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Feb 27 21:59:11 2013 -0700

    makedepend.SH: Split too long lines; properly join
    
    I had thought that a continuation introduced a space.  But no,
    a continuation can happen in the middle of a token.
    
    And this splits lines that are getting very long to avoid preprocessor
    limitations.

M       makedepend.SH

commit de4ff993328cf7a0f33d1be629e290185247a643
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Feb 27 15:51:28 2013 -0700

    makedepend.SH: White-space only
    
    Align continuation backslashes

M       makedepend.SH

commit b6af465ca19adbf783cf49e53b2f9b17bfb834f4
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Feb 27 14:39:28 2013 -0700

    makedepend.SH: Remove some unnecessary white space
    
    Multi-line preprocessor directives are now joined into single lines.
    This can create lines too long for the preprocessor to handle.  This
    commit removes blanks adjoining comments that get deleted.  This makes
    things somewhat less likely to exceed the limit.
    
    This commit also fixes several [] which were meant to each match a tab
    or a blank, but editors converted the tabs to blanks

M       makedepend.SH

commit 5733537fab2693c8a14297cc83b258c57df28fe9
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Feb 27 14:30:51 2013 -0700

    makedepend.SH: Retain '/**/' comments
    
    These comments may actually be necessary.

M       makedepend.SH

commit 1e51f6590fb0e09e86a608baa32b67a188b054bd
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Feb 27 08:38:19 2013 -0700

    handy.h: Remove extraneous parens

M       handy.h

commit f623ed1c0fb0b81aa58f279aab7a27632f452065
Author: Andy Dougherty <dough...@lafayette.edu>
Date:   Wed Feb 27 13:06:07 2013 -0500

    Disable gcc-style function attributes on z/OS.
    
    John Goodyear <johng...@us.ibm.com> reports that the z/OS C compiler
    supports the attribute keyword, but not exactly the same as gcc.
    Instead of a "warning", the compiler emits an "INFORMATIONAL" message
    that Configure fails to detect.  Until Configure is fixed, just disable
    the attributes altogether.
    
    John Goodyear

M       hints/os390.sh

commit aafded08822c4a6c38ff88fa6b530dad7189603a
Author: Andy Dougherty <dough...@lafayette.edu>
Date:   Wed Feb 27 09:12:13 2013 -0500

    Change os390 custom cppstdin script to use fgrep.
    
    Grep appears to be limited to 2048 characters, and truncates
    the output for cppstin.  Fgrep apparently doesn't have that limit.
    Thanks to John Goodyear <johng...@us.ibm.com> for reporting this.

M       hints/os390.sh

commit da91ff86dba83f553b2669527703d88ec8d59a84
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Tue Feb 26 13:45:19 2013 -0700

    utf8.c: Use more clearly named macro
    
    In the case of invariants these two macros should do the same thing,
    but it seems to me that the latter name more clearly indicates what is
    going on.

M       utf8.c

commit cad01f65a246f8f19000aa9fb2a8d5a002b75ac0
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Tue Feb 26 13:35:12 2013 -0700

    Add macro OFFUNISKIP
    
    This means use official Unicode code point numbering, not native.  Doing
    this converts the existing UNISKIP calls in the code to refer to native
    code points, which is what they meant anyway.  The terminology is
    somewhat ambiguous, but I don't think will cause real confusion.
    NATIVESKIP is also introduced for situations where it is important to be
    precise.

M       toke.c
M       utf8.c
M       utf8.h
M       utfebcdic.h

commit 7d7723b58b2a79d867a02d09966bcdfa857770ca
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Tue Feb 26 13:22:19 2013 -0700

    toke.c: white space only

M       toke.c

commit 4134bb29f9ef6e79d14e79b46ed87be00a94ce89
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Tue Feb 26 12:08:50 2013 -0700

    utf8.c: Deprecate two functions
    
    This is to force any code that has been using these functions to change.
    Since the Unicode tables are now stored in native order, these functions
    should only rarely be needed.
    
    However, the functionality of these is needed, and in actuality, on
    ASCII platforms, the native functions are #defined to these.  So what
    this commit does is rename the functions to something else, and create
    wrappers with the old names, so that anyone using them will get the
    deprecation.

M       embed.fnc
M       embed.h
M       mathoms.c
M       proto.h
M       toke.c
M       utf8.c
M       utf8.h

commit c47e28459cbb4a233ac2d010f42bdecbd3981e74
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Tue Feb 26 11:26:09 2013 -0700

    Deprecate uvuni_to_utf8()
    
    Code should almost never be dealing with non-native code points

M       embed.fnc
M       embed.h
M       proto.h
M       toke.c
M       utf8.c
M       utf8.h

commit daa32328fcb3591424180e61e8819d57aea18cf6
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Tue Feb 26 11:02:33 2013 -0700

    Deprecate utf8_to_uni_buf()
    
    Now that the tables are stored in native order, there is almost no need
    for code to be dealing in Unicode order.

M       embed.fnc
M       proto.h
M       utf8.c

commit e8528bd66195b2b6e48ee6ea52eecf2a37f836bb
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Tue Feb 26 09:00:18 2013 -0700

    makedepend.SH: Comment out unnecessary code
    
    This causes problems currently for z/OS.  But, since we don't know why
    it was there, I'm leaving it in as a placeholder.

M       makedepend.SH

commit bae2d98ec82302ea7a1335e9530892f7e8bef174
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Feb 25 20:26:44 2013 -0700

    Deprecate valid_utf8_to_uvuni()
    
    Now that all the tables are stored in native format, there is very
    little reason to use this function; and those who do need this kind of
    functionality should be using the bottom level routine, so as to make it
    clear they are doing nonstandard stuff.

M       embed.fnc
M       proto.h
M       utf8.c

commit 5e30371f74299226efb9510563e3cacef0961d71
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Feb 25 20:14:26 2013 -0700

    utf8.c: Swap which fcn wraps the other
    
    This is in preparation for the current wrapee becoming deprecated

M       embed.fnc
M       embed.h
M       proto.h
M       utf8.c
M       utf8.h

commit 295c1f68bd1967a5d9ba2ef66f59c402e6bba028
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Feb 25 19:29:34 2013 -0700

    utf8.c: Skip a no-op
    
    Since the value is invariant under both UTF-8 and not, we already have
    it in 'uv'; no need to do anything else to get it

M       utf8.c

commit c4387f401bd9262b30cbf6489a970031153dcc5a
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Feb 25 19:26:50 2013 -0700

    utf8.c: Move comment to where makes more sense

M       utf8.c

commit 2d45ce7f60d0245ba8aa7ae79b05507b17197f2d
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Feb 25 17:30:10 2013 -0700

    APItest: Test native code points, instead of Unicode

M       ext/XS-APItest/APItest.pm
M       ext/XS-APItest/APItest.xs
M       ext/XS-APItest/t/utf8.t

commit 09d942dd2acf9978d5ce81d9f2f700b70164c68b
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Feb 25 17:25:08 2013 -0700

    XXX CPAN Normalize
    
    This converts Unicode::Normalize to use the native tables that are used
    by Perl starting in XXX, while using the Unicode-ordered ones that were
    used before then.
    
    Another alternative would be to have mktables generate just these tables
    in Unicode ordering.

M       cpan/Unicode-Normalize/Normalize.xs

commit dde7f379af5167b0840c4083a488f569c5925e17
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Feb 25 17:22:55 2013 -0700

    XXX CPAN prob wrong Collate
    
    This changes to implicity usenative code points.  This is likely wrong,
    as the module comes with its own data, that are probably in terms of
    Unicode

M       cpan/Unicode-Collate/Collate.xs

commit 3aec9efd95cf5b479b945b421fcc950b05730134
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Feb 25 17:12:53 2013 -0700

    XXX CPAN Encode.xs
    
    Use core function if available.  This will insulate this code from any
    future changes.

M       cpan/Encode/Encode.xs

commit 4626ceae19bcf6e5c28fb2f7396e3dc4af87f0dc
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Feb 25 17:04:24 2013 -0700

    XXX CPAN and unsure Encode

M       cpan/Encode/Encode.xs
M       cpan/Encode/Unicode/Unicode.xs

commit 7d2cf6d2347488d6b6d3bc9b0d354bd78313942d
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Feb 25 17:00:47 2013 -0700

    XXX CPAN Encode.xs: fix indent

M       cpan/Encode/Encode.xs

commit 74e377cb5abf439c0c302a94454026d3eacdc220
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 24 17:23:15 2013 -0700

    Don't refer to U+XXXX when mean native
    
    These messages say the output number is Unicode, but it is really
    native, so change to saying is 0xXXXX.

M       regen/regcharclass_multi_char_folds.pl
M       regexec.c

commit 5261c0f7026866a2823c2c62aed2ffae02a8912e
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 24 16:43:59 2013 -0700

    Convert some uvuni() to uvchr()
    
    All the tables are now based on the native character set, so using
    uvuni() in almost all cases is wrong.

M       cygwin/cygwin.c
M       doop.c
M       op.c
M       pp_pack.c
M       regcomp.c
M       regexec.c
M       toke.c
M       utf8.c

commit c0529a44aa63a461b9f80c62253850dec31e141c
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 24 16:25:47 2013 -0700

    handy.h: White space only

M       handy.h

commit e93c1ab9028372d0af805d597878b58677150b17
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 24 16:19:49 2013 -0700

    t/test.pl: Allow native/latin1 string conversions to work on utf8.
    
    These functions no longer have the hard-coded definitions in them,
    but now end up resolving to internal functions, so that new encodings
    could be added and these would automatically understand them.
    
    Instead of using tr///, these now go character by character and
    converting to/from ord, which is slower, but allows them to operate on
    utf8 strings.
    
    Peephole optimization should make these essentially no-ops on ascii
    platforms.

M       t/test.pl

commit 174d794b1f61300d938f9588b53786aa72554876
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 24 16:05:55 2013 -0700

    t/test.pl: Simplify ord to/from native fcns
    
    This commit changes these functions from converting to/from a string to
    calling utf8:: functions which operate on ordinals instead.

M       t/test.pl

commit 6f6c57eca515b946f8c4d12420bca953cdf5b8ef
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 24 15:35:38 2013 -0700

    Make casing tables native
    
    These are final tables that haven't been converted to native character
    set casing.

M       perl.h
M       utfebcdic.h

commit 85122aa001b39c0bd21301544c830dbc6d9f8767
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 24 15:32:30 2013 -0700

    utfebcdic.h: Remove trailing spaces

M       utfebcdic.h

commit bcc198da013b12734c8d46ab987a5edeae4ee63d
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Fri Feb 22 18:55:26 2013 -0700

    EBCDIC has the unicode bug too
    
    We have not had a working modern Perl on EBCDIC for some years.  When I
    started out, comments and code led me to conclude erroneously that
    natively it supported semantics for all 256 characters 0-255.  It turns
    out that I was wrong; it natively (at least on some platforms) has the
    same rules (essentially none) for the characters which don't correspond
    to ASCII onees, as the rules for these on ASCII platforms.
    
    This commit forces those rules on EBCDIC platforms (even should there be
    one that natively uses all 256).  To get all 256, the same things like
    'use feature "unicode_strings"' must now be done.

M       autodoc.pl
M       handy.h
M       pod/perlfunc.pod
M       pod/perlre.pod
M       pod/perlrecharclass.pod
M       pod/perlunicode.pod
M       pod/perlunifaq.pod

commit 702eccb77c9a1f094d3dff96b3c281bfd3f42c7b
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Thu Feb 21 13:47:52 2013 -0700

    handy.h: Solve a failure to compile problem under EBCDIC
    
    handy.h is included in files that don't include perl.h, and hence not
    utf8.h.  We can't rely therefore on the ASCII/EBCDIC conversion
    macros being available to us.  The best way to cope is to use the native
    ctype functions.  Most, but not all, of the macros in this commit
    currently resolve to use those native ones, but a future commit will
    change that.

M       handy.h

commit 920ff90096cf7ec8ac38022b306c05bb1763325d
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Thu Feb 21 13:35:12 2013 -0700

    handy.h: Simplify some macro definitions
    
    Now, only one of the macros relies on magic numbers (isPRINT), leading
    to clearer definitions.

M       handy.h

commit b04620d051a74b451d278d8eb020660b2f7bb014
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Thu Feb 21 13:26:49 2013 -0700

    handy.h: Combine macros that are same in ASCII, EBCDIC
    
    These 4 macros can have the same RHS for their ASCII and EBCDIC
    versions, so no need to duplicate their definitions
    
    This also enables the EBCDIC versions to not have undefined expansions
    when compiling without perl.h

M       handy.h

commit c3756a316ce993ac803dc374c016b5646cc84243
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Feb 20 10:39:48 2013 -0700

    Deprecate NATIVE_TO_NEED and ASCII_TO_NEED
    
    These macros are no longer called in the Perl core.  This commit turns
    them into functions so that they can use gcc's deprecation facility.
    
    I believe these were defective right from the beginning, and I have
    struggled to understand what's going on.  From the name, it appears
    NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the
    appropriate parameter indicates that.  But that is impossible to do
    correctly from that API, as for variant characters, it needs to return
    two bytes.  It could only work correctly if ch is an I8 byte, which
    isn't native, and hence the name would be wrong.
    
    Similar arguments for ASCII_TO_NEED.
    
    The function S_append_utf8_from_native_byte(const U8 byte, U8** dest)
    does what I think NATIVE_TO_NEED intended.

M       embed.fnc
M       mathoms.c
M       proto.h
M       toke.c
M       utf8.h
M       utfebcdic.h

commit 680bad8bb16cb1e9c0f8404535c60e0c2cd5712f
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Feb 20 10:26:43 2013 -0700

    Remove remaining calls of NATIVE_TO_NEED
    
    These calls are just copying the input to the output byte by byte.
    There is no need to worry about UTF-8 or not, as the output is just an
    exact copy of the input

M       toke.c

commit 505b73ba3218d2bcf28ebccc1fed137dbc988d14
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Feb 20 08:12:15 2013 -0700

    toke.c: Remove some NATIVE_TO_NEED calls
    
    I believe NATIVE_TO_NEED is defective, and will remove it in a future
    commit.  But, just in case I'm wrong, I'm doing it in small steps so
    bisects will show the culprit.  This removes the calls to it where the
    parameter is clearly invariant under UTF-8 and UTF-EBCDIC, and so the
    result can't be other than just the parameter.

M       toke.c

commit cced5e9c4f0c09eef4254a9a5f650743bf0077e6
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Feb 20 08:22:07 2013 -0700

    toke.c: in [A-Za-z] use macros that exclude non-ASCII alphas
    
    This code is attempting to deal with the problem of holes in the ranges
    a-z and A-Z in EBCDIC.  Prior to this patch, it accepeted things like A
    WITH GRAVE, etc, which shouldn't have the special processing to deal
    with the holes

M       toke.c

commit d3257ce5da7cd199be29eda4818ef67415226b10
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Tue Feb 19 15:13:19 2013 -0700

    Use real illegal UTF-8 byte
    
    The code here was wrong in assuming that \xFF is not legal in UTF-8
    encoded strings.  It currently doesn't work due to a bug, but that may
    eventually be fixed: [perl #116867].  The comments are also wrong that
    all bytes are legal in UTF-EBCDIC.
    
    It turns out that in well-formed UTF-8, the bytes C0 and C1 never appear
    (C2, C3, and C4 as well in UTF-EBCDIC), as they would be the start byte
    of an illegal overlong sequence.
    
    This creates a #define for an illegal byte using one of the real illegal
    ones, and changes the code to use that.
    
    No test is included due to #116867.

M       op.c
M       toke.c
M       utf8.h

commit b6ac44edadca5cee52c5aa33a682063d5075749c
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 17 14:00:13 2013 -0700

    toke.c: Don't remap \N{} for EBCDIC
    
    Everything is now in native,

M       toke.c

commit ddbc29df5b811a5c63be1ff1f2e747889416f8d7
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 17 13:50:45 2013 -0700

    toke.c: Remove remapping for EBCDIC for octal
    
    The code prior to this commit converted something like \04 into its
    EBCDIC equivalent only in double-quoted strings.  This was not done in
    patterns, and so gave inconsistent results.  The correct thing to do
    should be to do the native thing, what someone who works on a platform
    would think \04 do.  Platform independent characters are available
    through \N{}, either by name or by U+.
    
    The comment changed by this was wrong, as in some cases it was native,
    and in some cases Unicode.

M       toke.c

commit a39d1217aa38f5ebfdd378afc8a37f3f531ff03f
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 17 13:47:13 2013 -0700

    Remove EBCDIC remappings
    
    Now that the tables are stored in native format, we shouldn't be doing
    remapping.
    
    Note that this assumes that the Latin1 casing tables are stored in
    native order; not all of this has been done yet.

M       handy.h
M       perly.c
M       pp.c
M       regcomp.c
M       regexec.c
M       utf8.c

commit 33bc93e764a69e9e2240bad3f652c1bfbb58b452
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 17 12:46:05 2013 -0700

    Add and use macro to return EBCDIC
    
    The conversion from UTF-8 to code point should generally be to the
    native code point.  This adds a macro to do that, and converts the
    core calls to the existing macro to use the new one instead.  The old
    macro is retained for possible backwards compatibility, though it
    probably should be deprecated.

M       handy.h
M       pp.c
M       regcomp.c
M       regexec.c
M       toke.c
M       utf8.c
M       utf8.h

commit 27613b8810e40a24e3fc1f8cecdcc1a44de40576
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sun Feb 17 09:18:06 2013 -0700

    charnames: fix nit in comment

M       lib/_charnames.pm

commit 791cb1ae20e836f892beb3e1e1ff26810149a55f
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sat Feb 16 11:05:44 2013 -0700

    charnames: Make work in EBCDIC
    
    Now that mktables generates native tables, the only thing that was
    needed was to make U+ mean Unicode instead of native.

M       lib/_charnames.pm
M       lib/charnames.pm

commit 33e490b5f39871c63fd5ef79f89499b5449dcec4
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sat Feb 16 09:35:56 2013 -0700

    Unicode::UCD: Work on non-ASCII platforms
    
    Now that mktables generates native tables, it is a fairly simple matter
    to get Unicode::UCD to work on those platforms.

M       lib/Unicode/UCD.pm

commit 5b9573a95cfe5e2fa460a17ce1cd9104e63f1bec
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Thu Feb 14 22:16:38 2013 -0700

    mktables: Generate native code-point tables
    
    The output tables for mktables are now in the platform's native
    character set.  This means there is no change for ASCII platforms, but
    is a change for EBCDIC ones.
    
    Since we currently don't have any EBCDIC test platforms, I tested this
    by faking it out to generate EBCDIC data, and then eye-balled the
    results.
    
    Code that didn't realize there was a potential difference between EBCDIC
    and non-EBCDIC platforms will now start to work; code that tried to do
    the right thing under these circumstances will no longer work.  Fixing
    that comes in later commits.

M       lib/unicore/mktables

commit 8d3e03f491e3ce0d1a14502446a47ef1e3e476ec
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Thu Feb 14 10:50:00 2013 -0700

    Fix some EBCDIC problems
    
    These spots have native code points, so should be using the macros for
    native code points, instead of Unicode ones.

M       regcomp.c
M       sv.c
M       toke.c

commit d773b3cd2c52e20a73b2a4aebbbc3c042af822b2
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Feb 13 22:10:19 2013 -0700

    Remove unnecessary temp variable in converting to UTF-8
    
    These areas of code included a temporary that is unnecessary.

M       inline.h
M       regcomp.c
M       sv.c

commit e458e4c1ddf6881b1f340e38e66402acf38e04ae
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Wed Feb 13 22:00:55 2013 -0700

    utf8.h: Correct macros for EBCDIC
    
    These macros were incorrect for EBCDIC.  The 3 step process given in
    utfebcdic.h wasn't being followed.

M       utf8.h

commit 580e8f18554fe0f71bc13d088a353bc017e6d8dd
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Sat Feb 9 21:23:30 2013 -0700

    Extract common code to an inline function
    
    This fairly short paradigm is repeated in several places; a later commit
    will improve it.

M       embed.fnc
M       embed.h
M       inline.h
M       pp_pack.c
M       proto.h
M       sv.c
M       toke.c
M       utf8.c

commit 3ca1bfa95ed7c596d668ae6958d477340605c1f9
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Thu Feb 7 21:35:57 2013 -0700

    Don't use EBCDIC macro for a C language escape
    
    C recognizes '\a' (for BEL); just use that instead of a look-up.
    
    regen/unicode_constants.pl could be used to generate the character for
    the ESC (set in surrounding code), but I didn't do that because of
    potential bootstrapping problems when porting to an EBCDIC platform
    without a working perl.  (The other characters generated in that .pl are
    less likely to cause problems when compiling perl.)

M       regcomp.c
M       toke.c

commit 9d6ae553522706e2ba4974914dc28f373263ab7a
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Thu Feb 7 19:53:38 2013 -0700

    Use byte domain EBCDIC/LATIN1 macro where appropriate
    
    The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the
    whole Unicode range.  In the locations affected by this commit, it is
    known that the domain is limited to a single byte, so the simpler ones
    whose names contain LATIN1 may be used.
    
    On ASCII platforms, all the macros are null, so there is no effective
    change.

M       handy.h
M       regcomp.c
M       utf8.c

commit a2726ddb7b198b088b6961f265ae400888164001
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Thu Feb 7 14:31:09 2013 -0700

    Use new clearer named #defines
    
    This converts several areas of code to use the more clearly named macros
    introduced in a recent commit

M       op.c
M       toke.c
M       utf8.c
M       utf8.h
M       utfebcdic.h

commit b1027becb814700572ab1acad1663dca3f71890b
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Thu Feb 7 13:52:31 2013 -0700

    utf8.h, utfebcdic.h: Create less confusing #defines
    
    This commit creates macros whose names mean something to me, and I don't
    find confusing.  The older names are retained for backwards
    compatibility.  Future commits will fix bugs I introduced from
    misunderstanding the meaning of the older names.
    
    The older names are now #defined in terms of the newer ones, and moved
    so that they are only defined once, valid for both ASCII and EBCDIC
    platforms.

M       utf8.h
M       utfebcdic.h

commit fa1399fa38641409b3bfe6d1c47885024facf470
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Mon Feb 4 14:22:02 2013 -0700

    pp_ctl.c: Use isCNTRL instead of hard-coded mask
    
    This is clearer and portable to EBCDIC.

M       pp_ctl.c

commit 7f03d5cda1730f9f9c3dec1b6c41b214067f72df
Author: Karl Williamson <pub...@khwilliamson.com>
Date:   Tue Feb 26 13:51:05 2013 -0700

    utf8.c: is_utf8_char_slow() should use native length
    
    What is passed is the actual length of the native utf8 character.  What
    this was calculating was the length it would be if it were a Unicode
    character, and then compares, apples to oranges.

M       utf8.c
-----------------------------------------------------------------------

--
Perl5 Master Repository

[perl.git] branch khw/ebcdic, created. v5.17.9-211-g5eab0cd

Reply via email to