In perl.git, the branch khw/ebcdic has been created <http://perl5.git.perl.org/perl.git/commitdiff/5eab0cdb5fb4522ee49456f11931f3132e298f8f?hp=0000000000000000000000000000000000000000>
at 5eab0cdb5fb4522ee49456f11931f3132e298f8f (commit) - Log ----------------------------------------------------------------- commit 5eab0cdb5fb4522ee49456f11931f3132e298f8f Author: Karl Williamson <pub...@khwilliamson.com> Date: Fri Mar 8 11:01:32 2013 -0700 XXX EBCDIC header files M charclass_invlists.h M l1_char_class_tab.h M unicode_constants.h commit b04d445fefbaa43db80553dc17262a6981ee49b1 Author: John Goodyear <johng...@us.ibm.com> Date: Sat Mar 2 12:31:25 2013 -0700 XXX Temporary for z/OS long long support M Configure M hints/os390.sh commit 67df5c08cd1fd2eb2becec160727f02c1c212e04 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Mar 10 13:11:07 2013 -0600 XXX Temporary comment out ParseXS check this is to get things to compile for now M dist/ExtUtils-ParseXS/lib/ExtUtils/ParseXS.pm commit 0df90f27f317e20ae53c03f4860e9dd36fe000e6 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Mar 10 11:34:10 2013 -0600 XXX Collate, Normalize: Allow to compile under EBCDIC M cpan/Unicode-Collate/Collate.pm M cpan/Unicode-Collate/mkheader M cpan/Unicode-Normalize/Normalize.pm M cpan/Unicode-Normalize/mkheader commit 09150d7e6dcdadeae95d30562e5841d4c277d34d Author: Karl Williamson <pub...@khwilliamson.com> Date: Sat Mar 9 21:57:38 2013 -0700 XXX dquote_static.c: Silence wrong warning on EBCDIC Unsure of whether to add the 2nd !isCNTRL_L1 to silence return trip, which should be a separate commit anyway. This silences an inappropriate warning that doesn't happen on ASCII platforms. CTRL-T maps to 0x14 on both ASCII and EBCDIC platforms. But 0x14 is a C1 control on EBCDIC, a C0 on ASCII. Therefore the test that it's a control should include both C0 and C1, which isCNTRL_L1() does. Also has a white-space change, outdenting a line so it doesn't wrap in an 80 column window. M dquote_static.c commit dbe9e4667a31f99490f0e23e30223b29a2421282 Author: Karl Williamson <pub...@khwilliamson.com> Date: Thu Mar 7 12:08:41 2013 -0700 utfebcdic.h: Change 'unsigned char' to U8 This is for consistency with the rest of Perl M utfebcdic.h commit 3d03e09e5d6eb1a7b047fe4f8eec23adcc731092 Author: Karl Williamson <pub...@khwilliamson.com> Date: Fri Mar 8 08:11:38 2013 -0700 regen/regcharclass.pl: Make more EBCDIC-friendly This commit changes the code generated by the macros so that they work right out-of-the-box on non-ASCII platforms for non-UTF-8 inputs. THEY ARE WRONG for UTF-8, but this is good enough to get perl bootstrapped onto the target platform, and regcharclass.pl can be run there, generating macros correct UTF-8. M regcharclass.h M regen/regcharclass.pl commit da5610eafb6066b458d7ac4fe57519bb38779740 Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Mar 6 21:30:01 2013 -0700 utfebcdic.h: Add (UV) cast The operand of this macro is implicitly a UV. Make sure that it is. M utfebcdic.h commit e9e33410f3ad60a1f8b9168b8e901c9e89bed13a Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Mar 6 17:04:58 2013 -0700 handy.h: Allow bootstrapping to non-ASCII platform This adds a bunch of macros and moves things around to support conditional compilation when Configure is called with -DBOOTSTRAP_CHARSET. Doing so causes the usual macros that are table-driven to not be used, since the table may not be valid when bringing Perl up for the first time on a non-ASCII platform. This allows it to compile using the platform's native C library ctype functions, which should work enough to compile miniperl, and allow the table to be changed to be valid. Then Configure can be re-run to not bootstrap, and normal compilation can proceed M handy.h M inline.h commit 598ffe073b09f8e8753f1ec12887d9b5899b9ad8 Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Mar 4 13:43:26 2013 -0700 gv.c: Remove EBCDIC dependency M gv.c commit a03816435f8cfa1da0d6c677a4543c8cb4119f7e Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Mar 4 13:00:47 2013 -0700 toke.c: Remove EBCDIC dependency M toke.c commit 222c72555e6776c48a2d835cee183d644fc459b0 Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Mar 4 09:14:25 2013 -0700 toke.c: Remove character set dependency Instead of hard-coding the bit patterns that comprise the Byte Order Mark in the UTF-8 or UTF-EBCDIC encodings, use the generated ones for the current platform. This removes some EBCDIC-only code. M toke.c commit 577c3241c380127cf76ec6cddf625f0ca8921b78 Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Mar 4 09:10:27 2013 -0700 unicode_constants.h: Add #defines for Byte Order Mark These will be used in future commits M regen/unicode_constants.pl M unicode_constants.h commit c44bc3320120012b60c708b0fe44130496fdb70b Author: Karl Williamson <pub...@khwilliamson.com> Date: Sat Mar 2 15:04:18 2013 -0700 XXX: Find a cleaner way. Handle missing is_UTF8_CHAR_utf8_safe This macro may not be present, and is currently used exclusively in IS_UTF8_CHAR, which itself may be undefined, and code should cope with that. This is a work-around until a better solution is found. M utf8.c M utf8.h commit 165725ff92b973a7cc44159b44251a713e23b5cc Author: Karl Williamson <pub...@khwilliamson.com> Date: Sat Mar 2 14:09:04 2013 -0700 Add Porting tool for help with non-ASCII platforms Porting/reorder_l1_char_class_tab.pl is used to bootstrap Perl onto a non-ASCII platform with no working Perl. M MANIFEST A Porting/reorder_l1_char_class_tab.pl M regen/mk_PL_charclass.pl commit 74526830206f06e68509865ac993b5c16e13a10b Author: Karl Williamson <pub...@khwilliamson.com> Date: Sat Mar 2 13:06:58 2013 -0700 inline.h: Reorder functions The comment implied that the functions below it in the file were deprecated, but in fact only the next two functions were. This clarifies that and moves them so they are the final ones in the file M inline.h commit 96e0aafd0c7de8865eb6d50bc74e9454bf1d3755 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sat Mar 2 12:33:42 2013 -0700 utfebcdic.h: Add comment M utfebcdic.h commit 028969003dabd9e627336fc76645d84d94acc020 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sat Mar 2 12:12:11 2013 -0700 utf8.h: Clean up START_MARK definition and use The previous definition broke good encapsulation rules. UTF_START_MARK should return something that fits in a byte; it shouldn't be the caller that does this. So the mask is moved into the definition. This means it can apply only to the portion that creates something larger than a byte. Further, the EBCDIC version can be simplified, since 7 is the largest possible number of bytes in an EBCDIC UTF8 character. M utf8.h M utfebcdic.h commit 8f322df02c9649471364c80ce88daa7a8905f344 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sat Mar 2 12:05:26 2013 -0700 utf8.h: Move #includes These two files were only being #included for non-ebcdic compiles; they should be included always. M utf8.h commit bfb4d32192882d7707dc31a3aa7832db3df4ee86 Author: John Goodyear <johng...@us.ibm.com> Date: Sat Mar 2 11:49:14 2013 -0700 utfebcdic.h: Remove extra parameter expansions These two macros were improperly expanding the parameters as well as defining the operation, leading to compile errors. M utfebcdic.h commit 2bfad75e80a07a20b554945110f2735da6ad0b77 Author: Karl Williamson <pub...@khwilliamson.com> Date: Fri Mar 1 08:28:52 2013 -0700 utf8.h: Simplify UTF8_EIGHT_BIT_foo on EBCDIC These macros were previously defined in terms of UTF8_TWO_BYTE_HI and UTF8_TWO_BYTE_LO. But the EIGHT_BIT versions can use the less general and simpler NATIVE_TO_LATN1 instead of NATIVE_TO_UNI because the input domain is restricted in the EIGHT_BIT. Note that on ASCII platforms, these both expand to the same thing, so the difference matters only on EBCDIC. M utf8.h commit 8a148438132be9ebd4be875108f43cee9c589abf Author: Karl Williamson <pub...@khwilliamson.com> Date: Thu Feb 28 09:25:27 2013 -0700 XXX temp: show makedepend cerr M makedepend.SH commit 965635fe4487a0a73a018517d4e4d86ba90c2382 Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Feb 27 21:59:11 2013 -0700 makedepend.SH: Split too long lines; properly join I had thought that a continuation introduced a space. But no, a continuation can happen in the middle of a token. And this splits lines that are getting very long to avoid preprocessor limitations. M makedepend.SH commit de4ff993328cf7a0f33d1be629e290185247a643 Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Feb 27 15:51:28 2013 -0700 makedepend.SH: White-space only Align continuation backslashes M makedepend.SH commit b6af465ca19adbf783cf49e53b2f9b17bfb834f4 Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Feb 27 14:39:28 2013 -0700 makedepend.SH: Remove some unnecessary white space Multi-line preprocessor directives are now joined into single lines. This can create lines too long for the preprocessor to handle. This commit removes blanks adjoining comments that get deleted. This makes things somewhat less likely to exceed the limit. This commit also fixes several [] which were meant to each match a tab or a blank, but editors converted the tabs to blanks M makedepend.SH commit 5733537fab2693c8a14297cc83b258c57df28fe9 Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Feb 27 14:30:51 2013 -0700 makedepend.SH: Retain '/**/' comments These comments may actually be necessary. M makedepend.SH commit 1e51f6590fb0e09e86a608baa32b67a188b054bd Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Feb 27 08:38:19 2013 -0700 handy.h: Remove extraneous parens M handy.h commit f623ed1c0fb0b81aa58f279aab7a27632f452065 Author: Andy Dougherty <dough...@lafayette.edu> Date: Wed Feb 27 13:06:07 2013 -0500 Disable gcc-style function attributes on z/OS. John Goodyear <johng...@us.ibm.com> reports that the z/OS C compiler supports the attribute keyword, but not exactly the same as gcc. Instead of a "warning", the compiler emits an "INFORMATIONAL" message that Configure fails to detect. Until Configure is fixed, just disable the attributes altogether. John Goodyear M hints/os390.sh commit aafded08822c4a6c38ff88fa6b530dad7189603a Author: Andy Dougherty <dough...@lafayette.edu> Date: Wed Feb 27 09:12:13 2013 -0500 Change os390 custom cppstdin script to use fgrep. Grep appears to be limited to 2048 characters, and truncates the output for cppstin. Fgrep apparently doesn't have that limit. Thanks to John Goodyear <johng...@us.ibm.com> for reporting this. M hints/os390.sh commit da91ff86dba83f553b2669527703d88ec8d59a84 Author: Karl Williamson <pub...@khwilliamson.com> Date: Tue Feb 26 13:45:19 2013 -0700 utf8.c: Use more clearly named macro In the case of invariants these two macros should do the same thing, but it seems to me that the latter name more clearly indicates what is going on. M utf8.c commit cad01f65a246f8f19000aa9fb2a8d5a002b75ac0 Author: Karl Williamson <pub...@khwilliamson.com> Date: Tue Feb 26 13:35:12 2013 -0700 Add macro OFFUNISKIP This means use official Unicode code point numbering, not native. Doing this converts the existing UNISKIP calls in the code to refer to native code points, which is what they meant anyway. The terminology is somewhat ambiguous, but I don't think will cause real confusion. NATIVESKIP is also introduced for situations where it is important to be precise. M toke.c M utf8.c M utf8.h M utfebcdic.h commit 7d7723b58b2a79d867a02d09966bcdfa857770ca Author: Karl Williamson <pub...@khwilliamson.com> Date: Tue Feb 26 13:22:19 2013 -0700 toke.c: white space only M toke.c commit 4134bb29f9ef6e79d14e79b46ed87be00a94ce89 Author: Karl Williamson <pub...@khwilliamson.com> Date: Tue Feb 26 12:08:50 2013 -0700 utf8.c: Deprecate two functions This is to force any code that has been using these functions to change. Since the Unicode tables are now stored in native order, these functions should only rarely be needed. However, the functionality of these is needed, and in actuality, on ASCII platforms, the native functions are #defined to these. So what this commit does is rename the functions to something else, and create wrappers with the old names, so that anyone using them will get the deprecation. M embed.fnc M embed.h M mathoms.c M proto.h M toke.c M utf8.c M utf8.h commit c47e28459cbb4a233ac2d010f42bdecbd3981e74 Author: Karl Williamson <pub...@khwilliamson.com> Date: Tue Feb 26 11:26:09 2013 -0700 Deprecate uvuni_to_utf8() Code should almost never be dealing with non-native code points M embed.fnc M embed.h M proto.h M toke.c M utf8.c M utf8.h commit daa32328fcb3591424180e61e8819d57aea18cf6 Author: Karl Williamson <pub...@khwilliamson.com> Date: Tue Feb 26 11:02:33 2013 -0700 Deprecate utf8_to_uni_buf() Now that the tables are stored in native order, there is almost no need for code to be dealing in Unicode order. M embed.fnc M proto.h M utf8.c commit e8528bd66195b2b6e48ee6ea52eecf2a37f836bb Author: Karl Williamson <pub...@khwilliamson.com> Date: Tue Feb 26 09:00:18 2013 -0700 makedepend.SH: Comment out unnecessary code This causes problems currently for z/OS. But, since we don't know why it was there, I'm leaving it in as a placeholder. M makedepend.SH commit bae2d98ec82302ea7a1335e9530892f7e8bef174 Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Feb 25 20:26:44 2013 -0700 Deprecate valid_utf8_to_uvuni() Now that all the tables are stored in native format, there is very little reason to use this function; and those who do need this kind of functionality should be using the bottom level routine, so as to make it clear they are doing nonstandard stuff. M embed.fnc M proto.h M utf8.c commit 5e30371f74299226efb9510563e3cacef0961d71 Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Feb 25 20:14:26 2013 -0700 utf8.c: Swap which fcn wraps the other This is in preparation for the current wrapee becoming deprecated M embed.fnc M embed.h M proto.h M utf8.c M utf8.h commit 295c1f68bd1967a5d9ba2ef66f59c402e6bba028 Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Feb 25 19:29:34 2013 -0700 utf8.c: Skip a no-op Since the value is invariant under both UTF-8 and not, we already have it in 'uv'; no need to do anything else to get it M utf8.c commit c4387f401bd9262b30cbf6489a970031153dcc5a Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Feb 25 19:26:50 2013 -0700 utf8.c: Move comment to where makes more sense M utf8.c commit 2d45ce7f60d0245ba8aa7ae79b05507b17197f2d Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Feb 25 17:30:10 2013 -0700 APItest: Test native code points, instead of Unicode M ext/XS-APItest/APItest.pm M ext/XS-APItest/APItest.xs M ext/XS-APItest/t/utf8.t commit 09d942dd2acf9978d5ce81d9f2f700b70164c68b Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Feb 25 17:25:08 2013 -0700 XXX CPAN Normalize This converts Unicode::Normalize to use the native tables that are used by Perl starting in XXX, while using the Unicode-ordered ones that were used before then. Another alternative would be to have mktables generate just these tables in Unicode ordering. M cpan/Unicode-Normalize/Normalize.xs commit dde7f379af5167b0840c4083a488f569c5925e17 Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Feb 25 17:22:55 2013 -0700 XXX CPAN prob wrong Collate This changes to implicity usenative code points. This is likely wrong, as the module comes with its own data, that are probably in terms of Unicode M cpan/Unicode-Collate/Collate.xs commit 3aec9efd95cf5b479b945b421fcc950b05730134 Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Feb 25 17:12:53 2013 -0700 XXX CPAN Encode.xs Use core function if available. This will insulate this code from any future changes. M cpan/Encode/Encode.xs commit 4626ceae19bcf6e5c28fb2f7396e3dc4af87f0dc Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Feb 25 17:04:24 2013 -0700 XXX CPAN and unsure Encode M cpan/Encode/Encode.xs M cpan/Encode/Unicode/Unicode.xs commit 7d2cf6d2347488d6b6d3bc9b0d354bd78313942d Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Feb 25 17:00:47 2013 -0700 XXX CPAN Encode.xs: fix indent M cpan/Encode/Encode.xs commit 74e377cb5abf439c0c302a94454026d3eacdc220 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 24 17:23:15 2013 -0700 Don't refer to U+XXXX when mean native These messages say the output number is Unicode, but it is really native, so change to saying is 0xXXXX. M regen/regcharclass_multi_char_folds.pl M regexec.c commit 5261c0f7026866a2823c2c62aed2ffae02a8912e Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 24 16:43:59 2013 -0700 Convert some uvuni() to uvchr() All the tables are now based on the native character set, so using uvuni() in almost all cases is wrong. M cygwin/cygwin.c M doop.c M op.c M pp_pack.c M regcomp.c M regexec.c M toke.c M utf8.c commit c0529a44aa63a461b9f80c62253850dec31e141c Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 24 16:25:47 2013 -0700 handy.h: White space only M handy.h commit e93c1ab9028372d0af805d597878b58677150b17 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 24 16:19:49 2013 -0700 t/test.pl: Allow native/latin1 string conversions to work on utf8. These functions no longer have the hard-coded definitions in them, but now end up resolving to internal functions, so that new encodings could be added and these would automatically understand them. Instead of using tr///, these now go character by character and converting to/from ord, which is slower, but allows them to operate on utf8 strings. Peephole optimization should make these essentially no-ops on ascii platforms. M t/test.pl commit 174d794b1f61300d938f9588b53786aa72554876 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 24 16:05:55 2013 -0700 t/test.pl: Simplify ord to/from native fcns This commit changes these functions from converting to/from a string to calling utf8:: functions which operate on ordinals instead. M t/test.pl commit 6f6c57eca515b946f8c4d12420bca953cdf5b8ef Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 24 15:35:38 2013 -0700 Make casing tables native These are final tables that haven't been converted to native character set casing. M perl.h M utfebcdic.h commit 85122aa001b39c0bd21301544c830dbc6d9f8767 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 24 15:32:30 2013 -0700 utfebcdic.h: Remove trailing spaces M utfebcdic.h commit bcc198da013b12734c8d46ab987a5edeae4ee63d Author: Karl Williamson <pub...@khwilliamson.com> Date: Fri Feb 22 18:55:26 2013 -0700 EBCDIC has the unicode bug too We have not had a working modern Perl on EBCDIC for some years. When I started out, comments and code led me to conclude erroneously that natively it supported semantics for all 256 characters 0-255. It turns out that I was wrong; it natively (at least on some platforms) has the same rules (essentially none) for the characters which don't correspond to ASCII onees, as the rules for these on ASCII platforms. This commit forces those rules on EBCDIC platforms (even should there be one that natively uses all 256). To get all 256, the same things like 'use feature "unicode_strings"' must now be done. M autodoc.pl M handy.h M pod/perlfunc.pod M pod/perlre.pod M pod/perlrecharclass.pod M pod/perlunicode.pod M pod/perlunifaq.pod commit 702eccb77c9a1f094d3dff96b3c281bfd3f42c7b Author: Karl Williamson <pub...@khwilliamson.com> Date: Thu Feb 21 13:47:52 2013 -0700 handy.h: Solve a failure to compile problem under EBCDIC handy.h is included in files that don't include perl.h, and hence not utf8.h. We can't rely therefore on the ASCII/EBCDIC conversion macros being available to us. The best way to cope is to use the native ctype functions. Most, but not all, of the macros in this commit currently resolve to use those native ones, but a future commit will change that. M handy.h commit 920ff90096cf7ec8ac38022b306c05bb1763325d Author: Karl Williamson <pub...@khwilliamson.com> Date: Thu Feb 21 13:35:12 2013 -0700 handy.h: Simplify some macro definitions Now, only one of the macros relies on magic numbers (isPRINT), leading to clearer definitions. M handy.h commit b04620d051a74b451d278d8eb020660b2f7bb014 Author: Karl Williamson <pub...@khwilliamson.com> Date: Thu Feb 21 13:26:49 2013 -0700 handy.h: Combine macros that are same in ASCII, EBCDIC These 4 macros can have the same RHS for their ASCII and EBCDIC versions, so no need to duplicate their definitions This also enables the EBCDIC versions to not have undefined expansions when compiling without perl.h M handy.h commit c3756a316ce993ac803dc374c016b5646cc84243 Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Feb 20 10:39:48 2013 -0700 Deprecate NATIVE_TO_NEED and ASCII_TO_NEED These macros are no longer called in the Perl core. This commit turns them into functions so that they can use gcc's deprecation facility. I believe these were defective right from the beginning, and I have struggled to understand what's going on. From the name, it appears NATIVE_TO_NEED taks a native byte and turns it into UTF-8 if the appropriate parameter indicates that. But that is impossible to do correctly from that API, as for variant characters, it needs to return two bytes. It could only work correctly if ch is an I8 byte, which isn't native, and hence the name would be wrong. Similar arguments for ASCII_TO_NEED. The function S_append_utf8_from_native_byte(const U8 byte, U8** dest) does what I think NATIVE_TO_NEED intended. M embed.fnc M mathoms.c M proto.h M toke.c M utf8.h M utfebcdic.h commit 680bad8bb16cb1e9c0f8404535c60e0c2cd5712f Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Feb 20 10:26:43 2013 -0700 Remove remaining calls of NATIVE_TO_NEED These calls are just copying the input to the output byte by byte. There is no need to worry about UTF-8 or not, as the output is just an exact copy of the input M toke.c commit 505b73ba3218d2bcf28ebccc1fed137dbc988d14 Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Feb 20 08:12:15 2013 -0700 toke.c: Remove some NATIVE_TO_NEED calls I believe NATIVE_TO_NEED is defective, and will remove it in a future commit. But, just in case I'm wrong, I'm doing it in small steps so bisects will show the culprit. This removes the calls to it where the parameter is clearly invariant under UTF-8 and UTF-EBCDIC, and so the result can't be other than just the parameter. M toke.c commit cced5e9c4f0c09eef4254a9a5f650743bf0077e6 Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Feb 20 08:22:07 2013 -0700 toke.c: in [A-Za-z] use macros that exclude non-ASCII alphas This code is attempting to deal with the problem of holes in the ranges a-z and A-Z in EBCDIC. Prior to this patch, it accepeted things like A WITH GRAVE, etc, which shouldn't have the special processing to deal with the holes M toke.c commit d3257ce5da7cd199be29eda4818ef67415226b10 Author: Karl Williamson <pub...@khwilliamson.com> Date: Tue Feb 19 15:13:19 2013 -0700 Use real illegal UTF-8 byte The code here was wrong in assuming that \xFF is not legal in UTF-8 encoded strings. It currently doesn't work due to a bug, but that may eventually be fixed: [perl #116867]. The comments are also wrong that all bytes are legal in UTF-EBCDIC. It turns out that in well-formed UTF-8, the bytes C0 and C1 never appear (C2, C3, and C4 as well in UTF-EBCDIC), as they would be the start byte of an illegal overlong sequence. This creates a #define for an illegal byte using one of the real illegal ones, and changes the code to use that. No test is included due to #116867. M op.c M toke.c M utf8.h commit b6ac44edadca5cee52c5aa33a682063d5075749c Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 17 14:00:13 2013 -0700 toke.c: Don't remap \N{} for EBCDIC Everything is now in native, M toke.c commit ddbc29df5b811a5c63be1ff1f2e747889416f8d7 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 17 13:50:45 2013 -0700 toke.c: Remove remapping for EBCDIC for octal The code prior to this commit converted something like \04 into its EBCDIC equivalent only in double-quoted strings. This was not done in patterns, and so gave inconsistent results. The correct thing to do should be to do the native thing, what someone who works on a platform would think \04 do. Platform independent characters are available through \N{}, either by name or by U+. The comment changed by this was wrong, as in some cases it was native, and in some cases Unicode. M toke.c commit a39d1217aa38f5ebfdd378afc8a37f3f531ff03f Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 17 13:47:13 2013 -0700 Remove EBCDIC remappings Now that the tables are stored in native format, we shouldn't be doing remapping. Note that this assumes that the Latin1 casing tables are stored in native order; not all of this has been done yet. M handy.h M perly.c M pp.c M regcomp.c M regexec.c M utf8.c commit 33bc93e764a69e9e2240bad3f652c1bfbb58b452 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 17 12:46:05 2013 -0700 Add and use macro to return EBCDIC The conversion from UTF-8 to code point should generally be to the native code point. This adds a macro to do that, and converts the core calls to the existing macro to use the new one instead. The old macro is retained for possible backwards compatibility, though it probably should be deprecated. M handy.h M pp.c M regcomp.c M regexec.c M toke.c M utf8.c M utf8.h commit 27613b8810e40a24e3fc1f8cecdcc1a44de40576 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sun Feb 17 09:18:06 2013 -0700 charnames: fix nit in comment M lib/_charnames.pm commit 791cb1ae20e836f892beb3e1e1ff26810149a55f Author: Karl Williamson <pub...@khwilliamson.com> Date: Sat Feb 16 11:05:44 2013 -0700 charnames: Make work in EBCDIC Now that mktables generates native tables, the only thing that was needed was to make U+ mean Unicode instead of native. M lib/_charnames.pm M lib/charnames.pm commit 33e490b5f39871c63fd5ef79f89499b5449dcec4 Author: Karl Williamson <pub...@khwilliamson.com> Date: Sat Feb 16 09:35:56 2013 -0700 Unicode::UCD: Work on non-ASCII platforms Now that mktables generates native tables, it is a fairly simple matter to get Unicode::UCD to work on those platforms. M lib/Unicode/UCD.pm commit 5b9573a95cfe5e2fa460a17ce1cd9104e63f1bec Author: Karl Williamson <pub...@khwilliamson.com> Date: Thu Feb 14 22:16:38 2013 -0700 mktables: Generate native code-point tables The output tables for mktables are now in the platform's native character set. This means there is no change for ASCII platforms, but is a change for EBCDIC ones. Since we currently don't have any EBCDIC test platforms, I tested this by faking it out to generate EBCDIC data, and then eye-balled the results. Code that didn't realize there was a potential difference between EBCDIC and non-EBCDIC platforms will now start to work; code that tried to do the right thing under these circumstances will no longer work. Fixing that comes in later commits. M lib/unicore/mktables commit 8d3e03f491e3ce0d1a14502446a47ef1e3e476ec Author: Karl Williamson <pub...@khwilliamson.com> Date: Thu Feb 14 10:50:00 2013 -0700 Fix some EBCDIC problems These spots have native code points, so should be using the macros for native code points, instead of Unicode ones. M regcomp.c M sv.c M toke.c commit d773b3cd2c52e20a73b2a4aebbbc3c042af822b2 Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Feb 13 22:10:19 2013 -0700 Remove unnecessary temp variable in converting to UTF-8 These areas of code included a temporary that is unnecessary. M inline.h M regcomp.c M sv.c commit e458e4c1ddf6881b1f340e38e66402acf38e04ae Author: Karl Williamson <pub...@khwilliamson.com> Date: Wed Feb 13 22:00:55 2013 -0700 utf8.h: Correct macros for EBCDIC These macros were incorrect for EBCDIC. The 3 step process given in utfebcdic.h wasn't being followed. M utf8.h commit 580e8f18554fe0f71bc13d088a353bc017e6d8dd Author: Karl Williamson <pub...@khwilliamson.com> Date: Sat Feb 9 21:23:30 2013 -0700 Extract common code to an inline function This fairly short paradigm is repeated in several places; a later commit will improve it. M embed.fnc M embed.h M inline.h M pp_pack.c M proto.h M sv.c M toke.c M utf8.c commit 3ca1bfa95ed7c596d668ae6958d477340605c1f9 Author: Karl Williamson <pub...@khwilliamson.com> Date: Thu Feb 7 21:35:57 2013 -0700 Don't use EBCDIC macro for a C language escape C recognizes '\a' (for BEL); just use that instead of a look-up. regen/unicode_constants.pl could be used to generate the character for the ESC (set in surrounding code), but I didn't do that because of potential bootstrapping problems when porting to an EBCDIC platform without a working perl. (The other characters generated in that .pl are less likely to cause problems when compiling perl.) M regcomp.c M toke.c commit 9d6ae553522706e2ba4974914dc28f373263ab7a Author: Karl Williamson <pub...@khwilliamson.com> Date: Thu Feb 7 19:53:38 2013 -0700 Use byte domain EBCDIC/LATIN1 macro where appropriate The macros like NATIVE_TO_UNI will work on EBCDIC, but operate on the whole Unicode range. In the locations affected by this commit, it is known that the domain is limited to a single byte, so the simpler ones whose names contain LATIN1 may be used. On ASCII platforms, all the macros are null, so there is no effective change. M handy.h M regcomp.c M utf8.c commit a2726ddb7b198b088b6961f265ae400888164001 Author: Karl Williamson <pub...@khwilliamson.com> Date: Thu Feb 7 14:31:09 2013 -0700 Use new clearer named #defines This converts several areas of code to use the more clearly named macros introduced in a recent commit M op.c M toke.c M utf8.c M utf8.h M utfebcdic.h commit b1027becb814700572ab1acad1663dca3f71890b Author: Karl Williamson <pub...@khwilliamson.com> Date: Thu Feb 7 13:52:31 2013 -0700 utf8.h, utfebcdic.h: Create less confusing #defines This commit creates macros whose names mean something to me, and I don't find confusing. The older names are retained for backwards compatibility. Future commits will fix bugs I introduced from misunderstanding the meaning of the older names. The older names are now #defined in terms of the newer ones, and moved so that they are only defined once, valid for both ASCII and EBCDIC platforms. M utf8.h M utfebcdic.h commit fa1399fa38641409b3bfe6d1c47885024facf470 Author: Karl Williamson <pub...@khwilliamson.com> Date: Mon Feb 4 14:22:02 2013 -0700 pp_ctl.c: Use isCNTRL instead of hard-coded mask This is clearer and portable to EBCDIC. M pp_ctl.c commit 7f03d5cda1730f9f9c3dec1b6c41b214067f72df Author: Karl Williamson <pub...@khwilliamson.com> Date: Tue Feb 26 13:51:05 2013 -0700 utf8.c: is_utf8_char_slow() should use native length What is passed is the actual length of the native utf8 character. What this was calculating was the length it would be if it were a Unicode character, and then compares, apples to oranges. M utf8.c ----------------------------------------------------------------------- -- Perl5 Master Repository