Branch: refs/heads/blead Home: https://github.com/Perl/perl5 Commit: 1ae36e9bbaa155adcc9230c5e4232b276a2a33fb https://github.com/Perl/perl5/commit/1ae36e9bbaa155adcc9230c5e4232b276a2a33fb Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023)
Changed paths: M locale.c Log Message: ----------- locale.c: Convert to conditional ? : operator I think this makes it less clumsy. Commit: 523eea4765a728d47445079c80c774a5b3902576 https://github.com/Perl/perl5/commit/523eea4765a728d47445079c80c774a5b3902576 Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023) Changed paths: M locale.c Log Message: ----------- locale.c: Rename variable This was shadowing an outer variable, and conflating two things. We are looking for the UTF8ness of some strings in a locale to try to divine if the locale itself is a UTF-8 one or not. But we're doing this in the context of trying to find the CODESET of the locale, like 8859-1 or UTF-8.. And the utf8ness of the CODESET name is always going to be an ASCII string. Thus there are two types of utf8ness being looked at here, and the names of the variables for each should be distinct. Commit: d8fc44bc3477dad1dbd623be1e193e8a2a70401f https://github.com/Perl/perl5/commit/d8fc44bc3477dad1dbd623be1e193e8a2a70401f Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023) Changed paths: M locale.c Log Message: ----------- locale.c: Avoid some mallocs By reusing this buffer, we don't have to realloc unless the next thing to store in it is bigger than the first. The order of calling already has abbreviations (hence shorter) coming after their full names. Commit: 868f26346d64105799b6828a6bd136ba01464cdc https://github.com/Perl/perl5/commit/868f26346d64105799b6828a6bd136ba01464cdc Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023) Changed paths: M locale.c Log Message: ----------- locale.c: Remove potential infinite recursive call In Configurations where this #ifdef'd code is compiled, we recursively call my_langinfo(). Prior to this commit, the call asked for the UTFness of the returned string. Depending on the particular values involved, that could lead to this same code being executed to determine that UTF8ness of the locale. This would have proceeded ad infinitum except a previous commit had created flags so as to skip any call that would recurse infinitely. But that can lead to erroneous results, because when skipped, we may not know what the answer is. This commit avoids all that by not asking the recursed call to return the UTF8ness of the string, but instead use a heuristic to get its value here. This avoids needing to know the locale's UTF8ness (which is where the infinite recursion would come from). The heuristic is that if it is illegal UTF-8, it isn't UTF-8; if it is plain ASCII, we can't tell; and if it is legal UTF-8, it will be tentatively considered UTF-8. This is just one iteration of a loop through a bunch of strings, so that after all the accumulated evidence of all iterations, we have confidence that the total result is correct. There are other code sections that also have the potential for infinite recursion. The next commits handle these. Commit: 2a8fb4df77474a4ee6fba9462c25cdb81d0977c9 https://github.com/Perl/perl5/commit/2a8fb4df77474a4ee6fba9462c25cdb81d0977c9 Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023) Changed paths: M locale.c Log Message: ----------- locale.c: Differently avoid infinite recursion This commit changes the mechanism for avoiding potential infinite recursion in my_localeconv(). Normally the UTF8ness of the locale is determined. Then all strings returned by localeconv() are examined to see if their SVs need to be marked as UTF-8 or not. Knowing that the locale is or isn't UTF-8 helps in that determination. But in figuring that out, some Configurations call this function asking for just a single item. That would lead to infinite recursion. To avoid that, on such Configurations prior to this commit, the UTF8ness of the overall locale wasn't calculated, but instead each item's UTF8ness was calculated individually. It's complicated, but it turns out doing this finesses the issue. See below for a fuller explanaation. This commit changes things so that for single item calls, the UTF8-ness isn't determined here, but the caller does it itself, and it doesn't generally need the locale's UTF8ness to make that determination. To expand on why it's complicated: This situation arises only on Configurations where calculating the UTF8ness of the locale may not be reliable. But it very likely is reliable except for English locales whose currency symbol is plain ASCII, such as the USA and Canada and other former members of the British Empire who use the dollar sign for their currency symbol. (I told you it was complicated.) But for such locales, the strings are going to all be ASCII, so they aren't going to be UTF-8, so we don't need to know the locale's UTF8ness. What both the previous mechanism and this new one share is both use the function get_locale_string_utf8ness_i(), and that function has the intelligence to not need the locale's UTF8ness for an ASCII string. Commit: 95a47f01ee33fb949895f73a78f8144fe4c97cef https://github.com/Perl/perl5/commit/95a47f01ee33fb949895f73a78f8144fe4c97cef Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023) Changed paths: M embedvar.h M intrpvar.h M locale.c M sv.c Log Message: ----------- locale.c: Collapse near duplicate code; removes recursion We do the same operation on both an LC_MONETARY string and various LC_TIME strings. Previously the latter was changed to not have the potential for infinite recursion. The goal is to make both instances the same here. The way this commit does that is to have both instances share the same code path. Previously the operation in each was simple, but now it is more complicated, with further revisions to come. This however entails setting up some data structures to cope with the difference in locale categories, which this commit does. Commit: 30d9e85a161d1fa966be9e598fa5d8d12430c005 https://github.com/Perl/perl5/commit/30d9e85a161d1fa966be9e598fa5d8d12430c005 Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023) Changed paths: M locale.c Log Message: ----------- locale.c: Move code to earlier and use its result Under most Configurations, we can determine a locale's code set (like "ISO 88591-1") directly. In a few, we have to use more complicated means, one of which is to see if the name of the locale conforms to the XPG standard, which includes the code set as part of the name. It turns out that in most instances, all we care about is if the code set is UTF-8 or not, and there are ways to do that even on locale names that don't meet the XPG standard. However, those ways aren't foolproof. Especially with English locales whose currency symbol is ASCII (like a dollar sign or a string of ASCII characters), the code can't make a definitive determination, and chooses the incorrect answer. In those cases, if the locale name does meet XPG criteria, we could use that as a tie breaker to get the correct answer. That's what this commit does. It moves the extraction of the code set from the locale name to prior to the UTF8ness determination, and when there is ambiguity (as in the English locales), it uses the found code set name to resolve the ambiguity. Commit: 976959c8df1a685a1a37bbfa818e585f7859f698 https://github.com/Perl/perl5/commit/976959c8df1a685a1a37bbfa818e585f7859f698 Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023) Changed paths: M locale.c Log Message: ----------- locale.c: White space, comments only Recent commits have removed the containing blocks for these two sections of code, so outdent them Commit: f85b57b3e4f82dfebc574b19e1d743d5847ec3f7 https://github.com/Perl/perl5/commit/f85b57b3e4f82dfebc574b19e1d743d5847ec3f7 Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023) Changed paths: M locale.c Log Message: ----------- locale.c: Move a couple lines to earlier in the file This attaches the comment more closely to what it is commenting. Commit: ba2aeae3cad4bf8abb07eba1321662da38c683ed https://github.com/Perl/perl5/commit/ba2aeae3cad4bf8abb07eba1321662da38c683ed Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023) Changed paths: M locale.c Log Message: ----------- locale.c: Remove some unnecessary code This code was a relic which had been lately only necessary to prevent potential infinite recursion. The recursion has been removed, so this can too. Commit: 7a10096df44ed38f4ce1b16b41069115ad5b84bf https://github.com/Perl/perl5/commit/7a10096df44ed38f4ce1b16b41069115ad5b84bf Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023) Changed paths: M locale.c Log Message: ----------- locale.c: Use MB_CUR_MAX to resolve ambiguities This code is only compiled for rare Configurations where two C99-required functions aren't available. This likely means that someone ran Configure using arguments to deny their use, which in turn means they have been found to be buggy on this platform. The code was originally developed before we required C99, but now functions as a workaround for their absence, and the code doesn't give perfect results. In particular, it fails for English UTF-8 locales whose names don't meet the XPG standard and which use the dollar sign as the currency symbol (or other string of ASCII characters). But, using another C99 feature, MB_CUR_MAX, fixes that. Our experience in the field is that it works well. Commit: a15d3a085745f320ab9b75bdf56b647a9f9914f9 https://github.com/Perl/perl5/commit/a15d3a085745f320ab9b75bdf56b647a9f9914f9 Author: Karl Williamson <k...@cpan.org> Date: 2023-11-12 (Sun, 12 Nov 2023) Changed paths: M locale.c Log Message: ----------- locale.c: Use fewer CPU cycles Now that we have the code set name (if any) before having calculated the UTF-8ness of the locale (from the past few commits), we can skip some calculations. It is very unlikely to be coincidental for the name to be "UTF-8" and a string from that locale to be syntactically legal UTF-8 (unless the string is all ASCII). This is because of the highly restrictive syntax of UTF-8. Thus, if we find this situation, we can presume that the name is telling the truth, and we don't have to keep checking all the possible strings that we previously did. Compare: https://github.com/Perl/perl5/compare/1d74e8214dd5...a15d3a085745