Branch: refs/heads/yves/refactored_and_improved_mph_code
  Home:   https://github.com/Perl/perl5
  Commit: 34ae082d4674816a90dc8d8566dbfb66642d3a4b
      
https://github.com/Perl/perl5/commit/34ae082d4674816a90dc8d8566dbfb66642d3a4b
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M t/op/magic.t

  Log Message:
  -----------
  op/magic.t - make $SIG{ALRM} local test deal with inherited IGNORE'd signal

Signal ignores can be set up by a parent process and the child process
will inherit them. So in some situations when we test $SIG{ALRM} might
be "IGNORE" and not undef. So instead of just expecting undef, check
what it was before the localization, and then check it is still that
value afterwards.

I found this while running make test via rebase --exec, something like
this reduction (from Matthew Horsfall):

    $ git rebase --exec='perl -le"print \$SIG{ALRM}"' HEAD~1
    IGNORE

A similar case for a different signal would be this example (from Leon
Timmermans):

    $ nohup perl -E 'say $SIG{HUP}' 2>/dev/null | cat
    IGNORE


  Commit: b645c3905b03d9b23f1505d8d9e28507abbab521
      
https://github.com/Perl/perl5/commit/b645c3905b03d9b23f1505d8d9e28507abbab521
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - remove unnecessary use of bignum

We can detect we are on a 32 bit perl and then do the right thing. This
*massively* speeds up the process. Hashing character by character with
bignum enabled is painfully slow.


  Commit: 326fc1fe71030f5c78393bdb3b28d91767229155
      
https://github.com/Perl/perl5/commit/326fc1fe71030f5c78393bdb3b28d91767229155
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - document build_split_words()

Explain the behavior and inputs and outputs of build_split_words()


  Commit: fe602516dcadefe6e00768a2e34fd24aecb6d644
      
https://github.com/Perl/perl5/commit/fe602516dcadefe6e00768a2e34fd24aecb6d644
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - refactor common sort routine out

This sort expression gets repeated a lot and it is quite long, replace
with a sub.


  Commit: 13cf268b64b9a2eb783e7a3415f661efb3987260
      
https://github.com/Perl/perl5/commit/13cf268b64b9a2eb783e7a3415f661efb3987260
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - remove unused arguments from build_split_words()

These were vestigal from a previous implementation, no point in
leaving them in, and they impact debugging.

As part of this it make sense to rename %res to $res in
build_scalar_words() as it makes it easier to make $old_res= $res
when we retry.


  Commit: ded75ec1da80b1ce0281f04e4367fd5744624142
      
https://github.com/Perl/perl5/commit/ded75ec1da80b1ce0281f04e4367fd5744624142
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - dont use $key when dealing with $part of a key

In build_split_words() $key is used for the main hash, use
$part instead when we are dealing with part of a key (even
if the $part might equal the $key).


  Commit: e4a66cbd997691729a9fdd0f79d9d9a31449c505
      
https://github.com/Perl/perl5/commit/e4a66cbd997691729a9fdd0f79d9d9a31449c505
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - dont null terminate in preprocess stage

Removing the null termination allows us to save about 100 bytes
from the final result.


  Commit: b82389bc5ed10a64ace577eacd610326e9dffbee
      
https://github.com/Perl/perl5/commit/b82389bc5ed10a64ace577eacd610326e9dffbee
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - track added bytes while we process/check the blob

It is possible that the $blob grows while we validate it, if so
show how much it grew by and produce some diagnostics about it.
We also track how many passes we have done. With this commit it
is only used to make the new diagnostics a bit cleaner, but we will
use it in more diagnostics later.


  Commit: 0f249cbd839eee5ff287b976b6efc243b3468927
      
https://github.com/Perl/perl5/commit/0f249cbd839eee5ff287b976b6efc243b3468927
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - do a bit less work in build_split_words()

We do not need to check the split point of 0 or of the length of the
string as we do not want empty prefixes (the 0 case) and we have already
checked if the entire string is in the blob already in a earlier check
(the length case). This also initializes the $best_prefix to be the full
$key, and the $best_suffix to be the empty string just in case we cannot
find any split point which would result in the variables being
initialized. This prevents unitialized warnings when we track these
variables in the %appended hash.


  Commit: fc226c94944ee1e13f77d662ab630f1ca59c5d33
      
https://github.com/Perl/perl5/commit/fc226c94944ee1e13f77d662ab630f1ca59c5d33
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - fix assignment to be consistent with other assignments.

This file uses left hugging assignment operators, this was an exception.


  Commit: a4a147272f2eebb2a5ddc0c0f2978536e0fdf5bc
      
https://github.com/Perl/perl5/commit/a4a147272f2eebb2a5ddc0c0f2978536e0fdf5bc
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - better diags in build_split_words()

This also shows how effective the split process compression has been
by computing what percentage the final blob is compared to the worst
case of naively concatenating all the keys together.


  Commit: ddcbb84d060e952525bba7d5dd23dca83e3f33e1
      
https://github.com/Perl/perl5/commit/ddcbb84d060e952525bba7d5dd23dca83e3f33e1
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - rename var $b2 to $new_blob in build_split_words()

$new_blob is more descriptive and less confusing.


  Commit: 4b216ad2c6a43c55bbfac0ed37269af26d3895e8
      
https://github.com/Perl/perl5/commit/4b216ad2c6a43c55bbfac0ed37269af26d3895e8
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - calculate $length_all_keys in make_mph_from_hash()

Doing it in build_perfect_hash() just results in duplicated logic
between build_split_words() and build_perfect_hash()


  Commit: 584e6129c31235d65e83a4221ed0fb456632782c
      
https://github.com/Perl/perl5/commit/584e6129c31235d65e83a4221ed0fb456632782c
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - handle split data in make_mph_from_hash()

Doing it in build_perfect_hash() does not make sense, we might want to
use that function to build a hash that does not need to be split.


  Commit: 4397d13a25e3a25ecc163e7a8ab8052054ab62a5
      
https://github.com/Perl/perl5/commit/4397d13a25e3a25ecc163e7a8ab8052054ab62a5
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mk_invlists.pl - move require to top of file

mk_invlists.pl does a lot and takes a while before it gets to the part
where it requires regen/mph.pl, which means that if there are issues in
it they arent discovered until a fair amount of time elapses, which is
frustrating when debugging. Moving the require to the top means the
script dies early and can be fixed.

Includes a regen of uni_keywords.h and friends as this changes a regen
script which causes regen.t to fail if its output is not up to date.


  Commit: afa6102029038b2a972a9ed35251c04d78a49ea8
      
https://github.com/Perl/perl5/commit/afa6102029038b2a972a9ed35251c04d78a49ea8
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mk_invlists.pl - move token_name() sub closer to where it is used

sub token_name() was injected into the middle of totally unrelated logic
that does not use it. token_name() is a wrapper around sanitize_name()
so move it next to that sub.

Also includes the output from running regen/mk_invlists.pl to keep
porting/regen.t happy.


  Commit: 2c7654a7a3c2e57de403fbd3e7aa4045e29dbef6
      
https://github.com/Perl/perl5/commit/2c7654a7a3c2e57de403fbd3e7aa4045e29dbef6
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mk_invlists.pl - add a way to dump the keywords hash for review

This adds a way to tell mk_invlists.pl to dump the keywords hash so it
can be reviewed, or used for testing or whatnot. A user can define the
env var DUMP_KEYWORDS_FILE to be a file name which will be used to save
the keywords hash to. If the env var is not set the file won't get
written to disk.

Includes regenerated output from running regen/mk_invlists.pl to keep
porting/regen.t happy.


  Commit: 59eaa7f0d2ab45bacf6ca574657c566888add11a
      
https://github.com/Perl/perl5/commit/59eaa7f0d2ab45bacf6ca574657c566888add11a
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - replace hard coded literal constants

This replaces the various hard coded constants with either named
constants, or vars. It also renames FNV_CONST to FNV32_PRIME in the
generated code as that is its proper name.


  Commit: 47ea2eab61b932a203d99f2ade4f22962c30b330
      
https://github.com/Perl/perl5/commit/47ea2eab61b932a203d99f2ade4f22962c30b330
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - specify which fnv we are using

This documents and specifies which FNV we are using. Since we are using
fnv1a_32() it also replaces the use of $MASK and replaces it with
U32_MAX. fnv1a_32() would never use anything other U32_MAX, even if
$MASK changed, as obviously it is a 32 bit hash function.


  Commit: be3928ec7c14f800d9e1d61010403cfecb49d01d
      
https://github.com/Perl/perl5/commit/be3928ec7c14f800d9e1d61010403cfecb49d01d
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - remove $max_h check in build_perfect_hash()

Not sure why I put this in to the original code, it is a check that
would be relevant to a PRNG but is not relevant to this use case. So
this patch removes it.


  Commit: 782ef7ac52367f8d3b474a960eaec960b5aa1bf0
      
https://github.com/Perl/perl5/commit/782ef7ac52367f8d3b474a960eaec960b5aa1bf0
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - perltidy file for style consistency

and document the perltidy options used so any future maintainers
can follow the style of the file more easily.


  Commit: 88aaccdbb5a7e9ee31c33042fa54d47913c0239a
      
https://github.com/Perl/perl5/commit/88aaccdbb5a7e9ee31c33042fa54d47913c0239a
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - remove unused var


  Commit: 671414647169ec5282063783313cadf905597b97
      
https://github.com/Perl/perl5/commit/671414647169ec5282063783313cadf905597b97
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - enable fatal warnings

If this script warns then there is something very wrong and it should be
fixed, so just die immediately. Especially as if it is broken it might
just *spew* warnings, which is annoying.


  Commit: 9d258a7f618eab38e1756a7b2137d8fe3bef0898
      
https://github.com/Perl/perl5/commit/9d258a7f618eab38e1756a7b2137d8fe3bef0898
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - convert loop to use block form and add comment

Using the comma operator to separate statments is fine on a one liner,
but so much in a script that is part of perl regen processes. IMO.


  Commit: 85e69ad5747130fb935b756fb4a8afeeae9a00fe
      
https://github.com/Perl/perl5/commit/85e69ad5747130fb935b756fb4a8afeeae9a00fe
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - add sanity check for idx and value parameters

If either are missing then something has gone badly wrong and we should
stop processing immediately.


  Commit: 5731ac49e533b9c3cedb4d8d320011f18b6fd053
      
https://github.com/Perl/perl5/commit/5731ac49e533b9c3cedb4d8d320011f18b6fd053
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - bucket info storage: remove 'hash', move 'value' logic

The 'hash' key is totally unused and unneeded so drop it entirely.

The 'value' key can be stored into the bucket info data elsewhere,
strictly speaking it is not needed for the minimal perfect hash
computations, so move it out so that logic can be changed and
simplified in a future patch.


  Commit: 428f849d5eda3298bb8b982571ae67eba82b51ff
      
https://github.com/Perl/perl5/commit/428f849d5eda3298bb8b982571ae67eba82b51ff
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - move bucket info construction log into a sub

In a follow up patch this logic will get called from more than one place
so move it to a sub so it can be used easily.


  Commit: 82a8cb7ac45feccd914c1e3ae6ca7abbe872c7f4
      
https://github.com/Perl/perl5/commit/82a8cb7ac45feccd914c1e3ae6ca7abbe872c7f4
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - change $hash argument to $source_hash

The term 'hash' is overused in this code, rename the $hash argument
to build_perfect_hash() to $source_hash to disambiguate.


  Commit: d646a7f47d9e053658b64b9341e15d0b540e6c00
      
https://github.com/Perl/perl5/commit/d646a7f47d9e053658b64b9341e15d0b540e6c00
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - eliminate the need to use goto

The goto is confusing, and has the potential to introduce its own bugs
if future changes are not careful, so get rid of it completely and
break build_perfect_hash() into two subs.


  Commit: 2038878ece684aa1e68a2456fe0a8610d2ba4414
      
https://github.com/Perl/perl5/commit/2038878ece684aa1e68a2456fe0a8610d2ba4414
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - use more efficient logic in build_perfect_hash()

This patch greatly reduces the amount of work we have to do to find a
bucket for keys whose first level hash does not produce any collisions.

We have two cases, those items whose $hash2 is higher than $MAX_SEED2
and those items whose $hash2 is smaller than or equal to $MAX_SEED2.

For the former case we use a similar but streamlined process as we do
for keys whose first level hash produces collisions. For the latter case
we can trivially map the items into any bucket we choose.


  Commit: 69376b62b982de22c2fdc7cfc91c0e58e450c2a1
      
https://github.com/Perl/perl5/commit/69376b62b982de22c2fdc7cfc91c0e58e450c2a1
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mph.pl - Clean up diagnostics logic, allow DEBUG from env.

Be silent unless requested to. If DEBUG>1 produce lots of output,
if DEBUG==1 produce some basic information about what is going on.


  Commit: fd5fd3f854cb4c849bb6b8788d49bf4813a2daf9
      
https://github.com/Perl/perl5/commit/fd5fd3f854cb4c849bb6b8788d49bf4813a2daf9
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - fixup die that is issued if we can't solve this hash

The old code was bit messed up, but I didn't notice because it doesn't
happen with our current data. But in theory it could. It is possible
that fnv1a has a multicollision vulnerability as it is not a secure
hash, so changing the seed wouldn't help. For now we can assume it does
not.


  Commit: 24da4c1b278e731d104f81901875f05c639928d3
      
https://github.com/Perl/perl5/commit/24da4c1b278e731d104f81901875f05c639928d3
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - change fnv1a_32() to _fnv1a_32() as it is not public

This function is not part of the "public" API for this script/package
and might be removed in the future if we needed to, so mark it as
private with a leading underscore.


  Commit: f9c0d679da0e161557f6bacdc10debbc05c69a96
      
https://github.com/Perl/perl5/commit/f9c0d679da0e161557f6bacdc10debbc05c69a96
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - fixup wording in comment to be grammatical

The original wording of this paragraph was a bit clumsy and not
grammatically correct. This fixes it.


  Commit: dc8b69df5372cabbe882cb234e20f1b34c1773c8
      
https://github.com/Perl/perl5/commit/dc8b69df5372cabbe882cb234e20f1b34c1773c8
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - move split key logic out of make_mph_from_hash

The logic of calling build_split_words() twice, once in "normal" mode
and once in "preproces" mode does not belong in regen/mph.pl. So this
patch renames build_split_words() to _build_split_words() and create
a new sub called build_split_words() that implements the "call it
twice" logic.

For consistency the arguments are rearranged as well so the $preprocess
argument is last, as build_split_words() now does not have a $preproces
argument, as only _build_split_words() needs it.


  Commit: fc3ce90815623b76fd3f177e1c6dadbfe80889b5
      
https://github.com/Perl/perl5/commit/fc3ce90815623b76fd3f177e1c6dadbfe80889b5
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - rename $smart_blob to $blob across file

$smart_blob is unhelpfully specific, rename to more generic $blob


  Commit: 2906a1d56e77bab3976dce1126e27d9bb6cb1778
      
https://github.com/Perl/perl5/commit/2906a1d56e77bab3976dce1126e27d9bb6cb1778
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - whitespace fixups

This removes trailing whitespace only. No logic changes here at all.


  Commit: 3b1713f842b811f7d2cfde269d6f751b4019d1d6
      
https://github.com/Perl/perl5/commit/3b1713f842b811f7d2cfde269d6f751b4019d1d6
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - move file open logic to a sub

this makes all the subs affected support a filehandle or filename


  Commit: bbd333e6eee3a8de0d88c08b7e2773d2fd1ef7a7
      
https://github.com/Perl/perl5/commit/bbd333e6eee3a8de0d88c08b7e2773d2fd1ef7a7
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - remove bogus defaulting for undef vars

This was some kind of thinko and never made sense, prefix
and suffix should never be undefined.


  Commit: fd87a368f60d2610f8be2f9d819fbffe183949fc
      
https://github.com/Perl/perl5/commit/fd87a368f60d2610f8be2f9d819fbffe183949fc
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - use $n instead of @$second_level

Same thing but $n is initialized one line above, so use it.


  Commit: 144ab7b4d8693e20877f9e4bbde4e7a372f72ab7
      
https://github.com/Perl/perl5/commit/144ab7b4d8693e20877f9e4bbde4e7a372f72ab7
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - tweaks to generated code, put type on its own line

All the C code we have puts the type on its own line separate from
the function parameter declaration, so follow that style in our
generated file too.

Also show the generator script in the comment that contains metadata
about the file.


  Commit: 2d9dce49fe5b85a3b4d3be2538a016406be333c7
      
https://github.com/Perl/perl5/commit/2d9dce49fe5b85a3b4d3be2538a016406be333c7
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M regen/mph.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mph.pl & mk_invlists.pl - convert from sub interfaces to OO interfaces

The old sub based API was passing around an awkward number of arguments
and it was becoming difficult to enhance in certain ways. This patch
changes all the "user servicable" functions into methods, and moves the
configuration defaults into the constructor.

Note, not all the functions have been converted, the core routines with
simple interfaces have not been changed. This is OO for the purpose of
encapsulation not inheritance or overloading.


  Commit: 00729846bf0e20a2777db3aaffb715364efe5ece
      
https://github.com/Perl/perl5/commit/00729846bf0e20a2777db3aaffb715364efe5ece
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-17 (Sun, 17 Apr 2022)

  Changed paths:
    M AUTHORS
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M regen/mph.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mph.pl & mk_invlists.pl - add the "_squeeze" algorithm to produce 
smaller blobs

The squeeze algorithm produces smaller blobs, 10-20% depending on how it
is used. With the "randomize_squeeze" option enabled it is slower but
produces 20% smaller blobs than the "_simple" strategy we used to use.
With the "randomize_squeeze" option disabled it is about as fast as
"_simple" but produces about 10% smaller blobs. Regardless "_squeeze"
uses more memory than _simple; quite a bit more currently, although that
is unforced and could be changed if required.

    -blob length: 10548
    +blob length: 8635
    ...
    -data size: 69908 (%67.07)
    +data size: 67995 (%65.23)

So it saves 1913 bytes running with this seed. I happened to get lucky
with the seed, depending on the seed used the blob ended up about 8650
bytes.

This algorithm is originally by Ilya Sashcheka, so I have added him to
the AUTHORS file, but unfortunately I no longer have his email address
as we lost touch. It contains many modifications by me.


Compare: https://github.com/Perl/perl5/compare/34ae082d4674%5E...00729846bf0e

Reply via email to