Branch: refs/heads/yves/refactored_and_improved_mph_code
  Home:   https://github.com/Perl/perl5
  Commit: ea9ae499478ffd3969df9e4274cfa3926a839789
      
https://github.com/Perl/perl5/commit/ea9ae499478ffd3969df9e4274cfa3926a839789
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M t/op/magic.t

  Log Message:
  -----------
  op/magic.t - make $SIG{ALRM} local test deal with inherited IGNORE'd signal

Signal ignores can be set up by a parent process and the child process
will inherit them. So in some situations when we test $SIG{ALRM} might
be "IGNORE" and not undef. So instead of just expecting undef, check
what it was before the localization, and then check it is still that
value afterwards.

I found this while running make test via rebase --exec, something like
this reduction (from Matthew Horsfall):

    $ git rebase --exec='perl -le"print \$SIG{ALRM}"' HEAD~1
    IGNORE

A similar case for a different signal would be this example (from Leon
Timmermans):

    $ nohup perl -E 'say $SIG{HUP}' 2>/dev/null | cat
    IGNORE


  Commit: 023ab9123ad48049a285a5bcf506dbb5515b48dc
      
https://github.com/Perl/perl5/commit/023ab9123ad48049a285a5bcf506dbb5515b48dc
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - remove unnecessary use of bignum

We can detect we are on a 32 bit perl and then do the right thing. This
*massively* speeds up the process. Hashing character by character with
bignum enabled is painfully slow.


  Commit: 4bbd3e10bc13051d24475b5607f0e6baeeb38026
      
https://github.com/Perl/perl5/commit/4bbd3e10bc13051d24475b5607f0e6baeeb38026
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - document build_split_words()

Explain the behavior and inputs and outputs of build_split_words()


  Commit: b679abb0c1fbc78e6b400f898005b36755993b09
      
https://github.com/Perl/perl5/commit/b679abb0c1fbc78e6b400f898005b36755993b09
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - refactor common sort routine out

This sort expression gets repeated a lot and it is quite long, replace
with a sub.


  Commit: 34932729dcd19e1feddb33e368c337e2101169f8
      
https://github.com/Perl/perl5/commit/34932729dcd19e1feddb33e368c337e2101169f8
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - remove unused arguments from build_split_words()

These were vestigal from a previous implementation, no point in
leaving them in, and they impact debugging.

As part of this it make sense to rename %res to $res in
build_scalar_words() as it makes it easier to make $old_res= $res
when we retry.


  Commit: 0e6367b22002e43cc558610467d80592aeae3033
      
https://github.com/Perl/perl5/commit/0e6367b22002e43cc558610467d80592aeae3033
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - dont use $key when dealing with $part of a key

In build_split_words() $key is used for the main hash, use
$part instead when we are dealing with part of a key (even
if the $part might equal the $key).


  Commit: 29f6eef5440df06a758906132362507c1fb09417
      
https://github.com/Perl/perl5/commit/29f6eef5440df06a758906132362507c1fb09417
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - dont null terminate in preprocess stage

Removing the null termination allows us to save about 100 bytes
from the final result.


  Commit: 08b351e06643df157a68b2c01bd0a0ac172c5224
      
https://github.com/Perl/perl5/commit/08b351e06643df157a68b2c01bd0a0ac172c5224
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - track added bytes while we process/check the blob

It is possible that the $blob grows while we validate it, if so
show how much it grew by and produce some diagnostics about it.
We also track how many passes we have done. With this commit it
is only used to make the new diagnostics a bit cleaner, but we will
use it in more diagnostics later.


  Commit: a1f2f14fc52f2faa542148d6b6b36c56faaaa29b
      
https://github.com/Perl/perl5/commit/a1f2f14fc52f2faa542148d6b6b36c56faaaa29b
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - do a bit less work in build_split_words()

We do not need to check the split point of 0 or of the length of the
string as we do not want empty prefixes (the 0 case) and we have already
checked if the entire string is in the blob already in a earlier check
(the length case). This also initializes the $best_prefix to be the full
$key, and the $best_suffix to be the empty string just in case we cannot
find any split point which would result in the variables being
initialized. This prevents unitialized warnings when we track these
variables in the %appended hash.


  Commit: af46264be24a4000113c57fd53ee314b0a110927
      
https://github.com/Perl/perl5/commit/af46264be24a4000113c57fd53ee314b0a110927
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - fix assignment to be consistent with other assignments.

This file uses left hugging assignment operators, this was an exception.


  Commit: 6a620f195ae703bba8eee39a2c08061a3b4cce6c
      
https://github.com/Perl/perl5/commit/6a620f195ae703bba8eee39a2c08061a3b4cce6c
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - better diags in build_split_words()

This also shows how effective the split process compression has been
by computing what percentage the final blob is compared to the worst
case of naively concatenating all the keys together.


  Commit: 6b084f2241d54e3bbaad6989dc0e64e5d04bca40
      
https://github.com/Perl/perl5/commit/6b084f2241d54e3bbaad6989dc0e64e5d04bca40
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - rename var $b2 to $new_blob in build_split_words()

$new_blob is more descriptive and less confusing.


  Commit: b5aeb161ad1d06045658f04a6bba9cafb9909741
      
https://github.com/Perl/perl5/commit/b5aeb161ad1d06045658f04a6bba9cafb9909741
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - calculate $length_all_keys in make_mph_from_hash()

Doing it in build_perfect_hash() just results in duplicated logic
between build_split_words() and build_perfect_hash()


  Commit: 7404d3b1f654a54955988406bf2619bdb43b88d5
      
https://github.com/Perl/perl5/commit/7404d3b1f654a54955988406bf2619bdb43b88d5
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - handle split data in make_mph_from_hash()

Doing it in build_perfect_hash() does not make sense, we might want to
use that function to build a hash that does not need to be split.


  Commit: 2201182740b25a5a3bd7bd6203307a83ce5497fb
      
https://github.com/Perl/perl5/commit/2201182740b25a5a3bd7bd6203307a83ce5497fb
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mk_invlists.pl - move require to top of file

mk_invlists.pl does a lot and takes a while before it gets to the part
where it requires regen/mph.pl, which means that if there are issues in
it they arent discovered until a fair amount of time elapses, which is
frustrating when debugging. Moving the require to the top means the
script dies early and can be fixed.

Includes a regen of uni_keywords.h and friends as this changes a regen
script which causes regen.t to fail if its output is not up to date.


  Commit: 9e01f0808113eddaff5ac3dbdfd4125352ab5692
      
https://github.com/Perl/perl5/commit/9e01f0808113eddaff5ac3dbdfd4125352ab5692
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mk_invlists.pl - move token_name() sub closer to where it is used

sub token_name() was injected into the middle of totally unrelated logic
that does not use it. token_name() is a wrapper around sanitize_name()
so move it next to that sub.

Also includes the output from running regen/mk_invlists.pl to keep
porting/regen.t happy.


  Commit: 0f3c93cf9059fec55292a93db4778a5ff142d208
      
https://github.com/Perl/perl5/commit/0f3c93cf9059fec55292a93db4778a5ff142d208
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mk_invlists.pl - add a way to dump the keywords hash for review

This adds a way to tell mk_invlists.pl to dump the keywords hash so it
can be reviewed, or used for testing or whatnot. A user can define the
env var DUMP_KEYWORDS_FILE to be a file name which will be used to save
the keywords hash to. If the env var is not set the file won't get
written to disk.

Includes regenerated output from running regen/mk_invlists.pl to keep
porting/regen.t happy.


  Commit: 7d754d01f57866ad8caae2d6acf18669bbce44c7
      
https://github.com/Perl/perl5/commit/7d754d01f57866ad8caae2d6acf18669bbce44c7
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - replace hard coded literal constants

This replaces the various hard coded constants with either named
constants, or vars. It also renames FNV_CONST to FNV32_PRIME in the
generated code as that is its proper name.


  Commit: 40f03e80207469a5c980b22c23a5a373a91234c6
      
https://github.com/Perl/perl5/commit/40f03e80207469a5c980b22c23a5a373a91234c6
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - specify which fnv we are using

This documents and specifies which FNV we are using. Since we are using
fnv1a_32() it also replaces the use of $MASK and replaces it with
U32_MAX. fnv1a_32() would never use anything other U32_MAX, even if
$MASK changed, as obviously it is a 32 bit hash function.


  Commit: 86fa8ad69482a3f3293fd4173cd454df8a3e01ac
      
https://github.com/Perl/perl5/commit/86fa8ad69482a3f3293fd4173cd454df8a3e01ac
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - remove $max_h check in build_perfect_hash()

Not sure why I put this in to the original code, it is a check that
would be relevant to a PRNG but is not relevant to this use case. So
this patch removes it.


  Commit: 263c3b75ebc84d5e7f0757fe14981b009a04ad21
      
https://github.com/Perl/perl5/commit/263c3b75ebc84d5e7f0757fe14981b009a04ad21
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - perltidy file for style consistency

and document the perltidy options used so any future maintainers
can follow the style of the file more easily.


  Commit: 60c0fa5bed0922f00204b5f3838cdaf55a8f21b4
      
https://github.com/Perl/perl5/commit/60c0fa5bed0922f00204b5f3838cdaf55a8f21b4
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - remove unused var


  Commit: 1772e9a089779b3f63f0b80706d832bf3c15f9a8
      
https://github.com/Perl/perl5/commit/1772e9a089779b3f63f0b80706d832bf3c15f9a8
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - enable fatal warnings

If this script warns then there is something very wrong and it should be
fixed, so just die immediately. Especially as if it is broken it might
just *spew* warnings, which is annoying.


  Commit: 12d8e421bd053d9114baeb86a0cddb55fcd1ebed
      
https://github.com/Perl/perl5/commit/12d8e421bd053d9114baeb86a0cddb55fcd1ebed
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - convert loop to use block form and add comment

Using the comma operator to separate statments is fine on a one liner,
but so much in a script that is part of perl regen processes. IMO.


  Commit: 4e97ebcd0bfbb4c6e12effb28120e68bc907c027
      
https://github.com/Perl/perl5/commit/4e97ebcd0bfbb4c6e12effb28120e68bc907c027
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - add sanity check for idx and value parameters

If either are missing then something has gone badly wrong and we should
stop processing immediately.


  Commit: 47c342cb77fb2d142bd130991ae39f68bd59663b
      
https://github.com/Perl/perl5/commit/47c342cb77fb2d142bd130991ae39f68bd59663b
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - bucket info storage: remove 'hash', move 'value' logic

The 'hash' key is totally unused and unneeded so drop it entirely.

The 'value' key can be stored into the bucket info data elsewhere,
strictly speaking it is not needed for the minimal perfect hash
computations, so move it out so that logic can be changed and
simplified in a future patch.


  Commit: b22625cc977366cb4e24537314da9f87912aac8a
      
https://github.com/Perl/perl5/commit/b22625cc977366cb4e24537314da9f87912aac8a
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - move bucket info construction log into a sub

In a follow up patch this logic will get called from more than one place
so move it to a sub so it can be used easily.


  Commit: 4f100a10c2da6a1e142011e082bdce349e85673c
      
https://github.com/Perl/perl5/commit/4f100a10c2da6a1e142011e082bdce349e85673c
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - change $hash argument to $source_hash

The term 'hash' is overused in this code, rename the $hash argument
to build_perfect_hash() to $source_hash to disambiguate.


  Commit: e6d5132202b641c1ab02857315eebac2f3af4e2b
      
https://github.com/Perl/perl5/commit/e6d5132202b641c1ab02857315eebac2f3af4e2b
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - eliminate the need to use goto

The goto is confusing, and has the potential to introduce its own bugs
if future changes are not careful, so get rid of it completely and
break build_perfect_hash() into two subs.


  Commit: b6a9348fdf6450fcf17e11328f87fe07b23420e2
      
https://github.com/Perl/perl5/commit/b6a9348fdf6450fcf17e11328f87fe07b23420e2
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - use more efficient logic in build_perfect_hash()

This patch greatly reduces the amount of work we have to do to find a
bucket for keys whose first level hash does not produce any collisions.

We have two cases, those items whose $hash2 is higher than $MAX_SEED2
and those items whose $hash2 is smaller than or equal to $MAX_SEED2.

For the former case we use a similar but streamlined process as we do
for keys whose first level hash produces collisions. For the latter case
we can trivially map the items into any bucket we choose.


  Commit: ea5757390dc2e7ec2202095576e8a48687fa1ae0
      
https://github.com/Perl/perl5/commit/ea5757390dc2e7ec2202095576e8a48687fa1ae0
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mph.pl - Clean up diagnostics logic, allow DEBUG from env.

Be silent unless requested to. If DEBUG>1 produce lots of output,
if DEBUG==1 produce some basic information about what is going on.


  Commit: c4a6cd77820e525373065480de0b49f0578f5139
      
https://github.com/Perl/perl5/commit/c4a6cd77820e525373065480de0b49f0578f5139
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - fixup die that is issued if we can't solve this hash

The old code was bit messed up, but I didn't notice because it doesn't
happen with our current data. But in theory it could. It is possible
that fnv1a has a multicollision vulnerability as it is not a secure
hash, so changing the seed wouldn't help. For now we can assume it does
not.


  Commit: 3baf02ed43a04db1f2704044e621c6ee36382b94
      
https://github.com/Perl/perl5/commit/3baf02ed43a04db1f2704044e621c6ee36382b94
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - change fnv1a_32() to _fnv1a_32() as it is not public

This function is not part of the "public" API for this script/package
and might be removed in the future if we needed to, so mark it as
private with a leading underscore.


  Commit: 0f653a4ca6609fd8077db5a5fb08025d381c7971
      
https://github.com/Perl/perl5/commit/0f653a4ca6609fd8077db5a5fb08025d381c7971
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - fixup wording in comment to be grammatical

The original wording of this paragraph was a bit clumsy and not
grammatically correct. This fixes it.


  Commit: 3490f956f063b11b030ba02dc9841e339c4ed7b4
      
https://github.com/Perl/perl5/commit/3490f956f063b11b030ba02dc9841e339c4ed7b4
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - move split key logic out of make_mph_from_hash

The logic of calling build_split_words() twice, once in "normal" mode
and once in "preproces" mode does not belong in regen/mph.pl. So this
patch renames build_split_words() to _build_split_words() and create
a new sub called build_split_words() that implements the "call it
twice" logic.

For consistency the arguments are rearranged as well so the $preprocess
argument is last, as build_split_words() now does not have a $preproces
argument, as only _build_split_words() needs it.


  Commit: 60a124b808161d3b5625d8971107bcf394c4f9f8
      
https://github.com/Perl/perl5/commit/60a124b808161d3b5625d8971107bcf394c4f9f8
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - rename $smart_blob to $blob across file

$smart_blob is unhelpfully specific, rename to more generic $blob


  Commit: ea2a8040cbd09bfc45d18663df801e91c6662c1e
      
https://github.com/Perl/perl5/commit/ea2a8040cbd09bfc45d18663df801e91c6662c1e
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - whitespace fixups

This removes trailing whitespace only. No logic changes here at all.


  Commit: f129a932d198e47073ef3961928bfca5ce033856
      
https://github.com/Perl/perl5/commit/f129a932d198e47073ef3961928bfca5ce033856
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - move file open logic to a sub

this makes all the subs affected support a filehandle or filename


  Commit: 9b42a69a958b97f3cce2b0954313f8aec9ae7ea0
      
https://github.com/Perl/perl5/commit/9b42a69a958b97f3cce2b0954313f8aec9ae7ea0
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - remove bogus defaulting for undef vars

This was some kind of thinko and never made sense, prefix
and suffix should never be undefined.


  Commit: 3af6b7f0a2d860fe07348c0699bb48b60e8ed94a
      
https://github.com/Perl/perl5/commit/3af6b7f0a2d860fe07348c0699bb48b60e8ed94a
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - use $n instead of @$second_level

Same thing but $n is initialized one line above, so use it.


  Commit: 255f6677563a2ae21d2ff78df6a5e8053e6d7a8f
      
https://github.com/Perl/perl5/commit/255f6677563a2ae21d2ff78df6a5e8053e6d7a8f
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M regen/mph.pl

  Log Message:
  -----------
  regen/mph.pl - tweaks to generated code, put type on its own line

All the C code we have puts the type on its own line separate from
the function parameter declaration, so follow that style in our
generated file too.

Also show the generator script in the comment that contains metadata
about the file.


  Commit: caa1e018565a79a9f55a5ce79a7a11ead8931d20
      
https://github.com/Perl/perl5/commit/caa1e018565a79a9f55a5ce79a7a11ead8931d20
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M regen/mph.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mph.pl & mk_invlists.pl - convert from sub interfaces to OO interfaces

The old sub based API was passing around an awkward number of arguments
and it was becoming difficult to enhance in certain ways. This patch
changes all the "user servicable" functions into methods, and moves the
configuration defaults into the constructor.

Note, not all the functions have been converted, the core routines with
simple interfaces have not been changed. This is OO for the purpose of
encapsulation not inheritance or overloading.


  Commit: 5e6ff99312662c4ef150bcd620e7e895f3268d64
      
https://github.com/Perl/perl5/commit/5e6ff99312662c4ef150bcd620e7e895f3268d64
  Author: Yves Orton <demer...@gmail.com>
  Date:   2022-04-18 (Mon, 18 Apr 2022)

  Changed paths:
    M AUTHORS
    M charclass_invlists.h
    M lib/unicore/uni_keywords.pl
    M regen/mk_invlists.pl
    M regen/mph.pl
    M uni_keywords.h

  Log Message:
  -----------
  regen/mph.pl & mk_invlists.pl - add the "_squeeze" algorithm to produce 
smaller blobs

The squeeze algorithm produces smaller blobs, 10-20% depending on how it
is used. With the "randomize_squeeze" option enabled it is slower but
produces 20% smaller blobs than the "_simple" strategy we used to use.
With the "randomize_squeeze" option disabled it is about as fast as
"_simple" but produces about 10% smaller blobs. Regardless "_squeeze"
uses more memory than _simple; quite a bit more currently, although that
is unforced and could be changed if required.

    -blob length: 10548
    +blob length: 8635
    ...
    -data size: 69908 (%67.07)
    +data size: 67995 (%65.23)

So it saves 1913 bytes running with this seed. I happened to get lucky
with the seed, depending on the seed used the blob ended up about 8650
bytes.

This algorithm is originally by Ilya Sashcheka, so I have added him to
the AUTHORS file, but unfortunately I no longer have his email address
as we lost touch. It contains many modifications by me.


Compare: https://github.com/Perl/perl5/compare/00729846bf0e...5e6ff9931266

Reply via email to