In perl.git, the branch smoke-me/khw-invlist has been created
<http://perl5.git.perl.org/perl.git/commitdiff/a7b347c4677caed6957bf9624cd44a08224af31f?hp=0000000000000000000000000000000000000000>
at a7b347c4677caed6957bf9624cd44a08224af31f (commit)
- Log -----------------------------------------------------------------
commit a7b347c4677caed6957bf9624cd44a08224af31f
Author: Karl Williamson <[email protected]>
Date: Wed Jun 27 16:24:43 2012 -0600
regcomp.c: Optimize /[0-9]/ into /\d/a
The commonly used [0-9] can be optimized into a smaller, faster node
that means the same thing.
M regcomp.c
commit 56b939a1bb6d60adefa9c27fcc41ce357c3bc54f
Author: Karl Williamson <[email protected]>
Date: Wed Jun 27 14:43:41 2012 -0600
regcomp.c: Optimize /[^\w]/ into /\W/. etc.
This optimizes character classes that have a single element that is one
of the ops that have the same meaning outside (namely \d, \h, \s, \w, \v
and their complements) to that op. Those ops take less space than a
character class and run faster. An initial '^' for complementing the
class is also handled.
M regcomp.c
M regcomp.sym
commit a21862584e08efb838752dc82cca9b4c19020dd9
Author: Karl Williamson <[email protected]>
Date: Wed Jun 27 13:48:16 2012 -0600
regcomp.c: Simply some node calculations
For the node types that have differing versions depending on the
character set regex modifiers, /d, /l, /u, /a, and /aa, we can use the
enum values as offsets from the base node number to derive the correct
one. This eliminates a number of tests.
Because there is no DIGITU node type, I added placeholders for it (and
NDIGITU) to avoid some special casing of it (more important in future
commits). We currently have many available node types, so can afford to
waste these two.
M op_reg_common.h
M regcomp.c
M regcomp.sym
M regnodes.h
commit 2a440f2e2a3f2381acf398f6acc3cc519b72c3a7
Author: Karl Williamson <[email protected]>
Date: Wed Jun 27 13:28:13 2012 -0600
regcomp.sym: Reorder a couple of nodes
This causes all the nodes that depend on the regex modifier, BOUND,
BOUNDL, etc. to have the same relative ordering. This will enable a
future commit to simplify generation of the correct node.
M regcomp.sym
M regnodes.h
commit a6a0a280fe1318c4834df682e32c5645343b60cd
Author: Karl Williamson <[email protected]>
Date: Tue Jun 26 18:14:23 2012 -0600
reg_fold.t: Make test cases non-optimizable away
This commit changes the bracketed character classes to include a
non-related character. This is in preparation for a future commit which
would cause the current character classes to be optimized into EXACTish
nodes which would start passing TODO tests, but don't fix the underlying
problem with character classes. That bug is that you can't split a
multi-char fold across nodes. It probably is not fixable in Perl without
a total restructuring of the regular expression mechanism. For example,
"\N{LATIN SMALL LIGATURE FFI}" doesn't match /[f][f][i]/i. But it would
if those got optimized into a single EXACTF node. (The problem is not
limited to character classes, /(f)(f)(i)/i also doesn't match, and
can't, as $1, $2, and $3 are not well-defined.)
M t/re/reg_fold.t
commit 25e1d472cf3c5b91fd6c57da4479c17bd1ba5d46
Author: Karl Williamson <[email protected]>
Date: Sun Jun 24 14:16:44 2012 -0600
regcomp.c: Simplify compile time [^..] complement
This simply moves the code that populates the bitmap and combines the
two inversion lists to after the inversion (the differences are shown
much greater than there really are, since a move is done.) This greatly
simplifies complementing the character class.
M regcomp.c
commit 31bd5c832b74a44971b1a2a2cfa7a78402695afc
Author: Karl Williamson <[email protected]>
Date: Sun Jun 24 14:02:48 2012 -0600
regcomp.c: Rename variable to reflect new purpose
This variable really holds the list of all code points the bracketed
character class matches; it's not just the ones not in the bitmap.
M regcomp.c
commit 92625279726ba4b1aa4da2972adeefd3cc16a41b
Author: Karl Williamson <[email protected]>
Date: Sat Jun 23 21:25:36 2012 -0600
regcomp.c: Have a subroutine do the work
Since this code was originally written, the fold function has added
input flags that cause it to do the same thing this code does. So do it
in the subroutine.
M regcomp.c
commit 758864f4b6915c83cddde4ada7a256523e06bb32
Author: Karl Williamson <[email protected]>
Date: Sat Jun 23 15:48:42 2012 -0600
regcomp.c: Remove obsolete code
A previous commit has removed all calls to these two functions (moving a
large portion of the bit_fold() one to another place, and no longer sets
the variable.
M embed.fnc
M embed.h
M proto.h
M regcomp.c
commit 885f2c0f1807cd1dc4399e3fb926ebe69e31c7af
Author: Karl Williamson <[email protected]>
Date: Sat Jun 23 14:19:02 2012 -0600
regcomp.c: White-space, comments only
This indents, outdents previous code, based on new/removed outer blocks.
It reflows comments and code to fit into 80 columns, add/removes blank
lines, minor comment rewording
M regcomp.c
commit ecbb58f3dac4e57235a924966e0cfe6fab8be40f
Author: Karl Williamson <[email protected]>
Date: Sat Jun 23 15:24:38 2012 -0600
regcomp.c: Remove unnecessary 'if' test
A previous commit has refactored things, so this test is always true
M regcomp.c
commit 6d3e02d39dc974eefdbd5164de6618a69decf59f
Author: Karl Williamson <[email protected]>
Date: Sat Jun 23 15:00:26 2012 -0600
regcomp.c: Use more inversion lists in [] char classes
This changes the building of bracketed character classes to use
inversion lists instead of a bitmap/inversion list combination.
This will lead in later commits to simplification and extending
optimizations to beyond the Latin1 range.
M regcomp.c
commit dc81fbad48c2c524378b837af70317951cf3ae7a
Author: Karl Williamson <[email protected]>
Date: Sat Jun 23 13:30:36 2012 -0600
perlguts: Document that PV can point to non-string
M pod/perlguts.pod
commit c5167f8c7cabf51095086405287466b99d14466b
Author: Karl Williamson <[email protected]>
Date: Sat Jun 23 12:57:54 2012 -0600
handy.h: Fix isBLANK_uni and isBLANK_utf8
These macros have never worked outside the Latin1 range, so this extends
them to work.
There are no tests I could find for things in handy.h, except that many
of them are called all over the place during the normal course of
events. This commit adds a new file for such testing, containing for
now only with a few tests for the isBLANK's
M MANIFEST
M embed.fnc
M embed.h
M embedvar.h
M ext/XS-APItest/APItest.pm
M ext/XS-APItest/APItest.xs
A ext/XS-APItest/t/handy.t
M handy.h
M intrpvar.h
M perl.c
M proto.h
M sv.c
M utf8.c
commit 683de0149cdd5325c8ea6ca3f059673e677f59d3
Author: Karl Williamson <[email protected]>
Date: Sat Jun 23 12:03:42 2012 -0600
no_utf8_pm.t: Add blank between 'not' and 'ok' in .t
M t/re/no_utf8_pm.t
-----------------------------------------------------------------------
--
Perl5 Master Repository