Executive summary: - comparing raku 2021.10 with raku 2021.9 -comparing 3 ways of parsing (although the 2 string function ways are similar) - raku 2021.10 is better than 2 times as fast as 2021.9 using the string functions - raku 2021.10 is about the same as 2021.9 using a more general regular expression - regular expressions are still slow in 2021.10
Side note: not shown here is also parsing with Text::LDIF. In 2021.9 it was comparable to the regex method. Not tried with 2021.10. I need to parse a 40K entry LDIF file. Below is some code that uses 3 ways to parse. There are 3 MAIN subs that differ in a few last lines of the for loop. The loop reads the LDIF entries and populates %ldap keyed on the "uid" of the LDIF entry. The values of %ldap are User objects. A %f hash is used to build the values of User on each LDIF entry The aim is to show the difference in timings between 3 ways of parsing the LDIF The 1st MAIN (regex) uses this general regular expression to build %f next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /; %f{$0} = "$1"; The "starts" MAIN uses starts-with() to build %f for @attributes -> $a { if $line.starts-with( $a ~ ": " ) { %f{$a} = (split( ": ", $line, 2))[1]; last; } And finally the "split" MAIN uses split() but also uses the feature that User.new() will ignore attributes that are not used. ($k, $v) = split( ": ", $line, 2); %f{$k} = $v; That's the difference between the MAIN()'s below. Sorry I couldn't golf it down more. Running the benchmarks multiple times does vary the times slightly but not significantly. Results for rakudo-pkg-2021.9.0-01: $ ./icheck.raku regex 41391 entries by regex in 27.859560887 seconds $ ./icheck.raku starts 41391 entries by starts-with in 5.970667533 seconds $ ./icheck.raku split 41391 entries by split in 5.12252741 seconds Results for rakudo-pkg-2021.10.0-01 $ ./icheck.raku regex 41391 entries by regex in 27.833870158 seconds $ ./icheck.raku starts 41391 entries by starts-with in 2.560101599 seconds $ ./icheck.raku split 41391 entries by split in 2.307679407 seconds ------------------------------------- #!/usr/bin/env raku class User { has $.uid; has $.uidNumber; has $.gidNumber; has $.homeDirectory; has $.mode = 0; method attributes { # return <uid uidNumber gidNumber homeDirectory mode>; User.^attributes(:local)>>.name>>.substr(2); # Is the order guaranteed? } } # Read user info from LDIF file my %ldap; my @attributes = User.attributes; multi MAIN ( "regex", $ldif-fn = "db/icheck.ldif" ) { my ( %f ); for $ldif-fn.IO.lines -> $line { when not $line { # blank line is LDIF entry terminator %ldap{%f<uid>} = User.new( |%f ); } when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new entry next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /; %f{$0} = "$1"; } say "{%ldap.elems} entries by regex in {now - BEGIN now} seconds"; } multi MAIN ( "starts", $ldif-fn = "db/icheck.ldif" ) { my ( %f ); for $ldif-fn.IO.lines -> $line { when not $line { # blank line is LDIF entry terminator %ldap{%f<uid>} = User.new( |%f ); } when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new entry for @attributes -> $a { if $line.starts-with( $a ~ ": " ) { %f{$a} = (split( ": ", $line, 2))[1]; last; } } } say "{%ldap.elems} entries by starts-with in {now - BEGIN now} seconds"; } multi MAIN ( "split", $ldif-fn = "db/icheck.ldif" ) { my ( %f, $k, $v ); for $ldif-fn.IO.lines -> $line { when not $line { # blank line is LDIF entry terminator %ldap{%f<uid>} = User.new( |%f ); # attributes not used are ignored } when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new entry ($k, $v) = split( ": ", $line, 2); %f{$k} = $v; } say "{%ldap.elems} entries by split in {now - BEGIN now} seconds"; } -- Norman Gaywood, Computer Systems Officer School of Science and Technology University of New England Armidale NSW 2351, Australia ngayw...@une.edu.au http://turing.une.edu.au/~ngaywood Phone: +61 (0)2 6773 2412 Mobile: +61 (0)4 7862 0062 Please avoid sending me Word or Power Point attachments. See http://www.gnu.org/philosophy/no-word-attachments.html