A small thing to begin with in the regex m/ ^ (@attributes) ':' \s (.+) $ /;
All the string examples use the literal ': ' colon+space, so how about making the regex more consistent? And also allowing the empty string as a value, which the string examples allow. m/ ^ (@attributes) ': ' (.*) $ /; Next, how about adding a 2nd regex test similar to the "split" that also relies on User ignoring unknown fields? This accepts an empty-string key, which the "split" string handler does too. m/ ^ (<-[:]>*) ': ' (.*) /; -y On Thu, Oct 28, 2021 at 2:14 AM Norman Gaywood <ngayw...@une.edu.au> wrote: > Oh, and I welcome suggestions on how I might do the task more quickly, > elegantly, differently, etc :-) > And critiques of the code also welcome. I still have a strong perl5 accent > I suspect. > > On Thu, 28 Oct 2021 at 13:15, Norman Gaywood <ngayw...@une.edu.au> wrote: > >> Executive summary: >> - comparing raku 2021.10 with raku 2021.9 >> -comparing 3 ways of parsing (although the 2 string function ways >> are similar) >> - raku 2021.10 is better than 2 times as fast as 2021.9 using the >> string functions >> - raku 2021.10 is about the same as 2021.9 using a more general >> regular expression >> - regular expressions are still slow in 2021.10 >> >> Side note: not shown here is also parsing with Text::LDIF. In 2021.9 it >> was comparable to the regex method. Not tried with 2021.10. >> >> I need to parse a 40K entry LDIF file. >> >> Below is some code that uses 3 ways to parse. >> There are 3 MAIN subs that differ in a few last lines of the for loop. >> The loop reads the LDIF entries and populates %ldap keyed on the "uid" of >> the LDIF entry. >> The values of %ldap are User objects. >> A %f hash is used to build the values of User on each LDIF entry >> >> The aim is to show the difference in timings between 3 ways of parsing >> the LDIF >> >> The 1st MAIN (regex) uses this general regular expression to build %f >> next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /; >> %f{$0} = "$1"; >> >> The "starts" MAIN uses starts-with() to build %f >> for @attributes -> $a { >> if $line.starts-with( $a ~ ": " ) { >> %f{$a} = (split( ": ", $line, 2))[1]; >> last; >> } >> >> And finally the "split" MAIN uses split() but also uses the feature that >> User.new() will ignore attributes that are not used. >> ($k, $v) = split( ": ", $line, 2); >> %f{$k} = $v; >> >> That's the difference between the MAIN()'s below. Sorry I couldn't golf >> it down more. >> Running the benchmarks multiple times does vary the times slightly but >> not significantly. >> >> Results for rakudo-pkg-2021.9.0-01: >> $ ./icheck.raku regex >> 41391 entries by regex in 27.859560887 seconds >> $ ./icheck.raku starts >> 41391 entries by starts-with in 5.970667533 seconds >> $ ./icheck.raku split >> 41391 entries by split in 5.12252741 seconds >> >> Results for rakudo-pkg-2021.10.0-01 >> $ ./icheck.raku regex >> 41391 entries by regex in 27.833870158 seconds >> $ ./icheck.raku starts >> 41391 entries by starts-with in 2.560101599 seconds >> $ ./icheck.raku split >> 41391 entries by split in 2.307679407 seconds >> >> ------------------------------------- >> #!/usr/bin/env raku >> >> class User { >> has $.uid; >> has $.uidNumber; >> has $.gidNumber; >> has $.homeDirectory; >> has $.mode = 0; >> >> method attributes { >> # return <uid uidNumber gidNumber homeDirectory mode>; >> User.^attributes(:local)>>.name>>.substr(2); # Is the order >> guaranteed? >> } >> } >> >> # Read user info from LDIF file >> my %ldap; >> my @attributes = User.attributes; >> >> multi MAIN ( "regex", $ldif-fn = "db/icheck.ldif" ) { >> my ( %f ); >> for $ldif-fn.IO.lines -> $line { >> when not $line { # blank line is LDIF entry terminator >> %ldap{%f<uid>} = User.new( |%f ); >> } >> when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new >> entry >> >> next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /; >> %f{$0} = "$1"; >> } >> say "{%ldap.elems} entries by regex in {now - BEGIN now} seconds"; >> } >> >> multi MAIN ( "starts", $ldif-fn = "db/icheck.ldif" ) { >> my ( %f ); >> for $ldif-fn.IO.lines -> $line { >> when not $line { # blank line is LDIF entry terminator >> %ldap{%f<uid>} = User.new( |%f ); >> } >> when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new >> entry >> >> for @attributes -> $a { >> if $line.starts-with( $a ~ ": " ) { >> %f{$a} = (split( ": ", $line, 2))[1]; >> last; >> } >> } >> >> } >> say "{%ldap.elems} entries by starts-with in {now - BEGIN now} >> seconds"; >> } >> >> multi MAIN ( "split", $ldif-fn = "db/icheck.ldif" ) { >> my ( %f, $k, $v ); >> for $ldif-fn.IO.lines -> $line { >> when not $line { # blank line is LDIF entry terminator >> %ldap{%f<uid>} = User.new( |%f ); # attributes not >> used are ignored >> } >> when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new >> entry >> >> ($k, $v) = split( ": ", $line, 2); >> %f{$k} = $v; >> } >> say "{%ldap.elems} entries by split in {now - BEGIN now} seconds"; >> } >> >> -- >> Norman Gaywood, Computer Systems Officer >> School of Science and Technology >> University of New England >> Armidale NSW 2351, Australia >> >> ngayw...@une.edu.au http://turing.une.edu.au/~ngaywood >> Phone: +61 (0)2 6773 2412 Mobile: +61 (0)4 7862 0062 >> >> Please avoid sending me Word or Power Point attachments. >> See http://www.gnu.org/philosophy/no-word-attachments.html >> > > > -- > Norman Gaywood, Computer Systems Officer > School of Science and Technology > University of New England > Armidale NSW 2351, Australia > > ngayw...@une.edu.au http://turing.une.edu.au/~ngaywood > Phone: +61 (0)2 6773 2412 Mobile: +61 (0)4 7862 0062 > > Please avoid sending me Word or Power Point attachments. > See http://www.gnu.org/philosophy/no-word-attachments.html >