Oh, and I welcome suggestions on how I might do the task more quickly, elegantly, differently, etc :-) And critiques of the code also welcome. I still have a strong perl5 accent I suspect.
On Thu, 28 Oct 2021 at 13:15, Norman Gaywood <ngayw...@une.edu.au> wrote: > Executive summary: > - comparing raku 2021.10 with raku 2021.9 > -comparing 3 ways of parsing (although the 2 string function ways are > similar) > - raku 2021.10 is better than 2 times as fast as 2021.9 using the > string functions > - raku 2021.10 is about the same as 2021.9 using a more general > regular expression > - regular expressions are still slow in 2021.10 > > Side note: not shown here is also parsing with Text::LDIF. In 2021.9 it > was comparable to the regex method. Not tried with 2021.10. > > I need to parse a 40K entry LDIF file. > > Below is some code that uses 3 ways to parse. > There are 3 MAIN subs that differ in a few last lines of the for loop. > The loop reads the LDIF entries and populates %ldap keyed on the "uid" of > the LDIF entry. > The values of %ldap are User objects. > A %f hash is used to build the values of User on each LDIF entry > > The aim is to show the difference in timings between 3 ways of parsing the > LDIF > > The 1st MAIN (regex) uses this general regular expression to build %f > next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /; > %f{$0} = "$1"; > > The "starts" MAIN uses starts-with() to build %f > for @attributes -> $a { > if $line.starts-with( $a ~ ": " ) { > %f{$a} = (split( ": ", $line, 2))[1]; > last; > } > > And finally the "split" MAIN uses split() but also uses the feature that > User.new() will ignore attributes that are not used. > ($k, $v) = split( ": ", $line, 2); > %f{$k} = $v; > > That's the difference between the MAIN()'s below. Sorry I couldn't golf it > down more. > Running the benchmarks multiple times does vary the times slightly but not > significantly. > > Results for rakudo-pkg-2021.9.0-01: > $ ./icheck.raku regex > 41391 entries by regex in 27.859560887 seconds > $ ./icheck.raku starts > 41391 entries by starts-with in 5.970667533 seconds > $ ./icheck.raku split > 41391 entries by split in 5.12252741 seconds > > Results for rakudo-pkg-2021.10.0-01 > $ ./icheck.raku regex > 41391 entries by regex in 27.833870158 seconds > $ ./icheck.raku starts > 41391 entries by starts-with in 2.560101599 seconds > $ ./icheck.raku split > 41391 entries by split in 2.307679407 seconds > > ------------------------------------- > #!/usr/bin/env raku > > class User { > has $.uid; > has $.uidNumber; > has $.gidNumber; > has $.homeDirectory; > has $.mode = 0; > > method attributes { > # return <uid uidNumber gidNumber homeDirectory mode>; > User.^attributes(:local)>>.name>>.substr(2); # Is the order > guaranteed? > } > } > > # Read user info from LDIF file > my %ldap; > my @attributes = User.attributes; > > multi MAIN ( "regex", $ldif-fn = "db/icheck.ldif" ) { > my ( %f ); > for $ldif-fn.IO.lines -> $line { > when not $line { # blank line is LDIF entry terminator > %ldap{%f<uid>} = User.new( |%f ); > } > when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new > entry > > next unless $line ~~ m/ ^ (@attributes) ':' \s (.+) $ /; > %f{$0} = "$1"; > } > say "{%ldap.elems} entries by regex in {now - BEGIN now} seconds"; > } > > multi MAIN ( "starts", $ldif-fn = "db/icheck.ldif" ) { > my ( %f ); > for $ldif-fn.IO.lines -> $line { > when not $line { # blank line is LDIF entry terminator > %ldap{%f<uid>} = User.new( |%f ); > } > when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new > entry > > for @attributes -> $a { > if $line.starts-with( $a ~ ": " ) { > %f{$a} = (split( ": ", $line, 2))[1]; > last; > } > } > > } > say "{%ldap.elems} entries by starts-with in {now - BEGIN now} > seconds"; > } > > multi MAIN ( "split", $ldif-fn = "db/icheck.ldif" ) { > my ( %f, $k, $v ); > for $ldif-fn.IO.lines -> $line { > when not $line { # blank line is LDIF entry terminator > %ldap{%f<uid>} = User.new( |%f ); # attributes not > used are ignored > } > when $line.starts-with( 'dn: ' ) { %f = () } # dn: starts a new > entry > > ($k, $v) = split( ": ", $line, 2); > %f{$k} = $v; > } > say "{%ldap.elems} entries by split in {now - BEGIN now} seconds"; > } > > -- > Norman Gaywood, Computer Systems Officer > School of Science and Technology > University of New England > Armidale NSW 2351, Australia > > ngayw...@une.edu.au http://turing.une.edu.au/~ngaywood > Phone: +61 (0)2 6773 2412 Mobile: +61 (0)4 7862 0062 > > Please avoid sending me Word or Power Point attachments. > See http://www.gnu.org/philosophy/no-word-attachments.html > -- Norman Gaywood, Computer Systems Officer School of Science and Technology University of New England Armidale NSW 2351, Australia ngayw...@une.edu.au http://turing.une.edu.au/~ngaywood Phone: +61 (0)2 6773 2412 Mobile: +61 (0)4 7862 0062 Please avoid sending me Word or Power Point attachments. See http://www.gnu.org/philosophy/no-word-attachments.html