Re: Primitive benchmark comparison (parsing LDIF)

Norman Gaywood Thu, 28 Oct 2021 15:15:33 -0700

On Fri, 29 Oct 2021 at 00:46, yary <not....@gmail.com> wrote:

> A small thing to begin with in the regex  m/ ^ (@attributes) ':' \s (.+)
> $ /;
> m/ ^ (@attributes) ': ' (.*) $ /;
>


Yes, nice cleanup. Thanks.


> Next, how about adding a 2nd regex test similar to the "split" that also
> relies on User ignoring unknown fields? This accepts an empty-string key,
> which the "split" string handler does too.
>
> m/ ^ (<-[:]>*) ': ' (.*) /;
>

$ ./icheck.raku regex2
41391 entries by regex2 in 4.615332639 seconds

Woh! That was surprising. The new regex is only about 2x slower than the
"split" method.

I did read on SO that someone claimed " longest-match alternation of the
list's elements" is slow.
But I thought the conclusion in the answers was that, in general, regex's
are slow.

Might have to test this example again on 2021.10 (not easy for me).


>>> Results for rakudo-pkg-2021.9.0-01:
>>> $ ./icheck.raku regex
>>> 41391 entries by regex in 27.859560887 seconds
>>> $ ./icheck.raku starts
>>> 41391 entries by starts-with in 5.970667533 seconds
>>> $ ./icheck.raku split
>>> 41391 entries by split in 5.12252741 seconds
>>>
>>> Results for rakudo-pkg-2021.10.0-01
>>> $ ./icheck.raku regex
>>> 41391 entries by regex in 27.833870158 seconds
>>> $ ./icheck.raku starts
>>> 41391 entries by starts-with in 2.560101599 seconds
>>> $ ./icheck.raku split
>>> 41391 entries by split in 2.307679407 seconds
>>>
>>>
--------------------------------------------------
 #!/usr/bin/env raku

class User {
    has $.uid;
    has $.uidNumber;
    has $.gidNumber;
    has $.homeDirectory;
    has $.mode = 0;

    method attributes {
       # return <uid uidNumber gidNumber homeDirectory mode>;
       User.^attributes(:local)>>.name>>.substr(2);  # Is the order
guaranteed?
    }
}

# Read user info from LDIF file
my %ldap;
my @attributes = User.attributes;

multi MAIN ( "regex", $ldif-fn = "db/icheck.ldif" ) {
    my ( %f );
    for $ldif-fn.IO.lines -> $line {
        when not $line {  # blank line is LDIF entry terminator
            %ldap{%f<uid>} = User.new( |%f );
        }
        when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
entry

        next unless $line ~~ m/ ^ (@attributes) ': ' (.*) $ /;
        %f{$0} = "$1";
    }
    say "{%ldap.elems} entries by regex in {now - BEGIN now} seconds";
}

multi MAIN ( "regex2", $ldif-fn = "db/icheck.ldif" ) {
    my ( %f );
    for $ldif-fn.IO.lines -> $line {
        when not $line {  # blank line is LDIF entry terminator
            %ldap{%f<uid>} = User.new( |%f );
        }
        when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
entry

        next unless $line ~~ m/ ^ (<-[:]>*) ': ' (.*) $ /;
        %f{$0} = "$1";
    }
    say "{%ldap.elems} entries by regex2 in {now - BEGIN now} seconds";
}

multi MAIN ( "starts", $ldif-fn = "db/icheck.ldif" ) {
    my ( %f );
    for $ldif-fn.IO.lines -> $line {
        when not $line {  # blank line is LDIF entry terminator
            %ldap{%f<uid>} = User.new( |%f );
        }
        when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
entry

        for @attributes -> $a {
            if $line.starts-with( $a ~ ": " ) {
               %f{$a} = (split( ": ", $line, 2))[1];
               last;
            }
         }
    }
    say "{%ldap.elems} entries by starts-with in {now - BEGIN now} seconds";
}

multi MAIN ( "split", $ldif-fn = "db/icheck.ldif" ) {
    my ( %f, $k, $v );
    for $ldif-fn.IO.lines -> $line {
        when not $line {  # blank line is LDIF entry terminator
            %ldap{%f<uid>} = User.new( |%f );         # attributes not used
are ignored
        }
        when $line.starts-with( 'dn: ' ) { %f = () }   # dn: starts a new
entry

        ($k, $v) = split( ": ", $line, 2);
        %f{$k} = $v;
    }
    say "{%ldap.elems} entries by split in {now - BEGIN now} seconds";
}



-- 
Norman Gaywood, Computer Systems Officer
School of Science and Technology
University of New England
Armidale NSW 2351, Australia

ngayw...@une.edu.au  http://turing.une.edu.au/~ngaywood
Phone: +61 (0)2 6773 2412  Mobile: +61 (0)4 7862 0062

Please avoid sending me Word or Power Point attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

Re: Primitive benchmark comparison (parsing LDIF)

Reply via email to