Re: [ragel-users] Hints about ragel for performance

Michael Lachmann Sat, 03 Nov 2012 16:52:33 -0700

Just continuing the performance question.
I tried to write a simple wc in ragel.


The main part looks like this:
--
%%{
  machine wc;
  word = (^space)+ > { nwords++; };
  line = (space* . (word . space+)* word?) &
     (^'\n')*  > { nlines++; };
    main := (line . '\n')** (line)?  ;
}%%
--
(NOTE: I think this is still not totally correct. How do I handle a
line without '\n' at the end of the file correctly?)

Then I coded it in C:
--
     switch( *B.cur )
       {
        case '\n':
          n_lines++;
        case ' ':
        case '\t':
          in_word = false ;
          break ;
        default:
          if( !in_word )
            {
               in_word = true ;
               nwords++ ;
            }
       }
--
The ragel parser, compiled with g++ -O3, runs at ~35MB/s on a big
file, and the hand-crafted parser at least at 130MB/s.
Did I do some errors in the ragel coding? Could I have done it more efficiently?
Or, will hand coding always be a fairly bit faster than ragel generated code?

Thanks!
Michael


On 3 November 2012 23:34, Michael Lachmann <lachm...@eva.mpg.de> wrote:
> Hi,
>
> I'm starting to learn ragel because I'd like to write a very fast
> parser to a fairly simple file structure.
> I'd like to learn some of the tricks of increasing the performance of
> the resulting program. So,
> here are a few questions:
>
> 1. Is there a good sample program in terms of performance? I
> downloaded awkemu - is that a good example?
> 2. Often, one can use **, or one can find a terminating character.
>     For example, awkemu has:
> line = ( blineElements** '\n' )
>    I think here just * would have been enough, because there is the
> terminating \n - is that right? Does it matter?
> Should ** be avoided if possible?
>
> 3. Is there a disadvantage of using the lex-like scanner with
> \*
> pat =>
> pat =>
> etc., vs just specifying the full machine?
> 4. Is there a disadvantage of using intersection? For example, I think
> the above line handling can written as:
> line = something & [^\n]* '\n'
> where something doesn't care about handling end-of-line. Is it just as
> fast as writing expressions that also handle end-of line?
>
> 5. awkemu uses the following:
> --
>                 /* Find the last newline by searching backwards. This is where
>                  * we will stop processing on this iteration. */
>                 p = buf;
>                 pe = buf + have + len - 1;
>                 while ( *pe != '\n' && pe >= buf )
>                         pe--;
>                 pe += 1;
>
>                 /* fprintf( stderr, "running on: %i\n", pe - p ); */
>
>                 %% write exec;
>
>                 /* How much is still in the buffer. */
>                 have = data + len - pe;
>                 if ( have > 0 )
>                         memmove( buf, pe, have );
> --
> Is the first running backward to find the last eol necessary? It seems
> to run part of the file through two parsers.
>
> Thanks!
> Michael

_______________________________________________
ragel-users mailing list
ragel-users@complang.org
http://www.complang.org/mailman/listinfo/ragel-users

Re: [ragel-users] Hints about ragel for performance

Reply via email to