Just continuing the performance question. I tried to write a simple wc in ragel.
The main part looks like this: -- %%{ machine wc; word = (^space)+ > { nwords++; }; line = (space* . (word . space+)* word?) & (^'\n')* > { nlines++; }; main := (line . '\n')** (line)? ; }%% -- (NOTE: I think this is still not totally correct. How do I handle a line without '\n' at the end of the file correctly?) Then I coded it in C: -- switch( *B.cur ) { case '\n': n_lines++; case ' ': case '\t': in_word = false ; break ; default: if( !in_word ) { in_word = true ; nwords++ ; } } -- The ragel parser, compiled with g++ -O3, runs at ~35MB/s on a big file, and the hand-crafted parser at least at 130MB/s. Did I do some errors in the ragel coding? Could I have done it more efficiently? Or, will hand coding always be a fairly bit faster than ragel generated code? Thanks! Michael On 3 November 2012 23:34, Michael Lachmann <lachm...@eva.mpg.de> wrote: > Hi, > > I'm starting to learn ragel because I'd like to write a very fast > parser to a fairly simple file structure. > I'd like to learn some of the tricks of increasing the performance of > the resulting program. So, > here are a few questions: > > 1. Is there a good sample program in terms of performance? I > downloaded awkemu - is that a good example? > 2. Often, one can use **, or one can find a terminating character. > For example, awkemu has: > line = ( blineElements** '\n' ) > I think here just * would have been enough, because there is the > terminating \n - is that right? Does it matter? > Should ** be avoided if possible? > > 3. Is there a disadvantage of using the lex-like scanner with > \* > pat => > pat => > etc., vs just specifying the full machine? > 4. Is there a disadvantage of using intersection? For example, I think > the above line handling can written as: > line = something & [^\n]* '\n' > where something doesn't care about handling end-of-line. Is it just as > fast as writing expressions that also handle end-of line? > > 5. awkemu uses the following: > -- > /* Find the last newline by searching backwards. This is where > * we will stop processing on this iteration. */ > p = buf; > pe = buf + have + len - 1; > while ( *pe != '\n' && pe >= buf ) > pe--; > pe += 1; > > /* fprintf( stderr, "running on: %i\n", pe - p ); */ > > %% write exec; > > /* How much is still in the buffer. */ > have = data + len - pe; > if ( have > 0 ) > memmove( buf, pe, have ); > -- > Is the first running backward to find the last eol necessary? It seems > to run part of the file through two parsers. > > Thanks! > Michael _______________________________________________ ragel-users mailing list ragel-users@complang.org http://www.complang.org/mailman/listinfo/ragel-users