Great! thanks a lot. 2014-06-30 4:49 GMT+02:00 William Ahern <will...@25thandclement.com>: > 2014-04-10 22:08 GMT+02:00 I?aki Baz Castillo <i...@aliax.net>: >> Hi, >> >> I'm building a parser for a protocol message similar to HTTP (let's >> say: a main header and N key: value separated by CRLF until a final >> double CRLF). My concern is: >> >> - I parse the messages in a "Dispatcher" module that just needs to >> parse a few fields in each message. >> - Then the Dispatcher passes the message to a Worker thread via UNIX >> Socket. - And the Worker must parse it again, but in this case I need all >> the fields parsed. >> >> Note that during the Worker's parsing, a C++ complex object is build >> with all the parsed fields mapped into member variables, so I don't >> want to play with those complex objects in the Dispatcher module. >> >> How could I reuse the same Ragel machine for both cases? > <snip> > > Here's an example from my own code. For various reasons (expediency, > simplicity) I used different machines to parse individual headers. But they > all use the same library of tokenization sub-machines. > > The first machine is the basic library. You could put this in a separate > file, but mine is in the same file as everything else HTTP/RTSP-related. The > second and third machines are parser examples. Note that most of the context > is missing, so you won't be able to copy+paste this. For example, I have a > basic tokenizer written in pure C (which follows DJB's algorithm for > structured MIME header parsing) which emits tagged characters as short > integers (e.g. an escaped or quoted character will have a high bit set). > This made it easier for me to handle things like quoted strings and > parenthetical comments. Although, I wrote this years ago and today I might > find it easier to handle those problems with Ragel's fcall and fgoto > statments. But the truly beautiful thing about Ragel is how it allows you to > mix-and-match approaches. So there's really no wrong way. And I would > counsel a novice to avoid attempts at Ragel-purity--i.e. trying to do > everything in Ragel, such as handle recursive structures directly in Ragel. > You can do it (and I do it in some other stuff, like my Flash FLV, Microsoft > ASF, and SMTP parsers), but it's not something worth struggling over. > > %%{ > machine tokenizer; > > crlf = [\r\n]; > lwsp = [ \t]; > > qdigit = (0x0130 - 0x0139); > qxdigit = (0x0141 - 0x0146) | (0x0161 - 0x0166) | qdigit; > > digits = digit | qdigit; > xdigits = xdigit | qxdigit; > > qalpha = (0x0141 - 0x015a) | (0x0161 | 0x017a); > > action num_begin { num = 0; } > action num_write { num *= 10; num += (0xff & fc) - '0'; } > > action hex_begin { num = 0; } > action hex_write { num <<= 4; num += ((0xff & fc) > '9')? (10 + > (tolower((0xff & fc)) - 'a')) : (0xff & fc) - '0'; } > > action str_begin { > str = 0; > if ((error = obs_new(obs, 0))) > goto error; > } > > action str_write { > if ((error = obs_putc(obs, 0xff & fc))) > goto error; > } > > action str_end { str = obs_top(obs); } > }%% > > > %%{ > machine x_sessioncookie_parser; > alphtype short; > > include tokenizer; > > action oops { > rtsp_badparse("x-sessioncookie", src, len, p); > error = EINVAL; > goto error; > } > > token = (alnum | "+" | "/")+ >str_begin $str_write %str_end %{ > hdr->token = str; }; > > main := (token lwsp*) $!oops; > > write data; > }%% > > > %%{ > machine content_type_parser; > alphtype short; > > getkey (0xff & (*fpc)); # Mask high-order bits. > > include tokenizer; > > action oops { > rtsp_badparse("Content-Type", src, len, p); > error = EINVAL; > goto error; > } > > equal = lwsp** "=" lwsp**; > > reg_name = (alnum | [!#$&.+\-\^_]){1,127}; # RFC 4288 4.2 > > charset = "charset" equal reg_name >str_begin $str_write %str_end %{ > hdr->charset = str; }; > boundary = "boundary" equal reg_name >str_begin $str_write %str_end > %{ hdr->boundary = str; }; > > attrib = (charset | boundary)? <: ^";"**; > > type = reg_name >str_begin $str_write %str_end %{ hdr->type = str; }; > subtype = reg_name >str_begin $str_write %str_end %{ hdr->subtype = > str; }; > > main := (type "/" subtype lwsp** (";" lwsp** attrib)*) $!oops; > > write data; > }%% > > > _______________________________________________ > ragel-users mailing list > ragel-users@complang.org > http://www.complang.org/mailman/listinfo/ragel-users
-- Iñaki Baz Castillo <i...@aliax.net> _______________________________________________ ragel-users mailing list ragel-users@complang.org http://www.complang.org/mailman/listinfo/ragel-users