On 12/11/10 22:41, Matthias Walter wrote: > Hi all, > > I wrote a ByToken tokenizer that models Range, i.e. it can be used in a > foreach loop to read from a std.stdio.File. For it to work one has to > supply it with a delegate, taking a current buffer and a controller > class instance. It is called to extract a token from the unprocessed > part of the buffer, but can act as follows (by calling methods from the > controller class): > > - It can skip some bytes. > - It can succeed, by eating some bytes and setting the token to be read > by the front() property. > - It can request more data. > - It can indicate that the data is invalid, in which case further > processing is stopped and a user-supplied delegate is invoked that may > or may not handle this failure. > > > It is efficient, because it reuses the same buffer every time and just > supplies the user with a slice of unprocessed data. If more data is > requested, the remaining unprocessed part is copied to the beginning and > more data is read. If there is no such unprocessed data, the buffer is > enlarged, i.e. length doubled. > > The ByToken class has the type of a token as a template parameter. > > Does this behavior make sense? Any further suggestions? > Is there any interest in having this functionality, i.e. should I create > a dsource project, > or does everybody use parser-generators for everything? > > Matthias
I write lexers/parsers relatively often -- and I don't use generators... because I'm masochistic like that! And because there aren't many options for D. There was Enki for D1 a while back, which might still work pretty well, and there's GOLD although I'm not aware of how their D support is right now. I might be forgetting another. So I, for one, like the idea of it at the very least. I'd have to see it in action, though, to say much beyond that. -- Chris N-S