On 12/12/2010 02:04 AM, Christopher Nicholson-Sauls wrote: > On 12/11/10 22:41, Matthias Walter wrote: >> Hi all, >> >> I wrote a ByToken tokenizer that models Range, i.e. it can be used in a >> foreach loop to read from a std.stdio.File. For it to work one has to >> supply it with a delegate, taking a current buffer and a controller >> class instance. It is called to extract a token from the unprocessed >> part of the buffer, but can act as follows (by calling methods from the >> controller class): >> >> - It can skip some bytes. >> - It can succeed, by eating some bytes and setting the token to be read >> by the front() property. >> - It can request more data. >> - It can indicate that the data is invalid, in which case further >> processing is stopped and a user-supplied delegate is invoked that may >> or may not handle this failure. >> >> >> It is efficient, because it reuses the same buffer every time and just >> supplies the user with a slice of unprocessed data. If more data is >> requested, the remaining unprocessed part is copied to the beginning and >> more data is read. If there is no such unprocessed data, the buffer is >> enlarged, i.e. length doubled. >> >> The ByToken class has the type of a token as a template parameter. >> >> Does this behavior make sense? Any further suggestions? >> Is there any interest in having this functionality, i.e. should I create >> a dsource project, >> or does everybody use parser-generators for everything? >> >> Matthias > I write lexers/parsers relatively often -- and I don't use generators... > because I'm masochistic like that! And because there aren't many > options for D. There was Enki for D1 a while back, which might still > work pretty well, and there's GOLD although I'm not aware of how their D > support is right now. I might be forgetting another. > > So I, for one, like the idea of it at the very least. I'd have to see > it in action, though, to say much beyond that. My current version can be used as follows to yield a simple word-tokenizer:
http://pastebin.com/qjH6y0Mf As I'm going to use it for one or two real-world file formats I might change some things, but for now I like it. If you have any suggestions for improvements, please let me know. Matthias