deadalnix , dans le message (digitalmars.D:174155), a écrit : >> The tokens are not kept, correct. But the identifier strings, and the >> string literals, are kept, and if they are slices into the input buffer, >> then everything I said applies. >> > > Ok, what do you think of that : > > lexer can have a parameter that tell if it should build a table of token > or slice the input. The second is important, for instance for an IDE : > lexing will occur often, and you prefer slicing here because you already > have the source file in memory anyway. > > The token always contains as a member a slice. The slice come either > from the source or from a memory chunk allocated by the lexer.
If I may add, there are several possitilities here: 1- a real slice of the input range 2- a slice of the input range created with .save and takeExactly 3- a slice allocated in GC memory by the lexer 4- a slice of memory owned by the lexer, which is reused for the next token (thus, the next call to popFront invalidates the token). 5- a slice of memory from a lookup table. All are useful in certain situations. #1 is usable for sliceable ranges, and is definitely efficient when you don't have a huge amont of code to parse. #2 is usable for forward ranges. #3 is usable for any range, but I would not recommand it... #4 is usable for any range, #5 is best if you perform complicated operations with the tokens. #1/#2 should not be very hard to code: when you start to lex a new token, you save the range, and when you found the end of the token, you just use takeExactly on the saved ranged. #4 requires to use an internal buffer. That's more code, but you have to do them in a second step if you want to be able to use input range (which you have too). Actually, the buffer may be external, if you use a buffered-range adapter to make a forward range out of an input range. Having an internal buffer may be more efficient. That something that has to be profiled. #3 can be obtained from #4 by map!(x => x.dup). #5 requires one of the previous to be implemented. You need to have a slice saved somewhere before having a look at the look-up table. Therefore, I think #5 should be obtained without a high loss of efficiency by an algorithm external to the lexer. This would probably bring many ways to use the lexer. For example, you can filter-out many tokens that you don't want before building the table, which avoids to have an overfull look-up table if you are only interested in a subset of tokens. #1/#2 with adapter ranges might be the only thing that is required to code, although the API should allow to define #4 and #5, for the user to use the adapters blindly, or if an internal implementation proves to be significantly more efficient. -- Christophe