On Wednesday, August 01, 2012 20:29:45 Jacob Carlborg wrote: > But if you read a source file which is encoded using UTF-16 you would > need to re-encode that to store it in the "str" filed in your Token struct?
Currently, yes. > If that's the case, wouldn't it be better to make Token a template to be > able to store all Unicode encodings without re-encoding? Although I > don't know how if that will complicate the rest of the lexer. It may very well be a good idea to templatize Token on range type. It would be nice not to have to templatize it, but that may be the best route to go. The main question is whether str is _always_ a slice (or the result of takeExactly) of the orignal range. I _think_ that it is, but I'd have to make sure of that. If it's not and can't be for whatever reason, then that poses a problem. If Token _does_ get templatized, then I believe that R will end up being the original type in the case of the various string types or a range which has slicing, but it'll be the result of takeExactly(range, len) for everything else. I just made str a string to begin with, since it was simple, and I was still working on a lot of the initial design and how I was going to go about things. If it makes more sense for it to be templated, then it'll be changed so that it's templated. - Jonathan M Davis