Andrei Alexandrescu wrote:
During compilation, such non-tokens are recognized as code by the lexer generator and called appropriately. A comprehensive library of such routines completes a useful library.
I agree, a set of "canned" and heavily optimized lexing functions for common things like identifiers, numbers, comments, etc., would make a lexing library much more practical.
Those will work great for inventing DSLs, but for existing languages, the trouble is that the different languages have subtle variations on how they handle them. For example, D's numeric literals allow embedded underscores. Go doesn't overflow on numeric literals. Javascript has some wacky rules to distinguish a comment from a regex. The \uNNNN letters allowed in identifiers in some languages.
So while a general purpose lexing library will be very useful, for lexing D code (and Java, Javascript, etc.) a custom one will probably be much more practical.