On 2011-02-03 22:27:08 -0500, Andrei Alexandrescu <seewebsiteforem...@erdani.org> said:

On 2/3/11 9:11 PM, Walter Bright wrote:
Andrei Alexandrescu wrote:
Nobody that I know of. If you want to discuss design here while
working on it, that would be great. I could think of a few high-level
requirements:

* works with input ranges so we can plug it in with any source

The difficulty with that is if it's a pure input range, then the output
cannot be slices of the input.

In that case it's fair to require sliceable ranges of characters then, or strings outright. It all boils down to stating one's assumptions and choices. Probably parameterizing on character width would be recommendable anyway.

The problem with parametrizing on the character width is that whether a parser parses a UTF-8 document or a UTF-16 document is determined at runtime by inspecting the document. How is the user of the parser supposed to decide in advance which to instantiate? And how the application is supposed to handle slices of different string types coming from those different parser instances?

The actual low-level parser could indeed use a different instance depending on the text encoding as an optimization, but the end-user API should standardize on one string type. Unfortunately, if the XML file is not using the same text encoding as your standard string type, then you can't use slicing and have to create copies for each and every string...

Another option is to use a "smart" string type that can accept strings slices of any encoding.

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Reply via email to