Ali Çehreli:
Agreed.
But I am not that sure about this particular function anymore
because for the function to be not 'strongly exception safe',
the input string must be invalid UTF-8 to begin with.
I am not sure how bad it is to not preserve the actual
invalidness of the string in that case. :)
I see. This is a matter of design. I see some possible solutions:
1) Do nothing, assume input is well-formed UTF-8, otherwise
output will be wrong (or it will throw an exception unsafely).
This is what Phobos may be doing in this case.
2) Put a UTF validate inside the function pre-condition if the
input is a narrow string. This will slow down code in non-release
mode, maybe too much.
3) Use a stronger type system, that enforces pre-conditions and
post-conditions in a smarter way. This means if the return value
of a function that has 'validate' inside its post-condition is
given as input to a function that has 'validate' inside its
pre-condition, the validate is run only once even in non-release
mode. Generally if you use many string functions this leads to
the saving of lot of 'validate' functions. This solution is
appreciated by Eiffel languages.
4) Use two different types, one for validated UTF-8 and one for
unvalidated UTF-8. Unless you have bad bugs in your code this
will avoid most calls to 'validate'. This solution is very simple
because it doesn't require a smart compiler, and it's appreciated
in languages like Haskell (example, see: http://www.yesodweb.com/
).
Bye,
bearophile