On 10/16/2014 12:43 PM, spir via Digitalmars-d-learn wrote:
denis
spir is back! :)
On 10/16/2014 11:46 AM, Uranuz wrote:
> I have some string *str* of unicode characters. The question is how to
> check if I have valid unicode code point starting at code unit *index*?
It is easy if I understand the question as skipping over invalid UTF-8
sequences:
import std.stdio;
ubyte upperTwoBits(ubyte b)
{
return b & 0b1100_0000;
}
bool isUtf8ContinuationByte(char c)
{
enum utf8ContinuationPrefix = 0b1000_0000;
return upperTwoBits(c) == utf8ContinuationPrefix;
}
void moveToValid(ref inout(char)[] s)
{
/* Skip over UTF-8 continuation bytes. */
while (s.length && isUtf8ContinuationByte(s[0])) {
s = s[1..$];
}
/*
* The wchar[] overload is too complicated for Ali at this time. :)
*
* Please see the following function template in phobos/std/utf.d:
*
* private dchar decodeImpl(bool canIndex, S)(...)
* if (is(S : const wchar[]) ...
*/
}
unittest
{
auto s = "çde";
moveToValid(s);
assert(s == "çde");
s = s[1 .. $];
moveToValid(s);
assert(s == "de", s);
}
void moveToValid(ref const(dchar)[] s)
{
/* Every code unit is valid; nothing to do. */
}
void main()
{}
Ali