On 04/10/2018 08:04 PM, Ralph Amissah wrote:
The exact location of problem may be provided in the error statement
"core.exception.UnicodeException@src/rt/util/utf.d(292): invalid
UTF-8 sequence".

[...]
Mock problem string with test code follows (d2sqlite3 required):

[... code ...]

A more minimal test case, reduced from your code:

----
module d2sqlite3_utf8.issue;
import d2sqlite3;
void main() {
  string[] info_tag = ["pass", "fault"];
    auto db = Database(":memory:");
    string _sql_statement = `SELECT '’’';`;
    db.run(_sql_statement);
    db.close;
}
----

From the exception's stack trace we see that `d2sqlite3.internal.util.byStatement(immutable(char)[]).ByStatement.findEnd` is the deepest non-Phobos function involved. So that's a good first spot to look for a bug. Let's check it out.

https://github.com/biozic/d2sqlite3/blob/2e8211946ae0e09646d561aeae1361a695adcc17/source/d2sqlite3/internal/util.d#L64-L83

And indeed, there's a bug in these lines:

----
auto tail = sql[pos .. $];
immutable offset = tail.countUntil(';') + 1;
pos += offset;
----

`pos` is used to slice the string `sql`. That means, `pos` is interpreted as a number of UTF-8 code *units*. But then the result of `countUntil` is added. `countUntil` counts code *points*. So a number of code points is mistaken as a number of code units. That means the next slicing can be incorrect and split up a multibyte sequence. And then `countUntil` will complain about broken UTF-8.

This can be fixed by letting `countUntil` operate on count code units instead:

----
import std.utf: byCodeUnit;
immutable offset = tail.byCodeUnit.countUntil(';') + 1;
----

If you want, you can make a bug report or a pull request with the fix. Otherwise, if you're not up to that, I can make one.

[...]
   - DMD64 D Compiler v2.074.1

That's rather old. I'd recommend updating if possible.

Reply via email to