Simon Slavin wrote:
> On 9 Nov 2011, at 8:03pm, Yuriy Kaminskiy wrote:
> 
>> Look at: SELECT hex(X'1245005679'),hex(X'1245001234');
>> 
>> And compare: SELECT X'1245005679' LIKE X'1245001234'; 1 -- incorrect SELECT
>> X'1245005679' = X'1245001234'; 0 -- correct SELECT X'1245005679' >
>> X'1245001234'; 1 -- correct
>> 
>> "LIKE" (in both "native" and "icu" implementations) ignores value length 
>> (sqlite_value_bytes) and stops at NUL character. Compare that with "=" and
>> ">" that compares full value.
>> 
>> Arguable SQL_BLOB should be illegal operand for LIKE/GLOB, but then this
>> should be documented and they should return error when called with BLOB
>> argument instead of returning nonsense.
> 
> The definition of LIKE on the expression page states (not with complete
> clarity) that LIKE operates on strings.  So we're back to an old favourite: a
> decision as to what constitutes a string inside SQLite.  One definition would
> say that a string would terminate with the first 0x00 no matter how many
> bytes are stored (C style strings).  Another would say that the string
> terminates with the last byte stored (Pascal style strings).

One way or other, "=", "LIKE" and "GLOB" results should be consistent.
If string is NUL-terminated, "=" should ignore everything after NUL.
If string is length-terminated, "LIKE" should not ignore bytes after NUL.

If devs have no time to change code and/or make decisions now, those cases
should be at least documented as "function/operation A,B,C (currently) have
undefined behavior on BLOB". Not very nice, but randomly stumbling over such
surprises is certainly worse.

... and few related strangeness: UPPER/LOWER/*TRIM/REPLACE also accept BLOB
argument, but return *TEXT* instead. With somewhat unexpected result:

sqlite> SELECT length(           X'41424300414243313233');
10
sqlite> SELECT length(     lower(X'41424300414243313233'));
3
sqlite> SELECT length(CAST(lower(X'41424300414243313233') AS BLOB));
10
sqlite> .mode insert
sqlite> select      rtrim(X'4142430061626333',X'334363');
INSERT INTO table VALUES('ABC');
sqlite> select CAST(rtrim(X'4142430061626333',X'334363') AS BLOB)
INSERT INTO table VALUES(X'414243006162');

(BTW, one can consider that *TRIM, LIKE and GLOB semantic on BLOB should be
different - with BLOB they should operate on *bytes*, not *utf-8 chars*; one
more reason to state "undefined behavior" for now).

Same with || operator (OP_Concat): it takes BLOB, but make result TEXT (leading
to [arguable illegal] TEXT with embedded NUL {or with broken UTF-X encoding}:
X'1234' || X'004567' [also look at
<http://permalink.gmane.org/gmane.comp.db.sqlite.general/68473>, where this
problem triggered invalid .dump])

SUBSTR also documented to operate *on string*, but, unlike above functions, when
supplied with BLOB argument it correctly returns BLOB value.

Not sure if it worth changing code, but certainly should be somehow mentioned in
documentation (including clearly stated "undocumented behavior on BLOB" as 
option).

> I don't think any official definition is stated anywhere in the documentation
> intended for users.  (I haven't looked at comments in the source code.)  And
> I suspect that if DRH wants to decide one way or another, this may point up
> some inconsistencies in the codebase which should be quickly fixed.

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to