Re: [sqlite] Possible bug in FTS3 or Parametrization?

2011-08-05 Thread Martin Gill
On 5 August 2011 14:55, Dan Kennedy  wrote:
> On 08/05/2011 05:44 PM, Martin Gill wrote:
>> On 5 August 2011 10:55, Dan Kennedy  wrote:
>>>
>>> On 08/05/2011 02:58 PM, Martin Gill wrote:

>>              SQLiteParameter param1 = command.CreateParameter();
>>              param1.Value = "FORÉIGN";
>>              param1.ParameterName = "p1";
>>              param1.DbType = System.Data.DbType.String;
>>              command.Parameters.Add(param1);
>>
>>              SQLiteParameter param2 = command.CreateParameter();
>>              param2.Value = "foréign";
>>              param2.ParameterName = "p2";
>>              param2.DbType = System.Data.DbType.String;
>>              command.Parameters.Add(param2);
>
> p2 is lower-case here, so it matches. If the upper-case value were
> bound to p2, it would not.
>

Nice catch. Oops.

Cheers,

Martin
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Possible bug in FTS3 or Parametrization?

2011-08-05 Thread Dan Kennedy
On 08/05/2011 05:44 PM, Martin Gill wrote:
> On 5 August 2011 10:55, Dan Kennedy  wrote:
>>
>> On 08/05/2011 02:58 PM, Martin Gill wrote:
>>>
>>> CREATE VIRTUAL TABLE full_text USING FTS3 (id,text)
>>> INSERT INTO full_text VALUES ('1', 'I have a foréign character.')
>>
>> SQLite does not recognize upper/lower-case mappings for non-ASCII
>> characters. That is why querying for "HAVE" works but "FORÉIGN"
>> does not. If you need upper/lower case mapping for non-ASCII characters,
>> you will need to use the ICU extension.
>>
>
> That may be part of the problem. System.Data.SQlite isn't compiled with ICU, 
> so
> I cannot check that quickly. I do though get exactly the same odd
> behaviour using
> the porter tokenizer.
>
>>> SELECT id, text FROM full_text WHERE NOT(text = @p1) AND text MATCH @p2
>>>
>>> p1 = "foréign"
>>> p2 = "FORÉIGN"
>>>
>>> Not using parametrized SQL also returns a correct result.
>>
>> I don't see how either the query above or a version that uses SQL
>> literals could be returning any rows.. Not with the "CREATE VIRTUAL
>> TABLE" as it stands above. Unless you specify the ICU tokenizer (or
>> some other tokenizer that understands non-ASCII upper/lower case),
>> "FORÉIGN" should not match the row in table "full_text".
>>

>  SQLiteParameter param1 = command.CreateParameter();
>  param1.Value = "FORÉIGN";
>  param1.ParameterName = "p1";
>  param1.DbType = System.Data.DbType.String;
>  command.Parameters.Add(param1);
>
>  SQLiteParameter param2 = command.CreateParameter();
>  param2.Value = "foréign";
>  param2.ParameterName = "p2";
>  param2.DbType = System.Data.DbType.String;
>  command.Parameters.Add(param2);

p2 is lower-case here, so it matches. If the upper-case value were
bound to p2, it would not.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Possible bug in FTS3 or Parametrization?

2011-08-05 Thread Martin Gill
On 5 August 2011 10:55, Dan Kennedy  wrote:
>
> On 08/05/2011 02:58 PM, Martin Gill wrote:
> >
> > CREATE VIRTUAL TABLE full_text USING FTS3 (id,text)
> > INSERT INTO full_text VALUES ('1', 'I have a foréign character.')
>
> SQLite does not recognize upper/lower-case mappings for non-ASCII
> characters. That is why querying for "HAVE" works but "FORÉIGN"
> does not. If you need upper/lower case mapping for non-ASCII characters,
> you will need to use the ICU extension.
>

That may be part of the problem. System.Data.SQlite isn't compiled with ICU, so
I cannot check that quickly. I do though get exactly the same odd
behaviour using
the porter tokenizer.

> > SELECT id, text FROM full_text WHERE NOT(text = @p1) AND text MATCH @p2
> >
> > p1 = "foréign"
> > p2 = "FORÉIGN"
> >
> > Not using parametrized SQL also returns a correct result.
>
> I don't see how either the query above or a version that uses SQL
> literals could be returning any rows.. Not with the "CREATE VIRTUAL
> TABLE" as it stands above. Unless you specify the ICU tokenizer (or
> some other tokenizer that understands non-ASCII upper/lower case),
> "FORÉIGN" should not match the row in table "full_text".
>

Except that it does.

While I gave the unicode value, e-acute is ASCII 130, and capital
e-acute is ASCII 144.
That puts them on the extended ascii chart ( 128 < c <= 255 ) and I
believe all western
European codepages will have the characters.

Create the database as indicated and run

SELECT id, text FROM full_text WHERE NOT(text = 'foréign') AND text
MATCH 'FORÉIGN'

and you'll get a result. At least on my British installation of Windows.

Here's the C# code that demonstrates the issue.
My initial attachment seems to have been cut off:


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.SQLite;
using System.Reflection;
using System.IO;

namespace sqlitetests
{
class Program
{
static void Main(string[] args)
{
SQLiteConnection connection = CreateOpenDB();

Test1(connection);
Test2(connection);
Test3(connection);
Test4(connection);
Console.ReadKey();
}

static SQLiteConnection CreateOpenDB()
{
if (File.Exists("test.sq3"))
{
File.Delete("test.sq3");
}

SQLiteConnectionStringBuilder csb = new
SQLiteConnectionStringBuilder();
csb.DataSource = @"test.sq3";
csb.Version = 3;
SQLiteConnection connection = new SQLiteConnection(csb.ToString());
connection.Open();

SQLiteCommand command = connection.CreateCommand();
command.CommandText = "CREATE VIRTUAL TABLE full_text
USING FTS3 (id,text, tokenize=porter)";
command.ExecuteNonQuery();

command = connection.CreateCommand();
command.CommandText = "INSERT INTO full_text VALUES ('1',
'I have a foréign character.')";
command.ExecuteNonQuery();

return connection;

}

static void Test1(SQLiteConnection connection)
{
SQLiteCommand theCommand = new SQLiteCommand(connection);
string tableName = "full_text";

theCommand.CommandText = string.Format("SELECT id, text
FROM {0} WHERE NOT(text = @source) AND text MATCH @query", tableName);


StringBuilder sb = new StringBuilder();

string query = "FORÉIGN";
string sourceText = "foréign";

SQLiteParameter param = theCommand.CreateParameter();
param.Value = query;
param.ParameterName = "query";
param.DbType = System.Data.DbType.String;
theCommand.Parameters.Add(param);

param = theCommand.CreateParameter();
param.Value = sourceText.TrimEnd();
param.ParameterName = "source";
param.DbType = System.Data.DbType.String;
theCommand.Parameters.Add(param);

SQLiteDataReader reader = theCommand.ExecuteReader();
Console.WriteLine("{0} : {1}",
MethodInfo.GetCurrentMethod().Name, reader.HasRows);

}


static void Test2(SQLiteConnection connection)
{
SQLiteCommand theCommand = new SQLiteCommand(connection);
string tableName = "full_text";

theCommand.CommandText = string.Format("SELECT id, text
FROM {0} WHERE NOT(text = @s) AND text MATCH @q", tableName);


StringBuilder sb = new StringBuilder();

string query = "FORÉIGN";
string sourceText = "foréign";

SQLiteParameter param = theCommand.CreateParameter();
param.Value = query;
param.ParameterName = "q";
param.DbType = System.Data.DbType.String;
theCommand.Parameters.Add(param);

param = 

Re: [sqlite] Possible bug in FTS3 or Parametrization?

2011-08-05 Thread Dan Kennedy
On 08/05/2011 02:58 PM, Martin Gill wrote:
> Hi all
>
> I have discovered what may be a bug in either SQLLite or System.Data.SQLite
> (v 1.0.66.0 using SQLite 3.6.23.1). I certainly don't understand what's
> happening and I'd apprecite it if someone could help me out; perhaps I'm
> missing something.
>
> To recreate the issue create a database with the following commands:
>
> CREATE VIRTUAL TABLE full_text USING FTS3 (id,text)
> INSERT INTO full_text VALUES ('1', 'I have a foréign character.')

SQLite does not recognize upper/lower-case mappings for non-ASCII
characters. That is why querying for "HAVE" works but "FORÉIGN"
does not. If you need upper/lower case mapping for non-ASCII characters,
you will need to use the ICU extension.

> SELECT id, text FROM full_text WHERE NOT(text = @p1) AND text MATCH @p2
>
> p1 = "foréign"
> p2 = "FORÉIGN"
>
> Not using parametrized SQL also returns a correct result.

I don't see how either the query above or a version that uses SQL
literals could be returning any rows.. Not with the "CREATE VIRTUAL
TABLE" as it stands above. Unless you specify the ICU tokenizer (or
some other tokenizer that understands non-ASCII upper/lower case),
"FORÉIGN" should not match the row in table "full_text".



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users