Re: [sqlite] Possible bug in FTS3 or Parametrization?
On 5 August 2011 14:55, Dan Kennedy wrote: > On 08/05/2011 05:44 PM, Martin Gill wrote: >> On 5 August 2011 10:55, Dan Kennedy wrote: >>> >>> On 08/05/2011 02:58 PM, Martin Gill wrote: >> SQLiteParameter param1 = command.CreateParameter(); >> param1.Value = "FORÉIGN"; >> param1.ParameterName = "p1"; >> param1.DbType = System.Data.DbType.String; >> command.Parameters.Add(param1); >> >> SQLiteParameter param2 = command.CreateParameter(); >> param2.Value = "foréign"; >> param2.ParameterName = "p2"; >> param2.DbType = System.Data.DbType.String; >> command.Parameters.Add(param2); > > p2 is lower-case here, so it matches. If the upper-case value were > bound to p2, it would not. > Nice catch. Oops. Cheers, Martin ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Possible bug in FTS3 or Parametrization?
On 08/05/2011 05:44 PM, Martin Gill wrote: > On 5 August 2011 10:55, Dan Kennedy wrote: >> >> On 08/05/2011 02:58 PM, Martin Gill wrote: >>> >>> CREATE VIRTUAL TABLE full_text USING FTS3 (id,text) >>> INSERT INTO full_text VALUES ('1', 'I have a foréign character.') >> >> SQLite does not recognize upper/lower-case mappings for non-ASCII >> characters. That is why querying for "HAVE" works but "FORÉIGN" >> does not. If you need upper/lower case mapping for non-ASCII characters, >> you will need to use the ICU extension. >> > > That may be part of the problem. System.Data.SQlite isn't compiled with ICU, > so > I cannot check that quickly. I do though get exactly the same odd > behaviour using > the porter tokenizer. > >>> SELECT id, text FROM full_text WHERE NOT(text = @p1) AND text MATCH @p2 >>> >>> p1 = "foréign" >>> p2 = "FORÉIGN" >>> >>> Not using parametrized SQL also returns a correct result. >> >> I don't see how either the query above or a version that uses SQL >> literals could be returning any rows.. Not with the "CREATE VIRTUAL >> TABLE" as it stands above. Unless you specify the ICU tokenizer (or >> some other tokenizer that understands non-ASCII upper/lower case), >> "FORÉIGN" should not match the row in table "full_text". >> > SQLiteParameter param1 = command.CreateParameter(); > param1.Value = "FORÉIGN"; > param1.ParameterName = "p1"; > param1.DbType = System.Data.DbType.String; > command.Parameters.Add(param1); > > SQLiteParameter param2 = command.CreateParameter(); > param2.Value = "foréign"; > param2.ParameterName = "p2"; > param2.DbType = System.Data.DbType.String; > command.Parameters.Add(param2); p2 is lower-case here, so it matches. If the upper-case value were bound to p2, it would not. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Possible bug in FTS3 or Parametrization?
On 5 August 2011 10:55, Dan Kennedy wrote: > > On 08/05/2011 02:58 PM, Martin Gill wrote: > > > > CREATE VIRTUAL TABLE full_text USING FTS3 (id,text) > > INSERT INTO full_text VALUES ('1', 'I have a foréign character.') > > SQLite does not recognize upper/lower-case mappings for non-ASCII > characters. That is why querying for "HAVE" works but "FORÉIGN" > does not. If you need upper/lower case mapping for non-ASCII characters, > you will need to use the ICU extension. > That may be part of the problem. System.Data.SQlite isn't compiled with ICU, so I cannot check that quickly. I do though get exactly the same odd behaviour using the porter tokenizer. > > SELECT id, text FROM full_text WHERE NOT(text = @p1) AND text MATCH @p2 > > > > p1 = "foréign" > > p2 = "FORÉIGN" > > > > Not using parametrized SQL also returns a correct result. > > I don't see how either the query above or a version that uses SQL > literals could be returning any rows.. Not with the "CREATE VIRTUAL > TABLE" as it stands above. Unless you specify the ICU tokenizer (or > some other tokenizer that understands non-ASCII upper/lower case), > "FORÉIGN" should not match the row in table "full_text". > Except that it does. While I gave the unicode value, e-acute is ASCII 130, and capital e-acute is ASCII 144. That puts them on the extended ascii chart ( 128 < c <= 255 ) and I believe all western European codepages will have the characters. Create the database as indicated and run SELECT id, text FROM full_text WHERE NOT(text = 'foréign') AND text MATCH 'FORÉIGN' and you'll get a result. At least on my British installation of Windows. Here's the C# code that demonstrates the issue. My initial attachment seems to have been cut off: using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Data.SQLite; using System.Reflection; using System.IO; namespace sqlitetests { class Program { static void Main(string[] args) { SQLiteConnection connection = CreateOpenDB(); Test1(connection); Test2(connection); Test3(connection); Test4(connection); Console.ReadKey(); } static SQLiteConnection CreateOpenDB() { if (File.Exists("test.sq3")) { File.Delete("test.sq3"); } SQLiteConnectionStringBuilder csb = new SQLiteConnectionStringBuilder(); csb.DataSource = @"test.sq3"; csb.Version = 3; SQLiteConnection connection = new SQLiteConnection(csb.ToString()); connection.Open(); SQLiteCommand command = connection.CreateCommand(); command.CommandText = "CREATE VIRTUAL TABLE full_text USING FTS3 (id,text, tokenize=porter)"; command.ExecuteNonQuery(); command = connection.CreateCommand(); command.CommandText = "INSERT INTO full_text VALUES ('1', 'I have a foréign character.')"; command.ExecuteNonQuery(); return connection; } static void Test1(SQLiteConnection connection) { SQLiteCommand theCommand = new SQLiteCommand(connection); string tableName = "full_text"; theCommand.CommandText = string.Format("SELECT id, text FROM {0} WHERE NOT(text = @source) AND text MATCH @query", tableName); StringBuilder sb = new StringBuilder(); string query = "FORÉIGN"; string sourceText = "foréign"; SQLiteParameter param = theCommand.CreateParameter(); param.Value = query; param.ParameterName = "query"; param.DbType = System.Data.DbType.String; theCommand.Parameters.Add(param); param = theCommand.CreateParameter(); param.Value = sourceText.TrimEnd(); param.ParameterName = "source"; param.DbType = System.Data.DbType.String; theCommand.Parameters.Add(param); SQLiteDataReader reader = theCommand.ExecuteReader(); Console.WriteLine("{0} : {1}", MethodInfo.GetCurrentMethod().Name, reader.HasRows); } static void Test2(SQLiteConnection connection) { SQLiteCommand theCommand = new SQLiteCommand(connection); string tableName = "full_text"; theCommand.CommandText = string.Format("SELECT id, text FROM {0} WHERE NOT(text = @s) AND text MATCH @q", tableName); StringBuilder sb = new StringBuilder(); string query = "FORÉIGN"; string sourceText = "foréign"; SQLiteParameter param = theCommand.CreateParameter(); param.Value = query; param.ParameterName = "q"; param.DbType = System.Data.DbType.String; theCommand.Parameters.Add(param); param = theCommand.CreateParameter();
Re: [sqlite] Possible bug in FTS3 or Parametrization?
On 08/05/2011 02:58 PM, Martin Gill wrote: > Hi all > > I have discovered what may be a bug in either SQLLite or System.Data.SQLite > (v 1.0.66.0 using SQLite 3.6.23.1). I certainly don't understand what's > happening and I'd apprecite it if someone could help me out; perhaps I'm > missing something. > > To recreate the issue create a database with the following commands: > > CREATE VIRTUAL TABLE full_text USING FTS3 (id,text) > INSERT INTO full_text VALUES ('1', 'I have a foréign character.') SQLite does not recognize upper/lower-case mappings for non-ASCII characters. That is why querying for "HAVE" works but "FORÉIGN" does not. If you need upper/lower case mapping for non-ASCII characters, you will need to use the ICU extension. > SELECT id, text FROM full_text WHERE NOT(text = @p1) AND text MATCH @p2 > > p1 = "foréign" > p2 = "FORÉIGN" > > Not using parametrized SQL also returns a correct result. I don't see how either the query above or a version that uses SQL literals could be returning any rows.. Not with the "CREATE VIRTUAL TABLE" as it stands above. Unless you specify the ICU tokenizer (or some other tokenizer that understands non-ASCII upper/lower case), "FORÉIGN" should not match the row in table "full_text". ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Possible bug in FTS3 or Parametrization?
Hi all I have discovered what may be a bug in either SQLLite or System.Data.SQLite (v 1.0.66.0 using SQLite 3.6.23.1). I certainly don't understand what's happening and I'd apprecite it if someone could help me out; perhaps I'm missing something. To recreate the issue create a database with the following commands: CREATE VIRTUAL TABLE full_text USING FTS3 (id,text) INSERT INTO full_text VALUES ('1', 'I have a foréign character.') In case it gets lost, the e in foreign is actually an e-acute. unicode:00E9, unicode:00C9 for the capital version. This is significant. The test scenarios: The following query incorrectly returns zero rows: SELECT id, text FROM full_text WHERE NOT(text = @source) AND text MATCH @query source = "foréign" query = "FORÉIGN" As does this one: SELECT id, text FROM full_text WHERE NOT(text = @s) AND text MATCH @q s = "foréign" q = "FORÉIGN" The following queries correctly return 1 row: SELECT id, text FROM full_text WHERE NOT(text = @source) AND text MATCH @query source = "have" query = "HAVE" SELECT id, text FROM full_text WHERE NOT(text = @p1) AND text MATCH @p2 p1 = "foréign" p2 = "FORÉIGN" Not using parametrized SQL also returns a correct result. It seems that when doing this query for a word with an e-acute and having parameter names beginning with s and/or q gives the wrong result. Using different parameter names, not using parameters or searching for a word without an e-acute gives the correct result. The attached C# class demonstrates the problem. Any help is appreciated. Regards, Martin ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users