Re: [sqlite] FTS3 finds too much: Slash special meaning? Something else?

2010-09-04 Thread Lukas Haase
Am 03.09.2010 13:27, schrieb Dan Kennedy:

 On Sep 2, 2010, at 6:37 PM, Lukas Haase wrote:

 Hi,

 I use FTS3 (SQLITE_ENABLE_FTS3) with enhanced query syntax
 (SQLITE_ENABLE_FTS3_PARENTHESIS).

 Now if I search for a string like '2002/91/AH' there are lots of items
 which do NOT contain this string. This is a query:

 SELECT rowid, content FROM fulltext WHERE content MATCH '2002/91/AH';

 In my case, there are only 10 items which actually contain the string
 '2002/91/AH' but the query above gives me 162 (!) matches!

 I can not find any reason for this. Some of the topics contain
 similar
 strings like 2002/96/AH or even 94/31/EG. But in fact, these strings
 must not be matched :-(

 Does the slash have a special meaning in the query syntax? Does a
 query
 like 2002/91/AH have a special meaning?

 The '/' characters are serving as token separators. So
 you are searching for (2002 OR 91 OR ah). If you enclose
 the date in double quotes:

 ... MATCH '2002/91/AH'

 you will be searching for the phrase 2002 91 ah, which
 is as close as you can get to what you want without writing
 a custom tokenizer:

 http://www.sqlite.org/fts3.html#section_5_1

Oh great! Thank you for your hint! With double quotes it works as I 
would expect it.

In general writing a custom tokenizer would not be a problem BUT in my 
case it is complicated because the database (including the fulltext 
table) is created on a different machine with SQLites packages from a 
distributor (Debian stable).

The data is queried using a custom C++ application, so on client side I 
may alter the code.

Until now I use the simple tokenizer. At first I thought I can make it 
work if I just remove the slash '/' from the separator list. Therefore I 
added the following line to simpleCreate:

t-delim['/'] = 0;

Now, nothing was found with content MATCH '2002/91/AH'; and first I did 
not understand why.

After your explanation it is clear why: 2002, 91 and AH are separate 
tokens in my database and MATCH '2002/91/AH' would normally search for 
the tokens 2002, 91, AH next to each other. But after the patch above, 
2002/91/AH is a single token which is not found.

So my questions is: Is there any other way to achieve my goal?

Regards,
Luke


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS3 finds too much: Slash special meaning? Something else?

2010-09-03 Thread Lukas Haase
Hi,

I use FTS3 (SQLITE_ENABLE_FTS3) with enhanced query syntax 
(SQLITE_ENABLE_FTS3_PARENTHESIS).

Now if I search for a string like '2002/91/AH' there are lots of items 
which do NOT contain this string. This is a query:

SELECT rowid, content FROM fulltext WHERE content MATCH '2002/91/AH';

In my case, there are only 10 items which actually contain the string 
'2002/91/AH' but the query above gives me 162 (!) matches!

I can not find any reason for this. Some of the topics contain similar 
strings like 2002/96/AH or even 94/31/EG. But in fact, these strings 
must not be matched :-(

Does the slash have a special meaning in the query syntax? Does a query 
like 2002/91/AH have a special meaning?

What else could be the reason and is there a way to prevent FTS to find 
this wrong entries?

Thank you very much in advance!

Regards,
Luke

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS3 finds too much: Slash special meaning? Something else?

2010-09-03 Thread Dan Kennedy

On Sep 2, 2010, at 6:37 PM, Lukas Haase wrote:

 Hi,

 I use FTS3 (SQLITE_ENABLE_FTS3) with enhanced query syntax
 (SQLITE_ENABLE_FTS3_PARENTHESIS).

 Now if I search for a string like '2002/91/AH' there are lots of items
 which do NOT contain this string. This is a query:

 SELECT rowid, content FROM fulltext WHERE content MATCH '2002/91/AH';

 In my case, there are only 10 items which actually contain the string
 '2002/91/AH' but the query above gives me 162 (!) matches!

 I can not find any reason for this. Some of the topics contain  
 similar
 strings like 2002/96/AH or even 94/31/EG. But in fact, these strings
 must not be matched :-(

 Does the slash have a special meaning in the query syntax? Does a  
 query
 like 2002/91/AH have a special meaning?

The '/' characters are serving as token separators. So
you are searching for (2002 OR 91 OR ah). If you enclose
the date in double quotes:

   ... MATCH '2002/91/AH'

you will be searching for the phrase 2002 91 ah, which
is as close as you can get to what you want without writing
a custom tokenizer:

   http://www.sqlite.org/fts3.html#section_5_1

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users