Re: [sqlite] FTS slowdown with matchinfo

2011-02-17 Thread Iker Arizmendi
Dan Kennedy wrote:
> On 02/17/2011 05:41 AM, Iker Arizmendi wrote:
>> Dan Kennedy wrote:
>>> Can you make the database available for download? And
>>> supply the exact query you are using too? I'd like to
>>> know why this is. Thanks.
>>>
>>> Dan.
>>>
>> You can find a tarball of the DB file here:
>>
>>  http://www.research.att.com/people/Arizmendi_Iker/geo.db.tgz
>>
>> This query runs in around 1.2 seconds:
>>
>>  SELECT length(content)
>>  FROM locateme
>>  WHERE locateme MATCH 'newark OR new OR brunswick';
>>
>> And this one in around 8.5 minutes:
>>
>>  SELECT length(matchinfo(locateme, 'x'))
>>  FROM locateme
>>  WHERE locateme MATCH 'newark OR new OR brunswick';
>>
> 
> The database uses a custom tokenizer - "stopwords" - so I can't
> run the queries directly. If I dump the data into a regular fts3
> table using the default tokenizer and then run your queries with
> 3.7.5 they both run in pretty much the same amount of time. Both
> much quicker than 1 second on a Linux PC.
> 
> There was a bug causing excessive calls to realloc() fixed a
> little while ago, although from memory I don't think it would
> have hit this case. The symptoms are similar though, so I could
> easily be wrong on that.
> 
> Suggest upgrading to 3.7.5 to see if that clears the problem.
> 
> If you can get this slowdown with 3.7.5 and one of the built in
> tokenizers, please post so I can look again.
> 
> Thanks,
> Dan.

Upgrading to 3.7.5 (from 3.7.4) did the trick.

Thanks again!

Iker





___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS slowdown with matchinfo

2011-02-17 Thread Dan Kennedy
On 02/17/2011 05:41 AM, Iker Arizmendi wrote:
> Dan Kennedy wrote:
>>
>> Can you make the database available for download? And
>> supply the exact query you are using too? I'd like to
>> know why this is. Thanks.
>>
>> Dan.
>>
>
> You can find a tarball of the DB file here:
>
>  http://www.research.att.com/people/Arizmendi_Iker/geo.db.tgz
>
> This query runs in around 1.2 seconds:
>
>  SELECT length(content)
>  FROM locateme
>  WHERE locateme MATCH 'newark OR new OR brunswick';
>
> And this one in around 8.5 minutes:
>
>  SELECT length(matchinfo(locateme, 'x'))
>  FROM locateme
>  WHERE locateme MATCH 'newark OR new OR brunswick';
>

The database uses a custom tokenizer - "stopwords" - so I can't
run the queries directly. If I dump the data into a regular fts3
table using the default tokenizer and then run your queries with
3.7.5 they both run in pretty much the same amount of time. Both
much quicker than 1 second on a Linux PC.

There was a bug causing excessive calls to realloc() fixed a
little while ago, although from memory I don't think it would
have hit this case. The symptoms are similar though, so I could
easily be wrong on that.

Suggest upgrading to 3.7.5 to see if that clears the problem.

If you can get this slowdown with 3.7.5 and one of the built in
tokenizers, please post so I can look again.

Thanks,
Dan.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS slowdown with matchinfo

2011-02-16 Thread Iker Arizmendi
Dan Kennedy wrote:
> 
> Can you make the database available for download? And
> supply the exact query you are using too? I'd like to
> know why this is. Thanks.
> 
> Dan.
> 

You can find a tarball of the DB file here:

http://www.research.att.com/people/Arizmendi_Iker/geo.db.tgz

This query runs in around 1.2 seconds:

SELECT length(content)
FROM locateme
WHERE locateme MATCH 'newark OR new OR brunswick';

And this one in around 8.5 minutes:

SELECT length(matchinfo(locateme, 'x'))
FROM locateme
WHERE locateme MATCH 'newark OR new OR brunswick';

Thanks!

Iker

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS slowdown with matchinfo

2011-02-15 Thread Dan Kennedy
On 02/16/2011 12:50 AM, Iker Arizmendi wrote:
> Hello all,
>
> I'm running into what seems like an abnormally large
> performance drop on on some FTS queries that use
> matchinfo when compared to those that don't.
>
> I created an FTS table using the following:
>
>  CREATE VIRTUAL TABLE test
>  USING FTS4(rowid, content);
>
> and then filled it with ~2 million docs of ~10 tokens
> each which resulted in a file of around 275 MB. After
> running "optimize" on the table I issued a query with
> 3 terms like so:
>
>  SELECT length(content) FROM test
>  WHERE MATCH "w1 OR w2 OR w3"
>
> which returned ~164,000 rows in 1.1 seconds. However,
> if I throw in a call to matchinfo:
>
>  SELECT length(matchinfo(test, 'x')) FROM test
>  WHERE MATCH "w1 OR w2 OR w3"
>
> the query takes 7.5 minutes. It seems FTS is getting
> stuck calculating the 2nd and 3rd part of the "x"
> matchinfo data ("hits all rows" and "docs with hits")
> but it's not clear why this should take so long.
> Any ideas on what might be causing the slowdown?

Can you make the database available for download? And
supply the exact query you are using too? I'd like to
know why this is. Thanks.

Dan.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users