Re: [sqlite] speeding up an FTS5 search with a JOIN

Dan Kennedy Mon, 09 Mar 2020 05:04:37 -0700


On 9/3/63 01:44, Puneet Kishor wrote:

Update: so, after much hitting of my head against the wall of sql, I came up 
with the following – as noted above, I really have two distinct set of queries 
I can do separately like so


Q1: (SELECT t1Id FROM t1 WHERE …) AS a

Q2: (SELECT t1Id FROM vt1 WHERE vt1 MATCH ‘bar’) AS b

Then, I can do the following -

SELECT Count(*) FROM a WHERE a.t1Id IN b

Of course, in reality, I don’t do this separately but all in one go to make a 
really messy SQL but a really fast query, a couple of hundred ms as opposed to 
> 25s

I don't see why that would be any different from your join query. Whatdoes EXPLAIN QUERY PLAN say for the version that takes a few hundred ms?


Dan.


You might notice that in my Q2 above I MATCHed for ‘bar’ instead of ‘foo’. That is 
because ‘bar’ returns fewer rows than ‘foo’ does. The problem remains when there 
are too many matches in the FTS query in which case the FTS query itself is slow, 
for example, with ‘foo’ which matches > 80K rows.

Now, one interesting point of comparison – the same kind of query (from the 
user point-of-view) against an ElasticSearch instance (that is, all the rows 
with ‘foo’ anywhere in the text) is very fast, in the order of sub-hundred ms. 
I realize it might be unfair comparing SQLite with ElasticSearch, but still. 
(Or, is it a fair comparison?)

On Mar 7, 2020, at 8:59 AM, P Kishor <punk.k...@gmail.com> wrote:

I asked this question on Stackoverflow with not much success, and a suggestion 
to ask it on the list. So here I am. I have two tables, t1(id, t1Id, … other 
cols …, fullText) and a FTS5 virtual table vt1(t1Id, fullText)

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num FROM t1 WHERE deleted = 0;

QUERY
PLAN
--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=?)

sqlite> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
308498

Run Time
: real 0.043 user 0.023668 sys 0.009005
```


As can be see above, the actual query takes ~43ms

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';

QUERY
PLAN
--SCAN TABLE vt1 VIRTUAL TABLE INDEX 131073:

sqlite> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
80789

Run Time
: real 0.047 user 0.008021 sys 0.009640
```

The actual query, in this case, takes ~47ms. So far so good. But the problem 
occurs when I join the two tables

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';

QUERY
PLAN
|--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
--SEARCH TABLE t1 USING INDEX sqlite_autoindex_t1_1 (t1Id=?)

sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
80789

Run Time
: real 26.218 user 1.396376 sys 5.413630
```

The answer is correct but the query takes more than 26 seconds! Of course, I 
would like to speed up this query by several orders of magnitude, but I would 
also like to understand why this join is causing the slowdown.

Now, the reason I have constructed a query like this is because users can add 
further constraints for the table t1. For example,

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND
   ...> WHERE t1.frob = ‘bar' AND
   ...> WHERE t1.nob = ‘baz' AND
   ...> vt1 MATCH 'foo’;
```

Also, in every operation, for every given constraint, two queries are 
performed, one that returns the count and the other that returns the actual 
columns. And, finally, only a subset of the results are returned using LIMIT 
and OFFSET but *after* a sort ORDER has been prescribed. So, in reality, the 
last constraint above would result in the following

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND
   ...> WHERE t1.frob = ‘bar' AND
   ...> WHERE t1.nob = ‘baz' AND
   ...> vt1 MATCH 'foo’;
20367

sqlite> SELECT t1.id, t1.t1Id, … other cols …,
   ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND
   ...> WHERE t1.frob = ‘bar' AND
   ...> WHERE t1.nob = ‘baz' AND
   ...> vt1 MATCH ‘foo’
   ...> ORDER BY <some t1 col>
   ...> LIMIT 30 OFFSET <some offset calculated by pagination>;
```

When no t1 columns are prescribed in the constraint, the default count (shown 
above) and default cols are returned with the FTS search

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0
   ...> vt1 MATCH 'foo’;
20367

sqlite> SELECT t1.id, t1.t1Id, … other cols …,
   ...> snippet(vt1, 1, "<b>", "</b>", "", 50) AS context,
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0
   ...> vt1 MATCH ‘foo’
   ...> ORDER BY <some t1 col>
   ...> LIMIT 30 OFFSET 0;
```

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] speeding up an FTS5 search with a JOIN

Reply via email to