Re: [sqlite] speeding up an FTS5 search with a JOIN

2020-03-09 Thread P Kishor
following up on my own response to Dan…

I see what you mean Dan. I am not an expert at reading the query plans, but I 
do see that effectively my new query has the same query plan as the last two 
queries combined as well as the original query. The only difference is the 
order in which it proceeds. So, the original (slow query)

|--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
`--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)

The two sub-queries separately that run very fast

`--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m

and 
|--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
`--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)

and the new query that also runs very fast using the two sub-queries

QUERY PLAN
|--CO-ROUTINE 1
|  |--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
|  `--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
|--SCAN SUBQUERY 1 AS a
`--LIST SUBQUERY 2
  `--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m


No idea what is going on.


> On Mar 9, 2020, at 2:08 PM, P Kishor  wrote:
> 
> 
> 
>> On Mar 9, 2020, at 1:04 PM, Dan Kennedy  wrote:
>> 
>> 
>> On 9/3/63 01:44, Puneet Kishor wrote:
>>> Update: so, after much hitting of my head against the wall of sql, I came 
>>> up with the following – as noted above, I really have two distinct set of 
>>> queries I can do separately like so
>>> 
>>> Q1: (SELECT t1Id FROM t1 WHERE …) AS a
>>> 
>>> Q2: (SELECT t1Id FROM vt1 WHERE vt1 MATCH ‘bar’) AS b
>>> 
>>> Then, I can do the following -
>>> 
>>> SELECT Count(*) FROM a WHERE a.t1Id IN b
>>> 
>>> Of course, in reality, I don’t do this separately but all in one go to make 
>>> a really messy SQL but a really fast query, a couple of hundred ms as 
>>> opposed to > 25s
>> 
>> 
>> I don't see why that would be any different from your join query. What does 
>> EXPLAIN QUERY PLAN say for the version that takes a few hundred ms?
> 
> 
> Here it is, the new query made up of two queries derived from the original 
> query (summarize again below). The query itself takes 301ms. 
> 
> ```
> SQLite version 3.30.0 2019-10-04 15:03:17
> Enter ".help" for usage hints.
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(*) FROM
>   ...> 
>   ...> -- this one is 'QUERY a'
>   ...> (SELECT Count(DISTINCT t1.t1Id) t1Id 
>   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id 
>   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '') a
>   ...>
>   ...> WHERE a.t1Id IN 
>   ...> 
>   ...> -- and this one is 'QUERY b'
>   ...> (SELECT Count(DISTINCT t1Id) FROM vt1 WHERE vt1 MATCH "foo");
> QUERY PLAN
> |--CO-ROUTINE 1
> |  |--SEARCH TABLE t2 USING INDEX ix_t2 (deleted=? AND deleted=?)
> |  `--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
> |--SCAN SUBQUERY 1 AS a
> `--LIST SUBQUERY 2
>   `--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
> sqlite>
> ```
> 
> Here is the original query (QUERY zero). This is the one that takes a very 
> long time, 10.611s. Note, in this one I’ve added an additional table ’t2’ but 
> the concept is the same. Normal tables JOINed to each other, and then JOINed 
> to a virtual table.
> 
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(DISTINCT t1.t1Id)
>   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '' AND vt1 MATCH 
> "foo";
> QUERY PLAN
> |--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
> |--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
> `--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
> sqlite>
> ```
> 
> Here are the two queries derived from QUERY zero. The first one (QUERY a) 
> takes 324ms
> 
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(DISTINCT t1.t1Id)
>   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id
>   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '';
> QUERY PLAN
> |--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
> `--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
> sqlite>
> ```
> 
> The second query (QUERY b) takes: 27ms
> 
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(DISTINCT t1Id) FROM vt1 WHERE vt1 MATCH “foo";
> QUERY PLAN
> `--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
> sqlite>
> ```
> 
> One more thing: All of this goes to hell if the virtual table returns way too 
> many matches. For example, I have a term that returns 80K rows from the FTS 
> MATCH. In that case, even my new query very slow because, well, because the 
> QUERY b above is slow.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up an FTS5 search with a JOIN

2020-03-09 Thread P Kishor


> On Mar 9, 2020, at 1:04 PM, Dan Kennedy  wrote:
> 
> 
> On 9/3/63 01:44, Puneet Kishor wrote:
>> Update: so, after much hitting of my head against the wall of sql, I came up 
>> with the following – as noted above, I really have two distinct set of 
>> queries I can do separately like so
>> 
>> Q1: (SELECT t1Id FROM t1 WHERE …) AS a
>> 
>> Q2: (SELECT t1Id FROM vt1 WHERE vt1 MATCH ‘bar’) AS b
>> 
>> Then, I can do the following -
>> 
>> SELECT Count(*) FROM a WHERE a.t1Id IN b
>> 
>> Of course, in reality, I don’t do this separately but all in one go to make 
>> a really messy SQL but a really fast query, a couple of hundred ms as 
>> opposed to > 25s
> 
> 
> I don't see why that would be any different from your join query. What does 
> EXPLAIN QUERY PLAN say for the version that takes a few hundred ms?


Here it is, the new query made up of two queries derived from the original 
query (summarize again below). The query itself takes 301ms. 

```
SQLite version 3.30.0 2019-10-04 15:03:17
Enter ".help" for usage hints.
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) FROM
   ...> 
   ...> -- this one is 'QUERY a'
   ...> (SELECT Count(DISTINCT t1.t1Id) t1Id 
   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id 
   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '') a
   ...>
   ...> WHERE a.t1Id IN 
   ...> 
   ...> -- and this one is 'QUERY b'
   ...> (SELECT Count(DISTINCT t1Id) FROM vt1 WHERE vt1 MATCH "foo");
QUERY PLAN
|--CO-ROUTINE 1
|  |--SEARCH TABLE t2 USING INDEX ix_t2 (deleted=? AND deleted=?)
|  `--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
|--SCAN SUBQUERY 1 AS a
`--LIST SUBQUERY 2
   `--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
sqlite>
```

Here is the original query (QUERY zero). This is the one that takes a very long 
time, 10.611s. Note, in this one I’ve added an additional table ’t2’ but the 
concept is the same. Normal tables JOINed to each other, and then JOINed to a 
virtual table.

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(DISTINCT t1.t1Id)
   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '' AND vt1 MATCH 
"foo";
QUERY PLAN
|--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
|--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
`--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
sqlite>
```

Here are the two queries derived from QUERY zero. The first one (QUERY a) takes 
324ms

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(DISTINCT t1.t1Id)
   ...> FROM t2 JOIN t1 ON t2.t1Id = t1.t1Id
   ...> WHERE t1.deleted = 0 AND t2.deleted = 0 AND scm != '';
QUERY PLAN
|--SEARCH TABLE t2 USING INDEX ix_t2_scm (deleted=? AND deleted=?)
`--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=? AND t1Id=?)
sqlite>
```

The second query (QUERY b) takes: 27ms

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(DISTINCT t1Id) FROM vt1 WHERE vt1 MATCH “foo";
QUERY PLAN
`--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
sqlite>
```

One more thing: All of this goes to hell if the virtual table returns way too 
many matches. For example, I have a term that returns 80K rows from the FTS 
MATCH. In that case, even my new query very slow because, well, because the 
QUERY b above is slow.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up an FTS5 search with a JOIN

2020-03-09 Thread Dan Kennedy


On 9/3/63 01:44, Puneet Kishor wrote:

Update: so, after much hitting of my head against the wall of sql, I came up 
with the following – as noted above, I really have two distinct set of queries 
I can do separately like so

Q1: (SELECT t1Id FROM t1 WHERE …) AS a

Q2: (SELECT t1Id FROM vt1 WHERE vt1 MATCH ‘bar’) AS b

Then, I can do the following -

SELECT Count(*) FROM a WHERE a.t1Id IN b

Of course, in reality, I don’t do this separately but all in one go to make a 
really messy SQL but a really fast query, a couple of hundred ms as opposed to 
> 25s



I don't see why that would be any different from your join query. What 
does EXPLAIN QUERY PLAN say for the version that takes a few hundred ms?


Dan.





You might notice that in my Q2 above I MATCHed for ‘bar’ instead of ‘foo’. That is 
because ‘bar’ returns fewer rows than ‘foo’ does. The problem remains when there 
are too many matches in the FTS query in which case the FTS query itself is slow, 
for example, with ‘foo’ which matches > 80K rows.

Now, one interesting point of comparison – the same kind of query (from the 
user point-of-view) against an ElasticSearch instance (that is, all the rows 
with ‘foo’ anywhere in the text) is very fast, in the order of sub-hundred ms. 
I realize it might be unfair comparing SQLite with ElasticSearch, but still. 
(Or, is it a fair comparison?)



On Mar 7, 2020, at 8:59 AM, P Kishor  wrote:

I asked this question on Stackoverflow with not much success, and a suggestion 
to ask it on the list. So here I am. I have two tables, t1(id, t1Id, … other 
cols …, fullText) and a FTS5 virtual table vt1(t1Id, fullText)

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num FROM t1 WHERE deleted = 0;

QUERY
PLAN
--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=?)

sqlite> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
308498

Run Time
: real 0.043 user 0.023668 sys 0.009005
```


As can be see above, the actual query takes ~43ms

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';

QUERY
PLAN
--SCAN TABLE vt1 VIRTUAL TABLE INDEX 131073:

sqlite> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
80789

Run Time
: real 0.047 user 0.008021 sys 0.009640
```

The actual query, in this case, takes ~47ms. So far so good. But the problem 
occurs when I join the two tables

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';

QUERY
PLAN
|--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
--SEARCH TABLE t1 USING INDEX sqlite_autoindex_t1_1 (t1Id=?)

sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
80789

Run Time
: real 26.218 user 1.396376 sys 5.413630
```

The answer is correct but the query takes more than 26 seconds! Of course, I 
would like to speed up this query by several orders of magnitude, but I would 
also like to understand why this join is causing the slowdown.

Now, the reason I have constructed a query like this is because users can add 
further constraints for the table t1. For example,

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND
   ...> WHERE t1.frob = ‘bar' AND
   ...> WHERE t1.nob = ‘baz' AND
   ...> vt1 MATCH 'foo’;
```

Also, in every operation, for every given constraint, two queries are 
performed, one that returns the count and the other that returns the actual 
columns. And, finally, only a subset of the results are returned using LIMIT 
and OFFSET but *after* a sort ORDER has been prescribed. So, in reality, the 
last constraint above would result in the following

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND
   ...> WHERE t1.frob = ‘bar' AND
   ...> WHERE t1.nob = ‘baz' AND
   ...> vt1 MATCH 'foo’;
20367

sqlite> SELECT t1.id, t1.t1Id, … other cols …,
   ...> snippet(vt1, 1, "", "", "", 50) AS context,
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND
   ...> WHERE t1.frob = ‘bar' AND
   ...> WHERE t1.nob = ‘baz' AND
   ...> vt1 MATCH ‘foo’
   ...> ORDER BY 
   ...> LIMIT 30 OFFSET ;
```

When no t1 columns are prescribed in the constraint, the default count (shown 
above) and default cols are returned with the FTS search

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0
   ...> vt1 MATCH 'foo’;
20367

sqlite> SELECT t1.id, t1.t1Id, … other cols …,
   ...> snippet(vt1, 1, "", "", "", 50) AS context,
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0
   ...> vt1 MATCH ‘foo’
   ...> ORDER BY 
   ...> LIMIT 30 OFFSET 0;
```

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] speeding up an FTS5 search with a JOIN

2020-03-09 Thread P Kishor

Hi Dan,


> On Sat Mar 7 13:32:54 UTC 2020,Dan Kennedy danielk1977 at gmail.com wrote:

>> On 7/3/63 14:58, P Kishor wrote: 

[snipped]

>> The actual query, in this case, takes ~47ms. So far so good. But the problem 
>> occurs when I join the two tables 
>>
>> ``` 
>> sqlite> EXPLAIN QUERY PLAN 
>> ...> SELECT Count(*) as num 
>> ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id 
>> ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo'; 
>>
>> QUERY 
>> PLAN 
>> |--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m 
>> --SEARCH TABLE t1 USING INDEX sqlite_autoindex_t1_1 (t1Id=?) 
>>
>> sqlite> SELECT Count(*) as num 
>> ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id 
>> ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo'; 
>> 80789 
>>
>> Run Time 
>> : real 26.218 user 1.396376 sys 5.413630 
>> ``` 

>That does seem slow. Are there many rows in table "t1" with t1.deleted set to 
>something other than 0? 

No, none of them have `deleted != 0`. Explanation: `deleted` is a flag field 
that will track (in the future) when any of the rows are tagged as deleted, no 
longer in use. The idea is to exclude “deleted” rows from searches, but not 
really delete them. As of now, all the rows are active, so the column is set to 
0 (false) for all the rows. In other words,

```
sqlite> SELECT count(*) FROM t1 WHERE deleted = 0;
308498

sqlite> SELECT count(*) FROM t1 WHERE deleted != 0;
0
```

> What does:   SELECT count(*) AS num FROM t1 JOIN vt1 ON t1.t1Id=vt1.t1Id 
> WHERE vt1 MATCH 'foo'; return? Dan. 

sqlite> SELECT count(*) AS num FROM t1 JOIN vt1 ON t1.t1Id=vt1.t1Id WHERE vt1 
MATCH 'foo';
80789


[snipped]

Please also see my follow-up email with an update sent yesterday, Mar 8 at 
18:44:14 UTC 2020

Many thanks.

Note: I had accidentally set the wrong mail-delivery flag on my subscription so 
I didn’t get earlier emails. It has been set right now. There is a possibility 
this email may not be a part of the original thread, in which case, my 
apologies in advance.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up an FTS5 search with a JOIN

2020-03-08 Thread Puneet Kishor
Update: so, after much hitting of my head against the wall of sql, I came up 
with the following – as noted above, I really have two distinct set of queries 
I can do separately like so

Q1: (SELECT t1Id FROM t1 WHERE …) AS a

Q2: (SELECT t1Id FROM vt1 WHERE vt1 MATCH ‘bar’) AS b

Then, I can do the following -

SELECT Count(*) FROM a WHERE a.t1Id IN b

Of course, in reality, I don’t do this separately but all in one go to make a 
really messy SQL but a really fast query, a couple of hundred ms as opposed to 
> 25s

You might notice that in my Q2 above I MATCHed for ‘bar’ instead of ‘foo’. That 
is because ‘bar’ returns fewer rows than ‘foo’ does. The problem remains when 
there are too many matches in the FTS query in which case the FTS query itself 
is slow, for example, with ‘foo’ which matches > 80K rows.

Now, one interesting point of comparison – the same kind of query (from the 
user point-of-view) against an ElasticSearch instance (that is, all the rows 
with ‘foo’ anywhere in the text) is very fast, in the order of sub-hundred ms. 
I realize it might be unfair comparing SQLite with ElasticSearch, but still. 
(Or, is it a fair comparison?)


> On Mar 7, 2020, at 8:59 AM, P Kishor  wrote:
> 
> I asked this question on Stackoverflow with not much success, and a 
> suggestion to ask it on the list. So here I am. I have two tables, t1(id, 
> t1Id, … other cols …, fullText) and a FTS5 virtual table vt1(t1Id, fullText)
> 
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
> 
> QUERY 
> PLAN
> --SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=?)
> 
> sqlite> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
> 308498
> 
> Run Time
> : real 0.043 user 0.023668 sys 0.009005
> ```
> 
> 
> As can be see above, the actual query takes ~43ms
> 
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
> 
> QUERY 
> PLAN
> --SCAN TABLE vt1 VIRTUAL TABLE INDEX 131073:
> 
> sqlite> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
> 80789
> 
> Run Time
> : real 0.047 user 0.008021 sys 0.009640
> ```
> 
> The actual query, in this case, takes ~47ms. So far so good. But the problem 
> occurs when I join the two tables
> 
> ```
> sqlite> EXPLAIN QUERY PLAN
>   ...> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
> 
> QUERY 
> PLAN
> |--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
> --SEARCH TABLE t1 USING INDEX sqlite_autoindex_t1_1 (t1Id=?)
> 
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
> 80789
> 
> Run Time
> : real 26.218 user 1.396376 sys 5.413630
> ```
> 
> The answer is correct but the query takes more than 26 seconds! Of course, I 
> would like to speed up this query by several orders of magnitude, but I would 
> also like to understand why this join is causing the slowdown.
> 
> Now, the reason I have constructed a query like this is because users can add 
> further constraints for the table t1. For example, 
> 
> ```
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND 
>   ...> WHERE t1.frob = ‘bar' AND 
>   ...> WHERE t1.nob = ‘baz' AND 
>   ...> vt1 MATCH 'foo’;
> ```
> 
> Also, in every operation, for every given constraint, two queries are 
> performed, one that returns the count and the other that returns the actual 
> columns. And, finally, only a subset of the results are returned using LIMIT 
> and OFFSET but *after* a sort ORDER has been prescribed. So, in reality, the 
> last constraint above would result in the following
> 
> ```
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND 
>   ...> WHERE t1.frob = ‘bar' AND 
>   ...> WHERE t1.nob = ‘baz' AND 
>   ...> vt1 MATCH 'foo’;
> 20367
> 
> sqlite> SELECT t1.id, t1.t1Id, … other cols …, 
>   ...> snippet(vt1, 1, "", "", "", 50) AS context,
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 AND 
>   ...> WHERE t1.frob = ‘bar' AND 
>   ...> WHERE t1.nob = ‘baz' AND 
>   ...> vt1 MATCH ‘foo’
>   ...> ORDER BY 
>   ...> LIMIT 30 OFFSET ;
> ```
> 
> When no t1 columns are prescribed in the constraint, the default count (shown 
> above) and default cols are returned with the FTS search
> 
> ```
> sqlite> SELECT Count(*) as num
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 
>   ...> vt1 MATCH 'foo’;
> 20367
> 
> sqlite> SELECT t1.id, t1.t1Id, … other cols …, 
>   ...> snippet(vt1, 1, "", "", "", 50) AS context,
>   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
>   ...> WHERE t1.deleted = 0 
>   ...> vt1 MATCH ‘foo’
>   ...> ORDER BY 
>   ...> LIMIT 30 OFFSET 0;
> ```
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org

Re: [sqlite] speeding up an FTS5 search with a JOIN

2020-03-07 Thread Dan Kennedy


On 7/3/63 14:58, P Kishor wrote:

I asked this question on Stackoverflow with not much success, and a suggestion 
to ask it on the list. So here I am. I have two tables, t1(id, t1Id, … other 
cols …, fullText) and a FTS5 virtual table vt1(t1Id, fullText)

```
sqlite> EXPLAIN QUERY PLAN
...> SELECT Count(*) as num FROM t1 WHERE deleted = 0;

QUERY
PLAN
--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=?)

sqlite> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
308498

Run Time
: real 0.043 user 0.023668 sys 0.009005
```


As can be see above, the actual query takes ~43ms

```
sqlite> EXPLAIN QUERY PLAN
...> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';

QUERY
PLAN
--SCAN TABLE vt1 VIRTUAL TABLE INDEX 131073:

sqlite> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
80789

Run Time
: real 0.047 user 0.008021 sys 0.009640
```

The actual query, in this case, takes ~47ms. So far so good. But the problem 
occurs when I join the two tables

```
sqlite> EXPLAIN QUERY PLAN
...> SELECT Count(*) as num
...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';

QUERY
PLAN
|--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
  --SEARCH TABLE t1 USING INDEX sqlite_autoindex_t1_1 (t1Id=?)

sqlite> SELECT Count(*) as num
...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
80789



That does seem slow. Are there many rows in table "t1" with t1.deleted 
set to something other than 0? What does:


  SELECT count(*) AS num FROM t1 JOIN vt1 ON t1.t1Id=vt1.t1Id WHERE vt1 
MATCH 'foo';


return?

Dan.






Run Time
: real 26.218 user 1.396376 sys 5.413630
```

The answer is correct but the query takes more than 26 seconds! Of course, I 
would like to speed up this query by several orders of magnitude, but I would 
also like to understand why this join is causing the slowdown.

Now, the reason I have constructed a query like this is because users can add 
further constraints for the table t1. For example,

```
sqlite> SELECT Count(*) as num
...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
...> WHERE t1.deleted = 0 AND
...> WHERE t1.frob = ‘bar' AND
...> WHERE t1.nob = ‘baz' AND
...> vt1 MATCH 'foo’;
```

Also, in every operation, for every given constraint, two queries are 
performed, one that returns the count and the other that returns the actual 
columns. And, finally, only a subset of the results are returned using LIMIT 
and OFFSET but *after* a sort ORDER has been prescribed. So, in reality, the 
last constraint above would result in the following

```
sqlite> SELECT Count(*) as num
...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
...> WHERE t1.deleted = 0 AND
...> WHERE t1.frob = ‘bar' AND
...> WHERE t1.nob = ‘baz' AND
...> vt1 MATCH 'foo’;
20367

sqlite> SELECT t1.id, t1.t1Id, … other cols …,
...> snippet(vt1, 1, "", "", "", 50) AS context,
...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
...> WHERE t1.deleted = 0 AND
...> WHERE t1.frob = ‘bar' AND
...> WHERE t1.nob = ‘baz' AND
...> vt1 MATCH ‘foo’
...> ORDER BY 
...> LIMIT 30 OFFSET ;
```

When no t1 columns are prescribed in the constraint, the default count (shown 
above) and default cols are returned with the FTS search

```
sqlite> SELECT Count(*) as num
...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
...> WHERE t1.deleted = 0
...> vt1 MATCH 'foo’;
20367

sqlite> SELECT t1.id, t1.t1Id, … other cols …,
...> snippet(vt1, 1, "", "", "", 50) AS context,
...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
...> WHERE t1.deleted = 0
...> vt1 MATCH ‘foo’
...> ORDER BY 
...> LIMIT 30 OFFSET 0;
```
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] speeding up an FTS5 search with a JOIN

2020-03-07 Thread P Kishor
I asked this question on Stackoverflow with not much success, and a suggestion 
to ask it on the list. So here I am. I have two tables, t1(id, t1Id, … other 
cols …, fullText) and a FTS5 virtual table vt1(t1Id, fullText)

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num FROM t1 WHERE deleted = 0;

QUERY 
PLAN
--SEARCH TABLE t1 USING COVERING INDEX ix_t1_t1Id (deleted=?)

sqlite> SELECT Count(*) as num FROM t1 WHERE deleted = 0;
308498

Run Time
: real 0.043 user 0.023668 sys 0.009005
```


As can be see above, the actual query takes ~43ms

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';

QUERY 
PLAN
--SCAN TABLE vt1 VIRTUAL TABLE INDEX 131073:

sqlite> SELECT Count(*) as num FROM vt1 WHERE vt1 MATCH 'foo';
80789

Run Time
: real 0.047 user 0.008021 sys 0.009640
```

The actual query, in this case, takes ~47ms. So far so good. But the problem 
occurs when I join the two tables

```
sqlite> EXPLAIN QUERY PLAN
   ...> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';

QUERY 
PLAN
|--SCAN TABLE vt1 VIRTUAL TABLE INDEX 0:m
 --SEARCH TABLE t1 USING INDEX sqlite_autoindex_t1_1 (t1Id=?)

sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND vt1 MATCH 'foo';
80789

Run Time
: real 26.218 user 1.396376 sys 5.413630
```

The answer is correct but the query takes more than 26 seconds! Of course, I 
would like to speed up this query by several orders of magnitude, but I would 
also like to understand why this join is causing the slowdown.

Now, the reason I have constructed a query like this is because users can add 
further constraints for the table t1. For example, 

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND 
   ...> WHERE t1.frob = ‘bar' AND 
   ...> WHERE t1.nob = ‘baz' AND 
   ...> vt1 MATCH 'foo’;
```

Also, in every operation, for every given constraint, two queries are 
performed, one that returns the count and the other that returns the actual 
columns. And, finally, only a subset of the results are returned using LIMIT 
and OFFSET but *after* a sort ORDER has been prescribed. So, in reality, the 
last constraint above would result in the following

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND 
   ...> WHERE t1.frob = ‘bar' AND 
   ...> WHERE t1.nob = ‘baz' AND 
   ...> vt1 MATCH 'foo’;
20367

sqlite> SELECT t1.id, t1.t1Id, … other cols …, 
   ...> snippet(vt1, 1, "", "", "", 50) AS context,
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 AND 
   ...> WHERE t1.frob = ‘bar' AND 
   ...> WHERE t1.nob = ‘baz' AND 
   ...> vt1 MATCH ‘foo’
   ...> ORDER BY 
   ...> LIMIT 30 OFFSET ;
```

When no t1 columns are prescribed in the constraint, the default count (shown 
above) and default cols are returned with the FTS search

```
sqlite> SELECT Count(*) as num
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 
   ...> vt1 MATCH 'foo’;
20367

sqlite> SELECT t1.id, t1.t1Id, … other cols …, 
   ...> snippet(vt1, 1, "", "", "", 50) AS context,
   ...> FROM t1 JOIN vt1 ON t1.t1Id = vt1.t1Id
   ...> WHERE t1.deleted = 0 
   ...> vt1 MATCH ‘foo’
   ...> ORDER BY 
   ...> LIMIT 30 OFFSET 0;
```
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users