Re: [sqlite] speeding up FTS4

2011-09-29 Thread Mr. Puneet Kishor
Well, defeated by FTS4 for now, I will try the following approach --

1. drop the fts tables and rebuild them and test.

2. if the above doesn't work, then either migrate the data to Postgres and use 
its fts, or implement e-Swish or httpdig for full text search.

On Sep 28, 2011, at 4:35 PM, Puneet Kishor wrote:

 
 On Sep 28, 2011, at 4:18 PM, Black, Michael (IS) wrote:
 
 I have no idea if this would work...but...here's some more thoughts...
 
 
 
 #1 How long does this take:
 
select count(*) from fts_uri match 'education school';
 
 
 
 #2 Create a view on uris with just what you need and use that in your join 
 (I'm guessing that uri_content takes up most of your database space).
 
 
 
 create view v_uris as select uri_id,feed_history_id from uri;
 
 
 ..
 
 
 snipped a bunch of stuff
 
 I did a query on just the fts table and got the answers relatively quickly. 
 Not instantly, but very fast compared to all the attempts so far. So, 
 assuming that the bottleneck is the multiple JOINs to get the data for the 
 correct project_id, I created a temp table with all that JOIN nonsense
 
   sqlite CREATE TEMP TABLE tmp_uris AS SELECT u.uri_id, u.uri uri, 
 u.u_downloaded_on  
  ... FROM projects p 
  ...   JOIN feeds f ON f.project_id = p.project_id 
  ...   JOIN feed_history fh ON f.feed_id = fh.feed_id 
  ...   JOIN uris u ON fh.feed_history_id = u.feed_history_id 
  ... WHERE p.project_id = 3 AND u.u_downloaded_on = 
 p.u_project_start;
   CPU Time: user 16.369556 sys 81.393235
   
 
   sqlite EXPLAIN QUERY PLAN SELECT u.uri_id uri_id, u.uri uri, 
 u.u_downloaded_on, 
  ...   Snippet(fts_uri, 'span class=hilite', '/span', 
 'hellip;', -1, 64) snippet
  ... FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id 
  ... WHERE fts_uri MATCH 'education school'
  ... ORDER BY u.uri_id, u_downloaded_on DESC;
   0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
   0|1|1|SEARCH TABLE tmp_uris AS u USING INDEX tmp_uris_uri_id (uri_id=?) 
 (~10 rows)
   0|0|0|USE TEMP B-TREE FOR ORDER BY
   CPU Time: user 0.86 sys 0.06
 
 and yet
 
   sqlite SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on, 
  ...   Snippet(fts_uri, 'span class=hilite', '/span', 
 'hellip;', -1, 64) snippet
  ... FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id 
  ... WHERE fts_uri MATCH 'education school'
  ... ORDER BY u.uri_id, u_downloaded_on DESC;
 
   CPU Time: user 21.871541 sys 26.414337
 
 
 A lot better, but simply not usable for a web application.
 
 --
 Puneet Kishor

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-29 Thread Black, Michael (IS)
Why do you think postgres would be any better?

Have you thought about writing your own code to process the tables rather than 
letting the database do all the work?

Load your data into memory and then slog through the uris to winnow out the 
matches?
Probably a LOT faster than letting the db do it.



Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Mr. Puneet Kishor [punk.k...@gmail.com]
Sent: Thursday, September 29, 2011 8:30 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] speeding up FTS4


Well, defeated by FTS4 for now, I will try the following approach --

1. drop the fts tables and rebuild them and test.

2. if the above doesn't work, then either migrate the data to Postgres and use 
its fts, or implement e-Swish or httpdig for full text search.

On Sep 28, 2011, at 4:35 PM, Puneet Kishor wrote:


 On Sep 28, 2011, at 4:18 PM, Black, Michael (IS) wrote:

 I have no idea if this would work...but...here's some more thoughts...



 #1 How long does this take:

select count(*) from fts_uri match 'education school';



 #2 Create a view on uris with just what you need and use that in your join 
 (I'm guessing that uri_content takes up most of your database space).



 create view v_uris as select uri_id,feed_history_id from uri;


 ..


 snipped a bunch of stuff

 I did a query on just the fts table and got the answers relatively quickly. 
 Not instantly, but very fast compared to all the attempts so far. So, 
 assuming that the bottleneck is the multiple JOINs to get the data for the 
 correct project_id, I created a temp table with all that JOIN nonsense

sqlite CREATE TEMP TABLE tmp_uris AS SELECT u.uri_id, u.uri uri, 
 u.u_downloaded_on
   ... FROM projects p
   ...   JOIN feeds f ON f.project_id = p.project_id
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id
   ...   JOIN uris u ON fh.feed_history_id = u.feed_history_id
   ... WHERE p.project_id = 3 AND u.u_downloaded_on = 
 p.u_project_start;
CPU Time: user 16.369556 sys 81.393235


sqlite EXPLAIN QUERY PLAN SELECT u.uri_id uri_id, u.uri uri, 
 u.u_downloaded_on,
   ...   Snippet(fts_uri, 'span class=hilite', '/span', 
 'hellip;', -1, 64) snippet
   ... FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id
   ... WHERE fts_uri MATCH 'education school'
   ... ORDER BY u.uri_id, u_downloaded_on DESC;
0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
0|1|1|SEARCH TABLE tmp_uris AS u USING INDEX tmp_uris_uri_id 
 (uri_id=?) (~10 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY
CPU Time: user 0.86 sys 0.06

 and yet

sqlite SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on,
   ...   Snippet(fts_uri, 'span class=hilite', '/span', 
 'hellip;', -1, 64) snippet
   ... FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id
   ... WHERE fts_uri MATCH 'education school'
   ... ORDER BY u.uri_id, u_downloaded_on DESC;

CPU Time: user 21.871541 sys 26.414337


 A lot better, but simply not usable for a web application.

 --
 Puneet Kishor

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-29 Thread Simon Slavin

On 29 Sep 2011, at 2:30pm, Mr. Puneet Kishor wrote:

 Well, defeated by FTS4 for now, I will try the following approach --
 
 1. drop the fts tables and rebuild them and test.
 
 2. if the above doesn't work, then either migrate the data to Postgres and 
 use its fts, or implement e-Swish or httpdig for full text search.

You know, given that SELECT, you may just not need FTS4.  If you have decent 
indexes it should only need to search a few thousand records for the search 
string and you can do that with LIKE.  The entire search should take less than 
five seconds, probably less than one second.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-29 Thread Puneet Kishor

On Sep 29, 2011, at 8:52 AM, Black, Michael (IS) wrote:

 Why do you think postgres would be any better?
 

I don't. That is why I have two options.


 Have you thought about writing your own code to process the tables rather 
 than letting the database do all the work?

Well, that kinda defeats the purpose of having a database... I want the 
database to do all the work (where all has a sliding scale definition).

In any case, I think (again, think... no scientific tests here) that FTS4 
itself may not be the problem. I can see fts searches are slow, but not that 
slow. The problem is throttling the fts searches based on criteria that are 
coming from joined tables.

So, I do have my work cut out for me. I have to go through my SQL code, really 
narrow down the query that is gumming up the works, then try to eliminate that 
or code a work around.

We'll see. The database is not that large when it comes to rows (less than 45K 
rows), but it is definitely gargantuan when it comes to size (because of all 
the nonsense I have stuffed in it). I have to figure out a workaround for that 
as well.


 
 Load your data into memory and then slog through the uris to winnow out the 
 matches?
 Probably a LOT faster than letting the db do it.
 
 
 
 Michael D. Black
 Senior Scientist
 NG Information Systems
 Advanced Analytics Directorate
 
 
 
 
 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of Mr. Puneet Kishor [punk.k...@gmail.com]
 Sent: Thursday, September 29, 2011 8:30 AM
 To: General Discussion of SQLite Database
 Subject: EXT :Re: [sqlite] speeding up FTS4
 
 
 Well, defeated by FTS4 for now, I will try the following approach --
 
 1. drop the fts tables and rebuild them and test.
 
 2. if the above doesn't work, then either migrate the data to Postgres and 
 use its fts, or implement e-Swish or httpdig for full text search.
 
 On Sep 28, 2011, at 4:35 PM, Puneet Kishor wrote:
 
 
 On Sep 28, 2011, at 4:18 PM, Black, Michael (IS) wrote:
 
 I have no idea if this would work...but...here's some more thoughts...
 
 
 
 #1 How long does this take:
 
   select count(*) from fts_uri match 'education school';
 
 
 
 #2 Create a view on uris with just what you need and use that in your join 
 (I'm guessing that uri_content takes up most of your database space).
 
 
 
 create view v_uris as select uri_id,feed_history_id from uri;
 
 
 ..
 
 
 snipped a bunch of stuff
 
 I did a query on just the fts table and got the answers relatively quickly. 
 Not instantly, but very fast compared to all the attempts so far. So, 
 assuming that the bottleneck is the multiple JOINs to get the data for the 
 correct project_id, I created a temp table with all that JOIN nonsense
 
   sqlite CREATE TEMP TABLE tmp_uris AS SELECT u.uri_id, u.uri uri, 
 u.u_downloaded_on
  ... FROM projects p
  ...   JOIN feeds f ON f.project_id = p.project_id
  ...   JOIN feed_history fh ON f.feed_id = fh.feed_id
  ...   JOIN uris u ON fh.feed_history_id = u.feed_history_id
  ... WHERE p.project_id = 3 AND u.u_downloaded_on = 
 p.u_project_start;
   CPU Time: user 16.369556 sys 81.393235
 
 
   sqlite EXPLAIN QUERY PLAN SELECT u.uri_id uri_id, u.uri uri, 
 u.u_downloaded_on,
  ...   Snippet(fts_uri, 'span class=hilite', '/span', 
 'hellip;', -1, 64) snippet
  ... FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id
  ... WHERE fts_uri MATCH 'education school'
  ... ORDER BY u.uri_id, u_downloaded_on DESC;
   0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
   0|1|1|SEARCH TABLE tmp_uris AS u USING INDEX tmp_uris_uri_id 
 (uri_id=?) (~10 rows)
   0|0|0|USE TEMP B-TREE FOR ORDER BY
   CPU Time: user 0.86 sys 0.06
 
 and yet
 
   sqlite SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on,
  ...   Snippet(fts_uri, 'span class=hilite', '/span', 
 'hellip;', -1, 64) snippet
  ... FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id
  ... WHERE fts_uri MATCH 'education school'
  ... ORDER BY u.uri_id, u_downloaded_on DESC;
 
   CPU Time: user 21.871541 sys 26.414337
 
 
 A lot better, but simply not usable for a web application.
 
 --
 Puneet Kishor
 
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-29 Thread Petite Abeille

On Sep 29, 2011, at 3:30 PM, Mr. Puneet Kishor wrote:

 Well, defeated by FTS4 for now, I will try the following approach --

[didn't follow the thread blow by blow, so apologies if this was already 
covered and dismissed :)]

Before you jump to the deep end...

FTS tables are meant to be accessed by either their rowid or queried with a 
match qualifier. Anything else will be rather slow, as it will result in a full 
table scan. See section 1.4. Simple FTS Queries of the fine manual [1].

Looking at the DDL you posted original, you have:

   CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content); 

That uri_id is not helping anything, as you cannot really use it to lookup the 
table (see section 1.3. Populating FTS Tables  [2]) . You should instead drop 
it, and simply set the rowid of your FTS table to the uris.uri_id.

So:

   CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_content);  -- don't bother with 
uri_id here, as it cannot be used for lookups

Now populate fts_uri with the content of  uris, using the same rowid for both:

insert
intofts_uri
(
rowid,
uri_content
)
select  uris.uri_id,
uris. uris
fromuris

Also, it would appear that you are storing your content twice: once in uris. 
uri_content and once again in fts_uri. uri_content. That's once too many.

From uris, you can retrieve your content directly from fts_uri:

select  *
fromuris
join fts_uri
on   fts_uri.rowid = uris.uri_id

No point in storing the data twice as you can retrieve the text verbatim from 
fts_uri without much ado.

And now you can access uris directly from fts_uri as well as they share the 
same rowid.

Now, for your search, decompose the problem:

(1) do the FTS first, assuming it's the most selective part of your query

select  fts_uri.rowid as uri_id,
snippet(fts_uri, 'span class=hilite', '/span', 'hellip;', -1, 
64) snippet 
fromfts_uri

where   fts_uri.uri_content match 'education,school' 

Is that slow? 

(2) Add joins one by one

select  fts_uri.rowid as uri_id,
snippet(fts_uri, 'span class=hilite', '/span', 'hellip;', -1, 
64) snippet, 
uris.uri as uri, 
uris.downloaded_on as downloaded_on
fromfts_uri

joinuris
on  uris.uri_id = fts_uri.rowid

where   fts_uri.uri_content match 'education,school' 

Is that slow?

Repeat and rinse :)




[1] http://www.sqlite.org/fts3.html#section_1_4
[2] http://www.sqlite.org/fts3.html#section_1_3
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-29 Thread Puneet Kishor
all of the below is really good advice that I shall follow over this weekend.

Many thanks.


On Sep 29, 2011, at 10:05 AM, Petite Abeille wrote:

 
 On Sep 29, 2011, at 3:30 PM, Mr. Puneet Kishor wrote:
 
 Well, defeated by FTS4 for now, I will try the following approach --
 
 [didn't follow the thread blow by blow, so apologies if this was already 
 covered and dismissed :)]
 
 Before you jump to the deep end...
 
 FTS tables are meant to be accessed by either their rowid or queried with a 
 match qualifier. Anything else will be rather slow, as it will result in a 
 full table scan. See section 1.4. Simple FTS Queries of the fine manual [1].
 
 Looking at the DDL you posted original, you have:
 
   CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content); 
 
 That uri_id is not helping anything, as you cannot really use it to lookup 
 the table (see section 1.3. Populating FTS Tables  [2]) . You should 
 instead drop it, and simply set the rowid of your FTS table to the 
 uris.uri_id.
 
 So:
 
   CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_content);  -- don't bother 
 with uri_id here, as it cannot be used for lookups
 
 Now populate fts_uri with the content of  uris, using the same rowid for both:
 
 insert
 into  fts_uri
   (
   rowid,
   uri_content
   )
 selecturis.uri_id,
   uris. uris
 from  uris
 
 Also, it would appear that you are storing your content twice: once in uris. 
 uri_content and once again in fts_uri. uri_content. That's once too many.
 
 From uris, you can retrieve your content directly from fts_uri:
 
 select  *
 fromuris
 join fts_uri
 on   fts_uri.rowid = uris.uri_id
 
 No point in storing the data twice as you can retrieve the text verbatim from 
 fts_uri without much ado.
 
 And now you can access uris directly from fts_uri as well as they share the 
 same rowid.
 
 Now, for your search, decompose the problem:
 
 (1) do the FTS first, assuming it's the most selective part of your query
 
 select  fts_uri.rowid as uri_id,
snippet(fts_uri, 'span class=hilite', '/span', 'hellip;', -1, 
 64) snippet 
 fromfts_uri
 
 where   fts_uri.uri_content match 'education,school' 
 
 Is that slow? 
 
 (2) Add joins one by one
 
 select  fts_uri.rowid as uri_id,
snippet(fts_uri, 'span class=hilite', '/span', 'hellip;', -1, 
 64) snippet, 
uris.uri as uri, 
uris.downloaded_on as downloaded_on
 fromfts_uri
 
 joinuris
 on  uris.uri_id = fts_uri.rowid
 
 where   fts_uri.uri_content match 'education,school' 
 
 Is that slow?
 
 Repeat and rinse :)
 
 
 
 
 [1] http://www.sqlite.org/fts3.html#section_1_4
 [2] http://www.sqlite.org/fts3.html#section_1_3
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Black, Michael (IS)
P.S.  Your projects table is missing project_start.  So apparently these aren't 
the real create statements you are using.





Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Puneet Kishor [punk.k...@gmail.com]
Sent: Tuesday, September 27, 2011 5:46 PM
To: General Discussion of SQLite Database
Subject: EXT :[sqlite] speeding up FTS4

I have

CREATE TABLE uris (
uri_id INTEGER PRIMARY KEY,
uri TEXT,
uri_content TEXT,
downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
feed_history_id INTEGER
);

with 46608 rows

CREATE TABLE feed_history (
feed_history_id INTEGER PRIMARY KEY,
feed_id INTEGER,
scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
);

with 3276 rows

CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id 
INTEGER);

with 79 rows

CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);

with 3 rows

CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);

The database file is about 27 GB.


The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 
46608 rows

SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on,
Snippet(fts_uri, 'span class=hilite', '/span', 'hellip;', -1, 
64) snippet
FROM fts_uri f
JOIN uris u ON f.uri_id = u.uri_id
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
JOIN feeds f ON fh.feed_id = f.feed_id
JOIN projects p ON f.project_id = p.project_id
WHERE p.project_id = 3
AND Datetime(u.downloaded_on) = Datetime(p.project_start)
AND fts_uri MATCH 'education,school'
ORDER BY u.uri_id, downloaded_on DESC;


The EXPLAIN QUERY PLAN for the above query tells me

0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) 
(~1 rows)
0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 
rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY


Is there anything I can do to speed this up?

--
Puneet Kishor
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Puneet Kishor

On Sep 28, 2011, at 8:19 AM, Black, Michael (IS) wrote:

 P.S.  Your projects table is missing project_start.  So apparently these 
 aren't the real create statements you are using.
 
 
 
 


Sorry, I think that is the only table from which I snipped off information to 
make the post brief(er). These are the actual statements. The projects table is 

CREATE TABLE projects (
project_id INTEGER PRIMARY KEY, 
project_name TEXT,
website_tags TEXT,
twitter_tags TEXT,
flickr_tags TEXT, 
project_start DATETIME, 
project_en DATETIME, 
project_end DATETIME
);

 
 
 
 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of Puneet Kishor [punk.k...@gmail.com]
 Sent: Tuesday, September 27, 2011 5:46 PM
 To: General Discussion of SQLite Database
 Subject: EXT :[sqlite] speeding up FTS4
 
 I have
 
CREATE TABLE uris (
uri_id INTEGER PRIMARY KEY,
uri TEXT,
uri_content TEXT,
downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
feed_history_id INTEGER
);
 
 with 46608 rows
 
CREATE TABLE feed_history (
feed_history_id INTEGER PRIMARY KEY,
feed_id INTEGER,
scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
);
 
 with 3276 rows
 
CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id 
 INTEGER);
 
 with 79 rows
 
CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);
 
 with 3 rows
 
CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);
 
 The database file is about 27 GB.
 
 
 The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 
 46608 rows
 
SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on,
Snippet(fts_uri, 'span class=hilite', '/span', 'hellip;', -1, 
 64) snippet
FROM fts_uri f
JOIN uris u ON f.uri_id = u.uri_id
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
JOIN feeds f ON fh.feed_id = f.feed_id
JOIN projects p ON f.project_id = p.project_id
WHERE p.project_id = 3
AND Datetime(u.downloaded_on) = Datetime(p.project_start)
AND fts_uri MATCH 'education,school'
ORDER BY u.uri_id, downloaded_on DESC;
 
 
 The EXPLAIN QUERY PLAN for the above query tells me
 
0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) 
 (~1 rows)
0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 
 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY
 
 
 Is there anything I can do to speed this up?
 
 --
 Puneet Kishor
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Black, Michael (IS)
Sounds like you may just be hitting disk i/o.  Your sys numbers seem to 
indicate that.



How much memory does your machine have?



How much time does each WHERE clause take?

select count(*) from project where project_id = 3;

select count(*) from fts_uri MATCH 'education,school';

select count(*) from project as p,fts_uri as u where DateTime(u.downloaded_on) 
= DateTime(p.project_start);

(I don't think you need Datetime at all...but I doubt it makes a lot of 
differencetest it)...should get the same count.

select count(*) from project as p,uris as u where u.downloaded_on = 
p.project_start;



Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Puneet Kishor [punk.k...@gmail.com]
Sent: Wednesday, September 28, 2011 9:41 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] EXT : speeding up FTS4


On Sep 28, 2011, at 8:02 AM, Black, Michael (IS) wrote:

 Have you done ANALYZE?  That might help.

 Also...try to arrange your joins based on record count (both high-to-low and 
 low-to-high) and see what difference it makes.

 Since you have only one WHERE clause I'm guessing having project_ids as the 
 first join makes sense.




So, I ran ANALYZE. Then, with the query as is, I got

CPU Time: user 24.742481 sys 79.120486

If I flip the WHERE clause to

WHERE Datetime(u.downloaded_on) = Datetime(p.project_start)
AND p.project_id = 3
AND fts_uri MATCH 'education school'

I still get

CPU Time: user 24.726792 sys 79.240780



Yesterday, in desperation, I tried to pare down my fts_uri table which brought 
the file size to 17 GB, but ended up creating a malformed disk image whatever 
the heck that means. Thankfully I had a backup (clever boy).

So, I am now back with a 27 GB file, and a query that takes forever thereby 
locking up my web app.





 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of Puneet Kishor [punk.k...@gmail.com]
 Sent: Tuesday, September 27, 2011 5:46 PM
 To: General Discussion of SQLite Database
 Subject: EXT :[sqlite] speeding up FTS4


 I have

CREATE TABLE uris (
uri_id INTEGER PRIMARY KEY,
uri TEXT,
uri_content TEXT,
downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
feed_history_id INTEGER
);

 with 46608 rows

CREATE TABLE feed_history (
feed_history_id INTEGER PRIMARY KEY,
feed_id INTEGER,
scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
);

 with 3276 rows

CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id 
 INTEGER);

 with 79 rows

CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);

 with 3 rows

CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);

 The database file is about 27 GB.


 The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 
 46608 rows

SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on,
Snippet(fts_uri, 'span class=hilite', '/span', 'hellip;', -1, 
 64) snippet
FROM fts_uri f
JOIN uris u ON f.uri_id = u.uri_id
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
JOIN feeds f ON fh.feed_id = f.feed_id
JOIN projects p ON f.project_id = p.project_id
WHERE p.project_id = 3
AND Datetime(u.downloaded_on) = Datetime(p.project_start)
AND fts_uri MATCH 'education,school'
ORDER BY u.uri_id, downloaded_on DESC;


 The EXPLAIN QUERY PLAN for the above query tells me

0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) 
 (~1 rows)
0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 
 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY


 Is there anything I can do to speed this up?

 --
 Puneet Kishor

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Black, Michael (IS)
Your change to numeric date/time may not take a long as you think.



drop any indexes on project_start and downloaded_on;



update projects set project_start=julianday(project_start);

update uris set downloaded_on=julianday(downloaded_on);



Recreate indexes.



Modify your code to insert julianday('now','localtime') instead of taking the 
default current_timestamp.



I wasn't really aware before the the datetime functions stored as text all the 
time...that's bad for searches...





Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Puneet Kishor [punk.k...@gmail.com]
Sent: Wednesday, September 28, 2011 9:48 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] EXT : speeding up FTS4


On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote:


 On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote:

   WHERE Datetime(u.downloaded_on) = Datetime(p.project_start)

 Why are you doing 'Datetime' here ?  Not only does the conversion take time, 
 but it means you can't usefully index either of those two columns.

 Can you instead store your stamps in a format which is readily sortable ?  
 Either in text form or as julian days.


Could I? Sure, if I had known better. Should I? I would be happy to create a 
new column, convert the values to julian days, and try that, but on a 27 GB db, 
that would take a bit of a while.

But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there 
really is no such thing as DATETIME value. Internally, it is stored as TEXT 
anyway.



 Simon.
 ___
 sqlite-users mailing list
 sqlite-users@sqlite.org
 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Puneet Kishor

On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote:

 Your change to numeric date/time may not take a long as you think.
 
 
 


Took an hour and a half.

Step 1: Alter all tables with datetime columns, converting those columns to 
integer;

Step 2: Update all tables setting new datetime columns to unixtime

UPDATE table SET new_column = strftime('%s', old_column);

Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on);

Step 4: Run the following query

SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on 
FROM fts_uri f 
JOIN uris u ON f.uri_id = u.uri_id 
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id 
JOIN feeds f ON fh.feed_id = f.feed_id 
JOIN projects p ON f.project_id = p.project_id 
WHERE p.project_id = 3
AND u.u_downloaded_on = p.u_project_start 
AND fts_uri MATCH 'education school'
ORDER BY u.uri_id, u_downloaded_on DESC;

Terrible time.
 CPU Time: user 27.584849 sys 115.219293

Step 5: EXPLAIN QUERY PLAN (above SELECT query)

0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 
rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY 
(rowid=?) (~1 rows)
0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 
rows)
0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) 
(~1 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY
CPU Time: user 0.99 sys 0.08

Step 6: ANALYZE;

Step 7: Run the above SELECT query again

Terrible time.
 CPU Time: user 27.703538 sys 116.684390



This train is going nowhere. The times are actually worse than they were when I 
was using a non-text column for date time.

By the way, have tried this on two machines -- the previous one was a dual-Xeon 
Xserve with 12 GB RAM. The current machine of choice is the top of the line 
iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 RPM SATA drive.


 drop any indexes on project_start and downloaded_on;
 
 
 
 update projects set project_start=julianday(project_start);
 
 update uris set downloaded_on=julianday(downloaded_on);
 
 
 
 Recreate indexes.
 
 
 
 Modify your code to insert julianday('now','localtime') instead of taking the 
 default current_timestamp.
 
 
 
 I wasn't really aware before the the datetime functions stored as text all 
 the time...that's bad for searches...
 
 
 
 
 
 Michael D. Black
 
 Senior Scientist
 
 NG Information Systems
 
 Advanced Analytics Directorate
 
 
 
 
 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of Puneet Kishor [punk.k...@gmail.com]
 Sent: Wednesday, September 28, 2011 9:48 AM
 To: General Discussion of SQLite Database
 Subject: Re: [sqlite] EXT : speeding up FTS4
 
 
 On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote:
 
 
 On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote:
 
  WHERE Datetime(u.downloaded_on) = Datetime(p.project_start)
 
 Why are you doing 'Datetime' here ?  Not only does the conversion take time, 
 but it means you can't usefully index either of those two columns.
 
 Can you instead store your stamps in a format which is readily sortable ?  
 Either in text form or as julian days.
 
 
 Could I? Sure, if I had known better. Should I? I would be happy to create a 
 new column, convert the values to julian days, and try that, but on a 27 GB 
 db, that would take a bit of a while.
 
 But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there 
 really is no such thing as DATETIME value. Internally, it is stored as TEXT 
 anyway.
 
 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Simon Slavin

On 28 Sep 2011, at 6:44pm, Puneet Kishor wrote:

 Step 4: Run the following query
 
   SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on 
   FROM fts_uri f 
   JOIN uris u ON f.uri_id = u.uri_id 
   JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id 
   JOIN feeds f ON fh.feed_id = f.feed_id 
   JOIN projects p ON f.project_id = p.project_id 
   WHERE p.project_id = 3
   AND u.u_downloaded_on = p.u_project_start 
   AND fts_uri MATCH 'education school'
   ORDER BY u.uri_id, u_downloaded_on DESC;
 
 Terrible time.
 CPU Time: user 27.584849 sys 115.219293
 
 Step 5: EXPLAIN QUERY PLAN (above SELECT query)
 
   0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
   0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 
 rows)
   0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY 
 (rowid=?) (~1 rows)
   0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 
 rows)
   0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) 
 (~1 rows)
   0|0|0|USE TEMP B-TREE FOR ORDER BY
   CPU Time: user 0.99 sys 0.08

I'm not familiar with the output of EXPLAIN QUERY PLAN.  'SEARCH' means it's 
using an index ?  And SCAN means it isn't ?  Can anyone suggest a good INDEX or 
two ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Black, Michael (IS)
strftime returns a text representation.  So you didn't really change anything.

You need to use juliandays() as I said.



And you want a REAL number...not integer...though SQLite doesn't really care 
what you call it.  It's more for your own reference.



You just added a bunch more strings increasing the size of your database...ergo 
it ran slower.



Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Puneet Kishor [punk.k...@gmail.com]
Sent: Wednesday, September 28, 2011 12:44 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] speeding up FTS4


On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote:

 Your change to numeric date/time may not take a long as you think.





Took an hour and a half.

Step 1: Alter all tables with datetime columns, converting those columns to 
integer;

Step 2: Update all tables setting new datetime columns to unixtime

UPDATE table SET new_column = strftime('%s', old_column);

Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on);

Step 4: Run the following query

SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
FROM fts_uri f
JOIN uris u ON f.uri_id = u.uri_id
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
JOIN feeds f ON fh.feed_id = f.feed_id
JOIN projects p ON f.project_id = p.project_id
WHERE p.project_id = 3
AND u.u_downloaded_on = p.u_project_start
AND fts_uri MATCH 'education school'
ORDER BY u.uri_id, u_downloaded_on DESC;

Terrible time.
 CPU Time: user 27.584849 sys 115.219293

Step 5: EXPLAIN QUERY PLAN (above SELECT query)

0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 
rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY 
(rowid=?) (~1 rows)
0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 
rows)
0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) 
(~1 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY
CPU Time: user 0.99 sys 0.08

Step 6: ANALYZE;

Step 7: Run the above SELECT query again

Terrible time.
 CPU Time: user 27.703538 sys 116.684390



This train is going nowhere. The times are actually worse than they were when I 
was using a non-text column for date time.

By the way, have tried this on two machines -- the previous one was a dual-Xeon 
Xserve with 12 GB RAM. The current machine of choice is the top of the line 
iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 RPM SATA drive.


 drop any indexes on project_start and downloaded_on;



 update projects set project_start=julianday(project_start);

 update uris set downloaded_on=julianday(downloaded_on);



 Recreate indexes.



 Modify your code to insert julianday('now','localtime') instead of taking the 
 default current_timestamp.



 I wasn't really aware before the the datetime functions stored as text all 
 the time...that's bad for searches...





 Michael D. Black

 Senior Scientist

 NG Information Systems

 Advanced Analytics Directorate



 
 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of Puneet Kishor [punk.k...@gmail.com]
 Sent: Wednesday, September 28, 2011 9:48 AM
 To: General Discussion of SQLite Database
 Subject: Re: [sqlite] EXT : speeding up FTS4


 On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote:


 On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote:

  WHERE Datetime(u.downloaded_on) = Datetime(p.project_start)

 Why are you doing 'Datetime' here ?  Not only does the conversion take time, 
 but it means you can't usefully index either of those two columns.

 Can you instead store your stamps in a format which is readily sortable ?  
 Either in text form or as julian days.


 Could I? Sure, if I had known better. Should I? I would be happy to create a 
 new column, convert the values to julian days, and try that, but on a 27 GB 
 db, that would take a bit of a while.

 But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there 
 really is no such thing as DATETIME value. Internally, it is stored as TEXT 
 anyway.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Roger Andersson

 On 09/28/11 20:14, Black, Michael (IS) wrote:

strftime returns a text representation.  So you didn't really change anything.

You need to use juliandays() as I said.


And you want a REAL number...not integer...though SQLite doesn't really care 
what you call it.  It's more for your own reference.


Assuming that second resolution is sufficient.

Would
 UPDATE table SET new_column = cast(strftime('%s', old_column) as integer);
make any difference?

/Roger


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Puneet Kishor

On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote:

 strftime returns a text representation.  So you didn't really change anything.
 


That's not true at all. I added u_downloaded_on (u_ for unixtime)

CREATE TABLE uris (
uri_id INTEGER PRIMARY KEY, 
uri TEXT,
uri_html TEXT,
uri_content TEXT,
downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
feed_history_id INTEGER, 
u_downloaded_on INTEGER
);

sqlite SELECT u_downloaded_on FROM uris LIMIT 5;
1306450769
1306450769
1306450770
1306450774
1306450776


 You need to use juliandays() as I said.
 
 
 
 And you want a REAL number...not integer...though SQLite doesn't really care 
 what you call it.  It's more for your own reference.
 
 
 
 You just added a bunch more strings increasing the size of your 
 database...ergo it ran slower.
 
 


If I understand correctly, the *size* of the database should not matter. Or, at 
least not matter as much. Imagine a database with one table with only one row 
but with so much content in that row that it is 27 GB vs. a database with a few 
million rows, each with small amount of content but together totaling 27 GB.

What should matter is using the indexes correctly. In this case, my query plan 
shows that I am hitting all the indexes. But, I think I am getting closer to 
understanding this.

I started backward with

sqlite SELECT p.project_id 
   ... FROM projects p
   ... WHERE p.project_id = 3;
3
CPU Time: user 0.80 sys 0.89

sqlite SELECT f.feed_id, p.project_id 
   ... FROM projects p JOIN feeds f ON f.project_id = p.project_id 
   ... WHERE p.project_id = 3; 
..
CPU Time: user 0.000239 sys 0.000170

SELECT fh.feed_history_id, f.feed_id, p.project_id 
   ... FROM projects p 
   ...   JOIN feeds f ON f.project_id = p.project_id 
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id 
   ... WHERE p.project_id = 3;
..
CPU Time: user 0.008491 sys 0.008054

SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id 
   ... FROM projects p 
   ...   JOIN feeds f ON f.project_id = p.project_id 
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id 
   ...   JOIN uris u ON fh.feed_history_id = u.feed_history_id 
   ... WHERE p.project_id = 3;


BOOM! Adding that last table makes my query way too slow.

Taking out the uris table and querying only the fts table

sqlite SELECT uri_id 
   ... FROM fts_uri 
   ... WHERE fts_uri MATCH 'education school';


starts producing the results immediately, but gives me back way too many 
matches. I am now getting all the matches, but I want only the matches for 
project id = 3 or matches throttled by u_downloaded_on (both of which are 
achievable only via a JOIN with uris table).


 
 Michael D. Black
 
 Senior Scientist
 
 NG Information Systems
 
 Advanced Analytics Directorate
 
 
 
 
 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of Puneet Kishor [punk.k...@gmail.com]
 Sent: Wednesday, September 28, 2011 12:44 PM
 To: General Discussion of SQLite Database
 Subject: EXT :Re: [sqlite] speeding up FTS4
 
 
 On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote:
 
 Your change to numeric date/time may not take a long as you think.
 
 
 
 
 
 Took an hour and a half.
 
 Step 1: Alter all tables with datetime columns, converting those columns to 
 integer;
 
 Step 2: Update all tables setting new datetime columns to unixtime
 
 UPDATE table SET new_column = strftime('%s', old_column);
 
 Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on);
 
 Step 4: Run the following query
 
SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
FROM fts_uri f
JOIN uris u ON f.uri_id = u.uri_id
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
JOIN feeds f ON fh.feed_id = f.feed_id
JOIN projects p ON f.project_id = p.project_id
WHERE p.project_id = 3
AND u.u_downloaded_on = p.u_project_start
AND fts_uri MATCH 'education school'
ORDER BY u.uri_id, u_downloaded_on DESC;
 
 Terrible time.
 CPU Time: user 27.584849 sys 115.219293
 
 Step 5: EXPLAIN QUERY PLAN (above SELECT query)
 
0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 
 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY 
 (rowid=?) (~1 rows)
0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 
 rows)
0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) 
 (~1 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY
CPU Time: user 0.99 sys 0.08
 
 Step 6: ANALYZE;
 
 Step 7: Run the above SELECT query again
 
 Terrible time.
 CPU Time: user 27.703538 sys 116.684390
 
 
 
 This train is going nowhere. The times are actually worse than they were when 
 I was using a non-text column for date time

Re: [sqlite] speeding up FTS4

2011-09-28 Thread Black, Michael (IS)
'scuse meI was wrong (again)...I guess strftime does return an 
integerseems to me that belies the name as it's a mismatch to the unix 
function.



select strftime('%s','now');

1317236583



But I think you may want:

strftime('%s','now','unixepoch','localtime');



That works with the default datetime() then when you select it.

sqlite select datetime(strftime('%s','now'),'unixepoch','localtime');
2011-09-28 14:05:09



So...given that you actually stuck integers into your table I'm a bit surprised 
that it took longer.





Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Roger Andersson [r...@telia.com]
Sent: Wednesday, September 28, 2011 1:52 PM
To: sqlite-users@sqlite.org
Subject: EXT :Re: [sqlite] speeding up FTS4

  On 09/28/11 20:14, Black, Michael (IS) wrote:
 strftime returns a text representation.  So you didn't really change anything.

 You need to use juliandays() as I said.


 And you want a REAL number...not integer...though SQLite doesn't really care 
 what you call it.  It's more for your own reference.

Assuming that second resolution is sufficient.

Would
  UPDATE table SET new_column = cast(strftime('%s', old_column) as integer);
make any difference?

/Roger


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Roger Andersson

 On 09/28/11 21:10, Black, Michael (IS) wrote:


'scuse meI was wrong (again)...I guess strftime does return an 
integerseems to me that belies the name as it's a mismatch to the 
unix function.




?
SQLite version 3.7.8 2011-09-19 14:49:19
Enter .help for instructions
Enter SQL statements terminated with a ;
sqlite select typeof(strftime('%s','now','localtime'));
text

/Roger
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Petite Abeille

On Sep 28, 2011, at 9:00 PM, Puneet Kishor wrote:

 If I understand correctly, the *size* of the database should not matter. Or, 
 at least not matter as much. 

So she said. But contrary to popular believe, size does matter.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Black, Michael (IS)
What happens if you create an index on uris(feed_history_id)





Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Puneet Kishor [punk.k...@gmail.com]
Sent: Wednesday, September 28, 2011 2:00 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] speeding up FTS4


On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote:

 strftime returns a text representation.  So you didn't really change anything.



That's not true at all. I added u_downloaded_on (u_ for unixtime)

CREATE TABLE uris (
uri_id INTEGER PRIMARY KEY,
uri TEXT,
uri_html TEXT,
uri_content TEXT,
downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
feed_history_id INTEGER,
u_downloaded_on INTEGER
);

sqlite SELECT u_downloaded_on FROM uris LIMIT 5;
1306450769
1306450769
1306450770
1306450774
1306450776


 You need to use juliandays() as I said.



 And you want a REAL number...not integer...though SQLite doesn't really care 
 what you call it.  It's more for your own reference.



 You just added a bunch more strings increasing the size of your 
 database...ergo it ran slower.




If I understand correctly, the *size* of the database should not matter. Or, at 
least not matter as much. Imagine a database with one table with only one row 
but with so much content in that row that it is 27 GB vs. a database with a few 
million rows, each with small amount of content but together totaling 27 GB.

What should matter is using the indexes correctly. In this case, my query plan 
shows that I am hitting all the indexes. But, I think I am getting closer to 
understanding this.

I started backward with

sqlite SELECT p.project_id
   ... FROM projects p
   ... WHERE p.project_id = 3;
3
CPU Time: user 0.80 sys 0.89

sqlite SELECT f.feed_id, p.project_id
   ... FROM projects p JOIN feeds f ON f.project_id = p.project_id
   ... WHERE p.project_id = 3;
..
CPU Time: user 0.000239 sys 0.000170

SELECT fh.feed_history_id, f.feed_id, p.project_id
   ... FROM projects p
   ...   JOIN feeds f ON f.project_id = p.project_id
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id
   ... WHERE p.project_id = 3;
..
CPU Time: user 0.008491 sys 0.008054

SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id
   ... FROM projects p
   ...   JOIN feeds f ON f.project_id = p.project_id
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id
   ...   JOIN uris u ON fh.feed_history_id = u.feed_history_id
   ... WHERE p.project_id = 3;


BOOM! Adding that last table makes my query way too slow.

Taking out the uris table and querying only the fts table

sqlite SELECT uri_id
   ... FROM fts_uri
   ... WHERE fts_uri MATCH 'education school';


starts producing the results immediately, but gives me back way too many 
matches. I am now getting all the matches, but I want only the matches for 
project id = 3 or matches throttled by u_downloaded_on (both of which are 
achievable only via a JOIN with uris table).



 Michael D. Black

 Senior Scientist

 NG Information Systems

 Advanced Analytics Directorate



 
 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of Puneet Kishor [punk.k...@gmail.com]
 Sent: Wednesday, September 28, 2011 12:44 PM
 To: General Discussion of SQLite Database
 Subject: EXT :Re: [sqlite] speeding up FTS4


 On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote:

 Your change to numeric date/time may not take a long as you think.





 Took an hour and a half.

 Step 1: Alter all tables with datetime columns, converting those columns to 
 integer;

 Step 2: Update all tables setting new datetime columns to unixtime

 UPDATE table SET new_column = strftime('%s', old_column);

 Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on);

 Step 4: Run the following query

SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
FROM fts_uri f
JOIN uris u ON f.uri_id = u.uri_id
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
JOIN feeds f ON fh.feed_id = f.feed_id
JOIN projects p ON f.project_id = p.project_id
WHERE p.project_id = 3
AND u.u_downloaded_on = p.u_project_start
AND fts_uri MATCH 'education school'
ORDER BY u.uri_id, u_downloaded_on DESC;

 Terrible time.
 CPU Time: user 27.584849 sys 115.219293

 Step 5: EXPLAIN QUERY PLAN (above SELECT query)

0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 
 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY 
 (rowid=?) (~1 rows)
0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 
 rows)
0|4|4|SEARCH TABLE projects AS p USING

Re: [sqlite] speeding up FTS4

2011-09-28 Thread Puneet Kishor

On Sep 28, 2011, at 2:41 PM, Black, Michael (IS) wrote:

 What happens if you create an index on uris(feed_history_id)
 
 
 

Yeah, I noticed that lacking as well.


sqlite EXPLAIN QUERY PLAN SELECT u.uri_id  
   ... FROM projects p 
   ...   JOIN feeds f ON f.project_id = p.project_id 
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id 
   ...   JOIN uris u ON fh.feed_history_id = u.feed_history_id 
   ... WHERE p.project_id = 3;
0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|1|3|SCAN TABLE uris AS u (~46608 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 
rows)
0|3|1|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
CPU Time: user 0.79 sys 0.14



sqlite CREATE INDEX uris_feed_history_id ON uris (feed_history_id);
CPU Time: user 12.766977 sys 82.766372


sqlite EXPLAIN QUERY PLAN SELECT u.uri_id  
   ... FROM projects p 
   ...   JOIN feeds f ON f.project_id = p.project_id 
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id 
   ...   JOIN uris u ON fh.feed_history_id = u.feed_history_id 
   ... WHERE p.project_id = 3;
0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|1|1|SCAN TABLE feeds AS f (~7 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING AUTOMATIC COVERING INDEX 
(feed_id=?) (~5 rows)
0|3|3|SEARCH TABLE uris AS u USING COVERING INDEX uris_feed_history_id 
(feed_history_id=?) (~10 rows)
CPU Time: user 0.000123 sys 0.15


But, no joy.


SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on 
FROM fts_uri f 
JOIN uris u ON f.uri_id = u.uri_id 
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id 
JOIN feeds f ON fh.feed_id = f.feed_id 
JOIN projects p ON f.project_id = p.project_id 
WHERE p.project_id = 3
AND u.u_downloaded_on = p.u_project_start 
AND fts_uri MATCH 'education school'
ORDER BY u.uri_id, u_downloaded_on DESC;
..
CPU Time: user 28.599581 sys 108.518205




 
 
 
 
 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of Puneet Kishor [punk.k...@gmail.com]
 Sent: Wednesday, September 28, 2011 2:00 PM
 To: General Discussion of SQLite Database
 Subject: EXT :Re: [sqlite] speeding up FTS4
 
 
 On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote:
 
 strftime returns a text representation.  So you didn't really change 
 anything.
 
 
 
 That's not true at all. I added u_downloaded_on (u_ for unixtime)
 
 CREATE TABLE uris (
uri_id INTEGER PRIMARY KEY,
uri TEXT,
uri_html TEXT,
uri_content TEXT,
downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
feed_history_id INTEGER,
u_downloaded_on INTEGER
 );
 
 sqlite SELECT u_downloaded_on FROM uris LIMIT 5;
 1306450769
 1306450769
 1306450770
 1306450774
 1306450776
 
 
 You need to use juliandays() as I said.
 
 
 
 And you want a REAL number...not integer...though SQLite doesn't really care 
 what you call it.  It's more for your own reference.
 
 
 
 You just added a bunch more strings increasing the size of your 
 database...ergo it ran slower.
 
 
 
 
 If I understand correctly, the *size* of the database should not matter. Or, 
 at least not matter as much. Imagine a database with one table with only one 
 row but with so much content in that row that it is 27 GB vs. a database with 
 a few million rows, each with small amount of content but together totaling 
 27 GB.
 
 What should matter is using the indexes correctly. In this case, my query 
 plan shows that I am hitting all the indexes. But, I think I am getting 
 closer to understanding this.
 
 I started backward with
 
 sqlite SELECT p.project_id
   ... FROM projects p
   ... WHERE p.project_id = 3;
 3
 CPU Time: user 0.80 sys 0.89
 
 sqlite SELECT f.feed_id, p.project_id
   ... FROM projects p JOIN feeds f ON f.project_id = p.project_id
   ... WHERE p.project_id = 3;
 ..
 CPU Time: user 0.000239 sys 0.000170
 
 SELECT fh.feed_history_id, f.feed_id, p.project_id
   ... FROM projects p
   ...   JOIN feeds f ON f.project_id = p.project_id
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id
   ... WHERE p.project_id = 3;
 ..
 CPU Time: user 0.008491 sys 0.008054
 
 SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id
   ... FROM projects p
   ...   JOIN feeds f ON f.project_id = p.project_id
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id
   ...   JOIN uris u ON fh.feed_history_id = u.feed_history_id
   ... WHERE p.project_id = 3;
 
 
 BOOM! Adding that last table makes my query way too slow.
 
 Taking out the uris table and querying only the fts table
 
 sqlite SELECT uri_id
   ... FROM fts_uri
   ... WHERE fts_uri MATCH 'education school';
 
 
 starts producing the results immediately, but gives me back way too many 
 matches. I am now getting all the matches, but I want only the matches for 
 project id = 3 or matches throttled by u_downloaded_on (both of which are 
 achievable only via a JOIN

Re: [sqlite] speeding up FTS4

2011-09-28 Thread Puneet Kishor

On Sep 28, 2011, at 2:26 PM, Roger Andersson wrote:

 On 09/28/11 21:10, Black, Michael (IS) wrote:
 
 'scuse meI was wrong (again)...I guess strftime does return an 
 integerseems to me that belies the name as it's a mismatch to the unix 
 function.
 
 
 ?
 SQLite version 3.7.8 2011-09-19 14:49:19
 Enter .help for instructions
 Enter SQL statements terminated with a ;
 sqlite select typeof(strftime('%s','now','localtime'));
 text
 

Perhaps, but I have inserted that in my table where the column is INTEGER.

sqlite SELECT typeof(u_downloaded_on) FROM uris LIMIT 1;
integer

--
Puneet Kishor

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Roger Andersson

 On 09/28/11 21:55, Puneet Kishor wrote:

Perhaps, but I have inserted that in my table where the column is INTEGER.
sqlite  SELECT typeof(u_downloaded_on) FROM uris LIMIT 1;
integer

--

OK!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up FTS4

2011-09-28 Thread Black, Michael (IS)
I have no idea if this would work...but...here's some more thoughts...



#1 How long does this take:

 select count(*) from fts_uri match 'education school';



#2 Create a view on uris with just what you need and use that in your join (I'm 
guessing that uri_content takes up most of your database space).



create view v_uris as select uri_id,feed_history_id from uri;



#3 Move your uri_content to another database and attach it.  Again I'm assuming 
that's your big data field.  That will seperate the paging needed to walk 
through uris.







Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Puneet Kishor [punk.k...@gmail.com]
Sent: Wednesday, September 28, 2011 2:54 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] speeding up FTS4


On Sep 28, 2011, at 2:41 PM, Black, Michael (IS) wrote:

 What happens if you create an index on uris(feed_history_id)




Yeah, I noticed that lacking as well.


sqlite EXPLAIN QUERY PLAN SELECT u.uri_id
   ... FROM projects p
   ...   JOIN feeds f ON f.project_id = p.project_id
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id
   ...   JOIN uris u ON fh.feed_history_id = u.feed_history_id
   ... WHERE p.project_id = 3;
0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|1|3|SCAN TABLE uris AS u (~46608 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 
rows)
0|3|1|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
CPU Time: user 0.79 sys 0.14



sqlite CREATE INDEX uris_feed_history_id ON uris (feed_history_id);
CPU Time: user 12.766977 sys 82.766372


sqlite EXPLAIN QUERY PLAN SELECT u.uri_id
   ... FROM projects p
   ...   JOIN feeds f ON f.project_id = p.project_id
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id
   ...   JOIN uris u ON fh.feed_history_id = u.feed_history_id
   ... WHERE p.project_id = 3;
0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|1|1|SCAN TABLE feeds AS f (~7 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING AUTOMATIC COVERING INDEX 
(feed_id=?) (~5 rows)
0|3|3|SEARCH TABLE uris AS u USING COVERING INDEX uris_feed_history_id 
(feed_history_id=?) (~10 rows)
CPU Time: user 0.000123 sys 0.15


But, no joy.


SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on
FROM fts_uri f
JOIN uris u ON f.uri_id = u.uri_id
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id
JOIN feeds f ON fh.feed_id = f.feed_id
JOIN projects p ON f.project_id = p.project_id
WHERE p.project_id = 3
AND u.u_downloaded_on = p.u_project_start
AND fts_uri MATCH 'education school'
ORDER BY u.uri_id, u_downloaded_on DESC;
..
CPU Time: user 28.599581 sys 108.518205







 
 From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
 behalf of Puneet Kishor [punk.k...@gmail.com]
 Sent: Wednesday, September 28, 2011 2:00 PM
 To: General Discussion of SQLite Database
 Subject: EXT :Re: [sqlite] speeding up FTS4


 On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote:

 strftime returns a text representation.  So you didn't really change 
 anything.



 That's not true at all. I added u_downloaded_on (u_ for unixtime)

 CREATE TABLE uris (
uri_id INTEGER PRIMARY KEY,
uri TEXT,
uri_html TEXT,
uri_content TEXT,
downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
feed_history_id INTEGER,
u_downloaded_on INTEGER
 );

 sqlite SELECT u_downloaded_on FROM uris LIMIT 5;
 1306450769
 1306450769
 1306450770
 1306450774
 1306450776


 You need to use juliandays() as I said.



 And you want a REAL number...not integer...though SQLite doesn't really care 
 what you call it.  It's more for your own reference.



 You just added a bunch more strings increasing the size of your 
 database...ergo it ran slower.




 If I understand correctly, the *size* of the database should not matter. Or, 
 at least not matter as much. Imagine a database with one table with only one 
 row but with so much content in that row that it is 27 GB vs. a database with 
 a few million rows, each with small amount of content but together totaling 
 27 GB.

 What should matter is using the indexes correctly. In this case, my query 
 plan shows that I am hitting all the indexes. But, I think I am getting 
 closer to understanding this.

 I started backward with

 sqlite SELECT p.project_id
   ... FROM projects p
   ... WHERE p.project_id = 3;
 3
 CPU Time: user 0.80 sys 0.89

 sqlite SELECT f.feed_id, p.project_id
   ... FROM projects p JOIN feeds f ON f.project_id = p.project_id
   ... WHERE p.project_id = 3;
 ..
 CPU Time: user 0.000239 sys 0.000170

 SELECT fh.feed_history_id, f.feed_id, p.project_id
   ... FROM projects p
   ...   JOIN feeds f ON f.project_id = p.project_id

Re: [sqlite] speeding up FTS4

2011-09-28 Thread Puneet Kishor

On Sep 28, 2011, at 4:18 PM, Black, Michael (IS) wrote:

 I have no idea if this would work...but...here's some more thoughts...
 
 
 
 #1 How long does this take:
 
 select count(*) from fts_uri match 'education school';
 
 
 
 #2 Create a view on uris with just what you need and use that in your join 
 (I'm guessing that uri_content takes up most of your database space).
 
 
 
 create view v_uris as select uri_id,feed_history_id from uri;
 
 
 ..


snipped a bunch of stuff

I did a query on just the fts table and got the answers relatively quickly. Not 
instantly, but very fast compared to all the attempts so far. So, assuming that 
the bottleneck is the multiple JOINs to get the data for the correct 
project_id, I created a temp table with all that JOIN nonsense

sqlite CREATE TEMP TABLE tmp_uris AS SELECT u.uri_id, u.uri uri, 
u.u_downloaded_on  
   ... FROM projects p 
   ...   JOIN feeds f ON f.project_id = p.project_id 
   ...   JOIN feed_history fh ON f.feed_id = fh.feed_id 
   ...   JOIN uris u ON fh.feed_history_id = u.feed_history_id 
   ... WHERE p.project_id = 3 AND u.u_downloaded_on = 
p.u_project_start;
CPU Time: user 16.369556 sys 81.393235


sqlite EXPLAIN QUERY PLAN SELECT u.uri_id uri_id, u.uri uri, 
u.u_downloaded_on, 
   ...   Snippet(fts_uri, 'span class=hilite', '/span', 
'hellip;', -1, 64) snippet
   ... FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id 
   ... WHERE fts_uri MATCH 'education school'
   ... ORDER BY u.uri_id, u_downloaded_on DESC;
0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
0|1|1|SEARCH TABLE tmp_uris AS u USING INDEX tmp_uris_uri_id (uri_id=?) 
(~10 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY
CPU Time: user 0.86 sys 0.06

and yet

sqlite SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on, 
   ...   Snippet(fts_uri, 'span class=hilite', '/span', 
'hellip;', -1, 64) snippet
   ... FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id 
   ... WHERE fts_uri MATCH 'education school'
   ... ORDER BY u.uri_id, u_downloaded_on DESC;

CPU Time: user 21.871541 sys 26.414337


A lot better, but simply not usable for a web application.

--
Puneet Kishor
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] speeding up FTS4

2011-09-27 Thread Puneet Kishor
I have 

CREATE TABLE uris (
uri_id INTEGER PRIMARY KEY, 
uri TEXT,
uri_content TEXT,
downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP,
feed_history_id INTEGER
);

with 46608 rows

CREATE TABLE feed_history (
feed_history_id INTEGER PRIMARY KEY,
feed_id INTEGER,
scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP
);

with 3276 rows

CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id 
INTEGER);

with 79 rows

CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT);

with 3 rows

CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on);
CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content);

The database file is about 27 GB.


The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 
46608 rows

SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on, 
Snippet(fts_uri, 'span class=hilite', '/span', 'hellip;', -1, 
64) snippet 
FROM fts_uri f 
JOIN uris u ON f.uri_id = u.uri_id 
JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id 
JOIN feeds f ON fh.feed_id = f.feed_id 
JOIN projects p ON f.project_id = p.project_id 
WHERE p.project_id = 3
AND Datetime(u.downloaded_on) = Datetime(p.project_start) 
AND fts_uri MATCH 'education,school' 
ORDER BY u.uri_id, downloaded_on DESC;


The EXPLAIN QUERY PLAN for the above query tells me

0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows)
0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) 
(~1 rows)
0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 
rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY


Is there anything I can do to speed this up?

--
Puneet Kishor
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users