Re: [sqlite] speeding up FTS4
all of the below is really good advice that I shall follow over this weekend. Many thanks. On Sep 29, 2011, at 10:05 AM, Petite Abeille wrote: > > On Sep 29, 2011, at 3:30 PM, Mr. Puneet Kishor wrote: > >> Well, defeated by FTS4 for now, I will try the following approach -- > > [didn't follow the thread blow by blow, so apologies if this was already > covered and dismissed :)] > > Before you jump to the deep end... > > FTS tables are meant to be accessed by either their rowid or queried with a > match qualifier. Anything else will be rather slow, as it will result in a > full table scan. See section 1.4. "Simple FTS Queries" of the fine manual [1]. > > Looking at the DDL you posted original, you have: > > CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content); > > That uri_id is not helping anything, as you cannot really use it to lookup > the table (see section "1.3. Populating FTS Tables" [2]) . You should > instead drop it, and simply set the rowid of your FTS table to the > uris.uri_id. > > So: > > CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_content); -- don't bother > with uri_id here, as it cannot be used for lookups > > Now populate fts_uri with the content of uris, using the same rowid for both: > > insert > into fts_uri > ( > rowid, > uri_content > ) > selecturis.uri_id, > uris. uris > from uris > > Also, it would appear that you are storing your content twice: once in uris. > uri_content and once again in fts_uri. uri_content. That's once too many. > > From uris, you can retrieve your content directly from fts_uri: > > select * > fromuris > join fts_uri > on fts_uri.rowid = uris.uri_id > > No point in storing the data twice as you can retrieve the text verbatim from > fts_uri without much ado. > > And now you can access uris directly from fts_uri as well as they share the > same rowid. > > Now, for your search, decompose the problem: > > (1) do the FTS first, assuming it's the most selective part of your query > > select fts_uri.rowid as uri_id, >snippet(fts_uri, '', '', '…', -1, > 64) snippet > fromfts_uri > > where fts_uri.uri_content match 'education,school' > > Is that slow? > > (2) Add joins one by one > > select fts_uri.rowid as uri_id, >snippet(fts_uri, '', '', '…', -1, > 64) snippet, >uris.uri as uri, >uris.downloaded_on as downloaded_on > fromfts_uri > > joinuris > on uris.uri_id = fts_uri.rowid > > where fts_uri.uri_content match 'education,school' > > Is that slow? > > Repeat and rinse :) > > > > > [1] http://www.sqlite.org/fts3.html#section_1_4 > [2] http://www.sqlite.org/fts3.html#section_1_3 > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
On Sep 29, 2011, at 3:30 PM, Mr. Puneet Kishor wrote: > Well, defeated by FTS4 for now, I will try the following approach -- [didn't follow the thread blow by blow, so apologies if this was already covered and dismissed :)] Before you jump to the deep end... FTS tables are meant to be accessed by either their rowid or queried with a match qualifier. Anything else will be rather slow, as it will result in a full table scan. See section 1.4. "Simple FTS Queries" of the fine manual [1]. Looking at the DDL you posted original, you have: CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content); That uri_id is not helping anything, as you cannot really use it to lookup the table (see section "1.3. Populating FTS Tables" [2]) . You should instead drop it, and simply set the rowid of your FTS table to the uris.uri_id. So: CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_content); -- don't bother with uri_id here, as it cannot be used for lookups Now populate fts_uri with the content of uris, using the same rowid for both: insert intofts_uri ( rowid, uri_content ) select uris.uri_id, uris. uris fromuris Also, it would appear that you are storing your content twice: once in uris. uri_content and once again in fts_uri. uri_content. That's once too many. >From uris, you can retrieve your content directly from fts_uri: select * fromuris join fts_uri on fts_uri.rowid = uris.uri_id No point in storing the data twice as you can retrieve the text verbatim from fts_uri without much ado. And now you can access uris directly from fts_uri as well as they share the same rowid. Now, for your search, decompose the problem: (1) do the FTS first, assuming it's the most selective part of your query select fts_uri.rowid as uri_id, snippet(fts_uri, '', '', '…', -1, 64) snippet fromfts_uri where fts_uri.uri_content match 'education,school' Is that slow? (2) Add joins one by one select fts_uri.rowid as uri_id, snippet(fts_uri, '', '', '…', -1, 64) snippet, uris.uri as uri, uris.downloaded_on as downloaded_on fromfts_uri joinuris on uris.uri_id = fts_uri.rowid where fts_uri.uri_content match 'education,school' Is that slow? Repeat and rinse :) [1] http://www.sqlite.org/fts3.html#section_1_4 [2] http://www.sqlite.org/fts3.html#section_1_3 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
On Sep 29, 2011, at 8:52 AM, Black, Michael (IS) wrote: > Why do you think postgres would be any better? > I don't. That is why I have two options. > Have you thought about writing your own code to process the tables rather > than letting the database do all the work? Well, that kinda defeats the purpose of having a database... I want the database to do all the work (where "all" has a sliding scale definition). In any case, I think (again, "think"... no scientific tests here) that FTS4 itself may not be the problem. I can see fts searches are slow, but not that slow. The problem is throttling the fts searches based on criteria that are coming from joined tables. So, I do have my work cut out for me. I have to go through my SQL code, really narrow down the query that is gumming up the works, then try to eliminate that or code a work around. We'll see. The database is not that large when it comes to rows (less than 45K rows), but it is definitely gargantuan when it comes to size (because of all the nonsense I have stuffed in it). I have to figure out a workaround for that as well. > > Load your data into memory and then slog through the uris to winnow out the > matches? > Probably a LOT faster than letting the db do it. > > > > Michael D. Black > Senior Scientist > NG Information Systems > Advanced Analytics Directorate > > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Mr. Puneet Kishor [punk.k...@gmail.com] > Sent: Thursday, September 29, 2011 8:30 AM > To: General Discussion of SQLite Database > Subject: EXT :Re: [sqlite] speeding up FTS4 > > > Well, defeated by FTS4 for now, I will try the following approach -- > > 1. drop the fts tables and rebuild them and test. > > 2. if the above doesn't work, then either migrate the data to Postgres and > use its fts, or implement e-Swish or httpdig for full text search. > > On Sep 28, 2011, at 4:35 PM, Puneet Kishor wrote: > >> >> On Sep 28, 2011, at 4:18 PM, Black, Michael (IS) wrote: >> >>> I have no idea if this would work...but...here's some more thoughts... >>> >>> >>> >>> #1 How long does this take: >>> >>> select count(*) from fts_uri match 'education school'; >>> >>> >>> >>> #2 Create a view on uris with just what you need and use that in your join >>> (I'm guessing that uri_content takes up most of your database space). >>> >>> >>> >>> create view v_uris as select uri_id,feed_history_id from uri; >>> >>> >>> .. >> >> >> >> >> I did a query on just the fts table and got the answers relatively quickly. >> Not instantly, but very fast compared to all the attempts so far. So, >> assuming that the bottleneck is the multiple JOINs to get the data for the >> correct project_id, I created a temp table with all that JOIN nonsense >> >> sqlite> CREATE TEMP TABLE tmp_uris AS SELECT u.uri_id, u.uri uri, >> u.u_downloaded_on >> ...> FROM projects p >> ...> JOIN feeds f ON f.project_id = p.project_id >> ...> JOIN feed_history fh ON f.feed_id = fh.feed_id >> ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id >> ...> WHERE p.project_id = 3 AND u.u_downloaded_on >= >> p.u_project_start; >> CPU Time: user 16.369556 sys 81.393235 >> >> >> sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id uri_id, u.uri uri, >> u.u_downloaded_on, >> ...> Snippet(fts_uri, '', '', >> '…', -1, 64) snippet >> ...> FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id >> ...> WHERE fts_uri MATCH 'education school' >> ...> ORDER BY u.uri_id, u_downloaded_on DESC; >> 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) >> 0|1|1|SEARCH TABLE tmp_uris AS u USING INDEX tmp_uris_uri_id >> (uri_id=?) (~10 rows) >> 0|0|0|USE TEMP B-TREE FOR ORDER BY >> CPU Time: user 0.86 sys 0.06 >> >> and yet >> >> sqlite> SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on, >> ...> Snippet(fts_uri, '', '', >> '…', -1, 64) snippet >> ...> FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id >> ...> WHERE fts_uri MATCH 'education school' >> ...> ORDER BY u.uri_id, u_downloaded_on DESC; >> >> CPU Time: user 21.871541 sys 26.414337 >> >> >> A lot better, but simply not usable for a web application. >> >> -- >> Puneet Kishor > > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
On 29 Sep 2011, at 2:30pm, Mr. Puneet Kishor wrote: > Well, defeated by FTS4 for now, I will try the following approach -- > > 1. drop the fts tables and rebuild them and test. > > 2. if the above doesn't work, then either migrate the data to Postgres and > use its fts, or implement e-Swish or httpdig for full text search. You know, given that SELECT, you may just not need FTS4. If you have decent indexes it should only need to search a few thousand records for the search string and you can do that with LIKE. The entire search should take less than five seconds, probably less than one second. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
Why do you think postgres would be any better? Have you thought about writing your own code to process the tables rather than letting the database do all the work? Load your data into memory and then slog through the uris to winnow out the matches? Probably a LOT faster than letting the db do it. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Mr. Puneet Kishor [punk.k...@gmail.com] Sent: Thursday, September 29, 2011 8:30 AM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] speeding up FTS4 Well, defeated by FTS4 for now, I will try the following approach -- 1. drop the fts tables and rebuild them and test. 2. if the above doesn't work, then either migrate the data to Postgres and use its fts, or implement e-Swish or httpdig for full text search. On Sep 28, 2011, at 4:35 PM, Puneet Kishor wrote: > > On Sep 28, 2011, at 4:18 PM, Black, Michael (IS) wrote: > >> I have no idea if this would work...but...here's some more thoughts... >> >> >> >> #1 How long does this take: >> >>select count(*) from fts_uri match 'education school'; >> >> >> >> #2 Create a view on uris with just what you need and use that in your join >> (I'm guessing that uri_content takes up most of your database space). >> >> >> >> create view v_uris as select uri_id,feed_history_id from uri; >> >> >> .. > > > > > I did a query on just the fts table and got the answers relatively quickly. > Not instantly, but very fast compared to all the attempts so far. So, > assuming that the bottleneck is the multiple JOINs to get the data for the > correct project_id, I created a temp table with all that JOIN nonsense > >sqlite> CREATE TEMP TABLE tmp_uris AS SELECT u.uri_id, u.uri uri, > u.u_downloaded_on > ...> FROM projects p > ...> JOIN feeds f ON f.project_id = p.project_id > ...> JOIN feed_history fh ON f.feed_id = fh.feed_id > ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id > ...> WHERE p.project_id = 3 AND u.u_downloaded_on >= > p.u_project_start; >CPU Time: user 16.369556 sys 81.393235 > > >sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id uri_id, u.uri uri, > u.u_downloaded_on, > ...> Snippet(fts_uri, '', '', > '…', -1, 64) snippet > ...> FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id > ...> WHERE fts_uri MATCH 'education school' > ...> ORDER BY u.uri_id, u_downloaded_on DESC; >0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) >0|1|1|SEARCH TABLE tmp_uris AS u USING INDEX tmp_uris_uri_id > (uri_id=?) (~10 rows) >0|0|0|USE TEMP B-TREE FOR ORDER BY >CPU Time: user 0.86 sys 0.06 > > and yet > >sqlite> SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on, > ...> Snippet(fts_uri, '', '', > '…', -1, 64) snippet > ...> FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id > ...> WHERE fts_uri MATCH 'education school' > ...> ORDER BY u.uri_id, u_downloaded_on DESC; > >CPU Time: user 21.871541 sys 26.414337 > > > A lot better, but simply not usable for a web application. > > -- > Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
Well, defeated by FTS4 for now, I will try the following approach -- 1. drop the fts tables and rebuild them and test. 2. if the above doesn't work, then either migrate the data to Postgres and use its fts, or implement e-Swish or httpdig for full text search. On Sep 28, 2011, at 4:35 PM, Puneet Kishor wrote: > > On Sep 28, 2011, at 4:18 PM, Black, Michael (IS) wrote: > >> I have no idea if this would work...but...here's some more thoughts... >> >> >> >> #1 How long does this take: >> >>select count(*) from fts_uri match 'education school'; >> >> >> >> #2 Create a view on uris with just what you need and use that in your join >> (I'm guessing that uri_content takes up most of your database space). >> >> >> >> create view v_uris as select uri_id,feed_history_id from uri; >> >> >> .. > > > > > I did a query on just the fts table and got the answers relatively quickly. > Not instantly, but very fast compared to all the attempts so far. So, > assuming that the bottleneck is the multiple JOINs to get the data for the > correct project_id, I created a temp table with all that JOIN nonsense > > sqlite> CREATE TEMP TABLE tmp_uris AS SELECT u.uri_id, u.uri uri, > u.u_downloaded_on > ...> FROM projects p > ...> JOIN feeds f ON f.project_id = p.project_id > ...> JOIN feed_history fh ON f.feed_id = fh.feed_id > ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id > ...> WHERE p.project_id = 3 AND u.u_downloaded_on >= > p.u_project_start; > CPU Time: user 16.369556 sys 81.393235 > > > sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id uri_id, u.uri uri, > u.u_downloaded_on, > ...> Snippet(fts_uri, '', '', > '…', -1, 64) snippet > ...> FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id > ...> WHERE fts_uri MATCH 'education school' > ...> ORDER BY u.uri_id, u_downloaded_on DESC; > 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) > 0|1|1|SEARCH TABLE tmp_uris AS u USING INDEX tmp_uris_uri_id (uri_id=?) > (~10 rows) > 0|0|0|USE TEMP B-TREE FOR ORDER BY > CPU Time: user 0.86 sys 0.06 > > and yet > > sqlite> SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on, > ...> Snippet(fts_uri, '', '', > '…', -1, 64) snippet > ...> FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id > ...> WHERE fts_uri MATCH 'education school' > ...> ORDER BY u.uri_id, u_downloaded_on DESC; > > CPU Time: user 21.871541 sys 26.414337 > > > A lot better, but simply not usable for a web application. > > -- > Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
On Sep 28, 2011, at 4:18 PM, Black, Michael (IS) wrote: > I have no idea if this would work...but...here's some more thoughts... > > > > #1 How long does this take: > > select count(*) from fts_uri match 'education school'; > > > > #2 Create a view on uris with just what you need and use that in your join > (I'm guessing that uri_content takes up most of your database space). > > > > create view v_uris as select uri_id,feed_history_id from uri; > > > .. I did a query on just the fts table and got the answers relatively quickly. Not instantly, but very fast compared to all the attempts so far. So, assuming that the bottleneck is the multiple JOINs to get the data for the correct project_id, I created a temp table with all that JOIN nonsense sqlite> CREATE TEMP TABLE tmp_uris AS SELECT u.uri_id, u.uri uri, u.u_downloaded_on ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id ...> WHERE p.project_id = 3 AND u.u_downloaded_on >= p.u_project_start; CPU Time: user 16.369556 sys 81.393235 sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on, ...> Snippet(fts_uri, '', '', '…', -1, 64) snippet ...> FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id ...> WHERE fts_uri MATCH 'education school' ...> ORDER BY u.uri_id, u_downloaded_on DESC; 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) 0|1|1|SEARCH TABLE tmp_uris AS u USING INDEX tmp_uris_uri_id (uri_id=?) (~10 rows) 0|0|0|USE TEMP B-TREE FOR ORDER BY CPU Time: user 0.86 sys 0.06 and yet sqlite> SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on, ...> Snippet(fts_uri, '', '', '…', -1, 64) snippet ...> FROM fts_uri f JOIN tmp_uris u ON f.uri_id = u.uri_id ...> WHERE fts_uri MATCH 'education school' ...> ORDER BY u.uri_id, u_downloaded_on DESC; CPU Time: user 21.871541 sys 26.414337 A lot better, but simply not usable for a web application. -- Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
I have no idea if this would work...but...here's some more thoughts... #1 How long does this take: select count(*) from fts_uri match 'education school'; #2 Create a view on uris with just what you need and use that in your join (I'm guessing that uri_content takes up most of your database space). create view v_uris as select uri_id,feed_history_id from uri; #3 Move your uri_content to another database and attach it. Again I'm assuming that's your big data field. That will seperate the paging needed to walk through uris. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Puneet Kishor [punk.k...@gmail.com] Sent: Wednesday, September 28, 2011 2:54 PM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] speeding up FTS4 On Sep 28, 2011, at 2:41 PM, Black, Michael (IS) wrote: > What happens if you create an index on uris(feed_history_id) > > > Yeah, I noticed that lacking as well. sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id ...> WHERE p.project_id = 3; 0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|1|3|SCAN TABLE uris AS u (~46608 rows) 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|3|1|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) CPU Time: user 0.79 sys 0.14 sqlite> CREATE INDEX uris_feed_history_id ON uris (feed_history_id); CPU Time: user 12.766977 sys 82.766372 sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id ...> WHERE p.project_id = 3; 0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|1|1|SCAN TABLE feeds AS f (~7 rows) 0|2|2|SEARCH TABLE feed_history AS fh USING AUTOMATIC COVERING INDEX (feed_id=?) (~5 rows) 0|3|3|SEARCH TABLE uris AS u USING COVERING INDEX uris_feed_history_id (feed_history_id=?) (~10 rows) CPU Time: user 0.000123 sys 0.15 But, no joy. SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on FROM fts_uri f JOIN uris u ON f.uri_id = u.uri_id JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id JOIN feeds f ON fh.feed_id = f.feed_id JOIN projects p ON f.project_id = p.project_id WHERE p.project_id = 3 AND u.u_downloaded_on >= p.u_project_start AND fts_uri MATCH 'education school' ORDER BY u.uri_id, u_downloaded_on DESC; .. CPU Time: user 28.599581 sys 108.518205 > > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Puneet Kishor [punk.k...@gmail.com] > Sent: Wednesday, September 28, 2011 2:00 PM > To: General Discussion of SQLite Database > Subject: EXT :Re: [sqlite] speeding up FTS4 > > > On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote: > >> strftime returns a text representation. So you didn't really change >> anything. >> > > > That's not true at all. I added u_downloaded_on (u_ for unixtime) > > CREATE TABLE uris ( >uri_id INTEGER PRIMARY KEY, >uri TEXT, >uri_html TEXT, >uri_content TEXT, >downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, >feed_history_id INTEGER, >u_downloaded_on INTEGER > ); > > sqlite> SELECT u_downloaded_on FROM uris LIMIT 5; > 1306450769 > 1306450769 > 1306450770 > 1306450774 > 1306450776 > > >> You need to use juliandays() as I said. >> >> >> >> And you want a REAL number...not integer...though SQLite doesn't really care >> what you call it. It's more for your own reference. >> >> >> >> You just added a bunch more strings increasing the size of your >> database...ergo it ran slower. >> >> > > > If I understand correctly, the *size* of the database should not matter. Or, > at least not matter as much. Imagine a database with one table with only one > row but with so much content in that row that it is 27 GB vs. a database with > a few million rows, each with small amount of content but together totaling > 27 GB. > > What should matter is using the indexes correctly. In this case, my query > plan shows that I am hitting all the indexes. But, I think I am getting > closer to understanding this. > >
Re: [sqlite] speeding up FTS4
On 09/28/11 21:55, Puneet Kishor wrote: Perhaps, but I have inserted that in my table where the column is INTEGER. sqlite> SELECT typeof(u_downloaded_on) FROM uris LIMIT 1; integer -- OK! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
On Sep 28, 2011, at 2:26 PM, Roger Andersson wrote: > On 09/28/11 21:10, Black, Michael (IS) wrote: >> >> 'scuse meI was wrong (again)...I guess strftime does return an >> integerseems to me that belies the name as it's a mismatch to the unix >> function. >> >> > ? > SQLite version 3.7.8 2011-09-19 14:49:19 > Enter ".help" for instructions > Enter SQL statements terminated with a ";" > sqlite> select typeof(strftime('%s','now','localtime')); > text > Perhaps, but I have inserted that in my table where the column is INTEGER. sqlite> SELECT typeof(u_downloaded_on) FROM uris LIMIT 1; integer -- Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
On Sep 28, 2011, at 2:41 PM, Black, Michael (IS) wrote: > What happens if you create an index on uris(feed_history_id) > > > Yeah, I noticed that lacking as well. sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id ...> WHERE p.project_id = 3; 0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|1|3|SCAN TABLE uris AS u (~46608 rows) 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|3|1|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) CPU Time: user 0.79 sys 0.14 sqlite> CREATE INDEX uris_feed_history_id ON uris (feed_history_id); CPU Time: user 12.766977 sys 82.766372 sqlite> EXPLAIN QUERY PLAN SELECT u.uri_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id ...> WHERE p.project_id = 3; 0|0|0|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|1|1|SCAN TABLE feeds AS f (~7 rows) 0|2|2|SEARCH TABLE feed_history AS fh USING AUTOMATIC COVERING INDEX (feed_id=?) (~5 rows) 0|3|3|SEARCH TABLE uris AS u USING COVERING INDEX uris_feed_history_id (feed_history_id=?) (~10 rows) CPU Time: user 0.000123 sys 0.15 But, no joy. SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on FROM fts_uri f JOIN uris u ON f.uri_id = u.uri_id JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id JOIN feeds f ON fh.feed_id = f.feed_id JOIN projects p ON f.project_id = p.project_id WHERE p.project_id = 3 AND u.u_downloaded_on >= p.u_project_start AND fts_uri MATCH 'education school' ORDER BY u.uri_id, u_downloaded_on DESC; .. CPU Time: user 28.599581 sys 108.518205 > > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Puneet Kishor [punk.k...@gmail.com] > Sent: Wednesday, September 28, 2011 2:00 PM > To: General Discussion of SQLite Database > Subject: EXT :Re: [sqlite] speeding up FTS4 > > > On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote: > >> strftime returns a text representation. So you didn't really change >> anything. >> > > > That's not true at all. I added u_downloaded_on (u_ for unixtime) > > CREATE TABLE uris ( >uri_id INTEGER PRIMARY KEY, >uri TEXT, >uri_html TEXT, >uri_content TEXT, >downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, >feed_history_id INTEGER, >u_downloaded_on INTEGER > ); > > sqlite> SELECT u_downloaded_on FROM uris LIMIT 5; > 1306450769 > 1306450769 > 1306450770 > 1306450774 > 1306450776 > > >> You need to use juliandays() as I said. >> >> >> >> And you want a REAL number...not integer...though SQLite doesn't really care >> what you call it. It's more for your own reference. >> >> >> >> You just added a bunch more strings increasing the size of your >> database...ergo it ran slower. >> >> > > > If I understand correctly, the *size* of the database should not matter. Or, > at least not matter as much. Imagine a database with one table with only one > row but with so much content in that row that it is 27 GB vs. a database with > a few million rows, each with small amount of content but together totaling > 27 GB. > > What should matter is using the indexes correctly. In this case, my query > plan shows that I am hitting all the indexes. But, I think I am getting > closer to understanding this. > > I started backward with > > sqlite> SELECT p.project_id > ...> FROM projects p > ...> WHERE p.project_id = 3; > 3 > CPU Time: user 0.80 sys 0.89 > > sqlite> SELECT f.feed_id, p.project_id > ...> FROM projects p JOIN feeds f ON f.project_id = p.project_id > ...> WHERE p.project_id = 3; > .. > CPU Time: user 0.000239 sys 0.000170 > > SELECT fh.feed_history_id, f.feed_id, p.project_id > ...> FROM projects p > ...> JOIN feeds f ON f.project_id = p.project_id > ...> JOIN feed_history fh ON f.feed_id = fh.feed_id > ...> WHERE p.project_id = 3; > .. > CPU Time: user 0.008491 sys 0.008054 > > SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id > ...> FROM projects p > ...> JOIN feeds f ON f.project_id = p.project_id > ...> JOIN feed_history fh ON f.feed_id = fh.f
Re: [sqlite] speeding up FTS4
What happens if you create an index on uris(feed_history_id) Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Puneet Kishor [punk.k...@gmail.com] Sent: Wednesday, September 28, 2011 2:00 PM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] speeding up FTS4 On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote: > strftime returns a text representation. So you didn't really change anything. > That's not true at all. I added u_downloaded_on (u_ for unixtime) CREATE TABLE uris ( uri_id INTEGER PRIMARY KEY, uri TEXT, uri_html TEXT, uri_content TEXT, downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, feed_history_id INTEGER, u_downloaded_on INTEGER ); sqlite> SELECT u_downloaded_on FROM uris LIMIT 5; 1306450769 1306450769 1306450770 1306450774 1306450776 > You need to use juliandays() as I said. > > > > And you want a REAL number...not integer...though SQLite doesn't really care > what you call it. It's more for your own reference. > > > > You just added a bunch more strings increasing the size of your > database...ergo it ran slower. > > If I understand correctly, the *size* of the database should not matter. Or, at least not matter as much. Imagine a database with one table with only one row but with so much content in that row that it is 27 GB vs. a database with a few million rows, each with small amount of content but together totaling 27 GB. What should matter is using the indexes correctly. In this case, my query plan shows that I am hitting all the indexes. But, I think I am getting closer to understanding this. I started backward with sqlite> SELECT p.project_id ...> FROM projects p ...> WHERE p.project_id = 3; 3 CPU Time: user 0.80 sys 0.89 sqlite> SELECT f.feed_id, p.project_id ...> FROM projects p JOIN feeds f ON f.project_id = p.project_id ...> WHERE p.project_id = 3; .. CPU Time: user 0.000239 sys 0.000170 SELECT fh.feed_history_id, f.feed_id, p.project_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> WHERE p.project_id = 3; .. CPU Time: user 0.008491 sys 0.008054 SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id ...> WHERE p.project_id = 3; BOOM! Adding that last table makes my query way too slow. Taking out the uris table and querying only the fts table sqlite> SELECT uri_id ...> FROM fts_uri ...> WHERE fts_uri MATCH 'education school'; starts producing the results immediately, but gives me back way too many matches. I am now getting all the matches, but I want only the matches for project id = 3 or matches throttled by u_downloaded_on (both of which are achievable only via a JOIN with uris table). > > Michael D. Black > > Senior Scientist > > NG Information Systems > > Advanced Analytics Directorate > > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Puneet Kishor [punk.k...@gmail.com] > Sent: Wednesday, September 28, 2011 12:44 PM > To: General Discussion of SQLite Database > Subject: EXT :Re: [sqlite] speeding up FTS4 > > > On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote: > >> Your change to numeric date/time may not take a long as you think. >> >> >> > > > Took an hour and a half. > > Step 1: Alter all tables with datetime columns, converting those columns to > integer; > > Step 2: Update all tables setting new datetime columns to unixtime > > UPDATE table SET new_column = strftime('%s', old_column); > > Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on); > > Step 4: Run the following query > >SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on >FROM fts_uri f >JOIN uris u ON f.uri_id = u.uri_id >JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id >JOIN feeds f ON fh.feed_id = f.feed_id >JOIN projects p ON f.project_id = p.project_id >WHERE p.project_id = 3 >AND u.u_downloaded_on >= p.u_project_start >AND fts_uri MATCH 'education school' >ORDER BY u.uri_id, u_downloaded_on DESC; > > Terrible time. >>> CPU Time: user 27.584849 sys 115.219293 >
Re: [sqlite] speeding up FTS4
On Sep 28, 2011, at 9:00 PM, Puneet Kishor wrote: > If I understand correctly, the *size* of the database should not matter. Or, > at least not matter as much. So she said. But contrary to popular believe, size does matter. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
On 09/28/11 21:10, Black, Michael (IS) wrote: 'scuse meI was wrong (again)...I guess strftime does return an integerseems to me that belies the name as it's a mismatch to the unix function. ? SQLite version 3.7.8 2011-09-19 14:49:19 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> select typeof(strftime('%s','now','localtime')); text /Roger ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
'scuse meI was wrong (again)...I guess strftime does return an integerseems to me that belies the name as it's a mismatch to the unix function. select strftime('%s','now'); 1317236583 But I think you may want: strftime('%s','now','unixepoch','localtime'); That works with the default datetime() then when you select it. sqlite> select datetime(strftime('%s','now'),'unixepoch','localtime'); 2011-09-28 14:05:09 So...given that you actually stuck integers into your table I'm a bit surprised that it took longer. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Roger Andersson [r...@telia.com] Sent: Wednesday, September 28, 2011 1:52 PM To: sqlite-users@sqlite.org Subject: EXT :Re: [sqlite] speeding up FTS4 On 09/28/11 20:14, Black, Michael (IS) wrote: > strftime returns a text representation. So you didn't really change anything. > > You need to use juliandays() as I said. > > > And you want a REAL number...not integer...though SQLite doesn't really care > what you call it. It's more for your own reference. > Assuming that second resolution is sufficient. Would UPDATE table SET new_column = cast(strftime('%s', old_column) as integer); make any difference? /Roger ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
On Sep 28, 2011, at 1:14 PM, Black, Michael (IS) wrote: > strftime returns a text representation. So you didn't really change anything. > That's not true at all. I added u_downloaded_on (u_ for unixtime) CREATE TABLE uris ( uri_id INTEGER PRIMARY KEY, uri TEXT, uri_html TEXT, uri_content TEXT, downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, feed_history_id INTEGER, u_downloaded_on INTEGER ); sqlite> SELECT u_downloaded_on FROM uris LIMIT 5; 1306450769 1306450769 1306450770 1306450774 1306450776 > You need to use juliandays() as I said. > > > > And you want a REAL number...not integer...though SQLite doesn't really care > what you call it. It's more for your own reference. > > > > You just added a bunch more strings increasing the size of your > database...ergo it ran slower. > > If I understand correctly, the *size* of the database should not matter. Or, at least not matter as much. Imagine a database with one table with only one row but with so much content in that row that it is 27 GB vs. a database with a few million rows, each with small amount of content but together totaling 27 GB. What should matter is using the indexes correctly. In this case, my query plan shows that I am hitting all the indexes. But, I think I am getting closer to understanding this. I started backward with sqlite> SELECT p.project_id ...> FROM projects p ...> WHERE p.project_id = 3; 3 CPU Time: user 0.80 sys 0.89 sqlite> SELECT f.feed_id, p.project_id ...> FROM projects p JOIN feeds f ON f.project_id = p.project_id ...> WHERE p.project_id = 3; .. CPU Time: user 0.000239 sys 0.000170 SELECT fh.feed_history_id, f.feed_id, p.project_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> WHERE p.project_id = 3; .. CPU Time: user 0.008491 sys 0.008054 SELECT u.uri_id, fh.feed_history_id, f.feed_id, p.project_id ...> FROM projects p ...> JOIN feeds f ON f.project_id = p.project_id ...> JOIN feed_history fh ON f.feed_id = fh.feed_id ...> JOIN uris u ON fh.feed_history_id = u.feed_history_id ...> WHERE p.project_id = 3; BOOM! Adding that last table makes my query way too slow. Taking out the uris table and querying only the fts table sqlite> SELECT uri_id ...> FROM fts_uri ...> WHERE fts_uri MATCH 'education school'; starts producing the results immediately, but gives me back way too many matches. I am now getting all the matches, but I want only the matches for project id = 3 or matches throttled by u_downloaded_on (both of which are achievable only via a JOIN with uris table). > > Michael D. Black > > Senior Scientist > > NG Information Systems > > Advanced Analytics Directorate > > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Puneet Kishor [punk.k...@gmail.com] > Sent: Wednesday, September 28, 2011 12:44 PM > To: General Discussion of SQLite Database > Subject: EXT :Re: [sqlite] speeding up FTS4 > > > On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote: > >> Your change to numeric date/time may not take a long as you think. >> >> >> > > > Took an hour and a half. > > Step 1: Alter all tables with datetime columns, converting those columns to > integer; > > Step 2: Update all tables setting new datetime columns to unixtime > > UPDATE table SET new_column = strftime('%s', old_column); > > Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on); > > Step 4: Run the following query > >SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on >FROM fts_uri f >JOIN uris u ON f.uri_id = u.uri_id >JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id >JOIN feeds f ON fh.feed_id = f.feed_id >JOIN projects p ON f.project_id = p.project_id >WHERE p.project_id = 3 >AND u.u_downloaded_on >= p.u_project_start >AND fts_uri MATCH 'education school' >ORDER BY u.uri_id, u_downloaded_on DESC; > > Terrible time. >>> CPU Time: user 27.584849 sys 115.219293 > > Step 5: EXPLAIN QUERY PLAN (above SELECT query) > >0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) >0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 > rows) >0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY > (rowid=?) (~1 rows) >0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (
Re: [sqlite] speeding up FTS4
On 09/28/11 20:14, Black, Michael (IS) wrote: strftime returns a text representation. So you didn't really change anything. You need to use juliandays() as I said. And you want a REAL number...not integer...though SQLite doesn't really care what you call it. It's more for your own reference. Assuming that second resolution is sufficient. Would UPDATE table SET new_column = cast(strftime('%s', old_column) as integer); make any difference? /Roger ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
strftime returns a text representation. So you didn't really change anything. You need to use juliandays() as I said. And you want a REAL number...not integer...though SQLite doesn't really care what you call it. It's more for your own reference. You just added a bunch more strings increasing the size of your database...ergo it ran slower. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Puneet Kishor [punk.k...@gmail.com] Sent: Wednesday, September 28, 2011 12:44 PM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] speeding up FTS4 On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote: > Your change to numeric date/time may not take a long as you think. > > > Took an hour and a half. Step 1: Alter all tables with datetime columns, converting those columns to integer; Step 2: Update all tables setting new datetime columns to unixtime UPDATE table SET new_column = strftime('%s', old_column); Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on); Step 4: Run the following query SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on FROM fts_uri f JOIN uris u ON f.uri_id = u.uri_id JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id JOIN feeds f ON fh.feed_id = f.feed_id JOIN projects p ON f.project_id = p.project_id WHERE p.project_id = 3 AND u.u_downloaded_on >= p.u_project_start AND fts_uri MATCH 'education school' ORDER BY u.uri_id, u_downloaded_on DESC; Terrible time. >> CPU Time: user 27.584849 sys 115.219293 Step 5: EXPLAIN QUERY PLAN (above SELECT query) 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) 0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|0|0|USE TEMP B-TREE FOR ORDER BY CPU Time: user 0.99 sys 0.08 Step 6: ANALYZE; Step 7: Run the above SELECT query again Terrible time. >> CPU Time: user 27.703538 sys 116.684390 This train is going nowhere. The times are actually worse than they were when I was using a non-text column for date time. By the way, have tried this on two machines -- the previous one was a dual-Xeon Xserve with 12 GB RAM. The current machine of choice is the top of the line iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 RPM SATA drive. > drop any indexes on project_start and downloaded_on; > > > > update projects set project_start=julianday(project_start); > > update uris set downloaded_on=julianday(downloaded_on); > > > > Recreate indexes. > > > > Modify your code to insert julianday('now','localtime') instead of taking the > default current_timestamp. > > > > I wasn't really aware before the the datetime functions stored as text all > the time...that's bad for searches... > > > > > > Michael D. Black > > Senior Scientist > > NG Information Systems > > Advanced Analytics Directorate > > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Puneet Kishor [punk.k...@gmail.com] > Sent: Wednesday, September 28, 2011 9:48 AM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] EXT : speeding up FTS4 > > > On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote: > >> >> On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote: >> >>> WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) >> >> Why are you doing 'Datetime' here ? Not only does the conversion take time, >> but it means you can't usefully index either of those two columns. >> >> Can you instead store your stamps in a format which is readily sortable ? >> Either in text form or as julian days. > > > Could I? Sure, if I had known better. Should I? I would be happy to create a > new column, convert the values to julian days, and try that, but on a 27 GB > db, that would take a bit of a while. > > But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there > really is no such thing as DATETIME value. Internally, it is stored as TEXT > anyway. > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
On 28 Sep 2011, at 6:44pm, Puneet Kishor wrote: > Step 4: Run the following query > > SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on > FROM fts_uri f > JOIN uris u ON f.uri_id = u.uri_id > JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id > JOIN feeds f ON fh.feed_id = f.feed_id > JOIN projects p ON f.project_id = p.project_id > WHERE p.project_id = 3 > AND u.u_downloaded_on >= p.u_project_start > AND fts_uri MATCH 'education school' > ORDER BY u.uri_id, u_downloaded_on DESC; > > Terrible time. >>> CPU Time: user 27.584849 sys 115.219293 > > Step 5: EXPLAIN QUERY PLAN (above SELECT query) > > 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) > 0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 > rows) > 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY > (rowid=?) (~1 rows) > 0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 > rows) > 0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) > (~1 rows) > 0|0|0|USE TEMP B-TREE FOR ORDER BY > CPU Time: user 0.99 sys 0.08 I'm not familiar with the output of EXPLAIN QUERY PLAN. 'SEARCH' means it's using an index ? And SCAN means it isn't ? Can anyone suggest a good INDEX or two ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
On Sep 28, 2011, at 11:00 AM, Black, Michael (IS) wrote: > Your change to numeric date/time may not take a long as you think. > > > Took an hour and a half. Step 1: Alter all tables with datetime columns, converting those columns to integer; Step 2: Update all tables setting new datetime columns to unixtime UPDATE table SET new_column = strftime('%s', old_column); Step 3: CREATE INDEX uris_downloaded_on ON uris (u_downloaded_on); Step 4: Run the following query SELECT u.uri_id uri_id, u.uri uri, u.u_downloaded_on FROM fts_uri f JOIN uris u ON f.uri_id = u.uri_id JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id JOIN feeds f ON fh.feed_id = f.feed_id JOIN projects p ON f.project_id = p.project_id WHERE p.project_id = 3 AND u.u_downloaded_on >= p.u_project_start AND fts_uri MATCH 'education school' ORDER BY u.uri_id, u_downloaded_on DESC; Terrible time. >> CPU Time: user 27.584849 sys 115.219293 Step 5: EXPLAIN QUERY PLAN (above SELECT query) 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) 0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|0|0|USE TEMP B-TREE FOR ORDER BY CPU Time: user 0.99 sys 0.08 Step 6: ANALYZE; Step 7: Run the above SELECT query again Terrible time. >> CPU Time: user 27.703538 sys 116.684390 This train is going nowhere. The times are actually worse than they were when I was using a non-text column for date time. By the way, have tried this on two machines -- the previous one was a dual-Xeon Xserve with 12 GB RAM. The current machine of choice is the top of the line iMac (quad core 3.4 GHz Intel Core i7) with 12 GB RAM and a 7200 RPM SATA drive. > drop any indexes on project_start and downloaded_on; > > > > update projects set project_start=julianday(project_start); > > update uris set downloaded_on=julianday(downloaded_on); > > > > Recreate indexes. > > > > Modify your code to insert julianday('now','localtime') instead of taking the > default current_timestamp. > > > > I wasn't really aware before the the datetime functions stored as text all > the time...that's bad for searches... > > > > > > Michael D. Black > > Senior Scientist > > NG Information Systems > > Advanced Analytics Directorate > > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Puneet Kishor [punk.k...@gmail.com] > Sent: Wednesday, September 28, 2011 9:48 AM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] EXT : speeding up FTS4 > > > On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote: > >> >> On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote: >> >>> WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) >> >> Why are you doing 'Datetime' here ? Not only does the conversion take time, >> but it means you can't usefully index either of those two columns. >> >> Can you instead store your stamps in a format which is readily sortable ? >> Either in text form or as julian days. > > > Could I? Sure, if I had known better. Should I? I would be happy to create a > new column, convert the values to julian days, and try that, but on a 27 GB > db, that would take a bit of a while. > > But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there > really is no such thing as DATETIME value. Internally, it is stored as TEXT > anyway. > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
Your change to numeric date/time may not take a long as you think. drop any indexes on project_start and downloaded_on; update projects set project_start=julianday(project_start); update uris set downloaded_on=julianday(downloaded_on); Recreate indexes. Modify your code to insert julianday('now','localtime') instead of taking the default current_timestamp. I wasn't really aware before the the datetime functions stored as text all the time...that's bad for searches... Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Puneet Kishor [punk.k...@gmail.com] Sent: Wednesday, September 28, 2011 9:48 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] EXT : speeding up FTS4 On Sep 28, 2011, at 9:44 AM, Simon Slavin wrote: > > On 28 Sep 2011, at 3:41pm, Puneet Kishor wrote: > >> WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) > > Why are you doing 'Datetime' here ? Not only does the conversion take time, > but it means you can't usefully index either of those two columns. > > Can you instead store your stamps in a format which is readily sortable ? > Either in text form or as julian days. Could I? Sure, if I had known better. Should I? I would be happy to create a new column, convert the values to julian days, and try that, but on a 27 GB db, that would take a bit of a while. But, if I understood [http://www.sqlite.org/datatype3.html] correctly, there really is no such thing as DATETIME value. Internally, it is stored as TEXT anyway. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
Sounds like you may just be hitting disk i/o. Your "sys" numbers seem to indicate that. How much memory does your machine have? How much time does each WHERE clause take? select count(*) from project where project_id = 3; select count(*) from fts_uri MATCH 'education,school'; select count(*) from project as p,fts_uri as u where DateTime(u.downloaded_on) >= DateTime(p.project_start); (I don't think you need Datetime at all...but I doubt it makes a lot of differencetest it)...should get the same count. select count(*) from project as p,uris as u where u.downloaded_on >= p.project_start; Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Puneet Kishor [punk.k...@gmail.com] Sent: Wednesday, September 28, 2011 9:41 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] EXT : speeding up FTS4 On Sep 28, 2011, at 8:02 AM, Black, Michael (IS) wrote: > Have you done "ANALYZE"? That might help. > > Also...try to arrange your joins based on record count (both high-to-low and > low-to-high) and see what difference it makes. > > Since you have only one WHERE clause I'm guessing having project_ids as the > first join makes sense. > > So, I ran ANALYZE. Then, with the query as is, I got CPU Time: user 24.742481 sys 79.120486 If I flip the WHERE clause to WHERE Datetime(u.downloaded_on) >= Datetime(p.project_start) AND p.project_id = 3 AND fts_uri MATCH 'education school' I still get CPU Time: user 24.726792 sys 79.240780 Yesterday, in desperation, I tried to pare down my fts_uri table which brought the file size to 17 GB, but ended up creating a "malformed disk image" whatever the heck that means. Thankfully I had a backup (clever boy). So, I am now back with a 27 GB file, and a query that takes forever thereby locking up my web app. > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Puneet Kishor [punk.k...@gmail.com] > Sent: Tuesday, September 27, 2011 5:46 PM > To: General Discussion of SQLite Database > Subject: EXT :[sqlite] speeding up FTS4 > > > I have > >CREATE TABLE uris ( >uri_id INTEGER PRIMARY KEY, >uri TEXT, >uri_content TEXT, >downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, >feed_history_id INTEGER >); > > with 46608 rows > >CREATE TABLE feed_history ( >feed_history_id INTEGER PRIMARY KEY, >feed_id INTEGER, >scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP >); > > with 3276 rows > >CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id > INTEGER); > > with 79 rows > >CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT); > > with 3 rows > >CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on); >CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content); > > The database file is about 27 GB. > > > The following query takes (CPU Time: user 23.952698 sys 73.188765) returning > 46608 rows > >SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on, >Snippet(fts_uri, '', '', '…', -1, > 64) snippet >FROM fts_uri f >JOIN uris u ON f.uri_id = u.uri_id >JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id >JOIN feeds f ON fh.feed_id = f.feed_id >JOIN projects p ON f.project_id = p.project_id >WHERE p.project_id = 3 >AND Datetime(u.downloaded_on) >= Datetime(p.project_start) >AND fts_uri MATCH 'education,school' >ORDER BY u.uri_id, downloaded_on DESC; > > > The EXPLAIN QUERY PLAN for the above query tells me > >0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) >0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) >0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) > (~1 rows) >0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) >0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 > rows) >0|0|0|USE TEMP B-TREE FOR ORDER BY > > > Is there anything I can do to speed this up? > > -- > Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
On Sep 28, 2011, at 8:19 AM, Black, Michael (IS) wrote: > P.S. Your projects table is missing project_start. So apparently these > aren't the real create statements you are using. > > > > Sorry, I think that is the only table from which I snipped off information to make the post brief(er). These are the actual statements. The projects table is CREATE TABLE projects ( project_id INTEGER PRIMARY KEY, project_name TEXT, website_tags TEXT, twitter_tags TEXT, flickr_tags TEXT, project_start DATETIME, project_en DATETIME, project_end DATETIME ); > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Puneet Kishor [punk.k...@gmail.com] > Sent: Tuesday, September 27, 2011 5:46 PM > To: General Discussion of SQLite Database > Subject: EXT :[sqlite] speeding up FTS4 > > I have > >CREATE TABLE uris ( >uri_id INTEGER PRIMARY KEY, >uri TEXT, >uri_content TEXT, >downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, >feed_history_id INTEGER >); > > with 46608 rows > >CREATE TABLE feed_history ( >feed_history_id INTEGER PRIMARY KEY, >feed_id INTEGER, >scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP >); > > with 3276 rows > >CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id > INTEGER); > > with 79 rows > >CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT); > > with 3 rows > >CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on); >CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content); > > The database file is about 27 GB. > > > The following query takes (CPU Time: user 23.952698 sys 73.188765) returning > 46608 rows > >SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on, >Snippet(fts_uri, '', '', '…', -1, > 64) snippet >FROM fts_uri f >JOIN uris u ON f.uri_id = u.uri_id >JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id >JOIN feeds f ON fh.feed_id = f.feed_id >JOIN projects p ON f.project_id = p.project_id >WHERE p.project_id = 3 >AND Datetime(u.downloaded_on) >= Datetime(p.project_start) >AND fts_uri MATCH 'education,school' >ORDER BY u.uri_id, downloaded_on DESC; > > > The EXPLAIN QUERY PLAN for the above query tells me > >0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) >0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) >0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) > (~1 rows) >0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) >0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 > rows) >0|0|0|USE TEMP B-TREE FOR ORDER BY > > > Is there anything I can do to speed this up? > > -- > Puneet Kishor > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] speeding up FTS4
P.S. Your projects table is missing project_start. So apparently these aren't the real create statements you are using. Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Puneet Kishor [punk.k...@gmail.com] Sent: Tuesday, September 27, 2011 5:46 PM To: General Discussion of SQLite Database Subject: EXT :[sqlite] speeding up FTS4 I have CREATE TABLE uris ( uri_id INTEGER PRIMARY KEY, uri TEXT, uri_content TEXT, downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, feed_history_id INTEGER ); with 46608 rows CREATE TABLE feed_history ( feed_history_id INTEGER PRIMARY KEY, feed_id INTEGER, scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP ); with 3276 rows CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id INTEGER); with 79 rows CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT); with 3 rows CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on); CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content); The database file is about 27 GB. The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 46608 rows SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on, Snippet(fts_uri, '', '', '…', -1, 64) snippet FROM fts_uri f JOIN uris u ON f.uri_id = u.uri_id JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id JOIN feeds f ON fh.feed_id = f.feed_id JOIN projects p ON f.project_id = p.project_id WHERE p.project_id = 3 AND Datetime(u.downloaded_on) >= Datetime(p.project_start) AND fts_uri MATCH 'education,school' ORDER BY u.uri_id, downloaded_on DESC; The EXPLAIN QUERY PLAN for the above query tells me 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) 0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|0|0|USE TEMP B-TREE FOR ORDER BY Is there anything I can do to speed this up? -- Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] speeding up FTS4
I have CREATE TABLE uris ( uri_id INTEGER PRIMARY KEY, uri TEXT, uri_content TEXT, downloaded_on DATETIME DEFAULT CURRENT_TIMESTAMP, feed_history_id INTEGER ); with 46608 rows CREATE TABLE feed_history ( feed_history_id INTEGER PRIMARY KEY, feed_id INTEGER, scanned_on DATETIME DEFAULT CURRENT_TIMESTAMP ); with 3276 rows CREATE TABLE feeds (feed_id INTEGER PRIMARY KEY, feed_uri TEXT, project_id INTEGER); with 79 rows CREATE TABLE projects (project_id INTEGER PRIMARY KEY, project_name TEXT); with 3 rows CREATE INDEX idx_uris_downloaded_on ON uris (downloaded_on); CREATE VIRTUAL TABLE fts_uri USING fts4 (uri_id, uri_content); The database file is about 27 GB. The following query takes (CPU Time: user 23.952698 sys 73.188765) returning 46608 rows SELECT u.uri_id uri_id, u.uri uri, u.downloaded_on, Snippet(fts_uri, '', '', '…', -1, 64) snippet FROM fts_uri f JOIN uris u ON f.uri_id = u.uri_id JOIN feed_history fh ON u.feed_history_id = fh.feed_history_id JOIN feeds f ON fh.feed_id = f.feed_id JOIN projects p ON f.project_id = p.project_id WHERE p.project_id = 3 AND Datetime(u.downloaded_on) >= Datetime(p.project_start) AND fts_uri MATCH 'education,school' ORDER BY u.uri_id, downloaded_on DESC; The EXPLAIN QUERY PLAN for the above query tells me 0|0|0|SCAN TABLE fts_uri AS f VIRTUAL TABLE INDEX 4: (~0 rows) 0|1|1|SEARCH TABLE uris AS u USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|2|2|SEARCH TABLE feed_history AS fh USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|3|3|SEARCH TABLE feeds AS f USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|4|4|SEARCH TABLE projects AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0|0|0|USE TEMP B-TREE FOR ORDER BY Is there anything I can do to speed this up? -- Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users