Re: [sqlite] Optimizing list retrieval with a FTS3 table
Scott, Thank you for clarifying the inefficiency of FTS3 when not using a MATCH criteria. Unfortunately there are other use cases that do require the MATCH criteria so the FTS3 is required. I believe reading everything into a temp table would consume too much memory as in an ideal situation we would not place an upper bound on the number of songs that can be indexed. My next thought is to go ahead and keep two separate tables with the same information. One a FTS3 table for when searching by a word is necessary and another that does not use FTS3 for all other cases. Using TRIGGERS and VIEWS it should be relatively simple to keep both tables up to date. Also the FTS3 version of the table will not need all of the information that is in the main table since not all columns need to be searchable. In case you still cared, here is the current schema. query = "CREATE VIRTUAL TABLE " LDB_ST " USING fts3 ( tokenize='unaccent', " LDB_ST_ID " INTEGER PRIMARY KEY, " LDB_ST_FPATH " TEXT NOT NULL, " LDB_ST_TITLE " TEXT NOT NULL, " LDB_ST_ARTIST " TEXT NOT NULL, " LDB_ST_ALBUM " TEXT NOT NULL, " LDB_ST_TRACKNUM " INTEGER, " LDB_ST_GENRE " TEXT NOT NULL, " LDB_ST_BPM " REAL NOT NULL, " LDB_ST_TAPPED_BPM " REAL NOT NULL, " LDB_ST_PLAYTIME " INTEGER NOT NULL, " LDB_ST_COMMENTS " TEXT NOT NULL, " LDB_ST_ISANALYSED " INTEGER NOT NULL, " LDB_ST_SESSION " INTEGER, " LDB_ST_BPM_TYPE " INTEGER); "; Thanks again, Travis -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Scott Hess Sent: March-16-11 4:20 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Optimizing list retrieval with a FTS3 table On Wed, Mar 16, 2011 at 12:15 PM, Travis Orr wrote: > Can someone enlighten me about this. To me a lot of the details appear > to be hidden since my main SongTable is a FTS3 virtual table. You don't provide your schema, but based on your queries, I'll make unwarranted assumptions :-). In fts3, there is a rowid column (standard SQLite meaning), a docid column which is an alias of rowid, and all the columns you define are TEXT columns. If you say: CREATE VIRTUAL TABLE MyTable USING FTS3( songid INTEGER PRIMARY KEY AUTO_INCREMENT MAGIC KEYWORDS, title VARCHAR(23), recorded DATETIME ); All three of those columns are TEXT.. Based on your queries, I'm betting that you're assuming that the various typing keywords for a CREATE TABLE statement apply, but they don't. If you want to know why, you can scan the archives or read the source code, but suffice to say that this is the truth at this time. Anyhow, the gist of it is that the FTS3 table has a full-text index on the TEXT of the columns, and that any other queries will be full table scans, as if there were no optimizations at all. So complicated queries with ORDER BY, LIMIT, and OFFSET can absolutely destroy performance if your result sets are all all big (or can be big, watch for the query of death!). If you will not be using MATCH, then there is no gain at all from FTS3, and you should consider just using a regular table. As I understand your problem, the solution I'd probably use would be to create a new temporary table to hold the data while scanning it. So something like: CREATE TEMPORARY TABLE MyResults AS SELECT docid, title, artist FROM songtable WHERE ... ORDER BY ...; I _think_ the resulting table will effectively capture the ORDER BY results, so you can then scan it using OFFSET and LIMIT (or rowid) efficiently. If this is too big, you could experiment with capturing only the docid values in order, and then joining MyResults back against songtable to get the original values. That won't be particularly efficient with OFFSET and LIMIT, but it should be able to join directly with songtable.docid, so it shouldn't be particularly inefficient, either. Of course, you could also just read the entire docid set into memory and manage it that way. It's a little cumbersome because then you have to keep re-binding the query to walk through things, but it probably won't perform any worse. -scott ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing list retrieval with a FTS3 table
On Wed, Mar 16, 2011 at 12:15 PM, Travis Orr wrote: > Can someone enlighten me about this. To me a lot of the details appear > to be hidden since my main SongTable is a FTS3 virtual table. You don't provide your schema, but based on your queries, I'll make unwarranted assumptions :-). In fts3, there is a rowid column (standard SQLite meaning), a docid column which is an alias of rowid, and all the columns you define are TEXT columns. If you say: CREATE VIRTUAL TABLE MyTable USING FTS3( songid INTEGER PRIMARY KEY AUTO_INCREMENT MAGIC KEYWORDS, title VARCHAR(23), recorded DATETIME ); All three of those columns are TEXT.. Based on your queries, I'm betting that you're assuming that the various typing keywords for a CREATE TABLE statement apply, but they don't. If you want to know why, you can scan the archives or read the source code, but suffice to say that this is the truth at this time. Anyhow, the gist of it is that the FTS3 table has a full-text index on the TEXT of the columns, and that any other queries will be full table scans, as if there were no optimizations at all. So complicated queries with ORDER BY, LIMIT, and OFFSET can absolutely destroy performance if your result sets are all all big (or can be big, watch for the query of death!). If you will not be using MATCH, then there is no gain at all from FTS3, and you should consider just using a regular table. As I understand your problem, the solution I'd probably use would be to create a new temporary table to hold the data while scanning it. So something like: CREATE TEMPORARY TABLE MyResults AS SELECT docid, title, artist FROM songtable WHERE ... ORDER BY ...; I _think_ the resulting table will effectively capture the ORDER BY results, so you can then scan it using OFFSET and LIMIT (or rowid) efficiently. If this is too big, you could experiment with capturing only the docid values in order, and then joining MyResults back against songtable to get the original values. That won't be particularly efficient with OFFSET and LIMIT, but it should be able to join directly with songtable.docid, so it shouldn't be particularly inefficient, either. Of course, you could also just read the entire docid set into memory and manage it that way. It's a little cumbersome because then you have to keep re-binding the query to walk through things, but it probably won't perform any worse. -scott ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing list retrieval with a FTS3 table
On Wed, Mar 16, 2011 at 4:12 PM, Travis Orr wrote: > Jim, just did a test with your recommendation, however that ended up > using too much memory, since it is selecting the entire list in on go > and keeping the memory for a significant amount of time, for the > embedded system it is running on. Other tasks started having problems. > > Travis I think it should only use cachesize*pagesize memory, so you should be able to control memory usage with pragma cache_size. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com > > -Original Message- > From: sqlite-users-boun...@sqlite.org > [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Jim Wilcoxson > Sent: March-16-11 12:51 PM > To: General Discussion of SQLite Database > Subject: Re: [sqlite] Optimizing list retrieval with a FTS3 table > > On Wed, Mar 16, 2011 at 3:15 PM, Travis Orr wrote: >> I am currently working on a project that requires retrieving a list of >> all the rows from a FTS3 table. The ordering of the results varies by >> search criteria. Since this is for an embedded project the list > results >> are passed in chunks to another module to give the appearance of > faster >> operations. > > Somewhere you have some state information so that you know what offset > to use. Store the SQLite cursor with that state information, and use > it to fetch the next 2000 rows on each call, ie, only do the query > once. > > Jim > -- > HashBackup: easy onsite and offsite Unix backup > http://www.hashbackup.com > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing list retrieval with a FTS3 table
Jim, just did a test with your recommendation, however that ended up using too much memory, since it is selecting the entire list in on go and keeping the memory for a significant amount of time, for the embedded system it is running on. Other tasks started having problems. Travis -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Jim Wilcoxson Sent: March-16-11 12:51 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Optimizing list retrieval with a FTS3 table On Wed, Mar 16, 2011 at 3:15 PM, Travis Orr wrote: > I am currently working on a project that requires retrieving a list of > all the rows from a FTS3 table. The ordering of the results varies by > search criteria. Since this is for an embedded project the list results > are passed in chunks to another module to give the appearance of faster > operations. Somewhere you have some state information so that you know what offset to use. Store the SQLite cursor with that state information, and use it to fetch the next 2000 rows on each call, ie, only do the query once. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing list retrieval with a FTS3 table
Jim, That would require maintaining the current connection and using a prepared statement, correct? Thanks for recommending that I had considered that earlier but couldn't afford to have the db locked from writes for the length of time a query could take. At that point in time I was still on Sqlite 3.6.xx without shared-cache enabled. Since then I have pulled Sqlite 3.7.5 and am using the WAL making this feasible. Thanks, Travis -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Jim Wilcoxson Sent: March-16-11 12:51 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] Optimizing list retrieval with a FTS3 table On Wed, Mar 16, 2011 at 3:15 PM, Travis Orr wrote: > I am currently working on a project that requires retrieving a list of > all the rows from a FTS3 table. The ordering of the results varies by > search criteria. Since this is for an embedded project the list > results are passed in chunks to another module to give the appearance > of faster operations. Somewhere you have some state information so that you know what offset to use. Store the SQLite cursor with that state information, and use it to fetch the next 2000 rows on each call, ie, only do the query once. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Optimizing list retrieval with a FTS3 table
On Wed, Mar 16, 2011 at 3:15 PM, Travis Orr wrote: > I am currently working on a project that requires retrieving a list of > all the rows from a FTS3 table. The ordering of the results varies by > search criteria. Since this is for an embedded project the list results > are passed in chunks to another module to give the appearance of faster > operations. Somewhere you have some state information so that you know what offset to use. Store the SQLite cursor with that state information, and use it to fetch the next 2000 rows on each call, ie, only do the query once. Jim -- HashBackup: easy onsite and offsite Unix backup http://www.hashbackup.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Optimizing list retrieval with a FTS3 table
I am currently working on a project that requires retrieving a list of all the rows from a FTS3 table. The ordering of the results varies by search criteria. Since this is for an embedded project the list results are passed in chunks to another module to give the appearance of faster operations. Currently the selection of rows is done as follows: SELECT songid, title, artist FROM SongTable ORDER BY title COLLATE NOCASE DESC, artist COLLATE NOCASE DESC, songid DESC LIMIT 2000 OFFSET 4000; With the offset increasing for each subsequent page of data. The big problem I have with this is that when the offset gets large there is a huge inefficiency because all rows are selected and then stepped through until the offset is reached. My current test database has 34,000 rows. I am considering changing to queries of the following style but am unsure of the performance gains since I will still not be using the MATCH operator on the data. SELECT songid, title, artist FROM songtable WHERE (title = "some title AND artist = "some artist" AND songid > 4194419) OR (title = "some title" AND artist > "some artist") OR title > "some title" ORDER BY title, artist, songid LIMIT 2000; When using the "explain query plan" operator I see that both of the above cases only use "VIRTUAL TABLE INDEX 0" The advantage of the second case is that is only selects the necessary rows, however it does use need to perform comparisons and I am unsure what the performance hit of the comparison operators on a FTS3 table is. sqlite> explain query plan SELECT songid, title, artist FROM songtable WHERE (title = "some title AND artist = "some artist" AND songid > 4194419) OR (title = "some title" AND artist > "some artist") OR title > "some title" ORDER BY title, artist, songid LIMIT 2000; 0|0|0|SCAN TABLE songtable VIRTUAL TABLE INDEX 0: (~0 rows) 0|0|0|SCAN TABLE songtable VIRTUAL TABLE INDEX 0: (~0 rows) 0|0|0|SCAN TABLE songtable VIRTUAL TABLE INDEX 0: (~0 rows) 0|0|0|USE TEMP B-TREE FOR ORDER BY sqlite> explain query plan SELECT songid, title, artist FROM SongTable ORDER BY title COLLATE NOCASE DESC, artist COLLATE NOCASE DESC, songid DESC LIMIT 2000 OFFSET 4000; 0|0|0|SCAN TABLE songtable VIRTUAL TABLE INDEX 0: (~0 rows) 0|0|0|USE TEMP B-TREE FOR ORDER BY Can someone enlighten me about this. To me a lot of the details appear to be hidden since my main SongTable is a FTS3 virtual table. I hope this makes sense. Thanks, Travis ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users