Re: [sqlite] faster query - help
On 1/25/2013 9:45 AM, moti lahiani wrote: Hi I have a data base with the following tables 1) files - each file in the system have his information in this table. that table include 12 different columns like file_id as integer primary key autoincrement, date_uploaded as integer, file_type as integer and so.. 2) music_files - hold more information on file in case the file is music file. columns like file_id as integer, artist, genre, album_name and more I need to query those 2 tables (with both all information) according to some filters/rule that I have, order the results according to date_uploaded and return the top(10) as final result my question is what is faster and have the best performance: (the table files can have more than 150 rows) option 1: select the file_id and date_uploaded, order by date_uploaded and get the top(10) file_id and than select all information from both(files and music_files) tables according to those top(10) results option 2: select all information from both(files+music files) tables, order by date_uploaded and return the top(10) as results At which point do you plan to apply your filters? In option 1, if you first select top ten file IDs and *then* apply filters to them, then you would likely end up with fewer than ten files. But if you apply filters at the same time you choose your ten IDs, then you need to read associated data anyway, and your option 1 becomes equivalent to option 2. I'd say, just do select * from files left join music_files using file_id where YourFilterGoesHere order by date_uploaded limit 10; An index on files(date_uploaded) would likely prove beneficial. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Substring Search Across Many Columns
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 25/01/13 12:59, Paul Vercellotti wrote: > As I understand, it's tricky to get FTS to do substring matching, no? > What's the best way to do that? In what way is it tricky? There are several examples of doing it in the doc I pointed to. Even when it does a full scan the list of all words should be shorter than visiting each source row. I recommend you actually go ahead and use FTS before deciding it doesn't work. You'll be able to get accurate performance information for your data set. http://c2.com/cgi/wiki?PrematureOptimization If you want to do substring matching using an index then you need to use n-grams. This involves taking fragments from the text. For example if your source text is "hi there" and you are doing n-grams between 2 and 4 letters then you would index these: 'hi' 'hi ' 'hi t' 'i ' 'i t' 'i th' ' t' ' th' ' the' 'th' 'the' 'ther' 'he' 'her' 'here' 'er' 'ere' 're' You can possibly also use a FTS tokenizer that produces n-grams. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlEDFdwACgkQmOOfHg372QTShgCfXMmtiWFbWL9INRMF4TfTUTGb 5+IAn2LrTYKTm9mLcJ6mR6piRQ8LT6nw =taL+ -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] sqlite-analyzer & largish db?
Hello, [3.7.15.2 2013-01-09 11:53:05 c0e09560d26f0a6456be9dd3447f5311eb4f238f] [sqlite-analyzer-osx-x86-3071502] sqlite3_analyzer seems to, hmmm, overflow or something given a largish db. For example, for a db of 8,434,233,344 bytes (8.43 GB on disk) : --8<-- *** All tables and indices *** Percentage of total database.. 100.000% Number of entries. 143344555 Bytes of storage consumed. -155705344 Bytes used after compression.. 8434229248 -5416.8% Bytes of payload.. 6972547011 -4478.0% -->8-- Note the negative 'Bytes of storage consumed' and associated negative percentages. Bug or feature? Thanks in advance. Cheers, PA. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Substring Search Across Many Columns
As I understand, it's tricky to get FTS to do substring matching, no? What's the best way to do that? Thanks! Paul From: Roger Binns To: General Discussion of SQLite Database Sent: Friday, January 25, 2013 12:01 PM Subject: Re: [sqlite] Substring Search Across Many Columns -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 25/01/13 11:30, Paul Vercellotti wrote: > I'm trying to match a substring (case-insensitive) across multiple > columns and want it to go fast; it's very slow to do a query like '... > WHERE name LIKE "%fish%" OR desc LIKE "%red%" OR title LIKE > "%soup%"...' > > Will creating a composite index ... If you are concerned about performance then SQLite already has a solution for you: http://www.sqlite.org/fts3.html Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlEC5IEACgkQmOOfHg372QSNawCfcstLdcz+03qyerOuCRD1apxg OsIAoKmLokdeHugQARtxRmD3bxVwBu99 =MyPO -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] identify virtual tables?
Hello, What would be a reasonable way to programmatically identify all the virtual tables in a database? Is there a structured way to do so? Short of scanning the DDLs that is? selectname from sqlite_master where type = 'table' and lower( sql ) like '% virtual %' order by name; Thanks in advance. Cheers, PA. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Substring Search Across Many Columns
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 25/01/13 11:30, Paul Vercellotti wrote: > I'm trying to match a substring (case-insensitive) across multiple > columns and want it to go fast; it's very slow to do a query like '... > WHERE name LIKE "%fish%" OR desc LIKE "%red%" OR title LIKE > "%soup%"...' > > Will creating a composite index ... If you are concerned about performance then SQLite already has a solution for you: http://www.sqlite.org/fts3.html Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlEC5IEACgkQmOOfHg372QSNawCfcstLdcz+03qyerOuCRD1apxg OsIAoKmLokdeHugQARtxRmD3bxVwBu99 =MyPO -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Substring Search Across Many Columns
Hi there, I'm trying to match a substring (case-insensitive) across multiple columns and want it to go fast; it's very slow to do a query like '... WHERE name LIKE "%fish%" OR desc LIKE "%red%" OR title LIKE "%soup%"...' Will creating a composite index that has all those columns speed up a LIKE query across all the columns in that index? That is, we'd like it to only have to do one index scan, rather than multiple table scans - will a composite key help with that? Any other suggestions for doing this are appreciated too. Thanks! Paul ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] using the same sqlite parameter more than once causes premature memory deallocation
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 24/01/13 17:37, abbood wrote: > Btw I'm curious how did you find out about this auto release thing? Can > you add more detail about that? I used a debugger to set a breakpoint in malloc_error_debug as the message says. That gave a stack trace of an autorelease pool being drained. Since most frees happen in the autorelease pool drain it doesn't particularly help with telling you which item is the problem. I asked on the valgrind list and got a way of keeping the allocation stack trace instead of the free one. (The next valgrind release will allow you to print both.) http://article.gmane.org/gmane.comp.debugging.valgrind/12755 The cause of that original issue in my code was NSInvocation and dealing with returned objects. The cause of the most recent issue was because NSData was owning a buffer passed to it that I didn't want it to. > Ie I put a break point in malloc_error_break.. But then I just jump to > the internals of sqlite.. And basically within the salite internals > it's freeing an operation that doesn't exist.. Using valgrind will narrow down the problem. What you are seeing is the consequence of earlier memory errors. > But im not sure if it's a good idea to modify the guts of sqlite.. Is > it? SQLite is *extremely* unlikely to have a bug. Some other piece of code has the bug, and SQLite is the victim. Remember that virtually every web browser on virtually every platform is using SQLite - an error would show up for someone. http://www.sqlite.org/testing.html > And then I made the second part use in line variables as opposed to > bindings.. Ie NSString stringwithformat.. Do remember sqlite3_mprintf for that sort of thing, especially if strings are involved to avoid sql injection attacks/bugs: http://www.sqlite.org/c3ref/mprintf.html > So I deleted that row from the dbase and it works fine.. I couldn't > find anything different on that row.. Memory allocators typically have buckets for different sized allocations often in powers of two. It could be that row had a string needing 33 bytes while others needed less which then caused memory to come out of a different bucket which then changes where the victim of the actual bug shows up. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlEC2o8ACgkQmOOfHg372QQQgACg2u97/wfBys3ryf/EZphv0R43 hjUAoLh2/anUjqGWa+GxC+7GO5tt3D0L =X0kB -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] faster query - help
The real answer is try both and see which is faster. My guess is #1 is probably faster since I don't think there's an easy way to limit the left-hand side of a left-join operation to do #2 without touching all the music_file records, is there? The join would have to match on file_id so would hit the entire music_files against . I'm sure somebody will correct me if I'm wrong here...I'm not confident that the join isn't smarter than that. Is the join done BEFORE the order by and limit? Or after? I would think it's done before. Or will the order by and limit apply to the left-hand side first before the join when it can? -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of moti lahiani Sent: Friday, January 25, 2013 8:45 AM To: sqlite-users@sqlite.org Subject: [sqlite] faster query - help Hi I have a data base with the following tables 1) files - each file in the system have his information in this table. that table include 12 different columns like file_id as integer primary key autoincrement, date_uploaded as integer, file_type as integer and so.. 2) music_files - hold more information on file in case the file is music file. columns like file_id as integer, artist, genre, album_name and more I need to query those 2 tables (with both all information) according to some filters/rule that I have, order the results according to date_uploaded and return the top(10) as final result my question is what is faster and have the best performance: (the table files can have more than 150 rows) option 1: select the file_id and date_uploaded, order by date_uploaded and get the top(10) file_id and than select all information from both(files and music_files) tables according to those top(10) results option 2: select all information from both(files+music files) tables, order by date_uploaded and return the top(10) as results Thanks ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] faster query - help
Hi I have a data base with the following tables 1) files - each file in the system have his information in this table. that table include 12 different columns like file_id as integer primary key autoincrement, date_uploaded as integer, file_type as integer and so.. 2) music_files - hold more information on file in case the file is music file. columns like file_id as integer, artist, genre, album_name and more I need to query those 2 tables (with both all information) according to some filters/rule that I have, order the results according to date_uploaded and return the top(10) as final result my question is what is faster and have the best performance: (the table files can have more than 150 rows) option 1: select the file_id and date_uploaded, order by date_uploaded and get the top(10) file_id and than select all information from both(files and music_files) tables according to those top(10) results option 2: select all information from both(files+music files) tables, order by date_uploaded and return the top(10) as results Thanks ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] bug report: out of memory error for most operations on a large database
How much free disk space do you have? Your temp tables might be exceeding capacity. -Original Message- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of ammon_lymphater Sent: Thursday, January 24, 2013 1:41 PM To: sqlite-users@sqlite.org Subject: [sqlite] bug report: out of memory error for most operations on a large database Summary: except for select count(*) , all operations on a large table (500m rows/80gb) result in out-of-memory error Details: a. Using SQLite3 ver 3.7.15.2, Windows 7 64 bit AMD(but the error also in 3.6.xx and on other Windows platforms) b. Created a table (schema attached), imported data from a csv file using sqlite3 (no problems) c. "select Name5, count(*) from LargeData group by name5 order by name5 results" in Error: out of memory (the cardinality of Name5 is 12) d. "sqlite> select count(*) from StorageHourlyBySIDCL2;" gives 587608348, as it should e. The above independent of cache_size (from 0.3 to 1.5gb); happens for page_size 32kb and 64kb (others not tested) (personally not urgent for me - just testing the limits - but may be useful for others) table schema & sqlite3_analyzer output CREATE TABLE largedata ( name1 smalldatetime , Name2 uniqueidentifier , Name3 varchar (16) , Name4 varchar (8) , Name5 varchar (80) , Name6 real , Name7 real , Name8 real , Name9 real , Name10 real , Name11 real , Name12 real , Name13 real , Name14 smallint , Name15 tinyint , Name16 tinyint , Name17 smalldatetime , Name18 real , Name19 tinyint ); -- SQLITE3_ANALYZER output /** Disk-Space Utilization Report For h:\temp\convert\import2.db Page size in bytes 32768 Pages in the whole file (measured) 2578119 Pages in the whole file (calculated).. 2578118 Pages that store data. 2578118100.000% Pages on the freelist (per header) 00.0% Pages on the freelist (calculated) 10.0% Pages of auto-vacuum overhead. 00.0% Number of tables in the database.. 11 Number of indices. 0 Number of named indices... 0 Automatically generated indices... 0 Size of the file in bytes. 84479803392 Bytes of user payload stored.. 79293861071 93.9% *** Page counts for all tables with their indices LargeData. 2578108100.000% smalldata 10.0% (zero-length tables omitted) *** All tables *** Percentage of total database.. 100.000% Number of entries. 587608358 Bytes of storage consumed. 84479770624 Bytes of payload.. 79293871126 93.9% Average payload per entry. 134.94 Average unused bytes per entry 0.32 Average fanout 2716.00 Fragmentation. 0.11% Maximum payload per entry. 1933 Entries that use overflow. 00.0% Index pages used.. 949 Primary pages used 2577169 Overflow pages used... 0 Total pages used.. 2578118 Unused bytes on index pages... 3904523 12.6% Unused bytes on primary pages. 1841229100.22% Unused bytes on overflow pages 0 Unused bytes on all pages. 1880274330.22% --$ removing unused tables *** Table STORAGEHOURLYBYSIDCL2 ** Percentage of total database.. 100.000% Number of entries. 587608348 Bytes of storage consumed. 84479442944 Bytes of payload.. 79293861071 93.9% Average payload per entry. 134.94 Average unused bytes per entry 0.32 Average fanout 2716.00 Fragmentation. 0.11% Maximum payload per entry. 183 Entries that use overflow. 00.0% Index pages used.. 949 Primary pages used 2577159 Overflow pages used... 0 Total pages used.. 2578108 Unused bytes on index pages... 3904523 12.6% Unused bytes on primary pages. 1838055150.22% Unused bytes on overflow pages 0 Unused bytes on all pages. 1877100380.22% *** Table SQLITE_MASTER ** Percentage of total database.. 0.0% Number of entries. 10 Bytes of storage consumed. 32768 Bytes of payload.. 10055 30.7% Average payload per entry. 1005.50 Average unused bytes per e
[sqlite] bug report: out of memory error for most operations on a large database
Summary: except for select count(*) , all operations on a large table (500m rows/80gb) result in out-of-memory error Details: a. Using SQLite3 ver 3.7.15.2, Windows 7 64 bit AMD(but the error also in 3.6.xx and on other Windows platforms) b. Created a table (schema attached), imported data from a csv file using sqlite3 (no problems) c. "select Name5, count(*) from LargeData group by name5 order by name5 results" in Error: out of memory (the cardinality of Name5 is 12) d. "sqlite> select count(*) from StorageHourlyBySIDCL2;" gives 587608348, as it should e. The above independent of cache_size (from 0.3 to 1.5gb); happens for page_size 32kb and 64kb (others not tested) (personally not urgent for me - just testing the limits - but may be useful for others) table schema & sqlite3_analyzer output CREATE TABLE largedata ( name1 smalldatetime , Name2 uniqueidentifier , Name3 varchar (16) , Name4 varchar (8) , Name5 varchar (80) , Name6 real , Name7 real , Name8 real , Name9 real , Name10 real , Name11 real , Name12 real , Name13 real , Name14 smallint , Name15 tinyint , Name16 tinyint , Name17 smalldatetime , Name18 real , Name19 tinyint ); -- SQLITE3_ANALYZER output /** Disk-Space Utilization Report For h:\temp\convert\import2.db Page size in bytes 32768 Pages in the whole file (measured) 2578119 Pages in the whole file (calculated).. 2578118 Pages that store data. 2578118100.000% Pages on the freelist (per header) 00.0% Pages on the freelist (calculated) 10.0% Pages of auto-vacuum overhead. 00.0% Number of tables in the database.. 11 Number of indices. 0 Number of named indices... 0 Automatically generated indices... 0 Size of the file in bytes. 84479803392 Bytes of user payload stored.. 79293861071 93.9% *** Page counts for all tables with their indices LargeData. 2578108100.000% smalldata 10.0% (zero-length tables omitted) *** All tables *** Percentage of total database.. 100.000% Number of entries. 587608358 Bytes of storage consumed. 84479770624 Bytes of payload.. 79293871126 93.9% Average payload per entry. 134.94 Average unused bytes per entry 0.32 Average fanout 2716.00 Fragmentation. 0.11% Maximum payload per entry. 1933 Entries that use overflow. 00.0% Index pages used.. 949 Primary pages used 2577169 Overflow pages used... 0 Total pages used.. 2578118 Unused bytes on index pages... 3904523 12.6% Unused bytes on primary pages. 1841229100.22% Unused bytes on overflow pages 0 Unused bytes on all pages. 1880274330.22% --$ removing unused tables *** Table STORAGEHOURLYBYSIDCL2 ** Percentage of total database.. 100.000% Number of entries. 587608348 Bytes of storage consumed. 84479442944 Bytes of payload.. 79293861071 93.9% Average payload per entry. 134.94 Average unused bytes per entry 0.32 Average fanout 2716.00 Fragmentation. 0.11% Maximum payload per entry. 183 Entries that use overflow. 00.0% Index pages used.. 949 Primary pages used 2577159 Overflow pages used... 0 Total pages used.. 2578108 Unused bytes on index pages... 3904523 12.6% Unused bytes on primary pages. 1838055150.22% Unused bytes on overflow pages 0 Unused bytes on all pages. 1877100380.22% *** Table SQLITE_MASTER ** Percentage of total database.. 0.0% Number of entries. 10 Bytes of storage consumed. 32768 Bytes of payload.. 10055 30.7% Average payload per entry. 1005.50 Average unused bytes per entry 2255.50 Maximum payload per entry. 1933 Entries that use overflow. 00.0% Primary pages used 1 Overflow pages used... 0 Total pages used.. 1 Unused bytes on primary pages. 22555 68.8% Unused bytes on overflow pages 0 Unused bytes on all pages.