Re: [sqlite] In-memory DB slower than disk-based?
On 8/6/2014 11:51 AM, Jensen, Vern wrote: *bump* Anyone? Not sure if this is your problem, but this kind of load puts a lot of stress on SQLite's page cache, which might well perform worse than your OS'es page cache (used for real disk IO) does, especially under high contention. Also, SQLite's defaults wrt. threading are set up to be safe, which means that when in doubt about what your code might be doing, SQLite will serialize. I did a bit of SQLite perf tuning recently and here's two things you might want to try: 1. What's your value for flags? SQLite by default (pessimistically) assumes that you might be sharing a DB connection between multiple threads, and thus wraps essentially all API functions in locks to make sure this is safe. In a heavily multi-threaded scenario, I would recommend that you manually ensure that each connection is only used by one thread at a time, and then add SQLITE_OPEN_NOMUTEX to your flags. This reduces lock traffic substantially. (If your connection was per-thread already, doing so will not reduce contention or have any impact on your asymptotic perf or scalability, but locks aren't free.) 2. Before sqlite3_initialize, try sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0); This disables some API functions that provide access to memory stats, but avoids a global lock on every memory allocation/free, and SQLite does a lot of them. If you don't need these memory statistics, turning this off can make a noticeable difference in scalability. I'm not sure if this will affect page cache throughput specifically, but these two things are both fairly easy to try. -Fabian ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Building multiple indexes in one scan over a table
Hi, I'm working on a profiler that uses SQLite to store and index the results. Due to the nature of the task, the workload ends up splitting into essentially three phases: 1. Bulk insertion of relatively large amounts of data (low hundreds of megabytes to tens of gigabytes); a typical profile generates on the order of 10MB of sampling data per second. 2. Indexing. The first phase normally runs without any indexes on the sample tables, to maximize insertion throughput. In the second phase we create the indexes we need to retrieve the samples efficiently. (We found batch-indexing after the fact to generally lead to shorter overall processing times.) 3. Browse the results. Once the samples are indexed, the user can look around, analyze the results using queries, and so forth. This is essentially read-only. Phase 2 is crucial for quick retrieval, but at the same time it essentially amounts to wait time for the user, so we'd like to get it over with as quickly as possible. Now, looking at what happens in there, I noticed that we often end up creating several indexes on the same table back-to-back: CREATE INDEX samples_by_time ON samples(time); CREATE INDEX samples_by_name ON samples(name_id); CREATE INDEX samples_by_type ON samples(type_id); Each of these statements ends up scanning over the entire table once. Since the tables in question are quite large, I would love to have some way to create multiple indexes in one pass. A cursory examination of the VDBE code produced for the CREATE INDEX statements indicates that SQLite's backend should be capable of doing so (though I'm no expert), but I don't see a way to express what I want in standard SQL. So, is there any way to create multiple indexes for the same table in one operation in SQLite? If not, how hard would it be to add? Thanks, -Fabian ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Building multiple indexes in one scan over a table
On 6/10/2014 2:25 PM, Simon Slavin wrote: On 10 Jun 2014, at 7:22pm, Fabian Giesen fabi...@radgametools.com wrote: 1. Bulk insertion of relatively large amounts of data (low hundreds of megabytes to tens of gigabytes); a typical profile generates on the order of 10MB of sampling data per second. I presume you do your INSERTs inside a transaction. If not, you should. Yes, it's all wrapped in transactions; typically on the order of 10-100 a second. We're talking (order of magnitude) about 100k rows inserted per second of capture here; it would be nowhere near fast enough if every INSERT was its own transaction. Each of these statements ends up scanning over the entire table once. Since the tables in question are quite large, I would love to have some way to create multiple indexes in one pass. Not possible in current SQLite. No standard syntax for doing it in SQL. A quick google suggests that no other popular DBMS implements it. I beg to differ. MySQL's InnoDB has it, for example: http://dev.mysql.com/doc/refman/5.5/en/innodb-create-index-examples.html And ... I'm not sure it would be that much faster. The majority of the work it's doing is writing. Reading the existing table data is fast and easy, but writing new data and figuring out the pointers takes the majority of the time. Because of this I don't think it likely that this feature will be added to standard SQLite. Of course, if you really care about this you could add it yourself. The majority of the work is sorting, as Richard correctly points out, not writing. The indexes in question are all on one or two integer keys so they don't end up particularly big. The average sample record comes out at ~100 bytes. All the records are integer values; anything more complicated associated with a sample is stored as unique records in separate tables and referenced via foreign keys. To optimize this, when you create the table put the columns you're going to index near the beginning of the rows. SQLite stops reading the data of a table row once it has all the columns it needs. I don't know whether this will make a lot of difference but it won't do any harm. When the dataset is small enough to fit in memory (this is on workstation PCs, so this means less than a few gigabytes) we're CPU bound (doing sorting, mostly, as Richard correctly suggested). However, looking at the sorting time vs. number of records, sorting (in that scenario, for our indices) typically proceeds at well over 1M records/s (depends on the machine, of course; on my work PC it's closer to 1.4Mrecs/s). At that rate, in an out-of-core situation, we would need to be reading ~140MB/s of records, sustained, during index build time, plus the write bandwidth for the resulting index (a small fraction; the indexes on the large tables are all on one or two integer keys), to not become IO bound. That's possible with a good hard drive (and most SSDs), but it's closer than I'm fully comfortable with. The reason I'm asking is because we're seeing customers running into disproportionately slow indexing times at the end of long (~1h!) capture runs; since the dataset size is the only thing that's fundamentally different for longer runs, I started looking into it. I'm well aware this is an extreme case in various ways though. I just wanted to ask to make sure there wasn't a way I was missing! Also, store your time as a numeric value (Unix epoch or Julian day) rather than a string. Numbers are faster to handle than strings. INTEGERs are faster to handle than REALs. :-) The time fields are 64-bit ints. The exact meaning depends on the capture target platform (we support several) but usually it's either a system-synchronized CPU clock cycle counter (when available) or ticks for a high-frequency bus/event timer. -Fabian ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] ORDER BY DESC after GROUP BY not using INDEX - intentionally?
Hi, I have been thinking about a question on stackoverflow (http://stackoverflow.com/questions/19236363/select-distinct-faster-than-group-by), where some SQL framework removes duplicates from results using GROUP BY instead of DISTINCT. I don't want to discuss that this might not be a good idea. However, the core of that problem is the creation of temp b-trees when using ORDER BY ... DESC after GROUP BY. I wondered if the construction of a temp b-tree in the third query is intentional / by design? I am using sqlite 3.8.1. sqlite PRAGMA legacy_file_format=OFF; sqlite create table test1 (x INTEGER); sqlite create index test1_idx on test1(x); sqlite explain query plan select x from test1 group by x order by x; selectidorder fromdetail -- -- -- --- 0 0 0 SCAN TABLE test1 USING COVERING INDEX test1_idx create table test2 (x INTEGER); sqlite create index test2_idx on test2(x); sqlite explain query plan select x from test2 group by x order by x desc; selectidorder fromdetail -- -- -- --- 0 0 0 SCAN TABLE test2 USING COVERING INDEX test2_idx 0 0 0 USE TEMP B-TREE FOR ORDER BY create table test3 (x INTEGER); sqlite create index test3_idx on test3(x desc); sqlite explain query plan select x from test3 group by x order by x desc; selectidorder fromdetail -- -- -- --- 0 0 0 SCAN TABLE test3 USING COVERING INDEX test3_idx 0 0 0 USE TEMP B-TREE FOR ORDER BY To double check: sqlite explain query plan select x from test3 order by x desc; selectidorder fromdetail -- -- -- --- 0 0 0 SCAN TABLE test3 USING COVERING INDEX test3_idx Regards Fabian ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] ORDER BY DESC after GROUP BY not using INDEX - intentionally?
Thanks. I think using GROUP BY without aggregates is a strange way to remove duplicates, anyway. Not intentional. SQLite simply fails to recognize that by using the GROUP BY in descending order it could avoid the ORDER BY clause. This is an optimization that we have never considered because it has never come up before. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite keeps on searching endlessly
You may want to put the columns with the highest selectivity first in your index. The device 15 has nearly 10 entries in the table while the remaining of the 600 Million records belong to another device. E.g., CREATE INDEX map_index ON map (d, ...); Also, you should run ANALYZE map so that the query planner has accurate information about your index. Fabian ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQLite keeps on searching endlessly
I cannot definitely solve your problem but I can think of some things to try. First, do these: ANALYZE; CREATE INDEX map_dsn ON map (d, s, n); CREATE INDEX map_dns ON map (d, n, s); then execute the same SELECT. Does it have the same problem ? Does the EXPLAIN QUERY PLAN tell you which of the (now) three indexes SQLite has chosen. Shouldn't ANALZYE be run _after_ creating the indexes? sqlite create table t(x int, y int); sqlite insert into t values (1,1), (2,1), (3,3); sqlite analyze t; sqlite select * from sqlite_stat1; t||3 sqlite create index i on t(x,y); sqlite select * from sqlite_stat1; t||3 sqlite analyze t; sqlite select * from sqlite_stat1; t|i|3 1 1 Fabian ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Comparing two tables column by column
Wouldn't SELECT * FROM table1 EXCEPT SELECT * FROM table2 solve this problem? I think it does for the example provided. Not sure if it would work in real-world environment. -Ursprüngliche Nachricht- Von: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] Im Auftrag von Stephen Chrzanowski Gesendet: Montag, 29. Juli 2013 13:01 An: General Discussion of SQLite Database Betreff: Re: [sqlite] Comparing two tables column by column To be fair to me, the example had the same column names. If the two tables have the same column names, then having a bit of extra code to tag on the column name + _1 might have worked. As my first reply answered, untested. ;) On Mon, Jul 29, 2013 at 6:46 AM, Clemens Ladisch clem...@ladisch.de wrote: Simon Slavin wrote: On 29 Jul 2013, at 4:03am, Fehmi Noyan ISI fnoyan...@yahoo.com wrote: One point I forgot to mention; the number of columns is unknown. There is no way in SQL to say Give me the contents of all the columns of a row of table in an unambiguous format.. Well, just give me could be done with SELECT *, but it is almost impossible to compare such columns in SQL. If the tables have the same number _and_ names of columns, it would be possible to use a NATURAL JOIN, and combine this with an outer join to get non-matching records: SELECT table1.* FROM table1 NATURAL LEFT JOIN table2 WHERE table2.rowid IS NULL This will return those records from table1 that do not have a matching record in table2. For the other direction, run the same query with table1 and table2 exchanged. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS Find Tokens at Record Start
You might want to check the following: SELECT word FROM fts WHERE fts MATCH '^token' Beginning with 3.7.9 this should only return records that have 'token' at the beginning of the record. See changelog of 3.7.9: If a search token (on the right-hand side of the MATCH operator) in FTS4 begins with ^ then that token must be the first in its field of the document. ** Potentially Incompatible Change ** -Ursprüngliche Nachricht- Von: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] Im Auftrag von Clemens Ladisch Gesendet: Freitag, 5. April 2013 15:41 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] FTS Find Tokens at Record Start Paul Vercellotti wrote: using FTS, how do you match records that contain certain tokens beginning at the start of the record Apparently, this information is not stored in the FTS index. Search for the tokens, then manually check with LIKE or something like that. Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS: Phrase queries
Ever since I started using FTS extensively, I frequently ran into this limitation: ** TODO: Strangely, it is not possible to associate a column specifier ** with a quoted phrase, only with a single token. Not sure if this was ** an implementation artifact or an intentional decision when fts3 was ** first implemented. Whichever it was, this module duplicates the ** limitation. Is it ever planned to be fixed, because it doesn't seem to break any backwards compatibilty? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS: Phrase queries
2011/11/14 nobre rafael.ro...@novaprolink.com.br Comment from the source: ** TODO: Strangely, it is not possible to associate a column specifier ** with a quoted phrase, only with a single token. Not sure if this was ** an implementation artifact or an intentional decision when fts3 was ** first implemented. Whichever it was, this module duplicates the ** limitation. So, seems its really not possible Rafael Thanks, I hope this limitation will be lifted someday. Fixing it will not break any existing queries and is fully backwards compatible, so I dont understand why FTS4 duplicates that (faulty) behaviour. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS: Phrase queries
When I have a basic FTS query that needs to be restricted to a column, I can write it in two ways: 1.) WHERE column MATCH 'apple' 2.) WHERE table MATCH 'column:apple' But when I have a phrase query, I can only write it in one way: 1.) WHERE column MATCH 'apple juice' The problem is that when I want to combine the queries (search for 'apple' in column1 and for apple juice in column2) i cannot write the query like: WHERE column1 MATCH 'apple' AND column2 MATCH 'apple juice' Nor can I write it like: WHERE table MATCH 'column1:apple column2:apple juice' So this fairly simple query, seems impossible in FTS? Or does anyone know how to workaround this (without doing two seperate queries)? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] JOIN vs IN
Suppose I have two tables, and I want to have look for a value in the first table, but display the columns from the second table. The most obvious way would be joining them on rowid. But I don't need to SELECT any columns from the first table, and it's a FTS4 table (which always joins a bit slower than real tables), so I currently do: SELECT * FROM table2 WHERE rowid IN (SELECT rowid FROM table1 WHERE amount 500) It returns the same results, but it doesn't seem much faster. Is there any performance difference to be expected from using IN instead of JOIN, or does SQLite internally rewrite JOIN queries to something similar as IN, which would explain they perform nearly the same? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] VACUUM
I'm trying to understand the following: Why is it that when I create a new database, and fill it with rows, why can it not be commited to disk the same way as VACUUM would do it? Currently I'm trying to manually vacuum the database by inserting rows sorted by table and by rowid, but as soon as the table contains INDEXes I cannot reproduce anywhere near the same output as a real VACUUM, because it writes the indexes first instead of last, for example. Is there some trick to force VACUUM-like output when building the database, and avoid to build it twice? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] VACUUM
2011/11/10 Richard Hipp d...@sqlite.org Because when you are inserting the Nth row, SQLite has no idea of how many more rows will follow or how big the subsequent rows will be, so it has no way to reserve contiguous space sufficient to hold them all. The result is that parts of the table and parts of the indices become interleaved in the file. But nothing is written to disk until I call commit (i'm using journal_mode=memory), so when SQLite has to start writing the data, it knows the exact total number of rows, and also that no other rows will follow. But then again, maybe the format of the journal in memory, is an exact copy of the bytes it will write to disk, and in that case I understand that it would be very inefficient to start shuffling things, instead of just dumping it. I pictured it like a temporary table, in which case it would be fairly easy to restructure things before writing. Thanks for your explanation! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] INDEX Types
I'm having an issue where inserts on an un-cached database are very slow. The reason probably is that a large part of the existing index needs to be read from disk, to be able to insert new rows to the index. The length of the values in the indexed column are around 60 bytes, so I'm thinking about adding an extra column, containing a shorter hash (64bits) of the actual value, and move the index to that column instead. This way the total size of the index (and the database) will be much smaller, hopefully resulting in faster inserts. But I'm wondering if SQLite can deal more efficiently with a INTEGER index (64bits) VS an 8-byte TEXT column (also 64bits). I know the INTEGERs require less disk-space because SQLite can store smaller values in fewer bytes, but are there any other differences that make them more preferable as an INDEX? If there is no difference in performance, I could just take the first 8 characters of the TEXT column as a hash-value, instead of calculating a CRC64 each time. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Black, Michael (IS) michael.bla...@ngc.com Hmmm...appears to be the same for this case which, I must say, I find surprising. Thanks for actually benchmarking it. I'm also a bit surprised, because I always thought SQLite handled INTEGER more efficiently than TEXT. I also did some new benchmarks on the inserts-slowdown I experience, and to rule out VirtualBox, I performed them on real hardware, but the results are the same: 10 secs to create a database with 1 million rows, and 30 secs to add an additional 10.000 rows, if the database file is un-cached. Maybe this is normal behaviour, but how can it possibly be that it's faster to delete the whole database and re-create it, than just adding a (relatively) small amount of additional rows? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Simon Slavin slav...@bigfraud.org Didn't someone recently note that entering the first million records was fast, but if he then closed and reopened the database, entering the next 100,000 records was slow ? Yes, and there is still no real explanation for it, other than slow disk reads. But even with very slow random disk I/O, 30 seconds seems still way too slow for a 100MB file. But today I made a new observation: if I create the same table as a virtual FTS4 table, I can add the additional rows within 1 second (even on an un-cached database file). So if the reason for the slowness is disk-reads, the FTS4 way of creating/updating the index requires much less reads? Maybe because it allows for seperate tree-branches? FTS is overkill for my simple requirements, but if it's so much faster than a regular index, why not? The only things that's holding me back from switching to FTS for this table is: - I cannot use the UNIQUE constraint, to disallow duplicate values. - I cannot search efficiently for rows that DON'T match a certain value, because FTS doesn't allow a single NOT operator. So can someone explain what FTS is doing behind the scenes that makes these additional inserts so much faster? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Black, Michael (IS) michael.bla...@ngc.com Are you sure you're using BEGIN/COMMIT on your transactions? Yes I just used my benchmark data and inserted another 100,000 rows into the database in 2.3 seconds. That is because you immediately insert those additional rows, after creating the database. I get the same excellent performance as you when I do that. But if you make sure the database is not in cache (for example, reboot the machine), you will see that the additional inserts will take MUCH longer than to be reasonally expected. Could you repeat your benchmark, with a reboot between the two transactions? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Luuk luu...@gmail.com On 09-11-2011 17:23, Black, Michael (IS) wrote: time sqlite3index1.sql 19.307u 0.082s 0:19.40 99.8%0+0k 0+0io 0pf+0w time sqlite3 index2.sql 19.266u 0.092s 0:19.37 99.8%0+0k 0+0io 0pf+0w $ time sqlite3 gendat1.sql real0m21.094s user0m20.989s sys 0m0.104s $ time sqlite3 gendat2.sql real0m20.898s user0m20.813s sys 0m0.084s $ time sqlite3 gendat1.sqlite gendat1.sql real2m32.701s user0m26.038s sys 0m5.256s $ time sqlite3 gendat2.sqlite gendat2.sql real1m50.452s user0m25.534s sys 0m5.360s The difference when creating a db on disk (size: 35Mb) is (152-110)/110= 38% Did you do a reboot between the second insert? Because the difference I'm seeing is much larger than 38%? Did you test it on Linux or Windows? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] SQLite Error (19)
I'm in the process of converting a normal table to a FTS virtual table, and I'm noticing different behaviour if I try to insert duplicate rowid's. I'm using the 'INSERT OR IGNORE' statement, and on the normal table I can try to insert millions of duplicates very quickly, but on the FTS table it's very slow, and for every failed insert I get: SQLite error (19): abort at 10 in [INSERT INTO 'main'.'table_content' VALUES(?,(?))]: PRIMARY KEY must be unique So a possibly explanation of the performance difference would be that the virtual table is sending a TRACE message for each row, but how can I disable/prevent that? Or is there another reason the virtual table is slow with duplicates? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Nico Williams n...@cryptonector.com What's wrong with reading the whole file into memory at boot time as a way to prime the cache? Rebooting always takes some time, mostly the time to read all sorts of files. It's a desktop application, I cannot pre-cache anything before the user launches the app. After the launch a couple of inserts need to be done, but having to read the whole database file into memory, just to be able to perform those inserts, doesn't seem a scalable solution. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Nico Williams n...@cryptonector.com I don't get it. You're reading practically the whole file in a random manner, which is painfully slow, so why can't you read the file in one fell swoop (i.e., sequential reads)?? I'm only reading the whole file when the number of additional inserts is high enough to cause the whole index to be read from disk. But if I always pre-cache the database, it will downgrade performance for cases when only 10 inserts need to be done. And I'd like to avoid to have some fuzzy logic that tries to predicts which of the two methods is going to be faster. Besides, pre-caching the file sounds easier than it is to accomplish, because all methods suggested on this list did not work on Windows (for example copying the file to null). Windows and the harddrive have their own logic to decide which data to cache, and I haven't found a simple way to force a certain file into cache. Or, if FTS really works better, then use that. I will, but I'm trying to understand the issue that i'm facing, not just workaround it. It seems that FTS doesn't need to read the whole index from disk, so I'm trying to pinpoint the difference. My best guess is that it creates a fresh b-tree for the additional inserts, causing the boost in performance. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Black, Michael (IS) michael.bla...@ngc.com OK...you're right...a reboot kills it. I'm glad someone was able to reproduce this on Linux, ruling out the possibility it's a Windows-issue. However, reboot again and add select count(*) from a; as the first line of gendat2.sql So if a simple SELECT COUNT(*) can speed up the same insert from 90 seconds to 7 seconds (including the count), does this confirm it's a sequential vs random reads problem? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 GB gbi...@web.de Maybe you could try to use a pagesize that matches the size of a disk allocation unit or memory page. For Windows since a typical NTFS partition has a clustersize of 4KiB - which happens to also be the size of a memory page - a pagesize of 4096 Byte seems to be a good compromise between pleasing the cache manager and waste of space. I already used a page_size of 4096 for all the benchmarks, together with a large (300MB) cache_size, but thanks for the suggestion anyway! You could also investigate the effect of VACUUMing and ANALYZEing the Database. If you have multiple indexes on tables try compiling sqlite defining the SQLITE_ENABLE_STAT3 switch to get even better results from ANALYZE. I'm running these tests on a very simple database: 1 table and 1 column, so ANALYZE shouldn't have any effect. And I already tested running VACUUM on the database, but it didn't help (which seems logical, because I start with a fresh db each time). ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS4: Datatypes
Ever since I started using FTS, I'm always confronted with the problem that I need two tables: one FTS table with the TEXT columns, and one normal table with the INTEGER columns for numerical values. This causes all kinds of problems (keeping the rowid's in sync, complex queries, etc.). From a previous post from the author of FTS, I understand that it will be hard to implement datatypes for the virtual tables that FTS uses, so I already ruled that out as a solution. I could just store the numerical values as TEXT in the FTS table, but even the most simple operation (sorting for example) will be a challenge, since I have to format the numbers in such a way that they sort right (prepending zeroes to make them all the same length). But suppose I take all that into account, there still is another problem: FTS will unnecessary index all those extra columns. So would it be an idea to have a simple flag (NOINDEX for example) which dictates that a certain column shouldn't be indexed by FTS, just stored? It may be a lot simpler to implement than actual datatypes, and I could work around the other limitations myself. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS4: Datatypes
It seems I finally have some sort of a solution: As of SQLite version 3.7.9, FTS4 supports a new option - content -designed to extend FTS4 to support the creation of full-text indexes where: + +* The indexed documents are not stored within the SQLite database + at all (a contentless FTS4 table), or + +* The indexed documents are stored in a database table created and + managed by the user (an external content FTS4 table). Using this new option I can keep all data together in a regular table, and use FTS purely as an index. I wonder why this new option wasn't mentioned in the changelog of 3.7.9, but maybe because it's still experimental? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow INDEX
I just tested it, and it made no difference. The root cause of the problem is most likely not slow writes, because inserting duplicate values (which are ignored instead of written to disk) are just as slow. But your suggestion may help me with another problem: when I fill an empty database with million of rows, they are written to disk in the order I inserted them, not in the order a VACUUM would put them (sequential per table). If I can force an atomic commit by enabling a journal_mode (MEMORY for example), I would possibly end up with much better structured database file. So your idea is appreciated, but for the problem reported in this thread, it had no effect. 2011/11/3 nobre rafael.ro...@novaprolink.com.br Have you tried setting journal_mode to the default DELETE option ? Without atommic commits, maybe your inserts are going to disk one by one instead of in a single step, when commiting your transactions, thus slowing down disk writes. Fabian-40 wrote: 2011/11/2 Black, Michael (IS) I do not use WAL, since I have turned 'journal_mode' off (to improve insert performance), and as far as I know WAL is only usefull when you need to keep a journal? I also have 'synchronous' off, so SQLite shouldn't be waiting for any filesystem flushes. I hoped it was writing all the newly inserted rows to disk using a single operation, as soon as I call 'commit'. But I observed it, and found out its already writing the rows as soon as soon as they are inserted, not batching them for when I call commit. So that could be part of the problem. I don't have a real Windows machine at hand, so I will build one tomorrow, but if your expectations are correct, than it will be even slower than inside a virtual machine, because it will do individiual writes for the 1 million rows too, making performance even worse than it is now. Anothing thing is that I don't expect the slow performance have anything to do with slow disk writes, only with disk reads. I know this because when I make the index UNIQUE, and try to insert 10.000 duplicate rows (which are all ignored), it has the same bad performance, even though there are zero bytes written to disk. So it points in the direction of the reads making it slow, not the writes. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users -- View this message in context: http://old.nabble.com/Slow-INDEX-tp32766886p32772266.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Slow INDEX
I asked a similar question before, but received no response, so maybe it's a very stupid question, but if so, feel free to say so. I create a database, create one table with a TEXT column, insert 1 million rows in 10 secs, create an index in 10 secs, VACUUM the database, and close the database. Now if I re-open the database, I can add an additional 10.000 rows very fast (1 sec). But if I reboot the (Windows) PC, and insert an additional 10.000 rows, it takes at least 30 secs, which seems very slow, if I can add the first 1 million in under 10 seconds. It probably has to do with inefficient disk seeks, but even if SQLite has to read the whole database in memory to do the inserts, 30 secs is still slow for a 150mb file (the database). So is there anyone who can explain these slow inserts, or has ideas to optimize for a scenario like the above? Because currently this renders my application almost unusable. The only optimization I can think of would be dropping the index, inserting the rows, and re-creating the index. But it would require some smart logic when to do it (because for just 3 inserts it would make things slower instead of faster), and it feels like more of a work-around than a solution. Please help me out, thanks in advance! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow INDEX
2011/11/2 Mr. Puneet Kishor punk.k...@gmail.com Others will have better answers, but methinks that when you reboot the computer, the operating system's caches are flushed out, which slows the operation. Try working with the db for a bit (SELECT, repeat INSERTs, etc.) and notice if the speed increases again to what you expect. The reason I reboot the PC for the test, is because I want to have the caches flushed out, and I fully expect it to make things slower, but not by the degree (factor 300) i'm experiencing. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow INDEX
2011/11/2 Mr. Puneet Kishor punk.k...@gmail.com ahh, so you *are* getting expected behavior, just not what *you* expected. Did you have a different number in mind instead of a factor of 300? And, if so, why? To read an (un-cached) 150mb file from disk, doesn't take 30 seconds. And SQLite shouldn't have to read the whole file (because of the INDEX), only the relevant rows in the INDEX, so theoreticly it should even be faster than reading the whole file. Because Windows has a aggresive caching technique (it reads much larger blocks from disk, even if you request only 10 bytes from the file), I'm not expecting SQLite to be faster than reading the whole file, but that it's so much slower just surprises me. Maybe there is a very simple explanation, I just can't think of any. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow INDEX
2011/11/2 Paul Corke paul.co...@datatote.co.uk A stateful antivirus that does lots of heavy processing when you first open the file? I'm running these tests on a completey clean Win7 install, without any anti-virus (or other software) installed. I should note that it runs virtualized (in VirtualBox) so I can quickly switch back to the clean state every time, but if VirtualBox performance was the root cause of these performance issues, I would expect it to slow down the initial insert ( 1M rows) too. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow INDEX
2011/11/2 Simon Slavin slav...@bigfraud.org So compare your 'restart-and-INSERT' test with one where you restart, log in, then do a few random operations for a minute: start your web browser and load a page. Open and close a few documents or control panels. Sit and do nothing for a minute. /Then/ carry on with the test. To rule out your suggestion of background processes slowing down the operation, or any VirtualBox performance issues, I rebooted, and immediately copied the DB file (125 MB) to another location, and it completed within 5 seconds. So it doesn't seem VirtualBox or background processes are to blame. In the past I already did some tests with waiting for 15 minutes, and it had zero effect. Thanks for the suggestion anyway! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow INDEX
2011/11/2 Black, Michael (IS) Then, when you restart the file is on system disk and it is flushing each insert to system disk on the WAL file slowing things down dramaticalliy. I do not use WAL, since I have turned 'journal_mode' off (to improve insert performance), and as far as I know WAL is only usefull when you need to keep a journal? I also have 'synchronous' off, so SQLite shouldn't be waiting for any filesystem flushes. I hoped it was writing all the newly inserted rows to disk using a single operation, as soon as I call 'commit'. But I observed it, and found out its already writing the rows as soon as soon as they are inserted, not batching them for when I call commit. So that could be part of the problem. I don't have a real Windows machine at hand, so I will build one tomorrow, but if your expectations are correct, than it will be even slower than inside a virtual machine, because it will do individiual writes for the 1 million rows too, making performance even worse than it is now. Anothing thing is that I don't expect the slow performance have anything to do with slow disk writes, only with disk reads. I know this because when I make the index UNIQUE, and try to insert 10.000 duplicate rows (which are all ignored), it has the same bad performance, even though there are zero bytes written to disk. So it points in the direction of the reads making it slow, not the writes. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow INDEX
2011/11/2 Nico Williams n...@cryptonector.com But that's NOT what SQLite3 is doing. SQLite3 is doing random I/O. And the OS sees the random I/O pattern and concludes it's better to not read the whole file in. So for those 10K inserts you pay -worst case- 10K I/Os. At ~12ms per random I/O (likely the seek times for your disks) you're talking 120s, so you're actually far from the worst case -- even at 7ms seek time you're talking about twice the time you've seen in the worst case. Linux will not read the whole file in, but Windows eventually does. The inserts go progressively faster when they are reaching halfway, and Windows reads very large pages from disk, even if you request only 10 bytes. So in reality a very large percentage of these 10K I/O's will come from a buffer (either Windows one or your harddrive's buffer), and will not result in any physical reads from disk. Ofcourse you're right that these random reads will be slower than a sequential file-copy, because they are random, and not large, continous blocks. Actually, it might be nice if SQLite3 had a function or open flag by which to request that the whole thing be read into memory, because the OS certainly won't know to do it. I completely agree, because all the current methods (copy the file to 'null', etc.) didn't work well. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow INDEX
2011/11/2 Nico Williams n...@cryptonector.com Incidentally, it pays to make the SQLite3 page size match the filesystem preferred block size. I already have the page_size set to 4096, which should match the default NTFS cluster size. But note that this can still fail you when the file is larger than available RAM. In that case such a flag would be very bad. And SQLite3 can't know how much RAM is available. The OS can know (sortof) and the user can know, but SQLite3 can't. So I take the above back -- such a flag would probably result in posts about how SQLite3 startup causes thrashing... If the flag would respect the cache_size pragma, it could work very well. I currently set the cache_size to 300MB (72k x 4096 pages), and the database size on disk is only 125 MB, so it would fit in perfectly. People that never changed the default cache_size, will never experience any trashing, because theyre database will not be pre-cached, since it doesnt fit into cache. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow inserts with UNIQUE
2011/10/30 Black, Michael (IS) michael.bla...@ngc.com #1 What version? 3.7.8, using System.Data.Sqlite wrapper #2 How long to insert the 1M? 10 seconds #3 What's the average string size? 55 characters #5 How long to create the index? 10 seconds #6 How long to insert the next 10,000? 34 seconds. But... only 1 second if I do it immediately after filling the database, so it seems to be related wether the file is in the filesystem cache or not. Server-apps will always have the file cached, but this is a desktop-app, where the db will not be in the filesystem cache most of the times. One more thing...show us the EXPLAIN of your insert. Is sounds like your insert is not using the index for the insert for some reason (buq in sqlite?). 0 Trace 0 0 0 00 1 Goto 0 18 0 00 2 OpenWrite 0 2 0 1 00 3 OpenWrite 1 14345 0 keyinfo(1,BINARY) 00 4 NewRowid 0 2 0 00 5 String8 0 3 0 test 00 6 SCopy 3 4 0 00 7 SCopy 2 5 0 00 8 MakeRecord 4 2 1 ab 00 9 SCopy 2 6 0 00 10 IsUnique 1 12 6 4 00 11 Goto 0 15 0 00 12 IdxInsert 1 1 0 10 13 MakeRecord 3 1 6 a 00 14 Insert 0 6 2 table 1b 15 Close 0 0 0 00 16 Close 1 0 0 00 17 Halt 0 0 0 00 18 Transaction 0 1 0 00 19 VerifyCookie 0 2 0 00 20 TableLock 0 2 1 table 00 21 Goto 0 2 0 00 The resulting database is about 125MB large. So 34 seconds seems way too long, even if the whole db-file has to be read from disk, and stored into memory, it shouldn't take that long. Pragma's used: PRAGMA page_size = 4096; PRAGMA synchronous = OFF; PRAGMA journal_mode = OFF; PRAGMA temp_store = MEMORY; PRAGMA locking_mode = EXCLUSIVE; PRAGMA cache_size = 72500; If you need any more info, let me know! ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Slow inserts with UNIQUE
I have a table with one TEXT column. I insert 1 million rows of short strings, and then create an UNIQUE INDEX. The speed is very acceptable. Then I insert another 10.000 short strings, and the performance is very bad, it almosts take longer than inserting the initial million(!) rows to fill the table. Is this by design, or what can be causing this? I also observed that inserting the additional 10.000 rows goes progressively faster when nearing the end, I suppose because this is because at that time the index is completely in cache. But what is the reason for the bad performance of these inserts? I read things about b-tree page splitting, could it be related? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow inserts with UNIQUE
2011/10/29 Simon Slavin slav...@bigfraud.org When you insert the 10,000 strings are you doing it inside a transaction ? BEGIN TRANSACTION; INSERT ... INSERT ... INSERT ... COMMIT; Yes, I use transactions, prepared statements, cache_size, journal_mode and synchronous PRAGMA's, almost everything you can think of to make it faster. But most important: I use the exact same code to fill the initial million rows, so if there was anything wrong, the initial filling would be slow too I suppose? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] UNIQUE constraint
I have a column with a normal INDEX, and I would like to turn it into an UNIQUE index, but I'm a bit worried about the performance implications for inserts. Can someone give some insight into how UNIQUE is implemented in SQLite, does it create extra tables compared to a normale index, are there many extra checks? And a related question: I need to insert 1 million rows, and currently I create the INDEX afterwards, because that should be faster. I cannot create the UNIQUE INDEX afterwards, because there's a slight possibility there will be 1 or 2 duplicates, which will make the creation fail. I now have the possibility to specify UNIQUE upfront (in the TABLE definition) or manually filter out any doubles before the insert (which will also take CPU time). Would there be any advantage doing this manually, or will SQLite do it just as efficiently? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Database file structure
I have two tables, both containing 1 million rows, which frequently need to be joined by rowid. Right now, the insert loop is like this: For I = 1 to 1000 INSERT INTO TABLE1 ... INSERT INTO TABLE2 ... Next When I look at the structure of the created database-file, the rows for the two tables are in alternating pattern. At first I thought this was a good sign, because when the two rows needs to be joined, they are very close to eachother on disk. An other way to do the inserts would be: For I = 1 to 1000 INSERT INTO TABLE1 ... Next For I = 1 to 1000 INSERT INTO TABLE2 ... Next Now, the actual data of the two rows, are not close to eachother on disk, but the structure looks very clean and not fragmented. Are there any significant performances differences to be expected when choosing the first method vs the second? I guess that using the first method JOINS will be faster (because the data is close), but SELECTs on a single table will be slower (because the rows are scattered around the file), but I don't know enough about the internals of SQLite to know if that's true. So what is generally more preferable, or doesnt it make any difference? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UNIQUE constraint
No, a UNIQUE index and a regular index are implemented the exact same way. It's just that, at INSERT and UPDATE time, after finding a proper place to insert the new value, an additional check is made that the place isn't already occupied. So the only overhead for UNIQUE is that extra check? I suspect the way you are going to manually filter duplicates will involve inserting them into a moral equivalent of a UNIQUE index. The performace is probably going to be similar to that of just creating a UNIQUE index up front. But only measurements with your particular data can tell for sure. I was planning using the HashList from the .Net framework, which should be the fastest way to do it. If SQLite only has to check if a certain position is occupied, it should outperform creating a hash for each value. What got me worried was the post Slow insertion for Unique Text Column to this mailinglist, I quote: --- I was creating a new table and populating it with 100,000 rows of data (as a test case; I really wanted to populate it with over a million rows). [Insertion A] When a Text Column was NOT Unique it would take: 8875 ms = ~9 seconds [Insertion B] When a Text Column was Unique it would take: 155781 ms = ~156 seconds - The difference between 9 and 156 seconds is too large for me. The original author solved this by pre-sorting the data for the indexed column, which made the difference 9 to 12 seconds (including the pre-sort), which is very acceptable. So why does SQLite not pre-sort the data itself, when doing such large batch inserts inside a transaction? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Database file structure
2011/10/23 Simon Slavin slav...@bigfraud.org My immediate question is why this is two rows in two separate tables rather than one row in one table. After all, if tables always have the same rows in, they might as well be the same row in one table. I would love to have those rows into a single table, because those joins slow down the queries, but I have a mix of TEXT and INTEGER columns, and I had to move the TEXT columns to a different table (FTS virtual table), and I could not move the INTEGERs too, because FTS doesn't support them. So we could ask you for your OS and disk format. But even then the answer will be useful only for your exact current setup. The next time you get an OS update things might change. Manufacturers tweak this stuff all the time. It's Windows/NTFS, but I get the point. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] UNIQUE constraint
2011/10/23 Simon Slavin slav...@bigfraud.org In this example, the indexed column is a text column. The text fields could have been very long, and checking long text fields for uniqueness can involve comparing every byte. Nevertheless, I do not understand the results you quoted. I wonder whether some use of transactions would have vastly reduced the problem. In my case, the indexed column is a text column too, and the original author already stated he batched the inserts into transactions, and made use of PRAGMA statements. Only after pre-sorting the data, the problem disappeared, but I'd rather avoid having to pre-sort everything (because SQLite's C code should be able to do it faster, than my C# code). ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Database file structure
2011/10/23 Simon Slavin slav...@bigfraud.org In that case, try defragging your file sometime. May make a big difference. If you mean Windows defrag, it would be pointless, since it doesn't change the database structure? If you mean VACUUM, it will generate the exact same structure as 'method 2', so I could better use that method in the first place? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Stand-Alone INDEX
I have a very simple table: it just contains one single (text) column with an index. This table contains million of hash-values, and because of the index, SQLite is storing all the data twice. Behind the scenes it creates a second table, containing all the same values in sorted order, causing the database size to double. Because all the data I need is also in this second index-table, is there some kind of way to get rid of my original table, and still be able to insert new items? My initial thought was to change the schema of my table so that it only has a TEXT PRIMARY KEY and no other columns, but SQLite internally still creates an INTEGER rowid, so the end-result made no difference. So is there some way to have a 'stand-alone index', which doesn't store everything twice? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS vs INDEX
Did anyone do some benchmarks how the insert-speed of FTS compares to a TEXT INDEX column? I don't need many of the extra features of FTS, because I always need to look up rows by prefix or exact match, and both can be implemented efficiently via TEXT INDEX too. But if the overhead is comparable, I'd rather use FTS. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
Very interesting benchmarks! However it seems to focus mainly on the speed of SELECT queries, and the total size of the resulting database on disk. But my main concern is about the speed of INSERT queries vs normal tables. Any chance you compared that too? 2011/10/19 Black, Michael (IS) michael.bla...@ngc.com I recently benchmarked this...FTS4 has a prefix option that can make it slightly faster than TEXT. Other than that it's about the same speed. http://www.mail-archive.com/sqlite-users@sqlite.org/msg64591.html The older part of the thread has the benchmark data Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Fabian [fabianpi...@gmail.com] Sent: Wednesday, October 19, 2011 9:20 AM To: General Discussion of SQLite Database Subject: EXT :[sqlite] FTS vs INDEX Did anyone do some benchmarks how the insert-speed of FTS compares to a TEXT INDEX column? I don't need many of the extra features of FTS, because I always need to look up rows by prefix or exact match, and both can be implemented efficiently via TEXT INDEX too. But if the overhead is comparable, I'd rather use FTS. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
2011/10/19 Alexey Pechnikov pechni...@mobigroup.ru FTS use index multi-tree and de-facto has _no_ insert speed degradation. Thanks, that's good to hear! It makes me wonder why SQLite doesn't use that same multi-tree mechanism for regular indexes, but that's a whole different question. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS vs INDEX
2011/10/19 Scott Hess sh...@google.com To be clear, how it works is that new insertions are batched into a new index tree, with index trees periodically aggregated to keep selection efficient and to keep the size contained. So while the speed per insert should remain pretty stable constant, periodically an insert will require index maintenance, so that insert will be slower. If you have a lot of documents (or a small page cache) these maintenance events can get pretty expensive relative to the cost of a non-maintenance insert. So it's not a clear-cut win, but it probably would be interesting as an alternative sort of index for some tables. I always do inserts in batches of 100.000 rows, and after each batch I manually merge the b-trees using: INSERT INTO table(table) VALUES('optimize'); Is there a possibility that it will do automatic maintenance half-way during a batch? Or will it always wait untill the transaction is finished? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS: Reduce tokens
Using the default tokenizer, everything that is not an alphanumeric character or an underscore, will generate a new token. I have a lot of columns that contains e-mail addresses or URL's, and most of them have characters like '.', '@' and '/'. Is there a simple way to make FTS see them as one single token, instead of splitting those strings into many small ones? I know it's possible to develop a custom tokenizer, but that's way over my head I'm afraid :) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Fast JOIN
I'm working on a pagination system where two tables need to be joined. At first my query looked like this: SELECT table1.data1, table2.data2 FROM table1 JOIN table2 ON table1.rowid = table2.rowid WHERE table1.data1 = 10 ORDER BY table1.rowid DESC OFFSET 0 LIMIT 250 I got very good advice on this mailing-list, to change the query into this: SELECT table1.data1, table2.data2 FROM ( SELECT table1.data1 WHERE table1.data1 = 10 ORDER BY table1.rowid DESC OFFSET 0 LIMIT 250 ) AS table1 JOIN table2 ON table1.rowid = table2.rowid This returns exactly the same results, but the performance is much better because it only has to JOIN the rows that match the WHERE clause. Soon, the first problem appeared: in some cases the WHERE clause was on a column from table2, for example: WHERE table2.data2 = 10, for which the above query wouldn't work. I changed my code so that when it detects the WHERE is on table2, it uses this query: SELECT table1.data1, table2.data2 FROM ( SELECT table2.data2 WHERE table2.data2 = 10 ORDER BY table2.rowid DESC OFFSET 0 LIMIT 250 ) AS table2 JOIN table1 ON table2.rowid = table1.rowid Which is nothing more than the exact reverse of the previous query. When my code detects that WHERE needs BOTH table1 and table2, it uses the old (slow) JOIN. So far, so good. Then the second problem appeared: the user is free to sort on any column of the output. So you can have the situation where the WHERE is on table1 but ORDER BY is on table 2. At first I wanted to solve this by using the old (slow) JOIN for cases like this. But it turned out to be unworkable: if the query only returns 3 rows, it was still spending precious seconds sorting all the other rows in the table. Is there any solution for this? All things I tried completely break the pagination system, because it fully depends on having ORDER and OFFSET in the same sub-query. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fast JOIN
Assuming that is a typo, does the following not do what you want? SELECT table1.data1, table2.data2 FROM ( SELECT table1.rowid, table1.data1 WHERE table1.data1 = 10 ORDER BY table1.rowid DESC OFFSET 0 LIMIT 250 ) AS table1 JOIN table2 ON table1.rowid = table2.rowid ORDER BY table2.rowid; No, because it only sorts the current page. When the user scrolls down using the scrollbar, and the second page is fetched (OFFSET 250 LIMIT 250), it does not match up with the sorting of the previous page. Suppose it was sorted by a TEXT column of table2, it would look like this: Page 1: aaa abb ddd Page 2: bba cca dda Maybe there is just no easy way to do what I want, I will take a good night sleep about it ;) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Fast JOIN
2011/10/19 Fabian fabianpi...@gmail.com Maybe there is just no easy way to do what I want, I will take a good night sleep about it ;) I think the correct query would be: SELECT table1.data1, table2.data2 FROM ( SELECT table1.rowid, table1.data1 WHERE table1.data1 = 10 ) AS table1 JOIN table2 ON table1.rowid = table2.rowid ORDER BY table2.data2 DESC OFFSET 0 LIMIT 250 Which should make sure that only the rows matching WHERE are going to be sorted (instead of all rows). Since I moved the TEXT columns out of my main table to a FTS table, I have all these complex and inefficient joins, because all data is divided between two tables with identical rowids. I hope some day FTS supports INTEGER columns, so I can merge the two back together, and get rid of the joins :) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Limit COUNT
2011/10/16 Frank Missel i...@missel.sg But it sounds a bit like Fabian both wants to have the total number of records available and at the same time limit the count. No, I only want to have a capped total available. If I would go with Simons solution, I have to read the rows for the first 100 pages (or whatever the cap is) into a temporary table, just to show the first page. I don't need a cache for all those other pages, so that seems a lot of overhead. I only want to know if there are 100 or less pages (without copying data around). Maybe COUNT() is also creating a temporary table behind the scenes, then the performance of Simons solutions would be comparable with what I have now, and I would have the advantage that I can re-use that table to show subsequent pages without reading from disk. But I always assumed COUNT() was faster than copying between tables, maybe I should just benchmark it. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Limit COUNT
How can you limit a count-query? I tried: SELECT COUNT(*) FROM table LIMIT 5000 But it ignores the LIMIT clause. I think the workaround would be counting the results of a sub-query, but I'm trying to understand whats wrong with the syntax above. The goal is to make the engine stop iterating after it reached the upper limit, in order to save performance. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Limit COUNT
2011/10/16 Frank Missel i...@missel.sg What do you want to attain with the count? I want to allow users to paginate through a result set. The pages are retreived through LIMIT/OFFSET, but to calculate the total number of pages, I have execute a separate COUNT() query (without LIMIT) once. Because I'm basicly executing the same query twice just to get a total count, I'm trying to optimize this. Restricting the maximum number of pages to 10 should improve performance, if there was some way to put make COUNT() respect the LIMIT specified. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
Exactly. I still don't have optimal performance in the query (although it's much better now), and it seems to be related to ORDER BY. When I execute: SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50 It's very fast, but it's get much slower (10 times) when I add an ORDER BY clause, like rowid ASC or rowid DESC. I'm trying to understand why this is. It seems like SQLite is actually performing an actual sort behind the scenes, while I expected it to just iterate in reverse order (because rowid is always incremental), which should give comparable performance as the first query? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Petite Abeille petite.abei...@gmail.com Much? Really? I get the broadly same execution time for either variant: 0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~2192503 rows) 0|0|0|SCAN TABLE mail_header (~2192503 rows) I get SELECT mail_header.rowid FROM mail_header WHERE data = 1 ORDER BY mail_header.rowid DESC LIMIT 250 OFFSET 142750 0 0 0 SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~46392 rows) VS SELECT mail_header.rowid FROM mail_header WHERE data = 1 LIMIT 250 OFFSET 142750 0 0 0 SEARCH TABLE mail_header USING INDEX ididx (id=?) (~66275 rows) So you seem to get the same speed in both queries, because you have no index on the column in the WHERE clause, making my EXPLAIN differs from yours? But when the column is indexed, the 'ORDER BY' makes the query much slower. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Simon Slavin slav...@bigfraud.org If you have an index on the data column then that's the index SQLite would used for that query. Once it has picked that index it no longer has access to the rowid index. CREATE INDEX tdr ON table (data, rowid) Thanks! That explains everything! Can I remove the index for 'data' if I create an index like that, or do I need to keep them both? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Petite Abeille petite.abei...@gmail.com Hurray! Now you must have the finest query ever to grace the intraweb! A true work of beauty :)) Here in Belgium we have a saying: Who doesnt honor the petite, is not worth the big ;) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Dan Kennedy danielk1...@gmail.com If SQLite cannot use an index to for an ORDER BY in a SELECT query and has to do an external sort, the EXPLAIN QUERY PLAN output will have something like this in it: 0|0|0|USE TEMP B-TREE FOR ORDER BY Since my EXPLAIN does not show this, what else could be the cause for the large decrease in performance when ordering by rowid? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Dan Kennedy danielk1...@gmail.com Good question. Can you enter the following commands into the shell tool and post the complete output (no QUERY PLAN this time): .version .schema .explain EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET 50; EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50; sqlite .version SQLite 3.7.8 2011-09-19 14:49:19 3e0da808d2f5b4d12046e05980ca04578f581177 sqlite .schema CREATE TABLE table1(id INTEGER PRIMARY KEY, data INT); CREATE INDEX ididx ON table1(data); sqlite .explain sqlite EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET 50; addr opcode p1p2p3p4 p5 comment - - -- - 0 Trace 0 0 000 1 Noop 0 0 000 2 Integer250 1 000 3 Integer5 2 000 4 MustBeInt 2 0 000 5 IfPos 2 7 000 6 Integer0 2 000 7 Add1 2 300 8 IfPos 1 10000 9 Integer-13 000 10Integer1 4 000 11Goto 0 36000 12OpenRead 0 2 0 12 00 13Rewind 0 34000 14Column 0 1 500 15Ne 4 335 collseq(BINARY) 6c 16AddImm 2 -1000 17IfNeg 2 19000 18Goto 0 33000 19Rowid 0 7 000 20Column 0 1 800 21Column 0 2 900 22Column 0 3 10 00 23Column 0 4 11 00 24Column 0 5 12 00 25Column 0 6 13 00 26Column 0 7 14 00 27Column 0 8 15 00 28Column 0 9 16 00 29Column 0 1017 00 30Column 0 1118 00 31ResultRow 7 12000 32IfZero 1 34-1 00 33Next 0 14001 34Close 0 0 000 35Halt 0 0 000 36Transaction0 0 000 37VerifyCookie 0 10000 38TableLock 0 2 0 table1 00 39Goto 0 12000 EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50; addr opcode p1p2p3p4 p5 comment - - -- --- 0 Trace 0 0 000 1 Integer250 1 000 2 Integer5 2 000 3 MustBeInt 2 0 000 4 IfPos 2 6 000 5 Integer0 2 000 6 Add1 2 300 7 IfPos 1 9 000 8 Integer-13 000 9 Integer1 4 000 10Goto 0 38000 11OpenRead 0 2 0 12 00 12OpenRead 1 123405 0 keyinfo(1,BINARY) 00 13SeekGe 1 354 1 00 14IdxGE 1 354 1 01 15IdxRowid 1 5 000 16Seek 0 5 000 17AddImm 2 -1000 18IfNeg 2 20000 19Goto 0 34000 20IdxRowid 1 6 000 21Column 1 0 700 22Column 0 2 800 23Column 0 3 900 24Column 0 4 10 00 25Column 0 5 11 00 26Column 0 6 12 00 27Column 0 7 13 00 28Column 0 8 14 00 29Column 0 9 15
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Dan Kennedy danielk1...@gmail.com Your EXPLAIN output shows that it is doing a linear scan of table1. Which is different from what I get here with the same schema and query. When I run them here, both queries (with and without the ORDER BY rowid) use the same query plan. Do you have an sqlite_stat1 table in the database (created by running ANALYZE)? What is the output of the shell command How about the contents of the sqlite_stat1 table? What does the shell command .dump sqlite_stat1 show? You are right, I issues an ANALYSE a couple of days ago, I completely forgot about that. Maybe that's causing a negative impact on the performance. sqlite .dump sqlite_stat1 PRAGMA foreign_keys=OFF; BEGIN TRANSACTION; ANALYZE sqlite_master; INSERT INTO sqlite_stat1 VALUES('table1','ididx','463923 66275'); COMMIT; ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
Do you have an sqlite_stat1 table in the database (created by running ANALYZE)? What is the output of the shell command How about the contents of the sqlite_stat1 table? What does the shell command .dump sqlite_stat1 show? This is the output with a fresh database, where ANALYZE hasn't been run: EXPLAIN SELECT * FROM table1 WHERE data=1 LIMIT 250 OFFSET 5; 0|Trace|0|0|0||00| 1|Integer|250|1|0||00| 2|Integer|5|2|0||00| 3|MustBeInt|2|0|0||00| 4|IfPos|2|6|0||00| 5|Integer|0|2|0||00| 6|Add|1|2|3||00| 7|IfPos|1|9|0||00| 8|Integer|-1|3|0||00| 9|Integer|1|4|0||00| 10|Goto|0|39|0||00| 11|OpenRead|0|2|0|13|00| 12|OpenRead|1|31079|0|keyinfo(1,BINARY)|00| 13|SeekGe|1|36|4|1|00| 14|IdxGE|1|36|4|1|01| 15|IdxRowid|1|5|0||00| 16|Seek|0|5|0||00| 17|AddImm|2|-1|0||00| 18|IfNeg|2|20|0||00| 19|Goto|0|35|0||00| 20|IdxRowid|1|6|0||00| 21|Column|1|0|7||00| 22|Column|0|2|8||00| 23|Column|0|3|9||00| 24|Column|0|4|10||00| 25|Column|0|5|11||00| 26|Column|0|6|12||00| 27|Column|0|7|13||00| 28|Column|0|8|14||00| 29|Column|0|9|15||00| 30|Column|0|10|16||00| 31|Column|0|11|17||00| 32|Column|0|12|18||00| 33|ResultRow|6|13|0||00| 34|IfZero|1|36|-1||00| 35|Next|1|14|0||00| 36|Close|0|0|0||00| 37|Close|1|0|0||00| 38|Halt|0|0|0||00| 39|Transaction|0|0|0||00| 40|VerifyCookie|0|10|0||00| 41|TableLock|0|2|0|table1|00| 42|Goto|0|11|0||00| SELECT * FROM table1 WHERE data=1 ORDER BY rowid DESC LIMIT 250 OFFSET 5; 0|Trace|0|0|0||00| 1|Noop|0|0|0||00| 2|Integer|250|1|0||00| 3|Integer|5|2|0||00| 4|MustBeInt|2|0|0||00| 5|IfPos|2|7|0||00| 6|Integer|0|2|0||00| 7|Add|1|2|3||00| 8|IfPos|1|10|0||00| 9|Integer|-1|3|0||00| 10|Integer|1|4|0||00| 11|Goto|0|40|0||00| 12|OpenRead|0|2|0|13|00| 13|OpenRead|2|31079|0|keyinfo(1,BINARY)|00| 14|SeekLe|2|37|4|1|00| 15|IdxLT|2|37|4|1|00| 16|IdxRowid|2|5|0||00| 17|Seek|0|5|0||00| 18|AddImm|2|-1|0||00| 19|IfNeg|2|21|0||00| 20|Goto|0|36|0||00| 21|IdxRowid|2|6|0||00| 22|Column|2|0|7||00| 23|Column|0|2|8||00| 24|Column|0|3|9||00| 25|Column|0|4|10||00| 26|Column|0|5|11||00| 27|Column|0|6|12||00| 28|Column|0|7|13||00| 29|Column|0|8|14||00| 30|Column|0|9|15||00| 31|Column|0|10|16||00| 32|Column|0|11|17||00| 33|Column|0|12|18||00| 34|ResultRow|6|13|0||00| 35|IfZero|1|37|-1||00| 36|Prev|2|15|0||00| 37|Close|0|0|0||00| 38|Close|2|0|0||00| 39|Halt|0|0|0||00| 40|Transaction|0|0|0||00| 41|VerifyCookie|0|10|0||00| 42|TableLock|0|2|0|table1|00| 43|Goto|0|12|0||00| Is this the expected output? If so, ANALYZE was to blame. The query containing 'ORDER BY rowid DESC' is still slower than the one which doesn't specify any order, but the results are closer to eachother now. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/14 Simon Slavin slav...@bigfraud.org So that should never happen, right ? ANALYZE is meant to make things faster, not slower. So is that an actual fixable bug or is it one of those extremely unlikely situations that is hard to fix ? So, OP, try issuing these commands DROP TABLE sqlite_stat1; DROP TABLE sqlite_stat2; then see whether anything improves. I already tried that, but it returned: Error: table sqlite_stat1 may not be dropped ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow JOIN on ROWID
2011/10/12 Petite Abeille petite.abei...@gmail.com Now the join is performed only 250 times, adding just a small overhead compare the the bare bone query without the join. The short of it: minimize the amount of work upfront :) Thank you very much! This approach solved the problem. However, in my situation I need to select a lot more columns than just 'id' from 'mail_header', and when I look at the resulting query it appears it is selecting all those columns twice. Would it be smart to change the query so that the inner loop only selects mail_header.id, and adding a third join that fetches the extra columns from 'mail_header'? Or would the performance penalty from adding a third join out-weight the advantage of selecting less columns in the inner loop? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS4: OFFSET Performance
I have two tables, one normal table (table1), and one FTS4 table (table2). The ROWID's between the tables are in sync. I often run queries on table1, but every row returned must also include a value from table2 (data2), so I create a query like this: SELECT table1.data1, table2.data2 FROM table1 JOIN table2 ON table1.rowid=table2.rowid WHERE table1.data1 = 10 ORDER BY table1.rowid DESC LIMIT 250 OFFSET 0 This query returns the results as expected, and performs well. But as soon as I raise the OFFSET to a large value (for pagination) the performance drops drastically. After a long search through the documentation, I found out the reason: the SELECT on table2.data2 causes FTS4 to load the entire column from disk, and doesn't take into account the LIMIT/OFFSET clauses. I have a related query, where I do a MATCH on table2.data2, SELECT table1.data1, table2.data2 FROM table2 JOIN table1 ON table2.rowid=table1.rowid WHERE table2.data2 MATCH 'value' AND table1.data1 = 10 ORDER BY table1.rowid DESC LIMIT 250 OFFSET 0 Here the problem is even worse. When MATCH 'value' returns a lot of results, the query is very slow (even with OFFSET 0) because it ignores the fact it only has to return the first 250 results. How would I optimize the above queries, to take advantage of the LIMIT/OFFSET values, making them suitable for fast pagination? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] FTS4: Use Cases
I have three types columns in my table, which I want to move to a FTS4 table, but I'm not sure whether that would make sense. 1.) This column contains hash-values. I always need to lookup 100% exact matches. Normally you would store them with 'TEXT INDEX' in a standard table. Would there be any advantage moving this column to FTS? Or would it perform worse than 'TEXT INDEX'? 2.) This column also contains hash-values, but I always need to look them up by prefix (LIKE 'start%'), never by exact match. And the size of this prefix is unknown in advantage, so I can't use FTS4 optimization for fixed-size prefix searches. Is there any advantage over a traditional LIKE query? Since I suspect FTS to only index full words, which would mean no advantage. 3.) This column contains a list of short tags seperated by spaces. For example: 'v45 s12 h65', which I currently look up using 'LIKE '%tag%', to search for a certain combination of tags. In a good database-design, these tags would be stored in a second table, referencing the 'rowid' of the first table. But every row can have many tags, and storing them in normalized form, would make the INSERTS very slow, because I have to insert an extra row for each tag, instead of just one row. Since my app does much more inserts than actual selects, I choosed to store them into a single column. Would there be any advantages storing these tags into a FTS column, so I can do quick lookups? One concern I have is that they're not unique enough (just like stopwords), eventually bloating the index. Or is the only right option to create a second table with a TEXT INDEX column on these tags? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS4: OFFSET Performance
How would I optimize the above queries, to take advantage of the LIMIT/OFFSET values, making them suitable for fast pagination? Are you sure the issue is with the fts table? Or is it the order by desc? If I omit the FTS table in the query, like so: SELECT table1.data1 FROM table1 WHERE table1.data1 = 10 ORDER BY table1.rowid DESC LIMIT 250 OFFSET 0 The query always executes fast, independant of how large OFFSET is. The only difference is that I don't include 'table2.data2' in the output, so that leads me to think that the issue is related to FTS, or is there another possibility? And in the FTS docs I read: Because of the way SQLite's virtual table interface works, retrieving the value of the title column requires loading the entire row from disk (including the content field, which may be quite large). This means that if the users query matches several thousand documents, many megabytes of title and content data may be loaded from disk into memory even though they will never be used for any purpose. Which seems to confirm that the reason the query is so slow, is because I include the value of 'table2.data2' in the output? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS4: OFFSET Performance
2011/10/12 Petite Abeille petite.abei...@gmail.com In any case, you can easily refactor the query into two steps: (1) First get your pagination (2) Then join to your fts I tried implementing your refactoring, but immediately run into a problem. The user must be able to sort the data on any of the columns returned. So 'ORDER BY table1.rowid DESC' is really a variabele, which could easily be 'ORDER BY table2.data2 DESC', in which cases the query fails, because .data2 is not selected in the inner query. And I cannot move 'ORDER BY' into the outer loop, because else it will only sort the 250 results returned, not the underlying data. So the questions remains: why does increasing OFFSET kill performance when selecting table2.data2 in additation to the columns of table1. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS4: OFFSET Performance
2011/10/12 Dan Kennedy danielk1...@gmail.com Are you using 3.7.7 or newer? And specifying the order=DESC option when creating the FTS4 table? If not, it might be a big help in this case. http://www.sqlite.org/fts3.**html#fts4orderhttp://www.sqlite.org/fts3.html#fts4order Yes I'm using 3.7.7, and I already considered specifying order=DESC upon creation, but I suspect it will reverse the problem: Slow queries with OFFSET=0 and fast queries with a large OFFSET, which would be no solution. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] OFFSET Performance
2011/10/12 Igor Tandetnik itandet...@mvps.org See if this helps: http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor Thanks! Very interesting! I already was aware that using a large OFFSET could potentially be slow, because SQLite internally reads all preceding rows, and just discards them. But I do my offsets exclusively on rowid, and it appears there is some optimization taking place, since I can specify very large offsets, without a change in performance. So I assumed that SQLite is taking some shortcut when using rowid for OFFSET, and I was only wondering why this 'shortcut' is disabled when joining a FTS table. I'm going to implement the method described in the above article, and see if it makes any differences. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] OFFSET Performance
2011/10/12 Igor Tandetnik itandet...@mvps.org See if this helps: http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor I tried to implement the method as suggested in the article, but it will only work for pagination where the user is only allowed to go 1 page back or 1 page forward (since you have to remember the last rowid). In my case, the user is allowed to jump to the last page, without visiting any of the previous pages, making the suggested method impossible to implement. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS4: OFFSET Performance
2011/10/12 Dan Kennedy danielk1...@gmail.com Can we see the output of EXPLAIN for this query? Without selecting table2.data2: 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows) Including selecting table2.data2: 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows) 0 1 1 SCAN TABLE table2 VIRTUAL TABLE INDEX 1: (~0 rows) So it looks like the whole problem is caused by the 'SCAN' on table2. Is there any reason why SQLite does this SCAN instead of a SEARCH, since I JOIN the two tables on ROWID, why does it have to do a full table scan? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] FTS4: OFFSET Performance
2011/10/12 Dan Kennedy danielk1...@gmail.com (INDEX 1) == lookup by rowid. Okay! Then maybe my problem is unrelated to FTS, I will create a new question. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Slow JOIN on ROWID
Why is this very fast (20 ms): SELECT table1.data1, table1.data2 FROM table1 WHERE table1.data1 = 10 ORDER BY table1.rowid DESC LIMIT 250 OFFSET 50 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows) And this very slow (3500ms): SELECT table1.data1, table2.data2 FROM table1 JOIN table2 ON table1.rowid=table2.rowid WHERE table1.data1 = 10 ORDER BY table1.rowid DESC LIMIT 250 OFFSET 50 0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows) 0 1 1 SCAN TABLE table2 VIRTUAL TABLE INDEX 1: (~0 rows) The values in table1.data2 and table2.data2 are 100% identical, because my app has always duplicated all columns of table2 (fts4 table) into table1 to work around this issue. But I really like to solve it some day, because it doubles my database size. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Exclusive transaction severly slows down the queries increasingly in time
Hi, I use SQLite 3.0.8 with Quicklite 1.5.0 as wrapper for it on Mac OS X 10.3. I made a simple test which creates a table with 1 rows. The table contains an indexed column named 'Path'. This is the query I ran: for i=0 to 1000 do begin begin exclusive/immediate transaction select where Path=some path end transaction end If I use simple beginTransation, the for cycle ends in 8 seconds which is a good time. The problem: If I use beginImmediateTransaction or beginExclusiveTransaction the for cycle ends in 9 minutes. The first 10-20 queries run ok, but after that, the beginExclusive/ImmediateTransaction instructions eat more and more time increasingly. At the end of the , one single select lasts between 1/2 and 1 second which IS VERY MUCH. If you'll argue that I should include the whole cycle in a transation, the reason why I don't do that is that my application executes some queries on the table from time to time. Please help, Fabian __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com