Re: [sqlite] Building multiple indexes in one scan over a table
> > > CREATE INDEX samples_by_time ON samples(time); > CREATE INDEX samples_by_name ON samples(name_id); > CREATE INDEX samples_by_type ON samples(type_id); > > Maybe this is dumb idea (sorry for that), but I'll try to write it down: What if to use an index for an index? You create N additional talbes as: CREATE TABLE index_field_N_table (index_field INT PRIMARY KEY, main_table_id INT); ... Now you can use those aux tables as indices, ie SELECT * FROM index_field_N_table LEFT JOIN main_table ON main_table.id = index_field_N_table.main_table_id WHERE index_field_N_table.index_field = ; OR if you need it just for sorting: SELECT * FROM index_field_N_table LEFT JOIN main_table ON main_table.id = index_field_N_table.main_table_id WHERE ORDER BY index_field_N_table.index_field; SqLite is very smart, so selects will work properly: sqlite> create temp table xxx (id int PRIMARY key, some_val int); sqlite> create temp table yyy (id int PRIMARY key, some_val int); sqlite> explain query plan SELECT * FROM yyy LEFT JOIN xxx ON xxx.id = yyy.some_val WHERE yyy.id = 1; selectidorder fromdetail -- -- -- 0 0 0 SEARCH TABLE yyy USING INDEX sqlite_autoindex_yyy_1 (id=?) (~1 rows) 0 1 1 SEARCH TABLE xxx USING INDEX sqlite_autoindex_xxx_1 (id=?) (~1 rows) sqlite> explain query plan SELECT * FROM yyy LEFT JOIN xxx ON xxx.id = yyy.some_val ORDER BY yyy.id ; selectidorder fromdetail -- -- -- - 0 0 0 SCAN TABLE yyy USING INDEX sqlite_autoindex_yyy_1 (~100 rows) 0 1 1 SEARCH TABLE xxx USING INDEX sqlite_autoindex_xxx_1 (id=?) (~1 ro AFAIK you can insert data into several tables simultaneously (we can call this an 'index creation'). If I'm wrong and SQLite does not support simultaneous inserts into separate tables, you can use separate database for each table. To make a creation of 'indices' faster, you can aggregate their data and make transaction batches much bigger. Or better yet, if possible memory-wise, dump the data into aux tables at the end Sorry, I did zero research and I'm not an expert in this field. I have no clue, how much bigger will your database become and how will it affect the performance. I'm not even sure if 'index creation' be any faster... Make your own research if you like. Just wanted to share what came to my crazy mind :) Good luck! Paul ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Building multiple indexes in one scan over a table
Hi, > Von: Fabian Giesen > I'm working on a profiler that uses SQLite to store and index the > results. > > Due to the nature of the task, the workload ends up splitting into > essentially three phases: > > 1. Bulk insertion of relatively large amounts of data (low hundreds > of megabytes to tens of gigabytes); a typical profile generates on > the order of 10MB of sampling data per second. > > 2. Indexing. The first phase normally runs without any indexes on the > sample tables, to maximize insertion throughput. In the second phase > we create the indexes we need to retrieve the samples efficiently. > (We found "batch-indexing" after the fact to generally lead to > shorter overall processing times.) > > 3. Browse the results. Once the samples are indexed, the user can > look around, analyze the results using queries, and so forth. This is > essentially read-only. > > Phase 2 is crucial for quick retrieval, but at the same time it > essentially amounts to wait time for the user, so we'd like to get it > over with as quickly as possible. Now, looking at what happens in > there, I noticed that we often end up creating several indexes on the > same table back-to-back: > >CREATE INDEX samples_by_time ON samples(time); >CREATE INDEX samples_by_name ON samples(name_id); >CREATE INDEX samples_by_type ON samples(type_id); > > Each of these statements ends up scanning over the entire table once. > Since the tables in question are quite large, I would love to have > some way to create multiple indexes in one pass. A cursory > examination of the VDBE code produced for the CREATE INDEX statements > indicates that SQLite's backend should be capable of doing so (though > I'm no expert), but I don't see a way to express what I want in > standard SQL. > > So, is there any way to create multiple indexes for the same table in > one operation in SQLite? If not, how hard would it be to add? If one of your columns is unique, you could declare this column as primary key and use "WITHOUT ROWID". If the column is also of type int, you could use the "INTEGER PRIMARY KEY" optimization which has a similar effect. This saves the time of creating the index altogether, saves a few bytes for the row id column, and the table will be pre-ordered by said value. But be sure you read the caveats in the documentation before proceeding. https://www.sqlite.org/withoutrowid.html https://www.sqlite.org/lang_createtable.html#rowid I'm also not sure how this affects insertion time, but if the value is growing monotoneously, it should not differ too much from the standard auto-generated ROWID behavior which has the same properties. Best regards Markus Schaber CODESYS(r) a trademark of 3S-Smart Software Solutions GmbH Inspiring Automation Solutions 3S-Smart Software Solutions GmbH Dipl.-Inf. Markus Schaber | Product Development Core Technology Memminger Str. 151 | 87439 Kempten | Germany Tel. +49-831-54031-979 | Fax +49-831-54031-50 E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com CODESYS forum: http://forum.codesys.com Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Building multiple indexes in one scan over a table
Fabian Giesen wrote: > Simon Slavin wrote: ... >> Also, store your time as a numeric value (Unix epoch or Julian day) rather >> than a string. Numbers are faster to handle than strings. INTEGERs are >> faster to handle than REALs. > > > :-) > > The time fields are 64-bit ints. The exact meaning depends on the > capture target platform (we support several) but usually it's either > a system-synchronized CPU clock cycle counter (when available) or > ticks for a high-frequency bus/event timer. It's worth adding that SQLite stores integers in variable length (from 1 to 9 bytes). Small integers are more compact to store and are thus more efficient (both in the table and in its indexes). So if you can store numbers relatively to a known minimum values, you can make records smaller. See: http://sqlite.org/src4/doc/trunk/www/varint.wiki Multiple indexes are independent from each other. So it would be nice if creation of multiple indexes could be done in parallel (one thread per index). That's assuming that creation of indexes is CPU bound and not I/O bound. Regards Dominique ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Building multiple indexes in one scan over a table
On 11 Jun 2014, at 12:21am, Fabian Giesenwrote: > I just wanted > to ask to make sure there wasn't a way I was missing! Nothing obvious from your clear description. Good luck with it. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Building multiple indexes in one scan over a table
On 6/10/2014 2:25 PM, Simon Slavin wrote: On 10 Jun 2014, at 7:22pm, Fabian Giesenwrote: 1. Bulk insertion of relatively large amounts of data (low hundreds of megabytes to tens of gigabytes); a typical profile generates on the order of 10MB of sampling data per second. I presume you do your INSERTs inside a transaction. If not, you should. Yes, it's all wrapped in transactions; typically on the order of 10-100 a second. We're talking (order of magnitude) about 100k rows inserted per second of capture here; it would be nowhere near fast enough if every INSERT was its own transaction. Each of these statements ends up scanning over the entire table once. Since the tables in question are quite large, I would love to have some way to create multiple indexes in one pass. Not possible in current SQLite. No standard syntax for doing it in SQL. A quick google suggests that no other popular DBMS implements it. I beg to differ. MySQL's InnoDB has it, for example: http://dev.mysql.com/doc/refman/5.5/en/innodb-create-index-examples.html And ... I'm not sure it would be that much faster. The majority of the work it's doing is writing. Reading the existing table data is fast and easy, but writing new data and figuring out the pointers takes the majority of the time. Because of this I don't think it likely that this feature will be added to standard SQLite. Of course, if you really care about this you could add it yourself. The majority of the work is sorting, as Richard correctly points out, not writing. The indexes in question are all on one or two integer keys so they don't end up particularly big. The average sample record comes out at ~100 bytes. All the records are integer values; anything more complicated associated with a sample is stored as unique records in separate tables and referenced via foreign keys. To optimize this, when you create the table put the columns you're going to index near the beginning of the rows. SQLite stops reading the data of a table row once it has all the columns it needs. I don't know whether this will make a lot of difference but it won't do any harm. When the dataset is small enough to fit in memory (this is on workstation PCs, so this means "less than a few gigabytes") we're CPU bound (doing sorting, mostly, as Richard correctly suggested). However, looking at the sorting time vs. number of records, sorting (in that scenario, for our indices) typically proceeds at well over 1M records/s (depends on the machine, of course; on my work PC it's closer to 1.4Mrecs/s). At that rate, in an out-of-core situation, we would need to be reading ~140MB/s of records, sustained, during index build time, plus the write bandwidth for the resulting index (a small fraction; the indexes on the large tables are all on one or two integer keys), to not become IO bound. That's possible with a good hard drive (and most SSDs), but it's closer than I'm fully comfortable with. The reason I'm asking is because we're seeing customers running into disproportionately slow indexing times at the end of long (~1h!) capture runs; since the dataset size is the only thing that's fundamentally different for longer runs, I started looking into it. I'm well aware this is an extreme case in various ways though. I just wanted to ask to make sure there wasn't a way I was missing! Also, store your time as a numeric value (Unix epoch or Julian day) rather than a string. Numbers are faster to handle than strings. INTEGERs are faster to handle than REALs. :-) The time fields are 64-bit ints. The exact meaning depends on the capture target platform (we support several) but usually it's either a system-synchronized CPU clock cycle counter (when available) or ticks for a high-frequency bus/event timer. -Fabian ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Building multiple indexes in one scan over a table
On Tue, Jun 10, 2014 at 2:22 PM, Fabian Giesenwrote: > > So, is there any way to create multiple indexes for the same table in one > operation in SQLite? If not, how hard would it be to add? > Creating an index is essentially the same thing as sorting the whole table - sorting in index order. So you are going to have to sort the whole table three times, once for each index. The time to do the sort dominates. The time needed to scan the original table in order to pull out the elements is usually negligible compared to the sorting time. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Building multiple indexes in one scan over a table
On Tue, Jun 10, 2014 at 4:25 PM, Simon Slavinwrote: > On 10 Jun 2014, at 7:22pm, Fabian Giesen wrote: > Not possible in current SQLite. No standard syntax for doing it in SQL. A > quick google suggests that no other popular DBMS implements it. If the CREATE INDEX statements are in a transaction then the RDBMS could defer index creation until they are needed or a COMMIT is done, then it could coalesce their execution plans to do a single scan. Otherwise it can't be done at all indeed (since if not in a transaction it's as if each CREATE INDEX were in its own transaction, and that has to be durable (synchronous) when it completes. > And ... I'm not sure it would be that much faster. The majority of the work > it's doing is writing. Reading the existing table data is fast and easy, but > writing new data and figuring out the pointers takes the majority of the > time. Because of this I don't think it likely that this feature will be > added to standard SQLite. Of course, if you really care about this you could > add it yourself. How much faster it is depends on whether the table to be scanned fits in memory, how much memory pressure there is, and so on. So, yes, it might improve performance, but probably not that much because yeah, the limiting factor will be write bandwidth, and that will surely be less than read bandwidth, so at most we're talking about an improvement of 33%. If you were creating 10 indexes then it'd be at most 9/20s improvement. Anything that approaches a 50% improvement is not to be sneezed at, but given what a special case this is, I'd not blame anyone for focusing on something better. > To optimize this, when you create the table put the columns you're going to > index near the beginning of the rows. SQLite stops reading the data of a > table row once it has all the columns it needs. I don't know whether this > will make a lot of difference but it won't do any harm. Well, no, SQLite reads page-by-page. It may stop decoding data for columns past the last one it needs, but that's different. Also, let's look at the OP's indexes: CREATE INDEX samples_by_time ON samples(time); CREATE INDEX samples_by_name ON samples(name_id); CREATE INDEX samples_by_type ON samples(type_id); I'm betting that type_id is a low-cardinality column. It might be best to try the skip-scan index optimization in recent releases and go for just one or two indexes. Assuming these are something like a time series, and assuming skip-scan indexing, this might work: CREATE INDEX samples_idx ON samples (time, name_id, type_id); or maybe two: CREATE INDEX samples_idx ON samples (time); CREATE INDEX samples_idx ON samples (name_id, type_id); > Also, store your time as a numeric value (Unix epoch or Julian day) rather > than a string. Numbers are faster to handle than strings. INTEGERs are > faster to handle than REALs. I'm not so sure. Number parsing involves multiplications. This sort of thing needs to be benchmarked in each use case. Storing dates in ISO 8601 format has its benefits anyways. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Building multiple indexes in one scan over a table
On 10 Jun 2014, at 7:22pm, Fabian Giesenwrote: > 1. Bulk insertion of relatively large amounts of data (low hundreds of > megabytes to tens of gigabytes); a typical profile generates on the order of > 10MB of sampling data per second.CREATE INDEX samples_by_time ON > samples(time); I presume you do your INSERTs inside a transaction. If not, you should. > CREATE INDEX samples_by_name ON samples(time); > CREATE INDEX samples_by_name ON samples(name_id); > CREATE INDEX samples_by_type ON samples(type_id); > > Each of these statements ends up scanning over the entire table once. Since > the tables in question are quite large, I would love to have some way to > create multiple indexes in one pass. Not possible in current SQLite. No standard syntax for doing it in SQL. A quick google suggests that no other popular DBMS implements it. And ... I'm not sure it would be that much faster. The majority of the work it's doing is writing. Reading the existing table data is fast and easy, but writing new data and figuring out the pointers takes the majority of the time. Because of this I don't think it likely that this feature will be added to standard SQLite. Of course, if you really care about this you could add it yourself. To optimize this, when you create the table put the columns you're going to index near the beginning of the rows. SQLite stops reading the data of a table row once it has all the columns it needs. I don't know whether this will make a lot of difference but it won't do any harm. Also, store your time as a numeric value (Unix epoch or Julian day) rather than a string. Numbers are faster to handle than strings. INTEGERs are faster to handle than REALs. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users