Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices
> On Sep 12, 2019, at 12:33 PM, mailing lists wrote: > > the application also runs on mobile devices and the insertion (excluding > indexing) also takes a couple of minutes. This means that the user might > switch to another app during insertion and - in the worst case - the system > is just terminating my background app. So, all the insertion is lost in case > of a single transaction. [Off-topic] Your app should not be getting terminated without notice! Use the platform's APIs for requesting background time. I'm not an Android programmer, but here's what to do on iOS: When notified the app is being backgrounded, call UIApplication.beginBackgroundTask. The OS will give you more time to run. More importantly, you'll be notified when the OS wants you to quit, and can clean up (i.e. commit the transaction). To guard against conditions where the OS needs to terminate the app while it's in the foreground (low memory or low battery), implement the UIApplicationDelegate method applicationWillTerminate and perform the same cleanup there. Note that you won't get notifications like this if your DB processing is blocking the main thread's runloop. To prevent that, do the work on a background thread, or at least put the runloop in a modal state and periodically run it for an iteration. —Jens ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices
Hi, > Am 2019-09-12 um 10:55 schrieb Keith Medcalf : > > > On Tuesday, 10 September, 2019 09:26, mailing lists > wrote: > >> I cannot really put all the inserts into one transaction because in case of >> a failure I loose all the already inserted data. > > Why is that important? Cannot you just load it again from whence it came in > the first place on failure? the application also runs on mobile devices and the insertion (excluding indexing) also takes a couple of minutes. This means that the user might switch to another app during insertion and - in the worst case - the system is just terminating my background app. So, all the insertion is lost in case of a single transaction. This scenario should be prevented. > >> There is hardly any performance gain anymore when doing 1000 or 10 000 >> insertions in one transaction including immediate insertion into indices >> (in my case the difference is in the per cent range). > > I find that the difference between (a) "inserting all the data into a table > with indexes in a single transaction"; (b) "inserting all the data into a > table without indexes in a single transaction and then create the indexes"; > and, (d) "within a single transaction drop the indexes, insert all the data, > then create the indexes" is: > > (b) is about 10% faster than (a) > (c) is about 40& faster than (a) > > smaller batch sizes result in more random I/O and performance decreases as > the batch size decreases. > > -- > The fact that there's a Highway to Hell but only a Stairway to Heaven says a > lot about anticipated traffic volume. > > > > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices
On Tuesday, 10 September, 2019 09:26, mailing lists wrote: >I cannot really put all the inserts into one transaction because in case of >a failure I loose all the already inserted data. Why is that important? Cannot you just load it again from whence it came in the first place on failure? >There is hardly any performance gain anymore when doing 1000 or 10 000 >insertions in one transaction including immediate insertion into indices >(in my case the difference is in the per cent range). I find that the difference between (a) "inserting all the data into a table with indexes in a single transaction"; (b) "inserting all the data into a table without indexes in a single transaction and then create the indexes"; and, (d) "within a single transaction drop the indexes, insert all the data, then create the indexes" is: (b) is about 10% faster than (a) (c) is about 40& faster than (a) smaller batch sizes result in more random I/O and performance decreases as the batch size decreases. -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices
On Tue, Sep 10, 2019 at 05:25:38PM +0200, mailing lists wrote: > Hi, > > I cannot really put all the inserts into one transaction because in case of a > failure I loose all the already inserted data. Though I made some tests. > There is hardly any performance gain anymore when doing 1000 or 10 000 > insertions in one transaction including immediate insertion into indices (in > my case the difference is in the per cent range). What do you mean "to loose data"? Do you need them to be immediately available via SQL, or just written to persistent storage? In the latter case you can implement your own data cache, like sequentilal log files, which will be periodically (and/or on demand) rotated, and afterwards asynchronously parsed, inserted into the SQLite database with optimized CACHE_SIZE, transaction size, journal mode etc, and deleted only after successfull commit. Thus you shift the burden from SQL to filesystem which is less limited by natural data structure and might perform better. Valentin Davydov. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices
On 10 Sep 2019, at 4:02pm, mailing lists wrote: > Insertion really slows down after about 100 000 items have been inserted. I > suppose that the slow down is related to indexing because: > > [...] > c) changing the cache size has only a minor impact SQLite speed does degrade with table size, but it degrades gracefully. Depending on which OS you're using, you have at least three caches in play. A sudden change in speed indicates that you just bust one of those caches, as you wrote. But in this case it's probably the OS's memory cache assigned to your program. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices
On Tuesday, 10 September, 2019 09:26, mailing lists wrote: >I cannot really put all the inserts into one transaction because in case of >a failure I loose all the already inserted data. Though I made some tests. >There is hardly any performance gain anymore when doing 1000 or 10 000 >insertions in one transaction including immediate insertion into indices >(in my case the difference is in the per cent range). >What is the background that index creation is so much faster than insertion >using indices? Once I heard something about fragmentation but on solid >state disks fragmentation should not play a role as long as indices and >data are separated, are they? When you "create" an index as a single operation, you scan the table to collect the key data, do a sort, and then do an in-order insertion into the B-Tree, and then write out the entire tree all at once (it is a single transaction). When the index already exists, you have to "fiddle" with the B-Tree for each record because it is no longer being built in order. You have to split and combine pages and shuffle the data about as each record is inserted into the index. The more records that can be inserted per transaction the less I/O will be required (and you have to have a big enough cache). If you can pre-sort the records so that they are inserted in-order and use a relatively large number of records inserted per transaction, then there will be very little difference between the two. Of course, if you have multiple indexes then pre-sorting into the most efficient insertion order is a high art (and often not entirely possible anyway). -- The fact that there's a Highway to Hell but only a Stairway to Heaven says a lot about anticipated traffic volume. ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices
On 9/10/19, mailing lists wrote: > What is the background that index creation is so much faster than insertion > using indices? Indexes are maintained in key order. So inserts are happening at random spots all across the index. For each insert, the system has to (1) read a 4096-byte page, (2) update the 10 or 20 bytes corresponding to the index entry, and finally (3) write the 4096-byte page. If you keep pages in cache, you might get lucky in update two or three entries on each page before you have to write it back. But even then, you are doing a lot of I/O relative to the amount of content you are changing. Your I/O is hundreds of times more than the amount of content being updated. CREATE INDEX does an external merge source on the keys, such that keys are always written in ascending order. Hence, the pattern is (1) fill up a 4096-byte page with new entries, then (2) write the page. The I/O to content change ratio is now close to 1.0, which is what you want. About a hundred times less I/O than random inserts. The fact that all pages are written in order also helps, as filesystems tend to be optimized for that case. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices
Hi, I cannot really put all the inserts into one transaction because in case of a failure I loose all the already inserted data. Though I made some tests. There is hardly any performance gain anymore when doing 1000 or 10 000 insertions in one transaction including immediate insertion into indices (in my case the difference is in the per cent range). What is the background that index creation is so much faster than insertion using indices? Once I heard something about fragmentation but on solid state disks fragmentation should not play a role as long as indices and data are separated, are they? Regards, Hartwig > Am 2019-09-10 um 17:16 schrieb Richard Hipp : > > On 9/10/19, mailing lists wrote: > >> So, the best solution I found so far is to disable indexing while insertion >> and to index the table afterwards > > I think that is the best solution. Be sure to also do all of your > inserts (and the CREATE INDEX statements) inside of a transaction. > > -- > D. Richard Hipp > d...@sqlite.org > ___ > sqlite-users mailing list > sqlite-users@mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices
On 9/10/19, mailing lists wrote: > So, the best solution I found so far is to disable indexing while insertion > and to index the table afterwards I think that is the best solution. Be sure to also do all of your inserts (and the CREATE INDEX statements) inside of a transaction. -- D. Richard Hipp d...@sqlite.org ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices
Hartwig, You have got most of the tricks we know about. Other more experienced developers may provide a better insight. We had to moved about 60GB of table data about and we ended up doing what you have done with one extra bit, we batched the jobs up in multiples of 10,000 between BEGIN and END to make transactions out of them. It's not clear if you are doing that. Rob On 10 Sep 2019, at 16:02, mailing lists wrote: I have the following situation: - in one table relatively small data is inserted (100 bytes per record) - this table contains three indices - about 100 million or more records have to be inserted Insertion really slows down after about 100 000 items have been inserted. I suppose that the slow down is related to indexing because: a) removing the indices brings the speed up b) it does not matter whether using a solid state drive or a conventional one (the overall speed differs but not the phenomenon itself) c) changing the cache size has only a minor impact So, the best solution I found so far is to disable indexing while insertion and to index the table afterwards (this is magnitudes faster than insertion with indexes). Are there any better solutions or other tricks I might try (splitting table into a data and an index part)? BTW: I am using journal_mode DELETE. WAL mode only delays the problem and increases a bit the speed but not significantly. Regards, Hartwig ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users