Re: [sqlite] Speeding up index creation
On Mon, Sep 5, 2011 at 1:18 PM, Jaco Breitenbachwrote: > Dear all, > > Any speculation on if and how the new code would affect inserts into an > indexed table? > > Jaco It doesn't affect them at all. The optimization specifically targets sorting large amount of data. Best regards, Filip Navara ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
Dear all, Any speculation on if and how the new code would affect inserts into an indexed table? Jaco On 5 September 2011 09:15, Filip Navarawrote: > On Mon, Sep 5, 2011 at 9:39 AM, Baruch Burstein > wrote: > > This is unrelated, but I have never done any serious testing, and am > > wondering what the user/sys results are in all these performance tests. > Is > > user I/O time and sys CPU time? > > User is the time spent in user-mode code, ie. mostly SQLite code and > operating system libraries. > Sys is the time spent in kernel-mode code, ie. kernel, drivers, I/O > operatons. > > In both cases the values are per-process and thus they should be more > accurate than wall clock time. > > Sum of both values should give the total time spent executing a given > command. > > Best regards, > Filip Navara > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
On Mon, Sep 5, 2011 at 9:39 AM, Baruch Bursteinwrote: > This is unrelated, but I have never done any serious testing, and am > wondering what the user/sys results are in all these performance tests. Is > user I/O time and sys CPU time? User is the time spent in user-mode code, ie. mostly SQLite code and operating system libraries. Sys is the time spent in kernel-mode code, ie. kernel, drivers, I/O operatons. In both cases the values are per-process and thus they should be more accurate than wall clock time. Sum of both values should give the total time spent executing a given command. Best regards, Filip Navara ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
This is unrelated, but I have never done any serious testing, and am wondering what the user/sys results are in all these performance tests. Is user I/O time and sys CPU time? On Sun, Sep 4, 2011 at 11:43 AM, Filip Navarawrote: > Updated results against current trunk: > > CPU Time: user 82.009726 sys 129.636831 > > I rerun the profiler and looked at the results. Most of them looked > sensible, but one thing still struck me. Once the new index is created > in the VDBE Sorter it gets copied back into the new Btree using a loop > with the following OPCodes: > > sqlite3VdbeAddOp2(v, OP_SorterData, iSorter, regRecord); > sqlite3VdbeAddOp3(v, OP_IdxInsert, iIdx, regRecord, 1); > sqlite3VdbeChangeP5(v, OPFLAG_USESEEKRESULT); > > The OPFLAG_USESEEKRESULT is no-op in this case since OP_IsUnique is > never used in the merge-sort branch of code. OP_IdxInsert always seeks > for the new Btree row in the resulting tree. Couldn't we exploit the > fact that the rows are already sorted to construct the Btree faster? > > Best regards, > Filip Navara > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
Updated results against current trunk: CPU Time: user 82.009726 sys 129.636831 I rerun the profiler and looked at the results. Most of them looked sensible, but one thing still struck me. Once the new index is created in the VDBE Sorter it gets copied back into the new Btree using a loop with the following OPCodes: sqlite3VdbeAddOp2(v, OP_SorterData, iSorter, regRecord); sqlite3VdbeAddOp3(v, OP_IdxInsert, iIdx, regRecord, 1); sqlite3VdbeChangeP5(v, OPFLAG_USESEEKRESULT); The OPFLAG_USESEEKRESULT is no-op in this case since OP_IsUnique is never used in the merge-sort branch of code. OP_IdxInsert always seeks for the new Btree row in the resulting tree. Couldn't we exploit the fact that the rows are already sorted to construct the Btree faster? Best regards, Filip Navara ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
On Fri, Sep 2, 2011 at 6:04 PM, Dan Kennedywrote: > On 09/02/2011 07:32 PM, Filip Navara wrote: >> >> On Fri, Sep 2, 2011 at 11:04 AM, Filip Navara >> wrote: >> *snip* >>> >>> The time to create an index on my 266 Mb experimental database is more >>> than 9 minutes. >> >> *snip* >> >> I erroneously measured the time with DEBUG build, so I've rerun the >> experiment with several SQLite versions: >> >> [2869ed2829] Leaf: Avoid using uninitialized variables after failures >> in the merge sort code. (user: drh, tags: trunk) >> CPU Time: user 107.359888 sys 135.050066 >> >> [7769fb988d] Instead of a temporary b-tree, use a linked-list and >> merge-sort to sort records in main memory in vdbesort.c. (user: dan, >> tags: merge-sort) >> CPU Time: user 118.451559 sys 132.117247 >> >> [71075673c6] Leaf: If all data being sorted fits in memory, avoid >> writing any data out to temporary files in vdbesort.c. (user: dan, >> tags: merge-sort) >> CPU Time: user 116.813549 sys 132.710051 > > Thanks for doing this. There is (hopefully) a fix for the performance > regression up now. Thanks, it looks much better now: [68e26c4487] Leaf: The build works again with -DSQLITE_OMIT_MERGE_SORT. The merge-sorter now avoids spilling to disk (letting the in-memory linked list grow without bound) if PRAGMA temp_store=3. (user: drh, tags: merge-sort) CPU Time: user 86.486954 sys 132.273248 Best regards, Filip Navara ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
On 09/02/2011 07:32 PM, Filip Navara wrote: On Fri, Sep 2, 2011 at 11:04 AM, Filip Navarawrote: *snip* The time to create an index on my 266 Mb experimental database is more than 9 minutes. *snip* I erroneously measured the time with DEBUG build, so I've rerun the experiment with several SQLite versions: [2869ed2829] Leaf: Avoid using uninitialized variables after failures in the merge sort code. (user: drh, tags: trunk) CPU Time: user 107.359888 sys 135.050066 [7769fb988d] Instead of a temporary b-tree, use a linked-list and merge-sort to sort records in main memory in vdbesort.c. (user: dan, tags: merge-sort) CPU Time: user 118.451559 sys 132.117247 [71075673c6] Leaf: If all data being sorted fits in memory, avoid writing any data out to temporary files in vdbesort.c. (user: dan, tags: merge-sort) CPU Time: user 116.813549 sys 132.710051 Thanks for doing this. There is (hopefully) a fix for the performance regression up now. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
On 09/02/2011 09:24 PM, Simon Slavin wrote: On 2 Sep 2011, at 10:04am, Filip Navara wrote: The time to create an index on my 266 Mb experimental database is more than 9 minutes. The database is available at http://www.emclient.com/temp/mail_index.zip and the command I use to create the index is CREATE INDEX "idx_MailAddresses_address" ON "MailAddresses" ("type", "address", "parentId"); I had run the shell under profiler Strangely, on my Mac running the shell tool provided with OS X 10.7.1, SQLite 3.7.5, there seems to be a problem. It's still going after more than 2 hours. Loading your database and running your CREATE INDEX command, the application only seems to be using about 1% of one of my CPUs. I looked to see if it was i/o bound instead of CPU bound but it seems only to be reading 320KB/s and my computer can handle a lot more than that. (All above figures from Activity Monitor.) We were just wondering a half hour ago how long this would take with 3.7.7. Thanks! Released versions of SQLite build an index by inserting all values from the indexed column(s) in whatever order they appear in the table (i.e. unsorted order) into a new b-tree. This is fine if the index b-tree you are constructing fits in the cache. If it doesn't fit in the cache you have a problem. Each time you go to insert a new entry into the b-tree you have to find the leaf page that the new entry will be added to. Since your b-tree doesn't fit in the cache, odds are that this means reading the page from the file-system. And since you are inserting in arbitrary order, the page could be anywhere in the database (or WAL) file. In the worst case, if your page is not cached in OS memory, you may even have to shift the disk arm to get at it. Way slow. The result is that reading data from disk becomes the bottleneck when writing unsorted values to a b-tree. Hence your 1% CPU measurement. The new version uses a merge-sort to sort all the index entries before it inserts them into the b-tree. This way it doesn't matter if your b-tree is larger than the cache, as you are always inserting into the right-most leaf node. No need to go searching through the file-system/disk for pages while building the b-tree. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
On 2 Sep 2011, at 4:38pm, Filip Navara wrote: > With SQLite 3.7.7 and older the index creation takes eons since the > file is in WAL mode and the journal file grows uncontrollably. Yeah. It was 5.1Gig, and the underlying database was only 266Meg. Heh. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
On Fri, Sep 2, 2011 at 5:23 PM, Simon Slavinwrote: > > On 2 Sep 2011, at 3:24pm, Simon Slavin wrote: > >> On 2 Sep 2011, at 10:04am, Filip Navara wrote: >> >>> The time to create an index on my 266 Mb experimental database is more >>> than 9 minutes. The database is available at >>> http://www.emclient.com/temp/mail_index.zip and the command I use to >>> create the index is >>> >>> CREATE INDEX "idx_MailAddresses_address" ON "MailAddresses" ("type", >>> "address", "parentId"); >>> >>> I had run the shell under profiler >> >> Strangely, on my Mac running the shell tool provided with OS X 10.7.1, >> SQLite 3.7.5, there seems to be a problem. It's still going after more than >> 2 hours. > > I correct myself: it had finished. It just wasn't showing the next prompt, > for some reason. After force-quitting and restarting the shell it showed the > index as existing and the index worked. > > Simon. With SQLite 3.7.7 and older the index creation takes eons since the file is in WAL mode and the journal file grows uncontrollably. Since I run it on small SSD I was never able to let it finish, it always filled up the disk first. Best regards, Filip Navara ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
On 2 Sep 2011, at 3:24pm, Simon Slavin wrote: > On 2 Sep 2011, at 10:04am, Filip Navara wrote: > >> The time to create an index on my 266 Mb experimental database is more >> than 9 minutes. The database is available at >> http://www.emclient.com/temp/mail_index.zip and the command I use to >> create the index is >> >> CREATE INDEX "idx_MailAddresses_address" ON "MailAddresses" ("type", >> "address", "parentId"); >> >> I had run the shell under profiler > > Strangely, on my Mac running the shell tool provided with OS X 10.7.1, SQLite > 3.7.5, there seems to be a problem. It's still going after more than 2 hours. I correct myself: it had finished. It just wasn't showing the next prompt, for some reason. After force-quitting and restarting the shell it showed the index as existing and the index worked. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
On 2 Sep 2011, at 10:04am, Filip Navara wrote: > The time to create an index on my 266 Mb experimental database is more > than 9 minutes. The database is available at > http://www.emclient.com/temp/mail_index.zip and the command I use to > create the index is > > CREATE INDEX "idx_MailAddresses_address" ON "MailAddresses" ("type", > "address", "parentId"); > > I had run the shell under profiler Strangely, on my Mac running the shell tool provided with OS X 10.7.1, SQLite 3.7.5, there seems to be a problem. It's still going after more than 2 hours. Loading your database and running your CREATE INDEX command, the application only seems to be using about 1% of one of my CPUs. I looked to see if it was i/o bound instead of CPU bound but it seems only to be reading 320KB/s and my computer can handle a lot more than that. (All above figures from Activity Monitor.) Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation
On Fri, Sep 2, 2011 at 11:04 AM, Filip Navarawrote: *snip* > The time to create an index on my 266 Mb experimental database is more > than 9 minutes. *snip* I erroneously measured the time with DEBUG build, so I've rerun the experiment with several SQLite versions: [2869ed2829] Leaf: Avoid using uninitialized variables after failures in the merge sort code. (user: drh, tags: trunk) CPU Time: user 107.359888 sys 135.050066 [7769fb988d] Instead of a temporary b-tree, use a linked-list and merge-sort to sort records in main memory in vdbesort.c. (user: dan, tags: merge-sort) CPU Time: user 118.451559 sys 132.117247 [71075673c6] Leaf: If all data being sorted fits in memory, avoid writing any data out to temporary files in vdbesort.c. (user: dan, tags: merge-sort) CPU Time: user 116.813549 sys 132.710051 Best regards, Filip Navara ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation?
On Fri, May 23, 2008 at 6:37 PM, Nemanja Čorlija <[EMAIL PROTECTED]> wrote: > On Fri, May 23, 2008 at 4:20 PM, Stefan Arentz <[EMAIL PROTECTED]> wrote: >> I have an interesting problem. I need to generate a large table >> periodically. The table contains a unique SHA1 hash code and 6 integer >> values and has about 6 million rows. Generating this table is fast. I >> can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and >> slow disk). The thing that takes a (relatively) long time is the index >> creation on the unique hash code .. 720 seconds. >> >> (I'm importing within transactions with chunks of 25.000 records) >> >> The question is, is there any way to speed up the indexing proces? >> >> Couple of things on my mind: >> >> Doing this all on an in-memory database takes about 150 seconds in >> total. Is it possible to build a database in memory and then dump it >> to disk? >> >> Would it make sense to sort the records that i import? Could that >> result in a quicker index operation? This certainly helps for Berkeley >> DB. But it is lower level and you can use it's internal hashing >> functions on your data to pre-sort it. Maybe SQLite has something >> similar? >> > > It does make sense to presort records before inserting into on-disk > db. You should insert into memory db first and then insert sorted > records into disk db from there. You can batch this into chunks of 25K > rows as you're doing now. But if you have enough memory, it will > probably be better to increase number of rows. > You should also play with increasing cache_size and page_size values. > Pseudo code: > > OPEN :MEMORY: > ATTACH DISK_DB; > > DO > BEGIN; > INSERT INTO MEMORY_TABLE VALUES(...); x 25.000 > INSERT INTO DISK_TABLE SELECT * FROM MEMORY_TABLE ORDER BY sha1_hash; > COMMIT; > WHILE has_more_rows This is a great tip. I implemented the above and it works very well. The load speed on Ubuntu Hardy went down from an hour+ to 6 minutes. I'm now experimenting with the pragmas to see if this can go even more faster. S. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation?
On Fri, May 23, 2008 at 4:20 PM, Stefan Arentz <[EMAIL PROTECTED]> wrote: > I have an interesting problem. I need to generate a large table > periodically. The table contains a unique SHA1 hash code and 6 integer > values and has about 6 million rows. Generating this table is fast. I > can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and > slow disk). The thing that takes a (relatively) long time is the index > creation on the unique hash code .. 720 seconds. > > (I'm importing within transactions with chunks of 25.000 records) > > The question is, is there any way to speed up the indexing proces? > > Couple of things on my mind: > > Doing this all on an in-memory database takes about 150 seconds in > total. Is it possible to build a database in memory and then dump it > to disk? > > Would it make sense to sort the records that i import? Could that > result in a quicker index operation? This certainly helps for Berkeley > DB. But it is lower level and you can use it's internal hashing > functions on your data to pre-sort it. Maybe SQLite has something > similar? > It does make sense to presort records before inserting into on-disk db. You should insert into memory db first and then insert sorted records into disk db from there. You can batch this into chunks of 25K rows as you're doing now. But if you have enough memory, it will probably be better to increase number of rows. You should also play with increasing cache_size and page_size values. Pseudo code: OPEN :MEMORY: ATTACH DISK_DB; DO BEGIN; INSERT INTO MEMORY_TABLE VALUES(...); x 25.000 INSERT INTO DISK_TABLE SELECT * FROM MEMORY_TABLE ORDER BY sha1_hash; COMMIT; WHILE has_more_rows Search this list's archives for "locality of reference" for more helpful tips on the subject of building large indexes in SQLite. > Are there any other tunable options that can improve indexing speed? > > S. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- Nemanja Čorlija <[EMAIL PROTECTED]> ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Speeding up index creation?
On Fri, May 23, 2008 at 10:20:45AM -0400, Stefan Arentz scratched on the wall: > I have an interesting problem. I need to generate a large table > periodically. The table contains a unique SHA1 hash code and 6 integer > values and has about 6 million rows. Generating this table is fast. I > can set it up in less than 90 seconds on a slow iMac (2.16 Ghz and > slow disk). The thing that takes a (relatively) long time is the index > creation on the unique hash code .. 720 seconds. > Doing this all on an in-memory database takes about 150 seconds in > total. Is it possible to build a database in memory and then dump it > to disk? Not at this time. > Are there any other tunable options that can improve indexing speed? The biggest thing you can do to improve indexing performance is to increase the size of the page cache. Assuming you're using the default 1K page size, each page takes up about 1.5K of RAM in the cache. The default cache size is 2000 pages (3MB), but you should crank this up as high as you can while having a reasonable chance of still keeping the whole thing in physical RAM... say 75% of your machine's total RAM, if you've got a gig or two. Just be aware that the value is the number of pages, not the number of KB. The other thing that helps a little is to turn synchronous to off while you're creating the index. This is normally a dangerous thing to do, but if you're just dumping data into a database chances are you can re-start from scratch if things go wrong. You may also see some performance from setting the temp_store to memory. The biggest single thing is the page cache, however. http://www.sqlite.org/pragma.html PRAGMA page_cache PRAGMA synchronous PRAGMA temp_store -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "'People who live in bamboo houses should not throw pandas.' Jesus said that." - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006" ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users