Re: [sqlite] INDEX Types
On Thu, Nov 10, 2011 at 12:28:24AM +0100, GB scratched on the wall: > Ok, just thought it worth mentioning. But a VACUUMed database may be > more efficient if you have multiple columns with multiple indexes > because you get a mixed sequence of data and index pages while > inserting data. VACUUM rearranges them so that pages belonging to an > object are grouped together which in turn helps drawing benefit from > prefetching and caching. While that's true, the difference is more limited for indexes. VACUUM rebuilds tables in ROWID order, hence re-packing both records into a page, and pages into the database file. Indexes, on the other hand, are rebuilt with a table scan, essentially the same as CREATE INDEX on an existing table. This means that if the table rows (in ROWID order) are not already more or less in-order (according to the index) the rebuilt index will suffer from inserted values and re-balanced nodes. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
Doing a file copy has similar behavior. So as long as the file is cached everything is copacetic. That's what leads me to believe it's head thrashing causing this behavior. ./sqlite3 index2.db However, reboot again and add "select count(*) from a;" as the first line of > gendat2.sql > > time sqlite3 index.db 110 > 2.389u 1.123s 0:07.39 47.3% 0+0k 0+0io 0pf+0w > That's faster than the 2nd run before rebooting. Out of interest, if you have the time, instead of doing "select count(*) from a;" can you just copy the file to another file ? Possibly read-a-block-write-a-block ? I'm curious to know whether this makes a difference. I have tried some of these things on a Mac, but rebooting doesn't make anything like so much difference on a Mac. It's slower without the reboot but faster with the reboot. However, I don't have a Mac to play with which isn't also acting as a server. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
Fabian schrieb am 09.11.2011 23:10: I'm running these tests on a very simple database: 1 table and 1 column, so ANALYZE shouldn't have any effect. And I already tested running VACUUM on the database, but it didn't help (which seems logical, because I start with a fresh db each time). Ok, just thought it worth mentioning. But a VACUUMed database may be more efficient if you have multiple columns with multiple indexes because you get a mixed sequence of data and index pages while inserting data. VACUUM rearranges them so that pages belonging to an object are grouped together which in turn helps drawing benefit from prefetching and caching. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 GB> Maybe you could try to use a pagesize that matches the size of a disk > allocation unit or memory page. For Windows since a typical NTFS partition > has a clustersize of 4KiB - which happens to also be the size of a memory > page - a pagesize of 4096 Byte seems to be a good compromise between > pleasing the cache manager and waste of space. I already used a page_size of 4096 for all the benchmarks, together with a large (300MB) cache_size, but thanks for the suggestion anyway! > You could also investigate the effect of VACUUMing and ANALYZEing the > Database. If you have multiple indexes on tables try compiling sqlite > defining the SQLITE_ENABLE_STAT3 switch to get even better results from > ANALYZE. > I'm running these tests on a very simple database: 1 table and 1 column, so ANALYZE shouldn't have any effect. And I already tested running VACUUM on the database, but it didn't help (which seems logical, because I start with a fresh db each time). ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
On Nov 9, 2011, at 10:24 PM, Fabian wrote: > And I'd like to avoid to have some fuzzy logic > that tries to predicts which of the two methods is going to be faster. Perhaps an alternative to your conundrum is the good, old "divide and conquer" approach. In other words, you could partition your data among multiple database files, using a simple hash to write into one partition or another. That, in combination with some astute use of 'attach', is as good an approximation as any of table partitioning in SQLite. http://en.wikipedia.org/wiki/Partition_(database) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
Maybe you could try to use a pagesize that matches the size of a disk allocation unit or memory page. For Windows since a typical NTFS partition has a clustersize of 4KiB - which happens to also be the size of a memory page - a pagesize of 4096 Byte seems to be a good compromise between pleasing the cache manager and waste of space. You could also investigate the effect of VACUUMing and ANALYZEing the Database. If you have multiple indexes on tables try compiling sqlite defining the SQLITE_ENABLE_STAT3 switch to get even better results from ANALYZE. hth gerd ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
On Wed, Nov 9, 2011 at 3:24 PM, Fabianwrote: > 2011/11/9 Nico Williams >> I don't get it. You're reading practically the whole file in a random >> manner, which is painfully slow, so why can't you read the file in one >> fell swoop (i.e., sequential reads)?? > > I'm only reading the whole file when the number of additional inserts is > high enough to cause the whole index to be read from disk. But if I always > pre-cache the database, it will downgrade performance for cases when only > 10 inserts need to be done. And I'd like to avoid to have some fuzzy logic > that tries to predicts which of the two methods is going to be faster. I don't see how to avoid that. Set N=100 inserts before you read the whole thing into memory. You'll need to be able to cache, somewhere, whether the DB has been read since last reboot (you could use a table in the same DB for this). > Besides, pre-caching the file sounds easier than it is to accomplish, > because all methods suggested on this list did not work on Windows (for > example copying the file to null). Windows and the harddrive have their own > logic to decide which data to cache, and I haven't found a simple way to > force a certain file into cache. On many operating systems copying a file to /dev/null or equivalent can fail to read the file into cache. On a Unix the cp(1) utility might mmap() in the file then write(2) the file a page at a time to /dev/null, with the page fault deferred until the last minute, but since /dev/null doesn't use the data, the page fault never comes, thus the file is never read into memory. If you want to read the file reliably you may need to use a SELECT, or actually *read* the file into memory, not just mmap() it. >> Or, if FTS really works better, then use that. > > I will, but I'm trying to understand the issue that i'm facing, not just > workaround it. It seems that FTS doesn't need to read the whole index from > disk, so I'm trying to pinpoint the difference. My best guess is that it > creates a fresh b-tree for the additional inserts, causing the boost in > performance. Yes, it'd be nice to understand what FTS is doing. I can imagine many ways to implement an index that has the performance characteristic you've observed, but with various trade-offs. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
I'm imagining that the index is only loading what it has to for each insert. That results in head thrashing the disk when the file isn't cached. I'm going to profile this and see what pops out. I'm also going to use 3.7.9. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Fabian [fabianpi...@gmail.com] Sent: Wednesday, November 09, 2011 3:30 PM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] INDEX Types 2011/11/9 Black, Michael (IS) <michael.bla...@ngc.com> OK...you're right...a reboot kills it. > I'm glad someone was able to reproduce this on Linux, ruling out the possibility it's a Windows-issue. > However, reboot again and add "select count(*) from a;" as the first line > of gendat2.sql > So if a simple SELECT COUNT(*) can speed up the same insert from 90 seconds to 7 seconds (including the count), does this confirm it's a sequential vs random reads problem? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
On Nov 9, 2011, at 10:24 PM, Fabian wrote: > It seems that FTS doesn't need to read the whole index from > disk, so I'm trying to pinpoint the difference. My best guess is that it > creates a fresh b-tree for the additional inserts, causing the boost in > performance. Indeed. Quoting the fine manual: "Multiple b-tree structures are used instead of a single b-tree to reduce the cost of inserting records into FTS tables. When a new record is inserted into an FTS table that already contains a lot of data, it is likely that many of the terms in the new record are already present in a large number of existing records. If a single b-tree were used, then large doclist structures would have to be loaded from the database, amended to include the new docid and term-offset list, then written back to the database. Using multiple b-tree tables allows this to be avoided by creating a new b-tree which can be merged with the existing b-tree (or b-trees) later on. Merging of b-tree structures can be performed as a background task, or once a certain number of separate b-tree structures have been accumulated. Of course, this scheme makes queries more expensive (as the FTS code may have to look up individual terms in more than one b-tree and merge the results), but it has been found that in practice this overhead is often negligible." http://www.sqlite.org/fts3.html#section_8 ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Black, Michael (IS)OK...you're right...a reboot kills it. > I'm glad someone was able to reproduce this on Linux, ruling out the possibility it's a Windows-issue. > However, reboot again and add "select count(*) from a;" as the first line > of gendat2.sql > So if a simple SELECT COUNT(*) can speed up the same insert from 90 seconds to 7 seconds (including the count), does this confirm it's a sequential vs random reads problem? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Nico Williams> > I don't get it. You're reading practically the whole file in a random > manner, which is painfully slow, so why can't you read the file in one > fell swoop (i.e., sequential reads)?? > I'm only reading the whole file when the number of additional inserts is high enough to cause the whole index to be read from disk. But if I always pre-cache the database, it will downgrade performance for cases when only 10 inserts need to be done. And I'd like to avoid to have some fuzzy logic that tries to predicts which of the two methods is going to be faster. Besides, pre-caching the file sounds easier than it is to accomplish, because all methods suggested on this list did not work on Windows (for example copying the file to null). Windows and the harddrive have their own logic to decide which data to cache, and I haven't found a simple way to force a certain file into cache. Or, if FTS really works better, then use that. I will, but I'm trying to understand the issue that i'm facing, not just workaround it. It seems that FTS doesn't need to read the whole index from disk, so I'm trying to pinpoint the difference. My best guess is that it creates a fresh b-tree for the additional inserts, causing the boost in performance. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
OK...you're right...a reboot kills it. Here's the program I generated the data with: #include main() { char sql[4096]; char key[9]; long *lkey=key; int i; FILE *fp1,*fp2; fp1=fopen("gendat1.sql","w"); fp2=fopen("gendat2.sql","w"); fputs("CREATE TABLE t (a text);",fp1); fputs("CREATE INDEX tx on t(a);",fp1); fputs("BEGIN;",fp1); fputs("BEGIN;",fp2); for(i=0;i<110;++i) { int j; for(j=0;j<8;++j) { key[j]=(random()%26)+'a'; } if (i < 100) { fprintf(fp1,"insert into t values('%s');\n",key); } else { fprintf(fp2,"insert into t values(%ld);\n",*lkey); } } fputs("COMMIT;",fp1); fputs("COMMIT;",fp2); fclose(fp1); fclose(fp2); } time sqlite3 index.db < gendat1.sql 21.594u 5.497s 0:27.87 97.1%0+0k 0+0io 0pf+0w time sqlite3 index.db < gendat2.sql 12.302u 1.035s 0:04.46 74.6% 0+0k 0+0io 0pf+0w rm index.db time sqlite3 index.db < gendat1.sql 20.641u 6.477s 0:27.91 97.1%0+0k 0+0io 0pf+0w reboot time sqlite3 index.db < gendat2.sql 2.474u 1.219s 1:28.56 4.1% 0+0k 0+0io 0pf+0w However, reboot again and add "select count(*) from a;" as the first line of gendat2.sql time sqlite3 index.db /proc/sys/vm/drop_caches You don't see the same performance drop. So I'm wondering is this is disk cache instead of Linux cache. Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Fabian [fabianpi...@gmail.com] Sent: Wednesday, November 09, 2011 1:36 PM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] INDEX Types 2011/11/9 Luuk <luu...@gmail.com> > > no, i did not reboot, I created two NEW databases, so rebooting should not > make a big difference because its not cached in both cases. If you just created it, why wouldn't it be in cache? You just written all the data, so it's highly likely to be in some kind of cache (OS,HD,etc.). As already explained to Michael, I experience no problems when I create the database, close it, re-open it, and add some more rows. The problem only appears if I reboot in between those steps. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
On Wed, Nov 9, 2011 at 1:53 PM, Fabianwrote: > 2011/11/9 Nico Williams >> What's wrong with reading the whole file into memory at boot time as a >> way to prime the cache? Rebooting always takes some time, mostly the >> time to read all sorts of files. >> > It's a desktop application, I cannot pre-cache anything before the user > launches the app. After the launch a couple of inserts need to be done, but > having to read the whole database file into memory, just to be able to > perform those inserts, doesn't seem a scalable solution. I don't get it. You're reading practically the whole file in a random manner, which is painfully slow, so why can't you read the file in one fell swoop (i.e., sequential reads)?? You've picked a technology that scales only vertically with the hardware. If you've reached the limit of that hardware (no, you've not) then it's time to upgrade the hardware or switch to a technology that scales horizontally. But you're not there yet. And this is for a desktop app, so horizontal scaling would seem out of the question (or else you'd have built a client/server app in the first place). This leaves you with pre-fetching the file. You could defer the pre-fetching as much as possible (e.g., wait until the Nth operation, for N<<10,000), but that's about as much as you can do. Or, if FTS really works better, then use that. nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Nico Williams> > What's wrong with reading the whole file into memory at boot time as a > way to prime the cache? Rebooting always takes some time, mostly the > time to read all sorts of files. > > It's a desktop application, I cannot pre-cache anything before the user launches the app. After the launch a couple of inserts need to be done, but having to read the whole database file into memory, just to be able to perform those inserts, doesn't seem a scalable solution. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
Fabian, What's wrong with reading the whole file into memory at boot time as a way to prime the cache? Rebooting always takes some time, mostly the time to read all sorts of files. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
On Wed, Nov 9, 2011 at 12:04 PM, Fabianwrote: > 2011/11/9 Simon Slavin >> Didn't someone recently note that entering the first million records was >> fast, but if he then closed and reopened the database, entering the next >> 100,000 records was slow ? >> > Yes, and there is still no real explanation for it, other than slow disk > reads. But even with very slow random disk I/O, 30 seconds seems still way > too slow for a 100MB file. I posted some math on this assuming typical disk seek times. 30 seconds is very much less than the worst case scenario. I don't have an answer regarding FTS. Nico -- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
On 09-11-2011 20:14, Fabian wrote: 2011/11/9 LuukOn 09-11-2011 17:23, Black, Michael (IS) wrote: time sqlite3 $ time sqlite3 Did you do a reboot between the second insert? Because the difference I'm seeing is much larger than 38%? Did you test it on Linux or Windows? Linux (Linux opensuse 2.6.27.56-0.1-default #1 SMP 2010-12-01 16:57:58 +0100 x86_64 x86_64 x86_64 GNU/Linux) no, i did not reboot, I created two NEW databases, so rebooting should not make a big difference because its not cached in both cases. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Luuk> On 09-11-2011 17:23, Black, Michael (IS) wrote: > >> time sqlite3> 19.307u 0.082s 0:19.40 99.8%0+0k 0+0io 0pf+0w >> >> time sqlite3< index2.sql >> 19.266u 0.092s 0:19.37 99.8%0+0k 0+0io 0pf+0w >> >> > > $ time sqlite3 > real0m21.094s > user0m20.989s > sys 0m0.104s > $ time sqlite3 > real0m20.898s > user0m20.813s > sys 0m0.084s > $ time sqlite3 gendat1.sqlite > real2m32.701s > user0m26.038s > sys 0m5.256s > $ time sqlite3 gendat2.sqlite > real1m50.452s > user0m25.534s > sys 0m5.360s > > > The difference when creating a db on disk (size: 35Mb) is (152-110)/110= > 38% > > Did you do a reboot between the second insert? Because the difference I'm seeing is much larger than 38%? Did you test it on Linux or Windows? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
On 09-11-2011 17:23, Black, Michael (IS) wrote: time sqlite3 $ time sqlite3 http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Black, Michael (IS)> Are you sure you're using BEGIN/COMMIT on your transactions? > Yes > I just used my benchmark data and inserted another 100,000 rows into the > database in 2.3 seconds. That is because you immediately insert those additional rows, after creating the database. I get the same excellent performance as you when I do that. But if you make sure the database is not in cache (for example, reboot the machine), you will see that the additional inserts will take MUCH longer than to be reasonally expected. Could you repeat your benchmark, with a reboot between the two transactions? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
Are you sure you're using BEGIN/COMMIT on your transactions? I just used my benchmark data and inserted another 100,000 rows into the database in 2.3 seconds. I made 1,100,000 records and cut the last 100,000 into a seperate file with BEGIN/COMMIT on both. time sqlite3 index.db > > Didn't someone recently note that entering the first million records was > fast, but if he then closed and reopened the database, entering the next > 100,000 records was slow ? > > Yes, and there is still no real explanation for it, other than slow disk reads. But even with very slow random disk I/O, 30 seconds seems still way too slow for a 100MB file. But today I made a new observation: if I create the same table as a virtual FTS4 table, I can add the additional rows within 1 second (even on an un-cached database file). So if the reason for the slowness is disk-reads, the FTS4 way of creating/updating the index requires much less reads? Maybe because it allows for seperate tree-branches? FTS is overkill for my simple requirements, but if it's so much faster than a regular index, why not? The only things that's holding me back from switching to FTS for this table is: - I cannot use the UNIQUE constraint, to disallow duplicate values. - I cannot search efficiently for rows that DON'T match a certain value, because FTS doesn't allow a single NOT operator. So can someone explain what FTS is doing behind the scenes that makes these additional inserts so much faster? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Simon Slavin> > Didn't someone recently note that entering the first million records was > fast, but if he then closed and reopened the database, entering the next > 100,000 records was slow ? > > Yes, and there is still no real explanation for it, other than slow disk reads. But even with very slow random disk I/O, 30 seconds seems still way too slow for a 100MB file. But today I made a new observation: if I create the same table as a virtual FTS4 table, I can add the additional rows within 1 second (even on an un-cached database file). So if the reason for the slowness is disk-reads, the FTS4 way of creating/updating the index requires much less reads? Maybe because it allows for seperate tree-branches? FTS is overkill for my simple requirements, but if it's so much faster than a regular index, why not? The only things that's holding me back from switching to FTS for this table is: - I cannot use the UNIQUE constraint, to disallow duplicate values. - I cannot search efficiently for rows that DON'T match a certain value, because FTS doesn't allow a single NOT operator. So can someone explain what FTS is doing behind the scenes that makes these additional inserts so much faster? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
On 9 Nov 2011, at 4:42pm, Fabian wrote: > 2011/11/9 Black, Michael (IS)> >> Hmmm...appears to be the same for this case which, I must say, I find >> surprising. > > Thanks for actually benchmarking it. I'm also a bit surprised, because I > always thought SQLite handled INTEGER more efficiently than TEXT. In order to save filespace, SQLite stores integers not as a fixed number of bytes per value, but as a different number of bytes depending on the value. This would make things slower than being able to read a fixed number of bytes from disk and using double-word maths for the values you've read. > I also did some new benchmarks on the inserts-slowdown I experience, and to > rule out VirtualBox, I performed them on real hardware, but the results are > the same: > > 10 secs to create a database with 1 million rows, and 30 secs to add an > additional 10.000 rows, if the database file is un-cached. > > Maybe this is normal behaviour, but how can it possibly be that it's faster > to delete the whole database and re-create it, than just adding a > (relatively) small amount of additional rows? Didn't someone recently note that entering the first million records was fast, but if he then closed and reopened the database, entering the next 100,000 records was slow ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
2011/11/9 Black, Michael (IS)> Hmmm...appears to be the same for this case which, I must say, I find > surprising. > > Thanks for actually benchmarking it. I'm also a bit surprised, because I always thought SQLite handled INTEGER more efficiently than TEXT. I also did some new benchmarks on the inserts-slowdown I experience, and to rule out VirtualBox, I performed them on real hardware, but the results are the same: 10 secs to create a database with 1 million rows, and 30 secs to add an additional 10.000 rows, if the database file is un-cached. Maybe this is normal behaviour, but how can it possibly be that it's faster to delete the whole database and re-create it, than just adding a (relatively) small amount of additional rows? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
> But I'm wondering if SQLite can deal more efficiently with a INTEGER index > (64bits) VS an 8-byte TEXT column (also 64bits). I know the INTEGERs > require less disk-space because SQLite can store smaller values in fewer > bytes, but are there any other differences that make them more preferable > as an INDEX? If there is no difference in performance, I could just take > the first 8 characters of the TEXT column as a hash-value, instead of > calculating a CRC64 each time. My first thought was "sure, text will be compared byte-by-byte, 64-bit integers on a 64-bit CPU will be compared all in one operation". But then SQLite should read integer from database first and I'm not sure if it does some optimization to read the whole integer in one CPU operation, maybe it does that also byte-by-byte. So probably you won't notice much difference in performance of either way of indexing. Pavel On Wed, Nov 9, 2011 at 10:23 AM, Fabianwrote: > I'm having an issue where inserts on an un-cached database are very slow. > The reason probably is that a large part of the existing index needs to be > read from disk, to be able to insert new rows to the index. The length of > the values in the indexed column are around 60 bytes, so I'm thinking about > adding an extra column, containing a shorter hash (64bits) of the actual > value, and move the index to that column instead. This way the total size > of the index (and the database) will be much smaller, hopefully resulting > in faster inserts. > > But I'm wondering if SQLite can deal more efficiently with a INTEGER index > (64bits) VS an 8-byte TEXT column (also 64bits). I know the INTEGERs > require less disk-space because SQLite can store smaller values in fewer > bytes, but are there any other differences that make them more preferable > as an INDEX? If there is no difference in performance, I could just take > the first 8 characters of the TEXT column as a hash-value, instead of > calculating a CRC64 each time. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] INDEX Types
On 9 Nov 2011, at 3:23pm, Fabian wrote: > I'm having an issue where inserts on an un-cached database are very slow. > The reason probably is that a large part of the existing index needs to be > read from disk, to be able to insert new rows to the index. Are you doing a lot of INSERT commands one after another ? Things will go far faster if you wrap them in a transaction: BEGIN TRANSACTION; INSERT ...; INSERT ...; INSERT ...; END TRANSACTION; > The length of > the values in the indexed column are around 60 bytes, so I'm thinking about > adding an extra column, containing a shorter hash (64bits) of the actual > value, and move the index to that column instead. This way the total size > of the index (and the database) will be much smaller, hopefully resulting > in faster inserts. > > But I'm wondering if SQLite can deal more efficiently with a INTEGER index > (64bits) VS an 8-byte TEXT column (also 64bits). I know the INTEGERs > require less disk-space because SQLite can store smaller values in fewer > bytes, but are there any other differences that make them more preferable > as an INDEX? Yes. It is far faster to compare two integers than it is to compare two strings. This will speed up everything to do with the index, and make the index take up less space on disk. Of course, you will have to take the time to calculate a hash value for each string. > If there is no difference in performance, I could just take > the first 8 characters of the TEXT column as a hash-value, instead of > calculating a CRC64 each time. This would be a very fast HASHing calculation. If it's easy to program, then it might be worth a try. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users