Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase
I'll explain our needs a little more and then perhaps you can tell me if the current approach is the optimal one or not. In our application, if we were to have a single file, due to the billions of records we need to store, the file would easily exceed the 14TB maximum limit of a database file. In the application, the nature of the data is such that we really need many millions of tables, but the contents of each table is completely independent of another; we don't ever need to run a join or query multiple tables when making a query. We also know the exact table name + database to use for a query based on the table and database names. This is the reason we wanted to try and fit as many tables into a database file as practical. Each table is not expected to have more than 5000 rows. I saw some posts in the past where Dr. Richard Hipp said it's fine to have 100's or even 1000's of tables per database file as long as you keep the application that uses this database long running where you don't constantly open and close it. We would fit into this model. Performance is important in this app, so this is the reason we started using WAL mode to improve write speeds. 70% of the operations we have are inserts + updates. Any thoughts on this approach? Thanks From: Simon Slavin >To: Terry Cumaranatunge ; General Discussion of SQLite >Database >Sent: Friday, September 2, 2011 12:41 PM >Subject: Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens >increase > > >On 2 Sep 2011, at 6:34pm, Terry Cumaranatunge wrote: > >> This does suggest that we should re-evaluate our design and not open as many >> databases from a single process. The original motivation was to limit the >> size the database for performance reasons, which resulted in a large number >> of database files. I think we are going to try creating multiple tables per >> database and evaluate its performance. >> >> Are there any general guidelines on a tested limit on the number of tables >> in a database or a general rule of thumb on the maximum recommended size for >> a database file? > >I'm concerned that you might be involved in premature optimization. The >performance of SQLite does not greatly degrade with larger tables or larger >databases. In fact you might be slowing your system down more by trying to >arrange multiple databases. Have you tried just using SQLite in the naive way >-- with one big table in one database -- and seeing whether that provides a >solution too slow for you ? > >Simon. > > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase
On 2 Sep 2011, at 6:34pm, Terry Cumaranatunge wrote: > This does suggest that we should re-evaluate our design and not open as many > databases from a single process. The original motivation was to limit the > size the database for performance reasons, which resulted in a large number > of database files. I think we are going to try creating multiple tables per > database and evaluate its performance. > > Are there any general guidelines on a tested limit on the number of tables in > a database or a general rule of thumb on the maximum recommended size for a > database file? I'm concerned that you might be involved in premature optimization. The performance of SQLite does not greatly degrade with larger tables or larger databases. In fact you might be slowing your system down more by trying to arrange multiple databases. Have you tried just using SQLite in the naive way -- with one big table in one database -- and seeing whether that provides a solution too slow for you ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase
Thanks everyone for your responses. We modified the sqlite findReusableFd function call to return 0 immediately (i.e. no file descriptor to reuse) and it significantly improved the performance. The time to open 20K db's went down from 75 secs to 24 secs. The modification to findReusableFd works for us because we don't plan on opening multiple connections to the same database from one process. This does suggest that we should re-evaluate our design and not open as many databases from a single process. The original motivation was to limit the size the database for performance reasons, which resulted in a large number of database files. I think we are going to try creating multiple tables per database and evaluate its performance. Are there any general guidelines on a tested limit on the number of tables in a database or a general rule of thumb on the maximum recommended size for a database file? From: Pavel Ivanov >To: General Discussion of SQLite Database >Sent: Friday, September 2, 2011 9:28 AM >Subject: Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens >increase > >> Does anyone know why the performance degrades this way and what can be done >> to resolve the problem? > >Do you have by any chance shared cache turned on? I believe when >shared cache is used SQLite searches through all open databases to >understand if the one you want to open is already opened and available >for cache reuse. > >Also I believe there's some similar search happens through all open >file descriptors in unix VFS. So that if several file descriptors to >the same file are open there was no issues with dropping advisory >locks on one descriptor while second is closing. And data below seem >to prove that this is the problem for you. > >> I don't see the times you're seeing using this program... >> time ./db 5000 >> 1.602s >> >> time ./db 1 >> 5.357s >> >> time ./db 15000 >> 11.280s >> >> time ./db 2 >> 19.570s >> >> time ./db 25000 >> 28.166s > >Michael, >So in your test first 5000 are opened in 1.6 seconds and last 5000 are >opened in 8.6 seconds. I'd say it's a noticeable increase that can't >be attributed to OS. > >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls us/call us/call name >> 52.06 0.76 0.76 5000 152.00 152.00 findInodeInfo >> 43.84 1.40 0.64 5000 128.00 128.00 findReusableFd > >That's exactly what I talked about above. Roughly speaking >findReusableFd searches through all open file descriptors to >understand if the process already have some for the file you want to >open. Or more precisely it looks for fd for the same file which is >still open when corresponding sqlite3* connection is already closed >(and yes if you open 5 connections to the same file and then close 4 >of them SQLite will still keep 5 file descriptors open to avoid >problems with POSIX advisory locks). > > >Pavel > > >On Fri, Sep 2, 2011 at 10:08 AM, Black, Michael (IS) > wrote: >> I profiled the sqlite3 test program below and I get this for 25,000 files >> >> >> >> Flat profile: >> >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls ms/call ms/call name >> 61.15 17.61 17.61 25000 0.70 0.70 findInodeInfo >> 36.67 28.17 10.56 25000 0.42 0.42 findReusableFd >> >> For 5,000 files looks like this: >> >> Each sample counts as 0.01 seconds. >> % cumulative self self total >> time seconds seconds calls us/call us/call name >> 52.06 0.76 0.76 5000 152.00 152.00 findInodeInfo >> 43.84 1.40 0.64 5000 128.00 128.00 findReusableFd >> >> >> >> Michael D. Black >> >> Senior Scientist >> >> NG Information Systems >> >> Advanced Analytics Directorate >> >> >> >> >> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on >> behalf of Black, Michael (IS) [michael.bla...@ngc.com] >> Sent: Friday, September 02, 2011 8:00 AM >> To: General Discussion of SQLite Database >> Subject: EXT :Re: [sqlite] sqlite3_open_v2 performance degrades as number of >> opens increase >> >> I assume you've overridden the system default for 1024 files in ulimit for # >> of open files? >> >> I don't see the times yo
Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase
On Fri, Sep 02, 2011 at 06:30:57AM -0500, Terry Cumaranatunge scratched on the wall: > Hello, > > We have an application that creates many small databases (over 100K) to be > able to control the size of the database and provide more deterministic > performance. At process startup, it opens many of these databases in a loop > to keep them open for faster transaction response times. The behavior we are > noticing is that the it takes progressively a longer time for each > sqlite3_open_v2 to complete as the number of databases kept > opened increases. These are some of the measurements: > > 5000 DBs = 3 secs > 1 DBs = 11 secs > 15000 DBs = 35 secs > 2 DBs = 75 secs > > Many processes can concurrently open 5000 db's at the same time and it takes > about the same time as a single process doing the work. So, it doesn't > appear to be related to the OS related issue with the number of opens. I'd guess the issue is OS related, but is at the process level, not the system level. For example, if a process holds open file descriptors in an array, many operations (such as scanning for the lowest unused descriptor number) are going to be O(n). I would run a quick test that just calls the system level open(2) type call, and see if you observe the same type of slow-down. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase
> Does anyone know why the performance degrades this way and what can be done > to resolve the problem? Do you have by any chance shared cache turned on? I believe when shared cache is used SQLite searches through all open databases to understand if the one you want to open is already opened and available for cache reuse. Also I believe there's some similar search happens through all open file descriptors in unix VFS. So that if several file descriptors to the same file are open there was no issues with dropping advisory locks on one descriptor while second is closing. And data below seem to prove that this is the problem for you. > I don't see the times you're seeing using this program... > time ./db 5000 > 1.602s > > time ./db 1 > 5.357s > > time ./db 15000 > 11.280s > > time ./db 2 > 19.570s > > time ./db 25000 > 28.166s Michael, So in your test first 5000 are opened in 1.6 seconds and last 5000 are opened in 8.6 seconds. I'd say it's a noticeable increase that can't be attributed to OS. > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls us/call us/call name > 52.06 0.76 0.76 5000 152.00 152.00 findInodeInfo > 43.84 1.40 0.64 5000 128.00 128.00 findReusableFd That's exactly what I talked about above. Roughly speaking findReusableFd searches through all open file descriptors to understand if the process already have some for the file you want to open. Or more precisely it looks for fd for the same file which is still open when corresponding sqlite3* connection is already closed (and yes if you open 5 connections to the same file and then close 4 of them SQLite will still keep 5 file descriptors open to avoid problems with POSIX advisory locks). Pavel On Fri, Sep 2, 2011 at 10:08 AM, Black, Michael (IS) wrote: > I profiled the sqlite3 test program below and I get this for 25,000 files > > > > Flat profile: > > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls ms/call ms/call name > 61.15 17.61 17.61 25000 0.70 0.70 findInodeInfo > 36.67 28.17 10.56 25000 0.42 0.42 findReusableFd > > For 5,000 files looks like this: > > Each sample counts as 0.01 seconds. > % cumulative self self total > time seconds seconds calls us/call us/call name > 52.06 0.76 0.76 5000 152.00 152.00 findInodeInfo > 43.84 1.40 0.64 5000 128.00 128.00 findReusableFd > > > > Michael D. Black > > Senior Scientist > > NG Information Systems > > Advanced Analytics Directorate > > > > > From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on > behalf of Black, Michael (IS) [michael.bla...@ngc.com] > Sent: Friday, September 02, 2011 8:00 AM > To: General Discussion of SQLite Database > Subject: EXT :Re: [sqlite] sqlite3_open_v2 performance degrades as number of > opens increase > > I assume you've overridden the system default for 1024 files in ulimit for # > of open files? > > I don't see the times you're seeing using this program...though my databases > are empty which probably makes a difference. > I do see the gradual increase in time...I think this is almost all due to the > OS when you try and open thousands of files in one process. > > I ran this once to create 25,000 databases > ./db 25000 > ls -l file* | wc -l > 25000 > The tested opening them > time ./db 5000 > 1.602s > > time ./db 1 > 5.357s > > time ./db 15000 > 11.280s > > time ./db 2 > 19.570s > > time ./db 25000 > 28.166s > > #include > #include > #include > #include "sqlite3.h" > > int main(int argc,char *argv[]) > { > int i,rc; > sqlite3 *db; > for(i=0;i char name[4096]; > sprintf(name,"file%d",i); > > rc=sqlite3_open_v2(name,&db,SQLITE_OPEN_READWRITE|SQLITE_OPEN_CREATE|SQLITE_OPEN_NOMUTEX,NULL); > if (rc != SQLITE_OK) { > printf("%s",sqlite3_errmsg(db)); > exit(1); > } > //sqlite3_close(db); // if you close the database it runs a LOT faster. > So I don't think it's sqlite3 causing it. > } > return 0; > } > > If you run this you'll see similar behavior > #include > #include > #include > #include > #include > #include > #include > > int main(int argc,char *argv[]) > { > int i; > int fd; > for(i=0;i char name[4096]; > sprintf(name,"%dfile",i); >
Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase
I profiled the sqlite3 test program below and I get this for 25,000 files Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls ms/call ms/call name 61.15 17.6117.6125000 0.70 0.70 findInodeInfo 36.67 28.1710.5625000 0.42 0.42 findReusableFd For 5,000 files looks like this: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls us/call us/call name 52.06 0.76 0.76 5000 152.00 152.00 findInodeInfo 43.84 1.40 0.64 5000 128.00 128.00 findReusableFd Michael D. Black Senior Scientist NG Information Systems Advanced Analytics Directorate From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Black, Michael (IS) [michael.bla...@ngc.com] Sent: Friday, September 02, 2011 8:00 AM To: General Discussion of SQLite Database Subject: EXT :Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase I assume you've overridden the system default for 1024 files in ulimit for # of open files? I don't see the times you're seeing using this program...though my databases are empty which probably makes a difference. I do see the gradual increase in time...I think this is almost all due to the OS when you try and open thousands of files in one process. I ran this once to create 25,000 databases ./db 25000 ls -l file* | wc -l 25000 The tested opening them time ./db 5000 1.602s time ./db 1 5.357s time ./db 15000 11.280s time ./db 2 19.570s time ./db 25000 28.166s #include #include #include #include "sqlite3.h" int main(int argc,char *argv[]) { int i,rc; sqlite3 *db; for(i=0;i #include #include #include #include #include #include int main(int argc,char *argv[]) { int i; int fd; for(i=0;ihttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase
I assume you've overridden the system default for 1024 files in ulimit for # of open files? I don't see the times you're seeing using this program...though my databases are empty which probably makes a difference. I do see the gradual increase in time...I think this is almost all due to the OS when you try and open thousands of files in one process. I ran this once to create 25,000 databases ./db 25000 ls -l file* | wc -l 25000 The tested opening them time ./db 5000 1.602s time ./db 1 5.357s time ./db 15000 11.280s time ./db 2 19.570s time ./db 25000 28.166s #include #include #include #include "sqlite3.h" int main(int argc,char *argv[]) { int i,rc; sqlite3 *db; for(i=0;i #include #include #include #include #include #include int main(int argc,char *argv[]) { int i; int fd; for(i=0;ihttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase
My first thought would be to check the amount of memory being used by your many connections. Each connection will consume a non-trivial amount of resources (page cache, file handles, OS file cache, etc.) It's certainly plausible that your overall system performance is reduced as you run out of physical memory (or other system resources). As such, I'd double check your free / available memory as you open more and more connections-- see if there is a significant impact. On a side note, trying to manage 100K or more separate databases sounds excessive. Doing so somewhat defeats the purpose of a nice relational database. If you properly index your tables, I would think you could still achieve similar / reasonable performance, even after combining the many smaller databases into fewer larger ones. [Just my 2 cents.] On 9/2/11, Terry Cumaranatunge wrote: > Hello, > > We have an application that creates many small databases (over 100K) to be > able to control the size of the database and provide more deterministic > performance. At process startup, it opens many of these databases in a loop > to keep them open for faster transaction response times. The behavior we are > noticing is that the it takes progressively a longer time for each > sqlite3_open_v2 to complete as the number of databases kept > opened increases. These are some of the measurements: > > 5000 DBs = 3 secs > 1 DBs = 11 secs > 15000 DBs = 35 secs > 2 DBs = 75 secs > > Many processes can concurrently open 5000 db's at the same time and it takes > about the same time as a single process doing the work. So, it doesn't > appear to be related to the OS related issue with the number of opens. > > The open is done as: > sqlite3_open_v2(dbName, db, SQLITE_OPEN_READWRITE | SQLITE_OPEN_NOMUTEX, > NULL) > > We are using RHEL 6 with SQLite 3.7.7.1. Each database was created with > journaling mode set to WAL. > Does anyone know why the performance degrades this way and what can be done > to resolve the problem? > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase
On 2 Sep 2011, at 12:30pm, Terry Cumaranatunge wrote: > We have an application that creates many small databases (over 100K) to be > able to control the size of the database and provide more deterministic > performance. At process startup, it opens many of these databases in a loop > to keep them open for faster transaction response times. The behavior we are > noticing is that the it takes progressively a longer time for each > sqlite3_open_v2 to complete as the number of databases kept > opened increases. These are some of the measurements: > > 5000 DBs = 3 secs > 1 DBs = 11 secs > 15000 DBs = 35 secs > 2 DBs = 75 secs > > Many processes can concurrently open 5000 db's at the same time and it takes > about the same time as a single process doing the work. So, it doesn't > appear to be related to the OS related issue with the number of opens. I suspect that your application is having to work through a big list each time you specify a database to work on. This might be caused by your own code or there might be something inside one of the SQLite routines that needs to keep a list of open connections. How are you storing your connection handles ? In other words, what are you handing sqlite3_open_v2() for its sqlite3** value ? Are you perhaps using a big array, or a linked list ? As you can probably guess, creating 2 databases with the same structure is something one doesn't often do since this is what table columns are for. Although you have your reasons for doing so, you might find merging them all into one database may actually save you time and memory. Of course I don't know how hard it will be to change your programming to run a test with this way of doing things. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] sqlite3_open_v2 performance degrades as number of opens increase
Hello, We have an application that creates many small databases (over 100K) to be able to control the size of the database and provide more deterministic performance. At process startup, it opens many of these databases in a loop to keep them open for faster transaction response times. The behavior we are noticing is that the it takes progressively a longer time for each sqlite3_open_v2 to complete as the number of databases kept opened increases. These are some of the measurements: 5000 DBs = 3 secs 1 DBs = 11 secs 15000 DBs = 35 secs 2 DBs = 75 secs Many processes can concurrently open 5000 db's at the same time and it takes about the same time as a single process doing the work. So, it doesn't appear to be related to the OS related issue with the number of opens. The open is done as: sqlite3_open_v2(dbName, db, SQLITE_OPEN_READWRITE | SQLITE_OPEN_NOMUTEX, NULL) We are using RHEL 6 with SQLite 3.7.7.1. Each database was created with journaling mode set to WAL. Does anyone know why the performance degrades this way and what can be done to resolve the problem? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users