Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase

2011-09-04 Thread Terry Cumaranatunge
I'll explain our needs a little more and then perhaps you can tell me if the 
current approach is the optimal one or not. In our application, if we were to 
have a single file, due to the billions of records we need to store, the file 
would easily exceed the 14TB maximum limit of a database file. In the 
application, the nature of the data is such that we really need many millions 
of tables, but the contents of each table is completely independent of another; 
we don't ever need to run a join or query multiple tables when making a query. 
We also know the exact table name + database to use for a query based on the 
table and database names. This is the reason we wanted to try and fit as many 
tables into a database file as practical. Each table is not expected to have 
more than 5000 rows.
 
I saw some posts in the past where Dr. Richard Hipp said it's fine to have 
100's or even 1000's of tables per database file as long as you keep the 
application that uses this database long running where you don't constantly 
open and close it. We would fit into this model. Performance is important in 
this app, so this is the reason we started using WAL mode to improve write 
speeds. 70% of the operations we have are inserts + updates.
 
Any thoughts on this approach?
 
Thanks

From: Simon Slavin 
>To: Terry Cumaranatunge ; General Discussion of SQLite 
>Database 
>Sent: Friday, September 2, 2011 12:41 PM
>Subject: Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens 
>increase
>
>
>On 2 Sep 2011, at 6:34pm, Terry Cumaranatunge wrote:
>
>> This does suggest that we should re-evaluate our design and not open as many 
>> databases from a single process. The original motivation was to limit the 
>> size the database for performance reasons, which resulted in a large number 
>> of database files. I think we are going to try creating multiple tables per 
>> database and evaluate its performance.
>>  
>> Are there any general guidelines on a tested limit on the number of tables 
>> in a database or a general rule of thumb on the maximum recommended size for 
>> a database file?
>
>I'm concerned that you might be involved in premature optimization.  The 
>performance of SQLite does not greatly degrade with larger tables or larger 
>databases.  In fact you might be slowing your system down more by trying to 
>arrange multiple databases.  Have you tried just using SQLite in the naive way 
>-- with one big table in one database -- and seeing whether that provides a 
>solution too slow for you ?
>
>Simon.
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase

2011-09-02 Thread Simon Slavin

On 2 Sep 2011, at 6:34pm, Terry Cumaranatunge wrote:

> This does suggest that we should re-evaluate our design and not open as many 
> databases from a single process. The original motivation was to limit the 
> size the database for performance reasons, which resulted in a large number 
> of database files. I think we are going to try creating multiple tables per 
> database and evaluate its performance.
>  
> Are there any general guidelines on a tested limit on the number of tables in 
> a database or a general rule of thumb on the maximum recommended size for a 
> database file?

I'm concerned that you might be involved in premature optimization.  The 
performance of SQLite does not greatly degrade with larger tables or larger 
databases.  In fact you might be slowing your system down more by trying to 
arrange multiple databases.  Have you tried just using SQLite in the naive way 
-- with one big table in one database -- and seeing whether that provides a 
solution too slow for you ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase

2011-09-02 Thread Terry Cumaranatunge
Thanks everyone for your responses.
 
We modified the sqlite findReusableFd function call to return 0 immediately 
(i.e. no file descriptor to reuse) and it significantly improved the 
performance. The time to open 20K db's went down from 75 secs to 24 secs. The 
modification to findReusableFd works for us because we don't plan on opening 
multiple connections to the same database from one process.
 
This does suggest that we should re-evaluate our design and not open as many 
databases from a single process. The original motivation was to limit the size 
the database for performance reasons, which resulted in a large number of 
database files. I think we are going to try creating multiple tables per 
database and evaluate its performance.
 
Are there any general guidelines on a tested limit on the number of tables in a 
database or a general rule of thumb on the maximum recommended size for a 
database file?

From: Pavel Ivanov 
>To: General Discussion of SQLite Database 
>Sent: Friday, September 2, 2011 9:28 AM
>Subject: Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens 
>increase
>
>> Does anyone know why the performance degrades this way and what can be done
>> to resolve the problem?
>
>Do you have by any chance shared cache turned on? I believe when
>shared cache is used SQLite searches through all open databases to
>understand if the one you want to open is already opened and available
>for cache reuse.
>
>Also I believe there's some similar search happens through all open
>file descriptors in unix VFS. So that if several file descriptors to
>the same file are open there was no issues with dropping advisory
>locks on one descriptor while second is closing. And data below seem
>to prove that this is the problem for you.
>
>> I don't see the times you're seeing using this program...
>> time ./db 5000
>> 1.602s
>>
>> time ./db 1
>> 5.357s
>>
>> time ./db 15000
>> 11.280s
>>
>> time ./db 2
>> 19.570s
>>
>> time ./db 25000
>> 28.166s
>
>Michael,
>So in your test first 5000 are opened in 1.6 seconds and last 5000 are
>opened in 8.6 seconds. I'd say it's a noticeable increase that can't
>be attributed to OS.
>
>> Each sample counts as 0.01 seconds.
>>  %   cumulative   self              self     total
>>  time   seconds   seconds    calls  us/call  us/call  name
>>  52.06      0.76     0.76     5000   152.00   152.00  findInodeInfo
>>  43.84      1.40     0.64     5000   128.00   128.00  findReusableFd
>
>That's exactly what I talked about above. Roughly speaking
>findReusableFd searches through all open file descriptors to
>understand if the process already have some for the file you want to
>open. Or more precisely it looks for fd for the same file which is
>still open when corresponding sqlite3* connection is already closed
>(and yes if you open 5 connections to the same file and then close 4
>of them SQLite will still keep 5 file descriptors open to avoid
>problems with POSIX advisory locks).
>
>
>Pavel
>
>
>On Fri, Sep 2, 2011 at 10:08 AM, Black, Michael (IS)
> wrote:
>> I profiled the sqlite3 test program below and I get this for 25,000 files
>>
>>
>>
>> Flat profile:
>>
>> Each sample counts as 0.01 seconds.
>>  %   cumulative   self              self     total
>>  time   seconds   seconds    calls  ms/call  ms/call  name
>>  61.15     17.61    17.61    25000     0.70     0.70  findInodeInfo
>>  36.67     28.17    10.56    25000     0.42     0.42  findReusableFd
>>
>> For 5,000 files looks like this:
>>
>> Each sample counts as 0.01 seconds.
>>  %   cumulative   self              self     total
>>  time   seconds   seconds    calls  us/call  us/call  name
>>  52.06      0.76     0.76     5000   152.00   152.00  findInodeInfo
>>  43.84      1.40     0.64     5000   128.00   128.00  findReusableFd
>>
>>
>>
>> Michael D. Black
>>
>> Senior Scientist
>>
>> NG Information Systems
>>
>> Advanced Analytics Directorate
>>
>>
>>
>> 
>> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
>> behalf of Black, Michael (IS) [michael.bla...@ngc.com]
>> Sent: Friday, September 02, 2011 8:00 AM
>> To: General Discussion of SQLite Database
>> Subject: EXT :Re: [sqlite] sqlite3_open_v2 performance degrades as number of 
>> opens increase
>>
>> I assume you've overridden the system default for 1024 files in ulimit for # 
>> of open files?
>>
>> I don't see the times yo

Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase

2011-09-02 Thread Jay A. Kreibich
On Fri, Sep 02, 2011 at 06:30:57AM -0500, Terry Cumaranatunge scratched on the 
wall:
> Hello,
> 
> We have an application that creates many small databases (over 100K) to be
> able to control the size of the database and provide more deterministic
> performance. At process startup, it opens many of these databases in a loop
> to keep them open for faster transaction response times. The behavior we are
> noticing is that the it takes progressively a longer time for each
> sqlite3_open_v2 to complete as the number of databases kept
> opened increases. These are some of the measurements:
> 
> 5000 DBs = 3 secs
> 1 DBs = 11 secs
> 15000 DBs = 35 secs
> 2 DBs = 75 secs
> 
> Many processes can concurrently open 5000 db's at the same time and it takes
> about the same time as a single process doing the work. So, it doesn't
> appear to be related to the OS related issue with the number of opens.

  I'd guess the issue is OS related, but is at the process level, not
  the system level.  For example, if a process holds open file
  descriptors in an array, many operations (such as scanning for the
  lowest unused descriptor number) are going to be O(n).

  I would run a quick test that just calls the system level open(2)
  type call, and see if you observe the same type of slow-down.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase

2011-09-02 Thread Pavel Ivanov
> Does anyone know why the performance degrades this way and what can be done
> to resolve the problem?

Do you have by any chance shared cache turned on? I believe when
shared cache is used SQLite searches through all open databases to
understand if the one you want to open is already opened and available
for cache reuse.

Also I believe there's some similar search happens through all open
file descriptors in unix VFS. So that if several file descriptors to
the same file are open there was no issues with dropping advisory
locks on one descriptor while second is closing. And data below seem
to prove that this is the problem for you.

> I don't see the times you're seeing using this program...
> time ./db 5000
> 1.602s
>
> time ./db 1
> 5.357s
>
> time ./db 15000
> 11.280s
>
> time ./db 2
> 19.570s
>
> time ./db 25000
> 28.166s

Michael,
So in your test first 5000 are opened in 1.6 seconds and last 5000 are
opened in 8.6 seconds. I'd say it's a noticeable increase that can't
be attributed to OS.

> Each sample counts as 0.01 seconds.
>  %   cumulative   self              self     total
>  time   seconds   seconds    calls  us/call  us/call  name
>  52.06      0.76     0.76     5000   152.00   152.00  findInodeInfo
>  43.84      1.40     0.64     5000   128.00   128.00  findReusableFd

That's exactly what I talked about above. Roughly speaking
findReusableFd searches through all open file descriptors to
understand if the process already have some for the file you want to
open. Or more precisely it looks for fd for the same file which is
still open when corresponding sqlite3* connection is already closed
(and yes if you open 5 connections to the same file and then close 4
of them SQLite will still keep 5 file descriptors open to avoid
problems with POSIX advisory locks).


Pavel


On Fri, Sep 2, 2011 at 10:08 AM, Black, Michael (IS)
 wrote:
> I profiled the sqlite3 test program below and I get this for 25,000 files
>
>
>
> Flat profile:
>
> Each sample counts as 0.01 seconds.
>  %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  61.15     17.61    17.61    25000     0.70     0.70  findInodeInfo
>  36.67     28.17    10.56    25000     0.42     0.42  findReusableFd
>
> For 5,000 files looks like this:
>
> Each sample counts as 0.01 seconds.
>  %   cumulative   self              self     total
>  time   seconds   seconds    calls  us/call  us/call  name
>  52.06      0.76     0.76     5000   152.00   152.00  findInodeInfo
>  43.84      1.40     0.64     5000   128.00   128.00  findReusableFd
>
>
>
> Michael D. Black
>
> Senior Scientist
>
> NG Information Systems
>
> Advanced Analytics Directorate
>
>
>
> 
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
> behalf of Black, Michael (IS) [michael.bla...@ngc.com]
> Sent: Friday, September 02, 2011 8:00 AM
> To: General Discussion of SQLite Database
> Subject: EXT :Re: [sqlite] sqlite3_open_v2 performance degrades as number of 
> opens increase
>
> I assume you've overridden the system default for 1024 files in ulimit for # 
> of open files?
>
> I don't see the times you're seeing using this program...though my databases 
> are empty which probably makes a difference.
> I do see the gradual increase in time...I think this is almost all due to the 
> OS when you try and open thousands of files in one process.
>
> I ran this once to create 25,000 databases
> ./db 25000
> ls -l file* | wc -l
> 25000
> The tested opening them
> time ./db 5000
> 1.602s
>
> time ./db 1
> 5.357s
>
> time ./db 15000
> 11.280s
>
> time ./db 2
> 19.570s
>
> time ./db 25000
> 28.166s
>
> #include 
> #include 
> #include 
> #include "sqlite3.h"
>
> int main(int argc,char *argv[])
> {
>  int i,rc;
>  sqlite3 *db;
>  for(i=0;i    char name[4096];
>    sprintf(name,"file%d",i);
>    
> rc=sqlite3_open_v2(name,&db,SQLITE_OPEN_READWRITE|SQLITE_OPEN_CREATE|SQLITE_OPEN_NOMUTEX,NULL);
>    if (rc != SQLITE_OK) {
>        printf("%s",sqlite3_errmsg(db));
>        exit(1);
>    }
>    //sqlite3_close(db); // if you close the database it runs a LOT faster.  
> So I don't think it's sqlite3 causing it.
>  }
>  return 0;
> }
>
> If you run this you'll see similar behavior
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> int main(int argc,char *argv[])
> {
>  int i;
>  int fd;
>  for(i=0;i    char name[4096];
>    sprintf(name,"%dfile",i);
> 

Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase

2011-09-02 Thread Black, Michael (IS)
I profiled the sqlite3 test program below and I get this for 25,000 files



Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total
 time   seconds   secondscalls  ms/call  ms/call  name
 61.15 17.6117.6125000 0.70 0.70  findInodeInfo
 36.67 28.1710.5625000 0.42 0.42  findReusableFd

For 5,000 files looks like this:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total
 time   seconds   secondscalls  us/call  us/call  name
 52.06  0.76 0.76 5000   152.00   152.00  findInodeInfo
 43.84  1.40 0.64 5000   128.00   128.00  findReusableFd



Michael D. Black

Senior Scientist

NG Information Systems

Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Black, Michael (IS) [michael.bla...@ngc.com]
Sent: Friday, September 02, 2011 8:00 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] sqlite3_open_v2 performance degrades as number of 
opens increase

I assume you've overridden the system default for 1024 files in ulimit for # of 
open files?

I don't see the times you're seeing using this program...though my databases 
are empty which probably makes a difference.
I do see the gradual increase in time...I think this is almost all due to the 
OS when you try and open thousands of files in one process.

I ran this once to create 25,000 databases
./db 25000
ls -l file* | wc -l
25000
The tested opening them
time ./db 5000
1.602s

time ./db 1
5.357s

time ./db 15000
11.280s

time ./db 2
19.570s

time ./db 25000
28.166s

#include 
#include 
#include 
#include "sqlite3.h"

int main(int argc,char *argv[])
{
  int i,rc;
  sqlite3 *db;
  for(i=0;i
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc,char *argv[])
{
  int i;
  int fd;
  for(i=0;ihttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase

2011-09-02 Thread Black, Michael (IS)
I assume you've overridden the system default for 1024 files in ulimit for # of 
open files?

I don't see the times you're seeing using this program...though my databases 
are empty which probably makes a difference.
I do see the gradual increase in time...I think this is almost all due to the 
OS when you try and open thousands of files in one process.

I ran this once to create 25,000 databases
./db 25000
ls -l file* | wc -l
25000
The tested opening them
time ./db 5000
1.602s

time ./db 1
5.357s

time ./db 15000
11.280s

time ./db 2
19.570s

time ./db 25000
28.166s

#include 
#include 
#include 
#include "sqlite3.h"

int main(int argc,char *argv[])
{
  int i,rc;
  sqlite3 *db;
  for(i=0;i
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc,char *argv[])
{
  int i;
  int fd;
  for(i=0;ihttp://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase

2011-09-02 Thread Eric Minbiole
My first thought would be to check the amount of memory being used by
your many connections.  Each connection will consume a non-trivial
amount of resources (page cache, file handles, OS file cache, etc.)
It's certainly plausible that your overall system performance is
reduced as you run out of physical memory (or other system resources).
 As such, I'd double check your free / available memory as you open
more and more connections-- see if there is a significant impact.

On a side note, trying to manage 100K or more separate databases
sounds excessive.  Doing so somewhat defeats the purpose of a nice
relational database.  If you properly index your tables, I would think
you could still achieve similar / reasonable performance, even after
combining the many smaller databases into fewer larger ones.  [Just my
2 cents.]

On 9/2/11, Terry Cumaranatunge  wrote:
> Hello,
>
> We have an application that creates many small databases (over 100K) to be
> able to control the size of the database and provide more deterministic
> performance. At process startup, it opens many of these databases in a loop
> to keep them open for faster transaction response times. The behavior we are
> noticing is that the it takes progressively a longer time for each
> sqlite3_open_v2 to complete as the number of databases kept
> opened increases. These are some of the measurements:
>
> 5000 DBs = 3 secs
> 1 DBs = 11 secs
> 15000 DBs = 35 secs
> 2 DBs = 75 secs
>
> Many processes can concurrently open 5000 db's at the same time and it takes
> about the same time as a single process doing the work. So, it doesn't
> appear to be related to the OS related issue with the number of opens.
>
> The open is done as:
> sqlite3_open_v2(dbName, db, SQLITE_OPEN_READWRITE  | SQLITE_OPEN_NOMUTEX,
> NULL)
>
> We are using RHEL 6 with SQLite 3.7.7.1. Each database was created with
> journaling mode set to WAL.
> Does anyone know why the performance degrades this way and what can be done
> to resolve the problem?
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3_open_v2 performance degrades as number of opens increase

2011-09-02 Thread Simon Slavin

On 2 Sep 2011, at 12:30pm, Terry Cumaranatunge wrote:

> We have an application that creates many small databases (over 100K) to be
> able to control the size of the database and provide more deterministic
> performance. At process startup, it opens many of these databases in a loop
> to keep them open for faster transaction response times. The behavior we are
> noticing is that the it takes progressively a longer time for each
> sqlite3_open_v2 to complete as the number of databases kept
> opened increases. These are some of the measurements:
> 
> 5000 DBs = 3 secs
> 1 DBs = 11 secs
> 15000 DBs = 35 secs
> 2 DBs = 75 secs
> 
> Many processes can concurrently open 5000 db's at the same time and it takes
> about the same time as a single process doing the work. So, it doesn't
> appear to be related to the OS related issue with the number of opens.

I suspect that your application is having to work through a big list each time 
you specify a database to work on.  This might be caused by your own code or 
there might be something inside one of the SQLite routines that needs to keep a 
list of open connections.  How are you storing your connection handles ?  In 
other words, what are you handing sqlite3_open_v2() for its sqlite3** value ?  
Are you perhaps using a big array, or a linked list ?

As you can probably guess, creating 2 databases with the same structure is 
something one doesn't often do since this is what table columns are for.  
Although you have your reasons for doing so, you might find merging them all 
into one database may actually save you time and memory.  Of course I don't 
know how hard it will be to change your programming to run a test with this way 
of doing things.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite3_open_v2 performance degrades as number of opens increase

2011-09-02 Thread Terry Cumaranatunge
Hello,

We have an application that creates many small databases (over 100K) to be
able to control the size of the database and provide more deterministic
performance. At process startup, it opens many of these databases in a loop
to keep them open for faster transaction response times. The behavior we are
noticing is that the it takes progressively a longer time for each
sqlite3_open_v2 to complete as the number of databases kept
opened increases. These are some of the measurements:

5000 DBs = 3 secs
1 DBs = 11 secs
15000 DBs = 35 secs
2 DBs = 75 secs

Many processes can concurrently open 5000 db's at the same time and it takes
about the same time as a single process doing the work. So, it doesn't
appear to be related to the OS related issue with the number of opens.

The open is done as:
sqlite3_open_v2(dbName, db, SQLITE_OPEN_READWRITE  | SQLITE_OPEN_NOMUTEX,
NULL)

We are using RHEL 6 with SQLite 3.7.7.1. Each database was created with
journaling mode set to WAL.
Does anyone know why the performance degrades this way and what can be done
to resolve the problem?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users