Re: [sqlite] Concurrent readonly access to a large database.
> Anyway, the iostat output of my system is > > 2011 Oct 22 21:16:36, load: 0.03, disk_r: 2676 KB, disk_w: 0 KB > > UID PID PPID CMD DEVICE MAJ MIN D BYTES > 0 0 0 ?? 14 8 65536 > 503 732 730 sqlite3 ?? 14 14 R 1323008 > 503 731 730 sqlite3 ?? 14 14 R 1355776 > > If I understand correctly, the IO load is only 3% when two sqlite3 > processes are running No, 0.03 is load of your system CPU-wise. I'm sorry, I had a Linux's iostat output format in mind. You have apparently something else (Mac OS X?) and its iostat has completely different output. Pavel On Sat, Oct 22, 2011 at 11:05 PM, Bo Peng wrote: >> It's not only speed in KB/sec that matters. It's also disk system >> usage as reported by iostat. If it's close to 100% then SQLite can't >> do any better. > > A sad day. > > I copied the database to a faster driver with RAID 0, made another > copy of the database (referred to as DB1), and ran another set of > tests: > > test1: two sequential processes of sqlite count(*) table1 and table 2 > in DB1 --> 7m15s > > test2: two concurrent processes of sqlite count(*) table1 and table2 > in DB1 --> 5m22s > > test3: four concurrent processes of sqlite count(*) table 1, 2, 3, and > 4 in DB1 --> 12m58s > > test4: two concurrent processes of sqlite count(*) table1 in DB1, and > table1 in DB2 --> 9m51s. > > Although running two or more processes can save some time, the > performance gain is not that big (tests 2 and 3), splitting the > database into several smaller ones would not help either (test 4). > > Anyway, the iostat output of my system is > > 2011 Oct 22 21:16:36, load: 0.03, disk_r: 2676 KB, disk_w: 0 KB > > UID PID PPID CMD DEVICE MAJ MIN D BYTES > 0 0 0 ?? 14 8 65536 > 503 732 730 sqlite3 ?? 14 14 R 1323008 > 503 731 730 sqlite3 ?? 14 14 R 1355776 > > If I understand correctly, the IO load is only 3% when two sqlite3 > processes are running, so perhaps I can still tweak sqlite3 to run > faster. I will also copy the database around and see if other disks > (SSD?), operating system (linux?), and file systems can provide better > performance. > > Thanks again for all the help, > Bo > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent readonly access to a large database.
On 23 Oct 2011, at 4:05am, Bo Peng wrote: > If I understand correctly, the IO load is only 3% when two sqlite3 > processes are running, so perhaps I can still tweak sqlite3 to run > faster. I will also copy the database around and see if other disks > (SSD?), operating system (linux?), and file systems can provide better > performance. Tweaking SQLite probably won't help. The speed of your task is limited by the speed of your hard disk. You could have 1 process or 10, but they all have to get at the database through the hard disk, as Igor wrote. Adding more processes just means more processes queued up to wait until the hard disk decides to fetch the bit of the database they want next. You could speed this task up by splitting your database into two and putting each part on a different hard disk, then using two processes to fetch the information. Or you could use triggers to keep the figure you're trying to find constantly updated in another table. Or you could use a faster caching database management system. Or you could just live with it as it is now. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent readonly access to a large database.
> It's not only speed in KB/sec that matters. It's also disk system > usage as reported by iostat. If it's close to 100% then SQLite can't > do any better. A sad day. I copied the database to a faster driver with RAID 0, made another copy of the database (referred to as DB1), and ran another set of tests: test1: two sequential processes of sqlite count(*) table1 and table 2 in DB1 --> 7m15s test2: two concurrent processes of sqlite count(*) table1 and table2 in DB1 --> 5m22s test3: four concurrent processes of sqlite count(*) table 1, 2, 3, and 4 in DB1 --> 12m58s test4: two concurrent processes of sqlite count(*) table1 in DB1, and table1 in DB2 --> 9m51s. Although running two or more processes can save some time, the performance gain is not that big (tests 2 and 3), splitting the database into several smaller ones would not help either (test 4). Anyway, the iostat output of my system is 2011 Oct 22 21:16:36, load: 0.03, disk_r: 2676 KB, disk_w: 0 KB UIDPID PPID CMD DEVICE MAJ MIN DBYTES 0 0 0 ?? 14 8 65536 503732730 sqlite3 ?? 14 14 R 1323008 503731730 sqlite3 ?? 14 14 R 1355776 If I understand correctly, the IO load is only 3% when two sqlite3 processes are running, so perhaps I can still tweak sqlite3 to run faster. I will also copy the database around and see if other disks (SSD?), operating system (linux?), and file systems can provide better performance. Thanks again for all the help, Bo ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent readonly access to a large database.
On Sat, Oct 22, 2011 at 6:02 PM, Bo Peng wrote: >> You may create multiple threads, but your hard drive only has one set of >> heads. > > I now realize this problem and is moving the data to a faster drive. > However, when copying the data out, the activity monitor reports > 40MB/sec read speed. I admit this is not a fast drive, but comparing > to the peak 700KB/sec read speed when three instances of sqlite3 were > running, there might still be room for improvement at the sqlite side. It's not only speed in KB/sec that matters. It's also disk system usage as reported by iostat. If it's close to 100% then SQLite can't do any better. Pavel ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent readonly access to a large database.
On 22 Oct 2011, at 10:57pm, Igor Tandetnik wrote: > You may create multiple threads, but your hard drive only has one set of > heads. Right. I use lots of Macs. I also think you're I/O bound. I think you're just better resign yourself to an overnight run. If you're going to do this a lot, it would be good to merge all your tables. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent readonly access to a large database.
> You may create multiple threads, but your hard drive only has one set of > heads. I now realize this problem and is moving the data to a faster drive. However, when copying the data out, the activity monitor reports 40MB/sec read speed. I admit this is not a fast drive, but comparing to the peak 700KB/sec read speed when three instances of sqlite3 were running, there might still be room for improvement at the sqlite side. Bo ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent readonly access to a large database.
Bo Peng wrote: > I have a database with about 5000 tables each with more than 1 million > records. I needed to get some summary statistics of each table but > find that it will take days to run 'SELECT count(*) FROM table_XX' > (XX=1,...,5000) sequentially. I therefore created 10 threads, each > having its own database connection (sqlite3.connect() from python). To > my dismay, it appears that only one thread was allowed to access the > database at any time so the overall speed was not improved. You may create multiple threads, but your hard drive only has one set of heads. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent readonly access to a large database.
On Sat, Oct 22, 2011 at 4:18 PM, Pavel Ivanov wrote: >> Using three tables (e.g. test.sh 111 112 113), the first command takes >> 13m3s, the second command takes 12m45s. I am wondering if there is any >> magic to make the second script finish in 5 minutes by executing the >> query in parallel ... > > Try to execute "pragma cache_size = 100" before executing select. > It should be executed in the same sqlite3 run to have any effect, so > your argument to sqlite3 should look like "pragma cache_size = > 100; SELECT COUNT(*) FROM table_$arg;". I think disk I/O kills > your performance. iostat could prove you that. The performance is even worse with this option (13m55s). However, I notice that the disk activity monitor (I am running a mac) reports 700 KB/sec for all test scripts I ran, so it is very likely that your suspicion (i/o kills performance) is right. I am moving the data to a faster drive and will report back later. Thanks, Bo ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent readonly access to a large database.
> Using three tables (e.g. test.sh 111 112 113), the first command takes > 13m3s, the second command takes 12m45s. I am wondering if there is any > magic to make the second script finish in 5 minutes by executing the > query in parallel ... Try to execute "pragma cache_size = 100" before executing select. It should be executed in the same sqlite3 run to have any effect, so your argument to sqlite3 should look like "pragma cache_size = 100; SELECT COUNT(*) FROM table_$arg;". I think disk I/O kills your performance. iostat could prove you that. Pavel On Sat, Oct 22, 2011 at 5:11 PM, Bo Peng wrote: >> >> Multithreaded mode allows SQLite to be accessed via multiple threads as long >> as threads don't shared connection handles. This is the what's sometimes >> called the apartment model of multithreading. > > Thank you very much for your quick reply. > > Is there a way to enable multi-thread mode from command line? Before > changing the source code of my program, I would like to see the effect > of multi-threading from command line. More specifically, I have two > scripts: > > #!/bin/bash > for arg in $* > do > echo "Handling " $arg > sqlite3 esp.DB "SELECT COUNT(*) FROM table_$arg;" > done > > and > > #!/bin/bash > for arg in $* > do > echo "Handling " $arg > sqlite3 esp.DB "SELECT COUNT(*) FROM table_$arg;" & > done > wait > > Using three tables (e.g. test.sh 111 112 113), the first command takes > 13m3s, the second command takes 12m45s. I am wondering if there is any > magic to make the second script finish in 5 minutes by executing the > query in parallel ... > > Bo > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent readonly access to a large database.
> > Multithreaded mode allows SQLite to be accessed via multiple threads as long > as threads don't shared connection handles. This is the what's sometimes > called the apartment model of multithreading. Thank you very much for your quick reply. Is there a way to enable multi-thread mode from command line? Before changing the source code of my program, I would like to see the effect of multi-threading from command line. More specifically, I have two scripts: #!/bin/bash for arg in $* do echo "Handling " $arg sqlite3 esp.DB "SELECT COUNT(*) FROM table_$arg;" done and #!/bin/bash for arg in $* do echo "Handling " $arg sqlite3 esp.DB "SELECT COUNT(*) FROM table_$arg;" & done wait Using three tables (e.g. test.sh 111 112 113), the first command takes 13m3s, the second command takes 12m45s. I am wondering if there is any magic to make the second script finish in 5 minutes by executing the query in parallel ... Bo ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent readonly access to a large database.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 22/10/11 12:52, Bo Peng wrote: > I mean, is it possible, in theory, to read a sqlite database from > multiple threads/processes each with performance comparable to a > single thread/process? Yes. The details are explained here: http://www.sqlite.org/lockingv3.html You'd also be far better off having fewer tables with a column corresponding to the table name in a master table. Also if you are doing counts like that then you could maintain a table of the counts using triggers to update it. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk6jJPMACgkQmOOfHg372QRFdwCgki4sfnummmja1WlB3aNiXPYI DNkAniiOkT2XUTNF62G/RXg1DD/aLSle =BmWY -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Concurrent readonly access to a large database.
The default threading mode for SQLite can be Serialized, which means only one thread at a time. See http://www.sqlite.org/threadsafe.html However, you can change it to Multithreaded either at compile time or via a call to sqlite3_config() (that's in the C API -- I don't know about Python, but I presume it has a version of the call) which allows you to use to select a different threading mode. You can also use flags on the open call as well. Multithreaded mode allows SQLite to be accessed via multiple threads as long as threads don't shared connection handles. This is the what's sometimes called the apartment model of multithreading. Best, Peter > -Original Message- > From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users- > boun...@sqlite.org] On Behalf Of Bo Peng > Sent: Saturday, October 22, 2011 12:53 PM > To: sqlite-users@sqlite.org > Subject: [sqlite] Concurrent readonly access to a large database. > > Dear all, > > I have a database with about 5000 tables each with more than 1 million > records. I needed to get some summary statistics of each table but > find that it will take days to run 'SELECT count(*) FROM table_XX' > (XX=1,...,5000) sequentially. I therefore created 10 threads, each > having its own database connection (sqlite3.connect() from python). To > my dismay, it appears that only one thread was allowed to access the > database at any time so the overall speed was not improved. I further > created 10 processes, each running one query (e.g. 'sqlite3 > file:mydatabase?mode=ro "SELECT count(*) FROM table_XX"'), but I still > could see any improvement in performance... > > Because each thread/process took only 10M RAM and about 1% CPU, > staring 10 threads/processes should not be a problem at all. Before I > investigate further (e.g. programming error, bottleneck of disk I/O), > can anyone tell me if I am going the right direction? I mean, is it > possible, in theory, to read a sqlite database from multiple > threads/processes each with performance comparable to a single > thread/process? I understand writing to a database from multiple > threads can be troublesome but I only need to read from it. > > Many thanks in advance, > Bo > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users