Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Pavel Ivanov
> Anyway, the iostat output of my system is
>
> 2011 Oct 22 21:16:36,  load: 0.03,  disk_r:   2676 KB,  disk_w:      0 KB
>
>  UID    PID   PPID CMD              DEVICE  MAJ MIN D            BYTES
>    0      0      0                  ??       14   8              65536
>  503    732    730 sqlite3          ??       14  14 R          1323008
>  503    731    730 sqlite3          ??       14  14 R          1355776
>
> If I understand correctly, the IO load is only 3% when two sqlite3
> processes are running

No, 0.03 is load of your system CPU-wise. I'm sorry, I had a Linux's
iostat output format in mind. You have apparently something else (Mac
OS X?) and its iostat has completely different output.


Pavel


On Sat, Oct 22, 2011 at 11:05 PM, Bo Peng  wrote:
>> It's not only speed in KB/sec that matters. It's also disk system
>> usage as reported by iostat. If it's close to 100% then SQLite can't
>> do any better.
>
> A sad day.
>
> I copied the database to a faster driver with RAID 0, made another
> copy of the database (referred to as DB1), and ran another set of
> tests:
>
> test1: two sequential processes of sqlite count(*) table1 and table 2
> in DB1 --> 7m15s
>
> test2: two concurrent processes of sqlite count(*) table1 and table2
> in DB1 --> 5m22s
>
> test3: four concurrent processes of sqlite count(*) table 1, 2, 3, and
> 4 in DB1 --> 12m58s
>
> test4: two concurrent processes of sqlite count(*) table1 in DB1, and
> table1 in DB2 --> 9m51s.
>
> Although running two or more processes can save some time, the
> performance gain is not that big (tests 2 and 3), splitting the
> database into several smaller ones would not help either (test 4).
>
> Anyway, the iostat output of my system is
>
> 2011 Oct 22 21:16:36,  load: 0.03,  disk_r:   2676 KB,  disk_w:      0 KB
>
>  UID    PID   PPID CMD              DEVICE  MAJ MIN D            BYTES
>    0      0      0                  ??       14   8              65536
>  503    732    730 sqlite3          ??       14  14 R          1323008
>  503    731    730 sqlite3          ??       14  14 R          1355776
>
> If I understand correctly, the IO load is only 3% when two sqlite3
> processes are running, so perhaps I can still tweak sqlite3 to run
> faster. I will also copy the database around and see if other disks
> (SSD?), operating system (linux?), and file systems can provide better
> performance.
>
> Thanks again for all the help,
> Bo
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Simon Slavin

On 23 Oct 2011, at 4:05am, Bo Peng wrote:

> If I understand correctly, the IO load is only 3% when two sqlite3
> processes are running, so perhaps I can still tweak sqlite3 to run
> faster. I will also copy the database around and see if other disks
> (SSD?), operating system (linux?), and file systems can provide better
> performance.

Tweaking SQLite probably won't help.  The speed of your task is limited by the 
speed of your hard disk.  You could have 1 process or 10, but they all have to 
get at the database through the hard disk, as Igor wrote.  Adding more 
processes just means more processes queued up to wait until the hard disk 
decides to fetch the bit of the database they want next.

You could speed this task up by splitting your database into two and putting 
each part on a different hard disk, then using two processes to fetch the 
information.  Or you could use triggers to keep the figure you're trying to 
find constantly updated in another table.  Or you could use a faster caching 
database management system.  Or you could just live with it as it is now.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Bo Peng
> It's not only speed in KB/sec that matters. It's also disk system
> usage as reported by iostat. If it's close to 100% then SQLite can't
> do any better.

A sad day.

I copied the database to a faster driver with RAID 0, made another
copy of the database (referred to as DB1), and ran another set of
tests:

test1: two sequential processes of sqlite count(*) table1 and table 2
in DB1 --> 7m15s

test2: two concurrent processes of sqlite count(*) table1 and table2
in DB1 --> 5m22s

test3: four concurrent processes of sqlite count(*) table 1, 2, 3, and
4 in DB1 --> 12m58s

test4: two concurrent processes of sqlite count(*) table1 in DB1, and
table1 in DB2 --> 9m51s.

Although running two or more processes can save some time, the
performance gain is not that big (tests 2 and 3), splitting the
database into several smaller ones would not help either (test 4).

Anyway, the iostat output of my system is

2011 Oct 22 21:16:36,  load: 0.03,  disk_r:   2676 KB,  disk_w:  0 KB

  UIDPID   PPID CMD  DEVICE  MAJ MIN DBYTES
0  0  0  ??   14   8  65536
  503732730 sqlite3  ??   14  14 R  1323008
  503731730 sqlite3  ??   14  14 R  1355776

If I understand correctly, the IO load is only 3% when two sqlite3
processes are running, so perhaps I can still tweak sqlite3 to run
faster. I will also copy the database around and see if other disks
(SSD?), operating system (linux?), and file systems can provide better
performance.

Thanks again for all the help,
Bo
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Pavel Ivanov
On Sat, Oct 22, 2011 at 6:02 PM, Bo Peng  wrote:
>> You may create multiple threads, but your hard drive only has one set of 
>> heads.
>
> I now realize this problem and is moving the data to a faster drive.
> However, when copying the data out, the activity monitor reports
> 40MB/sec read speed. I admit this is not a fast drive, but comparing
> to the peak 700KB/sec read speed when three instances of sqlite3 were
> running, there might still be room for improvement at the sqlite side.

It's not only speed in KB/sec that matters. It's also disk system
usage as reported by iostat. If it's close to 100% then SQLite can't
do any better.


Pavel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Simon Slavin

On 22 Oct 2011, at 10:57pm, Igor Tandetnik wrote:

> You may create multiple threads, but your hard drive only has one set of 
> heads.

Right.  I use lots of Macs.  I also think you're I/O bound.  I think you're 
just better resign yourself to an overnight run.

If you're going to do this a lot, it would be good to merge all your tables.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Bo Peng
> You may create multiple threads, but your hard drive only has one set of 
> heads.

I now realize this problem and is moving the data to a faster drive.
However, when copying the data out, the activity monitor reports
40MB/sec read speed. I admit this is not a fast drive, but comparing
to the peak 700KB/sec read speed when three instances of sqlite3 were
running, there might still be room for improvement at the sqlite side.

Bo
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Igor Tandetnik
Bo Peng  wrote:
> I have a database with about 5000 tables each with more than 1 million
> records. I needed to get some summary statistics of each table but
> find that it will take days to run 'SELECT count(*) FROM table_XX'
> (XX=1,...,5000) sequentially. I therefore created 10 threads, each
> having its own database connection (sqlite3.connect() from python). To
> my dismay, it appears that only one thread was allowed to access the
> database at any time so the overall speed was not improved.

You may create multiple threads, but your hard drive only has one set of heads.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Bo Peng
On Sat, Oct 22, 2011 at 4:18 PM, Pavel Ivanov  wrote:
>> Using three tables (e.g. test.sh 111 112 113), the first command takes
>> 13m3s, the second command takes 12m45s. I am wondering if there is any
>> magic to make the second script finish in 5 minutes by executing the
>> query in parallel ...
>
> Try to execute "pragma cache_size = 100" before executing select.
> It should be executed in the same sqlite3 run to have any effect, so
> your argument to sqlite3 should look like "pragma cache_size =
> 100; SELECT COUNT(*) FROM table_$arg;". I think disk I/O kills
> your performance. iostat could prove you that.

The performance is even worse with this option (13m55s). However, I
notice that the disk activity monitor (I am running a mac) reports 700
KB/sec for all test scripts I ran, so it is very likely that your
suspicion (i/o kills performance) is right. I am moving the data to a
faster drive and will report back later.

Thanks,
Bo
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Pavel Ivanov
> Using three tables (e.g. test.sh 111 112 113), the first command takes
> 13m3s, the second command takes 12m45s. I am wondering if there is any
> magic to make the second script finish in 5 minutes by executing the
> query in parallel ...

Try to execute "pragma cache_size = 100" before executing select.
It should be executed in the same sqlite3 run to have any effect, so
your argument to sqlite3 should look like "pragma cache_size =
100; SELECT COUNT(*) FROM table_$arg;". I think disk I/O kills
your performance. iostat could prove you that.


Pavel


On Sat, Oct 22, 2011 at 5:11 PM, Bo Peng  wrote:
>>
>> Multithreaded mode allows SQLite to be accessed via multiple threads as long
>> as threads don't shared connection handles.  This is the what's sometimes
>> called the apartment model of multithreading.
>
> Thank you very much for your quick reply.
>
> Is there a way to enable multi-thread mode from command line? Before
> changing the source code of my program, I would like to see the effect
> of multi-threading from command line. More specifically, I have two
> scripts:
>
> #!/bin/bash
> for arg in $*
> do
>    echo "Handling " $arg
>    sqlite3  esp.DB "SELECT COUNT(*) FROM table_$arg;"
> done
>
> and
>
> #!/bin/bash
> for arg in $*
> do
>    echo "Handling " $arg
>    sqlite3  esp.DB "SELECT COUNT(*) FROM table_$arg;" &
> done
> wait
>
> Using three tables (e.g. test.sh 111 112 113), the first command takes
> 13m3s, the second command takes 12m45s. I am wondering if there is any
> magic to make the second script finish in 5 minutes by executing the
> query in parallel ...
>
> Bo
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Bo Peng
>
> Multithreaded mode allows SQLite to be accessed via multiple threads as long
> as threads don't shared connection handles.  This is the what's sometimes
> called the apartment model of multithreading.

Thank you very much for your quick reply.

Is there a way to enable multi-thread mode from command line? Before
changing the source code of my program, I would like to see the effect
of multi-threading from command line. More specifically, I have two
scripts:

#!/bin/bash
for arg in $*
do
echo "Handling " $arg
sqlite3  esp.DB "SELECT COUNT(*) FROM table_$arg;"
done

and

#!/bin/bash
for arg in $*
do
echo "Handling " $arg
sqlite3  esp.DB "SELECT COUNT(*) FROM table_$arg;" &
done
wait

Using three tables (e.g. test.sh 111 112 113), the first command takes
13m3s, the second command takes 12m45s. I am wondering if there is any
magic to make the second script finish in 5 minutes by executing the
query in parallel ...

Bo
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 22/10/11 12:52, Bo Peng wrote:
> I mean, is it possible, in theory, to read a sqlite database from
> multiple threads/processes each with performance comparable to a
> single thread/process?

Yes.  The details are explained here:

  http://www.sqlite.org/lockingv3.html

You'd also be far better off having fewer tables with a column
corresponding to the table name in a master table.  Also if you are doing
counts like that then you could maintain a table of the counts using
triggers to update it.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk6jJPMACgkQmOOfHg372QRFdwCgki4sfnummmja1WlB3aNiXPYI
DNkAniiOkT2XUTNF62G/RXg1DD/aLSle
=BmWY
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Concurrent readonly access to a large database.

2011-10-22 Thread Peter Aronson
The default threading mode for SQLite can be Serialized, which means only
one thread at a time.  

See http://www.sqlite.org/threadsafe.html

However, you can change it to Multithreaded either at compile time or via a
call to sqlite3_config() (that's in the C API -- I don't know about Python,
but I presume it has a version of the call) which allows you to use to
select a different threading mode.  You can also use flags on the open call
as well.

Multithreaded mode allows SQLite to be accessed via multiple threads as long
as threads don't shared connection handles.  This is the what's sometimes
called the apartment model of multithreading.

Best,

Peter

> -Original Message-
> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-
> boun...@sqlite.org] On Behalf Of Bo Peng
> Sent: Saturday, October 22, 2011 12:53 PM
> To: sqlite-users@sqlite.org
> Subject: [sqlite] Concurrent readonly access to a large database.
> 
> Dear all,
> 
> I have a database with about 5000 tables each with more than 1 million
> records. I needed to get some summary statistics of each table but
> find that it will take days to run 'SELECT count(*) FROM table_XX'
> (XX=1,...,5000) sequentially. I therefore created 10 threads, each
> having its own database connection (sqlite3.connect() from python). To
> my dismay, it appears that only one thread was allowed to access the
> database at any time so the overall speed was not improved. I further
> created 10 processes, each running one query (e.g. 'sqlite3
> file:mydatabase?mode=ro "SELECT count(*) FROM table_XX"'), but I still
> could see any improvement in performance...
> 
> Because each thread/process took only 10M RAM and about 1% CPU,
> staring 10 threads/processes should not be a problem at all. Before I
> investigate further (e.g. programming error, bottleneck of disk I/O),
> can anyone tell me if I am going the right direction? I mean, is it
> possible, in theory, to read a sqlite database from multiple
> threads/processes each with performance comparable to a single
> thread/process? I understand writing to a database from multiple
> threads can be troublesome but I only need to read from it.
> 
> Many thanks in advance,
> Bo
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users