subject:"Re\: \[sqlite\] SQLite Performance"

DRH,
 Thanks for your valuable insite. 

 When the DB is closed when in synchrounous mode, is it then persistent at the 
OS level even from power failures etc? 

[EMAIL PROTECTED] wrote: Ken  wrote:
>  
>  I should be able to run with synchronous=off. Since 
> the application maintains state in a seperate DB elsewhere.
>  

Just to clarify the implications where, if you run with
synchronous=off and you take a power failure or an OS
crash in the middle of a transaction, then you run a
serious risk of ending up with a corruption database
file. 

However, (key point->) an application crash will not hurt 
the database.  

Since you have an alternate source of authoritative data
and since power failures and OS crashes are reasonably
infrequent, your decision to run with synchronous=off
may well be the right choice.

--
D. Richard Hipp  

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite Performance

Tito,
 Its even better now!
 
 Synchronous=normal and No primary keys (except 1 table) for auto increment. 
 
 real0m1.975s
 user0m1.436s
 sys 0m0.140s
 
 Vs  flat file test case:
 
 real0m0.862s
 user0m0.228s
 sys 0m0.188s
 
 This is now very respectable. 
 
 Thanks,
 Ken
 

Ken <[EMAIL PROTECTED]> wrote: Tito,
 
 There are no indices built besides the default ones. Hmm maybe I should try 
this by dropping the primary Keys.. I'll give that a try as well, GOOD idea!
 
 The entire batch of inserts (about 8 tables) is done in a single transaction.
 
 As an Oracle DBA, I'm pretty familar with tuning. This was definately an I/O 
issue and related to the code path vs say a select where the optimizer picked 
an incorrect plan. 
 
 Regards,
 Ken
 

Tito Ciuro  wrote: Hello,

IIRC (it was a while ago), one way to speed up insertion for large  
data sets is to drop the indexes, do the inserts (wrapped around a  
transaction) and then rebuild the indexes. For smaller data sets, the  
drop/rebuild indexes solution doesn't make sense because the time it  
takes to do that invalidates the performance gain. However, larger  
data sets seemed to benefit greatly. Again... that was a while  
ago... :-)

I should dust-off my test app and see what the results are with the  
latest sources. I'll let you know what I find out.

-- Tito

On Mar 15, 2007, at 11:42 AM, John Stanton wrote:

> There are no free lunches.  When Sqlite stores your data item it  
> not only writes it into a linked list of pages in a file but also  
> inserts at least on key into a B-Tree index.  It does it quite  
> efficiently so what you are seeing is the inevitable overhead of  
> storing the data in a structured form.  The value of the structure  
> becomes obvious when you are retrieving a single item from a set of  
> millions and the index allows you to access it in a tiny fraction  
> of the time it would take to search an unstructured list like a  
> flat file.
>
> The ACID implementation in Sqlite provides data security but is  
> does involve a significant overhead.  You pay a price for not  
> losing data in a system crash.
>
> Like all things in life "you pays your money and you takes your  
> choice".  It is somewhat simpler with Sqlite in that you don't pay  
> your money, you just take your choice.
>
> If you want faster Sqlite performance use faster disks.  The  
> latency is important so 15,000 rpm disks will be better than 5,400  
> rpm ones.
>
> Ken wrote:
>> To answer your question:  Yes I can use a flat file at this stage,  
>> but eventually it needs to be imported into some type of  
>> structure. So to that end I decided early on to use sqlite to  
>> write the data out.   I was hoping for better performance. The raw  
>> I/O to read the data and process is around .75 seconds (no write i/ 
>> o).. So using a flat file output costs about .7 seconds.
>>   Using sqlite to do the output costs about 2.25 seconds. My  
>> question is why? And what can be done to improve this  
>> performance?   John Stanton  wrote: Ken wrote:
>>> I'm looking for suggestions on improving performance of my sqlite  
>>> application.
>>> Here are system timings for a run where the sqlite db has been  
>>> replaced with a flat file output.
>>> real 0m1.459s
>>> user0m0.276s
>>> sys  0m0.252s
>>> This is a run when using sqlite as the output format.
>>> real 0m3.095s
>>> user0m1.956s
>>> sys  0m0.160s
>>> As you can see sqlite takes twice as long and almost 8 times the  
>>> user time.
>>> Output size for flat file:   13, 360, 504flatfile.dat
>>> Output size fo sqlit file:   11,042,816   sqlt.db f
>>> Slite db has the following pragmas set.
>>>  PRAGMA default_synchronous=FULL
>>>  PRAGMA temp_store=memory
>>>  PRAGMA page_size=4096
>>>  PRAGMA cache_size=2000
>>> Any ideas how to get the sqlite output timings to a more  
>>> respectable level would be appreciated.
>>> Thanks
>>> Ken
>> If you want flat file performance, use a flat file.  Sqlite is  
>> built on top of a flat file and cannot be faster or even as fast.   
>> If your application can use a flat file, why use anything more  
>> complex?
>> - 
>> 
>> To unsubscribe, send email to [EMAIL PROTECTED]
>> - 
>> 
>
>
> -- 
> ---
> To unsubscribe, send email to [EMAIL PROTECTED]
> -- 
> ---
>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

RE: [sqlite] sqlite Performance

2007-03-15 Thread GBanschbach



"Griggs, Donald" wrote on 03/15/2007 01:49:30 PM:

> Regarding:
>  Creation of flat file takes 1.5 secs vs 3 seconds to create sqlite db.
>  Flat file is 13 MB, sqlite db is 11 MB.
>
>  "Any ideas how to get the sqlite output timings to a more respectable
> level would be appreciated. "
>
I think you may be looking at this as a one dimensional problem, or looking
at it from the wrong angle. [I am using upper case here for mild emphasis -
not yelling].  Really, there is a greater difference in gains at the other
end.  In other words, there IS a certain amount of overhead, BUT, what you
get back from sqlite over a flat file is: The ability to use SQL, where a
flat file is flat and brain dead. The ability to create indexes to help
find specific data, where your other choices may require you to write code
(which equals time which equals labor and money).  Finally, businesses
often throw hardware at performance, when tuning has already been tried.
Good Luck.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite Performance

2007-03-15 Thread drh

Ken <[EMAIL PROTECTED]> wrote:
>  
>  I should be able to run with synchronous=off. Since 
> the application maintains state in a seperate DB elsewhere.
>  

Just to clarify the implications where, if you run with
synchronous=off and you take a power failure or an OS
crash in the middle of a transaction, then you run a
serious risk of ending up with a corruption database
file. 

However, (key point->) an application crash will not hurt 
the database.  

Since you have an alternate source of authoritative data
and since power failures and OS crashes are reasonably
infrequent, your decision to run with synchronous=off
may well be the right choice.

--
D. Richard Hipp  <[EMAIL PROTECTED]>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite Performance

Tito,
 
 There are no indices built besides the default ones. Hmm maybe I should try 
this by dropping the primary Keys.. I'll give that a try as well, GOOD idea!
 
 The entire batch of inserts (about 8 tables) is done in a single transaction.
 
 As an Oracle DBA, I'm pretty familar with tuning. This was definately an I/O 
issue and related to the code path vs say a select where the optimizer picked 
an incorrect plan. 
 
 Regards,
 Ken
 

Tito Ciuro <[EMAIL PROTECTED]> wrote: Hello,

IIRC (it was a while ago), one way to speed up insertion for large  
data sets is to drop the indexes, do the inserts (wrapped around a  
transaction) and then rebuild the indexes. For smaller data sets, the  
drop/rebuild indexes solution doesn't make sense because the time it  
takes to do that invalidates the performance gain. However, larger  
data sets seemed to benefit greatly. Again... that was a while  
ago... :-)

I should dust-off my test app and see what the results are with the  
latest sources. I'll let you know what I find out.

-- Tito

On Mar 15, 2007, at 11:42 AM, John Stanton wrote:

> There are no free lunches.  When Sqlite stores your data item it  
> not only writes it into a linked list of pages in a file but also  
> inserts at least on key into a B-Tree index.  It does it quite  
> efficiently so what you are seeing is the inevitable overhead of  
> storing the data in a structured form.  The value of the structure  
> becomes obvious when you are retrieving a single item from a set of  
> millions and the index allows you to access it in a tiny fraction  
> of the time it would take to search an unstructured list like a  
> flat file.
>
> The ACID implementation in Sqlite provides data security but is  
> does involve a significant overhead.  You pay a price for not  
> losing data in a system crash.
>
> Like all things in life "you pays your money and you takes your  
> choice".  It is somewhat simpler with Sqlite in that you don't pay  
> your money, you just take your choice.
>
> If you want faster Sqlite performance use faster disks.  The  
> latency is important so 15,000 rpm disks will be better than 5,400  
> rpm ones.
>
> Ken wrote:
>> To answer your question:  Yes I can use a flat file at this stage,  
>> but eventually it needs to be imported into some type of  
>> structure. So to that end I decided early on to use sqlite to  
>> write the data out.   I was hoping for better performance. The raw  
>> I/O to read the data and process is around .75 seconds (no write i/ 
>> o).. So using a flat file output costs about .7 seconds.
>>   Using sqlite to do the output costs about 2.25 seconds. My  
>> question is why? And what can be done to improve this  
>> performance?   John Stanton  wrote: Ken wrote:
>>> I'm looking for suggestions on improving performance of my sqlite  
>>> application.
>>> Here are system timings for a run where the sqlite db has been  
>>> replaced with a flat file output.
>>> real 0m1.459s
>>> user0m0.276s
>>> sys  0m0.252s
>>> This is a run when using sqlite as the output format.
>>> real 0m3.095s
>>> user0m1.956s
>>> sys  0m0.160s
>>> As you can see sqlite takes twice as long and almost 8 times the  
>>> user time.
>>> Output size for flat file:   13, 360, 504flatfile.dat
>>> Output size fo sqlit file:   11,042,816   sqlt.db f
>>> Slite db has the following pragmas set.
>>>  PRAGMA default_synchronous=FULL
>>>  PRAGMA temp_store=memory
>>>  PRAGMA page_size=4096
>>>  PRAGMA cache_size=2000
>>> Any ideas how to get the sqlite output timings to a more  
>>> respectable level would be appreciated.
>>> Thanks
>>> Ken
>> If you want flat file performance, use a flat file.  Sqlite is  
>> built on top of a flat file and cannot be faster or even as fast.   
>> If your application can use a flat file, why use anything more  
>> complex?
>> - 
>> 
>> To unsubscribe, send email to [EMAIL PROTECTED]
>> - 
>> 
>
>
> -- 
> ---
> To unsubscribe, send email to [EMAIL PROTECTED]
> -- 
> ---
>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

RE: [sqlite] sqlite Performance

Donald,
 
 I set the PRAGMA synchronous= OFF and here are the results:
 
 real0m2.258s
 user0m1.736s
 sys 0m0.168s
 
 --
 Pragma synchronous= NORMAL
 real0m2.395s
 user0m1.520s
 sys 0m0.128s
 
 Pragma synchronous= FULL
 real0m3.228s
 user0m2.276s
 sys 0m0.136s
 
 
 Running with synchronous=off is 43% faster !!!
 Running with Synchrounous=normal is 33 % faster.
 
 I should be able to run with synchronous=off. Since the application maintains 
state in a seperate DB elsewhere.
 
 
 Thanks for you valuable Input.
 Ken
 
 

"Griggs, Donald" <[EMAIL PROTECTED]> wrote: Regarding: 
 Creation of flat file takes 1.5 secs vs 3 seconds to create sqlite db.
 Flat file is 13 MB, sqlite db is 11 MB.

 "Any ideas how to get the sqlite output timings to a more respectable
level would be appreciated. "

I may be way off base if I'm not understanding correctly, but how can
one call these values less than respectable?

To create an sqlite database (or any other) the system must do the same
things it does for the flat file, plus maintain a paging structure,
create indices (presumably), rollback journals, etc., etc.

To take only twice as long seems great (but I'm no expert here).  I'm
guessing it might have taken sqlite even longer except that maybe
compression of numeric values allowed it to actually need fewer disk
writes for the final file (not counting journalling, though).

That being said, if the data you're writing out is saved elsewhere (i.e.
you can repeat the whole process if it should fail) then you can try
turning synchronous OFF, or, if you have to be more conservative, to
NORMAL.
 
 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite Performance

2007-03-15 Thread Tito Ciuro


Hello,

IIRC (it was a while ago), one way to speed up insertion for large  
data sets is to drop the indexes, do the inserts (wrapped around a  
transaction) and then rebuild the indexes. For smaller data sets, the  
drop/rebuild indexes solution doesn't make sense because the time it  
takes to do that invalidates the performance gain. However, larger  
data sets seemed to benefit greatly. Again... that was a while  
ago... :-)


I should dust-off my test app and see what the results are with the  
latest sources. I'll let you know what I find out.


-- Tito

On Mar 15, 2007, at 11:42 AM, John Stanton wrote:

There are no free lunches.  When Sqlite stores your data item it  
not only writes it into a linked list of pages in a file but also  
inserts at least on key into a B-Tree index.  It does it quite  
efficiently so what you are seeing is the inevitable overhead of  
storing the data in a structured form.  The value of the structure  
becomes obvious when you are retrieving a single item from a set of  
millions and the index allows you to access it in a tiny fraction  
of the time it would take to search an unstructured list like a  
flat file.


The ACID implementation in Sqlite provides data security but is  
does involve a significant overhead.  You pay a price for not  
losing data in a system crash.


Like all things in life "you pays your money and you takes your  
choice".  It is somewhat simpler with Sqlite in that you don't pay  
your money, you just take your choice.


If you want faster Sqlite performance use faster disks.  The  
latency is important so 15,000 rpm disks will be better than 5,400  
rpm ones.


Ken wrote:
To answer your question:  Yes I can use a flat file at this stage,  
but eventually it needs to be imported into some type of  
structure. So to that end I decided early on to use sqlite to  
write the data out.   I was hoping for better performance. The raw  
I/O to read the data and process is around .75 seconds (no write i/ 
o).. So using a flat file output costs about .7 seconds.
  Using sqlite to do the output costs about 2.25 seconds. My  
question is why? And what can be done to improve this  
performance?   John Stanton <[EMAIL PROTECTED]> wrote: Ken wrote:
I'm looking for suggestions on improving performance of my sqlite  
application.
Here are system timings for a run where the sqlite db has been  
replaced with a flat file output.

real 0m1.459s
user0m0.276s
sys  0m0.252s
This is a run when using sqlite as the output format.
real 0m3.095s
user0m1.956s
sys  0m0.160s
As you can see sqlite takes twice as long and almost 8 times the  
user time.

Output size for flat file:   13, 360, 504flatfile.dat
Output size fo sqlit file:   11,042,816   sqlt.db f
Slite db has the following pragmas set.
 PRAGMA default_synchronous=FULL
 PRAGMA temp_store=memory
 PRAGMA page_size=4096
 PRAGMA cache_size=2000
Any ideas how to get the sqlite output timings to a more  
respectable level would be appreciated.

Thanks
Ken
If you want flat file performance, use a flat file.  Sqlite is  
built on top of a flat file and cannot be faster or even as fast.   
If your application can use a flat file, why use anything more  
complex?
- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 




-- 
---

To unsubscribe, send email to [EMAIL PROTECTED]
-- 
---





-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite Performance

2007-03-15 Thread John Stanton

There are no free lunches. When Sqlite stores your data item it not
only writes it into a linked list of pages in a file but also inserts at
least on key into a B-Tree index. It does it quite efficiently so what
you are seeing is the inevitable overhead of storing the data in a
structured form. The value of the structure becomes obvious when you
are retrieving a single item from a set of millions and the index allows
you to access it in a tiny fraction of the time it would take to search
an unstructured list like a flat file.

The ACID implementation in Sqlite provides data security but is does
involve a significant overhead. You pay a price for not losing data in
a system crash.

Like all things in life "you pays your money and you takes your choice".
It is somewhat simpler with Sqlite in that you don't pay your money,
you just take your choice.

If you want faster Sqlite performance use faster disks. The latency is
important so 15,000 rpm disks will be better than 5,400 rpm ones.

Ken wrote:
To answer your question:
Yes I can use a flat file at this stage, but eventually it needs to be imported into some type of structure. So to that end I decided early on to use sqlite to write the data out.

I was hoping for better performance. The raw I/O to read the data and process is around .75 seconds (no write i/o).. So using a flat file output costs about .7 seconds.

Using sqlite to do the output costs about 2.25 seconds. My question is why? And what can be done to improve this performance?

John Stanton <[EMAIL PROTECTED]> wrote: Ken wrote:

I'm looking for suggestions on improving performance of my sqlite application.

Here are system timings for a run where the sqlite db has been replaced with a
flat file output.
real 0m1.459s
user0m0.276s
sys 0m0.252s

This is a run when using sqlite as the output format.
real 0m3.095s
user0m1.956s
sys 0m0.160s

As you can see sqlite takes twice as long and almost 8 times the user time.

Output size for flat file: 13, 360, 504flatfile.dat
Output size fo sqlit file: 11,042,816 sqlt.db f

Slite db has the following pragmas set.

PRAGMA default_synchronous=FULL
PRAGMA temp_store=memory
PRAGMA page_size=4096
PRAGMA cache_size=2000

Any ideas how to get the sqlite output timings to a more respectable level
would be appreciated.

Thanks
Ken

If you want flat file performance, use a flat file. Sqlite is built on
top of a flat file and cannot be faster or even as fast. If your
application can use a flat file, why use anything more complex?

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite Performance

Scott, 
 
 The whole job is wrapped in an explicit transaction.
 Variables are bound and statements prepared only once, using reset. 
 
 This is  a write only app. 100% insert. 
 
 Ken
 

Scott Hess <[EMAIL PROTECTED]> wrote: Are you using explicit transactions at 
all?  If not, as a quick test,
put the _entire_ job in a transaction and see what happens.

-scott


On 3/15/07, Ken  wrote:
> To answer your question:
> Yes I can use a flat file at this stage, but eventually it needs to be 
> imported into some type of structure. So to that end I decided early on to 
> use sqlite to write the data out.
>
> I was hoping for better performance. The raw I/O to read the data and process 
> is around .75 seconds (no write i/o).. So using a flat file output costs 
> about .7 seconds.
>
> Using sqlite to do the output costs about 2.25 seconds. My question is why? 
> And what can be done to improve this performance?
>
>
>
> John Stanton  wrote: Ken wrote:
> > I'm looking for suggestions on improving performance of my sqlite 
> > application.
> >
> >  Here are system timings for a run where the sqlite db has been replaced 
> > with a flat file output.
> >  real 0m1.459s
> >  user0m0.276s
> >  sys  0m0.252s
> >
> >  This is a run when using sqlite as the output format.
> >  real 0m3.095s
> >  user0m1.956s
> >  sys  0m0.160s
> >
> >  As you can see sqlite takes twice as long and almost 8 times the user time.
> >
> >  Output size for flat file:   13, 360, 504flatfile.dat
> >  Output size fo sqlit file:   11,042,816   sqlt.db f
> >
> >  Slite db has the following pragmas set.
> >
> >   PRAGMA default_synchronous=FULL
> >   PRAGMA temp_store=memory
> >   PRAGMA page_size=4096
> >   PRAGMA cache_size=2000
> >
> >  Any ideas how to get the sqlite output timings to a more respectable level 
> > would be appreciated.
> >
> >  Thanks
> >  Ken
> >
> If you want flat file performance, use a flat file.  Sqlite is built on
> top of a flat file and cannot be faster or even as fast.  If your
> application can use a flat file, why use anything more complex?
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

RE: [sqlite] sqlite Performance

ok my bad for poor wording... 
 
 I'll try with Synchronous off. I may also try disabling the journal file since 
I can easily recreate the data if it is not successful.
 
 Thanks,
 Ken
 

"Griggs, Donald" <[EMAIL PROTECTED]> wrote: Regarding: 
 Creation of flat file takes 1.5 secs vs 3 seconds to create sqlite db.
 Flat file is 13 MB, sqlite db is 11 MB.

 "Any ideas how to get the sqlite output timings to a more respectable
level would be appreciated. "

I may be way off base if I'm not understanding correctly, but how can
one call these values less than respectable?

To create an sqlite database (or any other) the system must do the same
things it does for the flat file, plus maintain a paging structure,
create indices (presumably), rollback journals, etc., etc.

To take only twice as long seems great (but I'm no expert here).  I'm
guessing it might have taken sqlite even longer except that maybe
compression of numeric values allowed it to actually need fewer disk
writes for the final file (not counting journalling, though).

That being said, if the data you're writing out is saved elsewhere (i.e.
you can repeat the whole process if it should fail) then you can try
turning synchronous OFF, or, if you have to be more conservative, to
NORMAL.
 
 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite Performance

To answer your question: 
 Yes I can use a flat file at this stage, but eventually it needs to be 
imported into some type of structure. So to that end I decided early on to use 
sqlite to write the data out. 

 I was hoping for better performance. The raw I/O to read the data and process 
is around .75 seconds (no write i/o).. So using a flat file output costs about 
.7 seconds.

 Using sqlite to do the output costs about 2.25 seconds. My question is why? 
And what can be done to improve this performance? 

John Stanton <[EMAIL PROTECTED]> wrote: Ken wrote:
> I'm looking for suggestions on improving performance of my sqlite application.
>  
>  Here are system timings for a run where the sqlite db has been replaced with 
> a flat file output.
>  real 0m1.459s
>  user0m0.276s
>  sys  0m0.252s
>  
>  This is a run when using sqlite as the output format.
>  real 0m3.095s
>  user0m1.956s
>  sys  0m0.160s
>  
>  As you can see sqlite takes twice as long and almost 8 times the user time.
>  
>  Output size for flat file:   13, 360, 504flatfile.dat
>  Output size fo sqlit file:   11,042,816   sqlt.db f
>  
>  Slite db has the following pragmas set.
>  
>   PRAGMA default_synchronous=FULL
>   PRAGMA temp_store=memory
>   PRAGMA page_size=4096
>   PRAGMA cache_size=2000
>  
>  Any ideas how to get the sqlite output timings to a more respectable level 
> would be appreciated.
>  
>  Thanks
>  Ken
>  
If you want flat file performance, use a flat file.  Sqlite is built on 
top of a flat file and cannot be faster or even as fast.  If your 
application can use a flat file, why use anything more complex?

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

RE: [sqlite] sqlite Performance

2007-03-15 Thread Griggs, Donald

Regarding: 
 Creation of flat file takes 1.5 secs vs 3 seconds to create sqlite db.
 Flat file is 13 MB, sqlite db is 11 MB.

 "Any ideas how to get the sqlite output timings to a more respectable
level would be appreciated. "

I may be way off base if I'm not understanding correctly, but how can
one call these values less than respectable?

To create an sqlite database (or any other) the system must do the same
things it does for the flat file, plus maintain a paging structure,
create indices (presumably), rollback journals, etc., etc.

To take only twice as long seems great (but I'm no expert here).  I'm
guessing it might have taken sqlite even longer except that maybe
compression of numeric values allowed it to actually need fewer disk
writes for the final file (not counting journalling, though).

That being said, if the data you're writing out is saved elsewhere (i.e.
you can repeat the whole process if it should fail) then you can try
turning synchronous OFF, or, if you have to be more conservative, to
NORMAL.
 
 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite Performance

2007-03-15 Thread John Stanton


Ken wrote:

I'm looking for suggestions on improving performance of my sqlite application.
 
 Here are system timings for a run where the sqlite db has been replaced with a flat file output.

 real 0m1.459s
 user0m0.276s
 sys  0m0.252s
 
 This is a run when using sqlite as the output format.

 real 0m3.095s
 user0m1.956s
 sys  0m0.160s
 
 As you can see sqlite takes twice as long and almost 8 times the user time.
 
 Output size for flat file:   13, 360, 504flatfile.dat

 Output size fo sqlit file:   11,042,816   sqlt.db f
 
 Slite db has the following pragmas set.
 
  PRAGMA default_synchronous=FULL

  PRAGMA temp_store=memory
  PRAGMA page_size=4096
  PRAGMA cache_size=2000
 
 Any ideas how to get the sqlite output timings to a more respectable level would be appreciated.
 
 Thanks

 Ken
 
If you want flat file performance, use a flat file.  Sqlite is built on 
top of a flat file and cannot be faster or even as fast.  If your 
application can use a flat file, why use anything more complex?


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-06 Thread Roger Binns

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Gerry Snyder wrote:
> The wiki is there and open to all.
> 
> I look forward to reading your additions to it.

To be fair, only some of the documentation is in the wiki.  The
remainder is generated.  For example you can't edit any of the pages
listed under:

  http://sqlite.org/docs.html

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFoA0YmOOfHg372QQRAlycAJ9p3hQkWfc8yAUTqE7DgjGsefdiTwCfWIqD
xxpWSAFOZOB6yrbYrNO/Cwc=
=jxv/
-END PGP SIGNATURE-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-06 Thread Gerry Snyder


Emerson Clarke wrote:



Fix the out of date documentation


The wiki is there and open to all.

I look forward to reading your additions to it.


Gerry

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-06 Thread Emerson Clarke


For anyone who is interested i have created a standalone test case
which demonstrates the threading behaviour that i had, or as close as
i can get it.  Feel free to use the code for whatever purposes you see
fit.

It will compile on linux and windows, and comes with 4 versions of the
sqlite libraries which can be found under "test/lib/sqlite/lib".  Each
platform has two versions, one built with THREADSAFE=0, and the other
with THREADSAFE=1.

http://ul1.rapidshare.com/files/10511098/test.zip

It seems that i was incorrect in thinking the cause was
sqlite3_step(). In replicating the problem from scratch i found that
it was actually caused by sqlite3_finalize() being called without
synchronisation.

What does this mean, well firstly it means that the entire problem was
avoidable had i simply synchronised correctly.  But i think more
worryingly, it highlights the fact that there is a great deal of
confusion as to what it means for something to be thread safe on this
mailing list.

This confusion is also not aided in any way by the obtuse and stubborn
attitudes that some people have demonstrated.  If a lot of people are
having regular problems with sqlite and threading as has been
indicated then perhaps its time to review the situation.

It is incredibly frustrating that the documentation and the various
answers that i have recieved on this topic have been contradictory.
At no time has anyone actually stood up and said that the software can
safely be used in a multithreaded environment as long as access to the
api's is synchronised.

Instead you have a situation where the software has inbuilt misuse
detection which may or may not alert the programmer to a potential
error and a THREADSAFE macro which makes 5% of the it threadsafe but
leaves the other 95% exposed.  (By way of example using the code i
supplied and the safe versions of the libraries it is possible to get
away with calling sqlite3_finalize() outside the mutex, but if you
take it one step further and use sqlite3_step() outside the mutex it
will fail)

So an inconsistent and unpredictable internal approach to thread
safety combined with a policy of righteousness on this mailing list
means that people like me have little chance of ever getting it right.

Why is it so hard to simply nail down the characteristics of the
software and update the documentation so that everyone is on the same
page ?

If sqlite is thread safe, and by that i mean it can be used safely
with synchronisation, then why not say so.  And be consistent in
saying so.  That way when someone comes here with a problem you can
confidently describe how the api can be used and the user can go away
and attempt to resolve their problem with confidence.

I guess in part i owe people an appology, but then, had i been given a
rational explanation or had access to clear documentation i guess this
would not have gone on so long.  And to be quite honest, my confidence
in the software has now been eroded to such an extent that i dont know
what is up or down.

Please considder what im saying rationally and dont take offence.  I
know a lot of you are frustrated at this debate, and so you should be.
I am too. So why not take steps to ensure that it doesnt happen
again.  Be clear about the threading behaviour and remove the
inconsistencies.

Fix the out of date documentation, and give a consistent answer,
sqlite can be used with multiple threads provided that access to the
api is synchronised.

The saftey checks and macro options do more harm than good becuase
they create confusion.  Yes if you do X,Y, and Z you might get lucky,
but thats probably why so many people have trouble.  Its not
consistent and its complicated.  Unless your going to document the
exact sequence of calls which is safe to make without synchronisation,
you shouldnt be encouraging it.

Advising people to create complicated designs where each thread has
its own connection doesnt help either, because it will not work with
transactions.  It should be either 100% safe, or not at all.

Perhaps it would also be useful to document the various strategies and
ways in which sqlite can be used with safety.  Stop telling people
that mutlithreading results in random unpredictable behavior. Be open
minded and don't hide behind whatever doctrine has been followed up
till now.

Emerson

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

RE: [sqlite] sqlite performance, locking & threading

2007-01-06 Thread Fred Williams

Sounds like a H-- of a plan to me!  

I'll call Larry Ellison and warn him to put the 12 meter yacht up for sale, 
he's about to get steam rolled! :-)

Fred

> -Original Message-
> From: Bill King [mailto:[EMAIL PROTECTED]
> Sent: Saturday, January 06, 2007 1:15 AM
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] sqlite performance, locking & threading
> 
> 
> Roger Binns wrote:
> 
> >>Im sorry for being so harsh, and i know im not winning any friends
> >>here,
> >>
> >>
> >
> >So far noone has agreed with you :-)
> >
> >  
> >
> This would be incorrect. The correct statement is "so far no one has 
> vocally agreed with you".
> 
> If people didn't agree, this whole once a month people having 
> multi-threading issues would not be the case. Surely the number of 
> people with exactly the same issues, month after month, 
> should point to 
> something? Perchance illogic?
> 
> Time I think to go off and write my own simple sql engine. 
> People here 
> are far too closed minded, and far too quick to attack others because 
> they don't follow DRH's line of thought, and decisions.
> 
> --
> ---
> To unsubscribe, send email to [EMAIL PROTECTED]
> --
> ---
> 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-06 Thread John Stanton


Bill King wrote:

Roger Binns wrote:


Im sorry for being so harsh, and i know im not winning any friends
here,
  



So far noone has agreed with you :-)

 

This would be incorrect. The correct statement is "so far no one has 
vocally agreed with you".


If people didn't agree, this whole once a month people having 
multi-threading issues would not be the case. Surely the number of 
people with exactly the same issues, month after month, should point to 
something? Perchance illogic?


Time I think to go off and write my own simple sql engine. People here 
are far too closed minded, and far too quick to attack others because 
they don't follow DRH's line of thought, and decisions.


Good idea.  Would you share your proposed architecture with us.



- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Bill King


Roger Binns wrote:


Im sorry for being so harsh, and i know im not winning any friends
here,
   



So far noone has agreed with you :-)

 

This would be incorrect. The correct statement is "so far no one has 
vocally agreed with you".


If people didn't agree, this whole once a month people having 
multi-threading issues would not be the case. Surely the number of 
people with exactly the same issues, month after month, should point to 
something? Perchance illogic?


Time I think to go off and write my own simple sql engine. People here 
are far too closed minded, and far too quick to attack others because 
they don't follow DRH's line of thought, and decisions.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threadin

2007-01-05 Thread Emerson Clarke

Ken,

Thanks for you comments.  I have coded and tested a module just like
test_server.c and by disabling the safety checks i have also been able
to code and test an example which uses a single connection, single
transaction, single table with up to 50 threads doing
insert/update/delete with no errors.

Granted it was synchronised, so all execution of sqlite3_ api routines
was done serially, but in my experience thats quite a normal
requirement for multithreading.

Its hard to make something which doesnt require synchronisation,
especially when internal state and files are concerned.  But thats not
a problem, because if the synchronisation is granular enough it will
only ever block as frequently as it would normallly crash.  So in
other words, blocking becomes a race condition, and performance is not
hindered very much at all.

If you would like i can code up a test case which demonstrates this
and send it too you, but i think i should stop posting to this topic.
Some people are getting a bit too defensive and annoyed...

I am grateful for all the time people have given me, and for the
software itself.  At the end of the day i only brought this up because
i wanted to help improve it.

Emerson

On 1/5/07, Ken <[EMAIL PROTECTED]> wrote:

Emerson,

 I agree with you somewhat. Not 100% convinced but, I like you am a little dissapointed how sqlite handles  
"threadsafe" and multiple connections. Even in the "test_server.c" module is not 
"concurrent" As it serializes all processing to a single thread, this is not concurrent processing.

 Now if i could take that server thread and create a pool of workers and hand 
off the working set request to one of those, then this would be concurrent 
since each thread would have the capability to run independently of the other 
and at the same time on an SMP hardware system such as Solaris,Hpux etc... But 
this is specifically disallowed since the connection when using the shared 
cache must be used by the creating thread.

 So far I've not been able to successfully find any threadsafe examples of 
sqlite that actually work in a concurrent manner.  Even the threadtest2.c 
crased (bug fix coming in 3.3.10)  but even it does not start up a transaction 
and run multiple inserts/step commands.

 I would like to see a working example of how sqlite can be concurrently 
accessed via multiple threads performingWrite (insert/update/delete) and by 
other threads performing selects against a single table.

 IMHO, the sqlite3_step function if it can only be executed serially (ie must be wrapped 
in a mutex) should do this mutexing internally and not be exposed and left in the hands 
of the user. If one compiles the code with --enable-threadsafe, shouldn't the API be able 
to handle areas such as these internally and not generate "misuse" errors.

Emerson Clarke <[EMAIL PROTECTED]> wrote: Roger,

Of course you can test threading behaviour, yes its not exactly
repeatable but under most circumstances and with enough test cases you
can catch the problems.

I don't think sqlite is such a large and complicated piece of software
that it would be impossible to reproduce such errors.

Everyone keeps saying its well documented and referring to the fact
that the reasoning has already been done, but i havnt heard anything
other than the usual "thats the way its always been" responses, or the
"its designed that way beacuse its easier" argument.

That does not count as documentation or reasoning.  If anything the
source code is the documentation, but thats not quite the same as an
essay on the thread safety of the sqlite vm is it ?

Anyway, i can see im flogging a dead horse here. This is as futile as
arguing religion with a priest. :P

Theres not much point forking any maintaining the code if on one else
sees any validity in my arguments.

Thanks to everyone anyway,

Emerson

On 1/5/07, Roger Binns  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Emerson Clarke wrote:
> > I have to admit i am a little dissapointed.  As the primary author of
> > the software i would have thought that you would have a good
> > understanding of what the thread safety characteristics of your own
> > api were.
>
> He does!  It is well documented and tested.  *You* are the one who wants
> to do various things differently.
>
> > Suggesting that suppressing the safety checks will result in random
> > and non reproducable failures seems a little extreme, if not
> > superstitious.
>
> Err, you are using threading.  Explain exactly how if you got a failure
> due to threading you could reproduce it exactly.  You do know that you
> can't run x86 processors in lockstep because at the cycle level they
> have effectively random behaviour due to cache eviction and replacement
> policies.  That one reason for the Itanium if you need two processors to
> have exactly the same behaviour on the same code at the cycle level.
>
> > As i was discussing with Nicholas, api's are not by default thread
> > unsafe.  In

Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Ken

Emerson,

 I agree with you somewhat. Not 100% convinced but, I like you am a little 
dissapointed how sqlite handles  "threadsafe" and multiple connections. Even in 
the "test_server.c" module is not "concurrent" As it serializes all processing 
to a single thread, this is not concurrent processing.

 Now if i could take that server thread and create a pool of workers and hand 
off the working set request to one of those, then this would be concurrent 
since each thread would have the capability to run independently of the other 
and at the same time on an SMP hardware system such as Solaris,Hpux etc... But 
this is specifically disallowed since the connection when using the shared 
cache must be used by the creating thread.

 So far I've not been able to successfully find any threadsafe examples of 
sqlite that actually work in a concurrent manner.  Even the threadtest2.c 
crased (bug fix coming in 3.3.10)  but even it does not start up a transaction 
and run multiple inserts/step commands.

 I would like to see a working example of how sqlite can be concurrently 
accessed via multiple threads performingWrite (insert/update/delete) and by 
other threads performing selects against a single table. 

 IMHO, the sqlite3_step function if it can only be executed serially (ie must 
be wrapped in a mutex) should do this mutexing internally and not be exposed 
and left in the hands of the user. If one compiles the code with 
--enable-threadsafe, shouldn't the API be able to handle areas such as these 
internally and not generate "misuse" errors. 

Emerson Clarke <[EMAIL PROTECTED]> wrote: Roger,

Of course you can test threading behaviour, yes its not exactly
repeatable but under most circumstances and with enough test cases you
can catch the problems.

I don't think sqlite is such a large and complicated piece of software
that it would be impossible to reproduce such errors.

Everyone keeps saying its well documented and referring to the fact
that the reasoning has already been done, but i havnt heard anything
other than the usual "thats the way its always been" responses, or the
"its designed that way beacuse its easier" argument.

That does not count as documentation or reasoning.  If anything the
source code is the documentation, but thats not quite the same as an
essay on the thread safety of the sqlite vm is it ?

Anyway, i can see im flogging a dead horse here. This is as futile as
arguing religion with a priest. :P

Theres not much point forking any maintaining the code if on one else
sees any validity in my arguments.

Thanks to everyone anyway,

Emerson

On 1/5/07, Roger Binns  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Emerson Clarke wrote:
> > I have to admit i am a little dissapointed.  As the primary author of
> > the software i would have thought that you would have a good
> > understanding of what the thread safety characteristics of your own
> > api were.
>
> He does!  It is well documented and tested.  *You* are the one who wants
> to do various things differently.
>
> > Suggesting that suppressing the safety checks will result in random
> > and non reproducable failures seems a little extreme, if not
> > superstitious.
>
> Err, you are using threading.  Explain exactly how if you got a failure
> due to threading you could reproduce it exactly.  You do know that you
> can't run x86 processors in lockstep because at the cycle level they
> have effectively random behaviour due to cache eviction and replacement
> policies.  That one reason for the Itanium if you need two processors to
> have exactly the same behaviour on the same code at the cycle level.
>
> > As i was discussing with Nicholas, api's are not by default thread
> > unsafe.  In fact it is the other way around, most api's should be
> > perfectly thread safe if access to them is synchronised.
>
> This all comes down to semantics.  For trivial libraries, sure you can
> just wrap a mutex around it all.  For non-trivial libraries, best
> practise is for the author to design and implement for whatever thread
> usage they believe is best suited for the library and the platforms it
> operates one.
>
> > Im sorry for being so harsh, and i know im not winning any friends
> > here,
>
> So far noone has agreed with you :-)
>
> > I have tested the changes i made with some 50 threads performing
> > probably thousands of statements a second, and had no errors.
>
> What does that prove?  You cannot prove threading by running stuff and
> saying "see it didn't crash".  The only way to prove threading is by
> reasoning about the design and verifying the implementation matches the
> design. drh already did that years ago.
>
> > If you are willing to work through the problem, i am willing to assist
> > you in any way that i can.  Wether it be writing test cases or making
> > modifications to the source.
>
> You can always maintain your own forked version of SQLite and link to it
> from the wiki.
>
> Roger
> -BEGIN PGP

Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Emerson Clarke

Roger,

Of course you can test threading behaviour, yes its not exactly
repeatable but under most circumstances and with enough test cases you
can catch the problems.

I don't think sqlite is such a large and complicated piece of software
that it would be impossible to reproduce such errors.

Everyone keeps saying its well documented and referring to the fact
that the reasoning has already been done, but i havnt heard anything
other than the usual "thats the way its always been" responses, or the
"its designed that way beacuse its easier" argument.

That does not count as documentation or reasoning.  If anything the
source code is the documentation, but thats not quite the same as an
essay on the thread safety of the sqlite vm is it ?

Anyway, i can see im flogging a dead horse here. This is as futile as
arguing religion with a priest. :P

Theres not much point forking any maintaining the code if on one else
sees any validity in my arguments.

Thanks to everyone anyway,

Emerson

On 1/5/07, Roger Binns <[EMAIL PROTECTED]> wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Emerson Clarke wrote:
> I have to admit i am a little dissapointed.  As the primary author of
> the software i would have thought that you would have a good
> understanding of what the thread safety characteristics of your own
> api were.

He does!  It is well documented and tested.  *You* are the one who wants
to do various things differently.

> Suggesting that suppressing the safety checks will result in random
> and non reproducable failures seems a little extreme, if not
> superstitious.

Err, you are using threading.  Explain exactly how if you got a failure
due to threading you could reproduce it exactly.  You do know that you
can't run x86 processors in lockstep because at the cycle level they
have effectively random behaviour due to cache eviction and replacement
policies.  That one reason for the Itanium if you need two processors to
have exactly the same behaviour on the same code at the cycle level.

> As i was discussing with Nicholas, api's are not by default thread
> unsafe.  In fact it is the other way around, most api's should be
> perfectly thread safe if access to them is synchronised.

This all comes down to semantics.  For trivial libraries, sure you can
just wrap a mutex around it all.  For non-trivial libraries, best
practise is for the author to design and implement for whatever thread
usage they believe is best suited for the library and the platforms it
operates one.

> Im sorry for being so harsh, and i know im not winning any friends
> here,

So far noone has agreed with you :-)

> I have tested the changes i made with some 50 threads performing
> probably thousands of statements a second, and had no errors.

What does that prove?  You cannot prove threading by running stuff and
saying "see it didn't crash".  The only way to prove threading is by
reasoning about the design and verifying the implementation matches the
design. drh already did that years ago.

> If you are willing to work through the problem, i am willing to assist
> you in any way that i can.  Wether it be writing test cases or making
> modifications to the source.

You can always maintain your own forked version of SQLite and link to it
from the wiki.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFnnDQmOOfHg372QQRAtz1AJwIzJbrlnH0xImPSJYzvutAtMomVACfSltc
KaSDdgBuwW4ITN1UWms5DMI=
=EqF9
-END PGP SIGNATURE-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Roger Binns

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Emerson Clarke wrote:
> I have to admit i am a little dissapointed.  As the primary author of
> the software i would have thought that you would have a good
> understanding of what the thread safety characteristics of your own
> api were.

He does!  It is well documented and tested.  *You* are the one who wants
to do various things differently.

> Suggesting that suppressing the safety checks will result in random
> and non reproducable failures seems a little extreme, if not
> superstitious.

Err, you are using threading.  Explain exactly how if you got a failure
due to threading you could reproduce it exactly.  You do know that you
can't run x86 processors in lockstep because at the cycle level they
have effectively random behaviour due to cache eviction and replacement
policies.  That one reason for the Itanium if you need two processors to
have exactly the same behaviour on the same code at the cycle level.

> As i was discussing with Nicholas, api's are not by default thread
> unsafe.  In fact it is the other way around, most api's should be
> perfectly thread safe if access to them is synchronised.

This all comes down to semantics.  For trivial libraries, sure you can
just wrap a mutex around it all.  For non-trivial libraries, best
practise is for the author to design and implement for whatever thread
usage they believe is best suited for the library and the platforms it
operates one.

> Im sorry for being so harsh, and i know im not winning any friends
> here,

So far noone has agreed with you :-)

> I have tested the changes i made with some 50 threads performing
> probably thousands of statements a second, and had no errors.

What does that prove?  You cannot prove threading by running stuff and
saying "see it didn't crash".  The only way to prove threading is by
reasoning about the design and verifying the implementation matches the
design. drh already did that years ago.

> If you are willing to work through the problem, i am willing to assist
> you in any way that i can.  Wether it be writing test cases or making
> modifications to the source.

You can always maintain your own forked version of SQLite and link to it
from the wiki.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFnnDQmOOfHg372QQRAtz1AJwIzJbrlnH0xImPSJYzvutAtMomVACfSltc
KaSDdgBuwW4ITN1UWms5DMI=
=EqF9
-END PGP SIGNATURE-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Emerson Clarke

Richard,

I have to admit i am a little dissapointed.  As the primary author of
the software i would have thought that you would have a good
understanding of what the thread safety characteristics of your own
api were.

Suggesting that suppressing the safety checks will result in random
and non reproducable failures seems a little extreme, if not
superstitious.

As i was discussing with Nicholas, api's are not by default thread
unsafe.  In fact it is the other way around, most api's should be
perfectly thread safe if access to them is synchronised.

It is my undestanding that sqlite3_stmt * represents an instance of
the VM, and that most of the internal virtual machine operations are
reentrant, simply operating on the supplied VM and modifying its
internal state.

If this is the case, i cant see how you would ever have thread safety
problems so long as no two threads are modifying the same sqlite3_stmt
* at the same time.

Surely this is something that can be reasoned through ?

Adding deliberate safety check imbalances to sqlite3_step becuase it
is the only routine that can be called mutliple times in sequence is a
bit extreme.  Why not allow users who don't believe in withces and
goblins go ahead and use the api in a synchronised way.  The safety
checks will still be there, just not as agressively.

You should also considder that the safety checks only provide a
warning for the user.  Given that they are subject to race conditions,
you can never guarantee that the safety checks will trap all errors.
In fact i would argue that you are better off throwing an int 3
instruction than returning a misuse error.  Giving the programmer the
false idea that they can continue to run their program without stack
corruption is not a good idea, after all its just encouraging them to
try again.  Rather, according to what you are saying they should be
re-writing their code to access the api with a single thread.

So on the one hand you are actively disallowing users like myself from
synchronising thread access to the api, but on the other hand
encouraging other users to not fix the actual problems.  It doesnt
make sense...

Im sorry for being so harsh, and i know im not winning any friends
here, but i dont give up easily.  I think sqlite is an excellent piece
of software and i believe there is a way which it can be made to
accomodate these concerns.

I have tested the changes i made with some 50 threads performing
probably thousands of statements a second, and had no errors.

If you are willing to work through the problem, i am willing to assist
you in any way that i can.  Wether it be writing test cases or making
modifications to the source.

If there are problems as you say, then there is no reason why they
cant alteast be documented and verified.

Emerson

On 1/5/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
>
> The problem i had was with sqlite not being compatible with the simple
> design that i wanted.  I did try several alternate designs, but only
> as a way of working around the problem i had with sqlite.  It took a
> long time but eventually i managed to get someone to explain why
> sqlite had that particular problem, and i was able to modify the
> sqlite source to resolve the issue.
>

I remain unconvinced that there is any problem with SQLite in
regard to thread safety.  It is my belief that by disabling the
safety check routines, you are opening your self up to lots of
problems.  Those routines are there to protect you, the programmer,
and to make your programming errors obvious.  By disabling those
checks, you have not fixed the problem.  You have merely
suppressed the symptoms so that you will get rare, random
failures that cannot be easily reproduced.  On your own head
be it.

--
D. Richard Hipp  <[EMAIL PROTECTED]>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread drh

"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
> 
> The problem i had was with sqlite not being compatible with the simple
> design that i wanted.  I did try several alternate designs, but only
> as a way of working around the problem i had with sqlite.  It took a
> long time but eventually i managed to get someone to explain why
> sqlite had that particular problem, and i was able to modify the
> sqlite source to resolve the issue.
> 

I remain unconvinced that there is any problem with SQLite in
regard to thread safety.  It is my belief that by disabling the
safety check routines, you are opening your self up to lots of
problems.  Those routines are there to protect you, the programmer,
and to make your programming errors obvious.  By disabling those
checks, you have not fixed the problem.  You have merely 
suppressed the symptoms so that you will get rare, random
failures that cannot be easily reproduced.  On your own head
be it.

--
D. Richard Hipp  <[EMAIL PROTECTED]>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-04 Thread Emerson Clarke


Bill,

Thanks for the description, thats pretty much how i designed the
index, but with a few modifications.  The filesystem becomes the tree
structure which is indexed by a hash of the original document url. It
works like a big hashtable so its quite scalable.




>
Sorry if this has been posited before, but our solution to this was data
normalisation. IE, we store only the filename, and an integer ID which
is the directory of the file, and of course a lookup table of id's to
directory names. This helped us greatly, as well as reducing overhead of
amount of data stored.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-04 Thread Bill King

Emerson Clarke wrote:

The indexing process works like this.

1.) Open a document and parse its contents.
2.) Look up records in the first database based on the contents of the
document, updating records where appropriate and inserting new ones.
3.) Transforming the document based on what was obtained from the
first database.
4.) Create a filesystem structure in the form of a folder and two or 
more files.

5.) Look up some more records in a second database, updating and
inserting as necessary.

For architectural reasons the above steps must be performed in that
order. This means that operations cant be separated or queued up in
the way that you suggested.  Each step is dependent on the previous
step.

But by having multiple threads and using synchronisation around the
database operations you can ensure that multiple database operations
are always pending.

Each thread will be at a different stage in the pipeline, but a few
will always be ready to perform a database operation so the idea is to
maximise the throughput.

As you said, the more rows per transaction, the more rows per second.
Which brings us back to the original issue.  Why cant i have multiple
threads all using the same connection within a single transaction ?

Of course i know the simple answer, which is that the current api does
not support this.  But im wondering why, and if there are any other
ways to achieve the desired performance.

Emerson

On 12/28/06, Roger Binns <[EMAIL PROTECTED]> wrote:

Emerson Clarke wrote:
> The idea is that because i am accessing two databases, and doing
> several file system operations per document, there should be a large
> gain by using many threads.  There is no actual indexing process, the
> whole structure is the index, but if anything the database operations
> take the most time.  The filesystem operations have a very small
> amount of overhead.

That is all unclear from your original description.  Aren't you trying
to "index" several million documents and doesn't the process of indexing
consist of two parts?

1: Open the document, parse it in various ways, build index data, 
close it

2: Add a row to a SQLite database

My point was that #1 is way more work than #2, so you can run #1's in
multiple threads/processes and do #2 in a single thread using a
queue/pipe object for communication.

On the other hand, if #1 is way less work than #2 then you will be bound
by the speed at which you decide to make transactions in SQLite.  A 7200
rpm disk limits you to 60 transactions a second.  The more rows per
transaction, the more rows per second.

Roger

- 

To unsubscribe, send email to [EMAIL PROTECTED]
- 

- 

To unsubscribe, send email to [EMAIL PROTECTED]
- 

Sorry if this has been posited before, but our solution to this was data 
normalisation. IE, we store only the filename, and an integer ID which 
is the directory of the file, and of course a lookup table of id's to 
directory names. This helped us greatly, as well as reducing overhead of 
amount of data stored.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-03 Thread Nicolas Williams

On Thu, Jan 04, 2007 at 12:50:01AM +, Emerson Clarke wrote:
> My oppologies, your right that explanation had been given.

OK.

> But i didnt actually take it seriously, i guess i found it hard to
> believe that it being the easier option was the only reason why this
> limitation was in place.

SQLite is a large pile of code.  Other libraries that I'm familiar with
that have taken this approach are larger still.  Retrofitting MT-safety
into these is hard, so the easiest path is often taken.  (It may be that
SQLite was always intended to be MT-safe, but I don't know that for a
fact.)

> If this is the case, then surely the fix is simple.  Given that i
> assume it is safe to have multiple sqlite3_step() calls active on a
> single connection on a single thread.  And given what you have said
> about sqlite not already checking data structures that would be shared
> by multiple threads, then surely all that needs to happen is for the
> misuse detection to be removed.

Your first assumption, as has been explained repeatedly, is incorrect.

Oh, wait.  I think I understand what's happening.  You've missunderstood
what you've been told (your next paragraph makes me think so).

You *can* use sqlite3_step() with the same db context in multiple
threads, you just have to synchronize so this doesn't happen
*concurrently*.

If you remove the misuse detection but don't synchronize I believe
you'll find that your application will crash or worse.

> Since there is usually nothing which needs to be done to specifically
> make any api thread safe other than synchronising access too it.  If
> by synchronising access to the api calls i can ensure that no two
> threads use any data structure at the same time, everything should
> work fine right ?

Yes.

Nico
-- 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-03 Thread Emerson Clarke

Nicholas,

My oppologies, your right that explanation had been given.

But i didnt actually take it seriously, i guess i found it hard to
believe that it being the easier option was the only reason why this
limitation was in place.

If this is the case, then surely the fix is simple.  Given that i
assume it is safe to have multiple sqlite3_step() calls active on a
single connection on a single thread.  And given what you have said
about sqlite not already checking data structures that would be shared
by multiple threads, then surely all that needs to happen is for the
misuse detection to be removed.

Since there is usually nothing which needs to be done to specifically
make any api thread safe other than synchronising access too it.  If
by synchronising access to the api calls i can ensure that no two
threads use any data structure at the same time, everything should
work fine right ?

This gets to the very core of the original issue i had, and why i said
sqlite was "actively" thread unsafe.  Because it terminates itself
with misuse errors even when in theory it is perfectly safe to be used
with multiple threads provided the user synchronises on every api
call, which is easily achieved.

Am i making any sense ?

Emerson

On 1/3/07, Nicolas Williams <[EMAIL PROTECTED]> wrote:

On Wed, Jan 03, 2007 at 11:22:36PM +, Emerson Clarke wrote:
> Ok,
>
> Well can you do me a favour and forward me the email where this was
> supposedly explained in all its technical glory.

Technical glory?  No.  At a high level it's this: that it is easier to
make an API like SQLite's thread-safe with exceptions like "only one
thread active in any given context object at any time" than it is to
put mutexes and what not around data structures that would be shared by
multiple threads if this exception were not stated.  Unless and until
you try to do it the other way you'll find this rationale to be
subjective.

Like I said, good luck.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-03 Thread Emerson Clarke


Michael,

Thanks for the advice.  During the indexing process i need to select
and optionally insert records into a table so i cant ignore the
outcomes.

Basically the indexing process does compression, so for each document
it inserts words into a table and looks up keys.  Every word in the
document gets swapped with a key, and new keys are inserted as needed.

There are some problems with splitting the work up in a different way
as you suggested. I would either end up with a lot of queues or i
would have to stagger the work so that the entire data set gets
processed in stages which doesnt scale very well and isnt particularly
fault tollerant.  When building an index, you want the structure to be
built up progressively, so that you can pause the process and resume
it later on whilst still having useful results.

I would be worried that in a queued design, the overhead and
bottlenecks caused by the buffering, message passing, and context
switching would reduce the performance to that of a single thread.
Especially since the database operations represent 90% of the work,
all you would really be doing is attempting to serialise things in a
multithreaded way.

Im sure having worked on multithreaded systems you appreciate that
sometimes simple designs are better, and i think i have a pretty good
handle on what it is that im trying to do.

You never have control over static initialisation, it happens before
main().  If i was writing very specific code to suit just this
situation then maybe as you say i wouldnt need to worry about it.  But
im also writing a database api, and that api is used for many
different things.  My considderations are not just for this one
problem, but also for the best general way to code the api so that it
is safe and efficient in all circumstances.  So far the client/server
design is the only way i can achieve true thread safety.

If i could work out why sqlite3_step() causes problems across multiple
threads i could probably make things a little faster and i could do
away with the need for a client/server design.

Emerson


On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:

Emerson,

Now I understand your current implementation.  You seemingly only partially
split up the work in your code. I'd schedule the database operation and not
wait on the outcome, but start on the next task. When the database finishes
and has retrieved its result, schedule some work package on a third thread,
which only processes the results etc. Split up the work in to repetitive,
non blocking tasks. Use multiple queues and dedicated threads for parts of
the operation or thread pools, which process queues in parallel if possible.
From what I can tell you're already half way there.

I still don't see your static initialization problem, but that's another
story. Actually I'd avoid using static initialization or static (singleton)
instances, unless the design really requires it. Someone must control
startup of the entire process, have that one (probably main/WinMain) take
care that the work queues are available. Afterwards the order of thread
starts doesn't matter... Actually it is non-deterministic anyway (unless you
serialize this yourself.)

Michael

-Ursprüngliche Nachricht-
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 15:14
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading

Michael,

Im not sure that atomic operations would be a suitable alternative.
The reason why im using events/conditions is so that the client thread
blocks until the server thread has processed the query and returned the
result.  If i did not need the result then a simple queueing system with
atomic operations or critical sections would be fine i guess.

The client thread must always block or spin until the server thread has
completed the query.  Critical sections cant be efficiently used to notify
other threads of status change.  I did try using critical sections in this
way, by spinning until the server thread takes a lock, then blocking and
eventually waiting for the server thread to finish.  But since there is no
way to block the server thread when there is no work to do both the client
and server thread must sleep which induces context switching anyway.

If you used atomic operations, how would you get the client thread to block
and the server thread to block when it is not processing ?

Events/conditions seemed to be the best solution, the server thread never
runs when it doesnt need to and always wakes up when there is processing to
be done.

The static initialisation problem occurs becuase the server thread must be
running before anything which needs to use it.  If you have a static
instance of a class which accesses a database and it is initalised before
the static instance which controls the server thread, you have a problem.
It can be overcome using the initialise on first use idiom, as long as your
careful to protect the ini

Re: [sqlite] sqlite performance, locking & threading

2007-01-03 Thread Nicolas Williams

On Tue, Jan 02, 2007 at 11:56:42PM +, Emerson Clarke wrote:
> The single connection multiple thread alternative apparently has
> problems with sqlite3_step being active on more than one thread at the
> same moment, so cannot easily be used in a safe way.  But it is by far
> the fastest and simplest alternative.

No, not "apparently" -- it _does_.  What you should do is keep a set of
db contexts and assign them to clients/connections and make sure that
each clients/connections is only every serviced by one thread at a time.

One way to do this is to classify incoming messages, select an existing
object representing that client/connection/whatever or create a new one,
then queue the new message in that object and queue this object up for
dispatch to a worker thread.

Cheers,

Nico
-- 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-03 Thread Emerson Clarke

Michael,

Im not sure that atomic operations would be a suitable alternative.
The reason why im using events/conditions is so that the client thread
blocks until the server thread has processed the query and returned
the result.  If i did not need the result then a simple queueing
system with atomic operations or critical sections would be fine i
guess.

The client thread must always block or spin until the server thread
has completed the query.  Critical sections cant be efficiently used
to notify other threads of status change.  I did try using critical
sections in this way, by spinning until the server thread takes a
lock, then blocking and eventually waiting for the server thread to
finish.  But since there is no way to block the server thread when
there is no work to do both the client and server thread must sleep
which induces context switching anyway.

If you used atomic operations, how would you get the client thread to
block and the server thread to block when it is not processing ?

Events/conditions seemed to be the best solution, the server thread
never runs when it doesnt need to and always wakes up when there is
processing to be done.

The static initialisation problem occurs becuase the server thread
must be running before anything which needs to use it.  If you have a
static instance of a class which accesses a database and it is
initalised before the static instance which controls the server
thread, you have a problem.  It can be overcome using the initialise
on first use idiom, as long as your careful to protect the
initalisation with atomic operations, but its still a bit complicated.

Emerson

On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:

Hi Emerson,

Another remark: On Windows using Events synchronization objects involves
additional kernel context switches and thus slows you down more than
necessary. I'd suggest using a queue, which makes use of the InterlockedXXX
operations (I've implemented a number of those, including priority based
ones - so this is possible without taking a single lock.) or to use critical
sections - those only take the kernel context switch if there really is lock
contention. If you can reduce the kernel context switches, you're
performance will likely increase drastically.

I also don't see the static initialization problem: The queue has to be
available before any thread is started. No thread has ownership of the
queue, except may be the main thread.

Michael

-Ursprüngliche Nachricht-
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 00:57
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading

Nico,

I have implemented all three strategies (thead specific connections, single
connection multiple threads, and single thread server with multiple client
threads).

The problem with using thread specific contexts is that you cant have a
single global transaction which wraps all of those contexts.  So you end up
having to use fine grained transactions, which decreases performance.

The single connection multiple thread alternative apparently has problems
with sqlite3_step being active on more than one thread at the same moment,
so cannot easily be used in a safe way.  But it is by far the fastest and
simplest alternative.

The single thread server solution involves message passing between threads,
and even when this is done optimally with condition variables (or events on
windows) and blocking ive found that it results in a high number of context
switches and decreased performance.  It does however make a robust basis for
a wrapper api, since it guarantees that things will always be synchronised.
But using this arrangement can also result in various static initialisation
problems, since the single thread server must always be up and running
before anything which needs to use it.

Emerson

On 1/2/07, Nicolas Williams <[EMAIL PROTECTED]> wrote:
> On Sat, Dec 30, 2006 at 03:34:01PM +, Emerson Clarke wrote:
> > Technically sqlite is not thread safe.  [...]
>
> Solaris man pages describe APIs with requirements like SQLite's as
> "MT-Safe with exceptions" and the exceptions are listed in the man page.
>
> That's still MT-Safe, but the caller has to play by certain rules.
>
> Anyways, this is silly.  SQLite API is MT-Safe with one exception and
> that exception is rather ordinary, common to other APIs like it that
> have a context object of some sort (e.g., the MIT krb5 API), and not
> really a burden to the caller.  In exchange for this exception you get
> an implementation of the API that is lighter weight and easier to
> maintain than it would have been without that exception; a good
> trade-off IMO.
>
> Coping with this exception is easy.  For example, if you have a server
> app with multiple worker threads each of which needs a db context then
> you could use a thread-specific key to track a per-

Re: [sqlite] sqlite performance, locking & threading

2007-01-02 Thread drh

"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
> 
> Firstly can i clarify what you mean regarding the same moment.  Do you
> mean that no two threads can be executing the call, or that no two
> threads can be in the middle of stepping through a series of results
> using the step function (assuming there is some context behind the
> scenes).  

This first.
--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-02 Thread Emerson Clarke


Nico,

I have implemented all three strategies (thead specific connections,
single connection multiple threads, and single thread server with
multiple client threads).

The problem with using thread specific contexts is that you cant have
a single global transaction which wraps all of those contexts.  So you
end up having to use fine grained transactions, which decreases
performance.

The single connection multiple thread alternative apparently has
problems with sqlite3_step being active on more than one thread at the
same moment, so cannot easily be used in a safe way.  But it is by far
the fastest and simplest alternative.

The single thread server solution involves message passing between
threads, and even when this is done optimally with condition variables
(or events on windows) and blocking ive found that it results in a
high number of context switches and decreased performance.  It does
however make a robust basis for a wrapper api, since it guarantees
that things will always be synchronised.  But using this arrangement
can also result in various static initialisation problems, since the
single thread server must always be up and running before anything
which needs to use it.

Emerson

On 1/2/07, Nicolas Williams <[EMAIL PROTECTED]> wrote:

On Sat, Dec 30, 2006 at 03:34:01PM +, Emerson Clarke wrote:
> Technically sqlite is not thread safe.  [...]

Solaris man pages describe APIs with requirements like SQLite's as
"MT-Safe with exceptions" and the exceptions are listed in the man page.

That's still MT-Safe, but the caller has to play by certain rules.

Anyways, this is silly.  SQLite API is MT-Safe with one exception and
that exception is rather ordinary, common to other APIs like it that
have a context object of some sort (e.g., the MIT krb5 API), and not
really a burden to the caller.  In exchange for this exception you get
an implementation of the API that is lighter weight and easier to
maintain than it would have been without that exception; a good
trade-off IMO.

Coping with this exception is easy.  For example, if you have a server
app with multiple worker threads each of which needs a db context then
you could use a thread-specific key to track a per-thread db context;
use pthread_key_create(3C) to create the key, pthread_setspecific(3C)
once per-thread to associate a new db context with the calling thread,
and pthread_getspecific(3C) to get the calling thread's db context when
you need it.  If you have a protocol where you have to step a statement
over multiple message exchanges with a client, and you don't want to
have per-client threads then get a db context per-client/exchange and
store that and a mutext in an object that represents that
client/exchange.  And so on.

Nico
--

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2007-01-02 Thread Nicolas Williams

On Sat, Dec 30, 2006 at 03:34:01PM +, Emerson Clarke wrote:
> Technically sqlite is not thread safe.  [...]

Solaris man pages describe APIs with requirements like SQLite's as
"MT-Safe with exceptions" and the exceptions are listed in the man page.

That's still MT-Safe, but the caller has to play by certain rules.

Anyways, this is silly.  SQLite API is MT-Safe with one exception and
that exception is rather ordinary, common to other APIs like it that
have a context object of some sort (e.g., the MIT krb5 API), and not
really a burden to the caller.  In exchange for this exception you get
an implementation of the API that is lighter weight and easier to
maintain than it would have been without that exception; a good
trade-off IMO.

Coping with this exception is easy.  For example, if you have a server
app with multiple worker threads each of which needs a db context then
you could use a thread-specific key to track a per-thread db context;
use pthread_key_create(3C) to create the key, pthread_setspecific(3C)
once per-thread to associate a new db context with the calling thread,
and pthread_getspecific(3C) to get the calling thread's db context when
you need it.  If you have a protocol where you have to step a statement
over multiple message exchanges with a client, and you don't want to
have per-client threads then get a db context per-client/exchange and
store that and a mutext in an object that represents that
client/exchange.  And so on.

Nico
-- 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
> 
> "With that and subsequent versions, it is safe to move a connection
> handle across threads as long as the connection is not holding any
> fcntl() locks. You can safely assume that no locks are being held if
> no transaction is pending and all statements have been finalized."
> 
> The above statement regarding transactions was one of the things which
> led me to believe that it was not possible to have the single
> transaction, single connection and multiple thread arrangement.
> 

The statement is true in general.  It works even on the older,
broken versions of Linux.  If you are running on a newer version
of Linux or on OS-X or windows, then you can move a database
connection from one thread to another anytime you want.  But
if you write code that does that, it will not be portable to
older versions of linux with the fcntl() bug.

--
D. Richard Hipp  <[EMAIL PROTECTED]>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

Richard,

Im getting some mixed signals here, on the one hand i am being told
that it should be ok to use a single transaction and connection across
multiple threads (assuming that it is correctly synchronised). But on
the other hand i am reading statements like that as part of a list of
things which may cause a sqlite misuse error.

"Trying to use the same SQLite database connection from two or more
threads at the same time."

Anyway, the sqlite_step() problem certainly explains one of the
crashes that i saw so that will save me some debugging.

Thanks to those who have suggested alternative designs and strategies.
I am currently adapting the single thread does all the database work
solution.

The documentation sources i have been looking at regarding threading
are (in addition to various discussions in this mailing list) here;

http://www.sqlite.org/cvstrac/wiki?p=MultiThreading
http://www.sqlite.org/lang_transaction.html
http://www.sqlite.org/lockingv3.html
http://www.sqlite.org/faq.html#q8

"With that and subsequent versions, it is safe to move a connection
handle across threads as long as the connection is not holding any
fcntl() locks. You can safely assume that no locks are being held if
no transaction is pending and all statements have been finalized."

The above statement regarding transactions was one of the things which
led me to believe that it was not possible to have the single
transaction, single connection and multiple thread arrangement.

Emerson

On 12/30/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

Joe Wilson <[EMAIL PROTECTED]> wrote:
> --- [EMAIL PROTECTED] wrote:
> > SQLite allows you to have multiple queries running at the
> > same time in multiple threads, as long as no two threads are
> > running sqlite3_step() at the same moment.
>
> Do you mean "as long as no two threads are running sqlite3_step()
> _for the same sqlite3* connection_ at the same moment"?
>

Yes.
--
D. Richard Hipp  <[EMAIL PROTECTED]>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

Joe Wilson <[EMAIL PROTECTED]> wrote:
> --- [EMAIL PROTECTED] wrote:
> > SQLite allows you to have multiple queries running at the
> > same time in multiple threads, as long as no two threads are
> > running sqlite3_step() at the same moment.
> 
> Do you mean "as long as no two threads are running sqlite3_step() 
> _for the same sqlite3* connection_ at the same moment"?
> 

Yes.
--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2006-12-30 Thread Joe Wilson

--- [EMAIL PROTECTED] wrote:
> SQLite allows you to have multiple queries running at the
> same time in multiple threads, as long as no two threads are
> running sqlite3_step() at the same moment.

Do you mean "as long as no two threads are running sqlite3_step() 
_for the same sqlite3* connection_ at the same moment"?

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

2006-12-30 Thread John Stanton

Good advice.  If you want to experiment with a certain architecture, 
write the programs yourself rather than trying to get others to do it 
for you.  Sqlite is a well thought through solution for an embedded 
database with a wide range of applications, but if you want something 
more specialised, write it yourself.


Indexing and data storage methods are well documented so it is not a 
daunting task to roll your own and get all the performance you want.


As Roger points out, Sqlite is open source and is available to you as a 
model.  You will find that it is clearly and simply written and the code 
is very easy to follow.


Roger Binns wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Emerson Clarke wrote:
| I have deliberately tried to avoid giving too much detail on the
| architecture of the index since that was not the point and i didnt
| want to end up debating it.

I don't want to debate your index architecture either :-).  Quite simply
several times a month people post to this ideally wanting SQLite changed
to match how they want to structure things.  People on the list explore
with the poster how the items of data are related, and can suggest an
alternative way of doing things.  Usually the poster finds that simpler
than what they had first thought of and goes away happy.

Is this your question: I want SQLite to work differently than it
currently does so that it matches how I want to do things?

If that is the case, then the answer is you can go ahead and rewrite as
much of SQLite as you want to to do that.  The code is public domain so
there are no legal or technical hindrances standing in your way.  This
thread may as well end at that.

On the other hand, if you do want to work within the constraints of
SQLite then there are quite a few things that can be suggested.  But
that is only possible if more is known about the relationships of the data.

| I did make an attempt to explain that A and B could not be done at the
| same time in a previous message, but perhaps its been lost in the
| conversation.  The process involves several stages some of which are
| database operations and some of which are file operations and that the
| operations are not separable.  They must be done in sequential order.

I was trying to establish what has to be serialized.  In particular the
question was about if A and B had any relationships to each other.  If
they do, then that means they would have to be processed serially and I
don't see the relevance of threading etc.

If they can be processed at the same time, then that means some sort of
partitioning can happen.  In theory breaking the datasets into 10
partitions can give 10 times the performance, but in practise there will
need to be some coordination in order to make it look like there is one
database not multiple pieces.

| The database operations, though very small still consume the most time
| and are the most sensetive to how the synchronisation takes place and
| where the transactions are placed.

Have you considered just using plain DB/dbm/gdbm and then importing the
data on demand into SQLite?

Also a lot of the synchronisation is because SQLite makes damn sure it
doesn't lose your data.  If your documents are permanent (ie you can
access them later if need be), then you can loosen the constraints on
SQLite.  For example you could run with pragma synchronous=off and then
do a checkpoint every 100,000 documents where you close the database,
copy it to a permanent file, sync, and start again.  You could also use
a ram disk and copy to permanent storage as your checkpoint.

| I dont think custom functions are
| appropriate for what im doing and im not sure how virtual tables would
| be either, i rather suspect that would be a very complicated approach.

You can (ab)use custom functions and virtual tables to help behind the
scenes.  For example they can be used to make data sets that are
partitioned appear to be a single whole.  Another example is if you have
your database in two pieces - one that is read only with "old" data and
new one with updates.  That can again appear to the rest of the code as
one database.  Finally you can also make the functions and virtual
tables have side effects even on what appear to be read only queries.

| The schema is extemely simple, and there is barely any logic too the
| indexing process at all.

Maybe not even indexing the documents at all would work?  If you used a
virtual table, you can make it grovel through the documents on demand.
You can even build indices (in the SQL sense) which are in your own
format and performance characteristics and use those for the virtual table.

| Unfortunately i cannot do this with sqlite at the moment...

Correct.  SQLite errs on the side of being a library with no controller,
working with multiple processes and only having the lowest common
denominator operating system locking functionality available.  There are
techniques that can be used to improve concurrency.  DRH has a

Re: [sqlite] sqlite performance, locking & threading

"Michael Ruck" <[EMAIL PROTECTED]> wrote:
> Richard,
> 
> I believe his problem is this:
> 
> "Each query is allowed to complete before the other one starts, but each
> thread may have multiple statements or result sets open."
> 
> The open resultsets/multiple started statements are causing him =
> headaches.
> 

SQLite allows you to have multiple queries running at the
same time in multiple threads, as long as no two threads are
running sqlite3_step() at the same moment.

Isn't there a list of possible causes for SQLITE_MISUSE somewhere.
I seem to remember writing such a list one.  Does anybody know where 
I put it?

--
D. Richard Hipp  <[EMAIL PROTECTED]>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
> Richard,
> 
> Are you sure we are not just getting into semantic knots here ?
> 
> Do we have the same definition of "at the same time".  I mean
> concurrently, so that both threads use the same sqlite3 * structure,
> within mutexes. Each query is allowed to complete before the other one
> starts, but each thread may have multiple statements or result sets
> open.
> 
> When i try to do this, i get api called out of sequence errors...
> 

There are around 50 test cases for this kind of behavior in the
regression test files thread1.test and thread2.test.  They all
seem to work for me.

Perhaps your mutexes are not working as you expect and you are
in fact trying to use the same database connection simultaneously
in two or more threads.  SQLite attempts to detect this situation
and when it sees it it return SQLITE_MISUSE which generates the
"API called out of sequence" error. 

--
D. Richard Hipp  <[EMAIL PROTECTED]>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
> 
> I have code which creates a transaction on a connection in the parent
> thread, then creates several child threads which attempt to use the
> same connection and transaction in a synchronised mannor.  It does not
> work, and by all the documentation that i have read on sqlite, it
> should not work.
> 

This should work fine on SQLite version 3.3.1 and later on
newer Linux machines or on any windows or OS-X machine.  And I
think the documentation is clear on that point.

Where in the documentation does it suggest otherwise?

--
D. Richard Hipp  <[EMAIL PROTECTED]>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

RE: [sqlite] sqlite performance, locking & threading

2006-12-30 Thread Fred Williams

Been following this a while...  You have access to the source, and
apparently are a "threading genius."  Please make the required minor
changes and post a link here so we can all benefit.

Fred

> -Original Message-
> From: Emerson Clarke [mailto:[EMAIL PROTECTED]
> Sent: Saturday, December 30, 2006 9:34 AM
> To: sqlite-users@sqlite.org
> Subject: Re: [sqlite] sqlite performance, locking & threading
>
>
> Roger,
>
> I think sqlite suffers somewhat from a bit of an identity crisis.
> Whilst it is both a library and a piece of code which you embed in a
> project it is often talked about as though it is some external
> component.
>
> Technically sqlite is not thread safe.  Just because the library has
> explicitly disallowed using the same sqlite3 * structure in multiple
> threads on some platforms (i understand this requirement has been
> relaxed on others) does not make it thread safe.  Even on the
> platforms where a single sqlite3 * structure can be used on multiple
> threads (provided it is not at the same time), it is not possible to
> have a transaction which works across these threads.  So even if the
> connection is thread safe, the transactions are not.
>
> By the usual definition, something which is thread safe can be safely
> used across multiple threads, usually with the aid of synchronisation
> but sometimes not.  For instance collections are often considdered
> thread safe only when they manage their own mutexes internally so that
> the user doesnt have to.  But either way, you can use them accross
> multiple threads.  You cannot do this with sqlite, so it is quite
> confusing to say that sqlite is thread safe...
>
> I think a better definition would be that sqlite can be safely used in
> a multithreaded program, but is not thread safe.
>
> I agree that multithreaded programming can be difficult, but its not
> magic and i think that a few simple rules can overcome most of the
> problems.  It certainly is not luck that multithreaded systems work,
> usually its the result of careful design and hard work.
>
> Emerson
>
> On 12/30/06, Roger Binns <[EMAIL PROTECTED]> wrote:
> > Emerson Clarke wrote:
> > > If i have a linked list, i can use it across threads if i want to,
> > > provided that i synchronise operations in such a way that the list
> > > does not get corrupted.
> >
> > And of course you also have to know about memory barriers
> and compiler
> > re-ordering.  That is highly dependent on the libraries
> and/or compiler
> > you are using, as well as underlying hardware
> implementation.  Most of
> > the time, developers just get lucky.
> >
> >   http://en.wikipedia.org/wiki/Memory_barrier
> >
> > > Likewise for most other data structures and libraries.
> >
> > Arguably that is by luck and not design!  Look at the
> effort that to go
> > in an add _r suffixed versions of several functions in the standard
> > libraries.  And many GUI libraries have long had a
> restriction that you
> > can only use them in one thread.
> >
> > > Sqlite does not follow these rules, as something created
> in one thread
> > > does not work in another thread regardless of
> synchronisation and it
> > > is out of my control.
> >
> > SQLite's design was not "luck".  The design expects you to
> create unique
> > sqlite3 objects in each thread.  Effort and thought was put
> into that!
> >
> > http://www.sqlite.org/cvstrac/wiki?p=MultiThreading
> >
> > It was loosened a bit in 3.3.x:
> >
> >   http://www.sqlite.org/faq.html#q8
> >
> > What isn't allowed is multiple statements executing at the
> same time in
> > multiple threads against the same sqlite3* db object.  In order to
> > support that, SQLite would have to have extensive code
> protecting the
> > various internal data structures as well as ensuring concurrency.
> >
> > > This is not a situation that i would expect anyone to purposefully
> > > design becuase it makes multithreaded programming difficult,
> >
> > The purposeful design is that you make sqlite3 objects per
> thread.  That
> > way there is absolutely no danger of corruption or other bad issues.
> >
> > Roger
> >
> >
> >
> --
> ---
> > To unsubscribe, send email to [EMAIL PROTECTED]
> >
> --
> ---
> >
> >
>
> --
> ---
> To unsubscribe, send email to [EMAIL PROTECTED]
> --
> ---
>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

Richard,

Are you sure we are not just getting into semantic knots here ?

Do we have the same definition of "at the same time".  I mean
concurrently, so that both threads use the same sqlite3 * structure,
within mutexes. Each query is allowed to complete before the other one
starts, but each thread may have multiple statements or result sets
open.

When i try to do this, i get api called out of sequence errors...

On 12/30/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
> But why then can i not have a single transaction wrapping a single
> connection which is used within multiple threads, obvioulsy not at the
> same time.

You can.  What makes you think you can't?
--
D. Richard Hipp  <[EMAIL PROTECTED]>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
> But why then can i not have a single transaction wrapping a single
> connection which is used within multiple threads, obvioulsy not at the
> same time. 

You can.  What makes you think you can't?
--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

Richard,

Ok, im pretty clear on the file locking being the cause of the
problems with the sqlite3 * structures, but thanks for confirming it.
I understand that on platforms that dont have this issue its not a
problem.

But why then can i not have a single transaction wrapping a single
connection which is used within multiple threads, obvioulsy not at the
same time.  Its rare that anything can be used at the same time by
multiple threads, hence the need for atomic operations (which *nix
doesnt have great standard support for).

To clarify, i have never been talking about using something
simultaneously, only within the confines of synchronisation.  So when
i talk about using something in multiple threads, i mean in a
syhcnronised way, but still it would be the same exact piece of
memory.

On 12/30/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
> Richard,
>
> My complaint, if you want to call it that.  Was simply that there are
> seemingly artificial constraints on what you can and cant do accross
> threads.
>
> If i have a linked list, i can use it across threads if i want to,
> provided that i synchronise operations in such a way that the list
> does not get corrupted.
>
> Likewise for most other data structures and libraries.  The default
> behaviour is that if i create an object on the stack or the heap, i
> can use it between multiple threads provided that i synchronise access
> to that object.
>
> Sqlite does not follow these rules, as something created in one thread
> does not work in another thread regardless of synchronisation and it
> is out of my control.
>

I think it was already explained to you that the reason for this
behavior is to work around bugs in file locking in older versions
of Linux.  If you are using a newer version of Linux or OS-X or
windows, those constraints do not apply to you. You can move
database connections across threads freely.

What you cannot do is to use the same database connection in two
or more threads at the same time.  This is the usual case with most
libraries, threadsafe or not - you can use separate instances of
an object in different threads at the same time, but you cannot use
the same object simultaneously in multiple threads.
--
D. Richard Hipp  <[EMAIL PROTECTED]>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

"Emerson Clarke" <[EMAIL PROTECTED]> wrote:
> Even on the
> platforms where a single sqlite3 * structure can be used on multiple
> threads (provided it is not at the same time), it is not possible to
> have a transaction which works across these threads.  

I beg to differ.  What makes you think this does not work?
--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading

Roger,

I think sqlite suffers somewhat from a bit of an identity crisis.
Whilst it is both a library and a piece of code which you embed in a
project it is often talked about as though it is some external
component.

Technically sqlite is not thread safe.  Just because the library has
explicitly disallowed using the same sqlite3 * structure in multiple
threads on some platforms (i understand this requirement has been
relaxed on others) does not make it thread safe.  Even on the
platforms where a single sqlite3 * structure can be used on multiple
threads (provided it is not at the same time), it is not possible to
have a transaction which works across these threads.  So even if the
connection is thread safe, the transactions are not.

By the usual definition, something which is thread safe can be safely
used across multiple threads, usually with the aid of synchronisation
but sometimes not.  For instance collections are often considdered
thread safe only when they manage their own mutexes internally so that
the user doesnt have to.  But either way, you can use them accross
multiple threads.  You cannot do this with sqlite, so it is quite
confusing to say that sqlite is thread safe...

I think a better definition would be that sqlite can be safely used in
a multithreaded program, but is not thread safe.

I agree that multithreaded programming can be difficult, but its not
magic and i think that a few simple rules can overcome most of the
problems.  It certainly is not luck that multithreaded systems work,
usually its the result of careful design and hard work.

Emerson

On 12/30/06, Roger Binns <[EMAIL PROTECTED]> wrote:

Emerson Clarke wrote:
> If i have a linked list, i can use it across threads if i want to,
> provided that i synchronise operations in such a way that the list
> does not get corrupted.

And of course you also have to know about memory barriers and compiler
re-ordering.  That is highly dependent on the libraries and/or compiler
you are using, as well as underlying hardware implementation.  Most of
the time, developers just get lucky.

  http://en.wikipedia.org/wiki/Memory_barrier

> Likewise for most other data structures and libraries.

Arguably that is by luck and not design!  Look at the effort that to go
in an add _r suffixed versions of several functions in the standard
libraries.  And many GUI libraries have long had a restriction that you
can only use them in one thread.

> Sqlite does not follow these rules, as something created in one thread
> does not work in another thread regardless of synchronisation and it
> is out of my control.

SQLite's design was not "luck".  The design expects you to create unique
sqlite3 objects in each thread.  Effort and thought was put into that!

http://www.sqlite.org/cvstrac/wiki?p=MultiThreading

It was loosened a bit in 3.3.x:

  http://www.sqlite.org/faq.html#q8

What isn't allowed is multiple statements executing at the same time in
multiple threads against the same sqlite3* db object.  In order to
support that, SQLite would have to have extensive code protecting the
various internal data structures as well as ensuring concurrency.

> This is not a situation that i would expect anyone to purposefully
> design becuase it makes multithreaded programming difficult,

The purposeful design is that you make sqlite3 objects per thread.  That
way there is absolutely no danger of corruption or other bad issues.

Roger

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] sqlite performance, locking & threading