Re: [sqlite] VC7 VC8 Linker Errors

2007-05-21 Thread Emerson Clarke

Im building as a static library with /MT (multithreaded)

I just thought it was strange that none of the other libraries i build
have similar problems and was wondering if anyone knew the specific
cause ?

On 5/21/07, Teg [EMAIL PROTECTED] wrote:

Hello Emerson,

How are you compiling it? /MT /MTd or /MD (static multi,
static-multi-debug,DLL). What MFC lib are you telling it to use if
any? It looks like standard runtime library difference between the
libs. I ran into similar problems with all my libs between VC6 and
VS2005. I just have SQlite defined in my project and it gets rebuilt
every time I build my project.  I'm switching between 32 and 64 but
build so I do many Build-All's

If you use the DLL version, this probably goes away.

C

Sunday, May 20, 2007, 6:33:48 PM, you wrote:

EC Ive been building some code on VC7 and VC8 under Windows Vista and
EC noticed that SQLite is not compatible with switching between the two
EC SDK's.

EC I use many libraries (mysql, openssl, zlib) and all of the others
EC which are built with VC7 work fine with VC8 and the Vista SDK, but
EC SQLite gets the following error.

EC sqlite.lib(sqlite3.obj) : error LNK2001: unresolved external symbol __iob

EC It can be resolved by recompiling SQLite with VC8 and the Vista SDK,
EC but then when i switch back to VC7 i get another error.

EC sqlite.lib(sqlite3.obj) : error LNK2019: unresolved external symbol
EC __ftol2_sse referenced in function _computeJD

EC Does anyone know whats causing this and if there is an easy work
EC around ?  Its tedious having to switch .lib files (SQLite is the only
EC library ive had to do this with).

EC Emerson

EC 
-
EC To unsubscribe, send email to [EMAIL PROTECTED]
EC 
-




--
Best regards,
 Tegmailto:[EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] VC7 VC8 Linker Errors

2007-05-20 Thread Emerson Clarke

Ive been building some code on VC7 and VC8 under Windows Vista and
noticed that SQLite is not compatible with switching between the two
SDK's.

I use many libraries (mysql, openssl, zlib) and all of the others
which are built with VC7 work fine with VC8 and the Vista SDK, but
SQLite gets the following error.

sqlite.lib(sqlite3.obj) : error LNK2001: unresolved external symbol __iob

It can be resolved by recompiling SQLite with VC8 and the Vista SDK,
but then when i switch back to VC7 i get another error.

sqlite.lib(sqlite3.obj) : error LNK2019: unresolved external symbol
__ftol2_sse referenced in function _computeJD

Does anyone know whats causing this and if there is an easy work
around ?  Its tedious having to switch .lib files (SQLite is the only
library ive had to do this with).

Emerson

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] sqlite3_step error code confusion

2007-05-09 Thread Emerson Clarke

Ive been getting SQLITE_MISUSE errors after calling sqlite3_step one
more time than required.  Im using sqlite3_prepare_v2 so im expecting
detailed error messages, but when i call sqlite3_errmsg all i seem to
get is not an error.

So my question is this.  What is an error ?  It seems a little
confusing becuase there is a SQLITE_ERROR code, and also a number of
other detailed codes which i assume represent forms of errors.

The old version of sqlite3_step used to return SQLITE_ERROR, and then
you would call sqlite3_reset to get the detailed error code.  This
implies that the other codes are in fact errors.  Am i correct in
assuming that SQLITE_MISUSE is a form of error ?  Is there some subset
of the error codes which work with sqlite3_errmsg ?

I would have expected the error message that i got for SQLITE_MISUSE
to be library used incorectly or some string other than not an
error.

Emerson

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] 回复: [sqlite] C++ Framework

2007-03-09 Thread Emerson Clarke

No, nothing like ACE.

More like .Net or Java.

ACE is a little heavier and more narrowly focused on middleware and
messaging.  It has none of the breadth, consistency, or generic
usefulness that Reason does.

Reason is all about letting you write platform agnostic code in C++
without ever having to worry about how to download something from a
web server, write to a file, compress a log file, tell the time,
process xml, create threads, access a database, sort something in a
collection etc...

Reason is a one stop C++ software development library. I like to think
of it as what the C++ language has been missing all these years.

Emerson

On 3/9/07, Linker M Lin [EMAIL PROTECTED] wrote:

It's something like ACE Framework?

--
Linker M Lin
[EMAIL PROTECTED]



-
To unsubscribe, send email to [EMAIL PROTECTED]
-












___
Mp3疯狂搜-新歌热歌高速下
http://music.yahoo.com.cn/?source=mail_mailbox_footer


Re: [sqlite] sqlite performance, locking threading

2007-01-06 Thread Emerson Clarke

For anyone who is interested i have created a standalone test case
which demonstrates the threading behaviour that i had, or as close as
i can get it.  Feel free to use the code for whatever purposes you see
fit.

It will compile on linux and windows, and comes with 4 versions of the
sqlite libraries which can be found under test/lib/sqlite/lib.  Each
platform has two versions, one built with THREADSAFE=0, and the other
with THREADSAFE=1.

http://ul1.rapidshare.com/files/10511098/test.zip

It seems that i was incorrect in thinking the cause was
sqlite3_step(). In replicating the problem from scratch i found that
it was actually caused by sqlite3_finalize() being called without
synchronisation.

What does this mean, well firstly it means that the entire problem was
avoidable had i simply synchronised correctly.  But i think more
worryingly, it highlights the fact that there is a great deal of
confusion as to what it means for something to be thread safe on this
mailing list.

This confusion is also not aided in any way by the obtuse and stubborn
attitudes that some people have demonstrated.  If a lot of people are
having regular problems with sqlite and threading as has been
indicated then perhaps its time to review the situation.

It is incredibly frustrating that the documentation and the various
answers that i have recieved on this topic have been contradictory.
At no time has anyone actually stood up and said that the software can
safely be used in a multithreaded environment as long as access to the
api's is synchronised.

Instead you have a situation where the software has inbuilt misuse
detection which may or may not alert the programmer to a potential
error and a THREADSAFE macro which makes 5% of the it threadsafe but
leaves the other 95% exposed.  (By way of example using the code i
supplied and the safe versions of the libraries it is possible to get
away with calling sqlite3_finalize() outside the mutex, but if you
take it one step further and use sqlite3_step() outside the mutex it
will fail)

So an inconsistent and unpredictable internal approach to thread
safety combined with a policy of righteousness on this mailing list
means that people like me have little chance of ever getting it right.

Why is it so hard to simply nail down the characteristics of the
software and update the documentation so that everyone is on the same
page ?

If sqlite is thread safe, and by that i mean it can be used safely
with synchronisation, then why not say so.  And be consistent in
saying so.  That way when someone comes here with a problem you can
confidently describe how the api can be used and the user can go away
and attempt to resolve their problem with confidence.

I guess in part i owe people an appology, but then, had i been given a
rational explanation or had access to clear documentation i guess this
would not have gone on so long.  And to be quite honest, my confidence
in the software has now been eroded to such an extent that i dont know
what is up or down.

Please considder what im saying rationally and dont take offence.  I
know a lot of you are frustrated at this debate, and so you should be.
I am too. So why not take steps to ensure that it doesnt happen
again.  Be clear about the threading behaviour and remove the
inconsistencies.

Fix the out of date documentation, and give a consistent answer,
sqlite can be used with multiple threads provided that access to the
api is synchronised.

The saftey checks and macro options do more harm than good becuase
they create confusion.  Yes if you do X,Y, and Z you might get lucky,
but thats probably why so many people have trouble.  Its not
consistent and its complicated.  Unless your going to document the
exact sequence of calls which is safe to make without synchronisation,
you shouldnt be encouraging it.

Advising people to create complicated designs where each thread has
its own connection doesnt help either, because it will not work with
transactions.  It should be either 100% safe, or not at all.

Perhaps it would also be useful to document the various strategies and
ways in which sqlite can be used with safety.  Stop telling people
that mutlithreading results in random unpredictable behavior. Be open
minded and don't hide behind whatever doctrine has been followed up
till now.

Emerson

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: AW: [sqlite] sqlite performance, locking threading

2007-01-05 Thread Emerson Clarke

John,

Um, alright then...

But i think your preaching to the converted, simplifying things is
what i always try to do.  And not just when theres a problem

If you followed the thread fully you would realise that there was
never a problem with my design, though that didnt stop many people
from chirping in and offering suggestions.

The problem i had was with sqlite not being compatible with the simple
design that i wanted.  I did try several alternate designs, but only
as a way of working around the problem i had with sqlite.  It took a
long time but eventually i managed to get someone to explain why
sqlite had that particular problem, and i was able to modify the
sqlite source to resolve the issue.

Unfortunately no one has yet commented on my solution, or the problem
which it addresses.   Basically sqlite has thread safety checking
routines which work a little like mutexe's.  Every time you enter a
bit of code which is potentially thread unsafe it sets a magic number,
then resets it when it comes out.  This is an attempt to detect when
two threads are accessing the same bit of code at the same time.

Clearly its not 100% reliable, and is subject to all kinds of thread
races, but it does provide some measure of protection.  Unfortunately
though, the way it has been coded, an unbalanced safety check is
performed in the sqlite3_step() function.

This is equivalent to entering a mutex but never leaving, which causes
deadlock in a multithreaded program.  Only in this situation sqlite
throws a misuse error any time two or more threads use sqlite3_step()
at the same time, even if those threads are synchronised and perfectly
safe.

The easy solution is to disable the safety checks, the propper
solution is to balance out the checks in sqlite3_step() so that users
who actually wish to use sqlite in a multithreaded program are free to
synchronise access to the api without error and there is still a
reasonable level of safety checking for users who do not synchronise
properly.


Emerson

On 1/5/07, John Stanton [EMAIL PROTECTED] wrote:

Work on turning reasonable into adequate or good and it will help
you get an intuitive feel for the design of programs such as yours.
Then your programs will be simple, fast and robust, as Einstein
counselled - Make it as simple a possible, but no simpler.

I also suggest that you take Niklaus Wirth's advice and when you run
into a difficulty backtrack your work and scrap everything until you
reach a point where there are no problems and start again from that
point taking a different approach having learned a lesson from your
previous attempt.

By the way, I doubt whether you are using a modern operating system, it
is most likely to be old technology like Windows or Linux.  Neither
supports much in the way of parallelism.



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2007-01-05 Thread Emerson Clarke

Richard,

I have to admit i am a little dissapointed.  As the primary author of
the software i would have thought that you would have a good
understanding of what the thread safety characteristics of your own
api were.

Suggesting that suppressing the safety checks will result in random
and non reproducable failures seems a little extreme, if not
superstitious.

As i was discussing with Nicholas, api's are not by default thread
unsafe.  In fact it is the other way around, most api's should be
perfectly thread safe if access to them is synchronised.

It is my undestanding that sqlite3_stmt * represents an instance of
the VM, and that most of the internal virtual machine operations are
reentrant, simply operating on the supplied VM and modifying its
internal state.

If this is the case, i cant see how you would ever have thread safety
problems so long as no two threads are modifying the same sqlite3_stmt
* at the same time.

Surely this is something that can be reasoned through ?

Adding deliberate safety check imbalances to sqlite3_step becuase it
is the only routine that can be called mutliple times in sequence is a
bit extreme.  Why not allow users who don't believe in withces and
goblins go ahead and use the api in a synchronised way.  The safety
checks will still be there, just not as agressively.

You should also considder that the safety checks only provide a
warning for the user.  Given that they are subject to race conditions,
you can never guarantee that the safety checks will trap all errors.
In fact i would argue that you are better off throwing an int 3
instruction than returning a misuse error.  Giving the programmer the
false idea that they can continue to run their program without stack
corruption is not a good idea, after all its just encouraging them to
try again.  Rather, according to what you are saying they should be
re-writing their code to access the api with a single thread.

So on the one hand you are actively disallowing users like myself from
synchronising thread access to the api, but on the other hand
encouraging other users to not fix the actual problems.  It doesnt
make sense...

Im sorry for being so harsh, and i know im not winning any friends
here, but i dont give up easily.  I think sqlite is an excellent piece
of software and i believe there is a way which it can be made to
accomodate these concerns.

I have tested the changes i made with some 50 threads performing
probably thousands of statements a second, and had no errors.

If you are willing to work through the problem, i am willing to assist
you in any way that i can.  Wether it be writing test cases or making
modifications to the source.

If there are problems as you say, then there is no reason why they
cant alteast be documented and verified.

Emerson



On 1/5/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

Emerson Clarke [EMAIL PROTECTED] wrote:

 The problem i had was with sqlite not being compatible with the simple
 design that i wanted.  I did try several alternate designs, but only
 as a way of working around the problem i had with sqlite.  It took a
 long time but eventually i managed to get someone to explain why
 sqlite had that particular problem, and i was able to modify the
 sqlite source to resolve the issue.


I remain unconvinced that there is any problem with SQLite in
regard to thread safety.  It is my belief that by disabling the
safety check routines, you are opening your self up to lots of
problems.  Those routines are there to protect you, the programmer,
and to make your programming errors obvious.  By disabling those
checks, you have not fixed the problem.  You have merely
suppressed the symptoms so that you will get rare, random
failures that cannot be easily reproduced.  On your own head
be it.

--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2007-01-05 Thread Emerson Clarke

Roger,

Of course you can test threading behaviour, yes its not exactly
repeatable but under most circumstances and with enough test cases you
can catch the problems.

I don't think sqlite is such a large and complicated piece of software
that it would be impossible to reproduce such errors.

Everyone keeps saying its well documented and referring to the fact
that the reasoning has already been done, but i havnt heard anything
other than the usual thats the way its always been responses, or the
its designed that way beacuse its easier argument.

That does not count as documentation or reasoning.  If anything the
source code is the documentation, but thats not quite the same as an
essay on the thread safety of the sqlite vm is it ?

Anyway, i can see im flogging a dead horse here. This is as futile as
arguing religion with a priest. :P

Theres not much point forking any maintaining the code if on one else
sees any validity in my arguments.

Thanks to everyone anyway,

Emerson

On 1/5/07, Roger Binns [EMAIL PROTECTED] wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Emerson Clarke wrote:
 I have to admit i am a little dissapointed.  As the primary author of
 the software i would have thought that you would have a good
 understanding of what the thread safety characteristics of your own
 api were.

He does!  It is well documented and tested.  *You* are the one who wants
to do various things differently.

 Suggesting that suppressing the safety checks will result in random
 and non reproducable failures seems a little extreme, if not
 superstitious.

Err, you are using threading.  Explain exactly how if you got a failure
due to threading you could reproduce it exactly.  You do know that you
can't run x86 processors in lockstep because at the cycle level they
have effectively random behaviour due to cache eviction and replacement
policies.  That one reason for the Itanium if you need two processors to
have exactly the same behaviour on the same code at the cycle level.

 As i was discussing with Nicholas, api's are not by default thread
 unsafe.  In fact it is the other way around, most api's should be
 perfectly thread safe if access to them is synchronised.

This all comes down to semantics.  For trivial libraries, sure you can
just wrap a mutex around it all.  For non-trivial libraries, best
practise is for the author to design and implement for whatever thread
usage they believe is best suited for the library and the platforms it
operates one.

 Im sorry for being so harsh, and i know im not winning any friends
 here,

So far noone has agreed with you :-)

 I have tested the changes i made with some 50 threads performing
 probably thousands of statements a second, and had no errors.

What does that prove?  You cannot prove threading by running stuff and
saying see it didn't crash.  The only way to prove threading is by
reasoning about the design and verifying the implementation matches the
design. drh already did that years ago.

 If you are willing to work through the problem, i am willing to assist
 you in any way that i can.  Wether it be writing test cases or making
 modifications to the source.

You can always maintain your own forked version of SQLite and link to it
from the wiki.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFnnDQmOOfHg372QQRAtz1AJwIzJbrlnH0xImPSJYzvutAtMomVACfSltc
KaSDdgBuwW4ITN1UWms5DMI=
=EqF9
-END PGP SIGNATURE-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threadin

2007-01-05 Thread Emerson Clarke

Ken,

Thanks for you comments.  I have coded and tested a module just like
test_server.c and by disabling the safety checks i have also been able
to code and test an example which uses a single connection, single
transaction, single table with up to 50 threads doing
insert/update/delete with no errors.

Granted it was synchronised, so all execution of sqlite3_ api routines
was done serially, but in my experience thats quite a normal
requirement for multithreading.

Its hard to make something which doesnt require synchronisation,
especially when internal state and files are concerned.  But thats not
a problem, because if the synchronisation is granular enough it will
only ever block as frequently as it would normallly crash.  So in
other words, blocking becomes a race condition, and performance is not
hindered very much at all.

If you would like i can code up a test case which demonstrates this
and send it too you, but i think i should stop posting to this topic.
Some people are getting a bit too defensive and annoyed...

I am grateful for all the time people have given me, and for the
software itself.  At the end of the day i only brought this up because
i wanted to help improve it.

Emerson

On 1/5/07, Ken [EMAIL PROTECTED] wrote:

Emerson,

 I agree with you somewhat. Not 100% convinced but, I like you am a little dissapointed how sqlite handles  
threadsafe and multiple connections. Even in the test_server.c module is not 
concurrent As it serializes all processing to a single thread, this is not concurrent processing.

 Now if i could take that server thread and create a pool of workers and hand 
off the working set request to one of those, then this would be concurrent 
since each thread would have the capability to run independently of the other 
and at the same time on an SMP hardware system such as Solaris,Hpux etc... But 
this is specifically disallowed since the connection when using the shared 
cache must be used by the creating thread.

 So far I've not been able to successfully find any threadsafe examples of 
sqlite that actually work in a concurrent manner.  Even the threadtest2.c 
crased (bug fix coming in 3.3.10)  but even it does not start up a transaction 
and run multiple inserts/step commands.

 I would like to see a working example of how sqlite can be concurrently 
accessed via multiple threads performingWrite (insert/update/delete) and by 
other threads performing selects against a single table.

 IMHO, the sqlite3_step function if it can only be executed serially (ie must be wrapped 
in a mutex) should do this mutexing internally and not be exposed and left in the hands 
of the user. If one compiles the code with --enable-threadsafe, shouldn't the API be able 
to handle areas such as these internally and not generate misuse errors.



Emerson Clarke [EMAIL PROTECTED] wrote: Roger,

Of course you can test threading behaviour, yes its not exactly
repeatable but under most circumstances and with enough test cases you
can catch the problems.

I don't think sqlite is such a large and complicated piece of software
that it would be impossible to reproduce such errors.

Everyone keeps saying its well documented and referring to the fact
that the reasoning has already been done, but i havnt heard anything
other than the usual thats the way its always been responses, or the
its designed that way beacuse its easier argument.

That does not count as documentation or reasoning.  If anything the
source code is the documentation, but thats not quite the same as an
essay on the thread safety of the sqlite vm is it ?

Anyway, i can see im flogging a dead horse here. This is as futile as
arguing religion with a priest. :P

Theres not much point forking any maintaining the code if on one else
sees any validity in my arguments.

Thanks to everyone anyway,

Emerson

On 1/5/07, Roger Binns  wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Emerson Clarke wrote:
  I have to admit i am a little dissapointed.  As the primary author of
  the software i would have thought that you would have a good
  understanding of what the thread safety characteristics of your own
  api were.

 He does!  It is well documented and tested.  *You* are the one who wants
 to do various things differently.

  Suggesting that suppressing the safety checks will result in random
  and non reproducable failures seems a little extreme, if not
  superstitious.

 Err, you are using threading.  Explain exactly how if you got a failure
 due to threading you could reproduce it exactly.  You do know that you
 can't run x86 processors in lockstep because at the cycle level they
 have effectively random behaviour due to cache eviction and replacement
 policies.  That one reason for the Itanium if you need two processors to
 have exactly the same behaviour on the same code at the cycle level.

  As i was discussing with Nicholas, api's are not by default thread
  unsafe.  In fact it is the other way around, most api's should

Re: [sqlite] sqlite performance, locking threading

2007-01-04 Thread Emerson Clarke

Bill,

Thanks for the description, thats pretty much how i designed the
index, but with a few modifications.  The filesystem becomes the tree
structure which is indexed by a hash of the original document url. It
works like a big hashtable so its quite scalable.





Sorry if this has been posited before, but our solution to this was data
normalisation. IE, we store only the filename, and an integer ID which
is the directory of the file, and of course a lookup table of id's to
directory names. This helped us greatly, as well as reducing overhead of
amount of data stored.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: AW: [sqlite] sqlite performance, locking threading

2007-01-04 Thread Emerson Clarke

John,

I have a reasonable understanding of the PC architecture, and more
appropriately the architecture which the operating system presents to
software.  The PC may be a serial device, but a modern operating
system with its multitasking shcheduler attempts to emulate a non
serial environment.  It devotes a certain amount of time to each
thread and then moves on.

Wether or not you are working on a highly pyshically parallel
architecture or not makes no difference, the design principles are the
same.  You should still build your software so that it is capable of
taking advantage of the environment that the operating system
presents.  It is the operating system you should be designing for, not
the hardware...

As it happens, the newest generation of PC's are all multi-core, and i
have been working on multi processor environments for many years.

Emerson

On 1/4/07, John Stanton [EMAIL PROTECTED] wrote:

If Emeroson intuitively understood the essential architecture of the PC
he is using he would not be having difficulty with his concept of how to
use it.  It is essentially a serial device, multi-tasking device and
parallelism in the forms of threading and multi processing is a
sophistication added with a high overhead.

I recollect an insightful CS professor impressing on his his students
the concept by explaining to them that the machines on their desks were
descended from a device invented to be a gas pump controller.

A machine designed from first principles to manage parrallel processing
would be very different.

Michael Ruck wrote:
 Hi Emerson,

 I just hope you don't reinvent the wheel ;) I haven't yet had the need to
 index things the way you describe it. May be I should take that as one of my
 next pet projects to get a handle on this type of task.

 The problem as I see it is basically, that any way you design this: If the
 storage tasks take 90% of your indexing time, then any parallelization may
 be a waste of effort. Even if you use a synchronization object you're
 essentially serializing things in a (complicated) multithreaded way...

 As far as static initialization: That it occurs before main() and is out of
 your control was the point I was getting across. That's why I wrote that
 this type of initialization should be avoided, unless there's no better
 design for it.

 Michael

 -Ursprüngliche Nachricht-
 Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
 Gesendet: Mittwoch, 3. Januar 2007 20:31
 An: sqlite-users@sqlite.org
 Betreff: Re: [sqlite] sqlite performance, locking  threading

 Michael,

 Thanks for the advice.  During the indexing process i need to select and
 optionally insert records into a table so i cant ignore the outcomes.

 Basically the indexing process does compression, so for each document it
 inserts words into a table and looks up keys.  Every word in the document
 gets swapped with a key, and new keys are inserted as needed.

 There are some problems with splitting the work up in a different way as you
 suggested. I would either end up with a lot of queues or i would have to
 stagger the work so that the entire data set gets processed in stages which
 doesnt scale very well and isnt particularly fault tollerant.  When building
 an index, you want the structure to be built up progressively, so that you
 can pause the process and resume it later on whilst still having useful
 results.

 I would be worried that in a queued design, the overhead and bottlenecks
 caused by the buffering, message passing, and context switching would reduce
 the performance to that of a single thread.
 Especially since the database operations represent 90% of the work, all you
 would really be doing is attempting to serialise things in a multithreaded
 way.

 Im sure having worked on multithreaded systems you appreciate that sometimes
 simple designs are better, and i think i have a pretty good handle on what
 it is that im trying to do.

 You never have control over static initialisation, it happens before main().
 If i was writing very specific code to suit just this situation then maybe
 as you say i wouldnt need to worry about it.  But im also writing a database
 api, and that api is used for many different things.  My considderations are
 not just for this one problem, but also for the best general way to code the
 api so that it is safe and efficient in all circumstances.  So far the
 client/server design is the only way i can achieve true thread safety.

 If i could work out why sqlite3_step() causes problems across multiple
 threads i could probably make things a little faster and i could do away
 with the need for a client/server design.

 Emerson


 On 1/3/07, Michael Ruck [EMAIL PROTECTED] wrote:

Emerson,

Now I understand your current implementation.  You seemingly only
partially split up the work in your code. I'd schedule the database
operation and not wait on the outcome, but start on the next task.
When the database finishes and has retrieved its result, schedule some

Re: [sqlite] sqlite performance, locking threading

2007-01-03 Thread Emerson Clarke

Michael,

Im not sure that atomic operations would be a suitable alternative.
The reason why im using events/conditions is so that the client thread
blocks until the server thread has processed the query and returned
the result.  If i did not need the result then a simple queueing
system with atomic operations or critical sections would be fine i
guess.

The client thread must always block or spin until the server thread
has completed the query.  Critical sections cant be efficiently used
to notify other threads of status change.  I did try using critical
sections in this way, by spinning until the server thread takes a
lock, then blocking and eventually waiting for the server thread to
finish.  But since there is no way to block the server thread when
there is no work to do both the client and server thread must sleep
which induces context switching anyway.

If you used atomic operations, how would you get the client thread to
block and the server thread to block when it is not processing ?

Events/conditions seemed to be the best solution, the server thread
never runs when it doesnt need to and always wakes up when there is
processing to be done.

The static initialisation problem occurs becuase the server thread
must be running before anything which needs to use it.  If you have a
static instance of a class which accesses a database and it is
initalised before the static instance which controls the server
thread, you have a problem.  It can be overcome using the initialise
on first use idiom, as long as your careful to protect the
initalisation with atomic operations, but its still a bit complicated.

Emerson


On 1/3/07, Michael Ruck [EMAIL PROTECTED] wrote:

Hi Emerson,

Another remark: On Windows using Events synchronization objects involves
additional kernel context switches and thus slows you down more than
necessary. I'd suggest using a queue, which makes use of the InterlockedXXX
operations (I've implemented a number of those, including priority based
ones - so this is possible without taking a single lock.) or to use critical
sections - those only take the kernel context switch if there really is lock
contention. If you can reduce the kernel context switches, you're
performance will likely increase drastically.

I also don't see the static initialization problem: The queue has to be
available before any thread is started. No thread has ownership of the
queue, except may be the main thread.

Michael


-Ursprüngliche Nachricht-
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 00:57
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking  threading

Nico,

I have implemented all three strategies (thead specific connections, single
connection multiple threads, and single thread server with multiple client
threads).

The problem with using thread specific contexts is that you cant have a
single global transaction which wraps all of those contexts.  So you end up
having to use fine grained transactions, which decreases performance.

The single connection multiple thread alternative apparently has problems
with sqlite3_step being active on more than one thread at the same moment,
so cannot easily be used in a safe way.  But it is by far the fastest and
simplest alternative.

The single thread server solution involves message passing between threads,
and even when this is done optimally with condition variables (or events on
windows) and blocking ive found that it results in a high number of context
switches and decreased performance.  It does however make a robust basis for
a wrapper api, since it guarantees that things will always be synchronised.
But using this arrangement can also result in various static initialisation
problems, since the single thread server must always be up and running
before anything which needs to use it.

Emerson

On 1/2/07, Nicolas Williams [EMAIL PROTECTED] wrote:
 On Sat, Dec 30, 2006 at 03:34:01PM +, Emerson Clarke wrote:
  Technically sqlite is not thread safe.  [...]

 Solaris man pages describe APIs with requirements like SQLite's as
 MT-Safe with exceptions and the exceptions are listed in the man page.

 That's still MT-Safe, but the caller has to play by certain rules.

 Anyways, this is silly.  SQLite API is MT-Safe with one exception and
 that exception is rather ordinary, common to other APIs like it that
 have a context object of some sort (e.g., the MIT krb5 API), and not
 really a burden to the caller.  In exchange for this exception you get
 an implementation of the API that is lighter weight and easier to
 maintain than it would have been without that exception; a good
 trade-off IMO.

 Coping with this exception is easy.  For example, if you have a server
 app with multiple worker threads each of which needs a db context then
 you could use a thread-specific key to track a per-thread db context;
 use pthread_key_create(3C) to create the key, pthread_setspecific(3C)
 once per-thread

Re: [sqlite] sqlite performance, locking threading

2007-01-03 Thread Emerson Clarke

Michael,

Thanks for the advice.  During the indexing process i need to select
and optionally insert records into a table so i cant ignore the
outcomes.

Basically the indexing process does compression, so for each document
it inserts words into a table and looks up keys.  Every word in the
document gets swapped with a key, and new keys are inserted as needed.

There are some problems with splitting the work up in a different way
as you suggested. I would either end up with a lot of queues or i
would have to stagger the work so that the entire data set gets
processed in stages which doesnt scale very well and isnt particularly
fault tollerant.  When building an index, you want the structure to be
built up progressively, so that you can pause the process and resume
it later on whilst still having useful results.

I would be worried that in a queued design, the overhead and
bottlenecks caused by the buffering, message passing, and context
switching would reduce the performance to that of a single thread.
Especially since the database operations represent 90% of the work,
all you would really be doing is attempting to serialise things in a
multithreaded way.

Im sure having worked on multithreaded systems you appreciate that
sometimes simple designs are better, and i think i have a pretty good
handle on what it is that im trying to do.

You never have control over static initialisation, it happens before
main().  If i was writing very specific code to suit just this
situation then maybe as you say i wouldnt need to worry about it.  But
im also writing a database api, and that api is used for many
different things.  My considderations are not just for this one
problem, but also for the best general way to code the api so that it
is safe and efficient in all circumstances.  So far the client/server
design is the only way i can achieve true thread safety.

If i could work out why sqlite3_step() causes problems across multiple
threads i could probably make things a little faster and i could do
away with the need for a client/server design.

Emerson


On 1/3/07, Michael Ruck [EMAIL PROTECTED] wrote:

Emerson,

Now I understand your current implementation.  You seemingly only partially
split up the work in your code. I'd schedule the database operation and not
wait on the outcome, but start on the next task. When the database finishes
and has retrieved its result, schedule some work package on a third thread,
which only processes the results etc. Split up the work in to repetitive,
non blocking tasks. Use multiple queues and dedicated threads for parts of
the operation or thread pools, which process queues in parallel if possible.
From what I can tell you're already half way there.

I still don't see your static initialization problem, but that's another
story. Actually I'd avoid using static initialization or static (singleton)
instances, unless the design really requires it. Someone must control
startup of the entire process, have that one (probably main/WinMain) take
care that the work queues are available. Afterwards the order of thread
starts doesn't matter... Actually it is non-deterministic anyway (unless you
serialize this yourself.)

Michael

-Ursprüngliche Nachricht-
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 15:14
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking  threading

Michael,

Im not sure that atomic operations would be a suitable alternative.
The reason why im using events/conditions is so that the client thread
blocks until the server thread has processed the query and returned the
result.  If i did not need the result then a simple queueing system with
atomic operations or critical sections would be fine i guess.

The client thread must always block or spin until the server thread has
completed the query.  Critical sections cant be efficiently used to notify
other threads of status change.  I did try using critical sections in this
way, by spinning until the server thread takes a lock, then blocking and
eventually waiting for the server thread to finish.  But since there is no
way to block the server thread when there is no work to do both the client
and server thread must sleep which induces context switching anyway.

If you used atomic operations, how would you get the client thread to block
and the server thread to block when it is not processing ?

Events/conditions seemed to be the best solution, the server thread never
runs when it doesnt need to and always wakes up when there is processing to
be done.

The static initialisation problem occurs becuase the server thread must be
running before anything which needs to use it.  If you have a static
instance of a class which accesses a database and it is initalised before
the static instance which controls the server thread, you have a problem.
It can be overcome using the initialise on first use idiom, as long as your
careful to protect the initalisation with atomic

Re: [sqlite] sqlite performance, locking threading

2007-01-03 Thread Emerson Clarke

Nicholas,

My oppologies, your right that explanation had been given.

But i didnt actually take it seriously, i guess i found it hard to
believe that it being the easier option was the only reason why this
limitation was in place.

If this is the case, then surely the fix is simple.  Given that i
assume it is safe to have multiple sqlite3_step() calls active on a
single connection on a single thread.  And given what you have said
about sqlite not already checking data structures that would be shared
by multiple threads, then surely all that needs to happen is for the
misuse detection to be removed.

Since there is usually nothing which needs to be done to specifically
make any api thread safe other than synchronising access too it.  If
by synchronising access to the api calls i can ensure that no two
threads use any data structure at the same time, everything should
work fine right ?

This gets to the very core of the original issue i had, and why i said
sqlite was actively thread unsafe.  Because it terminates itself
with misuse errors even when in theory it is perfectly safe to be used
with multiple threads provided the user synchronises on every api
call, which is easily achieved.

Am i making any sense ?

Emerson


On 1/3/07, Nicolas Williams [EMAIL PROTECTED] wrote:

On Wed, Jan 03, 2007 at 11:22:36PM +, Emerson Clarke wrote:
 Ok,

 Well can you do me a favour and forward me the email where this was
 supposedly explained in all its technical glory.

Technical glory?  No.  At a high level it's this: that it is easier to
make an API like SQLite's thread-safe with exceptions like only one
thread active in any given context object at any time than it is to
put mutexes and what not around data structures that would be shared by
multiple threads if this exception were not stated.  Unless and until
you try to do it the other way you'll find this rationale to be
subjective.

Like I said, good luck.



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2007-01-02 Thread Emerson Clarke

Nico,

I have implemented all three strategies (thead specific connections,
single connection multiple threads, and single thread server with
multiple client threads).

The problem with using thread specific contexts is that you cant have
a single global transaction which wraps all of those contexts.  So you
end up having to use fine grained transactions, which decreases
performance.

The single connection multiple thread alternative apparently has
problems with sqlite3_step being active on more than one thread at the
same moment, so cannot easily be used in a safe way.  But it is by far
the fastest and simplest alternative.

The single thread server solution involves message passing between
threads, and even when this is done optimally with condition variables
(or events on windows) and blocking ive found that it results in a
high number of context switches and decreased performance.  It does
however make a robust basis for a wrapper api, since it guarantees
that things will always be synchronised.  But using this arrangement
can also result in various static initialisation problems, since the
single thread server must always be up and running before anything
which needs to use it.

Emerson

On 1/2/07, Nicolas Williams [EMAIL PROTECTED] wrote:

On Sat, Dec 30, 2006 at 03:34:01PM +, Emerson Clarke wrote:
 Technically sqlite is not thread safe.  [...]

Solaris man pages describe APIs with requirements like SQLite's as
MT-Safe with exceptions and the exceptions are listed in the man page.

That's still MT-Safe, but the caller has to play by certain rules.

Anyways, this is silly.  SQLite API is MT-Safe with one exception and
that exception is rather ordinary, common to other APIs like it that
have a context object of some sort (e.g., the MIT krb5 API), and not
really a burden to the caller.  In exchange for this exception you get
an implementation of the API that is lighter weight and easier to
maintain than it would have been without that exception; a good
trade-off IMO.

Coping with this exception is easy.  For example, if you have a server
app with multiple worker threads each of which needs a db context then
you could use a thread-specific key to track a per-thread db context;
use pthread_key_create(3C) to create the key, pthread_setspecific(3C)
once per-thread to associate a new db context with the calling thread,
and pthread_getspecific(3C) to get the calling thread's db context when
you need it.  If you have a protocol where you have to step a statement
over multiple message exchanges with a client, and you don't want to
have per-client threads then get a db context per-client/exchange and
store that and a mutext in an object that represents that
client/exchange.  And so on.

Nico
--

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2006-12-30 Thread Emerson Clarke

Roger,

My original question was in fact not a statement.  I did not want
sqlite to work differently.  Rather the opposite, sqlite already works
differently to the way i, and probably a lot of users assume that it
would.  So all i wanted to know was why that is the case.

It seemed to me that making a library which only functioned on a per
thread basis was something that you would have to do deliberately and
by design.  But i am still vague as to what the reasons were behind
this design, aside from some mention of file locks being based on
threads not processes.

I am left to assume that all other locking mechanisms like ipc and
files have already been tried and been found wanting.  I also assume
that priority has been given to making sqlite operate across network
boundaries rather than process to process (where the locking
requirements would be quite different)

But thank you for your many suggestions and explanations of the
virtual tables functionality.  It seems quite powerful, and if i didnt
want to try to keep this as database agnostic as possible i would
probably look into that.  I especcially liked the notion that you
could keep the index as files but have it appear as a table since that
is in part what i am doing anyway in a slightly more abstract way.  I
am already using options like synchronous=off.

I think as other users have suggested the two best options for me are
to write custom locking handlers for os.c (although i am still unsure
as to wether this will fix the problem, essentially i want to turn all
file locking off completely since the database will never be accessed
by more than one process) or to simply implement an in process server
which handles all of the sqlite functionality using message passing to
a single thread so that there is only ever one lock and one cache.

The second option is probably the one i will try first, since its more
transparent and for the cost of a single thread lets me emulate the
exact same behaviour that other databases have with regards to
transactions and threads.

Im not familiar with DB/dbm/gdbm, are any of those under a similar
license to sqlite and or as easy to use and simple ? :)

Emerson






On 12/30/06, Roger Binns [EMAIL PROTECTED] wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Emerson Clarke wrote:
| I have deliberately tried to avoid giving too much detail on the
| architecture of the index since that was not the point and i didnt
| want to end up debating it.

I don't want to debate your index architecture either :-).  Quite simply
several times a month people post to this ideally wanting SQLite changed
to match how they want to structure things.  People on the list explore
with the poster how the items of data are related, and can suggest an
alternative way of doing things.  Usually the poster finds that simpler
than what they had first thought of and goes away happy.

Is this your question: I want SQLite to work differently than it
currently does so that it matches how I want to do things?

If that is the case, then the answer is you can go ahead and rewrite as
much of SQLite as you want to to do that.  The code is public domain so
there are no legal or technical hindrances standing in your way.  This
thread may as well end at that.

On the other hand, if you do want to work within the constraints of
SQLite then there are quite a few things that can be suggested.  But
that is only possible if more is known about the relationships of the data.

| I did make an attempt to explain that A and B could not be done at the
| same time in a previous message, but perhaps its been lost in the
| conversation.  The process involves several stages some of which are
| database operations and some of which are file operations and that the
| operations are not separable.  They must be done in sequential order.

I was trying to establish what has to be serialized.  In particular the
question was about if A and B had any relationships to each other.  If
they do, then that means they would have to be processed serially and I
don't see the relevance of threading etc.

If they can be processed at the same time, then that means some sort of
partitioning can happen.  In theory breaking the datasets into 10
partitions can give 10 times the performance, but in practise there will
need to be some coordination in order to make it look like there is one
database not multiple pieces.

| The database operations, though very small still consume the most time
| and are the most sensetive to how the synchronisation takes place and
| where the transactions are placed.

Have you considered just using plain DB/dbm/gdbm and then importing the
data on demand into SQLite?

Also a lot of the synchronisation is because SQLite makes damn sure it
doesn't lose your data.  If your documents are permanent (ie you can
access them later if need be), then you can loosen the constraints on
SQLite.  For example you could run with pragma synchronous=off and then
do a checkpoint

Re: [sqlite] sqlite performance, locking threading

2006-12-30 Thread Emerson Clarke

Richard,

My complaint, if you want to call it that.  Was simply that there are
seemingly artificial constraints on what you can and cant do accross
threads.

If i have a linked list, i can use it across threads if i want to,
provided that i synchronise operations in such a way that the list
does not get corrupted.

Likewise for most other data structures and libraries.  The default
behaviour is that if i create an object on the stack or the heap, i
can use it between multiple threads provided that i synchronise access
to that object.

Sqlite does not follow these rules, as something created in one thread
does not work in another thread regardless of synchronisation and it
is out of my control.

This is not a situation that i would expect anyone to purposefully
design becuase it makes multithreaded programming difficult,
particularly when most things obey the first paradigm (allowing things
to be used across threads).

So i really dont understand why sqlite has this thread behaviour, and
i assume that there is a technical limitation rather than design
decision behind it, though no one has as yet confirmed this...

Emerson

On 12/30/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

Emerson Clarke [EMAIL PROTECTED] wrote:

 It seemed to me that making a library which only functioned on a per
 thread basis was something that you would have to do deliberately and
 by design.

I'm still trying to understand what your complaint is.
--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2006-12-30 Thread Emerson Clarke

Roger,

I think sqlite suffers somewhat from a bit of an identity crisis.
Whilst it is both a library and a piece of code which you embed in a
project it is often talked about as though it is some external
component.

Technically sqlite is not thread safe.  Just because the library has
explicitly disallowed using the same sqlite3 * structure in multiple
threads on some platforms (i understand this requirement has been
relaxed on others) does not make it thread safe.  Even on the
platforms where a single sqlite3 * structure can be used on multiple
threads (provided it is not at the same time), it is not possible to
have a transaction which works across these threads.  So even if the
connection is thread safe, the transactions are not.

By the usual definition, something which is thread safe can be safely
used across multiple threads, usually with the aid of synchronisation
but sometimes not.  For instance collections are often considdered
thread safe only when they manage their own mutexes internally so that
the user doesnt have to.  But either way, you can use them accross
multiple threads.  You cannot do this with sqlite, so it is quite
confusing to say that sqlite is thread safe...

I think a better definition would be that sqlite can be safely used in
a multithreaded program, but is not thread safe.

I agree that multithreaded programming can be difficult, but its not
magic and i think that a few simple rules can overcome most of the
problems.  It certainly is not luck that multithreaded systems work,
usually its the result of careful design and hard work.

Emerson

On 12/30/06, Roger Binns [EMAIL PROTECTED] wrote:

Emerson Clarke wrote:
 If i have a linked list, i can use it across threads if i want to,
 provided that i synchronise operations in such a way that the list
 does not get corrupted.

And of course you also have to know about memory barriers and compiler
re-ordering.  That is highly dependent on the libraries and/or compiler
you are using, as well as underlying hardware implementation.  Most of
the time, developers just get lucky.

  http://en.wikipedia.org/wiki/Memory_barrier

 Likewise for most other data structures and libraries.

Arguably that is by luck and not design!  Look at the effort that to go
in an add _r suffixed versions of several functions in the standard
libraries.  And many GUI libraries have long had a restriction that you
can only use them in one thread.

 Sqlite does not follow these rules, as something created in one thread
 does not work in another thread regardless of synchronisation and it
 is out of my control.

SQLite's design was not luck.  The design expects you to create unique
sqlite3 objects in each thread.  Effort and thought was put into that!

http://www.sqlite.org/cvstrac/wiki?p=MultiThreading

It was loosened a bit in 3.3.x:

  http://www.sqlite.org/faq.html#q8

What isn't allowed is multiple statements executing at the same time in
multiple threads against the same sqlite3* db object.  In order to
support that, SQLite would have to have extensive code protecting the
various internal data structures as well as ensuring concurrency.

 This is not a situation that i would expect anyone to purposefully
 design becuase it makes multithreaded programming difficult,

The purposeful design is that you make sqlite3 objects per thread.  That
way there is absolutely no danger of corruption or other bad issues.

Roger


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2006-12-30 Thread Emerson Clarke

Richard,

Ok, im pretty clear on the file locking being the cause of the
problems with the sqlite3 * structures, but thanks for confirming it.
I understand that on platforms that dont have this issue its not a
problem.

But why then can i not have a single transaction wrapping a single
connection which is used within multiple threads, obvioulsy not at the
same time.  Its rare that anything can be used at the same time by
multiple threads, hence the need for atomic operations (which *nix
doesnt have great standard support for).

To clarify, i have never been talking about using something
simultaneously, only within the confines of synchronisation.  So when
i talk about using something in multiple threads, i mean in a
syhcnronised way, but still it would be the same exact piece of
memory.


On 12/30/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

Emerson Clarke [EMAIL PROTECTED] wrote:
 Richard,

 My complaint, if you want to call it that.  Was simply that there are
 seemingly artificial constraints on what you can and cant do accross
 threads.

 If i have a linked list, i can use it across threads if i want to,
 provided that i synchronise operations in such a way that the list
 does not get corrupted.

 Likewise for most other data structures and libraries.  The default
 behaviour is that if i create an object on the stack or the heap, i
 can use it between multiple threads provided that i synchronise access
 to that object.

 Sqlite does not follow these rules, as something created in one thread
 does not work in another thread regardless of synchronisation and it
 is out of my control.


I think it was already explained to you that the reason for this
behavior is to work around bugs in file locking in older versions
of Linux.  If you are using a newer version of Linux or OS-X or
windows, those constraints do not apply to you. You can move
database connections across threads freely.

What you cannot do is to use the same database connection in two
or more threads at the same time.  This is the usual case with most
libraries, threadsafe or not - you can use separate instances of
an object in different threads at the same time, but you cannot use
the same object simultaneously in multiple threads.
--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2006-12-30 Thread Emerson Clarke

Richard,

Are you sure we are not just getting into semantic knots here ?

Do we have the same definition of at the same time.  I mean
concurrently, so that both threads use the same sqlite3 * structure,
within mutexes. Each query is allowed to complete before the other one
starts, but each thread may have multiple statements or result sets
open.

When i try to do this, i get api called out of sequence errors...

On 12/30/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

Emerson Clarke [EMAIL PROTECTED] wrote:
 But why then can i not have a single transaction wrapping a single
 connection which is used within multiple threads, obvioulsy not at the
 same time.

You can.  What makes you think you can't?
--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2006-12-30 Thread Emerson Clarke

Richard,

Im getting some mixed signals here, on the one hand i am being told
that it should be ok to use a single transaction and connection across
multiple threads (assuming that it is correctly synchronised). But on
the other hand i am reading statements like that as part of a list of
things which may cause a sqlite misuse error.

Trying to use the same SQLite database connection from two or more
threads at the same time.

Anyway, the sqlite_step() problem certainly explains one of the
crashes that i saw so that will save me some debugging.

Thanks to those who have suggested alternative designs and strategies.
I am currently adapting the single thread does all the database work
solution.

The documentation sources i have been looking at regarding threading
are (in addition to various discussions in this mailing list) here;

http://www.sqlite.org/cvstrac/wiki?p=MultiThreading
http://www.sqlite.org/lang_transaction.html
http://www.sqlite.org/lockingv3.html
http://www.sqlite.org/faq.html#q8

With that and subsequent versions, it is safe to move a connection
handle across threads as long as the connection is not holding any
fcntl() locks. You can safely assume that no locks are being held if
no transaction is pending and all statements have been finalized.

The above statement regarding transactions was one of the things which
led me to believe that it was not possible to have the single
transaction, single connection and multiple thread arrangement.


Emerson

On 12/30/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

Joe Wilson [EMAIL PROTECTED] wrote:
 --- [EMAIL PROTECTED] wrote:
  SQLite allows you to have multiple queries running at the
  same time in multiple threads, as long as no two threads are
  running sqlite3_step() at the same moment.

 Do you mean as long as no two threads are running sqlite3_step()
 _for the same sqlite3* connection_ at the same moment?


Yes.
--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2006-12-29 Thread Emerson Clarke

Im not sure that the current situation of forced thread
incompatibility is better than leaving it up to users to manage the
threading.  Usually it is assumed that a library is thread unsafe
unless otherwise specified.

Developing multithreaded applications is difficult, i wouldnt dispute
that.  But i do dispute the wisdom of actively making a library
incompatible with threads.  Sqlite is the first example of such
behaviour i have seen, it deliberately fails even if the user attempts
to synchronise concurrent access.

Surely it would be better to design the api so that it was either
fully thread safe and blocked on every call to an sqlite_ function
using its own mutexes, or to simply ensure that it was process safe
using shared memory.  As i mentioned in a previous reply,
synchronising sqlite would be no more or less difficult or error prone
than synchonising a linked list.

I dont see what is gained by making it not work with threads, so i
assume that it is becuase of some technical reason rather than a
design decision.

Anyway, as another user has pointed out.  You can compile the library
in such a way as to allow the thread synchronisation routines to be
overridden so i think i have a solution to the problem, but perhaps
not an answer as to why things are the way they are.


On 12/29/06, Jay Sprenkle [EMAIL PROTECTED] wrote:

On 12/27/06, Emerson Clarke [EMAIL PROTECTED] wrote:
 The first question is why database locking has been enabled on a per thread
 basis instead of per process so that the normal methods of thread
 synchronisation (mutexes, ciritcal sections) could not be used for
 maintaining consistency within the application.  The programmer can easily
 ensure that no two statements are executed at the same time just as they
 would ensure that no more than one operation on a linked list was performed
 at the same time.

Having watched a lot of people attempt to write thread safe code I would
say this is an incorrect assumption. I think it only appears simple at a casual
glance.

--
The PixAddixImage Collector suite:
http://groups-beta.google.com/group/pixaddix

SqliteImporter and SqliteReplicator: Command line utilities for Sqlite
http://www.reddawn.net/~jsprenkl/Sqlite

Cthulhu Bucks!
http://www.cthulhubucks.com

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2006-12-29 Thread Emerson Clarke

In general worker threads is not an efficient solution to the problem
even if parallelism is available.  There is nothing to be gained by
having one thread handing off A to another set of worker threads
because you have to synchronise on whatever queue/list/pipe you use to
transfer the information between threads anyway.  So you may as well
just have multiple threads all performing their A,B, and C operations
in parallel and they will naturally interleave their operations based
on the synchronisation that you use.

Typically worker threads are useful when you want something to be done
asynchronously in the background.   I dont think its a suitable design
for this task though because it does not do anything that multiple
threads with mutexes cant do in the foreground and only introduces
overhead and delay in the queueing process and context switching.

On the second point, i cant see any reason why the VDBE design would
be a problem for what i want to do.  Taking out a table wide lock is
fine, so long as i can make changes to that table from multiple
threads using the same connection and lock.

The only danger occurs if those threads are not synchronised and two
or more attempt to modify the table at once, but of course, thats what
mutexes are for.

Emerson

On 12/29/06, Roger Binns [EMAIL PROTECTED] wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Emerson Clarke wrote:
| For architectural reasons the above steps must be performed in that
| order. This means that operations cant be separated or queued up in
| the way that you suggested.  Each step is dependent on the previous
| step.

I was looking for parallelism expecting that indexing one document has
no dependencies on other documents.  Lets say that you have 3 documents
A, B and C.  Is it impossible to index B until after A's indexing is
done?  If that is the case then your indexing process is imposing
serialization and there is nothing SQLite can do about that.  If the
indexing of A, B  C can be done in parallel, then you can have the
indexing workers send updates to a single thread/process.  That can
aggregate information and then dump it all in a single transaction quickly.

| Of course i know the simple answer, which is that the current api does
| not support this.  But im wondering why, and if there are any other
| ways to achieve the desired performance.

It is actually the internal design of SQLite that doesn't support this.
~  (Look at VDBE instructions for queries sometime). SQLite locks the
whole database, rather than tables or rows.  But the locks are held for
the shortest time period possible.  There are other database systems out
there that have finer granularity locking, but that is more complicated.
~ There are also potential patent issues and SQLite deliberately uses old
techniques to ensure no patent violation.  SQLite's techniques have to
work without centrally coordinating mechanism and within the confines of
lowest common denominator functionality available on a variety of
operating systems.

Quite frankly if you cannot change the way you work with your data set
then you may want to try a different database system such as Postgres.
Of course that internally solves the problems by having a single process
and doesn't have to worry about any other process/thread stepping on its
toes.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFlLG+mOOfHg372QQRAn+sAKCrImPgr0Ex1xo3XOXgWg0E7t3wWQCgtAAQ
ZB5bs3fFntT4u3rABIGcOik=
=c2pB
-END PGP SIGNATURE-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threadin

2006-12-29 Thread Emerson Clarke

Joe,

Im interested to know what those constraints are and why ?

The only reason i mentioned shared memory is because it provides a
platform and filesystem agnostic way of handling ipc.  Obvioulsy i
dont know the ins and outs of the locking process, but i just thought
it would make sense to have a single unified model rather than
different strategies on many platforms.

On 12/29/06, Joe Wilson [EMAIL PROTECTED] wrote:

--- Emerson Clarke [EMAIL PROTECTED] wrote:
 Developing multithreaded applications is difficult, i wouldnt dispute
 that.  But i do dispute the wisdom of actively making a library
 incompatible with threads.

Actively?  That's a bit much.

There are constraints on the ability to pass SQLite connections
between threads. To be safe, just use the connection on the
same thread it was created and you'll be fine.

 Sqlite is the first example of such
 behaviour i have seen, it deliberately fails even if the user attempts
 to synchronise concurrent access.

 Surely it would be better to design the api so that it was either
 fully thread safe and blocked on every call to an sqlite_ function
 using its own mutexes, or to simply ensure that it was process safe
 using shared memory.  As i mentioned in a previous reply,
 synchronising sqlite would be no more or less difficult or error prone
 than synchonising a linked list.

You're free to serialize your calls to SQLite in your own code.
Take a look at the many wrappers for SQLite for ideas.

Shared memory is not the answer. It's a nightmare to manage and creates
more problems than it solves.


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2006-12-29 Thread Emerson Clarke

Richard,

First let me say thank you for your earilier suggestion regarding the
compile time options and overlaoding of the os routines.  It was
exactly the kind of information i was searching to for and i am
looking into using that strategy at the moment.

Sorry if ive managed to offend you, please excuse my enthusiasm. Its
just the way i write and it doesnt necessarily reflect my intentions.

I only meant to point out that it is very rare to encounter a library
which behaves in the way that sqlite does.  That is that the actual
pod types cannot be used across multiple threads.  I understand that
sqlite can be used with multiple threads, but only in what i percieve
to be a limited way.

I cannot use one connection with multiple threads.  So i cant take
advantage of the caching unless i use other options and pragma
directives.

I also cannot perform multiple operations within a single transaction
across multiple threads. This design is restrictive in that it forces
very granular transactions which then limit the overall performance.

Perhaps you could assist me in understanding why this design is better
than say, just allowing one connection to be shared amongst threads
and letting the user handle the synchronisation ?

At the moment there is no way which i can wrap the sqlite libraries in
an api which is shared with another database like mysql and have
consistent behaviour.  This is purely because of the restrictions on
the way sqlite handles threads and the fact that a transaction cannot
be used across multiple threads.  If sqlite behaved differently then
from the wrapper code there would be no difference between using
sqlite and mysql except for the sql syntax.  The fact that it is an
embedded file based database would be completely transparent.  At the
moment it is not.

Functions like qsort() are not only thread safe, but probably also
reentrant.  Im aware of the differences and i understand why things
are and arent thread safe. Of course there are several algorithms
other than quick sort which can be used to sort the same array in
multiple threads.

I dont wish to start a debate, merely to engange in a technical
discussion of the design which sqlite uses since i dont understand the
reasons behind it and i would very much like too.

Emerson

On 12/29/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

Emerson Clarke [EMAIL PROTECTED] wrote:
 Im not sure that the current situation of forced thread
 incompatibility is better than leaving it up to users to manage the
 threading.  Usually it is assumed that a library is thread unsafe
 unless otherwise specified.

 Developing multithreaded applications is difficult, i wouldnt dispute
 that.  But i do dispute the wisdom of actively making a library
 incompatible with threads.  Sqlite is the first example of such
 behaviour i have seen, it deliberately fails even if the user attempts
 to synchronise concurrent access.

You'll probably get more help in solving your problem if you
can refrain from sophomoric remarks such as the above.


 Surely it would be better to design the api so that it was either
 fully thread safe and blocked on every call to an sqlite_ function
 using its own mutexes, or to simply ensure that it was process safe
 using shared memory.  As i mentioned in a previous reply,
 synchronising sqlite would be no more or less difficult or error prone
 than synchonising a linked list.

The threading design of SQLite is very simple and intuitive.
Multiple threads can be using SQLite at the same time as long
as they are not using the same database connection at the same
time.  This is consistent with most other utility functions.
The qsort() function is threadsafe - you can have two different
threads in qsort() sorting different arrays at the same time.
But it does not work to sort the same array in two threads at
the same time.  In the same way, SQLite allows two different
threads to be working with different database connections at
the same time, but not the same database connection.

What about that is so difficult to understand?

On some older versions of Linux, SQLite is unable to pass
database connections from one thread to another.  But this
is a problem with the threading libraries used in those older
linux versions and is outside the control of SQLite.  I do not
think this issue comes into play here.


 I dont see what is gained by making it not work with threads, so i
 assume that it is becuase of some technical reason rather than a
 design decision.

SQLite does work with threads.  Your comment is based on a false
premise.

--
D. Richard Hipp  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threadi

2006-12-29 Thread Emerson Clarke

Ken,

Thanks.

Ok, i think i did miss the point with your test case.  I assumed that
it was always possible to perform multiple select, insert, delete, and
update statements within a single transaction anyway.  Which i guess
relates to you last point.  I dont see why if i have a single global
transaction i cant perform all the operations i want within that
transaction to maximise performance.  In fact i know i can. because
thats what i do with a single thread.

I have always assumed that the simplest and highest performance
solution would be to have multiple threads with one transaction and
one cache.  I will take a look at the test_server.c code and see if i
can understand what you are suggesting.

I dont see why writer starvation would be a problem, but again i
probably just dont understand what your suggesting there.  Anyway,
thanks for your help.  I suspect that you are right and the ultimate
solution will be to write my own synchronisation code, since that will
allow me to have a consistent api across multiple databases like
sqlite and mysql.

I could achieve this if i could have multiple threads executing
statements within the same transaction since that is the effective
behaviour i get from most other databases which are not in process.
In those cases it is the process boundary and the tcp/ip connections
which are effectivley serialising all interactions which is equivalent
of having only one thread when using sqlite.

Emerson

On 12/29/06, Ken [EMAIL PROTECTED] wrote:


 Emerson,

   You just need to lock the entire transaction using a mutex before hand. That 
means each thread will have its own sqlite cache.  Reread the sqlite locking 
and concurrency guide, you'll see that SQLITE is NOT a transaction based system 
such as Postgress/mysql/ oracle. It locks the entire database file at the point 
of writing, not at the first insert/update/delete. Its better to think of 
sqlite as an easy way to perform data access against a FLAT file. Rather than a 
transaction oriented system.

 Regarding my test case. You missed the point, there is a select statement in 
the middle that each thread executes. The nice thing is that each and every 
thread uses a shared cache and can execute the select statement concurrently 
without a mutex. The locking when using a shared cache works at the table level 
instead of file!

 Another thing you should look at: Create a master DB, then attach the 
individual databases to the master. That way you only need one Cache. How do 
you know you wouldn't benefit from merging your caches? Have you tried? Do you 
have test cases to prove that it doesn't help?

 You stated that you want to put your owner locking mutex wrapper around the 
database interactions so that there would be only 1 thread access the DB.. That 
means that you need to release the lock after each 
statement(insert/update/delete) and not perform a BEGIN transaction/commit... 
Which means very slow performance. And besides only one thread regardless could 
be accessing the DB. Whereas the test_server.c  architecture (you need to 
modify this for your own needs) will let you create multiple shared connections 
to the database. And each thread can perform selects concurrently. But all 
threads are serialized into the single thread for writing. This takes care of 
all locking issues and is technically the same as mutexing the writes across 
threads. And it addresses writer starvation, which you have not addressed with 
your simple mutex locking.

 It seems to me you might want write your own locking implementation or even 
abandoning sqlite if it doesn't fit your needs.


Emerson Clarke [EMAIL PROTECTED] wrote: Ken,

Thanks i understand your example well.

What im looking for is the ability to have multiple threads perform a
write operation based on my mutex, not some internal sqlite write
mutex.  If i am managing the concurrency and performing correct
synchronisation, why can i not have multiple threads writing to the
database at once and achive a better overall performance.

Given that the indexing process i am referring too has several steps
which it must perform in successive order, and not all of them are
database writes, i am simply trying to use threads to emulate a
situation where there is only one thread doing all the database
writes, if that makes sense.

So in this case, what ever synchronisation sqlite is doing internally
is actually getting in the way of what i am trying to do.

There were no gains in performance in your test case becuase there was
no reason to have multiple threads.  The only thing each thread was
doing was writing to the database, so of course the only thing you
will introduce by using multiple threads is overhead.

Emerson



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threadi

2006-12-29 Thread Emerson Clarke

OK, thanks Joe.

I will look up those peculiarities as you suggested, im interested in
understanding what they are.

I agree with you, DRH has done a great job.  Sqlite is a fantastic
piece of software and and outstanding example of the way open source
should be.  It is small, efficient and incredibly easy to use.

It is remarkable that it has so many capabilities and as such it has
rapidly become the critical component in many of the things i am
working on.

I was just a bit confused about the way the threading works, hence
this discussion.

Emerson

On 12/29/06, Emerson Clarke [EMAIL PROTECTED] wrote:

Ken,

Thanks.

Ok, i think i did miss the point with your test case.  I assumed that
it was always possible to perform multiple select, insert, delete, and
update statements within a single transaction anyway.  Which i guess
relates to you last point.  I dont see why if i have a single global
transaction i cant perform all the operations i want within that
transaction to maximise performance.  In fact i know i can. because
thats what i do with a single thread.

I have always assumed that the simplest and highest performance
solution would be to have multiple threads with one transaction and
one cache.  I will take a look at the test_server.c code and see if i
can understand what you are suggesting.

I dont see why writer starvation would be a problem, but again i
probably just dont understand what your suggesting there.  Anyway,
thanks for your help.  I suspect that you are right and the ultimate
solution will be to write my own synchronisation code, since that will
allow me to have a consistent api across multiple databases like
sqlite and mysql.

I could achieve this if i could have multiple threads executing
statements within the same transaction since that is the effective
behaviour i get from most other databases which are not in process.
In those cases it is the process boundary and the tcp/ip connections
which are effectivley serialising all interactions which is equivalent
of having only one thread when using sqlite.

Emerson

On 12/29/06, Ken [EMAIL PROTECTED] wrote:

  Emerson,

You just need to lock the entire transaction using a mutex before hand. 
That means each thread will have its own sqlite cache.  Reread the sqlite locking 
and concurrency guide, you'll see that SQLITE is NOT a transaction based system 
such as Postgress/mysql/ oracle. It locks the entire database file at the point of 
writing, not at the first insert/update/delete. Its better to think of sqlite as 
an easy way to perform data access against a FLAT file. Rather than a transaction 
oriented system.

  Regarding my test case. You missed the point, there is a select statement in 
the middle that each thread executes. The nice thing is that each and every thread 
uses a shared cache and can execute the select statement concurrently without a 
mutex. The locking when using a shared cache works at the table level instead of 
file!

  Another thing you should look at: Create a master DB, then attach the 
individual databases to the master. That way you only need one Cache. How do you 
know you wouldn't benefit from merging your caches? Have you tried? Do you have 
test cases to prove that it doesn't help?

  You stated that you want to put your owner locking mutex wrapper around the 
database interactions so that there would be only 1 thread access the DB.. That 
means that you need to release the lock after each statement(insert/update/delete) 
and not perform a BEGIN transaction/commit... Which means very slow performance. 
And besides only one thread regardless could be accessing the DB. Whereas the 
test_server.c  architecture (you need to modify this for your own needs) will let 
you create multiple shared connections to the database. And each thread can 
perform selects concurrently. But all threads are serialized into the single 
thread for writing. This takes care of all locking issues and is technically the 
same as mutexing the writes across threads. And it addresses writer starvation, 
which you have not addressed with your simple mutex locking.

  It seems to me you might want write your own locking implementation or even 
abandoning sqlite if it doesn't fit your needs.


 Emerson Clarke [EMAIL PROTECTED] wrote: Ken,

 Thanks i understand your example well.

 What im looking for is the ability to have multiple threads perform a
 write operation based on my mutex, not some internal sqlite write
 mutex.  If i am managing the concurrency and performing correct
 synchronisation, why can i not have multiple threads writing to the
 database at once and achive a better overall performance.

 Given that the indexing process i am referring too has several steps
 which it must perform in successive order, and not all of them are
 database writes, i am simply trying to use threads to emulate a
 situation where there is only one thread doing all the database
 writes, if that makes sense.

 So in this case, what ever

Re: [sqlite] sqlite performance, locking thread

2006-12-29 Thread Emerson Clarke

Ken,

Yes you cannot have multiple threads within the same transaction, and
you cannot pass a connection between threads.

I think we have an undestanding about the performance situation, and
we are getting to the real heart of the issue, which why it is not
possible to have a single transaction, single connection and multiple
threads.

Provided the user synchronises the access to the connection, this
should offer the highest performance.  But there is some technical
reason why sqlite has this (in my opinion unusual behaviour).  Perhaps
DRH can explain this...

Emerson


On 12/29/06, Ken [EMAIL PROTECTED] wrote:

The test server.c code is an example that has a single thread that performs the 
DB interactions each client thread, communicates via a queue interface. Each 
client will get serialized into the  DBserver thread and get its work done. 
Thus eliminating any lower level locking and mutexing inside the application 
code.

 Your assumption regarding 1 thread/1 cache is pretty accurate. This is what 
the test_server.c code does, however each client thread does however get a 
database connection handle.

 If you have a single global transaction you can do insr/selec/upd/delete.. The 
thing is that the connection may not be passed around amongst threads.

 You cannot have multiple threads executing within the same transaction!!! To 
my understanding, that is not allowed. Please someone else correct me if I'm 
wrong...

 The difference is with other DB's they utilize a finer grain of locking 
internally, either page locking, row locking etc.. Sqlite uses Database 
LOCKING, which is full file. So its really only designed to be used by a single 
thread of execution. (again DRH please correct me here if I'm wrong).


Emerson Clarke [EMAIL PROTECTED] wrote: Ken,

Thanks.

Ok, i think i did miss the point with your test case.  I assumed that
it was always possible to perform multiple select, insert, delete, and
update statements within a single transaction anyway.  Which i guess
relates to you last point.  I dont see why if i have a single global
transaction i cant perform all the operations i want within that
transaction to maximise performance.  In fact i know i can. because
thats what i do with a single thread.

I have always assumed that the simplest and highest performance
solution would be to have multiple threads with one transaction and
one cache.  I will take a look at the test_server.c code and see if i
can understand what you are suggesting.

I dont see why writer starvation would be a problem, but again i
probably just dont understand what your suggesting there.  Anyway,
thanks for your help.  I suspect that you are right and the ultimate
solution will be to write my own synchronisation code, since that will
allow me to have a consistent api across multiple databases like
sqlite and mysql.

I could achieve this if i could have multiple threads executing
statements within the same transaction since that is the effective
behaviour i get from most other databases which are not in process.
In those cases it is the process boundary and the tcp/ip connections
which are effectivley serialising all interactions which is equivalent
of having only one thread when using sqlite.

Emerson


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2006-12-29 Thread Emerson Clarke

Roger,

I have deliberately tried to avoid giving too much detail on the
architecture of the index since that was not the point and i didnt
want to end up debating it.

The design of the index is not the issue, suffice to say that i think
you are over complicating things. It is a desceptively simple problem,
and i really am not insisting on anything.  At the end of the day i
can just use another database, but the similicity and low maintenance
of sqlite appeals to me.

I did make an attempt to explain that A and B could not be done at the
same time in a previous message, but perhaps its been lost in the
conversation.  The process involves several stages some of which are
database operations and some of which are file operations and that the
operations are not separable.  They must be done in sequential order.

At this stage nothing is premature optimisation since i am merely
running performance tests and modelling the architecture. I can see
where the time is being spent using high resolution timers so that is
my empirical evidence.

The database operations, though very small still consume the most time
and are the most sensetive to how the synchronisation takes place and
where the transactions are placed.  I dont think custom functions are
appropriate for what im doing and im not sure how virtual tables would
be either, i rather suspect that would be a very complicated approach.
The schema is extemely simple, and there is barely any logic too the
indexing process at all.

The biggest impact on the performance is the transactions, and since
its hard to dispute that a single global transaction is the fastest
way to operate when you are using a single thread it is easy to see
that being able to use one global transaction and multiple threads
within that transaction is the simplest and easiest way to gain
perofrmance.  Unfortunately i cannot do this with sqlite at the
moment...

A few threads and two or three mutex lock/unlock statements is all it
takes with the design that i have now. The fact that each thread must
use its own connection is a hastle and does impact on the performance,
but not nearly as much as the inability to wrap the whole thing in a
single transaction.


On 12/30/06, Roger Binns [EMAIL PROTECTED] wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

You never answered the bit about whether indexing A can be done at the
same time as B.  (I get the feeling you have a certain design in mind
and insist that SQLite changes to meet that design, rather than change
your own design around the constraints of SQLite).

Emerson Clarke wrote:
| In general worker threads is not an efficient solution to the problem
| even if parallelism is available.  There is nothing to be gained by
| having one thread handing off A to another set of worker threads
| because you have to synchronise on whatever queue/list/pipe you use to
| transfer the information between threads anyway.

The synchronisation is only at the begining and end of a job.  The big
efficiency gain is that you can write synchronous code in the
threads/processes which is way easier to write and less likely to have
bugs.  It is correct that you can write the same thing using a single
thread and using asynchronous I/O but that is way harder.

| So you may as well
| just have multiple threads all performing their A,B, and C operations
| in parallel and they will naturally interleave their operations based
| on the synchronisation that you use.

You haven't said why that won't work in practise.

| threads with mutexes cant do in the foreground and only introduces
| overhead and delay in the queueing process and context switching.

Sounds like premature optimization to me.  Compared to any I/O you are
going to have to do (all of which involves interrupts, DMA transfers and
context switching), other context switching is going to be negligible.

| On the second point, i cant see any reason why the VDBE design would
| be a problem for what i want to do.

Did you run explain?  Quite simply SQLite is designed to keep the
database locked for the minimum amount of time, rather than having
multiple VDBE's having shared locks and doing lots of concurrent locking
for longer periods of time.  Fundamentally changing the design of SQLite
is not a quick thing.

| Taking out a table wide lock is
| fine, so long as i can make changes to that table from multiple
| threads using the same connection and lock.

Taking out a table lock is fine, as long as the table is not locked :-)

If you have actual empirical evidence that SQLite is a bottleneck, then
there are two other approaches that may help:

- - Use multiple databases (eg split on first letter of document name) and
use ATTACH to have them all available at the same time

- - Use the new virtual tables feature and user defined functions.  You
can have the code consult other tables, store up information and do
almost any other workflow and locking scheme you want.  You can copy
batches of data from your virtual

Re: [sqlite] sqlite performance, locking threading

2006-12-28 Thread Emerson Clarke

Roger,

Thanks for the suggestions.  I think using a worker thread and a queue
would be equivalent to just running a single thread since it
effectively makes the database operations synchronous.  Although i can
see what your driving at regarding the transactions every n records.

The idea is that because i am accessing two databases, and doing
several file system operations per document, there should be a large
gain by using many threads.  There is no actual indexing process, the
whole structure is the index, but if anything the database operations
take the most time.  The filesystem operations have a very small
amount of overhead.

I have tried the page size pragma setting already, though i read that
it is dependent on the cluster size of the particular filesystem that
you are running on.

Since i only have one connection to each database from each thread i
dont think i would benefit from the caching.  Im not quite sure why
you would ever have more than one connection to the database from a
single thread ?  The api that i use more or less ensures that under
most circumstances there is only one connection.

Emerson

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threading

2006-12-28 Thread Emerson Clarke

The indexing process works like this.

1.) Open a document and parse its contents.
2.) Look up records in the first database based on the contents of the
document, updating records where appropriate and inserting new ones.
3.) Transforming the document based on what was obtained from the
first database.
4.) Create a filesystem structure in the form of a folder and two or more files.
5.) Look up some more records in a second database, updating and
inserting as necessary.

For architectural reasons the above steps must be performed in that
order. This means that operations cant be separated or queued up in
the way that you suggested.  Each step is dependent on the previous
step.

But by having multiple threads and using synchronisation around the
database operations you can ensure that multiple database operations
are always pending.

Each thread will be at a different stage in the pipeline, but a few
will always be ready to perform a database operation so the idea is to
maximise the throughput.

As you said, the more rows per transaction, the more rows per second.
Which brings us back to the original issue.  Why cant i have multiple
threads all using the same connection within a single transaction ?

Of course i know the simple answer, which is that the current api does
not support this.  But im wondering why, and if there are any other
ways to achieve the desired performance.

Emerson


On 12/28/06, Roger Binns [EMAIL PROTECTED] wrote:

Emerson Clarke wrote:
 The idea is that because i am accessing two databases, and doing
 several file system operations per document, there should be a large
 gain by using many threads.  There is no actual indexing process, the
 whole structure is the index, but if anything the database operations
 take the most time.  The filesystem operations have a very small
 amount of overhead.

That is all unclear from your original description.  Aren't you trying
to index several million documents and doesn't the process of indexing
consist of two parts?

1: Open the document, parse it in various ways, build index data, close it
2: Add a row to a SQLite database

My point was that #1 is way more work than #2, so you can run #1's in
multiple threads/processes and do #2 in a single thread using a
queue/pipe object for communication.

On the other hand, if #1 is way less work than #2 then you will be bound
by the speed at which you decide to make transactions in SQLite.  A 7200
rpm disk limits you to 60 transactions a second.  The more rows per
transaction, the more rows per second.

Roger

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] sqlite performance, locking threadin

2006-12-28 Thread Emerson Clarke

Ken,

Thanks i understand your example well.

What im looking for is the ability to have multiple threads perform a
write operation based on my mutex, not some internal sqlite write
mutex.  If i am managing the concurrency and performing correct
synchronisation, why can i not have multiple threads writing to the
database at once and achive a better overall performance.

Given that the indexing process i am referring too has several steps
which it must perform in successive order, and not all of them are
database writes, i am simply trying to use threads to emulate a
situation where there is only one thread doing all the database
writes, if that makes sense.

So in this case, what ever synchronisation sqlite is doing internally
is actually getting in the way of what i am trying to do.

There were no gains in performance in your test case becuase there was
no reason to have multiple threads.  The only thing each thread was
doing was writing to the database, so of course the only thing you
will introduce by using multiple threads is overhead.

Emerson

On 12/28/06, Ken [EMAIL PROTECTED] wrote:

Emerson,

 Is the Database on the same disk as the rest of the file operations? If so is 
it possible that you are I/O bound and causing seek issues due to i/o access 
patterns?

 Take a look at the test_server.c code in the sqlite/src directory. I used that 
as a basis to build a custom library that opens a single DB and then allows 
multiple threads to access. The nice thing about this architecture is that the 
threads will all get to write and no writer starvation. But all write 
operations an single threaded.

 The test code I ran creates any number of threads and performs the following 
in each thread:

 outer loop 1- 10
 begin txn
 loop  1 -1000
  insert record (using modulo for data so data is unique amongst 
threads)
 end loop
 commit

 prepare statement
 loop 1 - 1000
 Select data (using modulo)
 end loop
  close statement

  begin transaction
  loop 1 - 1000
  delete data, using same modulo
  end loop
 end main loop

 timinng (seconds)  Thread count
 1.6651   (transaction size 
is 1000)
 1.6352   (transcaction 
size is 500)
 3.094   4( txn size is 250 
)
 5.571   8(txn size is 125 )
 7.82216(txn size is 62.5)

 so as the number of threads increase the overall time it takes to 
insert/select/delete a fixed set of data increases using this architecture. 
This is because all threads are serialized upon inserts/deletes and are 
contending on a single writer mutex. So in this particular case fewer threads 
actually improves performance.

 Hope this helps,
 Ken




Emerson Clarke [EMAIL PROTECTED] wrote: Roger,

Thanks for the suggestions.  I think using a worker thread and a queue
would be equivalent to just running a single thread since it
effectively makes the database operations synchronous.  Although i can
see what your driving at regarding the transactions every n records.

The idea is that because i am accessing two databases, and doing
several file system operations per document, there should be a large
gain by using many threads.  There is no actual indexing process, the
whole structure is the index, but if anything the database operations
take the most time.  The filesystem operations have a very small
amount of overhead.

I have tried the page size pragma setting already, though i read that
it is dependent on the cluster size of the particular filesystem that
you are running on.

Since i only have one connection to each database from each thread i
dont think i would benefit from the caching.  Im not quite sure why
you would ever have more than one connection to the database from a
single thread ?  The api that i use more or less ensures that under
most circumstances there is only one connection.

Emerson

-
To unsubscribe, send email to [EMAIL PROTECTED]
-






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] sqlite performance, locking threading

2006-12-27 Thread Emerson Clarke

I am developing a multithreaded C++ library which uses sqlite as an embedded
database alongside the mysql client as a simple sql api.  Both databases
share a common interface which supports statements, prepared statements,
recordsets, records, and transactions.

As part of some research and testing i have been playing around with
creating a small filesystem based search engine index.  Each document within
the index is stored as a file in a folder tree structure, but the contents
of the documents are catalogued in two sqlite databases at the root of the
folder tree.  As each document gets added to the index, either or both of
the sqlite databases are updated.

The goal is to be able to index something in the order of 10 to 100
documents a second.  This will allow indexing of around 8 million documents
per day on a single machine.  As such i have been investigating the ways in
which i can get the most performance out of sqlite.

I began by wrapping the entire indexing process in transactions, this
immediately resulted in a huge performance gain but the entire process was
still very synchronous, only indexing one document at a time.  The next
logical step was to add multiple threads to make better use of the operating
system and interleave the various indexing tasks.  At this point i ran into
the limitations which sqlite places on threading, and after several crashes
and error messages concerning locks and routines called out of sequence i
modified the api to ensure that each thread was given its own sqlite3 *
structure.  However in doing this i was penalised by no longer being able to
take advantage of the application wide transaction (or at least that was my
understanding), so i then had to increase the granularity of the
transactions to be within the normal thread synchronisation blocks.  This
decreased the performance to such a degree that it counteracted any gains
which could be made by using multiple threads.

I also attempted to improve performance by using multiple inserts but
discovered that sqlite does not support the same multiple sets of values
that mysql does, insert into table(a,b) values (1,2),(3,4),(5,6).  I then
tried to use prepared statements but suffered several scary looking crashes
deep within the sqlite code responsible for closing file handles which i
have yet to debug properly.

I then discovered the pragma directives such as pragma synchronise=off and
tried those, but did not get the performance gain that i was looking for
either.

My feelings are that the major gain that i am missing is being able to have
an application wide transaction and multiple threads and also being able to
execute multiple inserts using the values syntax or something similar.
Having read the information on file locking and concurrency (
http://www.sqlite.org/lockingv3.html) and browsed the wiki's and various
other online forums i have some questions regarding the above situation.

The first question is why database locking has been enabled on a per thread
basis instead of per process so that the normal methods of thread
synchronisation (mutexes, ciritcal sections) could not be used for
maintaining consistency within the application.  The programmer can easily
ensure that no two statements are executed at the same time just as they
would ensure that no more than one operation on a linked list was performed
at the same time.  I read somewhere that there is a technical reason for
this behaviour in sqlite, such as the fcntl() function taking out per thread
locks.  But i dont understand why file locking is used at all.  Given that
all platforms, and indeed file systems, have different locking semantics,
would it not be easier to either use a .lock file (in a similar way to the
,journal file) if you really needed to support concurrent access from other
machines. Or just use a much faster IPC mechanism like shared memory and
assume that all access will be from multiple processes on the same machine
(a reasonable assumption for an embedded database).

And the second question is simply how hard is it to support the multiple
insert syntax discussed above, or is it simply a case of there being nothing
to be gained here because the normal insert statement is already fast enough
?  Surely not...

Also are any other optimisation techniques which you can suggest which i
might be missing ?

Apologies for the length, hopefully im not covering well trodden ground here
although i suspect i am.

Emerson