Re: [sqlite] sqlite performance, locking & threading
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Gerry Snyder wrote: > The wiki is there and open to all. > > I look forward to reading your additions to it. To be fair, only some of the documentation is in the wiki. The remainder is generated. For example you can't edit any of the pages listed under: http://sqlite.org/docs.html Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFoA0YmOOfHg372QQRAlycAJ9p3hQkWfc8yAUTqE7DgjGsefdiTwCfWIqD xxpWSAFOZOB6yrbYrNO/Cwc= =jxv/ -END PGP SIGNATURE- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Emerson Clarke wrote: Fix the out of date documentation The wiki is there and open to all. I look forward to reading your additions to it. Gerry - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
For anyone who is interested i have created a standalone test case which demonstrates the threading behaviour that i had, or as close as i can get it. Feel free to use the code for whatever purposes you see fit. It will compile on linux and windows, and comes with 4 versions of the sqlite libraries which can be found under "test/lib/sqlite/lib". Each platform has two versions, one built with THREADSAFE=0, and the other with THREADSAFE=1. http://ul1.rapidshare.com/files/10511098/test.zip It seems that i was incorrect in thinking the cause was sqlite3_step(). In replicating the problem from scratch i found that it was actually caused by sqlite3_finalize() being called without synchronisation. What does this mean, well firstly it means that the entire problem was avoidable had i simply synchronised correctly. But i think more worryingly, it highlights the fact that there is a great deal of confusion as to what it means for something to be thread safe on this mailing list. This confusion is also not aided in any way by the obtuse and stubborn attitudes that some people have demonstrated. If a lot of people are having regular problems with sqlite and threading as has been indicated then perhaps its time to review the situation. It is incredibly frustrating that the documentation and the various answers that i have recieved on this topic have been contradictory. At no time has anyone actually stood up and said that the software can safely be used in a multithreaded environment as long as access to the api's is synchronised. Instead you have a situation where the software has inbuilt misuse detection which may or may not alert the programmer to a potential error and a THREADSAFE macro which makes 5% of the it threadsafe but leaves the other 95% exposed. (By way of example using the code i supplied and the safe versions of the libraries it is possible to get away with calling sqlite3_finalize() outside the mutex, but if you take it one step further and use sqlite3_step() outside the mutex it will fail) So an inconsistent and unpredictable internal approach to thread safety combined with a policy of righteousness on this mailing list means that people like me have little chance of ever getting it right. Why is it so hard to simply nail down the characteristics of the software and update the documentation so that everyone is on the same page ? If sqlite is thread safe, and by that i mean it can be used safely with synchronisation, then why not say so. And be consistent in saying so. That way when someone comes here with a problem you can confidently describe how the api can be used and the user can go away and attempt to resolve their problem with confidence. I guess in part i owe people an appology, but then, had i been given a rational explanation or had access to clear documentation i guess this would not have gone on so long. And to be quite honest, my confidence in the software has now been eroded to such an extent that i dont know what is up or down. Please considder what im saying rationally and dont take offence. I know a lot of you are frustrated at this debate, and so you should be. I am too. So why not take steps to ensure that it doesnt happen again. Be clear about the threading behaviour and remove the inconsistencies. Fix the out of date documentation, and give a consistent answer, sqlite can be used with multiple threads provided that access to the api is synchronised. The saftey checks and macro options do more harm than good becuase they create confusion. Yes if you do X,Y, and Z you might get lucky, but thats probably why so many people have trouble. Its not consistent and its complicated. Unless your going to document the exact sequence of calls which is safe to make without synchronisation, you shouldnt be encouraging it. Advising people to create complicated designs where each thread has its own connection doesnt help either, because it will not work with transactions. It should be either 100% safe, or not at all. Perhaps it would also be useful to document the various strategies and ways in which sqlite can be used with safety. Stop telling people that mutlithreading results in random unpredictable behavior. Be open minded and don't hide behind whatever doctrine has been followed up till now. Emerson - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] sqlite performance, locking & threading
Sounds like a H-- of a plan to me! I'll call Larry Ellison and warn him to put the 12 meter yacht up for sale, he's about to get steam rolled! :-) Fred > -Original Message- > From: Bill King [mailto:[EMAIL PROTECTED] > Sent: Saturday, January 06, 2007 1:15 AM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] sqlite performance, locking & threading > > > Roger Binns wrote: > > >>Im sorry for being so harsh, and i know im not winning any friends > >>here, > >> > >> > > > >So far noone has agreed with you :-) > > > > > > > This would be incorrect. The correct statement is "so far no one has > vocally agreed with you". > > If people didn't agree, this whole once a month people having > multi-threading issues would not be the case. Surely the number of > people with exactly the same issues, month after month, > should point to > something? Perchance illogic? > > Time I think to go off and write my own simple sql engine. > People here > are far too closed minded, and far too quick to attack others because > they don't follow DRH's line of thought, and decisions. > > -- > --- > To unsubscribe, send email to [EMAIL PROTECTED] > -- > --- > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Bill King wrote: Roger Binns wrote: Im sorry for being so harsh, and i know im not winning any friends here, So far noone has agreed with you :-) This would be incorrect. The correct statement is "so far no one has vocally agreed with you". If people didn't agree, this whole once a month people having multi-threading issues would not be the case. Surely the number of people with exactly the same issues, month after month, should point to something? Perchance illogic? Time I think to go off and write my own simple sql engine. People here are far too closed minded, and far too quick to attack others because they don't follow DRH's line of thought, and decisions. Good idea. Would you share your proposed architecture with us. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Roger Binns wrote: Im sorry for being so harsh, and i know im not winning any friends here, So far noone has agreed with you :-) This would be incorrect. The correct statement is "so far no one has vocally agreed with you". If people didn't agree, this whole once a month people having multi-threading issues would not be the case. Surely the number of people with exactly the same issues, month after month, should point to something? Perchance illogic? Time I think to go off and write my own simple sql engine. People here are far too closed minded, and far too quick to attack others because they don't follow DRH's line of thought, and decisions. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: AW: [sqlite] sqlite performance, locking & threading
> not to spark a debate Although the majority of this thread is as clear as mud, it is still interesting, even for simple VBA programmers like me that have no chance (maybe via a VB6 ActiveX exe) to use multi-threading. RBS > Emerson, one posts to a forum like this to get help and other ideas, not > to spark a debate. Many talented people gave you some of their time to > help you solve your problem and one in particular gave you a well > conceived and executed piece of software free of charge. Appreciate > their charity. > > If you have some insights which have escaped the rest of us, implement > them and present the results to the world and dazzle us. > > BTW, a major advantage of Open Source software is that you do not need > to have people explain it to you. You have the source and that explains > everything. That is particularly so with Sqlite, which is clearly > written and consequently the source reads like a book. A few minutes > with the source and grep and you have your answers. > > Emerson Clarke wrote: >> John, >> >> Um, alright then... >> >> But i think your preaching to the converted, simplifying things is >> what i always try to do. And not just when theres a problem >> >> If you followed the thread fully you would realise that there was >> never a problem with my design, though that didnt stop many people >> from chirping in and offering suggestions. >> >> The problem i had was with sqlite not being compatible with the simple >> design that i wanted. I did try several alternate designs, but only >> as a way of working around the problem i had with sqlite. It took a >> long time but eventually i managed to get someone to explain why >> sqlite had that particular problem, and i was able to modify the >> sqlite source to resolve the issue. >> >> Unfortunately no one has yet commented on my solution, or the problem >> which it addresses. Basically sqlite has thread safety checking >> routines which work a little like mutexe's. Every time you enter a >> bit of code which is potentially thread unsafe it sets a magic number, >> then resets it when it comes out. This is an attempt to detect when >> two threads are accessing the same bit of code at the same time. >> >> Clearly its not 100% reliable, and is subject to all kinds of thread >> races, but it does provide some measure of protection. Unfortunately >> though, the way it has been coded, an unbalanced safety check is >> performed in the sqlite3_step() function. >> >> This is equivalent to entering a mutex but never leaving, which causes >> deadlock in a multithreaded program. Only in this situation sqlite >> throws a misuse error any time two or more threads use sqlite3_step() >> at the same time, even if those threads are synchronised and perfectly >> safe. >> >> The easy solution is to disable the safety checks, the propper >> solution is to balance out the checks in sqlite3_step() so that users >> who actually wish to use sqlite in a multithreaded program are free to >> synchronise access to the api without error and there is still a >> reasonable level of safety checking for users who do not synchronise >> properly. >> >> >> Emerson >> >> On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote: >> >>> Work on turning "reasonable" into "adequate" or "good" and it will help >>> you get an intuitive feel for the design of programs such as yours. >>> Then your programs will be simple, fast and robust, as Einstein >>> counselled - "Make it as simple a possible, but no simpler". >>> >>> I also suggest that you take Niklaus Wirth's advice and when you run >>> into a difficulty backtrack your work and scrap everything until you >>> reach a point where there are no problems and start again from that >>> point taking a different approach having learned a lesson from your >>> previous attempt. >>> >>> By the way, I doubt whether you are using a modern operating system, it >>> is most likely to be old technology like Windows or Linux. Neither >>> supports much in the way of parallelism. >>> >> >> - >> >> To unsubscribe, send email to [EMAIL PROTECTED] >> - >> >> > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: AW: [sqlite] sqlite performance, locking & threading
Can we please stop this thread? John Stanton wrote: Emerson, one posts to a forum like this to get help and other ideas, not to spark a debate. Many talented people gave you some of their time to help you solve your problem and one in particular gave you a well conceived and executed piece of software free of charge. Appreciate their charity. If you have some insights which have escaped the rest of us, implement them and present the results to the world and dazzle us. BTW, a major advantage of Open Source software is that you do not need to have people explain it to you. You have the source and that explains everything. That is particularly so with Sqlite, which is clearly written and consequently the source reads like a book. A few minutes with the source and grep and you have your answers. Emerson Clarke wrote: John, Um, alright then... But i think your preaching to the converted, simplifying things is what i always try to do. And not just when theres a problem If you followed the thread fully you would realise that there was never a problem with my design, though that didnt stop many people from chirping in and offering suggestions. The problem i had was with sqlite not being compatible with the simple design that i wanted. I did try several alternate designs, but only as a way of working around the problem i had with sqlite. It took a long time but eventually i managed to get someone to explain why sqlite had that particular problem, and i was able to modify the sqlite source to resolve the issue. Unfortunately no one has yet commented on my solution, or the problem which it addresses. Basically sqlite has thread safety checking routines which work a little like mutexe's. Every time you enter a bit of code which is potentially thread unsafe it sets a magic number, then resets it when it comes out. This is an attempt to detect when two threads are accessing the same bit of code at the same time. Clearly its not 100% reliable, and is subject to all kinds of thread races, but it does provide some measure of protection. Unfortunately though, the way it has been coded, an unbalanced safety check is performed in the sqlite3_step() function. This is equivalent to entering a mutex but never leaving, which causes deadlock in a multithreaded program. Only in this situation sqlite throws a misuse error any time two or more threads use sqlite3_step() at the same time, even if those threads are synchronised and perfectly safe. The easy solution is to disable the safety checks, the propper solution is to balance out the checks in sqlite3_step() so that users who actually wish to use sqlite in a multithreaded program are free to synchronise access to the api without error and there is still a reasonable level of safety checking for users who do not synchronise properly. Emerson On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote: Work on turning "reasonable" into "adequate" or "good" and it will help you get an intuitive feel for the design of programs such as yours. Then your programs will be simple, fast and robust, as Einstein counselled - "Make it as simple a possible, but no simpler". I also suggest that you take Niklaus Wirth's advice and when you run into a difficulty backtrack your work and scrap everything until you reach a point where there are no problems and start again from that point taking a different approach having learned a lesson from your previous attempt. By the way, I doubt whether you are using a modern operating system, it is most likely to be old technology like Windows or Linux. Neither supports much in the way of parallelism. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: AW: [sqlite] sqlite performance, locking & threading
Emerson, one posts to a forum like this to get help and other ideas, not to spark a debate. Many talented people gave you some of their time to help you solve your problem and one in particular gave you a well conceived and executed piece of software free of charge. Appreciate their charity. If you have some insights which have escaped the rest of us, implement them and present the results to the world and dazzle us. BTW, a major advantage of Open Source software is that you do not need to have people explain it to you. You have the source and that explains everything. That is particularly so with Sqlite, which is clearly written and consequently the source reads like a book. A few minutes with the source and grep and you have your answers. Emerson Clarke wrote: John, Um, alright then... But i think your preaching to the converted, simplifying things is what i always try to do. And not just when theres a problem If you followed the thread fully you would realise that there was never a problem with my design, though that didnt stop many people from chirping in and offering suggestions. The problem i had was with sqlite not being compatible with the simple design that i wanted. I did try several alternate designs, but only as a way of working around the problem i had with sqlite. It took a long time but eventually i managed to get someone to explain why sqlite had that particular problem, and i was able to modify the sqlite source to resolve the issue. Unfortunately no one has yet commented on my solution, or the problem which it addresses. Basically sqlite has thread safety checking routines which work a little like mutexe's. Every time you enter a bit of code which is potentially thread unsafe it sets a magic number, then resets it when it comes out. This is an attempt to detect when two threads are accessing the same bit of code at the same time. Clearly its not 100% reliable, and is subject to all kinds of thread races, but it does provide some measure of protection. Unfortunately though, the way it has been coded, an unbalanced safety check is performed in the sqlite3_step() function. This is equivalent to entering a mutex but never leaving, which causes deadlock in a multithreaded program. Only in this situation sqlite throws a misuse error any time two or more threads use sqlite3_step() at the same time, even if those threads are synchronised and perfectly safe. The easy solution is to disable the safety checks, the propper solution is to balance out the checks in sqlite3_step() so that users who actually wish to use sqlite in a multithreaded program are free to synchronise access to the api without error and there is still a reasonable level of safety checking for users who do not synchronise properly. Emerson On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote: Work on turning "reasonable" into "adequate" or "good" and it will help you get an intuitive feel for the design of programs such as yours. Then your programs will be simple, fast and robust, as Einstein counselled - "Make it as simple a possible, but no simpler". I also suggest that you take Niklaus Wirth's advice and when you run into a difficulty backtrack your work and scrap everything until you reach a point where there are no problems and start again from that point taking a different approach having learned a lesson from your previous attempt. By the way, I doubt whether you are using a modern operating system, it is most likely to be old technology like Windows or Linux. Neither supports much in the way of parallelism. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Emerson, I agree with you somewhat. Not 100% convinced but, I like you am a little dissapointed how sqlite handles "threadsafe" and multiple connections. Even in the "test_server.c" module is not "concurrent" As it serializes all processing to a single thread, this is not concurrent processing. Now if i could take that server thread and create a pool of workers and hand off the working set request to one of those, then this would be concurrent since each thread would have the capability to run independently of the other and at the same time on an SMP hardware system such as Solaris,Hpux etc... But this is specifically disallowed since the connection when using the shared cache must be used by the creating thread. So far I've not been able to successfully find any threadsafe examples of sqlite that actually work in a concurrent manner. Even the threadtest2.c crased (bug fix coming in 3.3.10) but even it does not start up a transaction and run multiple inserts/step commands. I would like to see a working example of how sqlite can be concurrently accessed via multiple threads performingWrite (insert/update/delete) and by other threads performing selects against a single table. IMHO, the sqlite3_step function if it can only be executed serially (ie must be wrapped in a mutex) should do this mutexing internally and not be exposed and left in the hands of the user. If one compiles the code with --enable-threadsafe, shouldn't the API be able to handle areas such as these internally and not generate "misuse" errors. Emerson Clarke <[EMAIL PROTECTED]> wrote: Roger, Of course you can test threading behaviour, yes its not exactly repeatable but under most circumstances and with enough test cases you can catch the problems. I don't think sqlite is such a large and complicated piece of software that it would be impossible to reproduce such errors. Everyone keeps saying its well documented and referring to the fact that the reasoning has already been done, but i havnt heard anything other than the usual "thats the way its always been" responses, or the "its designed that way beacuse its easier" argument. That does not count as documentation or reasoning. If anything the source code is the documentation, but thats not quite the same as an essay on the thread safety of the sqlite vm is it ? Anyway, i can see im flogging a dead horse here. This is as futile as arguing religion with a priest. :P Theres not much point forking any maintaining the code if on one else sees any validity in my arguments. Thanks to everyone anyway, Emerson On 1/5/07, Roger Binns wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Emerson Clarke wrote: > > I have to admit i am a little dissapointed. As the primary author of > > the software i would have thought that you would have a good > > understanding of what the thread safety characteristics of your own > > api were. > > He does! It is well documented and tested. *You* are the one who wants > to do various things differently. > > > Suggesting that suppressing the safety checks will result in random > > and non reproducable failures seems a little extreme, if not > > superstitious. > > Err, you are using threading. Explain exactly how if you got a failure > due to threading you could reproduce it exactly. You do know that you > can't run x86 processors in lockstep because at the cycle level they > have effectively random behaviour due to cache eviction and replacement > policies. That one reason for the Itanium if you need two processors to > have exactly the same behaviour on the same code at the cycle level. > > > As i was discussing with Nicholas, api's are not by default thread > > unsafe. In fact it is the other way around, most api's should be > > perfectly thread safe if access to them is synchronised. > > This all comes down to semantics. For trivial libraries, sure you can > just wrap a mutex around it all. For non-trivial libraries, best > practise is for the author to design and implement for whatever thread > usage they believe is best suited for the library and the platforms it > operates one. > > > Im sorry for being so harsh, and i know im not winning any friends > > here, > > So far noone has agreed with you :-) > > > I have tested the changes i made with some 50 threads performing > > probably thousands of statements a second, and had no errors. > > What does that prove? You cannot prove threading by running stuff and > saying "see it didn't crash". The only way to prove threading is by > reasoning about the design and verifying the implementation matches the > design. drh already did that years ago. > > > If you are willing to work through the problem, i am willing to assist > > you in any way that i can. Wether it be writing test cases or making > > modifications to the source. > > You can always maintain your own forked version of SQLite and link to it > from the wiki. > > Roger > -BEGIN PGP S
Re: [sqlite] sqlite performance, locking & threading
Roger, Of course you can test threading behaviour, yes its not exactly repeatable but under most circumstances and with enough test cases you can catch the problems. I don't think sqlite is such a large and complicated piece of software that it would be impossible to reproduce such errors. Everyone keeps saying its well documented and referring to the fact that the reasoning has already been done, but i havnt heard anything other than the usual "thats the way its always been" responses, or the "its designed that way beacuse its easier" argument. That does not count as documentation or reasoning. If anything the source code is the documentation, but thats not quite the same as an essay on the thread safety of the sqlite vm is it ? Anyway, i can see im flogging a dead horse here. This is as futile as arguing religion with a priest. :P Theres not much point forking any maintaining the code if on one else sees any validity in my arguments. Thanks to everyone anyway, Emerson On 1/5/07, Roger Binns <[EMAIL PROTECTED]> wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Emerson Clarke wrote: > I have to admit i am a little dissapointed. As the primary author of > the software i would have thought that you would have a good > understanding of what the thread safety characteristics of your own > api were. He does! It is well documented and tested. *You* are the one who wants to do various things differently. > Suggesting that suppressing the safety checks will result in random > and non reproducable failures seems a little extreme, if not > superstitious. Err, you are using threading. Explain exactly how if you got a failure due to threading you could reproduce it exactly. You do know that you can't run x86 processors in lockstep because at the cycle level they have effectively random behaviour due to cache eviction and replacement policies. That one reason for the Itanium if you need two processors to have exactly the same behaviour on the same code at the cycle level. > As i was discussing with Nicholas, api's are not by default thread > unsafe. In fact it is the other way around, most api's should be > perfectly thread safe if access to them is synchronised. This all comes down to semantics. For trivial libraries, sure you can just wrap a mutex around it all. For non-trivial libraries, best practise is for the author to design and implement for whatever thread usage they believe is best suited for the library and the platforms it operates one. > Im sorry for being so harsh, and i know im not winning any friends > here, So far noone has agreed with you :-) > I have tested the changes i made with some 50 threads performing > probably thousands of statements a second, and had no errors. What does that prove? You cannot prove threading by running stuff and saying "see it didn't crash". The only way to prove threading is by reasoning about the design and verifying the implementation matches the design. drh already did that years ago. > If you are willing to work through the problem, i am willing to assist > you in any way that i can. Wether it be writing test cases or making > modifications to the source. You can always maintain your own forked version of SQLite and link to it from the wiki. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFnnDQmOOfHg372QQRAtz1AJwIzJbrlnH0xImPSJYzvutAtMomVACfSltc KaSDdgBuwW4ITN1UWms5DMI= =EqF9 -END PGP SIGNATURE- - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Emerson Clarke wrote: > I have to admit i am a little dissapointed. As the primary author of > the software i would have thought that you would have a good > understanding of what the thread safety characteristics of your own > api were. He does! It is well documented and tested. *You* are the one who wants to do various things differently. > Suggesting that suppressing the safety checks will result in random > and non reproducable failures seems a little extreme, if not > superstitious. Err, you are using threading. Explain exactly how if you got a failure due to threading you could reproduce it exactly. You do know that you can't run x86 processors in lockstep because at the cycle level they have effectively random behaviour due to cache eviction and replacement policies. That one reason for the Itanium if you need two processors to have exactly the same behaviour on the same code at the cycle level. > As i was discussing with Nicholas, api's are not by default thread > unsafe. In fact it is the other way around, most api's should be > perfectly thread safe if access to them is synchronised. This all comes down to semantics. For trivial libraries, sure you can just wrap a mutex around it all. For non-trivial libraries, best practise is for the author to design and implement for whatever thread usage they believe is best suited for the library and the platforms it operates one. > Im sorry for being so harsh, and i know im not winning any friends > here, So far noone has agreed with you :-) > I have tested the changes i made with some 50 threads performing > probably thousands of statements a second, and had no errors. What does that prove? You cannot prove threading by running stuff and saying "see it didn't crash". The only way to prove threading is by reasoning about the design and verifying the implementation matches the design. drh already did that years ago. > If you are willing to work through the problem, i am willing to assist > you in any way that i can. Wether it be writing test cases or making > modifications to the source. You can always maintain your own forked version of SQLite and link to it from the wiki. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFnnDQmOOfHg372QQRAtz1AJwIzJbrlnH0xImPSJYzvutAtMomVACfSltc KaSDdgBuwW4ITN1UWms5DMI= =EqF9 -END PGP SIGNATURE- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Richard, I have to admit i am a little dissapointed. As the primary author of the software i would have thought that you would have a good understanding of what the thread safety characteristics of your own api were. Suggesting that suppressing the safety checks will result in random and non reproducable failures seems a little extreme, if not superstitious. As i was discussing with Nicholas, api's are not by default thread unsafe. In fact it is the other way around, most api's should be perfectly thread safe if access to them is synchronised. It is my undestanding that sqlite3_stmt * represents an instance of the VM, and that most of the internal virtual machine operations are reentrant, simply operating on the supplied VM and modifying its internal state. If this is the case, i cant see how you would ever have thread safety problems so long as no two threads are modifying the same sqlite3_stmt * at the same time. Surely this is something that can be reasoned through ? Adding deliberate safety check imbalances to sqlite3_step becuase it is the only routine that can be called mutliple times in sequence is a bit extreme. Why not allow users who don't believe in withces and goblins go ahead and use the api in a synchronised way. The safety checks will still be there, just not as agressively. You should also considder that the safety checks only provide a warning for the user. Given that they are subject to race conditions, you can never guarantee that the safety checks will trap all errors. In fact i would argue that you are better off throwing an int 3 instruction than returning a misuse error. Giving the programmer the false idea that they can continue to run their program without stack corruption is not a good idea, after all its just encouraging them to try again. Rather, according to what you are saying they should be re-writing their code to access the api with a single thread. So on the one hand you are actively disallowing users like myself from synchronising thread access to the api, but on the other hand encouraging other users to not fix the actual problems. It doesnt make sense... Im sorry for being so harsh, and i know im not winning any friends here, but i dont give up easily. I think sqlite is an excellent piece of software and i believe there is a way which it can be made to accomodate these concerns. I have tested the changes i made with some 50 threads performing probably thousands of statements a second, and had no errors. If you are willing to work through the problem, i am willing to assist you in any way that i can. Wether it be writing test cases or making modifications to the source. If there are problems as you say, then there is no reason why they cant alteast be documented and verified. Emerson On 1/5/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: "Emerson Clarke" <[EMAIL PROTECTED]> wrote: > > The problem i had was with sqlite not being compatible with the simple > design that i wanted. I did try several alternate designs, but only > as a way of working around the problem i had with sqlite. It took a > long time but eventually i managed to get someone to explain why > sqlite had that particular problem, and i was able to modify the > sqlite source to resolve the issue. > I remain unconvinced that there is any problem with SQLite in regard to thread safety. It is my belief that by disabling the safety check routines, you are opening your self up to lots of problems. Those routines are there to protect you, the programmer, and to make your programming errors obvious. By disabling those checks, you have not fixed the problem. You have merely suppressed the symptoms so that you will get rare, random failures that cannot be easily reproduced. On your own head be it. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
"Emerson Clarke" <[EMAIL PROTECTED]> wrote: > > The problem i had was with sqlite not being compatible with the simple > design that i wanted. I did try several alternate designs, but only > as a way of working around the problem i had with sqlite. It took a > long time but eventually i managed to get someone to explain why > sqlite had that particular problem, and i was able to modify the > sqlite source to resolve the issue. > I remain unconvinced that there is any problem with SQLite in regard to thread safety. It is my belief that by disabling the safety check routines, you are opening your self up to lots of problems. Those routines are there to protect you, the programmer, and to make your programming errors obvious. By disabling those checks, you have not fixed the problem. You have merely suppressed the symptoms so that you will get rare, random failures that cannot be easily reproduced. On your own head be it. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: AW: [sqlite] sqlite performance, locking & threading
John, Um, alright then... But i think your preaching to the converted, simplifying things is what i always try to do. And not just when theres a problem If you followed the thread fully you would realise that there was never a problem with my design, though that didnt stop many people from chirping in and offering suggestions. The problem i had was with sqlite not being compatible with the simple design that i wanted. I did try several alternate designs, but only as a way of working around the problem i had with sqlite. It took a long time but eventually i managed to get someone to explain why sqlite had that particular problem, and i was able to modify the sqlite source to resolve the issue. Unfortunately no one has yet commented on my solution, or the problem which it addresses. Basically sqlite has thread safety checking routines which work a little like mutexe's. Every time you enter a bit of code which is potentially thread unsafe it sets a magic number, then resets it when it comes out. This is an attempt to detect when two threads are accessing the same bit of code at the same time. Clearly its not 100% reliable, and is subject to all kinds of thread races, but it does provide some measure of protection. Unfortunately though, the way it has been coded, an unbalanced safety check is performed in the sqlite3_step() function. This is equivalent to entering a mutex but never leaving, which causes deadlock in a multithreaded program. Only in this situation sqlite throws a misuse error any time two or more threads use sqlite3_step() at the same time, even if those threads are synchronised and perfectly safe. The easy solution is to disable the safety checks, the propper solution is to balance out the checks in sqlite3_step() so that users who actually wish to use sqlite in a multithreaded program are free to synchronise access to the api without error and there is still a reasonable level of safety checking for users who do not synchronise properly. Emerson On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote: Work on turning "reasonable" into "adequate" or "good" and it will help you get an intuitive feel for the design of programs such as yours. Then your programs will be simple, fast and robust, as Einstein counselled - "Make it as simple a possible, but no simpler". I also suggest that you take Niklaus Wirth's advice and when you run into a difficulty backtrack your work and scrap everything until you reach a point where there are no problems and start again from that point taking a different approach having learned a lesson from your previous attempt. By the way, I doubt whether you are using a modern operating system, it is most likely to be old technology like Windows or Linux. Neither supports much in the way of parallelism. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: AW: [sqlite] sqlite performance, locking & threading
Work on turning "reasonable" into "adequate" or "good" and it will help you get an intuitive feel for the design of programs such as yours. Then your programs will be simple, fast and robust, as Einstein counselled - "Make it as simple a possible, but no simpler". I also suggest that you take Niklaus Wirth's advice and when you run into a difficulty backtrack your work and scrap everything until you reach a point where there are no problems and start again from that point taking a different approach having learned a lesson from your previous attempt. By the way, I doubt whether you are using a modern operating system, it is most likely to be old technology like Windows or Linux. Neither supports much in the way of parallelism. Emerson Clarke wrote: John, I have a reasonable understanding of the PC architecture, and more appropriately the architecture which the operating system presents to software. The PC may be a serial device, but a modern operating system with its multitasking shcheduler attempts to emulate a non serial environment. It devotes a certain amount of time to each thread and then moves on. Wether or not you are working on a highly pyshically parallel architecture or not makes no difference, the design principles are the same. You should still build your software so that it is capable of taking advantage of the environment that the operating system presents. It is the operating system you should be designing for, not the hardware... As it happens, the newest generation of PC's are all multi-core, and i have been working on multi processor environments for many years. Emerson On 1/4/07, John Stanton <[EMAIL PROTECTED]> wrote: If Emeroson intuitively understood the essential architecture of the PC he is using he would not be having difficulty with his concept of how to use it. It is essentially a serial device, multi-tasking device and parallelism in the forms of threading and multi processing is a sophistication added with a high overhead. I recollect an insightful CS professor impressing on his his students the concept by explaining to them that the machines on their desks were descended from a device invented to be a gas pump controller. A machine designed from first principles to manage parrallel processing would be very different. Michael Ruck wrote: > Hi Emerson, > > I just hope you don't reinvent the wheel ;) I haven't yet had the need to > index things the way you describe it. May be I should take that as one of my > next pet projects to get a handle on this type of task. > > The problem as I see it is basically, that any way you design this: If the > storage tasks take 90% of your indexing time, then any parallelization may > be a waste of effort. Even if you use a synchronization object you're > essentially serializing things in a (complicated) multithreaded way... > > As far as static initialization: That it occurs before main() and is out of > your control was the point I was getting across. That's why I wrote that > this type of initialization should be avoided, unless there's no better > design for it. > > Michael > > -Ursprüngliche Nachricht- > Von: Emerson Clarke [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 3. Januar 2007 20:31 > An: sqlite-users@sqlite.org > Betreff: Re: [sqlite] sqlite performance, locking & threading > > Michael, > > Thanks for the advice. During the indexing process i need to select and > optionally insert records into a table so i cant ignore the outcomes. > > Basically the indexing process does compression, so for each document it > inserts words into a table and looks up keys. Every word in the document > gets swapped with a key, and new keys are inserted as needed. > > There are some problems with splitting the work up in a different way as you > suggested. I would either end up with a lot of queues or i would have to > stagger the work so that the entire data set gets processed in stages which > doesnt scale very well and isnt particularly fault tollerant. When building > an index, you want the structure to be built up progressively, so that you > can pause the process and resume it later on whilst still having useful > results. > > I would be worried that in a queued design, the overhead and bottlenecks > caused by the buffering, message passing, and context switching would reduce > the performance to that of a single thread. > Especially since the database operations represent 90% of the work, all you > would really be doing is attempting to serialise things in a multithreaded > way. > > Im sure having worked on multithreaded systems you appreciate that sometimes > simple designs are better, and i think i have a pretty good handle on what > i
Re: AW: [sqlite] sqlite performance, locking & threading
John, I have a reasonable understanding of the PC architecture, and more appropriately the architecture which the operating system presents to software. The PC may be a serial device, but a modern operating system with its multitasking shcheduler attempts to emulate a non serial environment. It devotes a certain amount of time to each thread and then moves on. Wether or not you are working on a highly pyshically parallel architecture or not makes no difference, the design principles are the same. You should still build your software so that it is capable of taking advantage of the environment that the operating system presents. It is the operating system you should be designing for, not the hardware... As it happens, the newest generation of PC's are all multi-core, and i have been working on multi processor environments for many years. Emerson On 1/4/07, John Stanton <[EMAIL PROTECTED]> wrote: If Emeroson intuitively understood the essential architecture of the PC he is using he would not be having difficulty with his concept of how to use it. It is essentially a serial device, multi-tasking device and parallelism in the forms of threading and multi processing is a sophistication added with a high overhead. I recollect an insightful CS professor impressing on his his students the concept by explaining to them that the machines on their desks were descended from a device invented to be a gas pump controller. A machine designed from first principles to manage parrallel processing would be very different. Michael Ruck wrote: > Hi Emerson, > > I just hope you don't reinvent the wheel ;) I haven't yet had the need to > index things the way you describe it. May be I should take that as one of my > next pet projects to get a handle on this type of task. > > The problem as I see it is basically, that any way you design this: If the > storage tasks take 90% of your indexing time, then any parallelization may > be a waste of effort. Even if you use a synchronization object you're > essentially serializing things in a (complicated) multithreaded way... > > As far as static initialization: That it occurs before main() and is out of > your control was the point I was getting across. That's why I wrote that > this type of initialization should be avoided, unless there's no better > design for it. > > Michael > > -Ursprüngliche Nachricht- > Von: Emerson Clarke [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 3. Januar 2007 20:31 > An: sqlite-users@sqlite.org > Betreff: Re: [sqlite] sqlite performance, locking & threading > > Michael, > > Thanks for the advice. During the indexing process i need to select and > optionally insert records into a table so i cant ignore the outcomes. > > Basically the indexing process does compression, so for each document it > inserts words into a table and looks up keys. Every word in the document > gets swapped with a key, and new keys are inserted as needed. > > There are some problems with splitting the work up in a different way as you > suggested. I would either end up with a lot of queues or i would have to > stagger the work so that the entire data set gets processed in stages which > doesnt scale very well and isnt particularly fault tollerant. When building > an index, you want the structure to be built up progressively, so that you > can pause the process and resume it later on whilst still having useful > results. > > I would be worried that in a queued design, the overhead and bottlenecks > caused by the buffering, message passing, and context switching would reduce > the performance to that of a single thread. > Especially since the database operations represent 90% of the work, all you > would really be doing is attempting to serialise things in a multithreaded > way. > > Im sure having worked on multithreaded systems you appreciate that sometimes > simple designs are better, and i think i have a pretty good handle on what > it is that im trying to do. > > You never have control over static initialisation, it happens before main(). > If i was writing very specific code to suit just this situation then maybe > as you say i wouldnt need to worry about it. But im also writing a database > api, and that api is used for many different things. My considderations are > not just for this one problem, but also for the best general way to code the > api so that it is safe and efficient in all circumstances. So far the > client/server design is the only way i can achieve true thread safety. > > If i could work out why sqlite3_step() causes problems across multiple > threads i could probably make things a little faster and i could do away > with the need for a client/server design. > > Emerson > > > On 1/3/07, Michael
Re: [sqlite] sqlite performance, locking & threading
Bill, Thanks for the description, thats pretty much how i designed the index, but with a few modifications. The filesystem becomes the tree structure which is indexed by a hash of the original document url. It works like a big hashtable so its quite scalable. > Sorry if this has been posited before, but our solution to this was data normalisation. IE, we store only the filename, and an integer ID which is the directory of the file, and of course a lookup table of id's to directory names. This helped us greatly, as well as reducing overhead of amount of data stored. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: AW: [sqlite] sqlite performance, locking & threading
If Emeroson intuitively understood the essential architecture of the PC he is using he would not be having difficulty with his concept of how to use it. It is essentially a serial device, multi-tasking device and parallelism in the forms of threading and multi processing is a sophistication added with a high overhead. I recollect an insightful CS professor impressing on his his students the concept by explaining to them that the machines on their desks were descended from a device invented to be a gas pump controller. A machine designed from first principles to manage parrallel processing would be very different. Michael Ruck wrote: Hi Emerson, I just hope you don't reinvent the wheel ;) I haven't yet had the need to index things the way you describe it. May be I should take that as one of my next pet projects to get a handle on this type of task. The problem as I see it is basically, that any way you design this: If the storage tasks take 90% of your indexing time, then any parallelization may be a waste of effort. Even if you use a synchronization object you're essentially serializing things in a (complicated) multithreaded way... As far as static initialization: That it occurs before main() and is out of your control was the point I was getting across. That's why I wrote that this type of initialization should be avoided, unless there's no better design for it. Michael -Ursprüngliche Nachricht- Von: Emerson Clarke [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 3. Januar 2007 20:31 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] sqlite performance, locking & threading Michael, Thanks for the advice. During the indexing process i need to select and optionally insert records into a table so i cant ignore the outcomes. Basically the indexing process does compression, so for each document it inserts words into a table and looks up keys. Every word in the document gets swapped with a key, and new keys are inserted as needed. There are some problems with splitting the work up in a different way as you suggested. I would either end up with a lot of queues or i would have to stagger the work so that the entire data set gets processed in stages which doesnt scale very well and isnt particularly fault tollerant. When building an index, you want the structure to be built up progressively, so that you can pause the process and resume it later on whilst still having useful results. I would be worried that in a queued design, the overhead and bottlenecks caused by the buffering, message passing, and context switching would reduce the performance to that of a single thread. Especially since the database operations represent 90% of the work, all you would really be doing is attempting to serialise things in a multithreaded way. Im sure having worked on multithreaded systems you appreciate that sometimes simple designs are better, and i think i have a pretty good handle on what it is that im trying to do. You never have control over static initialisation, it happens before main(). If i was writing very specific code to suit just this situation then maybe as you say i wouldnt need to worry about it. But im also writing a database api, and that api is used for many different things. My considderations are not just for this one problem, but also for the best general way to code the api so that it is safe and efficient in all circumstances. So far the client/server design is the only way i can achieve true thread safety. If i could work out why sqlite3_step() causes problems across multiple threads i could probably make things a little faster and i could do away with the need for a client/server design. Emerson On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote: Emerson, Now I understand your current implementation. You seemingly only partially split up the work in your code. I'd schedule the database operation and not wait on the outcome, but start on the next task. When the database finishes and has retrieved its result, schedule some work package on a third thread, which only processes the results etc. Split up the work in to repetitive, non blocking tasks. Use multiple queues and dedicated threads for parts of the operation or thread pools, which process queues in parallel if possible. From what I can tell you're already half way there. I still don't see your static initialization problem, but that's another story. Actually I'd avoid using static initialization or static (singleton) instances, unless the design really requires it. Someone must control startup of the entire process, have that one (probably main/WinMain) take care that the work queues are available. Afterwards the order of thread starts doesn't matter... Actually it is non-deterministic anyway (unless you serialize this yourself.) Michael -Ursprüngliche Nachricht- Von: Emerson Clarke [mailto:[EMAIL PROTECTED] Ges
Re: [sqlite] sqlite performance, locking & threading
Emerson Clarke wrote: The indexing process works like this. 1.) Open a document and parse its contents. 2.) Look up records in the first database based on the contents of the document, updating records where appropriate and inserting new ones. 3.) Transforming the document based on what was obtained from the first database. 4.) Create a filesystem structure in the form of a folder and two or more files. 5.) Look up some more records in a second database, updating and inserting as necessary. For architectural reasons the above steps must be performed in that order. This means that operations cant be separated or queued up in the way that you suggested. Each step is dependent on the previous step. But by having multiple threads and using synchronisation around the database operations you can ensure that multiple database operations are always pending. Each thread will be at a different stage in the pipeline, but a few will always be ready to perform a database operation so the idea is to maximise the throughput. As you said, the more rows per transaction, the more rows per second. Which brings us back to the original issue. Why cant i have multiple threads all using the same connection within a single transaction ? Of course i know the simple answer, which is that the current api does not support this. But im wondering why, and if there are any other ways to achieve the desired performance. Emerson On 12/28/06, Roger Binns <[EMAIL PROTECTED]> wrote: Emerson Clarke wrote: > The idea is that because i am accessing two databases, and doing > several file system operations per document, there should be a large > gain by using many threads. There is no actual indexing process, the > whole structure is the index, but if anything the database operations > take the most time. The filesystem operations have a very small > amount of overhead. That is all unclear from your original description. Aren't you trying to "index" several million documents and doesn't the process of indexing consist of two parts? 1: Open the document, parse it in various ways, build index data, close it 2: Add a row to a SQLite database My point was that #1 is way more work than #2, so you can run #1's in multiple threads/processes and do #2 in a single thread using a queue/pipe object for communication. On the other hand, if #1 is way less work than #2 then you will be bound by the speed at which you decide to make transactions in SQLite. A 7200 rpm disk limits you to 60 transactions a second. The more rows per transaction, the more rows per second. Roger - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] - Sorry if this has been posited before, but our solution to this was data normalisation. IE, we store only the filename, and an integer ID which is the directory of the file, and of course a lookup table of id's to directory names. This helped us greatly, as well as reducing overhead of amount of data stored. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Nicholas, Thanks again for your patience, i think were getting somewhere. As an aside, can i just say that i dont understand why concurrency is always such a point of confusion. I think it should be a basic assumption that when people talk about accessing api's and data structures in a multithreaded environment they talk about doing it in a synchronised way. Unless otherwise specified, concurrent modification of any data structure which involves state is always unsafe. Anyway, i have removed the misuse detection. My application now runs fine without crashing, and i can use the multiple thread, single connection, single transaction design so it is about 25% faster than the other alternatives i have tried. I simply commented out the code in the sqlite3SafetyOn() and sqlite3SafetyOff() methods and had them both return 0. Assuming i didnt just get lucky, this means that the safety functions were responsible for triggering the misuse i was seeing and that this is probably due to the safety functions themselves not being used in all the necessary places to correctly track the state even under synchronised concurrency. If it really is the case that there is no reason why sqlite should be thread unsafe when synchronised correctly, then why are those safety functions there ? If they are there to detect circumstances where the api is used and not synchronised then i would suggest that they are doing more harm than good. As i dont know the code well i cant be certain that what i have done is enough to make it completely error free. Perhaps someone who is familiar with the safety functions can comment more on this ? Emerson On 1/4/07, Nicolas Williams <[EMAIL PROTECTED]> wrote: On Thu, Jan 04, 2007 at 12:50:01AM +, Emerson Clarke wrote: > My oppologies, your right that explanation had been given. OK. > But i didnt actually take it seriously, i guess i found it hard to > believe that it being the easier option was the only reason why this > limitation was in place. SQLite is a large pile of code. Other libraries that I'm familiar with that have taken this approach are larger still. Retrofitting MT-safety into these is hard, so the easiest path is often taken. (It may be that SQLite was always intended to be MT-safe, but I don't know that for a fact.) > If this is the case, then surely the fix is simple. Given that i > assume it is safe to have multiple sqlite3_step() calls active on a > single connection on a single thread. And given what you have said > about sqlite not already checking data structures that would be shared > by multiple threads, then surely all that needs to happen is for the > misuse detection to be removed. Your first assumption, as has been explained repeatedly, is incorrect. Oh, wait. I think I understand what's happening. You've missunderstood what you've been told (your next paragraph makes me think so). You *can* use sqlite3_step() with the same db context in multiple threads, you just have to synchronize so this doesn't happen *concurrently*. If you remove the misuse detection but don't synchronize I believe you'll find that your application will crash or worse. > Since there is usually nothing which needs to be done to specifically > make any api thread safe other than synchronising access too it. If > by synchronising access to the api calls i can ensure that no two > threads use any data structure at the same time, everything should > work fine right ? Yes. Nico -- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
On Thu, Jan 04, 2007 at 12:50:01AM +, Emerson Clarke wrote: > My oppologies, your right that explanation had been given. OK. > But i didnt actually take it seriously, i guess i found it hard to > believe that it being the easier option was the only reason why this > limitation was in place. SQLite is a large pile of code. Other libraries that I'm familiar with that have taken this approach are larger still. Retrofitting MT-safety into these is hard, so the easiest path is often taken. (It may be that SQLite was always intended to be MT-safe, but I don't know that for a fact.) > If this is the case, then surely the fix is simple. Given that i > assume it is safe to have multiple sqlite3_step() calls active on a > single connection on a single thread. And given what you have said > about sqlite not already checking data structures that would be shared > by multiple threads, then surely all that needs to happen is for the > misuse detection to be removed. Your first assumption, as has been explained repeatedly, is incorrect. Oh, wait. I think I understand what's happening. You've missunderstood what you've been told (your next paragraph makes me think so). You *can* use sqlite3_step() with the same db context in multiple threads, you just have to synchronize so this doesn't happen *concurrently*. If you remove the misuse detection but don't synchronize I believe you'll find that your application will crash or worse. > Since there is usually nothing which needs to be done to specifically > make any api thread safe other than synchronising access too it. If > by synchronising access to the api calls i can ensure that no two > threads use any data structure at the same time, everything should > work fine right ? Yes. Nico -- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Nicholas, My oppologies, your right that explanation had been given. But i didnt actually take it seriously, i guess i found it hard to believe that it being the easier option was the only reason why this limitation was in place. If this is the case, then surely the fix is simple. Given that i assume it is safe to have multiple sqlite3_step() calls active on a single connection on a single thread. And given what you have said about sqlite not already checking data structures that would be shared by multiple threads, then surely all that needs to happen is for the misuse detection to be removed. Since there is usually nothing which needs to be done to specifically make any api thread safe other than synchronising access too it. If by synchronising access to the api calls i can ensure that no two threads use any data structure at the same time, everything should work fine right ? This gets to the very core of the original issue i had, and why i said sqlite was "actively" thread unsafe. Because it terminates itself with misuse errors even when in theory it is perfectly safe to be used with multiple threads provided the user synchronises on every api call, which is easily achieved. Am i making any sense ? Emerson On 1/3/07, Nicolas Williams <[EMAIL PROTECTED]> wrote: On Wed, Jan 03, 2007 at 11:22:36PM +, Emerson Clarke wrote: > Ok, > > Well can you do me a favour and forward me the email where this was > supposedly explained in all its technical glory. Technical glory? No. At a high level it's this: that it is easier to make an API like SQLite's thread-safe with exceptions like "only one thread active in any given context object at any time" than it is to put mutexes and what not around data structures that would be shared by multiple threads if this exception were not stated. Unless and until you try to do it the other way you'll find this rationale to be subjective. Like I said, good luck. - To unsubscribe, send email to [EMAIL PROTECTED] -
AW: [sqlite] sqlite performance, locking & threading
Hi Emerson, I just hope you don't reinvent the wheel ;) I haven't yet had the need to index things the way you describe it. May be I should take that as one of my next pet projects to get a handle on this type of task. The problem as I see it is basically, that any way you design this: If the storage tasks take 90% of your indexing time, then any parallelization may be a waste of effort. Even if you use a synchronization object you're essentially serializing things in a (complicated) multithreaded way... As far as static initialization: That it occurs before main() and is out of your control was the point I was getting across. That's why I wrote that this type of initialization should be avoided, unless there's no better design for it. Michael -Ursprüngliche Nachricht- Von: Emerson Clarke [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 3. Januar 2007 20:31 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] sqlite performance, locking & threading Michael, Thanks for the advice. During the indexing process i need to select and optionally insert records into a table so i cant ignore the outcomes. Basically the indexing process does compression, so for each document it inserts words into a table and looks up keys. Every word in the document gets swapped with a key, and new keys are inserted as needed. There are some problems with splitting the work up in a different way as you suggested. I would either end up with a lot of queues or i would have to stagger the work so that the entire data set gets processed in stages which doesnt scale very well and isnt particularly fault tollerant. When building an index, you want the structure to be built up progressively, so that you can pause the process and resume it later on whilst still having useful results. I would be worried that in a queued design, the overhead and bottlenecks caused by the buffering, message passing, and context switching would reduce the performance to that of a single thread. Especially since the database operations represent 90% of the work, all you would really be doing is attempting to serialise things in a multithreaded way. Im sure having worked on multithreaded systems you appreciate that sometimes simple designs are better, and i think i have a pretty good handle on what it is that im trying to do. You never have control over static initialisation, it happens before main(). If i was writing very specific code to suit just this situation then maybe as you say i wouldnt need to worry about it. But im also writing a database api, and that api is used for many different things. My considderations are not just for this one problem, but also for the best general way to code the api so that it is safe and efficient in all circumstances. So far the client/server design is the only way i can achieve true thread safety. If i could work out why sqlite3_step() causes problems across multiple threads i could probably make things a little faster and i could do away with the need for a client/server design. Emerson On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote: > Emerson, > > Now I understand your current implementation. You seemingly only > partially split up the work in your code. I'd schedule the database > operation and not wait on the outcome, but start on the next task. > When the database finishes and has retrieved its result, schedule some > work package on a third thread, which only processes the results etc. > Split up the work in to repetitive, non blocking tasks. Use multiple > queues and dedicated threads for parts of the operation or thread pools, which process queues in parallel if possible. > From what I can tell you're already half way there. > > I still don't see your static initialization problem, but that's > another story. Actually I'd avoid using static initialization or > static (singleton) instances, unless the design really requires it. > Someone must control startup of the entire process, have that one > (probably main/WinMain) take care that the work queues are available. > Afterwards the order of thread starts doesn't matter... Actually it is > non-deterministic anyway (unless you serialize this yourself.) > > Michael > > -Ursprüngliche Nachricht- > Von: Emerson Clarke [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 3. Januar 2007 15:14 > An: sqlite-users@sqlite.org > Betreff: Re: [sqlite] sqlite performance, locking & threading > > Michael, > > Im not sure that atomic operations would be a suitable alternative. > The reason why im using events/conditions is so that the client thread > blocks until the server thread has processed the query and returned > the result. If i did not need the result then a simple queueing > system with atomic operations or critical sections would be fine i guess. > >
Re: [sqlite] sqlite performance, locking & threading
Michael, Thanks for the advice. During the indexing process i need to select and optionally insert records into a table so i cant ignore the outcomes. Basically the indexing process does compression, so for each document it inserts words into a table and looks up keys. Every word in the document gets swapped with a key, and new keys are inserted as needed. There are some problems with splitting the work up in a different way as you suggested. I would either end up with a lot of queues or i would have to stagger the work so that the entire data set gets processed in stages which doesnt scale very well and isnt particularly fault tollerant. When building an index, you want the structure to be built up progressively, so that you can pause the process and resume it later on whilst still having useful results. I would be worried that in a queued design, the overhead and bottlenecks caused by the buffering, message passing, and context switching would reduce the performance to that of a single thread. Especially since the database operations represent 90% of the work, all you would really be doing is attempting to serialise things in a multithreaded way. Im sure having worked on multithreaded systems you appreciate that sometimes simple designs are better, and i think i have a pretty good handle on what it is that im trying to do. You never have control over static initialisation, it happens before main(). If i was writing very specific code to suit just this situation then maybe as you say i wouldnt need to worry about it. But im also writing a database api, and that api is used for many different things. My considderations are not just for this one problem, but also for the best general way to code the api so that it is safe and efficient in all circumstances. So far the client/server design is the only way i can achieve true thread safety. If i could work out why sqlite3_step() causes problems across multiple threads i could probably make things a little faster and i could do away with the need for a client/server design. Emerson On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote: Emerson, Now I understand your current implementation. You seemingly only partially split up the work in your code. I'd schedule the database operation and not wait on the outcome, but start on the next task. When the database finishes and has retrieved its result, schedule some work package on a third thread, which only processes the results etc. Split up the work in to repetitive, non blocking tasks. Use multiple queues and dedicated threads for parts of the operation or thread pools, which process queues in parallel if possible. From what I can tell you're already half way there. I still don't see your static initialization problem, but that's another story. Actually I'd avoid using static initialization or static (singleton) instances, unless the design really requires it. Someone must control startup of the entire process, have that one (probably main/WinMain) take care that the work queues are available. Afterwards the order of thread starts doesn't matter... Actually it is non-deterministic anyway (unless you serialize this yourself.) Michael -Ursprüngliche Nachricht- Von: Emerson Clarke [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 3. Januar 2007 15:14 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] sqlite performance, locking & threading Michael, Im not sure that atomic operations would be a suitable alternative. The reason why im using events/conditions is so that the client thread blocks until the server thread has processed the query and returned the result. If i did not need the result then a simple queueing system with atomic operations or critical sections would be fine i guess. The client thread must always block or spin until the server thread has completed the query. Critical sections cant be efficiently used to notify other threads of status change. I did try using critical sections in this way, by spinning until the server thread takes a lock, then blocking and eventually waiting for the server thread to finish. But since there is no way to block the server thread when there is no work to do both the client and server thread must sleep which induces context switching anyway. If you used atomic operations, how would you get the client thread to block and the server thread to block when it is not processing ? Events/conditions seemed to be the best solution, the server thread never runs when it doesnt need to and always wakes up when there is processing to be done. The static initialisation problem occurs becuase the server thread must be running before anything which needs to use it. If you have a static instance of a class which accesses a database and it is initalised before the static instance which controls the server thread, you have a problem. It can be overcome using the initialise on first use idiom, as long as your c
Re: [sqlite] sqlite performance, locking & threading
On Tue, Jan 02, 2007 at 11:56:42PM +, Emerson Clarke wrote: > The single connection multiple thread alternative apparently has > problems with sqlite3_step being active on more than one thread at the > same moment, so cannot easily be used in a safe way. But it is by far > the fastest and simplest alternative. No, not "apparently" -- it _does_. What you should do is keep a set of db contexts and assign them to clients/connections and make sure that each clients/connections is only every serviced by one thread at a time. One way to do this is to classify incoming messages, select an existing object representing that client/connection/whatever or create a new one, then queue the new message in that object and queue this object up for dispatch to a worker thread. Cheers, Nico -- - To unsubscribe, send email to [EMAIL PROTECTED] -
AW: [sqlite] sqlite performance, locking & threading
Emerson, Now I understand your current implementation. You seemingly only partially split up the work in your code. I'd schedule the database operation and not wait on the outcome, but start on the next task. When the database finishes and has retrieved its result, schedule some work package on a third thread, which only processes the results etc. Split up the work in to repetitive, non blocking tasks. Use multiple queues and dedicated threads for parts of the operation or thread pools, which process queues in parallel if possible. >From what I can tell you're already half way there. I still don't see your static initialization problem, but that's another story. Actually I'd avoid using static initialization or static (singleton) instances, unless the design really requires it. Someone must control startup of the entire process, have that one (probably main/WinMain) take care that the work queues are available. Afterwards the order of thread starts doesn't matter... Actually it is non-deterministic anyway (unless you serialize this yourself.) Michael -Ursprüngliche Nachricht- Von: Emerson Clarke [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 3. Januar 2007 15:14 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] sqlite performance, locking & threading Michael, Im not sure that atomic operations would be a suitable alternative. The reason why im using events/conditions is so that the client thread blocks until the server thread has processed the query and returned the result. If i did not need the result then a simple queueing system with atomic operations or critical sections would be fine i guess. The client thread must always block or spin until the server thread has completed the query. Critical sections cant be efficiently used to notify other threads of status change. I did try using critical sections in this way, by spinning until the server thread takes a lock, then blocking and eventually waiting for the server thread to finish. But since there is no way to block the server thread when there is no work to do both the client and server thread must sleep which induces context switching anyway. If you used atomic operations, how would you get the client thread to block and the server thread to block when it is not processing ? Events/conditions seemed to be the best solution, the server thread never runs when it doesnt need to and always wakes up when there is processing to be done. The static initialisation problem occurs becuase the server thread must be running before anything which needs to use it. If you have a static instance of a class which accesses a database and it is initalised before the static instance which controls the server thread, you have a problem. It can be overcome using the initialise on first use idiom, as long as your careful to protect the initalisation with atomic operations, but its still a bit complicated. Emerson On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote: > Hi Emerson, > > Another remark: On Windows using Events synchronization objects > involves additional kernel context switches and thus slows you down > more than necessary. I'd suggest using a queue, which makes use of the > InterlockedXXX operations (I've implemented a number of those, > including priority based ones - so this is possible without taking a > single lock.) or to use critical sections - those only take the kernel > context switch if there really is lock contention. If you can reduce > the kernel context switches, you're performance will likely increase drastically. > > I also don't see the static initialization problem: The queue has to > be available before any thread is started. No thread has ownership of > the queue, except may be the main thread. > > Michael > > > -Ursprüngliche Nachricht- > Von: Emerson Clarke [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 3. Januar 2007 00:57 > An: sqlite-users@sqlite.org > Betreff: Re: [sqlite] sqlite performance, locking & threading > > Nico, > > I have implemented all three strategies (thead specific connections, > single connection multiple threads, and single thread server with > multiple client threads). > > The problem with using thread specific contexts is that you cant have > a single global transaction which wraps all of those contexts. So you > end up having to use fine grained transactions, which decreases performance. > > The single connection multiple thread alternative apparently has > problems with sqlite3_step being active on more than one thread at the > same moment, so cannot easily be used in a safe way. But it is by far > the fastest and simplest alternative. > > The single thread server solution involves message passing between > threads, and even when this is done optimally with condition
Re: [sqlite] sqlite performance, locking & threading
Michael, Im not sure that atomic operations would be a suitable alternative. The reason why im using events/conditions is so that the client thread blocks until the server thread has processed the query and returned the result. If i did not need the result then a simple queueing system with atomic operations or critical sections would be fine i guess. The client thread must always block or spin until the server thread has completed the query. Critical sections cant be efficiently used to notify other threads of status change. I did try using critical sections in this way, by spinning until the server thread takes a lock, then blocking and eventually waiting for the server thread to finish. But since there is no way to block the server thread when there is no work to do both the client and server thread must sleep which induces context switching anyway. If you used atomic operations, how would you get the client thread to block and the server thread to block when it is not processing ? Events/conditions seemed to be the best solution, the server thread never runs when it doesnt need to and always wakes up when there is processing to be done. The static initialisation problem occurs becuase the server thread must be running before anything which needs to use it. If you have a static instance of a class which accesses a database and it is initalised before the static instance which controls the server thread, you have a problem. It can be overcome using the initialise on first use idiom, as long as your careful to protect the initalisation with atomic operations, but its still a bit complicated. Emerson On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote: Hi Emerson, Another remark: On Windows using Events synchronization objects involves additional kernel context switches and thus slows you down more than necessary. I'd suggest using a queue, which makes use of the InterlockedXXX operations (I've implemented a number of those, including priority based ones - so this is possible without taking a single lock.) or to use critical sections - those only take the kernel context switch if there really is lock contention. If you can reduce the kernel context switches, you're performance will likely increase drastically. I also don't see the static initialization problem: The queue has to be available before any thread is started. No thread has ownership of the queue, except may be the main thread. Michael -Ursprüngliche Nachricht- Von: Emerson Clarke [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 3. Januar 2007 00:57 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] sqlite performance, locking & threading Nico, I have implemented all three strategies (thead specific connections, single connection multiple threads, and single thread server with multiple client threads). The problem with using thread specific contexts is that you cant have a single global transaction which wraps all of those contexts. So you end up having to use fine grained transactions, which decreases performance. The single connection multiple thread alternative apparently has problems with sqlite3_step being active on more than one thread at the same moment, so cannot easily be used in a safe way. But it is by far the fastest and simplest alternative. The single thread server solution involves message passing between threads, and even when this is done optimally with condition variables (or events on windows) and blocking ive found that it results in a high number of context switches and decreased performance. It does however make a robust basis for a wrapper api, since it guarantees that things will always be synchronised. But using this arrangement can also result in various static initialisation problems, since the single thread server must always be up and running before anything which needs to use it. Emerson On 1/2/07, Nicolas Williams <[EMAIL PROTECTED]> wrote: > On Sat, Dec 30, 2006 at 03:34:01PM +, Emerson Clarke wrote: > > Technically sqlite is not thread safe. [...] > > Solaris man pages describe APIs with requirements like SQLite's as > "MT-Safe with exceptions" and the exceptions are listed in the man page. > > That's still MT-Safe, but the caller has to play by certain rules. > > Anyways, this is silly. SQLite API is MT-Safe with one exception and > that exception is rather ordinary, common to other APIs like it that > have a context object of some sort (e.g., the MIT krb5 API), and not > really a burden to the caller. In exchange for this exception you get > an implementation of the API that is lighter weight and easier to > maintain than it would have been without that exception; a good > trade-off IMO. > > Coping with this exception is easy. For example, if you have a server > app with multiple worker threads each of which needs a db context then > you could use a threa
AW: [sqlite] sqlite performance, locking & threading
Hi Emerson, Another remark: On Windows using Events synchronization objects involves additional kernel context switches and thus slows you down more than necessary. I'd suggest using a queue, which makes use of the InterlockedXXX operations (I've implemented a number of those, including priority based ones - so this is possible without taking a single lock.) or to use critical sections - those only take the kernel context switch if there really is lock contention. If you can reduce the kernel context switches, you're performance will likely increase drastically. I also don't see the static initialization problem: The queue has to be available before any thread is started. No thread has ownership of the queue, except may be the main thread. Michael -Ursprüngliche Nachricht- Von: Emerson Clarke [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 3. Januar 2007 00:57 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] sqlite performance, locking & threading Nico, I have implemented all three strategies (thead specific connections, single connection multiple threads, and single thread server with multiple client threads). The problem with using thread specific contexts is that you cant have a single global transaction which wraps all of those contexts. So you end up having to use fine grained transactions, which decreases performance. The single connection multiple thread alternative apparently has problems with sqlite3_step being active on more than one thread at the same moment, so cannot easily be used in a safe way. But it is by far the fastest and simplest alternative. The single thread server solution involves message passing between threads, and even when this is done optimally with condition variables (or events on windows) and blocking ive found that it results in a high number of context switches and decreased performance. It does however make a robust basis for a wrapper api, since it guarantees that things will always be synchronised. But using this arrangement can also result in various static initialisation problems, since the single thread server must always be up and running before anything which needs to use it. Emerson On 1/2/07, Nicolas Williams <[EMAIL PROTECTED]> wrote: > On Sat, Dec 30, 2006 at 03:34:01PM +, Emerson Clarke wrote: > > Technically sqlite is not thread safe. [...] > > Solaris man pages describe APIs with requirements like SQLite's as > "MT-Safe with exceptions" and the exceptions are listed in the man page. > > That's still MT-Safe, but the caller has to play by certain rules. > > Anyways, this is silly. SQLite API is MT-Safe with one exception and > that exception is rather ordinary, common to other APIs like it that > have a context object of some sort (e.g., the MIT krb5 API), and not > really a burden to the caller. In exchange for this exception you get > an implementation of the API that is lighter weight and easier to > maintain than it would have been without that exception; a good > trade-off IMO. > > Coping with this exception is easy. For example, if you have a server > app with multiple worker threads each of which needs a db context then > you could use a thread-specific key to track a per-thread db context; > use pthread_key_create(3C) to create the key, pthread_setspecific(3C) > once per-thread to associate a new db context with the calling thread, > and pthread_getspecific(3C) to get the calling thread's db context > when you need it. If you have a protocol where you have to step a > statement over multiple message exchanges with a client, and you don't > want to have per-client threads then get a db context > per-client/exchange and store that and a mutext in an object that > represents that client/exchange. And so on. > > Nico > -- > > -- > --- To unsubscribe, send email to > [EMAIL PROTECTED] > -- > --- > > - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
"Emerson Clarke" <[EMAIL PROTECTED]> wrote: > > Firstly can i clarify what you mean regarding the same moment. Do you > mean that no two threads can be executing the call, or that no two > threads can be in the middle of stepping through a series of results > using the step function (assuming there is some context behind the > scenes). This first. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Richard, Sorry to bring this up again, but ive just finished testing the alternative strategies. Having coded all 3 versions i still have a question about the sqlite3_step() problem which i think was isolated as the cause of my trouble a few days ago. Firstly can i clarify what you mean regarding the same moment. Do you mean that no two threads can be executing the call, or that no two threads can be in the middle of stepping through a series of results using the step function (assuming there is some context behind the scenes). Even if the threads involved are synchronised. I treat all of the api functions as synchronisation boundaries, so i would never allow two threads to call an api function at the same time. The reason i ask is because i assume it is ok to have multiple statements performing a sqlite3_step() at the same time in a single threaded environment. I know that internally sqlite turns everything into prepared statements and executes them using sqlite3_step(). And if this is the case, then in a multithreaded environment it should still be the same so long as things were synchronised ? Im confused because ive been told that it doesnt work, and i guess i just dont understand what is internally preventing it from doing so... Emerson On 12/30/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Joe Wilson <[EMAIL PROTECTED]> wrote: > --- [EMAIL PROTECTED] wrote: > > SQLite allows you to have multiple queries running at the > > same time in multiple threads, as long as no two threads are > > running sqlite3_step() at the same moment. > > Do you mean "as long as no two threads are running sqlite3_step() > _for the same sqlite3* connection_ at the same moment"? > Yes. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Nico, I have implemented all three strategies (thead specific connections, single connection multiple threads, and single thread server with multiple client threads). The problem with using thread specific contexts is that you cant have a single global transaction which wraps all of those contexts. So you end up having to use fine grained transactions, which decreases performance. The single connection multiple thread alternative apparently has problems with sqlite3_step being active on more than one thread at the same moment, so cannot easily be used in a safe way. But it is by far the fastest and simplest alternative. The single thread server solution involves message passing between threads, and even when this is done optimally with condition variables (or events on windows) and blocking ive found that it results in a high number of context switches and decreased performance. It does however make a robust basis for a wrapper api, since it guarantees that things will always be synchronised. But using this arrangement can also result in various static initialisation problems, since the single thread server must always be up and running before anything which needs to use it. Emerson On 1/2/07, Nicolas Williams <[EMAIL PROTECTED]> wrote: On Sat, Dec 30, 2006 at 03:34:01PM +, Emerson Clarke wrote: > Technically sqlite is not thread safe. [...] Solaris man pages describe APIs with requirements like SQLite's as "MT-Safe with exceptions" and the exceptions are listed in the man page. That's still MT-Safe, but the caller has to play by certain rules. Anyways, this is silly. SQLite API is MT-Safe with one exception and that exception is rather ordinary, common to other APIs like it that have a context object of some sort (e.g., the MIT krb5 API), and not really a burden to the caller. In exchange for this exception you get an implementation of the API that is lighter weight and easier to maintain than it would have been without that exception; a good trade-off IMO. Coping with this exception is easy. For example, if you have a server app with multiple worker threads each of which needs a db context then you could use a thread-specific key to track a per-thread db context; use pthread_key_create(3C) to create the key, pthread_setspecific(3C) once per-thread to associate a new db context with the calling thread, and pthread_getspecific(3C) to get the calling thread's db context when you need it. If you have a protocol where you have to step a statement over multiple message exchanges with a client, and you don't want to have per-client threads then get a db context per-client/exchange and store that and a mutext in an object that represents that client/exchange. And so on. Nico -- - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
On Sat, Dec 30, 2006 at 03:34:01PM +, Emerson Clarke wrote: > Technically sqlite is not thread safe. [...] Solaris man pages describe APIs with requirements like SQLite's as "MT-Safe with exceptions" and the exceptions are listed in the man page. That's still MT-Safe, but the caller has to play by certain rules. Anyways, this is silly. SQLite API is MT-Safe with one exception and that exception is rather ordinary, common to other APIs like it that have a context object of some sort (e.g., the MIT krb5 API), and not really a burden to the caller. In exchange for this exception you get an implementation of the API that is lighter weight and easier to maintain than it would have been without that exception; a good trade-off IMO. Coping with this exception is easy. For example, if you have a server app with multiple worker threads each of which needs a db context then you could use a thread-specific key to track a per-thread db context; use pthread_key_create(3C) to create the key, pthread_setspecific(3C) once per-thread to associate a new db context with the calling thread, and pthread_getspecific(3C) to get the calling thread's db context when you need it. If you have a protocol where you have to step a statement over multiple message exchanges with a client, and you don't want to have per-client threads then get a db context per-client/exchange and store that and a mutext in an object that represents that client/exchange. And so on. Nico -- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
"Emerson Clarke" <[EMAIL PROTECTED]> wrote: > > "With that and subsequent versions, it is safe to move a connection > handle across threads as long as the connection is not holding any > fcntl() locks. You can safely assume that no locks are being held if > no transaction is pending and all statements have been finalized." > > The above statement regarding transactions was one of the things which > led me to believe that it was not possible to have the single > transaction, single connection and multiple thread arrangement. > The statement is true in general. It works even on the older, broken versions of Linux. If you are running on a newer version of Linux or on OS-X or windows, then you can move a database connection from one thread to another anytime you want. But if you write code that does that, it will not be portable to older versions of linux with the fcntl() bug. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Richard, Im getting some mixed signals here, on the one hand i am being told that it should be ok to use a single transaction and connection across multiple threads (assuming that it is correctly synchronised). But on the other hand i am reading statements like that as part of a list of things which may cause a sqlite misuse error. "Trying to use the same SQLite database connection from two or more threads at the same time." Anyway, the sqlite_step() problem certainly explains one of the crashes that i saw so that will save me some debugging. Thanks to those who have suggested alternative designs and strategies. I am currently adapting the single thread does all the database work solution. The documentation sources i have been looking at regarding threading are (in addition to various discussions in this mailing list) here; http://www.sqlite.org/cvstrac/wiki?p=MultiThreading http://www.sqlite.org/lang_transaction.html http://www.sqlite.org/lockingv3.html http://www.sqlite.org/faq.html#q8 "With that and subsequent versions, it is safe to move a connection handle across threads as long as the connection is not holding any fcntl() locks. You can safely assume that no locks are being held if no transaction is pending and all statements have been finalized." The above statement regarding transactions was one of the things which led me to believe that it was not possible to have the single transaction, single connection and multiple thread arrangement. Emerson On 12/30/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: Joe Wilson <[EMAIL PROTECTED]> wrote: > --- [EMAIL PROTECTED] wrote: > > SQLite allows you to have multiple queries running at the > > same time in multiple threads, as long as no two threads are > > running sqlite3_step() at the same moment. > > Do you mean "as long as no two threads are running sqlite3_step() > _for the same sqlite3* connection_ at the same moment"? > Yes. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Joe Wilson <[EMAIL PROTECTED]> wrote: > --- [EMAIL PROTECTED] wrote: > > SQLite allows you to have multiple queries running at the > > same time in multiple threads, as long as no two threads are > > running sqlite3_step() at the same moment. > > Do you mean "as long as no two threads are running sqlite3_step() > _for the same sqlite3* connection_ at the same moment"? > Yes. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
On Saturday, December 30, 2006 [EMAIL PROTECTED] wrote: > Isn't there a list of possible causes for SQLITE_MISUSE somewhere. > I seem to remember writing such a list one. Does anybody know where > I put it? This is a forwarded message From:[EMAIL PROTECTED] <[EMAIL PROTECTED]> To: sqlite-users@sqlite.org Cc: Date:Tuesday, October 31, 2006 Subject: [sqlite] What is the proper use of threads? -8<- Original message text - =?ISO-8859-2?Q?Zimmer_Andr=E1s?= <[EMAIL PROTECTED]> wrote: > > However, at random times I keep getting "library routine called out of > sequence" errors (in the feeder). This is the SQLITE_MISUSE error. You can grep for SQLITE_MISUSE in the source code to find all the cases where this error can be generated. SQLITE_MISUSE gets generated for things like this: * Trying to use the same SQLite database connection from two or more threads at the same time. * Trying to use an SQLite database connection from within a Linux signal handler. * Calling sqlite3_close() on a database connection that has already been closed. * Calling sqlite3_step() on a prepared statement that has already been finalized. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - -8<- End of Original message text -- -- Doug Currie Londonderry, NH, USA - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
--- [EMAIL PROTECTED] wrote: > SQLite allows you to have multiple queries running at the > same time in multiple threads, as long as no two threads are > running sqlite3_step() at the same moment. Do you mean "as long as no two threads are running sqlite3_step() _for the same sqlite3* connection_ at the same moment"? __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Good advice. If you want to experiment with a certain architecture, write the programs yourself rather than trying to get others to do it for you. Sqlite is a well thought through solution for an embedded database with a wide range of applications, but if you want something more specialised, write it yourself. Indexing and data storage methods are well documented so it is not a daunting task to roll your own and get all the performance you want. As Roger points out, Sqlite is open source and is available to you as a model. You will find that it is clearly and simply written and the code is very easy to follow. Roger Binns wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Emerson Clarke wrote: | I have deliberately tried to avoid giving too much detail on the | architecture of the index since that was not the point and i didnt | want to end up debating it. I don't want to debate your index architecture either :-). Quite simply several times a month people post to this ideally wanting SQLite changed to match how they want to structure things. People on the list explore with the poster how the items of data are related, and can suggest an alternative way of doing things. Usually the poster finds that simpler than what they had first thought of and goes away happy. Is this your question: I want SQLite to work differently than it currently does so that it matches how I want to do things? If that is the case, then the answer is you can go ahead and rewrite as much of SQLite as you want to to do that. The code is public domain so there are no legal or technical hindrances standing in your way. This thread may as well end at that. On the other hand, if you do want to work within the constraints of SQLite then there are quite a few things that can be suggested. But that is only possible if more is known about the relationships of the data. | I did make an attempt to explain that A and B could not be done at the | same time in a previous message, but perhaps its been lost in the | conversation. The process involves several stages some of which are | database operations and some of which are file operations and that the | operations are not separable. They must be done in sequential order. I was trying to establish what has to be serialized. In particular the question was about if A and B had any relationships to each other. If they do, then that means they would have to be processed serially and I don't see the relevance of threading etc. If they can be processed at the same time, then that means some sort of partitioning can happen. In theory breaking the datasets into 10 partitions can give 10 times the performance, but in practise there will need to be some coordination in order to make it look like there is one database not multiple pieces. | The database operations, though very small still consume the most time | and are the most sensetive to how the synchronisation takes place and | where the transactions are placed. Have you considered just using plain DB/dbm/gdbm and then importing the data on demand into SQLite? Also a lot of the synchronisation is because SQLite makes damn sure it doesn't lose your data. If your documents are permanent (ie you can access them later if need be), then you can loosen the constraints on SQLite. For example you could run with pragma synchronous=off and then do a checkpoint every 100,000 documents where you close the database, copy it to a permanent file, sync, and start again. You could also use a ram disk and copy to permanent storage as your checkpoint. | I dont think custom functions are | appropriate for what im doing and im not sure how virtual tables would | be either, i rather suspect that would be a very complicated approach. You can (ab)use custom functions and virtual tables to help behind the scenes. For example they can be used to make data sets that are partitioned appear to be a single whole. Another example is if you have your database in two pieces - one that is read only with "old" data and new one with updates. That can again appear to the rest of the code as one database. Finally you can also make the functions and virtual tables have side effects even on what appear to be read only queries. | The schema is extemely simple, and there is barely any logic too the | indexing process at all. Maybe not even indexing the documents at all would work? If you used a virtual table, you can make it grovel through the documents on demand. You can even build indices (in the SQL sense) which are in your own format and performance characteristics and use those for the virtual table. | Unfortunately i cannot do this with sqlite at the moment... Correct. SQLite errs on the side of being a library with no controller, working with multiple processes and only having the lowest common denominator operating system locking functionality available. There are techniques that can be used to improve concurrency. DRH has a polic
Re: [sqlite] sqlite performance, locking & threading
"Michael Ruck" <[EMAIL PROTECTED]> wrote: > Richard, > > I believe his problem is this: > > "Each query is allowed to complete before the other one starts, but each > thread may have multiple statements or result sets open." > > The open resultsets/multiple started statements are causing him = > headaches. > SQLite allows you to have multiple queries running at the same time in multiple threads, as long as no two threads are running sqlite3_step() at the same moment. Isn't there a list of possible causes for SQLITE_MISUSE somewhere. I seem to remember writing such a list one. Does anybody know where I put it? -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
AW: [sqlite] sqlite performance, locking & threading
Richard, I believe his problem is this: "Each query is allowed to complete before the other one starts, but each thread may have multiple statements or result sets open." The open resultsets/multiple started statements are causing him headaches. Mike -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Gesendet: Samstag, 30. Dezember 2006 17:32 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] sqlite performance, locking & threading "Emerson Clarke" <[EMAIL PROTECTED]> wrote: > Richard, > > Are you sure we are not just getting into semantic knots here ? > > Do we have the same definition of "at the same time". I mean > concurrently, so that both threads use the same sqlite3 * structure, > within mutexes. Each query is allowed to complete before the other one > starts, but each thread may have multiple statements or result sets > open. > > When i try to do this, i get api called out of sequence errors... > There are around 50 test cases for this kind of behavior in the regression test files thread1.test and thread2.test. They all seem to work for me. Perhaps your mutexes are not working as you expect and you are in fact trying to use the same database connection simultaneously in two or more threads. SQLite attempts to detect this situation and when it sees it it return SQLITE_MISUSE which generates the "API called out of sequence" error. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
AW: [sqlite] sqlite performance, locking & threading
I want to contribute my 0.02€ to this discussion. Basically I believe your (Emerson) design is flawed. I've been working for years with multithreaded and even multi-core systems. From my experience a design using threads for specific tasks is *always* performing better, than having multiple threads execute the same things in parallel. This experience is based on the facts about lock contention, thread starvation, caching and more. These always apply unless you have a realtime operating system, which guarantees you that every thread is scheduled and the scheduling order. On most systems this ain't the case AFAIK. At least not on Windows/Linux, if there's heavy threading and lock contention going on. I really would suggest you to test out a design, where threads are assigned single tasks, such as retrieving data from db, writing data to db and queues, which provide these threads with work items. If you're really smart, you'll create queues using atomic operations so that no thread will take locks for these operations. To summarize my points: - Create one Sqlite writer thread, one sqlite reader thread. - Create queues for all worker threads, which provide them with work items. - Do *not* take kernel locks on data structures or libraries, this *will* really hurt your performance. - Use transactions coarse grained on the Sqlite writer thread. Either decide inside the thread, when to commit and start a new one or design a specific workitem to trigger this from the outside. This depends on having a consistent state in your data structures/the database. But: Your limiting factor will always be the hard disk. Analyze your tasks to determine, what the bounding factor is: Is it the CPU? Is it the disk? Is it the network? Only then start changing something. Multithreading only makes sense if you can parallelize heavily and are not bound by disk/network, but by CPU and have multiple (unused) cores available. And: Having multiple statements executing concurrently is only possible with multiple Sqlite connections. A connection can only keep one resultset open or a statement executing (unless that has changed recently.) Mike -Ursprüngliche Nachricht- Von: Emerson Clarke [mailto:[EMAIL PROTECTED] Gesendet: Samstag, 30. Dezember 2006 17:08 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] sqlite performance, locking & threading Richard, Are you sure we are not just getting into semantic knots here ? Do we have the same definition of "at the same time". I mean concurrently, so that both threads use the same sqlite3 * structure, within mutexes. Each query is allowed to complete before the other one starts, but each thread may have multiple statements or result sets open. When i try to do this, i get api called out of sequence errors... On 12/30/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > "Emerson Clarke" <[EMAIL PROTECTED]> wrote: > > But why then can i not have a single transaction wrapping a single > > connection which is used within multiple threads, obvioulsy not at > > the same time. > > You can. What makes you think you can't? > -- > D. Richard Hipp <[EMAIL PROTECTED]> > > > -- > --- To unsubscribe, send email to > [EMAIL PROTECTED] > -- > --- > > - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
"Emerson Clarke" <[EMAIL PROTECTED]> wrote: > Richard, > > Are you sure we are not just getting into semantic knots here ? > > Do we have the same definition of "at the same time". I mean > concurrently, so that both threads use the same sqlite3 * structure, > within mutexes. Each query is allowed to complete before the other one > starts, but each thread may have multiple statements or result sets > open. > > When i try to do this, i get api called out of sequence errors... > There are around 50 test cases for this kind of behavior in the regression test files thread1.test and thread2.test. They all seem to work for me. Perhaps your mutexes are not working as you expect and you are in fact trying to use the same database connection simultaneously in two or more threads. SQLite attempts to detect this situation and when it sees it it return SQLITE_MISUSE which generates the "API called out of sequence" error. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
"Emerson Clarke" <[EMAIL PROTECTED]> wrote: > > I have code which creates a transaction on a connection in the parent > thread, then creates several child threads which attempt to use the > same connection and transaction in a synchronised mannor. It does not > work, and by all the documentation that i have read on sqlite, it > should not work. > This should work fine on SQLite version 3.3.1 and later on newer Linux machines or on any windows or OS-X machine. And I think the documentation is clear on that point. Where in the documentation does it suggest otherwise? -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] sqlite performance, locking & threading
Been following this a while... You have access to the source, and apparently are a "threading genius." Please make the required minor changes and post a link here so we can all benefit. Fred > -Original Message- > From: Emerson Clarke [mailto:[EMAIL PROTECTED] > Sent: Saturday, December 30, 2006 9:34 AM > To: sqlite-users@sqlite.org > Subject: Re: [sqlite] sqlite performance, locking & threading > > > Roger, > > I think sqlite suffers somewhat from a bit of an identity crisis. > Whilst it is both a library and a piece of code which you embed in a > project it is often talked about as though it is some external > component. > > Technically sqlite is not thread safe. Just because the library has > explicitly disallowed using the same sqlite3 * structure in multiple > threads on some platforms (i understand this requirement has been > relaxed on others) does not make it thread safe. Even on the > platforms where a single sqlite3 * structure can be used on multiple > threads (provided it is not at the same time), it is not possible to > have a transaction which works across these threads. So even if the > connection is thread safe, the transactions are not. > > By the usual definition, something which is thread safe can be safely > used across multiple threads, usually with the aid of synchronisation > but sometimes not. For instance collections are often considdered > thread safe only when they manage their own mutexes internally so that > the user doesnt have to. But either way, you can use them accross > multiple threads. You cannot do this with sqlite, so it is quite > confusing to say that sqlite is thread safe... > > I think a better definition would be that sqlite can be safely used in > a multithreaded program, but is not thread safe. > > I agree that multithreaded programming can be difficult, but its not > magic and i think that a few simple rules can overcome most of the > problems. It certainly is not luck that multithreaded systems work, > usually its the result of careful design and hard work. > > Emerson > > On 12/30/06, Roger Binns <[EMAIL PROTECTED]> wrote: > > Emerson Clarke wrote: > > > If i have a linked list, i can use it across threads if i want to, > > > provided that i synchronise operations in such a way that the list > > > does not get corrupted. > > > > And of course you also have to know about memory barriers > and compiler > > re-ordering. That is highly dependent on the libraries > and/or compiler > > you are using, as well as underlying hardware > implementation. Most of > > the time, developers just get lucky. > > > > http://en.wikipedia.org/wiki/Memory_barrier > > > > > Likewise for most other data structures and libraries. > > > > Arguably that is by luck and not design! Look at the > effort that to go > > in an add _r suffixed versions of several functions in the standard > > libraries. And many GUI libraries have long had a > restriction that you > > can only use them in one thread. > > > > > Sqlite does not follow these rules, as something created > in one thread > > > does not work in another thread regardless of > synchronisation and it > > > is out of my control. > > > > SQLite's design was not "luck". The design expects you to > create unique > > sqlite3 objects in each thread. Effort and thought was put > into that! > > > > http://www.sqlite.org/cvstrac/wiki?p=MultiThreading > > > > It was loosened a bit in 3.3.x: > > > > http://www.sqlite.org/faq.html#q8 > > > > What isn't allowed is multiple statements executing at the > same time in > > multiple threads against the same sqlite3* db object. In order to > > support that, SQLite would have to have extensive code > protecting the > > various internal data structures as well as ensuring concurrency. > > > > > This is not a situation that i would expect anyone to purposefully > > > design becuase it makes multithreaded programming difficult, > > > > The purposeful design is that you make sqlite3 objects per > thread. That > > way there is absolutely no danger of corruption or other bad issues. > > > > Roger > > > > > > > -- > --- > > To unsubscribe, send email to [EMAIL PROTECTED] > > > -- > --- > > > > > > -- > --- > To unsubscribe, send email to [EMAIL PROTECTED] > -- > --- > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Richard, Are you sure we are not just getting into semantic knots here ? Do we have the same definition of "at the same time". I mean concurrently, so that both threads use the same sqlite3 * structure, within mutexes. Each query is allowed to complete before the other one starts, but each thread may have multiple statements or result sets open. When i try to do this, i get api called out of sequence errors... On 12/30/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: "Emerson Clarke" <[EMAIL PROTECTED]> wrote: > But why then can i not have a single transaction wrapping a single > connection which is used within multiple threads, obvioulsy not at the > same time. You can. What makes you think you can't? -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Richard, Well, that has been the whole point of this discussion. I have code which creates a transaction on a connection in the parent thread, then creates several child threads which attempt to use the same connection and transaction in a synchronised mannor. It does not work, and by all the documentation that i have read on sqlite, it should not work. Ive also tried allocating a separate transaction to each child thread, but of course then they all lock each other out. So the only workable alternative has been to either use much slower fine grained transactions within each child thread, or to use an in process server which performs all operations on one thread (at what i presume to be a considerable hit to performance). On 12/30/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: "Emerson Clarke" <[EMAIL PROTECTED]> wrote: > Even on the > platforms where a single sqlite3 * structure can be used on multiple > threads (provided it is not at the same time), it is not possible to > have a transaction which works across these threads. I beg to differ. What makes you think this does not work? -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
"Emerson Clarke" <[EMAIL PROTECTED]> wrote: > But why then can i not have a single transaction wrapping a single > connection which is used within multiple threads, obvioulsy not at the > same time. You can. What makes you think you can't? -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Richard, Ok, im pretty clear on the file locking being the cause of the problems with the sqlite3 * structures, but thanks for confirming it. I understand that on platforms that dont have this issue its not a problem. But why then can i not have a single transaction wrapping a single connection which is used within multiple threads, obvioulsy not at the same time. Its rare that anything can be used at the same time by multiple threads, hence the need for atomic operations (which *nix doesnt have great standard support for). To clarify, i have never been talking about using something simultaneously, only within the confines of synchronisation. So when i talk about using something in multiple threads, i mean in a syhcnronised way, but still it would be the same exact piece of memory. On 12/30/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: "Emerson Clarke" <[EMAIL PROTECTED]> wrote: > Richard, > > My complaint, if you want to call it that. Was simply that there are > seemingly artificial constraints on what you can and cant do accross > threads. > > If i have a linked list, i can use it across threads if i want to, > provided that i synchronise operations in such a way that the list > does not get corrupted. > > Likewise for most other data structures and libraries. The default > behaviour is that if i create an object on the stack or the heap, i > can use it between multiple threads provided that i synchronise access > to that object. > > Sqlite does not follow these rules, as something created in one thread > does not work in another thread regardless of synchronisation and it > is out of my control. > I think it was already explained to you that the reason for this behavior is to work around bugs in file locking in older versions of Linux. If you are using a newer version of Linux or OS-X or windows, those constraints do not apply to you. You can move database connections across threads freely. What you cannot do is to use the same database connection in two or more threads at the same time. This is the usual case with most libraries, threadsafe or not - you can use separate instances of an object in different threads at the same time, but you cannot use the same object simultaneously in multiple threads. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
"Emerson Clarke" <[EMAIL PROTECTED]> wrote: > Even on the > platforms where a single sqlite3 * structure can be used on multiple > threads (provided it is not at the same time), it is not possible to > have a transaction which works across these threads. I beg to differ. What makes you think this does not work? -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Roger, I think sqlite suffers somewhat from a bit of an identity crisis. Whilst it is both a library and a piece of code which you embed in a project it is often talked about as though it is some external component. Technically sqlite is not thread safe. Just because the library has explicitly disallowed using the same sqlite3 * structure in multiple threads on some platforms (i understand this requirement has been relaxed on others) does not make it thread safe. Even on the platforms where a single sqlite3 * structure can be used on multiple threads (provided it is not at the same time), it is not possible to have a transaction which works across these threads. So even if the connection is thread safe, the transactions are not. By the usual definition, something which is thread safe can be safely used across multiple threads, usually with the aid of synchronisation but sometimes not. For instance collections are often considdered thread safe only when they manage their own mutexes internally so that the user doesnt have to. But either way, you can use them accross multiple threads. You cannot do this with sqlite, so it is quite confusing to say that sqlite is thread safe... I think a better definition would be that sqlite can be safely used in a multithreaded program, but is not thread safe. I agree that multithreaded programming can be difficult, but its not magic and i think that a few simple rules can overcome most of the problems. It certainly is not luck that multithreaded systems work, usually its the result of careful design and hard work. Emerson On 12/30/06, Roger Binns <[EMAIL PROTECTED]> wrote: Emerson Clarke wrote: > If i have a linked list, i can use it across threads if i want to, > provided that i synchronise operations in such a way that the list > does not get corrupted. And of course you also have to know about memory barriers and compiler re-ordering. That is highly dependent on the libraries and/or compiler you are using, as well as underlying hardware implementation. Most of the time, developers just get lucky. http://en.wikipedia.org/wiki/Memory_barrier > Likewise for most other data structures and libraries. Arguably that is by luck and not design! Look at the effort that to go in an add _r suffixed versions of several functions in the standard libraries. And many GUI libraries have long had a restriction that you can only use them in one thread. > Sqlite does not follow these rules, as something created in one thread > does not work in another thread regardless of synchronisation and it > is out of my control. SQLite's design was not "luck". The design expects you to create unique sqlite3 objects in each thread. Effort and thought was put into that! http://www.sqlite.org/cvstrac/wiki?p=MultiThreading It was loosened a bit in 3.3.x: http://www.sqlite.org/faq.html#q8 What isn't allowed is multiple statements executing at the same time in multiple threads against the same sqlite3* db object. In order to support that, SQLite would have to have extensive code protecting the various internal data structures as well as ensuring concurrency. > This is not a situation that i would expect anyone to purposefully > design becuase it makes multithreaded programming difficult, The purposeful design is that you make sqlite3 objects per thread. That way there is absolutely no danger of corruption or other bad issues. Roger - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
"Emerson Clarke" <[EMAIL PROTECTED]> wrote: > Richard, > > My complaint, if you want to call it that. Was simply that there are > seemingly artificial constraints on what you can and cant do accross > threads. > > If i have a linked list, i can use it across threads if i want to, > provided that i synchronise operations in such a way that the list > does not get corrupted. > > Likewise for most other data structures and libraries. The default > behaviour is that if i create an object on the stack or the heap, i > can use it between multiple threads provided that i synchronise access > to that object. > > Sqlite does not follow these rules, as something created in one thread > does not work in another thread regardless of synchronisation and it > is out of my control. > I think it was already explained to you that the reason for this behavior is to work around bugs in file locking in older versions of Linux. If you are using a newer version of Linux or OS-X or windows, those constraints do not apply to you. You can move database connections across threads freely. What you cannot do is to use the same database connection in two or more threads at the same time. This is the usual case with most libraries, threadsafe or not - you can use separate instances of an object in different threads at the same time, but you cannot use the same object simultaneously in multiple threads. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Emerson Clarke wrote: If i have a linked list, i can use it across threads if i want to, provided that i synchronise operations in such a way that the list does not get corrupted. And of course you also have to know about memory barriers and compiler re-ordering. That is highly dependent on the libraries and/or compiler you are using, as well as underlying hardware implementation. Most of the time, developers just get lucky. http://en.wikipedia.org/wiki/Memory_barrier Likewise for most other data structures and libraries. Arguably that is by luck and not design! Look at the effort that to go in an add _r suffixed versions of several functions in the standard libraries. And many GUI libraries have long had a restriction that you can only use them in one thread. Sqlite does not follow these rules, as something created in one thread does not work in another thread regardless of synchronisation and it is out of my control. SQLite's design was not "luck". The design expects you to create unique sqlite3 objects in each thread. Effort and thought was put into that! http://www.sqlite.org/cvstrac/wiki?p=MultiThreading It was loosened a bit in 3.3.x: http://www.sqlite.org/faq.html#q8 What isn't allowed is multiple statements executing at the same time in multiple threads against the same sqlite3* db object. In order to support that, SQLite would have to have extensive code protecting the various internal data structures as well as ensuring concurrency. This is not a situation that i would expect anyone to purposefully design becuase it makes multithreaded programming difficult, The purposeful design is that you make sqlite3 objects per thread. That way there is absolutely no danger of corruption or other bad issues. Roger - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Richard, My complaint, if you want to call it that. Was simply that there are seemingly artificial constraints on what you can and cant do accross threads. If i have a linked list, i can use it across threads if i want to, provided that i synchronise operations in such a way that the list does not get corrupted. Likewise for most other data structures and libraries. The default behaviour is that if i create an object on the stack or the heap, i can use it between multiple threads provided that i synchronise access to that object. Sqlite does not follow these rules, as something created in one thread does not work in another thread regardless of synchronisation and it is out of my control. This is not a situation that i would expect anyone to purposefully design becuase it makes multithreaded programming difficult, particularly when most things obey the first paradigm (allowing things to be used across threads). So i really dont understand why sqlite has this thread behaviour, and i assume that there is a technical limitation rather than design decision behind it, though no one has as yet confirmed this... Emerson On 12/30/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: "Emerson Clarke" <[EMAIL PROTECTED]> wrote: >> > It seemed to me that making a library which only functioned on a per > thread basis was something that you would have to do deliberately and > by design. I'm still trying to understand what your complaint is. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
"Emerson Clarke" <[EMAIL PROTECTED]> wrote: >> > It seemed to me that making a library which only functioned on a per > thread basis was something that you would have to do deliberately and > by design. I'm still trying to understand what your complaint is. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Emerson Clarke wrote: | I am left to assume that all other locking mechanisms like ipc and | files have already been tried and been found wanting. I also assume | that priority has been given to making sqlite operate across network | boundaries rather than process to process (where the locking | requirements would be quite different) Your assumptions are not quite correct. SQLite was always designed to be a library used within an optionally threaded process. Its sweet spot is low end to mid range. You wouldn't run the New York Stock Exchange on it, but it will work well on an MP3 player with memory measured in kilobytes. It has no external requirements like a daemon for coordination and control. IPC is difficult to work with when processes are unrelated. It does use file locking - see the various os_*.c files in the source. SQLite knows absolutely nothing about networks. In theory the file locking will work on files on a networked file server, but in practise the various network file protocols and their implementations often have problems getting their native locking functionality correct. See http://www.sqlite.org/faq.html#q7 Some people have proposed a networked version of SQLite and some even sell products. None of these are a standard part of SQLite. http://www.sqlite.org/cvstrac/wiki/wiki?p=ClientServer http://www.sqlite.org/cvstrac/wiki/wiki?p=SqliteNetwork | Im not familiar with DB/dbm/gdbm, are any of those under a similar | license to sqlite and or as easy to use and simple ? :) http://en.wikipedia.org/wiki/Gdbm http://en.wikipedia.org/wiki/Berkeley_DB They effectively use fixed size records and have no SQL layer. The Berkeley DB variant should allow the concurrency you want. Version 1 of SQLite was actually just a SQL layer over the GNU variant, gdbm. http://www.sqlite.org/cvstrac/wiki?p=SqliteHistory Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFlmGomOOfHg372QQRAkKTAKDTOAt+jz/bEG0BmwJ9ZrOajOB3ZQCgjKUi qghWqZISU+h6i+S3nAr2R6E= =8BoZ -END PGP SIGNATURE- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Roger, My original question was in fact not a statement. I did not want sqlite to work differently. Rather the opposite, sqlite already works differently to the way i, and probably a lot of users assume that it would. So all i wanted to know was why that is the case. It seemed to me that making a library which only functioned on a per thread basis was something that you would have to do deliberately and by design. But i am still vague as to what the reasons were behind this design, aside from some mention of file locks being based on threads not processes. I am left to assume that all other locking mechanisms like ipc and files have already been tried and been found wanting. I also assume that priority has been given to making sqlite operate across network boundaries rather than process to process (where the locking requirements would be quite different) But thank you for your many suggestions and explanations of the virtual tables functionality. It seems quite powerful, and if i didnt want to try to keep this as database agnostic as possible i would probably look into that. I especcially liked the notion that you could keep the index as files but have it appear as a table since that is in part what i am doing anyway in a slightly more abstract way. I am already using options like "synchronous=off". I think as other users have suggested the two best options for me are to write custom locking handlers for os.c (although i am still unsure as to wether this will fix the problem, essentially i want to turn all file locking off completely since the database will never be accessed by more than one process) or to simply implement an in process server which handles all of the sqlite functionality using message passing to a single thread so that there is only ever one lock and one cache. The second option is probably the one i will try first, since its more transparent and for the cost of a single thread lets me emulate the exact same behaviour that other databases have with regards to transactions and threads. Im not familiar with DB/dbm/gdbm, are any of those under a similar license to sqlite and or as easy to use and simple ? :) Emerson On 12/30/06, Roger Binns <[EMAIL PROTECTED]> wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Emerson Clarke wrote: | I have deliberately tried to avoid giving too much detail on the | architecture of the index since that was not the point and i didnt | want to end up debating it. I don't want to debate your index architecture either :-). Quite simply several times a month people post to this ideally wanting SQLite changed to match how they want to structure things. People on the list explore with the poster how the items of data are related, and can suggest an alternative way of doing things. Usually the poster finds that simpler than what they had first thought of and goes away happy. Is this your question: I want SQLite to work differently than it currently does so that it matches how I want to do things? If that is the case, then the answer is you can go ahead and rewrite as much of SQLite as you want to to do that. The code is public domain so there are no legal or technical hindrances standing in your way. This thread may as well end at that. On the other hand, if you do want to work within the constraints of SQLite then there are quite a few things that can be suggested. But that is only possible if more is known about the relationships of the data. | I did make an attempt to explain that A and B could not be done at the | same time in a previous message, but perhaps its been lost in the | conversation. The process involves several stages some of which are | database operations and some of which are file operations and that the | operations are not separable. They must be done in sequential order. I was trying to establish what has to be serialized. In particular the question was about if A and B had any relationships to each other. If they do, then that means they would have to be processed serially and I don't see the relevance of threading etc. If they can be processed at the same time, then that means some sort of partitioning can happen. In theory breaking the datasets into 10 partitions can give 10 times the performance, but in practise there will need to be some coordination in order to make it look like there is one database not multiple pieces. | The database operations, though very small still consume the most time | and are the most sensetive to how the synchronisation takes place and | where the transactions are placed. Have you considered just using plain DB/dbm/gdbm and then importing the data on demand into SQLite? Also a lot of the synchronisation is because SQLite makes damn sure it doesn't lose your data. If your documents are permanent (ie you can access them later if need be), then you can loosen the constraints on SQLite. For example you could run with pragma synchronous=off and then do a checkpoint e
Re: [sqlite] sqlite performance, locking & threading
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Emerson Clarke wrote: | I have deliberately tried to avoid giving too much detail on the | architecture of the index since that was not the point and i didnt | want to end up debating it. I don't want to debate your index architecture either :-). Quite simply several times a month people post to this ideally wanting SQLite changed to match how they want to structure things. People on the list explore with the poster how the items of data are related, and can suggest an alternative way of doing things. Usually the poster finds that simpler than what they had first thought of and goes away happy. Is this your question: I want SQLite to work differently than it currently does so that it matches how I want to do things? If that is the case, then the answer is you can go ahead and rewrite as much of SQLite as you want to to do that. The code is public domain so there are no legal or technical hindrances standing in your way. This thread may as well end at that. On the other hand, if you do want to work within the constraints of SQLite then there are quite a few things that can be suggested. But that is only possible if more is known about the relationships of the data. | I did make an attempt to explain that A and B could not be done at the | same time in a previous message, but perhaps its been lost in the | conversation. The process involves several stages some of which are | database operations and some of which are file operations and that the | operations are not separable. They must be done in sequential order. I was trying to establish what has to be serialized. In particular the question was about if A and B had any relationships to each other. If they do, then that means they would have to be processed serially and I don't see the relevance of threading etc. If they can be processed at the same time, then that means some sort of partitioning can happen. In theory breaking the datasets into 10 partitions can give 10 times the performance, but in practise there will need to be some coordination in order to make it look like there is one database not multiple pieces. | The database operations, though very small still consume the most time | and are the most sensetive to how the synchronisation takes place and | where the transactions are placed. Have you considered just using plain DB/dbm/gdbm and then importing the data on demand into SQLite? Also a lot of the synchronisation is because SQLite makes damn sure it doesn't lose your data. If your documents are permanent (ie you can access them later if need be), then you can loosen the constraints on SQLite. For example you could run with pragma synchronous=off and then do a checkpoint every 100,000 documents where you close the database, copy it to a permanent file, sync, and start again. You could also use a ram disk and copy to permanent storage as your checkpoint. | I dont think custom functions are | appropriate for what im doing and im not sure how virtual tables would | be either, i rather suspect that would be a very complicated approach. You can (ab)use custom functions and virtual tables to help behind the scenes. For example they can be used to make data sets that are partitioned appear to be a single whole. Another example is if you have your database in two pieces - one that is read only with "old" data and new one with updates. That can again appear to the rest of the code as one database. Finally you can also make the functions and virtual tables have side effects even on what appear to be read only queries. | The schema is extemely simple, and there is barely any logic too the | indexing process at all. Maybe not even indexing the documents at all would work? If you used a virtual table, you can make it grovel through the documents on demand. You can even build indices (in the SQL sense) which are in your own format and performance characteristics and use those for the virtual table. | Unfortunately i cannot do this with sqlite at the moment... Correct. SQLite errs on the side of being a library with no controller, working with multiple processes and only having the lowest common denominator operating system locking functionality available. There are techniques that can be used to improve concurrency. DRH has a policy of only using those that are at least 17 years old, otherwise there are likely to be patent implications. See this page for example: ~ http://www.sqlite.org/cvstrac/wiki?p=BlueSky In summary, you can do one or more of the following: - - Use some other database - - Rewrite SQLite bits yourself - - Use some sort of partitioning mechanism - - ... which can be hidden using custom functions and virtual tables - - Use a different storage mechanism (eg db/gdbm) with SQLite giving you a front end (virtual tables) - - Relax synchronisation and use a checkpointing mechanism Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8
Re: [sqlite] sqlite performance, locking & threading
Roger, I have deliberately tried to avoid giving too much detail on the architecture of the index since that was not the point and i didnt want to end up debating it. The design of the index is not the issue, suffice to say that i think you are over complicating things. It is a desceptively simple problem, and i really am not insisting on anything. At the end of the day i can just use another database, but the similicity and low maintenance of sqlite appeals to me. I did make an attempt to explain that A and B could not be done at the same time in a previous message, but perhaps its been lost in the conversation. The process involves several stages some of which are database operations and some of which are file operations and that the operations are not separable. They must be done in sequential order. At this stage nothing is premature optimisation since i am merely running performance tests and modelling the architecture. I can see where the time is being spent using high resolution timers so that is my empirical evidence. The database operations, though very small still consume the most time and are the most sensetive to how the synchronisation takes place and where the transactions are placed. I dont think custom functions are appropriate for what im doing and im not sure how virtual tables would be either, i rather suspect that would be a very complicated approach. The schema is extemely simple, and there is barely any logic too the indexing process at all. The biggest impact on the performance is the transactions, and since its hard to dispute that a single global transaction is the fastest way to operate when you are using a single thread it is easy to see that being able to use one global transaction and multiple threads within that transaction is the simplest and easiest way to gain perofrmance. Unfortunately i cannot do this with sqlite at the moment... A few threads and two or three mutex lock/unlock statements is all it takes with the design that i have now. The fact that each thread must use its own connection is a hastle and does impact on the performance, but not nearly as much as the inability to wrap the whole thing in a single transaction. On 12/30/06, Roger Binns <[EMAIL PROTECTED]> wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 You never answered the bit about whether indexing A can be done at the same time as B. (I get the feeling you have a certain design in mind and insist that SQLite changes to meet that design, rather than change your own design around the constraints of SQLite). Emerson Clarke wrote: | In general worker threads is not an efficient solution to the problem | even if parallelism is available. There is nothing to be gained by | having one thread handing off A to another set of worker threads | because you have to synchronise on whatever queue/list/pipe you use to | transfer the information between threads anyway. The synchronisation is only at the begining and end of a "job". The big efficiency gain is that you can write synchronous code in the threads/processes which is way easier to write and less likely to have bugs. It is correct that you can write the same thing using a single thread and using asynchronous I/O but that is way harder. | So you may as well | just have multiple threads all performing their A,B, and C operations | in parallel and they will naturally interleave their operations based | on the synchronisation that you use. You haven't said why that won't work in practise. | threads with mutexes cant do in the foreground and only introduces | overhead and delay in the queueing process and context switching. Sounds like premature optimization to me. Compared to any I/O you are going to have to do (all of which involves interrupts, DMA transfers and context switching), other context switching is going to be negligible. | On the second point, i cant see any reason why the VDBE design would | be a problem for what i want to do. Did you run explain? Quite simply SQLite is designed to keep the database locked for the minimum amount of time, rather than having multiple VDBE's having shared locks and doing lots of concurrent locking for longer periods of time. Fundamentally changing the design of SQLite is not a quick thing. | Taking out a table wide lock is | fine, so long as i can make changes to that table from multiple | threads using the same connection and lock. "Taking out a table lock is fine, as long as the table is not locked" :-) If you have actual empirical evidence that SQLite is a bottleneck, then there are two other approaches that may help: - - Use multiple databases (eg split on first letter of document name) and use ATTACH to have them all available at the same time - - Use the new virtual tables feature and user defined functions. You can have the code consult other tables, store up information and do almost any other workflow and locking scheme you want. You can copy batches of data from your vir
Re: [sqlite] sqlite performance, locking & threading
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 You never answered the bit about whether indexing A can be done at the same time as B. (I get the feeling you have a certain design in mind and insist that SQLite changes to meet that design, rather than change your own design around the constraints of SQLite). Emerson Clarke wrote: | In general worker threads is not an efficient solution to the problem | even if parallelism is available. There is nothing to be gained by | having one thread handing off A to another set of worker threads | because you have to synchronise on whatever queue/list/pipe you use to | transfer the information between threads anyway. The synchronisation is only at the begining and end of a "job". The big efficiency gain is that you can write synchronous code in the threads/processes which is way easier to write and less likely to have bugs. It is correct that you can write the same thing using a single thread and using asynchronous I/O but that is way harder. | So you may as well | just have multiple threads all performing their A,B, and C operations | in parallel and they will naturally interleave their operations based | on the synchronisation that you use. You haven't said why that won't work in practise. | threads with mutexes cant do in the foreground and only introduces | overhead and delay in the queueing process and context switching. Sounds like premature optimization to me. Compared to any I/O you are going to have to do (all of which involves interrupts, DMA transfers and context switching), other context switching is going to be negligible. | On the second point, i cant see any reason why the VDBE design would | be a problem for what i want to do. Did you run explain? Quite simply SQLite is designed to keep the database locked for the minimum amount of time, rather than having multiple VDBE's having shared locks and doing lots of concurrent locking for longer periods of time. Fundamentally changing the design of SQLite is not a quick thing. | Taking out a table wide lock is | fine, so long as i can make changes to that table from multiple | threads using the same connection and lock. "Taking out a table lock is fine, as long as the table is not locked" :-) If you have actual empirical evidence that SQLite is a bottleneck, then there are two other approaches that may help: - - Use multiple databases (eg split on first letter of document name) and use ATTACH to have them all available at the same time - - Use the new virtual tables feature and user defined functions. You can have the code consult other tables, store up information and do almost any other workflow and locking scheme you want. You can copy batches of data from your virtual tables into the real ones, have virtual functions that look in virtual tables for updates, then fallback on original data or any other design that suits you. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFlbQUmOOfHg372QQRAn/OAJwKH6O3nWmHDRdn4ZsF+wcTaV71VQCg1zjs AmBJS5ujkKDou83gWc+Inj8= =GQLS -END PGP SIGNATURE- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Unless you have a multi processor machine or some form of parallel processing I cannot see how you can do anything other than slow your procesing by adding threading overhead. You can simplify the conception of the whole process by understanding that Sqlite writes to a single file on a single file system on a single disk drive. If you had a database striped across multiple spindles and with multiple processors you would gain by parallel processing, but I very much doubt whether you have such an environment. My suggestion is that you queue your activity to maximize throughput and use transactions or a more radical speedup. Your application looks as if you are creating some form of index which means that you can run Sqlite in its fastest and least safe mode because a crash can be repaired simply by restarting your job. Emerson Clarke wrote: Im not sure that the current situation of forced thread incompatibility is better than leaving it up to users to manage the threading. Usually it is assumed that a library is thread unsafe unless otherwise specified. Developing multithreaded applications is difficult, i wouldnt dispute that. But i do dispute the wisdom of actively making a library incompatible with threads. Sqlite is the first example of such behaviour i have seen, it deliberately fails even if the user attempts to synchronise concurrent access. Surely it would be better to design the api so that it was either fully thread safe and blocked on every call to an sqlite_ function using its own mutexes, or to simply ensure that it was process safe using shared memory. As i mentioned in a previous reply, synchronising sqlite would be no more or less difficult or error prone than synchonising a linked list. I dont see what is gained by making it not work with threads, so i assume that it is becuase of some technical reason rather than a design decision. Anyway, as another user has pointed out. You can compile the library in such a way as to allow the thread synchronisation routines to be overridden so i think i have a solution to the problem, but perhaps not an answer as to why things are the way they are. On 12/29/06, Jay Sprenkle <[EMAIL PROTECTED]> wrote: On 12/27/06, Emerson Clarke <[EMAIL PROTECTED]> wrote: > The first question is why database locking has been enabled on a per thread > basis instead of per process so that the normal methods of thread > synchronisation (mutexes, ciritcal sections) could not be used for > maintaining consistency within the application. The programmer can easily > ensure that no two statements are executed at the same time just as they > would ensure that no more than one operation on a linked list was performed > at the same time. Having watched a lot of people attempt to write thread safe code I would say this is an incorrect assumption. I think it only appears simple at a casual glance. -- The PixAddixImage Collector suite: http://groups-beta.google.com/group/pixaddix SqliteImporter and SqliteReplicator: Command line utilities for Sqlite http://www.reddawn.net/~jsprenkl/Sqlite Cthulhu Bucks! http://www.cthulhubucks.com - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
This old mailing list thread better describes this proposed algorithm to contain all open/close/lock/unlock activity in a single work thread: http://www.mail-archive.com/sqlite-users@sqlite.org/msg15852.html --- Joe Wilson <[EMAIL PROTECTED]> wrote: > As you already know, it's not just Linux - it's a POSIX thing. > It's also true with FreeBSD and OpenBSD. (BSD fcntl man page below). > > It would be great if SQLite could remove this last vestage of not > being able to reliably pass connections between threads on UNIX. > One way to accomplish that is to have all low level UNIX open() and close() > calls be performed from a single thread. Regardless of whatever thread > initiates > the sqlite3_open or sqlite3_close, SQLite could populate a threadsafe > work queue with the open/close information and wait on a condition variable > for its successful completion. The same "don't close() the file until the > file ref-count is zero" trick would still have to be employed behind > the scenes. > > This interface follows the completely stupid semantics of System V and > IEEE Std 1003.1-1988 (``POSIX.1'') that require that all locks associated > with a file for a given process are removed when any file descriptor for > that file is closed by that process. This semantic means that applica- > tions must be aware of any files that a subroutine library may access. > For example if an application for updating the password file locks the > password file database while making the update, and then calls > getpwnam(3) to retrieve a record, the lock will be lost because > getpwnam(3) opens, reads, and closes the password database. The database > close will release all locks that the process has associated with the > database, even if the library routine never requested a lock on the data- > base. Another minor semantic problem with this interface is that locks > are not inherited by a child process created using the fork(2) system > call. The flock(2) interface has much more rational last close semantics > and allows locks to be inherited by child processes. The flock(2) system > call is recommended for applications that want to ensure the integrity of > their locks when using library routines or wish to pass locks to their > children. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Richard, First let me say thank you for your earilier suggestion regarding the compile time options and overlaoding of the os routines. It was exactly the kind of information i was searching to for and i am looking into using that strategy at the moment. Sorry if ive managed to offend you, please excuse my enthusiasm. Its just the way i write and it doesnt necessarily reflect my intentions. I only meant to point out that it is very rare to encounter a library which behaves in the way that sqlite does. That is that the actual pod types cannot be used across multiple threads. I understand that sqlite can be used with multiple threads, but only in what i percieve to be a limited way. I cannot use one connection with multiple threads. So i cant take advantage of the caching unless i use other options and pragma directives. I also cannot perform multiple operations within a single transaction across multiple threads. This design is restrictive in that it forces very granular transactions which then limit the overall performance. Perhaps you could assist me in understanding why this design is better than say, just allowing one connection to be shared amongst threads and letting the user handle the synchronisation ? At the moment there is no way which i can wrap the sqlite libraries in an api which is shared with another database like mysql and have consistent behaviour. This is purely because of the restrictions on the way sqlite handles threads and the fact that a transaction cannot be used across multiple threads. If sqlite behaved differently then from the wrapper code there would be no difference between using sqlite and mysql except for the sql syntax. The fact that it is an embedded file based database would be completely transparent. At the moment it is not. Functions like qsort() are not only thread safe, but probably also reentrant. Im aware of the differences and i understand why things are and arent thread safe. Of course there are several algorithms other than quick sort which can be used to sort the same array in multiple threads. I dont wish to start a debate, merely to engange in a technical discussion of the design which sqlite uses since i dont understand the reasons behind it and i would very much like too. Emerson On 12/29/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: "Emerson Clarke" <[EMAIL PROTECTED]> wrote: > Im not sure that the current situation of forced thread > incompatibility is better than leaving it up to users to manage the > threading. Usually it is assumed that a library is thread unsafe > unless otherwise specified. > > Developing multithreaded applications is difficult, i wouldnt dispute > that. But i do dispute the wisdom of actively making a library > incompatible with threads. Sqlite is the first example of such > behaviour i have seen, it deliberately fails even if the user attempts > to synchronise concurrent access. You'll probably get more help in solving your problem if you can refrain from sophomoric remarks such as the above. > > Surely it would be better to design the api so that it was either > fully thread safe and blocked on every call to an sqlite_ function > using its own mutexes, or to simply ensure that it was process safe > using shared memory. As i mentioned in a previous reply, > synchronising sqlite would be no more or less difficult or error prone > than synchonising a linked list. The threading design of SQLite is very simple and intuitive. Multiple threads can be using SQLite at the same time as long as they are not using the same database connection at the same time. This is consistent with most other utility functions. The qsort() function is threadsafe - you can have two different threads in qsort() sorting different arrays at the same time. But it does not work to sort the same array in two threads at the same time. In the same way, SQLite allows two different threads to be working with different database connections at the same time, but not the same database connection. What about that is so difficult to understand? On some older versions of Linux, SQLite is unable to pass database connections from one thread to another. But this is a problem with the threading libraries used in those older linux versions and is outside the control of SQLite. I do not think this issue comes into play here. > > I dont see what is gained by making it not work with threads, so i > assume that it is becuase of some technical reason rather than a > design decision. SQLite does work with threads. Your comment is based on a false premise. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
--- [EMAIL PROTECTED] wrote: > On some older versions of Linux, SQLite is unable to pass > database connections from one thread to another. But this > is a problem with the threading libraries used in those older > linux versions and is outside the control of SQLite. I do not > think this issue comes into play here. As you already know, it's not just Linux - it's a POSIX thing. It's also true with FreeBSD and OpenBSD. (BSD fcntl man page below). It would be great if SQLite could remove this last vestage of not being able to reliably pass connections between threads on UNIX. One way to accomplish that is to have all low level UNIX open() and close() calls be performed from a single thread. Regardless of whatever thread initiates the sqlite3_open or sqlite3_close, SQLite could populate a threadsafe work queue with the open/close information and wait on a condition variable for its successful completion. The same "don't close() the file until the file ref-count is zero" trick would still have to be employed behind the scenes. This interface follows the completely stupid semantics of System V and IEEE Std 1003.1-1988 (``POSIX.1'') that require that all locks associated with a file for a given process are removed when any file descriptor for that file is closed by that process. This semantic means that applica- tions must be aware of any files that a subroutine library may access. For example if an application for updating the password file locks the password file database while making the update, and then calls getpwnam(3) to retrieve a record, the lock will be lost because getpwnam(3) opens, reads, and closes the password database. The database close will release all locks that the process has associated with the database, even if the library routine never requested a lock on the data- base. Another minor semantic problem with this interface is that locks are not inherited by a child process created using the fork(2) system call. The flock(2) interface has much more rational last close semantics and allows locks to be inherited by child processes. The flock(2) system call is recommended for applications that want to ensure the integrity of their locks when using library routines or wish to pass locks to their children. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
"Emerson Clarke" <[EMAIL PROTECTED]> wrote: > Im not sure that the current situation of forced thread > incompatibility is better than leaving it up to users to manage the > threading. Usually it is assumed that a library is thread unsafe > unless otherwise specified. > > Developing multithreaded applications is difficult, i wouldnt dispute > that. But i do dispute the wisdom of actively making a library > incompatible with threads. Sqlite is the first example of such > behaviour i have seen, it deliberately fails even if the user attempts > to synchronise concurrent access. You'll probably get more help in solving your problem if you can refrain from sophomoric remarks such as the above. > > Surely it would be better to design the api so that it was either > fully thread safe and blocked on every call to an sqlite_ function > using its own mutexes, or to simply ensure that it was process safe > using shared memory. As i mentioned in a previous reply, > synchronising sqlite would be no more or less difficult or error prone > than synchonising a linked list. The threading design of SQLite is very simple and intuitive. Multiple threads can be using SQLite at the same time as long as they are not using the same database connection at the same time. This is consistent with most other utility functions. The qsort() function is threadsafe - you can have two different threads in qsort() sorting different arrays at the same time. But it does not work to sort the same array in two threads at the same time. In the same way, SQLite allows two different threads to be working with different database connections at the same time, but not the same database connection. What about that is so difficult to understand? On some older versions of Linux, SQLite is unable to pass database connections from one thread to another. But this is a problem with the threading libraries used in those older linux versions and is outside the control of SQLite. I do not think this issue comes into play here. > > I dont see what is gained by making it not work with threads, so i > assume that it is becuase of some technical reason rather than a > design decision. SQLite does work with threads. Your comment is based on a false premise. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
--- Emerson Clarke <[EMAIL PROTECTED]> wrote: > Developing multithreaded applications is difficult, i wouldnt dispute > that. But i do dispute the wisdom of actively making a library > incompatible with threads. "Actively"? That's a bit much. There are constraints on the ability to pass SQLite connections between threads. To be safe, just use the connection on the same thread it was created and you'll be fine. > Sqlite is the first example of such > behaviour i have seen, it deliberately fails even if the user attempts > to synchronise concurrent access. > > Surely it would be better to design the api so that it was either > fully thread safe and blocked on every call to an sqlite_ function > using its own mutexes, or to simply ensure that it was process safe > using shared memory. As i mentioned in a previous reply, > synchronising sqlite would be no more or less difficult or error prone > than synchonising a linked list. You're free to serialize your calls to SQLite in your own code. Take a look at the many wrappers for SQLite for ideas. Shared memory is not the answer. It's a nightmare to manage and creates more problems than it solves. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
In general worker threads is not an efficient solution to the problem even if parallelism is available. There is nothing to be gained by having one thread handing off A to another set of worker threads because you have to synchronise on whatever queue/list/pipe you use to transfer the information between threads anyway. So you may as well just have multiple threads all performing their A,B, and C operations in parallel and they will naturally interleave their operations based on the synchronisation that you use. Typically worker threads are useful when you want something to be done asynchronously in the background. I dont think its a suitable design for this task though because it does not do anything that multiple threads with mutexes cant do in the foreground and only introduces overhead and delay in the queueing process and context switching. On the second point, i cant see any reason why the VDBE design would be a problem for what i want to do. Taking out a table wide lock is fine, so long as i can make changes to that table from multiple threads using the same connection and lock. The only danger occurs if those threads are not synchronised and two or more attempt to modify the table at once, but of course, thats what mutexes are for. Emerson On 12/29/06, Roger Binns <[EMAIL PROTECTED]> wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Emerson Clarke wrote: | For architectural reasons the above steps must be performed in that | order. This means that operations cant be separated or queued up in | the way that you suggested. Each step is dependent on the previous | step. I was looking for parallelism expecting that indexing one document has no dependencies on other documents. Lets say that you have 3 documents A, B and C. Is it impossible to index B until after A's indexing is done? If that is the case then your indexing process is imposing serialization and there is nothing SQLite can do about that. If the indexing of A, B & C can be done in parallel, then you can have the indexing workers send updates to a single thread/process. That can aggregate information and then dump it all in a single transaction quickly. | Of course i know the simple answer, which is that the current api does | not support this. But im wondering why, and if there are any other | ways to achieve the desired performance. It is actually the internal design of SQLite that doesn't support this. ~ (Look at VDBE instructions for queries sometime). SQLite locks the whole database, rather than tables or rows. But the locks are held for the shortest time period possible. There are other database systems out there that have finer granularity locking, but that is more complicated. ~ There are also potential patent issues and SQLite deliberately uses old techniques to ensure no patent violation. SQLite's techniques have to work without centrally coordinating mechanism and within the confines of lowest common denominator functionality available on a variety of operating systems. Quite frankly if you cannot change the way you work with your data set then you may want to try a different database system such as Postgres. Of course that internally solves the problems by having a single process and doesn't have to worry about any other process/thread stepping on its toes. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFlLG+mOOfHg372QQRAn+sAKCrImPgr0Ex1xo3XOXgWg0E7t3wWQCgtAAQ ZB5bs3fFntT4u3rABIGcOik= =c2pB -END PGP SIGNATURE- - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Im not sure that the current situation of forced thread incompatibility is better than leaving it up to users to manage the threading. Usually it is assumed that a library is thread unsafe unless otherwise specified. Developing multithreaded applications is difficult, i wouldnt dispute that. But i do dispute the wisdom of actively making a library incompatible with threads. Sqlite is the first example of such behaviour i have seen, it deliberately fails even if the user attempts to synchronise concurrent access. Surely it would be better to design the api so that it was either fully thread safe and blocked on every call to an sqlite_ function using its own mutexes, or to simply ensure that it was process safe using shared memory. As i mentioned in a previous reply, synchronising sqlite would be no more or less difficult or error prone than synchonising a linked list. I dont see what is gained by making it not work with threads, so i assume that it is becuase of some technical reason rather than a design decision. Anyway, as another user has pointed out. You can compile the library in such a way as to allow the thread synchronisation routines to be overridden so i think i have a solution to the problem, but perhaps not an answer as to why things are the way they are. On 12/29/06, Jay Sprenkle <[EMAIL PROTECTED]> wrote: On 12/27/06, Emerson Clarke <[EMAIL PROTECTED]> wrote: > The first question is why database locking has been enabled on a per thread > basis instead of per process so that the normal methods of thread > synchronisation (mutexes, ciritcal sections) could not be used for > maintaining consistency within the application. The programmer can easily > ensure that no two statements are executed at the same time just as they > would ensure that no more than one operation on a linked list was performed > at the same time. Having watched a lot of people attempt to write thread safe code I would say this is an incorrect assumption. I think it only appears simple at a casual glance. -- The PixAddixImage Collector suite: http://groups-beta.google.com/group/pixaddix SqliteImporter and SqliteReplicator: Command line utilities for Sqlite http://www.reddawn.net/~jsprenkl/Sqlite Cthulhu Bucks! http://www.cthulhubucks.com - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Emerson Clarke wrote: | For architectural reasons the above steps must be performed in that | order. This means that operations cant be separated or queued up in | the way that you suggested. Each step is dependent on the previous | step. I was looking for parallelism expecting that indexing one document has no dependencies on other documents. Lets say that you have 3 documents A, B and C. Is it impossible to index B until after A's indexing is done? If that is the case then your indexing process is imposing serialization and there is nothing SQLite can do about that. If the indexing of A, B & C can be done in parallel, then you can have the indexing workers send updates to a single thread/process. That can aggregate information and then dump it all in a single transaction quickly. | Of course i know the simple answer, which is that the current api does | not support this. But im wondering why, and if there are any other | ways to achieve the desired performance. It is actually the internal design of SQLite that doesn't support this. ~ (Look at VDBE instructions for queries sometime). SQLite locks the whole database, rather than tables or rows. But the locks are held for the shortest time period possible. There are other database systems out there that have finer granularity locking, but that is more complicated. ~ There are also potential patent issues and SQLite deliberately uses old techniques to ensure no patent violation. SQLite's techniques have to work without centrally coordinating mechanism and within the confines of lowest common denominator functionality available on a variety of operating systems. Quite frankly if you cannot change the way you work with your data set then you may want to try a different database system such as Postgres. Of course that internally solves the problems by having a single process and doesn't have to worry about any other process/thread stepping on its toes. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFlLG+mOOfHg372QQRAn+sAKCrImPgr0Ex1xo3XOXgWg0E7t3wWQCgtAAQ ZB5bs3fFntT4u3rABIGcOik= =c2pB -END PGP SIGNATURE- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
On 12/27/06, Emerson Clarke <[EMAIL PROTECTED]> wrote: The first question is why database locking has been enabled on a per thread basis instead of per process so that the normal methods of thread synchronisation (mutexes, ciritcal sections) could not be used for maintaining consistency within the application. The programmer can easily ensure that no two statements are executed at the same time just as they would ensure that no more than one operation on a linked list was performed at the same time. Having watched a lot of people attempt to write thread safe code I would say this is an incorrect assumption. I think it only appears simple at a casual glance. -- The PixAddixImage Collector suite: http://groups-beta.google.com/group/pixaddix SqliteImporter and SqliteReplicator: Command line utilities for Sqlite http://www.reddawn.net/~jsprenkl/Sqlite Cthulhu Bucks! http://www.cthulhubucks.com - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
The indexing process works like this. 1.) Open a document and parse its contents. 2.) Look up records in the first database based on the contents of the document, updating records where appropriate and inserting new ones. 3.) Transforming the document based on what was obtained from the first database. 4.) Create a filesystem structure in the form of a folder and two or more files. 5.) Look up some more records in a second database, updating and inserting as necessary. For architectural reasons the above steps must be performed in that order. This means that operations cant be separated or queued up in the way that you suggested. Each step is dependent on the previous step. But by having multiple threads and using synchronisation around the database operations you can ensure that multiple database operations are always pending. Each thread will be at a different stage in the pipeline, but a few will always be ready to perform a database operation so the idea is to maximise the throughput. As you said, the more rows per transaction, the more rows per second. Which brings us back to the original issue. Why cant i have multiple threads all using the same connection within a single transaction ? Of course i know the simple answer, which is that the current api does not support this. But im wondering why, and if there are any other ways to achieve the desired performance. Emerson On 12/28/06, Roger Binns <[EMAIL PROTECTED]> wrote: Emerson Clarke wrote: > The idea is that because i am accessing two databases, and doing > several file system operations per document, there should be a large > gain by using many threads. There is no actual indexing process, the > whole structure is the index, but if anything the database operations > take the most time. The filesystem operations have a very small > amount of overhead. That is all unclear from your original description. Aren't you trying to "index" several million documents and doesn't the process of indexing consist of two parts? 1: Open the document, parse it in various ways, build index data, close it 2: Add a row to a SQLite database My point was that #1 is way more work than #2, so you can run #1's in multiple threads/processes and do #2 in a single thread using a queue/pipe object for communication. On the other hand, if #1 is way less work than #2 then you will be bound by the speed at which you decide to make transactions in SQLite. A 7200 rpm disk limits you to 60 transactions a second. The more rows per transaction, the more rows per second. Roger - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Emerson, Is the Database on the same disk as the rest of the file operations? If so is it possible that you are I/O bound and causing seek issues due to i/o access patterns? Take a look at the test_server.c code in the sqlite/src directory. I used that as a basis to build a custom library that opens a single DB and then allows multiple threads to access. The nice thing about this architecture is that the threads will all get to write and no writer starvation. But all write operations an single threaded. The test code I ran creates any number of threads and performs the following in each thread: outer loop 1- 10 begin txn loop 1 -1000 insert record (using modulo for data so data is unique amongst threads) end loop commit prepare statement loop 1 - 1000 Select data (using modulo) end loop close statement begin transaction loop 1 - 1000 delete data, using same modulo end loop end main loop timinng (seconds) Thread count 1.6651 (transaction size is 1000) 1.6352 (transcaction size is 500) 3.094 4( txn size is 250 ) 5.571 8(txn size is 125 ) 7.82216(txn size is 62.5) so as the number of threads increase the overall time it takes to insert/select/delete a fixed set of data increases using this architecture. This is because all threads are serialized upon inserts/deletes and are contending on a single writer mutex. So in this particular case fewer threads actually improves performance. Hope this helps, Ken Emerson Clarke <[EMAIL PROTECTED]> wrote: Roger, Thanks for the suggestions. I think using a worker thread and a queue would be equivalent to just running a single thread since it effectively makes the database operations synchronous. Although i can see what your driving at regarding the transactions every n records. The idea is that because i am accessing two databases, and doing several file system operations per document, there should be a large gain by using many threads. There is no actual indexing process, the whole structure is the index, but if anything the database operations take the most time. The filesystem operations have a very small amount of overhead. I have tried the page size pragma setting already, though i read that it is dependent on the cluster size of the particular filesystem that you are running on. Since i only have one connection to each database from each thread i dont think i would benefit from the caching. Im not quite sure why you would ever have more than one connection to the database from a single thread ? The api that i use more or less ensures that under most circumstances there is only one connection. Emerson - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Emerson Clarke wrote: The idea is that because i am accessing two databases, and doing several file system operations per document, there should be a large gain by using many threads. There is no actual indexing process, the whole structure is the index, but if anything the database operations take the most time. The filesystem operations have a very small amount of overhead. That is all unclear from your original description. Aren't you trying to "index" several million documents and doesn't the process of indexing consist of two parts? 1: Open the document, parse it in various ways, build index data, close it 2: Add a row to a SQLite database My point was that #1 is way more work than #2, so you can run #1's in multiple threads/processes and do #2 in a single thread using a queue/pipe object for communication. On the other hand, if #1 is way less work than #2 then you will be bound by the speed at which you decide to make transactions in SQLite. A 7200 rpm disk limits you to 60 transactions a second. The more rows per transaction, the more rows per second. Roger - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
I am curious as to how multiple threads would perform faster inserts into an Sqlite database, which is a single file plus the journal. Are you using a multiple processor machine? Emerson Clarke wrote: Roger, Thanks for the suggestions. I think using a worker thread and a queue would be equivalent to just running a single thread since it effectively makes the database operations synchronous. Although i can see what your driving at regarding the transactions every n records. The idea is that because i am accessing two databases, and doing several file system operations per document, there should be a large gain by using many threads. There is no actual indexing process, the whole structure is the index, but if anything the database operations take the most time. The filesystem operations have a very small amount of overhead. I have tried the page size pragma setting already, though i read that it is dependent on the cluster size of the particular filesystem that you are running on. Since i only have one connection to each database from each thread i dont think i would benefit from the caching. Im not quite sure why you would ever have more than one connection to the database from a single thread ? The api that i use more or less ensures that under most circumstances there is only one connection. Emerson - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
Roger, Thanks for the suggestions. I think using a worker thread and a queue would be equivalent to just running a single thread since it effectively makes the database operations synchronous. Although i can see what your driving at regarding the transactions every n records. The idea is that because i am accessing two databases, and doing several file system operations per document, there should be a large gain by using many threads. There is no actual indexing process, the whole structure is the index, but if anything the database operations take the most time. The filesystem operations have a very small amount of overhead. I have tried the page size pragma setting already, though i read that it is dependent on the cluster size of the particular filesystem that you are running on. Since i only have one connection to each database from each thread i dont think i would benefit from the caching. Im not quite sure why you would ever have more than one connection to the database from a single thread ? The api that i use more or less ensures that under most circumstances there is only one connection. Emerson - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
"Emerson Clarke" <[EMAIL PROTECTED]> wrote: > > The first question is why database locking has been enabled on a per thread > basis instead of per process so that the normal methods of thread > synchronisation (mutexes, ciritcal sections) could not be used for > maintaining consistency within the application. Most SQLite users have multiple processes accessing the database at the same time. Hence, thread synchronization will not work in the common case. But you can redefine the locking logic of SQLite at run-time to do whatever you want. You have to compile with -DSQLITE_ENABLE_REDEF_IO=1. Once you have done that, there is a global variable named "sqlite3Os" of type "struct sqlite3OsVtbl" that contains a bunch of pointers to routines that handle the operating-specific I/O for SQLite. You can substitute alternative routines that do thread synchronization. > > And the second question is simply how hard is it to support the multiple > insert syntax discussed above, or is it simply a case of there being nothing > to be gained The multiple insert syntax provides no performance gain over the use of prepared statements. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] sqlite performance, locking & threading
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Emerson Clarke wrote: | modified the api to ensure that each thread was given its own sqlite3 * | structure. I would assume that the actual indexing is the expensive part since it involves a lot of I/O (SQLite page size is 1KB). Why don't you do this with worker threads (better on Windows) or worker processes (better on Unix)? Have a queue object that filenames to index are put into. Then have each worker thread or worker process take a filename from the queue, do whatever work it is they need to do and put their results into a result queue. You can have one thread then taking those results from the result queue and doing a commit every n results or n seconds. It is possible for multiple connections in the same thread to share a cache as well as having multiple queries in progress. See http://sqlite.org/capi3ref.html#sqlite3_enable_shared_cache and http://www.sqlite.org/sharedcache.html Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFFk2o0mOOfHg372QQRAid0AJ0Xq6pMsbpCMZVr7dBUEPrElhwtAQCeI3aR AcoNPTAv0C9kdEXuNJio28I= =gQr8 -END PGP SIGNATURE- - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] sqlite performance, locking & threading
I am developing a multithreaded C++ library which uses sqlite as an embedded database alongside the mysql client as a simple sql api. Both databases share a common interface which supports statements, prepared statements, recordsets, records, and transactions. As part of some research and testing i have been playing around with creating a small filesystem based search engine index. Each document within the index is stored as a file in a folder tree structure, but the contents of the documents are catalogued in two sqlite databases at the root of the folder tree. As each document gets added to the index, either or both of the sqlite databases are updated. The goal is to be able to index something in the order of 10 to 100 documents a second. This will allow indexing of around 8 million documents per day on a single machine. As such i have been investigating the ways in which i can get the most performance out of sqlite. I began by wrapping the entire indexing process in transactions, this immediately resulted in a huge performance gain but the entire process was still very synchronous, only indexing one document at a time. The next logical step was to add multiple threads to make better use of the operating system and interleave the various indexing tasks. At this point i ran into the limitations which sqlite places on threading, and after several crashes and error messages concerning locks and routines called out of sequence i modified the api to ensure that each thread was given its own sqlite3 * structure. However in doing this i was penalised by no longer being able to take advantage of the application wide transaction (or at least that was my understanding), so i then had to increase the granularity of the transactions to be within the normal thread synchronisation blocks. This decreased the performance to such a degree that it counteracted any gains which could be made by using multiple threads. I also attempted to improve performance by using multiple inserts but discovered that sqlite does not support the same multiple sets of values that mysql does, "insert into table(a,b) values (1,2),(3,4),(5,6)". I then tried to use prepared statements but suffered several scary looking crashes deep within the sqlite code responsible for closing file handles which i have yet to debug properly. I then discovered the pragma directives such as "pragma synchronise=off" and tried those, but did not get the performance gain that i was looking for either. My feelings are that the major gain that i am missing is being able to have an application wide transaction and multiple threads and also being able to execute multiple inserts using the values syntax or something similar. Having read the information on file locking and concurrency ( http://www.sqlite.org/lockingv3.html) and browsed the wiki's and various other online forums i have some questions regarding the above situation. The first question is why database locking has been enabled on a per thread basis instead of per process so that the normal methods of thread synchronisation (mutexes, ciritcal sections) could not be used for maintaining consistency within the application. The programmer can easily ensure that no two statements are executed at the same time just as they would ensure that no more than one operation on a linked list was performed at the same time. I read somewhere that there is a technical reason for this behaviour in sqlite, such as the fcntl() function taking out per thread locks. But i dont understand why file locking is used at all. Given that all platforms, and indeed file systems, have different locking semantics, would it not be easier to either use a .lock file (in a similar way to the ,journal file) if you really needed to support concurrent access from other machines. Or just use a much faster IPC mechanism like shared memory and assume that all access will be from multiple processes on the same machine (a reasonable assumption for an embedded database). And the second question is simply how hard is it to support the multiple insert syntax discussed above, or is it simply a case of there being nothing to be gained here because the normal insert statement is already fast enough ? Surely not... Also are any other optimisation techniques which you can suggest which i might be missing ? Apologies for the length, hopefully im not covering well trodden ground here although i suspect i am. Emerson