Re: AW: [sqlite] sqlite performance, locking & threading
> not to spark a debate Although the majority of this thread is as clear as mud, it is still interesting, even for simple VBA programmers like me that have no chance (maybe via a VB6 ActiveX exe) to use multi-threading. RBS > Emerson, one posts to a forum like this to get help and other ideas, not > to spark a debate. Many talented people gave you some of their time to > help you solve your problem and one in particular gave you a well > conceived and executed piece of software free of charge. Appreciate > their charity. > > If you have some insights which have escaped the rest of us, implement > them and present the results to the world and dazzle us. > > BTW, a major advantage of Open Source software is that you do not need > to have people explain it to you. You have the source and that explains > everything. That is particularly so with Sqlite, which is clearly > written and consequently the source reads like a book. A few minutes > with the source and grep and you have your answers. > > Emerson Clarke wrote: >> John, >> >> Um, alright then... >> >> But i think your preaching to the converted, simplifying things is >> what i always try to do. And not just when theres a problem >> >> If you followed the thread fully you would realise that there was >> never a problem with my design, though that didnt stop many people >> from chirping in and offering suggestions. >> >> The problem i had was with sqlite not being compatible with the simple >> design that i wanted. I did try several alternate designs, but only >> as a way of working around the problem i had with sqlite. It took a >> long time but eventually i managed to get someone to explain why >> sqlite had that particular problem, and i was able to modify the >> sqlite source to resolve the issue. >> >> Unfortunately no one has yet commented on my solution, or the problem >> which it addresses. Basically sqlite has thread safety checking >> routines which work a little like mutexe's. Every time you enter a >> bit of code which is potentially thread unsafe it sets a magic number, >> then resets it when it comes out. This is an attempt to detect when >> two threads are accessing the same bit of code at the same time. >> >> Clearly its not 100% reliable, and is subject to all kinds of thread >> races, but it does provide some measure of protection. Unfortunately >> though, the way it has been coded, an unbalanced safety check is >> performed in the sqlite3_step() function. >> >> This is equivalent to entering a mutex but never leaving, which causes >> deadlock in a multithreaded program. Only in this situation sqlite >> throws a misuse error any time two or more threads use sqlite3_step() >> at the same time, even if those threads are synchronised and perfectly >> safe. >> >> The easy solution is to disable the safety checks, the propper >> solution is to balance out the checks in sqlite3_step() so that users >> who actually wish to use sqlite in a multithreaded program are free to >> synchronise access to the api without error and there is still a >> reasonable level of safety checking for users who do not synchronise >> properly. >> >> >> Emerson >> >> On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote: >> >>> Work on turning "reasonable" into "adequate" or "good" and it will help >>> you get an intuitive feel for the design of programs such as yours. >>> Then your programs will be simple, fast and robust, as Einstein >>> counselled - "Make it as simple a possible, but no simpler". >>> >>> I also suggest that you take Niklaus Wirth's advice and when you run >>> into a difficulty backtrack your work and scrap everything until you >>> reach a point where there are no problems and start again from that >>> point taking a different approach having learned a lesson from your >>> previous attempt. >>> >>> By the way, I doubt whether you are using a modern operating system, it >>> is most likely to be old technology like Windows or Linux. Neither >>> supports much in the way of parallelism. >>> >> >> - >> >> To unsubscribe, send email to [EMAIL PROTECTED] >> - >> >> > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: AW: [sqlite] sqlite performance, locking & threading
Can we please stop this thread? John Stanton wrote: Emerson, one posts to a forum like this to get help and other ideas, not to spark a debate. Many talented people gave you some of their time to help you solve your problem and one in particular gave you a well conceived and executed piece of software free of charge. Appreciate their charity. If you have some insights which have escaped the rest of us, implement them and present the results to the world and dazzle us. BTW, a major advantage of Open Source software is that you do not need to have people explain it to you. You have the source and that explains everything. That is particularly so with Sqlite, which is clearly written and consequently the source reads like a book. A few minutes with the source and grep and you have your answers. Emerson Clarke wrote: John, Um, alright then... But i think your preaching to the converted, simplifying things is what i always try to do. And not just when theres a problem If you followed the thread fully you would realise that there was never a problem with my design, though that didnt stop many people from chirping in and offering suggestions. The problem i had was with sqlite not being compatible with the simple design that i wanted. I did try several alternate designs, but only as a way of working around the problem i had with sqlite. It took a long time but eventually i managed to get someone to explain why sqlite had that particular problem, and i was able to modify the sqlite source to resolve the issue. Unfortunately no one has yet commented on my solution, or the problem which it addresses. Basically sqlite has thread safety checking routines which work a little like mutexe's. Every time you enter a bit of code which is potentially thread unsafe it sets a magic number, then resets it when it comes out. This is an attempt to detect when two threads are accessing the same bit of code at the same time. Clearly its not 100% reliable, and is subject to all kinds of thread races, but it does provide some measure of protection. Unfortunately though, the way it has been coded, an unbalanced safety check is performed in the sqlite3_step() function. This is equivalent to entering a mutex but never leaving, which causes deadlock in a multithreaded program. Only in this situation sqlite throws a misuse error any time two or more threads use sqlite3_step() at the same time, even if those threads are synchronised and perfectly safe. The easy solution is to disable the safety checks, the propper solution is to balance out the checks in sqlite3_step() so that users who actually wish to use sqlite in a multithreaded program are free to synchronise access to the api without error and there is still a reasonable level of safety checking for users who do not synchronise properly. Emerson On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote: Work on turning "reasonable" into "adequate" or "good" and it will help you get an intuitive feel for the design of programs such as yours. Then your programs will be simple, fast and robust, as Einstein counselled - "Make it as simple a possible, but no simpler". I also suggest that you take Niklaus Wirth's advice and when you run into a difficulty backtrack your work and scrap everything until you reach a point where there are no problems and start again from that point taking a different approach having learned a lesson from your previous attempt. By the way, I doubt whether you are using a modern operating system, it is most likely to be old technology like Windows or Linux. Neither supports much in the way of parallelism. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: AW: [sqlite] sqlite performance, locking & threading
Emerson, one posts to a forum like this to get help and other ideas, not to spark a debate. Many talented people gave you some of their time to help you solve your problem and one in particular gave you a well conceived and executed piece of software free of charge. Appreciate their charity. If you have some insights which have escaped the rest of us, implement them and present the results to the world and dazzle us. BTW, a major advantage of Open Source software is that you do not need to have people explain it to you. You have the source and that explains everything. That is particularly so with Sqlite, which is clearly written and consequently the source reads like a book. A few minutes with the source and grep and you have your answers. Emerson Clarke wrote: John, Um, alright then... But i think your preaching to the converted, simplifying things is what i always try to do. And not just when theres a problem If you followed the thread fully you would realise that there was never a problem with my design, though that didnt stop many people from chirping in and offering suggestions. The problem i had was with sqlite not being compatible with the simple design that i wanted. I did try several alternate designs, but only as a way of working around the problem i had with sqlite. It took a long time but eventually i managed to get someone to explain why sqlite had that particular problem, and i was able to modify the sqlite source to resolve the issue. Unfortunately no one has yet commented on my solution, or the problem which it addresses. Basically sqlite has thread safety checking routines which work a little like mutexe's. Every time you enter a bit of code which is potentially thread unsafe it sets a magic number, then resets it when it comes out. This is an attempt to detect when two threads are accessing the same bit of code at the same time. Clearly its not 100% reliable, and is subject to all kinds of thread races, but it does provide some measure of protection. Unfortunately though, the way it has been coded, an unbalanced safety check is performed in the sqlite3_step() function. This is equivalent to entering a mutex but never leaving, which causes deadlock in a multithreaded program. Only in this situation sqlite throws a misuse error any time two or more threads use sqlite3_step() at the same time, even if those threads are synchronised and perfectly safe. The easy solution is to disable the safety checks, the propper solution is to balance out the checks in sqlite3_step() so that users who actually wish to use sqlite in a multithreaded program are free to synchronise access to the api without error and there is still a reasonable level of safety checking for users who do not synchronise properly. Emerson On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote: Work on turning "reasonable" into "adequate" or "good" and it will help you get an intuitive feel for the design of programs such as yours. Then your programs will be simple, fast and robust, as Einstein counselled - "Make it as simple a possible, but no simpler". I also suggest that you take Niklaus Wirth's advice and when you run into a difficulty backtrack your work and scrap everything until you reach a point where there are no problems and start again from that point taking a different approach having learned a lesson from your previous attempt. By the way, I doubt whether you are using a modern operating system, it is most likely to be old technology like Windows or Linux. Neither supports much in the way of parallelism. - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: AW: [sqlite] sqlite performance, locking & threading
John, Um, alright then... But i think your preaching to the converted, simplifying things is what i always try to do. And not just when theres a problem If you followed the thread fully you would realise that there was never a problem with my design, though that didnt stop many people from chirping in and offering suggestions. The problem i had was with sqlite not being compatible with the simple design that i wanted. I did try several alternate designs, but only as a way of working around the problem i had with sqlite. It took a long time but eventually i managed to get someone to explain why sqlite had that particular problem, and i was able to modify the sqlite source to resolve the issue. Unfortunately no one has yet commented on my solution, or the problem which it addresses. Basically sqlite has thread safety checking routines which work a little like mutexe's. Every time you enter a bit of code which is potentially thread unsafe it sets a magic number, then resets it when it comes out. This is an attempt to detect when two threads are accessing the same bit of code at the same time. Clearly its not 100% reliable, and is subject to all kinds of thread races, but it does provide some measure of protection. Unfortunately though, the way it has been coded, an unbalanced safety check is performed in the sqlite3_step() function. This is equivalent to entering a mutex but never leaving, which causes deadlock in a multithreaded program. Only in this situation sqlite throws a misuse error any time two or more threads use sqlite3_step() at the same time, even if those threads are synchronised and perfectly safe. The easy solution is to disable the safety checks, the propper solution is to balance out the checks in sqlite3_step() so that users who actually wish to use sqlite in a multithreaded program are free to synchronise access to the api without error and there is still a reasonable level of safety checking for users who do not synchronise properly. Emerson On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote: Work on turning "reasonable" into "adequate" or "good" and it will help you get an intuitive feel for the design of programs such as yours. Then your programs will be simple, fast and robust, as Einstein counselled - "Make it as simple a possible, but no simpler". I also suggest that you take Niklaus Wirth's advice and when you run into a difficulty backtrack your work and scrap everything until you reach a point where there are no problems and start again from that point taking a different approach having learned a lesson from your previous attempt. By the way, I doubt whether you are using a modern operating system, it is most likely to be old technology like Windows or Linux. Neither supports much in the way of parallelism. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: AW: [sqlite] sqlite performance, locking & threading
Work on turning "reasonable" into "adequate" or "good" and it will help you get an intuitive feel for the design of programs such as yours. Then your programs will be simple, fast and robust, as Einstein counselled - "Make it as simple a possible, but no simpler". I also suggest that you take Niklaus Wirth's advice and when you run into a difficulty backtrack your work and scrap everything until you reach a point where there are no problems and start again from that point taking a different approach having learned a lesson from your previous attempt. By the way, I doubt whether you are using a modern operating system, it is most likely to be old technology like Windows or Linux. Neither supports much in the way of parallelism. Emerson Clarke wrote: John, I have a reasonable understanding of the PC architecture, and more appropriately the architecture which the operating system presents to software. The PC may be a serial device, but a modern operating system with its multitasking shcheduler attempts to emulate a non serial environment. It devotes a certain amount of time to each thread and then moves on. Wether or not you are working on a highly pyshically parallel architecture or not makes no difference, the design principles are the same. You should still build your software so that it is capable of taking advantage of the environment that the operating system presents. It is the operating system you should be designing for, not the hardware... As it happens, the newest generation of PC's are all multi-core, and i have been working on multi processor environments for many years. Emerson On 1/4/07, John Stanton <[EMAIL PROTECTED]> wrote: If Emeroson intuitively understood the essential architecture of the PC he is using he would not be having difficulty with his concept of how to use it. It is essentially a serial device, multi-tasking device and parallelism in the forms of threading and multi processing is a sophistication added with a high overhead. I recollect an insightful CS professor impressing on his his students the concept by explaining to them that the machines on their desks were descended from a device invented to be a gas pump controller. A machine designed from first principles to manage parrallel processing would be very different. Michael Ruck wrote: > Hi Emerson, > > I just hope you don't reinvent the wheel ;) I haven't yet had the need to > index things the way you describe it. May be I should take that as one of my > next pet projects to get a handle on this type of task. > > The problem as I see it is basically, that any way you design this: If the > storage tasks take 90% of your indexing time, then any parallelization may > be a waste of effort. Even if you use a synchronization object you're > essentially serializing things in a (complicated) multithreaded way... > > As far as static initialization: That it occurs before main() and is out of > your control was the point I was getting across. That's why I wrote that > this type of initialization should be avoided, unless there's no better > design for it. > > Michael > > -Ursprüngliche Nachricht- > Von: Emerson Clarke [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 3. Januar 2007 20:31 > An: sqlite-users@sqlite.org > Betreff: Re: [sqlite] sqlite performance, locking & threading > > Michael, > > Thanks for the advice. During the indexing process i need to select and > optionally insert records into a table so i cant ignore the outcomes. > > Basically the indexing process does compression, so for each document it > inserts words into a table and looks up keys. Every word in the document > gets swapped with a key, and new keys are inserted as needed. > > There are some problems with splitting the work up in a different way as you > suggested. I would either end up with a lot of queues or i would have to > stagger the work so that the entire data set gets processed in stages which > doesnt scale very well and isnt particularly fault tollerant. When building > an index, you want the structure to be built up progressively, so that you > can pause the process and resume it later on whilst still having useful > results. > > I would be worried that in a queued design, the overhead and bottlenecks > caused by the buffering, message passing, and context switching would reduce > the performance to that of a single thread. > Especially since the database operations represent 90% of the work, all you > would really be doing is attempting to serialise things in a multithreaded > way. > > Im sure having worked on multithreaded systems you appreciate that sometimes > simple designs are better, and i think i have a pretty good handle on what > it is that im trying to do. > > You never have control over static initialisation, it happens before main(). > If i was writing very specific code to suit just this situation then maybe > as you say i wouldnt need to worry about it. But i
Re: AW: [sqlite] sqlite performance, locking & threading
John, I have a reasonable understanding of the PC architecture, and more appropriately the architecture which the operating system presents to software. The PC may be a serial device, but a modern operating system with its multitasking shcheduler attempts to emulate a non serial environment. It devotes a certain amount of time to each thread and then moves on. Wether or not you are working on a highly pyshically parallel architecture or not makes no difference, the design principles are the same. You should still build your software so that it is capable of taking advantage of the environment that the operating system presents. It is the operating system you should be designing for, not the hardware... As it happens, the newest generation of PC's are all multi-core, and i have been working on multi processor environments for many years. Emerson On 1/4/07, John Stanton <[EMAIL PROTECTED]> wrote: If Emeroson intuitively understood the essential architecture of the PC he is using he would not be having difficulty with his concept of how to use it. It is essentially a serial device, multi-tasking device and parallelism in the forms of threading and multi processing is a sophistication added with a high overhead. I recollect an insightful CS professor impressing on his his students the concept by explaining to them that the machines on their desks were descended from a device invented to be a gas pump controller. A machine designed from first principles to manage parrallel processing would be very different. Michael Ruck wrote: > Hi Emerson, > > I just hope you don't reinvent the wheel ;) I haven't yet had the need to > index things the way you describe it. May be I should take that as one of my > next pet projects to get a handle on this type of task. > > The problem as I see it is basically, that any way you design this: If the > storage tasks take 90% of your indexing time, then any parallelization may > be a waste of effort. Even if you use a synchronization object you're > essentially serializing things in a (complicated) multithreaded way... > > As far as static initialization: That it occurs before main() and is out of > your control was the point I was getting across. That's why I wrote that > this type of initialization should be avoided, unless there's no better > design for it. > > Michael > > -Ursprüngliche Nachricht- > Von: Emerson Clarke [mailto:[EMAIL PROTECTED] > Gesendet: Mittwoch, 3. Januar 2007 20:31 > An: sqlite-users@sqlite.org > Betreff: Re: [sqlite] sqlite performance, locking & threading > > Michael, > > Thanks for the advice. During the indexing process i need to select and > optionally insert records into a table so i cant ignore the outcomes. > > Basically the indexing process does compression, so for each document it > inserts words into a table and looks up keys. Every word in the document > gets swapped with a key, and new keys are inserted as needed. > > There are some problems with splitting the work up in a different way as you > suggested. I would either end up with a lot of queues or i would have to > stagger the work so that the entire data set gets processed in stages which > doesnt scale very well and isnt particularly fault tollerant. When building > an index, you want the structure to be built up progressively, so that you > can pause the process and resume it later on whilst still having useful > results. > > I would be worried that in a queued design, the overhead and bottlenecks > caused by the buffering, message passing, and context switching would reduce > the performance to that of a single thread. > Especially since the database operations represent 90% of the work, all you > would really be doing is attempting to serialise things in a multithreaded > way. > > Im sure having worked on multithreaded systems you appreciate that sometimes > simple designs are better, and i think i have a pretty good handle on what > it is that im trying to do. > > You never have control over static initialisation, it happens before main(). > If i was writing very specific code to suit just this situation then maybe > as you say i wouldnt need to worry about it. But im also writing a database > api, and that api is used for many different things. My considderations are > not just for this one problem, but also for the best general way to code the > api so that it is safe and efficient in all circumstances. So far the > client/server design is the only way i can achieve true thread safety. > > If i could work out why sqlite3_step() causes problems across multiple > threads i could probably make things a little faster and i could do away > with the need for a client/server design. > > Emerson > > > On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote: > >>Emerson, >> >>Now I understand your current implementation. You seemingly only >>partially split up the work in your code. I'd schedule the database >>operation and not wait on the outcome, but start on the ne
Re: AW: [sqlite] sqlite performance, locking & threading
If Emeroson intuitively understood the essential architecture of the PC he is using he would not be having difficulty with his concept of how to use it. It is essentially a serial device, multi-tasking device and parallelism in the forms of threading and multi processing is a sophistication added with a high overhead. I recollect an insightful CS professor impressing on his his students the concept by explaining to them that the machines on their desks were descended from a device invented to be a gas pump controller. A machine designed from first principles to manage parrallel processing would be very different. Michael Ruck wrote: Hi Emerson, I just hope you don't reinvent the wheel ;) I haven't yet had the need to index things the way you describe it. May be I should take that as one of my next pet projects to get a handle on this type of task. The problem as I see it is basically, that any way you design this: If the storage tasks take 90% of your indexing time, then any parallelization may be a waste of effort. Even if you use a synchronization object you're essentially serializing things in a (complicated) multithreaded way... As far as static initialization: That it occurs before main() and is out of your control was the point I was getting across. That's why I wrote that this type of initialization should be avoided, unless there's no better design for it. Michael -Ursprüngliche Nachricht- Von: Emerson Clarke [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 3. Januar 2007 20:31 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] sqlite performance, locking & threading Michael, Thanks for the advice. During the indexing process i need to select and optionally insert records into a table so i cant ignore the outcomes. Basically the indexing process does compression, so for each document it inserts words into a table and looks up keys. Every word in the document gets swapped with a key, and new keys are inserted as needed. There are some problems with splitting the work up in a different way as you suggested. I would either end up with a lot of queues or i would have to stagger the work so that the entire data set gets processed in stages which doesnt scale very well and isnt particularly fault tollerant. When building an index, you want the structure to be built up progressively, so that you can pause the process and resume it later on whilst still having useful results. I would be worried that in a queued design, the overhead and bottlenecks caused by the buffering, message passing, and context switching would reduce the performance to that of a single thread. Especially since the database operations represent 90% of the work, all you would really be doing is attempting to serialise things in a multithreaded way. Im sure having worked on multithreaded systems you appreciate that sometimes simple designs are better, and i think i have a pretty good handle on what it is that im trying to do. You never have control over static initialisation, it happens before main(). If i was writing very specific code to suit just this situation then maybe as you say i wouldnt need to worry about it. But im also writing a database api, and that api is used for many different things. My considderations are not just for this one problem, but also for the best general way to code the api so that it is safe and efficient in all circumstances. So far the client/server design is the only way i can achieve true thread safety. If i could work out why sqlite3_step() causes problems across multiple threads i could probably make things a little faster and i could do away with the need for a client/server design. Emerson On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote: Emerson, Now I understand your current implementation. You seemingly only partially split up the work in your code. I'd schedule the database operation and not wait on the outcome, but start on the next task. When the database finishes and has retrieved its result, schedule some work package on a third thread, which only processes the results etc. Split up the work in to repetitive, non blocking tasks. Use multiple queues and dedicated threads for parts of the operation or thread pools, which process queues in parallel if possible. From what I can tell you're already half way there. I still don't see your static initialization problem, but that's another story. Actually I'd avoid using static initialization or static (singleton) instances, unless the design really requires it. Someone must control startup of the entire process, have that one (probably main/WinMain) take care that the work queues are available. Afterwards the order of thread starts doesn't matter... Actually it is non-deterministic anyway (unless you serialize this yourself.) Michael -Ursprüngliche Nachricht- Von: Emerson Clarke [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 3. Januar 2007 15:14 An: sqlite-users@sqlite.org B