Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread bartsmissaert
> not to spark a debate

Although the majority of this thread is as clear as mud, it is still
interesting, even for simple VBA programmers like me that have no chance
(maybe via a VB6 ActiveX exe) to use multi-threading.

RBS


> Emerson, one posts to a forum like this to get help and other ideas, not
> to spark a debate.  Many talented people gave you some of their time to
> help you solve your problem and one in particular gave you a well
> conceived and executed piece of software free of charge.  Appreciate
> their charity.
>
> If you have some insights which have escaped the rest of us, implement
> them and present the results to the world and dazzle us.
>
> BTW, a major advantage of Open Source software is that you do not need
> to have people explain it to you.  You have the source and that explains
> everything.  That is particularly so with Sqlite, which is clearly
> written and consequently the source reads like a book.  A few minutes
> with the source and grep and you have your answers.
>
> Emerson Clarke wrote:
>> John,
>>
>> Um, alright then...
>>
>> But i think your preaching to the converted, simplifying things is
>> what i always try to do.  And not just when theres a problem
>>
>> If you followed the thread fully you would realise that there was
>> never a problem with my design, though that didnt stop many people
>> from chirping in and offering suggestions.
>>
>> The problem i had was with sqlite not being compatible with the simple
>> design that i wanted.  I did try several alternate designs, but only
>> as a way of working around the problem i had with sqlite.  It took a
>> long time but eventually i managed to get someone to explain why
>> sqlite had that particular problem, and i was able to modify the
>> sqlite source to resolve the issue.
>>
>> Unfortunately no one has yet commented on my solution, or the problem
>> which it addresses.   Basically sqlite has thread safety checking
>> routines which work a little like mutexe's.  Every time you enter a
>> bit of code which is potentially thread unsafe it sets a magic number,
>> then resets it when it comes out.  This is an attempt to detect when
>> two threads are accessing the same bit of code at the same time.
>>
>> Clearly its not 100% reliable, and is subject to all kinds of thread
>> races, but it does provide some measure of protection.  Unfortunately
>> though, the way it has been coded, an unbalanced safety check is
>> performed in the sqlite3_step() function.
>>
>> This is equivalent to entering a mutex but never leaving, which causes
>> deadlock in a multithreaded program.  Only in this situation sqlite
>> throws a misuse error any time two or more threads use sqlite3_step()
>> at the same time, even if those threads are synchronised and perfectly
>> safe.
>>
>> The easy solution is to disable the safety checks, the propper
>> solution is to balance out the checks in sqlite3_step() so that users
>> who actually wish to use sqlite in a multithreaded program are free to
>> synchronise access to the api without error and there is still a
>> reasonable level of safety checking for users who do not synchronise
>> properly.
>>
>>
>> Emerson
>>
>> On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote:
>>
>>> Work on turning "reasonable" into "adequate" or "good" and it will help
>>> you get an intuitive feel for the design of programs such as yours.
>>> Then your programs will be simple, fast and robust, as Einstein
>>> counselled - "Make it as simple a possible, but no simpler".
>>>
>>> I also suggest that you take Niklaus Wirth's advice and when you run
>>> into a difficulty backtrack your work and scrap everything until you
>>> reach a point where there are no problems and start again from that
>>> point taking a different approach having learned a lesson from your
>>> previous attempt.
>>>
>>> By the way, I doubt whether you are using a modern operating system, it
>>> is most likely to be old technology like Windows or Linux.  Neither
>>> supports much in the way of parallelism.
>>>
>>
>> -
>>
>> To unsubscribe, send email to [EMAIL PROTECTED]
>> -
>>
>>
>
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>
>




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread dcharno

Can we please stop this thread?

John Stanton wrote:
Emerson, one posts to a forum like this to get help and other ideas, not 
to spark a debate.  Many talented people gave you some of their time to 
help you solve your problem and one in particular gave you a well 
conceived and executed piece of software free of charge.  Appreciate 
their charity.


If you have some insights which have escaped the rest of us, implement 
them and present the results to the world and dazzle us.


BTW, a major advantage of Open Source software is that you do not need 
to have people explain it to you.  You have the source and that explains 
everything.  That is particularly so with Sqlite, which is clearly 
written and consequently the source reads like a book.  A few minutes 
with the source and grep and you have your answers.


Emerson Clarke wrote:

John,

Um, alright then...

But i think your preaching to the converted, simplifying things is
what i always try to do.  And not just when theres a problem

If you followed the thread fully you would realise that there was
never a problem with my design, though that didnt stop many people
from chirping in and offering suggestions.

The problem i had was with sqlite not being compatible with the simple
design that i wanted.  I did try several alternate designs, but only
as a way of working around the problem i had with sqlite.  It took a
long time but eventually i managed to get someone to explain why
sqlite had that particular problem, and i was able to modify the
sqlite source to resolve the issue.

Unfortunately no one has yet commented on my solution, or the problem
which it addresses.   Basically sqlite has thread safety checking
routines which work a little like mutexe's.  Every time you enter a
bit of code which is potentially thread unsafe it sets a magic number,
then resets it when it comes out.  This is an attempt to detect when
two threads are accessing the same bit of code at the same time.

Clearly its not 100% reliable, and is subject to all kinds of thread
races, but it does provide some measure of protection.  Unfortunately
though, the way it has been coded, an unbalanced safety check is
performed in the sqlite3_step() function.

This is equivalent to entering a mutex but never leaving, which causes
deadlock in a multithreaded program.  Only in this situation sqlite
throws a misuse error any time two or more threads use sqlite3_step()
at the same time, even if those threads are synchronised and perfectly
safe.

The easy solution is to disable the safety checks, the propper
solution is to balance out the checks in sqlite3_step() so that users
who actually wish to use sqlite in a multithreaded program are free to
synchronise access to the api without error and there is still a
reasonable level of safety checking for users who do not synchronise
properly.


Emerson

On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote:


Work on turning "reasonable" into "adequate" or "good" and it will help
you get an intuitive feel for the design of programs such as yours.
Then your programs will be simple, fast and robust, as Einstein
counselled - "Make it as simple a possible, but no simpler".

I also suggest that you take Niklaus Wirth's advice and when you run
into a difficulty backtrack your work and scrap everything until you
reach a point where there are no problems and start again from that
point taking a different approach having learned a lesson from your
previous attempt.

By the way, I doubt whether you are using a modern operating system, it
is most likely to be old technology like Windows or Linux.  Neither
supports much in the way of parallelism.



- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 









-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread John Stanton
Emerson, one posts to a forum like this to get help and other ideas, not 
to spark a debate.  Many talented people gave you some of their time to 
help you solve your problem and one in particular gave you a well 
conceived and executed piece of software free of charge.  Appreciate 
their charity.


If you have some insights which have escaped the rest of us, implement 
them and present the results to the world and dazzle us.


BTW, a major advantage of Open Source software is that you do not need 
to have people explain it to you.  You have the source and that explains 
everything.  That is particularly so with Sqlite, which is clearly 
written and consequently the source reads like a book.  A few minutes 
with the source and grep and you have your answers.


Emerson Clarke wrote:

John,

Um, alright then...

But i think your preaching to the converted, simplifying things is
what i always try to do.  And not just when theres a problem

If you followed the thread fully you would realise that there was
never a problem with my design, though that didnt stop many people
from chirping in and offering suggestions.

The problem i had was with sqlite not being compatible with the simple
design that i wanted.  I did try several alternate designs, but only
as a way of working around the problem i had with sqlite.  It took a
long time but eventually i managed to get someone to explain why
sqlite had that particular problem, and i was able to modify the
sqlite source to resolve the issue.

Unfortunately no one has yet commented on my solution, or the problem
which it addresses.   Basically sqlite has thread safety checking
routines which work a little like mutexe's.  Every time you enter a
bit of code which is potentially thread unsafe it sets a magic number,
then resets it when it comes out.  This is an attempt to detect when
two threads are accessing the same bit of code at the same time.

Clearly its not 100% reliable, and is subject to all kinds of thread
races, but it does provide some measure of protection.  Unfortunately
though, the way it has been coded, an unbalanced safety check is
performed in the sqlite3_step() function.

This is equivalent to entering a mutex but never leaving, which causes
deadlock in a multithreaded program.  Only in this situation sqlite
throws a misuse error any time two or more threads use sqlite3_step()
at the same time, even if those threads are synchronised and perfectly
safe.

The easy solution is to disable the safety checks, the propper
solution is to balance out the checks in sqlite3_step() so that users
who actually wish to use sqlite in a multithreaded program are free to
synchronise access to the api without error and there is still a
reasonable level of safety checking for users who do not synchronise
properly.


Emerson

On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote:


Work on turning "reasonable" into "adequate" or "good" and it will help
you get an intuitive feel for the design of programs such as yours.
Then your programs will be simple, fast and robust, as Einstein
counselled - "Make it as simple a possible, but no simpler".

I also suggest that you take Niklaus Wirth's advice and when you run
into a difficulty backtrack your work and scrap everything until you
reach a point where there are no problems and start again from that
point taking a different approach having learned a lesson from your
previous attempt.

By the way, I doubt whether you are using a modern operating system, it
is most likely to be old technology like Windows or Linux.  Neither
supports much in the way of parallelism.



- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-05 Thread Emerson Clarke

John,

Um, alright then...

But i think your preaching to the converted, simplifying things is
what i always try to do.  And not just when theres a problem

If you followed the thread fully you would realise that there was
never a problem with my design, though that didnt stop many people
from chirping in and offering suggestions.

The problem i had was with sqlite not being compatible with the simple
design that i wanted.  I did try several alternate designs, but only
as a way of working around the problem i had with sqlite.  It took a
long time but eventually i managed to get someone to explain why
sqlite had that particular problem, and i was able to modify the
sqlite source to resolve the issue.

Unfortunately no one has yet commented on my solution, or the problem
which it addresses.   Basically sqlite has thread safety checking
routines which work a little like mutexe's.  Every time you enter a
bit of code which is potentially thread unsafe it sets a magic number,
then resets it when it comes out.  This is an attempt to detect when
two threads are accessing the same bit of code at the same time.

Clearly its not 100% reliable, and is subject to all kinds of thread
races, but it does provide some measure of protection.  Unfortunately
though, the way it has been coded, an unbalanced safety check is
performed in the sqlite3_step() function.

This is equivalent to entering a mutex but never leaving, which causes
deadlock in a multithreaded program.  Only in this situation sqlite
throws a misuse error any time two or more threads use sqlite3_step()
at the same time, even if those threads are synchronised and perfectly
safe.

The easy solution is to disable the safety checks, the propper
solution is to balance out the checks in sqlite3_step() so that users
who actually wish to use sqlite in a multithreaded program are free to
synchronise access to the api without error and there is still a
reasonable level of safety checking for users who do not synchronise
properly.


Emerson

On 1/5/07, John Stanton <[EMAIL PROTECTED]> wrote:

Work on turning "reasonable" into "adequate" or "good" and it will help
you get an intuitive feel for the design of programs such as yours.
Then your programs will be simple, fast and robust, as Einstein
counselled - "Make it as simple a possible, but no simpler".

I also suggest that you take Niklaus Wirth's advice and when you run
into a difficulty backtrack your work and scrap everything until you
reach a point where there are no problems and start again from that
point taking a different approach having learned a lesson from your
previous attempt.

By the way, I doubt whether you are using a modern operating system, it
is most likely to be old technology like Windows or Linux.  Neither
supports much in the way of parallelism.



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-04 Thread John Stanton
Work on turning "reasonable" into "adequate" or "good" and it will help 
you get an intuitive feel for the design of programs such as yours. 
Then your programs will be simple, fast and robust, as Einstein 
counselled - "Make it as simple a possible, but no simpler".


I also suggest that you take Niklaus Wirth's advice and when you run 
into a difficulty backtrack your work and scrap everything until you 
reach a point where there are no problems and start again from that 
point taking a different approach having learned a lesson from your 
previous attempt.


By the way, I doubt whether you are using a modern operating system, it 
is most likely to be old technology like Windows or Linux.  Neither 
supports much in the way of parallelism.


Emerson Clarke wrote:

John,

I have a reasonable understanding of the PC architecture, and more
appropriately the architecture which the operating system presents to
software.  The PC may be a serial device, but a modern operating
system with its multitasking shcheduler attempts to emulate a non
serial environment.  It devotes a certain amount of time to each
thread and then moves on.

Wether or not you are working on a highly pyshically parallel
architecture or not makes no difference, the design principles are the
same.  You should still build your software so that it is capable of
taking advantage of the environment that the operating system
presents.  It is the operating system you should be designing for, not
the hardware...

As it happens, the newest generation of PC's are all multi-core, and i
have been working on multi processor environments for many years.

Emerson

On 1/4/07, John Stanton <[EMAIL PROTECTED]> wrote:


If Emeroson intuitively understood the essential architecture of the PC
he is using he would not be having difficulty with his concept of how to
use it.  It is essentially a serial device, multi-tasking device and
parallelism in the forms of threading and multi processing is a
sophistication added with a high overhead.

I recollect an insightful CS professor impressing on his his students
the concept by explaining to them that the machines on their desks were
descended from a device invented to be a gas pump controller.

A machine designed from first principles to manage parrallel processing
would be very different.

Michael Ruck wrote:
> Hi Emerson,
>
> I just hope you don't reinvent the wheel ;) I haven't yet had the 
need to
> index things the way you describe it. May be I should take that as 
one of my

> next pet projects to get a handle on this type of task.
>
> The problem as I see it is basically, that any way you design this: 
If the
> storage tasks take 90% of your indexing time, then any 
parallelization may

> be a waste of effort. Even if you use a synchronization object you're
> essentially serializing things in a (complicated) multithreaded way...
>
> As far as static initialization: That it occurs before main() and is 
out of
> your control was the point I was getting across. That's why I wrote 
that

> this type of initialization should be avoided, unless there's no better
> design for it.
>
> Michael
>
> -Ursprüngliche Nachricht-
> Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
> Gesendet: Mittwoch, 3. Januar 2007 20:31
> An: sqlite-users@sqlite.org
> Betreff: Re: [sqlite] sqlite performance, locking & threading
>
> Michael,
>
> Thanks for the advice.  During the indexing process i need to select 
and

> optionally insert records into a table so i cant ignore the outcomes.
>
> Basically the indexing process does compression, so for each 
document it
> inserts words into a table and looks up keys.  Every word in the 
document

> gets swapped with a key, and new keys are inserted as needed.
>
> There are some problems with splitting the work up in a different 
way as you
> suggested. I would either end up with a lot of queues or i would 
have to
> stagger the work so that the entire data set gets processed in 
stages which
> doesnt scale very well and isnt particularly fault tollerant.  When 
building
> an index, you want the structure to be built up progressively, so 
that you

> can pause the process and resume it later on whilst still having useful
> results.
>
> I would be worried that in a queued design, the overhead and 
bottlenecks
> caused by the buffering, message passing, and context switching 
would reduce

> the performance to that of a single thread.
> Especially since the database operations represent 90% of the work, 
all you
> would really be doing is attempting to serialise things in a 
multithreaded

> way.
>
> Im sure having worked on multithreaded systems you appreciate that 
sometimes
> simple designs are better, and i think i have a pretty good handle 
on what

> it is that im trying to do.
>
> You never have control over static initialisation, it happens before 
main().
> If i was writing very specific code to suit just this situation then 
maybe
> as you say i wouldnt need to worry about it.  But i

Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-04 Thread Emerson Clarke

John,

I have a reasonable understanding of the PC architecture, and more
appropriately the architecture which the operating system presents to
software.  The PC may be a serial device, but a modern operating
system with its multitasking shcheduler attempts to emulate a non
serial environment.  It devotes a certain amount of time to each
thread and then moves on.

Wether or not you are working on a highly pyshically parallel
architecture or not makes no difference, the design principles are the
same.  You should still build your software so that it is capable of
taking advantage of the environment that the operating system
presents.  It is the operating system you should be designing for, not
the hardware...

As it happens, the newest generation of PC's are all multi-core, and i
have been working on multi processor environments for many years.

Emerson

On 1/4/07, John Stanton <[EMAIL PROTECTED]> wrote:

If Emeroson intuitively understood the essential architecture of the PC
he is using he would not be having difficulty with his concept of how to
use it.  It is essentially a serial device, multi-tasking device and
parallelism in the forms of threading and multi processing is a
sophistication added with a high overhead.

I recollect an insightful CS professor impressing on his his students
the concept by explaining to them that the machines on their desks were
descended from a device invented to be a gas pump controller.

A machine designed from first principles to manage parrallel processing
would be very different.

Michael Ruck wrote:
> Hi Emerson,
>
> I just hope you don't reinvent the wheel ;) I haven't yet had the need to
> index things the way you describe it. May be I should take that as one of my
> next pet projects to get a handle on this type of task.
>
> The problem as I see it is basically, that any way you design this: If the
> storage tasks take 90% of your indexing time, then any parallelization may
> be a waste of effort. Even if you use a synchronization object you're
> essentially serializing things in a (complicated) multithreaded way...
>
> As far as static initialization: That it occurs before main() and is out of
> your control was the point I was getting across. That's why I wrote that
> this type of initialization should be avoided, unless there's no better
> design for it.
>
> Michael
>
> -Ursprüngliche Nachricht-
> Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
> Gesendet: Mittwoch, 3. Januar 2007 20:31
> An: sqlite-users@sqlite.org
> Betreff: Re: [sqlite] sqlite performance, locking & threading
>
> Michael,
>
> Thanks for the advice.  During the indexing process i need to select and
> optionally insert records into a table so i cant ignore the outcomes.
>
> Basically the indexing process does compression, so for each document it
> inserts words into a table and looks up keys.  Every word in the document
> gets swapped with a key, and new keys are inserted as needed.
>
> There are some problems with splitting the work up in a different way as you
> suggested. I would either end up with a lot of queues or i would have to
> stagger the work so that the entire data set gets processed in stages which
> doesnt scale very well and isnt particularly fault tollerant.  When building
> an index, you want the structure to be built up progressively, so that you
> can pause the process and resume it later on whilst still having useful
> results.
>
> I would be worried that in a queued design, the overhead and bottlenecks
> caused by the buffering, message passing, and context switching would reduce
> the performance to that of a single thread.
> Especially since the database operations represent 90% of the work, all you
> would really be doing is attempting to serialise things in a multithreaded
> way.
>
> Im sure having worked on multithreaded systems you appreciate that sometimes
> simple designs are better, and i think i have a pretty good handle on what
> it is that im trying to do.
>
> You never have control over static initialisation, it happens before main().
> If i was writing very specific code to suit just this situation then maybe
> as you say i wouldnt need to worry about it.  But im also writing a database
> api, and that api is used for many different things.  My considderations are
> not just for this one problem, but also for the best general way to code the
> api so that it is safe and efficient in all circumstances.  So far the
> client/server design is the only way i can achieve true thread safety.
>
> If i could work out why sqlite3_step() causes problems across multiple
> threads i could probably make things a little faster and i could do away
> with the need for a client/server design.
>
> Emerson
>
>
> On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:
>
>>Emerson,
>>
>>Now I understand your current implementation.  You seemingly only
>>partially split up the work in your code. I'd schedule the database
>>operation and not wait on the outcome, but start on the ne

Re: AW: [sqlite] sqlite performance, locking & threading

2007-01-04 Thread John Stanton
If Emeroson intuitively understood the essential architecture of the PC 
he is using he would not be having difficulty with his concept of how to 
use it.  It is essentially a serial device, multi-tasking device and 
parallelism in the forms of threading and multi processing is a 
sophistication added with a high overhead.


I recollect an insightful CS professor impressing on his his students 
the concept by explaining to them that the machines on their desks were 
descended from a device invented to be a gas pump controller.


A machine designed from first principles to manage parrallel processing 
would be very different.


Michael Ruck wrote:

Hi Emerson,

I just hope you don't reinvent the wheel ;) I haven't yet had the need to
index things the way you describe it. May be I should take that as one of my
next pet projects to get a handle on this type of task.

The problem as I see it is basically, that any way you design this: If the
storage tasks take 90% of your indexing time, then any parallelization may
be a waste of effort. Even if you use a synchronization object you're
essentially serializing things in a (complicated) multithreaded way...

As far as static initialization: That it occurs before main() and is out of
your control was the point I was getting across. That's why I wrote that
this type of initialization should be avoided, unless there's no better
design for it.

Michael

-Ursprüngliche Nachricht-
Von: Emerson Clarke [mailto:[EMAIL PROTECTED] 
Gesendet: Mittwoch, 3. Januar 2007 20:31

An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] sqlite performance, locking & threading

Michael,

Thanks for the advice.  During the indexing process i need to select and
optionally insert records into a table so i cant ignore the outcomes.

Basically the indexing process does compression, so for each document it
inserts words into a table and looks up keys.  Every word in the document
gets swapped with a key, and new keys are inserted as needed.

There are some problems with splitting the work up in a different way as you
suggested. I would either end up with a lot of queues or i would have to
stagger the work so that the entire data set gets processed in stages which
doesnt scale very well and isnt particularly fault tollerant.  When building
an index, you want the structure to be built up progressively, so that you
can pause the process and resume it later on whilst still having useful
results.

I would be worried that in a queued design, the overhead and bottlenecks
caused by the buffering, message passing, and context switching would reduce
the performance to that of a single thread.
Especially since the database operations represent 90% of the work, all you
would really be doing is attempting to serialise things in a multithreaded
way.

Im sure having worked on multithreaded systems you appreciate that sometimes
simple designs are better, and i think i have a pretty good handle on what
it is that im trying to do.

You never have control over static initialisation, it happens before main().
If i was writing very specific code to suit just this situation then maybe
as you say i wouldnt need to worry about it.  But im also writing a database
api, and that api is used for many different things.  My considderations are
not just for this one problem, but also for the best general way to code the
api so that it is safe and efficient in all circumstances.  So far the
client/server design is the only way i can achieve true thread safety.

If i could work out why sqlite3_step() causes problems across multiple
threads i could probably make things a little faster and i could do away
with the need for a client/server design.

Emerson


On 1/3/07, Michael Ruck <[EMAIL PROTECTED]> wrote:


Emerson,

Now I understand your current implementation.  You seemingly only 
partially split up the work in your code. I'd schedule the database 
operation and not wait on the outcome, but start on the next task. 
When the database finishes and has retrieved its result, schedule some 
work package on a third thread, which only processes the results etc. 
Split up the work in to repetitive, non blocking tasks. Use multiple 
queues and dedicated threads for parts of the operation or thread pools,


which process queues in parallel if possible.


From what I can tell you're already half way there.

I still don't see your static initialization problem, but that's 
another story. Actually I'd avoid using static initialization or 
static (singleton) instances, unless the design really requires it. 
Someone must control startup of the entire process, have that one 
(probably main/WinMain) take care that the work queues are available. 
Afterwards the order of thread starts doesn't matter... Actually it is 
non-deterministic anyway (unless you serialize this yourself.)


Michael

-Ursprüngliche Nachricht-
Von: Emerson Clarke [mailto:[EMAIL PROTECTED]
Gesendet: Mittwoch, 3. Januar 2007 15:14
An: sqlite-users@sqlite.org
B