Re: [sqlite] How to use sqlite and pthread together?

2011-02-23 Thread Nico Williams
On Sun, Feb 20, 2011 at 6:36 PM, Samuel Adam  wrote:
> On Sun, 20 Feb 2011 14:46:06 -0500, Nico Williams 
> wrote:

> I appreciate your extensive (if wildly offtopic) analysis as quoted
> below.  You thoroughly misunderstood what I said, though.  Again, my
> fork()/exec() comment was directed to the same “cultural thing” as you
> spoke about in a different context; and my object thereby was to posit
> __why__ *nix kernel developers have more incentive to make sure processes
> run light.  Winapi doesn’t offer a really equivalent pair of syscalls, nor
> an extensive existing fork-exec practice, so NT kernel developers needn’t
> optimize that use case; whereas *nix kernel folks must of practical
> necessity design their process models to support a typical *nix code
> pattern.  If they do not so do, their users will complain bitterly about
> the overhead of all their daemons’ zillion workers *after* those workers
> are started with the classic fork()/exec().

Unix _application_ developers have an incentive to keep their
processes light-weight, but _kernel_ developers can't do that very
much to make fork() faster other than encourage _application_
developers to use posix_spawn().  The semantics of fork() + threads
are such that COW is really expensive for processes with large
writable resident set sizes -- it is what it is.

> This being off-topic as it is, I must decline to continue discussing OS
> process practice in front of 10,000 or so people (or so I heard) who tuned
> in for discussion about SQLite.  You said some very interesting stuff,
> though, particularly as to the TLB.  I’d like to leave the door open to
> engaging such discussions in an appropriate venue sometime (ENOTIME for
> the foreseeable future).

I thought it was on topic: I'm giving advice to SQLite3 application
developers: a) fork-safety is _really_ difficult for complex libraries
to implement, so assume fork-unsafe libraries unless the documentation
tells you otherwise, b) fork() is not cheap, so use vfork() or better,
posix_spawn() if at all possible.  You're free to disregard such
advice, of course.

Cheers,

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to use sqlite and pthread together?

2011-02-20 Thread Samuel Adam
On Sun, 20 Feb 2011 14:46:06 -0500, Nico Williams   
wrote:

> On Sun, Feb 20, 2011 at 6:28 AM, Samuel Adam  wrote:
>> On Sat, 19 Feb 2011 17:12:31 -0500, Pavel Ivanov 
>> wrote:
>>
>>> [snip] On
>>> Windows it’s different - process is much more heavy-weight object than
>>> thread and involves much bigger system load to support it. There’s an
>>> official general advice for Windows: better create a new thread in the
>>> same process than a new process.
>>
>> Mr. Ivanov explained what I was saying better than I did.  My unclear
>> offhand comment about fork()/exec() was an allusion to why *nix  
>> developed
>> much lighter-weight processes than Windows, viz., decades of a
>> fork()/exec() custom and practice.  (Indeed, I believe that’s precisely
>> why Linux went to the trouble of re-engineering fork() with COW.)  I
>> intended to address the overhead of running, and inadvertently  
>> introduced
>> a red herring about overhead of starting.
>
> You seem to be conflating the weightiness of a notion of process with
> the weightiness of interfaces for creating processes.

I appreciate your extensive (if wildly offtopic) analysis as quoted  
below.  You thoroughly misunderstood what I said, though.  Again, my  
fork()/exec() comment was directed to the same “cultural thing” as you  
spoke about in a different context; and my object thereby was to posit  
__why__ *nix kernel developers have more incentive to make sure processes  
run light.  Winapi doesn’t offer a really equivalent pair of syscalls, nor  
an extensive existing fork-exec practice, so NT kernel developers needn’t  
optimize that use case; whereas *nix kernel folks must of practical  
necessity design their process models to support a typical *nix code  
pattern.  If they do not so do, their users will complain bitterly about  
the overhead of all their daemons’ zillion workers *after* those workers  
are started with the classic fork()/exec().

This being off-topic as it is, I must decline to continue discussing OS  
process practice in front of 10,000 or so people (or so I heard) who tuned  
in for discussion about SQLite.  You said some very interesting stuff,  
though, particularly as to the TLB.  I’d like to leave the door open to  
engaging such discussions in an appropriate venue sometime (ENOTIME for  
the foreseeable future).

Very truly,

Samuel Adam ◊ 
763 Montgomery Road ◊ Hillsborough, NJ  08844-1304 ◊ United States
Legal advice from a non-lawyer: “If you are sued, don’t do what the
Supreme Court of New Jersey, its agents, and its officers did.”
http://www.youtube.com/watch?v=iT2hEwBfU1g


>
> fork() has nothing to do with whether a notion of process is
> light-weight or not.  And quite aside from that, fork() is only as
> light-weight as the writable resident set size of the parent process.
> Long, long ago fork() would copy the parent's address space.  Later on
> fork() implementations started marking what should be writable pages
> as read-only in the MMU page table entries for the process in order to
> catch writes and then copy-on-write.  COW works fine for
> single-threaded processes when the child of fork() intends to exec()
> or exit() immediately and the parent is willing to wait for the child
> to do so.  But for a heavily multi-threaded process with a huge RSS,
> such as a web browser, COW is a performance disaster as it means
> cross-calls to do MMU TLB shoot down, and then incurring a potentially
> large number of page faults in the parent as those threads continue
> executing.  Nowadays it's often simpler and faster to just copy the
> writable portion of the parent's RSS...  vfork(), OTOH, need only
> result in cross-calls to stop the parent's threads, but no page table
> manipulations, TLB shootdowns, data copies, nor page faults need be
> incurred.  And a true posix_spawn() wouldn't even have to stop the
> parent's threads (but using vfork() makes posix_spawn perform so well
> compared to fork() that, for example, Solaris' posix_spawn() just uses
> vfork()).  In Solaris, for example, we've obtained major performance
> improvements by having applications such as web browsers use
> posix_spawn() or vfork() in preference to fork().
>
> In any case, fork() is not an essential attribute of an operating
> system's notion of "process", but an incidental one (related to how
> one creates processes).  In terms of essential attributes, Unix and
> Windows processes compare to each other, and Windows and Unix threads
> (POSIX threads) also compare to each other (roughly anyways, as some
> pthreads implementations have M:N mappings to kernel constructs while
> others have 1:1, and so on).  Yes, Linux has clone(2), which allows
> one to decide just what parts of the parent's various attributes the
> child will share with the parent or get a copy of from the parent, but
> because the standard is pthreads, in practice most developers on Linux
> 

Re: [sqlite] How to use sqlite and pthread together?

2011-02-20 Thread Nico Williams
On Sun, Feb 20, 2011 at 6:28 AM, Samuel Adam  wrote:
> On Sat, 19 Feb 2011 17:12:31 -0500, Pavel Ivanov 
> wrote:
>
>> [snip] On
>> Windows it’s different - process is much more heavy-weight object than
>> thread and involves much bigger system load to support it. There’s an
>> official general advice for Windows: better create a new thread in the
>> same process than a new process.
>
> Mr. Ivanov explained what I was saying better than I did.  My unclear
> offhand comment about fork()/exec() was an allusion to why *nix developed
> much lighter-weight processes than Windows, viz., decades of a
> fork()/exec() custom and practice.  (Indeed, I believe that’s precisely
> why Linux went to the trouble of re-engineering fork() with COW.)  I
> intended to address the overhead of running, and inadvertently introduced
> a red herring about overhead of starting.

You seem to be conflating the weightiness of a notion of process with
the weightiness of interfaces for creating processes.

fork() has nothing to do with whether a notion of process is
light-weight or not.  And quite aside from that, fork() is only as
light-weight as the writable resident set size of the parent process.
Long, long ago fork() would copy the parent's address space.  Later on
fork() implementations started marking what should be writable pages
as read-only in the MMU page table entries for the process in order to
catch writes and then copy-on-write.  COW works fine for
single-threaded processes when the child of fork() intends to exec()
or exit() immediately and the parent is willing to wait for the child
to do so.  But for a heavily multi-threaded process with a huge RSS,
such as a web browser, COW is a performance disaster as it means
cross-calls to do MMU TLB shoot down, and then incurring a potentially
large number of page faults in the parent as those threads continue
executing.  Nowadays it's often simpler and faster to just copy the
writable portion of the parent's RSS...  vfork(), OTOH, need only
result in cross-calls to stop the parent's threads, but no page table
manipulations, TLB shootdowns, data copies, nor page faults need be
incurred.  And a true posix_spawn() wouldn't even have to stop the
parent's threads (but using vfork() makes posix_spawn perform so well
compared to fork() that, for example, Solaris' posix_spawn() just uses
vfork()).  In Solaris, for example, we've obtained major performance
improvements by having applications such as web browsers use
posix_spawn() or vfork() in preference to fork().

In any case, fork() is not an essential attribute of an operating
system's notion of "process", but an incidental one (related to how
one creates processes).  In terms of essential attributes, Unix and
Windows processes compare to each other, and Windows and Unix threads
(POSIX threads) also compare to each other (roughly anyways, as some
pthreads implementations have M:N mappings to kernel constructs while
others have 1:1, and so on).  Yes, Linux has clone(2), which allows
one to decide just what parts of the parent's various attributes the
child will share with the parent or get a copy of from the parent, but
because the standard is pthreads, in practice most developers on Linux
constrain themselves to using pthreads, thus the concept of clone(2)
is not that relevant here.

> Speaking as a user, by the way, I don’t think I actually have *any*
> Windows applications which use worker processes for concurrency the same
> way my *nix server daemons do.  There’s a reason for that.

It's largely a cultural thing.  Windows NT and up had and promoted
threading from the get-go, while Unix had a very long tradition of
single-threaded processes, and some Unix systems had to catch up to
Windows regarding multi-threading.  There are many other factors
leading to this dichotomy, such as the fact that Unix developers tend
to appreciate isolation, the fact that Window's process spawn API is
so complex and difficult to use, the fact that Windows allows
individual threads of a process to execute with different access
tokens in effect (thus reducing the need to start additional
processes, even if this means losing a lot on the isolation front),
etcetera.  OTOH I've no reason to believe that this split has anything
to do with the weightiness of Windows processes vs. Unix ones (though
the complexity of creating new processes certainly is involved).  But
we do get many heavily multi-threaded applications on Unix nowadays.
Perhaps the Windows model won out... threads _are_ easier to get
started with than processes.

Obtopic: I've successfully used SQLite_2_ with pthreads, and have
every reason to believe that it is possible to safely and productively
use SQLite3 with pthreads.  The key for using SQLite3 in a
multi-threaded way is to adhere to good threaded programming
guidelines while thoroughly understanding the APIs you choose to use.
In particular you should do your utmost to minimize the use of any

Re: [sqlite] How to use sqlite and pthread together?

2011-02-20 Thread Samuel Adam
On Sat, 19 Feb 2011 17:12:31 -0500, Pavel Ivanov 
wrote:

> [snip] On
> Windows it’s different - process is much more heavy-weight object than
> thread and involves much bigger system load to support it. There’s an
> official general advice for Windows: better create a new thread in the
> same process than a new process.

Mr. Ivanov explained what I was saying better than I did.  My unclear
offhand comment about fork()/exec() was an allusion to why *nix developed
much lighter-weight processes than Windows, viz., decades of a
fork()/exec() custom and practice.  (Indeed, I believe that’s precisely
why Linux went to the trouble of re-engineering fork() with COW.)  I
intended to address the overhead of running, and inadvertently introduced
a red herring about overhead of starting.

Speaking as a user, by the way, I don’t think I actually have *any*
Windows applications which use worker processes for concurrency the same
way my *nix server daemons do.  There’s a reason for that.

Lots to say about threads, but well—that will need await another thread.

Very truly,

Samuel Adam ◊ 
763 Montgomery Road ◊ Hillsborough, NJ  08844-1304 ◊ United States
Legal advice from a non-lawyer: “If you are sued, don’t do what the
Supreme Court of New Jersey, its agents, and its officers did.”
http://www.youtube.com/watch?v=iT2hEwBfU1g
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to use sqlite and pthread together?

2011-02-19 Thread Nico Williams
Pavel, I am fully aware of clone(2).  But clone() is not standard, and the
modern Linux pthreads implementation is faithful to the pthreads
specification.  POSIX threads are schedulable threads of execution that
share a process' address space.  I will grant that my advice regarding
vfork() is based on my Solaris experience, but it's also based on
fundamental properties of those system calls, and, moreover, my advice is to
use posix_spawn(), not vfork().

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to use sqlite and pthread together?

2011-02-19 Thread Pavel Ivanov
> Rather, I meant that Windows processes are comparable to
> Unix processes (concurrent threads of execution in isolated address
> spaces), and that Windows threads are comparable to Unix threads
> (concurrent threads of execution in common address spaces).  I'm not
> an expert on Windows, but I do believe that those valid comparisons.

That's an incorrect comparison, Nico. I won't talk about all flavors
of Unix but on Linux threads and processes are absolutely equal.
That's why talks about what's better between threads and processes
makes sense. In Linux creation of new process involves just a little
bit more overhead compared to creation of new thread, and support of
running thread is absolutely equal to support of running process. On
Windows it's different - process is much more heavy-weight object than
thread and involves much bigger system load to support it. There's an
official general advice for Windows: better create a new thread in the
same process than a new process.

Regarding fork vs vfork (and thus posix_spawn): besides the fact that
it applies only to the case of running new application (not to cloning
the current process which fork is initially designed for) I don't
understand your position. In Linux kernel these two are the same
syscall and implementation of fork is quite straightforward whereas
vfork part requires jumping through additional hoops. So I don't
understand why one is better than other. BTW, vfork is strongly
discouraged on Linux:
http://tldp.org/HOWTO/Secure-Programs-HOWTO/avoid-vfork.html.


Pavel

On Sat, Feb 19, 2011 at 4:42 PM, Nico Williams  wrote:
> On Sat, Feb 19, 2011 at 1:45 PM, Pavel Ivanov  wrote:
>> Nico, it looks like your don't understand what you are saying.
>
> I think you misunderstood what I was saying.  I think my wording could
> easily have caused that:
>
>>> Windows and Unix processes and threads have similar semantics, and thus
>>> roughly comparable performance envelopes.
>>
>> Windows processes and threads don't have similar semantics, unix
>> processes (and threads) are not comparable to Windows processes at
>> all. Just search the internet on this topic and look at some
>> benchmarks.
>
> I didn't mean that Windows processes are comparable to Windows
> threads.  Rather, I meant that Windows processes are comparable to
> Unix processes (concurrent threads of execution in isolated address
> spaces), and that Windows threads are comparable to Unix threads
> (concurrent threads of execution in common address spaces).  I'm not
> an expert on Windows, but I do believe that those valid comparisons.
>
>>> Threading is difficult, though certainly not impossible, to get right.  Take
>>> Richard's advice, avoid threading.
>>
>> All your email is about evilness of fork(). But fork() is
>> process-related, not thread-related call. So you are suggesting that
>> processes are evil too? Or are you suggesting that starting new
>> process from your application by using fork() is evil? But exactly
>> this call is used by any shell or other application starting new
>> processes (or even a system() function).
>
> Wow, well, you did misunderstand me.  I called fork(2) evil, but I
> praised posix_spawn(3C), thus you should not have concluded that I
> think processes are evil -- I did not pronounce an opinion on
> processes earlier, but for the record: I don't think processes are
> evil.
>
> I stand by my characterization of fork(2) and friends.
>
> Also, I do not think threads are evil.  I do believe that threading
> (that is, multiple concurrent threads of execution in a single address
> space) is difficult to get right in any programming language (except,
> perhaps, ones like Clojure).
>
> Nico
> --
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to use sqlite and pthread together?

2011-02-19 Thread Pavel Ivanov
Nico, it looks like your don't understand what you are saying.

> Windows and Unix processes and threads have similar semantics, and thus
> roughly comparable performance envelopes.

Windows processes and threads don't have similar semantics, unix
processes (and threads) are not comparable to Windows processes at
all. Just search the internet on this topic and look at some
benchmarks.

> Threading is difficult, though certainly not impossible, to get right.  Take
> Richard's advice, avoid threading.

All your email is about evilness of fork(). But fork() is
process-related, not thread-related call. So you are suggesting that
processes are evil too? Or are you suggesting that starting new
process from your application by using fork() is evil? But exactly
this call is used by any shell or other application starting new
processes (or even a system() function).


Can I suggest for everybody to stop this holy war about threads vs
processes on this list? I see too many statements "don't use threads,
use processes" that come before one even tries to listen to reasons
why threads are actually used. That's not a good advice in such
situation. I understand that developing multi-process applications
could be much easier than multi-threaded ones (because you leave
worrying about all multi-threading stuff to other developers, the ones
who develop OS kernels), I understand that general advice to somebody
not knowing anything about threads is to not use them. But it's not
advice to absolutely any application out there. In many applications
you can't reach the same level of performance with multi-process
structure as you can with multi-threaded one. And using IPC and forks
can make developer's life much harder than using simple mutexes and
reading/writing to the same memory.
So as this list is not for analyzing of pros and cons of threading vs
processes for each application let's not bring this holy war here and
leave it for some other list.



Pavel

On Fri, Feb 18, 2011 at 8:34 PM, Nico Williams  wrote:
> On Feb 18, 2011 6:16 PM, "Samuel Adam"  wrote:
>> FYI, Windows NT is documented to have light threads and heavy processes.
>
> Windows and Unix processes and threads have similar semantics, and thus
> roughly comparable performance envelopes.
>
>> To my knowledge, it just was not designed with the goal of *nix/Plan 9/et
>> al.’s more-or-less cheap and easy fork()/exec().
>
> fork(2) is most certainly NOT light-weight, kernels having to jump through
> hoops to make it perform well sometimes.  Indeed, fork(2) is evil because it
> is very, very difficult (though usually not impossible) to make layered
> software fork-safe.
>
> If at all possible use posix_spawn(3C) instead of fork(2)/exec(2)!  That
> interface typically uses vfork(2) under the covers, which, though evil
> itself, is much less evil than fork(2) when used properly (and
> posix_spawn(3C) does use vfork() correctly).  vfork() has the advantage of
> having much safer semantics than fork() and also being much, much
> lighter-weight.  There's no need to worry about making code safe with
> respect to posix_spawn().
>
> [forkall(2) is the most evil of that family of Unix system calls, as it is
> impossible to make code forkall()-safe, and when it works it is purely
> because the parent exit()s quickly enough that nothing bad happens, but even
> then forkall() is an accident waiting to happen.]
>
> Threading is difficult, though certainly not impossible, to get right.  Take
> Richard's advice, avoid threading.
>
> Nico
> --
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to use sqlite and pthread together?

2011-02-18 Thread Nico Williams
On Feb 18, 2011 6:16 PM, "Samuel Adam"  wrote:
> FYI, Windows NT is documented to have light threads and heavy processes.

Windows and Unix processes and threads have similar semantics, and thus
roughly comparable performance envelopes.

> To my knowledge, it just was not designed with the goal of *nix/Plan 9/et
> al.’s more-or-less cheap and easy fork()/exec().

fork(2) is most certainly NOT light-weight, kernels having to jump through
hoops to make it perform well sometimes.  Indeed, fork(2) is evil because it
is very, very difficult (though usually not impossible) to make layered
software fork-safe.

If at all possible use posix_spawn(3C) instead of fork(2)/exec(2)!  That
interface typically uses vfork(2) under the covers, which, though evil
itself, is much less evil than fork(2) when used properly (and
posix_spawn(3C) does use vfork() correctly).  vfork() has the advantage of
having much safer semantics than fork() and also being much, much
lighter-weight.  There's no need to worry about making code safe with
respect to posix_spawn().

[forkall(2) is the most evil of that family of Unix system calls, as it is
impossible to make code forkall()-safe, and when it works it is purely
because the parent exit()s quickly enough that nothing bad happens, but even
then forkall() is an accident waiting to happen.]

Threading is difficult, though certainly not impossible, to get right.  Take
Richard's advice, avoid threading.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to use sqlite and pthread together?

2011-02-18 Thread Samuel Adam
On Thu, 17 Feb 2011 07:30:47 -0500, Richard Hipp  wrote:

> Using threads is like running with scissors - You are likely to get hurt  
> and so the best approach is to not do it.
>
> If you want to run queries in parallel, I suggest putting each query in a
> separate process.
>
> If your knowledge of threads is so limited that you don't know how to  
> enable them and you are trying to use pthreads on windows, then your 
> chances of getting hurt are compounded.  This is all the more reason to 
> use separate processes, not threads, for parallelism.

FYI, Windows NT is documented to have light threads and heavy processes.   
To my knowledge, it just was not designed with the goal of *nix/Plan 9/et  
al.’s more-or-less cheap and easy fork()/exec().

Of course, since the original poster was using pthreads, he probably  
doesn’t care.  (There exists a popular pthreads/win32 package; it is  
reputed slow, I cannot attest either way as thereto, and it may or may not  
be what the original poster was using.)

Very truly,

Samuel Adam ◊ 
763 Montgomery Road ◊ Hillsborough, NJ  08844-1304 ◊ United States
Legal advice from a non-lawyer: “If you are sued, don’t do what the
Supreme Court of New Jersey, its agents, and its officers did.”
http://www.youtube.com/watch?v=iT2hEwBfU1g
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to use sqlite and pthread together?

2011-02-17 Thread Richard Hipp
On Wed, Feb 16, 2011 at 4:56 PM, Hailiang Shen  wrote:

> Dear All,
>
> I am trying to apply multiple threads with sqlite to just query (no insert,
> update, delete operation) to compute objective values in optimization. But
> I
> cannot get it correct. I compiled pthread to a dll for use with sqlite in
> VC++. I am using separate database connection for each thread.
>

Using threads is like running with scissors - You are likely to get hurt and
so the best approach is to not do it.

If you want to run queries in parallel, I suggest putting each query in a
separate process.

If your knowledge of threads is so limited that you don't know how to enable
them and you are trying to use pthreads on windows, then your chances of
getting hurt are compounded.  This is all the more reason to use separate
processes, not threads, for parallelism.


>
>
>
> Any sample codes would be best.
>
>
>
> Thanks,
>
> Hailiang
>
>
>
> /**/
>
> Hailiang Shen
>
> Ph.D. Candidate
>
> Water Resources Engineering
>
> University of Guelph
>
> 315,  Engineering bldg.
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users