Re: Anyone working on multi-threaded core files for 2.4 ?

2000-10-02 Thread Andi Kleen

On Mon, Oct 02, 2000 at 03:10:10PM +0100, James Cownie wrote:
> 
> > Queueing the tcores in the mm_struct could work though. Add a prctl [1]
> > that enables tcore core dumping. When tcore core dumping is enabled every
> > core dump that would dump a mm_struct with reference count > 1 does not
> > actually dump it, but just queues a structure (tqueue) with its registers/
> > signal info/etc.  into a list in the mm_struct. When a thread dumps and 
> > the mm_struct count is 1 then dump a normal core file with all the tcores 
> > as thread notes. Also clean up all the cores then when freeing the mm_struct.
> 
> What is your model for the scope of this prctl ?
> 
> Should it be on a per-thread (OK, Alexander, "per process sharing the
> same MM") basis, or does it apply to all members of the thread_group ?

It should be per thread and inherited to childs but cleared on exec.
(this way the original thread or the thread manager could set it in 
LinuxThreads). 

> 
> It seems to me that it makes no sense unless
> 1) it applies to all members of the thread_group
>(because without this the ref count on the mm will never get to
>zero).

In this case the tcore information would be lost. If you don't
want that don't enable the prctl.

> 2) if it applies we also ensure that core dump signals get sent to all
>members of the thread_group.
>(because without this the other threads won't exit).

With the prctl you told the kernel that you are comitted to that.


> 
> At the moment (test9-pre7) there seems to be no code in the kernel to
> cause the core dumping signals to be fanned out. But then there seems
> to be no code to cause any signals (even the negative "deliver to
> thread group" ones) to be fanned out. I assume that's because it's
> still work in progress.

I see no problem in doing it in user space. It works fine there.
There are potential applications of the Linux clone threads model where 
you don't want to kill the other threads (e.g. when you have a 
object database that does its own object paging using SIGSEGV) 



-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-10-02 Thread James Cownie


> Queueing the tcores in the mm_struct could work though. Add a prctl [1]
> that enables tcore core dumping. When tcore core dumping is enabled every
> core dump that would dump a mm_struct with reference count > 1 does not
> actually dump it, but just queues a structure (tqueue) with its registers/
> signal info/etc.  into a list in the mm_struct. When a thread dumps and 
> the mm_struct count is 1 then dump a normal core file with all the tcores 
> as thread notes. Also clean up all the cores then when freeing the mm_struct.

What is your model for the scope of this prctl ?

Should it be on a per-thread (OK, Alexander, "per process sharing the
same MM") basis, or does it apply to all members of the thread_group ?

It seems to me that it makes no sense unless
1) it applies to all members of the thread_group
   (because without this the ref count on the mm will never get to
   zero).
2) if it applies we also ensure that core dump signals get sent to all
   members of the thread_group.
   (because without this the other threads won't exit).

At the moment (test9-pre7) there seems to be no code in the kernel to
cause the core dumping signals to be fanned out. But then there seems
to be no code to cause any signals (even the negative "deliver to
thread group" ones) to be fanned out. I assume that's because it's
still work in progress.

(Yes, I know that linuxthreads does it from outside the kernel, but I
was under the impression that the aim of the thread_group was to make
it work better, and if we're going to do multi-threaded core files it
seems that distributing the core dumping signals inside the kernel
is essential).

Just to reiterate what I and, I think, some of the other folks who've
written the existing patches are trying to achieve :-

1) We want to provide multi-threaded core files for pthread programs,
   where the user _expects_ a core dump signal to terminate all of the
   threads in the process and the process itself as soon "as
   possible".

2) We don't want to affect programs which are using the full linux
   threads model. If it makes no sense to attempt to write a
   multi-threaded core dump from such a process that's fine. People who
   are using this model probably don't want such a thing anyway.

The "itch" I'm trying to scratch is the complaints we get from our
customers saying that our debugger doesn't work because it won't
provide them with any useful information from the core files they get
from their pthreaded codes when running on Linux. I don't like having
to explain that it's a kernel "feature" :-( so I'm trying to fix it
(or, at least, help other people to do do).

-- Jim 

James Cownie<[EMAIL PROTECTED]>
Etnus, LLC. +44 117 9071438
http://www.etnus.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-10-02 Thread James Cownie


> Can someone explain why core dumping can't be done in userspace?
...
> There must be a good reason Unix and Linux don't do this ... but I
> haven't thought of it yet.  Anyone care to enlighten me?

The problem, I believe, is that once a process has reached the point
where it has been delivered a core dumping signal (of which there are
more than just SIGSEGV, of course), you can't rely on anything about
its internal state.

So, in particular, it could have unmapped all its writeable memory, or
have overwritten it with zeroes, or ... 

Therefore it's not at all clear that there's enough of the process
left to be able to guarantee that code inside it will be able to write
a core file.

If you really wanted to write core files from user space, the way to
do it would be to have a separate, well known to the kernel, daemon
process whose sole job was to dump the core of failing processes when
requested to do so by the kernel. FWIW I believe that this is what
HURD does.

This would be rather a micro-kernelish approach, but given that core
dumping is both rare and expensive anyway any performance hit from
doing it like this would be irrelevant.

Such an approach might be nice, (the kernel would get smaller, and
embedded folks could just leave out the code dumping daemon) but it's
a much more major change than anyone would want to do now for 2.4.

It's also hard to summon much enthusiasm to do it, since it's deep
into the "If it's not broken don't fix it" area.

-- Jim 

James Cownie<[EMAIL PROTECTED]>
Etnus, LLC. +44 117 9071438
http://www.etnus.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-10-02 Thread James Cownie


 Can someone explain why core dumping can't be done in userspace?
...
 There must be a good reason Unix and Linux don't do this ... but I
 haven't thought of it yet.  Anyone care to enlighten me?

The problem, I believe, is that once a process has reached the point
where it has been delivered a core dumping signal (of which there are
more than just SIGSEGV, of course), you can't rely on anything about
its internal state.

So, in particular, it could have unmapped all its writeable memory, or
have overwritten it with zeroes, or ... 

Therefore it's not at all clear that there's enough of the process
left to be able to guarantee that code inside it will be able to write
a core file.

If you really wanted to write core files from user space, the way to
do it would be to have a separate, well known to the kernel, daemon
process whose sole job was to dump the core of failing processes when
requested to do so by the kernel. FWIW I believe that this is what
HURD does.

This would be rather a micro-kernelish approach, but given that core
dumping is both rare and expensive anyway any performance hit from
doing it like this would be irrelevant.

Such an approach might be nice, (the kernel would get smaller, and
embedded folks could just leave out the code dumping daemon) but it's
a much more major change than anyone would want to do now for 2.4.

It's also hard to summon much enthusiasm to do it, since it's deep
into the "If it's not broken don't fix it" area.

-- Jim 

James Cownie[EMAIL PROTECTED]
Etnus, LLC. +44 117 9071438
http://www.etnus.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-10-02 Thread James Cownie


 Queueing the tcores in the mm_struct could work though. Add a prctl [1]
 that enables tcore core dumping. When tcore core dumping is enabled every
 core dump that would dump a mm_struct with reference count  1 does not
 actually dump it, but just queues a structure (tqueue) with its registers/
 signal info/etc.  into a list in the mm_struct. When a thread dumps and 
 the mm_struct count is 1 then dump a normal core file with all the tcores 
 as thread notes. Also clean up all the cores then when freeing the mm_struct.

What is your model for the scope of this prctl ?

Should it be on a per-thread (OK, Alexander, "per process sharing the
same MM") basis, or does it apply to all members of the thread_group ?

It seems to me that it makes no sense unless
1) it applies to all members of the thread_group
   (because without this the ref count on the mm will never get to
   zero).
2) if it applies we also ensure that core dump signals get sent to all
   members of the thread_group.
   (because without this the other threads won't exit).

At the moment (test9-pre7) there seems to be no code in the kernel to
cause the core dumping signals to be fanned out. But then there seems
to be no code to cause any signals (even the negative "deliver to
thread group" ones) to be fanned out. I assume that's because it's
still work in progress.

(Yes, I know that linuxthreads does it from outside the kernel, but I
was under the impression that the aim of the thread_group was to make
it work better, and if we're going to do multi-threaded core files it
seems that distributing the core dumping signals inside the kernel
is essential).

Just to reiterate what I and, I think, some of the other folks who've
written the existing patches are trying to achieve :-

1) We want to provide multi-threaded core files for pthread programs,
   where the user _expects_ a core dump signal to terminate all of the
   threads in the process and the process itself as soon "as
   possible".

2) We don't want to affect programs which are using the full linux
   threads model. If it makes no sense to attempt to write a
   multi-threaded core dump from such a process that's fine. People who
   are using this model probably don't want such a thing anyway.

The "itch" I'm trying to scratch is the complaints we get from our
customers saying that our debugger doesn't work because it won't
provide them with any useful information from the core files they get
from their pthreaded codes when running on Linux. I don't like having
to explain that it's a kernel "feature" :-( so I'm trying to fix it
(or, at least, help other people to do do).

-- Jim 

James Cownie[EMAIL PROTECTED]
Etnus, LLC. +44 117 9071438
http://www.etnus.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-10-02 Thread Andi Kleen

On Mon, Oct 02, 2000 at 03:10:10PM +0100, James Cownie wrote:
 
  Queueing the tcores in the mm_struct could work though. Add a prctl [1]
  that enables tcore core dumping. When tcore core dumping is enabled every
  core dump that would dump a mm_struct with reference count  1 does not
  actually dump it, but just queues a structure (tqueue) with its registers/
  signal info/etc.  into a list in the mm_struct. When a thread dumps and 
  the mm_struct count is 1 then dump a normal core file with all the tcores 
  as thread notes. Also clean up all the cores then when freeing the mm_struct.
 
 What is your model for the scope of this prctl ?
 
 Should it be on a per-thread (OK, Alexander, "per process sharing the
 same MM") basis, or does it apply to all members of the thread_group ?

It should be per thread and inherited to childs but cleared on exec.
(this way the original thread or the thread manager could set it in 
LinuxThreads). 

 
 It seems to me that it makes no sense unless
 1) it applies to all members of the thread_group
(because without this the ref count on the mm will never get to
zero).

In this case the tcore information would be lost. If you don't
want that don't enable the prctl.

 2) if it applies we also ensure that core dump signals get sent to all
members of the thread_group.
(because without this the other threads won't exit).

With the prctl you told the kernel that you are comitted to that.


 
 At the moment (test9-pre7) there seems to be no code in the kernel to
 cause the core dumping signals to be fanned out. But then there seems
 to be no code to cause any signals (even the negative "deliver to
 thread group" ones) to be fanned out. I assume that's because it's
 still work in progress.

I see no problem in doing it in user space. It works fine there.
There are potential applications of the Linux clone threads model where 
you don't want to kill the other threads (e.g. when you have a 
object database that does its own object paging using SIGSEGV) 



-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-10-01 Thread Andi Kleen

On Sat, Sep 30, 2000 at 03:07:57PM -0400, Alexander Viro wrote:
> 
> 
> On Sat, 30 Sep 2000, James Cownie wrote:
> 
> > I was expecting to take the Posix thread style viewpoint in which any
> > of the core dumping signals kill the _process_, so all of the threads
> > are necessarily dead thereafter since they have nowhere to live any
> > longer.
> 
> Different model. Threads are _not_ parts of process, they are processes
> that happen to share a component (VM).

The tcore concept in one of the multithreaded coredump patches looks 
quite useful actually [queueing the registers and then it is dumped by
a single thread]. They just got it wrong by using a global list, using
pgroup and not using an good mechanism to garbage collect tcores.

Queueing the tcores in the mm_struct could work though. Add a prctl [1]
that enables tcore core dumping. When tcore core dumping is enabled every
core dump that would dump a mm_struct with reference count > 1 does not
actually dump it, but just queues a structure (tqueue) with its registers/
signal info/etc.  into a list in the mm_struct. When a thread dumps and 
the mm_struct count is 1 then dump a normal core file with all the tcores 
as thread notes. Also clean up all the cores then when freeing the mm_struct.

This queueing has to be limited of course, probably by an ulimit. The prctl
could also set an limit. 

The question is just what information to put into the tqueue structure.
Everything that can be expressed by a elf_phdr is probably a good start. 

There are not more races than in normal core dumping.

This model is generally enough to fit into the Linux clone model.

-Andi

[1] I think a prctl is better than a clone flag here because it keeps the 
business of crashing out of the fast paths.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-10-01 Thread Andi Kleen

On Sat, Sep 30, 2000 at 03:07:57PM -0400, Alexander Viro wrote:
 
 
 On Sat, 30 Sep 2000, James Cownie wrote:
 
  I was expecting to take the Posix thread style viewpoint in which any
  of the core dumping signals kill the _process_, so all of the threads
  are necessarily dead thereafter since they have nowhere to live any
  longer.
 
 Different model. Threads are _not_ parts of process, they are processes
 that happen to share a component (VM).

The tcore concept in one of the multithreaded coredump patches looks 
quite useful actually [queueing the registers and then it is dumped by
a single thread]. They just got it wrong by using a global list, using
pgroup and not using an good mechanism to garbage collect tcores.

Queueing the tcores in the mm_struct could work though. Add a prctl [1]
that enables tcore core dumping. When tcore core dumping is enabled every
core dump that would dump a mm_struct with reference count  1 does not
actually dump it, but just queues a structure (tqueue) with its registers/
signal info/etc.  into a list in the mm_struct. When a thread dumps and 
the mm_struct count is 1 then dump a normal core file with all the tcores 
as thread notes. Also clean up all the cores then when freeing the mm_struct.

This queueing has to be limited of course, probably by an ulimit. The prctl
could also set an limit. 

The question is just what information to put into the tqueue structure.
Everything that can be expressed by a elf_phdr is probably a good start. 

There are not more races than in normal core dumping.

This model is generally enough to fit into the Linux clone model.

-Andi

[1] I think a prctl is better than a clone flag here because it keeps the 
business of crashing out of the fast paths.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-30 Thread Alexander Viro



On Sat, 30 Sep 2000, James Cownie wrote:

> I was expecting to take the Posix thread style viewpoint in which any
> of the core dumping signals kill the _process_, so all of the threads
> are necessarily dead thereafter since they have nowhere to live any
> longer.

Different model. Threads are _not_ parts of process, they are processes
that happen to share a component (VM).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-30 Thread Andi Kleen

On Sat, Sep 30, 2000 at 03:45:54PM +0100, James Cownie wrote:
> Since the Villarreal patch exists and seems to do all that I wanted, I
> don't propose to create a competing patch.
> 
> Maybe you kernel gurus could point out any problems with the Villarreal
> approach ? 

The patch assumes that all threads have the same pgrp (may be not true)

When other threads do not actually coredump or have the same pgrp then 
the tcore structure will never be cleaned up as far as I can see, allowing 
a nice DoS attack of filling your memory completely. 

There was also another patch from Philip Gladstone for 2.2 BTW which did the same
thing, but also had various problems. It worked around that particular trap by
implementing the killing of other threads in kernel space (which is fine for Linux
Threads, but limits other otherwise useful applications of clone threads). I think
it had some other problems too.

-Andi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-30 Thread James Cownie


> Open question: whether or not to allow the remaining threads to
> continue once the dump is completed, to abort them, or to signal
> them.  Probably should be run time configurable.

I was expecting to take the Posix thread style viewpoint in which any
of the core dumping signals kill the _process_, so all of the threads
are necessarily dead thereafter since they have nowhere to live any
longer.

This approach is certainly what people migrating from other Unixen
expect. (And, killing everyone is also what Linux threads
implements). 

Andreas Dilger <[EMAIL PROTECTED]> was kind enough to point out
that there have been a couple of recent postings (which I failed to
find in my original search :-() which claim already to have
implemented this, and provided patches :-

Terje Malmedal <[EMAIL PROTECTED]>:
http://marc.theaimsgroup.com/?l=linux-kernel=96355845607151=4

  The Malmedal patch is not actually a patch to generate a
  multi-threaded core file, rather it generates a separate complete core
  file for each thread. I will not consider this further.

Jason Villarreal <[EMAIL PROTECTED]>:
http://marc.theaimsgroup.com/?l=linux-kernel=96931745912910=4

  The Villarreal patch is exactly the kind of thing I was thinking
  of. It generates a standard multi-threaded ELF core file.

  I _think_, but would need to read it in more detail to be sure, that
  it assumes that all of the threads in the process are sent the
  signal which forced the dump. It then lets the last one out actually
  write the file including the thread specific data recorded by all of
  the previous threads to exit.

Since the Villarreal patch exists and seems to do all that I wanted, I
don't propose to create a competing patch.

Maybe you kernel gurus could point out any problems with the Villarreal
approach ? 

-- Jim 

James Cownie<[EMAIL PROTECTED]>
Etnus, LLC. +44 117 9071438
http://www.etnus.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-30 Thread James Cownie


 Open question: whether or not to allow the remaining threads to
 continue once the dump is completed, to abort them, or to signal
 them.  Probably should be run time configurable.

I was expecting to take the Posix thread style viewpoint in which any
of the core dumping signals kill the _process_, so all of the threads
are necessarily dead thereafter since they have nowhere to live any
longer.

This approach is certainly what people migrating from other Unixen
expect. (And, killing everyone is also what Linux threads
implements). 

Andreas Dilger [EMAIL PROTECTED] was kind enough to point out
that there have been a couple of recent postings (which I failed to
find in my original search :-() which claim already to have
implemented this, and provided patches :-

Terje Malmedal [EMAIL PROTECTED]:
http://marc.theaimsgroup.com/?l=linux-kernelm=96355845607151w=4

  The Malmedal patch is not actually a patch to generate a
  multi-threaded core file, rather it generates a separate complete core
  file for each thread. I will not consider this further.

Jason Villarreal [EMAIL PROTECTED]:
http://marc.theaimsgroup.com/?l=linux-kernelm=96931745912910w=4

  The Villarreal patch is exactly the kind of thing I was thinking
  of. It generates a standard multi-threaded ELF core file.

  I _think_, but would need to read it in more detail to be sure, that
  it assumes that all of the threads in the process are sent the
  signal which forced the dump. It then lets the last one out actually
  write the file including the thread specific data recorded by all of
  the previous threads to exit.

Since the Villarreal patch exists and seems to do all that I wanted, I
don't propose to create a competing patch.

Maybe you kernel gurus could point out any problems with the Villarreal
approach ? 

-- Jim 

James Cownie[EMAIL PROTECTED]
Etnus, LLC. +44 117 9071438
http://www.etnus.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-30 Thread Andi Kleen

On Sat, Sep 30, 2000 at 03:45:54PM +0100, James Cownie wrote:
 Since the Villarreal patch exists and seems to do all that I wanted, I
 don't propose to create a competing patch.
 
 Maybe you kernel gurus could point out any problems with the Villarreal
 approach ? 

The patch assumes that all threads have the same pgrp (may be not true)

When other threads do not actually coredump or have the same pgrp then 
the tcore structure will never be cleaned up as far as I can see, allowing 
a nice DoS attack of filling your memory completely. 

There was also another patch from Philip Gladstone for 2.2 BTW which did the same
thing, but also had various problems. It worked around that particular trap by
implementing the killing of other threads in kernel space (which is fine for Linux
Threads, but limits other otherwise useful applications of clone threads). I think
it had some other problems too.

-Andi


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Marty Fouts



> -Original Message-
> From: Alan Cox [mailto:[EMAIL PROTECTED]]
> Sent: Friday, September 29, 2000 2:08 PM
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];
> [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Subject: Re: Anyone working on multi-threaded core files for 2.4 ?
> 
> 
> > > while the dump is taken? How about thread A coredumping, 
> half of the image
> > > being already written and thread B (nowhere near the 
> kernel mode, mind
> > > you) changing the data both in the area that is already 
> dumped and area
> > > the still isn't? After that you can look at the dump and 
> notice absolutely
> > > corrupted data structures - very effective in 
> misdirecting your attempts
> > > to figure out what went wrong.
> > 
> > Couldn't all threads be stopped before coredumping begins?
> 
> Unless I am missing something doesn't a truncate of a file in 
> parallel also
> yank the pages from under the dump too
> 

a "good enough" bit of coherence for dumping, it seems to me, can be met by
insuring that none of the threads in the "process" are scheduled against a
CPU during the dump.  On a UP this can be relatively simple to do by making
sure that each related thread is kept off the run queue while the dump
occurs, since it is known that none of the non-failing threads were running
when the dump started.  On an MP you must also make sure that none of the
threads are running on any of the other processors when the dump starts.
Once all of the threads are stopped, then an "normal" dump is enough,
augmented by (optionaly) dumping the thread-specific state (ie PCB and
stack) of all of the threads in the "process."

Open question: whether or not to allow the remaining threads to continue
once the dump is completed, to abort them, or to signal them.  Probably
should be run time configurable.

Marty
  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Marty Fouts



> -Original Message-
> From: Igmar Palsenberg [mailto:[EMAIL PROTECTED]]

[snip]

> 
> Maybe I'm totally stupid, but I think you need to sync the 
> threads so that
> the're in the same state. And I don't think it's that simple.
> 
> Or I'm talking totally nonsense here :)
> 

I think one needs to be careful of not letting a desire for the perfect
solution prevent deploying a useful solution while working out the best one,
in this case.

When a multithreaded application "dies" due to one of the threads failing in
an unexpected and unrecoverable way, there probably isn't, at that point, a
"same state" for the threads to be in, and an non-coherent dump, while still
difficult to use, is more useful than not dumping any state at all, and
often is coherent enough to use anyway.

Marty
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Alexander Viro



On Fri, 29 Sep 2000, Alan Cox wrote:

> > > while the dump is taken? How about thread A coredumping, half of the image
> > > being already written and thread B (nowhere near the kernel mode, mind
> > > you) changing the data both in the area that is already dumped and area
> > > the still isn't? After that you can look at the dump and notice absolutely
> > > corrupted data structures - very effective in misdirecting your attempts
> > > to figure out what went wrong.
> > 
> > Couldn't all threads be stopped before coredumping begins?
> 
> Unless I am missing something doesn't a truncate of a file in parallel also
> yank the pages from under the dump too

Not exactly the same. vmtruncate() doesn't alter the VMA list|AVL-tree, it
just eats the pages, so you get zeroes. mmap() from another thread,
though... We _could_ protect ourselves from that, all right
(down(>mm->mmap_sem) and we are OK).

The real problem is different - sure thing, you can get garbled dump if
you truncate one of the mmaped files, but you are virtually guaranteed to
get memory writes from another thread. IOW, it's a difference between "you
can shoot your foot if you want it" and "if you are really lucky you will
not suffer too long".

The question being: is it really worth the trouble? We could try to stop
all threads and get a relatively safe dumps, but that's way trickier than
"remove this silly check and grab ->mmap_sem to avoid oopsen". If somebody
is willing to do that right - sure, why not?

Stopping them all _will_ be tricky - e.g. file creation would better
happen before such attempt, etc. There is a nice deadlock potential, so
we'ld better be very accurate. We should be reasonably safe for a dump on
local fs if we grab the ->mmap_sem and allocate all blocks for dump before
stopping the rest of threads, but frankly, I've no idea what's involved in 
NFS and CODA cases.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Alan Cox

> > while the dump is taken? How about thread A coredumping, half of the image
> > being already written and thread B (nowhere near the kernel mode, mind
> > you) changing the data both in the area that is already dumped and area
> > the still isn't? After that you can look at the dump and notice absolutely
> > corrupted data structures - very effective in misdirecting your attempts
> > to figure out what went wrong.
> 
> Couldn't all threads be stopped before coredumping begins?

Unless I am missing something doesn't a truncate of a file in parallel also
yank the pages from under the dump too


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread I Lee Hetherington

Alexander Viro wrote:

> How about preventing the rest of threads from doing mmap()/munmap()/etc.
> while the dump is taken? How about thread A coredumping, half of the image
> being already written and thread B (nowhere near the kernel mode, mind
> you) changing the data both in the area that is already dumped and area
> the still isn't? After that you can look at the dump and notice absolutely
> corrupted data structures - very effective in misdirecting your attempts
> to figure out what went wrong.

Couldn't all threads be stopped before coredumping begins?

--Lee Hetherington


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Alexander Viro



On Fri, 29 Sep 2000, Igmar Palsenberg wrote:

> > I was aiming at the simplest and in my mind most obvious thing, which
> > is to have the standard ELF coreer dump handle multiple threads in the
> > same way as it does on many other systems. The lack of these causes
> > shrieks of amazement from many of our customers :-(
> > 
> > This is not rocket science, and there are already debuggers (gdb, our
> > product TotalView, ...) which know how to understand such core dumps
> > if only the kernel produced them.
> > 
> > Now that the kernel has mechanisms for finding all the threads in a
> > process, the actual dump writing should be relatively simple. (You
> > need to write the appropriate register notes for every thread, rather
> > than just one). 

How about preventing the rest of threads from doing mmap()/munmap()/etc.
while the dump is taken? How about thread A coredumping, half of the image
being already written and thread B (nowhere near the kernel mode, mind
you) changing the data both in the area that is already dumped and area
the still isn't? After that you can look at the dump and notice absolutely
corrupted data structures - very effective in misdirecting your attempts
to figure out what went wrong.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Igmar Palsenberg



> I was aiming at the simplest and in my mind most obvious thing, which
> is to have the standard ELF coreer dump handle multiple threads in the
> same way as it does on many other systems. The lack of these causes
> shrieks of amazement from many of our customers :-(
> 
> This is not rocket science, and there are already debuggers (gdb, our
> product TotalView, ...) which know how to understand such core dumps
> if only the kernel produced them.
> 
> Now that the kernel has mechanisms for finding all the threads in a
> process, the actual dump writing should be relatively simple. (You
> need to write the appropriate register notes for every thread, rather
> than just one). 

Maybe I'm totally stupid, but I think you need to sync the threads so that
the're in the same state. And I don't think it's that simple.

Or I'm talking totally nonsense here :)




Igmar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Brian Pomerantz

On Fri, Sep 29, 2000 at 01:34:47PM +0100, James Cownie wrote:
> 
> I was aiming at the simplest and in my mind most obvious thing, which
> is to have the standard ELF coreer dump handle multiple threads in the
> same way as it does on many other systems. The lack of these causes
> shrieks of amazement from many of our customers :-(
> 
> This is not rocket science, and there are already debuggers (gdb, our
> product TotalView, ...) which know how to understand such core dumps
> if only the kernel produced them.
> 
> Now that the kernel has mechanisms for finding all the threads in a
> process, the actual dump writing should be relatively simple. (You
> need to write the appropriate register notes for every thread, rather
> than just one). 
> 
> You seem to be aiming for a much more featureful solution (also
> applicable to checkpointing ?), I'm simply aiming to catch up with
> existing practice on many other operating systems.
>

I have been thinking about adding this for some time.  Post mortem
analysis of multi-threaded programs is a Good Thing(tm).  I haven't
heard any news on whether anyone has started implementation of the
ideas kicked around on this list with regards to the thread group ID.
It seemed like a good kernel compromise to bring Linux up to speed
with other OSs.  I should probably get on the glibc mailing list as
well since there would have to be some changes made there to make
things work properly.


BAPper
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread James Cownie

Richard Moore - RAS Project Lead - Linux Technology Centre. 
wrote :-
> If you have ideas/concerns/requirements please make them known.
...
> There are many things we'd like to see 
> incorporated, the question is how not to boil the ocean. Here are some of 
> the ideas we are thinking about: 
> 
> Multi-process/multi-thread 
> Customisable memory ranges/object types 
>Code/Stack/Dynamic allocations 
>System Objects: 
>   File-system 
>   Memory Management 
>   Device Management 
>   Process/Task Management 
>   (Physical memory ranges) 
> 
> Multiple (non-fatal) Triggers: 
>Trap 
>Command 
>API 
>Automated (via DProbes) 

I was aiming at the simplest and in my mind most obvious thing, which
is to have the standard ELF coreer dump handle multiple threads in the
same way as it does on many other systems. The lack of these causes
shrieks of amazement from many of our customers :-(

This is not rocket science, and there are already debuggers (gdb, our
product TotalView, ...) which know how to understand such core dumps
if only the kernel produced them.

Now that the kernel has mechanisms for finding all the threads in a
process, the actual dump writing should be relatively simple. (You
need to write the appropriate register notes for every thread, rather
than just one). 

You seem to be aiming for a much more featureful solution (also
applicable to checkpointing ?), I'm simply aiming to catch up with
existing practice on many other operating systems.

-- Jim 

James Cownie<[EMAIL PROTECTED]>
Etnus, LLC. +44 117 9071438
http://www.etnus.com


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread James Cownie


> The synchronization on dump between the processes sharing a VM is quite nasty 
> actually. There were patches for it in the past, but they usually got that
> wrong. Linux has no way currently to stop them atomically.

How atomic does it need to be, though ?

In a uni-processor there shouldn't be a problem, since if this thread
is running none of the others can be.

In an SMP then the only thing that can be expected is that the
registers for threads which were executing on a different processor at
the time that a thread dumped show those executing threads as being in
a place which they can reach given the interactions between the
threads. (I.e. that causality is maintained.) 

It's like relativity, you can't ask about simultaneity, only
causality.

I was thinking that a scheme in which the core-dumping thread hit all
the others with a SIGSTOP and then (somehow...) waited for them all to
stop before writing the core file would suffice. (Of course, I may be
wrong !)

-- Jim 

James Cownie<[EMAIL PROTECTED]>
Etnus, LLC. +44 117 9071438
http://www.etnus.com



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread richardj_moore



Yes we (IBM Linux Technology Center RAS Team) are.  If you have
ideas/concerns/requirements please make them known. We are at the point of
deciding what to attack. We have other dumping technologies on other OSs we
could model a Linux enhancement on. There are many things we'd like to see
incorporated, the question is how not to boil the ocean. Here are some of
the ideas we are thinking about:

Multi-process/multi-thread
Customisable memory ranges/object types
   Code/Stack/Dynamic allocations
   System Objects:
  File-system
  Memory Management
  Device Management
  Process/Task Management
  (Physical memory ranges)

Multiple (non-fatal) Triggers:
   Trap
   Command
   API
   Automated (via DProbes)

Richard


Richard Moore -  RAS Project Lead - Linux Technology Centre.

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
PISC, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


James Cownie <[EMAIL PROTECTED]> on 29/09/2000 12:22:28

Please respond to James Cownie <[EMAIL PROTECTED]>

To:   [EMAIL PROTECTED]
cc:(bcc: Richard J Moore/UK/IBM)
Subject:  Anyone working on multi-threaded core files for 2.4 ?





Please let me know (by mail) otherwise I may take a look, since it
doesn't appear to be a _huge_ problem any longer, and it's one of the
things users keep bitching at us about when using our debugger :-(

Thanks

-- Jim

James Cownie   <[EMAIL PROTECTED]>
Etnus, LLC. +44 117 9071438
http://www.etnus.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Andi Kleen

On Fri, Sep 29, 2000 at 12:22:28PM +0100, James Cownie wrote:
> 
> Please let me know (by mail) otherwise I may take a look, since it
> doesn't appear to be a _huge_ problem any longer, and it's one of the
> things users keep bitching at us about when using our debugger :-(

The synchronization on dump between the processes sharing a VM is quite nasty 
actually. There were patches for it in the past, but they usually got that
wrong. Linux has no way currently to stop them atomically.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Andi Kleen

On Fri, Sep 29, 2000 at 12:22:28PM +0100, James Cownie wrote:
 
 Please let me know (by mail) otherwise I may take a look, since it
 doesn't appear to be a _huge_ problem any longer, and it's one of the
 things users keep bitching at us about when using our debugger :-(

The synchronization on dump between the processes sharing a VM is quite nasty 
actually. There were patches for it in the past, but they usually got that
wrong. Linux has no way currently to stop them atomically.


-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread richardj_moore



Yes we (IBM Linux Technology Center RAS Team) are.  If you have
ideas/concerns/requirements please make them known. We are at the point of
deciding what to attack. We have other dumping technologies on other OSs we
could model a Linux enhancement on. There are many things we'd like to see
incorporated, the question is how not to boil the ocean. Here are some of
the ideas we are thinking about:

Multi-process/multi-thread
Customisable memory ranges/object types
   Code/Stack/Dynamic allocations
   System Objects:
  File-system
  Memory Management
  Device Management
  Process/Task Management
  (Physical memory ranges)

Multiple (non-fatal) Triggers:
   Trap
   Command
   API
   Automated (via DProbes)

Richard


Richard Moore -  RAS Project Lead - Linux Technology Centre.

http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
PISC, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK


James Cownie [EMAIL PROTECTED] on 29/09/2000 12:22:28

Please respond to James Cownie [EMAIL PROTECTED]

To:   [EMAIL PROTECTED]
cc:(bcc: Richard J Moore/UK/IBM)
Subject:  Anyone working on multi-threaded core files for 2.4 ?





Please let me know (by mail) otherwise I may take a look, since it
doesn't appear to be a _huge_ problem any longer, and it's one of the
things users keep bitching at us about when using our debugger :-(

Thanks

-- Jim

James Cownie   [EMAIL PROTECTED]
Etnus, LLC. +44 117 9071438
http://www.etnus.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread James Cownie

Richard Moore - RAS Project Lead - Linux Technology Centre. 
wrote :-
 If you have ideas/concerns/requirements please make them known.
...
 There are many things we'd like to see 
 incorporated, the question is how not to boil the ocean. Here are some of 
 the ideas we are thinking about: 
 
 Multi-process/multi-thread 
 Customisable memory ranges/object types 
Code/Stack/Dynamic allocations 
System Objects: 
   File-system 
   Memory Management 
   Device Management 
   Process/Task Management 
   (Physical memory ranges) 
 
 Multiple (non-fatal) Triggers: 
Trap 
Command 
API 
Automated (via DProbes) 

I was aiming at the simplest and in my mind most obvious thing, which
is to have the standard ELF coreer dump handle multiple threads in the
same way as it does on many other systems. The lack of these causes
shrieks of amazement from many of our customers :-(

This is not rocket science, and there are already debuggers (gdb, our
product TotalView, ...) which know how to understand such core dumps
if only the kernel produced them.

Now that the kernel has mechanisms for finding all the threads in a
process, the actual dump writing should be relatively simple. (You
need to write the appropriate register notes for every thread, rather
than just one). 

You seem to be aiming for a much more featureful solution (also
applicable to checkpointing ?), I'm simply aiming to catch up with
existing practice on many other operating systems.

-- Jim 

James Cownie[EMAIL PROTECTED]
Etnus, LLC. +44 117 9071438
http://www.etnus.com


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread James Cownie


 The synchronization on dump between the processes sharing a VM is quite nasty 
 actually. There were patches for it in the past, but they usually got that
 wrong. Linux has no way currently to stop them atomically.

How atomic does it need to be, though ?

In a uni-processor there shouldn't be a problem, since if this thread
is running none of the others can be.

In an SMP then the only thing that can be expected is that the
registers for threads which were executing on a different processor at
the time that a thread dumped show those executing threads as being in
a place which they can reach given the interactions between the
threads. (I.e. that causality is maintained.) 

It's like relativity, you can't ask about simultaneity, only
causality.

I was thinking that a scheme in which the core-dumping thread hit all
the others with a SIGSTOP and then (somehow...) waited for them all to
stop before writing the core file would suffice. (Of course, I may be
wrong !)

-- Jim 

James Cownie[EMAIL PROTECTED]
Etnus, LLC. +44 117 9071438
http://www.etnus.com



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Brian Pomerantz

On Fri, Sep 29, 2000 at 01:34:47PM +0100, James Cownie wrote:
 
 I was aiming at the simplest and in my mind most obvious thing, which
 is to have the standard ELF coreer dump handle multiple threads in the
 same way as it does on many other systems. The lack of these causes
 shrieks of amazement from many of our customers :-(
 
 This is not rocket science, and there are already debuggers (gdb, our
 product TotalView, ...) which know how to understand such core dumps
 if only the kernel produced them.
 
 Now that the kernel has mechanisms for finding all the threads in a
 process, the actual dump writing should be relatively simple. (You
 need to write the appropriate register notes for every thread, rather
 than just one). 
 
 You seem to be aiming for a much more featureful solution (also
 applicable to checkpointing ?), I'm simply aiming to catch up with
 existing practice on many other operating systems.


I have been thinking about adding this for some time.  Post mortem
analysis of multi-threaded programs is a Good Thing(tm).  I haven't
heard any news on whether anyone has started implementation of the
ideas kicked around on this list with regards to the thread group ID.
It seemed like a good kernel compromise to bring Linux up to speed
with other OSs.  I should probably get on the glibc mailing list as
well since there would have to be some changes made there to make
things work properly.


BAPper
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Igmar Palsenberg



 I was aiming at the simplest and in my mind most obvious thing, which
 is to have the standard ELF coreer dump handle multiple threads in the
 same way as it does on many other systems. The lack of these causes
 shrieks of amazement from many of our customers :-(
 
 This is not rocket science, and there are already debuggers (gdb, our
 product TotalView, ...) which know how to understand such core dumps
 if only the kernel produced them.
 
 Now that the kernel has mechanisms for finding all the threads in a
 process, the actual dump writing should be relatively simple. (You
 need to write the appropriate register notes for every thread, rather
 than just one). 

Maybe I'm totally stupid, but I think you need to sync the threads so that
the're in the same state. And I don't think it's that simple.

Or I'm talking totally nonsense here :)




Igmar

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Alexander Viro



On Fri, 29 Sep 2000, Igmar Palsenberg wrote:

  I was aiming at the simplest and in my mind most obvious thing, which
  is to have the standard ELF coreer dump handle multiple threads in the
  same way as it does on many other systems. The lack of these causes
  shrieks of amazement from many of our customers :-(
  
  This is not rocket science, and there are already debuggers (gdb, our
  product TotalView, ...) which know how to understand such core dumps
  if only the kernel produced them.
  
  Now that the kernel has mechanisms for finding all the threads in a
  process, the actual dump writing should be relatively simple. (You
  need to write the appropriate register notes for every thread, rather
  than just one). 

How about preventing the rest of threads from doing mmap()/munmap()/etc.
while the dump is taken? How about thread A coredumping, half of the image
being already written and thread B (nowhere near the kernel mode, mind
you) changing the data both in the area that is already dumped and area
the still isn't? After that you can look at the dump and notice absolutely
corrupted data structures - very effective in misdirecting your attempts
to figure out what went wrong.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread I Lee Hetherington

Alexander Viro wrote:

 How about preventing the rest of threads from doing mmap()/munmap()/etc.
 while the dump is taken? How about thread A coredumping, half of the image
 being already written and thread B (nowhere near the kernel mode, mind
 you) changing the data both in the area that is already dumped and area
 the still isn't? After that you can look at the dump and notice absolutely
 corrupted data structures - very effective in misdirecting your attempts
 to figure out what went wrong.

Couldn't all threads be stopped before coredumping begins?

--Lee Hetherington


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Alan Cox

  while the dump is taken? How about thread A coredumping, half of the image
  being already written and thread B (nowhere near the kernel mode, mind
  you) changing the data both in the area that is already dumped and area
  the still isn't? After that you can look at the dump and notice absolutely
  corrupted data structures - very effective in misdirecting your attempts
  to figure out what went wrong.
 
 Couldn't all threads be stopped before coredumping begins?

Unless I am missing something doesn't a truncate of a file in parallel also
yank the pages from under the dump too


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Alexander Viro



On Fri, 29 Sep 2000, Alan Cox wrote:

   while the dump is taken? How about thread A coredumping, half of the image
   being already written and thread B (nowhere near the kernel mode, mind
   you) changing the data both in the area that is already dumped and area
   the still isn't? After that you can look at the dump and notice absolutely
   corrupted data structures - very effective in misdirecting your attempts
   to figure out what went wrong.
  
  Couldn't all threads be stopped before coredumping begins?
 
 Unless I am missing something doesn't a truncate of a file in parallel also
 yank the pages from under the dump too

Not exactly the same. vmtruncate() doesn't alter the VMA list|AVL-tree, it
just eats the pages, so you get zeroes. mmap() from another thread,
though... We _could_ protect ourselves from that, all right
(down(current-mm-mmap_sem) and we are OK).

The real problem is different - sure thing, you can get garbled dump if
you truncate one of the mmaped files, but you are virtually guaranteed to
get memory writes from another thread. IOW, it's a difference between "you
can shoot your foot if you want it" and "if you are really lucky you will
not suffer too long".

The question being: is it really worth the trouble? We could try to stop
all threads and get a relatively safe dumps, but that's way trickier than
"remove this silly check and grab -mmap_sem to avoid oopsen". If somebody
is willing to do that right - sure, why not?

Stopping them all _will_ be tricky - e.g. file creation would better
happen before such attempt, etc. There is a nice deadlock potential, so
we'ld better be very accurate. We should be reasonably safe for a dump on
local fs if we grab the -mmap_sem and allocate all blocks for dump before
stopping the rest of threads, but frankly, I've no idea what's involved in 
NFS and CODA cases.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Marty Fouts



 -Original Message-
 From: Igmar Palsenberg [mailto:[EMAIL PROTECTED]]

[snip]

 
 Maybe I'm totally stupid, but I think you need to sync the 
 threads so that
 the're in the same state. And I don't think it's that simple.
 
 Or I'm talking totally nonsense here :)
 

I think one needs to be careful of not letting a desire for the perfect
solution prevent deploying a useful solution while working out the best one,
in this case.

When a multithreaded application "dies" due to one of the threads failing in
an unexpected and unrecoverable way, there probably isn't, at that point, a
"same state" for the threads to be in, and an non-coherent dump, while still
difficult to use, is more useful than not dumping any state at all, and
often is coherent enough to use anyway.

Marty
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



RE: Anyone working on multi-threaded core files for 2.4 ?

2000-09-29 Thread Marty Fouts



 -Original Message-
 From: Alan Cox [mailto:[EMAIL PROTECTED]]
 Sent: Friday, September 29, 2000 2:08 PM
 To: [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];
 [EMAIL PROTECTED]; [EMAIL PROTECTED]
 Subject: Re: Anyone working on multi-threaded core files for 2.4 ?
 
 
   while the dump is taken? How about thread A coredumping, 
 half of the image
   being already written and thread B (nowhere near the 
 kernel mode, mind
   you) changing the data both in the area that is already 
 dumped and area
   the still isn't? After that you can look at the dump and 
 notice absolutely
   corrupted data structures - very effective in 
 misdirecting your attempts
   to figure out what went wrong.
  
  Couldn't all threads be stopped before coredumping begins?
 
 Unless I am missing something doesn't a truncate of a file in 
 parallel also
 yank the pages from under the dump too
 

a "good enough" bit of coherence for dumping, it seems to me, can be met by
insuring that none of the threads in the "process" are scheduled against a
CPU during the dump.  On a UP this can be relatively simple to do by making
sure that each related thread is kept off the run queue while the dump
occurs, since it is known that none of the non-failing threads were running
when the dump started.  On an MP you must also make sure that none of the
threads are running on any of the other processors when the dump starts.
Once all of the threads are stopped, then an "normal" dump is enough,
augmented by (optionaly) dumping the thread-specific state (ie PCB and
stack) of all of the threads in the "process."

Open question: whether or not to allow the remaining threads to continue
once the dump is completed, to abort them, or to signal them.  Probably
should be run time configurable.

Marty
  
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/