from:"Victor Zandy"

[PATCH] Backport of 2.4 ptrace flag to 2.2

2001-05-11 Thread Victor Zandy

> Alan Cox <[EMAIL PROTECTED]> writes: 
> > The preferable one for performance is certainly to backport the
> > 2.4 changes 

This patch against stock 2.2.19 is a backport of the task structure
ptrace flag of Linux 2.4.

It is available at
http://www.cs.wisc.edu/~zandy/ptrace

As we reported a couple weeks ago, under Linux 2.2 ptrace can globally
corrupt the FPU on SMPs.  Linus identified the problem as a race
between ptrace and the FPU trap handler over the process flags.  The
ptrace flag introduced in 2.4 eliminates the race.

This port is faithful to the 2.4 design.  Essentially it:

 - Adds a new variable `ptrace' to the task structure;
 - Adds new constants for this variable (PT_PTRACED etc.) and removes
   the corresponding old ones (PF_PTRACED etc.);
 - Replaces every ptrace-context reference to `flags' with a reference
   to `ptrace', and updates the constants used accordingly;
 - Updates ptrace offset constants, loads, and comparisons in assembly
   files.

The patch is complete for all platforms except ARM.  On ARM, I didn't
understand the meaning of the offset constants used in the assembly,
so I didn't try to fix them.  The patch does include the necessary
changes to C files on ARM.

We have applied (cleanly), compiled (cleanly) and tested the patch on
an x86 SMP, one of the same ones on which we saw FPU corruption.  We
have verified that FPU corruption cannot be produced, and that gdb and
strace still function.  We have not tested any other platform.

Please direct any questions or problems with the patch to
Victor Zandy <[EMAIL PROTECTED]>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy


Alan Cox <[EMAIL PROTECTED]> writes:

> The preferable one for performance is certainly to backport the 2.4 changes

Is it any more substantial than changing all uses of the ptrace flags
to the new variable?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy

Linus Torvalds writes:
> Ahh.. This actually _does_ look like a race on "current->flags": 
> PTRACE_ATTACH will do a 
> 
> child->flags |= PF_PTRACED; 
> 
> without waiting for the child to have stopped. 

I can see how this could case PF_USEDFPU to be cleared inadvertently,
but I do not have any ideas for testing this.  Is it clear that this
is the source of the problem?

What would be involved in backporting the split ptrace flags to 2.2?
Are there other solutions?

Vic
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy


"Christian Ehrhardt" <[EMAIL PROTECTED]> writes:
> Victor: Could you try to reproduce the system wide corruption if you
> add an explicit call to stts(); at the very end of __switch_to?
> This should prevent the FPU corruption from spreading.

After adding this call, I cannot reproduce the global corruption.
There is still occasional local corruption of individual pi processes
while pt is running.

Vic




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy



Someone else here traced the process flags of a FP-intensive program
on a machine before and after it is put in the faulty FPU state.  He
periodically sampled /proc/pid/stat while the program was running.

He found that PF_USEDFPU was always set before the machine was broken.
After he found that it was set about 70% of the time.

Vic



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy



It looks to me like the kernel sets a trap for FP operations when a
process is switched in.  Then when the process executes an FP op, the
kernel clears the trap and either loads the FP context or initializes
it, depending on whether it is the process' first FP operation.  So no
help is need from anything in user space.

Vic

"Richard B. Johnson" <[EMAIL PROTECTED]> writes:
> On 20 Apr 2001, Ulrich Drepper wrote:
> 
> > "Richard B. Johnson" <[EMAIL PROTECTED]> writes:
> > 
> > > If it "fixes" it, there is no problem with the FPU, but with the
> > > 'C' runtime library which doesn't initialize the FPU to a known
> > > state before it uses it.
> > 
> > It's the kernel which initializes the FPU.  This was always the case
> > and necessary to implement the fast lazy FPU saving/restoring.
> > Processes which never use the FPU never initialize it.
> 
> The kernel doesn't know if a process is going to use the FPU when
> a new process is created. Only the user's code, i.e., the 'C' runtime
> library knows. If the user is using 'asm' or whatever, the user must

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy



No dice.  Your program does not fix the problem.

If it were a hardware problem, I would expect the problem to occur
under 2.4.2 as well as 2.2.*, and I would be surprised that we can
consistently produce the behavior across our 64 node cluster.  But we
are keeping the possibility in mind.

Thanks for your suggestions.

Vic
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy

Victor Zandy <[EMAIL PROTECTED]> writes:
> We have found that one of our programs can cause system-wide
> corruption of the x86 FPU under 2.2.16 and 2.2.17.  That is, after we
> run this program, the FPU gives bad results to all subsequent
> processes.

We have now tested 2.4.2 and 2.2.19.

2.2.19 has the same problem.

2.4.3 does not seem to be affected.  Unfortunately, we really need a
working 2.2 kernel at this time.

We also patched the 2.2.19 kernel with the PIII patch found in
/pub/linux/kernel/people/andrea/patches/v2.2/2.2.19pre13/PIII-10.bz2
on ftp.kernel.org.  Same problem.

Does anyone have any ideas for us?

Thanks.

Vic

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG: Global FPU corruption in 2.2

2001-04-19 Thread Victor Zandy



We have found that one of our programs can cause system-wide
corruption of the x86 FPU under 2.2.16 and 2.2.17.  That is, after we
run this program, the FPU gives bad results to all subsequent
processes.

We see this problem on dual 550MHz Xeons with 1GB RAM.  We have 64 of
these things, and we see the problem on every node we try (dozens).
We don't have other SMPs handy.  Uniprocessors, including other PIIIs,
don't seem to be affected.

While we prepare to test for the problem on more recent 2.2 and 2.4
kernels, we would appreciate hearing from anyone who may have insight
into it.

Below are two programs we use to produce the behavior.  The first
program, pi, repeatedly spawns 10 parallel computations of pi.  When
all is well, each process prints pi as it completes.

The second program, pt, repeatedly attaches to and detaches from
another process.  Run pt against the root pi process until the output
of pi begins to look wrong.  Then kill everything and run pi by itself
again.  It will no longer produce good results.  We find that the FPU
persistently gives bad results until we reboot.

Here is the sort of thing we see:

BEFORE  AFTER
--
c36% ./pi   c36% ./pi
[3883]  [4069]   
3.1415936865157.146714   
3.141593inf  
3.14159381705.277947 
3.1415934.742524 
3.141593nan  
3.141593585.810296   
3.141593inf  
3.1415934.578857 
3.141593nan  
3.1415934.578857 

I am not currently subscribed to linux-kernel.  I'll be checking the
web archives, but please CC replies to me.

Thanks!

Vic Zandy

/* pi.c: gcc -g -o pi pi.c -lm */
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static double
do_pi()
{
double sum=0.0;
double x=1.0;
double s=1.0;
double pi;

while (x <= 1000.0) {
sum += (1.0/pow(x, 3.0))*s;
s = -s;
x += 2.0;
}
pi = pow(sum*32.0, 1.0/3.0);
return pi;
}

int
main( int argc, char* argv[] )
{
int i;
int pid;
int m = 1000;   /* runs */
int n = 10; /* procs per run */

pid = getpid();
fprintf(stderr, "[%d]\n", pid);
while (m-- > 0) {
 for (i = 1; i < n; i++)
  if (!fork())
   break;
 fprintf(stderr, "%f\n", do_pi());
 if (getpid() != pid)
  return 0;
 while (waitpid(0, 0, WNOHANG) > 0)
  ;
}
return 0;
}
/* end of pi.c */

/* pt.c: gcc -g -o pt pt.c */
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

long
dptrace(int req, pid_t pid, void *addr, void *data)
{
char buf[64];
int rv;
rv = ptrace(req, pid, addr, data);
if ((req != PTRACE_PEEKUSR && req != PTRACE_PEEKTEXT) && 0 > rv) {
sprintf(buf, "ptrace (req=%d)", req);
perror(buf);
exit(1);
}
return rv;
}

int
main(int argc, char *argv[])
{
int pid;
char buf[1024];
int n;

if (argc < 2) {
fprintf(stderr, "Usage: %s PID\n", argv[0]);
exit(1);
}
pid = atoi(argv[1]);
while (1) {
dptrace(PTRACE_ATTACH, pid, 0, 0);
waitpid(pid, 0, 0);
dptrace(PTRACE_DETACH, pid, 0, 0);
fprintf(stderr, ".");
}
return 0;
}
/* end of pt.c */


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.0/2.2 Bug: SIGTRAP lost

2000-10-05 Thread Victor Zandy

Victor Zandy <[EMAIL PROTECTED]> writes:

> If a process executes an int3 (breakpoint) instruction while
> another process is attaching to it, the SIGTRAP can be lost.  This bug
> is present in 2.4.0-test8 and 2.2.14.

Uh, this turns out to be my stupid programming error, not a bug in
any of the fine versions of the Linux kernel.

My apologies to anyone who invested time looking at this.

Vic Zandy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

2.0/2.2 Bug: SIGTRAP lost

2000-09-27 Thread Victor Zandy



If a process executes an int3 (breakpoint) instruction while
another process is attaching to it, the SIGTRAP can be lost.  This bug
is present in 2.4.0-test8 and 2.2.14.

Below is a program that demonstrates this behavior.  It forks a
child that repeatedly executes an int3 and handles the SIGTRAP.  The
parent repeatedly attaches and detaches to the child.  Eventually the
SIGTRAP generated by the int3 is lost, and the child falls through (to
the fprintf).

Vic Zandy

#include 
#include 
#include 
#include 
#include 


long int dptrace(enum __ptrace_request req, pid_t pid,
 void *addr, void *data)
{
 int rv;
 rv = ptrace(req, pid, addr, data);
 if (0 > rv) {
  perror("ptrace");
  exit(1);
 }
 return rv;
}

void do_trace(int pid)
{
 while (1) {
  dptrace(PTRACE_ATTACH, pid, 0, 0);
  waitpid(pid, 0, 0);
  dptrace(PTRACE_DETACH, pid, 0, 0);
 }
}

void handler(int sig, struct sigcontext uap)
{
 uap.eip--;
}

void do_int3()
{
 struct sigaction sa;
 sa.sa_handler = (void (*)(int)) handler;
 sigemptyset(&sa.sa_mask);
 sa.sa_flags = 0;
 sigaction(SIGTRAP, &sa, NULL);

 asm("int3");  /* Should loop here */
 fprintf(stderr, "Bug triggered\n");
}

int main(int argc, char *argv[])
{
 int pid;
 pid = fork();
 if (pid)
 do_trace(pid);
 else
 do_int3();
 return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

2.0/2.2 Bug: SIGTRAP lost

2000-09-18 Thread Victor Zandy



If a process executes an int3 (breakpoint) instruction while
another process is attaching to it, the SIGTRAP can be lost.  This bug
is present in 2.4.0-test8 and 2.2.14.

Below is a program that demonstrates this behavior.  It forks a
child that repeatedly executes an int3 and handles the SIGTRAP.  The
parent repeatedly attaches and detaches to the child.  Eventually the
SIGTRAP generated by the int3 is lost, and the child falls through (to
the fprintf).

Vic Zandy

#include 
#include 
#include 
#include 
#include 


long int dptrace(enum __ptrace_request req, pid_t pid,
 void *addr, void *data)
{
 int rv;
 rv = ptrace(req, pid, addr, data);
 if (0 > rv) {
  perror("ptrace");
  exit(1);
 }
 return rv;
}

void do_trace(int pid)
{
 while (1) {
  dptrace(PTRACE_ATTACH, pid, 0, 0);
  waitpid(pid, 0, 0);
  dptrace(PTRACE_DETACH, pid, 0, 0);
 }
}

void handler(int sig, struct sigcontext uap)
{
 uap.eip--;
}

void do_int3()
{
 struct sigaction sa;
 sa.sa_handler = (void (*)(int)) handler;
 sigemptyset(&sa.sa_mask);
 sa.sa_flags = 0;
 sigaction(SIGTRAP, &sa, NULL);

 asm("int3");  /* Should loop here */
 fprintf(stderr, "Bug triggered\n");
}

int main(int argc, char *argv[])
{
 int pid;
 pid = fork();
 if (pid)
 do_trace(pid);
 else
 do_int3();
 return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

[PATCH] Backport of 2.4 ptrace flag to 2.2

Re: BUG: Global FPU corruption in 2.2

Re: BUG: Global FPU corruption in 2.2

Re: BUG: Global FPU corruption in 2.2

Re: BUG: Global FPU corruption in 2.2

Re: BUG: Global FPU corruption in 2.2

Re: BUG: Global FPU corruption in 2.2

Re: BUG: Global FPU corruption in 2.2

BUG: Global FPU corruption in 2.2

Re: 2.0/2.2 Bug: SIGTRAP lost

2.0/2.2 Bug: SIGTRAP lost

2.0/2.2 Bug: SIGTRAP lost

12 matches

Site Navigation

Mail list logo

Footer information