Re: Re: Unable to kill runaway app. -

2009-08-21 Thread Todd Denniston

Rick Stevens wrote, On 12/23/-28158 02:59 PM:

Todd Denniston wrote:

Bob Goodwin wrote, On 12/23/-28158 02:59 PM:
I've added the option soft to the client /etc/fstab which may make it 
possible to interrupt things?


That is, if I have done the right thing in the right place.

Bob




Assuming that after you reboot[1], the situation is better with soft, 
I would suggest going back to hard but use the intr[2] option.

i.e.
server:/usr/local/pub/pub   nfshard,intr

I have seen soft loose data on networks that are some what loaded, 
with out even giving you any error notifications.  The probability 
seemed somewhat proportional with how many times larger the file you 
are writing is than the wsize parameter.


It's "lose" (as in "lost") not "loose" (as in "running wild").  
English lessons aside, 


I sometimes dislike my 'mother' tongue.


did you use TCP instead of the default UDP on that
heavily loaded network?



was not available on the server of that time (Solaris 2.6 or was it 2.5).

[1] so that the process that is currently stuck and CAN NOT be killed 
is finally terminated. :)


[2] man nfs|grep -3  EINTR
or read the man and search for intr



--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Unable to kill runaway app. -

2009-08-21 Thread Rick Stevens

Todd Denniston wrote:

Bob Goodwin wrote, On 12/23/-28158 02:59 PM:
I've added the option soft to the client /etc/fstab which may make it 
possible to interrupt things?


That is, if I have done the right thing in the right place.

Bob




Assuming that after you reboot[1], the situation is better with soft, I 
would suggest going back to hard but use the intr[2] option.

i.e.
server:/usr/local/pub/pub   nfshard,intr

I have seen soft loose data on networks that are some what loaded, with 
out even giving you any error notifications.  The probability seemed 
somewhat proportional with how many times larger the file you are 
writing is than the wsize parameter.


It's "lose" (as in "lost") not "loose" (as in "running wild").  English 
lessons aside, did you use TCP instead of the default UDP on that

heavily loaded network?

[1] so that the process that is currently stuck and CAN NOT be killed is 
finally terminated. :)


[2] man nfs|grep -3  EINTR
or read the man and search for intr

--
- Rick Stevens, Systems Engineer  ri...@nerd.com -
- AIM/Skype: therps2ICQ: 22643734Yahoo: origrps2 -
--
-   Vegetarian:  Old Indian word for "lousy hunter"  -
--

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Unable to kill runaway app. -

2009-08-20 Thread Bob Goodwin

Todd Denniston wrote:

Bob Goodwin wrote, On 12/23/-28158 02:59 PM:
I've added the option soft to the client /etc/fstab which may make it 
possible to interrupt things?


That is, if I have done the right thing in the right place.

Bob




Assuming that after you reboot[1], the situation is better with soft, 
I would suggest going back to hard but use the intr[2] option.

i.e.
server:/usr/local/pub/pub   nfshard,intr

I have seen soft loose data on networks that are some what loaded, 
with out even giving you any error notifications.  The probability 
seemed somewhat proportional with how many times larger the file you 
are writing is than the wsize parameter.



[1] so that the process that is currently stuck and CAN NOT be killed 
is finally terminated. :)


[2] man nfs|grep -3  EINTR
or read the man and search for intr


   Yes, I saw "intr" in a page I found via google while investigating
   "soft" and how to apply it. Wondered about it ...

   I will try it too.

   Thank you.

   Bob



--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Re: Unable to kill runaway app. -

2009-08-20 Thread Todd Denniston

Bob Goodwin wrote, On 12/23/-28158 02:59 PM:
I've added the option soft to the client /etc/fstab which may make it 
possible to interrupt things?


That is, if I have done the right thing in the right place.

Bob




Assuming that after you reboot[1], the situation is better with soft, I would suggest going back to 
hard but use the intr[2] option.

i.e.
server:/usr/local/pub/pub   nfshard,intr

I have seen soft loose data on networks that are some what loaded, with out even giving you any 
error notifications.  The probability seemed somewhat proportional with how many times larger the 
file you are writing is than the wsize parameter.



[1] so that the process that is currently stuck and CAN NOT be killed is 
finally terminated. :)

[2] man nfs|grep -3  EINTR
or read the man and search for intr
--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Unable to kill runaway app. -

2009-08-20 Thread Bob Goodwin

Patrick O'Callaghan wrote:

On Thu, 2009-08-20 at 12:31 -0700, Peter Langfelder wrote:
  


As previously stated, use kill -9 . The kill command without the
-9 only works if the process actually listens to signals, which is not
likely if it's stuck in some (semi-)infinite loop.



To be pedantic, even -9 will only work if the process is "listening".
That's because signal-handling is done by the kernel side of the process
itself. The point about -9 (SIGKILL) is that the process can't trap or
mask it, but if it's stuck waiting on an uninterruptible kernel event
('D' state) there is nothing that will kill it short of rebooting.

poc

Yes, I guess I've had that demonstrated to me.

I've added the option soft to the client /etc/fstab which may make it 
possible to interrupt things?


That is, if I have done the right thing in the right place.

Bob

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Unable to kill runaway app. -

2009-08-20 Thread Patrick O'Callaghan
On Thu, 2009-08-20 at 12:31 -0700, Peter Langfelder wrote:
> On Thu, Aug 20, 2009 at 12:11 PM, Bob Goodwin wrote:
> > I just had perhaps the third occurrence of this problem.
> >
> > I tried to shut down gthumb which was displaying a a photo from the nfs
> > server. It would not shut down, at least not in a reasonable amount if time.
> > Gkrellm showed cup1 running at max. and top indicated the cup at 99.5%.
> > Something did eventually time out but that did not calm the cup activity.: .
> >
> >   3487 bobg  20   0  2928 1068  932 R 99.5  0.0 445:55.55 gam_server
> >
> > Kill 3487 does not stop it. In fact nothing seems to. I told it to poweroff
> > and it got as far as "halting system" and stayed there until I pressed the
> > power button for five seconds or so.
> >
> > This happened once last night and it sat there saying it was busy, the power
> > button was required to kill it then too.
> >
> > I don't expect anyone to troubleshoot the problem but would like to know
> > what other commands I might try to restore things without shutting down and
> > rebooting.
> >
> > This is an F-10 system pretty much up to date, certainly all security
> > updates and perhaps all the rest, I've lost track at the moment. I suspect
> > the problem is related to some horse photo files from my daughters Mac. But
> > I need a way to stop things when this happens ...
> >
> > Any help appreciated.
> >
> > Bob
> 
> As previously stated, use kill -9 . The kill command without the
> -9 only works if the process actually listens to signals, which is not
> likely if it's stuck in some (semi-)infinite loop.

To be pedantic, even -9 will only work if the process is "listening".
That's because signal-handling is done by the kernel side of the process
itself. The point about -9 (SIGKILL) is that the process can't trap or
mask it, but if it's stuck waiting on an uninterruptible kernel event
('D' state) there is nothing that will kill it short of rebooting.

poc

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Unable to kill runaway app. -

2009-08-20 Thread Bob Goodwin

Howard Wilkinson wrote:

Bob Goodwin wrote:

I just had perhaps the third occurrence of this problem.

I tried to shut down gthumb which was displaying a a photo from the 
nfs server. It would not shut down, at least not in a reasonable 
amount if time. Gkrellm showed cup1 running at max. and top indicated 
the cup at 99.5%. Something did eventually time out but that did not 
calm the cup activity.: .


   3487 bobg  20   0  2928 1068  932 R 99.5  0.0 445:55.55 
gam_server


Kill 3487 does not stop it. In fact nothing seems to. I told it to 
poweroff and it got as far as "halting system" and stayed there until 
I pressed the power button for five seconds or so.


This happened once last night and it sat there saying it was busy, 
the power button was required to kill it then too.


I don't expect anyone to troubleshoot the problem but would like to 
know what other commands I might try to restore things without 
shutting down and rebooting.


This is an F-10 system pretty much up to date, certainly all security 
updates and perhaps all the rest, I've lost track at the moment. I 
suspect the problem is related to some horse photo files from my 
daughters Mac. But I need a way to stop things when this happens ...


Any help appreciated.

Bob


Bob,

what kernel version do you have loaded, is the processor a multicore 
or multiprocessor unit. If the kernel version is a recent FC10 update 
and you are on an SMP motherboard then I have seen the same thing 
happen with other processes. The problem seems to be in the area where 
it interacts with the NFS code, BUT it look like a kernel problem with 
the SMP system. I have not been able to get a dump to prove this but 
try downgrading to an older kernel and see if it goes away - I used 
the last FC9 kernel and it did.


I have since upgraded to FC11 and this also does not exhibit the 
problem so it may just have been with one or two of the latest FC10 
builds!


Howard.

This is an older computer, certainly not ancient, a Dell gx280 I bought 
used a few months ago.


[b...@box9 ~]$ uname -a
Linux box9 2.6.27.29-170.2.79.fc10.i686 #1 SMP Fri Aug 14 21:11:41 EDT 
2009 i686 i686 i386 GNU/Linux


I believe that's the most recent Kernel from a few days ago, again I 
don't recall exactly when but I could try an earlier one, I usually save 
two older ones but never seem to need them.


dmidecode shows:

   Handle 0x0400, DMI type 4, 32 bytes
   Processor Information
   Socket Designation: Microprocessor
   Type: Central Processor
   Family: Pentium 4
   Manufacturer: Intel
   ID: 41 0F 00 00 FF FB EB BF
   Signature: Type 0, Family 15, Model 4, Stepping 1

   and also:

   Handle 0x0100, DMI type 1, 25 bytes
   System Information
   Manufacturer: Dell Inc.   
   Product Name: OptiPlex GX280  
   Version: Not Specified

   Serial Number: 9HY0281
   UUID: 44454C4C-4800-1059-8030-B9C04F323831
   Wake-up Type: APM Timer

   Handle 0x0200, DMI type 2, 8 bytes
   Base Board Information
   Manufacturer: Dell Inc. 
   Product Name: 0H7276
   Version:   
   Serial Number: ..CN1374056S00IZ.


   I guess that makes it a multicore processor.


--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Unable to kill runaway app. -

2009-08-20 Thread Howard Wilkinson

Bob Goodwin wrote:

I just had perhaps the third occurrence of this problem.

I tried to shut down gthumb which was displaying a a photo from the 
nfs server. It would not shut down, at least not in a reasonable 
amount if time. Gkrellm showed cup1 running at max. and top indicated 
the cup at 99.5%. Something did eventually time out but that did not 
calm the cup activity.: .


   3487 bobg  20   0  2928 1068  932 R 99.5  0.0 445:55.55 gam_server

Kill 3487 does not stop it. In fact nothing seems to. I told it to 
poweroff and it got as far as "halting system" and stayed there until 
I pressed the power button for five seconds or so.


This happened once last night and it sat there saying it was busy, the 
power button was required to kill it then too.


I don't expect anyone to troubleshoot the problem but would like to 
know what other commands I might try to restore things without 
shutting down and rebooting.


This is an F-10 system pretty much up to date, certainly all security 
updates and perhaps all the rest, I've lost track at the moment. I 
suspect the problem is related to some horse photo files from my 
daughters Mac. But I need a way to stop things when this happens ...


Any help appreciated.

Bob


Bob,

what kernel version do you have loaded, is the processor a multicore or 
multiprocessor unit. If the kernel version is a recent FC10 update and 
you are on an SMP motherboard then I have seen the same thing happen 
with other processes. The problem seems to be in the area where it 
interacts with the NFS code, BUT it look like a kernel problem with the 
SMP system. I have not been able to get a dump to prove this but try 
downgrading to an older kernel and see if it goes away - I used the last 
FC9 kernel and it did.


I have since upgraded to FC11 and this also does not exhibit the problem 
so it may just have been with one or two of the latest FC10 builds!


Howard.

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Unable to kill runaway app. -

2009-08-20 Thread Bob Goodwin

Christopher K. Johnson wrote:

Bob Goodwin wrote:

I just had perhaps the third occurrence of this problem.

I tried to shut down gthumb which was displaying a a photo from the 
nfs server. It would not shut down, at least not in a reasonable 
amount if time. Gkrellm showed cup1 running at max. and top indicated 
the cup at 99.5%. Something did eventually time out but that did not 
calm the cup activity.: .


   3487 bobg  20   0  2928 1068  932 R 99.5  0.0 445:55.55 
gam_server


Kill 3487 does not stop it. In fact nothing seems to. I told it to 
poweroff and it got as far as "halting system" and stayed there until 
I pressed the power button for five seconds or so.


This happened once last night and it sat there saying it was busy, 
the power button was required to kill it then too.


I don't expect anyone to troubleshoot the problem but would like to 
know what other commands I might try to restore things without 
shutting down and rebooting.


This is an F-10 system pretty much up to date, certainly all security 
updates and perhaps all the rest, I've lost track at the moment. I 
suspect the problem is related to some horse photo files from my 
daughters Mac. But I need a way to stop things when this happens ...


Any help appreciated.

Bob

Try "soft" option on the nfs mount in case the root cause is a problem 
with the nfs access to the image file.



   Ok, I will try that. If I understand the soft option goes in the
   client /etc/fstab? It can also be assigned a time value?

   Thanks.

   Bob


--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Unable to kill runaway app. -

2009-08-20 Thread Bob Goodwin

Andras Simon wrote:

On 8/20/09, Bob Goodwin  wrote:

  

I don't expect anyone to troubleshoot the problem but would like to know
what other commands I might try to restore things without shutting down
and rebooting.



kill -9  can be pretty effective.

Andras

  
Yes I tried kill -9 3487 and even -0. I have some trouble understanding 
the Kill man page but those seemed like something to try.


And I must apologizes for "cup" instead of cpu, my spell checker did 
that for me. I thought I told it to remember the word but must have 
clicked the wrong spot?


Bob

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Unable to kill runaway app. -

2009-08-20 Thread Peter Langfelder
On Thu, Aug 20, 2009 at 12:11 PM, Bob Goodwin wrote:
> I just had perhaps the third occurrence of this problem.
>
> I tried to shut down gthumb which was displaying a a photo from the nfs
> server. It would not shut down, at least not in a reasonable amount if time.
> Gkrellm showed cup1 running at max. and top indicated the cup at 99.5%.
> Something did eventually time out but that did not calm the cup activity.: .
>
>   3487 bobg      20   0  2928 1068  932 R 99.5  0.0 445:55.55 gam_server
>
> Kill 3487 does not stop it. In fact nothing seems to. I told it to poweroff
> and it got as far as "halting system" and stayed there until I pressed the
> power button for five seconds or so.
>
> This happened once last night and it sat there saying it was busy, the power
> button was required to kill it then too.
>
> I don't expect anyone to troubleshoot the problem but would like to know
> what other commands I might try to restore things without shutting down and
> rebooting.
>
> This is an F-10 system pretty much up to date, certainly all security
> updates and perhaps all the rest, I've lost track at the moment. I suspect
> the problem is related to some horse photo files from my daughters Mac. But
> I need a way to stop things when this happens ...
>
> Any help appreciated.
>
> Bob

As previously stated, use kill -9 . The kill command without the
-9 only works if the process actually listens to signals, which is not
likely if it's stuck in some (semi-)infinite loop.

HTH,

Peter

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Unable to kill runaway app. -

2009-08-20 Thread Christopher K. Johnson

Bob Goodwin wrote:

I just had perhaps the third occurrence of this problem.

I tried to shut down gthumb which was displaying a a photo from the 
nfs server. It would not shut down, at least not in a reasonable 
amount if time. Gkrellm showed cup1 running at max. and top indicated 
the cup at 99.5%. Something did eventually time out but that did not 
calm the cup activity.: .


   3487 bobg  20   0  2928 1068  932 R 99.5  0.0 445:55.55 gam_server

Kill 3487 does not stop it. In fact nothing seems to. I told it to 
poweroff and it got as far as "halting system" and stayed there until 
I pressed the power button for five seconds or so.


This happened once last night and it sat there saying it was busy, the 
power button was required to kill it then too.


I don't expect anyone to troubleshoot the problem but would like to 
know what other commands I might try to restore things without 
shutting down and rebooting.


This is an F-10 system pretty much up to date, certainly all security 
updates and perhaps all the rest, I've lost track at the moment. I 
suspect the problem is related to some horse photo files from my 
daughters Mac. But I need a way to stop things when this happens ...


Any help appreciated.

Bob

Try "soft" option on the nfs mount in case the root cause is a problem 
with the nfs access to the image file.


--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Unable to kill runaway app. -

2009-08-20 Thread Andras Simon
On 8/20/09, Bob Goodwin  wrote:

> I don't expect anyone to troubleshoot the problem but would like to know
> what other commands I might try to restore things without shutting down
> and rebooting.

kill -9  can be pretty effective.

Andras

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Unable to kill runaway app. -

2009-08-20 Thread Bob Goodwin

I just had perhaps the third occurrence of this problem.

I tried to shut down gthumb which was displaying a a photo from the nfs 
server. It would not shut down, at least not in a reasonable amount if 
time. Gkrellm showed cup1 running at max. and top indicated the cup at 
99.5%. Something did eventually time out but that did not calm the cup 
activity.: .


   3487 bobg  20   0  2928 1068  932 R 99.5  0.0 445:55.55 gam_server

Kill 3487 does not stop it. In fact nothing seems to. I told it to 
poweroff and it got as far as "halting system" and stayed there until I 
pressed the power button for five seconds or so.


This happened once last night and it sat there saying it was busy, the 
power button was required to kill it then too.


I don't expect anyone to troubleshoot the problem but would like to know 
what other commands I might try to restore things without shutting down 
and rebooting.


This is an F-10 system pretty much up to date, certainly all security 
updates and perhaps all the rest, I've lost track at the moment. I 
suspect the problem is related to some horse photo files from my 
daughters Mac. But I need a way to stop things when this happens ...


Any help appreciated.

Bob

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines