Re: Unexplained Kernel Panic / Hung Task

2013-12-04 Thread Bluejay Adametz
> I am looking up the firmware versions for this box now. I am not hopeful
> that I will find a newer firmware for this old of a system though.
> Still, totally worth the try! :-)

I maintain racks of DL380 G4s, and have found recent firmware at
http://h17007.www1.hp.com/us/en/enterprise/servers/products/service_pack/spp/index.aspx

Download the DVD, boot it, and see what you got.

 - Bluejay Adametz

Happiness comes when your work and words are of benefit to yourself and
others. - Siddharthe Gautama

-- 


NOTICE: This message, including any attachments, is only for the use of the 
intended recipient(s) and may contain confidential and privileged information, 
or information otherwise protected from disclosure by law.  If the reader of 
this message is not the intended recipient, you are hereby notified that any 
use, disclosure, copying, dissemination or distribution of this message or any 
of its attachments is strictly prohibited.  If you received this message in 
error, please contact the sender immediately by reply email and destroy this 
message, including all attachments, and any copies thereof. 


Re: Unexplained Kernel Panic / Hung Task

2013-12-04 Thread Paul Robert Marino
If not down rev it to the same version as the one that works.It isn't hard to do with their utilities because those of us who work in mission critical environment have hammered it into their heads that its an absolute requierment-- Sent from my HP Pre3On Dec 4, 2013 19:12, ~Stack~  wrote: On 12/04/2013 05:51 PM, Paul Robert Marino wrote:
> Well I tend to discount the driver idea because of an other problem he
> has involving multiple what I think are identical machines . Also any
> problems I've ever had with the ccsis driver were usually firmware
> related an a update or roll back usually corrects them.
> Besides the based on what I've heard this is low budget equipment and
> ProLiants aren't cheap. If I had to guess we are talking about Dells. 

You are right, in that I am experiencing two different issues and the
vast majority of my test lab is older cast-away parts. The difference is
that both issues are on very different systems.

The DHCP problem is on a bunch of similar generic Dells. This particular
problem is on a HP Prolient DL360 G4 which its twin (same hardware specs
and thanks to Puppet should be dang-near identical in terms of software)
so far has not displayed this problem.

Because the twin isn't having this problem and the problem only started
~3 weeks ago is why I thought for the last few weeks it was a disk drive
problem.

I am looking up the firmware versions for this box now. I am not hopeful
that I will find a newer firmware for this old of a system though.
Still, totally worth the try! :-)

Thanks!
~Stack~


Re: Unexplained Kernel Panic / Hung Task

2013-12-04 Thread ~Stack~
On 12/04/2013 05:51 PM, Paul Robert Marino wrote:
> Well I tend to discount the driver idea because of an other problem he
> has involving multiple what I think are identical machines . Also any
> problems I've ever had with the ccsis driver were usually firmware
> related an a update or roll back usually corrects them.
> Besides the based on what I've heard this is low budget equipment and
> ProLiants aren't cheap. If I had to guess we are talking about Dells. 

You are right, in that I am experiencing two different issues and the
vast majority of my test lab is older cast-away parts. The difference is
that both issues are on very different systems.

The DHCP problem is on a bunch of similar generic Dells. This particular
problem is on a HP Prolient DL360 G4 which its twin (same hardware specs
and thanks to Puppet should be dang-near identical in terms of software)
so far has not displayed this problem.

Because the twin isn't having this problem and the problem only started
~3 weeks ago is why I thought for the last few weeks it was a disk drive
problem.

I am looking up the firmware versions for this box now. I am not hopeful
that I will find a newer firmware for this old of a system though.
Still, totally worth the try! :-)

Thanks!
~Stack~



signature.asc
Description: OpenPGP digital signature


Re: No DHCP on boot with a fresh install

2013-12-04 Thread ~Stack~
On 12/04/2013 05:39 PM, ~Stack~ wrote:
> On 12/04/2013 05:13 PM, Alan Bartlett wrote:
>> On 4 December 2013 23:07, ~Stack~  wrote:
>>> On 12/04/2013 08:19 AM, Mark Stodola wrote:
 I would suggest trying a NIC that uses a different driver or getting a
 newer driver from ELrepo (kmod-tg3).  Broadcom has been known to have
 issues in my experience.
>>> Hrm. I don't seem to find any package with tg3 in it at all. Even
>>> looking on the EPEL website[1] I don't see kmod-tg3. Is it under a
>>> different name perhaps?
>>> [1]
>>> http://dl.fedoraproject.org/pub/epel/6/x86_64/repoview/letter_k.group.html
>>>
 Personally, I try to stick with Intel.
>>> Me too. But these are old cast aways that the hardware is still good,
>>> hence why they are test boxes. :-)
>>>
>>> Thanks for the input!
>>>
>>> ~Stack~
>>
>> The ELRepo Project [1] is not Fedora's Extra Products for Enterprise Linux 
>> [2].
>>
>> Alan.
>>
>> [1] http://elrepo.org
>> [2] https://fedoraproject.org/wiki/EPEL
>>
> 
> Haha! Right on. It might help if I read things correctly. :-D
> 
> Thanks for pointing that out. I will go give that a try now.

Sadly, the updated driver didn't work for me.

Thanks anyway!

~Stack~




signature.asc
Description: OpenPGP digital signature


Re: Unexplained Kernel Panic / Hung Task

2013-12-04 Thread Paul Robert Marino
Well I tend to discount the driver idea because of an other problem he has involving multiple what I think are identical machines . Also any problems I've ever had with the ccsis driver were usually firmware related an a update or roll back usually corrects them.Besides the based on what I've heard this is low budget equipment and ProLiants aren't cheap. If I had to guess we are talking about Dells. -- Sent from my HP Pre3On Dec 4, 2013 18:36, David Sommerseth  wrote: On 04/12/13 14:21, ~Stack~ wrote:> Greetings,
 >
 > I have a test system I use for testing deployments and when I am not
 > using it, it runs Boinc. It is a Scientific Linux 6.4 fully updated box.
 > Recently (last ~3 weeks) I have started getting the same kernel panic.
 > Sometimes it will be multiple times in a single day and other times it
 > will be days before the next one (it just had a 5 day uptime). But the
 > kernel panic looks pretty much the same. It is a complaint about a hung
 > task plus information about the ext4 file system. I have run the
 > smartmon tool against both drives (2 drives setup in a hardware RAID
 > mirror) and both drives checkout fine. I ran a fsck against the /
 > partition and everything looked fine (on this text box there is only /
 > and swap partitions). I even took out a drive at a time and had the same
 > crashes (though this could be an indicator that both drives are bad). I
 > am wondering if my RAID card is going bad.
 >
 > When the crash happens I still have the SSH prompt, however, I can only
 > do basic things like navigating directories and sometimes reading files.
 > Writing to a file seems to hang, using tab-autocomplete will frequently
 > hang, running most programs (even `init 6` or `top`) will hang.
 >
 > It crashed again last night, and I am kind of stumped. I would greatly
 > appreciate others thoughts and input on what the problem might be.
 >
 > Thanks!
 > ~Stack~
 >
 > Dec  4 02:25:09 testbox kernel: INFO: task jbd2/cciss!c0d0:273 blocked
 > for more than 120 seconds.
 > Dec  4 02:25:09 testbox kernel: "echo 0 >
 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 > Dec  4 02:25:09 testbox kernel: jbd2/cciss!c0 D  0
 >   273  2 0x
 > Dec  4 02:25:09 testbox kernel: 8802142cfb30 0046
 > 8802138b5800 1000
 > Dec  4 02:25:09 testbox kernel: 8802142cfaa0 81012c59
 > 8802142cfae0 810a2431
 > Dec  4 02:25:09 testbox kernel: 880214157058 8802142cffd8
 > fb88 880214157058
 > Dec  4 02:25:09 testbox kernel: Call Trace:
 > Dec  4 02:25:09 testbox kernel: [] ? read_tsc+0x9/0x20

This looks like some locking issue to me, triggered by something around the 
TSC timer.

This is either a buggy driver (most likely the ccsis driver) or a related 
firmware (read the complete boot log carefully, look after firmware warnings). 
  Or it's a really unstable TSC clock source.  Try switching from TSC to HPET 
(or in really worst case acpi_pm).  See this KB for some related info: 


But my hunch tells me it's a driver related issue, with some bad locking. 
There seems to be several filesystem operations happening on two or more CPU 
cores in a certain order which seems to trigger a deadlock.


--
kind regards,

David Sommerseth

Re: No DHCP on boot with a fresh install

2013-12-04 Thread ~Stack~
On 12/04/2013 05:13 PM, Alan Bartlett wrote:
> On 4 December 2013 23:07, ~Stack~  wrote:
>> On 12/04/2013 08:19 AM, Mark Stodola wrote:
>>> I would suggest trying a NIC that uses a different driver or getting a
>>> newer driver from ELrepo (kmod-tg3).  Broadcom has been known to have
>>> issues in my experience.
>> Hrm. I don't seem to find any package with tg3 in it at all. Even
>> looking on the EPEL website[1] I don't see kmod-tg3. Is it under a
>> different name perhaps?
>> [1]
>> http://dl.fedoraproject.org/pub/epel/6/x86_64/repoview/letter_k.group.html
>>
>>> Personally, I try to stick with Intel.
>> Me too. But these are old cast aways that the hardware is still good,
>> hence why they are test boxes. :-)
>>
>> Thanks for the input!
>>
>> ~Stack~
> 
> The ELRepo Project [1] is not Fedora's Extra Products for Enterprise Linux 
> [2].
> 
> Alan.
> 
> [1] http://elrepo.org
> [2] https://fedoraproject.org/wiki/EPEL
> 

Haha! Right on. It might help if I read things correctly. :-D

Thanks for pointing that out. I will go give that a try now.

~Stack~




signature.asc
Description: OpenPGP digital signature


Re: Unexplained Kernel Panic / Hung Task

2013-12-04 Thread David Sommerseth

On 04/12/13 14:21, ~Stack~ wrote:> Greetings,
>
> I have a test system I use for testing deployments and when I am not
> using it, it runs Boinc. It is a Scientific Linux 6.4 fully updated box.
> Recently (last ~3 weeks) I have started getting the same kernel panic.
> Sometimes it will be multiple times in a single day and other times it
> will be days before the next one (it just had a 5 day uptime). But the
> kernel panic looks pretty much the same. It is a complaint about a hung
> task plus information about the ext4 file system. I have run the
> smartmon tool against both drives (2 drives setup in a hardware RAID
> mirror) and both drives checkout fine. I ran a fsck against the /
> partition and everything looked fine (on this text box there is only /
> and swap partitions). I even took out a drive at a time and had the same
> crashes (though this could be an indicator that both drives are bad). I
> am wondering if my RAID card is going bad.
>
> When the crash happens I still have the SSH prompt, however, I can only
> do basic things like navigating directories and sometimes reading files.
> Writing to a file seems to hang, using tab-autocomplete will frequently
> hang, running most programs (even `init 6` or `top`) will hang.
>
> It crashed again last night, and I am kind of stumped. I would greatly
> appreciate others thoughts and input on what the problem might be.
>
> Thanks!
> ~Stack~
>
> Dec  4 02:25:09 testbox kernel: INFO: task jbd2/cciss!c0d0:273 blocked
> for more than 120 seconds.
> Dec  4 02:25:09 testbox kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec  4 02:25:09 testbox kernel: jbd2/cciss!c0 D  0
>   273  2 0x
> Dec  4 02:25:09 testbox kernel: 8802142cfb30 0046
> 8802138b5800 1000
> Dec  4 02:25:09 testbox kernel: 8802142cfaa0 81012c59
> 8802142cfae0 810a2431
> Dec  4 02:25:09 testbox kernel: 880214157058 8802142cffd8
> fb88 880214157058
> Dec  4 02:25:09 testbox kernel: Call Trace:
> Dec  4 02:25:09 testbox kernel: [] ? read_tsc+0x9/0x20

This looks like some locking issue to me, triggered by something around the 
TSC timer.


This is either a buggy driver (most likely the ccsis driver) or a related 
firmware (read the complete boot log carefully, look after firmware warnings). 
 Or it's a really unstable TSC clock source.  Try switching from TSC to HPET 
(or in really worst case acpi_pm).  See this KB for some related info: 



But my hunch tells me it's a driver related issue, with some bad locking. 
There seems to be several filesystem operations happening on two or more CPU 
cores in a certain order which seems to trigger a deadlock.



--
kind regards,

David Sommerseth


Re: Unexplained Kernel Panic / Hung Task

2013-12-04 Thread ~Stack~
On 12/04/2013 07:56 AM, Paul Robert Marino wrote:
> Yup that's a hardware problem.

Drats. I was afraid of that.

> It may be a bad firmware on the controller I would check the firmware
> version first and see if there is a patch. I've seen this kind of thing
> with Dell OEMed RAID controllers enough over the years that that's
> almost always the first thing I try.

Will do. I will report back what I find.

Thanks!
~Stack~




signature.asc
Description: OpenPGP digital signature


Re: No DHCP on boot with a fresh install

2013-12-04 Thread Alan Bartlett
On 4 December 2013 23:07, ~Stack~  wrote:
> On 12/04/2013 08:19 AM, Mark Stodola wrote:
>> I would suggest trying a NIC that uses a different driver or getting a
>> newer driver from ELrepo (kmod-tg3).  Broadcom has been known to have
>> issues in my experience.
> Hrm. I don't seem to find any package with tg3 in it at all. Even
> looking on the EPEL website[1] I don't see kmod-tg3. Is it under a
> different name perhaps?
> [1]
> http://dl.fedoraproject.org/pub/epel/6/x86_64/repoview/letter_k.group.html
>
>> Personally, I try to stick with Intel.
> Me too. But these are old cast aways that the hardware is still good,
> hence why they are test boxes. :-)
>
> Thanks for the input!
>
> ~Stack~

The ELRepo Project [1] is not Fedora's Extra Products for Enterprise Linux [2].

Alan.

[1] http://elrepo.org
[2] https://fedoraproject.org/wiki/EPEL


Re: No DHCP on boot with a fresh install

2013-12-04 Thread ~Stack~
On 12/04/2013 08:19 AM, Mark Stodola wrote:
> I would suggest trying a NIC that uses a different driver or getting a
> newer driver from ELrepo (kmod-tg3).  Broadcom has been known to have
> issues in my experience.
Hrm. I don't seem to find any package with tg3 in it at all. Even
looking on the EPEL website[1] I don't see kmod-tg3. Is it under a
different name perhaps?
[1]
http://dl.fedoraproject.org/pub/epel/6/x86_64/repoview/letter_k.group.html

> Personally, I try to stick with Intel.
Me too. But these are old cast aways that the hardware is still good,
hence why they are test boxes. :-)

Thanks for the input!

~Stack~




signature.asc
Description: OpenPGP digital signature


Re: No DHCP on boot with a fresh install

2013-12-04 Thread Mark Stodola

On 12/3/2013 10:04 PM, ~Stack~ wrote:

I think we are on to something!

On 12/03/2013 09:41 PM, ~Stack~ wrote:

On 12/03/2013 09:16 PM, ~Stack~ wrote:

On 12/03/2013 08:37 PM, Nico Kadel-Garcia wrote:

On Tue, Dec 3, 2013 at 6:36 PM, ~Stack~  wrote:

On 12/01/2013 10:36 AM, olli hauer wrote:

Have you tried 'service network restart'? Does that bring up your nic?

Well now. That is interesting. This is consistent even with a fresh
kickstart install.
$ service network restart
Shutting down interface eth0:  [  OK  ]
Shutting down loopback interface:  [  OK  ]
Bringing up loopback interface:[  OK  ]
Bringing up interface eth0:
Determining IP information for eth0... failed; no link present.  Check
cable?  [FAILED]
$ ifup eth0
Determining IP information for eth0... done.

Errr...what? *scratches head* What exactly is 'ifup eth0' doing that
'service network restart' isn't?

It's running significantly later. Even dumb switches, and supported
network drivers, can tike time to recognize  the available MAC
address. This is especially the case with DHCP, which requires
communications all the way upstream to whatever DHCP server is in
place.

The weird part for me is that this is after the box is booted and I have
logged in. When I manually run 'service network restart' it fails in the
same way _every_ time. Then as soon as I run 'ifup eth0' it works! I
think I am going to experiment with this a bit.

Also, I have been tinkering with this a bit. In /etc/init.d/functions on
line ~536 (I have been editing a bit but I think that is right) there is
a line like this in the action function:
"$@"&&  success $"$STRING" || failure $"$STRING"

When I dumped out the variables it is just running './ifup eth0' but it
is on this line that everything seems to choke. What I find odd though is:
* If I run it on the command line it works. Running it as a service, it
fails. Thus I am wondering if it is an environmental variable setting?
That is my next investigation.

* If I run 'ifup eth0', get a IP, I can run 'service network restart'
and get an IP! If I run 'ifdown eth0' or reboot then the service kicks
back the error about a missing cable (which is obviously wrong).

Very very odd.

I checked out the environment variables, that is not it. I tried a few
other things and nothing. I don't understand why running '/sbin/ifup
eth0' but in the service command it doesn't work.

So I just started adding '/sbin/ifup eth0' statements into the start
command till it worked. I tweaked it and to reliably get a DHCP IP (even
on reboot!) just add *two* copies of the ifup command in the start
section. I put mine at the end just before the ";;" of the "start)" case
section. One copy alone will not do it. Thus the command is essentially
called three times in a row.

So there *is* a timing issue going on and just hammering it will
eventually get it to work. Now to find the best place to put the timing
delay...

Thanks!
I would suggest trying a NIC that uses a different driver or getting a 
newer driver from ELrepo (kmod-tg3).  Broadcom has been known to have 
issues in my experience.  Personally, I try to stick with Intel.


-Mark


Re: Unexplained Kernel Panic / Hung Task

2013-12-04 Thread Paul Robert Marino
Yup that's a hardware problem.It may be a bad firmware on the controller I would check the firmware version first and see if there is a patch. I've seen this kind of thing with Dell OEMed RAID controllers enough over the years that that's almost always the first thing I try.-- Sent from my HP Pre3On Dec 4, 2013 8:21, ~Stack~  wrote: Greetings,

I have a test system I use for testing deployments and when I am not
using it, it runs Boinc. It is a Scientific Linux 6.4 fully updated box.
Recently (last ~3 weeks) I have started getting the same kernel panic.
Sometimes it will be multiple times in a single day and other times it
will be days before the next one (it just had a 5 day uptime). But the
kernel panic looks pretty much the same. It is a complaint about a hung
task plus information about the ext4 file system. I have run the
smartmon tool against both drives (2 drives setup in a hardware RAID
mirror) and both drives checkout fine. I ran a fsck against the /
partition and everything looked fine (on this text box there is only /
and swap partitions). I even took out a drive at a time and had the same
crashes (though this could be an indicator that both drives are bad). I
am wondering if my RAID card is going bad.

When the crash happens I still have the SSH prompt, however, I can only
do basic things like navigating directories and sometimes reading files.
Writing to a file seems to hang, using tab-autocomplete will frequently
hang, running most programs (even `init 6` or `top`) will hang.

It crashed again last night, and I am kind of stumped. I would greatly
appreciate others thoughts and input on what the problem might be.

Thanks!
~Stack~

Dec  4 02:25:09 testbox kernel: INFO: task jbd2/cciss!c0d0:273 blocked
for more than 120 seconds.
Dec  4 02:25:09 testbox kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  4 02:25:09 testbox kernel: jbd2/cciss!c0 D  0
 273  2 0x
Dec  4 02:25:09 testbox kernel: 8802142cfb30 0046
8802138b5800 1000
Dec  4 02:25:09 testbox kernel: 8802142cfaa0 81012c59
8802142cfae0 810a2431
Dec  4 02:25:09 testbox kernel: 880214157058 8802142cffd8
fb88 880214157058
Dec  4 02:25:09 testbox kernel: Call Trace:
Dec  4 02:25:09 testbox kernel: [] ? read_tsc+0x9/0x20
Dec  4 02:25:09 testbox kernel: [] ?
ktime_get_ts+0xb1/0xf0
Dec  4 02:25:09 testbox kernel: [] ?
ktime_get_ts+0xb1/0xf0
Dec  4 02:25:09 testbox kernel: [] ? sync_page+0x0/0x50
Dec  4 02:25:09 testbox kernel: [] io_schedule+0x73/0xc0
Dec  4 02:25:09 testbox kernel: [] sync_page+0x3d/0x50
Dec  4 02:25:09 testbox kernel: [] __wait_on_bit+0x5f/0x90
Dec  4 02:25:09 testbox kernel: []
wait_on_page_bit+0x73/0x80
Dec  4 02:25:09 testbox kernel: [] ?
wake_bit_function+0x0/0x50
Dec  4 02:25:09 testbox kernel: [] ?
pagevec_lookup_tag+0x25/0x40
Dec  4 02:25:09 testbox kernel: []
wait_on_page_writeback_range+0xfb/0x190
Dec  4 02:25:09 testbox kernel: [] ? submit_bio+0x8d/0x120
Dec  4 02:25:09 testbox kernel: []
filemap_fdatawait+0x2f/0x40
Dec  4 02:25:09 testbox kernel: []
jbd2_journal_commit_transaction+0x7e9/0x1500 [jbd2]
Dec  4 02:25:09 testbox kernel: [] ?
__switch_to+0x13d/0x320
Dec  4 02:25:09 testbox kernel: [] ?
try_to_del_timer_sync+0x7b/0xe0
Dec  4 02:25:09 testbox kernel: []
kjournald2+0xb8/0x220 [jbd2]
Dec  4 02:25:09 testbox kernel: [] ?
autoremove_wake_function+0x0/0x40
Dec  4 02:25:09 testbox kernel: [] ?
kjournald2+0x0/0x220 [jbd2]
Dec  4 02:25:09 testbox kernel: [] kthread+0x96/0xa0
Dec  4 02:25:09 testbox kernel: [] child_rip+0xa/0x20
Dec  4 02:25:09 testbox kernel: [] ? kthread+0x0/0xa0
Dec  4 02:25:09 testbox kernel: [] ? child_rip+0x0/0x20
Dec  4 02:25:09 testbox kernel: INFO: task master:1058 blocked for more
than 120 seconds.
Dec  4 02:25:09 testbox kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  4 02:25:09 testbox kernel: masterD  0
1058  1 0x0080
Dec  4 02:25:09 testbox kernel: 88021535d948 0082
88021535d8d8 81065c75
Dec  4 02:25:09 testbox kernel: 880028216700 88021396b578
880214336ad8 880028216700
Dec  4 02:25:09 testbox kernel: 88021396baf8 88021535dfd8
fb88 88021396baf8
Dec  4 02:25:09 testbox kernel: Call Trace:
Dec  4 02:25:09 testbox kernel: [] ?
enqueue_entity+0x125/0x410
Dec  4 02:25:09 testbox kernel: [] ?
ktime_get_ts+0xb1/0xf0
Dec  4 02:25:09 testbox kernel: [] ? sync_buffer+0x0/0x50
Dec  4 02:25:09 testbox kernel: [] io_schedule+0x73/0xc0
Dec  4 02:25:09 testbox kernel: [] sync_buffer+0x40/0x50
Dec  4 02:25:09 testbox kernel: []
__wait_on_bit_lock+0x5a/0xc0
Dec  4 02:25:09 testbox kernel: [] ? sync_buffer+0x0/0x50
Dec  4 02:25:09 testbox kernel: []
out_of_line_wait_on_bit_lock+0x78/0x90
Dec  4 02:25:09 testbox kernel: [] ?
wake_bit_function+0x0/0x50
Dec  4 02:25:09 testbox kernel: [] ?
__find_get_block+0xa9/0x200
Dec  

Unexplained Kernel Panic / Hung Task

2013-12-04 Thread ~Stack~
Greetings,

I have a test system I use for testing deployments and when I am not
using it, it runs Boinc. It is a Scientific Linux 6.4 fully updated box.
Recently (last ~3 weeks) I have started getting the same kernel panic.
Sometimes it will be multiple times in a single day and other times it
will be days before the next one (it just had a 5 day uptime). But the
kernel panic looks pretty much the same. It is a complaint about a hung
task plus information about the ext4 file system. I have run the
smartmon tool against both drives (2 drives setup in a hardware RAID
mirror) and both drives checkout fine. I ran a fsck against the /
partition and everything looked fine (on this text box there is only /
and swap partitions). I even took out a drive at a time and had the same
crashes (though this could be an indicator that both drives are bad). I
am wondering if my RAID card is going bad.

When the crash happens I still have the SSH prompt, however, I can only
do basic things like navigating directories and sometimes reading files.
Writing to a file seems to hang, using tab-autocomplete will frequently
hang, running most programs (even `init 6` or `top`) will hang.

It crashed again last night, and I am kind of stumped. I would greatly
appreciate others thoughts and input on what the problem might be.

Thanks!
~Stack~

Dec  4 02:25:09 testbox kernel: INFO: task jbd2/cciss!c0d0:273 blocked
for more than 120 seconds.
Dec  4 02:25:09 testbox kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  4 02:25:09 testbox kernel: jbd2/cciss!c0 D  0
 273  2 0x
Dec  4 02:25:09 testbox kernel: 8802142cfb30 0046
8802138b5800 1000
Dec  4 02:25:09 testbox kernel: 8802142cfaa0 81012c59
8802142cfae0 810a2431
Dec  4 02:25:09 testbox kernel: 880214157058 8802142cffd8
fb88 880214157058
Dec  4 02:25:09 testbox kernel: Call Trace:
Dec  4 02:25:09 testbox kernel: [] ? read_tsc+0x9/0x20
Dec  4 02:25:09 testbox kernel: [] ?
ktime_get_ts+0xb1/0xf0
Dec  4 02:25:09 testbox kernel: [] ?
ktime_get_ts+0xb1/0xf0
Dec  4 02:25:09 testbox kernel: [] ? sync_page+0x0/0x50
Dec  4 02:25:09 testbox kernel: [] io_schedule+0x73/0xc0
Dec  4 02:25:09 testbox kernel: [] sync_page+0x3d/0x50
Dec  4 02:25:09 testbox kernel: [] __wait_on_bit+0x5f/0x90
Dec  4 02:25:09 testbox kernel: []
wait_on_page_bit+0x73/0x80
Dec  4 02:25:09 testbox kernel: [] ?
wake_bit_function+0x0/0x50
Dec  4 02:25:09 testbox kernel: [] ?
pagevec_lookup_tag+0x25/0x40
Dec  4 02:25:09 testbox kernel: []
wait_on_page_writeback_range+0xfb/0x190
Dec  4 02:25:09 testbox kernel: [] ? submit_bio+0x8d/0x120
Dec  4 02:25:09 testbox kernel: []
filemap_fdatawait+0x2f/0x40
Dec  4 02:25:09 testbox kernel: []
jbd2_journal_commit_transaction+0x7e9/0x1500 [jbd2]
Dec  4 02:25:09 testbox kernel: [] ?
__switch_to+0x13d/0x320
Dec  4 02:25:09 testbox kernel: [] ?
try_to_del_timer_sync+0x7b/0xe0
Dec  4 02:25:09 testbox kernel: []
kjournald2+0xb8/0x220 [jbd2]
Dec  4 02:25:09 testbox kernel: [] ?
autoremove_wake_function+0x0/0x40
Dec  4 02:25:09 testbox kernel: [] ?
kjournald2+0x0/0x220 [jbd2]
Dec  4 02:25:09 testbox kernel: [] kthread+0x96/0xa0
Dec  4 02:25:09 testbox kernel: [] child_rip+0xa/0x20
Dec  4 02:25:09 testbox kernel: [] ? kthread+0x0/0xa0
Dec  4 02:25:09 testbox kernel: [] ? child_rip+0x0/0x20
Dec  4 02:25:09 testbox kernel: INFO: task master:1058 blocked for more
than 120 seconds.
Dec  4 02:25:09 testbox kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  4 02:25:09 testbox kernel: masterD  0
1058  1 0x0080
Dec  4 02:25:09 testbox kernel: 88021535d948 0082
88021535d8d8 81065c75
Dec  4 02:25:09 testbox kernel: 880028216700 88021396b578
880214336ad8 880028216700
Dec  4 02:25:09 testbox kernel: 88021396baf8 88021535dfd8
fb88 88021396baf8
Dec  4 02:25:09 testbox kernel: Call Trace:
Dec  4 02:25:09 testbox kernel: [] ?
enqueue_entity+0x125/0x410
Dec  4 02:25:09 testbox kernel: [] ?
ktime_get_ts+0xb1/0xf0
Dec  4 02:25:09 testbox kernel: [] ? sync_buffer+0x0/0x50
Dec  4 02:25:09 testbox kernel: [] io_schedule+0x73/0xc0
Dec  4 02:25:09 testbox kernel: [] sync_buffer+0x40/0x50
Dec  4 02:25:09 testbox kernel: []
__wait_on_bit_lock+0x5a/0xc0
Dec  4 02:25:09 testbox kernel: [] ? sync_buffer+0x0/0x50
Dec  4 02:25:09 testbox kernel: []
out_of_line_wait_on_bit_lock+0x78/0x90
Dec  4 02:25:09 testbox kernel: [] ?
wake_bit_function+0x0/0x50
Dec  4 02:25:09 testbox kernel: [] ?
__find_get_block+0xa9/0x200
Dec  4 02:25:09 testbox kernel: [] __lock_buffer+0x36/0x40
Dec  4 02:25:09 testbox kernel: []
do_get_write_access+0x493/0x520 [jbd2]
Dec  4 02:25:09 testbox kernel: []
jbd2_journal_get_write_access+0x31/0x50 [jbd2]
Dec  4 02:25:09 testbox kernel: []
__ext4_journal_get_write_access+0x38/0x80 [ext4]
Dec  4 02:25:09 testbox kernel: []
ext4_r