Re: [gentoo-user] System freezes during compiles [SOLVED]

2013-03-22 Thread Carlos Hendson
On Wed, 2013-03-20 at 20:57 +0100, Volker Armin Hemmann wrote:
> you might just hit a thrashing situation. Linux is very bad when it
> comes to abusing swap in case of an emergency.
> 
> But it also sounds like overheating or a power problem. Power problems
> might be caused by the PSU - but it could also be the power circuitry
> of
> your mobo. 

First of all, thank you to everyone for the superb help and suggestions
regarding this problem.

Yesterday, I enabled some swap space, but the system froze on the first
attempt at compiling glibc.

The next cheapest option was to clean the case of dust.  The CPU heat
sink was clogged with a think layer of dust.  After thoroughly cleaning
the case, the system compiled glibc, the kernel, qtcore and other
packages without freezing.

The only downside is since the fins on the heat sink are exposed
directly to the fan again, the noise level has gone up.

When I checked the RPMs in the BIOS I noticed a setting which states,
decrease CPU voltage and frequency in the event of a temperature
threshold being exceeded.  This would explain the kernel watchdog
messages reporting stalls were detected.


For anyone that's curious, here's the output of sensors and free during
the compile of glibc.  Swap wasn't being touched at all, there's still
4GB of memory free.  The cpu was getting close to the threshold limit
even after the heat sink was cleaned of dust.

k10temp-pci-00c3
Adapter: PCI adapter
temp1:+58.9°C  (high = +70.0°C)
   (crit = +71.0°C, hyst = +66.0°C)

it8720-isa-0228
Adapter: ISA adapter
in0:  +1.49 V  (min =  +0.00 V, max =  +4.08 V)
in1:  +1.47 V  (min =  +0.00 V, max =  +4.08 V)
in2:  +3.38 V  (min =  +0.00 V, max =  +4.08 V)
+5V:  +2.96 V  (min =  +0.00 V, max =  +4.08 V)
in4:  +3.07 V  (min =  +0.00 V, max =  +4.08 V)
in5:  +3.25 V  (min =  +0.00 V, max =  +4.08 V)
in6:  +4.08 V  (min =  +0.00 V, max =  +4.08 V)  ALARM
5VSB: +2.98 V  (min =  +0.00 V, max =  +4.08 V)
Vbat: +3.28 V  
fan1:6750 RPM  (min =0 RPM)
fan2:   0 RPM  (min =0 RPM)
fan3:   0 RPM  (min =0 RPM)
fan4:   0 RPM  (min =0 RPM)
temp1:+31.0°C  (low  = +127.0°C, high = +127.0°C)  sensor =
thermistor
temp2:+67.0°C  (low  = +127.0°C, high = +127.0°C)  sensor =
thermal diode
temp3:+70.0°C  (low  = +127.0°C, high = +127.0°C)  sensor =
thermal diode
cpu0_vid:+0.525 V
intrusion0:  ALARM


 total   used   free sharedbuffers
cached
Mem:   816740031953004972100  0  62024
1379256
-/+ buffers/cache:17540206413380
Swap:   511996  0 511996



Once again, a big thanks for everyone's help.

Regards,
Carlos




Re: [gentoo-user] System freezes during compiles

2013-03-21 Thread Volker Armin Hemmann
Oom Killer is Not instant, can take a long time or get stuck or kills
something vital.
...
Am 21.03.2013 07:52 schrieb "Carlos Hendson" :

> On Thu, 2013-03-21 at 06:45 +0100, Volker Armin Hemmann wrote:
> > You got your answer. 8gig and no swap is NOT ENOUGHT.
>
> It's a strong indicator, which is going to be corrected.
>
> I am slightly confused by the resulting behaviour however.  I was of the
> impression oomkiller would start to kill processes when unallocated
> memory is getting scarce?
>
> How would no free memory cause CPU stalls?
>
> Regards,
> Carlos
>
>
>


Re: [gentoo-user] System freezes during compiles

2013-03-20 Thread Carlos Hendson
On Thu, 2013-03-21 at 06:45 +0100, Volker Armin Hemmann wrote:
> You got your answer. 8gig and no swap is NOT ENOUGHT.

It's a strong indicator, which is going to be corrected.

I am slightly confused by the resulting behaviour however.  I was of the
impression oomkiller would start to kill processes when unallocated
memory is getting scarce?

How would no free memory cause CPU stalls?

Regards,
Carlos




Re: [gentoo-user] System freezes during compiles

2013-03-20 Thread Carlos Hendson
On Wed, 2013-03-20 at 16:27 -0500, Paul Hartman wrote:
> 
> I had a virtual server that kept crashing/rebooting during compiles of
> large packages such as php. It ended up being because it was running
> out of memory. Added another 1GB of swap space and it has been happy
> ever since. 

Thanks Paul. Volker suggested a possible caused was swap.  I'll allocate
some swap space after the smartctl self-test finishes and try to
recompile gcc a few times.

Regards,
Carlos




Re: [gentoo-user] System freezes during compiles

2013-03-20 Thread Volker Armin Hemmann
You got your answer. 8gig and no swap is NOT ENOUGHT.
Am 20.03.2013 22:51 schrieb "Carlos Hendson" :

> On Wed, 2013-03-20 at 20:57 +0100, Volker Armin Hemmann wrote:
> > you might just hit a thrashing situation. Linux is very bad when it
> > comes to abusing swap in case of an emergency.
> >
> > But it also sounds like overheating or a power problem. Power problems
> > might be caused by the PSU - but it could also be the power circuitry
> > of
> > your mobo.
>
> It's not a thrashing issue as I don't have any swap.  The 8GB of ram has
> been sufficient memory for all tasks thus far.  I have no objection to
> allocating some swap space if it could resolve the issue.
>
> Actually, Grant and you both suggested possible heat issues which has
> just made me think that I should check for dust build up in the CPU heat
> sink.  There so much dust where I live that I have to vacuum dust build
> up from the case.
>
> The sensors tool reports 51C, it doesn't appear to be running too hot,
> but I don't have a baseline to compare it to.  I see I need to implement
> monitoring for this machine once it's stable again.
>
> k10temp-pci-00c3
> Adapter: PCI adapter
> temp1:+51.0°C  (high = +70.0°C)
>(crit = +71.0°C, hyst = +66.0°C)
>
>
> I'll give the inside a clean this weekend and see if there's any
> improvement.
>
> Thanks for the suggestions.
>
> Regards,
> Carlos
>
>
>


Re: [gentoo-user] System freezes during compiles

2013-03-20 Thread Carlos Hendson
On Wed, 2013-03-20 at 20:57 +0100, Volker Armin Hemmann wrote:
> you might just hit a thrashing situation. Linux is very bad when it
> comes to abusing swap in case of an emergency.
> 
> But it also sounds like overheating or a power problem. Power problems
> might be caused by the PSU - but it could also be the power circuitry
> of
> your mobo. 

It's not a thrashing issue as I don't have any swap.  The 8GB of ram has
been sufficient memory for all tasks thus far.  I have no objection to
allocating some swap space if it could resolve the issue.

Actually, Grant and you both suggested possible heat issues which has
just made me think that I should check for dust build up in the CPU heat
sink.  There so much dust where I live that I have to vacuum dust build
up from the case.

The sensors tool reports 51C, it doesn't appear to be running too hot,
but I don't have a baseline to compare it to.  I see I need to implement
monitoring for this machine once it's stable again.

k10temp-pci-00c3
Adapter: PCI adapter
temp1:+51.0°C  (high = +70.0°C)
   (crit = +71.0°C, hyst = +66.0°C)


I'll give the inside a clean this weekend and see if there's any
improvement.

Thanks for the suggestions.

Regards,
Carlos




Re: [gentoo-user] System freezes during compiles

2013-03-20 Thread Carlos Hendson
On Wed, 2013-03-20 at 08:17 +, Mick wrote:
> Stating the obvious, it seems that the kernel is struggling and indeed
> you may 
> have come across some nasty kernel bug.  However, it could well be
> that it is 
> not related to the kernel you're running, or your kernel config.  It
> could be 
> a problem with the power supply being faulty and causing these lock
> ups.
> 
> Unless someone else comes up with a better idea to troubleshoot it
> further, I 
> would consider replacing the power supply with another of a known
> good 
> condition. 

Thanks for the good advice Mick.  I don't have spare hardware on-tap so
switching psu, memory or processor may prove to be tricky.  It's one of
those catch 22's where I don't want to spend on components that aren't
faulty, however I need to spend on components to test if they're faulty.

I've been given a few other test to perform before I start moving to
hardware replacement.

Regards,
Carlos




Re: [gentoo-user] System freezes during compiles

2013-03-20 Thread Carlos Hendson
On Wed, 2013-03-20 at 18:43 +0100, Daniel Wagener wrote:
> "Frozen" means there is no Hard Drive Activity going on right?
> And there is no other indication, that you are just running out of
> memory? 

I can't categorically state if there was drive activity.  I was so
fixated on regaining control of the machine that I failed to pay
attention to the state of the HDD LED.  I'll make a point of checking it
the next time the machine appears to freeze.

I saw no other indications of memory exhaustion after the system came
back from the "soft-power reset" button being pressed.

Regards,
Carlos




Re: [gentoo-user] System freezes during compiles

2013-03-20 Thread Paul Hartman
On Tue, Mar 19, 2013 at 11:42 PM, Carlos Hendson  wrote:
> For last few weeks or so, I've been getting intermittent hard lock-ups
> during the emerge of various packages.  It appears the more compile
> intensive the package, the more likely the lock-up.  These lock-ups have
> occurred under kernels 3.4.9 and 3.7.10 with gcc 4.5.4 and 4.6.3.


I had a virtual server that kept crashing/rebooting during compiles of
large packages such as php. It ended up being because it was running
out of memory. Added another 1GB of swap space and it has been happy
ever since.



Re: [gentoo-user] System freezes during compiles

2013-03-20 Thread Volker Armin Hemmann
Am 20.03.2013 05:42, schrieb Carlos Hendson:
> Hello,
>
> For last few weeks or so, I've been getting intermittent hard lock-ups
> during the emerge of various packages.  It appears the more compile
> intensive the package, the more likely the lock-up.  These lock-ups have
> occurred under kernels 3.4.9 and 3.7.10 with gcc 4.5.4 and 4.6.3.
>
> Once the machine is in a frozen state, the only thing that responds is
> the soft power reset button.  Some times the machine lock-ups again
> after the button is pressed (this is because the compile resumes once
> the system comes out of it's frozen state).
>
> If the system subsequently lock-ups because I wasn't able to cancel the
> compile fast enough only a only option left is a hard power reset (10sec
> + hold power button).  If I cancel the compile, the system is perfectly
> responsive and functions normally.
>
> There are kernel stack traces in /var/log/messages which I'm unable to
> decipher and diagnose as to what caused the lock-up.
>
> If I had to guess, I'd blame an incorrect setting in the .config, but
> since I'm stuck in the diagnostic of what part of the kernel might be
> experiencing the problem, I need a bit of help to pin point the issue.  
>
> I believe it to be a kernel configuration issue because when I booted
> the machine using a system rescue Live CD, I was able to chroot into the
> system and emerge packages like gcc without the lock-up problem
> occurring.  
>
> That's by no means conclusive, however, I've also run a complete pass of
> memcheck for over an hour without any issues reported.
>
> I'd like to completely rule out hardware failure, what diagnostic tools
> tools are recommend to try identify potential hardware issue of this
> type?
>
> The various kernel stack traces are attached in case someone wants to
> take a look.  I can provide more information should it be needed.
>
> Any help or advice would be appreciated.
>
> Regards,
> Carlos 
you might just hit a thrashing situation. Linux is very bad when it
comes to abusing swap in case of an emergency.

But it also sounds like overheating or a power problem. Power problems
might be caused by the PSU - but it could also be the power circuitry of
your mobo.



Re: [gentoo-user] System freezes during compiles

2013-03-20 Thread Daniel Wagener
On Wed, 20 Mar 2013 05:42:28 +0100
Carlos Hendson  wrote:

> Hello,
> 
> For last few weeks or so, I've been getting intermittent hard lock-ups
> during the emerge of various packages.  It appears the more compile
> intensive the package, the more likely the lock-up.  These lock-ups have
> occurred under kernels 3.4.9 and 3.7.10 with gcc 4.5.4 and 4.6.3.
> 
> Once the machine is in a frozen state, the only thing that responds is
> the soft power reset button.  Some times the machine lock-ups again
> after the button is pressed (this is because the compile resumes once
> the system comes out of it's frozen state).
> 
> If the system subsequently lock-ups because I wasn't able to cancel the
> compile fast enough only a only option left is a hard power reset (10sec
> + hold power button).  If I cancel the compile, the system is perfectly
> responsive and functions normally.
> 
> There are kernel stack traces in /var/log/messages which I'm unable to
> decipher and diagnose as to what caused the lock-up.
> 
> If I had to guess, I'd blame an incorrect setting in the .config, but
> since I'm stuck in the diagnostic of what part of the kernel might be
> experiencing the problem, I need a bit of help to pin point the issue.  
> 
> I believe it to be a kernel configuration issue because when I booted
> the machine using a system rescue Live CD, I was able to chroot into the
> system and emerge packages like gcc without the lock-up problem
> occurring.  
> 
> That's by no means conclusive, however, I've also run a complete pass of
> memcheck for over an hour without any issues reported.
> 
> I'd like to completely rule out hardware failure, what diagnostic tools
> tools are recommend to try identify potential hardware issue of this
> type?
> 
> The various kernel stack traces are attached in case someone wants to
> take a look.  I can provide more information should it be needed.
> 
> Any help or advice would be appreciated.
> 
> Regards,
> Carlos 

"Frozen" means there is no Hard Drive Activity going on right?
And there is no other indication, that you are just running out of memory?

-- 




Re: [gentoo-user] System freezes during compiles

2013-03-20 Thread Neil Bothwick
On Wed, 20 Mar 2013 08:17:11 +, Mick wrote:

> Stating the obvious, it seems that the kernel is struggling and indeed
> you may have come across some nasty kernel bug.  However, it could well
> be that it is not related to the kernel you're running, or your kernel
> config.  It could be a problem with the power supply being faulty and
> causing these lock ups.

That's certainly possible, it could also be failing memory, and it's
cheaper to run memtest86+ before buying a new power supply ;-)


-- 
Neil Bothwick

Only an idiot actually READS taglines.


signature.asc
Description: PGP signature


Re: [gentoo-user] System freezes during compiles

2013-03-20 Thread Mick
On Wednesday 20 Mar 2013 04:42:28 Carlos Hendson wrote:
> Hello,
> 
> For last few weeks or so, I've been getting intermittent hard lock-ups
> during the emerge of various packages.  It appears the more compile
> intensive the package, the more likely the lock-up.  These lock-ups have
> occurred under kernels 3.4.9 and 3.7.10 with gcc 4.5.4 and 4.6.3.
> 
> Once the machine is in a frozen state, the only thing that responds is
> the soft power reset button.  Some times the machine lock-ups again
> after the button is pressed (this is because the compile resumes once
> the system comes out of it's frozen state).
> 
> If the system subsequently lock-ups because I wasn't able to cancel the
> compile fast enough only a only option left is a hard power reset (10sec
> + hold power button).  If I cancel the compile, the system is perfectly
> responsive and functions normally.
> 
> There are kernel stack traces in /var/log/messages which I'm unable to
> decipher and diagnose as to what caused the lock-up.
> 
> If I had to guess, I'd blame an incorrect setting in the .config, but
> since I'm stuck in the diagnostic of what part of the kernel might be
> experiencing the problem, I need a bit of help to pin point the issue.
> 
> I believe it to be a kernel configuration issue because when I booted
> the machine using a system rescue Live CD, I was able to chroot into the
> system and emerge packages like gcc without the lock-up problem
> occurring.
> 
> That's by no means conclusive, however, I've also run a complete pass of
> memcheck for over an hour without any issues reported.
> 
> I'd like to completely rule out hardware failure, what diagnostic tools
> tools are recommend to try identify potential hardware issue of this
> type?
> 
> The various kernel stack traces are attached in case someone wants to
> take a look.  I can provide more information should it be needed.
> 
> Any help or advice would be appreciated.
> 
> Regards,
> Carlos

Stating the obvious, it seems that the kernel is struggling and indeed you may 
have come across some nasty kernel bug.  However, it could well be that it is 
not related to the kernel you're running, or your kernel config.  It could be 
a problem with the power supply being faulty and causing these lock ups.

Unless someone else comes up with a better idea to troubleshoot it further, I 
would consider replacing the power supply with another of a known good 
condition.
-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


[gentoo-user] System freezes during compiles

2013-03-19 Thread Carlos Hendson
Hello,

For last few weeks or so, I've been getting intermittent hard lock-ups
during the emerge of various packages.  It appears the more compile
intensive the package, the more likely the lock-up.  These lock-ups have
occurred under kernels 3.4.9 and 3.7.10 with gcc 4.5.4 and 4.6.3.

Once the machine is in a frozen state, the only thing that responds is
the soft power reset button.  Some times the machine lock-ups again
after the button is pressed (this is because the compile resumes once
the system comes out of it's frozen state).

If the system subsequently lock-ups because I wasn't able to cancel the
compile fast enough only a only option left is a hard power reset (10sec
+ hold power button).  If I cancel the compile, the system is perfectly
responsive and functions normally.

There are kernel stack traces in /var/log/messages which I'm unable to
decipher and diagnose as to what caused the lock-up.

If I had to guess, I'd blame an incorrect setting in the .config, but
since I'm stuck in the diagnostic of what part of the kernel might be
experiencing the problem, I need a bit of help to pin point the issue.  

I believe it to be a kernel configuration issue because when I booted
the machine using a system rescue Live CD, I was able to chroot into the
system and emerge packages like gcc without the lock-up problem
occurring.  

That's by no means conclusive, however, I've also run a complete pass of
memcheck for over an hour without any issues reported.

I'd like to completely rule out hardware failure, what diagnostic tools
tools are recommend to try identify potential hardware issue of this
type?

The various kernel stack traces are attached in case someone wants to
take a look.  I can provide more information should it be needed.

Any help or advice would be appreciated.

Regards,
Carlos 
Mar 12 23:42:03 hydra kernel: [58066.564110] [ cut here 
]
Mar 12 23:42:03 hydra kernel: [58068.663176] WARNING: at kernel/watchdog.c:241 
watchdog_overflow_callback+0x93/0x9e()
Mar 12 23:42:03 hydra kernel: [58068.673235] Hardware name: GA-990FXA-D3
Mar 12 23:42:03 hydra kernel: [58068.673303] Watchdog detected hard LOCKUP on 
cpu 2
Mar 12 23:42:03 hydra kernel: [58068.751056] Modules linked in: usb_storage uas 
ipv6 it87 hwmon_vid fglrx(PO) uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_core joydev radeon i2c_al
go_bit ttm drm_kms_helper drm r8169 xhci_hcd ata_generic pata_acpi i2c_piix4 
mii i2c_core pata_atiixp wmi serio_raw k10temp powernow_k8 pcspkr mperf 
freq_table
Mar 12 23:42:03 hydra kernel: [58068.945979] Pid: 720, comm: cc1 Tainted: P 
  O 3.4.9-gentoo #2
Mar 12 23:42:03 hydra kernel: [58068.946053] Call Trace:
Mar 12 23:42:03 hydra kernel: [58069.054704][] ? 
warn_slowpath_common+0x78/0x8c
Mar 12 23:42:03 hydra kernel: [58069.231277]  [] ? 
warn_slowpath_fmt+0x45/0x4a
Mar 12 23:42:03 hydra kernel: [58069.271020]  [] ? 
watchdog_overflow_callback+0x93/0x9e
Mar 12 23:42:03 hydra kernel: [58069.271135]  [] ? 
touch_nmi_watchdog+0x62/0x62
Mar 12 23:42:03 hydra kernel: [58069.293566]  [] ? 
__perf_event_overflow+0x12c/0x1ae
Mar 12 23:42:03 hydra kernel: [58069.293689]  [] ? 
perf_event_update_userpage+0x13/0xbf
Mar 12 23:42:03 hydra kernel: [58069.293811]  [] ? 
x86_pmu_handle_irq+0xbe/0xf3
Mar 12 23:42:03 hydra kernel: [58069.293939]  [] ? 
nmi_handle.isra.4+0x3e/0x61
Mar 12 23:42:03 hydra kernel: [58069.294038]  [] ? 
do_nmi+0x9f/0x287
Mar 12 23:42:03 hydra kernel: [58069.294139]  [] ? 
end_repeat_nmi+0x1a/0x1e
Mar 12 23:42:03 hydra kernel: [58069.294253]  [] ? 
_raw_spin_lock_irq+0x6/0x6
Mar 12 23:42:03 hydra kernel: [58069.294357]  [] ? 
_raw_spin_lock_irq+0x6/0x6
Mar 12 23:42:03 hydra kernel: [58069.314699]  [] ? 
_raw_spin_lock_irq+0x6/0x6
Mar 12 23:42:03 hydra kernel: [58069.318869]  <>
[] ? ntp_tick_length+0x23/0x28
Mar 12 23:42:03 hydra kernel: [58069.319051]  [] ? 
do_timer+0x89/0x465
Mar 12 23:42:03 hydra kernel: [58069.319185]  [] ? 
tick_do_update_jiffies64+0x74/0x98
Mar 12 23:42:03 hydra kernel: [58069.319300]  [] ? 
tick_sched_timer+0x3f/0x8d
Mar 12 23:42:03 hydra kernel: [58069.319424]  [] ? 
__run_hrtimer.isra.27+0x4b/0xa3
Mar 12 23:42:03 hydra kernel: [58069.319547]  [] ? 
hrtimer_interrupt+0xd9/0x1c9
Mar 12 23:42:03 hydra kernel: [58069.319655]  [] ? 
smp_apic_timer_interrupt+0x6e/0x80
Mar 12 23:42:03 hydra kernel: [58069.319750]  [] ? 
apic_timer_interrupt+0x67/0x70
Mar 12 23:42:03 hydra kernel: [58069.319810]   
Mar 12 23:42:03 hydra kernel: [58069.324331] ---[ end trace b1a58589d91a0dec 
]---


Mar 12 23:58:02 hydra kernel: [59023.803433] [ cut here 
]
Mar 12 23:58:02 hydra kernel: [59024.963950] [ cut here 
]
Mar 12 23:58:02 hydra kernel: [59025.152834] WARNING: at kernel/watchdog.c:241 
watchdog_overflow_callback+0x93/0x9e()
Mar 12 23:58:02 hydra kernel: [59025.152895] Hardware name: GA-990FXA-D3
Mar 12 23:58:02 hydra kernel: [59025.152939] Watchdog detected hard LOCKUP on 
cpu 4
Mar 12 2