Re: BUG: rcu_sched detected stalls on CPUs

2017-05-22 Thread Paul E. McKenney
On Mon, May 22, 2017 at 02:56:50PM +0300, mi...@stz-bg.com wrote:
> On 2017-05-10 17:32, Paul E. McKenney wrote:
> >On Wed, May 10, 2017 at 02:00:32PM +0300, mi...@stz-bg.com wrote:
> >>On 2017-05-09 17:49, Paul E. McKenney wrote:
> >>>On Tue, May 09, 2017 at 10:53:25AM +0300, mi...@stz-bg.com wrote:
> On 2017-05-08 16:24, Paul E. McKenney wrote:
> >On Mon, May 08, 2017 at 09:43:15AM +0300, mi...@stz-bg.com wrote:
> >>On 2017-01-20 17:19, Steven Rostedt wrote:
> >>>On Fri, 20 Jan 2017 10:43:50 +0200
> >>>mi...@stz-bg.com wrote:
> >>>
> [1.] One line summary of the problem:
> 
> rcu_sched detected stalls on CPUs and few minutes server not respond.
> >>>
> >>>Is this reproducible? Or was this a one time ordeal?
> >>>
> 
> [2.] Full description of the problem/report:
> 
> Load of my server (postgres database) isnt big less then 0.50 and when
> error occured rcu_sched detected stalls on CPUs
> server freeze and nothing is work for 3-5 minute.
> No network, no video signal, no keyboard, no mouse. Nothing is worked.
> After these few minutes everything continue normal.
> This usual is happend once per day. When I check in google find a lots
> of ppl complain of this error, but no solution.
> Do any one know can help me to resolve it ? I spoke with few
> friends and
> they trying to convince me the problem is in CPU.
> I did not believe after a 3 years working CPU suddenly stop working
> correctly, but I might be wrong.
> 
> [3.] Keywords (i.e., modules, networking, kernel):
> 
> kernel
> 
> [4.] Kernel information
> [4.1.] Kernel version (from /proc/version):
> 
> Linux version 4.4.38 (root@hive64) (gcc version 5.4.0 (GCC) ) #2
> SMP Sun
> Dec 11 16:11:02 CST 2016
> 
> >>>
> >>>Have you tried a newer version of the kernel?
> >>>
> >>>-- Steve
> >>
> >>
> >>Hello,
> >>yesterday I change to new kernel: 4.9.26 and still no effect.
> >>
> >>I trying to figure out what I need to buy because I read on google a
> >>lots of posts about that problem,
> >>some ppl suggest is BIOS firmware bug, some ppl tell that when they
> >>swap CPU problem is resolved.
> >>May be problem was started when I first time boot 4.x kernel and
> >>there have cpu microcode updates.
> >>Im 3.x kernels this feature was not in kernel, but I don't know,
> >>only guess.
> >>
> >>Can some one point me clearly: You need to change this one and
> >>problem will be solved ?
> >
> >Looking back at your earlier email, I see the following:
> >
> >[Wed Jan  4 10:19:12 2017] rcu_sched kthread starved for 60004
> >jiffies! g61967937 c61967936 f0x0 s3 ->state=0x1
> >
> >That indicates that the task "rcu_sched" isn't being allowed to run.
> >The "->state=0x1" indicates that this task is currently asleep and the
> >"s3" indicates that it is looking to start a grace period, but has not
> >progressed through this task.  My guess is that your system get very
> >busy, and that rcu_sched's normal scheduling priority is not sufficient
> >for it to be allowed to run.
> >
> >I therefore suggest that you try raising the rcu_sched task's priority.
> >
> >You can find this task like this:
> >
> > ps -ef | grep rcu_sched
> >
> >On my laptop, I get this:
> >
> > $ ps -ef | grep rcu_sched
> > root 7 2  0 Apr11 ?00:19:19 [rcu_sched]
> > paulmck  18307 22926  0 06:11 pts/35   00:00:00 grep --color=auto
> >rcu_sche
> >
> >You can use the chrt command to set the priority.  For example,
> >given that
> >on my laptop, rcu_sched is PID 7:
> >
> > sudo chrt -f -p 1 7
> >
> >You can double-check as follows:
> >
> > $ chrt -p 7
> > pid 7's current scheduling policy: SCHED_FIFO
> > pid 7's current scheduling priority: 1
> >
> >Does that help?
> >
> > Thanx, Paul
> >
> 
> 
> Hi,
> interesting ... I change priority as you suggest and will keep you
> in touch
> if the problem happens again because I cant reproduce it. Replay on
> your next email, I don't have any
> virtual servers / guest os on the server. It's server with only one
> process - redis server
> and in this time when error it's happened actually one of other
> servers start loading data
> into redis sever. Data is small 3-4 mil. keys every key contain
> around 150 symbols.
> I have that problem on two servers, next server is database server
> with postgresql
> and that problem happened more often, usual once per day or for few
> days.
> I never play before with changing prior

Re: BUG: rcu_sched detected stalls on CPUs

2017-05-22 Thread mitko

On 2017-05-10 17:32, Paul E. McKenney wrote:

On Wed, May 10, 2017 at 02:00:32PM +0300, mi...@stz-bg.com wrote:

On 2017-05-09 17:49, Paul E. McKenney wrote:
>On Tue, May 09, 2017 at 10:53:25AM +0300, mi...@stz-bg.com wrote:
>>On 2017-05-08 16:24, Paul E. McKenney wrote:
>>>On Mon, May 08, 2017 at 09:43:15AM +0300, mi...@stz-bg.com wrote:
On 2017-01-20 17:19, Steven Rostedt wrote:
>On Fri, 20 Jan 2017 10:43:50 +0200
>mi...@stz-bg.com wrote:
>
>>[1.] One line summary of the problem:
>>
>>rcu_sched detected stalls on CPUs and few minutes server not respond.
>
>Is this reproducible? Or was this a one time ordeal?
>
>>
>>[2.] Full description of the problem/report:
>>
>>Load of my server (postgres database) isnt big less then 0.50 and when
>>error occured rcu_sched detected stalls on CPUs
>>server freeze and nothing is work for 3-5 minute.
>>No network, no video signal, no keyboard, no mouse. Nothing is worked.
>>After these few minutes everything continue normal.
>>This usual is happend once per day. When I check in google find a lots
>>of ppl complain of this error, but no solution.
>>Do any one know can help me to resolve it ? I spoke with few
>>friends and
>>they trying to convince me the problem is in CPU.
>>I did not believe after a 3 years working CPU suddenly stop working
>>correctly, but I might be wrong.
>>
>>[3.] Keywords (i.e., modules, networking, kernel):
>>
>>kernel
>>
>>[4.] Kernel information
>>[4.1.] Kernel version (from /proc/version):
>>
>>Linux version 4.4.38 (root@hive64) (gcc version 5.4.0 (GCC) ) #2
>>SMP Sun
>>Dec 11 16:11:02 CST 2016
>>
>
>Have you tried a newer version of the kernel?
>
>-- Steve


Hello,
yesterday I change to new kernel: 4.9.26 and still no effect.

I trying to figure out what I need to buy because I read on google a
lots of posts about that problem,
some ppl suggest is BIOS firmware bug, some ppl tell that when they
swap CPU problem is resolved.
May be problem was started when I first time boot 4.x kernel and
there have cpu microcode updates.
Im 3.x kernels this feature was not in kernel, but I don't know,
only guess.

Can some one point me clearly: You need to change this one and
problem will be solved ?
>>>
>>>Looking back at your earlier email, I see the following:
>>>
>>>[Wed Jan  4 10:19:12 2017] rcu_sched kthread starved for 60004
>>>jiffies! g61967937 c61967936 f0x0 s3 ->state=0x1
>>>
>>>That indicates that the task "rcu_sched" isn't being allowed to run.
>>>The "->state=0x1" indicates that this task is currently asleep and the
>>>"s3" indicates that it is looking to start a grace period, but has not
>>>progressed through this task.  My guess is that your system get very
>>>busy, and that rcu_sched's normal scheduling priority is not sufficient
>>>for it to be allowed to run.
>>>
>>>I therefore suggest that you try raising the rcu_sched task's priority.
>>>
>>>You can find this task like this:
>>>
>>>ps -ef | grep rcu_sched
>>>
>>>On my laptop, I get this:
>>>
>>>$ ps -ef | grep rcu_sched
>>>root 7 2  0 Apr11 ?00:19:19 [rcu_sched]
>>>paulmck  18307 22926  0 06:11 pts/35   00:00:00 grep --color=auto
>>>rcu_sche
>>>
>>>You can use the chrt command to set the priority.  For example,
>>>given that
>>>on my laptop, rcu_sched is PID 7:
>>>
>>>sudo chrt -f -p 1 7
>>>
>>>You can double-check as follows:
>>>
>>>$ chrt -p 7
>>>pid 7's current scheduling policy: SCHED_FIFO
>>>pid 7's current scheduling priority: 1
>>>
>>>Does that help?
>>>
>>>Thanx, Paul
>>>
>>
>>
>>Hi,
>>interesting ... I change priority as you suggest and will keep you
>>in touch
>>if the problem happens again because I cant reproduce it. Replay on
>>your next email, I don't have any
>>virtual servers / guest os on the server. It's server with only one
>>process - redis server
>>and in this time when error it's happened actually one of other
>>servers start loading data
>>into redis sever. Data is small 3-4 mil. keys every key contain
>>around 150 symbols.
>>I have that problem on two servers, next server is database server
>>with postgresql
>>and that problem happened more often, usual once per day or for few
>>days.
>>I never play before with changing priority of processes. When I check
>>iLO log I saw logged error: power lost but server does not lost the
>>power actually,
>>it's just stop respond for 2-3 minutes and then continue normal
>>operation with
>>uptime more then 100 days.
>
>Does redis keep any useful log?  If so, it would be worth looking into
>what it thought that it was doing.


No,

only standard logs:

1333:M 06 May 04:36:43.019 * 1 changes in 60 seconds. Saving...
1333:M 06 May 04:36:43.025 * Background saving started by pid 4325
4325:C 06 May 04:36:4

Re: BUG: rcu_sched detected stalls on CPUs

2017-05-10 Thread Paul E. McKenney
On Wed, May 10, 2017 at 02:00:32PM +0300, mi...@stz-bg.com wrote:
> On 2017-05-09 17:49, Paul E. McKenney wrote:
> >On Tue, May 09, 2017 at 10:53:25AM +0300, mi...@stz-bg.com wrote:
> >>On 2017-05-08 16:24, Paul E. McKenney wrote:
> >>>On Mon, May 08, 2017 at 09:43:15AM +0300, mi...@stz-bg.com wrote:
> On 2017-01-20 17:19, Steven Rostedt wrote:
> >On Fri, 20 Jan 2017 10:43:50 +0200
> >mi...@stz-bg.com wrote:
> >
> >>[1.] One line summary of the problem:
> >>
> >>rcu_sched detected stalls on CPUs and few minutes server not respond.
> >
> >Is this reproducible? Or was this a one time ordeal?
> >
> >>
> >>[2.] Full description of the problem/report:
> >>
> >>Load of my server (postgres database) isnt big less then 0.50 and when
> >>error occured rcu_sched detected stalls on CPUs
> >>server freeze and nothing is work for 3-5 minute.
> >>No network, no video signal, no keyboard, no mouse. Nothing is worked.
> >>After these few minutes everything continue normal.
> >>This usual is happend once per day. When I check in google find a lots
> >>of ppl complain of this error, but no solution.
> >>Do any one know can help me to resolve it ? I spoke with few
> >>friends and
> >>they trying to convince me the problem is in CPU.
> >>I did not believe after a 3 years working CPU suddenly stop working
> >>correctly, but I might be wrong.
> >>
> >>[3.] Keywords (i.e., modules, networking, kernel):
> >>
> >>kernel
> >>
> >>[4.] Kernel information
> >>[4.1.] Kernel version (from /proc/version):
> >>
> >>Linux version 4.4.38 (root@hive64) (gcc version 5.4.0 (GCC) ) #2
> >>SMP Sun
> >>Dec 11 16:11:02 CST 2016
> >>
> >
> >Have you tried a newer version of the kernel?
> >
> >-- Steve
> 
> 
> Hello,
> yesterday I change to new kernel: 4.9.26 and still no effect.
> 
> I trying to figure out what I need to buy because I read on google a
> lots of posts about that problem,
> some ppl suggest is BIOS firmware bug, some ppl tell that when they
> swap CPU problem is resolved.
> May be problem was started when I first time boot 4.x kernel and
> there have cpu microcode updates.
> Im 3.x kernels this feature was not in kernel, but I don't know,
> only guess.
> 
> Can some one point me clearly: You need to change this one and
> problem will be solved ?
> >>>
> >>>Looking back at your earlier email, I see the following:
> >>>
> >>>[Wed Jan  4 10:19:12 2017] rcu_sched kthread starved for 60004
> >>>jiffies! g61967937 c61967936 f0x0 s3 ->state=0x1
> >>>
> >>>That indicates that the task "rcu_sched" isn't being allowed to run.
> >>>The "->state=0x1" indicates that this task is currently asleep and the
> >>>"s3" indicates that it is looking to start a grace period, but has not
> >>>progressed through this task.  My guess is that your system get very
> >>>busy, and that rcu_sched's normal scheduling priority is not sufficient
> >>>for it to be allowed to run.
> >>>
> >>>I therefore suggest that you try raising the rcu_sched task's priority.
> >>>
> >>>You can find this task like this:
> >>>
> >>>   ps -ef | grep rcu_sched
> >>>
> >>>On my laptop, I get this:
> >>>
> >>>   $ ps -ef | grep rcu_sched
> >>>   root 7 2  0 Apr11 ?00:19:19 [rcu_sched]
> >>>   paulmck  18307 22926  0 06:11 pts/35   00:00:00 grep --color=auto
> >>>rcu_sche
> >>>
> >>>You can use the chrt command to set the priority.  For example,
> >>>given that
> >>>on my laptop, rcu_sched is PID 7:
> >>>
> >>>   sudo chrt -f -p 1 7
> >>>
> >>>You can double-check as follows:
> >>>
> >>>   $ chrt -p 7
> >>>   pid 7's current scheduling policy: SCHED_FIFO
> >>>   pid 7's current scheduling priority: 1
> >>>
> >>>Does that help?
> >>>
> >>>   Thanx, Paul
> >>>
> >>
> >>
> >>Hi,
> >>interesting ... I change priority as you suggest and will keep you
> >>in touch
> >>if the problem happens again because I cant reproduce it. Replay on
> >>your next email, I don't have any
> >>virtual servers / guest os on the server. It's server with only one
> >>process - redis server
> >>and in this time when error it's happened actually one of other
> >>servers start loading data
> >>into redis sever. Data is small 3-4 mil. keys every key contain
> >>around 150 symbols.
> >>I have that problem on two servers, next server is database server
> >>with postgresql
> >>and that problem happened more often, usual once per day or for few
> >>days.
> >>I never play before with changing priority of processes. When I check
> >>iLO log I saw logged error: power lost but server does not lost the
> >>power actually,
> >>it's just stop respond for 2-3 minutes and then continue normal
> >>operation with
> >>uptime more then 100 days.
> >
> >Does redis keep any useful log?  If so, it would be worth looking into
> >what it thought th

Re: BUG: rcu_sched detected stalls on CPUs

2017-05-10 Thread mitko

On 2017-05-09 17:49, Paul E. McKenney wrote:

On Tue, May 09, 2017 at 10:53:25AM +0300, mi...@stz-bg.com wrote:

On 2017-05-08 16:24, Paul E. McKenney wrote:
>On Mon, May 08, 2017 at 09:43:15AM +0300, mi...@stz-bg.com wrote:
>>On 2017-01-20 17:19, Steven Rostedt wrote:
>>>On Fri, 20 Jan 2017 10:43:50 +0200
>>>mi...@stz-bg.com wrote:
>>>
[1.] One line summary of the problem:

rcu_sched detected stalls on CPUs and few minutes server not respond.
>>>
>>>Is this reproducible? Or was this a one time ordeal?
>>>

[2.] Full description of the problem/report:

Load of my server (postgres database) isnt big less then 0.50 and when
error occured rcu_sched detected stalls on CPUs
server freeze and nothing is work for 3-5 minute.
No network, no video signal, no keyboard, no mouse. Nothing is worked.
After these few minutes everything continue normal.
This usual is happend once per day. When I check in google find a lots
of ppl complain of this error, but no solution.
Do any one know can help me to resolve it ? I spoke with few
friends and
they trying to convince me the problem is in CPU.
I did not believe after a 3 years working CPU suddenly stop working
correctly, but I might be wrong.

[3.] Keywords (i.e., modules, networking, kernel):

kernel

[4.] Kernel information
[4.1.] Kernel version (from /proc/version):

Linux version 4.4.38 (root@hive64) (gcc version 5.4.0 (GCC) ) #2
SMP Sun
Dec 11 16:11:02 CST 2016

>>>
>>>Have you tried a newer version of the kernel?
>>>
>>>-- Steve
>>
>>
>>Hello,
>>yesterday I change to new kernel: 4.9.26 and still no effect.
>>
>>I trying to figure out what I need to buy because I read on google a
>>lots of posts about that problem,
>>some ppl suggest is BIOS firmware bug, some ppl tell that when they
>>swap CPU problem is resolved.
>>May be problem was started when I first time boot 4.x kernel and
>>there have cpu microcode updates.
>>Im 3.x kernels this feature was not in kernel, but I don't know,
>>only guess.
>>
>>Can some one point me clearly: You need to change this one and
>>problem will be solved ?
>
>Looking back at your earlier email, I see the following:
>
>[Wed Jan  4 10:19:12 2017] rcu_sched kthread starved for 60004
>jiffies! g61967937 c61967936 f0x0 s3 ->state=0x1
>
>That indicates that the task "rcu_sched" isn't being allowed to run.
>The "->state=0x1" indicates that this task is currently asleep and the
>"s3" indicates that it is looking to start a grace period, but has not
>progressed through this task.  My guess is that your system get very
>busy, and that rcu_sched's normal scheduling priority is not sufficient
>for it to be allowed to run.
>
>I therefore suggest that you try raising the rcu_sched task's priority.
>
>You can find this task like this:
>
>ps -ef | grep rcu_sched
>
>On my laptop, I get this:
>
>$ ps -ef | grep rcu_sched
>root 7 2  0 Apr11 ?00:19:19 [rcu_sched]
>paulmck  18307 22926  0 06:11 pts/35   00:00:00 grep --color=auto
>rcu_sche
>
>You can use the chrt command to set the priority.  For example,
>given that
>on my laptop, rcu_sched is PID 7:
>
>sudo chrt -f -p 1 7
>
>You can double-check as follows:
>
>$ chrt -p 7
>pid 7's current scheduling policy: SCHED_FIFO
>pid 7's current scheduling priority: 1
>
>Does that help?
>
>Thanx, Paul
>


Hi,
interesting ... I change priority as you suggest and will keep you
in touch
if the problem happens again because I cant reproduce it. Replay on
your next email, I don't have any
virtual servers / guest os on the server. It's server with only one
process - redis server
and in this time when error it's happened actually one of other
servers start loading data
into redis sever. Data is small 3-4 mil. keys every key contain
around 150 symbols.
I have that problem on two servers, next server is database server
with postgresql
and that problem happened more often, usual once per day or for few
days.
I never play before with changing priority of processes. When I check
iLO log I saw logged error: power lost but server does not lost the
power actually,
it's just stop respond for 2-3 minutes and then continue normal
operation with
uptime more then 100 days.


Does redis keep any useful log?  If so, it would be worth looking into
what it thought that it was doing.



No,

only standard logs:

1333:M 06 May 04:36:43.019 * 1 changes in 60 seconds. Saving...
1333:M 06 May 04:36:43.025 * Background saving started by pid 4325
4325:C 06 May 04:36:46.745 * DB saved on disk
4325:C 06 May 04:36:46.749 * RDB: 40 MB of memory used by copy-on-write
1333:M 06 May 04:36:46.825 * Background saving terminated with success
1333:M 06 May 04:37:47.018 * 1 changes in 60 seconds. Saving...
1333:M 06 May 04:37:47.026 * Background saving started by pid 4326
4326:C 06 May 04:37:50.902 * DB saved on disk
4326:C 06 Ma

Re: BUG: rcu_sched detected stalls on CPUs

2017-05-09 Thread Paul E. McKenney
On Tue, May 09, 2017 at 10:53:25AM +0300, mi...@stz-bg.com wrote:
> On 2017-05-08 16:24, Paul E. McKenney wrote:
> >On Mon, May 08, 2017 at 09:43:15AM +0300, mi...@stz-bg.com wrote:
> >>On 2017-01-20 17:19, Steven Rostedt wrote:
> >>>On Fri, 20 Jan 2017 10:43:50 +0200
> >>>mi...@stz-bg.com wrote:
> >>>
> [1.] One line summary of the problem:
> 
> rcu_sched detected stalls on CPUs and few minutes server not respond.
> >>>
> >>>Is this reproducible? Or was this a one time ordeal?
> >>>
> 
> [2.] Full description of the problem/report:
> 
> Load of my server (postgres database) isnt big less then 0.50 and when
> error occured rcu_sched detected stalls on CPUs
> server freeze and nothing is work for 3-5 minute.
> No network, no video signal, no keyboard, no mouse. Nothing is worked.
> After these few minutes everything continue normal.
> This usual is happend once per day. When I check in google find a lots
> of ppl complain of this error, but no solution.
> Do any one know can help me to resolve it ? I spoke with few
> friends and
> they trying to convince me the problem is in CPU.
> I did not believe after a 3 years working CPU suddenly stop working
> correctly, but I might be wrong.
> 
> [3.] Keywords (i.e., modules, networking, kernel):
> 
> kernel
> 
> [4.] Kernel information
> [4.1.] Kernel version (from /proc/version):
> 
> Linux version 4.4.38 (root@hive64) (gcc version 5.4.0 (GCC) ) #2
> SMP Sun
> Dec 11 16:11:02 CST 2016
> 
> >>>
> >>>Have you tried a newer version of the kernel?
> >>>
> >>>-- Steve
> >>
> >>
> >>Hello,
> >>yesterday I change to new kernel: 4.9.26 and still no effect.
> >>
> >>I trying to figure out what I need to buy because I read on google a
> >>lots of posts about that problem,
> >>some ppl suggest is BIOS firmware bug, some ppl tell that when they
> >>swap CPU problem is resolved.
> >>May be problem was started when I first time boot 4.x kernel and
> >>there have cpu microcode updates.
> >>Im 3.x kernels this feature was not in kernel, but I don't know,
> >>only guess.
> >>
> >>Can some one point me clearly: You need to change this one and
> >>problem will be solved ?
> >
> >Looking back at your earlier email, I see the following:
> >
> >[Wed Jan  4 10:19:12 2017] rcu_sched kthread starved for 60004
> >jiffies! g61967937 c61967936 f0x0 s3 ->state=0x1
> >
> >That indicates that the task "rcu_sched" isn't being allowed to run.
> >The "->state=0x1" indicates that this task is currently asleep and the
> >"s3" indicates that it is looking to start a grace period, but has not
> >progressed through this task.  My guess is that your system get very
> >busy, and that rcu_sched's normal scheduling priority is not sufficient
> >for it to be allowed to run.
> >
> >I therefore suggest that you try raising the rcu_sched task's priority.
> >
> >You can find this task like this:
> >
> > ps -ef | grep rcu_sched
> >
> >On my laptop, I get this:
> >
> > $ ps -ef | grep rcu_sched
> > root 7 2  0 Apr11 ?00:19:19 [rcu_sched]
> > paulmck  18307 22926  0 06:11 pts/35   00:00:00 grep --color=auto
> >rcu_sche
> >
> >You can use the chrt command to set the priority.  For example,
> >given that
> >on my laptop, rcu_sched is PID 7:
> >
> > sudo chrt -f -p 1 7
> >
> >You can double-check as follows:
> >
> > $ chrt -p 7
> > pid 7's current scheduling policy: SCHED_FIFO
> > pid 7's current scheduling priority: 1
> >
> >Does that help?
> >
> > Thanx, Paul
> >
> 
> 
> Hi,
> interesting ... I change priority as you suggest and will keep you
> in touch
> if the problem happens again because I cant reproduce it. Replay on
> your next email, I don't have any
> virtual servers / guest os on the server. It's server with only one
> process - redis server
> and in this time when error it's happened actually one of other
> servers start loading data
> into redis sever. Data is small 3-4 mil. keys every key contain
> around 150 symbols.
> I have that problem on two servers, next server is database server
> with postgresql
> and that problem happened more often, usual once per day or for few
> days.
> I never play before with changing priority of processes. When I check
> iLO log I saw logged error: power lost but server does not lost the
> power actually,
> it's just stop respond for 2-3 minutes and then continue normal
> operation with
> uptime more then 100 days.

Does redis keep any useful log?  If so, it would be worth looking into
what it thought that it was doing.

Thanx, Paul

> Thanks for your suggestion.
> Regards,
> Mitko
> 
> 
> 
> 
> >>[239940.067938] clocksource: timekeeping watchdog on CPU26: Marking
> >>clocksource 'tsc' as unstable because the skew is too large:
> >>[239940.067943] clocksource:   'hpet' wd_now:

Re: BUG: rcu_sched detected stalls on CPUs

2017-05-09 Thread mitko

On 2017-05-08 16:24, Paul E. McKenney wrote:

On Mon, May 08, 2017 at 09:43:15AM +0300, mi...@stz-bg.com wrote:

On 2017-01-20 17:19, Steven Rostedt wrote:
>On Fri, 20 Jan 2017 10:43:50 +0200
>mi...@stz-bg.com wrote:
>
>>[1.] One line summary of the problem:
>>
>>rcu_sched detected stalls on CPUs and few minutes server not respond.
>
>Is this reproducible? Or was this a one time ordeal?
>
>>
>>[2.] Full description of the problem/report:
>>
>>Load of my server (postgres database) isnt big less then 0.50 and when
>>error occured rcu_sched detected stalls on CPUs
>>server freeze and nothing is work for 3-5 minute.
>>No network, no video signal, no keyboard, no mouse. Nothing is worked.
>>After these few minutes everything continue normal.
>>This usual is happend once per day. When I check in google find a lots
>>of ppl complain of this error, but no solution.
>>Do any one know can help me to resolve it ? I spoke with few
>>friends and
>>they trying to convince me the problem is in CPU.
>>I did not believe after a 3 years working CPU suddenly stop working
>>correctly, but I might be wrong.
>>
>>[3.] Keywords (i.e., modules, networking, kernel):
>>
>>kernel
>>
>>[4.] Kernel information
>>[4.1.] Kernel version (from /proc/version):
>>
>>Linux version 4.4.38 (root@hive64) (gcc version 5.4.0 (GCC) ) #2
>>SMP Sun
>>Dec 11 16:11:02 CST 2016
>>
>
>Have you tried a newer version of the kernel?
>
>-- Steve


Hello,
yesterday I change to new kernel: 4.9.26 and still no effect.

I trying to figure out what I need to buy because I read on google a
lots of posts about that problem,
some ppl suggest is BIOS firmware bug, some ppl tell that when they
swap CPU problem is resolved.
May be problem was started when I first time boot 4.x kernel and
there have cpu microcode updates.
Im 3.x kernels this feature was not in kernel, but I don't know,
only guess.

Can some one point me clearly: You need to change this one and
problem will be solved ?


Looking back at your earlier email, I see the following:

[Wed Jan  4 10:19:12 2017] rcu_sched kthread starved for 60004
jiffies! g61967937 c61967936 f0x0 s3 ->state=0x1

That indicates that the task "rcu_sched" isn't being allowed to run.
The "->state=0x1" indicates that this task is currently asleep and the
"s3" indicates that it is looking to start a grace period, but has not
progressed through this task.  My guess is that your system get very
busy, and that rcu_sched's normal scheduling priority is not sufficient
for it to be allowed to run.

I therefore suggest that you try raising the rcu_sched task's priority.

You can find this task like this:

ps -ef | grep rcu_sched

On my laptop, I get this:

$ ps -ef | grep rcu_sched
root 7 2  0 Apr11 ?00:19:19 [rcu_sched]
	paulmck  18307 22926  0 06:11 pts/35   00:00:00 grep --color=auto 
rcu_sche


You can use the chrt command to set the priority.  For example, given 
that

on my laptop, rcu_sched is PID 7:

sudo chrt -f -p 1 7

You can double-check as follows:

$ chrt -p 7
pid 7's current scheduling policy: SCHED_FIFO
pid 7's current scheduling priority: 1

Does that help?

Thanx, Paul




Hi,
interesting ... I change priority as you suggest and will keep you in 
touch
if the problem happens again because I cant reproduce it. Replay on your 
next email, I don't have any
virtual servers / guest os on the server. It's server with only one 
process - redis server
and in this time when error it's happened actually one of other servers 
start loading data
into redis sever. Data is small 3-4 mil. keys every key contain around 
150 symbols.
I have that problem on two servers, next server is database server with 
postgresql
and that problem happened more often, usual once per day or for few 
days.

I never play before with changing priority of processes. When I check
iLO log I saw logged error: power lost but server does not lost the 
power actually,
it's just stop respond for 2-3 minutes and then continue normal 
operation with

uptime more then 100 days.

Thanks for your suggestion.
Regards,
Mitko





[239940.067938] clocksource: timekeeping watchdog on CPU26: Marking
clocksource 'tsc' as unstable because the skew is too large:
[239940.067943] clocksource:   'hpet' wd_now:
ecb521ce wd_last: ca45912d mask: 
[239940.067946] clocksource:   'tsc' cs_now:
2d12df5f88d08 cs_last: 29fffbe2d36d6 mask: 
[239940.068357] clocksource: Switched to clocksource hpet
[24.066457] INFO: rcu_sched detected stalls on CPUs/tasks:
[24.066488] 2-...: (38 GPs behind) idle=f1a/0/0
softirq=2417028/2417028 fqs=0
[24.066491] 3-...: (27 GPs behind) idle=0bc/0/0
softirq=3076046/3076047 fqs=0
[24.066494] 4-...: (1006 GPs behind) idle=308/0/0
softirq=1474922/1474922 fqs=0
[24.066497] 5-...: (8034 GPs behind) idle=7b4/0/0
softirq=69

Re: BUG: rcu_sched detected stalls on CPUs

2017-05-08 Thread Paul E. McKenney
On Mon, May 08, 2017 at 06:24:23AM -0700, Paul E. McKenney wrote:
> On Mon, May 08, 2017 at 09:43:15AM +0300, mi...@stz-bg.com wrote:
> > On 2017-01-20 17:19, Steven Rostedt wrote:

[ . . . ]

> > Hello,
> > yesterday I change to new kernel: 4.9.26 and still no effect.
> > 
> > I trying to figure out what I need to buy because I read on google a
> > lots of posts about that problem,
> > some ppl suggest is BIOS firmware bug, some ppl tell that when they
> > swap CPU problem is resolved.
> > May be problem was started when I first time boot 4.x kernel and
> > there have cpu microcode updates.
> > Im 3.x kernels this feature was not in kernel, but I don't know,
> > only guess.
> > 
> > Can some one point me clearly: You need to change this one and
> > problem will be solved ?
> 
> Looking back at your earlier email, I see the following:
> 
> [Wed Jan  4 10:19:12 2017] rcu_sched kthread starved for 60004 jiffies! 
> g61967937 c61967936 f0x0 s3 ->state=0x1
> 
> That indicates that the task "rcu_sched" isn't being allowed to run.
> The "->state=0x1" indicates that this task is currently asleep and the
> "s3" indicates that it is looking to start a grace period, but has not
> progressed through this task.  My guess is that your system get very
> busy, and that rcu_sched's normal scheduling priority is not sufficient
> for it to be allowed to run.
> 
> I therefore suggest that you try raising the rcu_sched task's priority.
> 
> You can find this task like this:
> 
>   ps -ef | grep rcu_sched
> 
> On my laptop, I get this:
> 
>   $ ps -ef | grep rcu_sched
>   root 7 2  0 Apr11 ?00:19:19 [rcu_sched]
>   paulmck  18307 22926  0 06:11 pts/35   00:00:00 grep --color=auto 
> rcu_sche
> 
> You can use the chrt command to set the priority.  For example, given that
> on my laptop, rcu_sched is PID 7:
> 
>   sudo chrt -f -p 1 7
> 
> You can double-check as follows:
> 
>   $ chrt -p 7
>   pid 7's current scheduling policy: SCHED_FIFO
>   pid 7's current scheduling priority: 1
> 
> Does that help?

And I should add that this assumes that the interference is from
within this same OS instance.  If you are instead experiencing massive
interference due to other guest OSes on this same system, well, we are
working on it.

Thanx, Paul

> > [239940.067938] clocksource: timekeeping watchdog on CPU26: Marking
> > clocksource 'tsc' as unstable because the skew is too large:
> > [239940.067943] clocksource:   'hpet' wd_now:
> > ecb521ce wd_last: ca45912d mask: 
> > [239940.067946] clocksource:   'tsc' cs_now:
> > 2d12df5f88d08 cs_last: 29fffbe2d36d6 mask: 
> > [239940.068357] clocksource: Switched to clocksource hpet
> > [24.066457] INFO: rcu_sched detected stalls on CPUs/tasks:
> > [24.066488] 2-...: (38 GPs behind) idle=f1a/0/0
> > softirq=2417028/2417028 fqs=0
> > [24.066491] 3-...: (27 GPs behind) idle=0bc/0/0
> > softirq=3076046/3076047 fqs=0
> > [24.066494] 4-...: (1006 GPs behind) idle=308/0/0
> > softirq=1474922/1474922 fqs=0
> > [24.066497] 5-...: (8034 GPs behind) idle=7b4/0/0
> > softirq=69165/69166 fqs=0
> > [24.066499] 6-...: (1665 GPs behind) idle=716/0/0
> > softirq=16465/16465 fqs=0
> > [24.066502] 7-...: (8034 GPs behind) idle=bd0/0/0
> > softirq=15923/15924 fqs=0
> > [24.066505] 8-...: (1665 GPs behind) idle=a62/0/0
> > softirq=93861/93861 fqs=0
> > [24.066507] 9-...: (1258 GPs behind) idle=a8c/0/0
> > softirq=21608/21608 fqs=0
> > [24.066510] 11-...: (1665 GPs behind) idle=8e6/0/0
> > softirq=18844/18844 fqs=0
> > [24.066512] 13-...: (1665 GPs behind) idle=ab8/0/0
> > softirq=21278/21278 fqs=0
> > [24.066515] 14-...: (4854 GPs behind) idle=578/0/0
> > softirq=25624/25626 fqs=0
> > [24.066517] 15-...: (8034 GPs behind) idle=f5e/0/0
> > softirq=18114/18115 fqs=0
> > [24.066520] 16-...: (1665 GPs behind) idle=b66/0/0
> > softirq=19511/19511 fqs=0
> > [24.066528] 17-...: (0 ticks this GP) idle=e18/0/0
> > softirq=16163/16163 fqs=0
> > [24.066531] 18-...: (1665 GPs behind) idle=a18/0/0
> > softirq=16424/16424 fqs=0
> > [24.066533] 19-...: (1665 GPs behind) idle=944/0/0
> > softirq=17920/17920 fqs=0
> > [24.066549] 20-...: (20991 GPs behind) idle=750/0/0
> > softirq=17549/17552 fqs=0
> > [24.066552] 21-...: (0 ticks this GP) idle=9e0/0/0
> > softirq=18937/18937 fqs=0
> > [24.066555] 22-...: (14519 GPs behind) idle=95e/0/0
> > softirq=16122/16124 fqs=0
> > [24.066557] 23-...: (1665 GPs behind) idle=bbc/0/0
> > softirq=16316/16316 fqs=0
> > [24.066560] 24-...: (0 ticks this GP) idle=c56/0/0
> > softirq=20070/20070 fqs=0
> > [24.066563] 27-...: (8034 GPs behind) idle=bdc/0/0
> > softirq=18291/1

Re: BUG: rcu_sched detected stalls on CPUs

2017-05-08 Thread Paul E. McKenney
On Mon, May 08, 2017 at 09:43:15AM +0300, mi...@stz-bg.com wrote:
> On 2017-01-20 17:19, Steven Rostedt wrote:
> >On Fri, 20 Jan 2017 10:43:50 +0200
> >mi...@stz-bg.com wrote:
> >
> >>[1.] One line summary of the problem:
> >>
> >>rcu_sched detected stalls on CPUs and few minutes server not respond.
> >
> >Is this reproducible? Or was this a one time ordeal?
> >
> >>
> >>[2.] Full description of the problem/report:
> >>
> >>Load of my server (postgres database) isnt big less then 0.50 and when
> >>error occured rcu_sched detected stalls on CPUs
> >>server freeze and nothing is work for 3-5 minute.
> >>No network, no video signal, no keyboard, no mouse. Nothing is worked.
> >>After these few minutes everything continue normal.
> >>This usual is happend once per day. When I check in google find a lots
> >>of ppl complain of this error, but no solution.
> >>Do any one know can help me to resolve it ? I spoke with few
> >>friends and
> >>they trying to convince me the problem is in CPU.
> >>I did not believe after a 3 years working CPU suddenly stop working
> >>correctly, but I might be wrong.
> >>
> >>[3.] Keywords (i.e., modules, networking, kernel):
> >>
> >>kernel
> >>
> >>[4.] Kernel information
> >>[4.1.] Kernel version (from /proc/version):
> >>
> >>Linux version 4.4.38 (root@hive64) (gcc version 5.4.0 (GCC) ) #2
> >>SMP Sun
> >>Dec 11 16:11:02 CST 2016
> >>
> >
> >Have you tried a newer version of the kernel?
> >
> >-- Steve
> 
> 
> Hello,
> yesterday I change to new kernel: 4.9.26 and still no effect.
> 
> I trying to figure out what I need to buy because I read on google a
> lots of posts about that problem,
> some ppl suggest is BIOS firmware bug, some ppl tell that when they
> swap CPU problem is resolved.
> May be problem was started when I first time boot 4.x kernel and
> there have cpu microcode updates.
> Im 3.x kernels this feature was not in kernel, but I don't know,
> only guess.
> 
> Can some one point me clearly: You need to change this one and
> problem will be solved ?

Looking back at your earlier email, I see the following:

[Wed Jan  4 10:19:12 2017] rcu_sched kthread starved for 60004 jiffies! 
g61967937 c61967936 f0x0 s3 ->state=0x1

That indicates that the task "rcu_sched" isn't being allowed to run.
The "->state=0x1" indicates that this task is currently asleep and the
"s3" indicates that it is looking to start a grace period, but has not
progressed through this task.  My guess is that your system get very
busy, and that rcu_sched's normal scheduling priority is not sufficient
for it to be allowed to run.

I therefore suggest that you try raising the rcu_sched task's priority.

You can find this task like this:

ps -ef | grep rcu_sched

On my laptop, I get this:

$ ps -ef | grep rcu_sched
root 7 2  0 Apr11 ?00:19:19 [rcu_sched]
paulmck  18307 22926  0 06:11 pts/35   00:00:00 grep --color=auto 
rcu_sche

You can use the chrt command to set the priority.  For example, given that
on my laptop, rcu_sched is PID 7:

sudo chrt -f -p 1 7

You can double-check as follows:

$ chrt -p 7
pid 7's current scheduling policy: SCHED_FIFO
pid 7's current scheduling priority: 1

Does that help?

Thanx, Paul

> [239940.067938] clocksource: timekeeping watchdog on CPU26: Marking
> clocksource 'tsc' as unstable because the skew is too large:
> [239940.067943] clocksource:   'hpet' wd_now:
> ecb521ce wd_last: ca45912d mask: 
> [239940.067946] clocksource:   'tsc' cs_now:
> 2d12df5f88d08 cs_last: 29fffbe2d36d6 mask: 
> [239940.068357] clocksource: Switched to clocksource hpet
> [24.066457] INFO: rcu_sched detected stalls on CPUs/tasks:
> [24.066488] 2-...: (38 GPs behind) idle=f1a/0/0
> softirq=2417028/2417028 fqs=0
> [24.066491] 3-...: (27 GPs behind) idle=0bc/0/0
> softirq=3076046/3076047 fqs=0
> [24.066494] 4-...: (1006 GPs behind) idle=308/0/0
> softirq=1474922/1474922 fqs=0
> [24.066497] 5-...: (8034 GPs behind) idle=7b4/0/0
> softirq=69165/69166 fqs=0
> [24.066499] 6-...: (1665 GPs behind) idle=716/0/0
> softirq=16465/16465 fqs=0
> [24.066502] 7-...: (8034 GPs behind) idle=bd0/0/0
> softirq=15923/15924 fqs=0
> [24.066505] 8-...: (1665 GPs behind) idle=a62/0/0
> softirq=93861/93861 fqs=0
> [24.066507] 9-...: (1258 GPs behind) idle=a8c/0/0
> softirq=21608/21608 fqs=0
> [24.066510] 11-...: (1665 GPs behind) idle=8e6/0/0
> softirq=18844/18844 fqs=0
> [24.066512] 13-...: (1665 GPs behind) idle=ab8/0/0
> softirq=21278/21278 fqs=0
> [24.066515] 14-...: (4854 GPs behind) idle=578/0/0
> softirq=25624/25626 fqs=0
> [24.066517] 15-...: (8034 GPs behind) idle=f5e/0/0
> softirq=18114/18115 fqs=0
> [24.066520] 16-...: (1665 GPs behind) idle=b66/0/0
> sof

Re: BUG: rcu_sched detected stalls on CPUs

2017-05-07 Thread mitko

On 2017-01-20 17:19, Steven Rostedt wrote:

On Fri, 20 Jan 2017 10:43:50 +0200
mi...@stz-bg.com wrote:


[1.] One line summary of the problem:

rcu_sched detected stalls on CPUs and few minutes server not respond.


Is this reproducible? Or was this a one time ordeal?



[2.] Full description of the problem/report:

Load of my server (postgres database) isnt big less then 0.50 and when
error occured rcu_sched detected stalls on CPUs
server freeze and nothing is work for 3-5 minute.
No network, no video signal, no keyboard, no mouse. Nothing is worked.
After these few minutes everything continue normal.
This usual is happend once per day. When I check in google find a lots
of ppl complain of this error, but no solution.
Do any one know can help me to resolve it ? I spoke with few friends 
and

they trying to convince me the problem is in CPU.
I did not believe after a 3 years working CPU suddenly stop working
correctly, but I might be wrong.

[3.] Keywords (i.e., modules, networking, kernel):

kernel

[4.] Kernel information
[4.1.] Kernel version (from /proc/version):

Linux version 4.4.38 (root@hive64) (gcc version 5.4.0 (GCC) ) #2 SMP 
Sun

Dec 11 16:11:02 CST 2016



Have you tried a newer version of the kernel?

-- Steve



Hello,
yesterday I change to new kernel: 4.9.26 and still no effect.

I trying to figure out what I need to buy because I read on google a 
lots of posts about that problem,
some ppl suggest is BIOS firmware bug, some ppl tell that when they swap 
CPU problem is resolved.
May be problem was started when I first time boot 4.x kernel and there 
have cpu microcode updates.
Im 3.x kernels this feature was not in kernel, but I don't know, only 
guess.


Can some one point me clearly: You need to change this one and problem 
will be solved ?



[239940.067938] clocksource: timekeeping watchdog on CPU26: Marking 
clocksource 'tsc' as unstable because the skew is too large:
[239940.067943] clocksource:   'hpet' wd_now: 
ecb521ce wd_last: ca45912d mask: 
[239940.067946] clocksource:   'tsc' cs_now: 
2d12df5f88d08 cs_last: 29fffbe2d36d6 mask: 

[239940.068357] clocksource: Switched to clocksource hpet
[24.066457] INFO: rcu_sched detected stalls on CPUs/tasks:
[24.066488] 2-...: (38 GPs behind) idle=f1a/0/0 
softirq=2417028/2417028 fqs=0
[24.066491] 3-...: (27 GPs behind) idle=0bc/0/0 
softirq=3076046/3076047 fqs=0
[24.066494] 4-...: (1006 GPs behind) idle=308/0/0 
softirq=1474922/1474922 fqs=0
[24.066497] 5-...: (8034 GPs behind) idle=7b4/0/0 
softirq=69165/69166 fqs=0
[24.066499] 6-...: (1665 GPs behind) idle=716/0/0 
softirq=16465/16465 fqs=0
[24.066502] 7-...: (8034 GPs behind) idle=bd0/0/0 
softirq=15923/15924 fqs=0
[24.066505] 8-...: (1665 GPs behind) idle=a62/0/0 
softirq=93861/93861 fqs=0
[24.066507] 9-...: (1258 GPs behind) idle=a8c/0/0 
softirq=21608/21608 fqs=0
[24.066510] 11-...: (1665 GPs behind) idle=8e6/0/0 
softirq=18844/18844 fqs=0
[24.066512] 13-...: (1665 GPs behind) idle=ab8/0/0 
softirq=21278/21278 fqs=0
[24.066515] 14-...: (4854 GPs behind) idle=578/0/0 
softirq=25624/25626 fqs=0
[24.066517] 15-...: (8034 GPs behind) idle=f5e/0/0 
softirq=18114/18115 fqs=0
[24.066520] 16-...: (1665 GPs behind) idle=b66/0/0 
softirq=19511/19511 fqs=0
[24.066528] 17-...: (0 ticks this GP) idle=e18/0/0 
softirq=16163/16163 fqs=0
[24.066531] 18-...: (1665 GPs behind) idle=a18/0/0 
softirq=16424/16424 fqs=0
[24.066533] 19-...: (1665 GPs behind) idle=944/0/0 
softirq=17920/17920 fqs=0
[24.066549] 20-...: (20991 GPs behind) idle=750/0/0 
softirq=17549/17552 fqs=0
[24.066552] 21-...: (0 ticks this GP) idle=9e0/0/0 
softirq=18937/18937 fqs=0
[24.066555] 22-...: (14519 GPs behind) idle=95e/0/0 
softirq=16122/16124 fqs=0
[24.066557] 23-...: (1665 GPs behind) idle=bbc/0/0 
softirq=16316/16316 fqs=0
[24.066560] 24-...: (0 ticks this GP) idle=c56/0/0 
softirq=20070/20070 fqs=0
[24.066563] 27-...: (8034 GPs behind) idle=bdc/0/0 
softirq=18291/18292 fqs=0
[24.066566] 28-...: (8034 GPs behind) idle=4f8/0/0 
softirq=22313/22314 fqs=0
[24.066568] 29-...: (3443 GPs behind) idle=22a/0/0 
softirq=24975/24976 fqs=0
[24.066571] 30-...: (1665 GPs behind) idle=940/0/0 
softirq=89382/89382 fqs=0
[24.066572] (detected by 10, t=60002 jiffies, g=4843721, 
c=4843720, q=659)

[24.066578] Task dump for CPU 2:
[24.066581] swapper/2   R  running task0 0  1 
0x0020
[24.066588]  0083 a5cecd79 8aca3eea1e00 
a64cab40
[24.066593]  0005 a64cad38 a96c80073eb0 
a5ba9d93
[24.066597]  8aca3eea1e00 0e94fec6 8aca38535940 
0002


Re: BUG: rcu_sched detected stalls on CPUs

2017-01-20 Thread mitko

On 2017-01-20 17:19, Steven Rostedt wrote:

On Fri, 20 Jan 2017 10:43:50 +0200
mi...@stz-bg.com wrote:


[1.] One line summary of the problem:

rcu_sched detected stalls on CPUs and few minutes server not respond.


Is this reproducible? Or was this a one time ordeal?


It's happened usual once per day and can't be reproducible. If server 
load average is more then

0.70 it's will happened twice per day.





[2.] Full description of the problem/report:

Load of my server (postgres database) isnt big less then 0.50 and when
error occured rcu_sched detected stalls on CPUs
server freeze and nothing is work for 3-5 minute.
No network, no video signal, no keyboard, no mouse. Nothing is worked.
After these few minutes everything continue normal.
This usual is happend once per day. When I check in google find a lots
of ppl complain of this error, but no solution.
Do any one know can help me to resolve it ? I spoke with few friends 
and

they trying to convince me the problem is in CPU.
I did not believe after a 3 years working CPU suddenly stop working
correctly, but I might be wrong.

[3.] Keywords (i.e., modules, networking, kernel):

kernel

[4.] Kernel information
[4.1.] Kernel version (from /proc/version):

Linux version 4.4.38 (root@hive64) (gcc version 5.4.0 (GCC) ) #2 SMP 
Sun

Dec 11 16:11:02 CST 2016



Have you tried a newer version of the kernel?


No, for the moment I didnt because I can't find new one in slackware 
repos.
My last kernel compile was almost 7 years ago and I do not feel safe 
now, I guess a lot of things have changed.





-- Steve


Regards,
Mitko


Re: BUG: rcu_sched detected stalls on CPUs

2017-01-20 Thread Steven Rostedt
On Fri, 20 Jan 2017 10:43:50 +0200
mi...@stz-bg.com wrote:

> [1.] One line summary of the problem:
> 
> rcu_sched detected stalls on CPUs and few minutes server not respond.

Is this reproducible? Or was this a one time ordeal?

> 
> [2.] Full description of the problem/report:
> 
> Load of my server (postgres database) isnt big less then 0.50 and when 
> error occured rcu_sched detected stalls on CPUs
> server freeze and nothing is work for 3-5 minute.
> No network, no video signal, no keyboard, no mouse. Nothing is worked. 
> After these few minutes everything continue normal.
> This usual is happend once per day. When I check in google find a lots 
> of ppl complain of this error, but no solution.
> Do any one know can help me to resolve it ? I spoke with few friends and 
> they trying to convince me the problem is in CPU.
> I did not believe after a 3 years working CPU suddenly stop working 
> correctly, but I might be wrong.
> 
> [3.] Keywords (i.e., modules, networking, kernel):
> 
> kernel
> 
> [4.] Kernel information
> [4.1.] Kernel version (from /proc/version):
> 
> Linux version 4.4.38 (root@hive64) (gcc version 5.4.0 (GCC) ) #2 SMP Sun 
> Dec 11 16:11:02 CST 2016
> 

Have you tried a newer version of the kernel?

-- Steve