Re: [CentOS-virt] Major stability problems with xen 4.6.6

2017-08-29 Thread Nathan March
> It seems the patch you mentioned was merged to upstream Linux here:
>
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
> d=71472fa9c52b1da27663c275d416d8654b905f05
> 
> and then reverted/removed here:
>
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
> d=896d81fefe5d1919537db2c2150ab6384e4a6610
> 
> Do you know if there has been proper/fixed patch after that? has it been
> merged to upstream Linux kernel already?

Interesting! I didn't come across that when digging into this.

It looks like this hasn't been followed up on at all  since April:
https://lists.gt.net/engine?list=linux;do=search_results;search_type=AND;sea
rch_forum=forum_1;search_string=ldisc%20reopened&sb=post_time

Currently I've got ~40 dom0's running with the patch on 4.9.44-39 and it's
resolved all stability issues, previously I was seeing multiple crashes a
week.

Cheers,
Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Major stability problems with xen 4.6.6

2017-08-29 Thread Pasi Kärkkäinen
Hi,

On Thu, Aug 24, 2017 at 03:45:46PM -0700, Nathan March wrote:
>Just in case anyone else on this list is running into similar issues, I
>can confirm that the patch appears to have resolved this.
> 
> 
>I've opened [1]https://bugs.centos.org/view.php?id=13713
> 
> 
>It was so bad that having the system under load (with rpmbuild) and
>opening another ssh window or two would almost always cause the oops.
>

It seems the patch you mentioned was merged to upstream Linux here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=71472fa9c52b1da27663c275d416d8654b905f05

and then reverted/removed here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=896d81fefe5d1919537db2c2150ab6384e4a6610

Do you know if there has been proper/fixed patch after that? has it been merged 
to upstream Linux kernel already? 


Thanks,

-- Pasi

> 
> 
>Cheers,
> 
>Nathan
> 
> 
> 
>From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of
>Nathan March
>Sent: Wednesday, August 23, 2017 3:32 PM
>To: 'Discussion about the virtualization on CentOS'
>
>Subject: Re: [CentOS-virt] Major stability problems with xen 4.6.6
> 
> 
> 
>This appears to be a centos kernel issue rather than a xen one.
> 
> 
> 
>[2]https://lkml.org/lkml/2016/5/17/440
> 
> 
> 
>Digging through the posts and not clear why this never made it upstream...
> 
> 
> 
>I'm going to apply that patch to my systems and see if it resolves, but
>won't know for certain until a week or two of stability goes by.
> 
> 
> 
>- Nathan
> 
> 
> 
> 
> 
>From: CentOS-virt [[3]mailto:centos-virt-boun...@centos.org] On Behalf Of
>Nathan March
>Sent: Wednesday, August 23, 2017 2:48 PM
>To: [4]centos-virt@centos.org
>Subject: [CentOS-virt] Major stability problems with xen 4.6.6
> 
> 
> 
>Hi,
> 
> 
> 
>I'm seeing numerous crashes on the xen 4.6.6-1 / 4.6.6-2 releases, on both
>the 4.9.34-29 and 4.9.39-29 kernels.
> 
> 
> 
>I've attached a txt with two different servers outputs.
> 
> 
> 
>Xen-028: This crashed this morning while running 4.6.6-1 and 4.9.39-29
> 
>Xen-001: This crashed shortly after being upgraded to 4.6.6-2 and
>4.9.34-29
> 
> 
> 
>Both are on different hardware platforms, and have had a long history of
>being stable until these upgrades.
> 
> 
> 
>It sounds potentially related to
>
> [5]https://kernel.googlesource.com/pub/scm/linux/kernel/git/tiwai/sound-unstable/+/9ce119f318ba1a07c29149301f1544b6c4bea52a%5E%21/
>but I've confirmed this patch is in the above kernels.
> 
> 
> 
>Any suggestions / thoughts?
> 
> 
> 
>Cheers,
> 
>Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Major stability problems with xen 4.6.6

2017-08-24 Thread Nathan March
Just in case anyone else on this list is running into similar issues, I can
confirm that the patch appears to have resolved this.

 

I've opened https://bugs.centos.org/view.php?id=13713

 

It was so bad that having the system under load (with rpmbuild) and opening
another ssh window or two would almost always cause the oops.

 

Cheers,

Nathan

 

 

From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of
Nathan March
Sent: Wednesday, August 23, 2017 3:32 PM
To: 'Discussion about the virtualization on CentOS' 
Subject: Re: [CentOS-virt] Major stability problems with xen 4.6.6

 

This appears to be a centos kernel issue rather than a xen one.

 

https://lkml.org/lkml/2016/5/17/440

 

Digging through the posts and not clear why this never made it upstream.

 

I'm going to apply that patch to my systems and see if it resolves, but
won't know for certain until a week or two of stability goes by.

 

- Nathan

 

 

From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of
Nathan March
Sent: Wednesday, August 23, 2017 2:48 PM
To: centos-virt@centos.org <mailto:centos-virt@centos.org> 
Subject: [CentOS-virt] Major stability problems with xen 4.6.6

 

Hi,

 

I'm seeing numerous crashes on the xen 4.6.6-1 / 4.6.6-2 releases, on both
the 4.9.34-29 and 4.9.39-29 kernels.

 

I've attached a txt with two different servers outputs.

 

Xen-028: This crashed this morning while running 4.6.6-1 and 4.9.39-29

Xen-001: This crashed shortly after being upgraded to 4.6.6-2 and 4.9.34-29

 

Both are on different hardware platforms, and have had a long history of
being stable until these upgrades.

 

It sounds potentially related to
https://kernel.googlesource.com/pub/scm/linux/kernel/git/tiwai/sound-unstabl
e/+/9ce119f318ba1a07c29149301f1544b6c4bea52a%5E%21/ but I've confirmed this
patch is in the above kernels.

 

Any suggestions / thoughts?

 

Cheers,

Nathan

 

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Major stability problems with xen 4.6.6

2017-08-23 Thread Nathan March
This appears to be a centos kernel issue rather than a xen one.

 

https://lkml.org/lkml/2016/5/17/440

 

Digging through the posts and not clear why this never made it upstream.

 

I'm going to apply that patch to my systems and see if it resolves, but
won't know for certain until a week or two of stability goes by.

 

- Nathan

 

 

From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of
Nathan March
Sent: Wednesday, August 23, 2017 2:48 PM
To: centos-virt@centos.org
Subject: [CentOS-virt] Major stability problems with xen 4.6.6

 

Hi,

 

I'm seeing numerous crashes on the xen 4.6.6-1 / 4.6.6-2 releases, on both
the 4.9.34-29 and 4.9.39-29 kernels.

 

I've attached a txt with two different servers outputs.

 

Xen-028: This crashed this morning while running 4.6.6-1 and 4.9.39-29

Xen-001: This crashed shortly after being upgraded to 4.6.6-2 and 4.9.34-29

 

Both are on different hardware platforms, and have had a long history of
being stable until these upgrades.

 

It sounds potentially related to
https://kernel.googlesource.com/pub/scm/linux/kernel/git/tiwai/sound-unstabl
e/+/9ce119f318ba1a07c29149301f1544b6c4bea52a%5E%21/ but I've confirmed this
patch is in the above kernels.

 

Any suggestions / thoughts?

 

Cheers,

Nathan

 

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


[CentOS-virt] Major stability problems with xen 4.6.6

2017-08-23 Thread Nathan March
Hi,

 

I'm seeing numerous crashes on the xen 4.6.6-1 / 4.6.6-2 releases, on both
the 4.9.34-29 and 4.9.39-29 kernels.

 

I've attached a txt with two different servers outputs.

 

Xen-028: This crashed this morning while running 4.6.6-1 and 4.9.39-29

Xen-001: This crashed shortly after being upgraded to 4.6.6-2 and 4.9.34-29

 

Both are on different hardware platforms, and have had a long history of
being stable until these upgrades.

 

It sounds potentially related to
https://kernel.googlesource.com/pub/scm/linux/kernel/git/tiwai/sound-unstabl
e/+/9ce119f318ba1a07c29149301f1544b6c4bea52a%5E%21/ but I've confirmed this
patch is in the above kernels.

 

Any suggestions / thoughts?

 

Cheers,

Nathan

 

Aug 23 10:19:31 xen-028 kernel: [590071.735515] BUG: unable to handle kernel 
paging request at 2260
Aug 23 10:19:31 xen-028 kernel: [590071.735795] IP: [] 
n_tty_receive_buf_common+0xa4/0x1f0
Aug 23 10:19:31 xen-028 kernel: [590071.736031] PGD 0 
Aug 23 10:19:31 xen-028 kernel: [590071.736083] 
Aug 23 10:19:31 xen-028 kernel: [590071.736300] Oops:  [#1] SMP
Aug 23 10:19:31 xen-028 kernel: [590071.736470] Modules linked in: ebt_ip6 
ebt_ip ebtable_filter ebtables arptable_filter arp_tables bridge xen_pciback 
xen_gntalloc nfsd auth_rpcgss nfsv3 nfs_acl nfs fscache lockd sunrpc grace 
8021q mrp garp stp llc bonding blktap xen_netback xen_blkback xen_gntdev 
xen_evtchn xenfs xen_privcmd ipmi_devintf ipmi_si ipmi_msghandler gpio_ich 
iTCO_wdt iTCO_vendor_support fjes acpi_power_meter dcdbas pcspkr serio_raw 
joydev lpc_ich igb ixgbe dca ptp pps_core mdio i7core_edac edac_core bnx2 raid1 
megaraid_sas ttm
Aug 23 10:19:31 xen-028 kernel: [590071.740051] CPU: 14 PID: 21615 Comm: 
kworker/u48:1 Not tainted 4.9.39-29.el6.x86_64 #1
Aug 23 10:19:31 xen-028 kernel: [590071.740330] Hardware name: Dell Inc. 
PowerEdge R610/0F0XJ6, BIOS 6.0.7 08/18/2011
Aug 23 10:19:31 xen-028 kernel: [590071.740607] Workqueue: events_unbound 
flush_to_ldisc
Aug 23 10:19:31 xen-028 kernel: [590071.740806] task: 88008a6011c0 
task.stack: c9004cfec000
Aug 23 10:19:31 xen-028 kernel: [590071.740966] RIP: e030:[]  
[] n_tty_receive_buf_common+0xa4/0x1f0
Aug 23 10:19:31 xen-028 kernel: [590071.741282] RSP: e02b:c9004cfefb08  
EFLAGS: 00010296
Aug 23 10:19:31 xen-028 kernel: [590071.741442] RAX: 2260 RBX: 
 RCX: 000a
Aug 23 10:19:31 xen-028 kernel: [590071.741714] RDX:  RSI: 
88015ecd6420 RDI: 8800afd654d8
Aug 23 10:19:31 xen-028 kernel: [590071.741994] RBP: c9004cfefb78 R08: 
0001 R09: 81f0af00
Aug 23 10:19:31 xen-028 kernel: [590071.742274] R10: 7ff0 R11: 
0078 R12: 000a
Aug 23 10:19:31 xen-028 kernel: [590071.742549] R13: 8800afd65400 R14: 
 R15: 88015ecd6420
Aug 23 10:19:31 xen-028 kernel: [590071.742830] FS:  7f81da7317c0() 
GS:8801c098() knlGS:
Aug 23 10:19:31 xen-028 kernel: [590071.743112] CS:  e033 DS:  ES:  
CR0: 80050033
Aug 23 10:19:31 xen-028 kernel: [590071.743283] CR2: 2260 CR3: 
8f61f000 CR4: 2660
Aug 23 10:19:31 xen-028 kernel: [590071.743564] Stack:
Aug 23 10:19:31 xen-028 kernel: [590071.743719]  c900116c 
 8800afd654d8 0001c070
Aug 23 10:19:31 xen-028 kernel: [590071.744149]  2260 
8a603340 8801c0997000 
Aug 23 10:19:31 xen-028 kernel: [590071.744577]  8801c098b890 
88015ecd6400 8800b19e9c00 c9004cfefbf8
Aug 23 10:19:31 xen-028 kernel: [590071.745008] Call Trace:
Aug 23 10:19:31 xen-028 kernel: [590071.745169]  [] 
n_tty_receive_buf2+0x14/0x20
Aug 23 10:19:31 xen-028 kernel: [590071.745335]  [] 
tty_ldisc_receive_buf+0x23/0x50
Aug 23 10:19:31 xen-028 kernel: [590071.745501]  [] 
flush_to_ldisc+0xc8/0x100
Aug 23 10:19:31 xen-028 kernel: [590071.745669]  [] ? 
__switch_to+0x1dc/0x680
Aug 23 10:19:31 xen-028 kernel: [590071.745836]  [] 
process_one_work+0x170/0x500
Aug 23 10:19:31 xen-028 kernel: [590071.746005]  [] ? 
__schedule+0x238/0x530
Aug 23 10:19:31 xen-028 kernel: [590071.746169]  [] ? 
maybe_create_worker+0x94/0x120
Aug 23 10:19:31 xen-028 kernel: [590071.746342]  [] ? 
schedule+0x3a/0xa0
Aug 23 10:19:31 xen-028 kernel: [590071.746506]  [] 
worker_thread+0x166/0x580
Aug 23 10:19:31 xen-028 kernel: [590071.746671]  [] ? 
__schedule+0x238/0x530
Aug 23 10:19:31 xen-028 kernel: [590071.749537]  [] ? 
default_wake_function+0x12/0x20
Aug 23 10:19:31 xen-028 kernel: [590071.749706]  [] ? 
maybe_create_worker+0x120/0x120
Aug 23 10:19:31 xen-028 kernel: [590071.749872]  [] ? 
schedule+0x3a/0xa0
Aug 23 10:19:31 xen-028 kernel: [590071.750040]  [] ? 
_raw_spin_unlock_irqrestore+0x16/0x20
Aug 23 10:19:31 xen-028 kernel: [590071.750204]  [] ? 
maybe_create_worker+0x120/0x120
Aug 23 10:19:31 xen-028 kernel: [590071.750369]  [] 
kthread+0xe5/0x100
Aug 23 10:19:31 xen-028 kernel: