from:"\"Nathan March\""

Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

2018-01-24 Thread Nathan March


> -Original Message-
> From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of
> Johnny Hughes
> Sent: Wednesday, January 24, 2018 6:39 AM
> To: centos-virt@centos.org
> Subject: Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation)
> packages making their way to centos-virt-xen-testing
> 
> On 01/24/2018 01:01 AM, Pasi Kärkkäinen wrote:
> > On Tue, Jan 23, 2018 at 06:20:39PM -0600, Kevin Stange wrote:
> >> On 01/23/2018 05:57 PM, Karl Johnson wrote:
> >>>
> >>>
> >>> On Tue, Jan 23, 2018 at 4:50 PM, Nathan March  >>> <mailto:nat...@gt.net>> wrote:
> >>>
> >>> Hi,
> >>>
> >>> > Hmm.. isn't this the ldisc bug that was discussed a few months ago
> on this
> >>> list,
> >>> > and a patch was applied to virt-sig kernel aswell?
> >>> >
> >>> > Call trace looks similar..
> >>>
> >>> Good memory! I'd forgotten about that despite being the one who ran
> >>> into it.
> >>>
> >>> Looks like that patch was just removed in 4.9.75-30 which I just
> >>> upgraded
> >>> this system to: http://cbs.centos.org/koji/buildinfo?buildID=21122
> >>> <http://cbs.centos.org/koji/buildinfo?buildID=21122>
> >>> Previously I was on 4.9.63-29 which does not have this problem, and
> does
> >>> have the ldisc patch. So I guess the question is for Johnny, why was 
> >>> it
> >>> removed?
> >>>
> >>> In the meantime, I'll revert the kernel and follow up if I see any
> >>> further
> >>> problems.
> >>>
> >>>
> >>> IIRC the patch has been removed from the spec file because it has been
> >>> merged upstream in 4.9.71.
> >>
> >> The IRC discussion I found in my log indicates that it was removed
> >> because it didn't apply cleanly due to changes when updating to 4.9.75,
> >> yet I don't think anyone independently validated that the changes made
> >> are equivalent to the patch that was removed.  I was never able to
> >> reproduce this issue, so I didn't investigate it myself.
> >>
> >
> > Sounds like the patch is still needed :)
> >
> > Anyone up to re-porting it to 4.9.75+ ?
> 
> It looked, at first glance, like 4.9.71 fixed it .. I guess not in all cases

I'm happy to do testing here if anyone's able to help with a patch, does look 
like reverting to 4.9.63-29 solved it for me in the interm.


___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

2018-01-23 Thread Nathan March

Hi,

> Hmm.. isn't this the ldisc bug that was discussed a few months ago on this
list,
> and a patch was applied to virt-sig kernel aswell?
> 
> Call trace looks similar..

Good memory! I'd forgotten about that despite being the one who ran into it.

Looks like that patch was just removed in 4.9.75-30 which I just upgraded
this system to: http://cbs.centos.org/koji/buildinfo?buildID=21122
Previously I was on 4.9.63-29 which does not have this problem, and does
have the ldisc patch. So I guess the question is for Johnny, why was it
removed?

In the meantime, I'll revert the kernel and follow up if I see any further
problems.

Cheers,
Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

2018-01-23 Thread Nathan March

> Thanks for the heads-up.  It's been running through XenServer's tests
> as well as the XenProject's "osstest" -- I haven't heard of any
> additional issues, but I'll ask.

Looks like I can reproduce this pretty easily, this happened upon ssh'ing
into the server while I had a VM migrating into it. The system goes
completely unresponsive (can't even enter a keystroke via console):

[64722.291300] vlan208: port 4(vif5.0) entered forwarding state
[64722.291695] NOHZ: local_softirq_pending 08
[64929.006981] BUG: unable to handle kernel paging request at
2260
[64929.007020] IP: [] n_tty_receive_buf_common+0xa4/0x1f0
[64929.007049] PGD 1f7a53067 [64929.007057] PUD 1ee0d4067 
PMD 0 [64929.007069] 
[64929.007077] Oops:  [#1] SMP
[64929.007088] Modules linked in: ebt_ip6 ebt_ip ebtable_filter ebtables
arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss
nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding
xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn
xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler
joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb
dca ptp pps_core uas usb_storage wmi ttm
[64929.007327] CPU: 15 PID: 17696 Comm: kworker/u48:0 Not tainted
4.9.75-30.el6.x86_64 #1
[64929.007343] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1
03/04/2015
[64929.007362] Workqueue: events_unbound flush_to_ldisc
[64929.007376] task: 8801fbc70580 task.stack: c90048af8000
[64929.007415] RIP: e030:[]  []
n_tty_receive_buf_common+0xa4/0x1f0
[64929.007465] RSP: e02b:c90048afbb08  EFLAGS: 00010296
[64929.007476] RAX: 2260 RBX:  RCX:
0002
[64929.007519] RDX:  RSI: 8801dc0f3c20 RDI:
8801f9b8acd8
[64929.007563] RBP: c90048afbb78 R08: 0001 R09:
8210f1c0
[64929.007577] R10: 7ff0 R11:  R12:
0002
[64929.007620] R13: 8801f9b8ac00 R14:  R15:
8801dc0f3c20
[64929.007675] FS:  7fcfc0af8700() GS:880204dc()
knlGS:
[64929.007718] CS:  e033 DS:  ES:  CR0: 80050033
[64929.007759] CR2: 2260 CR3: 0001f067b000 CR4:
00042660
[64929.007782] Stack:
[64929.007806]  c90048afbb38  8801f9b8acd8
000104dda030
[64929.007858]  2260 fbc72700 880204dc48c0

[64929.007941]  880204dce890 8801dc0f3c00 8801f7f25c00
c90048afbbf8
[64929.007994] Call Trace:
[64929.008008]  [] n_tty_receive_buf2+0x14/0x20
[64929.008048]  [] tty_ldisc_receive_buf+0x23/0x50
[64929.008088]  [] flush_to_ldisc+0xc8/0x100
[64929.008133]  [] ? __switch_to+0x20b/0x690
[64929.008176]  [] ? xen_clocksource_read+0x15/0x20
[64929.008222]  [] process_one_work+0x170/0x500
[64929.008268]  [] ? __schedule+0x238/0x530
[64929.008310]  [] ? schedule+0x3a/0xa0
[64929.008324]  [] worker_thread+0x166/0x530
[64929.008368]  [] ? put_prev_entity+0x29/0x140
[64929.008412]  [] ? __schedule+0x238/0x530
[64929.008458]  [] ? default_wake_function+0x12/0x20
[64929.008502]  [] ? maybe_create_worker+0x120/0x120
[64929.008518]  [] ? schedule+0x3a/0xa0
[64929.008555]  [] ? _raw_spin_unlock_irqrestore+0x16/0x20
[64929.008599]  [] ? maybe_create_worker+0x120/0x120
[64929.008616]  [] kthread+0xe5/0x100
[64929.008630]  [] ? schedule_tail+0x56/0xc0
[64929.008643]  [] ? __kthread_init_worker+0x40/0x40
[64929.008659]  [] ? schedule_tail+0x56/0xc0
[64929.008673]  [] ret_from_fork+0x41/0x50
[64929.008685] Code: 89 fe 4c 89 ef 89 45 98 e8 aa fb ff ff 8b 45 98 48 63
d0 48 85 db 48 8d 0c 13 48 0f 45 d9 01 45 bc 49 01 d7 41 29 c4 48 8b 45 b0
<48> 8b 30 48 89 75 c0 49 8b 0e 8d 96 00 10 00 00 29 ca 41 f6 85 
[64929.008894] RIP  [] n_tty_receive_buf_common+0xa4/0x1f0
[64929.008914]  RSP 
[64929.008923] CR2: 2260
[64929.009641] ---[ end trace e1da1cdf77fed144 ]---
[64929.009785] BUG: unable to handle kernel paging request at
ffd8
[64929.009804] IP: [] kthread_data+0x10/0x20
[64929.009823] PGD 200d067 [64929.009831] PUD 200f067 
PMD 0 [64929.009842] 
[64929.009850] Oops:  [#2] SMP
[64929.009864] Modules linked in: ebt_ip6 ebt_ip ebtable_filter ebtables
arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss
nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding
xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn
xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler
joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb
dca ptp pps_core uas usb_storage wmi ttm
[64929.010054] CPU: 15 PID: 17696 Comm: kworker/u48:0 Tainted: G  D
4.9.75-30.el6.x86_64 #1
[64929.010068] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1
03/04/2015
[64929.010127] task: 8801fbc70580 task.stack: c90048af8000
[64929.010138] RIP: e030:[]  []
kthread_data

Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

2018-01-22 Thread Nathan March

Just a heads up that I'm seeing major stability problems on these builds.
Didn't have  console capture setup unfortunately, but have seen my test
hypervisor hard lock twice over the weekend.

This is with xpti being used, rather than the shim.

Cheers,
Nathan


> -Original Message-
> From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of
> George Dunlap
> Sent: Wednesday, January 17, 2018 9:14 AM
> To: Discussion about the virtualization on CentOS 
> Subject: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation)
packages
> making their way to centos-virt-xen-testing
> 
> I've built & tagged packages for CentOS 6 and 7 4.6.6-9, with XPTI
> "stage 1" Meltdown mitigation.
> 
> This will allow 64-bit PV guests to run safely (with a few caveats),
> but incurs a fairly significant slowdown for 64-bit PV guests on Intel
> boxes (including domain 0).
> 
> If you prefer using Vixen / Comet, you can turn it off by adding
> 'xpti=0' to your Xen command-line.
> 
> Detailed information can be found in the XSA-254 advisory:
> 
> https://xenbits.xen.org/xsa/advisory-254.html
> 
> Please test and report any issues you have.  I'll probably tag then
> with -release tomorrow.
> 
> 4.8 packages should be coming to buildlogs soon.
> 
>  -George
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

2018-01-18 Thread Nathan March

> -Original Message-
> From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of
> Peter Peltonen
> Sent: Thursday, January 18, 2018 11:19 AM
> To: Discussion about the virtualization on CentOS 
> Subject: Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation)
> packages making their way to centos-virt-xen-testing
> 
> Thanks George.
> 
> As there are now quite many options to choose from, what would be the
> best option performance wise for running 32bit domUs under xen-4.6?
> 
> Best,
> Peter
> 

It's worth taking a look at the table in the latest XSA, it helps clarify a
fair bit:

https://xenbits.xen.org/xsa/advisory-254.html

Cheers,
Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt

[CentOS-virt] Stability issues since moving to 4.6 - Kernel paging request bug + VM left in null state

2017-11-07 Thread Nathan March

Since moving from 4.4 to 4.6, I've been seeing an increasing number of
stability issues on our hypervisors. I'm not clear if there's a singular
root cause here, or if I'm dealing with multiple bugs.

 

One of the more common ones I've seen, is a VM on shutdown will remain in
the null state and a kernel bug is thrown:

 

xen001 log # xl list

NameID   Mem VCPUs  State
Time(s)

Domain-0 0  614424 r-
6639.7

(null)   3 0 1 --pscd
36.3

 

[89920.839074] BUG: unable to handle kernel paging request at
88020ee9a000

[89920.839546] IP: [] __memcpy+0x12/0x20

[89920.839933] PGD 2008067 

[89920.840022] PUD 17f43f067 

[89920.840390] PMD 1e0976067 

[89920.840469] PTE 0

[89920.840833] 

[89920.841123] Oops:  [#1] SMP

[89920.841417] Modules linked in: ebt_ip ebtable_filter ebtables
arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss
nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding
xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn
xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler
joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb
dca ptp pps_core uas usb_storage wmi ttm

[89920.847080] CPU: 4 PID: 1471 Comm: loop6 Not tainted 4.9.58-29.el6.x86_64
#1

[89920.847381] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1
03/04/2015

[89920.847893] task: 8801b75e0700 task.stack: c900460e

[89920.848192] RIP: e030:[]  []
__memcpy+0x12/0x20

[89920.848783] RSP: e02b:c900460e3b20  EFLAGS: 00010246

[89920.849081] RAX: 88018916d000 RBX: 8801b75e0700 RCX:
0200

[89920.849384] RDX:  RSI: 88020ee9a000 RDI:
88018916d000

[89920.849686] RBP: c900460e3b38 R08: 88011da9fcf8 R09:
0002

[89920.849989] R10: 88019535bddc R11: ea0006245b5c R12:
1000

[89920.850294] R13: 88018916e000 R14: 1000 R15:
c900460e3b68

[89920.850605] FS:  7fb865c30700() GS:880204b0()
knlGS:

[89920.851118] CS:  e033 DS:  ES:  CR0: 80050033

[89920.851418] CR2: 88020ee9a000 CR3: 0001ef03b000 CR4:
00042660

[89920.851720] Stack:

[89920.852009]  814375ca c900460e3b38 c900460e3d08
c900460e3bb8

[89920.852821]  814381c5 c900460e3b68 c900460e3d08
1000

[89920.853633]  c900460e3d88  1000
ea00

[89920.854445] Call Trace:

[89920.854741]  [] ? memcpy_from_page+0x3a/0x70

[89920.855043]  []
iov_iter_copy_from_user_atomic+0x265/0x290

[89920.855354]  [] generic_perform_write+0xf3/0x1d0

[89920.855673]  [] ? xen_load_tls+0xaa/0x160

[89920.855992]  [] nfs_file_write+0xdb/0x200 [nfs]

[89920.856297]  [] vfs_iter_write+0xa2/0xf0

[89920.856599]  [] lo_write_bvec+0x65/0x100

[89920.856899]  [] do_req_filebacked+0x195/0x300

[89920.857202]  [] loop_queue_work+0x5b/0x80

[89920.857505]  [] kthread_worker_fn+0x98/0x1b0

[89920.857808]  [] ? schedule+0x3a/0xa0

[89920.858108]  [] ? _raw_spin_unlock_irqrestore+0x16/0x20

[89920.858411]  [] ? kthread_probe_data+0x40/0x40

[89920.858713]  [] kthread+0xe5/0x100

[89920.859014]  [] ? __kthread_init_worker+0x40/0x40

[89920.859317]  [] ret_from_fork+0x25/0x30

[89920.859615] Code: 81 f3 00 00 00 00 e9 1e ff ff ff 90 90 90 90 90 90 90
90 90 90 90 90 90 90 66 66 90 66 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07
 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 f3 

[89920.864410] RIP  [] __memcpy+0x12/0x20

[89920.864749]  RSP 

[89920.865021] CR2: 88020ee9a000

[89920.865294] ---[ end trace b77d2ce5646284d1 ]---

 

Wondering if anyone has advice on how to troubleshoot the above, or might
have some insight into that the issue could be? This hypervisor was only up
for a day, had almost no VMs running on it since boot, I booted a single
windows test VM which BSOD'ed and then this happened.

 

This is on xen 4.6.6-4.el6 with 4.9.58-29.el6.x86_64. I see these issues
across a wide number of systems with from both Dell and Supermicro, although
we run the same Intel x540 10gb nic's in each system with the same netapp
nfs backend storage.

 

Cheers,

Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Status of reverted Linux patch "tty: Fix ldisc crash on reopened tty", Linux 4.9 kernel frequent crashes

2017-09-05 Thread Nathan March

> > I have no issues rolling this patch in , while we wait on upstream, if
> > it makes our tree more stable.
> >
> 
> I think we should do that.. What do others think?
> 

I've had the patch deployed to a group of 32 hosts (with hundreds of vms)
for about 10 days now and no sign of any issues.

So I support it =)

Cheers,
Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Major stability problems with xen 4.6.6

2017-08-29 Thread Nathan March

> It seems the patch you mentioned was merged to upstream Linux here:
>
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
> d=71472fa9c52b1da27663c275d416d8654b905f05
> 
> and then reverted/removed here:
>
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i
> d=896d81fefe5d1919537db2c2150ab6384e4a6610
> 
> Do you know if there has been proper/fixed patch after that? has it been
> merged to upstream Linux kernel already?

Interesting! I didn't come across that when digging into this.

It looks like this hasn't been followed up on at all  since April:
https://lists.gt.net/engine?list=linux;do=search_results;search_type=AND;sea
rch_forum=forum_1;search_string=ldisc%20reopened&sb=post_time

Currently I've got ~40 dom0's running with the patch on 4.9.44-39 and it's
resolved all stability issues, previously I was seeing multiple crashes a
week.

Cheers,
Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Major stability problems with xen 4.6.6

2017-08-24 Thread Nathan March

Just in case anyone else on this list is running into similar issues, I can
confirm that the patch appears to have resolved this.

 

I've opened https://bugs.centos.org/view.php?id=13713

 

It was so bad that having the system under load (with rpmbuild) and opening
another ssh window or two would almost always cause the oops.

 

Cheers,

Nathan

 

 

From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of
Nathan March
Sent: Wednesday, August 23, 2017 3:32 PM
To: 'Discussion about the virtualization on CentOS' 
Subject: Re: [CentOS-virt] Major stability problems with xen 4.6.6

 

This appears to be a centos kernel issue rather than a xen one.

 

https://lkml.org/lkml/2016/5/17/440

 

Digging through the posts and not clear why this never made it upstream.

 

I'm going to apply that patch to my systems and see if it resolves, but
won't know for certain until a week or two of stability goes by.

 

- Nathan

 

 

From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of
Nathan March
Sent: Wednesday, August 23, 2017 2:48 PM
To: centos-virt@centos.org <mailto:centos-virt@centos.org> 
Subject: [CentOS-virt] Major stability problems with xen 4.6.6

 

Hi,

 

I'm seeing numerous crashes on the xen 4.6.6-1 / 4.6.6-2 releases, on both
the 4.9.34-29 and 4.9.39-29 kernels.

 

I've attached a txt with two different servers outputs.

 

Xen-028: This crashed this morning while running 4.6.6-1 and 4.9.39-29

Xen-001: This crashed shortly after being upgraded to 4.6.6-2 and 4.9.34-29

 

Both are on different hardware platforms, and have had a long history of
being stable until these upgrades.

 

It sounds potentially related to
https://kernel.googlesource.com/pub/scm/linux/kernel/git/tiwai/sound-unstabl
e/+/9ce119f318ba1a07c29149301f1544b6c4bea52a%5E%21/ but I've confirmed this
patch is in the above kernels.

 

Any suggestions / thoughts?

 

Cheers,

Nathan

 

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Major stability problems with xen 4.6.6

2017-08-23 Thread Nathan March

This appears to be a centos kernel issue rather than a xen one.

 

https://lkml.org/lkml/2016/5/17/440

 

Digging through the posts and not clear why this never made it upstream.

 

I'm going to apply that patch to my systems and see if it resolves, but
won't know for certain until a week or two of stability goes by.

 

- Nathan

 

 

From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of
Nathan March
Sent: Wednesday, August 23, 2017 2:48 PM
To: centos-virt@centos.org
Subject: [CentOS-virt] Major stability problems with xen 4.6.6

 

Hi,

 

I'm seeing numerous crashes on the xen 4.6.6-1 / 4.6.6-2 releases, on both
the 4.9.34-29 and 4.9.39-29 kernels.

 

I've attached a txt with two different servers outputs.

 

Xen-028: This crashed this morning while running 4.6.6-1 and 4.9.39-29

Xen-001: This crashed shortly after being upgraded to 4.6.6-2 and 4.9.34-29

 

Both are on different hardware platforms, and have had a long history of
being stable until these upgrades.

 

It sounds potentially related to
https://kernel.googlesource.com/pub/scm/linux/kernel/git/tiwai/sound-unstabl
e/+/9ce119f318ba1a07c29149301f1544b6c4bea52a%5E%21/ but I've confirmed this
patch is in the above kernels.

 

Any suggestions / thoughts?

 

Cheers,

Nathan

 

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt

[CentOS-virt] Major stability problems with xen 4.6.6

2017-08-23 Thread Nathan March

Hi,

 

I'm seeing numerous crashes on the xen 4.6.6-1 / 4.6.6-2 releases, on both
the 4.9.34-29 and 4.9.39-29 kernels.

 

I've attached a txt with two different servers outputs.

 

Xen-028: This crashed this morning while running 4.6.6-1 and 4.9.39-29

Xen-001: This crashed shortly after being upgraded to 4.6.6-2 and 4.9.34-29

 

Both are on different hardware platforms, and have had a long history of
being stable until these upgrades.

 

It sounds potentially related to
https://kernel.googlesource.com/pub/scm/linux/kernel/git/tiwai/sound-unstabl
e/+/9ce119f318ba1a07c29149301f1544b6c4bea52a%5E%21/ but I've confirmed this
patch is in the above kernels.

 

Any suggestions / thoughts?

 

Cheers,

Nathan

 

Aug 23 10:19:31 xen-028 kernel: [590071.735515] BUG: unable to handle kernel 
paging request at 2260
Aug 23 10:19:31 xen-028 kernel: [590071.735795] IP: [] 
n_tty_receive_buf_common+0xa4/0x1f0
Aug 23 10:19:31 xen-028 kernel: [590071.736031] PGD 0 
Aug 23 10:19:31 xen-028 kernel: [590071.736083] 
Aug 23 10:19:31 xen-028 kernel: [590071.736300] Oops:  [#1] SMP
Aug 23 10:19:31 xen-028 kernel: [590071.736470] Modules linked in: ebt_ip6 
ebt_ip ebtable_filter ebtables arptable_filter arp_tables bridge xen_pciback 
xen_gntalloc nfsd auth_rpcgss nfsv3 nfs_acl nfs fscache lockd sunrpc grace 
8021q mrp garp stp llc bonding blktap xen_netback xen_blkback xen_gntdev 
xen_evtchn xenfs xen_privcmd ipmi_devintf ipmi_si ipmi_msghandler gpio_ich 
iTCO_wdt iTCO_vendor_support fjes acpi_power_meter dcdbas pcspkr serio_raw 
joydev lpc_ich igb ixgbe dca ptp pps_core mdio i7core_edac edac_core bnx2 raid1 
megaraid_sas ttm
Aug 23 10:19:31 xen-028 kernel: [590071.740051] CPU: 14 PID: 21615 Comm: 
kworker/u48:1 Not tainted 4.9.39-29.el6.x86_64 #1
Aug 23 10:19:31 xen-028 kernel: [590071.740330] Hardware name: Dell Inc. 
PowerEdge R610/0F0XJ6, BIOS 6.0.7 08/18/2011
Aug 23 10:19:31 xen-028 kernel: [590071.740607] Workqueue: events_unbound 
flush_to_ldisc
Aug 23 10:19:31 xen-028 kernel: [590071.740806] task: 88008a6011c0 
task.stack: c9004cfec000
Aug 23 10:19:31 xen-028 kernel: [590071.740966] RIP: e030:[]  
[] n_tty_receive_buf_common+0xa4/0x1f0
Aug 23 10:19:31 xen-028 kernel: [590071.741282] RSP: e02b:c9004cfefb08  
EFLAGS: 00010296
Aug 23 10:19:31 xen-028 kernel: [590071.741442] RAX: 2260 RBX: 
 RCX: 000a
Aug 23 10:19:31 xen-028 kernel: [590071.741714] RDX:  RSI: 
88015ecd6420 RDI: 8800afd654d8
Aug 23 10:19:31 xen-028 kernel: [590071.741994] RBP: c9004cfefb78 R08: 
0001 R09: 81f0af00
Aug 23 10:19:31 xen-028 kernel: [590071.742274] R10: 7ff0 R11: 
0078 R12: 000a
Aug 23 10:19:31 xen-028 kernel: [590071.742549] R13: 8800afd65400 R14: 
 R15: 88015ecd6420
Aug 23 10:19:31 xen-028 kernel: [590071.742830] FS:  7f81da7317c0() 
GS:8801c098() knlGS:
Aug 23 10:19:31 xen-028 kernel: [590071.743112] CS:  e033 DS:  ES:  
CR0: 80050033
Aug 23 10:19:31 xen-028 kernel: [590071.743283] CR2: 2260 CR3: 
8f61f000 CR4: 2660
Aug 23 10:19:31 xen-028 kernel: [590071.743564] Stack:
Aug 23 10:19:31 xen-028 kernel: [590071.743719]  c900116c 
 8800afd654d8 0001c070
Aug 23 10:19:31 xen-028 kernel: [590071.744149]  2260 
8a603340 8801c0997000 
Aug 23 10:19:31 xen-028 kernel: [590071.744577]  8801c098b890 
88015ecd6400 8800b19e9c00 c9004cfefbf8
Aug 23 10:19:31 xen-028 kernel: [590071.745008] Call Trace:
Aug 23 10:19:31 xen-028 kernel: [590071.745169]  [] 
n_tty_receive_buf2+0x14/0x20
Aug 23 10:19:31 xen-028 kernel: [590071.745335]  [] 
tty_ldisc_receive_buf+0x23/0x50
Aug 23 10:19:31 xen-028 kernel: [590071.745501]  [] 
flush_to_ldisc+0xc8/0x100
Aug 23 10:19:31 xen-028 kernel: [590071.745669]  [] ? 
__switch_to+0x1dc/0x680
Aug 23 10:19:31 xen-028 kernel: [590071.745836]  [] 
process_one_work+0x170/0x500
Aug 23 10:19:31 xen-028 kernel: [590071.746005]  [] ? 
__schedule+0x238/0x530
Aug 23 10:19:31 xen-028 kernel: [590071.746169]  [] ? 
maybe_create_worker+0x94/0x120
Aug 23 10:19:31 xen-028 kernel: [590071.746342]  [] ? 
schedule+0x3a/0xa0
Aug 23 10:19:31 xen-028 kernel: [590071.746506]  [] 
worker_thread+0x166/0x580
Aug 23 10:19:31 xen-028 kernel: [590071.746671]  [] ? 
__schedule+0x238/0x530
Aug 23 10:19:31 xen-028 kernel: [590071.749537]  [] ? 
default_wake_function+0x12/0x20
Aug 23 10:19:31 xen-028 kernel: [590071.749706]  [] ? 
maybe_create_worker+0x120/0x120
Aug 23 10:19:31 xen-028 kernel: [590071.749872]  [] ? 
schedule+0x3a/0xa0
Aug 23 10:19:31 xen-028 kernel: [590071.750040]  [] ? 
_raw_spin_unlock_irqrestore+0x16/0x20
Aug 23 10:19:31 xen-028 kernel: [590071.750204]  [] ? 
maybe_create_worker+0x120/0x120
Aug 23 10:19:31 xen-028 kernel: [590071.750369]  [] 
kthread+0xe5/0x100
Aug 23 10:19:31 xen-028 kernel:

[CentOS-virt] Xen packages with XSA-226+?

2017-08-21 Thread Nathan March

Hi,

It's been almost a week now since XSA-226 through XSA-230 were released and
just wondering when updated packages are expected to be posted?

https://cbs.centos.org/koji/packageinfo?packageID=88 has nothing for the
past month.

Thanks!

- Nathan
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Xen Doc Day: Guide to setting up bridging on CentOS 6 / 7

2015-10-28 Thread Nathan March

If you'd like to extend that a little bit, here's example configs on how to do 
LACP and vlan tagging on c6:

host network-scripts # cat ifcfg-eth0
DEVICE=eth0
ONBOOT=yes
USEERCTL=no
BOOTPROTO=none
IPV6INIT=no
MTU=1500
MASTER=bond0
SLAVE=yes

host network-scripts # cat ifcfg-eth1
DEVICE=eth1
ONBOOT=yes
USEERCTL=no
BOOTPROTO=none
IPV6INIT=no
MTU=1500
MASTER=bond0
SLAVE=yes

host network-scripts # cat ifcfg-bond0
DEVICE=bond0
ONBOOT=yes
USEERCTL=no
BOOTPROTO=none
IPV6INIT=no
BONDING_OPTS="miimon=100 mode=802.3ad"

host network-scripts # cat ifcfg-vlan###
DEVICE=vlan###
ONBOOT=yes
USEERCTL=no
BOOTPROTO=none
IPV6INIT=no
PHYSDEV=bond0
VLAN=yes
VLAN_NAME_TYPE=VLAN_PLUS_VID_NO_PAD
IPADDR=10.x.x.x
NETMASK=255.255.255.0
GATEWAY=10.x.x.x
DOMAIN="example.com"
DNS1=x.x.x.x
DNS2=x.x.x.x
DNS3=x.x.x.x

I use my own bridging control scripts, but this should extend your existing doc 
nicely just by using BRIDGE=xenbr0 in the ifcfg-vlan### file.

Also there's a small error in your c6 doc, you specify ifcfg-$dev but $dev 
never gets set anywhere.

- Nathan


> -Original Message-
> From: centos-virt-boun...@centos.org [mailto:centos-virt-
> boun...@centos.org] On Behalf Of George Dunlap
> Sent: Wednesday, October 28, 2015 10:02 AM
> To: Discussion about the virtualization on CentOS 
> Subject: [CentOS-virt] Xen Doc Day: Guide to setting up bridging on CentOS 6
> / 7
> 
> In honor of Xen Doc Day, I've put up some basic HOWTOs for setting up
> bridging on CentOS 6 and 7.  I'm far from an expert, so I'd appreciate any
> feedback.
> 
> The howtos can be found here:
> 
> https://wiki.centos.org/HowTos/Xen/Xen4QuickStart/Xen4Networking6
> 
> https://wiki.centos.org/HowTos/Xen/Xen4QuickStart/Xen4Networking7
> 
>  -George
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virtv

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt

[CentOS-virt] Kernel oops on the dom0

2015-10-05 Thread Nathan March

Hi All,

 

One of our developers managed to trigger a kernel oops on a 4.4.2 dom0.. Oops 
text is attached. He was working on setting up network namespaces / bridging 
inside a centos domU, had activated the bridge and lost networking (probably 
config error) so he rebooted the VM. On reboot is when we saw the oops, along 
with various xen procs hanging:

 

root 23388  0.0  1.6 132796 32264 ?SLsl Oct02   0:04 /usr/sbin/xl 
create /mnt/xen/gx/xen/metrixc7

root 26119  0.0  0.0  0 0 ?ZOct02   0:00  \_ [block] 


root 26127  0.0  0.0  0 0 ?ZOct02   0:00  \_ [block] 


root 26137  0.0  0.0  0 0 ?ZOct02   0:00  \_ [block] 


root 26157  0.0  0.0  0 0 ?ZOct02   0:00  \_ [block] 


root 26169  0.0  0.0  0 0 ?ZOct02   0:00  \_ [block] 


root 26195  0.0  0.0  0 0 ?ZOct02   0:00  \_ 
[vif-bridge] 

root 24625  0.0  0.0  0 0 ?Ds   Oct02   0:06 [tapdisk]

 

At this point the dom0 is still up and running existing VMs fine, I can also 
migrate live VMs off of it successfully although the post-migration clean up 
fails and hangs:

 

libxl: error: libxl_device.c:935:device_backend_callback: unable to remove 
device with path /local/domain/0/backend/vbd/17/51712

 

Host is running centos 7 with the 4.4.2-7 package and kernel 
3.10.68-11.el6.centos.alt.x86_64. I've also attached xl dmesg.

 

First time I've seen anything like this and not sure if his networking/bridging 
in the domU is related or just coincidental. Any thoughts / ideas? Going to try 
to reproduce on a test dom0 later this week, so happy to grab any additional 
debugging if required.

 

- Nathan

 

Oct  2 18:27:02 vana-031 kernel: BUG: unable to handle kernel paging request at 
88006564b000
Oct  2 18:27:02 vana-031 kernel: IP: [] memcpy+0x6/0x110
Oct  2 18:27:02 vana-031 kernel: PGD 1c0d067 PUD 104e0c067 PMD 104ce0067 PTE 0
Oct  2 18:27:02 vana-031 kernel: Oops: 0002 [#1] SMP 
Oct  2 18:27:02 vana-031 kernel: Modules linked in: ebt_ip6 tun ebt_ip 
ebtable_filter ebtables arptable_filter arp_tables bridge xen_pciback 
xen_gntalloc nfsd auth_rpcgss nfsv3 nfs_acl nfs fscache lockd sunrpc 8021q garp 
stp llc bond
ing ipv6 xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev 
xen_evtchn xenfs xen_privcmd iTCO_wdt iTCO_vendor_support dcdbas coretemp 
freq_table mperf crc32_pclmul crc32c_intel ghash_clmulni_intel microcode pcspkr 
ses enclosure
 sg ipmi_devintf ipmi_si ipmi_msghandler joydev i2c_i801 lpc_ich shpchp ixgbe 
mdio igb hwmon ptp pps_core ioatdma dca ext4 jbd2 mbcache raid1 sd_mod 
crc_t10dif aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 
mpt2sas sc
si_transport_sas raid_class ahci libahci wmi ttm drm_kms_helper dm_mirror 
dm_region_hash dm_log dm_mod
Oct  2 18:27:02 vana-031 kernel: CPU: 18 PID: 24625 Comm: tapdisk Not tainted 
3.10.68-11.el6.centos.alt.x86_64 #1
Oct  2 18:27:02 vana-031 kernel: Hardware name: Dell Inc. PowerEdge C6220 
II/09N44V, BIOS 2.6.0 10/09/2014
Oct  2 18:27:02 vana-031 kernel: task: 880004ed2780 ti: 880024266000 
task.ti: 880024266000
Oct  2 18:27:02 vana-031 kernel: RIP: e030:[]  
[] memcpy+0x6/0x110
Oct  2 18:27:02 vana-031 kernel: RSP: e02b:880024267c90  EFLAGS: 00010202
Oct  2 18:27:02 vana-031 kernel: RAX: 88006564b000 RBX: 88004104f830 
RCX: 1000
Oct  2 18:27:02 vana-031 kernel: RDX: 1000 RSI: 88007c0ca000 
RDI: 88006564b000
Oct  2 18:27:02 vana-031 kernel: RBP: 880024267cd8 R08:  
R09: 
Oct  2 18:27:02 vana-031 kernel: R10:  R11: 880024267eb8 
R12: 0001
Oct  2 18:27:02 vana-031 kernel: R13: 8800 R14: 6db6db6db6db6db7 
R15: 1600
Oct  2 18:27:02 vana-031 kernel: FS:  7f62c755e740() 
GS:88010144() knlGS:8801014e
Oct  2 18:27:02 vana-031 kernel: CS:  e033 DS:  ES:  CR0: 
80050033
Oct  2 18:27:02 vana-031 kernel: CR2: 88006564b000 CR3: 3533b000 
CR4: 00042660
Oct  2 18:27:02 vana-031 kernel: DR0:  DR1:  
DR2: 
Oct  2 18:27:02 vana-031 kernel: DR3:  DR6: 0ff0 
DR7: 0400
Oct  2 18:27:02 vana-031 kernel: Stack:
Oct  2 18:27:02 vana-031 kernel: a030f42a  
664b4800 880024267cc8
Oct  2 18:27:02 vana-031 kernel: 88004104f830 8800664b4800 
8800664b4800 
Oct  2 18:27:02 vana-031 kernel: 8800664b4820 880024267cf8 
a030e1f7 88004104f830
Oct  2 18:27:02 vana-031 kernel: Call Trace:
Oct  2 18:27:02 vana-031 kernel: [] ? 
blktap_request_bounce+0xda/0x100 [blktap]
Oct  2 18:27:02 vana-031 kernel: [] 
blktap_ring_unmap_request+0x67/0x90 [blktap]
Oct  2 18:27:02 vana-031 kernel: [] 
blktap_device_end_request+0x32/0x90 [blktap]
Oct  2 18:27:02 vana-031 k

Re: [CentOS-virt] Timezone issues with migrations between host kernel 3.10 and 3.18

2015-07-30 Thread Nathan March

> -Original Message-
> From: centos-virt-boun...@centos.org [mailto:centos-virt-
> boun...@centos.org] On Behalf Of Johnny Hughes
> Sent: Thursday, July 30, 2015 4:41 AM
> To: centos-virt@centos.org
> Subject: Re: [CentOS-virt] Timezone issues with migrations between host
> kernel 3.10 and 3.18
> 
> On 07/30/2015 06:38 AM, Johnny Hughes wrote:
> > On 07/29/2015 11:38 AM, Nathan March wrote:
> >> Hi All,
> >>
> >>
> >>
> >> I'm seeing clock issues with live migrations on the latest kernel
> >> packages, migrating a VM from 3.10.68-11 to 3.18.17-13 results in the
> >> VM clock being off by 7 hours (I'm PST, so appears to be a timezone
> issue).
> >> This is also between xen versions, but rolling the target back to
> >> 3.10 resolved so don't believe the recent XSA's are related.
> >>
> >>
> >>
> >> Anyone else seen behavior like this or have any ideas on how to resolve?
> >
> > Some versions of CentOS have a hardware clock uses UTC check box.  If
> > that is on, and if your hardware clock is instead set to local time,
> > that can cause issues.
> >
> > Can you check that there is no UTC=True in /etc/sysconfig/clock
> 
> also, you can use tzselect to make sure the correct timezone is used.
> 

I considered that so I did use hwclock to confirm that the hardware clock / 
system clock were both the same between the two servers, and comparing 
/etc/adjtime between the two indicates the hwclock should have been local time 
on both sides (unless this got changed when I downgraded the kernel to resolve 
the issue).

Unfortunately I don't have a test machine for this at the moment, but can 
follow up in a couple weeks.

- Nathan


___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt

[CentOS-virt] Timezone issues with migrations between host kernel 3.10 and 3.18

2015-07-29 Thread Nathan March

Hi All,

 

I'm seeing clock issues with live migrations on the latest kernel packages, 
migrating a VM from 3.10.68-11 to 3.18.17-13 results in the VM clock being off 
by 7 hours (I'm PST, so appears to be a timezone issue). This is also between 
xen versions, but rolling the target back to 3.10 resolved so don't believe the 
recent XSA's are related.

 

Anyone else seen behavior like this or have any ideas on how to resolve?

 

- Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] CentOS Images on AWS with partitions on /dev/xvda1 are awkwared to resize

2015-04-30 Thread Nathan March



> > So you're working from the command line tools in the EPEL 'cloud-init'
> > package, not the AWS GUI? Because when I tried expanding the size of
> > the base disk image in the GUI, I wound up with an an 8 Gig default
> > /dev/xvda1 on a 20 Gig /dev/xvda. That's why I was looking at "how do
> > I resize this thing safel?"

No experience with amazon here, but I routinely resize filesystems online 
without issues.

Repartition xvda so that xvda1 is the size you want (make sure you use the same 
start sector, just change the end of the partition). Run partprobe and confirm 
that fdisk -l /dev/xvda1 shows the new size. (You may need to reboot). After 
that just run resize2fs /dev/xvda1 (works online).

- Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6

2015-04-17 Thread Nathan March

Hi All,

I've tracked this down... We do rate limiting of our vms with a mix of 
ebtables/tc.

Running these commands (replace vif1.0 with the correct vif for your VM) will 
reproduce this:

ebtables -A FORWARD -i vif1.0 -j mark --set-mark 990 --mark-target CONTINUE

tc qdisc add dev bond0 root handle 1: htb default 2 
tc class add dev bond0 parent 1: classid 1:0 htb rate 1mbit 

tc class add dev bond0 parent 1: classid 1:990 htb rate 1mbit
tc filter add dev bond0 protocol ip parent 1:0 prio 990 handle 990 fw flowid 
1:990

Note that the speed limits being applied here are 10gb and I'm testing this on 
a 1gb network, so TC shouldn't really be doing anything here except letting the 
packets through. These same commands worked fine on gentoo xen 4.1 / kernel 
3.2.57, compared to this now not working on centos xen 4.4.1 / kernel 3.10.68.

Easiest way to reproduce is simply generate a large file, scp it to a remote 
host and on the remote host run:
tshark -Y "tcp.analysis.duplicate_ack_num"

If you run the ssh in a loop + tshark in another window, you can see the Dup 
ACK's begin immediately after adding the last filter rule:

25790294 1752.756733 xxx.xxx.xxx.13 -> xxx.xxx.xxx.205 TCP 78 [TCP Dup ACK 
25790286#4] ssh > 51515 [ACK] Seq=15994 Ack=50769840 Win=1544704 Len=0 
TSval=738150929 TSecr=4294944346 SLE=50785768 SRE=50790596
25790296 1752.756742 xxx.xxx.xxx.13 -> xxx.xxx.xxx.205 TCP 78 [TCP Dup ACK 
25790286#5] ssh > 51515 [ACK] Seq=15994 Ack=50769840 Win=1544704 Len=0 
TSval=738150929 TSecr=4294944346 SLE=50785768 SRE=50792044

- Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6

2015-04-15 Thread Nathan March

So I might have been misinterpreting things here and might be way off base. I 
think you can ignore this thread and I'll follow up if I get anything concrete 
down the road =) The retranmissions I'm seeing and reproducing are probably 
within normal allowances and can't reproduce the issue that originally lead me 
down this path.

- Nathan

> -Original Message-
> From: centos-virt-boun...@centos.org [mailto:centos-virt-
> boun...@centos.org] On Behalf Of Nathan March
> Sent: Wednesday, April 15, 2015 1:13 PM
> To: 'Discussion about the virtualization on CentOS'
> Subject: Re: [CentOS-virt] Seeing dropped packets / tcp retrans on latest
> 4.4.1-10el6
> 
> Hi All,
> 
> Some more data on this, I've reproduced this on another host that's a
> completely stock centos/xen deployment with a centos 6.6 domU.
> 
> Since I’m seeing the retransmissions on the VIF, I don't think it's related to
> the network stack but just in case.. Each host is connected via LACP with vlan
> tagging to a pair of stacked cisco 3750's. Host networking config is here:
> 
> http://dpaste.com/1Q6NY3Y
> 
> The vm is on br99 here.
> 
> This is easily reproducable by just generating a 250mb random file and doing
> an scp, while watching with tshark:
> 
> tshark -R "tcp.analysis.retransmission"
> 
> There's no visible impact to the connection the vast majority of the time,
> which is why I think this has gone unnoticed.
> 
> Just to confirm this wasn't related to hardware / nics, I've reproduced this 
> on:
> 
>  - Dell PowerEdge M620 with broadcom nics
>  - Dell C6220 with intel nics
>  - Supermicro X8DTT with intel nics
> 
> Any ideas? =)
> 
> - Nathan
> 
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> http://lists.centos.org/mailman/listinfo/centos-virt

___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6

2015-04-15 Thread Nathan March

Hi All,

Some more data on this, I've reproduced this on another host that's a 
completely stock centos/xen deployment with a centos 6.6 domU.

Since I’m seeing the retransmissions on the VIF, I don't think it's related to 
the network stack but just in case.. Each host is connected via LACP with vlan 
tagging to a pair of stacked cisco 3750's. Host networking config is here: 

http://dpaste.com/1Q6NY3Y

The vm is on br99 here.

This is easily reproducable by just generating a 250mb random file and doing an 
scp, while watching with tshark: 

tshark -R "tcp.analysis.retransmission"

There's no visible impact to the connection the vast majority of the time, 
which is why I think this has gone unnoticed.

Just to confirm this wasn't related to hardware / nics, I've reproduced this on:

 - Dell PowerEdge M620 with broadcom nics
 - Dell C6220 with intel nics
 - Supermicro X8DTT with intel nics

Any ideas? =)

- Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt

[CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6

2015-04-14 Thread Nathan March

Hi All,

 

Was troubleshooting some odd VM network issues and discovered that we're seeing 
dropped packets + retransmissions across multiple domU OS's and dom0 hardware 
platforms.

 

xendev01 ~ # tshark -R "tcp.analysis.retransmission " -i vif7.0

Running as user "root" and group "root". This could be dangerous.

Capturing on vif7.0

  3.054257 xxx.xxx.xxx.196 -> xxx.xxx.xxx.145 SSH 110 [TCP Fast Retransmission] 
Encrypted response packet len=44

  3.061949 xxx.xxx.xxx.196 -> xxx.xxx.xxx.145 SSH 1434 [TCP Fast 
Retransmission] Encrypted response packet len=1368

  3.383880 xxx.xxx.xxx.196 -> xxx.xxx.xxx.145 SSH 1434 [TCP Fast 
Retransmission] Encrypted response packet len=1368

  3.630911 xxx.xxx.xxx.196 -> xxx.xxx.xxx.145 SSH 1434 [TCP Fast 
Retransmission] Encrypted response packet len=1368

  3.635964 xxx.xxx.xxx.196 -> xxx.xxx.xxx.145 SSH 1434 [TCP Fast 
Retransmission] Encrypted response packet len=1368

 

I've confirmed this is happening with linux, windows and pfsense (bsd) domU's. 
I've turned off every feature I can with ethtool on both the underlying bridge 
on the host, the vif's, and the eth's inside the domU's. I also see it on 
traffic inbetween vms on the same host.

 

The domU sees packet errors on incoming traffic and outgoing looks fine, 
dumping on the dom0 indicates incoming packets are fine, but the reply from the 
domU is broken. This does not happen running the exact same VMs on some older 
xen 4.1.3 hosts. Reproduction is easy (for me at least), any burst of traffic 
will do it. I've just been running "ps auxf" over ssh to a vm to trigger.

 

Since I'm seeing it on the host when I sniff the vif, this feels like a bug?

 

- Nathan

 

___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Can't block-attach a file on a read only volume?

2015-03-13 Thread Nathan March

> http://cbs.centos.org/kojifiles/work/tasks/8801/8801/
> 
> If you could test those and let me know if it fixes your problem, I'd
> appreciate it. :-)

Confirmed, both issues are fixed. Thanks! Any plans to push those packages to 
main mirrors?

- Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt

[CentOS-virt] Can't block-attach a file on a read only volume?

2015-03-12 Thread Nathan March

Hi All,

 

One more weird issue, this works on old xen but fails on 4.4:

 

xendev01 ~ # mkdir /mnt/test

xendev01 ~ # mount -t tmpfs - /mnt/test

xendev01 ~ # dd if=/dev/null of=/mnt/test/disk seek=100M bs=1

0+0 records in

0+0 records out

0 bytes (0 B) copied, 0.000201809 s, 0.0 kB/s

 

xendev01 ~ # /usr/sbin/xl block-attach nathannx "file:/mnt/test/disk" "xvdd4"   


DEBUG libxl__blktap_devpath 37 aio:/mnt/test/disk

DEBUG libxl__blktap_devpath 40 /dev/xen/blktap-2/tapdev20

 

xendev01 ~ # xl block-detach nathannx 51764

DEBUG libxl__device_destroy_tapdisk 66 type=aio:/mnt/test/disk 
disk=:/mnt/test/disk

 

xendev01 ~ # mount -o remount,ro /mnt/test   

 

xendev01 ~ # /usr/sbin/xl block-attach nathannx "file:/mnt/test/disk" "xvdd4"   


DEBUG libxl__blktap_devpath 37 aio:/mnt/test/disk

libxl: error: libxl.c:2149:device_disk_add: failed to get blktap devpath for 
0xd3abd0

libxl: error: libxl.c:1727:device_addrm_aocomplete: unable to (null) device

libxl_device_disk_add failed.

 

I'm not sure why xen would care if the disk is writable? Would be nice to be 
able to mount these since many NFS storage arrays provide read only access to 
snapshots.

 

- Nathan

___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt

[CentOS-virt] Tapdisk processes being left behind when hvm domu's migrate/shutdown

2015-03-12 Thread Nathan March

Hi All,

 

I'm seeing tapdisk processes not being terminated after a HVM vm is shutdown or 
migrated away. I don't see this problem with linux paravirt domu's, just 
windows hvm ones.

 

xl.cfg:

 

name = 'nathanwin'

memory = 4096

vcpus = 2

disk = [ 'file:/mnt/gtc_disk_p1/nathanwin/drive_c,hda,w' ]

vif = [ 'mac=00:16:3D:01:03:E0,bridge=vlan208' ]

builder = "hvm"

kernel = "/usr/lib/xen/boot/hvmloader"

 

localtime = 0

on_poweroff = "destroy"

on_reboot = "restart"

on_crash = "destroy"

 

vnc = 1

vncunused = 1

 

cpuid  = [

'0:eax=1011',


'1:eax=001001101110,ecx=101110111010001000100011,edx=0001000010111011',

'2:eax=01010101001101011011',

  
'7,0:eax=,ebx=,ecx=,edx=',

 '13,1:eax=xxx0',

  '10:ebx=',

   '11:edx=',

   
'2147483650:eax=01100101011101000110111001001001,ebx=0010100101010010001011101100,ecx=01100110010101011010,edx=0010100101010010001011101110',

   
'2147483651:eax=01010101010101110010,ebx=0010001000100010,ecx=0010001000100010,edx=0100111000100010',

   
'2147483652:eax=001100110111011000110101,ebx=001001100010,ecx=00110111001100100010111000110010,edx=0010010011000111',

   '2147483656:eax=001100101000',

 ]

 

Starting with the VM running initially on another host, I migrate it in:

 

migration target: Ready to receive domain.

Saving to migration stream new xl format (info 0x0/0x0/1450)

Loading new save file  (new xl fmt info 0x0/0x0/1450)

Savefile contains xl domain config

WARNING: ignoring "kernel" directive for HVM guest. Use "firmware_override" 
instead if you really want a non-default firmware

xc: progress: Reloading memory pages: 56320/11141935%

xc: progress: Reloading memory pages: 1003520/1114193   90%

DEBUG libxl__blktap_devpath 37 aio:/mnt/gtc_disk_p1/nathanwin/drive_c

DEBUG libxl__blktap_devpath 40 /dev/xen/blktap-2/tapdev0

DEBUG libxl__blktap_devpath 37 aio:/mnt/gtc_disk_p1/nathanwin/drive_c

DEBUG libxl__blktap_devpath 40 /dev/xen/blktap-2/tapdev2

migration target: Transfer complete, requesting permission to start domain.

migration sender: Target has acknowledged transfer.

migration sender: Giving target permission to start.

migration target: Got permission, starting domain.

migration target: Domain started successsfully.

migration sender: Target reports successful startup.

DEBUG libxl__device_destroy_tapdisk 66 
type=aio:/mnt/gtc_disk_p1/nathanwin/drive_c 
disk=:/mnt/gtc_disk_p1/nathanwin/drive_c

Migration successful.

 

and now I have 2 tapdisk procs:

 

gtc-vana-005 ~ # ps auxf | grep tapdisk

root 32491  0.1  0.2  20364  4636 ?SLs  11:06   0:00 tapdisk

root 32520  0.0  0.2  20364  4636 ?SLs  11:06   0:00 tapdisk

 

Which seems odd given that the VM in question only has a single disk attached 
to it and the qemu proc indicates it's using tapdev2:

 

root 32524  0.4  0.7 323208 15040 ?SLsl 11:06   0:00 
/usr/lib/xen/bin/qemu-system-i386 -xen-domid 3 -chardev 
socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-3,server,nowait -mon 
chardev=libxl-cmd,mode=control -nodefaults -name nathanwin--incoming -vnc 
127.0.0.1:0,to=99 -device cirrus-vga -global vga.vram_size_mb=8 -boot order=cda 
-smp 2,maxcpus=2 -device rtl8139,id=nic0,netdev=net0,mac=00:16:3d:01:03:e0 
-netdev type=tap,id=net0,ifname=vif3.0-emu,script=no,downscript=no -incoming 
fd:13 -machine xenfv -m 4088 -drive 
file=/dev/xen/blktap-2/tapdev2,if=ide,index=0,media=disk,format=raw,cache=writeback

 

gtc-vana-005 ~ # lsof -p 32520 | grep blktap-2

tapdisk 32520 root  memCHR  246,2   886671 
/dev/xen/blktap-2/blktap2

tapdisk 32520 root   19u   CHR  246,2 0t0   886671 
/dev/xen/blktap-2/blktap2

 

gtc-vana-005 ~ # lsof -p 32491 | grep blktap-2   

tapdisk 32491 root  memCHR  246,0   903999 
/dev/xen/blktap-2/blktap0

tapdisk 32491 root   14u   CHR  246,0 0t0   903999 
/dev/xen/blktap-2/blktap0

 

I then migrate this VM off to another host:

 

migration target: Ready to receive domain.

Saving to migration stream new xl format (info 0x0/0x0/1450)

Loading new save file  (new xl fmt info 0x0/0x0/1450)

Savefile contains xl domain config

WARNING: ignoring "kernel" directive for HVM guest. Use "firmware_override" 
instead if you really want a non-default firmware

xc: progress: Reloading memory pages: 56320/11141935%

xc: progress: Reloading memory pages: 1003520/1114193   90%

DEBUG libxl__blktap_devpath 37 aio:/mnt/gtc_disk_p1/na

Re: [CentOS-virt] Masking CPU flags via libvirt xml not working?

2014-08-27 Thread Nathan March

On 8/26/2014 4:52 PM, Nathan March wrote:
>
> Has anyone here managed to get cpu masking working via libvirt?
> Intention to enable VM migrations between hosts of a different CPU
> generation.
>

To add to this, I've tried using the boot options to set the cpu mask 
instead:

xen_commandline: dom0_mem=2048M,max:2048M loglvl=all 
guest_loglvl=all cpuid_mask_ecx=0x009ee3fd cpuid_mask_edx=0xbfebfbff

Unfortunately still no luck. There's no errors in xm dmesg to indicate 
the settings were / weren't applied, it simply doesn't seem to do anything.

- Nathan
___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt

[CentOS-virt] Masking CPU flags via libvirt xml not working?

2014-08-26 Thread Nathan March

Hi,

Has anyone here managed to get cpu masking working via libvirt? 
Intention to enable VM migrations between hosts of a different CPU 
generation.

Inside my xml I'm providing the model as well as a list of features to 
specifically disable, but none of it seems to take any effect. On 
booting the VM I still see the disabled flags in /proc/cpuinfo

   
 x86_64
 Westmere
 
 
 
 
 
 
 
 
 
 
 
 
   

Doing a dumpxml against the domU once it's booted leaves out the entire 
 section, leading me to think maybe libvirt is dropping it for some 
reason. I've got the above  just in the main  section.

Anyone have this working or able to offer some suggestions?

Thanks!

- Nathan
___
CentOS-virt mailing list
CentOS-virt@centos.org
http://lists.centos.org/mailman/listinfo/centos-virt

Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

Re: [CentOS-virt] Xen 4.6.6-9 (with XPTI meltdown mitigation) packages making their way to centos-virt-xen-testing

[CentOS-virt] Stability issues since moving to 4.6 - Kernel paging request bug + VM left in null state

Re: [CentOS-virt] Status of reverted Linux patch "tty: Fix ldisc crash on reopened tty", Linux 4.9 kernel frequent crashes

Re: [CentOS-virt] Major stability problems with xen 4.6.6

Re: [CentOS-virt] Major stability problems with xen 4.6.6

Re: [CentOS-virt] Major stability problems with xen 4.6.6

[CentOS-virt] Major stability problems with xen 4.6.6

[CentOS-virt] Xen packages with XSA-226+?

Re: [CentOS-virt] Xen Doc Day: Guide to setting up bridging on CentOS 6 / 7

[CentOS-virt] Kernel oops on the dom0

Re: [CentOS-virt] Timezone issues with migrations between host kernel 3.10 and 3.18

[CentOS-virt] Timezone issues with migrations between host kernel 3.10 and 3.18

Re: [CentOS-virt] CentOS Images on AWS with partitions on /dev/xvda1 are awkwared to resize

Re: [CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6

Re: [CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6

Re: [CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6

[CentOS-virt] Seeing dropped packets / tcp retrans on latest 4.4.1-10el6

Re: [CentOS-virt] Can't block-attach a file on a read only volume?

[CentOS-virt] Can't block-attach a file on a read only volume?

[CentOS-virt] Tapdisk processes being left behind when hvm domu's migrate/shutdown

Re: [CentOS-virt] Masking CPU flags via libvirt xml not working?

[CentOS-virt] Masking CPU flags via libvirt xml not working?

26 matches

Site Navigation

Mail list logo

Footer information