Re: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 / Linux 3.18

2017-01-26 Thread Kevin Stange
On 01/26/2017 02:08 PM, Kevin Stange wrote:
> On 01/26/2017 09:35 AM, Johnny Hughes wrote:
>> On 01/26/2017 09:32 AM, Johnny Hughes wrote:
>>> On 01/25/2017 11:49 AM, Kevin Stange wrote:
 On 01/24/2017 11:16 AM, Kevin Stange wrote:
> On 01/24/2017 09:10 AM, Konrad Rzeszutek Wilk wrote:
>> On Tue, Jan 24, 2017 at 09:29:39PM +0800, -=X.L.O.R.D=- wrote:
>>> Kevin Stange,
>>> It can be either kernel or update the NIC driver or firmware of the NIC
>>> card. Hope that helps!
>>>
>>> Xlord
>>> -Original Message-
>>> From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of 
>>> Kevin
>>> Stange
>>> Sent: Tuesday, January 24, 2017 1:04 AM
>>> To: centos-virt@centos.org
>>> Subject: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 /
>>> Linux 3.18
>>>
> 
>>>
>>> Has anyone experienced similar issues with this configuration, and if 
>>> so,
>>> does anyone have tips on how to resolve the issues?
>>
>> Honeslty I would email Intel and see if they can help. This looks like
>> the NIC decides something is wrong, throws off an PCIe error and
>> then resets itself.
>
> This happens for several different NICs.  Is there a good contact at
> Intel for this kind of thing, or should I just try to reach them through
> their web site?
>
>> It could also be an error in the Linux stack which would "eat" an
>> interrupt when migrating interrupts (which was fixed
>> upstream, see below). Are you running irqbalance? Could you try
>> turning it off?
>
> irqbalance is enabled on these servers.  I'll try disabling it.

 I had stopped irqbalance yesterday afternoon, but had a hypervisor's
 NICs fail anyway in early morning this morning, so I'm pretty sure this
 is not the right tree to bark up.

>>>
>>> Here is a set of drivers/fireware from Intel for those NICs:
>>>
>>> https://downloadcenter.intel.com/download/15817/Intel-Network-Adapter-Driver-for-PCI-E-Gigabit-Network-Connections-under-Linux-
>>>
>>> I will see if I can get a CentOS-6 build of the latest version of that
>>> from our older SRPM:
>>>
>>> http://vault.centos.org/6.7/xen4/Source/SPackages/e1000e-2.5.4-3.10.68.2.el6.centos.alt.src.rpm
>>>
>>> I am currently very busy with several c5, c6, c7 updates and the i686
>>> altarch c7 tree .. but I have this on my list.  In the meantime, maybe
>>> someone else could also see if those drivers help you (or you could try
>>> to compile / install it).
>>>
>>> Do you have another machine that you can use to see if you can duplicate
>>> the issue NOT running the xen.gz hypervisor boot, but just the straight
>>> kernel?
> 
> I can't actually reproduce this problem reliably.  It happens randomly
> when the servers are up and running anywhere between a few hours and a
> month or more, and I haven't been able to isolate any specific way to
> cause it to happen.  As a result I can't really test different solutions
> on different servers to see what helps.  I was hoping other people were
> seeing it so that I could get some direction.  If I can reproduce it, it
> won't take me very long to identify what the cause is.  Right now if I
> do upgrade the drivers on the systems I won't really know if it's fixed
> until I don't see another issue for several months.
> 
>> Actually .. I think this is the driver for you:
>>
>> https://downloadcenter.intel.com/download/13663
>>
>> And this explains how to make it work:
>>
>> http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/05767.html
> 
> The different combinations of NICs overlap both the e1000e and igb
> drivers, but the most egregious issues have been with the igb ones.
> I'll try to give this a shot and report back if I still see issues with
> a server after doing so, but it might be a week or two before I find out.

So the NICs giving issues in most cases were igb drivers.  I've tried
replacing the drivers on some HVs with the version you suggested, but it
doesn't seem to have helped with stability.  Any other ideas?

-- 
Kevin Stange
Chief Technology Officer
Steadfast | Managed Infrastructure, Datacenter and Cloud Services
800 S Wells, Suite 190 | Chicago, IL 60607
312.602.2689 X203 | Fax: 312.602.2688
ke...@steadfast.net | www.steadfast.net
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] qemu-kvm-ev-2.6.0-28.el7_3.3.1 tagged for testing

2017-01-26 Thread Lamar Owen

On 01/26/2017 04:30 PM, Lamar Owen wrote:

On 01/26/2017 12:14 PM, Johnny Hughes wrote:

The testing RPMs are not signed .. they are straight from CBS.  Does the
testing repo not have 'gpgcheck=0'?
Ok, thanks.  Given the level of system interaction that qemu/kvm has, 
it would be an ideal vector for malware, and package signing prevents 
this.My copy of the repo file has the following:

+++
[centos-qemu-ev-test]
name=CentOS-$releasever - QEMU EV Testing
baseurl=http://buildlogs.centos.org/centos/$releasever/virt/$basearch/kvm-common/ 


gpgcheck=1
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Virtualization

The update pulled in a new .repo file as part of the release package, 
and this stanza now shows gpgcheck=0


___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] qemu-kvm-ev-2.6.0-28.el7_3.3.1 tagged for testing

2017-01-26 Thread Lamar Owen

On 01/26/2017 12:14 PM, Johnny Hughes wrote:

The testing RPMs are not signed .. they are straight from CBS.  Does the
testing repo not have 'gpgcheck=0'?
Ok, thanks.  Given the level of system interaction that qemu/kvm has, it 
would be an ideal vector for malware, and package signing prevents 
this.My copy of the repo file has the following:

+++
[centos-qemu-ev-test]
name=CentOS-$releasever - QEMU EV Testing
baseurl=http://buildlogs.centos.org/centos/$releasever/virt/$basearch/kvm-common/
gpgcheck=1
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Virtualization


___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Selinux Problem

2017-01-26 Thread Günther J . Niederwimmer
Hello,

Am Donnerstag, 26. Januar 2017, 10:54:20 CET schrieb Johnny Hughes:
> On 01/26/2017 10:06 AM, Günther J. Niederwimmer wrote:
> > Hello,
> > 
> > CentOS 7.(3) Xen 4.4,
> > 
> > Can I find any Doc for selinux with XEN, I found many Problems with
> > selinux on Dom0 ?
> > 
> > Or have I to disable selinux when I install XEN.
> > 
> > Thank's for a answer.
> 
> We have not tried to make xen work with selinux on Dom0 .. in fact our
> documentation:
> 
> https://wiki.centos.org/Manuals/ReleaseNotes/Xen4-01
> 
>  says:
> 
> SELinux support is disabled, and you might need to disable SELinux on
> the dom0 for some operations; primarily when using qemu-xen and blktap
> backed storage.

This is not the best Situation, but when I have no other way I have to disable 
selinux :-(.
 
> 
> 
> I would go as far as to say turn it off for all operations currently on
> Dom0.


-- 
mit freundlichen Grüssen / best regards

  Günther J. Niederwimmer
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 / Linux 3.18

2017-01-26 Thread Kevin Stange
On 01/26/2017 09:35 AM, Johnny Hughes wrote:
> On 01/26/2017 09:32 AM, Johnny Hughes wrote:
>> On 01/25/2017 11:49 AM, Kevin Stange wrote:
>>> On 01/24/2017 11:16 AM, Kevin Stange wrote:
 On 01/24/2017 09:10 AM, Konrad Rzeszutek Wilk wrote:
> On Tue, Jan 24, 2017 at 09:29:39PM +0800, -=X.L.O.R.D=- wrote:
>> Kevin Stange,
>> It can be either kernel or update the NIC driver or firmware of the NIC
>> card. Hope that helps!
>>
>> Xlord
>> -Original Message-
>> From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of 
>> Kevin
>> Stange
>> Sent: Tuesday, January 24, 2017 1:04 AM
>> To: centos-virt@centos.org
>> Subject: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 /
>> Linux 3.18
>>
 
>>
>> Has anyone experienced similar issues with this configuration, and if so,
>> does anyone have tips on how to resolve the issues?
>
> Honeslty I would email Intel and see if they can help. This looks like
> the NIC decides something is wrong, throws off an PCIe error and
> then resets itself.

 This happens for several different NICs.  Is there a good contact at
 Intel for this kind of thing, or should I just try to reach them through
 their web site?

> It could also be an error in the Linux stack which would "eat" an
> interrupt when migrating interrupts (which was fixed
> upstream, see below). Are you running irqbalance? Could you try
> turning it off?

 irqbalance is enabled on these servers.  I'll try disabling it.
>>>
>>> I had stopped irqbalance yesterday afternoon, but had a hypervisor's
>>> NICs fail anyway in early morning this morning, so I'm pretty sure this
>>> is not the right tree to bark up.
>>>
>>
>> Here is a set of drivers/fireware from Intel for those NICs:
>>
>> https://downloadcenter.intel.com/download/15817/Intel-Network-Adapter-Driver-for-PCI-E-Gigabit-Network-Connections-under-Linux-
>>
>> I will see if I can get a CentOS-6 build of the latest version of that
>> from our older SRPM:
>>
>> http://vault.centos.org/6.7/xen4/Source/SPackages/e1000e-2.5.4-3.10.68.2.el6.centos.alt.src.rpm
>>
>> I am currently very busy with several c5, c6, c7 updates and the i686
>> altarch c7 tree .. but I have this on my list.  In the meantime, maybe
>> someone else could also see if those drivers help you (or you could try
>> to compile / install it).
>>
>> Do you have another machine that you can use to see if you can duplicate
>> the issue NOT running the xen.gz hypervisor boot, but just the straight
>> kernel?

I can't actually reproduce this problem reliably.  It happens randomly
when the servers are up and running anywhere between a few hours and a
month or more, and I haven't been able to isolate any specific way to
cause it to happen.  As a result I can't really test different solutions
on different servers to see what helps.  I was hoping other people were
seeing it so that I could get some direction.  If I can reproduce it, it
won't take me very long to identify what the cause is.  Right now if I
do upgrade the drivers on the systems I won't really know if it's fixed
until I don't see another issue for several months.

> Actually .. I think this is the driver for you:
> 
> https://downloadcenter.intel.com/download/13663
> 
> And this explains how to make it work:
> 
> http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/05767.html

The different combinations of NICs overlap both the e1000e and igb
drivers, but the most egregious issues have been with the igb ones.
I'll try to give this a shot and report back if I still see issues with
a server after doing so, but it might be a week or two before I find out.

-- 
Kevin Stange
Chief Technology Officer
Steadfast | Managed Infrastructure, Datacenter and Cloud Services
800 S Wells, Suite 190 | Chicago, IL 60607
312.602.2689 X203 | Fax: 312.602.2688
ke...@steadfast.net | www.steadfast.net
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] qemu-kvm-ev-2.6.0-28.el7_3.3.1 tagged for testing

2017-01-26 Thread Johnny Hughes
On 01/26/2017 01:12 AM, Sandro Bonazzola wrote:
> 
> 
> On Wed, Jan 25, 2017 at 8:20 PM, Lamar Owen  > wrote:
> 
> On 01/24/2017 11:29 PM, Sandro Bonazzola wrote:
> 
> Hi,
> the latest qemu-kvm-ev has been tagged for testing.
> Please give it a run and provide feedback.
> If nothing against it shows up, we'll tag it for release on Friday.
> 
> Is it considered normal for the test RPMs to not be signed?
> 
> 
> I've no control over signing, Karanbir?
> 
>  

The testing RPMs are not signed .. they are straight from CBS.  Does the
testing repo not have 'gpgcheck=0'?




signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Selinux Problem

2017-01-26 Thread Johnny Hughes
On 01/26/2017 10:06 AM, Günther J. Niederwimmer wrote:
> Hello,
> 
> CentOS 7.(3) Xen 4.4,
> 
> Can I find any Doc for selinux with XEN, I found many Problems with selinux 
> on 
> Dom0 ?
> 
> Or have I to disable selinux when I install XEN.
> 
> Thank's for a answer.
> 

We have not tried to make xen work with selinux on Dom0 .. in fact our
documentation:

https://wiki.centos.org/Manuals/ReleaseNotes/Xen4-01

 says:

SELinux support is disabled, and you might need to disable SELinux on
the dom0 for some operations; primarily when using qemu-xen and blktap
backed storage.



I would go as far as to say turn it off for all operations currently on
Dom0.




signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Selinux Problem

2017-01-26 Thread Sarah Newman
On 01/26/2017 08:45 AM, Sarah Newman wrote:
> On 01/26/2017 08:06 AM, Günther J. Niederwimmer wrote:
>> Hello,
>>
>> CentOS 7.(3) Xen 4.4,
>>
>> Can I find any Doc for selinux with XEN, I found many Problems with selinux 
>> on 
>> Dom0 ?
>>
>> Or have I to disable selinux when I install XEN.
>>
>> Thank's for a answer.
>>
> 
> What problems and what version of CentOS?
> 
> We leave selinux enabled.

Sorry I'm blind, should have had more coffee.

I would like to know what problems you're having specifically. We aren't on 
CentOS 7 yet unfortunately.

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] Selinux Problem

2017-01-26 Thread Sarah Newman
On 01/26/2017 08:06 AM, Günther J. Niederwimmer wrote:
> Hello,
> 
> CentOS 7.(3) Xen 4.4,
> 
> Can I find any Doc for selinux with XEN, I found many Problems with selinux 
> on 
> Dom0 ?
> 
> Or have I to disable selinux when I install XEN.
> 
> Thank's for a answer.
> 

What problems and what version of CentOS?

We leave selinux enabled.

___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


[CentOS-virt] Selinux Problem

2017-01-26 Thread Günther J . Niederwimmer
Hello,

CentOS 7.(3) Xen 4.4,

Can I find any Doc for selinux with XEN, I found many Problems with selinux on 
Dom0 ?

Or have I to disable selinux when I install XEN.

Thank's for a answer.
-- 
mit freundlichen Grüssen / best regards

  Günther J. Niederwimmer
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 / Linux 3.18

2017-01-26 Thread Johnny Hughes
On 01/26/2017 09:32 AM, Johnny Hughes wrote:
> On 01/25/2017 11:49 AM, Kevin Stange wrote:
>> On 01/24/2017 11:16 AM, Kevin Stange wrote:
>>> On 01/24/2017 09:10 AM, Konrad Rzeszutek Wilk wrote:
 On Tue, Jan 24, 2017 at 09:29:39PM +0800, -=X.L.O.R.D=- wrote:
> Kevin Stange,
> It can be either kernel or update the NIC driver or firmware of the NIC
> card. Hope that helps!
>
> Xlord
> -Original Message-
> From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of 
> Kevin
> Stange
> Sent: Tuesday, January 24, 2017 1:04 AM
> To: centos-virt@centos.org
> Subject: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 /
> Linux 3.18
>
>>> 
>
> Has anyone experienced similar issues with this configuration, and if so,
> does anyone have tips on how to resolve the issues?

 Honeslty I would email Intel and see if they can help. This looks like
 the NIC decides something is wrong, throws off an PCIe error and
 then resets itself.
>>>
>>> This happens for several different NICs.  Is there a good contact at
>>> Intel for this kind of thing, or should I just try to reach them through
>>> their web site?
>>>
 It could also be an error in the Linux stack which would "eat" an
 interrupt when migrating interrupts (which was fixed
 upstream, see below). Are you running irqbalance? Could you try
 turning it off?
>>>
>>> irqbalance is enabled on these servers.  I'll try disabling it.
>>
>> I had stopped irqbalance yesterday afternoon, but had a hypervisor's
>> NICs fail anyway in early morning this morning, so I'm pretty sure this
>> is not the right tree to bark up.
>>
> 
> Here is a set of drivers/fireware from Intel for those NICs:
> 
> https://downloadcenter.intel.com/download/15817/Intel-Network-Adapter-Driver-for-PCI-E-Gigabit-Network-Connections-under-Linux-
> 
> I will see if I can get a CentOS-6 build of the latest version of that
> from our older SRPM:
> 
> http://vault.centos.org/6.7/xen4/Source/SPackages/e1000e-2.5.4-3.10.68.2.el6.centos.alt.src.rpm
> 
> I am currently very busy with several c5, c6, c7 updates and the i686
> altarch c7 tree .. but I have this on my list.  In the meantime, maybe
> someone else could also see if those drivers help you (or you could try
> to compile / install it).
> 
> Do you have another machine that you can use to see if you can duplicate
> the issue NOT running the xen.gz hypervisor boot, but just the straight
> kernel?

Actually .. I think this is the driver for you:

https://downloadcenter.intel.com/download/13663

And this explains how to make it work:

http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/05767.html




signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 / Linux 3.18

2017-01-26 Thread Johnny Hughes
On 01/25/2017 11:49 AM, Kevin Stange wrote:
> On 01/24/2017 11:16 AM, Kevin Stange wrote:
>> On 01/24/2017 09:10 AM, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Jan 24, 2017 at 09:29:39PM +0800, -=X.L.O.R.D=- wrote:
 Kevin Stange,
 It can be either kernel or update the NIC driver or firmware of the NIC
 card. Hope that helps!

 Xlord
 -Original Message-
 From: CentOS-virt [mailto:centos-virt-boun...@centos.org] On Behalf Of 
 Kevin
 Stange
 Sent: Tuesday, January 24, 2017 1:04 AM
 To: centos-virt@centos.org
 Subject: [CentOS-virt] NIC Stability Problems Under Xen 4.4 / CentOS 6 /
 Linux 3.18

>> 

 Has anyone experienced similar issues with this configuration, and if so,
 does anyone have tips on how to resolve the issues?
>>>
>>> Honeslty I would email Intel and see if they can help. This looks like
>>> the NIC decides something is wrong, throws off an PCIe error and
>>> then resets itself.
>>
>> This happens for several different NICs.  Is there a good contact at
>> Intel for this kind of thing, or should I just try to reach them through
>> their web site?
>>
>>> It could also be an error in the Linux stack which would "eat" an
>>> interrupt when migrating interrupts (which was fixed
>>> upstream, see below). Are you running irqbalance? Could you try
>>> turning it off?
>>
>> irqbalance is enabled on these servers.  I'll try disabling it.
> 
> I had stopped irqbalance yesterday afternoon, but had a hypervisor's
> NICs fail anyway in early morning this morning, so I'm pretty sure this
> is not the right tree to bark up.
> 

Here is a set of drivers/fireware from Intel for those NICs:

https://downloadcenter.intel.com/download/15817/Intel-Network-Adapter-Driver-for-PCI-E-Gigabit-Network-Connections-under-Linux-

I will see if I can get a CentOS-6 build of the latest version of that
from our older SRPM:

http://vault.centos.org/6.7/xen4/Source/SPackages/e1000e-2.5.4-3.10.68.2.el6.centos.alt.src.rpm

I am currently very busy with several c5, c6, c7 updates and the i686
altarch c7 tree .. but I have this on my list.  In the meantime, maybe
someone else could also see if those drivers help you (or you could try
to compile / install it).

Do you have another machine that you can use to see if you can duplicate
the issue NOT running the xen.gz hypervisor boot, but just the straight
kernel?

Thanks,
Johnny Hughes



signature.asc
Description: OpenPGP digital signature
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt