Re: [CentOS-virt] Xen C6 kernel 4.9.13 and testing 4.9.15 only reboots.

2017-04-20 Thread Anderson, Dave
Good news/bad news testing the new kernel on CentOS7 with my now notoriously 
finicky machines:

Good news: 4.9.23-26.el7 (grabbed today via yum update) isn't any worse than 
4.9.13-22 was on my xen hosts (as far as I can tell so far at least)

Bad news: It isn't any better than 4.9.13 was for me either, if I don't set 
vcpu limit in the grub/xen config, it still panics like so:

[6.716016] CPU: Physical Processor ID: 0
[6.720199] CPU: Processor Core ID: 0
[6.724046] mce: CPU supports 2 MCE banks
[6.728239] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8
[6.733884] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
[6.740770] Freeing SMP alternatives memory: 32K (821a8000 - 
821b)
[6.750638] ftrace: allocating 34344 entries in 135 pages
[6.771888] smpboot: Max logical packages: 1
[6.776363] VPMU disabled by hypervisor.
[6.780479] Performance Events: SandyBridge events, PMU not available due to 
virtualization, using software events only.
[6.792237] NMI watchdog: disabled (cpu0): hardware events not enabled
[6.798943] NMI watchdog: Shutting down hard lockup detector on all cpus
[6.805949] installing Xen timer for CPU 1
[6.810659] installing Xen timer for CPU 2
[6.815317] installing Xen timer for CPU 3
[6.819947] installing Xen timer for CPU 4
[6.824618] installing Xen timer for CPU 5
[6.829282] installing Xen timer for CPU 6
[6.833935] installing Xen timer for CPU 7
[6.838565] installing Xen timer for CPU 8
[6.843110] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1.
[6.849475] [ cut here ]
[6.854091] kernel BUG at arch/x86/kernel/cpu/common.c:997!
[6.855864] random: fast init done
[6.863070] invalid opcode:  [#1] SMP
[6.867088] Modules linked in:
[6.870168] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 4.9.23-26.el7.x86_64 #1
[6.877298] Hardware name: Supermicro X9DRT/X9DRT, BIOS 3.2a 08/04/2015
[6.883920] task: 880058a6a5c0 task.stack: c900400c
[6.889840] RIP: e030:[]  [] 
identify_secondary_cpu+0x57/0x80
[6.898756] RSP: e02b:c900400c3f08  EFLAGS: 00010086
[6.904069] RAX: ffe4 RBX: 88005d80a020 RCX: 81e5ffc8
[6.911201] RDX: 0001 RSI: 0005 RDI: 0005
[6.918335] RBP: c900400c3f18 R08: 00ce R09: 
[6.925466] R10: 0005 R11: 0006 R12: 0008
[6.932599] R13:  R14:  R15: 
[6.939735] FS:  () GS:88005d80() 
knlGS:
[6.947819] CS:  e033 DS: 002b ES: 002b CR0: 80050033
[6.953565] CR2:  CR3: 01e07000 CR4: 00042660
[6.960696] Stack:
[6.962731]  0008  c900400c3f28 
8104ebce
[6.970205]  c900400c3f40 81029855  
c900400c3f50
[6.977691]  810298d0   

[6.985164] Call Trace:
[6.987626]  [] smp_store_cpu_info+0x3e/0x40
[6.993480]  [] cpu_bringup+0x35/0x90
[6.998700]  [] cpu_bringup_and_idle+0x20/0x40
[7.004706] Code: 44 89 e7 ff 50 68 0f b7 93 d2 00 00 00 39 d0 75 1c 0f b7 
bb da 00 00 00 44 89 e6 e8 e4 02 01 00 85 c0 75 07 5b 41 5c 5d c3 0f 0b <0f> 0b 
0f b7 8b d4 00 00 00 89 c2 44 89 e6 48 c7 c7 90 d3 ca 81 
[7.024976] RIP  [] identify_secondary_cpu+0x57/0x80
[7.031528]  RSP 
[7.035032] ---[ end trace f2a8d75941398d9f ]---
[7.039658] Kernel panic - not syncing: Attempted to kill the idle task!

So...other than my work around...that still works...not sure what else I can 
provide in the way of feedback/testing. But if you want anything else gathered, 
let me know.

Thanks,
-Dave

--
Dave Anderson


> On Apr 19, 2017, at 10:33 AM, Johnny Hughes  wrote:
> 
> On 04/19/2017 12:18 PM, PJ Welsh wrote:
>> 
>> On Wed, Apr 19, 2017 at 5:40 AM, Johnny Hughes > > wrote:
>> 
>>On 04/18/2017 12:39 PM, PJ Welsh wrote:
>>> Here is something interesting... I went through the BIOS options and
>>> found that one R710 that *is* functioning only differed in that "Logical
>>> Processor"/Hyperthreading was *enabled* while the one that is *not*
>>> functioning had HT *disabled*. Enabled Logical Processor and the system
>>> starts without issue! I've rebooted 3 times now without issue.
>>> Dell R710 BIOS version 6.4.0
>>> 2x Intel(R) Xeon(R) CPU L5639  @ 2.13GHz
>>> 4.9.20-26.el7.x86_64 #1 SMP Tue Apr 4 11:19:26 CDT 2017 x86_64 x86_64
>>> x86_64 GNU/Linux
>>> 
>> 
>>Outstanding .. I have now released a 4.9.23-26.el6 and .el7 to the
>>system as normal updates.  It should be available later today.
>> 
>>
>> 
>> 
>> I've verified with a second Dell R710 that disabling
>> Hyperthreading/Logical Processor causes the primary xen booting kernel
>> to fail and r

Re: [CentOS-virt] lvm cache + qemu-kvm stops working after about 20GB of writes

2017-04-20 Thread Sandro Bonazzola
On Thu, Apr 20, 2017 at 12:32 PM, Richard Landsman - Rimote <
rich...@rimote.nl> wrote:

> Hello everyone,
>
> Anybody had the chance to test out this setup and reproduce the problem? I
> assumed it would be something that's used often these days and a solution
> would benefit a lot of users. If can be of any assistance please contact
> me.
>
I haven't seen any additional report of this happening, can you please try
to reproduce with the new qemu-kvm-ev-2.6.0-28.el7_3.9.1 currently in
testing?

> --
> Met vriendelijke groet,
>
> Richard Landsmanhttp://rimote.nl
>
> T: +31 (0)50 - 763 04 07
> (ma-vr 9:00 tot 18:00)
>
> 24/7 bij storingen:
> +31 (0)6 - 4388 7949
> @RimoteSaS (Twitter Serviceberichten/security updates)
>
>
>

-- 

SANDRO BONAZZOLA

ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D

Red Hat EMEA 

TRIED. TESTED. TRUSTED. 
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


[CentOS-virt] qemu-kvm-ev-2.6.0-28.el7_3.9.1 now available for testing

2017-04-20 Thread Sandro Bonazzola
Hi,
just pushed to testing a new build of qemu-kvm-ev, here's the ChangeLog:

* Thu Apr 20 2017 Sandro Bonazzola  -
ev-2.6.0-28.el7_3.9.1
- Removing RH branding from package name

* Fri Mar 24 2017 Miroslav Rezanina  -
rhev-2.6.0-28.el7_3.9
- kvm-block-gluster-memory-usage-use-one-glfs-instance-per.patch
[bz#1413044]
- kvm-gluster-Fix-use-after-free-in-glfs_clear_preopened.patch [bz#1413044]
- kvm-fix-cirrus_vga-fix-OOB-read-case-qemu-Segmentation-f.patch
[bz#1430061]
- kvm-cirrus-vnc-zap-bitblit-support-from-console-code.patch [bz#1430061]
- kvm-cirrus-add-option-to-disable-blitter.patch [bz#1430061]
- kvm-cirrus-fix-cirrus_invalidate_region.patch [bz#1430061]
- kvm-cirrus-stop-passing-around-dst-pointers-in-the-blitt.patch
[bz#1430061]
- kvm-cirrus-stop-passing-around-src-pointers-in-the-blitt.patch
[bz#1430061]
- kvm-cirrus-fix-off-by-one-in-cirrus_bitblt_rop_bkwd_tran.patch
[bz#1430061]
- kvm-file-posix-Consider-max_segments-for-BlockLimits.max.patch
[bz#1431149]
- kvm-file-posix-clean-up-max_segments-buffer-termination.patch [bz#1431149]
- kvm-file-posix-Don-t-leak-fd-in-hdev_get_max_segments.patch [bz#1431149]
- Resolves: bz#1413044
  (block-gluster: use one glfs instance per volume)
- Resolves: bz#1430061
  (CVE-2016-9603 qemu-kvm-rhev: Qemu: cirrus: heap buffer overflow via vnc
connection [rhel-7.3.z])
- Resolves: bz#1431149
  (VMs pause when writing to Virtio-SCSI direct lun with scsi passthrough
enabled via an Emulex HBA)

* Tue Mar 21 2017 Miroslav Rezanina  -
rhev-2.6.0-28.el7_3.8
- kvm-target-i386-present-virtual-L3-cache-info-for-vcpus.patch [bz#1430802]
- Resolves: bz#1430802
  (Enhance qemu to present virtual L3 cache info for vcpus)

* Wed Mar 15 2017 Miroslav Rezanina  -
rhev-2.6.0-28.el7_3.7
- kvm-block-check-full-backing-filename-when-searching-pro.patch
[bz#1425125]
- kvm-qemu-iotests-Don-t-create-fifos-pidfiles-with-protoc.patch
[bz#1425125]
- kvm-qemu-iotest-test-to-lookup-protocol-based-image-with.patch
[bz#1425125]
- kvm-target-i386-Don-t-use-cpu-migratable-when-filtering-.patch
[bz#1413897]
- Resolves: bz#1413897
  (cpu flag nonstop_tsc is not present in guest with host-passthrough and
feature policy require invtsc)
- Resolves: bz#1425125
  (qemu fails to recognize gluster URIs in backing chain for block-commit
operation)

In order to test it:

# yum install centos-release-qemu-ev
# yum-config-manager --enable centos-qemu-ev-test
# yum install qemu-kvm-ev

Or just yum update if you already have qemu-kvm-ev installed and test
repository enabled.

If you test it, please provide feedback.
I'm going to release it on April 26th if no negative feedback is provided.

-- 

SANDRO BONAZZOLA

ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D

Red Hat EMEA 

TRIED. TESTED. TRUSTED. 
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] qemu-kvm-ev ppc64le release builds

2017-04-20 Thread Sandro Bonazzola
On Thu, Apr 20, 2017 at 1:03 AM, Lance Albertson  wrote:

> Hi,
>
> We're using qemu-kvm-ev on ppc64le and I've noticed that it's included in
> the extras repo for ppc64le but in the qemu-kvm-ev repo for x86_64. I also
> noticed the version in ppc64le is lagging behind x86. I see that ppc64le is
> being built for this, however isn't tagged for virt7-kvm-common-release and
> thus showing up under /virt/kvm-common on the mirrors.
>
> Is there any particular reason why this is happening? If possible, I can
> certainly provide some testing to get this moving along.
>

There are still issues in the publishing chain for alternative arches from
SIGs.
https://bugs.centos.org/view.php?id=11457 is tracking it.
Karanbir, any chance we can move on on this topic?



>
> Thank you!
>
> [1] http://cbs.centos.org/koji/packageinfo?packageID=539
>
> --
> Lance Albertson
> Director
> Oregon State University | Open Source Lab
>
> ___
> CentOS-virt mailing list
> CentOS-virt@centos.org
> https://lists.centos.org/mailman/listinfo/centos-virt
>
>


-- 

SANDRO BONAZZOLA

ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D

Red Hat EMEA 

TRIED. TESTED. TRUSTED. 
___
CentOS-virt mailing list
CentOS-virt@centos.org
https://lists.centos.org/mailman/listinfo/centos-virt


Re: [CentOS-virt] lvm cache + qemu-kvm stops working after about 20GB of writes

2017-04-20 Thread Richard Landsman - Rimote

Hello everyone,

Anybody had the chance to test out this setup and reproduce the problem? 
I assumed it would be something that's used often these days and a 
solution would benefit a lot of users. If can be of any assistance 
please contact me.


--
Met vriendelijke groet,

Richard Landsman
http://rimote.nl

T: +31 (0)50 - 763 04 07
(ma-vr 9:00 tot 18:00)

24/7 bij storingen:
+31 (0)6 - 4388 7949
@RimoteSaS (Twitter Serviceberichten/security updates)

On 04/10/2017 10:08 AM, Sandro Bonazzola wrote:

Adding Paolo and Miroslav.

On Sat, Apr 8, 2017 at 4:49 PM, Richard Landsman - Rimote 
mailto:rich...@rimote.nl>> wrote:


Hello,

I would really appreciate some help/guidance with this problem.
First of all sorry for the long message. I would file a bug, but
do not know if it is my fault, dm-cache, qemu or (probably) a
combination of both. And i can imagine some of you have this setup
up and running without problems (or maybe you think it works, just
like i did, but it does not):

PROBLEM
LVM cache writeback stops working as expected after a while with a
qemu-kvm VM. A 100% working setup would be the holy grail in my
opinion... and the performance of KVM/qemu is great i must say in
the beginning.

DESCRIPTION

When using software RAID 1 (2x HDD) + software RAID 1 (2xSSD) and
create a cached LV out of them, the VM performs initially great
(at least 40.000 IOPS on 4k rand read/write)! But then after a
while (and a lot of random IO, ca 10 - 20 G) it effectively turns
in to a writethrough cache although there's much space left on the
cachedlv.


When  working as expected on KVM host all writes go to SSDs

iostat -x -m 2

Device: rrqm/s   wrqm/s r/s w/s rMB/swMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0.00   324.500.00   22.00 0.0014.94 
1390.57 1.90   86.390.00 86.39   5.32  11.70
sdb   0.00   324.500.00   22.00 0.0014.94 
1390.57 2.03   92.450.00 92.45   5.48  12.05
sdc   0.00  3932.000.00 *2191.50* 0.00 *270.07*  
252.3937.83   17.55 0.00   17.55   0.36 *78.05*
sdd   0.00  3932.000.00 *2197.50 * 0.00 *271.01 * 
252.5738.96   18.14 0.00   18.14   0.36 *78.95*



When not working as expected on KVM host all writes go through the
SSD on to the HDDs (effectively disabling writeback so it becomes
a writethrough)

Device: rrqm/s   wrqm/s r/s w/s rMB/swMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 7.00  234.50 *173.50 * 0.92 *1.95*   
14.3829.27   71.27  111.89 16.37   2.45 *100.00*
sdb   0.00 3.50  212.00 *177.50 * 0.83 *1.95*   
14.6035.58   91.24  143.00 29.42   2.57*100.10*
sdc   2.50 0.00  566.00 *199.00 * 2.69
0.78 9.28 0.080.110.13 0.04   0.10 *7.70*
sdd   1.50 0.00   76.00 *199.00* 0.65 0.78   
10.66 0.020.070.16 0.04   0.07 *1.85*



Stuff i've checked/tried:

- The data in the cached LV has then not exceeded even half of the
space, so this should not happen. It even happens when only 20% of
cachedata is used.
- It seems to be triggerd most of the time when %cpy/sync column
of `lvs -a` is about 30%. But this is not always the case!
- changing the cachepolicy from cleaner to smq, wait (check flush
ready with lvs -a) and then back to smq seems to help /sometimes/!
But not always...

lvchange --cachepolicy cleaner /dev/mapper/XXX-cachedlv

lvs -a

lvchange --cachepolicy smq /dev/mapper/XXX-cachedlv

- *when mounting the LV inside the host this does not seem to
happen!!* So it looks like a qemu-kvm / dm-cache combination
issue. Only difference is that inside host i do mkfs in stead of
LVM inside VM (so could be LVM inside VM on top of LVM on KVM host
problem too? small chance probably because the first 10 - 20GB it
works great!)

- tried disabling Selinux, upgrading to newest kernels (elrepo ml
and lt), played around with dirty_cache thingeys like
proc/sys/vm/dirty_writeback_centisecs
/proc/sys/vm/dirty_expire_centisecs cat /proc/sys/vm/dirty_ratio ,
and migration threashold of dmsetup, and other probably non
important stuff like vm.dirty_bytes

- when in "slow state" the systems kworkers are exessively using
IO (10 - 20 MB per kworker process). This seems to be the
writeback process (CPY%Sync) because the cache wants to flush to
HDD. But the strange thing is that after a good sync (0% left),
the disk may become slow again after a few MBs of data. A reboot
sometimes helps.

- have tried iothreads, virtio-scsi, vcpu driver setting on
virtio-scsi controller, cachesettings, disk shedulers etc. Nothing
helped.

- the new