Re: [pve-devel] transparent huge pages support / disk passthrough corruption

2017-01-19 Thread Andreas Steinel
It seems that the current implementation is much better than it was in the
RHEL-based kernel.

On Thu, Jan 19, 2017 at 9:43 AM, Alexandre DERUMIER 
wrote:

> Hi,
>
> I have reenable THP ( transparent_hugepage=madvise) since around 1 year
> (with pve-kernel 4.2-4.4), and I don't have problem anymore like in the
> past.
>
> I'm hosting a lot of database (mysql,sqlserver, redis, mongo,...) and I
> don't have seen performance impact since I have reenable THP.
>
> So I think it's pretty safe to set it by default.
>
>
>
>
> - Mail original -
> De: "Fabian Grünbichler" 
> À: "pve-devel" 
> Cc: "aderumier" , "Andreas Steinel" <
> a.stei...@gmail.com>
> Envoyé: Jeudi 19 Janvier 2017 09:35:43
> Objet: transparent huge pages support / disk passthrough corruption
>
> So it seems like the recently reported problems[1] with disk pass
> through using virtio-scsi(-single) are caused by a combination of Qemu
> since 2.7 not handling memory fragmentation (well) and our compiled-in
> default of disabling transparent huge pages on the kernel side.
>
> While I will investigate further and see whether this is not fixable on
> the Qemu side as well, I think it would be a good idea to revisit the
> decision to patch this default in[2].
>
> @Andreas, Alexandre: you both where proponents of disabling THP support
> back then, but the current kernel docs[3] say (emphasis mine):
>
> -%<-
> Transparent Hugepage Support can be entirely disabled (*mostly for
> debugging purposes*) or only enabled inside MADV_HUGEPAGE regions (to
> avoid the risk of consuming more memory resources) or enabled system
> wide. This can be achieved with one of:
>
> echo always >/sys/kernel/mm/transparent_hugepage/enabled
> echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
> echo never >/sys/kernel/mm/transparent_hugepage/enabled
>
> It's also possible to limit defrag efforts in the VM to generate
> hugepages in case they're not immediately free to madvise regions or
> to never try to defrag memory and simply fallback to regular pages
> unless hugepages are immediately available. Clearly if we spend CPU
> time to defrag memory, we would expect to gain even more by the fact
> we use hugepages later instead of regular pages. This isn't always
> guaranteed, but it may be more likely in case the allocation is for a
> MADV_HUGEPAGE region.
>
> echo always >/sys/kernel/mm/transparent_hugepage/defrag
> echo madvise >/sys/kernel/mm/transparent_hugepage/defrag
> echo never >/sys/kernel/mm/transparent_hugepage/defrag
> ->%-
>
> so I think setting both enabled and defrag to "madvise" by default would
> be advisable - the admin can override it (permanently with a kernel boot
> parameter, or at run time with the sysfs interface) anyway if they
> really know it causes performance issues.
>
> if you have any hard benchmark data to back up staying at "never",
> please send it soon ;) preferable both with non-transparent hugepages
> setup and without, and with both "always" and "madvise" for enabled and
> defrag.
>
> running a setup that is intended for debugging purposes (see above) as
> default seems strange to me (and this was probably the reason why we
> needed to patch "never" as default in in the first place). while I am
> not yet convinced that this solves the passthrough data corruption issue
> entirely, it is very reliably reproducable with THP disabled, and not at
> all so far on my test setup with THP enabled - so I propose switching
> with the next kernel update, unless there are (serious) objections.
>
> 1: https://forum.proxmox.com/threads/proxmox-4-4-virtio_
> scsi-regression.31471/
> 2: http://pve.proxmox.com/pipermail/pve-devel/2015-September/017079.html
> 3. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-
> stable.git/tree/Documentation/vm/transhuge.txt?h=linux-4.4.y#n95
>
>
___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] transparent huge pages support / disk passthrough corruption

2017-01-19 Thread Alexandre DERUMIER
Hi,

I have reenable THP ( transparent_hugepage=madvise) since around 1 year (with 
pve-kernel 4.2-4.4), and I don't have problem anymore like in the past.

I'm hosting a lot of database (mysql,sqlserver, redis, mongo,...) and I don't 
have seen performance impact since I have reenable THP.

So I think it's pretty safe to set it by default.




- Mail original -
De: "Fabian Grünbichler" 
À: "pve-devel" 
Cc: "aderumier" , "Andreas Steinel" 
Envoyé: Jeudi 19 Janvier 2017 09:35:43
Objet: transparent huge pages support / disk passthrough corruption

So it seems like the recently reported problems[1] with disk pass 
through using virtio-scsi(-single) are caused by a combination of Qemu 
since 2.7 not handling memory fragmentation (well) and our compiled-in 
default of disabling transparent huge pages on the kernel side. 

While I will investigate further and see whether this is not fixable on 
the Qemu side as well, I think it would be a good idea to revisit the 
decision to patch this default in[2]. 

@Andreas, Alexandre: you both where proponents of disabling THP support 
back then, but the current kernel docs[3] say (emphasis mine): 

-%<- 
Transparent Hugepage Support can be entirely disabled (*mostly for 
debugging purposes*) or only enabled inside MADV_HUGEPAGE regions (to 
avoid the risk of consuming more memory resources) or enabled system 
wide. This can be achieved with one of: 

echo always >/sys/kernel/mm/transparent_hugepage/enabled 
echo madvise >/sys/kernel/mm/transparent_hugepage/enabled 
echo never >/sys/kernel/mm/transparent_hugepage/enabled 

It's also possible to limit defrag efforts in the VM to generate 
hugepages in case they're not immediately free to madvise regions or 
to never try to defrag memory and simply fallback to regular pages 
unless hugepages are immediately available. Clearly if we spend CPU 
time to defrag memory, we would expect to gain even more by the fact 
we use hugepages later instead of regular pages. This isn't always 
guaranteed, but it may be more likely in case the allocation is for a 
MADV_HUGEPAGE region. 

echo always >/sys/kernel/mm/transparent_hugepage/defrag 
echo madvise >/sys/kernel/mm/transparent_hugepage/defrag 
echo never >/sys/kernel/mm/transparent_hugepage/defrag 
->%- 

so I think setting both enabled and defrag to "madvise" by default would 
be advisable - the admin can override it (permanently with a kernel boot 
parameter, or at run time with the sysfs interface) anyway if they 
really know it causes performance issues. 

if you have any hard benchmark data to back up staying at "never", 
please send it soon ;) preferable both with non-transparent hugepages 
setup and without, and with both "always" and "madvise" for enabled and 
defrag. 

running a setup that is intended for debugging purposes (see above) as 
default seems strange to me (and this was probably the reason why we 
needed to patch "never" as default in in the first place). while I am 
not yet convinced that this solves the passthrough data corruption issue 
entirely, it is very reliably reproducable with THP disabled, and not at 
all so far on my test setup with THP enabled - so I propose switching 
with the next kernel update, unless there are (serious) objections. 

1: https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/ 
2: http://pve.proxmox.com/pipermail/pve-devel/2015-September/017079.html 
3. 
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Documentation/vm/transhuge.txt?h=linux-4.4.y#n95
 

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] transparent huge pages support / disk passthrough corruption

2017-01-19 Thread Fabian Grünbichler
So it seems like the recently reported problems[1] with disk pass
through using virtio-scsi(-single) are caused by a combination of Qemu
since 2.7 not handling memory fragmentation (well) and our compiled-in
default of disabling transparent huge pages on the kernel side.

While I will investigate further and see whether this is not fixable on
the Qemu side as well, I think it would be a good idea to revisit the
decision to patch this default in[2].

@Andreas, Alexandre: you both where proponents of disabling THP support
back then, but the current kernel docs[3] say (emphasis mine):

-%<-
Transparent Hugepage Support can be entirely disabled (*mostly for
debugging purposes*) or only enabled inside MADV_HUGEPAGE regions (to
avoid the risk of consuming more memory resources) or enabled system
wide. This can be achieved with one of:

echo always >/sys/kernel/mm/transparent_hugepage/enabled
echo madvise >/sys/kernel/mm/transparent_hugepage/enabled
echo never >/sys/kernel/mm/transparent_hugepage/enabled

It's also possible to limit defrag efforts in the VM to generate
hugepages in case they're not immediately free to madvise regions or
to never try to defrag memory and simply fallback to regular pages
unless hugepages are immediately available. Clearly if we spend CPU
time to defrag memory, we would expect to gain even more by the fact
we use hugepages later instead of regular pages. This isn't always
guaranteed, but it may be more likely in case the allocation is for a
MADV_HUGEPAGE region.

echo always >/sys/kernel/mm/transparent_hugepage/defrag
echo madvise >/sys/kernel/mm/transparent_hugepage/defrag
echo never >/sys/kernel/mm/transparent_hugepage/defrag
->%-

so I think setting both enabled and defrag to "madvise" by default would
be advisable - the admin can override it (permanently with a kernel boot
parameter, or at run time with the sysfs interface) anyway if they
really know it causes performance issues.

if you have any hard benchmark data to back up staying at "never",
please send it soon ;) preferable both with non-transparent hugepages
setup and without, and with both "always" and "madvise" for enabled and
defrag.

running a setup that is intended for debugging purposes (see above) as
default seems strange to me (and this was probably the reason why we
needed to patch "never" as default in in the first place). while I am
not yet convinced that this solves the passthrough data corruption issue
entirely, it is very reliably reproducable with THP disabled, and not at
all so far on my test setup with THP enabled - so I propose switching
with the next kernel update, unless there are (serious) objections.

1: https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/
2: http://pve.proxmox.com/pipermail/pve-devel/2015-September/017079.html
3. 
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Documentation/vm/transhuge.txt?h=linux-4.4.y#n95

___
pve-devel mailing list
pve-devel@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel