Re: [pve-devel] transparent huge pages support / disk passthrough corruption
It seems that the current implementation is much better than it was in the RHEL-based kernel. On Thu, Jan 19, 2017 at 9:43 AM, Alexandre DERUMIERwrote: > Hi, > > I have reenable THP ( transparent_hugepage=madvise) since around 1 year > (with pve-kernel 4.2-4.4), and I don't have problem anymore like in the > past. > > I'm hosting a lot of database (mysql,sqlserver, redis, mongo,...) and I > don't have seen performance impact since I have reenable THP. > > So I think it's pretty safe to set it by default. > > > > > - Mail original - > De: "Fabian Grünbichler" > À: "pve-devel" > Cc: "aderumier" , "Andreas Steinel" < > a.stei...@gmail.com> > Envoyé: Jeudi 19 Janvier 2017 09:35:43 > Objet: transparent huge pages support / disk passthrough corruption > > So it seems like the recently reported problems[1] with disk pass > through using virtio-scsi(-single) are caused by a combination of Qemu > since 2.7 not handling memory fragmentation (well) and our compiled-in > default of disabling transparent huge pages on the kernel side. > > While I will investigate further and see whether this is not fixable on > the Qemu side as well, I think it would be a good idea to revisit the > decision to patch this default in[2]. > > @Andreas, Alexandre: you both where proponents of disabling THP support > back then, but the current kernel docs[3] say (emphasis mine): > > -%<- > Transparent Hugepage Support can be entirely disabled (*mostly for > debugging purposes*) or only enabled inside MADV_HUGEPAGE regions (to > avoid the risk of consuming more memory resources) or enabled system > wide. This can be achieved with one of: > > echo always >/sys/kernel/mm/transparent_hugepage/enabled > echo madvise >/sys/kernel/mm/transparent_hugepage/enabled > echo never >/sys/kernel/mm/transparent_hugepage/enabled > > It's also possible to limit defrag efforts in the VM to generate > hugepages in case they're not immediately free to madvise regions or > to never try to defrag memory and simply fallback to regular pages > unless hugepages are immediately available. Clearly if we spend CPU > time to defrag memory, we would expect to gain even more by the fact > we use hugepages later instead of regular pages. This isn't always > guaranteed, but it may be more likely in case the allocation is for a > MADV_HUGEPAGE region. > > echo always >/sys/kernel/mm/transparent_hugepage/defrag > echo madvise >/sys/kernel/mm/transparent_hugepage/defrag > echo never >/sys/kernel/mm/transparent_hugepage/defrag > ->%- > > so I think setting both enabled and defrag to "madvise" by default would > be advisable - the admin can override it (permanently with a kernel boot > parameter, or at run time with the sysfs interface) anyway if they > really know it causes performance issues. > > if you have any hard benchmark data to back up staying at "never", > please send it soon ;) preferable both with non-transparent hugepages > setup and without, and with both "always" and "madvise" for enabled and > defrag. > > running a setup that is intended for debugging purposes (see above) as > default seems strange to me (and this was probably the reason why we > needed to patch "never" as default in in the first place). while I am > not yet convinced that this solves the passthrough data corruption issue > entirely, it is very reliably reproducable with THP disabled, and not at > all so far on my test setup with THP enabled - so I propose switching > with the next kernel update, unless there are (serious) objections. > > 1: https://forum.proxmox.com/threads/proxmox-4-4-virtio_ > scsi-regression.31471/ > 2: http://pve.proxmox.com/pipermail/pve-devel/2015-September/017079.html > 3. https://git.kernel.org/cgit/linux/kernel/git/stable/linux- > stable.git/tree/Documentation/vm/transhuge.txt?h=linux-4.4.y#n95 > > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] transparent huge pages support / disk passthrough corruption
Hi, I have reenable THP ( transparent_hugepage=madvise) since around 1 year (with pve-kernel 4.2-4.4), and I don't have problem anymore like in the past. I'm hosting a lot of database (mysql,sqlserver, redis, mongo,...) and I don't have seen performance impact since I have reenable THP. So I think it's pretty safe to set it by default. - Mail original - De: "Fabian Grünbichler"À: "pve-devel" Cc: "aderumier" , "Andreas Steinel" Envoyé: Jeudi 19 Janvier 2017 09:35:43 Objet: transparent huge pages support / disk passthrough corruption So it seems like the recently reported problems[1] with disk pass through using virtio-scsi(-single) are caused by a combination of Qemu since 2.7 not handling memory fragmentation (well) and our compiled-in default of disabling transparent huge pages on the kernel side. While I will investigate further and see whether this is not fixable on the Qemu side as well, I think it would be a good idea to revisit the decision to patch this default in[2]. @Andreas, Alexandre: you both where proponents of disabling THP support back then, but the current kernel docs[3] say (emphasis mine): -%<- Transparent Hugepage Support can be entirely disabled (*mostly for debugging purposes*) or only enabled inside MADV_HUGEPAGE regions (to avoid the risk of consuming more memory resources) or enabled system wide. This can be achieved with one of: echo always >/sys/kernel/mm/transparent_hugepage/enabled echo madvise >/sys/kernel/mm/transparent_hugepage/enabled echo never >/sys/kernel/mm/transparent_hugepage/enabled It's also possible to limit defrag efforts in the VM to generate hugepages in case they're not immediately free to madvise regions or to never try to defrag memory and simply fallback to regular pages unless hugepages are immediately available. Clearly if we spend CPU time to defrag memory, we would expect to gain even more by the fact we use hugepages later instead of regular pages. This isn't always guaranteed, but it may be more likely in case the allocation is for a MADV_HUGEPAGE region. echo always >/sys/kernel/mm/transparent_hugepage/defrag echo madvise >/sys/kernel/mm/transparent_hugepage/defrag echo never >/sys/kernel/mm/transparent_hugepage/defrag ->%- so I think setting both enabled and defrag to "madvise" by default would be advisable - the admin can override it (permanently with a kernel boot parameter, or at run time with the sysfs interface) anyway if they really know it causes performance issues. if you have any hard benchmark data to back up staying at "never", please send it soon ;) preferable both with non-transparent hugepages setup and without, and with both "always" and "madvise" for enabled and defrag. running a setup that is intended for debugging purposes (see above) as default seems strange to me (and this was probably the reason why we needed to patch "never" as default in in the first place). while I am not yet convinced that this solves the passthrough data corruption issue entirely, it is very reliably reproducable with THP disabled, and not at all so far on my test setup with THP enabled - so I propose switching with the next kernel update, unless there are (serious) objections. 1: https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/ 2: http://pve.proxmox.com/pipermail/pve-devel/2015-September/017079.html 3. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Documentation/vm/transhuge.txt?h=linux-4.4.y#n95 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] transparent huge pages support / disk passthrough corruption
So it seems like the recently reported problems[1] with disk pass through using virtio-scsi(-single) are caused by a combination of Qemu since 2.7 not handling memory fragmentation (well) and our compiled-in default of disabling transparent huge pages on the kernel side. While I will investigate further and see whether this is not fixable on the Qemu side as well, I think it would be a good idea to revisit the decision to patch this default in[2]. @Andreas, Alexandre: you both where proponents of disabling THP support back then, but the current kernel docs[3] say (emphasis mine): -%<- Transparent Hugepage Support can be entirely disabled (*mostly for debugging purposes*) or only enabled inside MADV_HUGEPAGE regions (to avoid the risk of consuming more memory resources) or enabled system wide. This can be achieved with one of: echo always >/sys/kernel/mm/transparent_hugepage/enabled echo madvise >/sys/kernel/mm/transparent_hugepage/enabled echo never >/sys/kernel/mm/transparent_hugepage/enabled It's also possible to limit defrag efforts in the VM to generate hugepages in case they're not immediately free to madvise regions or to never try to defrag memory and simply fallback to regular pages unless hugepages are immediately available. Clearly if we spend CPU time to defrag memory, we would expect to gain even more by the fact we use hugepages later instead of regular pages. This isn't always guaranteed, but it may be more likely in case the allocation is for a MADV_HUGEPAGE region. echo always >/sys/kernel/mm/transparent_hugepage/defrag echo madvise >/sys/kernel/mm/transparent_hugepage/defrag echo never >/sys/kernel/mm/transparent_hugepage/defrag ->%- so I think setting both enabled and defrag to "madvise" by default would be advisable - the admin can override it (permanently with a kernel boot parameter, or at run time with the sysfs interface) anyway if they really know it causes performance issues. if you have any hard benchmark data to back up staying at "never", please send it soon ;) preferable both with non-transparent hugepages setup and without, and with both "always" and "madvise" for enabled and defrag. running a setup that is intended for debugging purposes (see above) as default seems strange to me (and this was probably the reason why we needed to patch "never" as default in in the first place). while I am not yet convinced that this solves the passthrough data corruption issue entirely, it is very reliably reproducable with THP disabled, and not at all so far on my test setup with THP enabled - so I propose switching with the next kernel update, unless there are (serious) objections. 1: https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/ 2: http://pve.proxmox.com/pipermail/pve-devel/2015-September/017079.html 3. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/Documentation/vm/transhuge.txt?h=linux-4.4.y#n95 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel