Re: [pve-devel] [PATCH kernel] By default disable the new dynamic halt polling behavior
Does upstream know about this? Stefan Excuse my typo sent from my mobile phone. > Am 12.05.2016 um 12:51 schrieb Wolfgang Bumiller: > > The default behavior introduced by kernel commit aca6ff29c > (KVM: dynamic halt-polling) causes a spike in cpu usage and > massive performance degradation with virtio network under > network load. This patch changes the newly introduced kvm > module parameters to reflect the old behavior. > --- > Makefile | 1 + > kvm-dynamic-halt-polling-disable-default.patch | 12 > 2 files changed, 13 insertions(+) > create mode 100644 kvm-dynamic-halt-polling-disable-default.patch > > diff --git a/Makefile b/Makefile > index 5851d9d..7fb83d5 100644 > --- a/Makefile > +++ b/Makefile > @@ -242,6 +242,7 @@ ${KERNEL_SRC}/README ${KERNEL_CFG_ORG}: ${KERNELSRCTAR} >cd ${KERNEL_SRC}; patch -p1 < > ../CVE-2016-4485-net-fix-infoleak-in-llc.patch >cd ${KERNEL_SRC}; patch -p1 < > ../CVE-2016-4486-net-fix-infoleak-in-rtnetlink.patch >cd ${KERNEL_SRC}; patch -p1 < > ../CVE-2016-4558-bpf-fix-refcnt-overflow.patch > +cd ${KERNEL_SRC}; patch -p1 < > ../kvm-dynamic-halt-polling-disable-default.patch >sed -i ${KERNEL_SRC}/Makefile -e > 's/^EXTRAVERSION.*$$/EXTRAVERSION=${EXTRAVERSION}/' >touch $@ > > diff --git a/kvm-dynamic-halt-polling-disable-default.patch > b/kvm-dynamic-halt-polling-disable-default.patch > new file mode 100644 > index 000..dcf1dee > --- /dev/null > +++ b/kvm-dynamic-halt-polling-disable-default.patch > @@ -0,0 +1,12 @@ > +diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > +--- a/virt/kvm/kvm_main.c 2016-05-12 10:39:37.540387127 +0200 > b/virt/kvm/kvm_main.c 2016-05-04 10:43:38.063996221 +0200 > +@@ -71,7 +71,7 @@ static unsigned int halt_poll_ns = KVM_H > + module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR); > + > + /* Default doubles per-vcpu halt_poll_ns. */ > +-static unsigned int halt_poll_ns_grow = 2; > ++static unsigned int halt_poll_ns_grow = 0; > + module_param(halt_poll_ns_grow, int, S_IRUGO); > + > + /* Default resets per-vcpu halt_poll_ns . */ > -- > 2.1.4 > > > ___ > pve-devel mailing list > pve-devel@pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
> 2M : /run/hugepages/kvm > 1G : /run/hugepages/kvm_1GB > > for example > > (It's possible to have /dev/hugepages and /run/hugepages/kvm at the same > time, as the files are allocated by process using pages, so they are no > conflict) would be OK for me. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
> a simple > > echo x > > /sys/devices/system/node/nodeX/hugepages/hugepages-1048576kB/nr_hugepages > echo x > > /sys/devices/system/node/nodeX/hugepages/hugepages-2048kB/nr_hugepages > > is enough (need to be done for each numa node) > > freehuges can be checked > > cat > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/free_hugepages > cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages > > > Like this, user can if he want reserved hugepages at boot (sysctl or kernel > options in grub). > And if he don't define hugepages or define not enough hugepages > (free_hugepages counter), > we can try to allocate them for him. Yes, that sounds reasonable to me. It would be great if we can get that working without the need to configure something (grub.conf, fstab, ...) ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
> Technically, it's possible to allocate, 1pages of 1GB and 250pages of 2MB, > with 2 mounts point. > > But It's a little bit more complex > > Opinions ? Why not. Also, allocation 1GB pages may can fail due to fragmentation, while allocation of 2MB pages still work? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
>>who mount that? # mount|grep hugecgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) cgmanager.service >>hugetlb on /run/lxcfs/controllers/hugetlb type cgroup >>(rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) lxcfs.service so we have already a default hugetlbfs of 2MB in /dev/hugepages For 1GB, we can define another mount point openstack use 2M : /run/hugepages/kvm 1G : /run/hugepages/kvm_1GB for example (It's possible to have /dev/hugepages and /run/hugepages/kvm at the same time, as the files are allocated by process using pages, so they are no conflict) - Mail original - De: "dietmar"À: "aderumier" , "pve-devel" Envoyé: Jeudi 12 Mai 2016 12:57:42 Objet: Re: [pve-devel] [PATCH 1/2] add hugepages option > host configuration > -- > hugepages need to be allocated at boot > > for 4GB of 2M hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=2M hugepages=2048" > > /etc/fstab > -- > hugetlbfs /dev/hugepages hugetlbfs pagesize=2048k 0 0 > > for 4GB for 1GB hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G > hugepages=4" It is still unclear to me how to setup hugepages. On my host /dev/hugepages is already mounted: # mount|grep hugecgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) hugetlb on /run/lxcfs/controllers/hugetlb type cgroup (rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) who mount that? Do I still need above setup? How do I know the number of required hugepages in advance? How can we make that more convenient for the user? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
I have tested dynamic allocation/deallocation of 2M and 1G hugepages, It's working for me (at least if memory is not too fragmented) a simple echo x > /sys/devices/system/node/nodeX/hugepages/hugepages-1048576kB/nr_hugepages echo x > /sys/devices/system/node/nodeX/hugepages/hugepages-2048kB/nr_hugepages is enough (need to be done for each numa node) freehuges can be checked cat /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/free_hugepages cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages Like this, user can if he want reserved hugepages at boot (sysctl or kernel options in grub). And if he don't define hugepages or define not enough hugepages (free_hugepages counter), we can try to allocate them for him. - Mail original - De: "aderumier"À: "dietmar" Cc: "pve-devel" Envoyé: Vendredi 13 Mai 2016 05:21:20 Objet: Re: [pve-devel] [PATCH 1/2] add hugepages option Another question, currently my patch manage only 1 size of hugepages. one /dev/hupagepages mount, with 2MB or 1GB hugepages. That mean that if user want to use 1GB hugepages, he can only define vm memory with multiple of 1GB (ex: 1,5G will not work) Technically, it's possible to allocate, 1pages of 1GB and 250pages of 2MB, with 2 mounts point. But It's a little bit more complex Opinions ? - Mail original - De: "dietmar" À: "aderumier" Cc: "pve-devel" Envoyé: Jeudi 12 Mai 2016 17:32:41 Objet: Re: [pve-devel] [PATCH 1/2] add hugepages option > but I have a default > > hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) I wonder what program creates that? Will try to find out ... ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
Another question, currently my patch manage only 1 size of hugepages. one /dev/hupagepages mount, with 2MB or 1GB hugepages. That mean that if user want to use 1GB hugepages, he can only define vm memory with multiple of 1GB (ex: 1,5G will not work) Technically, it's possible to allocate, 1pages of 1GB and 250pages of 2MB, with 2 mounts point. But It's a little bit more complex Opinions ? - Mail original - De: "dietmar"À: "aderumier" Cc: "pve-devel" Envoyé: Jeudi 12 Mai 2016 17:32:41 Objet: Re: [pve-devel] [PATCH 1/2] add hugepages option > but I have a default > > hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) I wonder what program creates that? Will try to find out ... ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
> but I have a default > > hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) I wonder what program creates that? Will try to find out ... ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
>># mount|grep hugecgroup on /sys/fs/cgroup/hugetlb type cgroup >>(rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) >>hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) >>hugetlb on /run/lxcfs/controllers/hugetlb type cgroup >>(rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) >> >>who mount that? I don't have this on my proxmox 4 nodes. (cgmanager is running, but I don't have container running) but I have a default hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) - Mail original - De: "dietmar"À: "aderumier" , "pve-devel" Envoyé: Jeudi 12 Mai 2016 12:57:42 Objet: Re: [pve-devel] [PATCH 1/2] add hugepages option > host configuration > -- > hugepages need to be allocated at boot > > for 4GB of 2M hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=2M hugepages=2048" > > /etc/fstab > -- > hugetlbfs /dev/hugepages hugetlbfs pagesize=2048k 0 0 > > for 4GB for 1GB hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G > hugepages=4" It is still unclear to me how to setup hugepages. On my host /dev/hugepages is already mounted: # mount|grep hugecgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) hugetlb on /run/lxcfs/controllers/hugetlb type cgroup (rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) who mount that? Do I still need above setup? How do I know the number of required hugepages in advance? How can we make that more convenient for the user? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
looking at the old article https://lwn.net/Articles/376606/ vm.nr_overcommit_hugepages "Knowing the exact huge page requirements in advance may not be possible. For example, the huge page requirements may be expected to vary throughout the lifetime of the system. In this case, the maximum number of additional huge pages that should be allocated is specified with the vm.nr_overcommit_hugepages. When a huge page pool does not have sufficient pages to satisfy a request for huge pages, an attempt to allocate up to nr_overcommit_hugepages is made. If an allocation fails, the result will be that mmap() will fail to avoid page fault failures as described in Huge Page Fault Behaviour in part 1. " So, maybe if we vm.nr_overcommit_hugepages, application can reserve hugepages without need to increase them manually I need to test that - Mail original - De: "aderumier"À: "dietmar" Cc: "pve-devel" Envoyé: Jeudi 12 Mai 2016 14:05:56 Objet: Re: [pve-devel] [PATCH 1/2] add hugepages option >>who mount that? Do I still need above setup? How do I know the number >>of required hugepages in advance? How can we make that more convenient for >>the user? It's quite possible to increase/decrease hugepage online, at least of 2MB hugepages, through sysfs. echo X > /proc/sys/vm/nr_hugepages can be done at vm start/stop for example For 1GB hugepages, it's more difficult because of memory fragmentation. But basicaly, THP can already do the job for 2MB hugepage for 1GB,THP don't work. and recommendation are to define them at start, because they need contigous memory - Mail original - De: "dietmar" À: "aderumier" , "pve-devel" Envoyé: Jeudi 12 Mai 2016 12:57:42 Objet: Re: [pve-devel] [PATCH 1/2] add hugepages option > host configuration > -- > hugepages need to be allocated at boot > > for 4GB of 2M hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=2M hugepages=2048" > > /etc/fstab > -- > hugetlbfs /dev/hugepages hugetlbfs pagesize=2048k 0 0 > > for 4GB for 1GB hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G > hugepages=4" It is still unclear to me how to setup hugepages. On my host /dev/hugepages is already mounted: # mount|grep hugecgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) hugetlb on /run/lxcfs/controllers/hugetlb type cgroup (rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) who mount that? Do I still need above setup? How do I know the number of required hugepages in advance? How can we make that more convenient for the user? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
>>Normally hugepages are setup via sysctl.conf. Default size is always 2 MB. No >>need for kernel commandline editing. for 2MB yes, for 1GB hugepage I'm not sure you can enable them through sysctl >>Why exactly are you using hugepages? needed for ovs + vhost-user + dpdk >>Can KVM handle hugepages? Normally hugepages implies no KSM, isn't that >>right? yes sure , it can handle it. Memory is preallocated at vm start, so KSM can't work here - Mail original - De: "Andreas Steinel"À: "dietmar" , "pve-devel" Cc: "aderumier" Envoyé: Jeudi 12 Mai 2016 13:30:08 Objet: Re: [pve-devel] [PATCH 1/2] add hugepages option Normally hugepages are setup via sysctl.conf. Default size is always 2 MB. No need for kernel commandline editing. Why exactly are you using hugepages? Can KVM handle hugepages? Normally hugepages implies no KSM, isn't that right? On Thu, May 12, 2016 at 12:57 PM, Dietmar Maurer < diet...@proxmox.com > wrote: > host configuration > -- > hugepages need to be allocated at boot > > for 4GB of 2M hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=2M hugepages=2048" > > /etc/fstab > -- > hugetlbfs /dev/hugepages hugetlbfs pagesize=2048k 0 0 > > for 4GB for 1GB hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G > hugepages=4" It is still unclear to me how to setup hugepages. On my host /dev/hugepages is already mounted: # mount|grep hugecgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) hugetlb on /run/lxcfs/controllers/hugetlb type cgroup (rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) who mount that? Do I still need above setup? How do I know the number of required hugepages in advance? How can we make that more convenient for the user? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH common] support for predictable network interface device names
On Thu, May 12, 2016 at 3:05 PM, Wolfgang Bumillerwrote: > On Thu, May 12, 2016 at 02:30:11PM +0300, Igor Vlasenko wrote: >> On Thu, May 12, 2016 at 2:08 PM, Wolfgang Bumiller >> wrote: >> > Could you review the following modified version of your old patch? >> >> seems good. >> The only question I have does not concern the patch, but the current code >> that is being patched. [...] > This portion reads the interfaces from /proc/net/dev where the names are > terminated with a colon. Then everything is ok, looking forward to see it applied. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
>>who mount that? Do I still need above setup? How do I know the number >>of required hugepages in advance? How can we make that more convenient for >>the user? It's quite possible to increase/decrease hugepage online, at least of 2MB hugepages, through sysfs. echo X > /proc/sys/vm/nr_hugepages can be done at vm start/stop for example For 1GB hugepages, it's more difficult because of memory fragmentation. But basicaly, THP can already do the job for 2MB hugepage for 1GB,THP don't work. and recommendation are to define them at start, because they need contigous memory - Mail original - De: "dietmar"À: "aderumier" , "pve-devel" Envoyé: Jeudi 12 Mai 2016 12:57:42 Objet: Re: [pve-devel] [PATCH 1/2] add hugepages option > host configuration > -- > hugepages need to be allocated at boot > > for 4GB of 2M hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=2M hugepages=2048" > > /etc/fstab > -- > hugetlbfs /dev/hugepages hugetlbfs pagesize=2048k 0 0 > > for 4GB for 1GB hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G > hugepages=4" It is still unclear to me how to setup hugepages. On my host /dev/hugepages is already mounted: # mount|grep hugecgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) hugetlb on /run/lxcfs/controllers/hugetlb type cgroup (rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) who mount that? Do I still need above setup? How do I know the number of required hugepages in advance? How can we make that more convenient for the user? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH common] support for predictable network interface device names
On Thu, May 12, 2016 at 02:30:11PM +0300, Igor Vlasenko wrote: > On Thu, May 12, 2016 at 2:08 PM, Wolfgang Bumiller >wrote: > > On Thu, May 12, 2016 at 11:42:29AM +0300, Igor Vlasenko wrote: > >> On Wed, May 11, 2016 at 10:56 PM, Igor Vlasenko > >> wrote: > >> > This is an improved version of my previous patch > >> > [ support for udev-style physical interface names (like enp3s0), > >> > http://pve.proxmox.com/pipermail/pve-devel/2016-May/020958.html ] > >> > thanks to Wolfgang. > >> > >> Yesterday I finished coding just before going to sleep, so I was a bit > >> muddleheaded and > >> left a few mistakes :( Here I fixed them and added a test case to verify. > > > > Ah now you added all the ones described in those links. But AFAIK wlan > > interfaces cannot be added to bridges. > > This seems a little overkill (and there are a few style concerns in the > > code). > > I was thinking that repeating the same (possibly incomplete) pattern > again and again is the code clone bug pattern. > To have one regex in one place, though it looks a bit overkill, will help > to maintain the code in the future. > > > Maybe it's best to stick to the devices we know users are currently > > dealing with and just stick to including 'en'-prefixed devices. > > Iow. a variant of your old patch with just 'en' instead of 'enp' and the > > veth/tap hunks removed. > > I have no objections. > > > Could you review the following modified version of your old patch? > > seems good. > The only question I have does not concern the patch, but the current code > that is being patched. > > --- a/src/PVE/INotify.pm > +++ b/src/PVE/INotify.pm > @@ -800,7 +800,7 @@ sub __read_etc_network_interfaces { > > if ($proc_net_dev) { > while (defined ($line = <$proc_net_dev>)) { > - if ($line =~ m/^\s*(eth\d+):.*/) { > + if ($line =~ m/^\s*(eth\d+|en[^:.]+):.*/) { > $ifaces->{$1}->{exists} = 1; > } > } > > in the line 803: > - if ($line =~ m/^\s*(eth\d+):.*/) { > Should it be a colon (`:') there? as in eth0:something? > Should not it be a point (`.') ? as in eth0.1? > If it is ok, then the patch is ok, else patch also should use a point > instead of the colon. This portion reads the interfaces from /proc/net/dev where the names are terminated with a colon. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
Normally hugepages are setup via sysctl.conf. Default size is always 2 MB. No need for kernel commandline editing. Why exactly are you using hugepages? Can KVM handle hugepages? Normally hugepages implies no KSM, isn't that right? On Thu, May 12, 2016 at 12:57 PM, Dietmar Maurerwrote: > > host configuration > > -- > > hugepages need to be allocated at boot > > > > for 4GB of 2M hugepages > > > > /etc/default/grub > > - > > GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=2M hugepages=2048" > > > > /etc/fstab > > -- > > hugetlbfs /dev/hugepages hugetlbfs pagesize=2048k0 0 > > > > for 4GB for 1GB hugepages > > > > /etc/default/grub > > - > > GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G > > hugepages=4" > > It is still unclear to me how to setup hugepages. On my host /dev/hugepages > is already mounted: > > # mount|grep hugecgroup on /sys/fs/cgroup/hugetlb type cgroup > > (rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) > hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) > hugetlb on /run/lxcfs/controllers/hugetlb type cgroup > > (rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) > > who mount that? Do I still need above setup? How do I know the number > of required hugepages in advance? How can we make that more convenient for > the user? > > ___ > pve-devel mailing list > pve-devel@pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH common] support for predictable network interface device names
On Thu, May 12, 2016 at 2:08 PM, Wolfgang Bumillerwrote: > On Thu, May 12, 2016 at 11:42:29AM +0300, Igor Vlasenko wrote: >> On Wed, May 11, 2016 at 10:56 PM, Igor Vlasenko wrote: >> > This is an improved version of my previous patch >> > [ support for udev-style physical interface names (like enp3s0), >> > http://pve.proxmox.com/pipermail/pve-devel/2016-May/020958.html ] >> > thanks to Wolfgang. >> >> Yesterday I finished coding just before going to sleep, so I was a bit >> muddleheaded and >> left a few mistakes :( Here I fixed them and added a test case to verify. > > Ah now you added all the ones described in those links. But AFAIK wlan > interfaces cannot be added to bridges. > This seems a little overkill (and there are a few style concerns in the > code). I was thinking that repeating the same (possibly incomplete) pattern again and again is the code clone bug pattern. To have one regex in one place, though it looks a bit overkill, will help to maintain the code in the future. > Maybe it's best to stick to the devices we know users are currently > dealing with and just stick to including 'en'-prefixed devices. > Iow. a variant of your old patch with just 'en' instead of 'enp' and the > veth/tap hunks removed. I have no objections. > Could you review the following modified version of your old patch? seems good. The only question I have does not concern the patch, but the current code that is being patched. --- a/src/PVE/INotify.pm +++ b/src/PVE/INotify.pm @@ -800,7 +800,7 @@ sub __read_etc_network_interfaces { if ($proc_net_dev) { while (defined ($line = <$proc_net_dev>)) { - if ($line =~ m/^\s*(eth\d+):.*/) { + if ($line =~ m/^\s*(eth\d+|en[^:.]+):.*/) { $ifaces->{$1}->{exists} = 1; } } in the line 803: - if ($line =~ m/^\s*(eth\d+):.*/) { Should it be a colon (`:') there? as in eth0:something? Should not it be a point (`.') ? as in eth0.1? If it is ok, then the patch is ok, else patch also should use a point instead of the colon. > --- > src/PVE/INotify.pm | 8 > src/PVE/Network.pm | 2 +- > 2 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/src/PVE/INotify.pm b/src/PVE/INotify.pm > index 74a0fe1..c34659f 100644 > --- a/src/PVE/INotify.pm > +++ b/src/PVE/INotify.pm > @@ -800,7 +800,7 @@ sub __read_etc_network_interfaces { > > if ($proc_net_dev) { > while (defined ($line = <$proc_net_dev>)) { > - if ($line =~ m/^\s*(eth\d+):.*/) { > + if ($line =~ m/^\s*(eth\d+|en[^:.]+):.*/) { > $ifaces->{$1}->{exists} = 1; > } > } > @@ -973,7 +973,7 @@ sub __read_etc_network_interfaces { > $ifaces->{$1}->{exists} = 0; > $d->{exists} = 0; > } > - } elsif ($iface =~ m/^eth\d+$/) { > + } elsif ($iface =~ m/^(?:eth\d+|en[^:.]+)$/) { > if (!$d->{ovs_type}) { > $d->{type} = 'eth'; > } elsif ($d->{ovs_type} eq 'OVSPort') { > @@ -1200,7 +1200,7 @@ sub __write_etc_network_interfaces { > $d->{type} eq 'OVSBond') { > my $brname = $used_ports->{$iface}; > if (!$brname || !$ifaces->{$brname}) { > - if ($iface =~ /^eth/) { > + if ($iface =~ /^(?:eth|en)/) { > $ifaces->{$iface} = { type => 'eth', > exists => 1, > method => 'manual', > @@ -1289,7 +1289,7 @@ NETWORKDOC > my $pri; > if ($iface eq 'lo') { > $pri = $if_type_hash->{loopback}; > - } elsif ($iface =~ m/^eth\d+$/) { > + } elsif ($iface =~ m/^(?:eth\d+|en[^:.]+)$/) { > $pri = $if_type_hash->{eth} + $child; > } elsif ($iface =~ m/^bond\d+$/) { > $pri = $if_type_hash->{bond} + $child; > diff --git a/src/PVE/Network.pm b/src/PVE/Network.pm > index be26132..0a984ad 100644 > --- a/src/PVE/Network.pm > +++ b/src/PVE/Network.pm > @@ -440,7 +440,7 @@ sub activate_bridge_vlan { > > my @ifaces = (); > my $dir = "/sys/class/net/$bridge/brif"; > -PVE::Tools::dir_glob_foreach($dir, '((eth|bond)\d+(\.\d+)?)', sub { > +PVE::Tools::dir_glob_foreach($dir, '(((eth|bond)\d+|en[^.]+)(\.\d+)?)', > sub { > push @ifaces, $_[0]; > }); > > -- > 2.1.4 > > -- С уважением, Игорь Власенко. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH kernel] By default disable the new dynamic halt polling behavior
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH common] support for predictable network interface device names
On Thu, May 12, 2016 at 11:42:29AM +0300, Igor Vlasenko wrote: > On Wed, May 11, 2016 at 10:56 PM, Igor Vlasenkowrote: > > This is an improved version of my previous patch > > [ support for udev-style physical interface names (like enp3s0), > > http://pve.proxmox.com/pipermail/pve-devel/2016-May/020958.html ] > > thanks to Wolfgang. > > Yesterday I finished coding just before going to sleep, so I was a bit > muddleheaded and > left a few mistakes :( Here I fixed them and added a test case to verify. Ah now you added all the ones described in those links. But AFAIK wlan interfaces cannot be added to bridges. This seems a little overkill (and there are a few style concerns in the code). Maybe it's best to stick to the devices we know users are currently dealing with and just stick to including 'en'-prefixed devices. Iow. a variant of your old patch with just 'en' instead of 'enp' and the veth/tap hunks removed. Could you review the following modified version of your old patch? --- src/PVE/INotify.pm | 8 src/PVE/Network.pm | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/PVE/INotify.pm b/src/PVE/INotify.pm index 74a0fe1..c34659f 100644 --- a/src/PVE/INotify.pm +++ b/src/PVE/INotify.pm @@ -800,7 +800,7 @@ sub __read_etc_network_interfaces { if ($proc_net_dev) { while (defined ($line = <$proc_net_dev>)) { - if ($line =~ m/^\s*(eth\d+):.*/) { + if ($line =~ m/^\s*(eth\d+|en[^:.]+):.*/) { $ifaces->{$1}->{exists} = 1; } } @@ -973,7 +973,7 @@ sub __read_etc_network_interfaces { $ifaces->{$1}->{exists} = 0; $d->{exists} = 0; } - } elsif ($iface =~ m/^eth\d+$/) { + } elsif ($iface =~ m/^(?:eth\d+|en[^:.]+)$/) { if (!$d->{ovs_type}) { $d->{type} = 'eth'; } elsif ($d->{ovs_type} eq 'OVSPort') { @@ -1200,7 +1200,7 @@ sub __write_etc_network_interfaces { $d->{type} eq 'OVSBond') { my $brname = $used_ports->{$iface}; if (!$brname || !$ifaces->{$brname}) { - if ($iface =~ /^eth/) { + if ($iface =~ /^(?:eth|en)/) { $ifaces->{$iface} = { type => 'eth', exists => 1, method => 'manual', @@ -1289,7 +1289,7 @@ NETWORKDOC my $pri; if ($iface eq 'lo') { $pri = $if_type_hash->{loopback}; - } elsif ($iface =~ m/^eth\d+$/) { + } elsif ($iface =~ m/^(?:eth\d+|en[^:.]+)$/) { $pri = $if_type_hash->{eth} + $child; } elsif ($iface =~ m/^bond\d+$/) { $pri = $if_type_hash->{bond} + $child; diff --git a/src/PVE/Network.pm b/src/PVE/Network.pm index be26132..0a984ad 100644 --- a/src/PVE/Network.pm +++ b/src/PVE/Network.pm @@ -440,7 +440,7 @@ sub activate_bridge_vlan { my @ifaces = (); my $dir = "/sys/class/net/$bridge/brif"; -PVE::Tools::dir_glob_foreach($dir, '((eth|bond)\d+(\.\d+)?)', sub { +PVE::Tools::dir_glob_foreach($dir, '(((eth|bond)\d+|en[^.]+)(\.\d+)?)', sub { push @ifaces, $_[0]; }); -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH 1/2] add hugepages option
> host configuration > -- > hugepages need to be allocated at boot > > for 4GB of 2M hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=2M hugepages=2048" > > /etc/fstab > -- > hugetlbfs /dev/hugepages hugetlbfs pagesize=2048k0 0 > > for 4GB for 1GB hugepages > > /etc/default/grub > - > GRUB_CMDLINE_LINUX_DEFAULT="quiet default_hugepagesz=1G hugepagesz=1G > hugepages=4" It is still unclear to me how to setup hugepages. On my host /dev/hugepages is already mounted: # mount|grep hugecgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) hugetlb on /run/lxcfs/controllers/hugetlb type cgroup (rw,relatime,hugetlb,release_agent=/run/cgmanager/agents/cgm-release-agent.hugetlb,nsroot=/) who mount that? Do I still need above setup? How do I know the number of required hugepages in advance? How can we make that more convenient for the user? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH kernel] By default disable the new dynamic halt polling behavior
The default behavior introduced by kernel commit aca6ff29c (KVM: dynamic halt-polling) causes a spike in cpu usage and massive performance degradation with virtio network under network load. This patch changes the newly introduced kvm module parameters to reflect the old behavior. --- Makefile | 1 + kvm-dynamic-halt-polling-disable-default.patch | 12 2 files changed, 13 insertions(+) create mode 100644 kvm-dynamic-halt-polling-disable-default.patch diff --git a/Makefile b/Makefile index 5851d9d..7fb83d5 100644 --- a/Makefile +++ b/Makefile @@ -242,6 +242,7 @@ ${KERNEL_SRC}/README ${KERNEL_CFG_ORG}: ${KERNELSRCTAR} cd ${KERNEL_SRC}; patch -p1 < ../CVE-2016-4485-net-fix-infoleak-in-llc.patch cd ${KERNEL_SRC}; patch -p1 < ../CVE-2016-4486-net-fix-infoleak-in-rtnetlink.patch cd ${KERNEL_SRC}; patch -p1 < ../CVE-2016-4558-bpf-fix-refcnt-overflow.patch + cd ${KERNEL_SRC}; patch -p1 < ../kvm-dynamic-halt-polling-disable-default.patch sed -i ${KERNEL_SRC}/Makefile -e 's/^EXTRAVERSION.*$$/EXTRAVERSION=${EXTRAVERSION}/' touch $@ diff --git a/kvm-dynamic-halt-polling-disable-default.patch b/kvm-dynamic-halt-polling-disable-default.patch new file mode 100644 index 000..dcf1dee --- /dev/null +++ b/kvm-dynamic-halt-polling-disable-default.patch @@ -0,0 +1,12 @@ +diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c +--- a/virt/kvm/kvm_main.c 2016-05-12 10:39:37.540387127 +0200 b/virt/kvm/kvm_main.c 2016-05-04 10:43:38.063996221 +0200 +@@ -71,7 +71,7 @@ static unsigned int halt_poll_ns = KVM_H + module_param(halt_poll_ns, uint, S_IRUGO | S_IWUSR); + + /* Default doubles per-vcpu halt_poll_ns. */ +-static unsigned int halt_poll_ns_grow = 2; ++static unsigned int halt_poll_ns_grow = 0; + module_param(halt_poll_ns_grow, int, S_IRUGO); + + /* Default resets per-vcpu halt_poll_ns . */ -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH common] replace the smartmatch operator
I prefer the following code: my @f100 = sort @{$ifaces->{vmbr0}->{families}}; die "invalid families defined for vmbr0" if (scalar(@f100) != 2) || ($f100[0] ne 'inet') || ($f100[1] ne 'inet6'); > +# Compare two arrays of strings > +sub strarray_equals($$) { > +my ($left, $right) = @_; > +return ref($left) && ref($right) && > +ref($left) eq 'ARRAY' && > +ref($right) eq 'ARRAY' && > +scalar(@$left) == scalar(@$right) && > +!grep { $left->[$_] ne $right->[$_] } (0..(@$left-1)); > +} > + ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH rebased container] setup: add ct_is_executable wrapper
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH rebased container] setup: add ct_is_executable wrapper
(This is only a minor fix since these functions are run while chrooted into the container directory anyway.) --- src/PVE/LXC/Setup/Base.pm | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/PVE/LXC/Setup/Base.pm b/src/PVE/LXC/Setup/Base.pm index d54c0cd..927f779 100644 --- a/src/PVE/LXC/Setup/Base.pm +++ b/src/PVE/LXC/Setup/Base.pm @@ -159,7 +159,7 @@ sub setup_init { sub setup_systemd_console { my ($self, $conf) = @_; -my $systemd_dir_rel = -x "/lib/systemd/systemd" ? +my $systemd_dir_rel = $self->ct_is_executable("/lib/systemd/systemd") ? "/lib/systemd/system" : "/usr/lib/systemd/system"; my $systemd_getty_service_rel = "$systemd_dir_rel/getty\@.service"; @@ -200,7 +200,7 @@ sub setup_systemd_console { sub setup_container_getty_service { my ($self, $nosubdir) = @_; -my $systemd_dir_rel = -x "/lib/systemd/systemd" ? +my $systemd_dir_rel = $self->ct_is_executable("/lib/systemd/systemd") ? "/lib/systemd/system" : "/usr/lib/systemd/system"; my $servicefile = "$systemd_dir_rel/container-getty\@.service"; my $raw = $self->ct_file_get_contents($servicefile); @@ -549,6 +549,11 @@ sub ct_is_symlink { return -l $file; } +sub ct_is_executable { +my ($self, $file) = @_; +return -x $file +} + sub ct_stat { my ($self, $file) = @_; return File::stat::stat($file); -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH common] support for predictable network interface device names
On Wed, May 11, 2016 at 10:56 PM, Igor Vlasenkowrote: > This is an improved version of my previous patch > [ support for udev-style physical interface names (like enp3s0), > http://pve.proxmox.com/pipermail/pve-devel/2016-May/020958.html ] > thanks to Wolfgang. Yesterday I finished coding just before going to sleep, so I was a bit muddleheaded and left a few mistakes :( Here I fixed them and added a test case to verify. Signed-off-by: Igor Vlasenko --- src/Makefile | 1 + src/PVE/INotify.pm | 10 +++-- src/PVE/Network.pm | 7 ++- src/PVE/NetworkInterfaces.pm | 50 ++ src/PVE/Tools.pm | 11 + .../t.is_physical_interface.pl | 50 ++ 6 files changed, 123 insertions(+), 6 deletions(-) create mode 100644 src/PVE/NetworkInterfaces.pm create mode 100644 test/etc_network_interfaces/t.is_physical_interface.pl diff --git a/src/Makefile b/src/Makefile index a07e2e4..02265e0 100644 --- a/src/Makefile +++ b/src/Makefile @@ -10,6 +10,7 @@ LIB_SOURCES=\ Daemon.pm\ SectionConfig.pm\ Network.pm\ +NetworkInterfaces.pm\ ProcFSTools.pm\ CLIHandler.pm\ RESTHandler.pm\ diff --git a/src/PVE/INotify.pm b/src/PVE/INotify.pm index 74a0fe1..ce83c16 100644 --- a/src/PVE/INotify.pm +++ b/src/PVE/INotify.pm @@ -14,6 +14,7 @@ use Fcntl qw(:DEFAULT :flock); use PVE::SafeSyslog; use PVE::Exception qw(raise_param_exc); use PVE::Tools; +use PVE::NetworkInterfaces; use Storable qw(dclone); use Linux::Inotify2; use base 'Exporter'; @@ -800,7 +801,8 @@ sub __read_etc_network_interfaces { if ($proc_net_dev) { while (defined ($line = <$proc_net_dev>)) { -if ($line =~ m/^\s*(eth\d+):.*/) { +if ($line =~ m/^\s*([^:]+):.*/ +and PVE::NetworkInterfaces::is_physical_interface($1)) { $ifaces->{$1}->{exists} = 1; } } @@ -973,7 +975,7 @@ sub __read_etc_network_interfaces { $ifaces->{$1}->{exists} = 0; $d->{exists} = 0; } -} elsif ($iface =~ m/^eth\d+$/) { +} elsif (PVE::NetworkInterfaces::is_physical_interface($iface)) { if (!$d->{ovs_type}) { $d->{type} = 'eth'; } elsif ($d->{ovs_type} eq 'OVSPort') { @@ -1200,7 +1202,7 @@ sub __write_etc_network_interfaces { $d->{type} eq 'OVSBond') { my $brname = $used_ports->{$iface}; if (!$brname || !$ifaces->{$brname}) { -if ($iface =~ /^eth/) { +if (PVE::NetworkInterfaces::is_physical_interface($iface)) { $ifaces->{$iface} = { type => 'eth', exists => 1, method => 'manual', @@ -1289,7 +1291,7 @@ NETWORKDOC my $pri; if ($iface eq 'lo') { $pri = $if_type_hash->{loopback}; -} elsif ($iface =~ m/^eth\d+$/) { +} elsif (PVE::NetworkInterfaces::is_physical_interface($iface)) { $pri = $if_type_hash->{eth} + $child; } elsif ($iface =~ m/^bond\d+$/) { $pri = $if_type_hash->{bond} + $child; diff --git a/src/PVE/Network.pm b/src/PVE/Network.pm index be26132..1d15990 100644 --- a/src/PVE/Network.pm +++ b/src/PVE/Network.pm @@ -4,6 +4,7 @@ use strict; use warnings; use PVE::Tools qw(run_command); use PVE::ProcFSTools; +use PVE::NetworkInterfaces; use PVE::INotify; use File::Basename; use IO::Socket::IP; @@ -440,8 +441,10 @@ sub activate_bridge_vlan { my @ifaces = (); my $dir = "/sys/class/net/$bridge/brif"; -PVE::Tools::dir_glob_foreach($dir, '((eth|bond)\d+(\.\d+)?)', sub { -push @ifaces, $_[0]; +PVE::Tools::dir_lambda_foreach($dir, sub { +push @ifaces, $_[0] if $_[0] =~ /^(?:bond|eth)\d+(\.\d+)?/ +or PVE::NetworkInterfaces::is_physical_interface($_[0]); +} }); die "no physical interface on bridge '$bridge'\n" if scalar(@ifaces) == 0; diff --git a/src/PVE/NetworkInterfaces.pm b/src/PVE/NetworkInterfaces.pm new file mode 100644 index 000..80daf31 --- /dev/null +++ b/src/PVE/NetworkInterfaces.pm @@ -0,0 +1,50 @@ +package PVE::NetworkInterfaces; + +use strict; +use warnings; + +# alternatively, on the linux kernel we can readlink /sys/class/net/$iface +# and check whether it points to ../../devices/virtual/... + +# physical network interface pattern matching. +# matching for predictable network interface device names is based on +# https://github.com/systemd/systemd/blob/master/src/udev/udev-builtin-net_id.c +# http://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames +sub is_physical_interface { +my ($iface) = @_; +return $iface =~ +/^(?: +# legacy interface names +eth\d+ + +| # predictable network interface device names + +# two character prefixes: +# en — Ethernet +#
Re: [pve-devel] [PATCH container v2] setup: check if securetty exists
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH kernel] Fix CVE-2016-4485, CVE-2016-4486, CVE-2016-4558
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel