Hello, I've got a Proxmox VE 4.3 cluster of 12 nodes. All of them are Dell C6220 sleds. Each has 2x Intel Xeon E5-2670 CPU and 64GB RAM. I've got two separate networks: 1Gbps LAN (Cisco 4948 switch) and 10Gbps storage (Cisco N3K-3064PQ fiber switch). The Dell nodes use the integrated Intel Gbit adapters for LAN and Intel PCI-E 10Gbps cards for the fiber network (ixgbe driver). The storage servers are separate, they run FreeNAS and export the shares with NFS. My virtual machines (I've made about 40 of them so far) are KVM/QCOW2 and they are stored on the FreeNAS storage. So far so good. I've been using this environment as a test and was almost ready to push into production.
But I have a problem with the cluster. From time to time the pveproxy service dies on the nodes or the web UI lists all nodes (except the one I'm actually logged into) as unreachable (red cross). Sometimes all nodes are listed as working (green status) but if I try to connect to a virtual machine I get a 'connection refused' error. When the cluster acts up I can't do any VM migration and any other VM management (i.e. console, start/stop/reset, new VM, etc). When it happens the only way to recover is powering down all 12 nodes and starting them one after another. Then everything works properly for a random amount of time: sometimes for weeks, sometimes for only a few days. I followed the network troubleshooting guide with omping, multicast, etc and confirmed I've got multicase enabled and the troubleshooting didn't return any error. The /etc/hosts file is configured on all nodes with the proper hostname/IP list of all nodes. When trying to do 'service pve-cluster restart' I get these errors: http://pastebin.com/NXnEf4rd (running pmxcsf manually mounts the /etc/pve properly, but doesn't fix the cluster/proxy issue) pvecm status : http://pastebin.com/jsDFkqu3 (I powered down one node, that's why it's missing) pvecm nodes : http://pastebin.com/1WR8Yij8 Corosync has a lot of these in the /var/logs/daemon.log : http://pastebin.com/ajhE8Rb9 Someone please help! Thanks, Szabolcs _______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
