Hi Alexandre, please find my logs here. From three different nodes just to see if there's any difference.
pve01 node : http://pastebin.com/M14R0WBc pve02 node : http://pastebin.com/q1kW07xs pve09 node (totem) : http://pastebin.com/CpZd6dmn omping gives me similar results on all nodes: http://pastebin.com/s4H92Scg Thanks! On Fri, Oct 28, 2016 at 3:55 PM, Alexandre DERUMIER <[email protected]> wrote: > can you send your corosync log in /var/log/daemon.log ? > > > ----- Mail original ----- > De: "Szabolcs F." <[email protected]> > À: "Michael Rasmussen" <[email protected]> > Cc: "proxmoxve" <[email protected]> > Envoyé: Vendredi 28 Octobre 2016 15:40:06 > Objet: Re: [PVE-User] Promox 4.3 cluster issue > > Hi All, > > my issue came back. So it wasn't related to having Proxmox 4.2 on 4 nodes > and Proxmox 4.3 on the other 8 nodes. > > Now for example if I log into the web UI of my first node all the 11 other > nodes are marked with the red cross. But if I click on a node I can still > see the summary (uptime, load, etc), still can get a shell on other nodes. > But I can't see the name/status of virtual machines running on the red > crossed nodes (I can only see the VM ID/number). And of course I can't > migrated any VM from one host to another. > > Any ideas? > > Thanks! > > On Wed, Oct 26, 2016 at 12:57 PM, Szabolcs F. <[email protected]> wrote: > > > Hello again, > > > > sorry for another followup. I just realised that 4 of the 12 cluster > nodes > > still have PVE Manager version 4.2 and the other 8 nodes have version > 4.3. > > Can this be the reason of all my troubles? > > > > I'm in the process of updating these 4 nodes. These 4 nodes were > installed > > with the Proxmox install media, but the other 8 nodes were installed with > > Debian 8 first. So the 4 outdated nodes didn't have the 'deb > > http://download.proxmox.com/debian jessie pve-no-subscription' repo > file. > > Adding this repo made the 4.3 updates available. > > > > > > > > On Wed, Oct 26, 2016 at 12:20 PM, Szabolcs F. <[email protected]> wrote: > > > >> Hi Michael, > >> > >> I can change to LACP, sure. Would it be better than simple > active-backup? > >> I haven't got too much experience with LACP though. > >> > >> On Wed, Oct 26, 2016 at 11:55 AM, Michael Rasmussen <[email protected]> > >> wrote: > >> > >>> Is it possible to switch to 802.3ad bond mode? > >>> > >>> On October 26, 2016 11:12:06 AM GMT+02:00, "Szabolcs F." < > >>> [email protected]> wrote: > >>> > >>>> Hi Lutz, > >>>> > >>>> my bondXX files look like this: http://pastebin.com/GX8x3ZaN > >>>> and my corosync.conf : http://pastebin.com/2ss0AAEr > >>>> > >>>> Mutlicast is enabled on my switches. > >>>> > >>>> The problem is I don't have a way to to replicate the problem, it > seems to > >>>> happen randomly, so I'm unsure how to do more tests. At the moment my > >>>> cluster is working fine for about 16 hours. Any ideas forcing the > issue? > >>>> > >>>> Thanks, > >>>> Szabolcs > >>>> > >>>> On Wed, Oct 26, 2016 at 9:17 AM, Lutz Willek < > [email protected]> > >>>> wrote: > >>>> > >>>> Am 24.10.2016 um 15:16 schrieb Szabolcs F.: > >>>>> > >>>>> Corosync has a lot of > >>>>>> these in the /var/logs/daemon.log : > >>>>>> http://pastebin.com/ajhE8Rb9 > >>>>> > >>>>> > >>>>> > >>>>> please carefully check your (node/switch/multicast) network > configuration, > >>>>> and please paste your corosync configuration file and output of > >>>>> /proc/net/bonding/bondXX > >>>>> > >>>>> just a guess: > >>>>> > >>>>> * powerdown 1/3 - 1/2 of your nodes, adjust quorum (pvecm expect) > >>>>> --> Problems still occours? > >>>>> > >>>>> * during "problem time" > >>>>> --> omping is still ok? > >>>>> > >>>>> https://pve.proxmox.com/wiki/Troubleshooting_multicast,_quor > >>>>> um_and_cluster_issues > >>>>> > >>>>> > >>>>> Freundliche Grüße / Best Regards > >>>>> > >>>>> Lutz Willek > >>>>> > >>>>> -- > >>>>> ------------------------------ > >>>>> creating IT solutions > >>>>> Lutz Willek science + computing ag > >>>>> Senior Systems Engineer Geschäftsstelle Berlin > >>>>> IT Services Berlin > >>>>> Friedrichstraße 187 > >>>>> phone +49(0)30 2007697-21 10117 Berlin, Germany > >>>>> fax +49(0)30 2007697-11 http://de.atos.net/sc > >>>>> > >>>>> S/MIME-Sicherheit: > >>>>> http://www.science-computing.de/cacert.crt > >>>>> http://www.science-computing.de/cacert-sha512.crt > >>>>> > >>>>> > >>>>> ------------------------------ > >>>>> > >>>>> pve-user mailing list > >>>>> [email protected] > >>>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > >>>> > >>>> > >>>> ------------------------------ > >>>> > >>>> pve-user mailing list > >>>> [email protected] > >>>> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > >>>> > >>>> > >>> -- > >>> Sent from my Android phone with K-9 Mail. Please excuse my brevity. > >>> > >> > >> > > > _______________________________________________ > pve-user mailing list > [email protected] > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > > _______________________________________________ > pve-user mailing list > [email protected] > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user > _______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
