Hi all,

here is a list of issues found during testing of a setup with 2 cluster nodes, 8 remote nodes and around 450 resources. I hope it could be useful to do some polishing before 1.1.15 release. pacemaker version is quite close to 1.1.15-rc1

* templates are not supported for ocf:pacemaker:remote
* fencing events may be lost due to long transition run time ( already discussed) * cib becomes unresponsive when uploading many changes, that leads to sbd fencing (if sbd is enabled) * node-action-limit seems to work on a per-cluster-node basis, so it limits number of operations run on all remote nodes connected by a given cluster node * changing many node attributes during the transition run may lead to transition-recalculation-storm (found with a resource-agent which changes dozens of attributes) * notice: Relying on watchdog integration for fencing - this should probably needs to be reworded/downgraded * application of a big enough CIB diff results in monitor failures - CPU hog? CIB hang? * crmd[9834]: crit: GLib: g_hash_table_lookup: assertion 'hash_table != NULL' failed - hope to catch this again next week as coredump is lost * pacemaker looses resource exit from a pending state (Starting/Stopping/Migrating) change is visible in logs of a local node (or crmd manages a given remote node) but is not propagated to CIB
* crmd crash discovered after moving DC node to standby
segfault in crmd's remote-related code (lrmd client) - hope to catch this again next week * failcounts for resources on remote nodes are not properly cleaned up (related to pending states enabled???) * many "warning: No reason to expect node XXX to be down" when deleting attributes on remote nodes * "error: Query resulted in an error: Timer expired" when adding attributes on remote nodes
* the same when uploading CIB patch
* attrd[23798]: notice: Update error (unknown peer uuid, retry will be attempted once uuid is discovered): <node>[<attribute>]=(null) failed (host=0x2921ae0) - needs to be reinvestigated

If there any interest in additional information, I can gather it next week when I have access to a hardware again.

Hope this could be useful,

Vladislav

_______________________________________________
Developers mailing list
[email protected]
http://clusterlabs.org/mailman/listinfo/developers

Reply via email to