Hi! Probably I sent my patch proposals to the wrong list, so I'll resent them (see attachments). I also added corresponding new patches (0002-0003) for resetting the ringid file. I think the program should not mess with umask, specifically not inside an "if" in a subroutine. I failed to get the sense of it, so I added a comment saying so.
Regards, Ulrich
--- Begin Message ---Answering my own questions: Those ringid files are created by corosync_ring_id_store() from main.c, and the contents is the ring sequence number (binary). The routine is called (indrirectly) from memb_state_commit_enter() (totemsrp.c). That routine is called by memb_join_process() and message_handler_memb_commit_token(). An obvious question arises: As the sequence number is only stored for the active ring, what happens if rings are switched (rrp_mode == passive)? Likewise it seems corosync is using the IP of the first available ring as representative on startup, it seems. Shouldn't the representative (and identifier) be stable (as per slide 4 of [AMMA00] Amir, Y.; Moser, L. E.; Melliar-Smith, P. M.; Agarwal, D. A.; Ciarfella, P.: The Totem Single-Ring Ordering and Membership Protocol)? I doubt that representative and ringid are persistent just for each ring individually... The "rxw"-Permissions are the natural consequence of this code (corosync_ring_id_store()): fd = open (filename, O_WRONLY, 0777); if (fd == -1) { fd = open (filename, O_CREAT|O_RDWR, 0777); } So here's my first patch proposal also. Regards, Ulrich >>> "Ulrich Windl" <[email protected]> schrieb am 02.06.2015 um 16:07 in Nachricht <[email protected]>: >>>> Jan Friesse <[email protected]> schrieb am 02.06.2015 um 12:57 in >>>> Nachricht > <[email protected]>: > [...] >> Last but not least is RRP. RRP itself works very well sadly it works in >> totally different way then most of people expects. > [...] > > Probably because the documentation leaves to many questions unanswered... > You can't tell what's wrong as long as you don't know how it should work. > > Example: What's the purpose of /var/lib/corosync/ringid_* files, and why do > some hosts have /var/lib/corosync/ringid_127.0.0.1? The file content seems to > be some binary number, but the file is created with rwx permissions... > > On systems with two rings (not on 127.*) I see all combinations from one to > three ringid files. > > Regards, > Ulrich > > > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org>From 70fba1e1a547d94e0480aa24b92810a410e36a08 Mon Sep 17 00:00:00 2001 From: Ulrich Windl <[email protected]> Date: Mon, 8 Jun 2015 12:52:04 +0200 Subject: [PATCH] exec/main.c: Improve corosync_ring_id_store() exec/main.c: Change file creation mode from 0777 to S_IRUSR|S_IWUSR improving error messages in corosync_ring_id_store(). --- exec/main.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/exec/main.c b/exec/main.c index 2e4fd02..6c9e1b6 100644 --- a/exec/main.c +++ b/exec/main.c @@ -1043,15 +1043,14 @@ static void corosync_ring_id_store ( snprintf (filename, sizeof(filename), "%s/ringid_%s", get_run_dir(), totemip_print (addr)); - fd = open (filename, O_WRONLY, 0777); + fd = open (filename, O_WRONLY, 0); if (fd == -1) { - fd = open (filename, O_CREAT|O_RDWR, 0777); + fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR); } if (fd == -1) { LOGSYS_PERROR(errno, LOGSYS_LEVEL_ERROR, - "Couldn't store new ring id %llx to stable storage", - memb_ring_id->seq); - + "Couldn't open %s: %s", + filename, strerror(errno)); corosync_exit_error (AIS_DONE_STORE_RINGID); } log_printf (LOGSYS_LEVEL_DEBUG, @@ -1060,8 +1059,8 @@ static void corosync_ring_id_store ( close (fd); if (res != sizeof(memb_ring_id->seq)) { LOGSYS_PERROR(errno, LOGSYS_LEVEL_ERROR, - "Couldn't store new ring id %llx to stable storage", - memb_ring_id->seq); + "Couldn't store new ring id %llx in %s: %s", + memb_ring_id->seq, filename, strerror(errno)); corosync_exit_error (AIS_DONE_STORE_RINGID); } -- 1.7.12.4
--- End Message ---
--- Begin Message ---Hi! Maybe we want "O_DSYNC" also: -- diff --git a/exec/main.c b/exec/main.c index 6c9e1b6..9be33e4 100644 --- a/exec/main.c +++ b/exec/main.c @@ -1043,9 +1043,9 @@ static void corosync_ring_id_store ( snprintf (filename, sizeof(filename), "%s/ringid_%s", get_run_dir(), totemip_print (addr)); - fd = open (filename, O_WRONLY, 0); + fd = open (filename, O_WRONLY|O_DSYNC, 0); if (fd == -1) { - fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR); + fd = open (filename, O_CREAT|O_RDWR|O_DSYNC, S_IRUSR|S_IWUSR); } if (fd == -1) { LOGSYS_PERROR(errno, LOGSYS_LEVEL_ERROR, -- Ulrich
--- End Message ---
>From 0fcf0cfacf5a60f3d84d2407fa917bd72b3b24ca Mon Sep 17 00:00:00 2001 From: Ulrich Windl <[email protected]> Date: Tue, 9 Jun 2015 08:39:02 +0200 Subject: [PATCH 2/3] Fix open flags when creating ringid file exec/main.c (corosync_ring_id_create_or_load): Use zero flags when opening file O_RDONLY. Use symbolic permissions when creating ring id file, dropping useless execute bit. Questin change of umask when creating file. --- exec/main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/exec/main.c b/exec/main.c index 6c9e1b6..9cc0351 100644 --- a/exec/main.c +++ b/exec/main.c @@ -996,7 +996,7 @@ static void corosync_ring_id_create_or_load ( snprintf (filename, sizeof(filename), "%s/ringid_%s", get_run_dir(), totemip_print (addr)); - fd = open (filename, O_RDONLY, 0700); + fd = open (filename, O_RDONLY, 0); /* * If file can be opened and read, read the ring id */ @@ -1009,8 +1009,8 @@ static void corosync_ring_id_create_or_load ( */ if ((fd == -1) || (res != sizeof (uint64_t))) { memb_ring_id->seq = 0; - umask(0); - fd = open (filename, O_CREAT|O_RDWR, 0700); + umask(0); /* why that? */ + fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR); if (fd != -1) { res = write (fd, &memb_ring_id->seq, sizeof (uint64_t)); close (fd); -- 1.7.12.4
>From 5a29ecee27d7fa71d7871c3c8d907dad61572fca Mon Sep 17 00:00:00 2001 From: Ulrich Windl <[email protected]> Date: Tue, 9 Jun 2015 08:44:57 +0200 Subject: [PATCH 3/3] exec/main.c: Improve comments and log messages exec/main.c (corosync_ring_id_create_or_load): Improve comments. Add log message when ring id is reset to zero. Change wording of error messages when new ringid file could not be created. --- exec/main.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/exec/main.c b/exec/main.c index 9cc0351..8734507 100644 --- a/exec/main.c +++ b/exec/main.c @@ -1005,9 +1005,11 @@ static void corosync_ring_id_create_or_load ( close (fd); } /* - * If file could not be opened or read, create a new ring id + * If file could not be opened or read, reset ring id */ if ((fd == -1) || (res != sizeof (uint64_t))) { + log_printf (LOGSYS_LEVEL_NOTICE, "Resetting ring id in %s\n", + filename); memb_ring_id->seq = 0; umask(0); /* why that? */ fd = open (filename, O_CREAT|O_RDWR, S_IRUSR|S_IWUSR); @@ -1016,13 +1018,15 @@ static void corosync_ring_id_create_or_load ( close (fd); if (res == -1) { LOGSYS_PERROR (errno, LOGSYS_LEVEL_ERROR, - "Couldn't write ringid file '%s'", filename); + "Couldn't write initial ringid file '%s'", + filename); corosync_exit_error (AIS_DONE_STORE_RINGID); } } else { LOGSYS_PERROR (errno, LOGSYS_LEVEL_ERROR, - "Couldn't create ringid file '%s'", filename); + "Couldn't create ringid file '%s': %s", + filename, strerror(errno)); corosync_exit_error (AIS_DONE_STORE_RINGID); } -- 1.7.12.4
_______________________________________________ discuss mailing list [email protected] http://lists.corosync.org/mailman/listinfo/discuss
