Hi,

I've been working on this again this afternoon and have made some progress.


+ What's the difference between FLAG_SEEN and what appears in a user's
seen database?

+ Where is the data behind FLAG_SEEN stored?

The seen state that I'm interested in is that of the owner of an inbox. That state is stored in the cyrus.index file rather than any of the auxiliary databases.

mbexamine (as explained in my previous message below) is the tool that can see that state:

$ sudo /usr/lib/cyrus/bin/mbexamine [email protected] |less

-----
000001> UID:00000615   INT_DATE:1228505606 SENTDATE:1228478400
SIZE:6745       > HDRSIZE:2832   LASTUPD :1718370948 SYSFLAGS:20000010
  LINES:53         > CACHEVER:6
GUID:36635abb976f24447b8a5529274198b6a3d32301 MODSEQ:455369
     > SYSTEMFLAGS: FLAG_ARCHIVED FLAG_SEEN
     > USERFLAGS: 00000000 00000000 00000000 00000081
-----

Unlike the other database tools, mbexamine works on the cyrus mailboxes themselves. This means that it doesn't take any filenames in commandline arguments.

My first job was to write a quick filter, attached as mbexamine-seenstate, that can extract the seen state for each message:

-----
$ /usr/lib/cyrus/bin/mbexamine [email protected] | ./mbexamine-seenstate > 2025-08-10.SEEN
$ more 2025-08-10.SEEN
00000005: FLAG_SEEN
00000006: FLAG_SEEN
00000066: FLAG_SEEN
00000130: FLAG_SEEN
00000159: FLAG_SEEN
...
-----

To read the cyrus.index files from my backups I was going to need to get mbexamine to read them.

The easiest way for me to do this on my running server was to write an LD_PRELOAD shim that intercepted the open(2) call and substituted it for the one from the backup. I also redirected open(2) calls for cyrus.header, cyrus.cache and cyrus.annotations. As I'm running an archive partition I made sure the redirect was in place for the cyrus.* files present in the archive partition as well.

My LD_PRELOAD shim is attached as intercept.c. You will need to customise the source to suit the locations of your Cyrus mail spools and backup files then you can build it:

-----
$ cc -o intercept.so -shared intercept.c -ld
-----

Unpack your backups and run it:

-----
$ export LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH
$ LD_PRELOAD=$PWD/intercept.so /usr/lib/cyrus/bin/mbexamine [email protected] | ./mbexamine-seenstate > 2022-08-28.preload.SEEN
-----

This works for me but I do get a lot of errors in the logs whilst it runs. Even tho' I restored the cache files from the backups they don't seem to be consistent with the index file. It's worked well enough for me to get the seen state I wanted but caveat emptor!

-----
$ diff 2022-08-28.preload.SEEN 2025-08-10.SEEN |less
-----


Next I'll work on merging the .SEEN files produced by my mbexamine-seenstate script. I'm currently thinking that I'll then write a script that logs in via IMAP and tweaks the seen state on the individual messages. I'd welcome thoughts from people with more experience at this kind of thing tho'.



I then used `cyr_dbtool` to extract the seen data from the running
server for that mailbox thus:

$ /usr/lib/cyrus/bin/cyr_dbtool /tmp/andyjpb.seen twoskip show |grep
-e '^533fe437493168f5' |cut -f 5 -d ' ' |tr ',' '\n' |less


Based on the documentation at
https://www.cyrusimap.org/imap/concepts/deployment/databases.html#seen-state-userid-seen
I think this should format the seen data as one seen message UID per
line.

However, I get entries like this:

-----
1:614
616:623
632:665
708:785
801:919
927:957
960:978
981:992
...
1406:1412
1451
1458:1467
...
-----

These can be processed into a complete list of message UIDs with `cyr_sequence`.

For example:
-----
$ /usr/lib/cyrus/bin/cyr_sequence members 1:4
-----

...and to process the list above:
-----
$  /usr/lib/cyrus/bin/cyr_dbtool /tmp/andyjpb.seen twoskip show |grep
-e '^533fe437493168f5' |cut -f 5 -d ' ' |tr ',' '\n' |xargs -n1 /usr/lib/cyrus/bin/cyr_sequence members
-----



--
[email protected]
http://www.ashurst.eu.org/
http://www.gonumber.com/andyjpb
0x7EBA75FF
------------------------------------------
Cyrus: Info
Permalink: 
https://cyrus.topicbox.com/groups/info/Tb45482804665ba7a-M1cdcfa1008c0b09ad36b5cdb
Delivery options: https://cyrus.topicbox.com/groups/info/subscription
#!/usr/bin/perl

# Filter the output of cyrus's `mbexamine` command
# ...to show the sequence number and SEEN state for each message.
#
# $ sudo /usr/lib/cyrus/bin/mbexamine [email protected] | mbexamine-seenstate
#
#
# Andy Bennett <[email protected]>, 2025/08/10 17:13

$flag = 0;
$uid  = 0;
$seen = "UNSEEN";

while ($line = <STDIN>) {
	if ($flag == 1) {
		if ($line =~ /^      >.*/) {
			#print $line;
			if ($line =~ /[^\w](FLAG_SEEN)[^\w]/) {
				$seen = $1;
			}
		}
		else {
			print "$uid: $seen\n";
			$flag = 0;
			$uid  = 0;
			$seen = "UNSEEN";
		}
	}
	if ($line =~ /^000001> UID:([0-9]+) .*/) {
		$flag = 1;
		$uid  = $1;
		#print $line;
	}
}

exit 0

/*
 * Intercept open(2) calls for use with Cyrus's `mbexamine` to extract SEEN state flags from backup cyrus.index files.
 *
 * Customise redirects[] to suit the locations of your Cyrus mail spools and backup files.
 *
 * Build thus:
 *   $ cc -o intercept.so -shared intercept.c -ldl
 *
 * Invoke thus:
 *   $ export LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH
 *   $ LD_PRELOAD=$PWD/intercept.so /usr/lib/cyrus/bin/mbexamine [email protected]
 *
 *   Ensure the backup files are readable by the Cyrus user.
 *
 *
 * Andy Bennett <[email protected]>, 2025/08/10.
 */

#define _GNU_SOURCE

#include <stdarg.h>
#include <stdio.h>
#include <string.h>
#include <dlfcn.h>
#include <fcntl.h>

extern int open (const char *__file, int __oflag, ...) __nonnull ((1));


struct redirect {
	const char *src;
	const char *dest;
};

const struct redirect redirects[] = {
	{
		"/var/vhost/ashurst.eu.org/mail-archivable/domain/a/ashurst.eu.org/a/user/andyjpb/cyrus.index",
		"./20220828/var-cyrus.index"
	},
	{
		"/var/vhost/ashurst.eu.org/mail-archivable/domain/a/ashurst.eu.org/a/user/andyjpb/cyrus.header",
		"./20220828/var-cyrus.header"
	},
	{
		"/var/vhost/ashurst.eu.org/mail-archivable/domain/a/ashurst.eu.org/a/user/andyjpb/cyrus.cache",
		"./20220828/var-cyrus.cache"
	},
	{
		"/var/vhost/ashurst.eu.org/mail-archivable/domain/a/ashurst.eu.org/a/user/andyjpb/cyrus.annotations",
		"./20220828/var-cyrus.annotations"
	},
	{
		"/srv/vhost/ashurst.eu.org/mail-archivable/domain/a/ashurst.eu.org/a/user/andyjpb/cyrus.cache",
		"./20220828/srv-cyrus.cache"
	},
};

#define N_REDIRECTS (sizeof(redirects) / sizeof(struct redirect))



int open(const char *pathname, int flags, ...) {

	const char *path  = NULL;
	mode_t      mode  = 0;
	int         i     = 0;
	int         r     = 0;

	int (*original_open) (const char *__file, int __oflag, ...);
	original_open = dlsym(RTLD_NEXT, "open");

	path = pathname;
	for (i = 0; i < N_REDIRECTS; i++) {
		if (!strcmp(pathname, redirects[i].src)) {
			path = redirects[i].dest;
			fprintf(stderr, "open(%s)\n", path);
		}
	}

	if (flags & O_CREAT) {

		va_list args;

		va_start(args, flags);
		mode = va_arg(args, mode_t);
		va_end(args);

		r = (*original_open)(path, flags, mode);

	} else {
		r = (*original_open)(path, flags);
	}

	return r;
}

Reply via email to