Don't worry, I'm not trying to make a meaning out of that ID. Just want to narrow down the recognised pattern to avoid false interpretation of the log entries.

Well, if that log was intended for humans, would it be an interesting idea to write a machine-readable log as well? I'm specifically interested in metrics such as these:

* How many messages are submitted by a specific user/address or for a specific recipient/domain? * From how many different hosts/IP addresses are messages submitted for a specific user/sender? * How many remote SMTP errors indicating server reputation issues do we see, and from which remote services?
* How many messages from a specific user/address could not be delivered?
* Does a user have a high sender spam score?

This is all to monitor the quality of the local service and detect hacked accounts or other kinds of misuse of the service.

But I already see at my last list item which uses a log message from my custom Exim config that it's probably hard to generate a more parsing-friendly format (e.g. JSON). Every custom log message would need to be annotated for that.

By now, out of 20000 log lines, I can't recognise 30. From all others it seems I can extract sufficient meaning and data for the necessary metrics. I can live with that, it's just a lot of code required to get there.

-Yves


-------- Ursprüngliche Nachricht --------
Von: Jeremy Harris via Exim-users <[email protected]>
Gesendet: Donnerstag, 24. Dezember 2020, 23:35 MEZ
Betreff: [exim] Queue ID format

On 24/12/2020 22:17, Yves Goergen via Exim-users wrote:
I'm parsing Exim log files, specifically the mainlog. Man, that's a complex structure and it's hard to find all necessary details from the documentation and by reading my actual log files. I'm using several regular expressions for different kinds of lines. But a stateful parser (the ones used to understand programming languages) would probably have been the better choice here. Apache access logs just require a single regex, for Exim I already have 8, one of which just covers most meaningless messages I don't care about, and lots of detailed post-processing.

The logs are really designed for human use, not for machine consumption.

What assumptions can I make about the format of a queue message ID? For now, I use this regex:

    [^ ]+

Though it seems they always match this regex:

    [0-9A-Za-z]{6}-[0-9A-Za-z]{6}-[0-9A-Za-z]{2}

It may change at any time from future development changes.
There's a relevant comment in the source:

/* Now build the unique message id. This has changed several times over the
lifetime of Exim. This description was rewritten for Exim 4.14 (February 2003).
...

I *think* that some high-volume sites are at or close to performance limits [1] that the current format imposes, hence I must reiterate: this (the message_id
format) is not supposed to be an exported interface.  It's only documented
behaviour is that it is unique.

It's fairly reasonable to assume it'll never have an embedded space. I would not
recommend trying to extract meaning from it.



--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

Reply via email to