Re: Transaction ID suggestions
On Tue, 4 Sep 2007, Peter Eisch wrote: Would it be possible to implement -id as a hook? The actual key could then be left to the creativity of the user. The plugin could then implement the other hooks and tune the id as necessary (connect, mail, queue, etc.). Yes, it's possible to do it that way. See the current hook for received headers for example code of how it has to work. Matt.
Re: Transaction ID suggestions
$instance_id # could be opaque or structured to include server name # or IP, PID, etc. $instance_id.$connection_id # identifies a connection handled # by this instance $instance_id.$connection_id.$transaction_id # identifies a # transaction within # this connection. I notice that svn code has moved to this model but it still has these lines in it my $SALT_HOST = crypt(hostname, chr(65+rand(57)).chr(65+rand(57))); $SALT_HOST =~ tr/A-Za-z0-9//cd; Is this being used anymore? I don't find a reference to $SALT_HOST in the same file. -- JT Moree
Re: Transaction ID suggestions
On 2007-09-04 07:59:15 -0700, JT Moree wrote: $instance_id# could be opaque or structured to include server name # or IP, PID, etc. $instance_id.$connection_id # identifies a connection handled # by this instance $instance_id.$connection_id.$transaction_id # identifies a # transaction within # this connection. I notice that svn code has moved to this model but it still has these lines in it my $SALT_HOST = crypt(hostname, chr(65+rand(57)).chr(65+rand(57))); $SALT_HOST =~ tr/A-Za-z0-9//cd; Is this being used anymore? I don't find a reference to $SALT_HOST in the same file. I was playing around a bit on the weekend, yes. Since neither Matt nor Ask have cried out in horror on what I did, I guess it's time to present that to a wider audience: The instance id basically identifies Qpsmtpd::SMTP object. Looking through the sources of various servers I found that there is always exactly one per process (although with forkserver it is inherited by the child processes), so I thought that time when object was created (seconds.microseconds since the epoch) host_id process id should always be unique. I replaced $SALT_HOST as the host_id with the primary IP address (in hex), because I think a predictable host id is useful (so that you can find the relevant host from the log entry - otherwise the host id could be removed). It may be useful to replace the IP address with something else, most likely the (abbreviated) hostname. That could be a configuration option. (So this answers your question: $SALT_HOST is obsolete and I just forgot to delete it) The connection id and transaction id are simple counters. So a complete log entry (without timestamp or whatever else the logging mechanism may add) looks like this: 1188729346.156197.7f000101.3165.2.1 Accepted connection 0/15 from 127.0.1.1 / Unknown So this is instance 1188729346.156197.7f000101.3165 (started at 1188729346.156197 on host 127.0.1.1 (oops - the joys of dhcp and strange /etc/hosts files) in process 3165). This is connection number 2 (i.e. the first real connection (connection number 1 is used up during startup) on this instance, and the first transaction within this connection (since a mail from command always starts a new transaction, you can think of transaction 2 as the first real transaction). Apart from the fact that the host id thingy should probably be configurable, there are some other things I'm not completely happy with: * The id is rather long. That is written into every log line and the first 33 characters are always the same until you restart the instance. If you have only a handful of instances (which is quite likely) that's 33 characters for a few bits of information (at least it will compress well with gzip). We could use base64 instead of base 10/16. Then the timestamp reduces to 6+4 (or 6+3 if we are content with 4 µs resolution) characters, the IP address to 6 characters and the PID to 3 characters. Now that's 20 characters including the dot, but it's quite opaque. * The same delimiter is used within the instance id and between the instance id and the connection and transaction ids. This may make life unnecessarily hard for log analysis tools. hp -- _ | Peter J. Holzer| I know I'd be respectful of a pirate |_|_) | Sysadmin WSR | with an emu on his shoulder. | | | [EMAIL PROTECTED] | __/ | http://www.hjp.at/ |-- Sam in Freefall signature.asc Description: Digital signature
Re: Transaction ID suggestions
On 4-Sep-07, at 12:43 PM, Peter J. Holzer wrote: I was playing around a bit on the weekend, yes. Since neither Matt nor Ask have cried out in horror on what I did, FWIW I didn't object simply because it seems so pointless with everyone having such conflicting ideas about what this should all be about. Honestly I'd be much happier with the timestamp being the time of the connection. I have no idea why we want an id for the times we're outside of a connection/transaction. The idea being that if you're writing the file to disk you can use the transaction id as the filename and it will be guaranteed unique, but also contain a timestamp-like component. But frankly if we're going to keep going around in circles on the implementation I'd rather just concede. Matt.
Re: Transaction ID suggestions
On Sep 4, 2007, at 9:43, Peter J. Holzer wrote: I was playing around a bit on the weekend, yes. Since neither Matt nor Ask have cried out in horror on what I did, I guess it's time to present that to a wider audience: I just got back from vacation and is hopelessly behind on reading up on this thread. I'm planning to catch up over the next few weeks. - ask -- http://develooper.com/ - http://askask.com/
Re: Transaction ID suggestions
On 9/4/07 1:14 PM, Matt Sergeant [EMAIL PROTECTED] wrote: On 4-Sep-07, at 12:43 PM, Peter J. Holzer wrote: I was playing around a bit on the weekend, yes. Since neither Matt nor Ask have cried out in horror on what I did, FWIW I didn't object simply because it seems so pointless with everyone having such conflicting ideas about what this should all be about. There seems to be consensus that a/n {connection|session|transaction} id would be useful. Would it be possible to implement -id as a hook? The actual key could then be left to the creativity of the user. The plugin could then implement the other hooks and tune the id as necessary (connect, mail, queue, etc.). peter
Re: Transaction ID suggestions
How does qmail do it?
Re: Transaction ID suggestions
On Sun, 2007-09-02 at 17:52 -0500, David Nicol wrote: How does qmail do it? Uses the inode number ... doesn't work for qpsmtpd ... and it's crap for logging (see my comment earlier in the thread) since the inodes get recycled. -- --gh
Re: Transaction ID suggestions
On 2007-08-31 13:44:44 -0400, Charlie Brady wrote: On Fri, 31 Aug 2007, Peter J. Holzer wrote: On 2007-08-31 10:42:37 -0400, Charlie Brady wrote: 127.0.0.1 is a problem even after establishing the connection: With normal routing arrangements the remote IP address will be 127.0.0.1, too, so the only variable is the remote port. Just to clarify, you are referring to SMTP connections from other processes local to the qpsmtpd server, i.e. connecting over loopback. Correct? Yes. Yes, I can see that in that case 127.0.0.1:nnn:127.0.0.1:25 would not identify the host and would not be unique across multiple servers. hp -- _ | Peter J. Holzer| I know I'd be respectful of a pirate |_|_) | Sysadmin WSR | with an emu on his shoulder. | | | [EMAIL PROTECTED] | __/ | http://www.hjp.at/ |-- Sam in Freefall signature.asc Description: Digital signature
Re: Transaction ID suggestions
On 2007-08-31 11:28:55 -0400, m. allan noah wrote: On 8/31/07, Peter J. Holzer [EMAIL PROTECTED] wrote: On 2007-08-31 10:42:37 -0400, Charlie Brady wrote: However, there is still an issue with Peter's proposed zero out remote address components proposal - prior to accept(), qpstmpd-forkserver may have multiple listening sockets. Some of those sockets (e.g. 127.0.0.1:25) won't be unique across multiple hosts. 127.0.0.1 is a problem even after establishing the connection: With normal routing arrangements the remote IP address will be 127.0.0.1, too, so the only variable is the remote port. If you aggregate log messages from several hosts which receive locally generated messages, that can be a problem. questions: 1. why would the remote ip be localhost once a tcp connection is established? When a client doesn't explicitely bind() to a socket before calling connect(), the OS will choose a port number and IP address. The IP address will generally be that of the interface that the connection goes out of. If the server IP address is local, then the same IP address will be chosen for the client. So, as a special case, if the server listens on 127.0.0.1:25, any connection coming in on that port will be from 127.0.0.1:n. 2. why do we need a 'transaction ID' prior to a connection? Don't think of it as a 'transaction ID'. Think of it as a 'logging ID', which identifies the entity to which the log message belongs. There are things which have to be logged before the first connection (e.g., problems with loading a plugin) and you want to identify where they come from. 3. can we separate 'startup' type messages from transaction-based ones? Probably. In logging/file_connection I used a server instance id (startup timestamp + pid of the forkserver parent process) plus a simple counter for the connections. Due to a quirk which I never investigated, all the startup messages have a connection count of 2, the connections start at 3. In an earlier message I suggested extra counters for the transactions and possibly commands, so the full scheme could be something like: $instance_id# could be opaque or structured to include server name # or IP, PID, etc. $instance_id.$connection_id # identifies a connection handled # by this instance $instance_id.$connection_id.$transaction_id # identifies a # transaction within # this connection. ... hp -- _ | Peter J. Holzer| I know I'd be respectful of a pirate |_|_) | Sysadmin WSR | with an emu on his shoulder. | | | [EMAIL PROTECTED] | __/ | http://www.hjp.at/ |-- Sam in Freefall signature.asc Description: Digital signature
Re: Transaction ID suggestions
On 2007-08-29 19:15:37 -0400, Guy Hulbert wrote: On Thu, 2007-08-30 at 00:49 +0200, Michael Holzt wrote: or even 10 + 1 + 16 + 1 + 39 + 1 + 5 + 1 + 39 + 1 + 5 = 119 characters Better encode it binary. E.g. for IPv4: And better get the number of bits correct. An IP address is a 32 bit integer, not 15 characters. You've snipped the context. JT was calling the Qpsmtpd::Connection::local_ip method which does indeed return a string of up to 15 characters, not an integer of 32 bits. Although perl converts scalars on-demand, it correctly preserves integer values. JT was using string concatenation, so that doesn't help. Yes, it would be possible to call inet_aton on the return value of local_ip, do the equivalent on local_port, then concatenate them, and send them through base64, thus encoding 48 bits of information in 8 characters. But JT didn't do this, so his scheme needs 21 characters to encode the same information. hp -- _ | Peter J. Holzer| I know I'd be respectful of a pirate |_|_) | Sysadmin WSR | with an emu on his shoulder. | | | [EMAIL PROTECTED] | __/ | http://www.hjp.at/ |-- Sam in Freefall signature.asc Description: Digital signature
Re: Transaction ID suggestions
On Sat, 2007-09-01 at 10:08 +0200, Peter J. Holzer wrote: Better encode it binary. E.g. for IPv4: And better get the number of bits correct. An IP address is a 32 bit integer, not 15 characters. You've snipped the context. JT was calling the Qpsmtpd::Connection::local_ip method which does indeed return a string of up to 15 characters, not an integer of 32 bits. An IPv4 address is a 32 bit unsigned integer. The string of 15 characters is a human-readable representation of it. AFAICT the context was obtaining an efficient packing of the data in question (see the post on binary logging to a database). I chose the IP address as an example -- the ID being created was not even close to what we had been discussing and what Matt implemented and I did not want to go in to length on an example which appeared to be marginally on-topic. -- --gh
Re: Transaction ID suggestions
On 2007-08-30 21:12:15 -0400, Charlie Brady wrote: On Thu, 30 Aug 2007, Peter J. Holzer wrote: On 2007-08-29 17:50:28 -0400, Charlie Brady wrote: A four-tuple identifying the TCP connection also identifies the server. Right. And the tuple must not be reused for some time (2*MSL or 4 minutes according to RFC 793), so you don't even need a high resolution timer. Indeed. However, what if there is no TCP connection yet? For example, in forkserver, the plugins are loaded before the first connection is accepted and you want to log a failure to load one of them (or the plugins may want to log something in their register method). I consider that to be a different issue. Log messages at that stage aren't related to and don't need to be correlated with an email message. Right, but we still want to log them and find out what logged them. You could just fill the remote part with zeros, but you can have multiple processes listening on the same port and you can't distinguish them in this case. You can't have multiple processes bound to the same local_IP/local_port, Sure I can: habanero:~ 9:50 101# lsof -i :80 | grep LISTEN httpd9875root 27u IPv4 81804443 TCP *:http (LISTEN) httpd9946 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd9967root 27u IPv4 81804443 TCP *:http (LISTEN) httpd9970 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd9974 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd9977 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd9980 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd9981 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd9991 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 10397 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 10400 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 10403 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 10790 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 11176 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 11728 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 14183 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 14186 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 14187 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 14194 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 14195 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 14198 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 14201 oraport 27u IPv4 81804443 TCP *:http (LISTEN) httpd 14207 oraport 27u IPv4 81804443 TCP *:http (LISTEN) The httpd in question is an Apache btw, so I'd expect an Apache::Qpsmtpd installation to look similar. so you could distinguish hosts and processes by filling in the local part of the four-tuple. That's what I meant with fill the remote part with zeros. hp -- _ | Peter J. Holzer| I know I'd be respectful of a pirate |_|_) | Sysadmin WSR | with an emu on his shoulder. | | | [EMAIL PROTECTED] | __/ | http://www.hjp.at/ |-- Sam in Freefall signature.asc Description: Digital signature
Re: Transaction ID suggestions
On Fri, 31 Aug 2007, Michael Holzt wrote: You can't have multiple processes bound to the same local_IP/local_port, Of course you can. bind - listen - fork Yes, brain fart at my end. s/$/ except by inheritance post-fork/. If we stop listening post-fork (as qpsmtpd-forkserver does) then this state only occurs briefly. And since the fork occurs after accept(), then we already have a TCP four-tuple during that time interval. However, there is still an issue with Peter's proposed zero out remote address components proposal - prior to accept(), qpstmpd-forkserver may have multiple listening sockets. Some of those sockets (e.g. 127.0.0.1:25) won't be unique across multiple hosts.
Re: Transaction ID suggestions
On 2007-08-31 10:42:37 -0400, Charlie Brady wrote: On Fri, 31 Aug 2007, Michael Holzt wrote: You can't have multiple processes bound to the same local_IP/local_port, Of course you can. bind - listen - fork Yes, brain fart at my end. s/$/ except by inheritance post-fork/. If we stop listening post-fork (as qpsmtpd-forkserver does) then this state only occurs briefly. And since the fork occurs after accept(), then we already have a TCP four-tuple during that time interval. However, there is still an issue with Peter's proposed zero out remote address components proposal - prior to accept(), qpstmpd-forkserver may have multiple listening sockets. Some of those sockets (e.g. 127.0.0.1:25) won't be unique across multiple hosts. 127.0.0.1 is a problem even after establishing the connection: With normal routing arrangements the remote IP address will be 127.0.0.1, too, so the only variable is the remote port. If you aggregate log messages from several hosts which receive locally generated messages, that can be a problem. -- _ | Peter J. Holzer| I know I'd be respectful of a pirate |_|_) | Sysadmin WSR | with an emu on his shoulder. | | | [EMAIL PROTECTED] | __/ | http://www.hjp.at/ |-- Sam in Freefall signature.asc Description: Digital signature
Re: Transaction ID suggestions
On Fri, 31 Aug 2007, Peter J. Holzer wrote: On 2007-08-31 10:42:37 -0400, Charlie Brady wrote: However, there is still an issue with Peter's proposed zero out remote address components proposal - prior to accept(), qpstmpd-forkserver may have multiple listening sockets. Some of those sockets (e.g. 127.0.0.1:25) won't be unique across multiple hosts. 127.0.0.1 is a problem even after establishing the connection: With normal routing arrangements the remote IP address will be 127.0.0.1, too, so the only variable is the remote port. Just to clarify, you are referring to SMTP connections from other processes local to the qpsmtpd server, i.e. connecting over loopback. Correct? Yes, I can see that in that case 127.0.0.1:nnn:127.0.0.1:25 would not identify the host and would not be unique across multiple servers. --- Charlie
Re: Transaction ID suggestions
On 2007-08-30 10:08:36 +0200, Peter J. Holzer wrote: Here are some (measured) resolutions of gettimeofday on various systems: Linux/i386: 1 ms Linux/SPARC: 2 ms HP-UX/PA-RISC: 2 ms Linux/Alpha: 976 ms (1024 Hz) Ok, so the Alpha is obsolete, and Sun and HP hardware seems to include a timer with reasonably high resolution (both systems are a bit old I'd expect newer gear get . The sentence in the parentheses was supposed to read: both systems are a bit old - I'd expect newer gear to get full microsecond resolution. Don't know how I managed to garble it that badly. hp -- _ | Peter J. Holzer| I know I'd be respectful of a pirate |_|_) | Sysadmin WSR | with an emu on his shoulder. | | | [EMAIL PROTECTED] | __/ | http://www.hjp.at/ |-- Sam in Freefall signature.asc Description: Digital signature
Re: Transaction ID suggestions
On Thu, 2007-08-30 at 10:08 +0200, Peter J. Holzer wrote: On 2007-08-29 18:36:12 -0400, Guy Hulbert wrote: [snip] Just assume that time() can have the granularity of the CPU instruction counter[1]. It could (if your perl implementation uses 128 bit long doubles), but it Or you could have gettimeofday return 3 ints (sec, nano-sec, atto-sec) instead of 2. isn't guaranteed to have that. You have to plan for the worst case and that's probably a 60 Hz counter. Or you can just warn that the transaction ID may be broken on some systems. Does it provide some critical internal function or is it just for logging ? Or we can provide some alternate hack. Here are some (measured) resolutions of gettimeofday on various systems: Linux/i386: 1 ms Linux/SPARC: 2 ms HP-UX/PA-RISC: 2 ms Linux/Alpha: 976 ms (1024 Hz) 'ms' is usually milli-seconds but it appears you mean micro-seconds ( I pretend that u=mu and write it 'us' ). The alpha is a problem then. However, Time::HiRes seems to be over 10 years old ... are the alpha boxes still being sold ? [snip] However, with a 16 bit PID and 65K processors you might run into collisions with the PID ... I don't know see that follows. The PID still has to be unique at any particular time. If a system can run more than 32k processes in parallel it must use a 32 bit PID. Doh. Yeah, iirc it's been 32 bits on AIX since 1992. [snip] but I doubt anyone has a connection machine to run qpsmtpd on. I think time() + PID is sufficient for now ... unless threads share the PID ... They do on most systems - but you could use the TID instead of the PID. Yup. ( otoh, qpsmtpd is not even threaded is it ? ). It might be possible to run Apache::Qpsmtpd on a multithreaded Apache. Unlikely. PHP people still won't bless mt apache. Postgres people discovered a problem with crypt() - from libc - you must not use crypt() passwords with Pg on mt apache (the problem is only seen with very high loads though). hp -- --gh
Re: Transaction ID suggestions
On Thu, 2007-08-30 at 10:45 +0200, Tony L. Svanstrom wrote: Would this be a bad time to mention that people might get the idea that they want to run two different setups of qpsmtpd on the same server? No that's fine. PID is still in there taking care of that. True, but the code makes both the security guy and the programmer in me twitch... rotfl The part of the unique ID meant to identify the server is now focusing on the OS/computer instead of the instance of qpsmtpd; which one can only get away with as the PID is in the connection ID-part, and thus we shouldn't get more collisions just because we run more than one instance on the same server. However, this is not only (currently) an undocumented and somewhat unobvious feature of the ID-generation, but it's also an unnecessary limitation. If people ever were to remove the PID, maybe as soon as at the end of this discussion, they might not think about fixing the $SALT_HOST. wtf does this mean - the *purpose* of the discussion is to *fix* a *unique* transaction ID when the discussion is over it is *fixed* and the discussion *documents* the implementation. What do you mean people ever were to remove the PID ? If you make random changes to any piece of code it's going to break. Using the IPs + port ought to be the way to go. Please clarify. Given sufficient resolution in the time(), you cannot have two processes on the CPU for the same transaction ID. I thought there might be a problem if you have multiple CPUs but Peter H. has pointed out that the PIDs must be different in that case. You need to have context-switching faster than the clock resolution for collisions in time() -- Peter has shown that the clock resolution is (close to) 1 us for all likely systems other than the alpha. -- --gh
Re: Transaction ID suggestions
On 30-Aug-07, at 4:45 AM, Tony L. Svanstrom wrote: True, but the code makes both the security guy and the programmer in me twitch... Well, don't think of it for security then :-) The part of the unique ID meant to identify the server is now focusing on the OS/computer instead of the instance of qpsmtpd; Not really. It uses a random salt. So every instance will be different. Matt.
Re: Transaction ID suggestions
On Thu, 2007-08-30 at 09:14 -0400, Matt Sergeant wrote: The part of the unique ID meant to identify the server is now Is this unique ID the transaction ID we've been discussing. Has someone already implemented it in svn - I thought it was a new proposal (I'm just a bit confused here) ? focusing on the OS/computer instead of the instance of qpsmtpd; Not really. It uses a random salt. So every instance will be different. That is not true. Random numbers do not give unique results. Also, hash functions have collisions. This is not a problem when using a hash in perl because there is a collision-resolution mechanism. For the requirement of logging multiple independent qpsmtpd servers to a central point there is no trivial mechanism to compare the results of the hash function so you must use a predictable function on something unique to the server. The IP address (for IPv6 the 32 most-significant bits would probably work) is one choice. However, I think it might be better to use a value derived from config('me') but it cannot be a hash. A suitable non-random choice might be substr(config('me')) padded with '_' to a fixed length. Since the sysadmin has to conifigure qpsmtpd to use it, he can make sure that his configurations will work together (if he cares). -- --gh
Re: Transaction ID suggestions
On 30-Aug-07, at 9:34 AM, Guy Hulbert wrote: On Thu, 2007-08-30 at 09:14 -0400, Matt Sergeant wrote: The part of the unique ID meant to identify the server is now Is this unique ID the transaction ID we've been discussing. Yes. Has someone already implemented it in svn - I thought it was a new proposal (I'm just a bit confused here) ? Yes, it's in svn. focusing on the OS/computer instead of the instance of qpsmtpd; Not really. It uses a random salt. So every instance will be different. That is not true. Random numbers do not give unique results. True enough. But I'm going out on a limb to assume that it's good enough for logging. It's not a security feature. Matt.
Re: Transaction ID suggestions
On Thu, 30 Aug 2007 the voices made Guy Hulbert write: GH wtf does this mean - the *purpose* of the discussion is to *fix* a GH *unique* transaction ID when the discussion is over it is *fixed* and GH the discussion *documents* the implementation. I meant undocumented as in it in Transaction.pm currently says Generate unique id without mentioning that the earlier defined $SALT_HOST relies on certain aspects of the ID-generation, without which the $id might not be unique in cases where there's more than one instance of qpsmtpd running on a single server. (Or on two different servers with the same hostname, which isn't exactly unheard of; it happens both by mistake and by design, for instance if setting up a testserver... which you still might want to use with whatever centralized logging you've got.) GH What do you mean people ever were to remove the PID ? If you make GH random changes to any piece of code it's going to break. Random changes yes, but as this discussion has clearly shown it isn't unreasonable to consider creating unique IDs without using the PID (incrementing counter etc); and it isn't unreasonable to view the transaction ID and the server ID as two seperate things, which combined creates a (hopefully) universally unique ID. Even the (current) code structure reflects such thinking. To then use a server ID that I think everyone on this list can agree on has a lesser chance of being unique, esp. if minor changes are made to it, isn't as future/idiot-proof as it easily could be; and if it's easily done at least I prefer to write code that minimizes the chances that people will mess up when working with it. It's enough that someone removes the crypt+rand to easier search the logs for this solution (hostname-based) to theoretically start creating trouble/break (well, at least crack slightly in a corner or two). GH Using the IPs + port ought to be the way to go. GH GH Please clarify. To qpsmtpd the hostname isn't as unique as the IPs + port used by it is. Actually, although IPs+port IMHO is better than hostname it was silly of me to say that it ought to be the way to go, as it doesn't deal with special-use addresses well enough... but it'd be easy to catch those and do something create/output a warning. I think I'll exit the discussion here; you can battle it out among yourselves, and if I'm unhappy with the results I'll just show up with some code and restart the fire... ;-) GH Given sufficient resolution in the time(), you cannot have two processes GH on the CPU for the same transaction ID. I thought there might be a GH problem if you have multiple CPUs but Peter H. has pointed out that the GH PIDs must be different in that case. GH GH You need to have context-switching faster than the clock resolution for GH collisions in time() -- Peter has shown that the clock resolution is GH (close to) 1 us for all likely systems other than the alpha. Or the same hostname on a second server, which is something we shouldn't rule out... /Tony -- Generally speaking, taunting mentally unstable people is a bad idea.
Re: Transaction ID suggestions
On Thu, 2007-08-30 at 10:01 -0400, Matt Sergeant wrote: That is not true. Random numbers do not give unique results. True enough. But I'm going out on a limb to assume that it's good enough for logging. It's not a security feature. But this (by design[*]) doesn't meet the requirement. The (ok, one) purpose of logging is to be able to trace the results of running the service and if your hash collides ALL the messages from the two servers where it collides will be ambiguous (by source). Using a non-random and predictable function on config('me') allows the user to avoid this problem (without modifying the core code). [*] As opposed to the implementation - where Peter has pointed out some limitations of Time::HiRes on one old platform. Thanks for the clarification on svn ... I'll have to check it out (but not today) to see it. Matt. -- --gh
Re: Transaction ID suggestions
On Thu, 2007-08-30 at 16:07 +0200, Tony L. Svanstrom wrote: To qpsmtpd the hostname isn't as unique as the IPs + port used by it is. But for qpsmptd the hostname is configurable ( config('me') ). As long as a hash is not used (see my follow-up to Matt) and the function used is documented, e.g.: sprintf(%_8s,substr(config('me',0,8)) so: me = linux1 - linux1__ me = linux2.example.com - linux2.e If you run two instances you can call them 'thing1' and 'thing2'. -- --gh
Re: Transaction ID suggestions
On 30-Aug-07, at 10:07 AM, Tony L. Svanstrom wrote: On Thu, 30 Aug 2007 the voices made Guy Hulbert write: GH wtf does this mean - the *purpose* of the discussion is to *fix* a GH *unique* transaction ID when the discussion is over it is *fixed* and GH the discussion *documents* the implementation. I meant undocumented as in it in Transaction.pm currently says Generate unique id without mentioning that the earlier defined $SALT_HOST relies on certain aspects of the ID-generation, without which the $id might not be unique in cases where there's more than one instance of qpsmtpd running on a single server. Including PID takes care of that. And you're assuming a broken srand () too. Admittedly, there's a very very remote freak possibility that given two identical hostnames, a rand() with a broken srand(), and those servers starting at the exact same microsecond time with the exact same PID, that you MIGHT, just MAYBE, get a duplicate transaction id. The alternative seems to me the only way to satisfy your security paranoid mind is to use Data::UUID, which is an extra dependency I don't want to add in. Matt.
Re: Transaction ID suggestions
On Thu, 2007-08-30 at 10:30 -0400, Matt Sergeant wrote: On 30-Aug-07, at 10:07 AM, Tony L. Svanstrom wrote: On Thu, 30 Aug 2007 the voices made Guy Hulbert write: [snip] GH the discussion *documents* the implementation. I meant undocumented as in it in Transaction.pm currently says In principle, the documentation will be updated when the discussion is complete. Generate unique id without mentioning that the earlier defined $SALT_HOST relies on certain aspects of the ID-generation, without which the $id might not be unique in cases where there's more than one instance of qpsmtpd running on a single server. Including PID takes care of that. And you're assuming a broken srand () too. Admittedly, there's a very very remote freak possibility that given two identical hostnames, a rand() with a broken srand(), and those servers starting at the exact same microsecond time with the exact same PID, that you MIGHT, just MAYBE, get a duplicate transaction id. Nope. I reject this. The design ASSUMES that the clock has sufficient resolution. It is the implementation which chooses Time::HiRes. There are two perfect solutions (bikesheds ;-): 1. Use a timer based directly on the values in the instruction count register. IIRC, the linux kernel clock (at least on intel) just quantizes this in either micro- or nano- seconds. [bikeshed = kernel patch] 2. Implement our own clock using a sequence generator, which reads the last value out of the tail of the log on startup (and is thread/async-safe). I think that using PID is a bit of a hack but it seems to work in every case that anyone has come up with. It should be changed to TID, should qpsmtpd ever be blessed as thread-safe but I'm not holding my breath for that to happen ;-) ... besides, async is a much better choice (compare lighttpd with apache). The alternative seems to me the only way to satisfy your security paranoid mind is to use Data::UUID, which is an extra dependency I I think the use of the adjective security in this context is rather generous. don't want to add in. Matt. -- --gh
Re: [Fwd: Re: Transaction ID suggestions]
On Aug 26, 2007, at 10:02, Matt Sergeant wrote: On 25-Aug-07, at 8:37 PM, Guy Hulbert wrote: The mod_uniqueid module in apache has quite a reasonable implementation. There is a perl implementation on CPAN (in my directory). I'm assuming Ask is referring to Apache::Usertrack, which does this: Hmn, I did - but that's not what I had in mind. I mixed up mod_usertrack and mod_unique_id in my head. From mod_unique_id.c (in Apache): /* Comments: * * We want an identifier which is unique across all hits, everywhere. * everywhere includes multiple httpd instances on the same machine, or on * multiple machines. Essentially everywhere should include all possible * httpds across all servers at a particular site. We make some assumptions * that if the site has a cluster of machines then their time is relatively * synchronized. We also assume that the first address returned by a * gethostbyname (gethostname()) is unique across all the machines at the * site. * * We also further assume that pids fit in 32-bits. If something uses more * than 32-bits, the fix is trivial, but it requires the unrolled uuencoding * loop to be extended. * A similar fix is needed to support multithreaded * servers, using a pid/tid combo. * * Together, the in_addr and pid are assumed to absolutely uniquely identify * this one child from all other currently running children on all servers * (including this physical server if it is running multiple httpds) from each * other. * * The stamp and counter are used to distinguish all hits for a particular * (in_addr,pid) pair. The stamp is updated using r-request_time, * saving cpu cycles. The counter is never reset, and is used to permit up to * 64k requests in a single second by a single child. * * The 112-bits of unique_id_rec are encoded using the alphabet * [EMAIL PROTECTED], resulting in 19 bytes of printable characters. That is then * stuffed into the environment variable UNIQUE_ID so that it is available to * other modules. The alphabet choice differs from normal base64 encoding * [A-Za-z0-9+/] because + and / are special characters in URLs and we want to * make it easy to use UNIQUE_ID in URLs. * * Note that UNIQUE_ID should be considered an opaque token by other * applications. No attempt should be made to dissect its internal components. * It is an abstraction that may change in the future as the needs of this * module change. * * It is highly desirable that identifiers exist for eternity. But future * needs (such as much faster webservers, moving to 64-bit pids, or moving to a * multithreaded server) may dictate a need to change the contents of * unique_id_rec. Such a future implementation should ensure that the first * field is still a time_t stamp. By doing that, it is possible for a site to * have a flag second in which they stop all of their old-format servers, * wait one entire second, and then start all of their new-servers. This * procedure will ensure that the new space of identifiers is completely unique * from the old space. (Since the first four unencoded bytes always differ.) */ /* - ask -- http://develooper.com/ - http://askask.com/
Re: Transaction ID suggestions
Woah - bikeshedding galore! I just got my email downloaded to my mac (I'm traveling) and Mail.app says there are 61 mails in this thread (plus those I deleted earlier!?!). Enough already. If anyone has a serious realistic concern with what Matt did, please provide a perl implementation of mod_unique_id from Apache - otherwise then let's leave this alone for now. - ask
Re: Transaction ID suggestions
Guy Hulbert wrote: me = linux1 - linux1__ me = linux2.example.com - linux2.e If you run two instances you can call them 'thing1' and 'thing2'. I'd rather not. -- JT Moree
Re: Transaction ID suggestions
On Fri, 2007-08-31 at 00:59 +0800, Ask Bjørn Hansen wrote: Woah - bikeshedding galore! I just got my email downloaded to my mac (I'm traveling) and Mail.app says there are 61 mails in this thread (plus those I deleted earlier!?!). Enough already. There might have been a little less chat if he'd posted the code to the list ... fwiw, here it is. If anyone has a serious realistic concern with what Matt did, please http://svn.perl.org/qpsmtpd/trunk/lib/Qpsmtpd/Transaction.pm # Generate unique id # use gettimeofday for microsec precision # add in rand() in case gettimeofday clock is slow (e.g. bsd?) # add in $$ in case srand is set per process my ($start, $mstart) = gettimeofday(); my $id = sprintf(%d.%06d.%s.%d.%d, $start, $mstart, $SALT_HOST, rand(1), $$, ); provide a perl implementation of mod_unique_id from Apache - otherwise then let's leave this alone for now. - ask -- --gh
Re: Transaction ID suggestions
Ask asked us to stop ... but what the heck ;-). Perhaps we should drop the list after this one though. On Thu, 2007-08-30 at 14:19 -0400, Matt Sergeant wrote: On 30-Aug-07, at 10:57 AM, Guy Hulbert wrote: Nope. I reject this. The design ASSUMES that the clock has sufficient resolution. It is the implementation which chooses Time::HiRes. Fine, so on Alpha, you have a qpsmtpd installation that is using First, what I'm saying, is that I don't think we should be particularly worried about an almost obsolete platform. Also, I am quite happy with whatever you decide as long as it reflects the requirements that everyone has requested (which it seems to do). async and doing more than 1000 mails/second? And given that it has rand(1) in there, you also need a rand() collision in that However. Nope. The problem with random number generators is that their output is *random*. That means that you will occasionally get results very close together and when you quantize it (e.g. rand(10)) it means you will get the same number consecutively. This is exactly what you do not want when your problem is insufficiently resolved times. You'd be better off using a block-cipher (e.g. DES) which scatters results *uniformly*. But either case is a hack so rand() will do since it's available. Actually, I think the right answer is just a sequence generator (mod 1). That guarantees different consecutive results. In python you could just use an iterator ... I'm not sure about perl. Have you read Knuth on random number generators ? It's quite amusing. millisecond. You're reaching for a problem. On normal platforms the minimum granularity is on the order of 1 billion mails/sec. Let me know when you're building the single CPU system that can do that, I'd like to buy one. Note that mod_unique_id is only designed for 64k hits/sec. -- --gh
Re: Transaction ID suggestions
On 30-Aug-07, at 2:52 PM, Guy Hulbert wrote: Actually, I think the right answer is just a sequence generator (mod 1). That guarantees different consecutive results. I think so too. In my testing perl only switches to floating point at or around 2**50 on 32 bit platforms, which should allow enough email between restarts for even the fastest mail systems on the planet. Consider rand() gone and a sequence used instead. Matt.
Re: Transaction ID suggestions
On 2007-08-29 17:50:28 -0400, Charlie Brady wrote: A four-tuple identifying the TCP connection also identifies the server. Right. And the tuple must not be reused for some time (2*MSL or 4 minutes according to RFC 793), so you don't even need a high resolution timer. However, what if there is no TCP connection yet? For example, in forkserver, the plugins are loaded before the first connection is accepted and you want to log a failure to load one of them (or the plugins may want to log something in their register method). You could just fill the remote part with zeros, but you can have multiple processes listening on the same port and you can't distinguish them in this case. hp -- _ | Peter J. Holzer| I know I'd be respectful of a pirate |_|_) | Sysadmin WSR | with an emu on his shoulder. | | | [EMAIL PROTECTED] | __/ | http://www.hjp.at/ |-- Sam in Freefall signature.asc Description: Digital signature
Re: Transaction ID suggestions
On 2007-08-30 07:07:51 -0400, Guy Hulbert wrote: On Thu, 2007-08-30 at 10:08 +0200, Peter J. Holzer wrote: On 2007-08-29 18:36:12 -0400, Guy Hulbert wrote: Here are some (measured) resolutions of gettimeofday on various systems: Linux/i386: 1 ms Linux/SPARC: 2 ms HP-UX/PA-RISC: 2 ms Linux/Alpha: 976 ms (1024 Hz) 'ms' is usually milli-seconds but it appears you mean micro-seconds ( I pretend that u=mu and write it 'us' ). Fortunately I am using a German keyboard so I can claim an AltGr key malfunction ;-) (AltGr+m = µ) ( otoh, qpsmtpd is not even threaded is it ? ). It might be possible to run Apache::Qpsmtpd on a multithreaded Apache. Unlikely. PHP people still won't bless mt apache. But mod_perl people do, AFAIK. Postgres people discovered a problem with crypt() - from libc Interesting. This bug has been known for a long time (Rasmus Lerdorf wrote 2004 that he tracked it down a couple of years ago), yet crypt in the glibc still isn't threadsafe even though that should be very easy to fix. Obviously few people invoke crypt in multithreaded programs. hp -- _ | Peter J. Holzer| I know I'd be respectful of a pirate |_|_) | Sysadmin WSR | with an emu on his shoulder. | | | [EMAIL PROTECTED] | __/ | http://www.hjp.at/ |-- Sam in Freefall signature.asc Description: Digital signature
Re: Transaction ID suggestions
On Thu, 30 Aug 2007, Peter J. Holzer wrote: On 2007-08-29 17:50:28 -0400, Charlie Brady wrote: A four-tuple identifying the TCP connection also identifies the server. Right. And the tuple must not be reused for some time (2*MSL or 4 minutes according to RFC 793), so you don't even need a high resolution timer. Indeed. However, what if there is no TCP connection yet? For example, in forkserver, the plugins are loaded before the first connection is accepted and you want to log a failure to load one of them (or the plugins may want to log something in their register method). I consider that to be a different issue. Log messages at that stage aren't related to and don't need to be correlated with an email message. You could just fill the remote part with zeros, but you can have multiple processes listening on the same port and you can't distinguish them in this case. You can't have multiple processes bound to the same local_IP/local_port, so you could distinguish hosts and processes by filling in the local part of the four-tuple. There's still an edge case where multiple processes are started with the same local port configuration, all but one of which will fail. Do we really ever expect to be merging logs from such errant processes?
Re: Transaction ID suggestions
On Tue, 2007-08-28 at 23:04 -0400, Charlie Brady wrote: On 28-Aug-07, at 3:51 PM, JT Moree wrote: hires_time.pid.local_port ... my $conn = $args{connection}; my $ip = $conn-local_port || 0; my $start = time; my $id = $start.$$.$ip; Some people have suggested adding the remote IP address. I'm curious why use local port instead of remote port? would both be better? Err, actually I had a brain fart. It should be remote_port. No, it should be remote_IP.remote_port.local_port and should include a transaction_within_connection count. I don't think that pid adds anything. This does not guarantee a unique message ID. That's why we are using hi_res time. -- --gh
Re: Transaction ID suggestions
On 28-Aug-07, at 11:04 PM, Charlie Brady wrote: Err, actually I had a brain fart. It should be remote_port. No, it should be remote_IP.remote_port.local_port and should include a transaction_within_connection count. I don't think that pid adds anything. Please try any way you can to get the algorithm I've used to generate a duplicate transaction id. Feel free to use your fastest hardware. I've tried, and cannot conceive of any way to get a repeat with this algorithm. Perhaps in 30 years maybe (when computers are that fast), but for now it works well. Matt.
Re: Transaction ID suggestions
From: Charlie Brady [EMAIL PROTECTED] Date: Tue, 28 Aug 2007 23:04:56 -0400 (EDT) No, it should be remote_IP.remote_port.local_port and should include a transaction_within_connection count. I don't think that pid adds anything. Isn't localport always 25? Chris -- Chris Garrigues Trinsic Solutions President 710-B West 14th Street Austin, TX 78701-1798 http://www.trinsics.com/blog http://www.trinsics.com 512-322-0180 Would you rather proactively pay for uptime or reactively pay for downtime? Trinsic Solutions Your Trusted Friends in Proactive IT. pgpJgOGkhQbvh.pgp Description: PGP signature
Re: Transaction ID suggestions
Chris Garrigues wrote: From: Charlie Brady [EMAIL PROTECTED] Date: Tue, 28 Aug 2007 23:04:56 -0400 (EDT) No, it should be remote_IP.remote_port.local_port and should include a transaction_within_connection count. I don't think that pid adds anything. Isn't localport always 25? the most time: yes. But it can also be 465 -- Jens signature.asc Description: OpenPGP digital signature
Re: Transaction ID suggestions
Charlie Brady wrote: On Tue, 28 Aug 2007, Matt Sergeant wrote: On 28-Aug-07, at 3:51 PM, JT Moree wrote: hires_time.pid.local_port ... my $conn = $args{connection}; my $ip = $conn-local_port || 0; my $start = time; my $id = $start.$$.$ip; Some people have suggested adding the remote IP address. I'm curious why use local port instead of remote port? would both be better? Err, actually I had a brain fart. It should be remote_port. No, it should be remote_IP.remote_port.local_port and should include a transaction_within_connection count. I don't think that pid adds anything. You could still have a machine with several IP's / interfaces, so emote_IP.remote_port.local_port.transaction_within_connection is not enough either. -Johan
Re: Transaction ID suggestions
On 8/29/07, JT Moree [EMAIL PROTECTED] wrote: Given that we are still disagreeing on what is the best way to do it; Can we use all information used so far to get the most unique possible for now? Even if it's not perfect, it's a start. Even if some of the information seems extraneous to some people (and may be) it's still better than nothing. Short of using UUID i'd say doing something like this. I've tried to put the order of information from most static to most dynamic. Using HiRes::Time my $ip = $conn-remote_ip($ip); my $rport = $conn-remote_port || 0; my $lport = $conn-local_port || 0; my $start = time; my $id = $$_$start.$lport_$ip:$rport; -- JT Moree if you want to be paranoid, you have to have all 4 data points from the connection- local port/ip, and remote port/ip, plus local boxes' time with high granularity. if you re-gen '$start' with each transaction within the connection, you dont need a per-connection counter, provided that your time is fine enough to prevent collisions. If you leave out any of the local info, an installation with two servers with un-synced times could still gen the same id. if you add it, then the only way you could have a collision is if your time is not granular enough or gets set back. tcp sequence numbers can also be useful here as a replacement for time, but might be hard to get within perl? allan -- The truth is an offense, but not a sin
Re: Transaction ID suggestions
On Wed, 2007-08-29 at 11:53 -0400, m. allan noah wrote: On 8/29/07, JT Moree [EMAIL PROTECTED] wrote: Given that we are still disagreeing on what is the best way to do it; Can we use all information used so far to get the most unique possible for now? Even if it's not perfect, it's a start. Even if some of the information seems extraneous to some people (and may be) it's still better than nothing. Short of using UUID i'd say doing something like this. I've tried to put the order of information from most static to most dynamic. Using HiRes::Time i.e. use HiRes::Time qw (time); my $ip = $conn-remote_ip($ip); my $rport = $conn-remote_port || 0; my $lport = $conn-local_port || 0; my $start = time; my $id = $$_$start.$lport_$ip:$rport; -- JT Moree if you want to be paranoid, you have to have all 4 data points from Why is there all this confusion about security ? The goal is to have a unique MessageID for logs ... [snip] tcp sequence numbers can also be useful here as a replacement for I doubt it very much. TCP sequence numbers have a history of poor implementation. time, but might be hard to get within perl? allan -- --gh
Re: Transaction ID suggestions
Isn't localport always 25? the most time: yes. But it can also be 465 Also port 587 (message submission as per RFC2476). Regards Michael -- It's an insane world, but i'm proud to be a part of it. -- Bill Hicks
Re: Transaction ID suggestions
If you leave out any of the local info, an installation with two servers with un-synced times could still gen the same id. if you add it, then the only way you could have a collision is if your time is not granular enough or gets set back. I'm ok with that Using HiRes::Time my $lip = $conn-local_ip(); my $rip = $conn-remote_ip(); my $rport = $conn-remote_port || 0; my $lport = $conn-local_port || 0; my $start = time; my $id = $$_$start_$lip:$lport_$rip:$rport; -- JT Moree
Re: Transaction ID suggestions
On Wed, 29 Aug 2007 the voices made Guy Hulbert write: GH Why is there all this confusion about security ? The goal is to have GH a unique MessageID for logs ... Then forget about the word security, and let's just say that people might want to have unique IDs that'll be unique even when they've got more than one server and centralized/aggregated logging... But we're not even there right now, we are still stuck on how to make the IDs 100% unique within a single server as it might be setup by any qpsmtpd-user. /Tony -- Generally speaking, taunting mentally unstable people is a bad idea.
Re: Transaction ID suggestions
On 8/29/07, Guy Hulbert [EMAIL PROTECTED] wrote: if you want to be paranoid, you have to have all 4 data points from Why is there all this confusion about security ? The goal is to have a unique MessageID for logs ... i never said security. i said paranoid, specifically about collisions. allan -- The truth is an offense, but not a sin
Re: Transaction ID suggestions
On Wed, 2007-08-29 at 12:23 -0400, m. allan noah wrote: On 8/29/07, Guy Hulbert [EMAIL PROTECTED] wrote: if you want to be paranoid, you have to have all 4 data points from Why is there all this confusion about security ? The goal is to have a unique MessageID for logs ... i never said security. i said paranoid, specifically about collisions. If the message ID is unique there will be no collisions. So I interpreted your paranoia ... my bad. allan -- --gh
Re: Transaction ID suggestions
On Wed, 2007-08-29 at 18:15 +0200, Tony L. Svanstrom wrote: On Wed, 29 Aug 2007 the voices made Guy Hulbert write: GH Why is there all this confusion about security ? The goal is to have GH a unique MessageID for logs ... Then forget about the word security, and let's just say that people might want to have unique IDs that'll be unique even when they've got more than one server and centralized/aggregated logging... But we're not even there right now, we are still stuck on how to make the IDs 100% unique within a single server as it might be setup by any qpsmtpd-user. There have been several adequate suggestions. This is only a problem if it goes into the qpsmtpd core since some of the suggestions are reported to be in use already. Perhaps it would help to agree on a list of requirements. From what I can remember these are: 1. A unique ID per message (on one server). 2. Ability to distinguish per recipient. 3. Ability to identify the server. A sequence solves (1) except for simultaneous processing of incoming messages via: a) async b) threads/multiple cpus c) local ports (possibly on multiple addresses) Except with multiple CPUs, time with sufficient resolution is a satisfactory replacement for a sequence. It may be useful to log things like remote_port but it doesn't seem to help directly to solve problem 1. A counter solves 2. Any tag which is unique per server solves 3. It is probably simpler to make this configurable by the end-user. /Tony -- --gh
Re: Transaction ID suggestions
A UUID is preferable to the other solutions because you can condense it down to 128 bits of binary data ... and put it in a database. :) The other solutions are not as database friendly. It seems to me if we're trying to solve the problem of guaranteeing unique transaction ids for extremely high volume sites, then we should make sure that the transaction id itself is high volume friendly. Cheers, ds
Re: Transaction ID suggestions
On Wed, 2007-08-29 at 10:14 -0700, David Sparks wrote: A UUID is preferable to the other solutions because you can condense it down to 128 bits of binary data ... and put it in a database. :) HiRes::Timer is 64 bits ... leaving 64 bits for the server tag. The other solutions are not as database friendly. It seems to me if we're trying to solve the problem of guaranteeing unique transaction ids for extremely high volume sites, then we should make sure that the transaction id itself is high volume friendly. Cheers -- --gh
Re: Transaction ID suggestions
Guy Hulbert wrote: There have been several adequate suggestions. This is only a problem if it goes into the qpsmtpd core since some of the suggestions are reported to be in use already. how is this a problem. those uses should still work even if we start with the same variable because they would overwrite what is in core. The plugin maintainers can update as they have time. good idea about the requirements. 1. A unique ID per message (on one server). 2. Ability to distinguish per recipient. 3. Ability to identify the server. 2) per recipient or per message? I don't see a way to make an id per recipient since any message can have multiple recipients. 3) which server are we talking about? A sequence solves (1) except for simultaneous processing of snip A counter solves 2. Any tag which is unique per server solves 3. It is probably simpler to make this configurable by the end-user. if A solves 1, B solves 2, and C solves 3 then A+B+C should solve all three and it's pretty simple to do so let's just do it. While letting the end user make changes is nice it defeats the purpose of putting a transaction ID into core where everyone can know and rely on it working the same way. -- JT Moree
Re: Transaction ID suggestions
Tony L. Svanstrom wrote: Then forget about the word security, and let's just say that people might want to have unique IDs that'll be unique even when they've got more than one server and centralized/aggregated logging... But we're not even there right now, we are still stuck on how to make the IDs 100% unique within a single server as it might be setup by any qpsmtpd-user. No, that much works, as far as I've been able to prove. It's just a bunch of bikeshed painting going on now :-) I'd be happy to add a quick hash of the server in. Matt.
Re: Transaction ID suggestions
On Wed, 2007-08-29 at 11:16 -0700, JT Moree wrote: Guy Hulbert wrote: There have been several adequate suggestions. This is only a problem if it goes into the qpsmtpd core since some of the suggestions are reported to be in use already. how is this a problem. those uses should still work even if we start I think you answered this at the end. with the same variable because they would overwrite what is in core. The plugin maintainers can update as they have time. good idea about the requirements. Well if people restrict their input to the requirements it simplifies things. 1. A unique ID per message (on one server). 2. Ability to distinguish per recipient. 3. Ability to identify the server. 4. Well-defined format (e.g. UUID). 2) per recipient or per message? I don't see a way to make an id per recipient since any message can have multiple recipients. There was a suggestion way back in the thread that this was required. I don't really know if it is required but it has been mentioned more than once (by people besides me). 3) which server are we talking about? If you use syslog you can have all your logs in one place but if you are running multiple mail servers then you might want to know which server is responsible for a particular message ID. A sequence solves (1) except for simultaneous processing of snip A counter solves 2. Any tag which is unique per server solves 3. It is probably simpler to make this configurable by the end-user. if A solves 1, B solves 2, and C solves 3 then A+B+C should solve all three and it's pretty simple to do so let's just do it. I would just use either what Matt Seargent is using http://www.nntp.perl.org/group/perl.qpsmtpd/2007/08/msg7116.html Yeah, we use HiRes::time() . .$$ and we don't get any file stomping (and we're doing millions of emails/day). or somthing like a UUID ... Here is an old UUID I have lying around: f9c31c2d-b3fb-0310-82b0-c4cdd2013627 so we can make it look something like that. use Time::HiRes qw( gettimeofday ); print sprintf(%08x-%08x-%04x\n,gettimeofday,$$); 46d5cf96-45e3-3348 This, at least looks a bit like a UUID and can be extended with -%04x formatted pieces. As long as (2) and (3) are not needed, we are done. I have to run now ... While letting the end user make changes is nice it defeats the purpose of putting a transaction ID into core where everyone can know and rely on it working the same way. -- --gh
Re: Transaction ID suggestions
On 2007-08-29 13:07:06 -0400, Guy Hulbert wrote: On Wed, 2007-08-29 at 18:15 +0200, Tony L. Svanstrom wrote: On Wed, 29 Aug 2007 the voices made Guy Hulbert write: GH Why is there all this confusion about security ? The goal is to have GH a unique MessageID for logs ... Then forget about the word security, and let's just say that people might want to have unique IDs that'll be unique even when they've got more than one server and centralized/aggregated logging... But we're not even there right now, we are still stuck on how to make the IDs 100% unique within a single server as it might be setup by any qpsmtpd-user. There have been several adequate suggestions. This is only a problem if it goes into the qpsmtpd core since some of the suggestions are reported to be in use already. Perhaps it would help to agree on a list of requirements. From what I can remember these are: 1. A unique ID per message (on one server). I'd rephrase that as unique ID per transaction. Not every transaction results in a message (indeed, on my systems 90+% of transactions don't result in a message). 2. Ability to distinguish per recipient. I'm not even sure what per recipient should mean here. Does it mean per RCPT command, so that a log file looks something like this: abcdef.0 Accepted connection 1/15 from 192.0.2.1 /foo.example.com abcdef.0 check_earlytalker plugin: remote host said nothing spontaneous, proceeding abcdef.0 220 ns1.hjp.at ESMTP qpsmtpd 0.40 ready; send us your mail, but not your spam. abcdef.0 dispatching EHLO foo.example.com abcdef.0 250-ns1.hjp.at Hi foo.example.com [192.0.2.1] abcdef.0 250-PIPELINING abcdef.0 250-8BITMIME abcdef.0 250 STARTTLS abcdef.0 dispatching MAIL FROM:[EMAIL PROTECTED] abcdef.0 from email address : [[EMAIL PROTECTED]] abcdef.0 Plugin check_badmailfrom, hook mail returned DECLINED abcdef.0 250 [EMAIL PROTECTED], sender OK - how exciting to get mail from you! abcdef.1 dispatching RCPT TO:[EMAIL PROTECTED] abcdef.1 to email address : [[EMAIL PROTECTED]] abcdef.1 Plugin aliases_check, hook rcpt returned DECLINED, abcdef.1 Plugin spamhaus, hook rcpt returned DECLINED, abcdef.1 250 [EMAIL PROTECTED], recipient ok abcdef.2 dispatching RCPT TO:[EMAIL PROTECTED] abcdef.2 to email address : [[EMAIL PROTECTED]] abcdef.2 Plugin aliases_check, hook rcpt returned DECLINED, abcdef.2 Plugin spamhaus, hook rcpt returned DECLINED, abcdef.2 250 [EMAIL PROTECTED], recipient ok abcdef.0 dispatching DATA ... or really distinguish recipients? The latter doesn't make much sense to me (before the first RCPT there are 0 recpients, and after the second (successful) RCPT there is more than one, so there are a lot of cases where this is ambiguous. As for the former, I don't see that much use in it, either. Grouping lines from dispatching ... to the response together seems easy enough, and if you find that hard for some reason, it doesn't apply only to recipients - you might want a command counter. 3. Ability to identify the server. 4. Ability to identify the connection. A connection can contain several transactions, and would not like to lose the information that two log entries are from the same connection. If we want transaction (and possibly command) ids, I would derive them from the connection id via simple counters: $transaction_id = $connection_id.$transaction_counter $command_id = $transaction_id.$command_counter where the counters are local to their parent and start at 0. A sequence solves (1) except for simultaneous processing of incoming messages via: a) async b) threads/multiple cpus c) local ports (possibly on multiple addresses) I think you'll have to define sequence. If you have one global sequence, that will work in all of these cases. Or you can have multiple sequences, but then you need a prefix to distinguish them. hp -- _ | Peter J. Holzer| I know I'd be respectful of a pirate |_|_) | Sysadmin WSR | with an emu on his shoulder. | | | [EMAIL PROTECTED] | __/ | http://www.hjp.at/ |-- Sam in Freefall signature.asc Description: Digital signature
Re: Transaction ID suggestions
On 2007-08-29 09:08:56 -0700, JT Moree wrote: If you leave out any of the local info, an installation with two servers with un-synced times could still gen the same id. if you add it, then the only way you could have a collision is if your time is not granular enough or gets set back. I'm ok with that Using HiRes::Time my $lip = $conn-local_ip(); up to 15 characters (39 with IPv6) my $rip = $conn-remote_ip(); up to 15 characters (39 with IPv6) my $rport = $conn-remote_port || 0; up to 5 characters my $lport = $conn-local_port || 0; up to 5 characters my $start = time; up to 16 characters $$ up to 5 characters (10 for 32bit PIDs) my $id = $$_$start_$lip:$lport_$rip:$rport; 5 + 1 + 16 + 1 + 15 + 1 + 5 + 1 + 15 + 1 + 5 = 66 characters or even 10 + 1 + 16 + 1 + 39 + 1 + 5 + 1 + 39 + 1 + 5 = 119 characters on some systems. Much too long for an ID which is included in each log line. You could condense it by using base 36 instead of base 10, but it's still quite bulky. hp -- _ | Peter J. Holzer| I know I'd be respectful of a pirate |_|_) | Sysadmin WSR | with an emu on his shoulder. | | | [EMAIL PROTECTED] | __/ | http://www.hjp.at/ |-- Sam in Freefall signature.asc Description: Digital signature
Re: Transaction ID suggestions
On Wed, 29 Aug 2007, Guy Hulbert wrote: 1. A unique ID per message (on one server). 2. Ability to distinguish per recipient. 3. Ability to identify the server. A sequence solves (1) except for simultaneous processing of incoming messages via: a) async b) threads/multiple cpus c) local ports (possibly on multiple addresses) Except with multiple CPUs, time with sufficient resolution is a satisfactory replacement for a sequence. Except with multiple CPUs is a big problem. OTOH, as has been mentioned multiple times, a four-tuple identifying the TCP connection plus a timestamp will be satisfactory with any number of CPUs, and with very fast networks. It may be useful to log things like remote_port but it doesn't seem to help directly to solve problem 1. A counter solves 2. Any tag which is unique per server solves 3. It is probably simpler to make this configurable by the end-user. A four-tuple identifying the TCP connection also identifies the server. --- Charlie
Re: Transaction ID suggestions
On 29-Aug-07, at 1:07 PM, Guy Hulbert wrote: On Wed, 2007-08-29 at 18:15 +0200, Tony L. Svanstrom wrote: Then forget about the word security, and let's just say that people might want to have unique IDs that'll be unique even when they've got more than one server and centralized/aggregated logging... But we're not even there right now, we are still stuck on how to make the IDs 100% unique within a single server as it might be setup by any qpsmtpd-user. There have been several adequate suggestions. This is only a problem if it goes into the qpsmtpd core since some of the suggestions are reported to be in use already. That doesn't matter as they haven't created $tran-id - they've just put something in -notes() which will continue to work. Perhaps it would help to agree on a list of requirements. From what I can remember these are: 1. A unique ID per message (on one server). 2. Ability to distinguish per recipient. 3. Ability to identify the server. I think you've made #2 confusing... I think what you mean is we want a new id when the transaction is reset (i.e. same connection, new email). That's fine. A sequence solves (1) except for simultaneous processing of incoming messages via: a) async b) threads/multiple cpus c) local ports (possibly on multiple addresses) I don't think any of these break when using a the timer. But to settle that concern I've updated the implementation again to use even finer grained time (microseconds) and add in rand() in case the gettimeofday timer is on a slow clock. So now it's: secs.microsecs.rand.pid There's a requirement that I'd like to add in: the ability to use the id as a filename for storage, and have it sort by time. Except with multiple CPUs, time with sufficient resolution is a satisfactory replacement for a sequence. I don't see what difference multiple CPUs makes. Adding in pid takes care of that. It may be useful to log things like remote_port but it doesn't seem to help directly to solve problem 1. Yup. I removed it now - it was stupid to add it in - I just wasn't thinking. A counter solves 2. Consider counter = rand(). Any tag which is unique per server solves 3. It is probably simpler to make this configurable by the end-user. I've added in a basic hashed version of hostname now. Matt.
Re: Transaction ID suggestions
On 29-Aug-07, at 5:50 PM, Charlie Brady wrote: Except with multiple CPUs is a big problem. OTOH, as has been mentioned multiple times, a four-tuple identifying the TCP connection plus a timestamp will be satisfactory with any number of CPUs, and with very fast networks. pid entirely satisfies this problem. Matt.
Re: Transaction ID suggestions
On Wed, 29 Aug 2007, Matt Sergeant wrote: On 28-Aug-07, at 11:04 PM, Charlie Brady wrote: Err, actually I had a brain fart. It should be remote_port. No, it should be remote_IP.remote_port.local_port and should include a transaction_within_connection count. I don't think that pid adds anything. Please try any way you can to get the algorithm I've used to generate a duplicate transaction id. Feel free to use your fastest hardware. My fastest hardware isn't relevant. And I don't have any fast hardware :-) I've tried, and cannot conceive of any way to get a repeat with this algorithm. This algorith I take to mean your proposal of time() . $$ . $remote_port. That is just asserting that no single process could receive two connections in the same tick of time() (because if it could, it's trivial to arrange for them to have the same remote port). I can conceive of that happening, so we should do better. Use the four-tuple. Perhaps in 30 years maybe (when computers are that fast), but for now it works well. But not perfectly :-) Nor as well as it could with a tiny bit more effort. --- Charlie
Re: Transaction ID suggestions
On Wed, 29 Aug 2007 the voices made Matt Sergeant write: MS I've added in a basic hashed version of hostname now. Would this be a bad time to mention that people might get the idea that they want to run two different setups of qpsmtpd on the same server? Like one for incoming e-mails and one for outgoing (logging, whitelisting, preventing spam/viruses from exiting). Yeah, I saw the crypt+rand, but if something is worth doing... =) /Tony -- Generally speaking, taunting mentally unstable people is a bad idea.
Re: Transaction ID suggestions
Peter. I think it might help if you were to just rewrite the requirements properly. I don't have strong opinions on what the solution should be nor what the requirements should be. As long as the total number is small and they are written concisely they will either converge or, if necessary, we can vote. On Wed, 2007-08-29 at 23:13 +0200, Peter J. Holzer wrote: 1. A unique ID per message (on one server). I'd rephrase that as unique ID per transaction. Not every transaction results in a message (indeed, on my systems 90+% of transactions don't result in a message). fine ... I was not clear on the distinction and I think the person who started the thread has already started using transaction ID 2. Ability to distinguish per recipient. I'm not even sure what per recipient should mean here. Does it mean per RCPT command, so that a log file looks something like this: Yes. [ Again I'm not clear but per RCPT command was the previous context I was referring to. ] -- --gh
Re: Transaction ID suggestions
my $lip = $conn-local_ip(); up to 15 characters (39 with IPv6) my $rip = $conn-remote_ip(); up to 15 characters (39 with IPv6) my $rport = $conn-remote_port || 0; up to 5 characters my $lport = $conn-local_port || 0; up to 5 characters my $start = time; up to 16 characters $$ up to 5 characters (10 for 32bit PIDs) my $id = $$_$start_$lip:$lport_$rip:$rport; 5 + 1 + 16 + 1 + 15 + 1 + 5 + 1 + 15 + 1 + 5 = 66 characters or even 10 + 1 + 16 + 1 + 39 + 1 + 5 + 1 + 39 + 1 + 5 = 119 characters Better encode it binary. E.g. for IPv4: my $id = pack(NC,$$,$start,$lip,$lport,$rip,$rport) Sum: 21 Bytes. Encoded in Base64: 28 Bytes. Regards Michael -- It's an insane world, but i'm proud to be a part of it. -- Bill Hicks
Re: Transaction ID suggestions
On 8/29/07, Matt Sergeant [EMAIL PROTECTED] wrote: On 29-Aug-07, at 5:50 PM, Charlie Brady wrote: Except with multiple CPUs is a big problem. OTOH, as has been mentioned multiple times, a four-tuple identifying the TCP connection plus a timestamp will be satisfactory with any number of CPUs, and with very fast networks. pid entirely satisfies this problem. not on multiple machines with centralized logging, which is a fairly common design. allan -- The truth is an offense, but not a sin
Re: Transaction ID suggestions
On 29-Aug-07, at 6:03 PM, Charlie Brady wrote: That is just asserting that no single process could receive two connections in the same tick of time() (because if it could, it's trivial to arrange for them to have the same remote port). I can conceive of that happening, so we should do better. Use the four- tuple. Just because you can conceive of it doesn't make it so. I can conceive of flying monkeys too. And yes, remote_port was dumb. It's gone now. Matt.
Re: Transaction ID suggestions
On 29-Aug-07, at 6:38 PM, Tony L. Svanstrom wrote: On Wed, 29 Aug 2007 the voices made Matt Sergeant write: MS I've added in a basic hashed version of hostname now. Would this be a bad time to mention that people might get the idea that they want to run two different setups of qpsmtpd on the same server? No that's fine. PID is still in there taking care of that. Matt.
Re: Transaction ID suggestions
On 29-Aug-07, at 7:02 PM, m. allan noah wrote: On 8/29/07, Matt Sergeant [EMAIL PROTECTED] wrote: On 29-Aug-07, at 5:50 PM, Charlie Brady wrote: Except with multiple CPUs is a big problem. OTOH, as has been mentioned multiple times, a four-tuple identifying the TCP connection plus a timestamp will be satisfactory with any number of CPUs, and with very fast networks. pid entirely satisfies this problem. not on multiple machines with centralized logging, which is a fairly common design. Hostname is also part of the id (hashed down to a few chars). Matt.
Re: Transaction ID suggestions
James W. Abendschan wrote: The check_earlytalker plugin ensures at least a one second pause in every SMTP session, so time() + peer IP + peer port will be far more unique than a random number :-) This has been suggested a few times but I'd rather not have to have ids for the system depend on using a plugin. I'm pushing for adding this id to core qpsmtpd. This combo would be unique among all hosts attached to the same routable networks -- two hosts on two different, unconnected networks could possibly get a connection from the same private IP + local port at the same time, but this should be impossible if the networks are connected. As in two clients behind a NAT sending to our server at the exact same time? Might be possible from server farms or distributed mailing list systems? What do you guys think? -- JT Moree
Re: Transaction ID suggestions
On 8/28/07, JT Moree [EMAIL PROTECTED] wrote: James W. Abendschan wrote: The check_earlytalker plugin ensures at least a one second pause in every SMTP session, so time() + peer IP + peer port will be far more unique than a random number :-) This has been suggested a few times but I'd rather not have to have ids for the system depend on using a plugin. I'm pushing for adding this id to core qpsmtpd. This combo would be unique among all hosts attached to the same routable networks -- two hosts on two different, unconnected networks could possibly get a connection from the same private IP + local port at the same time, but this should be impossible if the networks are connected. As in two clients behind a NAT sending to our server at the exact same time? Might be possible from server farms or distributed mailing list systems? What do you guys think? that wont be an issue. the nat box will rewrite the outgoing packets to say they are coming from a unique port on it's external interface, and that is all you can see on your end. remoteIP + remotePort + fineGrainedTime is what we use in-house for some high-speed http logging that needs a unique handle. it works just fine with a fair number of concurrent clients behind a nat or proxy. but, my installation is not massive :) allan -- The truth is an offense, but not a sin
Re: Transaction ID suggestions
Why not use something like Data::UUID? http://search.cpan.org/~rjbs/Data-UUID-1.148/UUID.pm There is reads: It provides reasonably efficient and reliable framework for generating UUIDs and supports fairly high allocation rates -- 10 million per second per machine -- and therefore is suitable for identifying both extremely short-lived and very persistent objects on a given system as well as across the network. I used this in a former project for unique persistent object ids. -- Ernesto
Re: Transaction ID suggestions
remoteIP + remotePort + fineGrainedTime is what we use in-house for some high-speed http logging that needs a unique handle. it works just fine with a fair number of concurrent clients behind a nat or proxy. but, my installation is not massive :) Add PID and a per-process message-counter and you should always be unique. Regards Michael -- It's an insane world, but i'm proud to be a part of it. -- Bill Hicks
Re: Transaction ID suggestions
I've checked in $transaction-id support now. Please let me know if you think it's OK. Matt.
Re: Transaction ID suggestions
Matt Sergeant wrote: I've checked in $transaction-id support now. Please let me know if you think it's OK. which method did you use? -- JT Moree
Re: Transaction ID suggestions
On 28-Aug-07, at 3:12 PM, JT Moree wrote: Matt Sergeant wrote: I've checked in $transaction-id support now. Please let me know if you think it's OK. which method did you use? hires_time.pid.local_port Matt.
Re: Transaction ID suggestions
Matt Sergeant wrote: On 28-Aug-07, at 3:12 PM, JT Moree wrote: Matt Sergeant wrote: I've checked in $transaction-id support now. Please let me know if you think it's OK. which method did you use? hires_time.pid.local_port I found the svn web interface: # generate id my $conn = $args{connection}; my $ip = $conn-local_port || 0; my $start = time; my $id = $start.$$.$ip; Some people have suggested adding the remote IP address. I'm curious why use local port instead of remote port? would both be better? my $ip = $conn-remote_ip($ip); my $rport = $conn-remote_port || 0; my $lport = $conn-local_port || 0; my $start = time; my $id = $start_$$.$lport_$ip:$rport; Thanks for checking something in. Progress is being made. ;) -- JT Moree
Re: Transaction ID suggestions
On 28-Aug-07, at 3:51 PM, JT Moree wrote: I found the svn web interface: # generate id my $conn = $args{connection}; my $ip = $conn-local_port || 0; my $start = time; my $id = $start.$$.$ip; Some people have suggested adding the remote IP address. I'm curious why use local port instead of remote port? would both be better? Err, actually I had a brain fart. It should be remote_port. Matt.
Re: Transaction ID suggestions
On Tue, 28 Aug 2007, Matt Sergeant wrote: On 28-Aug-07, at 3:51 PM, JT Moree wrote: hires_time.pid.local_port ... my $conn = $args{connection}; my $ip = $conn-local_port || 0; my $start = time; my $id = $start.$$.$ip; Some people have suggested adding the remote IP address. I'm curious why use local port instead of remote port? would both be better? Err, actually I had a brain fart. It should be remote_port. No, it should be remote_IP.remote_port.local_port and should include a transaction_within_connection count. I don't think that pid adds anything.
Re: Transaction ID suggestions
On 24-Aug-07, at 6:40 PM, David Sparks wrote: I'm using the poll server which means that there aren't threads to worry about. However the future probably means running multiple daemons to take advantage of multi-core systems so there would need to be a daemon id encoded in there. Yeah, we use HiRes::time() . .$$ and we don't get any file stomping (and we're doing millions of emails/day).
Re: Transaction ID suggestions
On Fri, 24 Aug 2007, Guy Hulbert wrote: fqdn + time + peer TCP port will be pretty unique, regardless of fqdn is the trivial part rand will be pretty unique ... Initial connection time, peer IP, and peer port will only repeat if the connection is torn down and restablished with the same peer reusing the same local port within the resolution of the timer. The check_earlytalker plugin ensures at least a one second pause in every SMTP session, so time() + peer IP + peer port will be far more unique than a random number :-) This combo would be unique among all hosts attached to the same routable networks -- two hosts on two different, unconnected networks could possibly get a connection from the same private IP + local port at the same time, but this should be impossible if the networks are connected. Adding this to plugins/logging/syslog works pretty well for forkserver: use Time::HiRes; ... if (!$self-{_logid}) { if ($self-connection-remote_ip) { $self-{_timestamp} = Time::HiRes::time(); $self-{_logid} = t= . $self-{_timestamp} . /peer= . $self-connection-remote_ip . : . $self-connection-remote_port; } } if ($self-connection-remote_ip) { $header = $self-{_logid} . ; } syslog $priority, '%s%s', $header, join(' ', @log); syslog messages look like this: Aug 25 14:31:27 mailfoo qpsmtpd[4892]: t=1188077487.69488/peer=10.1.253.1:40911 check_earlytalker If there's an existing way to count the number of messages sent during the connection, then append the count to _logid and it becomes a message ID generator. If this isn't already somewhere in SMTP.pm, the queueing plugin could increment a counter.. or the logging plugin could watch for the string 'to email address :' increment a (thread-safe) counter. That's a smidge brittle, tho.. a proper message counter would be less hacky. James
Re: Transaction ID suggestions
JT Moree wrote: Is this uique enough? what is the chance of getting the same random number again? should it be a combination of the PID + time + rand? my @sname = split(/\./, $self-qp-config(me)); = $sname[0].$$.'r'.int( (( time ^ $$ ) * rand($$)) / rand(time/$$)); = sprintf(%08X, rand(2**32 - 1)); $self-qp-config(me) =~ m/\.(\d{1,3}$/; #not tested $self-{_id} = $1; = sprintf(%.4f%d, time(), $self-{_id}); = sprintf(%.4f, time()) ... $self-qp-config(me) . \ sprintf(%08X, rand(2**32 - 1)); #how expensive is this? These are the approaches suggested so far. I added the last one as a combination of the others. Can we see a show of hands for the one people like the best? Can we get Hanno to modify his patch if people like one of these approaches? Can we get it tested by some people? Can we get it checked into svn? -- JT Moree
Re: Transaction ID suggestions
On Fri, 2007-08-24 at 11:52 -0700, JT Moree wrote: JT Moree wrote: Is this uique enough? what is the chance of getting the same random number again? should it be a combination of the PID + time + rand? my @sname = split(/\./, $self-qp-config(me)); = $sname[0].$$.'r'.int( (( time ^ $$ ) * rand($$)) / rand(time/$$)); = sprintf(%08X, rand(2**32 - 1)); $self-qp-config(me) =~ m/\.(\d{1,3}$/; #not tested $self-{_id} = $1; = sprintf(%.4f%d, time(), $self-{_id}); = sprintf(%.4f, time()) ... $self-qp-config(me) . \ sprintf(%08X, rand(2**32 - 1)); #how expensive is this? These are the approaches suggested so far. I added the last one as a combination of the others. Can we see a show of hands for the one Using rand is bogus. A random number generator will repeat values. Time (with sufficient resolution) is equivalent to a sequence ... but with threads, you would need a lock on the sequence generator. people like the best? Can we get Hanno to modify his patch if people like one of these approaches? Can we get it tested by some people? Can we get it checked into svn? -- --gh
Re: Transaction ID suggestions
On Fri, 24 Aug 2007, Guy Hulbert wrote: These are the approaches suggested so far. I added the last one as a combination of the others. Can we see a show of hands for the one Using rand is bogus. A random number generator will repeat values. Time (with sufficient resolution) is equivalent to a sequence ... but with threads, you would need a lock on the sequence generator. fqdn + time + peer TCP port will be pretty unique, regardless of whether you're forking, selecting, or threading. (fortunately, multiplexed SMTP does not yet exist.) Looks like remote_port is set in qpsmtpd-forkserver, at least.. James
Re: Transaction ID suggestions
On Fri, 24 Aug 2007, James W. Abendschan wrote: On Fri, 24 Aug 2007, Guy Hulbert wrote: These are the approaches suggested so far. I added the last one as a combination of the others. Can we see a show of hands for the one Using rand is bogus. A random number generator will repeat values. Time (with sufficient resolution) is equivalent to a sequence ... but with threads, you would need a lock on the sequence generator. fqdn + time + peer TCP port will be pretty unique, regardless of whether you're forking, selecting, or threading. (fortunately, multiplexed SMTP does not yet exist.) whoops; s/fqdn/peer IP/ James
Re: Transaction ID suggestions
James W. Abendschan wrote: On Fri, 24 Aug 2007, Guy Hulbert wrote: These are the approaches suggested so far. I added the last one as a combination of the others. Can we see a show of hands for the one Using rand is bogus. A random number generator will repeat values. Time (with sufficient resolution) is equivalent to a sequence ... but with threads, you would need a lock on the sequence generator. fqdn + time + peer TCP port will be pretty unique, regardless of whether you're forking, selecting, or threading. (fortunately, multiplexed SMTP does not yet exist.) mmh, multiplexed? A mailserver can send multiple mails within one tcp-connection: There may be zero or more, transactions in a session. - RFC2821 -- Jens
Re: Transaction ID suggestions
On Fri, 24 Aug 2007, Jens Weibler wrote: mmh, multiplexed? A mailserver can send multiple mails within one tcp-connection: There may be zero or more, transactions in a session. - RFC2821 Ah, good point. Okay then, obviously qpsmtpd now needs to be rewritten to make me right -- after leaving the DATA state, reject anything other than QUIT :-) I suppose a counter could be tacked on to the ID and incremented every time a message is queued.. James
Re: Transaction ID suggestions
Guy Hulbert wrote: Using rand is bogus. A random number generator will repeat values. So you would definitely not like #2 and probably not #1. How about #3 and $4? Time (with sufficient resolution) is equivalent to a sequence ... but with threads, you would need a lock on the sequence generator. In our case a repetition is not a highly critical problem. (Not enough to justify using a centralized sequence generator.) Repetition just reduces the readability of the logs. Given that the logs are even less readable without these id's I'd say we are in a better position to implement something rather than nothing. -- JT Moree
Re: Transaction ID suggestions
= sprintf(%.4f, time()) ... $self-qp-config(me) . \ sprintf(%08X, rand(2**32 - 1)); #how expensive is this? These are the approaches suggested so far. I added the last one as a combination of the others. Can we see a show of hands for the one Using rand is bogus. A random number generator will repeat values. Time (with sufficient resolution) is equivalent to a sequence ... but with threads, you would need a lock on the sequence generator. I'm using the poll server which means that there aren't threads to worry about. However the future probably means running multiple daemons to take advantage of multi-core systems so there would need to be a daemon id encoded in there. The big advantage to using time() + id as the least significant digit is that you can put the id in a db server as a double or unixtime which comes in quite handy when you've got a lot of volume. Cheers, ds
Re: Transaction ID suggestions
On Fri, 2007-08-24 at 13:18 -0700, James W. Abendschan wrote: On Fri, 24 Aug 2007, Guy Hulbert wrote: These are the approaches suggested so far. I added the last one as a combination of the others. Can we see a show of hands for the one Using rand is bogus. A random number generator will repeat values. Time (with sufficient resolution) is equivalent to a sequence ... but with threads, you would need a lock on the sequence generator. fqdn + time + peer TCP port will be pretty unique, regardless of fqdn is the trivial part rand will be pretty unique ... time by itself is sufficient if the resolution is fine enough ... the problem is that when systems are fast enough, whatever fixed resolution you picked will not be enough. However, at present the linux kernel gives microseconds (%.6f rather than %.4f) and it seems to take about .0001 seconds to fork a process so if forking, microseconds seem to be sufficient for a few years. ... but threads may be able to get the same value ... The problem with a sequence is to continue through a crash without repeating values. whether you're forking, selecting, or threading. (fortunately, multiplexed SMTP does not yet exist.) Looks like remote_port is set in qpsmtpd-forkserver, at least.. James -- --gh
Re: Transaction ID suggestions
On Fri, 2007-08-24 at 13:22 -0700, JT Moree wrote: Guy Hulbert wrote: Using rand is bogus. A random number generator will repeat values. So you would definitely not like #2 and probably not #1. How about #3 and $4? I can't think of anything that guarantees a unique number ... except pulling a sequence from an ACID database (where the problem of system crashes is already solved). Time (with sufficient resolution) is equivalent to a sequence ... but with threads, you would need a lock on the sequence generator. In our case a repetition is not a highly critical problem. (Not enough Repetition will break anything using a hash to sort messages by ID. to justify using a centralized sequence generator.) Repetition just reduces the readability of the logs. Given that the logs are even less Ah. You never know. DJB had a clever method to pick message IDs for the queue by using the inode ... but it is useless for log analysis where the mail queue has a dedicated reiser partition and the load is very LOW. I found that every message used the same inode when there was only one message at a time on the system ... :-( It solves his problem of picking a message ID which will not conflict with any other in the queue _at the same time_. readable without these id's I'd say we are in a better position to implement something rather than nothing. -- --gh