Re: Transaction ID suggestions

2007-09-05 Thread Matt Sergeant

On Tue, 4 Sep 2007, Peter Eisch wrote:


Would it be possible to implement -id as a hook?  The actual key could then
be left to the creativity of the user.  The plugin could then implement the
other hooks and tune the id as necessary (connect, mail, queue, etc.).


Yes, it's possible to do it that way. See the current hook for received 
headers for example code of how it has to work.


Matt.


Re: Transaction ID suggestions

2007-09-04 Thread JT Moree
 $instance_id  # could be opaque or structured to include server name
 # or IP, PID, etc.
 $instance_id.$connection_id   # identifies a connection handled
   # by this instance
 $instance_id.$connection_id.$transaction_id   # identifies a
   # transaction within 
   # this connection.

I notice that svn code has moved to this model but it still has these
lines in it

  my $SALT_HOST = crypt(hostname, chr(65+rand(57)).chr(65+rand(57)));
  $SALT_HOST =~ tr/A-Za-z0-9//cd;

Is this being used anymore?  I don't find a reference to $SALT_HOST in
the same file.

-- 
JT Moree


Re: Transaction ID suggestions

2007-09-04 Thread Peter J. Holzer
On 2007-09-04 07:59:15 -0700, JT Moree wrote:
  $instance_id# could be opaque or structured to include server name
  # or IP, PID, etc.
  $instance_id.$connection_id # identifies a connection handled
  # by this instance
  $instance_id.$connection_id.$transaction_id # identifies a
  # transaction within 
  # this connection.
 
 I notice that svn code has moved to this model but it still has these
 lines in it
 
   my $SALT_HOST = crypt(hostname, chr(65+rand(57)).chr(65+rand(57)));
   $SALT_HOST =~ tr/A-Za-z0-9//cd;
 
 Is this being used anymore?  I don't find a reference to $SALT_HOST in
 the same file.

I was playing around a bit on the weekend, yes. Since neither Matt nor
Ask have cried out in horror on what I did, I guess it's time to present
that to a wider audience:

The instance id basically identifies Qpsmtpd::SMTP object. Looking
through the sources of various servers I found that there is always
exactly one per process (although with forkserver it is inherited by the
child processes), so I thought that 

time when object was created (seconds.microseconds since the epoch)
host_id 
process id 

should always be unique. I replaced $SALT_HOST as the host_id with the
primary IP address (in hex), because I think a predictable host id is
useful (so that you can find the relevant host from the log entry -
otherwise the host id could be removed). It may be useful to replace the
IP address with something else, most likely the (abbreviated) hostname.
That could be a configuration option. (So this answers your question:
$SALT_HOST is obsolete and I just forgot to delete it)

The connection id and transaction id are simple counters. 

So a complete log entry (without timestamp or whatever else the logging
mechanism may add) looks like this:

1188729346.156197.7f000101.3165.2.1 Accepted connection 0/15 from 127.0.1.1 / 
Unknown

So this is instance 1188729346.156197.7f000101.3165 (started at
1188729346.156197 on host 127.0.1.1 (oops - the joys of dhcp and strange
/etc/hosts files) in process 3165). This is connection number 2 (i.e.
the first real connection (connection number 1 is used up during
startup) on this instance, and the first transaction within this
connection (since a mail from command always starts a new transaction,
you can think of transaction 2 as the first real transaction).

Apart from the fact that the host id thingy should probably be
configurable, there are some other things I'm not completely happy with:

* The id is rather long. That is written into every log line and the
  first 33 characters are always the same until you restart the
  instance. If you have only a handful of instances (which is quite
  likely) that's 33 characters for a few bits of information (at least
  it will compress well with gzip). We could use base64 instead of base
  10/16. Then the timestamp reduces to 6+4 (or 6+3 if we are content
  with 4 µs resolution) characters, the IP address to 6 characters and
  the PID to 3 characters. Now that's 20 characters including the
  dot, but it's quite opaque.

* The same delimiter is used within the instance id and between the
  instance id and the connection and transaction ids. This may make life
  unnecessarily hard for log analysis tools.

hp


-- 
   _  | Peter J. Holzer| I know I'd be respectful of a pirate 
|_|_) | Sysadmin WSR   | with an emu on his shoulder.
| |   | [EMAIL PROTECTED] |
__/   | http://www.hjp.at/ |-- Sam in Freefall


signature.asc
Description: Digital signature


Re: Transaction ID suggestions

2007-09-04 Thread Matt Sergeant

On 4-Sep-07, at 12:43 PM, Peter J. Holzer wrote:


I was playing around a bit on the weekend, yes. Since neither Matt nor
Ask have cried out in horror on what I did,


FWIW I didn't object simply because it seems so pointless with  
everyone having such conflicting ideas about what this should all be  
about.


Honestly I'd be much happier with the timestamp being the time of the  
connection. I have no idea why we want an id for the times we're  
outside of a connection/transaction. The idea being that if you're  
writing the file to disk you can use the transaction id as the  
filename and it will be guaranteed unique, but also contain a  
timestamp-like component.


But frankly if we're going to keep going around in circles on the  
implementation I'd rather just concede.


Matt.



Re: Transaction ID suggestions

2007-09-04 Thread Ask Bjørn Hansen


On Sep 4, 2007, at 9:43, Peter J. Holzer wrote:


I was playing around a bit on the weekend, yes. Since neither Matt nor
Ask have cried out in horror on what I did, I guess it's time to  
present

that to a wider audience:


I just got back from vacation and is hopelessly behind on reading up  
on this thread.  I'm planning to catch up over the next few weeks.



 - ask

--
http://develooper.com/ - http://askask.com/




Re: Transaction ID suggestions

2007-09-04 Thread Peter Eisch
On 9/4/07 1:14 PM, Matt Sergeant [EMAIL PROTECTED] wrote:

 On 4-Sep-07, at 12:43 PM, Peter J. Holzer wrote:
 
 I was playing around a bit on the weekend, yes. Since neither Matt nor
 Ask have cried out in horror on what I did,
 
 FWIW I didn't object simply because it seems so pointless with
 everyone having such conflicting ideas about what this should all be
 about.
 

There seems to be consensus that a/n {connection|session|transaction} id
would be useful.  

Would it be possible to implement -id as a hook?  The actual key could then
be left to the creativity of the user.  The plugin could then implement the
other hooks and tune the id as necessary (connect, mail, queue, etc.).

peter



Re: Transaction ID suggestions

2007-09-02 Thread David Nicol
How does qmail do it?


Re: Transaction ID suggestions

2007-09-02 Thread Guy Hulbert
On Sun, 2007-09-02 at 17:52 -0500, David Nicol wrote:
 How does qmail do it?

Uses the inode number ... doesn't work for qpsmtpd ... and it's crap for
logging (see my comment earlier in the thread) since the inodes get
recycled.

-- 
--gh




Re: Transaction ID suggestions

2007-09-01 Thread Peter J. Holzer
On 2007-08-31 13:44:44 -0400, Charlie Brady wrote:
 
 On Fri, 31 Aug 2007, Peter J. Holzer wrote:
 On 2007-08-31 10:42:37 -0400, Charlie Brady wrote:
 127.0.0.1 is a problem even after establishing the connection: With
 normal routing arrangements the remote IP address will be 127.0.0.1,
 too, so the only variable is the remote port.
 
 Just to clarify, you are referring to SMTP connections from other 
 processes local to the qpsmtpd server, i.e. connecting over loopback. 
 Correct?

Yes.

 Yes, I can see that in that case 127.0.0.1:nnn:127.0.0.1:25 would not 
 identify the host and would not be unique across multiple servers.

hp

-- 
   _  | Peter J. Holzer| I know I'd be respectful of a pirate 
|_|_) | Sysadmin WSR   | with an emu on his shoulder.
| |   | [EMAIL PROTECTED] |
__/   | http://www.hjp.at/ |-- Sam in Freefall


signature.asc
Description: Digital signature


Re: Transaction ID suggestions

2007-09-01 Thread Peter J. Holzer
On 2007-08-31 11:28:55 -0400, m. allan noah wrote:
 On 8/31/07, Peter J. Holzer [EMAIL PROTECTED] wrote:
  On 2007-08-31 10:42:37 -0400, Charlie Brady wrote:
   However, there is still an issue with Peter's proposed zero out remote
   address components proposal - prior to accept(), qpstmpd-forkserver may
   have multiple listening sockets. Some of those sockets (e.g. 127.0.0.1:25)
   won't be unique across multiple hosts.
 
  127.0.0.1 is a problem even after establishing the connection: With
  normal routing arrangements the remote IP address will be 127.0.0.1,
  too, so the only variable is the remote port. If you aggregate log
  messages from several hosts which receive locally generated messages,
  that can be a problem.
 
 
 questions:
 
 1. why would the remote ip be localhost once a tcp connection is established?

When a client doesn't explicitely bind() to a socket before calling
connect(), the OS will choose a port number and IP address. The IP
address will generally be that of the interface that the connection goes
out of. If the server IP address is local, then the same IP address will
be chosen for the client. So, as a special case, if the server listens
on 127.0.0.1:25, any connection coming in on that port will be from
127.0.0.1:n.


 2. why do we need a 'transaction ID' prior to a connection?

Don't think of it as a 'transaction ID'. Think of it as a 'logging ID',
which identifies the entity to which the log message belongs. There are
things which have to be logged before the first connection (e.g.,
problems with loading a plugin) and you want to identify where they come
from.


 3. can we separate 'startup' type messages from transaction-based ones?

Probably. In logging/file_connection I used a server instance id
(startup timestamp + pid of the forkserver parent process) plus a simple
counter for the connections. Due to a quirk which I never investigated,
all the startup messages have a connection count of 2, the connections
start at 3. In an earlier message I suggested extra counters for the
transactions and possibly commands, so the full scheme could be
something like:

$instance_id# could be opaque or structured to include server name
# or IP, PID, etc.
$instance_id.$connection_id # identifies a connection handled
# by this instance
$instance_id.$connection_id.$transaction_id # identifies a
# transaction within 
# this connection.
...

hp

-- 
   _  | Peter J. Holzer| I know I'd be respectful of a pirate 
|_|_) | Sysadmin WSR   | with an emu on his shoulder.
| |   | [EMAIL PROTECTED] |
__/   | http://www.hjp.at/ |-- Sam in Freefall


signature.asc
Description: Digital signature


Re: Transaction ID suggestions

2007-09-01 Thread Peter J. Holzer
On 2007-08-29 19:15:37 -0400, Guy Hulbert wrote:
 On Thu, 2007-08-30 at 00:49 +0200, Michael Holzt wrote:
   or even
   10 + 1 + 16 + 1 + 39 + 1 + 5 + 1 + 39 + 1 + 5 = 119 characters
  
  Better encode it binary. E.g. for IPv4:
 
 And better get the number of bits correct.  An IP address is a 32 bit
 integer, not 15 characters.

You've snipped the context. JT was calling the
Qpsmtpd::Connection::local_ip method which does indeed return a string
of up to 15 characters, not an integer of 32 bits.

 Although perl converts scalars on-demand, it correctly preserves
 integer values.

JT was using string concatenation, so that doesn't help. 

Yes, it would be possible to call inet_aton on the return value of
local_ip, do the equivalent on local_port, then concatenate them, and 
send them through base64, thus encoding 48 bits of information in 8
characters. But JT didn't do this, so his scheme needs 21 characters to
encode the same information.

hp

-- 
   _  | Peter J. Holzer| I know I'd be respectful of a pirate 
|_|_) | Sysadmin WSR   | with an emu on his shoulder.
| |   | [EMAIL PROTECTED] |
__/   | http://www.hjp.at/ |-- Sam in Freefall


signature.asc
Description: Digital signature


Re: Transaction ID suggestions

2007-09-01 Thread Guy Hulbert
On Sat, 2007-09-01 at 10:08 +0200, Peter J. Holzer wrote:
   Better encode it binary. E.g. for IPv4:
  
  And better get the number of bits correct.  An IP address is a 32
 bit
  integer, not 15 characters.
 
 You've snipped the context. JT was calling the
 Qpsmtpd::Connection::local_ip method which does indeed return a string
 of up to 15 characters, not an integer of 32 bits.

An IPv4 address is a 32 bit unsigned integer.  The string of 15
characters is a human-readable representation of it.  AFAICT the
context was obtaining an efficient packing of the data in question (see
the post on binary logging to a database). I chose the IP address as an
example -- the ID being created was not even close to what we had been
discussing and what Matt implemented and I did not want to go in to
length on an example which appeared to be marginally on-topic.

-- 
--gh




Re: Transaction ID suggestions

2007-08-31 Thread Peter J. Holzer
On 2007-08-30 21:12:15 -0400, Charlie Brady wrote:
 On Thu, 30 Aug 2007, Peter J. Holzer wrote:
 On 2007-08-29 17:50:28 -0400, Charlie Brady wrote:
 A four-tuple identifying the TCP connection also identifies the server.
 
 Right. And the tuple must not be reused for some time (2*MSL or 4 minutes
 according to RFC 793), so you don't even need a high resolution timer.
 
 Indeed.
 
 However, what if there is no TCP connection yet? For example, in
 forkserver, the plugins are loaded before the first connection is
 accepted and you want to log a failure to load one of them (or the
 plugins may want to log something in their register method).
 
 I consider that to be a different issue. Log messages at that stage aren't 
 related to and don't need to be correlated with an email message.

Right, but we still want to log them and find out what logged them.

 
 You could just fill the remote part with zeros, but you can have 
 multiple processes listening on the same port and you can't distinguish 
 them in this case.
 
 You can't have multiple processes bound to the same 
 local_IP/local_port,

Sure I can:

habanero:~ 9:50 101# lsof -i :80 | grep LISTEN
httpd9875root   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd9946 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd9967root   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd9970 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd9974 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd9977 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd9980 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd9981 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd9991 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   10397 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   10400 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   10403 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   10790 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   11176 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   11728 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   14183 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   14186 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   14187 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   14194 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   14195 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   14198 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   14201 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)
httpd   14207 oraport   27u  IPv4  81804443   TCP *:http (LISTEN)

The httpd in question is an Apache btw, so I'd expect an Apache::Qpsmtpd
installation to look similar. 

 so you could distinguish hosts and processes by filling in the local
 part of the four-tuple.

That's what I meant with fill the remote part with zeros.

hp


-- 
   _  | Peter J. Holzer| I know I'd be respectful of a pirate 
|_|_) | Sysadmin WSR   | with an emu on his shoulder.
| |   | [EMAIL PROTECTED] |
__/   | http://www.hjp.at/ |-- Sam in Freefall


signature.asc
Description: Digital signature


Re: Transaction ID suggestions

2007-08-31 Thread Charlie Brady


On Fri, 31 Aug 2007, Michael Holzt wrote:


You can't have multiple processes bound to the same
local_IP/local_port,


Of course you can.

bind - listen - fork


Yes, brain fart at my end. s/$/ except by inheritance post-fork/.

If we stop listening post-fork (as qpsmtpd-forkserver does) then this 
state only occurs briefly. And since the fork occurs after accept(), then 
we already have a TCP four-tuple during that time interval.


However, there is still an issue with Peter's proposed zero out remote 
address components proposal - prior to accept(), qpstmpd-forkserver may 
have multiple listening sockets. Some of those sockets (e.g. 127.0.0.1:25) 
won't be unique across multiple hosts.


Re: Transaction ID suggestions

2007-08-31 Thread Peter J. Holzer
On 2007-08-31 10:42:37 -0400, Charlie Brady wrote:
 
 On Fri, 31 Aug 2007, Michael Holzt wrote:
 
 You can't have multiple processes bound to the same
 local_IP/local_port,
 
 Of course you can.
 
 bind - listen - fork
 
 Yes, brain fart at my end. s/$/ except by inheritance post-fork/.
 
 If we stop listening post-fork (as qpsmtpd-forkserver does) then this 
 state only occurs briefly. And since the fork occurs after accept(), then 
 we already have a TCP four-tuple during that time interval.
 
 However, there is still an issue with Peter's proposed zero out remote 
 address components proposal - prior to accept(), qpstmpd-forkserver may 
 have multiple listening sockets. Some of those sockets (e.g. 127.0.0.1:25) 
 won't be unique across multiple hosts.

127.0.0.1 is a problem even after establishing the connection: With
normal routing arrangements the remote IP address will be 127.0.0.1,
too, so the only variable is the remote port. If you aggregate log
messages from several hosts which receive locally generated messages,
that can be a problem.

-- 
   _  | Peter J. Holzer| I know I'd be respectful of a pirate 
|_|_) | Sysadmin WSR   | with an emu on his shoulder.
| |   | [EMAIL PROTECTED] |
__/   | http://www.hjp.at/ |-- Sam in Freefall


signature.asc
Description: Digital signature


Re: Transaction ID suggestions

2007-08-31 Thread Charlie Brady


On Fri, 31 Aug 2007, Peter J. Holzer wrote:


On 2007-08-31 10:42:37 -0400, Charlie Brady wrote:


However, there is still an issue with Peter's proposed zero out remote
address components proposal - prior to accept(), qpstmpd-forkserver may
have multiple listening sockets. Some of those sockets (e.g. 127.0.0.1:25)
won't be unique across multiple hosts.


127.0.0.1 is a problem even after establishing the connection: With
normal routing arrangements the remote IP address will be 127.0.0.1,
too, so the only variable is the remote port.


Just to clarify, you are referring to SMTP connections from other 
processes local to the qpsmtpd server, i.e. connecting over loopback. 
Correct?


Yes, I can see that in that case 127.0.0.1:nnn:127.0.0.1:25 would not 
identify the host and would not be unique across multiple servers.


---
Charlie


Re: Transaction ID suggestions

2007-08-30 Thread Peter J. Holzer
On 2007-08-30 10:08:36 +0200, Peter J. Holzer wrote:
 Here are some (measured) resolutions of gettimeofday on various systems:
 
 Linux/i386:  1 ms
 Linux/SPARC: 2 ms
 HP-UX/PA-RISC:   2 ms
 Linux/Alpha:   976 ms (1024 Hz)
 
 Ok, so the Alpha is obsolete, and Sun and HP hardware seems to include a
 timer with reasonably high resolution (both systems are a bit old I'd
 expect newer gear get .

The sentence in the parentheses was supposed to read: both systems are
a bit old - I'd expect newer gear to get full microsecond resolution.
Don't know how I managed to garble it that badly.

hp

-- 
   _  | Peter J. Holzer| I know I'd be respectful of a pirate 
|_|_) | Sysadmin WSR   | with an emu on his shoulder.
| |   | [EMAIL PROTECTED] |
__/   | http://www.hjp.at/ |-- Sam in Freefall


signature.asc
Description: Digital signature


Re: Transaction ID suggestions

2007-08-30 Thread Guy Hulbert
On Thu, 2007-08-30 at 10:08 +0200, Peter J. Holzer wrote:
 On 2007-08-29 18:36:12 -0400, Guy Hulbert wrote:
[snip]
  Just assume that time() can have the granularity of the CPU instruction
  counter[1].
 
 It could (if your perl implementation uses 128 bit long doubles), but it

Or you could have gettimeofday return 3 ints (sec, nano-sec, atto-sec)
instead of 2.

 isn't guaranteed to have that. You have to plan for the worst case and
 that's probably a 60 Hz counter.

Or you can just warn that the transaction ID may be broken on some
systems.  Does it provide some critical internal function or is it just
for logging ?

Or we can provide some alternate hack.

 
 Here are some (measured) resolutions of gettimeofday on various systems:
 
 Linux/i386:  1 ms
 Linux/SPARC: 2 ms
 HP-UX/PA-RISC:   2 ms
 Linux/Alpha:   976 ms (1024 Hz)

'ms' is usually milli-seconds but it appears you mean micro-seconds ( I
pretend that u=mu and write it 'us' ).

The alpha is a problem then.  However, Time::HiRes seems to be over 10
years old ... are the alpha boxes still being sold ?



[snip]
  However, with a 16 bit PID and 65K processors you might run
  into collisions with the PID ...
 
 I don't know see that follows. The PID still has to be unique at any
 particular time. If a system can run more than 32k processes in parallel
 it must use a 32 bit PID. 

Doh.  Yeah, iirc it's been 32 bits on AIX since 1992.

[snip]
  but I doubt anyone has a connection machine to run qpsmtpd on.
  
  I think time() + PID is sufficient for now ... unless threads share
  the PID ...
 
 They do on most systems - but you could use the TID instead of the PID.

Yup.

 
  ( otoh, qpsmtpd is not even threaded is it ? ).
 
 It might be possible to run Apache::Qpsmtpd on a multithreaded Apache.

Unlikely.  PHP people still won't bless mt apache.  Postgres people
discovered a problem with crypt() - from libc - you must not use crypt()
passwords with Pg on mt apache (the problem is only seen with very high
loads though).

 
   hp

-- 
--gh




Re: Transaction ID suggestions

2007-08-30 Thread Guy Hulbert
On Thu, 2007-08-30 at 10:45 +0200, Tony L. Svanstrom wrote:
   Would this be a bad time to mention that people might get the idea that 
   they want to run two different setups of qpsmtpd on the same server?
  
  No that's fine. PID is still in there taking care of that.
 
  True, but the code makes both the security guy and the programmer in me 
 twitch...

rotfl

 
  The part of the unique ID meant to identify the server is now focusing on 
 the 
 OS/computer instead of the instance of qpsmtpd; which one can only get away 
 with as the PID is in the connection ID-part, and thus we shouldn't get more
 collisions just because we run more than one instance on the same server.
 
  However, this is not only (currently) an undocumented and somewhat unobvious 
 feature of the ID-generation, but it's also an unnecessary limitation.
  If people ever were to remove the PID, maybe as soon as at the end of this 
 discussion, they might not think about fixing the $SALT_HOST.

wtf does this mean - the *purpose* of the discussion is to *fix* a
*unique* transaction ID when the discussion is over it is *fixed* and
the discussion *documents* the implementation.

What do you mean people ever were to remove the PID ?  If you make
random changes to any piece of code it's going to break.

 
  Using the IPs + port ought to be the way to go.

Please clarify.

Given sufficient resolution in the time(), you cannot have two processes
on the CPU for the same transaction ID.  I thought there might be a
problem if you have multiple CPUs but Peter H. has pointed out that the
PIDs must be different in that case.

You need to have context-switching faster than the clock resolution for
collisions in time() -- Peter has shown that the clock resolution is
(close to) 1 us for all likely systems other than the alpha.


 

-- 
--gh




Re: Transaction ID suggestions

2007-08-30 Thread Matt Sergeant

On 30-Aug-07, at 4:45 AM, Tony L. Svanstrom wrote:

 True, but the code makes both the security guy and the programmer  
in me

twitch...


Well, don't think of it for security then :-)

 The part of the unique ID meant to identify the server is now  
focusing on the

OS/computer instead of the instance of qpsmtpd;


Not really. It uses a random salt. So every instance will be different.

Matt.


Re: Transaction ID suggestions

2007-08-30 Thread Guy Hulbert
On Thu, 2007-08-30 at 09:14 -0400, Matt Sergeant wrote:
   The part of the unique ID meant to identify the server is now  

Is this unique ID the transaction ID we've been discussing.

Has someone already implemented it in svn - I thought it was a new
proposal (I'm just a bit confused here) ?

  focusing on the
  OS/computer instead of the instance of qpsmtpd;
 
 Not really. It uses a random salt. So every instance will be
 different.

That is not true.  Random numbers do not give unique results.  Also,
hash functions have collisions.  This is not a problem when using a hash
in perl because there is a collision-resolution mechanism.  For the
requirement of logging multiple independent qpsmtpd servers to a central
point there is no trivial mechanism to compare the results of the hash
function so you must use a predictable function on something unique to
the server.

The IP address (for IPv6 the 32 most-significant bits would probably
work) is one choice.  However, I think it might be better to use a value
derived from config('me') but it cannot be a hash.  A suitable
non-random choice might be substr(config('me')) padded with '_' to a
fixed length.  Since the sysadmin has to conifigure qpsmtpd to use it,
he can make sure that his configurations will work together (if he
cares).

-- 
--gh




Re: Transaction ID suggestions

2007-08-30 Thread Matt Sergeant

On 30-Aug-07, at 9:34 AM, Guy Hulbert wrote:


On Thu, 2007-08-30 at 09:14 -0400, Matt Sergeant wrote:

 The part of the unique ID meant to identify the server is now


Is this unique ID the transaction ID we've been discussing.


Yes.


Has someone already implemented it in svn - I thought it was a new
proposal (I'm just a bit confused here) ?


Yes, it's in svn.


focusing on the
OS/computer instead of the instance of qpsmtpd;


Not really. It uses a random salt. So every instance will be
different.


That is not true.  Random numbers do not give unique results.


True enough. But I'm going out on a limb to assume that it's good  
enough for logging. It's not a security feature.


Matt.


Re: Transaction ID suggestions

2007-08-30 Thread Tony L. Svanstrom
On Thu, 30 Aug 2007 the voices made Guy Hulbert write:

GH wtf does this mean - the *purpose* of the discussion is to *fix* a
GH *unique* transaction ID when the discussion is over it is *fixed* and
GH the discussion *documents* the implementation.

 I meant undocumented as in it in Transaction.pm currently says Generate 
unique id without mentioning that the earlier defined $SALT_HOST relies on 
certain aspects of the ID-generation, without which the $id might not be unique 
in cases where there's more than one instance of qpsmtpd running on a single 
server. (Or on two different servers with the same hostname, which isn't 
exactly unheard of; it happens both by mistake and by design, for instance if 
setting up a testserver... which you still might want to use with whatever 
centralized logging you've got.)

GH What do you mean people ever were to remove the PID ?  If you make
GH random changes to any piece of code it's going to break.

 Random changes yes, but as this discussion has clearly shown it isn't 
unreasonable to consider creating unique IDs without using the PID 
(incrementing counter etc); and it isn't unreasonable to view the transaction 
ID and the server ID as two seperate things, which combined creates a 
(hopefully) universally unique ID. Even the (current) code structure reflects 
such thinking.

 To then use a server ID that I think everyone on this list can agree on has a 
lesser chance of being unique, esp. if minor changes are made to it, isn't as 
future/idiot-proof as it easily could be; and if it's easily done at least I 
prefer to write code that minimizes the chances that people will mess up when 
working with it.
 It's enough that someone removes the crypt+rand to easier search the logs for 
this solution (hostname-based) to theoretically start creating trouble/break 
(well, at least crack slightly in a corner or two).

GH   Using the IPs + port ought to be the way to go.
GH 
GH Please clarify.

 To qpsmtpd the hostname isn't as unique as the IPs + port used by it is.

 Actually, although IPs+port IMHO is better than hostname it was silly of me to 
say that it ought to be the way to go, as it doesn't deal with special-use 
addresses well enough... but it'd be easy to catch those and do something 
create/output a warning.

 I think I'll exit the discussion here; you can battle it out among yourselves, 
and if I'm unhappy with the results I'll just show up with some code and 
restart the fire... ;-)

GH Given sufficient resolution in the time(), you cannot have two processes
GH on the CPU for the same transaction ID.  I thought there might be a
GH problem if you have multiple CPUs but Peter H. has pointed out that the
GH PIDs must be different in that case.
GH 
GH You need to have context-switching faster than the clock resolution for
GH collisions in time() -- Peter has shown that the clock resolution is
GH (close to) 1 us for all likely systems other than the alpha.

 Or the same hostname on a second server, which is something we shouldn't rule 
out...



/Tony
-- 
Generally speaking, taunting mentally unstable people is a bad idea.


Re: Transaction ID suggestions

2007-08-30 Thread Guy Hulbert
On Thu, 2007-08-30 at 10:01 -0400, Matt Sergeant wrote:
  That is not true.  Random numbers do not give unique results.
 
 True enough. But I'm going out on a limb to assume that it's good  
 enough for logging. It's not a security feature.

But this (by design[*]) doesn't meet the requirement.

The (ok, one) purpose of logging is to be able to trace the results of
running the service and if your hash collides ALL the messages from the
two servers where it collides will be ambiguous (by source).  Using a
non-random and predictable function on config('me') allows the user to
avoid this problem (without modifying the core code).

[*] As opposed to the implementation - where Peter has pointed out some
limitations of Time::HiRes on one old platform.

Thanks for the clarification on svn ... I'll have to check it out (but
not today) to see it.

 
 Matt.
-- 
--gh




Re: Transaction ID suggestions

2007-08-30 Thread Guy Hulbert
On Thu, 2007-08-30 at 16:07 +0200, Tony L. Svanstrom wrote:
  To qpsmtpd the hostname isn't as unique as the IPs + port used by it
 is.

But for qpsmptd the hostname is configurable ( config('me') ).  As
long as a hash is not used (see my follow-up to Matt) and the function
used is documented, e.g.: sprintf(%_8s,substr(config('me',0,8)) so:

me = linux1
- linux1__

me = linux2.example.com
- linux2.e

If you run two instances you can call them 'thing1' and 'thing2'.

-- 
--gh




Re: Transaction ID suggestions

2007-08-30 Thread Matt Sergeant

On 30-Aug-07, at 10:07 AM, Tony L. Svanstrom wrote:


On Thu, 30 Aug 2007 the voices made Guy Hulbert write:

GH wtf does this mean - the *purpose* of the discussion is to *fix* a
GH *unique* transaction ID when the discussion is over it is  
*fixed* and

GH the discussion *documents* the implementation.

 I meant undocumented as in it in Transaction.pm currently says  
Generate
unique id without mentioning that the earlier defined $SALT_HOST  
relies on
certain aspects of the ID-generation, without which the $id might  
not be unique
in cases where there's more than one instance of qpsmtpd running on  
a single

server.


Including PID takes care of that. And you're assuming a broken srand 
() too.


Admittedly, there's a very very remote freak possibility that given  
two identical hostnames, a rand() with a broken srand(), and those  
servers starting at the exact same microsecond time with the exact  
same PID, that you MIGHT, just MAYBE, get a duplicate transaction id.


The alternative seems to me the only way to satisfy your security  
paranoid mind is to use Data::UUID, which is an extra dependency I  
don't want to add in.


Matt.


Re: Transaction ID suggestions

2007-08-30 Thread Guy Hulbert
On Thu, 2007-08-30 at 10:30 -0400, Matt Sergeant wrote:
 On 30-Aug-07, at 10:07 AM, Tony L. Svanstrom wrote:
 
  On Thu, 30 Aug 2007 the voices made Guy Hulbert write:
[snip]
  GH the discussion *documents* the implementation.
 
   I meant undocumented as in it in Transaction.pm currently says 

In principle, the documentation will be updated when the discussion is
complete.

  
  Generate
  unique id without mentioning that the earlier defined $SALT_HOST  
  relies on
  certain aspects of the ID-generation, without which the $id might  
  not be unique
  in cases where there's more than one instance of qpsmtpd running on  
  a single
  server.
 
 Including PID takes care of that. And you're assuming a broken srand 
 () too.
 
 Admittedly, there's a very very remote freak possibility that given  
 two identical hostnames, a rand() with a broken srand(), and those  
 servers starting at the exact same microsecond time with the exact  
 same PID, that you MIGHT, just MAYBE, get a duplicate transaction id.

Nope.  I reject this.  The design ASSUMES that the clock has sufficient
resolution.  It is the implementation which chooses Time::HiRes.  There
are two perfect solutions (bikesheds ;-):

1. Use a timer based directly on the values in the instruction count
register.  IIRC, the linux kernel clock (at least on intel) just
quantizes this in either micro- or nano- seconds. [bikeshed = kernel
patch]

2. Implement our own clock using a sequence generator, which reads the
last value out of the tail of the log on startup (and is
thread/async-safe).

I think that using PID is a bit of a hack but it seems to work in every
case that anyone has come up with.  It should be changed to TID, should
qpsmtpd ever be blessed as thread-safe but I'm not holding my breath
for that to happen ;-) ... besides, async is a much better choice
(compare lighttpd with apache).

 
 The alternative seems to me the only way to satisfy your security  
 paranoid mind is to use Data::UUID, which is an extra dependency I 

I think the use of the adjective security in this context is rather
generous.

  
 don't want to add in.
 
 Matt.

-- 
--gh




Re: [Fwd: Re: Transaction ID suggestions]

2007-08-30 Thread Ask Bjørn Hansen


On Aug 26, 2007, at 10:02, Matt Sergeant wrote:


On 25-Aug-07, at 8:37 PM, Guy Hulbert wrote:

The mod_uniqueid module in apache has quite a reasonable  
implementation.


There is a perl implementation on CPAN (in my directory).


I'm assuming Ask is referring to Apache::Usertrack, which does this:


Hmn, I did - but that's not what I had in mind.  I mixed up  
mod_usertrack and mod_unique_id in my head.


From mod_unique_id.c (in Apache):

/* Comments:
*
* We want an identifier which is unique across all hits, everywhere.
* everywhere includes multiple httpd instances on the same machine,  
or on
* multiple machines.  Essentially everywhere should include all  
possible
* httpds across all servers at a particular site.  We make some  
assumptions
* that if the site has a cluster of machines then their time is  
relatively

* synchronized.  We also assume that the first address returned by a
* gethostbyname (gethostname()) is unique across all the machines at the
* site.
*
* We also further assume that pids fit in 32-bits.  If something uses  
more
* than 32-bits, the fix is trivial, but it requires the unrolled  
uuencoding
* loop to be extended.  * A similar fix is needed to support  
multithreaded

* servers, using a pid/tid combo.
*
* Together, the in_addr and pid are assumed to absolutely uniquely  
identify
* this one child from all other currently running children on all  
servers
* (including this physical server if it is running multiple httpds)  
from each

* other.
*
* The stamp and counter are used to distinguish all hits for a  
particular

* (in_addr,pid) pair.  The stamp is updated using r-request_time,
* saving cpu cycles.  The counter is never reset, and is used to  
permit up to

* 64k requests in a single second by a single child.
*
* The 112-bits of unique_id_rec are encoded using the alphabet
* [EMAIL PROTECTED], resulting in 19 bytes of printable characters.  That  
is then
* stuffed into the environment variable UNIQUE_ID so that it is  
available to
* other modules.  The alphabet choice differs from normal base64  
encoding
* [A-Za-z0-9+/] because + and / are special characters in URLs and we  
want to

* make it easy to use UNIQUE_ID in URLs.
*
* Note that UNIQUE_ID should be considered an opaque token by other
* applications.  No attempt should be made to dissect its internal  
components.
* It is an abstraction that may change in the future as the needs of  
this

* module change.
*
* It is highly desirable that identifiers exist for eternity.  But  
future
* needs (such as much faster webservers, moving to 64-bit pids, or  
moving to a

* multithreaded server) may dictate a need to change the contents of
* unique_id_rec.  Such a future implementation should ensure that the  
first
* field is still a time_t stamp.  By doing that, it is possible for a  
site to
* have a flag second in which they stop all of their old-format  
servers,

* wait one entire second, and then start all of their new-servers.  This
* procedure will ensure that the new space of identifiers is  
completely unique
* from the old space.  (Since the first four unencoded bytes always  
differ.)

*/
/*



  - ask

--
http://develooper.com/ - http://askask.com/




Re: Transaction ID suggestions

2007-08-30 Thread Ask Bjørn Hansen

Woah - bikeshedding galore!

I just got my email downloaded to my mac (I'm traveling) and Mail.app  
says there are 61 mails in this thread (plus those I deleted  
earlier!?!).


Enough already.

If anyone has a serious realistic concern with what Matt did, please  
provide a perl implementation of mod_unique_id from Apache -  
otherwise then let's leave this alone for now.



 - ask


Re: Transaction ID suggestions

2007-08-30 Thread JT Moree
Guy Hulbert wrote:

 me = linux1
   - linux1__
 
 me = linux2.example.com
   - linux2.e
 
 If you run two instances you can call them 'thing1' and 'thing2'.
 
I'd rather not.

-- 
JT Moree


Re: Transaction ID suggestions

2007-08-30 Thread Guy Hulbert
On Fri, 2007-08-31 at 00:59 +0800, Ask Bjørn Hansen wrote:
 Woah - bikeshedding galore!
 
 I just got my email downloaded to my mac (I'm traveling) and Mail.app  
 says there are 61 mails in this thread (plus those I deleted  
 earlier!?!).
 
 Enough already.

There might have been a little less chat if he'd posted the code to the
list ... fwiw, here it is.

 
 If anyone has a serious realistic concern with what Matt did, please  

http://svn.perl.org/qpsmtpd/trunk/lib/Qpsmtpd/Transaction.pm

  # Generate unique id
  # use gettimeofday for microsec precision
  # add in rand() in case gettimeofday clock is slow (e.g. bsd?)
  # add in $$ in case srand is set per process
  my ($start, $mstart) = gettimeofday();
  my $id = sprintf(%d.%06d.%s.%d.%d,
  $start,
  $mstart,
  $SALT_HOST, 
  rand(1),
  $$,
  );



 provide a perl implementation of mod_unique_id from Apache -  
 otherwise then let's leave this alone for now.
 
 
   - ask

-- 
--gh




Re: Transaction ID suggestions

2007-08-30 Thread Guy Hulbert
Ask asked us to stop ... but what the heck ;-).

Perhaps we should drop the list after this one though.

On Thu, 2007-08-30 at 14:19 -0400, Matt Sergeant wrote:
 On 30-Aug-07, at 10:57 AM, Guy Hulbert wrote:
 
  Nope.  I reject this.  The design ASSUMES that the clock has  
  sufficient
  resolution.  It is the implementation which chooses Time::HiRes.
 
 Fine, so on Alpha, you have a qpsmtpd installation that is using  

First, what I'm saying, is that I don't think we should be particularly
worried about an almost obsolete platform.  Also, I am quite happy with
whatever you decide as long as it reflects the requirements that
everyone has requested (which it seems to do).

 async and doing more than 1000 mails/second? And given that it has  
 rand(1) in there, you also need a rand() collision in that

However.

Nope.

The problem with random number generators is that their output is
*random*.  That means that you will occasionally get results very close
together and when you quantize it (e.g. rand(10)) it means you will
get the same number consecutively.  This is exactly what you do not want
when your problem is insufficiently resolved times.  You'd be better off
using a block-cipher (e.g. DES) which scatters results *uniformly*.

But either case is a hack so rand() will do since it's available.

Actually, I think the right answer is just a sequence generator (mod
1).  That guarantees different consecutive results.  In python you
could just use an iterator ... I'm not sure about perl.

Have you read Knuth on random number generators ?  It's quite amusing.

   
 millisecond. You're reaching for a problem.
 On normal platforms the minimum granularity is on the order of 1  
 billion mails/sec. Let me know when you're building the single CPU  
 system that can do that, I'd like to buy one.
 
 Note that mod_unique_id is only designed for 64k hits/sec.
-- 
--gh




Re: Transaction ID suggestions

2007-08-30 Thread Matt Sergeant

On 30-Aug-07, at 2:52 PM, Guy Hulbert wrote:


Actually, I think the right answer is just a sequence generator (mod
1).  That guarantees different consecutive results.


I think so too. In my testing perl only switches to floating point at  
or around 2**50 on 32 bit platforms, which should allow enough email  
between restarts for even the fastest mail systems on the planet.


Consider rand() gone and a sequence used instead.

Matt.


Re: Transaction ID suggestions

2007-08-30 Thread Peter J. Holzer
On 2007-08-29 17:50:28 -0400, Charlie Brady wrote:
 A four-tuple identifying the TCP connection also identifies the server.

Right. And the tuple must not be reused for some time (2*MSL or 4 minutes
according to RFC 793), so you don't even need a high resolution timer. 

However, what if there is no TCP connection yet? For example, in
forkserver, the plugins are loaded before the first connection is
accepted and you want to log a failure to load one of them (or the
plugins may want to log something in their register method). You could
just fill the remote part with zeros, but you can have multiple
processes listening on the same port and you can't distinguish them in
this case.

hp

-- 
   _  | Peter J. Holzer| I know I'd be respectful of a pirate 
|_|_) | Sysadmin WSR   | with an emu on his shoulder.
| |   | [EMAIL PROTECTED] |
__/   | http://www.hjp.at/ |-- Sam in Freefall


signature.asc
Description: Digital signature


Re: Transaction ID suggestions

2007-08-30 Thread Peter J. Holzer
On 2007-08-30 07:07:51 -0400, Guy Hulbert wrote:
 On Thu, 2007-08-30 at 10:08 +0200, Peter J. Holzer wrote:
  On 2007-08-29 18:36:12 -0400, Guy Hulbert wrote:
  Here are some (measured) resolutions of gettimeofday on various systems:
  
  Linux/i386:  1 ms
  Linux/SPARC: 2 ms
  HP-UX/PA-RISC:   2 ms
  Linux/Alpha:   976 ms (1024 Hz)
 
 'ms' is usually milli-seconds but it appears you mean micro-seconds ( I
 pretend that u=mu and write it 'us' ).

Fortunately I am using a German keyboard so I can claim an AltGr
key malfunction ;-) (AltGr+m = µ)


   ( otoh, qpsmtpd is not even threaded is it ? ).
  
  It might be possible to run Apache::Qpsmtpd on a multithreaded Apache.
 
 Unlikely.  PHP people still won't bless mt apache.

But mod_perl people do, AFAIK.

 Postgres people discovered a problem with crypt() - from libc

Interesting. This bug has been known for a long time (Rasmus Lerdorf
wrote 2004 that he tracked it down a couple of years ago), yet crypt
in the glibc still isn't threadsafe even though that should be very easy
to fix.  Obviously few people invoke crypt in multithreaded programs.

hp

-- 
   _  | Peter J. Holzer| I know I'd be respectful of a pirate 
|_|_) | Sysadmin WSR   | with an emu on his shoulder.
| |   | [EMAIL PROTECTED] |
__/   | http://www.hjp.at/ |-- Sam in Freefall


signature.asc
Description: Digital signature


Re: Transaction ID suggestions

2007-08-30 Thread Charlie Brady


On Thu, 30 Aug 2007, Peter J. Holzer wrote:


On 2007-08-29 17:50:28 -0400, Charlie Brady wrote:

A four-tuple identifying the TCP connection also identifies the server.


Right. And the tuple must not be reused for some time (2*MSL or 4 minutes
according to RFC 793), so you don't even need a high resolution timer.


Indeed.


However, what if there is no TCP connection yet? For example, in
forkserver, the plugins are loaded before the first connection is
accepted and you want to log a failure to load one of them (or the
plugins may want to log something in their register method).


I consider that to be a different issue. Log messages at that stage aren't 
related to and don't need to be correlated with an email message.


You could just fill the remote part with zeros, but you can have 
multiple processes listening on the same port and you can't distinguish 
them in this case.


You can't have multiple processes bound to the same 
local_IP/local_port, so you could distinguish hosts and processes by 
filling in the local part of the four-tuple. There's still an edge case 
where multiple processes are started with the same local port 
configuration, all but one of which will fail. Do we really ever expect to 
be merging logs from such errant processes?


Re: Transaction ID suggestions

2007-08-29 Thread Guy Hulbert
On Tue, 2007-08-28 at 23:04 -0400, Charlie Brady wrote:
  On 28-Aug-07, at 3:51 PM, JT Moree wrote:
  hires_time.pid.local_port
 ...
 my $conn = $args{connection};
 my $ip = $conn-local_port || 0;
 my $start = time;
 my $id = $start.$$.$ip;
 
   Some people have suggested adding the remote IP address.  I'm
 curious
   why use local port instead of remote port?  would both be better?
 
  Err, actually I had a brain fart. It should be remote_port.
 
 No, it should be remote_IP.remote_port.local_port and should include
 a 
 transaction_within_connection count. I don't think that pid adds
 anything.
 
 
 
 

This does not guarantee a unique message ID.  That's why we are using
hi_res time.


-- 
--gh





Re: Transaction ID suggestions

2007-08-29 Thread Matt Sergeant

On 28-Aug-07, at 11:04 PM, Charlie Brady wrote:


Err, actually I had a brain fart. It should be remote_port.


No, it should be remote_IP.remote_port.local_port and should  
include a transaction_within_connection count. I don't think that  
pid adds anything.


Please try any way you can to get the algorithm I've used to generate  
a duplicate transaction id. Feel free to use your fastest hardware.


I've tried, and cannot conceive of any way to get a repeat with this  
algorithm. Perhaps in 30 years maybe (when computers are that fast),  
but for now it works well.


Matt.


Re: Transaction ID suggestions

2007-08-29 Thread Chris Garrigues
 From:  Charlie Brady [EMAIL PROTECTED]
 Date:  Tue, 28 Aug 2007 23:04:56 -0400 (EDT)

 No, it should be remote_IP.remote_port.local_port and should include a 
 transaction_within_connection count. I don't think that pid adds anything.

Isn't localport always 25?

Chris

-- 
Chris Garrigues Trinsic Solutions
President   710-B West 14th Street
Austin, TX  78701-1798
http://www.trinsics.com/blog
http://www.trinsics.com 512-322-0180

 Would you rather proactively pay for
uptime or reactively pay for downtime?

  Trinsic Solutions
Your Trusted Friends in Proactive IT.




pgpJgOGkhQbvh.pgp
Description: PGP signature


Re: Transaction ID suggestions

2007-08-29 Thread Jens Weibler
Chris Garrigues wrote:
 From:  Charlie Brady [EMAIL PROTECTED]
 Date:  Tue, 28 Aug 2007 23:04:56 -0400 (EDT)

 No, it should be remote_IP.remote_port.local_port and should include a 
 transaction_within_connection count. I don't think that pid adds anything.
 

 Isn't localport always 25?
   
the most time: yes.
But it can also be 465

-- 
Jens




signature.asc
Description: OpenPGP digital signature


Re: Transaction ID suggestions

2007-08-29 Thread Johan Almqvist

Charlie Brady wrote:


On Tue, 28 Aug 2007, Matt Sergeant wrote:


On 28-Aug-07, at 3:51 PM, JT Moree wrote:

hires_time.pid.local_port

...

   my $conn = $args{connection};
   my $ip = $conn-local_port || 0;
   my $start = time;
   my $id = $start.$$.$ip;

 Some people have suggested adding the remote IP address.  I'm curious
 why use local port instead of remote port?  would both be better?


Err, actually I had a brain fart. It should be remote_port.


No, it should be remote_IP.remote_port.local_port and should include a 
transaction_within_connection count. I don't think that pid adds anything.


You could still have a machine with several IP's / interfaces, so 
emote_IP.remote_port.local_port.transaction_within_connection is not 
enough either.


-Johan


Re: Transaction ID suggestions

2007-08-29 Thread m. allan noah
On 8/29/07, JT Moree [EMAIL PROTECTED] wrote:
 Given that we are still disagreeing on what is the best way to do it;
 Can we use all information used so far to get the most unique possible
 for now?  Even if it's not perfect, it's a start.  Even if some of the
 information seems extraneous to some people (and may be) it's still
 better than nothing.

 Short of using UUID i'd say doing something like this.  I've tried to
 put the order of information from most static to most dynamic.

 Using HiRes::Time

 my $ip = $conn-remote_ip($ip);
 my $rport = $conn-remote_port || 0;
 my $lport = $conn-local_port || 0;
 my $start = time;
 my $id = $$_$start.$lport_$ip:$rport;

 --
 JT Moree


if you want to be paranoid, you have to have all 4 data points from
the connection- local port/ip, and remote port/ip, plus local boxes'
time with high granularity. if you re-gen '$start' with each
transaction within the connection, you dont need a per-connection
counter, provided that your time is fine enough to prevent collisions.

If you leave out any of the local info, an installation with two
servers with un-synced times could still gen the same id. if you add
it, then the only way you could have a collision is if your time is
not granular enough or gets set back.

tcp sequence numbers can also be useful here as a replacement for
time, but might be hard to get within perl?

allan

-- 
The truth is an offense, but not a sin


Re: Transaction ID suggestions

2007-08-29 Thread Guy Hulbert
On Wed, 2007-08-29 at 11:53 -0400, m. allan noah wrote:
 On 8/29/07, JT Moree [EMAIL PROTECTED] wrote:
  Given that we are still disagreeing on what is the best way to do it;
  Can we use all information used so far to get the most unique possible
  for now?  Even if it's not perfect, it's a start.  Even if some of the
  information seems extraneous to some people (and may be) it's still
  better than nothing.
 
  Short of using UUID i'd say doing something like this.  I've tried to
  put the order of information from most static to most dynamic.
 
  Using HiRes::Time

i.e.

use HiRes::Time qw (time);

 
  my $ip = $conn-remote_ip($ip);
  my $rport = $conn-remote_port || 0;
  my $lport = $conn-local_port || 0;
  my $start = time;
  my $id = $$_$start.$lport_$ip:$rport;
 
  --
  JT Moree
 
 
 if you want to be paranoid, you have to have all 4 data points from

Why is there all this confusion about security ?  The goal is to have
a unique MessageID for logs ... 

[snip]
 tcp sequence numbers can also be useful here as a replacement for

I doubt it very much.  TCP sequence numbers have a history of poor
implementation.

 time, but might be hard to get within perl?
 
 allan

-- 
--gh




Re: Transaction ID suggestions

2007-08-29 Thread Michael Holzt
  Isn't localport always 25?
 the most time: yes.
 But it can also be 465

Also port 587 (message submission as per RFC2476).


Regards
Michael

-- 
It's an insane world, but i'm proud to be a part of it. -- Bill Hicks


Re: Transaction ID suggestions

2007-08-29 Thread JT Moree
 If you leave out any of the local info, an installation with two
 servers with un-synced times could still gen the same id. if you add
 it, then the only way you could have a collision is if your time is
 not granular enough or gets set back.

I'm ok with that

 Using HiRes::Time

my $lip = $conn-local_ip();
my $rip = $conn-remote_ip();
my $rport = $conn-remote_port || 0;
my $lport = $conn-local_port || 0;
my $start = time;
my $id = $$_$start_$lip:$lport_$rip:$rport;

-- 
JT Moree


Re: Transaction ID suggestions

2007-08-29 Thread Tony L. Svanstrom
On Wed, 29 Aug 2007 the voices made Guy Hulbert write:

GH Why is there all this confusion about security ?  The goal is to have
GH a unique MessageID for logs ... 

 Then forget about the word security, and let's just say that people might 
want to have unique IDs that'll be unique even when they've got more than one 
server and centralized/aggregated logging... But we're not even there right 
now, we are still stuck on how to make the IDs 100% unique within a single 
server as it might be setup by any qpsmtpd-user.



/Tony
-- 
Generally speaking, taunting mentally unstable people is a bad idea.


Re: Transaction ID suggestions

2007-08-29 Thread m. allan noah
On 8/29/07, Guy Hulbert [EMAIL PROTECTED] wrote:
  if you want to be paranoid, you have to have all 4 data points from

 Why is there all this confusion about security ?  The goal is to have
 a unique MessageID for logs ...

i never said security. i said paranoid, specifically about collisions.

allan

-- 
The truth is an offense, but not a sin


Re: Transaction ID suggestions

2007-08-29 Thread Guy Hulbert
On Wed, 2007-08-29 at 12:23 -0400, m. allan noah wrote:
 On 8/29/07, Guy Hulbert [EMAIL PROTECTED] wrote:
   if you want to be paranoid, you have to have all 4 data points from
 
  Why is there all this confusion about security ?  The goal is to have
  a unique MessageID for logs ...
 
 i never said security. i said paranoid, specifically about collisions.

If the message ID is unique there will be no collisions.  So I
interpreted your paranoia ... my bad.

 
 allan

-- 
--gh




Re: Transaction ID suggestions

2007-08-29 Thread Guy Hulbert
On Wed, 2007-08-29 at 18:15 +0200, Tony L. Svanstrom wrote:
 On Wed, 29 Aug 2007 the voices made Guy Hulbert write:
 
 GH Why is there all this confusion about security ?  The goal is to have
 GH a unique MessageID for logs ... 
 
  Then forget about the word security, and let's just say that people might 
 want to have unique IDs that'll be unique even when they've got more than one 
 server and centralized/aggregated logging... But we're not even there right 
 now, we are still stuck on how to make the IDs 100% unique within a single 
 server as it might be setup by any qpsmtpd-user.

There have been several adequate suggestions.  This is only a problem if
it goes into the qpsmtpd core since some of the suggestions are reported
to be in use already.

Perhaps it would help to agree on a list of requirements.  From what I
can remember these are:

1. A unique ID per message (on one server).
2. Ability to distinguish per recipient.
3. Ability to identify the server.

A sequence solves (1) except for simultaneous processing of
incoming messages via:

a) async
b) threads/multiple cpus
c) local ports (possibly on multiple addresses)

Except with multiple CPUs, time with sufficient resolution is a
satisfactory replacement for a sequence.

It may be useful to log things like remote_port but it doesn't seem to
help directly to solve problem 1.

A counter solves 2.

Any tag which is unique per server solves 3.  It is probably simpler to
make this configurable by the end-user.

 
 
 
   /Tony
-- 
--gh




Re: Transaction ID suggestions

2007-08-29 Thread David Sparks
A UUID is preferable to the other solutions because you can condense it
down to 128 bits of binary data ... and put it in a database. :)

The other solutions are not as database friendly.  It seems to me if
we're trying to solve the problem of guaranteeing unique transaction ids
for extremely high volume sites, then we should make sure that the
transaction id itself is high volume friendly.

Cheers,

ds


Re: Transaction ID suggestions

2007-08-29 Thread Guy Hulbert
On Wed, 2007-08-29 at 10:14 -0700, David Sparks wrote:
 A UUID is preferable to the other solutions because you can condense it
 down to 128 bits of binary data ... and put it in a database. :)

HiRes::Timer is 64 bits ... leaving 64 bits for the server tag.

 
 The other solutions are not as database friendly.  It seems to me if
 we're trying to solve the problem of guaranteeing unique transaction ids
 for extremely high volume sites, then we should make sure that the
 transaction id itself is high volume friendly.
 
 Cheers
-- 
--gh




Re: Transaction ID suggestions

2007-08-29 Thread JT Moree
Guy Hulbert wrote:
 There have been several adequate suggestions.  This is only a problem if
 it goes into the qpsmtpd core since some of the suggestions are reported
 to be in use already.
how is this a problem.  those uses should still work even if we start
with the same variable because they would overwrite what is in core.
The plugin maintainers can update as they have time.

good idea about the requirements.

   1. A unique ID per message (on one server).
   2. Ability to distinguish per recipient.
   3. Ability to identify the server.

2) per recipient or per message?  I don't see a way to make an id per
recipient since any message can have multiple recipients.

3) which server are we talking about?

 A sequence solves (1) except for simultaneous processing of
snip
 A counter solves 2.
 
 Any tag which is unique per server solves 3.  It is probably simpler to
 make this configurable by the end-user.

if A solves 1, B solves 2, and C solves 3 then A+B+C should solve all
three and it's pretty simple to do so let's just do it.

While letting the end user make changes is nice it defeats the purpose
of putting a transaction ID into core where everyone can know and rely
on it working the same way.

-- 
JT Moree


Re: Transaction ID suggestions

2007-08-29 Thread Matt Sergeant

Tony L. Svanstrom wrote:
 Then forget about the word security, and let's just say that people might 
want to have unique IDs that'll be unique even when they've got more than one 
server and centralized/aggregated logging... But we're not even there right 
now, we are still stuck on how to make the IDs 100% unique within a single 
server as it might be setup by any qpsmtpd-user.


No, that much works, as far as I've been able to prove. It's just a 
bunch of bikeshed painting going on now :-)


I'd be happy to add a quick hash of the server in.

Matt.


Re: Transaction ID suggestions

2007-08-29 Thread Guy Hulbert
On Wed, 2007-08-29 at 11:16 -0700, JT Moree wrote:
 Guy Hulbert wrote:
  There have been several adequate suggestions.  This is only a problem if
  it goes into the qpsmtpd core since some of the suggestions are reported
  to be in use already.
 how is this a problem.  those uses should still work even if we start

I think you answered this at the end.

 with the same variable because they would overwrite what is in core.
 The plugin maintainers can update as they have time.
 
 good idea about the requirements.

Well if people restrict their input to the requirements it simplifies
things.

 
  1. A unique ID per message (on one server).
  2. Ability to distinguish per recipient.
  3. Ability to identify the server.
4. Well-defined format (e.g. UUID).
 
 2) per recipient or per message?  I don't see a way to make an id per
 recipient since any message can have multiple recipients.

There was a suggestion way back in the thread that this was required.  I
don't really know if it is required but it has been mentioned more than
once (by people besides me).

 
 3) which server are we talking about?

If you use syslog you can have all your logs in one place but if you are
running multiple mail servers then you might want to know which server
is responsible for a particular message ID.

 
  A sequence solves (1) except for simultaneous processing of
 snip
  A counter solves 2.
  
  Any tag which is unique per server solves 3.  It is probably simpler to
  make this configurable by the end-user.
 
 if A solves 1, B solves 2, and C solves 3 then A+B+C should solve all
 three and it's pretty simple to do so let's just do it.

I would just use either what Matt Seargent is using 

http://www.nntp.perl.org/group/perl.qpsmtpd/2007/08/msg7116.html

Yeah, we use HiRes::time() . .$$ and we don't get any file stomping  
(and we're doing millions of emails/day).

or somthing like a UUID ... 

Here is an old UUID I have lying around:
f9c31c2d-b3fb-0310-82b0-c4cdd2013627
so we can make it look something like that.

use Time::HiRes qw( gettimeofday );
print sprintf(%08x-%08x-%04x\n,gettimeofday,$$);

46d5cf96-45e3-3348

This, at least looks a bit like a UUID and can be extended with -%04x
formatted pieces.  As long as (2) and (3) are not needed, we are done.

I have to run now ...

 While letting the end user make changes is nice it defeats the purpose
 of putting a transaction ID into core where everyone can know and rely
 on it working the same way.

 
-- 
--gh




Re: Transaction ID suggestions

2007-08-29 Thread Peter J. Holzer
On 2007-08-29 13:07:06 -0400, Guy Hulbert wrote:
 On Wed, 2007-08-29 at 18:15 +0200, Tony L. Svanstrom wrote:
  On Wed, 29 Aug 2007 the voices made Guy Hulbert write:
  
  GH Why is there all this confusion about security ?  The goal is to have
  GH a unique MessageID for logs ... 
  
   Then forget about the word security, and let's just say that people 
  might 
  want to have unique IDs that'll be unique even when they've got more than 
  one 
  server and centralized/aggregated logging... But we're not even there right 
  now, we are still stuck on how to make the IDs 100% unique within a 
  single 
  server as it might be setup by any qpsmtpd-user.
 
 There have been several adequate suggestions.  This is only a problem if
 it goes into the qpsmtpd core since some of the suggestions are reported
 to be in use already.
 
 Perhaps it would help to agree on a list of requirements.  From what I
 can remember these are:
 
   1. A unique ID per message (on one server).

I'd rephrase that as unique ID per transaction. Not every transaction
results in a message (indeed, on my systems 90+% of transactions don't
result in a message).


   2. Ability to distinguish per recipient.

I'm not even sure what per recipient should mean here. Does it mean
per RCPT command, so that a log file looks something like this:

abcdef.0 Accepted connection 1/15 from 192.0.2.1 /foo.example.com
abcdef.0 check_earlytalker plugin: remote host said nothing spontaneous, 
proceeding
abcdef.0 220 ns1.hjp.at ESMTP qpsmtpd 0.40 ready; send us your mail, but not 
your spam.
abcdef.0 dispatching EHLO foo.example.com
abcdef.0 250-ns1.hjp.at Hi foo.example.com [192.0.2.1]
abcdef.0 250-PIPELINING
abcdef.0 250-8BITMIME
abcdef.0 250 STARTTLS
abcdef.0 dispatching MAIL FROM:[EMAIL PROTECTED]
abcdef.0 from email address : [[EMAIL PROTECTED]]
abcdef.0 Plugin check_badmailfrom, hook mail returned DECLINED
abcdef.0 250 [EMAIL PROTECTED], sender OK - how exciting to get mail from you!
abcdef.1 dispatching RCPT TO:[EMAIL PROTECTED]
abcdef.1 to email address : [[EMAIL PROTECTED]]
abcdef.1 Plugin aliases_check, hook rcpt returned DECLINED,
abcdef.1 Plugin spamhaus, hook rcpt returned DECLINED,
abcdef.1 250 [EMAIL PROTECTED], recipient ok
abcdef.2 dispatching RCPT TO:[EMAIL PROTECTED]
abcdef.2 to email address : [[EMAIL PROTECTED]]
abcdef.2 Plugin aliases_check, hook rcpt returned DECLINED,
abcdef.2 Plugin spamhaus, hook rcpt returned DECLINED,
abcdef.2 250 [EMAIL PROTECTED], recipient ok
abcdef.0 dispatching DATA
...

or really distinguish recipients? The latter doesn't make much sense to
me (before the first RCPT there are 0 recpients, and after the second
(successful) RCPT there is more than one, so there are a lot of cases
where this is ambiguous. As for the former, I don't see that much use in
it, either. Grouping lines from dispatching ... to the response
together seems easy enough, and if you find that hard for some reason,
it doesn't apply only to recipients - you might want a command counter.


   3. Ability to identify the server.

4. Ability to identify the connection.

   A connection can contain several transactions, and would not
   like to lose the information that two log entries are from
   the same connection.

If we want transaction (and possibly command) ids, I would derive them
from the connection id via simple counters:

$transaction_id = $connection_id.$transaction_counter
$command_id = $transaction_id.$command_counter

where the counters are local to their parent and start at 0.

 A sequence solves (1) except for simultaneous processing of
 incoming messages via:
 
   a) async
   b) threads/multiple cpus
   c) local ports (possibly on multiple addresses)

I think you'll have to define sequence. If you have one global
sequence, that will work in all of these cases. Or you can have multiple
sequences, but then you need a prefix to distinguish them.

hp

-- 
   _  | Peter J. Holzer| I know I'd be respectful of a pirate 
|_|_) | Sysadmin WSR   | with an emu on his shoulder.
| |   | [EMAIL PROTECTED] |
__/   | http://www.hjp.at/ |-- Sam in Freefall


signature.asc
Description: Digital signature


Re: Transaction ID suggestions

2007-08-29 Thread Peter J. Holzer
On 2007-08-29 09:08:56 -0700, JT Moree wrote:
  If you leave out any of the local info, an installation with two
  servers with un-synced times could still gen the same id. if you add
  it, then the only way you could have a collision is if your time is
  not granular enough or gets set back.
 
 I'm ok with that
 
  Using HiRes::Time
 
 my $lip = $conn-local_ip();

up to 15 characters (39 with IPv6)

 my $rip = $conn-remote_ip();

up to 15 characters (39 with IPv6)

 my $rport = $conn-remote_port || 0;

up to 5 characters

 my $lport = $conn-local_port || 0;

up to 5 characters

 my $start = time;

up to 16 characters

$$

up to 5 characters (10 for 32bit PIDs)

 my $id = $$_$start_$lip:$lport_$rip:$rport;

5 + 1 + 16 + 1 + 15 + 1 + 5 + 1 + 15 + 1 + 5 = 66 characters

or even

10 + 1 + 16 + 1 + 39 + 1 + 5 + 1 + 39 + 1 + 5 = 119 characters

on some systems. Much too long for an ID which is included in each log
line. You could condense it by using base 36 instead of base 10, but
it's still quite bulky.

hp


-- 
   _  | Peter J. Holzer| I know I'd be respectful of a pirate 
|_|_) | Sysadmin WSR   | with an emu on his shoulder.
| |   | [EMAIL PROTECTED] |
__/   | http://www.hjp.at/ |-- Sam in Freefall


signature.asc
Description: Digital signature


Re: Transaction ID suggestions

2007-08-29 Thread Charlie Brady


On Wed, 29 Aug 2007, Guy Hulbert wrote:


1. A unique ID per message (on one server).
2. Ability to distinguish per recipient.
3. Ability to identify the server.

A sequence solves (1) except for simultaneous processing of
incoming messages via:

a) async
b) threads/multiple cpus
c) local ports (possibly on multiple addresses)

Except with multiple CPUs, time with sufficient resolution is a
satisfactory replacement for a sequence.


Except with multiple CPUs is a big problem. OTOH, as has been mentioned 
multiple times, a four-tuple identifying the TCP connection plus a 
timestamp will be satisfactory with any number of CPUs, and with very fast 
networks.



It may be useful to log things like remote_port but it doesn't seem to
help directly to solve problem 1.

A counter solves 2.

Any tag which is unique per server solves 3.  It is probably simpler to
make this configurable by the end-user.


A four-tuple identifying the TCP connection also identifies the server.

---
Charlie


Re: Transaction ID suggestions

2007-08-29 Thread Matt Sergeant

On 29-Aug-07, at 1:07 PM, Guy Hulbert wrote:


On Wed, 2007-08-29 at 18:15 +0200, Tony L. Svanstrom wrote:


 Then forget about the word security, and let's just say that  
people might
want to have unique IDs that'll be unique even when they've got  
more than one
server and centralized/aggregated logging... But we're not even  
there right
now, we are still stuck on how to make the IDs 100% unique  
within a single

server as it might be setup by any qpsmtpd-user.


There have been several adequate suggestions.  This is only a  
problem if
it goes into the qpsmtpd core since some of the suggestions are  
reported

to be in use already.


That doesn't matter as they haven't created $tran-id - they've just  
put something in -notes() which will continue to work.



Perhaps it would help to agree on a list of requirements.  From what I
can remember these are:

1. A unique ID per message (on one server).
2. Ability to distinguish per recipient.
3. Ability to identify the server.


I think you've made #2 confusing... I think what you mean is we want  
a new id when the transaction is reset (i.e. same connection, new  
email). That's fine.



A sequence solves (1) except for simultaneous processing of
incoming messages via:

a) async
b) threads/multiple cpus
c) local ports (possibly on multiple addresses)


I don't think any of these break when using a the timer. But to  
settle that concern I've updated the implementation again to use even  
finer grained time (microseconds) and add in rand() in case the  
gettimeofday timer is on a slow clock.


So now it's:

   secs.microsecs.rand.pid

There's a requirement that I'd like to add in: the ability to use the  
id as a filename for storage, and have it sort by time.



Except with multiple CPUs, time with sufficient resolution is a
satisfactory replacement for a sequence.


I don't see what difference multiple CPUs makes. Adding in pid takes  
care of that.



It may be useful to log things like remote_port but it doesn't seem to
help directly to solve problem 1.


Yup. I removed it now - it was stupid to add it in - I just wasn't  
thinking.



A counter solves 2.


Consider counter = rand().

Any tag which is unique per server solves 3.  It is probably  
simpler to

make this configurable by the end-user.


I've added in a basic hashed version of hostname now.

Matt.


Re: Transaction ID suggestions

2007-08-29 Thread Matt Sergeant

On 29-Aug-07, at 5:50 PM, Charlie Brady wrote:

Except with multiple CPUs is a big problem. OTOH, as has been  
mentioned multiple times, a four-tuple identifying the TCP  
connection plus a timestamp will be satisfactory with any number of  
CPUs, and with very fast networks.


pid entirely satisfies this problem.

Matt.


Re: Transaction ID suggestions

2007-08-29 Thread Charlie Brady


On Wed, 29 Aug 2007, Matt Sergeant wrote:


On 28-Aug-07, at 11:04 PM, Charlie Brady wrote:


 Err, actually I had a brain fart. It should be remote_port.

No, it should be remote_IP.remote_port.local_port and should include a 
transaction_within_connection count. I don't think that pid adds anything.


Please try any way you can to get the algorithm I've used to generate a 
duplicate transaction id. Feel free to use your fastest hardware.


My fastest hardware isn't relevant. And I don't have any fast hardware :-)

I've tried, and cannot conceive of any way to get a repeat with this 
algorithm.


This algorith I take to mean your proposal of time() . $$ . 
$remote_port.


That is just asserting that no single process could receive two 
connections in the same tick of time() (because if it could, it's trivial 
to arrange for them to have the same remote port). I can conceive of that 
happening, so we should do better. Use the four-tuple.


Perhaps in 30 years maybe (when computers are that fast), but for 
now it works well.


But not perfectly :-) Nor as well as it could with a tiny bit more effort.

---
Charlie


Re: Transaction ID suggestions

2007-08-29 Thread Tony L. Svanstrom
On Wed, 29 Aug 2007 the voices made Matt Sergeant write:

MS I've added in a basic hashed version of hostname now.

 Would this be a bad time to mention that people might get the idea that they 
want to run two different setups of qpsmtpd on the same server? Like one for 
incoming e-mails and one for outgoing (logging, whitelisting, preventing 
spam/viruses from exiting).

 Yeah, I saw the crypt+rand, but if something is worth doing... =)


/Tony
-- 
Generally speaking, taunting mentally unstable people is a bad idea.


Re: Transaction ID suggestions

2007-08-29 Thread Guy Hulbert
Peter.

I think it might help if you were to just rewrite the requirements
properly.  I don't have strong opinions on what the solution should be
nor what the requirements should be.  As long as the total number is
small and they are written concisely they will either converge or, if
necessary, we can vote.

On Wed, 2007-08-29 at 23:13 +0200, Peter J. Holzer wrote:
1. A unique ID per message (on one server).
 
 I'd rephrase that as unique ID per transaction. Not every
 transaction
 results in a message (indeed, on my systems 90+% of transactions don't
 result in a message).

fine ... I was not clear on the distinction and I think the person who
started the thread has already started using transaction ID

 
 
2. Ability to distinguish per recipient.
 
 I'm not even sure what per recipient should mean here. Does it mean
 per RCPT command, so that a log file looks something like this:

Yes. [ Again I'm not clear but per RCPT command was the previous
context I was referring to. ]

-- 
--gh




Re: Transaction ID suggestions

2007-08-29 Thread Michael Holzt
  my $lip = $conn-local_ip();
 up to 15 characters (39 with IPv6)
  my $rip = $conn-remote_ip();
 up to 15 characters (39 with IPv6)
  my $rport = $conn-remote_port || 0;
 up to 5 characters
  my $lport = $conn-local_port || 0;
 up to 5 characters
  my $start = time;
 up to 16 characters
 $$
 up to 5 characters (10 for 32bit PIDs)
  my $id = $$_$start_$lip:$lport_$rip:$rport;
 5 + 1 + 16 + 1 + 15 + 1 + 5 + 1 + 15 + 1 + 5 = 66 characters
 or even
 10 + 1 + 16 + 1 + 39 + 1 + 5 + 1 + 39 + 1 + 5 = 119 characters

Better encode it binary. E.g. for IPv4:

my $id = pack(NC,$$,$start,$lip,$lport,$rip,$rport)

Sum: 21 Bytes. Encoded in Base64: 28 Bytes.


Regards
Michael

-- 
It's an insane world, but i'm proud to be a part of it. -- Bill Hicks


Re: Transaction ID suggestions

2007-08-29 Thread m. allan noah
On 8/29/07, Matt Sergeant [EMAIL PROTECTED] wrote:
 On 29-Aug-07, at 5:50 PM, Charlie Brady wrote:

  Except with multiple CPUs is a big problem. OTOH, as has been
  mentioned multiple times, a four-tuple identifying the TCP
  connection plus a timestamp will be satisfactory with any number of
  CPUs, and with very fast networks.

 pid entirely satisfies this problem.

not on multiple machines with centralized logging, which is a fairly
common design.

allan

-- 
The truth is an offense, but not a sin


Re: Transaction ID suggestions

2007-08-29 Thread Matt Sergeant

On 29-Aug-07, at 6:03 PM, Charlie Brady wrote:

That is just asserting that no single process could receive two  
connections in the same tick of time() (because if it could, it's  
trivial to arrange for them to have the same remote port). I can  
conceive of that happening, so we should do better. Use the four- 
tuple.


Just because you can conceive of it doesn't make it so. I can  
conceive of flying monkeys too.


And yes, remote_port was dumb. It's gone now.

Matt.



Re: Transaction ID suggestions

2007-08-29 Thread Matt Sergeant

On 29-Aug-07, at 6:38 PM, Tony L. Svanstrom wrote:


On Wed, 29 Aug 2007 the voices made Matt Sergeant write:

MS I've added in a basic hashed version of hostname now.

 Would this be a bad time to mention that people might get the idea  
that they

want to run two different setups of qpsmtpd on the same server?


No that's fine. PID is still in there taking care of that.

Matt.


Re: Transaction ID suggestions

2007-08-29 Thread Matt Sergeant

On 29-Aug-07, at 7:02 PM, m. allan noah wrote:


On 8/29/07, Matt Sergeant [EMAIL PROTECTED] wrote:

On 29-Aug-07, at 5:50 PM, Charlie Brady wrote:


Except with multiple CPUs is a big problem. OTOH, as has been
mentioned multiple times, a four-tuple identifying the TCP
connection plus a timestamp will be satisfactory with any number of
CPUs, and with very fast networks.


pid entirely satisfies this problem.


not on multiple machines with centralized logging, which is a fairly
common design.


Hostname is also part of the id (hashed down to a few chars).

Matt.



Re: Transaction ID suggestions

2007-08-28 Thread JT Moree
James W. Abendschan wrote:
 The check_earlytalker plugin ensures at least a one
 second pause in every SMTP session, so time() + peer IP
 + peer port will be far more unique than a random number :-)

This has been suggested a few times but I'd rather not have to have ids
for the system depend on using a plugin.  I'm pushing for adding this id
to core qpsmtpd.

 This combo would be unique among all hosts attached to the same
 routable networks -- two hosts on two different, unconnected
 networks could possibly get a connection from the same
 private IP + local port at the same time, but this should
 be impossible if the networks are connected.

As in two clients behind a NAT sending to our server at the exact same
time?  Might be possible from server farms or distributed mailing list
systems?

What do you guys think?

-- 
JT Moree


Re: Transaction ID suggestions

2007-08-28 Thread m. allan noah
On 8/28/07, JT Moree [EMAIL PROTECTED] wrote:
 James W. Abendschan wrote:
  The check_earlytalker plugin ensures at least a one
  second pause in every SMTP session, so time() + peer IP
  + peer port will be far more unique than a random number :-)

 This has been suggested a few times but I'd rather not have to have ids
 for the system depend on using a plugin.  I'm pushing for adding this id
 to core qpsmtpd.

  This combo would be unique among all hosts attached to the same
  routable networks -- two hosts on two different, unconnected
  networks could possibly get a connection from the same
  private IP + local port at the same time, but this should
  be impossible if the networks are connected.

 As in two clients behind a NAT sending to our server at the exact same
 time?  Might be possible from server farms or distributed mailing list
 systems?

 What do you guys think?

that wont be an issue. the nat box will rewrite the outgoing packets
to say they are coming from a unique port on it's external interface,
and that is all you can see on your end.

remoteIP + remotePort + fineGrainedTime is what we use in-house for
some high-speed http logging that needs a unique handle. it works just
fine with a fair number of concurrent clients behind a nat or proxy.
but, my installation is not massive :)

allan

-- 
The truth is an offense, but not a sin


Re: Transaction ID suggestions

2007-08-28 Thread Ernesto

Why not use something like Data::UUID?

http://search.cpan.org/~rjbs/Data-UUID-1.148/UUID.pm

There is reads:

It provides reasonably efficient and reliable framework for generating
UUIDs and supports fairly high allocation rates -- 10 million per second
per machine -- and therefore is suitable for identifying both extremely
short-lived and very persistent objects on a given system as well as
across the network.

I used this in a former project for unique persistent object ids.

--
Ernesto






Re: Transaction ID suggestions

2007-08-28 Thread Michael Holzt
 remoteIP + remotePort + fineGrainedTime is what we use in-house for
 some high-speed http logging that needs a unique handle. it works just
 fine with a fair number of concurrent clients behind a nat or proxy.
 but, my installation is not massive :)

Add PID and a per-process message-counter and you should always be
unique.


Regards
Michael

-- 
It's an insane world, but i'm proud to be a part of it. -- Bill Hicks


Re: Transaction ID suggestions

2007-08-28 Thread Matt Sergeant
I've checked in $transaction-id support now. Please let me know if  
you think it's OK.


Matt.


Re: Transaction ID suggestions

2007-08-28 Thread JT Moree
Matt Sergeant wrote:
 I've checked in $transaction-id support now. Please let me know if you
 think it's OK.

which method did you use?
-- 
JT Moree


Re: Transaction ID suggestions

2007-08-28 Thread Matt Sergeant

On 28-Aug-07, at 3:12 PM, JT Moree wrote:


Matt Sergeant wrote:
I've checked in $transaction-id support now. Please let me know  
if you

think it's OK.


which method did you use?


hires_time.pid.local_port

Matt.




Re: Transaction ID suggestions

2007-08-28 Thread JT Moree
Matt Sergeant wrote:
 On 28-Aug-07, at 3:12 PM, JT Moree wrote:
 
 Matt Sergeant wrote:
 I've checked in $transaction-id support now. Please let me know if you
 think it's OK.

 which method did you use?
 
 hires_time.pid.local_port

I found the svn web interface:

  # generate id
  my $conn = $args{connection};
  my $ip = $conn-local_port || 0;
  my $start = time;
  my $id = $start.$$.$ip;

Some people have suggested adding the remote IP address.  I'm curious
why use local port instead of remote port?  would both be better?

  my $ip = $conn-remote_ip($ip);
  my $rport = $conn-remote_port || 0;
  my $lport = $conn-local_port || 0;
  my $start = time;
  my $id = $start_$$.$lport_$ip:$rport;


Thanks for checking something in.  Progress is being made. ;)
-- 
JT Moree


Re: Transaction ID suggestions

2007-08-28 Thread Matt Sergeant

On 28-Aug-07, at 3:51 PM, JT Moree wrote:


I found the svn web interface:

  # generate id
  my $conn = $args{connection};
  my $ip = $conn-local_port || 0;
  my $start = time;
  my $id = $start.$$.$ip;

Some people have suggested adding the remote IP address.  I'm curious
why use local port instead of remote port?  would both be better?


Err, actually I had a brain fart. It should be remote_port.

Matt.




Re: Transaction ID suggestions

2007-08-28 Thread Charlie Brady


On Tue, 28 Aug 2007, Matt Sergeant wrote:


On 28-Aug-07, at 3:51 PM, JT Moree wrote:

hires_time.pid.local_port

...

   my $conn = $args{connection};
   my $ip = $conn-local_port || 0;
   my $start = time;
   my $id = $start.$$.$ip;

 Some people have suggested adding the remote IP address.  I'm curious
 why use local port instead of remote port?  would both be better?


Err, actually I had a brain fart. It should be remote_port.


No, it should be remote_IP.remote_port.local_port and should include a 
transaction_within_connection count. I don't think that pid adds anything.




Re: Transaction ID suggestions

2007-08-25 Thread Matt Sergeant

On 24-Aug-07, at 6:40 PM, David Sparks wrote:

I'm using the poll server which means that there aren't threads to  
worry

about.  However the future probably means running multiple daemons to
take advantage of multi-core systems so there would need to be a  
daemon

id encoded in there.


Yeah, we use HiRes::time() . .$$ and we don't get any file stomping  
(and we're doing millions of emails/day).




Re: Transaction ID suggestions

2007-08-25 Thread James W. Abendschan
On Fri, 24 Aug 2007, Guy Hulbert wrote:

  fqdn + time + peer TCP port will be pretty unique, regardless of

 fqdn is the trivial part

 rand will be pretty unique ...

Initial connection time, peer IP, and peer port will only
repeat if the connection is torn down and restablished with
the same peer reusing the same local port within the resolution
of the timer.

The check_earlytalker plugin ensures at least a one
second pause in every SMTP session, so time() + peer IP
+ peer port will be far more unique than a random number :-)

This combo would be unique among all hosts attached to the same
routable networks -- two hosts on two different, unconnected
networks could possibly get a connection from the same
private IP + local port at the same time, but this should
be impossible if the networks are connected.

Adding this to plugins/logging/syslog works pretty well for
forkserver:

use Time::HiRes;

...

if (!$self-{_logid})
{
if ($self-connection-remote_ip)
{
$self-{_timestamp} = Time::HiRes::time();
$self-{_logid} = t= . $self-{_timestamp} . /peer= . 
$self-connection-remote_ip  . : . $self-connection-remote_port;
}
}

if ($self-connection-remote_ip)
{
$header = $self-{_logid} .  ;
}

syslog $priority, '%s%s', $header, join(' ', @log);


syslog messages look like this:

  Aug 25 14:31:27 mailfoo qpsmtpd[4892]: 
t=1188077487.69488/peer=10.1.253.1:40911 check_earlytalker


If there's an existing way to count the number of messages sent
during the connection, then append the count to _logid and it
becomes a message ID generator.  If this isn't already somewhere
in SMTP.pm, the queueing plugin could increment a counter..
or the logging plugin could watch for the string 'to email address :' 
increment a (thread-safe) counter.  That's a smidge brittle, tho..
a proper message counter would be less hacky.

James




Re: Transaction ID suggestions

2007-08-24 Thread JT Moree
JT Moree wrote:
 
 Is this uique enough?  what is the chance of getting the same random
 number again?  should it be a combination of the PID + time + rand?
 

my @sname = split(/\./, $self-qp-config(me));
= $sname[0].$$.'r'.int( (( time ^ $$ ) * rand($$)) / rand(time/$$));

= sprintf(%08X, rand(2**32 - 1));

$self-qp-config(me) =~ m/\.(\d{1,3}$/;  #not tested
$self-{_id} = $1;
= sprintf(%.4f%d, time(), $self-{_id});

= sprintf(%.4f, time()) ... $self-qp-config(me) . \
  sprintf(%08X, rand(2**32 - 1));  #how expensive is this?

These are the approaches suggested so far.  I added the last one as a
combination of the others.  Can we see a show of hands for the one
people like the best?

Can we get Hanno to modify his patch if people like one of these
approaches?  Can we get it tested by some people?  Can we get it checked
into svn?

-- 
JT Moree


Re: Transaction ID suggestions

2007-08-24 Thread Guy Hulbert
On Fri, 2007-08-24 at 11:52 -0700, JT Moree wrote:
 JT Moree wrote:
  
  Is this uique enough?  what is the chance of getting the same random
  number again?  should it be a combination of the PID + time + rand?
  
 
 my @sname = split(/\./, $self-qp-config(me));
 = $sname[0].$$.'r'.int( (( time ^ $$ ) * rand($$)) / rand(time/$$));
 
 = sprintf(%08X, rand(2**32 - 1));
 
 $self-qp-config(me) =~ m/\.(\d{1,3}$/;  #not tested
 $self-{_id} = $1;
 = sprintf(%.4f%d, time(), $self-{_id});
 
 = sprintf(%.4f, time()) ... $self-qp-config(me) . \
   sprintf(%08X, rand(2**32 - 1));  #how expensive is this?
 
 These are the approaches suggested so far.  I added the last one as a
 combination of the others.  Can we see a show of hands for the one

Using rand is bogus.  A random number generator will repeat values.

Time (with sufficient resolution) is equivalent to a sequence ... but
with threads, you would need a lock on the sequence generator.

 people like the best?
 
 Can we get Hanno to modify his patch if people like one of these
 approaches?  Can we get it tested by some people?  Can we get it checked
 into svn?

-- 
--gh




Re: Transaction ID suggestions

2007-08-24 Thread James W. Abendschan
On Fri, 24 Aug 2007, Guy Hulbert wrote:

  These are the approaches suggested so far.  I added the last one as a
  combination of the others.  Can we see a show of hands for the one

 Using rand is bogus.  A random number generator will repeat values.

 Time (with sufficient resolution) is equivalent to a sequence ... but
 with threads, you would need a lock on the sequence generator.

fqdn + time + peer TCP port will be pretty unique, regardless of
whether you're forking, selecting, or threading.  (fortunately,
multiplexed SMTP does not yet exist.)

Looks like remote_port is  set in qpsmtpd-forkserver, at least..

James






Re: Transaction ID suggestions

2007-08-24 Thread James W. Abendschan
On Fri, 24 Aug 2007, James W. Abendschan wrote:

 On Fri, 24 Aug 2007, Guy Hulbert wrote:

   These are the approaches suggested so far.  I added the last one as a
   combination of the others.  Can we see a show of hands for the one
 
  Using rand is bogus.  A random number generator will repeat values.
 
  Time (with sufficient resolution) is equivalent to a sequence ... but
  with threads, you would need a lock on the sequence generator.

 fqdn + time + peer TCP port will be pretty unique, regardless of
 whether you're forking, selecting, or threading.  (fortunately,
 multiplexed SMTP does not yet exist.)

whoops; s/fqdn/peer IP/

James




Re: Transaction ID suggestions

2007-08-24 Thread Jens Weibler

James W. Abendschan wrote:

On Fri, 24 Aug 2007, Guy Hulbert wrote:

  

These are the approaches suggested so far.  I added the last one as a
combination of the others.  Can we see a show of hands for the one
  

Using rand is bogus.  A random number generator will repeat values.

Time (with sufficient resolution) is equivalent to a sequence ... but
with threads, you would need a lock on the sequence generator.



fqdn + time + peer TCP port will be pretty unique, regardless of
whether you're forking, selecting, or threading.  (fortunately,
multiplexed SMTP does not yet exist.)
  

mmh, multiplexed?
A mailserver can send multiple mails within one tcp-connection:
There may be zero or more, transactions in a session. - RFC2821

--
Jens



Re: Transaction ID suggestions

2007-08-24 Thread James W. Abendschan
On Fri, 24 Aug 2007, Jens Weibler wrote:

 mmh, multiplexed?
 A mailserver can send multiple mails within one tcp-connection:
 There may be zero or more, transactions in a session. - RFC2821

Ah, good point.  Okay then, obviously qpsmtpd now needs to be rewritten
to make me right -- after leaving the DATA state, reject anything other
than QUIT :-)

I suppose a counter could be tacked on to the ID and incremented every
time a message is queued..

James




Re: Transaction ID suggestions

2007-08-24 Thread JT Moree
Guy Hulbert wrote:
 Using rand is bogus.  A random number generator will repeat values.

So you would definitely not like #2 and probably not #1.  How about #3
and $4?

 Time (with sufficient resolution) is equivalent to a sequence ... but
 with threads, you would need a lock on the sequence generator.

In our case a repetition is not a highly critical problem.  (Not enough
to justify using a centralized sequence generator.)  Repetition just
reduces the readability of the logs.  Given that the logs are even less
readable without these id's I'd say we are in a better position to
implement something rather than nothing.

-- 
JT Moree


Re: Transaction ID suggestions

2007-08-24 Thread David Sparks
 = sprintf(%.4f, time()) ... $self-qp-config(me) . \
   sprintf(%08X, rand(2**32 - 1));  #how expensive is this?

 These are the approaches suggested so far.  I added the last one as a
 combination of the others.  Can we see a show of hands for the one
 
 Using rand is bogus.  A random number generator will repeat values.
 
 Time (with sufficient resolution) is equivalent to a sequence ... but
 with threads, you would need a lock on the sequence generator.

I'm using the poll server which means that there aren't threads to worry
about.  However the future probably means running multiple daemons to
take advantage of multi-core systems so there would need to be a daemon
id encoded in there.

The big advantage to using time() + id as the least significant digit is
that you can put the id in a db server as a double or unixtime which
comes in quite handy when you've got a lot of volume.

Cheers,

ds


Re: Transaction ID suggestions

2007-08-24 Thread Guy Hulbert
On Fri, 2007-08-24 at 13:18 -0700, James W. Abendschan wrote:
 On Fri, 24 Aug 2007, Guy Hulbert wrote:
 
   These are the approaches suggested so far.  I added the last one as a
   combination of the others.  Can we see a show of hands for the one
 
  Using rand is bogus.  A random number generator will repeat values.
 
  Time (with sufficient resolution) is equivalent to a sequence ... but
  with threads, you would need a lock on the sequence generator.
 
 fqdn + time + peer TCP port will be pretty unique, regardless of

fqdn is the trivial part

rand will be pretty unique ...

time by itself is sufficient if the resolution is fine enough ... the
problem is that when systems are fast enough, whatever fixed
resolution you picked will not be enough.

However, at present the linux kernel gives microseconds (%.6f rather
than %.4f) and it seems to take about .0001 seconds to fork a process so
if forking, microseconds seem to be sufficient for a few years.

... but threads may be able to get the same value ...

The problem with a sequence is to continue through a crash without
repeating values.

 whether you're forking, selecting, or threading.  (fortunately,
 multiplexed SMTP does not yet exist.)
 
 Looks like remote_port is  set in qpsmtpd-forkserver, at least..
 
 James
 
 
 
 

-- 
--gh




Re: Transaction ID suggestions

2007-08-24 Thread Guy Hulbert
On Fri, 2007-08-24 at 13:22 -0700, JT Moree wrote:
 Guy Hulbert wrote:
  Using rand is bogus.  A random number generator will repeat values.
 
 So you would definitely not like #2 and probably not #1.  How about #3
 and $4?

I can't think of anything that guarantees a unique number ... except
pulling a sequence from an ACID database (where the problem of system
crashes is already solved).

 
  Time (with sufficient resolution) is equivalent to a sequence ... but
  with threads, you would need a lock on the sequence generator.
 
 In our case a repetition is not a highly critical problem.  (Not enough

Repetition will break anything using a hash to sort messages by ID.

 to justify using a centralized sequence generator.)  Repetition just
 reduces the readability of the logs.  Given that the logs are even less

Ah.  You never know.  DJB had a clever method to pick message IDs for
the queue by using the inode ... but it is useless for log analysis
where the mail queue has a dedicated reiser partition and the load is
very LOW.  I found that every message used the same inode when there was
only one message at a time on the system ... :-(  It solves his problem
of picking a message ID which will not conflict with any other in the
queue _at the same time_.

 readable without these id's I'd say we are in a better position to
 implement something rather than nothing.

-- 
--gh