Re: [Dbmail-dev] Replace unique_id with GUIDfor Load Balancing&Failover

Aaron Stone Fri, 27 May 2005 00:08:41 +0200 (CEST)

Here's a thought:

Mandatory mailbox compaction. If we go a route that uses time for message
insertion, and allows only (say) 6 months of uniqueness, we can be fully
multimaster, no worries, BUT once every 6 months, the UID's MUST be
compacted, and the UIDVALIDITY value of the mailbox incremented, as well
as the "last compaction time" which will be the value subtracted from the
current time to yield the truncated time (24 bits or whatever):


http://lists.ccil.org/pipermail/fetchmail-friends/2004-May/008697.html

And (big ass RFC 3501 quote):


2.3.1.1.        Unique Identifier (UID) Message Attribute

   A 32-bit value assigned to each message, which when used with the
   unique identifier validity value (see below) forms a 64-bit value
   that MUST NOT refer to any other message in the mailbox or any
   subsequent mailbox with the same name forever.  Unique identifiers
   are assigned in a strictly ascending fashion in the mailbox; as each
   message is added to the mailbox it is assigned a higher UID than the
   message(s) which were added previously.  Unlike message sequence
   numbers, unique identifiers are not necessarily contiguous.

   The unique identifier of a message MUST NOT change during the
   session, and SHOULD NOT change between sessions.  Any change of
   unique identifiers between sessions MUST be detectable using the
   UIDVALIDITY mechanism discussed below.  Persistent unique identifiers
   are required for a client to resynchronize its state from a previous
   session with the server (e.g., disconnected or offline access
   clients); this is discussed further in [IMAP-DISC].

   Associated with every mailbox are two values which aid in unique
   identifier handling: the next unique identifier value and the unique
   identifier validity value.

   The next unique identifier value is the predicted value that will be
   assigned to a new message in the mailbox.  Unless the unique
   identifier validity also changes (see below), the next unique
   identifier value MUST have the following two characteristics.  First,
   the next unique identifier value MUST NOT change unless new messages
   are added to the mailbox; and second, the next unique identifier
   value MUST change whenever new messages are added to the mailbox,
   even if those new messages are subsequently expunged.

        Note: The next unique identifier value is intended to
        provide a means for a client to determine whether any
        messages have been delivered to the mailbox since the
        previous time it checked this value.  It is not intended to
        provide any guarantee that any message will have this
        unique identifier.  A client can only assume, at the time
        that it obtains the next unique identifier value, that
        messages arriving after that time will have a UID greater
        than or equal to that value.

   The unique identifier validity value is sent in a UIDVALIDITY
   response code in an OK untagged response at mailbox selection time.
   If unique identifiers from an earlier session fail to persist in this
   session, the unique identifier validity value MUST be greater than
   the one used in the earlier session.

        Note: Ideally, unique identifiers SHOULD persist at all
        times.  Although this specification recognizes that failure
        to persist can be unavoidable in certain server
        environments, it STRONGLY ENCOURAGES message store
        implementation techniques that avoid this problem.  For
        example:

         1) Unique identifiers MUST be strictly ascending in the
            mailbox at all times.  If the physical message store is
            re-ordered by a non-IMAP agent, this requires that the
            unique identifiers in the mailbox be regenerated, since
            the former unique identifiers are no longer strictly
            ascending as a result of the re-ordering.

         2) If the message store has no mechanism to store unique
            identifiers, it must regenerate unique identifiers at
            each session, and each session must have a unique
            UIDVALIDITY value.

         3) If the mailbox is deleted and a new mailbox with the
            same name is created at a later date, the server must
            either keep track of unique identifiers from the
            previous instance of the mailbox, or it must assign a
            new UIDVALIDITY value to the new instance of the
            mailbox.  A good UIDVALIDITY value to use in this case
            is a 32-bit representation of the creation date/time of
            the mailbox.  It is alright to use a constant such as
            1, but only if it guaranteed that unique identifiers
            will never be reused, even in the case of a mailbox
            being deleted (or renamed) and a new mailbox by the
            same name created at some future time.

         4) The combination of mailbox name, UIDVALIDITY, and UID
            must refer to a single immutable message on that server
            forever.  In particular, the internal date, [RFC-2822]
            size, envelope, body structure, and message texts
            (RFC822, RFC822.HEADER, RFC822.TEXT, and all BODY[...]
            fetch data items) must never change.  This does not
            include message numbers, nor does it include attributes
            that can be set by a STORE command (e.g., FLAGS).



On Thu, May 26, 2005, ""Aaron Stone"" <[EMAIL PROTECTED]> said:

> I think an important question to ask is how many messages we really want
> to be able to fit into a mailbox. IMAP's 32 bit limit, and that some
> clients treat those 32 bits as signed (is this true? do we need to
> worry?), indicates that there's already a limit of 2 billion messages per
> mailbox.
> 
> If we're comfortable with a limit of, say, 16 million messages per
> mailbox, then we can go with 24 bits incrementing, 7 bits server id, and 1
> bit lost.
> 
> 2^24 == 16,777,216
> 60 sec * 60 min * 24 hours * 365 days ==  31,536,000
> Unfortunately, 24 bits will only hold about 6 months worth of seconds.
> 
> Using UNIX time only gives us until 2038 before we're screwed; and that's
> if there's one message per second. Using a time window could help with
> this, which is what I think Paul might have mentioned in a previous email:
> 
> next_uid = (curret_time - mailbox_time)
> 
> By subtracting the mailbox's creation time from the current time, we
> effectively restart the sequence, and, in a weird way, we give each
> mailbox about 50 years to live from the time of the mailbox's creation.
> 
> If we do this:
> 
> next_uid = (curret_time - mailbox_time) (24 bits) . server id (7 bits)
> Then the mailbox only has 6 months, with a cluster of 128 machines.
> 
> next_uid = (current_time - mailbox_time) (27 bits) . server id (4 bits)
> Then the mailbox has 5 years to live, with a cluster of 16 machines.
> 
> next_uid = (current_time - mailbox_time) (29 bits) . server id (2 bits)
> Then the mailbox has 17 years to live, with a cluster of 4 machines.
> 
> ---
> 
> Sadly, it's looking like 32 bits is just too small to combine time with
> anything else. We could go to a keyserver architecture, where there's one
> machine that has the task of doling out the next keys, which means we
> would be bound not by time but by the number of messages, but it also
> means that we'd lose significant clustering flexibility.
> 
> ***
> 
> Why isn't the a spec for 64 bit IMAP ids yet?! Should we write one?
> 
> Aaron
> 
> 
> On Thu, May 26, 2005, ""Kevin Baker"" <[EMAIL PROTECTED]> said:
> 
>>> Geo Carncross wrote:
>>>> On Thu, 2005-05-26 at 13:49 +0200, Paul J Stevens wrote:
>>>>
>>>>>Ok, let me recapitulate:
>>>>>
>>>>>- we want to replace all auto-incremented fields with
>>>>> bigint fields to
>>>>>hold uuids in order to accomodate N-clustered databases.
>> 
>> So the problem seems to be generating a sequential 32bit
>> char based on time and server id.
>> 
>> I'll have to do some reading on 32bit chars... to get my
>> head around this...
>> 
>> 
>> Kevin
>>
>> <snip: comments that seem to go over stuff from last years
>> thread>
>> 
>> _______________________________________________
>> Dbmail-dev mailing list
>> Dbmail-dev@dbmail.org
>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>> 
> 
> -- 
> 
> 
> 
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> 

--

Re: [Dbmail-dev] Replace unique_id with GUIDfor Load Balancing&Failover

Reply via email to