---- Original Message ----- 
  From: Ashley Sheridan 
  To: Angus Mann 
  Cc: php-general@lists.php.net 
  Sent: Friday, November 13, 2009 8:31 AM
  Subject: Re: [PHP] uniqid() and repetition of numbers generated


  On Fri, 2009-11-13 at 08:22 +1000, Angus Mann wrote: 
Hi all. I'm sure I can't be the first person to ask this question but a search 
of the net leaves me confused.

I need a unique identifier in an SQL table and for complicated reasons I don't 
want to use auto-increment.

So I thought I would use a pseudo-random method instead. I am NOT scared of 
people guessing the unique identifier, it just has to be unique in order for 
the database to work properly.

So I looked at the uniqid() function and see it is based on the "current time 
in microseconds" and when I test it out I see that it increments (very quickly) 
when run repeatedly.

If it is based on JUST the time, then it should repeat every 24 hours, thus 
making "collisions" possible, which I don't want.

If it is based on the time AND day, then that's fine....I can use it.

So here's the problem....
When I calculate the number of microseconds since 1970 I get a 16 digit number.
But uniqid() only gives a 13 digit number.
Calculating the number of microseconds in a day gives 11 digits.

So it seems to me that the numbering sequence will repeat every 100 days, which 
risks collisions also.

Can someone explain how uniqid() is really calculated, so I can make a proper 
judgement about how to use it?

Please don't suggest using a hash of a number generated by uniqid(). Hashing a 
small number into a longer one does not add entropy, it just transforms the 
input number, so it does NOT alter the risk of collisions so there is no net 
advantage.

I had a thought to just append the current date to the uniqid() result but I'm 
interested to know if anyone has a more elegant solution.

Thanks in advance.

Angus





  Auto increment fields are designed to avoid collisions. I can't think of any 
sensible reason for not using them. If you're worried that users of the system 
will think a number like '65' is a 'silly' value for an id, why not pad it up 
with leading zeros, and maybe add in some text from their name or something. To 
me, one unique number is the same as another, whether it has 11 digits or 2. 
Also, without having numbers with many leading zeros in your 11-digit unique 
number, the value range will be dramatically reduced, thereby increasing the 
chance of you running out of unique values.

        Thanks,
        Ash
        http://www.ashleysheridan.co.uk


       

Thanks Ashley. To clarify, the reason I don't want to use auto-increment : 
different users with their own populated databases may wish to merge some or 
all of their data. The unique identifier needs to be carried along with the 
rest of the data, hence be unique not only on the database it currently resides 
in ... it still needs to be unique if it gets copied into another person's 
database, and auto-increment will not meet that requirement. I thought that 
using microtime (hence uniqid()) will solve the problem, and the only chance of 
a collision is the unlikely event that by chance, records are added to 2 
different people's databases at EXACTLY the same time, to within an accuracy of 
a millionth of a second. Possible I realize, but very unlikely, given that each 
user will probably add less than 100 entries per day.

On balance I think I will generate an identifier consisting of a few 
things...uniqid() plus a a few letters from the person's name plus a 
(pseudo)random 3 digit number. Probably there's enough entropy in that for my 
purpose.  

But the question still remains....what exactly is being returned by uniqid() ? 
It is obviously not random, and not a hash function because it increments 
predictably. It's too short to be the number of microseconds since 1970 and too 
long to be the number of microseconds since midnight. Since it has a fixed 
length, and it increments, it will eventually get to the last possible number - 
when will that be, and what will happen - will an extra digit appear or will it 
go back to zero, or will the generating algorithm crash? 

If it's anything similar to the unix timestamp then we're all in trouble on 
January 19, 2038 !









Reply via email to