On Wed, Jan 23, 2013 at 12:57 AM, Ferrous Cranus <nikos.gr...@gmail.com> wrote:
> Τη Τρίτη, 22 Ιανουαρίου 2013 3:04:41 μ.μ. UTC+2, ο χρήστης Steven D'Aprano 
> έγραψε:
>
>> What do you expect int("my-web-page.html") to return? Should it return 23
>> or 794 or 109432985462940911485 or 42?
>
> I expected a unique number from the given string to be produced so i could 
> have a (number <=> string) relation. What does int( somestring ) is returning 
> really? i don;t have IDLE to test.

Just run python without any args, and you'll get interactive mode. You
can try things out there.

> This counter.py will work on a shared hosting enviroment, so absolutes paths 
> are BIG and expected like this:
>
> /home/nikos/public_html/varsa.gr/articles/html/files/index.html

That's not big. Trust me, modern databases work just fine with unique
indexes like that. The most common way to organize the index is with a
binary tree, so the database has to look through log(N) entries.
That's like figuring out if the two numbers 142857 and 857142 are the
same; you don't need to look through 1,000,000 possibilities, you just
need to look through the six digits each number has.

> 'pin' has to be a number because if i used the column 'page' instead, just 
> imagine the database's capacity withholding detailed information for each and 
> every .html requested by visitors!!!

Not that bad actually. I've happily used keys easily that long, and
expected the database to ensure uniqueness without costing
performance.

> So i really - really need to associate a (4-digit integer <=> htmlpage's 
> absolute path)

Is there any chance that you'll have more than 10,000 pages? If so, a
four-digit number is *guaranteed* to have duplicates. And if you
research the Birthday Paradox, you'll find that any sort of hashing
algorithm is likely to produce collisions a lot sooner than that.

> Maybe it can be done by creating a MySQL association between the two columns, 
> but i dont know how such a thing can be done(if it can).
>
> So, that why i need to get a "unique" number out of a string. please help.

Ultimately, that unique number would end up being a foreign key into a
table of URLs and IDs. So just skip that table and use the URLs
directly - much easier. In this instance, there's no value in
normalizing.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to