Thanks both.

But theoretically, a these two URLs may very well not 
represent the same document:
http://www.uio.no/
http://uio.no/
but still reside on the same server (same dns entry).

So ...  Is it possible to _know_ whether or not these two 
documents are the same without downloading their documents 
and comparing them? (I really don't think so myself, but 
someone might know something I don't.)

I suddenly realize this has got very little to do with 
Rebol. Sorry.

Hallvard

Dixit Tom Conlin <[EMAIL PROTECTED]> (Wed, 22 Oct 
2003 10:00:08 -0700 (PDT)):
>
>On Wed, 22 Oct 2003, Hallvard Ystad wrote:
>
>>
>> Hi list
>>
>> My rebol stuff search engine now has more than 10000
>> entries, and works pretty fast thanks to DocKimbels 
>>mysql
>> protocol.
>>
>> Here's a problem:
>> Some websites work both with and without the www prefix
>> (ex. www.rebol.com and just plain and simple rebol.com).
>> Sometimes this gives double records in my DB (ex.
>> http://www.oops-as.no/cgi-bin/rebsearch.r?q=mysql : 
>>you'll
>> see that both http://www.softinnov.com/bdd.html and
>> http://softinnov.com/bdd.html appears).
>>
>> Is there a way to detect such behaviour on a server? Or 
>>do
>> I have to compare my incoming document to whatever
>> documents I already have in the DB that _might_ be the
>> same document?
>>
>> Thnaks,
>> Hallvard
>>
>> Pr?tera censeo Carthaginem esse delendam
>> --
>> To unsubscribe from this list, just send an email to
>> [EMAIL PROTECTED] with unsubscribe as the subject.
>>
>
>Hi Hallvard
>
>I ran into different reasons for finding more than one 
>url to a page
>(URLs expressed as relative links)
>and wrote a QAD function that served my purpose at the 
>time.
>
>just added Antons sugestion maybe it will serve
>
>
>do 
>http://darkwing.uoregon.edu/~tomc/core/web/url-encode.r
>
>canotical-url: func[ url /local t p q][
>    replace/all url "\" "/"
>    t: parse url "/"
>    while [p: find t ".."][remove remove back p]
>    while [p: find t "."][remove p]
>    p: find t ""
>    while [p <> q: find/last t ""][remove q]
>
>    ;;; this is untested
>    ;;; using Anton's sugguestion
>
>    if not find t/3 "www."[
>       if equal? read join dns:// t/3 read join dns://www. t/3
>       [insert t/3  "www."]
>    ]
>
>    for i 1 (length? t) - 1 1[append t/:i "/"]
>    to-url url-encode/re rejoin t
>]
>-- 
>To unsubscribe from this list, just send an email to
>[EMAIL PROTECTED] with unsubscribe as the subject.
>

Pr?tera censeo Carthaginem esse delendam
-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.

Reply via email to