On 03-Nov-99 Gilles Detillieux wrote:

>> It's hard to know what's "normal" or which option would be faster. 
>> Remember we're all digging very different servers, pages, etc. For 
>> example, you don't mention how many URLs you have or the size of your 
>> database.
>> 
>> I'm guessing the merging is taking a while because either (or both):
>> a) 1200 sites => many, many URLs => large databases
>> b) the machine you're using doesn't have much RAM and is swapping to merge
>> 
>> These are obviously intertwined. The amount of RAM you need is 
>> related to the size of your databases...
> 
> I'm wondering how Andrea is merging these 1200 separate databases.
> I don't know, but I'd guess that merging them hierarchically would be
> faster than merging them linearly.  E.g., for 8 databases (1-8), you
> could merge 2-8 in turn into database 1, but it seems it would be more
> efficient to merge 2 into 1, 4 into 3, 6 into 5, 8 into 7, 3 into 1,
> 7 into 5, and finally 5 into 1.  I'm guessing though.  I don't know that
> anyone ever benchmarked it.

I have to merge the sites separately because I need to be able to know the
"originator" url when I have a hit: the idea is that htdig uses the <url_list>
attribute so that I get a list of "derived" urls from eache "originator" url. I
then merge all these small db's together. Is there an easiest way to trace back
the originating url from the hit ?

----------------------------------
Andrea Carpani
E-Mail: <[EMAIL PROTECTED]>

Vitaminic -- The Music Evolution --
http://www.vitaminic.it
----------------------------------

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.

Reply via email to