> Just submit my patch and try to compile you will see what you need to 
> change.
> Just some changes of new Properties() to  ContentProperties() and may 
> the import of this class.

Cool, I'll have a look at your patch :)

>
>> It's much better than what I have right now.  However, it's still not
>> 100% and fetching all the urls would mean implementing some sort of
>> iterative process until all the urls are finally fetched.
>> Do you have an idea why we are still missing 10 to 20% ?
>
>
> Well since i strated with dmoz that are the urls that does not exists 
> anymore but still listen in dmoz. You also have some general errors 
> like, unable to parse, host down etc.
> So 10 % error rate is not to bad, if you have later on some hundred 
> million you will see that this error rate is around less than 5%.


In my results I didn't include the urls that failed to fetch, regardless
of the error.  The % were the fetch attempts (so it includes the
errors), which should be 100%.
So, with your patch, did you see 100% of urls *attempting* a fetch ?

Thanks,
--Flo

Reply via email to