I think I solved the problem when I set depth to 50 and topN to 2000 though
the crawling is far from finished. but I can see outlinks are fetched in
log!!!thank you very much.

2009/3/6 Yves Yu <[email protected]>

> OK, I appended "*" in the tail to skip those files. it seems the pages I
> fetched got more lovely..~~)
>
>
> 2009/3/6 Alexander Aristov <[email protected]>
>
>> 2009/3/5 Yves Yu <[email protected]>
>>
>> > yes. I saw a lot of css and gif and js files here, but I do set
>> following
>> > configurations in my crawl-urlfilter.txt
>> > so ... I will enlarge depth to 50 and topN to 1000 and see what happened
>> >
>> > thank you very much..
>> >
>> > # skip image and other suffixes we can't yet parse
>> >
>> >
>> -\.(js|JS|gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$
>> >
>>
>> This affects only suffexes but in your cases CSS and JS end with random
>> digits/letters
>>  you  need to disable such mime type.
>>
>>
>>
>> >
>> > 2009/3/6 Alexander Aristov <[email protected]>
>> >
>> > -
>> >
>>
>
>

Reply via email to