I think I solved the problem when I set depth to 50 and topN to 2000 though the crawling is far from finished. but I can see outlinks are fetched in log!!!thank you very much.
2009/3/6 Yves Yu <[email protected]> > OK, I appended "*" in the tail to skip those files. it seems the pages I > fetched got more lovely..~~) > > > 2009/3/6 Alexander Aristov <[email protected]> > >> 2009/3/5 Yves Yu <[email protected]> >> >> > yes. I saw a lot of css and gif and js files here, but I do set >> following >> > configurations in my crawl-urlfilter.txt >> > so ... I will enlarge depth to 50 and topN to 1000 and see what happened >> > >> > thank you very much.. >> > >> > # skip image and other suffixes we can't yet parse >> > >> > >> -\.(js|JS|gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$ >> > >> >> This affects only suffexes but in your cases CSS and JS end with random >> digits/letters >> you need to disable such mime type. >> >> >> >> > >> > 2009/3/6 Alexander Aristov <[email protected]> >> > >> > - >> > >> > >
