OK, I appended "*" in the tail to skip those files. it seems the pages I fetched got more lovely..~~)
2009/3/6 Alexander Aristov <[email protected]> > 2009/3/5 Yves Yu <[email protected]> > > > yes. I saw a lot of css and gif and js files here, but I do set following > > configurations in my crawl-urlfilter.txt > > so ... I will enlarge depth to 50 and topN to 1000 and see what happened > > > > thank you very much.. > > > > # skip image and other suffixes we can't yet parse > > > > > -\.(js|JS|gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$ > > > > This affects only suffexes but in your cases CSS and JS end with random > digits/letters > you need to disable such mime type. > > > > > > > 2009/3/6 Alexander Aristov <[email protected]> > > > > - > > >
