Can you please tell me what is the meaning of this command? what is the top 35 links? how nutch rank the top 35 links?
"bin/nutch readdb crawl/crawldb -topN 35 test" On 4/19/07, Briggs <[EMAIL PROTECTED]> wrote: > Those links are links that were discovered. It does not mean that they > were fetched, they weren't. > > On 4/12/07, Meryl Silverburgh <[EMAIL PROTECTED]> wrote: > > I think I find out the answer to my previous question by doing this: > > > > bin/nutch readlinkdb crawl/linkdb/ -dump test > > > > > > But my next question is why the result shows URLs with 'gif', 'js', etc,etc > > > > I have this line in my craw-urlfilter.txt, so i don't except I will > > crawl things like images, javascript files, > > > > # skip image and other suffixes we can't yet parse > > -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js|rss|swf)$ > > > > > > Can you please tell me how to fix my problem? > > > > Thank you. > > > > On 4/11/07, Meryl Silverburgh <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > > > I read this article about nutch crawling: > > > http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html > > > > > > How can I dumped out the valid links which has been crawled? > > > This command described in the article does not work in nutch 0.9. What > > > should I use instead? > > > > > > bin/nutch readdb crawl-tinysite/db -dumplinks > > > > > > Thank you for any help. > > > > > > > > -- > "Conscious decisions by concious minds are what make reality real" > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
