Hi,
I haven't been watching nutch development progress for some time (so my
answer may not be accurate) but I don't think there is such a tool/report.
Anyway, your contribution would be warmly welcomed! :-)
On the other hand, based on your short description of features you are
looking for, my personal opinion is that you are looking for tool which
should provide exact information about something that is very variable
(mutable) in its nature and heavy dependend on Nutch setup.
For example the size on parsed document (for example html document) can be
limited to specific size. So can be the number of links extracted from
document ... etc,etc ... Such variables have fatal impact on the crawl
result and thus on the resul of your report as well.
Just my 2 cents.
Regards,
Lukas
On 12/2/06, karthik085 <[EMAIL PROTECTED]> wrote:
Hello,
How do I check that all pages have been fetched? Is there a command or
tool,
that says like:
these are the number of pages in the website, the number of pages fetched,
pages filtered...
give a report. If errors, how many and give a brief description...
I understand analyzing log and readdb with stats/dumppageurl is one
option.
But, it is time consuming and requires unwanted manual work. If there is a
tool/command that did the above option, I could just easily parse the
report
for my web services.
--
View this message in context:
http://www.nabble.com/Nutch-Data-Testing-tf2742246.html#a7651128
Sent from the Nutch - User mailing list archive at Nabble.com.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general