Re: [Nutch-general] 0.8 Intranet Crawl Output/Logging?

Renaud Richardet Thu, 14 Sep 2006 13:59:41 -0700

Hello Jared,

[EMAIL PROTECTED] wrote:
> Everyone, thanks for the help with this.  I hope to return the
> assistance, once I am more familiar with 0.8.  I am using tail -f now to
> monitor my test crawls.  It also look like you can use
> conf/hadoop-env.sh to redirect log file output to a different location
> for each of your configurations.
>
> One follow up question:
> Now that I can actually see the log, I am finding some of the output
> rather annoying/noisy.  Specially, I am referring to the Registered
> Plugins and Registered Extension-Points output.  It's nice to see that
> once at crawl start, but not with every step of the crawl.
>
> So does any one know if I can disable that output?  
please see http://issues.apache.org/jira/browse/NUTCH-346


HTH,
Renaud

> Here's the output to
> which I refer:
>
> 2006-09-14 14:03:42,852 INFO  plugin.PluginRepository - Plugins: looking
> in: /var/nutch/nutch-0.8/plugins
> 2006-09-14 14:03:43,030 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2006-09-14 14:03:43,030 INFO  plugin.PluginRepository - Registered
> Plugins:
> 2006-09-14 14:03:43,031 INFO  plugin.PluginRepository -
> CyberNeko HTML Parser (lib-nekohtml)
> 2006-09-14 14:03:43,031 INFO  plugin.PluginRepository -         Site
> Query Filter (query-site)
> 2006-09-14 14:03:43,031 INFO  plugin.PluginRepository -         Html
> Parse Plug-in (parse-html)
> [snip]
> 2006-09-14 14:03:43,031 INFO  plugin.PluginRepository - Registered
> Extension-Points:
> 2006-09-14 14:03:43,031 INFO  plugin.PluginRepository -         Nutch
> Summarizer (org.apache.nutch.searcher.Summarizer)
> 2006-09-14 14:03:43,031 INFO  plugin.PluginRepository -         Nutch 
> [snip]
> Search Results Clustering Plugin
> (org.apache.nutch.clustering.OnlineClusterer)
> 2006-09-14 14:03:43,032 INFO  plugin.PluginRepository -         Nutch
> Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
> 2006-09-14 14:03:43,032 INFO  plugin.PluginRepository -         Nutch
> Content Parser (org.apache.nutch.parse.Parser)
> [snip]
>
> Jared-
>
> -----Original Message-----
> From: Jacob Brunson [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, September 14, 2006 1:24 AM
> To: [email protected]
> Subject: Re: 0.8 Intranet Crawl Output/Logging?
>
> On my system, I run the crawl command in one shell while running this
> command in another shell to monitor the crawl:
> tail -f log/hadoop.log
> Of course this does about the same thing as listed below, but "tail
> -f" is a little easier to remember.
>
> On 9/13/06, Tomi NA <[EMAIL PROTECTED]> wrote:
>   
>> On 9/13/06, wmelo <[EMAIL PROTECTED]> wrote:
>>     
>>> I have the same original doubt.  I know that the log shows
>>>       
> informations,
>   
>>> but, how to see the things happening, real time, like in nutch
>>>       
> 0.7.2, when
>   
>>> you use the crawl command in the terminal?
>>>       
>> try something like this (assuming you know what's good for you so you
>> use a *n*x):
>> watch -n 1 "tail -n 20 /home/wmelo/nutch-0.8/logs/hadoop.log"
>>
>> Please replace the path to your "logs" directory to match your
>> environment and report back if there's a problem.
>> Hope it helps.
>>
>> t.n.a.
>>
>>     
>
>
>   

-- 
Renaud Richardet
COO America
Wyona    -   Open Source Content Management   -   Apache Lenya
office +1 857 776-3195                  mobile +1 617 230 9112
renaud.richardet <at> wyona.com           http://www.wyona.com


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] 0.8 Intranet Crawl Output/Logging?

Reply via email to