[jira] Created: (NUTCH-309) Uses commons logging Code Guards

2006-06-22 Thread Jerome Charron (JIRA)
Uses commons logging Code Guards Key: NUTCH-309 URL: http://issues.apache.org/jira/browse/NUTCH-309 Project: Nutch Type: Improvement Versions: 0.8-dev Reporter: Jerome Charron Assigned to: Jerome Charron Priority:

[jira] Created: (NUTCH-310) Review Log Levels

2006-06-22 Thread Jerome Charron (JIRA)
Review Log Levels - Key: NUTCH-310 URL: http://issues.apache.org/jira/browse/NUTCH-310 Project: Nutch Type: Improvement Versions: 0.8-dev Reporter: Jerome Charron Assigned to: Jerome Charron Priority: Minor Fix For: 0.8-dev

Problem opening checksum file

2006-06-22 Thread anton
I create file on dfs (for example filename done). After I try copy this file from dfs to local filesystem. In result I get this file in local filesystem and error: Problem opening checksum file: /user/root/crawl/done. Ignoring with exception org.apache.hadoop.ipc.RemoteException: jav

[jira] Resolved: (NUTCH-309) Uses commons logging Code Guards

2006-06-22 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-309?page=all ] Jerome Charron resolved NUTCH-309: -- Resolution: Fixed Logging code guards added. http://svn.apache.org/viewvc?view=revrevision=416346 Uses commons logging Code Guards

noindedo not index/noindex

2006-06-22 Thread Stefan Groschupf
Hi, as far I can see nutch's html parser does only support the meta tag noindex (meta name=ROBOTS content=NOINDEX,NOFOLLOW ) but there is an inoffiziel html noindex tag. http://www.webmasterworld.com/forum10003/2703.htm May be this would be another thing to make nutch more polite. Also

Re: noindedo not index/noindex

2006-06-22 Thread Jérôme Charron
as far I can see nutch's html parser does only support the meta tag noindex (meta name=ROBOTS content=NOINDEX,NOFOLLOW ) but there is an inoffiziel html noindex tag. http://www.webmasterworld.com/forum10003/2703.htm Hello Stefan, Here is a previous discussion about this :

Re: svn commit: r416346 [1/3] - in /lucene/nutch/trunk/src: java/org/apache/nutch/analysis/ java/org/apache/nutch/clustering/ java/org/apache/nutch/crawl/ java/org/apache/nutch/fetcher/ java/org/apach

2006-06-22 Thread Doug Cutting
[EMAIL PROTECTED] wrote: NUTCH-309 : Added logging code guards [ ... ] + if (LOG.isWarnEnabled()) { +LOG.warn(Line does not contain a field name: + line); + } [ ...] -1 I don't think guards should be added everywhere. They make the code bigger and provide

RE: [jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-22 Thread Teruhiko Kurosaka
Thank you for your reply, Sami. I am not intend to run hadoop at all, so this hadoop-site.xlm is empty. ... You should at least set values for 'mapred.system.dir' and 'mapred.local.dir' and point them to a dir that has enough space available (I think they default to under /tmp at least on

Re: svn commit: r416346 [1/3] - in /lucene/nutch/trunk/src: java/org/apache/nutch/analysis/ java/org/apache/nutch/clustering/ java/org/apache/nutch/crawl/ java/org/apache/nutch/fetcher/ java/org/apach

2006-06-22 Thread Jérôme Charron
I don't think guards should be added everywhere. That's right Doug. It was a rude first pass on logging. The next pass (finest) will be done with NUTCH-310. Rather, guards should only be added in performance critical code, and then only for Debug-level output. Info and Warn levels are

[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-22 Thread KuroSaka TeruHiko (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417387 ] KuroSaka TeruHiko commented on NUTCH-266: - Both Eugine's case and my case are failing in the call chain started at line 101 of LocalJobRunner.java, which reads:

[jira] Commented: (NUTCH-266) hadoop bug when doing updatedb

2006-06-22 Thread KuroSaka TeruHiko (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-266?page=comments#action_12417391 ] KuroSaka TeruHiko commented on NUTCH-266: - I'm sorry for adding many comment. This would be the last for today. As an experiment, I replaced hadoop-0.2-dev.jar that