Re: java.net.MalformedURLException: no protocol for parse-plugins.xml

2005-10-03 Thread Jérôme Charron
Likely missing file:/. If I get rid of lines 617-622 of conf/nutch-default.xml Oups, sorry. I made this last change just after testing the whole patch. And I doesn't test it once again since I was sure it was a minor change. I correct this right now. Sorry. Regards Jérôme --

umbilical.done is called two times

2005-10-03 Thread Stefan Groschupf
Hi, the umbilical.done is called two times in case a task is finished. The map and the reduce task implementation call done when in the last line of the run methods. (Maptask: 132, ReduceTask: 273) But the tasktracker calls the the umbilical.done a second time in line 585. Is this a bug?

Re: what contibute to fetch slowing down

2005-10-03 Thread Doug Cutting
Fuad Efendi wrote: I found this in J2SE API for setReuseAddress(default: false): = When a TCP connection is closed the connection may remain in a timeout state for a period of time after the connection is closed (typically known as the TIME_WAIT state or 2MSL wait state). For applications

Re: tasks is not killed

2005-10-03 Thread Doug Cutting
Stefan Groschupf wrote: I notice that can happen that a task is still running when the job already was killed. The web gui says there is no running job and process hold the nodes busy. I haven't found the source of the problem yet. I have seen this too. I think the solution is that, when

IlTrovatore check: e' SPAM? Re: [Fwd: Fetch list priority]

2005-10-03 Thread massimo miccoli
+1 I have read the paper about OPIc and it seam very good. I think it a must for Nutch to have good (and fast) rank algo webgraph based. I have fetched about 250 milions of pages and what I see is that the only inlinks count is not good for big crawl and quality results. Thanks, Massimo

Re: Nutch 0.7.1 and Nutch web site

2005-10-03 Thread Doug Cutting
Piotr Kosiorowski wrote: Should we have version independent site - always modified in trunk? Or should we think about having a site (eg. JavaDocs, tutorial etc) versioned and available for all versions at the same time? The practice I've followed is to have the website reflect the latest

[jira] Commented: (NUTCH-99) ports are hardcoded or random

2005-10-03 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12331224 ] Stefan Groschupf commented on NUTCH-99: --- OK, make sense. Do you prefer command line args for the ports for this 'lets search for a port' code? I personal would prefer

[jira] Commented: (NUTCH-99) ports are hardcoded or random

2005-10-03 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12331225 ] Doug Cutting commented on NUTCH-99: --- What command line would you add this to? I think this should simply start at the default port (e.g., 7030) and loop trying port+1 until

RE: what contibute to fetch slowing down

2005-10-03 Thread Fuad Efendi
Doug, Thanks for reply, I'll try to perform specific tests against in-home Apache during this week(end) (limited in time slightly... Sorry!). Everything possible, usually Apache httpd has timeout setting for keep-alive, and default setting is (I don't remember) probably 600 seconds. I performed

DNS

2005-10-03 Thread Fuad Efendi
Another cause of another problem: By default, Java 1.4 caches DNS-to-IP mappings forever... java.security.Security.setProperty(networkaddress.cache.ttl , 1);