Re: [Nutch-cvs] svn commit: r397320 - /lucene/nutch/trunk/src/plugin/parse-oo/plugin.xml

2006-04-27 Thread Jérôme Charron
parse-oo plugin manifest is valid with plugin.dtd Oops, I didn't catch that... Thanks! No problem Andrzej. It is just a cosmetic change since the plugin.xml are not validated at runtime (it is in my todo list), and the contentType and pathSuffix parameters are more or less deprecated. Jérôme

RE: exception

2006-04-27 Thread anton
We updated hadoop from trunk branch. But now we get new errors: On tasktarcker side: skiped java.io.IOException: timed out waiting for response at org.apache.hadoop.ipc.Client.call(Client.java:305) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:149) at

Re: Content-Type inconsistency?

2006-04-27 Thread Jérôme Charron
Are you mainly concerned with charset in Content-Type? Not specifically. But while looking at these content-type inconsistency, I noticed that there is some prossible troubles with charset in content-type. Currently, what happens when Content-Type exists in both HTTP layer and in META tag

Re: exception

2006-04-27 Thread Doug Cutting
[EMAIL PROTECTED] wrote: We updated hadoop from trunk branch. But now we get new errors: Oops. Looks like I introduced a bug yesterday. Let me fix it... Sorry, Doug

TRUNK IllegalArgumentException: Argument is not an array (WAS: Re: exception)

2006-04-27 Thread Michael Stack
I'm getting same as Anton below trying to launch a new job with latest from TRUNK. Logic in ObjectWriteable#readObject seems a little off. On the way in we test for a null instance. If null, we set to NullWriteable. Next we test declaredClass to see if its an array. We then try to do an

Re: TRUNK IllegalArgumentException: Argument is not an array (WAS: Re: exception)

2006-04-27 Thread Doug Cutting
I just fixed this. Sorry for the inconvenience! Doug Michael Stack wrote: I'm getting same as Anton below trying to launch a new job with latest from TRUNK. Logic in ObjectWriteable#readObject seems a little off. On the way in we test for a null instance. If null, we set to

Re: Content-Type inconsistency?

2006-04-27 Thread Jérôme Charron
I'm not sure if that is the right thing. If the site administrator did a poort job and a wrong media type is advertized, it's the site problem and Nutch shouldn't be fixing it, in my opinion. Those sites would not work properly with the browsers any way, and Nutch doesn't need to work

Re: Content-Type inconsistency?

2006-04-27 Thread Doug Cutting
Jérôme Charron wrote: Finaly it is a good news that Nutch seems to be more intelligent on content-type guessing than Firefox or IE, no? I'm not so sure. When crawling Apache we had trouble with this feature. Some HTML files that had an XML header and the server identified as text/html

[jira] Created: (NUTCH-256) Cannot open filename ....index.done.crc

2006-04-27 Thread [EMAIL PROTECTED] (JIRA)
Cannot open filename index.done.crc --- Key: NUTCH-256 URL: http://issues.apache.org/jira/browse/NUTCH-256 Project: Nutch Type: Bug Components: indexer Versions: 0.8-dev Reporter: [EMAIL PROTECTED]

[jira] Updated: (NUTCH-256) Cannot open filename ....index.done.crc

2006-04-27 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-256?page=all ] [EMAIL PROTECTED] updated NUTCH-256: Attachment: index.done.crc.patch Ensure creation of companion index.done .crc file Cannot open filename index.done.crc