Thanks it helps to solve my problem too. Does it means we need to update the config file in the trunk ?
> I get the same error (same setup as you-no changes to default with > Nutch). I doubt it's the way to do it, but I did find just now that if > I extract the tikal-mimetypes.xml from the jar file and copy it over the > one in nutch-trunk/conf at least I don't see those errors any more. > > Emmanuel wrote: >> Hi Chris, >> >> FYI, i used the version provided by nutch without changing it. >> >> Anyway please find it attached. >> >> Thanks, >> E >> > Hi Emmanuel, >> > >> > Could you please post your >> /data/sengine/search/conf/tika-mimetypes.xml >> > file? >> > >> > Thanks, >> > Chris >> > >> > >> > >> > On 2/14/08 6:07 AM, "Emmanuel" <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> > >> >> Hi Guys, >> >> >> >> I've updated my nutch version to use the latest trunk with the new >> TIKA >> >> jar. >> >> >> >> I run a crawl and i've got a lot of error like that >> >> 2008-02-14 22:02:51,494 INFO conf.Configuration - found resource >> >> tika-mimetypes.xml at >> file:/data/sengine/search/conf/tika-mimetypes.xml >> >> 2008-02-14 22:02:51,499 WARN mime.MimeTypesReader - Invalid media >> type >> >> alias: text/xml >> >> org.apache.tika.mime.MimeTypeException: Media type alias already >> exists: >> >> text/xml >> >> at >> org.apache.tika.mime.MimeTypes.addAlias(MimeTypes.java:312) >> >> at org.apache.tika.mime.MimeType.addAlias(MimeType.java:238) >> >> at org.apache.tika.mime.MimeTypesReader.readMimeType( >> >> MimeTypesReader.java:168) >> >> at >> >> org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java >> >> :138) >> >> at >> >> org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java >> >> :121) >> >> at org.apache.tika.mime.MimeTypesFactory.create( >> >> MimeTypesFactory.java:56) >> >> at org.apache.nutch.util.MimeUtil.<init>(MimeUtil.java:58) >> >> at org.apache.nutch.protocol.Content.<init>(Content.java:85) >> >> at >> >> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput( >> >> HttpBase.java:226) >> >> at >> >> org.apache.nutch.fetcher.Fetcher2$FetcherThread.run(Fetcher2.java >> >> :523) >> >> 2008-02-14 22:02:51,500 WARN mime.MimeTypesReader - Invalid media >> type >> >> alias: application/x-dosexec;exe >> >> org.apache.tika.mime.MimeTypeException: Invalid media type alias: >> >> application/x-dosexec;exe >> >> at org.apache.tika.mime.MimeType.addAlias(MimeType.java:242) >> >> at org.apache.tika.mime.MimeTypesReader.readMimeType( >> >> MimeTypesReader.java:168) >> >> at >> >> org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java >> >> :138) >> >> at >> >> org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java >> >> :121) >> >> at org.apache.tika.mime.MimeTypesFactory.create( >> >> MimeTypesFactory.java:56) >> >> at org.apache.nutch.util.MimeUtil.<init>(MimeUtil.java:58) >> >> at org.apache.nutch.protocol.Content.<init>(Content.java:85) >> >> at >> >> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput( >> >> HttpBase.java:226) >> >> at >> >> org.apache.nutch.fetcher.Fetcher2$FetcherThread.run(Fetcher2.java >> >> :523) >> >> >> >> Is that normal ? >> >> Do i miss something ? >> > >> > ______________________________________________ >> > Chris Mattmann, Ph.D. >> > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> >> > Cognizant Development Engineer >> > Early Detection Research Network Project >> > _________________________________________________ >> > Jet Propulsion Laboratory Pasadena, CA >> > Office: 171-266B Mailstop: 171-246 >> > _______________________________________________________ >> > >> > Disclaimer: The opinions presented within are my own and do not >> reflect >> > those of either NASA, JPL, or the California Institute of Technology. >> > >> > >> > >> > > -- > This email message and any attachments are for the sole use of the > intended > recipient(s) and may contain information that is proprietary to Ahold > and/or > its subsidiaries ("Ahold") or otherwise confidential or legally > privileged. > If you have received this message in error, please notify the sender by > reply, and delete all copies of this message and any attachments. If you > are the intended recipient you may use the information contained in this > message and any files attached to this message only as authorized by > Ahold. > Files attached to this message may only be transmitted using secure > systems > and appropriate means of encryption, and must be secured using the same > level of password and security protection with which the file was provided > to you. Any unauthorized use, dissemination or disclosure of this message > or its attachments is strictly prohibited. >
