Hi Guys,
I've updated my nutch version to use the latest trunk with the new TIKA jar.
I run a crawl and i've got a lot of error like that
2008-02-14 22:02:51,494 INFO conf.Configuration - found resource
tika-mimetypes.xml at file:/data/sengine/search/conf/tika-mimetypes.xml
2008-02-14 22:02:51,499 WARN mime.MimeTypesReader - Invalid media type
alias: text/xml
org.apache.tika.mime.MimeTypeException: Media type alias already exists:
text/xml
at org.apache.tika.mime.MimeTypes.addAlias(MimeTypes.java:312)
at org.apache.tika.mime.MimeType.addAlias(MimeType.java:238)
at org.apache.tika.mime.MimeTypesReader.readMimeType(
MimeTypesReader.java:168)
at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java
:138)
at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java
:121)
at org.apache.tika.mime.MimeTypesFactory.create(
MimeTypesFactory.java:56)
at org.apache.nutch.util.MimeUtil.<init>(MimeUtil.java:58)
at org.apache.nutch.protocol.Content.<init>(Content.java:85)
at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(
HttpBase.java:226)
at org.apache.nutch.fetcher.Fetcher2$FetcherThread.run(Fetcher2.java
:523)
2008-02-14 22:02:51,500 WARN mime.MimeTypesReader - Invalid media type
alias: application/x-dosexec;exe
org.apache.tika.mime.MimeTypeException: Invalid media type alias:
application/x-dosexec;exe
at org.apache.tika.mime.MimeType.addAlias(MimeType.java:242)
at org.apache.tika.mime.MimeTypesReader.readMimeType(
MimeTypesReader.java:168)
at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java
:138)
at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java
:121)
at org.apache.tika.mime.MimeTypesFactory.create(
MimeTypesFactory.java:56)
at org.apache.nutch.util.MimeUtil.<init>(MimeUtil.java:58)
at org.apache.nutch.protocol.Content.<init>(Content.java:85)
at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(
HttpBase.java:226)
at org.apache.nutch.fetcher.Fetcher2$FetcherThread.run(Fetcher2.java
:523)
Is that normal ?
Do i miss something ?