[ 
https://issues.apache.org/jira/browse/NUTCH-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris A. Mattmann updated NUTCH-618:
------------------------------------

    Attachment: NUTCH-618.Mattmann.patch.060108.txt

Hey Guys:

Okey dok: here's a candidate patch. Could someone who has an environment set up 
already in which these types of errors were manifesting please trying this 
patch out and see if it makes them go away? I'm thinking that the root of the 
issue is that the MimeTypes object was not necessarily being re instantiated 
many many times as much as it wasn't being cached in the ObjectCache. We'll see.

This attached patch passes all unit tests. So, please let me know what you 
think.

Thanks!

Cheers,
 Chris


> Tika error "Media type alias already exists"
> --------------------------------------------
>
>                 Key: NUTCH-618
>                 URL: https://issues.apache.org/jira/browse/NUTCH-618
>             Project: Nutch
>          Issue Type: Bug
>          Components: mime_type_detector
>    Affects Versions: 1.0.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Chris A. Mattmann
>         Attachments: NUTCH-618.Mattmann.patch.060108.txt
>
>
> After the upgrade to the latest Tika jar we see a lot of errors like this:
> 2008-03-06 08:07:20,659 WARN org.apache.tika.mime.MimeTypesReader: Invalid 
> media type alias: text/xml
> org.apache.tika.mime.MimeTypeException: Media type alias already exists: 
> text/xml
>       at org.apache.tika.mime.MimeTypes.addAlias(MimeTypes.java:312)
>       at org.apache.tika.mime.MimeType.addAlias(MimeType.java:238)
>       at 
> org.apache.tika.mime.MimeTypesReader.readMimeType(MimeTypesReader.java:168)
>       at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:138)
>       at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:121)
>       at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:56)
>       at org.apache.nutch.util.MimeUtil.(MimeUtil.java:58)
>       at org.apache.nutch.protocol.Content.(Content.java:85)
>       at 
> org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:226)
>       at 
> org.apache.nutch.fetcher.Fetcher2$FetcherThread.run(Fetcher2.java:523)
> This is caused most likely by the duplicate tika-mimetypes.xml file - one 
> copy is embedded inside the Tika jar, the other is found in Nutch conf/ 
> directory. The one inside the jar seems to be more recent, so I propose to 
> simply remove the one we have in conf.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to