Are you sure your new parser is on the classpath?

E.g. put a break on getSupportedTypes() and make sure that's getting called - if not, then the parser isn't being "found" by Tika.

Ken

On Jun 21, 2010, at 3:34am, Arturo Beltran wrote:

Hi Ken,

Thanks for your quick response.
This's exactly what I'm doing, but despite that Tika recognizes the new MIME tipe, my new parser is not called.

I added to tika-mimetypes.xml:

<mime-type type="application/shp">
<!--sub-class-of type="application/octet-stream"/-->
<glob pattern="*.shp"/>

I created a new class GeoParser:

public class GeoParser implements Parser {

private static final Set<MediaType> SUPPORTED_TYPES = Collections.singleton(MediaType.application("shp"));
   public static final String SHP_MIME_TYPE = "application/shp";

   public Set<MediaType> getSupportedTypes(ParseContext context) {
       return SUPPORTED_TYPES;

   public void parse(
           InputStream stream, ContentHandler handler,
           Metadata metadata, ParseContext context)
           throws IOException, SAXException, TikaException {

       metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE);
       metadata.set("Hello", "World");

       System.out.println("HELLO WORLD");
       System.err.println("ERR Hello world");

XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);

And that's the result:

Content-Length:  755072
Content-Type:  application/shp
resourceName:  comarques250.shp

I don't know wht exactly is failing, but I can't make it work.

Thanks in advance for your help.

El 17/06/2010 18:25, Ken Krugler escribió:
Hi Arturo,

Some of you already know that I'm working on a new parser ( ). After all day trying to set up a workspace for Eclipse, I implemented the typical "hello world" class, in the Tika Parser version. My problem now, is how to configure Tika in order to call my new parser when a file with especific extension (p.e. *.shp) is found. I read something about a configuration file (tika- config.xml) but I couldn't find it in the source code.

You first need to modify tika-core/src/main/resources/tika- mimetypes.xml.

E.g. something like this was done for mailbox files.

<mime-type type="application/mbox">
<sub-class-of type="text/plain"/>
<glob pattern="*.mbox"/>

That maps the suffix to the mime-type.

Then you define the SUPPORTED_TYPES static class field in your parser class that defines what mime-types it supports.

E.g. for MboxParser:

public class MboxParser implements Parser {

   private static final Set<MediaType> SUPPORTED_TYPES =

