Hi Arturo,

Some of you already know that I'm working on a new parser (https://issues.apache.org/jira/browse/TIKA-443 ). After all day trying to set up a workspace for Eclipse, I implemented the typical "hello world" class, in the Tika Parser version. My problem now, is how to configure Tika in order to call my new parser when a file with especific extension (p.e. *.shp) is found. I read something about a configuration file (tika-config.xml) but I couldn't find it in the source code.

You first need to modify tika-core/src/main/resources/tika- mimetypes.xml.

E.g. something like this was done for mailbox files.

  <mime-type type="application/mbox">
    <sub-class-of type="text/plain"/>
    <glob pattern="*.mbox"/>
  </mime-type>

That maps the suffix to the mime-type.

Then you define the SUPPORTED_TYPES static class field in your parser class that defines what mime-types it supports.

E.g. for MboxParser:

public class MboxParser implements Parser {

    private static final Set<MediaType> SUPPORTED_TYPES =
        Collections.singleton(MediaType.application("mbox"));


-- Ken

--------------------------------------------
<http://ken-blog.krugler.org>
+1 530-265-2225






--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g




Reply via email to