Hi Arturo,
Some of you already know that I'm working on a new parser (https://issues.apache.org/jira/browse/TIKA-443
). After all day trying to set up a workspace for Eclipse, I
implemented the typical "hello world" class, in the Tika Parser
version. My problem now, is how to configure Tika in order to call
my new parser when a file with especific extension (p.e. *.shp) is
found. I read something about a configuration file (tika-config.xml)
but I couldn't find it in the source code.
You first need to modify tika-core/src/main/resources/tika-
mimetypes.xml.
E.g. something like this was done for mailbox files.
<mime-type type="application/mbox">
<sub-class-of type="text/plain"/>
<glob pattern="*.mbox"/>
</mime-type>
That maps the suffix to the mime-type.
Then you define the SUPPORTED_TYPES static class field in your parser
class that defines what mime-types it supports.
E.g. for MboxParser:
public class MboxParser implements Parser {
private static final Set<MediaType> SUPPORTED_TYPES =
Collections.singleton(MediaType.application("mbox"));
-- Ken
--------------------------------------------
<http://ken-blog.krugler.org>
+1 530-265-2225
--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c w e b m i n i n g