Hi, I'm still with the same problem.I think it's all good, I do the/ "mvn install/" and my new class is included in the generated JAR, but never called. It should be very simple. I feel a little silly. I don't know how to make my new parser is found by Tika.
Thanks in advance
Arturo
El 21/06/2010 19:04, Ken Krugler escribió:
Are you sure your new parser is on the classpath?E.g. put a break on getSupportedTypes() and make sure that's getting called - if not, then the parser isn't being "found" by Tika.-- Ken On Jun 21, 2010, at 3:34am, Arturo Beltran wrote:Hi Ken, First of all, thanks for your quick response.This's exactly what I'm doing, but despite that Tika recognizes the new MIME tipe, my new parser is not called.I added to tika-mimetypes.xml: <mime-type type="application/shp"> <!--sub-class-of type="application/octet-stream"/--> <glob pattern="*.shp"/> </mime-type> I created a new class GeoParser: public class GeoParser implements Parser {private static final Set<MediaType> SUPPORTED_TYPES = Collections.singleton(MediaType.application("shp"));public static final String SHP_MIME_TYPE = "application/shp"; public Set<MediaType> getSupportedTypes(ParseContext context) { return SUPPORTED_TYPES; } public void parse( InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException { metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE); metadata.set("Hello", "World"); System.out.println("HELLO WORLD"); System.err.println("ERR Hello world");XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata);xhtml.startDocument(); xhtml.endDocument(); } ... } And that's the result: Content-Length: 755072 Content-Type: application/shp resourceName: comarques250.shp I don't know wht exactly is failing, but I can't make it work. Greetings and thanks in advance for your help. Arturo El 17/06/2010 18:25, Ken Krugler escribió:Hi Arturo,Some of you already know that I'm working on a new parser (https://issues.apache.org/jira/browse/TIKA-443). After all day trying to set up a workspace for Eclipse, I implemented the typical "hello world" class, in the Tika Parser version. My problem now, is how to configure Tika in order to call my new parser when a file with especific extension (p.e. *.shp) is found. I read something about a configuration file (tika-config.xml) but I couldn't find it in the source code.You first need to modify tika-core/src/main/resources/tika-mimetypes.xml.E.g. something like this was done for mailbox files. <mime-type type="application/mbox"> <sub-class-of type="text/plain"/> <glob pattern="*.mbox"/> </mime-type> That maps the suffix to the mime-type.Then you define the SUPPORTED_TYPES static class field in your parser class that defines what mime-types it supports.E.g. for MboxParser: public class MboxParser implements Parser { private static final Set<MediaType> SUPPORTED_TYPES = Collections.singleton(MediaType.application("mbox")); -- Ken -------------------------------------------- <http://ken-blog.krugler.org> +1 530-265-2225 -------------------------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g-- Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castellón, Spain mailto: [email protected]-------------------------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
-- Arturo Beltran Fonollosa Institute of New Imaging Technologies (INIT): http://www.init.uji.es Geographic Information research group: http://www.geoinfo.uji.es Universitat Jaume I, Avda. de Vicente Sos Baynat s/n E-12071, Castellón, Spain mailto: [email protected]
