Hi Chris and all,

El 07/07/2010 16:04, Mattmann, Chris A (388J) escribió:
Hi Arturo,

How exactly are you calling your parser? Are you using the AutoDetectParser? If 
so, can you put some print statements in in the public void parse(...) method 
of CompositeParser? Specifically, add a line right after:
I'm calling my parser using the Tika-app included, so I think I'm using AutoDetectParser.


Parser parser = getParser(metadata);
// print out the returned parser
System.out.println("Parser returned is: ["+parser.getClass().getName()+"]");

What does that return? Also, have you done the work to map your incoming 
document type in the tika-mimetypes.xml file?
Yes, sure.
  That is, if you're using AutoDetectParser or anything that extends 
CompositeParser, the mime type of the incoming document is used to determine 
what parser gets called? Is the mime type being detected appropriately? You can 
check this by putting a println right before getParser in the parse(...) method:
Yes, it returns "application/shp"
// print the mime type
System.out.println("The MIME type is: ["+ 
metadata.get(Metadata.CONTENT_TYPE)+"]);
Parser parser = getParser(metadata);

What does that print out?

Finally if both of these printlns check out, you should check and make sure 
that your new parser is correctly mapped to the media type it supports, in 
other words what Ken said below. Does your parser declare that it supports your 
expected MIME type?
Yes I declared this MIME type in my parser. But the /getSupportedTypes(context)/ function is never called.

I uploaded a file with the Tika source code that includes my modified /tika-mimetypes.xml/ file and my new parser /GeoParser.java/. Perhaps one of you will try it and find out where I'm wrong.
Here the link: http://elcano.dlsi.uji.es/arturo/tika_geo.zip


Greetings and thanks in advance for your help,
     Arturo
Let me know and thanks!

Cheers,
Chris




On 7/7/10 4:25 AM, "Arturo Beltran"<arturo.belt...@uji.es>  wrote:

Hi,

I'm still with the same problem.
I think it's all good, I do the/ "mvn install/" and my new class is
included in the generated JAR, but never called.
It should be very simple. I feel a little silly. I don't know how to
make my new parser is found by Tika.

Thanks in advance
       Arturo


El 21/06/2010 19:04, Ken Krugler escribió:
Are you sure your new parser is on the classpath?

E.g. put a break on getSupportedTypes() and make sure that's getting
called - if not, then the parser isn't being "found" by Tika.

-- Ken

On Jun 21, 2010, at 3:34am, Arturo Beltran wrote:

Hi Ken,

First of all, thanks for your quick response.
This's exactly what I'm doing, but despite that Tika recognizes the
new MIME tipe, my new parser is not called.

I added to tika-mimetypes.xml:

<mime-type type="application/shp">
<!--sub-class-of type="application/octet-stream"/-->
<glob pattern="*.shp"/>
</mime-type>

I created a new class GeoParser:

public class GeoParser implements Parser {

    private static final Set<MediaType>  SUPPORTED_TYPES =
Collections.singleton(MediaType.application("shp"));
    public static final String SHP_MIME_TYPE = "application/shp";

    public Set<MediaType>  getSupportedTypes(ParseContext context) {
        return SUPPORTED_TYPES;
    }

    public void parse(
            InputStream stream, ContentHandler handler,
            Metadata metadata, ParseContext context)
            throws IOException, SAXException, TikaException {

        metadata.set(Metadata.CONTENT_TYPE, SHP_MIME_TYPE);
        metadata.set("Hello", "World");

        System.out.println("HELLO WORLD");
        System.err.println("ERR Hello world");

        XHTMLContentHandler xhtml = new XHTMLContentHandler(handler,
metadata);
        xhtml.startDocument();
        xhtml.endDocument();
    }
...
}

And that's the result:

Content-Length:  755072
Content-Type:  application/shp
resourceName:  comarques250.shp

I don't know wht exactly is failing, but I can't make it work.

Greetings and thanks in advance for your help.
     Arturo


El 17/06/2010 18:25, Ken Krugler escribió:
Hi Arturo,

Some of you already know that I'm working on a new parser
(https://issues.apache.org/jira/browse/TIKA-443). After all day
trying to set up a workspace for Eclipse, I implemented the typical
"hello world" class, in the Tika Parser version. My problem now, is
how to configure Tika in order to call my new parser when a file
with especific extension (p.e. *.shp) is found. I read something
about a configuration file (tika-config.xml) but I couldn't find it
in the source code.
You first need to modify
tika-core/src/main/resources/tika-mimetypes.xml.

E.g. something like this was done for mailbox files.

<mime-type type="application/mbox">
<sub-class-of type="text/plain"/>
<glob pattern="*.mbox"/>
</mime-type>

That maps the suffix to the mime-type.

Then you define the SUPPORTED_TYPES static class field in your
parser class that defines what mime-types it supports.

E.g. for MboxParser:

public class MboxParser implements Parser {

    private static final Set<MediaType>  SUPPORTED_TYPES =
        Collections.singleton(MediaType.application("mbox"));


-- Ken

--------------------------------------------
<http://ken-blog.krugler.org>
+1 530-265-2225






--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






--
Arturo Beltran Fonollosa
Institute of New Imaging Technologies (INIT): http://www.init.uji.es
Geographic Information research group: http://www.geoinfo.uji.es
Universitat Jaume I, Avda. de Vicente Sos Baynat s/n
E-12071, Castellón, Spain
mailto: arturo.belt...@uji.es

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g






--
Arturo Beltran Fonollosa
Institute of New Imaging Technologies (INIT): http://www.init.uji.es
Geographic Information research group: http://www.geoinfo.uji.es
Universitat Jaume I, Avda. de Vicente Sos Baynat s/n
E-12071, Castellón, Spain
mailto: arturo.belt...@uji.es




++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.mattm...@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




--
Arturo Beltran Fonollosa
Institute of New Imaging Technologies (INIT): http://www.init.uji.es
Geographic Information research group: http://www.geoinfo.uji.es
Universitat Jaume I, Avda. de Vicente Sos Baynat s/n
E-12071, Castellón, Spain
mailto: arturo.belt...@uji.es

Reply via email to