
On Fri, Sep 23, 2011 at 4:19 PM, Ken Krugler
<kkrugler_li...@transpac.com> wrote:
> From my fairly naive perspective, it seems like one of the challenges
> here is that Tika tries to normalize/simplify interacting with data. [...]
> Whereas RDF is more focused on precision, in being explicit about
> the relationships between data.

Yep, as you mention that's obviously an issue that needs work and
sometimes tricky tradeoffs.

That said, I'm pretty confident that there is no fundamental
disconnect between these two goals, and I think over time (years most
likely) we will be able to work out all the details. We're already
taking steps along that road with our parsers exposing increasingly
more detailed document structure and our metadata model already
handling things like dates in a more structured manner.

At least that seems to me like an obvious candidate for inclusion in a
future roadmap for post-1.0 Tika.

> It would be great to get patches from that Mythical Someone who knows RDF

Agreed. :-) As Antoni said, this is an area where we could and should
be able to do better. There are quite a few RDF experts already at and
around Apache, and it shouldn't be too hard to position Tika more
prominently on their radars. The Any23 proposal that Chris is
championing is one good chance for this.

Also, now that I work at Adobe, my XMP itch has been growing quite a
bit, so I wouldn't be surprised if I ended up working on better XMP
(and thus RDF) support soon after Tika 1.0 is out.


Jukka Zitting

Reply via email to