Hi, On Fri, Sep 23, 2011 at 4:19 PM, Ken Krugler <[email protected]> wrote: > From my fairly naive perspective, it seems like one of the challenges > here is that Tika tries to normalize/simplify interacting with data. [...] > Whereas RDF is more focused on precision, in being explicit about > the relationships between data.
Yep, as you mention that's obviously an issue that needs work and sometimes tricky tradeoffs. That said, I'm pretty confident that there is no fundamental disconnect between these two goals, and I think over time (years most likely) we will be able to work out all the details. We're already taking steps along that road with our parsers exposing increasingly more detailed document structure and our metadata model already handling things like dates in a more structured manner. At least that seems to me like an obvious candidate for inclusion in a future roadmap for post-1.0 Tika. > It would be great to get patches from that Mythical Someone who knows RDF Agreed. :-) As Antoni said, this is an area where we could and should be able to do better. There are quite a few RDF experts already at and around Apache, and it shouldn't be too hard to position Tika more prominently on their radars. The Any23 proposal that Chris is championing is one good chance for this. Also, now that I work at Adobe, my XMP itch has been growing quite a bit, so I wouldn't be surprised if I ended up working on better XMP (and thus RDF) support soon after Tika 1.0 is out. BR, Jukka Zitting
