On Sep 23, 2011, at 7:00am, Antoni Mylka wrote:

> W dniu 2011-09-23 15:12, Jukka Zitting pisze:
>>> So I think I'll just patch my local copy to do the Q&D thing, and wait for
>>> someone with more XML/RDF-fu to deal with it properly.
>> 
>> Until Someone (TM, :-) does that, I'd be very happy to see the simple
>> property=xxx mapping you described added to HtmlParser.
> 
> There seems to be a long tradition in ASF to appeal to Someone when there is 
> talk about RDF. Chris Mattman wrote back in November 2007:
> 
> "... it's reasonable that someone may need to rewrite the ability to 
> represent metadata in RDF ..."
> 
> Whoever that Someone is - he has my support. ;-)
> 
> On a more serious note though. In the four years since that metadata 
> discussion three separate RDF-related projects have appeared in/around ASF: 
> Clerezza, Jena and Any23. Two are already in incubation, the third one tries 
> to. Jeremias Maerki noticed the lack of coordination in the metadata field 
> four years ago. It's not getting any better.

Agreed.

From my fairly naive perspective, it seems like one of the challenges here is 
that Tika tries to normalize/simplify interacting with data. E.g. I just want 
the text from any document I come across. That seems to be the primary use case.

Whereas RDF is more focused on precision, in being explicit about the 
relationships between data. So I would expect to see many interesting tradeoffs 
in figuring out how best to straddle both worlds. Heck, figuring out how best 
to map fairly simple document elements to XHTML 1.0 has proven challenging.

It would be great to get patches from that Mythical Someone who knows RDF - 
versus, say, me, where the end result is likely to be horribly wrong.

For better or worse, RDF has never been an itch that I've needed to scratch.

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr



Reply via email to