Ok, that clears some things up. So is there a good class to extend, like JenaReader? Or should I start from scratch and implement RDFReader?
I think most mainstream Linked Data publishing methods should be supported, at least these: http://linkeddatabook.com/editions/1.0/#htoc65 Maybe the implementation could be broken into several levels that extend each other: a) content negotiation only b) heuristics (like using file extension) not involving content-sniffing c) GRDDL d) HTML-sniffing to find <link>s etc Martynas On Fri, Jan 27, 2012 at 8:17 PM, Andy Seaborne <[email protected]> wrote: > On 27/01/12 13:44, Martynas Jusevicius wrote: >> >> Hey list, >> >> I am looking for an implementation doing what looks like a simple task >> (but probably isn't): given a URI, try to extract RDF Model from it in >> all possible ways. >> It should use content negotiation: ask for RDF/XML as first priority, >> Turtle/N-Triples as the second, and try GRDDL on HTML as the last >> option. >> >> I can see Jena's RDFReader, JenaReader, and GRDDLReader that all seem >> to do a part of what is needed, but I wonder if there already is some >> code that combines it all? >> >> Martynas >> http://graphity.org > > > Ah. This is something that's been talked about several times and I went as > far as looking for old notes on this for a JIRA moderately recently. > > What we need (IMO) is a single reader that opens streams then decides which > parser to dispatch to. > > FileManager+typed streams. > > Add a locator to the filemanager to do conneg. > Streams are typed by any MIME info > > then the decision on MIME type to believe is based on > 1/ MIME type > 2/ file extension > 3/ user hint > > probably in the order 3-1-2. Except for text/plain when 2 overrides 1 or we > route it to Turtle regardless. > > Given that, look in a registry and call the real parser. > > I'm not completely sure it will work for RDFa and GRDDL - maybe if the > system is told to read one of those, the dispatching reader believes that > over any conneg and just does it. > > What I think we should avoid unless really, really necessary is sniffing the > content. > > org.openjena.riot.web.HttpOp for some code that does HTTP GETs and > dispatches to a handler. I don't think this is the way to go; it's not nice > to pick the results out of the operation. > > org.openjena.riot.WebContent has lots of constants. > > Andy
