Ok, that clears some things up.

So is there a good class to extend, like JenaReader?
Or should I start from scratch and implement RDFReader?

I think most mainstream Linked Data publishing methods should be
supported, at least these:
http://linkeddatabook.com/editions/1.0/#htoc65

Maybe the implementation could be broken into several levels that
extend each other:
a) content negotiation only
b) heuristics (like using file extension) not involving content-sniffing
c) GRDDL
d) HTML-sniffing to find <link>s etc

Martynas

On Fri, Jan 27, 2012 at 8:17 PM, Andy Seaborne <[email protected]> wrote:
> On 27/01/12 13:44, Martynas Jusevicius wrote:
>>
>> Hey list,
>>
>> I am looking for an implementation doing what looks like a simple task
>> (but probably isn't): given a URI, try to extract RDF Model from it in
>> all possible ways.
>> It should use content negotiation: ask for RDF/XML as first priority,
>> Turtle/N-Triples as the second, and try GRDDL on HTML as the last
>> option.
>>
>> I can see Jena's RDFReader, JenaReader, and GRDDLReader that all seem
>> to do a part of what is needed, but I wonder if there already is some
>> code that combines it all?
>>
>> Martynas
>> http://graphity.org
>
>
> Ah. This is something that's been talked about several times and I went as
> far as looking for old notes on this for a JIRA moderately recently.
>
> What we need (IMO) is a single reader that opens streams then decides which
> parser to dispatch to.
>
> FileManager+typed streams.
>
>  Add a locator to the filemanager to do conneg.
>  Streams are typed by any MIME info
>
> then the decision on MIME type to believe is based on
> 1/ MIME type
> 2/ file extension
> 3/ user hint
>
> probably in the order 3-1-2.  Except for text/plain when 2 overrides 1 or we
> route it to Turtle regardless.
>
> Given that, look in a registry and call the real parser.
>
> I'm not completely sure it will work for RDFa and GRDDL - maybe if the
> system is told to read one of those, the dispatching reader believes that
> over any conneg and just does it.
>
> What I think we should avoid unless really, really necessary is sniffing the
> content.
>
> org.openjena.riot.web.HttpOp for some code that does HTTP GETs and
> dispatches to a handler.  I don't think this is the way to go; it's not nice
> to pick the results out of the operation.
>
> org.openjena.riot.WebContent has lots of constants.
>
>        Andy

Reply via email to