Hi William,
> Following through the same usage with the federated queries. Sometimes
> we get rubbish back. Things like<http://54233.1*B> come out of
> dbpedia. ARQ faithfully takes these, binds them to results and outputs
> them.
Whether checking is the right thing to do depends on the application
usage. Some might want to see the bad data (e.g. to fix it, or to
complain); in your case, you want it suppressed.
And one persons error is another persons useful data. Encoding errors
or illformed literals are common.
DBpedia has a lot of junk in it and it's a somewhat difficult service to
work with in a process. (It is also an appreciable support cost to
Jena.) I don't want to put in workarounds for DBpedia if DBpedia should
fix the data - from your POV it would be nice if the client code fixed
the problems of the remote end ... but ARQ is a general library.
I have worked with those guys to fix problems at source and last time I
checked, the data was at least legal inc legal URIs. If you are
accessing a recent version, maybe reporting it might get it fixed.
Of course the problem then comes when you try to take the results and
feed them into the very strict Jena parsers, and end up, in our setup,
with entire batches of statements rejected when we try to put it into
stable storage.
Which parser? Some are configurable.
Suggest making the output routines of arq.query check to make sure the
terms are valid, and in the case of CONSTRUCT and DESCRIBE, additional
checks that make sure we don't have things like literals in the
predicate position and suchlike, with the aim of guaranteeing that you
can always insert the results of a CONSTRUCT into a Jena/TDB store.
Graphs should never end up with illegal triples - this is a spec matter.
Could you provide a complete, minimal example please so it can be
fixed. The code should simply drop illformed literals - it is in the spec.
Andy
Cheers,
-w