Hi Monika, (Hi Ian),
Ian has already answered your question.
However, I want to had a similar use case we have in relation to errors or
malformed
RDF input files. When loading large RDF files we typically use N-Triples or
N-Quads
and we want to continue parsing the file even if there are a few errors (i.e.
invalid
lines).
We use RIOT and, even if there is not a feature to tell the parser to ignore an
error,
skip the line and continue to parse, it's not expensive to construct a LangNQuad
object for each line of your input. So, this is what we often do:
String line = ...
Tokenizer tokenizer =
TokenizerFactory.makeTokenizerString(value.toString());
LangNQuads parser = new LangNQuads(tokenizer, profile, sink) ;
parser.parse();
You can then catch all the exception and continue processing the next line.
This happens also when we write MapReduce jobs, for example here [1] or here
[2]. (*)
Maybe, it's not that difficult to add a feature to RIOT's LangNQuad parser to
report
errors but skip to the next line and continue parsing. However, I think this is
close
to impossible for RDF/XML or Turtle serializations.
Paolo
[1]
https://github.com/castagna/tdbloader3/blob/master/src/main/java/com/talis/labs/tdb/tdbloader3/FirstMapper.java
[2]
https://github.com/castagna/tdbloader3/blob/master/src/main/java/com/talis/labs/tdb/tdbloader3/io/QuadRecordReader.java
(*)
By the way, if someone wants to help me removing the bottleneck caused by the
fact
that I am using a single reducer in the first MapReduce job of tdbloader3 or has
ideas on how it could be done, let me know.
Monika Solanki wrote:
> Is it possible to check if the incoming data is legal RDF before reading
> into the model? I do not want my program to throw an error via
> RDFDefaultErrorHandler if the incoming data is illegal RDF. I only want
> a warning to be issued and the program should continue execution. If
> there are any supporting examples, that would be very helpful.
>
> Thanks,
>
> Monika