Re: Writing a whole lot of RDF to TDB versus Jena

Benson Margulies Sun, 22 Jan 2012 12:44:08 -0800

On Sun, Jan 22, 2012 at 3:21 PM, Andy Seaborne <[email protected]> wrote:
> On 21/01/12 18:21, Benson Margulies wrote:
>>
>> This isn't really a 'load' scenario. I've done more profiling since I
>> started that thread.
>>
>> A process is creating new RDF on the fly. It just makes 'add' calls to
>> the model obtained from the TDB default graph.
>>
>> There is, sadly, one case which I implemented with reification. When a
>> document wanders by which triggers thousands of these events, the code
>> bogs down. Not so much in adding the reifications, as in checking for
>> existing ones, which is what it has to do.
>
>
> Maybe theer is a better way - can you share the profiling?  It may be better
> not to check ... and let TDB suppress duplicates.


What I can share from the top of my head is the following. Perhaps I
can do a better job than usual of explaining myself. More likely, in
the process of explaining myself I'll solve the problem.

So, based on an input, I construct a statement, which is sitting in
stmt. By convention, if this statement exists, it is reified. All the
time disappears into the listReifiedStatements call below.

The problem for me is that the URI for an existing reified statement
can't be derived from the data available at this point as this code is
written. If I used my hashing idea to derive it instead of deriving it
from ephemeral information, I wouldn't need to call
listReifiedStatements! In other words, if I go on using 'real'
reification, but use the hash trick to derive the URI, I think all
will be swell. No need for partial reification or other abuses.

             ReifiedStatement rstmt = null;
            if (model.contains(stmt)) {
                RSIterator reit = model.listReifiedStatements(stmt);
                rstmt = reit.nextRS();
            } else {
                model.add(stmt);
                String reUrl = RdfUtils.relationshipUri(docId, ordinal);
                rstmt = model.createReifiedStatement(reUrl, stmt);
            }


>
> TDB reification support is special - it's pure code and implements the
> contract but being stateless the DB knows nothing of reification.  We have
> been thinking of making this the usual way because reification in RDF
> generally is nowadays for specialised use only.
>
>
>> I'm considering a scheme in which I feed the three URI's of the
>> statement into MD5, and 'reifiy' by adding statements like:
>>
>>   urn:<md5>      HAS_PROVENANCE WHATEVER
>>
>> instead of using the formal reification system.
>>
>> Of course, in an imaginary perfect world, TDB would somehow know about
>> reification.
>
>
> Been there, done that [not me personally] :-)
>
> Once upon-a-time, RDB (the old relational DB engine) did reification. While
> it gets good compactness, the complexity of managing partial reifications is
> huge and the payback is small.
>
> Named graphs can be used for keeping statements separated.
>
> All that said, I'd like to do property tables for TDB (string a set of
> properties per subject together) but that it's not a priority.
>
>        Andy
>
>
>

Re: Writing a whole lot of RDF to TDB versus Jena

Reply via email to