On 15/10/13 15:17, Simon Helsen wrote:
Hi all,

regarding some sort of migration utility, Wouldn't that be a must? Or is
the expectation that all previously built databases are thrown out and
recreated?

I don't undertand - any migration utility is a permanent, one way change to the database. Why "thrown out"?

TDB has a backup facility that is neutral to the RDF-2004/RDF-1.1 change (write a compressed nquads file).

From our point of view, we've usually had ways to recreate TDB databases,
but the cost can be enormous (depending on the size of the DB). I would
think a migration utility would be able to convert a database much faster

Do you use xsd:strings at all?  Do you mix simple literals and xsd:strings?

Normal practice before making a format change on any database technology would be to take a backup first. This isn't strictly a format change for the database; it is for the data.

Chasing uses of xsd:strings is going to be expensive if done as I described (which was not transactional, so it's an offline change). Do you use a better way?

It's jumping all over the indexes with little pattern. Reloading from the backup you have already taken may be significantly faster - it all depends on the data.

Of course, if someone contributes such a utility ...

(Small scale migrations could done with SPARQL Update, but it's not going to scale very well.)

        Andy


Simon





From:
Andy Seaborne <a...@apache.org>
To:
dev@jena.apache.org,
Date:
10/14/2013 10:41 AM
Subject:
Re: RDF 1.1 -- changes to plain literals





On 14/10/13 09:11, Rob Vesse wrote:
Andy

Thanks for the great overview, I've been looking at supporting this on
dotNetRDF as well lately so have been thinking much along the same
lines.

I think the check language first needs to be emphasized in messaging to
users about this change, dotNetRDF has the same issue and I've seen
recently that Sesame was also affected by this.  Therefore I think we
need
to be clear about the need for this change in usage.

My feeling is we should make this a configurable behavior, the default
going forward should be RDF 1.1 but it would be nice if users could
toggle
that back to RDF-2004 behaviors if they need to produce data for older
systems.

Some way of reverting to old behaviour would be good.  As long as it's
system-wide I don't foresee any problems.  On a per graph basis would be
very hard; on a per parser run is possible but does not catch API
created data.

Once data has passed through in RDF 1.1 mode and written to file,
whether database or syntax written to disk, it gets confusing to
mix-and-match and go back to RDF-2004 style.

There is reasonable need for some compatibility style, then, yes, let's
put it in.

One thing I think is worth avoiding is too much "speculatively
compatibility" (i.e. guessing!), like putting in all variations of Node
creation into NodeFactory as different factory methods.  These tend to
end up with a life beyond the transition.

On the database side particularly for TDB would it be feasible to
produce
a migration utility which would check a database to see if it is
affected
and if so produce a migrated version of the database?

Backup to N-Quads in RDF-2004 style, update software and restore in RDF
1.1 style will work and it will leave a backup should the deployment
wish to reverse the process.

A special utility to convert TDB databases would be possible by looking
in the node table for explicit xsd:strings, then looking in the indexes
for the internal value of term and changing it (delete-add).

Doing a backup first is a "good thing" (tm) at that point anyway.

It would be an offline process as it is munging the internal tables
directly.  A transactional version is also doable but each layer of
complexity increases the risk of getting it wrong in some corner case.
A special utility has the disadvantage of not being well-used so at risk
of bugs.

So, currently, I would want to see a significant need for this before
embarking on something other than backup-upgrade-restore.

                  Andy


Rob





Reply via email to