On 19/10/16 10:51, Rob Vesse wrote:
On 14/10/2016 17:09, "Andy Seaborne" <a...@apache.org> wrote:

    I don't understand what capabilities are enabled by transaction
    granularity if there are multiple transactions in a single patch.
    Concrete examples of where it helps?

    However, I've normally been working with one transaction per patch anyway.

    Allowing multiple transaction per patch is for making a collect of
    (semantically) related changes into a unit, by consolidating small
    patches "today's changes " (c.f. git squash).

    Leaving the transaction boundaries in gives internal checkpoints, not
    just one big transaction. It also makes the consolidate patch
    decomposable (unlike squash).

    Internal checkpoints are useful not just for keeping the transaction
    manageable but also to be able to restart a very large update in case it
    failed part way through for system reasons (server power cut, user
    reboots laptop by accident, ...)  Imagine keeping a DBpedia copy up to date.

I think the thought is that a producer of A patch can decide whether
each transaction being recorded should be reversible or not. For
example if you are a very large dataset to an already large database
you probably don’t want to slow down the import process by having to
check whether every triple/quad is already in the database as you
import it. Therefore you might choose to output a non-reversible
transaction for performance reasons.

On the other hand if you’re accepting a small change to the data then
that cost is probably acceptable and you would output a reversible
transaction.

I am not arguing that you shouldn’t have transaction boundaries, in
fact I think they are essential, but simply that you may want to be
to annotate the properties of a transaction Beyond just stating the
boundaries.

Rob,

I agree the producer needs to have control. What I am asking is why one patch unit (packet) would have multiple transactions with different characteristics in it. The properties of patch packet include reversibility of contents. A patch overall isn't reversible unless each transaction within it is so there is now an opportunity for errors.

I think unit of patch packet is enough - it is supposed to be a sensible set of changes to move the dataset from one consistent state to another. In developing that set of changes, there may have been several transactions (c.f. git squash). It happens to give a checkpoint effect on large patches as well.

Analogy that may not help : a "TB/TC" is a database-transaction and a "patch" is more like a "business transaction".


(The use of "transaction" may not be the best - "action"? but with a need for "abort" as well as "commit", "transaction"

        Andy

Reply via email to