On 08/07/13 23:42, Rob Vesse wrote:
I've been looking at doing various query optimizations lately and one
which we would like to do involves combining adjacent quad patterns
together.  ARQ will already combine adjacent BGPs but does not do
this for quad patterns.

Part of the issue in doing this ourselves seems to be the fact that
ARQ treats a OpQuadPattern as a wrapper around a graph Node and a
BasicPattern and produces the QuadPattern on the fly by using the
graph node to form quads.  This means that a merger can only be made
where the adjacent quad patterns have the same graph node as
otherwise we lose part of the graph information.  It would be useful
to us if OpQuadPattern instead just held a QuadPattern and did not
have a fixed graph node associated with it.  However I suspect this
would have a lot of knock on effects to other implementations so this
is not an implementation detail which I would lightly change.

Is there value in making this change longer term and what would the
knock of effects be?

Or is is better to introduce a new operator which is a true wrapper
around a QuadPattern and allows for different graph nodes on
different quads within the pattern?  This way we don't propagate the
change to implementations where it would not make any sense or would
create unnecessary work.

If the latter is preferable we can probably do this completely in our
code base by subclassing OpExt and not affect ARQ itself but thought
I'd throw the idea out there to see if there was any value of making
the change in ARQ

Rob


I agree that long term, a different OpQuadPattern would be good. It's the getting to there from here that matters.

And, yes, changing too quickly it would be have knock-on effects, not just on Jena but maybe (=probably) extensions. I looked at calls to getGraphNode() and getBasicPattern() and getPattern() and there are enough to see it's not a simple switch but it's not huge either.

A couple of caveats:

* Some boundaries are special, like default union graph.

* Entailment works on graphs: keeping boundaries can matter to some systems. I don't know if such systems do quad-things.

* OpAsQuery may be affected.  Should not be too bad - it can regroup quads.

* Odd corner cases like crossing storage boundaries as the graph boundary changes. Probably shouldn't happen.


Thought: What about going the other way??
Convert to joins of OpTriple/OpQuad everywhere and mark trees of pure joins of triples/quads.

See TransformPattern2Join



If you want to do that under OpExt, that's a "no barrier" route. I would be interested in comments on how well that works - extensible Ops is nice but it interacts with the visitor/transform pattern. I don't know of a better way but I could well be missing a design pattern.

I'm also happy to (myself) add OpQuadBlock soon and wire it in properly as a first class Op - it does not take too long and it would force me to look at the code.

        Andy

Reply via email to