Re: Suggestions for learning more about SPARQL query performance?

2021-04-06 Thread Andy Seaborne




On 06/04/2021 12:23, Steve Vestal wrote:
One more question.  What is the default interface used to query an 
InfModel?  


Graph.find

Is a tailored StageGenerator used 


No (but could be - it just isn't ATM)

or a straight-up Graph or 
InfGraph interface?


Yes.

Andy



On 4/5/2021 11:39 AM, Steve Vestal wrote:

Thanks. Are the following impressions sort of correct?

If I have a moderately sizeable RDF model (say over 1M triples), I am 
likely to get faster queries by loading it into TDB rather than read 
it into an in-memory model.


If I interface to an external database using the Graph interface (I 
assume this is much easier than writing my own StageGenerator or 
OpExecutor), then I should pay particular attention to how triples are 
ordered in a query or in BGPs in the resulting algebra.


Does the default StageGenerator execute finds in the order they appear 
in a BGP in the algebra?


Is there any reason to pick TDB over TDB2?

On 4/5/2021 8:34 AM, Andy Seaborne wrote:

Hi Steve,

QueryExecutionFactory lets me create a query on either a Model or a 
Dataset.What is the difference?Which one does ARQ actually operate 
on?Will ARQ create a Dataset and DataGraph if given (say) an 
in-memory Model to be queried?
All SPARQL execution is on a dataset (actually a DatasetGraph, lower 
level than Dataset which maps to Models, Resources from the building 
blocks of Graphs and DatasetGraph).


Model is a presentation API - it has no state and the RDF is stored 
in Graphs or DatasetGraphs.


Execution is of the algebra using class OpExecutor.

The algebra is extended to have quads and blocks of quads. OpExecutor 
has both triple/basic graph pattern and quad/quadpattern steps. TDB 
rewrite the algebra to quad form then executes that.


So actually touching the data comes down to "execute OpQuadPattern" 
or "execute OpBGP" in OpExecutor. TDB executes QuadPattern natively; 
it has its own subclass of OpExecutor. (TDB also has an adapter to 
execute over a single graph as well.)


If executing in triple form for an in-memory, general purpose 
dataset, a collection of graphs. The "execute OpBGP" step calls a 
rgistered StageGenerator, which is an interface and can be set 
per-dataset.


StageGenerator receives a multi-triple BGP and returns the solutions 
- get to see the whole BGP.


(Currently so does TIM, the native in-memory DatasetGraph even though 
it is a quad store).


To extend for a custom graph implementation:

* do nothing - the default StageGenerator will turn the BGP execution 
into graph.find calls.


if the graph can do better for a BGP:

* Provide your own StageGenerator.

and if a dataset concerned with quads or you want to see more of the 
execution beyond BGPs:


* Provide your own OpExecutor subclass.

Everything maps back and forth between triples and quads so full 
SPARQL functionality is available uniformly, but maybe not as 
efficiently as possible.



When/where are indexes created and used?
Indexes are a feature of the storage, not the general execution 
strategy.



optimizing the SPARQL algebra
Optimziation has two parts: rewriting the algebra into "better" 
algebra. This may include moving filters about.


Reordering BPs/QuadPatterns is done at the Storage or the default 
StageGenerator.


    Andy







On 05/04/2021 16:29, Steve Vestal wrote:


Thanks for the pointers to the excellent tutorial materials on how 
SPARQL queries are processed and how various things affect 
performance.After a little bit of digging and experimenting with 
ARQ, I’m not clear on how things go from an optimized SPARQL algebra 
expression to evaluation of leaf BGPs and when/where indexes are 
created and used.


I assumed ARQ uses the Graph interface to access an underlying 
Model, which can be backed by any of an in-memory model, a TDB, my 
own class that implements the Graph interface, or an inference or 
union model backed by any of these.The Graph interface does not have 
a “find” method that accepts a multi-triple BGP as an 

input (as one

of the tutorials described), just finds for single-triple patterns.

QueryExecutionFactory lets me create a query on either a Model or a 


Dataset.What is the difference?Which one does ARQ actually operate 
on?Will ARQ create a Dataset and DataGraph if given (say) an 
in-memory Model to be queried?Graph, Dataset, and DatasetGraph all 
support only single-triple query patterns.The TDB documentation 
talks about optimizing the SPARQL algebra, but it is the ARQ API 
that has optimization configuration options.Some initial experiments 
with a couple of ARQ Context settings resulted in little impact on a 
test query issued to a series of increasingly larger in-memory 
models.When/where are indexes created and used?


Thanks for any insight.

On 3/18/2021 9:45 AM, Steve Vestal wrote:
Thanks. I'm looking to get smarter in general about formulating 
queries, particularly those with non-trivial graph structure, e.g., 
more than just a shallow tree of properties 

Re: Suggestions for learning more about SPARQL query performance?

2021-04-06 Thread Steve Vestal
One more question.  What is the default interface used to query an 
InfModel?  Is a tailored StageGenerator used or a straight-up Graph or 
InfGraph interface?


On 4/5/2021 11:39 AM, Steve Vestal wrote:

Thanks. Are the following impressions sort of correct?

If I have a moderately sizeable RDF model (say over 1M triples), I am 
likely to get faster queries by loading it into TDB rather than read 
it into an in-memory model.


If I interface to an external database using the Graph interface (I 
assume this is much easier than writing my own StageGenerator or 
OpExecutor), then I should pay particular attention to how triples are 
ordered in a query or in BGPs in the resulting algebra.


Does the default StageGenerator execute finds in the order they appear 
in a BGP in the algebra?


Is there any reason to pick TDB over TDB2?

On 4/5/2021 8:34 AM, Andy Seaborne wrote:

Hi Steve,

QueryExecutionFactory lets me create a query on either a Model or a 
Dataset.What is the difference?Which one does ARQ actually operate 
on?Will ARQ create a Dataset and DataGraph if given (say) an 
in-memory Model to be queried?
All SPARQL execution is on a dataset (actually a DatasetGraph, lower 
level than Dataset which maps to Models, Resources from the building 
blocks of Graphs and DatasetGraph).


Model is a presentation API - it has no state and the RDF is stored 
in Graphs or DatasetGraphs.


Execution is of the algebra using class OpExecutor.

The algebra is extended to have quads and blocks of quads. OpExecutor 
has both triple/basic graph pattern and quad/quadpattern steps. TDB 
rewrite the algebra to quad form then executes that.


So actually touching the data comes down to "execute OpQuadPattern" 
or "execute OpBGP" in OpExecutor. TDB executes QuadPattern natively; 
it has its own subclass of OpExecutor. (TDB also has an adapter to 
execute over a single graph as well.)


If executing in triple form for an in-memory, general purpose 
dataset, a collection of graphs. The "execute OpBGP" step calls a 
rgistered StageGenerator, which is an interface and can be set 
per-dataset.


StageGenerator receives a multi-triple BGP and returns the solutions 
- get to see the whole BGP.


(Currently so does TIM, the native in-memory DatasetGraph even though 
it is a quad store).


To extend for a custom graph implementation:

* do nothing - the default StageGenerator will turn the BGP execution 
into graph.find calls.


if the graph can do better for a BGP:

* Provide your own StageGenerator.

and if a dataset concerned with quads or you want to see more of the 
execution beyond BGPs:


* Provide your own OpExecutor subclass.

Everything maps back and forth between triples and quads so full 
SPARQL functionality is available uniformly, but maybe not as 
efficiently as possible.



When/where are indexes created and used?
Indexes are a feature of the storage, not the general execution 
strategy.



optimizing the SPARQL algebra
Optimziation has two parts: rewriting the algebra into "better" 
algebra. This may include moving filters about.


Reordering BPs/QuadPatterns is done at the Storage or the default 
StageGenerator.


    Andy







On 05/04/2021 16:29, Steve Vestal wrote:


Thanks for the pointers to the excellent tutorial materials on how 
SPARQL queries are processed and how various things affect 
performance.After a little bit of digging and experimenting with 
ARQ, I’m not clear on how things go from an optimized SPARQL algebra 
expression to evaluation of leaf BGPs and when/where indexes are 
created and used.


I assumed ARQ uses the Graph interface to access an underlying 
Model, which can be backed by any of an in-memory model, a TDB, my 
own class that implements the Graph interface, or an inference or 
union model backed by any of these.The Graph interface does not have 
a “find” method that accepts a multi-triple BGP as an 
input (as one 

of the tutorials described), just finds for single-triple patterns.

QueryExecutionFactory lets me create a query on either a Model or a 


Dataset.What is the difference?Which one does ARQ actually operate 
on?Will ARQ create a Dataset and DataGraph if given (say) an 
in-memory Model to be queried?Graph, Dataset, and DatasetGraph all 
support only single-triple query patterns.The TDB documentation 
talks about optimizing the SPARQL algebra, but it is the ARQ API 
that has optimization configuration options.Some initial experiments 
with a couple of ARQ Context settings resulted in little impact on a 
test query issued to a series of increasingly larger in-memory 
models.When/where are indexes created and used?


Thanks for any insight.

On 3/18/2021 9:45 AM, Steve Vestal wrote:
Thanks. I'm looking to get smarter in general about formulating 
queries, particularly those with non-trivial graph structure, e.g., 
more than just a shallow tree of properties rooted in one resource, 
maybe dags, maybe with cycles.  I am open to post-processing query 
results. (I do that already, 

Re: Suggestions for learning more about SPARQL query performance?

2021-04-06 Thread Andy Seaborne




On 05/04/2021 17:39, Steve Vestal wrote:

Thanks.  Are the following impressions sort of correct?

If I have a moderately sizeable RDF model (say over 1M triples), I am 
likely to get faster queries by loading it into TDB rather than read it 
into an in-memory model.


Hard to tell which si faster.

At 1M triples, TDB will very likely be running with everything cached so 
it's "in-memory" for  query.


Speed difference will come down to the fact it is different code in 
places. Algorithmically, they are much the same (comparing TIM, the 
transactional in-memory dataset vs TDB).


If you try it out, do report your findings.

If I interface to an external database using the Graph interface (I 
assume this is much easier than writing my own StageGenerator or 
OpExecutor), then I should pay particular attention to how triples are 
ordered in a query or in BGPs in the resulting algebra.


Yes.

But the default StageGenerator is some light reordering then calling 
Graph.find.  See the code


For RDF-star:
StageGeneratorGenericStar
QueryIterBlockTriplesStar

but if no embedded triple terms these are calling the same as:

StageGeneratorGeneric
QueryIterBlockTriples
QueryIterTriplePattern

Does the default StageGenerator execute finds in the order they appear 
in a BGP in the algebra?


StageGeneratorGeneric - no.

There is a simple, fixed reordering.

Order by most grounded terms (being less keen on rdf:type).



Is there any reason to pick TDB over TDB2?


Not really especially for query use cases.

Andy



On 4/5/2021 8:34 AM, Andy Seaborne wrote:

Hi Steve,

QueryExecutionFactory lets me create a query on either a Model or a 
Dataset.What is the difference?Which one does ARQ actually operate 
on?Will ARQ create a Dataset and DataGraph if given (say) an 
in-memory Model to be queried?
All SPARQL execution is on a dataset (actually a DatasetGraph, lower 
level than Dataset which maps to Models, Resources from the building 
blocks of Graphs and DatasetGraph).


Model is a presentation API - it has no state and the RDF is stored in 
Graphs or DatasetGraphs.


Execution is of the algebra using class OpExecutor.

The algebra is extended to have quads and blocks of quads. OpExecutor 
has both triple/basic graph pattern and quad/quadpattern steps. TDB 
rewrite the algebra to quad form then executes that.


So actually touching the data comes down to "execute OpQuadPattern" or 
"execute OpBGP" in OpExecutor. TDB executes QuadPattern natively; it 
has its own subclass of OpExecutor. (TDB also has an adapter to 
execute over a single graph as well.)


If executing in triple form for an in-memory, general purpose dataset, 
a collection of graphs. The "execute OpBGP" step calls a rgistered 
StageGenerator, which is an interface and can be set per-dataset.


StageGenerator receives a multi-triple BGP and returns the solutions - 
get to see the whole BGP.


(Currently so does TIM, the native in-memory DatasetGraph even though 
it is a quad store).


To extend for a custom graph implementation:

* do nothing - the default StageGenerator will turn the BGP execution 
into graph.find calls.


if the graph can do better for a BGP:

* Provide your own StageGenerator.

and if a dataset concerned with quads or you want to see more of the 
execution beyond BGPs:


* Provide your own OpExecutor subclass.

Everything maps back and forth between triples and quads so full 
SPARQL functionality is available uniformly, but maybe not as 
efficiently as possible.



When/where are indexes created and used?

Indexes are a feature of the storage, not the general execution strategy.


optimizing the SPARQL algebra
Optimziation has two parts: rewriting the algebra into "better" 
algebra. This may include moving filters about.


Reordering BPs/QuadPatterns is done at the Storage or the default 
StageGenerator.


    Andy







On 05/04/2021 16:29, Steve Vestal wrote:


Thanks for the pointers to the excellent tutorial materials on how 
SPARQL queries are processed and how various things affect 
performance.After a little bit of digging and experimenting with ARQ, 
I’m not clear on how things go from an optimized SPARQL algebra 
expression to evaluation of leaf BGPs and when/where indexes are 
created and used.


I assumed ARQ uses the Graph interface to access an underlying Model, 
which can be backed by any of an in-memory model, a TDB, my own class 
that implements the Graph interface, or an inference or union model 
backed by any of these.The Graph interface does not have a “find” 
method that accepts a multi-triple BGP as an input (as one of the 
tutorials described), just finds for single-triple patterns.


QueryExecutionFactory lets me create a query on either a Model or a 


Dataset.What is the difference?Which one does ARQ actually operate 
on?Will ARQ create a Dataset and DataGraph if given (say) an 
in-memory Model to be queried?Graph, Dataset, and DatasetGraph all 
support only single-triple query patterns.The TDB 

Re: Suggestions for learning more about SPARQL query performance?

2021-04-05 Thread Steve Vestal

Thanks.  Are the following impressions sort of correct?

If I have a moderately sizeable RDF model (say over 1M triples), I am 
likely to get faster queries by loading it into TDB rather than read it 
into an in-memory model.


If I interface to an external database using the Graph interface (I 
assume this is much easier than writing my own StageGenerator or 
OpExecutor), then I should pay particular attention to how triples are 
ordered in a query or in BGPs in the resulting algebra.


Does the default StageGenerator execute finds in the order they appear 
in a BGP in the algebra?


Is there any reason to pick TDB over TDB2?

On 4/5/2021 8:34 AM, Andy Seaborne wrote:

Hi Steve,

QueryExecutionFactory lets me create a query on either a Model or a 
Dataset.What is the difference?Which one does ARQ actually operate 
on?Will ARQ create a Dataset and DataGraph if given (say) an 
in-memory Model to be queried?
All SPARQL execution is on a dataset (actually a DatasetGraph, lower 
level than Dataset which maps to Models, Resources from the building 
blocks of Graphs and DatasetGraph).


Model is a presentation API - it has no state and the RDF is stored in 
Graphs or DatasetGraphs.


Execution is of the algebra using class OpExecutor.

The algebra is extended to have quads and blocks of quads. OpExecutor 
has both triple/basic graph pattern and quad/quadpattern steps. TDB 
rewrite the algebra to quad form then executes that.


So actually touching the data comes down to "execute OpQuadPattern" or 
"execute OpBGP" in OpExecutor. TDB executes QuadPattern natively; it 
has its own subclass of OpExecutor. (TDB also has an adapter to 
execute over a single graph as well.)


If executing in triple form for an in-memory, general purpose dataset, 
a collection of graphs. The "execute OpBGP" step calls a rgistered 
StageGenerator, which is an interface and can be set per-dataset.


StageGenerator receives a multi-triple BGP and returns the solutions - 
get to see the whole BGP.


(Currently so does TIM, the native in-memory DatasetGraph even though 
it is a quad store).


To extend for a custom graph implementation:

* do nothing - the default StageGenerator will turn the BGP execution 
into graph.find calls.


if the graph can do better for a BGP:

* Provide your own StageGenerator.

and if a dataset concerned with quads or you want to see more of the 
execution beyond BGPs:


* Provide your own OpExecutor subclass.

Everything maps back and forth between triples and quads so full 
SPARQL functionality is available uniformly, but maybe not as 
efficiently as possible.



When/where are indexes created and used?

Indexes are a feature of the storage, not the general execution strategy.


optimizing the SPARQL algebra
Optimziation has two parts: rewriting the algebra into "better" 
algebra. This may include moving filters about.


Reordering BPs/QuadPatterns is done at the Storage or the default 
StageGenerator.


    Andy







On 05/04/2021 16:29, Steve Vestal wrote:


Thanks for the pointers to the excellent tutorial materials on how 
SPARQL queries are processed and how various things affect 
performance.After a little bit of digging and experimenting with ARQ, 
I’m not clear on how things go from an optimized SPARQL algebra 
expression to evaluation of leaf BGPs and when/where indexes are 
created and used.


I assumed ARQ uses the Graph interface to access an underlying Model, 
which can be backed by any of an in-memory model, a TDB, my own class 
that implements the Graph interface, or an inference or union model 
backed by any of these.The Graph interface does not have a “find” 
method that accepts a multi-triple BGP as an input (as one of the 
tutorials described), just finds for single-triple patterns.


QueryExecutionFactory lets me create a query on either a Model or a 


Dataset.What is the difference?Which one does ARQ actually operate 
on?Will ARQ create a Dataset and DataGraph if given (say) an 
in-memory Model to be queried?Graph, Dataset, and DatasetGraph all 
support only single-triple query patterns.The TDB documentation talks 
about optimizing the SPARQL algebra, but it is the ARQ API that has 
optimization configuration options.Some initial experiments with a 
couple of ARQ Context settings resulted in little impact on a test 
query issued to a series of increasingly larger in-memory 
models.When/where are indexes created and used?


Thanks for any insight.

On 3/18/2021 9:45 AM, Steve Vestal wrote:
Thanks. I'm looking to get smarter in general about formulating 
queries, particularly those with non-trivial graph structure, e.g., 
more than just a shallow tree of properties rooted in one resource, 
maybe dags, maybe with cycles.  I am open to post-processing query 
results.  (I do that already, generating and post-processing queries 
are steps in the overall algorithm.)


On 3/18/2021 9:19 AM, Andy Seaborne wrote:



On 17/03/2021 22:45, Steve Vestal wrote:
I'd like to dig a bit deeper into SPARQL 

Re: Suggestions for learning more about SPARQL query performance?

2021-04-05 Thread Andy Seaborne

Hi Steve,

QueryExecutionFactory lets me create a query on either a Model or a 
Dataset.What is the difference?Which one does ARQ actually operate 
on?Will ARQ create a Dataset and DataGraph if given (say) an in-memory 
Model to be queried?
All SPARQL execution is on a dataset (actually a DatasetGraph, lower 
level than Dataset which maps to Models, Resources from the building 
blocks of Graphs and DatasetGraph).


Model is a presentation API - it has no state and the RDF is stored in 
Graphs or DatasetGraphs.


Execution is of the algebra using class OpExecutor.

The algebra is extended to have quads and blocks of quads. OpExecutor 
has both triple/basic graph pattern and quad/quadpattern steps. TDB 
rewrite the algebra to quad form then executes that.


So actually touching the data comes down to "execute OpQuadPattern" or 
"execute OpBGP" in OpExecutor. TDB executes QuadPattern natively; it has 
its own subclass of OpExecutor. (TDB also has an adapter to execute over 
a single graph as well.)


If executing in triple form for an in-memory, general purpose dataset, a 
collection of graphs. The "execute OpBGP" step calls a rgistered 
StageGenerator, which is an interface and can be set per-dataset.


StageGenerator receives a multi-triple BGP and returns the solutions - 
get to see the whole BGP.


(Currently so does TIM, the native in-memory DatasetGraph even though it 
is a quad store).


To extend for a custom graph implementation:

* do nothing - the default StageGenerator will turn the BGP execution 
into graph.find calls.


if the graph can do better for a BGP:

* Provide your own StageGenerator.

and if a dataset concerned with quads or you want to see more of the 
execution beyond BGPs:


* Provide your own OpExecutor subclass.

Everything maps back and forth between triples and quads so full SPARQL 
functionality is available uniformly, but maybe not as efficiently as 
possible.



When/where are indexes created and used?

Indexes are a feature of the storage, not the general execution strategy.


optimizing the SPARQL algebra
Optimziation has two parts: rewriting the algebra into "better" algebra. 
This may include moving filters about.


Reordering BPs/QuadPatterns is done at the Storage or the default 
StageGenerator.


    Andy







On 05/04/2021 16:29, Steve Vestal wrote:


Thanks for the pointers to the excellent tutorial materials on how 
SPARQL queries are processed and how various things affect 
performance.After a little bit of digging and experimenting with ARQ, 
I’m not clear on how things go from an optimized SPARQL algebra 
expression to evaluation of leaf BGPs and when/where indexes are 
created and used.


I assumed ARQ uses the Graph interface to access an underlying Model, 
which can be backed by any of an in-memory model, a TDB, my own class 
that implements the Graph interface, or an inference or union model 
backed by any of these.The Graph interface does not have a “find” 
method that accepts a multi-triple BGP as an input (as one of the 
tutorials described), just finds for single-triple patterns.


QueryExecutionFactory lets me create a query on either a Model or a 
Dataset.What is the difference?Which one does ARQ actually operate 
on?Will ARQ create a Dataset and DataGraph if given (say) an in-memory 
Model to be queried?Graph, Dataset, and DatasetGraph all support only 
single-triple query patterns.The TDB documentation talks about 
optimizing the SPARQL algebra, but it is the ARQ API that has 
optimization configuration options.Some initial experiments with a 
couple of ARQ Context settings resulted in little impact on a test 
query issued to a series of increasingly larger in-memory 
models.When/where are indexes created and used?


Thanks for any insight.

On 3/18/2021 9:45 AM, Steve Vestal wrote:
Thanks. I'm looking to get smarter in general about formulating 
queries, particularly those with non-trivial graph structure, e.g., 
more than just a shallow tree of properties rooted in one resource, 
maybe dags, maybe with cycles.  I am open to post-processing query 
results.  (I do that already, generating and post-processing queries 
are steps in the overall algorithm.)


On 3/18/2021 9:19 AM, Andy Seaborne wrote:



On 17/03/2021 22:45, Steve Vestal wrote:
I'd like to dig a bit deeper into SPARQL query performance, better 
understand how different query formulations affect that, how ARQ 
configuration parameters might be used to tune that.  Can anyone 
recommend a place to start reading beyond the SPARQL book and 
language definition?


Hi Steve,

It's a bit "it depends on the query.

There was a presentation recently  and while its not about ARQ, the 
fundamental point that getting the basic graph pattern matching 
working efficiently applies.


http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov 



Do you have specific queries in mind or is this a general enquiry?

    Andy




Re: Suggestions for learning more about SPARQL query performance?

2021-04-05 Thread Steve Vestal
Thanks for the pointers to the excellent tutorial materials on how 
SPARQL queries are processed and how various things affect 
performance.After a little bit of digging and experimenting with ARQ, 
I’m not clear on how things go from an optimized SPARQL algebra 
expression to evaluation of leaf BGPs and when/where indexes are created 
and used.


I assumed ARQ uses the Graph interface to access an underlying Model, 
which can be backed by any of an in-memory model, a TDB, my own class 
that implements the Graph interface, or an inference or union model 
backed by any of these.The Graph interface does not have a “find” method 
that accepts a multi-triple BGP as an input (as one of the tutorials 
described), just finds for single-triple patterns.


QueryExecutionFactory lets me create a query on either a Model or a 
Dataset.What is the difference?Which one does ARQ actually operate 
on?Will ARQ create a Dataset and DataGraph if given (say) an in-memory 
Model to be queried?Graph, Dataset, and DatasetGraph all support only 
single-triple query patterns.The TDB documentation talks about 
optimizing the SPARQL algebra, but it is the ARQ API that has 
optimization configuration options.Some initial experiments with a 
couple of ARQ Context settings resulted in little impact on a test query 
issued to a series of increasingly larger in-memory models.When/where 
are indexes created and used?


Thanks for any insight.

On 3/18/2021 9:45 AM, Steve Vestal wrote:
Thanks. I'm looking to get smarter in general about formulating 
queries, particularly those with non-trivial graph structure, e.g., 
more than just a shallow tree of properties rooted in one resource, 
maybe dags, maybe with cycles.  I am open to post-processing query 


results.  (I do that already, generating and post-processing queries 
are steps in the overall algorithm.)


On 3/18/2021 9:19 AM, Andy Seaborne wrote:



On 17/03/2021 22:45, Steve Vestal wrote:
I'd like to dig a bit deeper into SPARQL query performance, better 
understand how different query formulations affect that, how ARQ 
configuration parameters might be used to tune that.  Can anyone 


recommend a place to start reading beyond the SPARQL book and 
language definition?


Hi Steve,

It's a bit "it depends on the query.

There was a presentation recently  and while its not about ARQ, the 
fundamental point that getting the basic graph pattern matching 
working efficiently applies.


http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov 



Do you have specific queries in mind or is this a general enquiry?

    Andy




OpenPGP_signature
Description: OpenPGP digital signature


Re: Suggestions for learning more about SPARQL query performance?

2021-03-18 Thread Steve Vestal

Never mind earlier response, thanks again!

On 3/18/2021 9:40 AM, Rob Vesse wrote:

Realised I had linked the wrong version, correct link is below:

https://www.dropbox.com/s/knudzewbiuqkqvy/SPARQL%20Optimisation%20101%20Tutorial.pptx?dl=0

Apologies for the confusion,

Rob

On 18/03/2021, 14:37, "Rob Vesse"  wrote:

 Steve

 Think I've shared this before on-list, I produced a slide deck a long time 
ago (2014) that covers this topic more focused on ARQ

 
https://www.dropbox.com/s/ixetdcfesqse893/SPARQL%20Optimization%20101.pptx?dl=0

 Some of the details have changed in the interim (e.g. new optimizations 
added, default order of optimizations changed etc) but a lot of the core 
material is still relevant

 I would also recommend Pavel's talk that Andy linked, as Andy says it 
covers the need for query authors to frame their queries appropriately but it 
also goes into more depth around some of the core low level implementation 
details of SPARQL engines e.g. join types

 Rob

 On 18/03/2021, 14:20, "Andy Seaborne"  wrote:



 On 17/03/2021 22:45, Steve Vestal wrote:
 > I'd like to dig a bit deeper into SPARQL query performance, better
 > understand how different query formulations affect that, how ARQ
 > configuration parameters might be used to tune that.  Can anyone
 > recommend a place to start reading beyond the SPARQL book and 
language
 > definition?

 Hi Steve,

 It's a bit "it depends on the query.

 There was a presentation recently  and while its not about ARQ, the
 fundamental point that getting the basic graph pattern matching working
 efficiently applies.

 
http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov

 Do you have specific queries in mind or is this a general enquiry?

  Andy












OpenPGP_signature
Description: OpenPGP digital signature


Re: Suggestions for learning more about SPARQL query performance?

2021-03-18 Thread Steve Vestal

Thanks.  I get a "That didn't work for some reason" error from dropbox.

On 3/18/2021 9:35 AM, Rob Vesse wrote:

Steve

Think I've shared this before on-list, I produced a slide deck a long time ago 
(2014) that covers this topic more focused on ARQ

https://www.dropbox.com/s/ixetdcfesqse893/SPARQL%20Optimization%20101.pptx?dl=0

Some of the details have changed in the interim (e.g. new optimizations added, 
default order of optimizations changed etc) but a lot of the core material is 
still relevant

I would also recommend Pavel's talk that Andy linked, as Andy says it covers 
the need for query authors to frame their queries appropriately but it also 
goes into more depth around some of the core low level implementation details 
of SPARQL engines e.g. join types

Rob

On 18/03/2021, 14:20, "Andy Seaborne"  wrote:



 On 17/03/2021 22:45, Steve Vestal wrote:
 > I'd like to dig a bit deeper into SPARQL query performance, better
 > understand how different query formulations affect that, how ARQ
 > configuration parameters might be used to tune that.  Can anyone
 > recommend a place to start reading beyond the SPARQL book and language
 > definition?

 Hi Steve,

 It's a bit "it depends on the query.

 There was a presentation recently  and while its not about ARQ, the
 fundamental point that getting the basic graph pattern matching working
 efficiently applies.

 http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov

 Do you have specific queries in mind or is this a general enquiry?

  Andy








OpenPGP_signature
Description: OpenPGP digital signature


Re: Suggestions for learning more about SPARQL query performance?

2021-03-18 Thread Steve Vestal
Thanks.  I'm looking to get smarter in general about formulating 
queries, particularly those with non-trivial graph structure, e.g., more 
than just a shallow tree of properties rooted in one resource, maybe 
dags, maybe with cycles.  I am open to post-processing query results.  
(I do that already, generating and post-processing queries are steps in 
the overall algorithm.)


On 3/18/2021 9:19 AM, Andy Seaborne wrote:



On 17/03/2021 22:45, Steve Vestal wrote:
I'd like to dig a bit deeper into SPARQL query performance, better 
understand how different query formulations affect that, how ARQ 
configuration parameters might be used to tune that.  Can anyone 
recommend a place to start reading beyond the SPARQL book and 
language definition?


Hi Steve,

It's a bit "it depends on the query.

There was a presentation recently  and while its not about ARQ, the 
fundamental point that getting the basic graph pattern matching 
working efficiently applies.


http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov 



Do you have specific queries in mind or is this a general enquiry?

    Andy




OpenPGP_signature
Description: OpenPGP digital signature


Re: Suggestions for learning more about SPARQL query performance?

2021-03-18 Thread Rob Vesse
Realised I had linked the wrong version, correct link is below:

https://www.dropbox.com/s/knudzewbiuqkqvy/SPARQL%20Optimisation%20101%20Tutorial.pptx?dl=0

Apologies for the confusion,

Rob

On 18/03/2021, 14:37, "Rob Vesse"  wrote:

Steve

Think I've shared this before on-list, I produced a slide deck a long time 
ago (2014) that covers this topic more focused on ARQ


https://www.dropbox.com/s/ixetdcfesqse893/SPARQL%20Optimization%20101.pptx?dl=0

Some of the details have changed in the interim (e.g. new optimizations 
added, default order of optimizations changed etc) but a lot of the core 
material is still relevant

I would also recommend Pavel's talk that Andy linked, as Andy says it 
covers the need for query authors to frame their queries appropriately but it 
also goes into more depth around some of the core low level implementation 
details of SPARQL engines e.g. join types

Rob

On 18/03/2021, 14:20, "Andy Seaborne"  wrote:



On 17/03/2021 22:45, Steve Vestal wrote:
> I'd like to dig a bit deeper into SPARQL query performance, better 
> understand how different query formulations affect that, how ARQ 
> configuration parameters might be used to tune that.  Can anyone 
> recommend a place to start reading beyond the SPARQL book and 
language 
> definition?

Hi Steve,

It's a bit "it depends on the query.

There was a presentation recently  and while its not about ARQ, the 
fundamental point that getting the basic graph pattern matching working 
efficiently applies.


http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov

Do you have specific queries in mind or is this a general enquiry?

 Andy










Re: Suggestions for learning more about SPARQL query performance?

2021-03-18 Thread Rob Vesse
Steve

Think I've shared this before on-list, I produced a slide deck a long time ago 
(2014) that covers this topic more focused on ARQ

https://www.dropbox.com/s/ixetdcfesqse893/SPARQL%20Optimization%20101.pptx?dl=0

Some of the details have changed in the interim (e.g. new optimizations added, 
default order of optimizations changed etc) but a lot of the core material is 
still relevant

I would also recommend Pavel's talk that Andy linked, as Andy says it covers 
the need for query authors to frame their queries appropriately but it also 
goes into more depth around some of the core low level implementation details 
of SPARQL engines e.g. join types

Rob

On 18/03/2021, 14:20, "Andy Seaborne"  wrote:



On 17/03/2021 22:45, Steve Vestal wrote:
> I'd like to dig a bit deeper into SPARQL query performance, better 
> understand how different query formulations affect that, how ARQ 
> configuration parameters might be used to tune that.  Can anyone 
> recommend a place to start reading beyond the SPARQL book and language 
> definition?

Hi Steve,

It's a bit "it depends on the query.

There was a presentation recently  and while its not about ARQ, the 
fundamental point that getting the basic graph pattern matching working 
efficiently applies.

http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov

Do you have specific queries in mind or is this a general enquiry?

 Andy






Re: Suggestions for learning more about SPARQL query performance?

2021-03-18 Thread Andy Seaborne




On 17/03/2021 22:45, Steve Vestal wrote:
I'd like to dig a bit deeper into SPARQL query performance, better 
understand how different query formulations affect that, how ARQ 
configuration parameters might be used to tune that.  Can anyone 
recommend a place to start reading beyond the SPARQL book and language 
definition?


Hi Steve,

It's a bit "it depends on the query.

There was a presentation recently  and while its not about ARQ, the 
fundamental point that getting the basic graph pattern matching working 
efficiently applies.


http://www.lotico.com/index.php/SPARQL_Query_Optimization_with_Pavel_Klinov

Do you have specific queries in mind or is this a general enquiry?

Andy