Re: ARQ Sparql Algebra Extension

2017-11-29 Thread Andy Seaborne



On 29/11/17 12:07, anuj kumar wrote:
Hi,done 
  So I am working on a performance issue with our Triple Store (which is

based on HBase)
To give a background, the query I am executing looks like:

SELECT ?s

WHERE {
 ?s a file:File .
 ?s ex:modified ?modified .
 FILTER(?modified >="2017-11-05T00:00:00.0"^^)
}



Looking at the ARQ Execution plan, it is like this:


It's an algebra expression - it may not may not have been through the 
optimizer. In this case the high-level 9algebra) optimize doesn't do 
much with this query.


This does not stop your system doing some more optimziation in its own 
OpExecutor.




(slice 0 1000


Not in your query.


 (project (?s)
   (filter (>= ?modified "2017-1105T00:00:00.0"^^)
 (bgp
   (triple ?s  <
http://www.example.com/File#File>)
   (triple ?s  ?modified)
 



AND I have around 45000 File Objects in my Triple Store.

As you can see from the above execution plan, I first get the Subject ID
for these 45000 File objects and then I fire a query per File Id to get the
odified date for the same. This clearly is not performant.


Not good for two reasons:

All the round triples to get the "ex:modified" when it should be server 
side (OK - that means putting something in the Hbase machine)


And also, it could do a range scan:
(think of hat as a physical execution plan and the algebra as a logical 
execution plan)







My Questions:

1. Is there a better way to create a SELECT query to have a good execution
plan.


Ideally, no but try this

 SELECT ?s
 WHERE {
  ?s ex:modified ?modified .
  FILTER(?modified >="2017-11-05T00:00:00.0"^^xsd;dateTime)
  ?s a file:File .
 }


changing the BGP order and doing filter placement to get:

(project (?s)
  (sequence
(filter (>= ?modified "2017-11-05T00:00:00.0"^^xsd:dateTime)
  (bgp (triple ?s ex:modified ?modified)))
(bgp (triple ?s rdf:type :File>


then in your code do:

(filter (>= ?modified "2017-11-05T00:00:00.0"^^xsd:dateTime)
  (bgp (triple ?s ex:modified ?modified)))


all in HBase (its a single range scan)

Subclass OpExecutor and implement OpFilter to spot such cases.


2. If not, then can I somehow change the generation of execution plan?
3. Is it advisable to re-write the ARQ Execution Plan to suite our need and
how complicated this might be.


How sophisticated do you want it to be?!

It's an open ended question - more work, better optimization!



Thanks and please let me know if you need more information.

Thanks,
Anuj Kumar



Re: Error 500: Not in a transaction

2017-11-29 Thread Andy Seaborne
Extracting a single graph from TDB2 and using it in a general dataset 
isn't supported (yet, if ever).


Andy

On 29/11/17 11:56, Laura Morales wrote:

So, this is the Assembler file that I'm trying to use:

==
PREFIX :<#>
PREFIX fuseki:  
PREFIX rdf: 
PREFIX rdfs:
PREFIX tdb2:
PREFIX ja:  

[] rdf:type fuseki:Server ;
fuseki:services (
  <#service_tdb2>
) .

<#service_tdb2> a fuseki:Service ;
 rdfs:label  "TDB2 Service" ;
 fuseki:name "tdb2" ;
 fuseki:serviceQuery "query", "sparql" ;
 fuseki:dataset  <#dataset> ;
 .

<#dataset> a ja:RDFDataset ;
 # tdb2:location  "" ;
 # tdb2:unionDefaultGraph true ;
 ja:namedGraph[ ja:graphName  ;
   ja:graph <#g1> ] ;
 .

<#g1> a tdb2:GraphTDB2 ;
 tdb2:location "g1" ;
 .
==

what I'm trying to do, basically, is to have a dataset with multiple graphs but 
where each graph is located in its own location. Adding another graph would be 
(ideally) like this:

<#g2> a tdb2:GraphTDB2 ;
 tdb2:location "g2" ;
 .

With the above assembler:

# select ?g where { graph ?g {} }

this query works, I get a list of graphs

# select * from ex:g1 where { ?s ?p ?o } limit 10

this query doesn't work, I get "Error 500: Not in a transaction" but I really 
don't understand how to fix this. Is it possible to do what I'm trying to achive?



Where are the jena vocabularies defined?

2017-11-29 Thread Laura Morales
ja:  
fuseki:  
tdb: 
tdb2:

visiting their URL I only get 404.


ARQ Sparql Algebra Extension

2017-11-29 Thread anuj kumar
Hi,
 So I am working on a performance issue with our Triple Store (which is
based on HBase)
To give a background, the query I am executing looks like:

SELECT ?s
> WHERE {
> ?s a file:File .
> ?s ex:modified ?modified .
> FILTER(?modified >="2017-11-05T00:00:00.0"^^ www.w3.org/2001/XMLSchema#dateTime>)
> }


Looking at the ARQ Execution plan, it is like this:

(slice 0 1000
> (project (?s)
>   (filter (>= ?modified "2017-1105T00:00:00.0"^^ www.w3.org/2001/XMLSchema#dateTime>)
> (bgp
>   (triple ?s  <
> http://www.example.com/File#File>)
>   (triple ?s  ?modified)
> 


AND I have around 45000 File Objects in my Triple Store.

As you can see from the above execution plan, I first get the Subject ID
for these 45000 File objects and then I fire a query per File Id to get the
odified date for the same. This clearly is not performant.

My Questions:

1. Is there a better way to create a SELECT query to have a good execution
plan.
2. If not, then can I somehow change the generation of execution plan?
3. Is it advisable to re-write the ARQ Execution Plan to suite our need and
how complicated this might be.

Thanks and please let me know if you need more information.

Thanks,
Anuj Kumar
-- 
*Anuj Kumar*


Error 500: Not in a transaction

2017-11-29 Thread Laura Morales
So, this is the Assembler file that I'm trying to use:

==
PREFIX :<#>
PREFIX fuseki:  
PREFIX rdf: 
PREFIX rdfs:
PREFIX tdb2:
PREFIX ja:  

[] rdf:type fuseki:Server ;
   fuseki:services (
 <#service_tdb2>
   ) .

<#service_tdb2> a fuseki:Service ;
rdfs:label  "TDB2 Service" ;
fuseki:name "tdb2" ;
fuseki:serviceQuery "query", "sparql" ;
fuseki:dataset  <#dataset> ;
.

<#dataset> a ja:RDFDataset ;
# tdb2:location  "" ;
# tdb2:unionDefaultGraph true ;
ja:namedGraph[ ja:graphName  ;
   ja:graph <#g1> ] ;
.

<#g1> a tdb2:GraphTDB2 ;
tdb2:location "g1" ;
.
==

what I'm trying to do, basically, is to have a dataset with multiple graphs but 
where each graph is located in its own location. Adding another graph would be 
(ideally) like this:

<#g2> a tdb2:GraphTDB2 ;
tdb2:location "g2" ;
.

With the above assembler:

# select ?g where { graph ?g {} }

this query works, I get a list of graphs

# select * from ex:g1 where { ?s ?p ?o } limit 10

this query doesn't work, I get "Error 500: Not in a transaction" but I really 
don't understand how to fix this. Is it possible to do what I'm trying to 
achive?


Re: Fuseki Reasoner

2017-11-29 Thread Andy Seaborne



On 29/11/17 08:13, Dave Reynolds wrote:
With the Jena API there's a separate validate() call on InfModel (and 
thus on OntModel) which will return a list of validation error reports.


I don't think there's any way to invoke that from Fuseki.


Thread drift 

Pull request 316 [1] is an addition to Fuseki (service side) to allow 
extension services to be added - like an "invoke inference" service or 
"validation report" service.


"/datasets/query" is the query service on /dataset - an extension 
service would be "/datasets/validation-report" and it can have an HTTP 
query string.


PR#316 is not an implementation of any new services but it means code 
can be added and wired into Fuseki for features not in the core system.


If there are customization features that don't fit into the "add a 
service" style, do please give a use case and describe what is required.


Andy

[1] https://github.com/apache/jena/pull/316

Whether the builtin rule reasoners would detect your specific issue I'm 
not sure but probably. However, in general if you want complete DL level 
validation you need to use a DL reasoner such as Pellet which is not 
included in Jena.


Dave


On 28/11/17 20:08, Hélio Azevedo wrote:

Hi

I have an ontology that was elaborated with the support of the Protégé 
tool.
In this ontology I use a data property, named "objectId". This 
property was

flagged as "Functional". If I insert two "objectId" for the same concept
and activate the Pellet reasoner the Protégé tool signals an error.

When loading the same ontology in Fuseki environment and inserting two
"objectId" for the same concept, Fuseki does not acknowledge an error!
The new objectId is inserted with the use of a Sparql endpoint.

How should I proceed ?

The reasoner activation on Fuseki is done by the configuration below:

@prefix :   .
@prefix tdb:    .
@prefix rdf:    .
@prefix ja:     .
@prefix rdfs:   .
@prefix fuseki:  .


@prefix ontsense:  .
@prefix xsd:  .
@prefix owl:  .



:service1  a  fuseki:Service ;
 fuseki:dataset    :dataset ;
 fuseki:name   "ontsense" ;
 fuseki:serviceQuery   "query" , "sparql" ;
 fuseki:serviceReadGraphStore  "get" ;
 fuseki:serviceReadWriteGraphStore
 "data" ;
 fuseki:serviceUpdate  "update" ;
 fuseki:serviceUpload  "upload" .

:dataset  a ja:DatasetTxnMem ;
   ja:defaultGraph <#model_inf_1> ;
   .

<#model_inf_1> rdfs:label "Inf-1" ;
  ja:reasoner
  [ ja:reasonerURL
  ];








Re: Fuseki Reasoner

2017-11-29 Thread Dave Reynolds
With the Jena API there's a separate validate() call on InfModel (and 
thus on OntModel) which will return a list of validation error reports.


I don't think there's any way to invoke that from Fuseki.

Whether the builtin rule reasoners would detect your specific issue I'm 
not sure but probably. However, in general if you want complete DL level 
validation you need to use a DL reasoner such as Pellet which is not 
included in Jena.


Dave


On 28/11/17 20:08, Hélio Azevedo wrote:

Hi

I have an ontology that was elaborated with the support of the Protégé tool.
In this ontology I use a data property, named "objectId". This property was
flagged as "Functional". If I insert two "objectId" for the same concept
and activate the Pellet reasoner the Protégé tool signals an error.

When loading the same ontology in Fuseki environment and inserting two
"objectId" for the same concept, Fuseki does not acknowledge an error!
The new objectId is inserted with the use of a Sparql endpoint.

How should I proceed ?

The reasoner activation on Fuseki is done by the configuration below:

@prefix :   .
@prefix tdb:    .
@prefix rdf:    .
@prefix ja: .
@prefix rdfs:   .
@prefix fuseki:  .


@prefix ontsense:  .
@prefix xsd:  .
@prefix owl:  .



:service1  a  fuseki:Service ;
 fuseki:dataset:dataset ;
 fuseki:name   "ontsense" ;
 fuseki:serviceQuery   "query" , "sparql" ;
 fuseki:serviceReadGraphStore  "get" ;
 fuseki:serviceReadWriteGraphStore
 "data" ;
 fuseki:serviceUpdate  "update" ;
 fuseki:serviceUpload  "upload" .

:dataset  a ja:DatasetTxnMem ;
   ja:defaultGraph <#model_inf_1> ;
   .

<#model_inf_1> rdfs:label "Inf-1" ;
  ja:reasoner
  [ ja:reasonerURL
  ];