Reasoners for RDFS + owl:sameAs: performance, stability & best practices

2018-02-19 Thread Andreas Kahl
Hello everyone, 

I am currently developing a little Jena Model that should be able to do
RDFS inferencing plus owl:sameAs. From the documentation I learned that
the minimal Reasoner for that is OWLmini. During development I
experienced some severe performance bottlenecks if a runtime model
contains too many owl:sameAs links and generally for nearly all models
exceeding 1000 Statements. Most of the tests simply freeze at some point
if those performance bottlenecks occur, sometimes selecting a Statement
with a SimpleSelector consisting of a subject URI, a predicate URI and a
null Object takes 20secs. 
There should be not problems with blocking of threads as I run my
integration tests single threaded - especially if I am experiencing
failures. 

I could confine this by using models without inferencing while
collecting and adding data spidered from the web, and especially adding
Ontologies last, only where absolutely needed. Also I use a whitelist
internally for domains my spider is allowed to fetch data from;
therefore I remove all owl:sameAs Statements containing object URIs not
in this whitelist. In the end, in my querying methods, I clone that
basic model with the collected data and add it to an InfModel: 

protected static Model getInfModelFrom(Model model) {
final long size = model.size();
LOG.debug("getInfModelFrom: Input size: " +
Long.toString(size));
final Model copy = ModelFactory.createDefaultModel();
copy.add(model instanceof InfModel ? ((InfModel)
model).getRawModel() : model);
final InfModel infModel =
ModelFactory.createInfModel(ReasonerRegistry.getOWLMiniReasoner(),
copy);
return infModel;
}

The only Ontology I am using is
http://d-nb.info/standards/elementset/gnd# . 

I suppose that the Reasoner I use is much to mighty for the seemingly
simple owl:sameAs. Is there any more basic option understanding
owl:sameAs besides RDFS? All other OWL Axioms are not needed. 
Are there any best practices dealing with Inferencing for relatively
small in memory models <10,000 Statements (most <5,000 Statements)? I
found some information on the web that a simple 'Equality Reasoner' is
in the works. Would that be a good choice? Will it be available any time
soon?

Thanks for any hints
Andreas



Re: Configuring fuseki with TDB2 and OWL reasoning

2018-02-19 Thread Eric Boisvert
made the change.

Now I get

Result: failed with message "Not in a transaction"

when I try to load from the interface


log:

(...)

[2018-02-19 17:21:31] Fuseki INFO  [5] Filename: test.ttl,
Content-Type=appl
ication/octet-stream, Charset=null => Turtle : Count=4 Triples=4 Quads=0
[2018-02-19 17:21:31] Fuseki INFO  [5] 500 Not in a transaction (29 ms)

Thank you very much helping me with this

2018-02-19 9:00 GMT-05:00 Andy Seaborne :

>
>
> On 19/02/18 12:46, Eric Boisvert wrote:
>
>> :dataset a ja:RDFDataset ;
>>  tdb2:defaultGraph   <#model_inf> .
>> ## tdb2:location  "c:\\fuseki/run/databases/gsip".
>>
>>
> ja:defaultGraph  not  tdb2:defaultGraph
>
> :dataset is a plain, in-memory dataset to hold the InfModel
>
> Andy
>


Re: Configuring fuseki with TDB2 and OWL reasoning

2018-02-19 Thread Andy Seaborne



On 19/02/18 12:46, Eric Boisvert wrote:

:dataset a ja:RDFDataset ;
 tdb2:defaultGraph   <#model_inf> .
## tdb2:location  "c:\\fuseki/run/databases/gsip".



ja:defaultGraph  not  tdb2:defaultGraph

:dataset is a plain, in-memory dataset to hold the InfModel

Andy


Re: Configuring fuseki with TDB2 and OWL reasoning

2018-02-19 Thread Eric Boisvert
sorry.

here's my new config file

:service_tdb_all  a   fuseki:Service ;
fuseki:dataset:dataset ;
fuseki:name   "gsip" ;
fuseki:serviceQuery   "query" , "sparql" ;
fuseki:serviceReadGraphStore  "get" ;
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:serviceUpdate  "update" ;
fuseki:serviceUpload  "upload" .

:dataset a ja:RDFDataset ;
tdb2:defaultGraph   <#model_inf> .
## tdb2:location  "c:\\fuseki/run/databases/gsip".


<#model_inf> a ja:InfModel ;
 ja:baseModel <#graph> ;
 ja:reasoner [
 ja:reasonerURL 
 ] .

<#graph> rdf:type tdb2:GraphTDB ;
  tdb2:dataset :datasetTDB2 .

 ## Storage
:datasetTDB2 rdf:type tdb2:DatasetTDB2 ;
tdb2:location "c:/fuseki/run/databases/gsip".



log file:

[2018-02-19 07:41:12] Server INFO  Apache Jena Fuseki 3.6.0
[2018-02-19 07:41:12] Config INFO  FUSEKI_HOME=C:\fuseki\.
[2018-02-19 07:41:12] Config INFO  FUSEKI_BASE=C:\fuseki\run
[2018-02-19 07:41:12] Config INFO  Shiro file: file://C:\fuseki\run\
shiro.in
i
[2018-02-19 07:41:12] Config INFO  Configuration file:
C:\fuseki\run\config.
ttl
[2018-02-19 07:41:13] Config INFO  Load configuration:
file:///C:/fuseki/run
/configuration/gsip.ttl
[2018-02-19 07:41:13] Config INFO  Register: /gsip
[2018-02-19 07:41:13] Server INFO  Started 2018/02/19 07:41:13 EST on
port 3
030
[2018-02-19 07:41:34] Fuseki INFO  [1] POST
http://localhost:3030/gsip/sparq
l
[2018-02-19 07:41:34] Fuseki INFO  [1] POST /gsip :: 'sparql' ::
[applicatio
n/x-www-form-urlencoded charset=utf-8] ?
[2018-02-19 07:41:34] Fuseki INFO  [1] Query = DESCRIBE <
http://domain.org/i
d/rechargeArea/r1>
[2018-02-19 07:41:34] Fuseki INFO  [1] 200 OK (40 ms)
[2018-02-19 07:41:47] Fuseki INFO  [2] POST
http://localhost:3030/gsip/data
[2018-02-19 07:41:47] Fuseki INFO  [2] POST /gsip :: 'data' ::
[multipart/fo
rm-data] ?
[2018-02-19 07:41:47] Fuseki INFO  [2] Filename: test.ttl,
Content-Type=appl
ication/octet-stream, Charset=null => Turtle : Count=4 Triples=4 Quads=0
[2018-02-19 07:41:47] Fuseki INFO  [2] 200 OK (41 ms)
[2018-02-19 07:41:51] Fuseki INFO  [3] POST
http://localhost:3030/gsip/sparq
l
[2018-02-19 07:41:51] Fuseki INFO  [3] POST /gsip :: 'sparql' ::
[applicatio
n/x-www-form-urlencoded charset=utf-8] ?
[2018-02-19 07:41:51] Fuseki INFO  [3] Query = DESCRIBE <
http://domain.org/i
d/rechargeArea/r1>
[2018-02-19 07:41:51] Fuseki INFO  [3] 200 OK (24 ms)


loading my test file and still no inference

on a interesting note:  no database created in the /databases/ folder.

 Thanks for your help.

Eric


2018-02-18 10:10 GMT-05:00 Andy Seaborne :

> Barry NL's comment on StackOveflow looks relevant:
>
> > :dataset a ja:RDFDataset ;
> >  ja:defaultGraph   <#model_inf> ;
> > tdb:location  "c:\\fuseki/run/databases/gsip".
> >
> >
> > <#model_inf> a ja:InfModel ;
> >   ja:baseModel <#graph> ;
> >   ja:reasoner [
> >   ja:reasonerURL 
> >   ] .
> >
> > <#graph> rdf:type tdb:GraphTDB ;
> >tdb:dataset :dataset .
> >
>
> Whaty we want is:
>
> service
>   -> Dataset with inference graph as default graph
>   -> inference graph uses TDB2 default graph
>   -> TDB2 dataset whose dft graph is the storage
>
> -
> ## Dataset to hold the inference graph
> :dataset a ja:RDFDataset ;
> ja:defaultGraph   <#model_inf> .
>
> <#model_inf> a ja:InfModel ;
>   # Inference graph uses #graph as base
>   ja:baseModel <#graph> ;
>   ja:reasoner [
>   ja:reasonerURL 
>   ] .
>
> ## Base is a TDB2 graph.
> <#graph> rdf:type tdb2:GraphTDB ;
>tdb2:dataset :datasetTDB2 .
>
> ## Storage
> :datasetTDB2 rdf:type tdb2:DatasetTDB2 ;
> tdb2:location ""c:/fuseki/run/databases/gsip".
> -
>
> I have a suspicion that the :dataset loopback results in non-deterministic
> behaviour as one item has two roles and that it worked for TDB1 is down to
> luck.
>
> Andy
>
> On 16/02/18 13:51, Eric Boisvert wrote:
>
>> So, here's my complete test
>>
>> small naive dataset
>>
>> @prefix rdf:  .
>> @prefix rdfs:  .
>> @prefix owl:  .
>> @prefix hy: .
>>
>> hy:rechargeZone a owl:ObjectProperty;
>> owl:inverseOf hy:sourceOf.
>> hy:sourceOf a owl:ObjectProperty.
>>
>> 
>> hy:rechargeZone .
>>
>> so,
>>
>> DESCRIBE .
>>
>> should return
>>
>>  hy:sourceOf <
>> http://domain.org/id/waterwells/0001>.
>>
>> This configuration runs, but no infere