Re: Re: Starting Fuseki server from Jena

2024-09-24 Thread Lorenz Buehmann
there was an issue on this which seems to affect only Windows user: 
https://github.com/jetty/jetty.project/issues/6661


But in fact this is just a DEBUG log level message telling you that your 
JVM does not support SO_REUSEPORT on that network, should not be an 
issue in general


On 25.09.24 03:50, Zlatareva, Neli (Computer Science) wrote:

java.lang.UnsupportedOperationException: 'SO_REUSEPORT' not supported


--
Lorenz Bühmann
Research Associate/Scientific Developer

emailbuehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany


Re: Re: Where did import org.apache.jena.rdf.model.SimpleSelector go?

2024-09-23 Thread Lorenz Buehmann

As mentioned in the docs, you can simply do

listStatements( S, P, O )

instead of

listStatements( new SimpleSelector( S, P, O ) )



On 23.09.24 18:52, Simon Bin wrote:

It was deprecated here https://github.com/apache/jena/issues/1970 and
removed here https://github.com/apache/jena/issues/2021 I guess the
tutorials and websites still need to be freshend up. Basically you
don't need it anymore.

On Mon, 2024-09-23 at 16:58 +0200, Andreas Kahl wrote:

Hello everyone,

after a long time, I came back to my old Jena code, using Jena 3.17.
and updated the dependency to Jena 5.1.0.
Now import org.apache.jena.rdf.model.SimpleSelector is gone...

I can't find any trace of this class in versions newer than 4.4.0.
Nonetheless, the Tutorial still contains this class:
https://jena.apache.org/tutorials/rdf_api.html#ch-Querying-a-Model

Is there something wrong with my Maven Repo, or has Selector /
SimpleSelector moved to another package, Google can't find?

Thanks & Best Regards
Andreas





--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Identifying the currently-running queries

2024-09-05 Thread Lorenz Buehmann

Hi,

we're running Fuseki 5.1.0 and are using the logging of Jena. That works 
fine for us.


There is no logging of SPARQL Update statements though because those can 
get to large in terms of text.



By the way, I would not run an ancient Fuseki 3.4.0 in production 
anymore - is there a reason for this? just because it works?



Cheers,

Lorenz

On 05.09.24 14:04, Hugo Mills wrote:


Hi, all,

We’ve got a heavily-used webapp backed by Fuseki, and we’re having 
issues with the load on the Fuseki server. It frequently heads into a 
storm of high load average, with the CPU usage pegged at 600% (on 6 
cores), and then the app grinds to a halt and we have to restart the 
database. We’re trying to understand why this is happening. Is there 
the ability in Fuseki to get a list of the currently-running queries 
at any given point in time, including the query text itself, and 
preferably also the amount of time each one has been running for?


We’re running Fuseki from Jena 3.4.0, if that makes a difference to 
the answer.


Thanks,

Hugo.

Hugo Mills

Development Team Leader

agrimetrics.co.uk 

Reading Enterprise Centre, Whiteknights Road, Reading, UK, RG6 6BU







--
Lorenz Bühmann
Research Associate/Scientific Developer

emailbuehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany


Re: Re: RDFS subPropertyOf property path query performance

2024-05-14 Thread Lorenz Buehmann

Hi Christian,

thanks for sharing a self-contained project.

What happens if you avoid the FILTER IN expression(s), which can be too 
expensive as the filter happens. And maybe use inline data to restrict 
the evaluation to given resources:



|PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <https://stati-cal.com/ontology/al/>
PREFIX aln: <https://stati-cal.com/ontology/al/bnode/>
SELECT (COUNT(1) AS ?cnt)
WHERE
{

  ?cmit a :StaticMethod ;
  :name "Commit" .
  ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ;
 a ?procedureOrTrigger .
  VALUES ?procedureOrTrigger {:Procedure :Trigger}

  VALUES ?ownerType {
    :Table :TableExtension :Page :PageExtension
    :Report :ReportExtension :Codeunit :XmlPort
    :Query :ControlAddIn :Enum :EnumExtension
    :PageCustomization :Profile
    :DotNetPackage :Interface
    :PermissionSet :PermissionSetExtension
    :Entitlement :DotNet
  }
  ?owner  :contains+  ?decl ;
  a ?ownerType .

  ?decl :localKey ?localkey .
}|


That should at least be a bit faster if I'm not wrong.

You should also provide the TDB database with some statistics about the 
data. Use tdb2.tdbstats for this to create a stats.opt file which you 
put in the Data-001 directory. This helps the optimizer in reordering of 
triple patterns. Won't work for property paths though, but in general 
it's a good idea to give it a try.



Cheers,
Lorenz

On 14.05.24 10:21, Christian Clausen wrote:

Hi Lorenz,

I have shared a Java project which includes data here:
https://drive.google.com/file/d/1MOQXNmTEmJBnzLIgQ3pQViQbiyvTT76q/view?usp=sharing

In GraphServer.java there is a variable USE_RDFS, which you can use to
switch between using RDFS and not.

In preparing the repro I realized that the performance difference only
occurs on more complex queries than what I originally thought.

The test query is this:

PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX :<https://stati-cal.com/ontology/al/>
PREFIX aln:<https://stati-cal.com/ontology/al/bnode/>

SELECT (COUNT(1) AS ?cnt)
WHERE
   { ?cmit a :StaticMethod ;
   :name "Commit" .
 ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ;
   a ?procedureOrTrigger .
 FILTER(?procedureOrTrigger in (:Procedure, :Trigger))
 ?owner  :contains+  ?decl ;
 a ?ownerType .
 FILTER(?ownerType in (:Table, :TableExtension, :Page, :PageExtension,
:Report, :ReportExtension, :Codeunit, :XmlPort,
:Query, :ControlAddIn, :Enum, :EnumExtension,
:PageCustomization, :Profile,
:DotNetPackage, :Interface,
:PermissionSet, :PermissionSetExtension,
:Entitlement, :DotNet))
 ?decl :localKey ?localkey .
   }

(For simplicity, I have used count instead of selecting ?owner and
?localKey which we use in our application.)

This is how I can the tests:

curl -v -X POST --header "Content-Type: application/sparql-query"
--data-binary @test1.sparqlhttp://localhost:3030/CodeGraph/query

With RDFS enabled, the query runs in about 80 seconds.

With RDFS disabled, it takes about 1-2 seconds.

Interestingly, it if I leave out the part that begins with ?owner...:

SELECT (COUNT(1) AS ?cnt)
WHERE
   { ?cmit a :StaticMethod ;
   :name "Commit" .
 ?decl (:possiblyCommittingProceduralFlow)+ ?cmit ;
   a ?procedureOrTrigger .
 FILTER(?procedureOrTrigger in (:Procedure, :Trigger))
   }

Then performance is similar (and good) with and without RDFS.

/Christian


On Mon, 13 May 2024 at 12:04, Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:


Hi,

does it mean the ?origin is always bound to a resource in the graph? Can
you share the whole query maybe?

How long are the sequences in the graph? How many paths starting from a
node, i.e. what's the out degree in general per node?

Also, would it be possible to share some kind of data for investigation?

In general, the RDFS inference you're using is pretty light-weight,
running at query eval time - all it does at triple pattern eval time is
to incorporate in your case the rdfs:subProperty triple from the schema,
but it might indeed grow at each step on the path


Cheers,

Lorenz

On 13.05.24 09:41, Christian Clausen wrote:

In our graph we have :flow properties and need to distinguish different
kinds of flows, :flowA and :flowB.

We modelled this with in RDFS:

  :flowA rdfs:subPropertyOf :flow
  :flowB rdfs:subPropertyOf :flow

Some of our SPARQL queries use :flow+ and some use :flowA+, always from

an

origin:

  ?origin :flowA+ :?result

or

  ?origin :flow+ :?result

If we start Fusek

Re: RDFS subPropertyOf property path query performance

2024-05-13 Thread Lorenz Buehmann

Hi,

does it mean the ?origin is always bound to a resource in the graph? Can 
you share the whole query maybe?


How long are the sequences in the graph? How many paths starting from a 
node, i.e. what's the out degree in general per node?


Also, would it be possible to share some kind of data for investigation?

In general, the RDFS inference you're using is pretty light-weight, 
running at query eval time - all it does at triple pattern eval time is 
to incorporate in your case the rdfs:subProperty triple from the schema, 
but it might indeed grow at each step on the path



Cheers,

Lorenz

On 13.05.24 09:41, Christian Clausen wrote:

In our graph we have :flow properties and need to distinguish different
kinds of flows, :flowA and :flowB.

We modelled this with in RDFS:

 :flowA rdfs:subPropertyOf :flow
 :flowB rdfs:subPropertyOf :flow

Some of our SPARQL queries use :flow+ and some use :flowA+, always from an
origin:

 ?origin :flowA+ :?result

or

 ?origin :flow+ :?result

If we start Fuseki *without* RDFS, the following queries finish in a second
or two:

 ?origin :flowA+ :?result
 ?origin :(flowA | :flowB)+ :?result

If we start Fuseki *with* RDFS, the following queries take about 85 seconds:

 ?origin :flowA+ :?result
 ?origin :flow+ :?result




What is causing this difference in performance? Are we missing something or
should we avoid RDFS for optimal performance? Any other alternatives?

Our overall process is:

1. Generate TTL files with :flowA and :flowB properties (not :flow other
than implied by rdfs:subPropertyOf)
2. Load with TDB2 loader
3. Start Fuseki (with RDSF vocabulary or not)

Here follows the code we use to start Fuseki.

Without RDFS:

 *Dataset data = TDB2Factory.connectDataset(options.directory);*

 FusekiServer server = FusekiServer.create()
 .port(options.port)
 .loopback(true)
 *.addDataset(options.datasetName, data.asDatasetGraph())*
 .addEndpoint(options.datasetName, "query", Operation.Query)
 // shortestPath
 .registerOperation(shortestPathOp, WebContent.contentTypeJSON,
new ShortestPathService())
 .addEndpoint(options.datasetName, "shortestPath",
shortestPathOp)
 // diagnostics
 .verbose(true)
 .enablePing(true)
 .enableStats(true)
 .enableMetrics(true)
 .enableTasks(true)
 .build();

 // Start
 server.start();

With RDFS:



*Dataset data = TDB2Factory.connectDataset(options.directory);Graph
vocabulary = RDFDataMgr.loadGraph(options.vocabularyFileName);
DatasetGraph dsg = RDFSFactory.datasetRDFS(data.asDatasetGraph(),
vocabulary);*

 FusekiServer server = FusekiServer.create()
 .port(options.port)
 .loopback(true)
 *.addDataset(options.datasetName,dsg)*
 .addEndpoint(options.datasetName, "query", Operation.Query)
 // shortestPath
 .registerOperation(shortestPathOp, WebContent.contentTypeJSON,
new ShortestPathService())
 .addEndpoint(options.datasetName, "shortestPath",
shortestPathOp)
 // diagnostics
 .verbose(true)
 .enablePing(true)
 .enableStats(true)
 .enableMetrics(true)
 .enableTasks(true)
 .build();

 // Start
 server.start();


--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Decompose a query into Star, Path and Sink shapes

2024-04-18 Thread Lorenz Buehmann

Hi,

can you give an example what you're trying to achieve? And what is the 
use case? Query optimization? Text to SPARQL? Are you talking about 
SPARQL, SHeX or SHACL?


In case of SPARQL queries, decomposing a SPARQL query into some 
sub-structures depends on what you assume being parts for your use case.


Also, I'm not sure what you consider as [Start|Path|Sink]-Patterns? Are 
those common terms?



On 17.04.24 23:44, Hashim Khan wrote:

Hi,

Using Apache Jena, decomposing a SPARQL query into sub-queries based on its
shape is what I need.


Indeed, you can see a SPARQL query as a graph - some people talk about 
query graphs - and then you could apply common graph-based techniques to 
divide the graph into parts. Traversing the SPARQL structure can be done 
on the "Element" layer in Jena code.


But I personally, would simply convert the SPARQL query (or maybe the 
algebra) to some JGraphT graph and then work on the method from this 
graph-driven framework. At least that's how we were doing it in the 
past. Indeed, this is mostly



  So far, I found a class for StartPattern and I am
wondering if someone can tell me about such a resource for "path
pattern" and "Sink pattern" ... Thanks in advance
I don't see a class StartPattern in the code, that's only something in 
the context of SHeX.



Best,


--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: RE: Performance question with joins

2024-04-03 Thread Lorenz Buehmann

To share some numbers with and without an artificial join key variable:

Query used from initial thread:

|select (count(*) as ?C) where { { select ?X ?Y (struuid() as ?UUID) 
where { values ?X_i { 0 1 2 3 4 5 6 7 8 9 } values ?X_j { 0 1 2 3 4 5 6 
7 8 9 } bind ( ?X_i + 10 * ?X_j as ?X) values ?Y_i { 0 1 2 3 4 5 6 7 8 9 
} values ?Y_j { 0 1 2 3 4 5 6 7 8 9 } bind ( ?Y_i + 10 * ?Y_j as ?Y) } } 
{ select ?X ?Y where { { select ?X ?Y (rand() as ?RAND) where { values 
?X_i { 0 1 2 3 4 5 6 7 8 9 } values ?X_j { 0 1 2 3 4 5 6 7 8 9 } bind ( 
?X_i + 10 * ?X_j as ?X) values ?Y_i { 0 1 2 3 4 5 6 7 8 9 } values ?Y_j 
{ 0 1 2 3 4 5 6 7 8 9 } bind ( ?Y_i + 10 * ?Y_j as ?Y) } } filter (?RAND 
< 0.95) } } }|


and with an artificial join key

|select (count(*) AS ?cnt)
where {
  {
    select
    (CONCAT(STR(?X), "|", STR(?Y)) AS ?key) (struuid() as ?UUID)
    where {
  values ?X_i { 0 1 2 3 4 5 6 7 8 9 10 }
  values ?X_j { 0 1 2 3 4 5 6 7 8 9 10 }
  bind ( ?X_i + 10 * ?X_j as ?X)
  values ?Y_i { 0 1 2 3 4 5 6 7 8 9 10 }
  values ?Y_j { 0 1 2 3 4 5 6 7 8 9 10 }
  bind ( ?Y_i + 10 * ?Y_j as ?Y)
    }
  }
  {
    select
    (CONCAT(STR(?X), "|", STR(?Y)) AS ?key)
    where {
  {
    select ?X ?Y (rand() as ?RAND)
    where {
      values ?X_i { 0 1 2 3 4 5 6 7 8 9 10 }
      values ?X_j { 0 1 2 3 4 5 6 7 8 9 10 }
      bind ( ?X_i + 10 * ?X_j as ?X)
      values ?Y_i { 0 1 2 3 4 5 6 7 8 9 10 }
      values ?Y_j { 0 1 2 3 4 5 6 7 8 9 10 }
      bind ( ?Y_i + 10 * ?Y_j as ?Y)
    }
  }
  filter (?RAND < 0.95)
    }
  }
}|

To increase data I also increased the sequence from 10 to 20 and 30 in 
further runs.


I disabled the join rewriter via flag because otherwise the join is 
being rewritten to a sequence, so I'm forcing the hash join in this query:


|arq --explain --time --set arq:optIndexJoinStrategy=false --query test.rq|


#0..10:

##Without:

|-
| cnt   |
=
| 18880 |
-
Time: 2.283 sec|

##With:

|-
| cnt   |
=
| 18849 |
-
Time: 0.868 sec|


#0..20:

##Without:

|--
| cnt    |
==
| 802195 |
--
Time: 47.565 sec|

##With:

|--
| cnt    |
==
| 802318 |
--
Time: 3.282 sec|

|
|

#0..30:

##Without:

|---
| cnt |
===
| 8066735 |
---
Time: 688.268 sec|

##With:

|---
| cnt |
===
| 8072254 |
---
Time: 13.429 sec|



So for the current query its way faster to force the join being done on 
all matching variables (X, Y) - but I don't know implications for other 
queries.



Cheers,
Lorenz

On 02.04.24 21:13, John Walker wrote:

Hi James



-Original Message-
From: James Anderson
Sent: Tuesday, 2 April 2024 18:53
To: usersjena. apache. org
Subject: Re: Performance question with joins

good evening;


On 2. Apr 2024, at 12:27, Lorenz Buehmann 
leipzig.de> wrote:

if this description is accurate


according to the hash join implementation in Jena in class

AbstractIterHashJoin a join key is created via line

 joinKey = JoinKey.createVarKey(varsLeft, varsRight) ; That method
does take only the first variable in both bindings as join key instead of all

matching variables. In our case that would probably be ?wafer I guess?

and the estimate of constituent cardinality is correct,


  The cardinality on the left and right side of the join is around 125k.

then, depending on the distribution of the ?wafer values, this could produce a
large intermediate cross-join.

In my test case, every solution will have the same value for ?wafer variable.
So, if that is being used as the join key for a hash join, then it can explain 
the problem.

I tried running the query without projecting ?wafer from the subqueries (so 
forcing another variable to be used as the join key) and it completes in 13 
seconds.

I also tried other things like moving ?wafer to be the last variable in the 
select of both subqueries and renaming the variables ?X_ and ?Y_ to ?a and ?b, 
but that does not appear to help.
So, the logic for taking the "first" variable in both bindings is something of 
a mystery.


what is the cardinality of the results prior to the distinct operation?

Somewhere around 120k.

That distinct is essentially redundant, each position on each wafer will be 
associated to a globally unique uid value.


---
james anderson |ja...@dydra.com  |https://dydra.com


--
Lorenz Bühmann
Research Associate/Scientific Developer

emailbuehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany


Re: RE: Performance question with joins

2024-04-02 Thread Lorenz Buehmann

Hi,

according to the hash join implementation in Jena in class 
AbstractIterHashJoin a join key is created via line


joinKey = JoinKey.createVarKey(varsLeft, varsRight) ;

That method does take only the first variable in both bindings as join 
key instead of all matching variables. In our case that would probably 
be ?wafer I guess?


Is that a trade-off between hashing costs and join-performance @Andy?

The join key (wafer) sounds like the least selective one for joining 
data of both subqueries compared to (x) or (y) or even better (x, y) .



What happens if you try to generate an artificial key by concatenating 
the variables to a single one? Probably doesn't change anything, but who 
knows?


```
prefix mfg: 
prefix rdfs: 
prefix xsd: 

select distinct ?uid
where {
  { select (CONCAT( STR(?wafer), STR(?X - ?ref_X),  STR(?Y - ?ref_Y) ) 
AS ?key) ?uid {

    graph ?g_uid {
  [] a mfg:ReferenceDevice ;
    rdfs:label "ref die"^^xsd:string ;
    mfg:location [ mfg:xPos ?ref_X ; mfg:yPos ?ref_Y ] .

  [] mfg:object [ rdfs:label ?uid ] ;
    mfg:location [ mfg:xPos ?X ; mfg:yPos ?Y ; mfg:containedIn ?wafer ]
    }
  }}
  { select (CONCAT( STR(?wafer), STR(?X - ?ref_X),  STR(?Y - ?ref_Y) ) 
AS ?key) {

    graph ?g_fwt {
  [] a mfg:ReferenceDevice ;
    rdfs:label "Prober Reference"^^xsd:string ;
    mfg:location [ mfg:xPos ?ref_X ; mfg:yPos ?ref_Y ] .

  [] mfg:binCode [ mfg:pick true ] ;
    mfg:location [ mfg:xPos ?X ; mfg:yPos ?Y ; mfg:containedIn ?wafer ]
    }
  }}
}
```


Cheers,
Lorenz

On 01.04.24 21:18, John Walker wrote:

Hi Andy,

Thanks for the pointers.

I have been digging a bit more into the plans for the actual queries against 
the actual datasets and see that a join is used.

Here's an example of the query:

```
prefix mfg: 
prefix rdfs: 
prefix xsd: 

select distinct ?uid
where {
   { select ?wafer ?uid (?X - ?ref_X as ?X_) (?Y - ?ref_Y as ?Y_) {
 graph ?g_uid {
   [] a mfg:ReferenceDevice ;
 rdfs:label "ref die"^^xsd:string ;
 mfg:location [ mfg:xPos ?ref_X ; mfg:yPos ?ref_Y ] .

   [] mfg:object [ rdfs:label ?uid ] ;
 mfg:location [ mfg:xPos ?X ; mfg:yPos ?Y ; mfg:containedIn ?wafer ]
 }
   }}
   { select ?wafer (?X - ?ref_X as ?X_) (?Y - ?ref_Y as ?Y_) {
 graph ?g_fwt {
   [] a mfg:ReferenceDevice ;
 rdfs:label "Prober Reference"^^xsd:string ;
 mfg:location [ mfg:xPos ?ref_X ; mfg:yPos ?ref_Y ] .

   [] mfg:binCode [ mfg:pick true ] ;
 mfg:location [ mfg:xPos ?X ; mfg:yPos ?Y ; mfg:containedIn ?wafer ]
 }
   }}
}
```

The gist of this is that we try to normalize the coordinates from two maps 
based on a common reference point.

And the algebra generated from running that query against some example data:

```
   (distinct
 (project (?uid)
   (join
 (project (?wafer ?uid ?X_ ?Y_)
   (extend ((?X_ (- ?/X ?/ref_X)) (?Y_ (- ?/Y ?/ref_Y)))
 (graph ?/g_uid
   (bgp
 (triple ?/?0  
)
 (triple ?/?0  "ref 
die")
 (triple ?/?0  ?/?1)
 (triple ?/?1  ?/ref_X)
 (triple ?/?1  ?/ref_Y)
 (triple ?/?2  ?/?3)
 (triple ?/?3  ?uid)
 (triple ?/?2  ?/?4)
 (triple ?/?4  ?/X)
 (triple ?/?4  ?/Y)
 (triple ?/?4  ?wafer)
   
 (project (?wafer ?X_ ?Y_)
   (extend ((?X_ (- ?/X ?/ref_X)) (?Y_ (- ?/Y ?/ref_Y)))
 (graph ?/g_fwt
   (bgp
 (triple ?/?5  
)
 (triple ?/?5  "Prober 
Reference")
 (triple ?/?5  ?/?6)
 (triple ?/?6  ?/ref_X)
 (triple ?/?6  ?/ref_Y)
 (triple ?/?7  ?/?8)
 (triple ?/?8  true)
 (triple ?/?7  ?/?9)
 (triple ?/?9  ?/X)
 (t

Re: Re: [EXTERNAL] Re: Query Performance Degrade With Sorting In Subquery

2024-03-20 Thread Lorenz Buehmann
This sounds more like a use-case for correlated queries which is on the 
way to being added to SPARQL standard in 1.2.


I also think that your current query doesn't do what your expect as the 
subquery is evaluated independently, and there will be a cartesian 
product given that you do not propagate the join variable, e.g. for the 
first subquery you need ?concept as variable.



You can try to use a LATERAL clause in order to inline the bound data to 
the subquery? I'm not sure if this does happen currently as subqueries 
will be evaluated first,


Like

OPTIONAL {
   LATERAL {
{Select ?alternate ?concept  {
  ?concept skosxl:altLabel ?alternateSkosxl.
   ?alternateSkosxl skosxl:literalForm ?alternate;
   relations:hasUserCount ?alternateUserCount.
}
ORDER BY DESC (?alternateUserCount) LIMIT 10}
   }
}


Lorenz

On 20.03.24 08:23, Chirag Ratra wrote:

Hi,

Before I share the background, @Rob the answer to your question is we are
using tdb2 and the object type for relation:hasuserCount  is  <
http://www.w3.org/2001/XMLSchema#integer>

BACKGROUND  :

So the use case is we need to do a full text search for the search term on
the skos xl-prefLabel and skos xl-altLabel. After resolving the search term
through fulltext search, we need to return the metadata which includes all
the skos xl-altLabel. Since there could be many skos xl-altLabels we need
to return the top 10 skos xl-altLabel as a collection of arrays.

So we have another triple corresponding to each skos xl label which has
predicate  and
object is integer value () which
is basically the rank of label.

So I need to return the higher rank  skos xl-altLabel in the collection .
Similar is the use case for related skosxl label



Here is ttl file

@prefix :  .
@prefix fuseki:  .
@prefix rdf:  .
@prefix text:  .
@prefix tdb2:  .
@prefix skos:  .
@prefix skosxl:  .
@prefix relations: 

:service_tdb_all  rdf:type  fuseki:Service;
 fuseki:name  "cdm";
fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name "sparql" ];
fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name "query" ];
fuseki:endpoint [ fuseki:operation fuseki:update; fuseki:name "update"];
fuseki:endpoint [ fuseki:operation fuseki:gsp-r; ];
fuseki:endpoint [ fuseki:operation fuseki:gsp-r; fuseki:name "get" ];
fuseki:endpoint [ fuseki:operation fuseki:gsp-rw; fuseki:name "data" ];

 fuseki:dataset <#myTextDS>.

<#myTextDS> rdf:type text:TextDataset ;
 text:dataset <#myDatasetReadWrite> ;
 text:index <#indexLucene> ;
 .

<#indexLucene> a text:TextIndexLucene ;
 text:analyzer [ a text:StandardAnalyzer ];
 text:directory "run/databases/cdm-text-index";
 text:storeValues true ;
 text:entityMap <#entMap> ;
 .

<#entMap> a text:EntityMap ;
 text:entityField "uri" ;
 text:graphField "graph" ;
 text:defaultField "title" ;
 text:map (
 [ text:field "title"; text:predicate skosxl:literalForm; ]

 ) .

<#myDatasetReadWrite>
 rdf:type   tdb2:DatasetTDB2;
 tdb2:location
  "/apache-jena-fuseki/apache-jena-fuseki-5.0.0-rc1/run/databases/cdm" .



Here is my current query

PREFIX text: 
PREFIX skos: 
PREFIX skosxl: 
PREFIX relations: 

SELECT ?concept ?titleSkosxl ?title ?languageCode (GROUP_CONCAT(DISTINCT
?relatedTitle; separator=", ") AS ?relatedTitles) (GROUP_CONCAT(DISTINCT
?alternate; separator=", ") AS ?alternates)
WHERE
{
   (?titleSkosxl ?score) text:query ('cashier').

?concept skosxl:prefLabel ?titleSkosxl.
   ?titleSkosxl skosxl:literalForm ?title.
   ?titleSkosxl relations:usedInLocale ?controlledList.
   ?controlledList relations:languageMarketCode ?languageCode
FILTER(?languageCode = 'en-US').


#  get alternate title
OPTIONAL
   {
 Select ?alternate  {
 ?concept skosxl:altLabel ?alternateSkosxl.
 ?alternateSkosxl skosxl:literalForm ?alternate;
   relations:hasUserCount ?alternateUserCount.
 }
ORDER BY DESC (?alternateUserCount) LIMIT 10
}

#  get related titles
   OPTIONAL
   {
   Select ?relatedTitle
   {
 ?titleSkosxl relations:isRelatedTo ?relatedSkosxl.
 ?relatedSkosxl skosxl:literalForm ?relatedTitle;
 relations:hasUserCount ?relatedUserCount.
   }
ORDER BY DESC (?relatedUserCount) LIMIT 10
}
}
GROUP BY ?concept ?titleSkosxl ?title ?languageCode ?alternateJobTitle
?notation
ORDER BY DESC(?jobtitleWeight) 

Re: Re: query performance on named graph vs. default graph

2024-03-20 Thread Lorenz Buehmann

Hi,

what about

SELECT *
FROM NAMED 
FROM NAMED 
FROM NAMED  ...
FROM NAMED 
{
  GRAPH ?g {
  ...
  }
}

or

SELECT *
{
 VALUES ?g {  ... }
  GRAPH ?g {
    ...
  }
}


does that work better?

On 19.03.24 15:21, Jim Balhoff wrote:

Hi Andy,


On Mar 19, 2024, at 5:02 AM, Andy Seaborne  wrote:
Hi Jim,

What happens if you use GRAPH rather than FROM?

WHERE {
   GRAPH  {
 ?cell rdfs:subClassOf cell: .
 ?cell part_of: ?organ .
 ?organ rdfs:subClassOf organ: .
 ?organ part_of: abdomen: .
 ?cell rdfs:label ?cell_label .
 ?organ rdfs:label ?organ_label .
   }
}


This does help. With TDB this is actually faster than using the default graph. 
With the HDT setup it’s about the same (fast). But it doesn’t work that well 
for what I’m trying to do (below).


FROM builds a "view dataset" which is general purpose (e.g. multiple FROM are 
possible) but which is less efficient for basic graph pattern matching. It does not use 
the TDB2 basic graph pattern matcher.

GRAPH restricts to a single graph and the query goes direct to TDB2 basic graph 
pattern matcher.



If there is only one name graph, is here a reason to have it as a named graph? 
Using the default graph and no unionDefaultGraph may be

What I am really trying to do is have suite of large graphs that I can choose 
to include or not in a particular query, depending on what data sources I want 
to use in the query. I have several HDT files, one for each data source. I set 
this up as a dataset with a named graph for each data file, and was at first 
very happy with how it performed while turning on and off graphs using FROM 
lines. For example I have Wikidata in one HDT file, and it looks like having it 
available doesn’t slow down queries on other graphs when it’s not included. 
However I did see that performance issue in the query I asked about, and found 
it wasn’t related to having multiple graphs loaded; it happens even with just 
that one graph configured.

If I wrote my own server that accepted a list of data source names in a query 
parameter, and then for each request constructed a union model for executing 
the query over the required HDT graphs, would that work any better? Or is that 
basically the same as what FROM is doing?

Thank you,
Jim



--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Re: Problems when querying the SPARQL with Jena

2024-03-13 Thread Lorenz Buehmann

Hi,

Ok, that's what i assumed, packaging of your project is the issue.

Andy provided your the appropriate pointers.

Good luck and feel free to ask further questions once your project is 
running.



Cheers,
Lorenz

On 12.03.24 14:02, Anna P wrote:

Hi Lorenz,
Thank you for your reply. Yes, I used maven to build the project. Here are
dependencies details:
Hi Lorenz,

Yes, I used maven to build the project. Here are the dependencies details:

UTF-8
1.8
1.8




junit
junit
4.11
test


org.apache.jena
apache-jena-libs
5.0.0-rc1
pom


org.apache.maven.plugins
maven-assembly-plugin
3.6.0
maven-plugin



Best regards,
Pan

On Tue, Mar 12, 2024 at 7:13 AM Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:


Hi,

how did you setup your project? Which Jena version? Do you use Maven?
Which dependencies? It looks like ARQ.init() hasn't been called which
should happen automatically if the setup of the project is correct.


Cheers,
Lorenz

On 11.03.24 14:44, Anna P wrote:

Dear Jena support team,

Currently I just started to work on a SPARQL project using Jena and I

could

not get a solution when I query a model.
I imported a turtle file and ran a simple query, and the snippet code is
shown below. However, I got the error.

public class App {
  public static void main(String[] args) {
  try {
  Model model = RDFDataMgr.loadModel('data.ttl', Lang.TURTLE);
  RDFDataMgr.write(System.out, model, Lang.TURTLE);
  String queryString = "SELECT * { ?s ?p ?o }";
  Query query = QueryFactory.create(queryString);
  QueryExecution qe = QueryExecutionFactory.create(query,

model);

  ResultSet results = qe.execSelect();
  ResultSetFormatter.out(System.out, results, query);
  qe.close();
  } catch (Exception e) {
  e.printStackTrace();
  }
  }
}

Here is the error message:

org.apache.jena.riot.RiotException: Not registered as a SPARQL result set
output syntax: Lang:SPARQL-Results-JSON
  at


org.apache.jena.sparql.resultset.ResultsWriter.write(ResultsWriter.java:179)

  at


org.apache.jena.sparql.resultset.ResultsWriter.write(ResultsWriter.java:156)

  at


org.apache.jena.sparql.resultset.ResultsWriter.write(ResultsWriter.java:149)

  at


org.apache.jena.sparql.resultset.ResultsWriter$Builder.write(ResultsWriter.java:96)

  at


org.apache.jena.query.ResultSetFormatter.output(ResultSetFormatter.java:308)

  at


org.apache.jena.query.ResultSetFormatter.outputAsJSON(ResultSetFormatter.java:516)

  at de.unistuttgart.ki.esparql.App.main(App.java:46)


Thank you for your time and help!

Best regards,

Pan


--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109
Leipzig | Germany



--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Problems when querying the SPARQL with Jena

2024-03-11 Thread Lorenz Buehmann

Hi,

how did you setup your project? Which Jena version? Do you use Maven? 
Which dependencies? It looks like ARQ.init() hasn't been called which 
should happen automatically if the setup of the project is correct.



Cheers,
Lorenz

On 11.03.24 14:44, Anna P wrote:

Dear Jena support team,

Currently I just started to work on a SPARQL project using Jena and I could
not get a solution when I query a model.
I imported a turtle file and ran a simple query, and the snippet code is
shown below. However, I got the error.

public class App {
 public static void main(String[] args) {
 try {
 Model model = RDFDataMgr.loadModel('data.ttl', Lang.TURTLE);
 RDFDataMgr.write(System.out, model, Lang.TURTLE);
 String queryString = "SELECT * { ?s ?p ?o }";
 Query query = QueryFactory.create(queryString);
 QueryExecution qe = QueryExecutionFactory.create(query, model);
 ResultSet results = qe.execSelect();
 ResultSetFormatter.out(System.out, results, query);
 qe.close();
 } catch (Exception e) {
 e.printStackTrace();
 }
 }
}

Here is the error message:

org.apache.jena.riot.RiotException: Not registered as a SPARQL result set
output syntax: Lang:SPARQL-Results-JSON
 at
org.apache.jena.sparql.resultset.ResultsWriter.write(ResultsWriter.java:179)
 at
org.apache.jena.sparql.resultset.ResultsWriter.write(ResultsWriter.java:156)
 at
org.apache.jena.sparql.resultset.ResultsWriter.write(ResultsWriter.java:149)
 at
org.apache.jena.sparql.resultset.ResultsWriter$Builder.write(ResultsWriter.java:96)
 at
org.apache.jena.query.ResultSetFormatter.output(ResultSetFormatter.java:308)
 at
org.apache.jena.query.ResultSetFormatter.outputAsJSON(ResultSetFormatter.java:516)
 at de.unistuttgart.ki.esparql.App.main(App.java:46)


Thank you for your time and help!

Best regards,

Pan


--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Re: [EXTERNAL] Re: Apache Jena Full Text Search With Lucene

2024-02-28 Thread Lorenz Buehmann

Hi,

just having the text index declared in the assembler doesn't trigger a 
full rebuild for obvious reasons. For SPARQL updates while running 
Fuseki indeed the fulltext index will be updated as it listens to any 
changes in the underlying dataset.


In your case, you can run an offline indexing for your existing dataset, 
it is documented here: 
https://jena.apache.org/documentation/query/text-query.html#step-2---build-the-text-index


Hope this helps.


Cheers,
Lorenz

On 29.02.24 07:05, Chirag Ratra wrote:

Hi David,

Hurrah!! Query worked when I added new data and queried it. *Again BUT*
But I already have the dataset, I need to create an index on existing, how
can it be done?

Regards
Chirag

On Thu, Feb 29, 2024 at 11:25 AM David Habgood  wrote:


Hi Chirag,

How are you generating the database initially? You need to use the same
assembler file when doing this (alternately data you add now should be
added to the index - you can test this).

Thanks

On Thu, Feb 29, 2024 at 1:24 PM Chirag Ratra 
wrote:


Hi David,

I am able to query RDF data , but not FTS index

This query works :

PREFIX text: 
PREFIX relation: 
PREFIX skos: 
PREFIX skosxl: 
PREFIX relations: 

select *  WHERE
{
?prefLabel skosxl:literalForm ?literalForm
   FILTER(?literalForm = 'Cashier')
}
limit 10


But this query doesn't

SELECT * WHERE {
  ?s text:query 'Cashier'
} LIMIT 10


Here is the ttl file


@prefix :  .
@prefix fuseki:  .
@prefix rdf:  .
@prefix text:  .
@prefix tdb2:  .
@prefix skos:  .
@prefix skosxl:  .

:service_my_ds rdf:type fuseki:Service;
 fuseki:name "tq-dataset";
 fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name "sparql"
];
 fuseki:endpoint [ fuseki:operation fuseki:query; fuseki:name "query"

];

 fuseki:endpoint [ fuseki:operation fuseki:update; fuseki:name

"update"

];
 fuseki:endpoint [ fuseki:operation fuseki:gsp-r; ];
 fuseki:endpoint [ fuseki:operation fuseki:gsp-r; fuseki:name "get" ];
 fuseki:endpoint [ fuseki:operation fuseki:gsp-rw; fuseki:name "data"

];

 fuseki:dataset <#myTextDS> .

<#myTextDS> rdf:type text:TextDataset ;
 text:dataset <#myDatasetReadWrite> ;
 text:index <#indexLucene> ;
 .

<#indexLucene> a text:TextIndexLucene ;
 text:analyzer [ a text:StandardAnalyzer ];
 text:directory "run/databases/text-index-with-rdfs";
 text:storeValues true ;
 text:entityMap <#entMap> ;
 .

<#entMap> a text:EntityMap ;
 text:entityField "uri" ;
 text:graphField "graph" ;
 text:defaultField "prefLabel" ;
 text:map (
 [ text:field "prefLabel"; text:predicate skosxl:literalForm; ]
 ) .

<#myDatasetReadWrite> rdf:type tdb2:DatasetTDB2 ;
tdb2:location



"C:\\Users\\chirag.ratra\\Downloads\\apache-jena-fuseki-4.10.0\\apache-jena-fuseki-4.10.0\\run/databases/tq-dataset"

.

On Tue, Feb 27, 2024 at 6:49 PM David Habgood 
wrote:


Hi Chirag,

There's an extra triple in the last block:

<#myDatasetReadWrite> rdf:type tdb2:DatasetTDB2 ;
   tdb2:unionDefaultGraph true ;
tdb2:location



"C:\\Users\\chirag.ratra\\Downloads\\apache-jena-fuseki-4.10.0\\apache-jena-fuseki-4.10.0\\run/databases/tq-dataset"

.

Try removing the unionDefaultGraph true statement, so the last block
becomes:

<#myDatasetReadWrite> rdf:type tdb2:DatasetTDB2 ;
tdb2:location



"C:\\Users\\chirag.ratra\\Downloads\\apache-jena-fuseki-4.10.0\\apache-jena-fuseki-4.10.0\\run/databases/tq-dataset"

.
I'm guessing this is your issue but if not can you go to the info tab

in

the fuseki UI and tell us what count triples returns, and this SPARQL
query:
SELECT * { GRAPH  {?s ?p ?o} } LIMIT 10

Thanks

On Tue, Feb 27, 2024 at 9:54 PM Chirag Ratra 
wrote:


Hi David,

fuseski-server.jar ran successfully but I am not able to see any

data,

please let me know If am i missing something?
Since I had the configuration for my rdf dataset in tq-dataset.ttl

file

and

had fuseki:name "tq-dataset" and I was able to query for my rdf data

(it

had 30 triples)

If I update  tq-dataset.ttl  with the content you shared, I am not

able

to

find any triple.

PREFIX rdf: 
PREFIX rdfs: 
SELECT * WHERE {
   ?sub ?pred ?obj .
} LIMIT 10

Here is the ttl file


@prefix :  .
@prefix fuseki:  .
@prefix rdf:  .
@prefix text:  .
@prefix tdb2:  .
@prefix skos: <

Re: Re: Re: jena-fuseki UI in podman execution (2nd effort without attachments)

2024-02-12 Thread Lorenz Buehmann

Open issue: https://github.com/apache/jena/issues/2267

On 12.02.24 09:26, Lorenz Buehmann wrote:

Hi all,


I can reproduce it with using Apache Jena Fuseki 4.10.0 and generating 
a dummy N-Quads file by e.g. using Bash:



for i in {1..100..1}; do echo "<http://example.org/s> 
<http://example.org/p> <http://example.org/o> <http://example.org/g$i> 
." >> data.nq; done


tdb2.tdbloader --loc /tmp/tdb2/debug data.nq

$FUSEKI_HOME/fuseki-server --loc /tmp/tdb2/debug --port  /ds


Looks like the larger number of graphs breaks the UI in the "edit" tab 
because the page size is 5 and the number of pages is then beyond what 
fits the left-side of the widget.



Cheers,

Lorenz


On 12.02.24 06:32, jaa...@kolumbus.fi wrote:
Hi Andy, it really seems that there had been an UI update as the 
olden versions look correct. The problem with the latest jena-fuseki 
UI is that I have some hundred named graphs and I cannot access them 
through the latest UI.


What is Vue ?

Br, Jaana


09.02.2024 13.37 EET Andy Seaborne  kirjoitti:

  Hi Jaana,

Glad you got it sorted out.

The Fuseki UI does not do anything special about browser caches. There
was a major UI update with implementing it in Vue and all the HTML
assets that go with that.

  Andy

On 09/02/2024 05:37, jaa...@kolumbus.fi wrote:
Hi, I just noticed that it's not question about podman or docker 
but about browser cache. After deleting everything in browser cache 
I managed to get the correct user interface when running 
stain/jena-fuseki:3.14.0 and stain/jena-fuseki:4.0.0 by both podman 
and docker, but when I tried the latest stain/jena-fuseki (4.8.0) I 
got the incorrect interface (shown here 
https://github.com/jamietti/jena/blob/main/fuseki-podman.png).


Jaana M



08.02.2024 13.23 EET jaa...@kolumbus.fi kirjoitti:

   Hi, I've running jena-fuseki with docker:
   docker run -p 3030:3030 -e ADMIN_PASSWORD=pw123 stain/jena-fuseki
   and rootless podman:
   podman run -p 3030:3030 -e ADMIN_PASSWORD=pw123 
docker.io/stain/jena-fuseki
   when excuted the same version 4.8.0 of jena-fuseki with podman 
the UI looks totally different from the UI of the instance excuted 
with docker.
   see file fuseki-podman.png 
https://github.com/jamietti/jena/blob/main/fuseki-podman.png in 
https://github.com/jamietti/jena/

What can cause this problem ?
   Br, Jaana M



--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Re: jena-fuseki UI in podman execution (2nd effort without attachments)

2024-02-12 Thread Lorenz Buehmann

Hi all,


I can reproduce it with using Apache Jena Fuseki 4.10.0 and generating a 
dummy N-Quads file by e.g. using Bash:



for i in {1..100..1}; do echo " 
   
." >> data.nq; done


tdb2.tdbloader --loc /tmp/tdb2/debug data.nq

$FUSEKI_HOME/fuseki-server --loc /tmp/tdb2/debug --port  /ds


Looks like the larger number of graphs breaks the UI in the "edit" tab 
because the page size is 5 and the number of pages is then beyond what 
fits the left-side of the widget.



Cheers,

Lorenz


On 12.02.24 06:32, jaa...@kolumbus.fi wrote:

Hi Andy, it really seems that there had been an UI update as the olden versions 
look correct. The problem with the latest jena-fuseki UI is that I have some 
hundred named graphs and I cannot access them through the latest UI.

What is Vue ?

Br, Jaana


09.02.2024 13.37 EET Andy Seaborne  kirjoitti:

  
Hi Jaana,


Glad you got it sorted out.

The Fuseki UI does not do anything special about browser caches. There
was a major UI update with implementing it in Vue and all the HTML
assets that go with that.

  Andy

On 09/02/2024 05:37, jaa...@kolumbus.fi wrote:

Hi, I just noticed that it's not  question about podman or docker but about 
browser cache. After deleting everything in browser cache I managed to get the 
correct user interface when running stain/jena-fuseki:3.14.0 and 
stain/jena-fuseki:4.0.0 by both podman and docker, but when I tried the latest 
stain/jena-fuseki (4.8.0) I got the incorrect interface (shown here 
https://github.com/jamietti/jena/blob/main/fuseki-podman.png).

Jaana M



08.02.2024 13.23 EET jaa...@kolumbus.fi kirjoitti:

   
Hi, I've running jena-fuseki with docker:
   
docker run -p 3030:3030 -e ADMIN_PASSWORD=pw123 stain/jena-fuseki
   
and rootless podman:
   
podman run -p 3030:3030 -e ADMIN_PASSWORD=pw123 docker.io/stain/jena-fuseki
   
when excuted the same version 4.8.0 of jena-fuseki with podman the UI looks totally different from the UI of the instance excuted with docker.
   
see file fuseki-podman.png https://github.com/jamietti/jena/blob/main/fuseki-podman.png in https://github.com/jamietti/jena/

What can cause this problem ?
   
Br, Jaana M


--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Unable to build the below query using jena query builder

2023-12-06 Thread Lorenz Buehmann

Hi,

did you try addGraph method on the SelectBuilder object with another 
SelectBuilder object on which you add the VALUES clause and the triple 
pattern?


In your case with the UNION I think the flow should be something like


SelectBuilder sb = ...

SelectBuilder sb_g1 = ... // first graph pattern
sb_g1.addValueVar(

SelectBuilder sb_g2 = ... // second graph pattern

sb.addUnion(sb_g1)
sb.addUnion(sb_g2)


But, I think there might be some bug or limitation:

While

SelectBuilder sb1 = new SelectBuilder()
    .addWhere("?s", "?p", "?o")
    .addWhereValueVar("?s", "foo1")
    .addWhereValueVar("?p", "foo1")
    .addWhereValueVar("?o", "foo1");
System.out.println(sb1.buildString());


Returns the expected query

SELECT  *
WHERE
  { ?s  ?p  ?o
    VALUES ( ?s ?p ?o ) {
  ( "foo1" "foo1" "foo1" )
    }
  }

passing that builder to another builder where it is supposed to make the 
graph


Node g1 = NodeFactory.createURI("http://example/g1";);
SelectBuilder sb2 = new SelectBuilder().addGraph(g1,
    new SelectBuilder()
    .addWhere("?s", "?p", "?o")
    .addWhereValueVar("?s", "foo1")
    .addWhereValueVar("?p", "foo1")
    .addWhereValueVar("?o", "foo1"));
System.out.println(sb2.buildString());


leads to only

SELECT  *
WHERE
  { GRAPH 
  { ?s  ?p  ?o}}


From the code I'd say the issue is that the build() method isn't being 
called before adding the query pattern as element to the graph. But 
that's just a guess.



So what works is to force a build() on the WhereHandler explicitly 
before passing it to the graph:



Node g1 = NodeFactory.createURI("http://example/g1";);
SelectBuilder sbG1 = new SelectBuilder()
    .addWhere("?s", "?p", "?o")
    .addWhereValueVar("?s", "foo1")
    .addWhereValueVar("?p", "foo1")
    .addWhereValueVar("?o", "foo1");
sbG1.getWhereHandler().build(); // the important line
SelectBuilder sb2 = new SelectBuilder().addGraph(g1,
    sbG1);
System.out.println(sb2.buildString());


then we get


SELECT  *
WHERE
  { GRAPH 
  { ?s  ?p  ?o
    VALUES ( ?s ?p ?o ) {
  ( "foo1" "foo1" "foo1" )
    }
  }}


It don't know if this is intended or if you want to open an issue or 
wait for Claude Warren which is the principle maintainer of the query 
builder code and he'll provide a better answer than me.



I have one question, any reason for using Jena 3.5.0 which is 6 years old?

On 06.12.23 01:33, Dhamotharan, Kishan wrote:

Hi All,

I have been trying to construct the below query using Jena query builder. I 
have tried multiple different ways to build it, nothing seems to work. Adding 
values block inside Graph seems to be not possible. We are using Jena 3.5.


SELECT ?subject ?predicate ?object ?graph
WHERE {

{
   GRAPH  {
 ?subject ?predicate ?object

   VALUES (?subject ?predicate ?object) {
   (  )
   (  )
   (  )
  }

 BIND( AS ?graph)
   }
}

UNION

{
   GRAPH  {
 ?subject ?predicate ?object

   VALUES (?subject ?predicate ?object) {
   (  )
   (  )
   (  )
}

 BIND( AS ?graph)
   }
}

Can anyone suggest on how this can be done ? Any help is appreciated 😊


Thanks
Kishan Dhamotharan


--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Re: Query features info

2023-09-20 Thread Lorenz Buehmann
@Hashim in the future you should reply to the mailing list such that all 
people can see your response and follow the thread. Now we can just see 
your answer because James replied to you and the mailing list. We also 
can't see your attachment.


Which paper are you talking about?

As James said, there is a difference between a syntax tree like the Jena 
algebra tree does (without optimizations) and a query execution plan 
(with optimizations).



Regarding LSQ, given that you're working at DICE group, I'd suggest to 
have a look at it or simply talk to Saleem. But it does only extract the 
"simple" features based on the syntax tree and makes it accessible as 
RDF data itself.


Lorenz

On 20.09.23 23:05, James Anderson wrote:

good evening;

if you want to reproduce those results, you will have to examine the parsed 
syntax tree.
that should comprise just two bgps, as that is the immediate syntax.
if, on the other hand, you examaine the results of a query planner, you are not 
looking at a systax tree, you are looking at the query processor's prospective 
exection plan.
execution model permits the transformations to which i alluded.
you will more likely get your desired representation by having jena emit an 
sse, rather than an execution plan.

best regards, from berlin,


On 20. Sep 2023, at 17:48, Hashim Khan  wrote:

Thanks for the quick reply.

To be precise, I want to clarify the table on page 7 of the attached paper. 
Here, the No. of BGPs is 2, and also some more values. I want to extract all 
the info using Jena. But I could not till now. About the LSQ, I will check it, 
but I am following this paper and want to reproduce the results.

Best Regards,
Hashim

On Tue, Sep 19, 2023 at 4:18 PM James Anderson  
wrote:
good afternoon;

you have to consider that a query processor is free to consolidate statement 
patterns in a nominal bgp - which itself implicitly joins them, or separate 
them in order to either apply a different join strategy or - as in this case, 
to interleave an operation under the suspicion that it will reduce solution set 
cardinality.

best regards, from berlin,


On 19. Sep 2023, at 13:20, Hashim Khan  wrote:

Hi,

Having a look on this SPARQL query:
---
prefix dbo:
prefix dbr:
prefix foaf:

SELECT DISTINCT ?name ?birth ?death
WHERE { ?person dbo:birthPlace  dbr:Berlin .
?person dbo:birthDate ?birth .
?person foaf:name ?name .
OPTIONAL { ?person dbo:deathDate ?death . }
FILTER (?birth < "1900-01-01") .
}
LIMIT 100
-
Using Apache Jena ARQ command, ./arq --query exampleQuery.sparql --explain
I got this result.

13:11:41 INFO  exec:: ALGEBRA
  (slice _ 100
(distinct
  (project (?name ?birth ?death)
(conditional
  (sequence
(filter (< ?birth "1900-01-01")
  (bgp
(triple ?person  <
http://dbpedia.org/resource/Berlin>)
(triple ?person 
?birth)
  ))
(bgp (triple ?person  ?name)))
  (bgp (triple ?person 
?death))
13:11:41 INFO  exec:: BGP
  ?person  <
http://dbpedia.org/resource/Berlin>
  ?person  ?birth
13:11:41 INFO  exec:: Reorder/generic
  ?person  <
http://dbpedia.org/resource/Berlin>
  ?person  ?birth
13:11:41 INFO  exec:: BGP ::   ?person <
http://xmlns.com/foaf/0.1/name> ?name

| name | birth | death |

 I have a question about the Basic Graph Patterns.
I think, in this query there are two BGPs. But here i shows 3. Can anyone
explain it to me? Also, I want to know, the number of joins, no of
projection variables, number of left joins, depth, and such other relevant
info about the query features. How can I get all at one place?

Best Regards,


*Hashim Khan*

---
james anderson | ja...@dydra.com | https://dydra.com




--
Hashim Khan



---
james anderson | ja...@dydra.com | https://dydra.com




--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Query features info

2023-09-19 Thread Lorenz Buehmann

Hi,

getting the data you're interested in needs to traverse either the 
algebra (which is shown in your "explain" request) or the query 
structure. For both you'll have to write Java code - luckily Jena 
already provides the necessary tools via visitors.


By the way, why do you think there are only two BGPs? I mean the number 
of triple patterns in the query is clearly not 2, right?



PS: did you check out the LSQ project resp. the LSQ code? I think there 
is already some SPARQL query feature extraction code somewhere in the 
framework.



Lorenz

On 19.09.23 13:20, Hashim Khan wrote:

Hi,

Having a look on this SPARQL query:
---
prefix dbo:
prefix dbr:
prefix foaf:

SELECT DISTINCT ?name ?birth ?death
WHERE { ?person dbo:birthPlace  dbr:Berlin .
 ?person dbo:birthDate ?birth .
 ?person foaf:name ?name .
OPTIONAL { ?person dbo:deathDate ?death . }
FILTER (?birth < "1900-01-01") .
}
LIMIT 100
-
Using Apache Jena ARQ command, ./arq --query exampleQuery.sparql --explain
I got this result.

13:11:41 INFO  exec:: ALGEBRA
   (slice _ 100
 (distinct
   (project (?name ?birth ?death)
 (conditional
   (sequence
 (filter (< ?birth "1900-01-01")
   (bgp
 (triple ?person  <
http://dbpedia.org/resource/Berlin>)
 (triple ?person 
?birth)
   ))
 (bgp (triple ?person  ?name)))
   (bgp (triple ?person 
?death))
13:11:41 INFO  exec:: BGP
   ?person  <
http://dbpedia.org/resource/Berlin>
   ?person  ?birth
13:11:41 INFO  exec:: Reorder/generic
   ?person  <
http://dbpedia.org/resource/Berlin>
   ?person  ?birth
13:11:41 INFO  exec:: BGP ::   ?person <
http://xmlns.com/foaf/0.1/name> ?name

| name | birth | death |

 I have a question about the Basic Graph Patterns.
I think, in this query there are two BGPs. But here i shows 3. Can anyone
explain it to me? Also, I want to know, the number of joins, no of
projection variables, number of left joins, depth, and such other relevant
info about the query features. How can I get all at one place?

Best Regards,


*Hashim Khan*


--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Transactions over http (fuseki)

2023-08-16 Thread Lorenz Buehmann

Hi,

that is an open issue in the SPARQL standard and Andy already opened a 
ticket [1] regarding this maybe w.r.t. an upcoming SPARQL 1.2


I think mixed query types are still not possible via standard Fuseki in 
a single transaction, but indeed an extension like you're planning 
should be possible. Andy is already working on a newer Fuseki extension 
mechanism (it's basically already there) where you can plugin so-called 
Fuseki modules. This would be the way I'd try to add this extension to 
Fuseki.


Indeed, Andy knows better and can give you more specific code or 
pointers - maybe he even has such a module or code part implemented 
somewhere.



Regards,

Lorenz


[1] https://github.com/w3c/sparql-dev/issues/83

On 16.08.23 17:20, Gaspar Bartalus wrote:

Hi,

We’ve been using jena-fuseki to store and interact with RDF data by running
queries over the http endpoints.
We are now facing the challenge to use transactional operations on the
triple store, i.e. running multiple sparql queries (both select and update
queries) in a single transaction.
I would like to ask what your suggestion might be to achieve this.

The idea we have in mind is to extend jena-fuseki with new http endpoints
for handling transactions.
Would this be technically feasible, i.e. could we reach the internal
transaction API (store API?) from jena-fuseki?
Would you agree with this approach conceptually, or would you recommend
something different?

Thanks in advance,
Gaspar

PS: Sorry for the duplicate, I have the feeling that my other email address
is blocked somehow.


--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: In using RIOT I encounter the "64000" entity expansions error.

2023-06-28 Thread Lorenz Buehmann
it is not a Jena specific parameter, thus, you have to set if via 
standard Java JVM arguments:


riot does make use of the system var JVM_ARGS, so you can use that


export JVM_ARGS="$JVM_ARGS  -DentityExpansionLimit=25"


or just prepend your call


JVM_ARGS="$JVM_ARGS  -DentityExpansionLimit=250" riot --set 
ttl:entityExpansionLimit=0 --validate ../../foodon.owl



On 28.06.23 10:26, Damion Dooley wrote:

I’m using RIOT to parse a large food ontology in owl rdf/xml format.  Its 
giving me an error:



“JAXP00010001: The parser has encountered more than "64000" entity expansions 
in this document; this is the limit imposed by the JDK.”



How can I increase the entityExpansionsLimit or whatever its called as a 
variable ? I was guessing:



riot --set ttl:entityExpansionLimit=0 --validate ../../foodon.owl



but of course that didn’t work.


I’m on a Mac powerbook btw.

Many thanks for the info,

Damion

Damion Dooley, Ontology Development Lead
Centre for Infectious Disease Genomics and One Health
Faculty of Health Sciences, SFU, Canada
Mobile 778-688-0049


--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Confirming on usage of andrewoma dexx collection from Jena-base

2023-06-26 Thread Lorenz Buehmann

Hi,

are you talking about your own fork of Jena in your company? And you're 
asking if there is anything preventing you from modifying the POM file 
in jena-base module? Is that something the Apache 2 License would care 
about? Isn't it more about your Marklogic product in the end?


Or do you want to redistribute that adopted Jena version at any place 
and you're not sure?



Cheers,

Lorenz

On 26.06.23 20:26, Abika Chitra wrote:

Hi There,

I would like to confirm on the usage of the 
com.github.andrewoma.dexx:collection:jar in the jena-base jar. Our application 
needs this to be mentioned in our pom for runtime execution and we also have to 
package it with our product for commandline dependency usage. Since the 
andrewoma repository looked a little different than an official library. We 
would like to check with the team if this is okay to be packaged as a third 
party library addition.

Regards,
Abika


This message and any attached documents contain information of MarkLogic and/or 
its customers that may be confidential and/or privileged. If you are not the 
intended recipient, you may not read, copy, distribute, or use this 
information. If you have received this transmission in error, please notify the 
sender immediately by reply e-mail and then delete this message. This email may 
contain pricing or other suggested contract terms related to MarkLogic software 
or services. Any such terms are not binding on MarkLogic unless and until they 
are included in a definitive agreement executed by MarkLogic.


--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany



Re: Combine two columns in SPARQL

2023-04-25 Thread Lorenz Buehmann

I cannot reproduce it:


Data (test.ttl):

|PREFIX : ||
||PREFIX skos: ||
||
||:Animals||
||  skos:prefLabel "animals"@en ;||
||  skos:altLabel "fauna"@en ;||
||  skos:hiddenLabel "aminals"@en ;||
||  skos:prefLabel "animaux"@fr ;||
||  skos:altLabel "faune"@fr .||
|

Query (test.rq):


|PREFIX skos: ||
||
||select ?s ?pl_al||
||where {||
||    ?s skos:prefLabel ?pl .||
||    ?s skos:altLabel ?al .||
||    bind(concat(?pl, ?al) as ?pl_al)||
||  }|


Usage (Jena CLI):

sparql --data test.ttl --query test.rq

Result:

-
| s |    | pl_al |||
||=||
|||  | "animauxfauna"    |||
|||  | "animauxfaune"@fr |||
|||  | "animalsfauna"@en |||
|||  | "animalsfaune"    |||
||-|


Do you have named graphs or something? I mean, is just one column empty 
or the whole resultset?




On 24.04.23 14:18, Mikael Pesonen wrote:


Not Jena question but hope someone can help. I have two columns with 
always equal amount of rows. How can they be combined into one column 
(variable)? This method doesn't work (example has different predicates):


select ?s ?pl_al
where {
    ?s skos:prefLabel ?pl .
    ?s skos:altLabel ?al .
    bind(concat(?pl, ?al) as ?pl_al)
  }


--
Lorenz Bühmann
Research Associate/Scientific Developer

emailbuehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany


Re: GeoSPARQL and Weighted graph to find the Shortest path

2023-03-25 Thread Lorenz Buehmann

Hi, comments inline

On 24.03.23 18:30, Yang-Min KIM wrote:

Dear Jena community,

I would like to use GeoSPARQL, but before I proceed I need your 
valuable advice.


What I want to do:
- Using GeoSPARQL, nodes have their own position (or zone).
In RDF and thus in Jena those are still plain RDF nodes in the RDF 
graph, doesn't matter if those nodes have geospatial information 
assigned or not

- Graph is weighted: edge has a value
Are you talking about edges aka triples in the RDF graph? The weight is 
an RDF triple then? Or an annotation on the RDF triple itself? Or an RDF 
Star triple?

- Find the shortest path between nodes in a weighted graph


How? Writing your own custom algorithm traversing the RDF graph? Neither 
SPARQL as the dedicated query language for RDF not RDF itself have 
algorithms or concepts of paths.


On the other hand, some triple stores nowadays provide extension to 
traverse the graph more efficiently, but there is no standardized 
method. For example, even Jena has some Java code do compute the/some 
shortest path: 
https://jena.apache.org/documentation/javadoc/jena/org.apache.jena.core/org/apache/jena/ontology/OntTools.html#findShortestPath(org.apache.jena.rdf.model.Model,org.apache.jena.rdf.model.Resource,org.apache.jena.rdf.model.RDFNode,java.util.function.Predicate)


But this implementation is no weighted, so you would have to extend the 
code or just write your own custom Java code based on Jena's Graph::find 
method for example. Even more simple, use JGraphT and just write a 
wrapper for Jena - we did the same, so we could apply JGraphT algorithms 
on a Jena Graph object. Indeed, efficiency can't be the same as for 
other Graph databases with optimized index structures.



My question is, what is the relation to GeoSPARQL in your current 
problem? Does it matter for your weighted graph? Can you give an example 
fore the RDF graph?




For instance, there is Dijkstra's algorithm allowing to find the 
shortest paths considering weighted edges.



Here are my questions as a beginner:
1. What is the best way to integrate a weighted graph in Jena?
2. Maybe RDF* (RDF-star)? Is it compatible with GeoSPARQL?
3. How can we find a shortest path considering edge values?

Thank you for your time and have a great weekend.

Min




Re: number of HTTP requests

2023-03-19 Thread Lorenz Buehmann

Hi,

In Fuseki you can gather stats as JSON: 
https://jena.apache.org/documentation/fuseki2/fuseki-server-info.html


This would give you the number of requests which should be what you need.


Cheers,
Lorenz

On 19.03.23 23:14, Hashim Khan wrote:

Hi all,
In case of client server model for executing SPARQL queries,
e.g. Linked Data Fragments or brTPF etc, I want to know how I can note the
number of HTTP requests that have been sent to the server in order to
execute a query. I can calculate the query execution time through bash
commands, but I need to know the no. of http requests as well. If anybody
can help me please.

Regards,


Re: Inference reasoner

2023-02-09 Thread Lorenz Buehmann
Your config indicates some RDFS reasoner, but apparently transitivity as 
well as inverse relations both a re part of OWL semantics only (also 
indicarted by the owl: prefix of course). And OWL is way more expressive 
than RDFS, and inference way more expensive of course.


 Thus, you should use an appropriate OWL reasoner, see the coverage of 
the existing reasoner profiles: 
https://jena.apache.org/documentation/inference/#OWLcoverage


On 08.02.23 17:45, Yang-Min KIM wrote:

Dear Jena community,

As a beginner in Jena (and I do not code in Java), I would like to ask 
you a question about ontology integration.


- Data: small test data and corresponding OWL ontology
- Method: put data into GraphDB and Jena servers then run SPARQL query
- Expected result: initial missing information is complemented by 
ontology transitivity


I will describe you step by step, the questions will come at the end 
of the mail.


/ Report: what I did 
/


1. Test data: pizza!

1.1. Pizza ontology
As example, we use Pizza ontology provided by "OWL Example with RDF 
Graph"
 




We focus on the inverse relationships: `:isIngredientOf owl:inverseOf 
:hasIngredient .` i.e. the property `isIngredientOf` is the inverse of 
the property `hasIngredient`.

---
:isIngredientOf
   a owl:TransitiveProperty , owl:ObjectProperty ;
   owl:inverseOf :hasIngredient .
---


1.2. Pizza data
Two pizza `Margherita` and `Cheese Bacon`: the relationship between a 
pizza and an ingredient is declared by the property either 
`isIngredientOf` or `hasIngredient`. In summary:

- Margherita contains tomato and basilic
- Cheese Bacon contains cheese, bacon, and tomato

---
@prefix ex: <> .
@prefix pizza:  <> .
@prefix rdf: <> .

# Margherita
ex:margherita rdf:type pizza:Pizza .
ex:margherita rdf:type pizza:VegetarianPizza .
ex:tomato pizza:isIngredientOf ex:margherita .
ex:margherita pizza:hasIngredient ex:basilic .

# Cheese Bacon
ex:cheese_bacon rdf:type pizza:Pizza .
ex:cheese_bacon rdf:type pizza:NonVegetarianPizza .
ex:cheese pizza:isIngredientOf ex:cheese_bacon .
ex:cheese_bacon pizza:hasIngredient ex:bacon .
ex:cheese_bacon pizza:hasIngredient ex:tomato .
---



2. GraphDB

2.1. Dataset (repository) creation:

As shown in the image, the `RDFS-Plus (Optimized)` are set up by 
default when creating a repository (dataset in Jena term).


2.2. Upload test data
Both test data and ontology are uploaded in a same repository (as 
default or named graphes, results are unchanged).


We can see reciprocal relationships between pizza and ingredients 
(`hasIngredient` and `isIngredientOf`).


2.3. SPARQL query

Got expected results: all ingredients are presents i.e. initial 
missing information is complemented: `ex:tomato pizza:isIngredientOf 
ex:margherita` is equivalent to `ex:margherita pizza:hasIngredient 
ex:tomato`




3. Jena

3.1. Dataset creation
Fuseki UI does not allow to set default reasoner as seen in 2.1.
It would be awesome if we could specify at this step!

3.2. Upload test data
Each data is uploaded on TDB as default graph into:
- A same dataset
- Separated dataset

3.3. SPARQL query

Misisng relationships are still missing, e.g. tomato is missing as 
Margherita's ingredient.


3.4. Set inference model via config
According to your previous suggestions (thank you Dave and Lorenz!), I 
have tried to follow the configuration steps mentioned in Fuseki's 
configuration documentation.
 
-> section "Inference"
I uploaded ontology file into a separated TDB dataset named 
"test_ontology_pizza".
Then I modified Fuseki's `config.ttl` by adding the part after ## 
--- as below.
The results are the same as the previous ones despite restarting the 
server.


---
# Licensed under the terms of 



## Fuseki Server configuration file.

@prefix :    <#> .
@prefix fuseki:  <> .
@prefix rdf: <> .
@prefix rdfs: <> .
@prefix ja: <> .
@prefix tdb2:    <> 

Re: Re: Re: How to implicitly integrate OWL ontology?

2023-02-06 Thread Lorenz Buehmann
Ok, but if your example isn't related to the link, how is the link 
related to your current issue?


Can you share the ontology with the father relation where the inference 
doesn't work for you?


On 06.02.23 10:42, Yang-Min KIM wrote:
Le lun., févr. 6 2023 at 10:30:17 +0100, Lorenz Buehmann 
 a écrit :

SWRL


Dear Dave and Lorenz,

Thank you for your reply!
As I am a beginner in ontology, I do not yet know all the different 
terms, e.g. SWRL, I'm checking!


My exemple (father-child...etc) is not related to the link (Biolink 
Model), and actually Biolink Model provides also in OWL (ttl) format: 
<https://github.com/biolink/biolink-model/blob/master/biolink-model.owl.ttl>


I don't code in Java, but as advised by Dave, I'll see the Fuseki 's 
"Inference" section.

I will be back to you if I have any further problems.

Have a nice day,
Min






Re: Re: How to implicitly integrate OWL ontology?

2023-02-06 Thread Lorenz Buehmann

Hi,

in addition to the comments of Dave:

On 03.02.23 11:16, Yang-Min KIM wrote:


OWL-B includes:
Daughter is subclass of Children. (rdfs:subClassOf)

Minor: use singular form for class named, i.e. Child

If X is Male and has Children Y, X is father of Y. (owl:inverseOf)


That doesn't sound like an owl:inverse statement, but more like an SWRL 
rule. An owl:inverse does only state property p is inverse of property 
q, thus, we can infer for any p(x, y) it also holds q(y, x)


Is it a rule or are those really OWL axioms? I doubt you can express 
that in OWL though without using SWRL. If it is SWRL, then you will a) 
use Pellet as reasoner or b) write that rule as a custom Jena rule


As Dave mentioned, you example ontology doesn't cover the domain you 
describe. At least I could find any entity denoted "father" or similar.



Lorenz

On 06.02.23 10:19, Dave Reynolds wrote:
To configure use of a reasoner with fuseki see 
https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html 
under the section "Inference".


The reasoners are not graph-aware so the union of your ontology and 
your instance data all need to appear in the default graph. Either by 
loading them there directly OR by loading them as separate graphs and 
setting default union flag.


However, the link you provide to your ontology doesn't match your 
prose example in any way at all. In particular it seems to be a mix of 
skos and linkml (whatever that is) and I see virtual no OWL in in 
there.[*] Though it is a 1.3Mb turtle file so who knows what's 
lurking. So on the face of it there's no OWL to reason over and you 
won't get any useful results.


My advice would be to isolate a smaller test example of the kind of 
reasoning you are trying to do and check that programmatically see 
https://jena.apache.org/documentation/inference/index.html#OWLexamples


Then, if it seems like inference does work, you can tackle the 
separate problem of setting that up within fuseki.


Dave

[*] In particular there's no use of rdfs:subClassOf. There are 188 
owl:inverseOf states but they are applied to things of type 
linkml:SlotDefinition which makes no sense at all.


On 03/02/2023 10:16, Yang-Min KIM wrote:

Dear Jena community,

I hope your day is going great.
I have a question about the ontology: we want to request an ontoogy 
data A that also import another ontology OWL-B.


e.g.

A includes:
 John is Male.
 John has a daughter called Monica.

OWL-B includes:
 Daughter is subclass of Children. (rdfs:subClassOf)
 If X is Male and has Children Y, X is father of Y. (owl:inverseOf)

What I want to query:
 Who is Monica's father?

Expected response:
 John


To get expected response, Jena needs to include OWL-B then manage 
implicit statement. However, I got results by explicit querying only: 
Who is John's daughter? -> Monica


I'm sure there is a solution since I see "The OWL reasoner" in 



Are there additional steps to include Ontology structure? (We are 
using Fuseki's API REST)
Is it better to import OWL-B as a default graph or a named graph? and 
what if we have several OWL files to import?


P.S. here is an example of our OWL file, BIolink Model, downloadable 
via 



Thank you for your time.




Re: Re: Why does the OSPG.dat file grows so much more than all other files?

2023-01-31 Thread Lorenz Buehmann
 213531 (Aproximately 213 thousand)
  - Predicates: 153

Disk Stats:
- my-dataset/Data-0002: 23GB
- my-dataset/Data-0002/OSPG.dat: 3.7GB
- my-dataset/Data-0002/nodes.dat: 680MB
- my-dataset/Data-0002/POSG.dat: 3.8GB
- my-dataset/Data-0002/nodes.idn: 8.0M
- my-dataset/Data-0002/POSG.idn: 40M
- my-dataset/Data-0002/OSPG.idn: 32M
- ...

## Comparison

RDF Stats:
  - Triples: Same Count
  - Subjects: Same Count
  - Objects: Same Count
  - Graphs: Same Count
  - Predicates: Same Count

Disk Stats:
- Total Space: ~29x reduction with both strategies
- OSPG.dat: ~69x reduction with replication and ~65x reduction with 
compression

- nodes.dat: ~111x reduction with both strategies
- POSG.dat: ~9,7x reduction with replication and ~7,6x reduction with 
compression

- nodes.idn: ~4125x reduction with both strategies
- POSG.idn: ~906x reduction with replication and ~725x reduction with 
compression

- OSPG.idn: ~843,75 reduction with both strategies

## Queries used to obtain the RDF Stats

### Triples
```
SELECT (COUNT(*) as ?count)
WHERE {
   GRAPH ?graph {
 ?subject ?predicate ?object
   }
}
```

### Graphs
```
SELECT (COUNT(DISTINCT ?graph) as ?count)
WHERE {
   GRAPH ?graph {
 ?subject ?predicate ?object
   }
}
```

### Subjects

```
SELECT (COUNT(DISTINCT ?subject) as ?count)
WHERE {
   GRAPH ?graph {
 ?subject ?predicate ?object
   }
}
```

### Predicates
```
SELECT (COUNT(DISTINCT ?predicate) as ?count)
WHERE {
   GRAPH ?graph {
 ?subject ?predicate ?object
   }
}
```

### Objects
```
SELECT (COUNT(DISTINCT ?object) as ?count)
WHERE {
   GRAPH ?graph {
 ?subject ?predicate ?object
   }
}
```

## Comands used to measure the Disk Stats

### File Sizes
```
ls -lh --sort=size
```

### Directory Sizes
```
du -h
```

Best Regards

On 28/01/23 11:01, "Andy Seaborne" <mailto:a...@apache.org>> wrote:



I don't how OSPG can be a considerably different size. Small variations
happen but this does not look small.


Lorenz's advice to run a compaction and see what the indexes sizes are
is a good idea. A backup would also be a good idea because something is
unexpected (backup uses GSPO).


There has been some fixes in compaction since 4.4.0 related to
compacting while also active in Fuseki.


This index does not store the literals strings representations - they
are referenced via the 8 byte entries. In OSPG, the index entries are 4
slots of 8 bytes.


Andy


(Unrelated comment below)


On 28/01/2023 07:47, Lorenz Buehmann wrote:

Hi Elton,

Do you have lots of may large literals in your data?

Also, did you try a compaction on the database? If not, can you try it
and post the new file sizes afterwards? Note, they will be located in a
new ./Data- directory, e.g. before Data-0001 and afterwards 
Data-0002


By the way, we're now at Jena 4.7.0 - you might have a look at release
notes of the last 3 versions, maybe things you have recognized while
running you current Fuseki. If not, just keep it running if you're 
happy

with it of course.



Theer





Cheers,
Lorenz

On 28.01.23 03:10, Elton Soares wrote:

Dear Jena Community,

I'm running Jena Fuseki Version 4.4.0 as a container on an OpenShift
Cluster.
OS Version Info (cat /etc/os-release):
NAME="Red Hat Enterprise Linux"
VERSION="8.5 (Ootpa)"
ID="rhel"
ID_LIKE="fedora" ="8.5"
...

Hardware Info (from Jena Fuseki initialization log):
[2023-01-27 20:08:59] Server INFO Memory: 32.0 GiB
[2023-01-27 20:08:59] Server INFO Java: 11.0.14.1
[2023-01-27 20:08:59] Server INFO OS: Linux
3.10.0-1160.76.1.el7.x86_64 amd64
[2023-01-27 20:08:59] Server INFO PID: 1


Disk Info (df -h):
Filesystem
Size Used Avail Use% Mounted on
overlay
99G 76G 18G 82% /
tmpfs
64M 0 64M 0% /dev
tmpfs
63G 0 63G 0% /sys/fs/cgroup
shm
64M 0 64M 0% /dev/shm
/dev/mapper/docker_data
99G 76G 18G 82% /config
/data
1.0T 677G 348G 67% /usr/app/run
tmpfs
40G 24K 40G 1%


My dataset is built using TDB2, and currently has the following RDF
Stats:
· Triples: 65KK (Approximately 65 million)
· Subjects: ~20KK (Aproximately 20 million)
· Objects: ~8KK (Aproximately 8 million)
· Graphs: ~213K (Aproximately 213 thousand)
· Predicates: 153


The files corresponding to this dataset alone on disk sum up to
approximately 671GB (measured with du -h). From these, the largest
files are:
· /usr/app/run/databases/my-dataset/Data-0001/OSPG.dat: 243GB
· /usr/app/run/databases/my-dataset/Data-0001/nodes.dat: 76GB
· /usr/app/run/databases/my-dataset/Data-0001/POSG.dat: 35GB
· /usr/app/run/databases/my-dataset/Data-0001/nodes.idn: 33GB
· /usr/app/run/databases/my-dataset/Data-0001/POSG.idn: 29GB
· /usr/app/run/databases/my-dataset/Data-0001/OSPG.idn: 27GB


I've looked into several documentation pages, source code, forums, ...
nowhere I was able to find some explanation to why OSPG.dat is so much
larger than all other files.
I've been using Jena for quite some time now and I'm well aware that
its indexes 

Re: Why does the OSPG.dat file grows so much more than all other files?

2023-01-27 Thread Lorenz Buehmann

Hi Elton,

Do you have lots of may large literals in your data?

Also, did you try a compaction on the database? If not, can you try it 
and post the new file sizes afterwards? Note, they will be located in a 
new ./Data- directory, e.g. before Data-0001 and afterwards Data-0002


By the way, we're now at Jena 4.7.0 - you might have a look at release 
notes of the last 3 versions, maybe things you have recognized while 
running you current Fuseki. If not, just keep it running if you're happy 
with it of course.



Cheers,
Lorenz

On 28.01.23 03:10, Elton Soares wrote:

Dear Jena Community,

I'm running Jena Fuseki Version 4.4.0 as a container on an OpenShift Cluster.
OS Version Info (cat /etc/os-release):
NAME="Red Hat Enterprise Linux"
VERSION="8.5 (Ootpa)"
ID="rhel"
ID_LIKE="fedora" ="8.5"
...

Hardware Info (from Jena Fuseki initialization log):
[2023-01-27 20:08:59] Server INFOMemory: 32.0 GiB
[2023-01-27 20:08:59] Server INFOJava:   11.0.14.1
[2023-01-27 20:08:59] Server INFOOS: Linux 
3.10.0-1160.76.1.el7.x86_64 amd64
[2023-01-27 20:08:59] Server INFOPID:1


Disk Info (df -h):
Filesystem  Size  Used 
Avail Use% Mounted on
overlay  99G   76G  
 18G  82% /
tmpfs64M 0  
 64M   0% /dev
tmpfs63G 0  
 63G   0% /sys/fs/cgroup
shm  64M 0  
 64M   0% /dev/shm
/dev/mapper/docker_data  99G   76G  
 18G  82% /config
/data1.0T  677G 
 348G  67% /usr/app/run
tmpfs40G   24K  
 40G   1%


My dataset is built using TDB2, and currently has the following RDF Stats:
· Triples: 65KK (Approximately 65 million)
· Subjects: ~20KK (Aproximately 20 million)
· Objects: ~8KK (Aproximately 8 million)
· Graphs: ~213K (Aproximately 213 thousand)
· Predicates: 153


The files corresponding to this dataset alone on disk sum up to approximately 
671GB (measured with du -h). From these, the largest files are:
· /usr/app/run/databases/my-dataset/Data-0001/OSPG.dat: 243GB
· /usr/app/run/databases/my-dataset/Data-0001/nodes.dat: 76GB
· /usr/app/run/databases/my-dataset/Data-0001/POSG.dat: 35GB
· /usr/app/run/databases/my-dataset/Data-0001/nodes.idn: 33GB
· /usr/app/run/databases/my-dataset/Data-0001/POSG.idn: 29GB
· /usr/app/run/databases/my-dataset/Data-0001/OSPG.idn: 27GB


I've looked into several documentation pages, source code, forums, ... nowhere 
I was able to find some explanation to why OSPG.dat is so much larger than all 
other files.
I've been using Jena for quite some time now and I'm well aware that its 
indexes grow significantly during usage, specially when triples are being added 
across multiple requests (transactional workloads).
Even though, the size of this particular file (OSPG.dat) surprised me, as in my 
prior experience the indexes would never get larger than the nodes.dat file.
Is there a reasonable explanation for this based on the content of the dataset 
or the way it was generated? Could this be an indexing bug within TDB2?
Thank you for your support!
For completeness, here is the assembler configuration for my dataset:
@prefix :   http://base/# .
@prefix fuseki: http://jena.apache.org/fuseki# .
@prefix ja: http://jena.hpl.hp.com/2005/11/Assembler# .
@prefix rdf:http://www.w3.org/1999/02/22-rdf-syntax-ns# .
@prefix rdfs:   http://www.w3.org/2000/01/rdf-schema# .
@prefix root:   http://dev-test-jena-fuseki/$/datasets .
@prefix tdb2:   http://jena.apache.org/2016/tdb# .

tdb2:GraphTDB  rdfs:subClassOf  ja:Model .

ja:ModelRDFS  rdfs:subClassOf  ja:Model .

ja:RDFDatasetSink  rdfs:subClassOf  ja:RDFDataset .

http://jena.hpl.hp.com/2008/tdb#DatasetTDB
rdfs:subClassOf  ja:RDFDataset .

tdb2:GraphTDB2  rdfs:subClassOf  ja:Model .

http://jena.apache.org/text#TextDataset
rdfs:subClassOf  ja:RDFDataset .

ja:RDFDatasetZero  rdfs:subClassOf  ja:RDFDataset .

:service_tdb_my-dataset
rdf:type  fuseki:Service ;
rdfs:label"TDB my-dataset" ;
fuseki:dataset:ds_my-dataset ;
fuseki:name   "my-dataset" ;
fuseki:serviceQuery   "sparql" , "query" ;
fuseki:serviceReadGraphStore  "get" ;
fuseki:serviceReadWriteGraphStore
"data" ;
fuseki:serviceUpdate  "update" ;
fuseki:serviceUpload  "upload" .

ja:ViewGraph  rdfs:subClassOf  ja:Model .

ja:GraphRDFS  rdfs:subClassOf  ja:Model .

tdb2:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .

http://jena.hpl.hp.com/2008/tdb#GraphTDB
rdfs:subClassOf  ja:Model .

ja:DatasetTx

Re: Re: Builtin primitives in rule not triggered

2023-01-26 Thread Lorenz Buehmann

I cannot reproduce this. For example, the test code


public static void main(String[] args) {
    String raw = "  
 .";

    Model rawData = ModelFactory.createDefaultModel();
    rawData.read(new StringReader(raw), null, "N-Triples");
    String rules =
    "[test1: now(?x) -> print(\"now test\") ]";
    Reasoner reasoner = new 
GenericRuleReasoner(Rule.parseRules(rules));

    InfModel inf = ModelFactory.createInfModel(reasoner, rawData);
    System.out.println("A * * =>");
    StmtIterator iterator = inf.listStatements(null, null, 
(RDFNode) null);

    while (iterator.hasNext()) {
    System.out.println(" - " + iterator.next());
    }
}


does in fact print "now test" to the console.


On 26.01.23 19:43, L B wrote:

test1: now(?x) -> print("now test")


Re: Re: MOVE GRAPH … TO GRAPH … ~  Server Error 500 Iterator: started at 5, now 6 (SPARQL Update)

2023-01-24 Thread Lorenz Buehmann



On 24.01.23 10:31, Andreas Plank wrote:

it failed
first attempting loading the entire 8GB zipped data backup,

what was the reason for failing here?


Re: Re: RDF-star support in Jena

2023-01-18 Thread Lorenz Buehmann
Can't you use an IDE? How are you writing the Javacode? If you're using 
Eclipse or IntelliJ, and import the Jena libs via Maven, the IDE should 
be able to render Javadoc as well as giving you the source code.


On 18.01.23 09:28, Bruno Kinoshita wrote:

where can we find the implementation of the class RDFStar?


https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/system/RDFStar.java

Just go to Jena's source code in GitHub, https://github.com/apache/jena/,
click on "Go to file", type the name of the class (e.g. "RDFStar") and
normally you will find a file with the matching name. Otherwise use
GitHub's "Search in this repository" and that may lead you to the source
code you are looking for.

-Bruno

On Wed, 18 Jan 2023 at 09:02, Ghinwa FAKIH 
wrote:


where can we find the implementation of the class RDFStar?

On 17/01/2023 13:13, Andy Seaborne wrote:


On 17/01/2023 10:32, Ghinwa FAKIH wrote:

Hello,

I find this documentation on jena :
https://jena.apache.org/documentation/rdf-star/

In this documentation, it is stated that the package
/org.apache.jena.system.RDFStar//**/can be used for converting from
rdf into rdf-star and vice versa.

But I didn't find any details about how this package is used.

RDFStar is a class:

/**
  * Library for RDF-star translation to and from reification form.
  * There is one reification for each unique quoted triple term.
  * This is especially important when decoding.
  */

It provides a specialist (non-standard - there isn't a standard)
translation of quoted triple terms into RDF reification triples that
can be read by a RDF system that does not support RDF star syntax.

It also can reverse the mapping - take data it has encoded and produce
a graph with quoted triples.

It enables transferring data between systems if one does not support
RDF-star.

It is not a general tool to convert already existing reification into
RDF quoted triple terms.

Any Jena graph (model) supports RDF star in the Jena current release
(4.7.0) with no additional steps.


In addition, there is no information in this documentation about the
functions related to rdf-star in this package. I need a help for
writing code that can load RDF data and convert it into rdf-star and
then executing sparql-star queries.

RDF-star is always on in Jena.

Just load the data as you would any other data.  Turtle data can
include quoted triples.

Jena supports the W3C CG final report:

https://w3c.github.io/rdf-star/cg-spec

including SPARQL functions and results.


Thank you in advance for any help.

Regards

 Andy


Re: Re: MOVE GRAPH … TO GRAPH … ~  Server Error 500 Iterator: started at 5, now 6 (SPARQL Update)

2023-01-17 Thread Lorenz Buehmann
You can dump the data within the running Fuseki, then reload it into the 
latest Fuseki 4.7.0?


On 16.01.23 17:46, Andreas Plank wrote:

Is there any data repair tool or to test the data?

Am Mo., 16. Jan. 2023 um 17:23 Uhr schrieb Andy Seaborne :

Without a minimal, reproducible example it's not possible to say much.

As it is TDB1, it might be a broken database due to an earlier problem.
You say it works on a development server.
Is it using the same database?

yes, the database GRAPH was set up on 4.0.0 (in August 2022)

today (16. Jan. 2023) jena fuseki is on 4.6.0 running on that same data.

Usually I use the configuration defaults, that are shipped with the
docker image, and from reading the docker image Dockerfile
(https://github.com/stain/jena-docker/blob/226b7348509233bfc1b6dcd760c4d281188ec3fe/jena-fuseki/Dockerfile),
it gets the official version, saves it, extracts it and starts the
server

Do you need other (server config) setup files? The directory structure is:

.
├── LICENSE
├── NOTICE
├── README
├── bin
│   ├── s-delete
│   ├── s-get
│   ├── s-head
│   ├── s-post
│   ├── s-put
│   ├── s-query
│   ├── s-update
│   ├── s-update-form
│   └── soh
├── fuseki
├── fuseki-backup
├── fuseki-server
├── fuseki-server.bat
├── fuseki-server.jar
├── fuseki.service
├── load.sh
├── log4j2.properties
├── shiro.ini
├── tdbloader
├── tdbloader2
└── webapp
├── WEB-INF
│   └── web.xml
├── favicon.ico
├── index.html
├── log4j2.properties
└── static
├── css
├── img
└── js




On 16/01/2023 16:15, Andreas Plank wrote:

Version 4.7.0 I could not get started running with, but 4.6.0 worked with
the old data
(that version 4.7.0 did not work fine is perhaps another issue to solve
from this anyhow, in the meantime I use 4.6.0 with a docker image see e.g.
https://github.com/stain/jena-docker/issues/70 … it downloads the official
fuseki 4.6.0 aso.)

“What else is happeing on the server at the same time?”

- the server throws the afore mentioned error log, and HTTP response is 500

So durign the load there is no overlapping operation?
(what does the log show - is every previous action finished?)


- and some seconds later one can query as expected

The configuration is as follows (shiro.ini)

That's only the security setup.


8< /fuseki/shiro.ini >8--
[main]
# Development
ssl.enabled = false

plainMatcher=org.apache.shiro.authc.credential.SimpleCredentialsMatcher
#iniRealm=org.apache.shiro.realm.text.IniRealm
iniRealm.credentialsMatcher = $plainMatcher

#localhost=org.apache.jena.fuseki.authz.LocalhostFilter

[users]
admin =onepassword, administrator
wwwuser=anotherpassword, guest

[roles]
administrator=*
wwwuser=rest:read

[urls]
## Control functions open to anyone
/$/status  = anon
/$/ping= anon
/$/server  = anon
/$/stats   = anon
/$/stats/* = anon
/*/query/**  = anon
/*/sparql/** = anon
/*/get/**= anon

## and the rest are restricted
/$/** = authcBasic,roles[administrator]

## Sparql update is restricted

# /manage.html** = authcBasic,roles[administrator] # old interface
/**/manage = authcBasic,roles[administrator]
/**/manage** = authcBasic,roles[administrator]
/*/data/** = authcBasic,roles[administrator]
/*/upload/** = authcBasic,roles[administrator]
/*/update/** = authcBasic,roles[administrator]
/*/delete/** = authcBasic,roles[administrator]

# Everything else
/**=anon

8< /fuseki/shiro.ini >8--


Re: Is it possible to implement logical OR in Jena rule?

2023-01-13 Thread Lorenz Buehmann
You can emulate logical OR in the premise of the rule by just adding a 
rule for each operand with the same conclusion


(a b c) -> 

(d e f) -> 


On 13.01.23 22:42, L B wrote:

The Jena rule is composed like logical AND.

(a b c) (d e f) -> 

I wonder if we could implement logical OR like below

(a b c) | (a x w) -> xxx

where either (a b c) or (a x w) are true, then the rule is triggered.  I
know we could split it into two rules, but consider the complex cases.

((a b c) | (a x w))   ((d e f) | (w y z)) -> 

I am searching for the best practice to do it.  Tried to google but no
luck.

Any suggestions?

Regards,
DDD



Re: Increasing verbosity of shacl cli tool to show sh:message or sh:resultMessage

2023-01-13 Thread Lorenz Buehmann
How do you decide on which part of the OR failed to print your expected 
message? I mean, technically also the first part lead to FALSE, so we 
could easily also decide on printing an assigned message to that part - 
doesn't exist in your case, but is also a reason why the validation fails.


Or do you expect all messages being propagated up to the root node which 
failed?


On 11.01.23 23:02, Kyle Lawlor-Bagcal wrote:

Hello,

I'm interested to know if there is a way to configure the apache jena 
shacl cli tool to output the values of sh:message or sh:resultMessage. 
I have a property shape where I tried adding contextual info into 
sh:message/sh:resultMessage. I would like to see this info in the 
output from the shacl cli in the case of failed validation. Here is an 
minimal example to show the situation, in case I'm just doing 
something wrong.


schema.ttl:


@prefix schema:  .
@prefix sh:  .
@prefix xsd:  .
@prefix regen:  .

regen:ProjectPageShape a sh:NodeShape ;
  sh:targetClass regen:Project-Page ;

  sh:or (
    [ sh:not
  [ sh:path regen:landStewardStory ;
    sh:minCount 1 ;
    sh:maxCount 1
  ]
    ]
    [
  sh:and (
    [ sh:path regen:landStewardStory ;
  sh:minCount 1 ;
  sh:maxCount 1 ;
  sh:minLength 1 ;
  sh:maxLength 500 ;
  sh:datatype xsd:string
    ]
    [
  sh:path regen:landStewardStoryTitle ;
  sh:resultMessage "regen:landStewardStoryTitle missing" ;
  sh:minCount 1 ;
  sh:maxCount 1 ;
  sh:minLength 1 ;
  sh:maxLength 160 ;
  sh:datatype xsd:string
    ]
  )
    ]
  );
.

data.jsonld:

{
  "@context": {
    "regen": "http://regen.network/";,
  },
  "@type": "regen:Project-Page",
  "regen:landStewardStory": "In 1998, the local community supported 
our plan to establish the Rukinga Wildlife Sanctuary that covers 
80,000 acres of forest. We established a community works project so 
local residents had an alternative income stream in place of poaching 
and clear cutting​. We brought on locally hired rangers and trained 
them to be wilderness guardians. We convinced the owners of the 
cattle to remove the cattle from the land to reduce conflict over 
resources."

}
And here is the output from running "shacl validate --text 
--shapes=schema.ttl --data=data.jsonld":



[ rdf:type sh:ValidationReport ;
  sh:conforms  false ;
  sh:result    [ rdf:type sh:ValidationResult ;
 sh:focusNode  _:b0 ;
 sh:resultMessage  "Or at focusNode 
_:Bccda076ea2a1aedde6ef53f415f1b148" ;

 sh:resultSeverity sh:Violation ;
 sh:sourceConstraintComponent sh:OrConstraintComponent ;
 sh:sourceShape regen:ProjectPageShape ;
 sh:value  _:b0
]
] .

The result I am looking for would look like this:

[ rdf:type sh:ValidationReport ;
  sh:conforms  false ;
  sh:result    [ rdf:type sh:ValidationResult ;
 sh:focusNode  _:b0 ;
 sh:resultMessage "regen:landStewardStoryTitle 
missing" ;

 sh:resultSeverity sh:Violation ;
 sh:sourceConstraintComponent sh:OrConstraintComponent ;
 sh:sourceShape regen:ProjectPageShape ;
 sh:value  _:b0
]
] . 

Any suggestions on how this could be achieved?

Thanks,
Kyle



Re: Re: Use rsparl to request a spaql database with login/password authentication

2023-01-08 Thread Lorenz Buehmann

It's a commandline tool in the Jena distribution [1]

./bin/rset

[1] https://dlcdn.apache.org/jena/binaries/apache-jena-4.7.0.tar.gz

On 09.01.23 01:42, Justin wrote:

Hi Andy,

What repo or package contains the "rset" utility?

Not this one I assume?
https://github.com/eradman/rset


On Fri, Jan 6, 2023 at 5:16 PM Andy Seaborne  wrote:


rsparql does not have authentication support.

You can use curl which has support for HTTP authentication:

curl -d query='SELECT * {}' "http://localhost:3030/ds";

that returns application/sparql-results+json

Use "-u user:password" or .netrc to put user and password in a secured
file.


curl -d query='SELECT * {}' "http://localhost:3030/ds?format=text";

returns the text format if asking Fuseki.

For some other triple store, you can format the results locally with "rset"

curl -s -d query='SELECT * {}' "http://localhost:3030/ds"; | \
  rset -in json -out text -- -

  Andy

On 06/01/2023 16:30, Steven Blanchard wrote:

Hello Jena Team,

I currently use jena to perform sparql queries on local files with arq
and on remote public databases with rsparql.
Now, i would like to perform sparql queries with rsparql on a remote
private database that requires authentication by login and password.
In your documentation, you describe how to identify yourself using your
java api
().

Is

it possible to perform this identification with your rsparql program?
Having never programmed in java, I would like if possible to use as much
as possible the programs that you have created.

I use Jena v4.6.1 on a fedora 36 with openjdk 17.0.5.

Thank you for your answer and your great work,
Regards,

Steven




Re: listIndividuals versus listInstances

2023-01-01 Thread Lorenz Buehmann
I can't reproduce this with a dummy example just containing a single OWL 
class with a single individual.


Both ways return  the individual. Do you have some inference enabled? 
Can you share sample data and code? All entities are strongly typed 
(don't know if this matters here)?



OntModel::listIndividuals(type) calls


getGraph().find( null, RDF.type.asNode(), type.asNode() );

OntClass::listInstances() does


getModel()
    .listStatements( null, RDF.type, this )
    .mapWith( s -> s.getSubject().as( Individual.class ) )
    .filterKeep( o -> o.hasRDFType( OntClassImpl.this, 
direct ))

    .filterKeep( new UniqueFilter());


On 31.12.22 21:07, Steve Vestal wrote:
Given an OntModel myModel and an OntClass myClass contained in 
myModel, the call myModel.listIndividuals(myClass) provides the 
expected list of Individuals that are members of myClass; but the call 
myClass.listInstances() doesn't list anything.  I am curious what the 
difference is between the two (I was not able to make that out from 
the javadoc).





Re: How to list OntModel Individual OntProperty triples?

2023-01-01 Thread Lorenz Buehmann

Hi Steve!

Looks like you're looking for something similar to 
Individual::getOntClasses, but I think there is no such method in the code.


Also, once getting statements, the type itself would be opaque.

What you can do is to call .canAs() and then .as() methods, for example

OntModel m = ...

ExtendedIterator individuals = m.listIndividuals();
individuals.forEach(i -> {
    i.listProperties().forEach(stmt -> {
    Property predicate = stmt.getPredicate();
    if (predicate.canAs(OntProperty.class)) {
    OntProperty property = predicate.as(OntProperty.class);
    }
    });
    });


But I think this is not what you want, but I can't think of why you 
would need this?



Cheers,

Lorenz

On 31.12.22 17:45, Steve Vestal wrote:
I am using Individual#listProperties (inherited 
Resource#listProperties) to get the property instances (triples) for 
Individuals in an OntModel.  This method returns an iterator of 
Statements.  The statement objects are returned as ResourceImpl java 
objects, and the predicates are returned as PropertyImpl java 
objects.  What I would like to get as statement objects and predicates 
are the OntResource and OntProperty implementations from the 
OntModel.  I could find no methods such as Resource#isOntResource or 
Resource#asOntResource.  What is a good way to list property triples 
that maintains existing Ontology natures of statement objects, 
predicates, subjects?




Re: Question about RDF Frames

2022-12-29 Thread Lorenz Buehmann

Hi Steve,

In the documentation you referred to, the important statement is at the 
bottom:


Global properties listDeclaredProperties will treat properties with no 
specified domain as global, and regard them as properties of all 
classes. The use of the direct flag can hide global properties from 
non-root classes.
The general idea is, that if no domain is defined for the property, then 
we cannot assume it doesn't "belong" to any class A - indeed, using


cls.listDeclaredProperties(true)

would avoid that assumption, and only return all those properties 
without a domain for the root class only.



Cheers,

Lorenz

On 29.12.22 15:31, Steve Vestal wrote:
Below is an example from 
https://jena.apache.org/documentation/notes/rdf-frames.html (rewritten 
in ofn due to minor syntax error in example and my greater familiarity 
with ofn), with one minor addition.  I declared an object property 
that is not used anywhere else.


Prefix(purl:=)
Prefix(rf:=)
Ontology( 
    Annotation( purl:title "Test RDF Frames" )

 Declaration( Class( rf:LivingThing ) )
 Declaration( Class( rf:Animal ) )
 SubClassOf(rf:Animal rf:LivingThing  )
 Declaration( Class( rf:Mammal ) )
 SubClassOf(rf:Mammal rf:Animal  )
     Declaration( ObjectProperty ( rf:hasSkeleton ) )
     ObjectPropertyDomain( rf:hasSkeleton rf:Animal )

     Declaration( ObjectProperty ( rf:unused ) )    # added to example
)

When I call OntClass.listDeclaredProperties, the rf:unused property 
appears in the list for all the classes in the ontology. Otherwise it 
behaves as in the example.  I have done some other simple tests, and 
it seems to list almost all the properties in the ontology for all 
classes.  What I would like to do is have it list for a class only the 
properties that are known (can be proven) to be used in the definition 
of that class, e.g., where removal of that property might change what 
appears in the ABox for a particular knowledge base.  I would 
appreciate help understanding this behavior and how I might get 
something closer to the desired list.  Where am I getting bitten by 
the open world assumption?





Re: How to handle optional lists in SPARQL

2022-12-12 Thread Lorenz Buehmann
I don't know your full query restrictions, but without given properties 
it would be a "simple" property path, no? Something like


owl:someValuesFrom/((owl:intersectionOf|owl:unionOf)/list:member)?

where the list closure is optional and if you want to make it nestested

(owl:someValuesFrom/((owl:intersectionOf|owl:unionOf)/list:member)?)*

So a query like

prefix rdfs: 
prefix owl: 
prefix list: 

select * {
?subclass rdfs:subClassOf|owl:equivalentClass 
[(owl:intersectionOf|owl:unionOf)/list:member/(owl:someValuesFrom/((owl:intersectionOf|owl:unionOf)/list:member)?)* 
?m]

FILTER(isIRI(?m))
}

could work. You could even try to make it more generic.


But maybe you have different requirements, in that case it would be 
easier to help with sample data. My sample data now was


@prefix : 
 
.

@prefix owl:  .
@prefix rdf:  .
@prefix xml:  .
@prefix xsd:  .
@prefix rdfs:  .

:Foo rdf:type owl:Class ;
 owl:equivalentClass [ rdf:type owl:Class ;
   owl:unionOf ( :A
 [ rdf:type owl:Restriction ;
   owl:onProperty :p ;
   owl:someValuesFrom [ 
rdf:type owl:Class ;

owl:unionOf ( :B
[ rdf:type owl:Restriction ;
owl:onProperty :q ;
owl:someValuesFrom :C
]
)
  ]
 ]
   )
 ] .


Cheers,

Lorenz

On 12.12.22 13:49, Mikael Pesonen wrote:


Is there a shortcut for making queries where a data value can be 
single item or list of items?


For example this is how I do a query now using UNION. Both parts are 
identical except for the single/list section in owl:someValuesFrom []. 
This is still somewhat readable but if there are multiple occurences, 
query lenght and complexity grows exponentially.


{
    ?finding owl:equivalentClass|rdfs:subClassOf [
        owl:intersectionOf [
            list:member [
                rdf:type owl:Restriction ;
                owl:onProperty id:609096000 ;
                owl:someValuesFrom [
                    rdf:type owl:Restriction ;
                    owl:onProperty id:363698007 ;
                    owl:someValuesFrom ?site
                ]
            ]
        ]
    ]
    }
    UNION
    {
    ?finding owl:equivalentClass|rdfs:subClassOf [
        owl:intersectionOf [
            list:member [
                rdf:type owl:Restriction ;
                owl:onProperty id:609096000 ;
                owl:someValuesFrom [
                    owl:intersectionOf [
                        list:member [
                            rdf:type owl:Restriction ;
                            owl:onProperty id:363698007 ;
                            owl:someValuesFrom ?site
                        ]
                    ]
                ]
            ]
        ]
    ]
    }

The data is not ours so we can't make everything lists.


Re: Re: density of GraphUtil not recognized

2022-12-11 Thread Lorenz Buehmann

Made my day ...

StackOverflow is already disallowing ChatGPT comments ...


there are of course tools for graph measures beyond Jena, e.g. you could 
use Python rdflib which allows for converting the graph to a  networkx 
graph which in fact is a standard API for network analytics.


Or you use Apache Jena, convert the Jena model to a JGraphT directed 
pseudo graph [1] and use  JGraphT methods resp. its extensions


[1] 
https://jgrapht.org/javadoc/org.jgrapht.core/org/jgrapht/graph/DirectedPseudograph.html


On 11.12.22 19:43, emri mbiemri wrote:

The ChatGPT suggested using these kinds of solutions for getting some of
the above-mentioned graph metrics.
Anyway, is there any other tool that calculated some graph metrics on RDF
graphs?

On Sun, Dec 11, 2022 at 8:13 PM Andy Seaborne  wrote:



On 11/12/2022 13:15, emri mbiemri wrote:

ok, thanks, but is there any function within Apache Jena that calculates
the graph metrics such as centrality, density, and clustering

coefficient?

No.

What led you to thinking there was?

 Andy



Re: ARQ: query from passed string

2022-12-07 Thread Lorenz Buehmann

the syntax should be

arq --data /path/to/file.rdf  "SELECT * WHERE { ?s ?p ?o }"

--query only if you provide it as file

On 07.12.22 08:54, Adrian Gschwend wrote:

Hi,

I just wanted to extract some stuff from a local file where I need a 
super simple one-liner query. I tried this after checking the docs:


arq --query "SELECT * WHERE { ?s ?p ?o }" --data

But it seems to want this passed as file.

Is there a reason that this cannot be passed as simple one-liner 
argument, as in this example?


regards

Adrian



Re: Re: How to Use Fuseki Backup API

2022-11-27 Thread Lorenz Buehmann

does it work with setting the encoding in curl maybe like with

curl -H 'Accept-Encoding: gzip, deflate'

?

On 27.11.22 22:21, Tim McIver wrote:
Thanks Andy.  I probably won't have a large amount of data for some 
time but I can imagine getting a timeout or other error if there is a 
huge amount of data.  Can it be returned compressed?


Tim

On 11/27/22 13:53, Andy Seaborne wrote:



On 27/11/2022 18:28, Tim McIver wrote:
Thanks Bruno!  Yeah, that tracks since I know that LinkedDataHub 



and to be clear - provided by AtomGraph, not the Apache Jena project.

 uses the "no UI" version
 but I hadn't realized 
that it did not come with the admin functions.


So a follow up question would be: can I still somehow back up the data?


The backup is an RDF file

Another way to get an RDF file of all the data is

   curl http://:3030/ds

will return the data as well. It's just as valid and is a isolation 
snapshot.


This is the simplest way.

The backup operation, if it were available, schedules a background 
task. That task writes to a local file which you can copy after the 
backup has finished.




I have access to the data files from Fuseki and I considered just 
copying or rsyncing them but I know that, in general, that's not 
safe as I can't expect them to /not/ be updated while I'm backing 
them up.


Correct - copying the database files while it is running is, in 
general, not safe.


If the database is compacted, the old generation data area is safe to 
copy.


    Andy



Tim


On 11/27/22 13:18, Bruno Kinoshita wrote:

Ah, at the top of this page:
https://jena.apache.org/documentation/fuseki2/fuseki-main#fuseki-docker 



It says: "Fuseki main is a packaging of Fuseki as a triple store 
without a
UI for administration." And further down: "The main server does not 
depend

on any files on disk (other than for databases provided by the
application), and does not provide the Fuseki UI or admins 
functions to

create dataset via HTTP.". I had forgotten about that change.

So I believe you are right Tim, what you must have in your 
container is

Fuseki main without the UI, so without the backup servlet & endpoint
binding (thus the 404). You can have a look at the page about 
Fuseki + UI

for options for running it separately with access to admin features:
https://jena.apache.org/documentation/fuseki2/fuseki-webapp.html

-Bruno

On Sun, 27 Nov 2022 at 19:12, Bruno Kinoshita
wrote:


I got the same result following the docs for the Docker compose
installation:
https://jena.apache.org/documentation/fuseki2/fuseki-main#fuseki-docker 



Adding --update didn't solve it. So there might be something that 
needs to

be enabled in the dataset assembler configuration when you create the
dataset in the container.

On Sun, 27 Nov 2022 at 18:56, Tim McIver  wrote:


It's not working for me.  I even tried doing it from the fuseki
container.  It seems this image does not have curl so I tried 
wget using
'wgethttp://localhost:3030/$/backup/ds  --post-data ""'. Again, I 
get a

404.


Do the admin endpoints have to be specifically enabled? Or could 
they

have been disabled?

Tim

On 11/27/22 12:07, Bruno Kinoshita wrote:

Hi Tim,

I am not using a container, but I just tested the latest version 
from

Git

on Eclipse, and tested the endpoints with curl to query and backup.

Maybe

your endpoint URL is missing something?

1. Create ds in-memory dataset
2. Load some dummy data
3. curl a query: $ curl 'http://localhost:3030/ds/' -X POST 
--data-raw

'query=...' (success, data returned as expected)
4. curl to trigger a backup: $ curl 
'http://localhost:3030/$/backup/ds'

-X

POST

Then, if you want, you can also query for the tasks (a back up 
creates

an

async task on the server):

$ curlhttp://localhost:3030/$/tasks
[ {
"task" : "Backup" ,
"taskId" : "1" ,
"started" : "2022-11-27T18:06:01.868+01:00" ,
"finished" : "2022-11-27T18:06:01.893+01:00" ,
"success" : true
}
]

-Bruno

On Sun, 27 Nov 2022 at 17:55, Tim McIver  wrote:

I should mention also that the Docker image that I'm using in 
this case

comes from here.

On 11/27/22 11:43, Tim McIver wrote:

Hello,

I'd like to backup my Fuseki data using the web API. I found
documentation about how to do that here
<
https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html#backup 


.
But when I try use the listed endpoints, they all result in a 
404.

I'm using curl from a container in a Docker network to do this. I
know that I can connect to the server because a call like "curl
http:/:3030/ds" returns data with content type
"application/trig".

What am I missing? Any help would be appreciated.

Tim



Re: Re: Weird sparql problem

2022-11-08 Thread Lorenz Buehmann
Andy changed something in the algebra optimizer in latest develop, maybe 
you can try this though I don't know if it will change anything as it 
was more related to FILTER expressions.


On 08.11.22 12:04, Mikael Pesonen wrote:
Both your suggestions for rewriting the query worked. I'm lost with 
the reasons, but for future cases, breaking problematic queries with 
{} is they way to go?


On 04/11/2022 11.25, rve...@dotnetrdf.org wrote:
So yes as suspected the triple patterns are being reordered badly in 
the BGP:


   (sequence
 (table (vars ?sct_code)
   (row [?sct_code "298314008"])
 )
 (bgp
   (triple ?c skos:inScheme lsu:SNOMEDCT_US)
   (triple ?c skosxl:prefLabel ??0)
   (triple ??0 lsu:code ?sct_code)
 )))

The optimizer doesn’t take into account the fact that the ?sct_code 
variable is going to be bound by the VALUES clause (table in the 
algebra) so considers that the least specific triple pattern (as it 
has two variables) causing it to evaluate a much less specific triple 
pattern first.


Lorenz’s suggestion of generating statistics for your dataset is a 
good one, statistics would likely guide the optimiser that the ?c 
skos:inScheme lsu:SNOMEDCT_US triple is actually very non-specific 
for your dataset.


You could also try Andy’s suggestion else-thread i.e. --set 
arq:optReorderBGP=false passed to the CLI command in question, or if 
this is being called from code 
ARQ.getContext().set(ARQ.optReorderBGP, false);


The other thing you can do is explicitly break up your query further 
i.e.


{ VALUES ?sct_code { "298314008" }
   {  _:b0  lsu:code  ?sct_code .
 ?c    skosxl:prefLabel  _:b0 . }
   {  ?c    skos:inScheme lsu:SNOMEDCT_US }
   }

Essentially forcing the engine to evaluate that very unspecific 
triple pattern last


Another possibility would be to change that triple pattern to be in a 
FILTER EXISTS condition, so it’d only be evaluated for matches to 
your other triple patterns i.e.


{ VALUES ?sct_code { "298314008" }
 _:b0  lsu:code  ?sct_code .
 ?c    skosxl:prefLabel  _:b0 .
    FILTER EXISTS {  ?c    skos:inScheme lsu:SNOMEDCT_US }
   }

Hope this helps,

Rob

From: Lorenz Buehmann 
Date: Thursday, 3 November 2022 at 11:12
To: users@jena.apache.org 
Subject: Re: Re: Weird sparql problem
tdbquery --explain --loc  $TDB_LOC  "query here"

would also work to see the plan - maybe also increase log level to see
more: https://jena.apache.org/documentation/tdb/optimizer.html

Another question, did you generate the TDB stats such those could be
used by the optimizer?

for debugging purpose, you could also disable query optimization (put an
empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your query
manually, i.e.


WHERE
   { VALUES ?sct_code { "298314008" }
   _:b0  lsu:code  ?sct_code .
 ?c    skosxl:prefLabel  _:b0 .
 ?c    skos:inScheme lsu:SNOMEDCT_US
   }

without stats and based on heuristics (e.g. number of variables in
triple pattern), otherwise the last triple pattern might always be
evaluated first


On 03.11.22 11:11, Mikael Pesonen wrote:

Here's the parse, hope it helps:

WHERE
   { VALUES ?sct_code { "298314008" }
 ?c    skosxl:prefLabel  _:b0 .
 _:b0  lsu:code  ?sct_code .
 ?c    skos:inScheme lsu:SNOMEDCT_US
   }
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(prefix ((owl: 
<http://www.w3.org/2002/07/owl#<http://www.w3.org/2002/07/owl>>)
  (rdf: 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns>>)
  (skosxl: 
<http://www.w3.org/2008/05/skos-xl#<http://www.w3.org/2008/05/skos-xl>>)
  (skos: 
<http://www.w3.org/2004/02/skos/core#<http://www.w3.org/2004/02/skos/core>>)

  (dcterms: <http://purl.org/dc/terms/>)
  (rdfs: 
<http://www.w3.org/2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema>>)

  (lsr: <https://resource.lingsoft.fi/>)
  (id: <http://snomed.info/id/>)
  (dcat: 
<http://www.w3.org/ns/dcat#<http://www.w3.org/ns/dcat>>)

  (dc: <http://purl.org/dc/elements/1.1/>)
  (lsu: <https://www.lingsoft.fi/ns/umls/>))
   (sequence
 (table (vars ?sct_code)
   (row [?sct_code "298314008"])
 )
 (bgp
   (triple ?c skos:inScheme lsu:SNOMEDCT_US)
   (triple ?c skosxl:prefLabel ??0)
   (triple ??0 lsu:code ?sct_code)
 )))


On 02/11/2022 12.32, rve...@dotnetrdf.org wrote:

For these kind of performance issues it is useful to see the SPARQL
algebra for the whole query, not just fragments of the query.  You
can use the qparse command for the version of Jena you are using to
see how it is optimising your queries e.g.

qparse --explain --query example.rq

As Lorenz suggests this may be the optimiser making 

Re: Re: Weird sparql problem

2022-11-08 Thread Lorenz Buehmann

tdbstats --loc $PATH_TO_TDB_LOCATION

tdbstats --desc $PATH_TO_ASSEMBLER_FILE


On 08.11.22 11:57, Mikael Pesonen wrote:

I ran your version of the query with none.opt and no change. For

|tdbstats --loc=DIR|--desc=assemblerFile [--graph=URI] Could you 
please explain loc and desc parameters? |




On 03/11/2022 13.11, Lorenz Buehmann wrote:

tdbquery --explain --loc  $TDB_LOC  "query here"

would also work to see the plan - maybe also increase log level to 
see more: https://jena.apache.org/documentation/tdb/optimizer.html


Another question, did you generate the TDB stats such those could be 
used by the optimizer?


for debugging purpose, you could also disable query optimization (put 
an empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your 
query manually, i.e.



WHERE
  { VALUES ?sct_code { "298314008" }
  _:b0  lsu:code  ?sct_code .
    ?c    skosxl:prefLabel  _:b0 .
    ?c    skos:inScheme lsu:SNOMEDCT_US
  } 


without stats and based on heuristics (e.g. number of variables in 
triple pattern), otherwise the last triple pattern might always be 
evaluated first



On 03.11.22 11:11, Mikael Pesonen wrote:

Here's the parse, hope it helps:

WHERE
  { VALUES ?sct_code { "298314008" }
    ?c    skosxl:prefLabel  _:b0 .
    _:b0  lsu:code  ?sct_code .
    ?c    skos:inScheme lsu:SNOMEDCT_US
  }
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(prefix ((owl: <http://www.w3.org/2002/07/owl#>)
 (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
 (skosxl: <http://www.w3.org/2008/05/skos-xl#>)
 (skos: <http://www.w3.org/2004/02/skos/core#>)
 (dcterms: <http://purl.org/dc/terms/>)
 (rdfs: <http://www.w3.org/2000/01/rdf-schema#>)
 (lsr: <https://resource.lingsoft.fi/>)
 (id: <http://snomed.info/id/>)
 (dcat: <http://www.w3.org/ns/dcat#>)
 (dc: <http://purl.org/dc/elements/1.1/>)
 (lsu: <https://www.lingsoft.fi/ns/umls/>))
  (sequence
    (table (vars ?sct_code)
  (row [?sct_code "298314008"])
    )
    (bgp
  (triple ?c skos:inScheme lsu:SNOMEDCT_US)
  (triple ?c skosxl:prefLabel ??0)
  (triple ??0 lsu:code ?sct_code)
    )))


On 02/11/2022 12.32, rve...@dotnetrdf.org wrote:
For these kind of performance issues it is useful to see the SPARQL 
algebra for the whole query, not just fragments of the query.  You 
can use the qparse command for the version of Jena you are using to 
see how it is optimising your queries e.g.


qparse --explain --query example.rq

As Lorenz suggests this may be the optimiser making a bad guess at 
the appropriate order in which to evaluate the triple patterns 
within the BGP but without the larger query context or the algebra 
all we can do is guess.


Rob

From: Mikael Pesonen 
Date: Tuesday, 1 November 2022 at 12:53
To: users@jena.apache.org 
Subject: Re: Weird sparql problem
Diferent case, but again hanging makes no sense to user, whatever are
the technical reasons.

   VALUES ?sct_code { "298314008" }
 ?c skosxl:prefLabel [ lsu:code ?sct_code ]

returns one row immediately, but

   VALUES ?sct_code { "298314008" }
 ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
lsu:SNOMEDCT_US

hangs forever


   skos:inScheme lsu:SNOMEDCT_US;

On 18/10/2022 9.08, Lorenz Buehmann wrote:

Hi,

comments inline

On 17.10.22 14:35, Mikael Pesonen wrote:

This works as a separate query, but not in a the middle, since ?s
gets new values instead of binding to previous ?s.

{ select ?t where {
?s a ?t .
  } limit 10}
   ?t skos:prefLabel ?l


In the middle of what? Subqueries will be evaluated first - if you
really want labels for classes, you should use a DISTINCT in the
subquery such that the intermediate result is small, there shouldn't
be that many classes, but many instances with the same class, thus,
the join would be more expensive than necessary.



On 17/10/2022 14.56, Mikael Pesonen wrote:

?s a ?t .
   ?t skos:prefLabel ?l

returns 3 million triples. Maybe it's related to this?
I don't see how this should be related to  your initial query 
where ?s

was bound, which in my opinion should be an easy join. Is it possible
for you to share the dataset somehow? Also, what you can do is to
compute statistics for the TDB database with tdbstats tool [1] from
commandline and put it into the TDB folder. But even without the 
query

plan should take the first triple pattern, use the spo index as s and
p are bound, then pass the bindings of ?o to the evaluation of the
second triple pattern

[1]
https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 






On 21/09/2022 9.15, Lorenz Buehmann wrote:

Weird, only 10M triples and each triple pattern returns only 1
binding, thus, the size is tiny - honestly I can't think of
anything except for open connection

Re: Re: Weird sparql problem

2022-11-03 Thread Lorenz Buehmann

tdbquery --explain --loc  $TDB_LOC  "query here"

would also work to see the plan - maybe also increase log level to see 
more: https://jena.apache.org/documentation/tdb/optimizer.html


Another question, did you generate the TDB stats such those could be 
used by the optimizer?


for debugging purpose, you could also disable query optimization (put an 
empty none.opt file into $TDB_LOC/Data-0001 dir)  and reorder your query 
manually, i.e.



WHERE
  { VALUES ?sct_code { "298314008" }
  _:b0  lsu:code  ?sct_code .
    ?c    skosxl:prefLabel  _:b0 .
    ?c    skos:inScheme lsu:SNOMEDCT_US
  } 


without stats and based on heuristics (e.g. number of variables in 
triple pattern), otherwise the last triple pattern might always be 
evaluated first



On 03.11.22 11:11, Mikael Pesonen wrote:

Here's the parse, hope it helps:

WHERE
  { VALUES ?sct_code { "298314008" }
    ?c    skosxl:prefLabel  _:b0 .
    _:b0  lsu:code  ?sct_code .
    ?c    skos:inScheme lsu:SNOMEDCT_US
  }
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(prefix ((owl: <http://www.w3.org/2002/07/owl#>)
 (rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
 (skosxl: <http://www.w3.org/2008/05/skos-xl#>)
 (skos: <http://www.w3.org/2004/02/skos/core#>)
 (dcterms: <http://purl.org/dc/terms/>)
 (rdfs: <http://www.w3.org/2000/01/rdf-schema#>)
 (lsr: <https://resource.lingsoft.fi/>)
 (id: <http://snomed.info/id/>)
 (dcat: <http://www.w3.org/ns/dcat#>)
 (dc: <http://purl.org/dc/elements/1.1/>)
 (lsu: <https://www.lingsoft.fi/ns/umls/>))
  (sequence
    (table (vars ?sct_code)
  (row [?sct_code "298314008"])
    )
    (bgp
  (triple ?c skos:inScheme lsu:SNOMEDCT_US)
  (triple ?c skosxl:prefLabel ??0)
  (triple ??0 lsu:code ?sct_code)
    )))


On 02/11/2022 12.32, rve...@dotnetrdf.org wrote:
For these kind of performance issues it is useful to see the SPARQL 
algebra for the whole query, not just fragments of the query.  You 
can use the qparse command for the version of Jena you are using to 
see how it is optimising your queries e.g.


qparse --explain --query example.rq

As Lorenz suggests this may be the optimiser making a bad guess at 
the appropriate order in which to evaluate the triple patterns within 
the BGP but without the larger query context or the algebra all we 
can do is guess.


Rob

From: Mikael Pesonen 
Date: Tuesday, 1 November 2022 at 12:53
To: users@jena.apache.org 
Subject: Re: Weird sparql problem
Diferent case, but again hanging makes no sense to user, whatever are
the technical reasons.

   VALUES ?sct_code { "298314008" }
 ?c skosxl:prefLabel [ lsu:code ?sct_code ]

returns one row immediately, but

   VALUES ?sct_code { "298314008" }
 ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme
lsu:SNOMEDCT_US

hangs forever


   skos:inScheme lsu:SNOMEDCT_US;

On 18/10/2022 9.08, Lorenz Buehmann wrote:

Hi,

comments inline

On 17.10.22 14:35, Mikael Pesonen wrote:

This works as a separate query, but not in a the middle, since ?s
gets new values instead of binding to previous ?s.

{ select ?t where {
?s a ?t .
  } limit 10}
   ?t skos:prefLabel ?l


In the middle of what? Subqueries will be evaluated first - if you
really want labels for classes, you should use a DISTINCT in the
subquery such that the intermediate result is small, there shouldn't
be that many classes, but many instances with the same class, thus,
the join would be more expensive than necessary.



On 17/10/2022 14.56, Mikael Pesonen wrote:

?s a ?t .
   ?t skos:prefLabel ?l

returns 3 million triples. Maybe it's related to this?

I don't see how this should be related to  your initial query where ?s
was bound, which in my opinion should be an easy join. Is it possible
for you to share the dataset somehow? Also, what you can do is to
compute statistics for the TDB database with tdbstats tool [1] from
commandline and put it into the TDB folder. But even without the query
plan should take the first triple pattern, use the spo index as s and
p are bound, then pass the bindings of ?o to the evaluation of the
second triple pattern

[1]
https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file 






On 21/09/2022 9.15, Lorenz Buehmann wrote:

Weird, only 10M triples and each triple pattern returns only 1
binding, thus, the size is tiny - honestly I can't think of
anything except for open connections, but as you mentioned, running
the queries with only one triple pattern works as expected, so that
too many open connections shouldn't be an issue most likely.

Can you reproduce this behavior with newer Jena versions like 4.6.1?

Or can you reproduce this on different servers as well?

Is it also stuck of your run the query directly after you

Re: Re: Weird sparql problem

2022-11-02 Thread Lorenz Buehmann

I think if you use

BIND( "298314008" AS ?sct_code )

it would work for the second query?

Looks the the query optimizer does the join in wrong order

@Andy?

On 01.11.22 13:52, Mikael Pesonen wrote:
Diferent case, but again hanging makes no sense to user, whatever are 
the technical reasons.


 VALUES ?sct_code { "298314008" }
   ?c skosxl:prefLabel [ lsu:code ?sct_code ]

returns one row immediately, but

 VALUES ?sct_code { "298314008" }
   ?c skosxl:prefLabel [ lsu:code ?sct_code ]; skos:inScheme 
lsu:SNOMEDCT_US


hangs forever


 skos:inScheme lsu:SNOMEDCT_US;

On 18/10/2022 9.08, Lorenz Buehmann wrote:

Hi,

comments inline

On 17.10.22 14:35, Mikael Pesonen wrote:
This works as a separate query, but not in a the middle, since ?s 
gets new values instead of binding to previous ?s.


{ select ?t where {
?s a ?t .
 } limit 10}
  ?t skos:prefLabel ?l



In the middle of what? Subqueries will be evaluated first -  if you 
really want labels for classes, you should use a DISTINCT in the 
subquery such that the intermediate result is small, there shouldn't 
be that many classes, but many instances with the same class, thus, 
the join would be more expensive than necessary.





On 17/10/2022 14.56, Mikael Pesonen wrote:


?s a ?t .
  ?t skos:prefLabel ?l

returns 3 million triples. Maybe it's related to this?


I don't see how this should be related to  your initial query where 
?s was bound, which in my opinion should be an easy join. Is it 
possible for you to share the dataset somehow? Also, what you can do 
is to compute statistics for the TDB database with tdbstats tool [1] 
from commandline and put it into the TDB folder. But even without the 
query plan should take the first triple pattern, use the spo index as 
s and p are bound, then pass the bindings of ?o to the evaluation of 
the second triple pattern


[1] 
https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file






On 21/09/2022 9.15, Lorenz Buehmann wrote:
Weird, only 10M triples and each triple pattern returns only 1 
binding, thus, the size is tiny - honestly I can't think of 
anything except for open connections, but as you mentioned, 
running the queries with only one triple pattern works as 
expected, so that too many open connections shouldn't be an issue 
most likely.


Can you reproduce this behavior with newer Jena versions like 4.6.1?

Or can you reproduce this on different servers as well?

Is it also stuck of your run the query directly after you restart 
Fuseki?



On 19.09.22 13:49, Mikael Pesonen wrote:



On 15/09/2022 17.48, Lorenz Buehmann wrote:

Forgot:

- size of result for each triple pattern? Might affect if hash 
join can be used.

It's one row for each.


- your hardware?

Normal server with 16gigs mem.


- is it just the first query after starting Fuseki? Connections 
have been closed? Note, there was also a bug in a recent Jena 
version, but only with TDB and too many open connections. It has 
been resolved with release 4.6.1.

Jena has been running quite a while.


Might not be related, but I'm mentioning all things here 
nevertheless.



On 15.09.22 11:16, Mikael Pesonen wrote:


This returns one row fast, say :C1

SELECT *
FROM <https://a.b.c>
WHERE {
  <https://x.y.z> a ?t .
  #?t skos:prefLabel ?l
}


and this too:

SELECT *
FROM <https://a.b.c>
WHERE {
  #<https://x.y.z> a ?t .
  :C1 skos:prefLabel ?l
}


But this always hangs until timeout

SELECT *
FROM <https://a.b.c>
WHERE {
  <https://x.y.z> a ?t .
  ?t skos:prefLabel ?l
}

What am I missing here? I'm using Fuseki web GUI. Thanks!










Re: Model#remove statement not working for me

2022-10-28 Thread Lorenz Buehmann

your mistake is here:


Triple sameAsTriple =new Triple(
 ontModel.getResource("http://test/kbA#memberA";).asNode(), 
ontModel.getResource("owl:sameAs").asNode(), 
ontModel.getResource("http://test/kbB#memberB";).asNode());


the methods expects a proper URI, not a prefixed one - the reason for 
this is that there is nothing that would prevent users from using the 
the prefix owl: for some other namespace, so Jena can't guess here


You could indeed also make use of Jena built-in vocab methods:


Triple sameAsTriple =new Triple(
 ontModel.getResource("http://test/kbA#memberA";).asNode(), OWL.sameAs.asNode(), 
ontModel.getResource("http://test/kbB#memberB";).asNode());

or indeed just use the full URI of owl:sameAs


Cheers,

Lorenz

On 28.10.22 14:26, Steve Vestal wrote:
I make a call model.remove(statement) for a statement that is listed 
in that model, but that statement stays listed.


The following example code is a bit lengthy due to setup and 
alternatives tried, but it is executable (with Jena imports added).  I 
think this is a simple misunderstanding of models, statements, and 
triples.  Look for the "Why doesn't this work?" comment and the print 
of "base contents after remove".   How should this be done?  I am 
using Jena 4.5.0.


public class BaseModelEdits {

    public static void main(String args[]) {

        // Jena documentation says,
        // "And when we update the model, only the base model 
changes."


        // Create two knowledge bases that will not change during 
use.
        OntModel kbA = 
ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);

        OntClass classA = kbA.createClass("http://test/kbA#classA";);
        kbA.createIndividual("http://test/kbA#memberA";, classA);
        kbA.createIndividual("http://test/kbA#memberC";, classA);
        OntModel kbB = 
ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);

        OntClass classB = kbB.createClass("http://test/kbB#classB";);
        kbB.createIndividual("http://test/kbB#memberB";, classB);
        kbB.createIndividual("http://test/kbB#memberD";, classB);

        // Create initial content for an editable base model
        OntModel initialBase = 
ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);
        OntResource memA = 
initialBase.createOntResource("http://test/kbA#memberA";);
        OntResource memB = 
initialBase.createOntResource("http://test/kbB#memberB";);

        memA.addSameAs(memB);

        // Create an OntModel with the initial base plain Model 
content.
        OntModel ontModel = 
ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM_MICRO_RULE_INF, 
initialBase.getBaseModel());


        // None of the above have import statements (for reasons I 
won't go into).

        // Add the knowledge bases as plan Model subModels.
        ontModel.addSubModel(kbA.getBaseModel());
        ontModel.addSubModel(kbB.getBaseModel());

        Individual memberA = 
ontModel.getIndividual("http://test/kbA#memberA";);
        Individual memberB = 
ontModel.getIndividual("http://test/kbB#memberB";);
        Individual memberC = 
ontModel.getIndividual("http://test/kbA#memberC";);
        Individual memberD = 
ontModel.getIndividual("http://test/kbB#memberD";);
        System.out.println("memberA sameAs memberB true = " + 
memberA.isSameAs(memberB));
        System.out.println("memberA sameAs memberC false = " + 
memberA.isSameAs(memberC));
        System.out.println("memberA sameAs memberD false = " + 
memberA.isSameAs(memberD));
        System.out.println("memberB sameAs memberC false = " + 
memberB.isSameAs(memberC));
        System.out.println("memberB sameAs memberD false = " + 
memberB.isSameAs(memberD));

        System.out.println();

        // Approach 1: Selectively remove the sameAs assertion
        // Why doesn't this work?

        System.out.println("initial base contents");
        StmtIterator baseStmts = 
ontModel.getBaseModel().listStatements();

        while (baseStmts.hasNext()) {
            System.out.println("    " + 
baseStmts.next().asTriple().toString());

        }
        // Removing the sameAs assertion will require knowing how 
it is encoded.

        // None of the following three Triple formulations work.
//            Triple sameAsTriple = new 
Triple(NodeFactory.createURI("http://test/kbA#memberA";),
//                    NodeFactory.createURI("owl:sameAs"), 
NodeFactory.createURI("http://test/kbB#memberB";));

        Triple sameAsTriple = new Triple(
ontModel.getResource("http://test/kbA#memberA";).asNode(),
ontModel.getResource("owl:sameAs").asNode(),
ontModel.getResource("http://test/kbB#memberB";).asNode());
//            Triple sameAsTriple = new Triple(
// 
ontModel.getBaseModel().getResource("http://test/kbA#memberA";).asNode(),

// ontModel.getBaseModel().getResource("owl:

Re: Re: What is killing my Fuseki instance?

2022-10-24 Thread Lorenz Buehmann
I have the machine running now for hours, but to be fair, I didn't 
produce any load in the meantime.



On 24.10.22 14:17, Andy Seaborne wrote:

Hi Bob, good article!

Especially the "check your data before loading" bit.


https://bobdc.com/miscfiles/dataset2.ttl
You can remove all those "rdfs:subClassOf" triples. That all happens 
automatically.


On 23/10/2022 20:36, Bob DuCharme wrote:
> The good news is that I have gotten Fuseki running on a free tier AWS
> EC2 instance with very little trouble and was able to use the HTML
> interface and the SPARQL endpoint, as described at
> https://www.bobdc.com/blog/ec2fuseki/
>
> The bad news: it just randomly stops, even when there has been no
> querying activity, typically after 30-60 minutes of being up:
>
>    17:17:50 INFO  Server  ::   OS: Linux
> 5.10.144-127.601.amzn2.x86_64 amd64
>    17:17:50 INFO  Server  ::   PID:    3314
>    17:17:51 INFO  Server  :: Started 2022/10/23 17:17:51 UTC on
> port 3030
>    Killed
>
> The instance has 1GB of memory. I had only loaded 162K of data.
>
> Should I set JVM_ARGS different from the default?

Yes - as Lorenz says.

It needs to be less than the machine size, and allow a bit of other 
space (OS, file system cache).  Guess: 0.75G.



What I think is happening is that even when "nothing" is happening, 
there is still some small amount of work going on. Not from Fuseki 
itself but, for example, UI pings the server, a bit of Java runs.


The heap will slowly increase because there is no pressure to do a 
full GC and, if the heap size is set larger than the machine, 
eventually a request to grow the heap larger then the OS allows 
happens and the OS kills the process. No java/Fuseki log message.


Even though this work is very small, on a t2.micro "eventually" might 
be quite soon.



Another factor you may come across later, when using TDB2 on a small 
instance, is that the TDB2 caches will need tuning smaller for safety. 
Most likely, at 162K all the data ends up in RAM and the node table 
cache isn't large so it won't be a problem because it never gets very 
big.


For 162K of data, and it's read-only ("publishing"), I'd try putting 
everything in-memory at startup.


# Transactional in-memory dataset.
:dataset rdf:type ja:MemoryDataset ;
    ja:data "data1.trig"; ## Or a triples format like .ttl.
    .

which is

   fuseki-server --file DATA --update /dataset2 ## --update optional

or load with a script. Downside updates are lost.


It is possible to tune down TDB cache sizes.  And if anyone is really 
desperate, 32-bit JVMs (but don't unless you really have to).


The mechanism is rather clunky to apply from Fuseki at the moment.

    Andy

>
> Thanks,
>
> Bob
>


Re: Re: What is killing my Fuseki instance?

2022-10-24 Thread Lorenz Buehmann

I setup my own EC2 instance now with the same data and Fuseki version.

the only difference I did is to use OpenJDK 11 from the OS repository - 
I'll keep it running and see if and when it crashes.


By the way, you can also start the fuseki-server with --loc param such 
that you wouldn't need an assembler file (just for simplicity).


On 24.10.22 10:22, Lorenz Buehmann wrote:

Hi,

I'm pretty sure this was just OOM killed. Default Fuseki settings are 
nowadays 4GB unless you overwrite the JVM_ARGS environment variable [1]


Indeed, you could also check the OS logs to find the reason for 
killing the process but I would assume it's because of low memory



Cheers,

Lorenz

[1] 
https://github.com/apache/jena/blob/main/jena-fuseki2/apache-jena-fuseki/fuseki-server#L105


On 23.10.22 21:36, Bob DuCharme wrote:
The good news is that I have gotten Fuseki running on a free tier AWS 
EC2 instance with very little trouble and was able to use the HTML 
interface and the SPARQL endpoint, as described at 
https://www.bobdc.com/blog/ec2fuseki/


The bad news: it just randomly stops, even when there has been no 
querying activity, typically after 30-60 minutes of being up:


  17:17:50 INFO  Server  ::   OS: Linux 
5.10.144-127.601.amzn2.x86_64 amd64

  17:17:50 INFO  Server  ::   PID:    3314
  17:17:51 INFO  Server  :: Started 2022/10/23 17:17:51 UTC 
on port 3030

  Killed

The instance has 1GB of memory. I had only loaded 162K of data.

Should I set JVM_ARGS different from the default?

Thanks,

Bob



Re: What is killing my Fuseki instance?

2022-10-24 Thread Lorenz Buehmann

Hi,

I'm pretty sure this was just OOM killed. Default Fuseki settings are 
nowadays 4GB unless you overwrite the JVM_ARGS environment variable [1]


Indeed, you could also check the OS logs to find the reason for killing 
the process but I would assume it's because of low memory



Cheers,

Lorenz

[1] 
https://github.com/apache/jena/blob/main/jena-fuseki2/apache-jena-fuseki/fuseki-server#L105


On 23.10.22 21:36, Bob DuCharme wrote:
The good news is that I have gotten Fuseki running on a free tier AWS 
EC2 instance with very little trouble and was able to use the HTML 
interface and the SPARQL endpoint, as described at 
https://www.bobdc.com/blog/ec2fuseki/


The bad news: it just randomly stops, even when there has been no 
querying activity, typically after 30-60 minutes of being up:


  17:17:50 INFO  Server  ::   OS: Linux 
5.10.144-127.601.amzn2.x86_64 amd64

  17:17:50 INFO  Server  ::   PID:    3314
  17:17:51 INFO  Server  :: Started 2022/10/23 17:17:51 UTC on 
port 3030

  Killed

The instance has 1GB of memory. I had only loaded 162K of data.

Should I set JVM_ARGS different from the default?

Thanks,

Bob



Re: Re: SPARQL limit doesn't work

2022-10-19 Thread Lorenz Buehmann



On 19.10.22 13:44, Mikael Pesonen wrote:




On 19/10/2022 10.18, Lorenz Buehmann wrote:
Honestly - probably because of lack of knowledge - I don't see how 
that can happen with the text index. You have a single triple pattern 
that is querying the Lucene index for the given pattern and returns 
by default at most 10 000 documents.



text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )

translates to


( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
which indeed can return duplicate documents as for each triple a 
separate document is created and indexed.


I still don't get how a query with limit 1000 returning 560 then 
doesn't return 100 if using limit 100


Currently, I find your results quite counter intuitive, but I still 
have to learn a log when using RDF, SPARQL and Jena.



Can you share some data please to reproduce?
Unfortunately I can't share the data. Of course when time, I could 
create similar dummy index.


What happens for a single property only? 


What does this mean?
you're querying two properties aka two fields in the Lucene query. What 
if you just use skos:prefLabel ?


Pagination should work as you're doing, the Lucene query is 
internally executed once, then cached - for later requests the same 
Lucene documents hits should be reused


On 19.10.22 08:21, Mikael Pesonen wrote:


Hi,

yes, same select as only query gets exactly limit amount of triples.

On 18/10/2022 16.48, Lorenz Buehmann wrote:
did you get those results when running only this subquery? Afaik, 
the default limit of the Lucene text query is at most 10 000 
documents - and I don't think that the outer LIMIT would make it to 
the Lucene request



On 18.10.22 13:35, Mikael Pesonen wrote:


I have a bigger query that starts with inner select

 { SELECT ?s ?score WHERE {
    (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx 
yy\"" "lang:en" ) .

    } order by desc(?score) offset 0 limit 1000 }

There are about 1 results. limit 1000 returns ~560 and limit 
100 ~75 results. How do I page results correctly?






Re: Re: SPARQL limit doesn't work

2022-10-19 Thread Lorenz Buehmann
Honestly - probably because of lack of knowledge - I don't see how that 
can happen with the text index. You have a single triple pattern that is 
querying the Lucene index for the given pattern and returns by default 
at most 10 000 documents.



text:query (skos:prefLabel skos:altLabel "\"xx yy\"" "lang:en" )

translates to


( (prefLabel:"\"xx yy\"" OR altLabel:"\"xx yy\"") AND lang:en)
which indeed can return duplicate documents as for each triple a 
separate document is created and indexed.


I still don't get how a query with limit 1000 returning 560 then doesn't 
return 100 if using limit 100


Currently, I find your results quite counter intuitive, but I still have 
to learn a log when using RDF, SPARQL and Jena.



Can you share some data please to reproduce?

What happens for a single property only? Pagination should work as 
you're doing, the Lucene query is internally executed once, then cached 
- for later requests the same Lucene documents hits should be reused


On 19.10.22 08:21, Mikael Pesonen wrote:


Hi,

yes, same select as only query gets exactly limit amount of triples.

On 18/10/2022 16.48, Lorenz Buehmann wrote:
did you get those results when running only this subquery? Afaik, the 
default limit of the Lucene text query is at most 10 000 documents - 
and I don't think that the outer LIMIT would make it to the Lucene 
request



On 18.10.22 13:35, Mikael Pesonen wrote:


I have a bigger query that starts with inner select

 { SELECT ?s ?score WHERE {
    (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\"" 
"lang:en" ) .

    } order by desc(?score) offset 0 limit 1000 }

There are about 1 results. limit 1000 returns ~560 and limit 100 
~75 results. How do I page results correctly?




Re: SPARQL limit doesn't work

2022-10-18 Thread Lorenz Buehmann
did you get those results when running only this subquery? Afaik, the 
default limit of the Lucene text query is at most 10 000 documents - and 
I don't think that the outer LIMIT would make it to the Lucene request



On 18.10.22 13:35, Mikael Pesonen wrote:


I have a bigger query that starts with inner select

 { SELECT ?s ?score WHERE {
    (?s ?score) text:query (skos:prefLabel skos:altLabel "\"xx yy\"" 
"lang:en" ) .

    } order by desc(?score) offset 0 limit 1000 }

There are about 1 results. limit 1000 returns ~560 and limit 100 
~75 results. How do I page results correctly?


Re: Re: Weird sparql problem

2022-10-17 Thread Lorenz Buehmann

Hi,

comments inline

On 17.10.22 14:35, Mikael Pesonen wrote:
This works as a separate query, but not in a the middle, since ?s gets 
new values instead of binding to previous ?s.


{ select ?t where {
?s a ?t .
 } limit 10}
  ?t skos:prefLabel ?l



In the middle of what? Subqueries will be evaluated first -  if you 
really want labels for classes, you should use a DISTINCT in the 
subquery such that the intermediate result is small, there shouldn't be 
that many classes, but many instances with the same class, thus, the 
join would be more expensive than necessary.





On 17/10/2022 14.56, Mikael Pesonen wrote:


?s a ?t .
  ?t skos:prefLabel ?l

returns 3 million triples. Maybe it's related to this?


I don't see how this should be related to  your initial query where ?s 
was bound, which in my opinion should be an easy join. Is it possible 
for you to share the dataset somehow? Also, what you can do is to 
compute statistics for the TDB database with tdbstats tool [1] from 
commandline and put it into the TDB folder. But even without the query 
plan should take the first triple pattern, use the spo index as s and p 
are bound, then pass the bindings of ?o to the evaluation of the second 
triple pattern


[1] 
https://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file






On 21/09/2022 9.15, Lorenz Buehmann wrote:
Weird, only 10M triples and each triple pattern returns only 1 
binding, thus, the size is tiny - honestly I can't think of anything 
except for open connections, but as you mentioned, running the 
queries with only one triple pattern works as expected, so that too 
many open connections shouldn't be an issue most likely.


Can you reproduce this behavior with newer Jena versions like 4.6.1?

Or can you reproduce this on different servers as well?

Is it also stuck of your run the query directly after you restart 
Fuseki?



On 19.09.22 13:49, Mikael Pesonen wrote:



On 15/09/2022 17.48, Lorenz Buehmann wrote:

Forgot:

- size of result for each triple pattern? Might affect if hash 
join can be used.

It's one row for each.


- your hardware?

Normal server with 16gigs mem.


- is it just the first query after starting Fuseki? Connections 
have been closed? Note, there was also a bug in a recent Jena 
version, but only with TDB and too many open connections. It has 
been resolved with release 4.6.1.

Jena has been running quite a while.


Might not be related, but I'm mentioning all things here 
nevertheless.



On 15.09.22 11:16, Mikael Pesonen wrote:


This returns one row fast, say :C1

SELECT *
FROM <https://a.b.c>
WHERE {
  <https://x.y.z> a ?t .
  #?t skos:prefLabel ?l
}


and this too:

SELECT *
FROM <https://a.b.c>
WHERE {
  #<https://x.y.z> a ?t .
  :C1 skos:prefLabel ?l
}


But this always hangs until timeout

SELECT *
FROM <https://a.b.c>
WHERE {
  <https://x.y.z> a ?t .
  ?t skos:prefLabel ?l
}

What am I missing here? I'm using Fuseki web GUI. Thanks!








Re: Re: Fuseki container is OOMKilled

2022-09-29 Thread Lorenz Buehmann
From my understanding, a larger heap space for Fuseki should only be 
necessary when doing reasoning or e.g. loading the geospatial index. A 
TDB database on the other hand is backed by memory mapped files, i.e. 
makes use of off-heap memory and let the OS do all the work.


Indeed, I cannot explain why assigning more heap makes the Fuseki thread 
consume as much until OOM is reached.


We also have a Fuseki Docker deployment, and we assigned Fuseki way more 
memory (64GB) because of generating a large scale spatial index needs 
all geometry objects in memory once. But it didn't crash because of what 
you describe with the query load (maybe we never had such a constant load).


Indeed, comparison is difficult, different machines, different Docker 
container, different Fuseki version ...



I think Andy will have better explanations and maybe also others like 
Rob or people already using Fuseki@Docker


On 29.09.22 16:45, Martynas Jusevičius wrote:

Still hasn't crashed, so less heap could be the solution in this case.

On Thu, Sep 29, 2022 at 3:12 PM Martynas Jusevičius
 wrote:

I've lowered the heap size to 4GB to leave more off-heap memory (6GB).
It's been an hour and OOMKilled hasn't happened yet unlike before.
MEM% in docker stats peaks around 70%.

On Thu, Sep 29, 2022 at 12:41 PM Martynas Jusevičius
 wrote:

OK the findings are weird so far...

Under constant query load on my local Docker, MEM% of the Fuseki
container reached 100% within 45 minutes and it got OOMKilled.

However, the Used heap "teeth" in VisualVM were below 3GB of the total
~8GB Heap size the whole time.

What does that tell us?


On Thu, Sep 29, 2022 at 11:58 AM Martynas Jusevičius
 wrote:

Hi Eugen,

I have the debugger working, I was trying to connect the profiler :)
Finally I managed to connect from VisualVM on Windows thanks to this
answer: 
https://stackoverflow.com/questions/66222727/how-to-connect-to-jmx-server-running-inside-wsl2/71881475#71881475

I've launched an infinite curl loop to create some query load, but
what now? What should I be looking for in VisualVM?

On Thu, Sep 29, 2022 at 11:33 AM Eugen Stan  wrote:

For debugging, you need to do the following:

* pass JVM options to enable debugging
* expose docker port for JVM debug you chose

https://stackoverflow.com/questions/138511/what-are-java-command-line-options-to-set-to-allow-jvm-to-be-remotely-debugged

You should be able to do all this without changing the image: docker env
variables and docker port option.

Once container is started and port is listening, open (confirm with
docker ps) connect to it to debug.

Good luck,

On 29.09.2022 11:22, Martynas Jusevičius wrote:

On Thu, Sep 29, 2022 at 9:41 AM Lorenz Buehmann
 wrote:

You're working on an in-memory dataset?

No the datasets are TDB2-backed


Does it also happen with Jena 4.6.1?

Don't know :)

I wanted to run a profiler and tried connecting from VisualVM on
Windows to the Fuseki container but neither jstatd nor JMX connections
worked...
Now I want to run VisualVM inside the container itself but this
requires changing the Docker image in a way that I haven't figured out
yet.


On 28.09.22 20:23, Martynas Jusevičius wrote:

Hi,

We have a dockerized Fuseki 4.5.0 instance that is gradually running
out of memory over the course of a few hours.

3 datasets, none larger than 10 triples. The load is negligible
(maybe a few bursts x 10 simple queries per minute), no updates.

Dockerfile: https://github.com/AtomGraph/fuseki-docker/blob/master/Dockerfile
Memory settings:
mem_limit: 10240m
JAVA_OPTIONS=-Xmx7700m -Xms7700m

Any advice?

Martynas

--
Eugen Stan

+40770 941 271  / https://www.netdava.com


Re: Fuseki container is OOMKilled

2022-09-29 Thread Lorenz Buehmann

You're working on an in-memory dataset? Does it also happen with Jena 4.6.1?

On 28.09.22 20:23, Martynas Jusevičius wrote:

Hi,

We have a dockerized Fuseki 4.5.0 instance that is gradually running
out of memory over the course of a few hours.

3 datasets, none larger than 10 triples. The load is negligible
(maybe a few bursts x 10 simple queries per minute), no updates.

Dockerfile: https://github.com/AtomGraph/fuseki-docker/blob/master/Dockerfile
Memory settings:
mem_limit: 10240m
JAVA_OPTIONS=-Xmx7700m -Xms7700m

Any advice?

Martynas


Re: Persist SHACL shapes in dataset possible?

2022-09-21 Thread Lorenz Buehmann
Interesting question. I think currently the Fuseki SHACL service expects 
a Turtle file for the shapes acording to [1]. I agree that this seems to 
be rather inefficient.


@Andy:

- the code expects only Turtle format, right? RDF/XML would fail to 
parse the shapes then, correct?


- the shapes are parsed into a Graph, wouldn't it be possible to reuse a 
named graph of the backend dataset containing the shapes? Or would it be 
too slow to use shaped from e.g. a TDB backend?



Cheers,

Lorenz


[1] 
https://github.com/apache/jena/blob/main/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/servlets/SHACL_Validation.java#L66


On 20.09.22 17:22, Sebastian Faubel wrote:

Hello everyone,

I am using Jena Fuseki 4.6.1 with a dataset that I want to validate using
SHACL. I've seen the documentation on the SHACL feature in Apache Jena
Fuseki here:

https://jena.apache.org/documentation/shacl/

My issue is that my SHACL shapes graph has around 300mb. Uploading this
every time I want to validate would be pretty slow and inefficient. I was
wondering if it is possible to persist the shapes graph in the dataset
somehow?

Thank you! :)

~Sebastian

*Semiodesk GmbH | *Werner-von-Siemens-Str. 6 Geb. 15k, 86159 Augsburg,
Germany | Phone: +49 821 8854401 | Fax: +49 821 8854410 | www.semiodesk.com


This e-mail message may contain confidential or legally privileged
information and is intended only for the use of the intended recipient(s).
Any unauthorized disclosure, dissemination, distribution, copying or the
taking of any action in reliance on the information herein is prohibited.
E-mails are not secure and cannot be guaranteed to be error free as they
can be intercepted, amended, or contain viruses. Anyone who communicates
with us by e-mail is deemed to have accepted these risks. Semiodesk GmbH is
not responsible for errors or omissions in this message and denies any
responsibility for any damage arising from the use of e-mail. Any opinion
and other statement contained in this message and any attachment are solely
those of the author and do not necessarily represent those of the company.



Re: Re: Weird sparql problem

2022-09-20 Thread Lorenz Buehmann
Weird, only 10M triples and each triple pattern returns only 1 binding, 
thus, the size is tiny - honestly I can't think of anything except for 
open connections, but as you mentioned, running the queries with only 
one triple pattern works as expected, so that too many open connections 
shouldn't be an issue most likely.


Can you reproduce this behavior with newer Jena versions like 4.6.1?

Or can you reproduce this on different servers as well?

Is it also stuck of your run the query directly after you restart Fuseki?


On 19.09.22 13:49, Mikael Pesonen wrote:



On 15/09/2022 17.48, Lorenz Buehmann wrote:

Forgot:

- size of result for each triple pattern? Might affect if hash join 
can be used.

It's one row for each.


- your hardware?

Normal server with 16gigs mem.


- is it just the first query after starting Fuseki? Connections have 
been closed? Note, there was also a bug in a recent Jena version, but 
only with TDB and too many open connections. It has been resolved 
with release 4.6.1.

Jena has been running quite a while.


Might not be related, but I'm mentioning all things here nevertheless.


On 15.09.22 11:16, Mikael Pesonen wrote:


This returns one row fast, say :C1

SELECT *
FROM <https://a.b.c>
WHERE {
  <https://x.y.z> a ?t .
  #?t skos:prefLabel ?l
}


and this too:

SELECT *
FROM <https://a.b.c>
WHERE {
  #<https://x.y.z> a ?t .
  :C1 skos:prefLabel ?l
}


But this always hangs until timeout

SELECT *
FROM <https://a.b.c>
WHERE {
  <https://x.y.z> a ?t .
  ?t skos:prefLabel ?l
}

What am I missing here? I'm using Fuseki web GUI. Thanks!




Re: Weird sparql problem

2022-09-15 Thread Lorenz Buehmann

Forgot:

- size of result for each triple pattern? Might affect if hash join can 
be used.


- your hardware?

- is it just the first query after starting Fuseki? Connections have 
been closed? Note, there was also a bug in a recent Jena version, but 
only with TDB and too many open connections. It has been resolved with 
release 4.6.1.


Might not be related, but I'm mentioning all things here nevertheless.


On 15.09.22 11:16, Mikael Pesonen wrote:


This returns one row fast, say :C1

SELECT *
FROM 
WHERE {
   a ?t .
  #?t skos:prefLabel ?l
}


and this too:

SELECT *
FROM 
WHERE {
  # a ?t .
  :C1 skos:prefLabel ?l
}


But this always hangs until timeout

SELECT *
FROM 
WHERE {
   a ?t .
  ?t skos:prefLabel ?l
}

What am I missing here? I'm using Fuseki web GUI. Thanks!


Re: Weird sparql problem

2022-09-15 Thread Lorenz Buehmann

Fuseki with in-memory backend or TDB?

Which version?

How large is the dataset? Not that I see how this simple query with a 
single join should lead to a timeout, but any numbers are usually helpful.


Did you try the query without defining the default graph but using a 
graph pattern, i.e.


SELECT *
WHERE {
  GRAPH  { a ?t .
  ?t skos:prefLabel ?l }
}


And/or did you try to reorder the triple patterns? The query optimizer 
should prefer the first one though anyways as it can make use of spo 
index (if it would be TDB)


On 15.09.22 11:16, Mikael Pesonen wrote:


This returns one row fast, say :C1

SELECT *
FROM 
WHERE {
   a ?t .
  #?t skos:prefLabel ?l
}


and this too:

SELECT *
FROM 
WHERE {
  # a ?t .
  :C1 skos:prefLabel ?l
}


But this always hangs until timeout

SELECT *
FROM 
WHERE {
   a ?t .
  ?t skos:prefLabel ?l
}

What am I missing here? I'm using Fuseki web GUI. Thanks!


Re: Re: Re: Re: Re: TDB2 bulk loader - multiple files into different graph per file

2022-08-29 Thread Lorenz Buehmann
I spotted an interesting difference in performance gap/gain when using a 
smaller dataset for Europe:


On the server we have

- the ZFS raid with less powerful hard-disks, i.e. only SATA with 4 x 
Samsung 870 QVO


- an 2TB NVMe mounted separately


On the ZFS raid:

    with Jena 4.6.0:

        Triples = 54,821,333
        3,047.89 sec : 54,821,333 Triples : 17,986.64 per second : 0 
errors : 10 warnings


    with Jena 4.7.0 patched with the BufferedInputStream wrapper:

        Triples = 54,821,333
        308.05 sec : 54,821,333 Triples : 177,963.61 per second : 0 
errors : 10 warnings



On the NVMe

    with Jena 4.6.0:

        Triples = 54,821,333
        824.11 sec : 54,821,333 Triples : 66,521.62 per second : 0 
errors : 10 warnings


    with Jena 4.7.0 patched with the BufferedInputStream wrapper:

        Triples = 54,821,333
        303.07 sec : 54,821,333 Triples : 180,888.49 per second : 0 
errors : 10 warnings



Observation:

- the difference on the ZFS raid is factor 10

- on the NVMe disk it is "only" 3x faster with the buffered stream


Looks like the Bzip implementation of Apache Commons Compress is doing 
lots of IO stuff, which is why it suffers way more not having the 
buffered stream on the ZFS raid compared to the faster NVMe disk.


Nevertheless, it's always worth to use the buffered stream


On 29.08.22 15:53, Simon Bin wrote:

I was asked to try it on my system (samsung 970 evo+ nvme, intel
11850h), but I used a slightly smaller data set (river_europe); it is
not quite as bad as on Lorenz' but the buffering would help
nevertheless:

main  : river_europe-latest.osm.pbf.ttl.bz2   : 815.14 sec : 72,098,221 
Triples :  88,449.21 per second : 0 errors : 10 warnings
fix/bzip2 : river_europe-latest.osm.pbf.ttl.bz2   : 376.64 sec : 72,098,221 
Triples : 191,424.76 per second : 0 errors : 10 warnings
pbzip2 -dc  river_europe-latest.osm.pbf.ttl.bz2 | : 155.24 sec : 72,098,221 
Triples : 464,442.66 per second : 0 errors : 10 warnings
 river_europe-latest.osm.pbf.ttl   : 136.92 sec : 72,098,221 
Triples : 526,587.26 per second : 0 errors : 10 warnings

Cheers,

On Mon, 2022-08-29 at 13:09 +0200, Lorenz Buehmann wrote:

In addition I used the OS tool in a pipe:

bunzip2 -c river_planet-latest.osm.pbf.ttl.bz2 | riot --time --count
--syntax "Turtle"

Triples = 163,310,838
stdin   : 717.78 sec : 163,310,838 Triples : 227,523.09 per
second : 0 errors : 31 warnings


unsurprisingly more or less exactly the time of decompression + the
parsing time of the uncompressed file - still way faster than the
Apache
Commons one, even with my suggested fix the OS variant is ~5min
faster


On 29.08.22 11:24, Lorenz Buehmann wrote:

riot --time --count river_planet-latest.osm.pbf.ttl

Triples = 163,310,838
351.00 sec : 163,310,838 Triples : 465,271.72 per second : 0 errors
:
31 warnings


riot --time --count river_planet-latest.osm.pbf.ttl.gz

Triples = 163,310,838
431.74 sec : 163,310,838 Triples : 378,258.50 per second : 0 errors
:
31 warnings


riot --time --count river_planet-latest.osm.pbf.ttl.bz2

Triples = 163,310,838
9,948.17 sec : 163,310,838 Triples : 16,416.17 per second : 0
errors :
31 warnings


Takes ages with Bzip2 ... there must be something going wrong ...


We checked code and the Apache Commons Compress docs, a colleague
spotted the hint at
https://commons.apache.org/proper/commons-compress/examples.html#Buffering
  
:



The stream classes all wrap around streams provided by the
calling
code and they work on them directly without any additional
buffering.
On the other hand most of them will benefit from buffering so it
is
highly recommended that users wrap their stream in
Buffered(In|Out)putStreams before using the Commons Compress API.

we were curious about this statement, checked
org.apache.jena.atlas.io.IO class and added one line in openFileEx

in = new BufferedInputStream(in);

which wraps the file stream before its passed to the decompressor
streams


Run again the parsing:


riot --time --count river_planet-latest.osm.pbf.ttl.bz2 (Jena
4.7.0-SNAPSHOT fork with a BufferedInputStream wrapping the file
stream in IO class)

Triples = 163,310,838
1,004.68 sec : 163,310,838 Triples : 162,550.10 per second : 0
errors
: 31 warnings


What do you think?


On 28.08.22 14:22, Andy Seaborne wrote:

If you are relying on Jena to do the bz2 decompress, then it is
using Commons Compress.

gz is done (via Commons Compress) in native code. I use gz and
if I
get a bz2 file, I decompress it with OS tools.

Could you try an experiment please?

Run on the same hardware as the loader was run:

riot --time --count river_planet-latest.osm.pbf.ttl
riot --time --count river_planet-latest.osm.pbf.ttl.bz2

     Andy

gz vs plain: NVMe m2 SSD : Dell XPS 13 9310

riot --time --count .../BSBM/bsbm-25m.nt.gz
Triples = 24,997,044
118.02 sec : 24,997,044 Triples : 211,808.84 per second

riot --time --count .../BSBM/bsbm-25m.nt
Triples = 24,997,044
1

Re: Re: Re: TDB2 bulk loader - multiple files into different graph per file

2022-08-29 Thread Lorenz Buehmann

In addition I used the OS tool in a pipe:

bunzip2 -c river_planet-latest.osm.pbf.ttl.bz2 | riot --time --count 
--syntax "Turtle"


Triples = 163,310,838
stdin   : 717.78 sec : 163,310,838 Triples : 227,523.09 per 
second : 0 errors : 31 warnings



unsurprisingly more or less exactly the time of decompression + the 
parsing time of the uncompressed file - still way faster than the Apache 
Commons one, even with my suggested fix the OS variant is ~5min faster



On 29.08.22 11:24, Lorenz Buehmann wrote:

riot --time --count river_planet-latest.osm.pbf.ttl

Triples = 163,310,838
351.00 sec : 163,310,838 Triples : 465,271.72 per second : 0 errors : 
31 warnings



riot --time --count river_planet-latest.osm.pbf.ttl.gz

Triples = 163,310,838
431.74 sec : 163,310,838 Triples : 378,258.50 per second : 0 errors : 
31 warnings



riot --time --count river_planet-latest.osm.pbf.ttl.bz2

Triples = 163,310,838
9,948.17 sec : 163,310,838 Triples : 16,416.17 per second : 0 errors : 
31 warnings



Takes ages with Bzip2 ... there must be something going wrong ...


We checked code and the Apache Commons Compress docs, a colleague 
spotted the hint at 
https://commons.apache.org/proper/commons-compress/examples.html#Buffering 
:


The stream classes all wrap around streams provided by the calling 
code and they work on them directly without any additional buffering. 
On the other hand most of them will benefit from buffering so it is 
highly recommended that users wrap their stream in 
Buffered(In|Out)putStreams before using the Commons Compress API.
we were curious about this statement, checked 
org.apache.jena.atlas.io.IO class and added one line in openFileEx


in = new BufferedInputStream(in);

which wraps the file stream before its passed to the decompressor streams


Run again the parsing:


riot --time --count river_planet-latest.osm.pbf.ttl.bz2 (Jena 
4.7.0-SNAPSHOT fork with a BufferedInputStream wrapping the file 
stream in IO class)


Triples = 163,310,838
1,004.68 sec : 163,310,838 Triples : 162,550.10 per second : 0 errors 
: 31 warnings



What do you think?


On 28.08.22 14:22, Andy Seaborne wrote:




If you are relying on Jena to do the bz2 decompress, then it is 
using Commons Compress.


gz is done (via Commons Compress) in native code. I use gz and if I 
get a bz2 file, I decompress it with OS tools.


Could you try an experiment please?

Run on the same hardware as the loader was run:

riot --time --count river_planet-latest.osm.pbf.ttl
riot --time --count river_planet-latest.osm.pbf.ttl.bz2

    Andy

gz vs plain: NVMe m2 SSD : Dell XPS 13 9310

riot --time --count .../BSBM/bsbm-25m.nt.gz
Triples = 24,997,044
118.02 sec : 24,997,044 Triples : 211,808.84 per second

riot --time --count .../BSBM/bsbm-25m.nt
Triples = 24,997,044
109.97 sec : 24,997,044 Triples : 227,314.05 per second


Re: Re: TDB2 bulk loader - multiple files into different graph per file

2022-08-29 Thread Lorenz Buehmann

riot --time --count river_planet-latest.osm.pbf.ttl

Triples = 163,310,838
351.00 sec : 163,310,838 Triples : 465,271.72 per second : 0 errors : 31 
warnings



riot --time --count river_planet-latest.osm.pbf.ttl.gz

Triples = 163,310,838
431.74 sec : 163,310,838 Triples : 378,258.50 per second : 0 errors : 31 
warnings



riot --time --count river_planet-latest.osm.pbf.ttl.bz2

Triples = 163,310,838
9,948.17 sec : 163,310,838 Triples : 16,416.17 per second : 0 errors : 
31 warnings



Takes ages with Bzip2 ... there must be something going wrong ...


We checked code and the Apache Commons Compress docs, a colleague 
spotted the hint at 
https://commons.apache.org/proper/commons-compress/examples.html#Buffering :


The stream classes all wrap around streams provided by the calling 
code and they work on them directly without any additional buffering. 
On the other hand most of them will benefit from buffering so it is 
highly recommended that users wrap their stream in 
Buffered(In|Out)putStreams before using the Commons Compress API.
we were curious about this statement, checked 
org.apache.jena.atlas.io.IO class and added one line in openFileEx


in = new BufferedInputStream(in);

which wraps the file stream before its passed to the decompressor streams


Run again the parsing:


riot --time --count river_planet-latest.osm.pbf.ttl.bz2 (Jena 
4.7.0-SNAPSHOT fork with a BufferedInputStream wrapping the file stream 
in IO class)


Triples = 163,310,838
1,004.68 sec : 163,310,838 Triples : 162,550.10 per second : 0 errors : 
31 warnings



What do you think?


On 28.08.22 14:22, Andy Seaborne wrote:




If you are relying on Jena to do the bz2 decompress, then it is using 
Commons Compress.


gz is done (via Commons Compress) in native code. I use gz and if I 
get a bz2 file, I decompress it with OS tools.


Could you try an experiment please?

Run on the same hardware as the loader was run:

riot --time --count river_planet-latest.osm.pbf.ttl
riot --time --count river_planet-latest.osm.pbf.ttl.bz2

    Andy

gz vs plain: NVMe m2 SSD : Dell XPS 13 9310

riot --time --count .../BSBM/bsbm-25m.nt.gz
Triples = 24,997,044
118.02 sec : 24,997,044 Triples : 211,808.84 per second

riot --time --count .../BSBM/bsbm-25m.nt
Triples = 24,997,044
109.97 sec : 24,997,044 Triples : 227,314.05 per second


Re: Re: TDB2 bulk loader - multiple files into different graph per file

2022-08-28 Thread Lorenz Buehmann

Yep, I already recognized that I forgot to mention hardware and details:


- file size compressed: 5,9G

- file size uncompressed: 23G


- Server:

    - AMD EPYC 7443P 24-Core Processor
    - 256GB RAM
    - 4 x 8TB SSD  Samsung_SSD_870 as a ZFS raid, i.e. ~30TB


- Jena version (latest release .4.6.0):

TDB2:   VERSION: 4.6.0
TDB2:   BUILD_DATE: 2022-08-20T08:22:47Z

- TDB2 loader is the default one, i.e. it should be 'phased'?

- I rerun the loader phased vs parallel on compress vs uncompressed:

https://gist.github.com/LorenzBuehmann/27f232a1fd2c2a95600115b18958458b


-> compressed one degrades immediately to an avg of 16,000/s vs 
140,000/s on the uncompressed data - looks horrible



And I yes, I also tend to decompress via OS tool before loading




On 28.08.22 13:55, Andy Seaborne wrote:



On 28/08/2022 09:58, Lorenz Buehmann wrote:

Hi Andy,

thanks for fast response.

I see - the only drawback with wrapping the streams into TriG is when 
we have Turtle syntax files (or lets say any non N-Triples format) - 
afaik, prefixes aren't allowed inside graphs, i.e. at that point 
you're lost. 

>
What I did now is to pipe those files into riot first which then 
generates N-Triples which then can be wrapped in TriG graphs. Indeed, 
we have the riot overhead here, i.e. the data is parsed twice. Still 
faster though then loading graphs in separate TDB loader calls, so I 
guess I can live with this.



Exercise in text processing :-)

Spit out the prefixes into a separate TTL file (grep!) and load that 
file as well.




Having a follow up question:

I could see a huge difference between read compressed (Bzip) vs 
uncompressed file:


I put the output until the triples have been loaded here as the index 
creating should be affected by the compression:



# uncompressed with tdb2.tdbloader


Which loader?
And what hardware?

(--loader=parallel may not make much of a difference at 100m)


14:24:40 INFO  loader  :: Add: 163,000,000 
river_planet-latest.osm.pbf.ttl (Batch: 144,320 / Avg: 140,230)
14:24:42 INFO  loader  :: Finished: 
output/river_planet-latest.osm.pbf.ttl: 163,310,838 tuples in 
1165.30s (Avg: 140,145)



# compressed with tdb2.tdbloader

17:37:37 INFO  loader  :: Add: 163,000,000 
river_planet-latest.osm.pbf.ttl.bz2 (Batch: 19,424 / Avg: 16,050)
17:37:40 INFO  loader  :: Finished: 
output/river_planet-latest.osm.pbf.ttl.bz2: 163,310,838 tuples in 
10158.16s (Avg: 16,076)


That is bad!
Was it consistently slow through the load?

If you are relying on Jena to do the bz2 decompress, then it is using 
Commons Compress.


gz is done (via Commons Compress) in native code. I use gz and if I 
get a bz2 file, I decompress it with OS tools.


So loading the compressed file is ~9x slower then the compressed one. 
Can we consider this as expected? Note, here we have a geospatial 
dataset with millions of geometry literals. Not sure if this is also 
something that makes things worse.


What are your experiences with loading compressed vs uncompressed data?


bz2 is expensive - it is focuses on max compression. Coupled with 
being java (not so much the java, as being not highly tuned code 
decompression code) it coudl be a factor.


Usually (gz) there is a slight slow down if using SSD as source. HDD 
can be either way.


    Andy




Cheers,

Lorenz


On 26.08.22 17:02, Andy Seaborne wrote:

Hi Lorenz,

No - there isn't an option.

The way to do it is to prepare the load as quads by, for example, 
wrapping in TriG syntax around the files or adding the G to N-triples.


This can be done streaming and piped into the loader (with --syntax= 
if not N-quads).


> By the way, the tdb2.xloader has no option for named graphs at all?

The input needs to be prepared as quads.

    Andy

On 26/08/2022 15:03, Lorenz Buehmann wrote:

Hi all,

is there any option to use TDB2 bulk loader (tdb2.xloader or just 
tdb2.loader) to load multiple files into multiple different named 
graphs? Like


tdb2.loader --loc ./tdb2/dataset --graph  file1 --graph  
file2 ...


I'm asking because I thought the initial loading is way faster then 
iterating over multiple (graph, file) pairs and running the TDB2 
loader for each pair?



By the way, the tdb2.xloader has no option for named graphs at all?


Cheers,

Lorenz



Re: Re: TDB2 bulk loader - multiple files into different graph per file

2022-08-28 Thread Lorenz Buehmann

Hi Andy,

thanks for fast response.

I see - the only drawback with wrapping the streams into TriG is when we 
have Turtle syntax files (or lets say any non N-Triples format) - afaik, 
prefixes aren't allowed inside graphs, i.e. at that point you're lost. 
What I did now is to pipe those files into riot first which then 
generates N-Triples which then can be wrapped in TriG graphs. Indeed, we 
have the riot overhead here, i.e. the data is parsed twice. Still faster 
though then loading graphs in separate TDB loader calls, so I guess I 
can live with this.


Having a follow up question:

I could see a huge difference between read compressed (Bzip) vs 
uncompressed file:


I put the output until the triples have been loaded here as the index 
creating should be affected by the compression:



# uncompressed with tdb2.tdbloader

14:24:40 INFO  loader  :: Add: 163,000,000 
river_planet-latest.osm.pbf.ttl (Batch: 144,320 / Avg: 140,230)
14:24:42 INFO  loader  :: Finished: 
output/river_planet-latest.osm.pbf.ttl: 163,310,838 tuples in 1165.30s 
(Avg: 140,145)



# compressed with tdb2.tdbloader

17:37:37 INFO  loader  :: Add: 163,000,000 
river_planet-latest.osm.pbf.ttl.bz2 (Batch: 19,424 / Avg: 16,050)
17:37:40 INFO  loader  :: Finished: 
output/river_planet-latest.osm.pbf.ttl.bz2: 163,310,838 tuples in 
10158.16s (Avg: 16,076)



So loading the compressed file is ~9x slower then the compressed one. 
Can we consider this as expected? Note, here we have a geospatial 
dataset with millions of geometry literals. Not sure if this is also 
something that makes things worse.


What are your experiences with loading compressed vs uncompressed data?


Cheers,

Lorenz


On 26.08.22 17:02, Andy Seaborne wrote:

Hi Lorenz,

No - there isn't an option.

The way to do it is to prepare the load as quads by, for example, 
wrapping in TriG syntax around the files or adding the G to N-triples.


This can be done streaming and piped into the loader (with --syntax= 
if not N-quads).


> By the way, the tdb2.xloader has no option for named graphs at all?

The input needs to be prepared as quads.

    Andy

On 26/08/2022 15:03, Lorenz Buehmann wrote:

Hi all,

is there any option to use TDB2 bulk loader (tdb2.xloader or just 
tdb2.loader) to load multiple files into multiple different named 
graphs? Like


tdb2.loader --loc ./tdb2/dataset --graph  file1 --graph  
file2 ...


I'm asking because I thought the initial loading is way faster then 
iterating over multiple (graph, file) pairs and running the TDB2 
loader for each pair?



By the way, the tdb2.xloader has no option for named graphs at all?


Cheers,

Lorenz



TDB2 bulk loader - multiple files into different graph per file

2022-08-26 Thread Lorenz Buehmann

Hi all,

is there any option to use TDB2 bulk loader (tdb2.xloader or just 
tdb2.loader) to load multiple files into multiple different named 
graphs? Like


tdb2.loader --loc ./tdb2/dataset --graph  file1 --graph  file2 ...

I'm asking because I thought the initial loading is way faster then 
iterating over multiple (graph, file) pairs and running the TDB2 loader 
for each pair?



By the way, the tdb2.xloader has no option for named graphs at all?


Cheers,

Lorenz



Re: Re: How to fix wrong semantic rule?

2022-07-26 Thread Lorenz Buehmann
Apparently I'm a bit lost as I still do not understand how the current 
inferences are generated.


You did not mention how you run Jena at all and how you setup 
inferencing. I also mentioned that there is no such reasoner class in 
the official Jena source code and we're not aware of your custom code.


All I got so far is that

?x rdfs:subClassOf ?y => ?x rdf:type ?y


It's not uncommon to have this kind of inference e.g. in biological domain

If you don't want to get that, the question would be more why you get 
that. This is **not** a standard inference in RDFS.



Nevertheless I cannot help as I do not understand neither

i) the current setup in Jena

ii) what inferences are currently returned

iii) what inferences are intended to be returned


Please answer all those points as precise as possible.

On 26.07.22 10:30, Dương Hồ wrote:

Oh Sorry !
That isn't what i'm wondering.
that is true inferences. But i don't want get that.

Vào Th 3, 26 thg 7, 2022 vào lúc 14:45 Andy Seaborne  đã
viết:



On 26/07/2022 08:36, Dương Hồ wrote:

If X is a subclass of Y,
then an instance of X is of type Y.

yes.
In Class X I have instance :X

If you say something is a class and also it is an instance (it is in the
class), you are going to get some weird inferences. You have a set that
is a member of itself.

As already asked: Show us the data and rules (a small, complete example).

  Andy


so this look like

Vào Th 3, 26 thg 7, 2022 vào lúc 14:33 Andy Seaborne 

đã

viết:



On 25/07/2022 16:17, Lorenz Buehmann wrote:

Good Afternoon.


There is no such RDFSExptRuleReasoner reasoner in standard Jena, or I
just cannot find the code in https://github.com/apache/jena

So I don't know what you're referring to. Can you explain this please?



hi all.
I'm using the reasoner RDFSExptRuleReasoner to enforce the rule:
With two classes X, Y if X is a subclass of Y, then X also has type Y.

Ok, so where is the rule Jena rule syntax?

If X is a subclass of Y,
then an instance of X is of type Y.

not, X is of type Y.

:a rdf:type :X .
=>
:a rdf:type :Y .


Let's say I have 3 classes X,Y,Z:
X is a subclass of Y
Z is a subclass of Y

And I execute the query :
If A has type Y
And B has type Y
then A and B are the same.

Now you are talking about queries. How do you execute the "query"?

Also, in which domain does this hold? If John is a person and Mary is a
person, both are the same individual wouldn't maker sense so I'm
interested in your data.


=> Then I get X same as Y

I don't get that conclusion via A and B, also what do you mean by

"get"?

But semantically, X is completely different from Y.
How can I handle this case?


We should start with sample data and the sample rules I guess


Re: How to fix wrong semantic rule?

2022-07-25 Thread Lorenz Buehmann

Good Afternoon.


There is no such RDFSExptRuleReasoner reasoner in standard Jena, or I 
just cannot find the code in https://github.com/apache/jena


So I don't know what you're referring to. Can you explain this please?



hi all.
I'm using the reasoner RDFSExptRuleReasoner to enforce the rule:
With two classes X, Y if X is a subclass of Y, then X also has type Y.

Ok, so where is the rule Jena rule syntax?

Let's say I have 3 classes X,Y,Z:
X is a subclass of Y
Z is a subclass of Y

And I execute the query :
If A has type Y
And B has type Y
then A and B are the same.


Now you are talking about queries. How do you execute the "query"?

Also, in which domain does this hold? If John is a person and Mary is a 
person, both are the same individual wouldn't maker sense so I'm 
interested in your data.




=> Then I get X same as Y

I don't get that conclusion via A and B, also what do you mean by "get"?

But semantically, X is completely different from Y.
How can I handle this case?


We should start with sample data and the sample rules I guess


Re: Re: [MASSMAIL]Re: Large *.dat files in Fuseki

2022-07-07 Thread Lorenz Buehmann
I think we should wait for Andy here with further input as he's the 
persons who basically designed and implemented all the fancy stuff and 
knows better advice for sure.


@Andy Did you read the whole discussion and can you verify that it's 
expected behavior that lot's of daily updates lead to such a big growth 
of the node table files?


On 07.07.22 10:53, Bartalus Gáspár wrote:

Hi Lorenz,

Would you recommend using tdb1 instead of tdb2 for our use case? What would be 
the differences?
We are using fuseki 4.5.0 btw.

Gaspar


On 6 Jul 2022, at 14:39, Bartalus Gáspár 
 wrote:

Hi,

Most of the updates are DELETE/INSERT queries, i.e

DELETE {?s ?p ?oldValue}
INSERT {?s ?p ?newValue}
WHERE {
  OPTIONAL {?s ?p ?oldValue}
  #derive ?newValue from somewhere
}

We also have some separate DELETE queries and INSERT queries.

I’ve tried HTTP POST /$/compact/db_name and as a result the files are getting 
back to normal size. However, as far as I can tell the old files are also kept. 
This is the folder structure I see:
- databases/db_name/Data-0001 - with the old large files
- databases/db_name/Data-0002 - presumably the result of the compact operation 
with normal file sizes.

Is there also some operation (http or cli) that would keep only one (the 
latest) data folder, i.e. delete the old files from Data-0001?

Gaspar


On 6 Jul 2022, at 12:52, Lorenz Buehmann  
wrote:

Ok, interesting

so

we have

- 150k triples, rather small dataset

- loaded into 10MB node table files

- 10 updates every 5s

- which makes up to 24 * 60 * 60 / 5 * 10 ~ 200k updates per day

- and leads to 10GB node table files


Can you share the shape of those update queries?


After doing a "compact" operation, the files are getting back to "normal" size?


On 06.07.22 10:36, Bartalus Gáspár wrote:

Hi Lorenz,

Thanks for quick feedback and clarification on lucene indexes.

Here are my answers to your questions:
- We are uploading 7 ttl files to our dataset, where 1 is larger 6Mb, the 
others are below 200Kb.
- The overall number of triples after data upload is  ~15.
- We have around 10 SPARQL UPDATE queries that are executed on a regular and 
frequent basis, i.e. every 5 seconds. We also have 5 such queries that are 
executed each minute. But most of the time they do not produce any outcome, 
i.e. the dataset is not altered, and when they do, there are just a couple of 
triples that are added to the dataset.
- These *.dat files start from ~10Mb in size, and after a day or so some of 
them grow to ~10Gb.

We have ~300 blank nodes, and ~half of the triples have a literal in the object 
position, so ~75000.

Best regards,
Gaspar




On 6 Jul 2022, at 10:55, Lorenz Buehmann  
wrote:

Hi and welcome Gaspar.


Those files do contain the node tables.

A Lucene index is never computed by default and would be contained in Lucene 
specific index files.


Can you give some details about the

- size of the files
- the number of triples
- the number triples added/removed/changed
- the frequency of updates
- how much the files grow
- what kind of data you insert? Lots of blank nodes? Or literals?

Also, did you try a compact operation during time?

Lorenz

On 06.07.22 09:40, Bartalus Gáspár wrote:

Hi Jena support team,

We are experiencing an issue with Jena Fuseki databases. In the databases 
folder we see some files called SPO.dat, OSP.dat, etc., and the size of these 
files are growing quickly. From our understanding these files are containing 
the Lucene indexes. We would have two questions:

1. Why are these files growing rapidly, although the underlying data (triples) 
are not being changed, or only slightly changed?
2. Can we disable indexing easily, since we are not using full text searches in 
our SPARQL queries?

Our usage of Jena Fuseki:

* Start the server with `fuseki-server —port 3030`
* Create databases with HTTP POST to 
`/$/datasets?state=active&dbType=tdb2&dbName=db_name`
* Upload ttl files with HTTP POST to /db_name/data

Thanks in advance for your feedback, and if you’d require more input from our 
side, please let me know.

Best regards,
Gaspar Bartalus



Re: Re: [MASSMAIL]Re: Large *.dat files in Fuseki

2022-07-06 Thread Lorenz Buehmann

Hi,


you should open another thread where we can discuss your question, 
please don't mix up threads - makes me confused


Also, did you check SPARQL 1.1 Update W3C documents? They are online and 
have lots of examples

On 06.07.22 13:50, Dương Hồ wrote:

DELETE {?s ?p ?oldValue}
INSERT {?s ?p ?newValue}
WHERE {
   OPTIONAL {?s ?p ?oldValue}
   #derive ?newValue from somewhere
}
If i want update 3 triples how to use this formats?
Can you help me?


Vào 18:39, Th 4, 6 thg 7, 2022 Bartalus Gáspár
 đã viết:


Hi,

Most of the updates are DELETE/INSERT queries, i.e

DELETE {?s ?p ?oldValue}
INSERT {?s ?p ?newValue}
WHERE {
   OPTIONAL {?s ?p ?oldValue}
   #derive ?newValue from somewhere
}

We also have some separate DELETE queries and INSERT queries.

I’ve tried HTTP POST /$/compact/db_name and as a result the files are
getting back to normal size. However, as far as I can tell the old files
are also kept. This is the folder structure I see:
- databases/db_name/Data-0001 - with the old large files
- databases/db_name/Data-0002 - presumably the result of the compact
operation with normal file sizes.

Is there also some operation (http or cli) that would keep only one (the
latest) data folder, i.e. delete the old files from Data-0001?

Gaspar


On 6 Jul 2022, at 12:52, Lorenz Buehmann <

buehm...@informatik.uni-leipzig.de> wrote:

Ok, interesting

so

we have

- 150k triples, rather small dataset

- loaded into 10MB node table files

- 10 updates every 5s

- which makes up to 24 * 60 * 60 / 5 * 10 ~ 200k updates per day

- and leads to 10GB node table files


Can you share the shape of those update queries?


After doing a "compact" operation, the files are getting back to

"normal" size?


On 06.07.22 10:36, Bartalus Gáspár wrote:

Hi Lorenz,

Thanks for quick feedback and clarification on lucene indexes.

Here are my answers to your questions:
- We are uploading 7 ttl files to our dataset, where 1 is larger 6Mb,

the others are below 200Kb.

- The overall number of triples after data upload is  ~15.
- We have around 10 SPARQL UPDATE queries that are executed on a

regular and frequent basis, i.e. every 5 seconds. We also have 5 such
queries that are executed each minute. But most of the time they do not
produce any outcome, i.e. the dataset is not altered, and when they do,
there are just a couple of triples that are added to the dataset.

- These *.dat files start from ~10Mb in size, and after a day or so

some of them grow to ~10Gb.

We have ~300 blank nodes, and ~half of the triples have a literal in

the object position, so ~75000.

Best regards,
Gaspar




On 6 Jul 2022, at 10:55, Lorenz Buehmann <

buehm...@informatik.uni-leipzig.de> wrote:

Hi and welcome Gaspar.


Those files do contain the node tables.

A Lucene index is never computed by default and would be contained in

Lucene specific index files.


Can you give some details about the

- size of the files
- the number of triples
- the number triples added/removed/changed
- the frequency of updates
- how much the files grow
- what kind of data you insert? Lots of blank nodes? Or literals?

Also, did you try a compact operation during time?

Lorenz

On 06.07.22 09:40, Bartalus Gáspár wrote:

Hi Jena support team,

We are experiencing an issue with Jena Fuseki databases. In the

databases folder we see some files called SPO.dat, OSP.dat, etc., and the
size of these files are growing quickly. From our understanding these files
are containing the Lucene indexes. We would have two questions:

1. Why are these files growing rapidly, although the underlying data

(triples) are not being changed, or only slightly changed?

2. Can we disable indexing easily, since we are not using full text

searches in our SPARQL queries?

Our usage of Jena Fuseki:

* Start the server with `fuseki-server —port 3030`
* Create databases with HTTP POST to

`/$/datasets?state=active&dbType=tdb2&dbName=db_name`

* Upload ttl files with HTTP POST to /db_name/data

Thanks in advance for your feedback, and if you’d require more input

from our side, please let me know.

Best regards,
Gaspar Bartalus





Re: Re: Re: [MASSMAIL]Re: Large *.dat files in Fuseki

2022-07-06 Thread Lorenz Buehmann
You can trigger compaction from CLI via tdb2.tdbcompact (needs Fuseki 
being down I think) or with Fuseki running as POST request:


https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html#datasets-and-services

On 06.07.22 11:52, Lorenz Buehmann wrote:

Ok, interesting

so

we have

- 150k triples, rather small dataset

- loaded into 10MB node table files

- 10 updates every 5s

- which makes up to 24 * 60 * 60 / 5 * 10 ~ 200k updates per day

- and leads to 10GB node table files


Can you share the shape of those update queries?


After doing a "compact" operation, the files are getting back to 
"normal" size?



On 06.07.22 10:36, Bartalus Gáspár wrote:

Hi Lorenz,

Thanks for quick feedback and clarification on lucene indexes.

Here are my answers to your questions:
- We are uploading 7 ttl files to our dataset, where 1 is larger 6Mb, 
the others are below 200Kb.

- The overall number of triples after data upload is  ~15.
- We have around 10 SPARQL UPDATE queries that are executed on a 
regular and frequent basis, i.e. every 5 seconds. We also have 5 such 
queries that are executed each minute. But most of the time they do 
not produce any outcome, i.e. the dataset is not altered, and when 
they do, there are just a couple of triples that are added to the 
dataset.
- These *.dat files start from ~10Mb in size, and after a day or so 
some of them grow to ~10Gb.


We have ~300 blank nodes, and ~half of the triples have a literal in 
the object position, so ~75000.


Best regards,
Gaspar



On 6 Jul 2022, at 10:55, Lorenz Buehmann 
 wrote:


Hi and welcome Gaspar.


Those files do contain the node tables.

A Lucene index is never computed by default and would be contained 
in Lucene specific index files.



Can you give some details about the

- size of the files
- the number of triples
- the number triples added/removed/changed
- the frequency of updates
- how much the files grow
- what kind of data you insert? Lots of blank nodes? Or literals?

Also, did you try a compact operation during time?

Lorenz

On 06.07.22 09:40, Bartalus Gáspár wrote:

Hi Jena support team,

We are experiencing an issue with Jena Fuseki databases. In the 
databases folder we see some files called SPO.dat, OSP.dat, etc., 
and the size of these files are growing quickly. From our 
understanding these files are containing the Lucene indexes. We 
would have two questions:


1. Why are these files growing rapidly, although the underlying 
data (triples) are not being changed, or only slightly changed?
2. Can we disable indexing easily, since we are not using full text 
searches in our SPARQL queries?


Our usage of Jena Fuseki:

* Start the server with `fuseki-server —port 3030`
* Create databases with HTTP POST to 
`/$/datasets?state=active&dbType=tdb2&dbName=db_name`

* Upload ttl files with HTTP POST to /db_name/data

Thanks in advance for your feedback, and if you’d require more 
input from our side, please let me know.


Best regards,
Gaspar Bartalus



Re: Re: [MASSMAIL]Re: Large *.dat files in Fuseki

2022-07-06 Thread Lorenz Buehmann

Ok, interesting

so

we have

- 150k triples, rather small dataset

- loaded into 10MB node table files

- 10 updates every 5s

- which makes up to 24 * 60 * 60 / 5 * 10 ~ 200k updates per day

- and leads to 10GB node table files


Can you share the shape of those update queries?


After doing a "compact" operation, the files are getting back to 
"normal" size?



On 06.07.22 10:36, Bartalus Gáspár wrote:

Hi Lorenz,

Thanks for quick feedback and clarification on lucene indexes.

Here are my answers to your questions:
- We are uploading 7 ttl files to our dataset, where 1 is larger 6Mb, the 
others are below 200Kb.
- The overall number of triples after data upload is  ~15.
- We have around 10 SPARQL UPDATE queries that are executed on a regular and 
frequent basis, i.e. every 5 seconds. We also have 5 such queries that are 
executed each minute. But most of the time they do not produce any outcome, 
i.e. the dataset is not altered, and when they do, there are just a couple of 
triples that are added to the dataset.
- These *.dat files start from ~10Mb in size, and after a day or so some of 
them grow to ~10Gb.

We have ~300 blank nodes, and ~half of the triples have a literal in the object 
position, so ~75000.

Best regards,
Gaspar




On 6 Jul 2022, at 10:55, Lorenz Buehmann  
wrote:

Hi and welcome Gaspar.


Those files do contain the node tables.

A Lucene index is never computed by default and would be contained in Lucene 
specific index files.


Can you give some details about the

- size of the files
- the number of triples
- the number triples added/removed/changed
- the frequency of updates
- how much the files grow
- what kind of data you insert? Lots of blank nodes? Or literals?

Also, did you try a compact operation during time?

Lorenz

On 06.07.22 09:40, Bartalus Gáspár wrote:

Hi Jena support team,

We are experiencing an issue with Jena Fuseki databases. In the databases 
folder we see some files called SPO.dat, OSP.dat, etc., and the size of these 
files are growing quickly. From our understanding these files are containing 
the Lucene indexes. We would have two questions:

1. Why are these files growing rapidly, although the underlying data (triples) 
are not being changed, or only slightly changed?
2. Can we disable indexing easily, since we are not using full text searches in 
our SPARQL queries?

Our usage of Jena Fuseki:

* Start the server with `fuseki-server —port 3030`
* Create databases with HTTP POST to 
`/$/datasets?state=active&dbType=tdb2&dbName=db_name`
* Upload ttl files with HTTP POST to /db_name/data

Thanks in advance for your feedback, and if you’d require more input from our 
side, please let me know.

Best regards,
Gaspar Bartalus



Re: Large *.dat files in Fuseki

2022-07-06 Thread Lorenz Buehmann

Hi and welcome Gaspar.


Those files do contain the node tables.

A Lucene index is never computed by default and would be contained in 
Lucene specific index files.



Can you give some details about the

- size of the files
- the number of triples
- the number triples added/removed/changed
- the frequency of updates
- how much the files grow
- what kind of data you insert? Lots of blank nodes? Or literals?

Also, did you try a compact operation during time?

Lorenz

On 06.07.22 09:40, Bartalus Gáspár wrote:

Hi Jena support team,

We are experiencing an issue with Jena Fuseki databases. In the databases 
folder we see some files called SPO.dat, OSP.dat, etc., and the size of these 
files are growing quickly. From our understanding these files are containing 
the Lucene indexes. We would have two questions:

1. Why are these files growing rapidly, although the underlying data (triples) 
are not being changed, or only slightly changed?
2. Can we disable indexing easily, since we are not using full text searches in 
our SPARQL queries?

Our usage of Jena Fuseki:

* Start the server with `fuseki-server —port 3030`
* Create databases with HTTP POST to 
`/$/datasets?state=active&dbType=tdb2&dbName=db_name`
* Upload ttl files with HTTP POST to /db_name/data

Thanks in advance for your feedback, and if you’d require more input from our 
side, please let me know.

Best regards,
Gaspar Bartalus



Re: How to use reasoning in Jena Fuseki?

2022-07-01 Thread Lorenz Buehmann
I guess you have to use an assembler file to configure reasoning, at 
least for anything beyond RDFS (for just RDFS Simple, you can start 
fuseki with --rdfs param)


Here is a blog post how to do this: 
https://apothem.blog/apache-jena-fuseki-adding-reasoning-and-full-text-search-capabilities-to-a-dataset.html


On 01.07.22 15:48, Dương Hồ wrote:

Hi all !
I'm use Jena Fuseki webapp but i can't find setup reasoning in UI.
If i have (?x rdfs:subClassOf ?y) and (?y rdfs:subClassOf ?z)
so how to run reasoning to (?x rdf:type ?z)
Can you help me ?



Re: Re: Jena Full Text Search poor performance

2022-06-16 Thread Lorenz Buehmann
Wouldn't it be already sufficient to move the pattern to the top of the 
query? I thought Jena doesn't optimize custom property functions, i.e. 
won't switch the order of those?


On 15.06.22 22:26, Øyvind Gjesdal wrote:

Hi Pawel,

I think this could be due to the text:query being evaluated late in the
query, and other statements first computing many results, before the text
query limits it down. Maybe the contains filter gets applied earlier?

Would reordering the statements, expanding the property path and/or
enclosing the statement with the text:query in curly brackets help?

SELECT DISTINCT  ?this ?json WHERE   {
   { ?name text:query (tes:indexedValue '*Allergy*') .}
?this fhir:CodeSystem.name ?name.
   ?this rdf:type  fhir:CodeSystem .  ?this
fhir:Resource.jsonContent/fhir:value ?json .}

Another approach I use on text queries is using subqueries, for smaller
batched results, but you may have to expand the default text:query lucene
limit to walk through all results.

SELECT DISTINCT  ?this ?json WHERE {
   {SELECT ?name { ?name text:query (tes:indexedValue '*Allergy*') .}
#  LIMIT N OFFSET 0
}
?this fhir:CodeSystem.name ?name.
   ?this rdf:type fhir:CodeSystem .?this
fhir:Resource.jsonContent/fhir:value ?json .}

I do use text:query on larger indexes on a similar server configuration,
without experiencing any issues, but I haven't compared results for filter
contains and text:query.

Best regards,
Øyvind

On Wed, Jun 15, 2022 at 3:37 PM Goławski, Paweł 
wrote:


Hi,

I’m trying to use Jena Full Text Search feature according to
https://jena.apache.org/documentation/query/text-query.html

I’ve noticed that queries using “*text:query”* are very slow: ~20 times
slower that similar using “*FILTER contains”* clause.

There are ~5.5M triples in database, 18230 triples with indexed predicate.

Database takes 1.3GB and index 4.2M disc space.

Available memory for fuseki server is 16GB.



My config is quite easy, there is nothing special configured:



**PREFIX
 :<#>
PREFIX fuseki:  http://jena.apache.org/fuseki#
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX rdfs:http://www.w3.org/2000/01/rdf-schema#
PREFIX ja:  http://jena.hpl.hp.com/2005/11/Assembler#
PREFIX tdb: http://jena.hpl.hp.com/2008/tdb#
PREFIX tdb2:http://jena.apache.org/2016/tdb#
PREFIX text:http://jena.apache.org/text#
PREFIX skos:http://www.w3.org/2004/02/skos/core#
PREFIX fhir:http://hl7.org/fhir/
PREFIX tes: http://mycompany/tes/

[] rdf:type fuseki:Server ;
fuseki:*services *(
:service
) .

:service rdf:type fuseki:Service ;
  fuseki:*name *"tes" ;
  fuseki:*serviceQuery   *"query" , "sparql" ;
*# SPARQL query service *fuseki:*serviceUpdate  
*"update" ;
*# SPARQL update service *fuseki:*serviceReadWriteGraphStore 
*"data" ;
*# SPARQL Graph store protocol (read and write) 
*fuseki:*serviceReadGraphStore  *"get" ;
  fuseki:*serviceUpload  *"upload" ;
  fuseki:*dataset *:text_dataset ;
.


*# A TextDataset is a regular dataset with a text index.*:text_dataset rdf:type 
   text:TextDataset ;
   text:*dataset   *:tdb2_dataset_readwrite;
   text:*index *:indexLucene ;
.


*# A TDB dataset used for RDF storage*:tdb2_dataset_readwrite rdf:type 
tdb2:DatasetTDB ;
 tdb2:*location  *"databases/db" ;
.


:indexLucene a text:TextIndexLucene ;
  text:*directory *"databases/db-index" ;
  text:*entityMap *:entMap ;
  text:*storeValues *true ;
  text:*analyzer *[
a text:StandardAnalyzer ;

*#   text:stopWords ("the" "a" "an" "and" "but")
   *] ;

*#text:queryAnalyzer [ a text:StandardAnalyzer ] ; *text:*queryParser 
*text:QueryParser ;

*# text:multilingualSupport true ; # optional*.

*# Entity map (see documentation for other options)*:entMap a text:EntityMap ;
 text:*defaultField *"tesValue" ;
 text:*entityField  *"uri" ;
 text:*uidField *"uid" ;
 text:*langField*"lang" ;
 text:*graphField   *"graph" ;
 text:*map *(
  [ text:*field *"tesValue" ;
text:*predicate *tes:indexedValue
  ]
  )
.

**



There are very similar SPARQL queries:

· with “text:query” clause:



PREFIX  tes:  http://mycompany/tes/

PREFIX  fhir: http://hl7.org/fhir/

PREFIX  rdf:  http://www.w3.org/1999/02/22-rdf-syntax-ns#

PREFIX  owl:  http://www.w3.org/2002/07/owl#

Re: Semantics of SERVICE w.r.t. slicing

2022-06-02 Thread Lorenz Buehmann
The semantics should be in a separate document: 
https://www.w3.org/TR/sparql11-federated-query/#fedSemantics

On 02.06.22 22:19, Claus Stadler wrote:

Hi,

I noticed some interesting results when using SERVICE with a sub query 
with a slice (limit / offset).



Preliminary Remark:

Because SPARQL semantics is bottom up, a query such as the following 
will not yield bindings for ?x:


SELECT * {
  SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }

  SERVICE  { BIND(?s AS ?x) }
}


Query engines, such as Jena, attempt to optimize execution. For 
instance, in the following query,


instead of retrieving all labels, jena uses each binding for a Musical 
Artist to perform a lookup at the service.


The result is semantically equivalent to bottom up evaluation (without 
result set limits) - just much faster.


SELECT * {
  SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
  SERVICE  { ?s 
 ?x }

}


The main point:

However, the following query with ARQ interestingly yields one binding 
for every musical artist - which contradicts the bottom-up paradigm:


SELECT * {
  SERVICE  { SELECT * { ?s a 
 } LIMIT 5 }
  SERVICE  { SELECT * { ?s 
 ?x } LIMIT 1 }

}


 "Aarti Mukherjee"@en
 "Abatte Barihun"@en
... 3 more results ...


With bottom-up semantics, the second service clause would only fetch a 
single binding so in the unlikely event that it happens to join with a 
musical artist I'd expect at most one binding


in the overall result set.

Now I wonder whether this is a bug or a feature.

I know that Jena's VarFinder is used to decide whether to perform a 
bottom-up evaluation using OpJoin or a correlated join using 
OpSequence which results in the different outcomes.


The SPARQL spec doesn't say much about the semantics of Service 
(https://www.w3.org/TR/sparql11-query/#sparqlAlgebraEval)


So I wonder which behavior is expected when using SERVICE with SLICE'd 
queries.



Cheers,

Claus




Re: Graph Store compared to tdb2.tdbloader

2022-05-29 Thread Lorenz Buehmann

Hi David,

On 29.05.22 15:34, David Habgood wrote:

Hi,

I've been running Apache Jena Fuseki 4.5.0 in a docker container. I've
loaded data to it two ways: though the graph store protocol, and using
tdb2.tdbloader before starting Jena Fuseki. No issues with either, however
I'm interested in what differences the two methods have.

With the graph store protocol, I can put larger RDF files 'close' to where
the docker container is running and handle any network issues, so the loads
have been fine. Loading data this way is convenient and allows updates
while Jena Fuseki is running. Are indexes continually updated as more data
is loaded through the graph store protocol? Are there any other
disadvantages to this method or reasons it (may) not be advised for large
datasets? Conversely, I'm aware tdb2.tdbloader can load large datasets, is
there any reason/s it should be used over graph store protocol?

Are there any other methods I should be considering (other than SPARQL
INSERT)?

I'll also be running GeoSPARQL Jena for some instances, and needing to
spatially index data. I think this will necessitate using tdb2.tdbloader
and generating the spatial index 'offline' before starting Jena/Fuseki - or
are there other ways?


At least when you have the GeoSPARQL layer enabled in your Fuseki 
assembler config, the index should be computed on the first start of 
Fuseki just once and serialized at the configured destination. Only the 
text index has to be generated offline before




Thanks
David Habgood



Re: TDB2 persistence and jena fuseki upgrades

2022-05-09 Thread Lorenz Buehmann

Hi,

can you verify that querying TDB2 from commandline still works? Can you 
show the Fuseki assembler config or how do you start Fuseki in general?


On 09.05.22 14:54, Øyvind Gjesdal wrote:

I've recently completed some upgrades using the fuseki binaries, first with
major versions, from different 3.X to 4.4 and then again from 4.4 to 4.5.

* I noticed that on the upgrades my tdb2 databases returned no results in
sparql after upgrading, but they both had similar file size in the
DATA- dir.
* I downgraded once from 4.5 to 4.4 and noticed that the tdb2 database
still gave no sparql results, after first having been started with 4.5,
then downgrading, restarting and querying again.
* The database can still be updated with rest calls, and returns what seems
correct data when I repopulate using the graph store rest api.

Is this the expected behaviour, or is it a bug that I should try to
recreate?

In the first case, maybe the documentation at
https://jena.apache.org/documentation/fuseki2/ could have a small
"Upgrading section" mentioning ways of upgrading without losing persistence.


... bulk uploads into a live Fuseki can be 100’s of millions of triples.

On another note, it is really cool to see the bulk upload RDF data into
tdb2, coming from using tdb1: we only update in the 1s to 10s of millions,
but it is both fast and seems to consistently work, and we don't have to
play around with expanding JVM memory as the requests grow.

Øyvind



Re: Re: tdbloader - how to indicate destination graph?

2022-04-11 Thread Lorenz Buehmann
Ok, but did you understand what I was trying to say? As long as your 
data format does have quad data, the TDB loader will assume to load it 
into the corresponding graphs defined in the data. And only the triples 
without being mapped a non default graph will be loaded into the graph 
given with --graph


As I said, either you avoid the generation as quads, or you can convert 
the quad stream to triples stream. TDB loader can also read from stdin, 
so you could pipe the quads to triples to TDB loader.


On 11.04.22 09:11, robert.ba...@tiscali.it wrote:
   


Hi,
I load triples in RDF Thrift format generated using Jena's
StreamRDF2Thrift class.
Loading the same triples with the ttl format I
have no problems.

Il 11.04.2022 07:33 Lorenz Buehmann ha scritto:

There is no error but a warning or do I misunderstand you?

The

problem is that your data is already quads, so for each triple a
graph is already given and thus put in the corresponding graph (if the


graph IRI is mentioned). Only for triples without a graph those

triples

will be put into the graph you set via --graph parameter



What kind of data do you load? N-Quads or TriG? Not sure if the TDB
loader has flag to ignore the graphs in the file, if not simply
converting to N-Triples in advance would be an option which would just


drop the graph from the quad and makes it a plain triple.

On

10.04.22 18:06, robert.ba...@tiscali.it [2]wrote:

hi i am using

jena-fuseki version 4.3.2 i have a dataset with default graph and
another graph: when i trying to run the command:
E:apache-jena-4.3.2battdb2_tdbloader.bat -loader=parallel
-graph=http://host/graphname1 -loc
E:apache-jena-fuseki-4.3.2rundatabasesbatch E:sharedttlcustomer.rt but I
get the error: "Warning: Quads format given - only the default graph is
loaded into the graph for --graph" Suggestion? Does the new version
(4.4.0) have any improvements for the loading phase (tdbloader)? thanks
Voucher MISE per P.IVA e PMI: fino a 2500EUR per la Banda Ultralarga.
https://casa.tiscali.it/promo/?u=https://promozioni.tiscali.it/voucher_business/?r=TS0A00025&dm=link&p=tiscali&utm_source=tiscali&utm_medium=link&utm_campaign=voucherbusiness&wt_np=tiscali.link.footermail.voucherbusiness.btb
[1]..
   



Voucher MISE per P.IVA e PMI: fino a 2500€ per la Banda Ultralarga. 
https://casa.tiscali.it/promo/?u=https://promozioni.tiscali.it/voucher_business/?r=TS0A00025&dm=link&p=tiscali&utm_source=tiscali&utm_medium=link&utm_campaign=voucherbusiness&wt_np=tiscali.link.footermail.voucherbusiness.btb..




Re: tdbloader - how to indicate destination graph?

2022-04-10 Thread Lorenz Buehmann

There is no error but a warning or do I misunderstand you?

The problem is that your data is already quads, so for each triple a 
graph is already given and thus put in the corresponding graph (if the 
graph IRI is mentioned). Only for triples without a graph those triples 
will be put into the graph you set via --graph parameter


What kind of data do you load? N-Quads or TriG? Not sure if the TDB 
loader has flag to ignore the graphs in the file, if not simply 
converting to N-Triples in advance would be an option which would just 
drop the graph from the quad and makes it a plain triple.


On 10.04.22 18:06, robert.ba...@tiscali.it wrote:
   


hi i am using jena-fuseki version 4.3.2
i have a dataset with
default graph and another graph:

when i trying to run the
command:
E:apache-jena-4.3.2battdb2_tdbloader.bat -loader=parallel
-graph=http://host/graphname1 -loc
E:apache-jena-fuseki-4.3.2rundatabasesbatch E:sharedttlcustomer.rt

but
I get the error:
"Warning: Quads format given - only the default graph
is loaded into the graph for --graph"

Suggestion?

Does the new version
(4.4.0) have any improvements for the loading phase (tdbloader)?

thanks
   



Voucher MISE per P.IVA e PMI: fino a 2500€ per la Banda Ultralarga. 
https://casa.tiscali.it/promo/?u=https://promozioni.tiscali.it/voucher_business/?r=TS0A00025&dm=link&p=tiscali&utm_source=tiscali&utm_medium=link&utm_campaign=voucherbusiness&wt_np=tiscali.link.footermail.voucherbusiness.btb..




Re: ARQ variables with dashes

2022-04-05 Thread Lorenz Buehmann

Hi Barry,


Did you try SPARQL1.1 parser instead? Afaik, ARQ was always beyond 
SPARQL 1.1 or better said, already before SPARQL 1.1 with some extensions.


Indeed, Andy will correct me soon :D

The grammar files for JavaCC are here:

https://github.com/apache/jena/tree/main/jena-arq/Grammar

You can check arq.jj and sparql_11.jj


Or just wait for Andy's response ...


Cheers,

Lorenz



On 05.04.22 13:21, Nouwt, B. (Barry) wrote:

Hi everyone,

We are using ARQ's SPARQL parser to parse graph patterns and noticed that it 
allows dashes in variable names if these variables occur as the *object* 
location of a triple pattern. If the variable names at the *subject* location 
of a triple pattern contains dashes, it fails with a ParseException. As far as 
we could tell the SPARQL specification does not allow dashes in variable names 
at all (https://www.w3.org/TR/sparql11-query/#rVARNAME). The pattern1 and 
pattern2 below should both fail, but the first one does not fail and the second 
does fail.

String pattern1 = " https://www.tno.nl/example/b ?community-ID .";
ARQParser parser1 = new ARQParser(new StringReader(pattern1));
parser1.GroupGraphPatternSub();

String pattern2 = "?community-ID https://www.tno.nl/example/b  .";
ARQParser parser2 = new ARQParser(new StringReader(pattern2));
parser2.GroupGraphPatternSub();

Is this a bug?

Best regards,

Barry
This message may contain information that is not intended for you. If you are 
not the addressee or if this message was sent to you by mistake, you are 
requested to inform the sender and delete the message. TNO accepts no liability 
for the content of this e-mail, for the manner in which you use it and for 
damage of any kind resulting from the risks inherent to the electronic 
transmission of messages.



Re: SPARQL optional limiting results

2022-03-15 Thread Lorenz Buehmann

Hi,

I'm probably misunderstanding the query, but what is the purpose of the 
OPTIONAL here?


?graph is bound because of VALUES clause, ?concept is bound because of 
the graph pattern before the OPTIONAL as well.


So ?graph and ?concept are bound on the left hand side of the left-join 
aka OPTIONAL


Here is the algebra:

(join
  (table (vars ?graph)
(row [?graph])
(row [?graph])
  )
  (assign ((?graph ?*g0))
(leftjoin
  (distinct
(project (?concept ?prefLabelm ?altLabelm)
  (filter (= (lang ?prefLabelm) "fi")
(quadpattern
  (quad ?*g0 ??0 rdf:first ?concept)
  (quad ?*g0 ??0 rdf:rest ??1)
  (quad ?*g0 ??1 rdf:first ?score1)
  (quad ?*g0 ??1 rdf:rest ??2)
  (quad ?*g0 ??2 rdf:first ?prefLabelm)
  (quad ?*g0 ??2 rdf:rest rdf:nil)
  (quad ?*g0 ??0 text:query ??3)
  (quad ?*g0 ??3 rdf:first skos:prefLabel)
  (quad ?*g0 ??3 rdf:rest ??4)
  (quad ?*g0 ??4 rdf:first "aamiainen*")
  (quad ?*g0 ??4 rdf:rest rdf:nil)

  (sequence
(graph ?*g0
  (path ?concept (path* skos:broader) ??5))
(quadpattern (quad ?*g0 ??5 skos:topConceptOf ?graph)


Can you say what you want to achieve with the OPTIONAL maybe, it won't 
return any additional data as far as I can see.


On 14.03.22 14:30, Mikael Pesonen wrote:
Hi, not directly related to Jena, but I have a query in which optional 
clause limits the number of results. I thought it's never possible. So 
below query returns less results with optional enabled. Wonder why is 
that and what would be the correct way to get optional data so than 
all rows are returned?


SELECT *
WHERE
{
    VALUES ?graph { 
}

    GRAPH ?graph
    {
    {
        SELECT DISTINCT ?concept ?prefLabelm ?altLabelm WHERE
        {
            {
                (?concept ?score1 ?prefLabelm) text:query 
(skos:prefLabel "aamiainen*") .

    FILTER ( (lang(?prefLabelm) = "fi" ))
            }
    }
    }
   # OPTIONAL { ?concept skos:broader* [ skos:topConceptOf ?graph] }
    }
}

Re: Re: Geo indexing Wikidata

2022-02-23 Thread Lorenz Buehmann

Hi Greg,

thanks for providing such an informative answer.

On 23.02.22 17:59, Greg wrote:

Hi Lorenz,

Regarding your final point on the use of Euclidean distance for the
geof:distance, this is derived from Requirement A.3.14 on page 38 of the
GeoSPARQL standard (quoted below). The definition of the distance and
other query functions follows that of the Simple Features standard (ISO
19125-1). The Simple Features standard uses a two dimensional planar
approach, the distance calculation is Euclidean and Great Circle is out
of scope. Applying the distance function to non-planar SRS coordinates
is regarded as an acceptable error.

Ok, I see - now I'm understanding better.


The Jena implementation follows the GeoSPARQL standard by converting the
second Geometry Literal's coordinates to the first Geometry Literal's
SRS, if required.

A Great Circle distance filter function has been provided as
*spatialF:greatCircleGeom(...)*
(https://jena.apache.org/documentation/geosparql/). This is an extension
namepsace for Jena as it is outside the GeoSPARQL standard.
Yep, as you can see from my query I did use spatialF:distance in the end 
which maps to Haversine


Could you provide the WKT Geometry Literals returned by your query, so
that they can be tested directly for the asymmetry?


Sure, the data comes from Wikidata but here is a self-contained query 
with just the WKT literals in the VALUES clause:



PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX uom: <http://www.opengis.net/def/uom/OGC/1.0/>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX spatialF: <http://jena.apache.org/function/spatial#>
PREFIX afn: <http://jena.apache.org/ARQ/function#>

SELECT * {
  VALUES ?wkt1 {"Point(11.4167 
53.6333)"^^geo:wktLiteral "Point(11.575 48.1375)"^^geo:wktLiteral}
  VALUES ?wkt2 {"Point(11.4167 
53.6333)"^^geo:wktLiteral "Point(11.575 48.1375)"^^geo:wktLiteral}

  #FILTER(?wkt1 != ?wkt2 && str(?wkt1) < str(?wkt2))
  BIND(geof:distance(?wkt1, ?wkt2, uom:kilometer) as ?d1)
  BIND(geof:distance(?wkt2, ?wkt1, uom:kilometer) as ?d2)
  BIND(abs(?d1 - ?d2) as ?diff_d1_d2)
  BIND(spatialF:distance(?wkt1, ?wkt2, uom:kilometer) as ?d_hav)
  BIND(afn:max(abs(?d1 - ?d_hav), abs(?d2 - ?d_hav)) as ?diff_eucl_hav)
}

Note, I commented out the filter because here must be some bug, 
FILTER(?wkt1 != ?wkt2) always leads to an error or false. Can somebody 
verify this?


I also checked the source code, indeed the raw euclidean measure is the 
same for two points p1 and p2 - but the post-processing to map the value 
to a unit like kilometers does more math and depends on the starting 
longitude value.





Thanks,

Greg

*A.3.1.4 /conf/geometry-extension/query-functions*

Requirement: /req/geometry-extension/query-functions
Implementations shall support geof:distance, geof:buffer,
geof:convexHull, geof:intersection, geof:union, geof:difference,
geof:symDifference, geof:envelope and geof:boundary as SPARQL extension
functions, consistent with the definitions of the corresponding
functions (distance, buffer, convexHull, intersection, difference,
symDifference, envelope and boundary respectively) in Simple Features
[ISO 19125-1].


On 23/02/2022 08:56, Lorenz Buehmann wrote:

Thanks both for your very helpful input - I'm still a GeoSPARQL novice
and trying to learn stuff and first of all just use the Jena
implementation as efficient as possible.

On 21.02.22 15:22, Andy Seaborne wrote:



On 21/02/2022 09:07, Lorenz Buehmann wrote:

Any experience or comments so far?


Using SubsystemLifecycle, could make the conversions by

    GeoSPARQLOperations.convertGeoPredicates

extensible.

    Andy

But having coordinate location (P625), located on astronomical body
(P376) as properties of a thing, is dangerous because of monotonicity
in RDF:

   SELECT * { ?x wdt:P625 ?coords }

the association of P625 and P376 is lost.


Yep, I could simply omit the extra-terrestrial entities for now when
storing the GeoSPARQL conform triples in the separate graph - clearly,
this would need the Wikidata full dump as qualifiers are not contained
in truthy.

As Marco pointed out there is ongoing discussion on Wikidata
community:
https://www.wikidata.org/wiki/Wikidata:Property_proposal/planetary_coordinates 





What is the range of P625? It is not "earth geometry" any more.
What if there is no P376 on ?x?


Wikidata doesn't really have a concept of range or let's say they do
not make use of RDFS at all. They use "property constraints" and if I
look at https://www.wikidata.org/wiki/Property:P625 they more or less
define some kind of domain

"not being human or company or railway" and some other more weird like
"not being a female given name" etc. - I

Re: Re: Re: Re: Re: Re: Integrating a reasoner in Fuseki

2022-02-23 Thread Lorenz Buehmann
Just use this example for in-memory dataset with inference: 
https://github.com/apache/jena/blob/main/jena-fuseki2/examples/config-inference-1.ttl


On 23.02.22 12:28, Luca Turchet wrote:

For the moment I don't use a TDB2, but a regular .ttl file.
Could you please send me the assembler modified to use the ttl file?

Also, is there a way to use both the --conf and --file options together?

Cheers

Luca

-

*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions Laboratory
*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 12:19 Luca Turchet 
ha scritto:


Ok received. I proceed and let you know

Luca


-

*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions Laboratory
*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 12:16 Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> ha scritto:


To speedup the process I uploaded a tarball containing

- the extracted Fuseki 4.4.0 with the necessary Openllet Jars

- the assembler config

- please set FUSEKI_BASE then run the server with --conf assembler.ttl

- and of course modify the assembler file to link to your TDB2 location
path

Link: https://www.file.io/R2ls/download/7YKjIFR0eWyb


On 23.02.22 12:05, Lorenz Buehmann wrote:

On 23.02.22 11:54, Luca Turchet wrote:

So, firstly mvn --version provided the JDK 17.02, but the JDK version
can
be set with "export JAVA_HOME= "

I installed JDK 11 and used the POM.xml configuration you suggested.
I also
exported the JAVA_HOME in the shell session to make sure
that JDK 11 is used (with export


JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.0.13.jdk/Contents/Home/).


I got only the error :
ERROR: Type 'openllet help' for usage.

what did you do here? why would you get this error? This looks more
like a commandline script call?

You should simply call (just skip the test for speedup) to build the
Openllet project:

mvn clean install -Dmaven.test.skip=true

but follow the instruction of my previous email, you have to put the
Jar file generated in openllet/distribution/target  to the Fuseki
classpath


No more information in the output. I repeated with JDK 13 getting the
same
error.

it would be great to port fuseki to JDK 17 as JDK 11 is pretty old.

Luca



-



*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions

Laboratory

*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 10:29 Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> ha scritto:


Ok, I'm still on JDK 11 and this worked for me - so you could give it

a

try. If not then can't you set the compiler plugin to 13? Although I'm
wondering why it failed with JDK 17 if you have it installed. Did you
also set this as your current JDK?

mvn --version should have shown Java 17 then

But let's try with JDK 11 first, Fuseki distribution is currently also
on Java 11

On 23.02.22 10:22, Luca Turchet wrote:

I have amended the file as you suggested (and even modifying
the  aven-enforcer-plugin  to 3.0.0) but the result is the
same.

I attach the pom POM file.

However, I don't have installed JDK version 11, or 15. I have 13 14,
16 and 17. Do I need to install JDK version 11?

Cheers

Luca



-


*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions
Laboratory
*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/
Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 10:04 Lorenz Buehmann
 ha scritto:

  I checked Openllet, it has been set to Java 17 - you can
change it in
  the POM file:

  - set maven-compiler-plugin source and target entry to 11
  - change maven-enforcer-plugin Java ru

Re: Re: Re: Re: Re: Integrating a reasoner in Fuseki

2022-02-23 Thread Lorenz Buehmann

To speedup the process I uploaded a tarball containing

- the extracted Fuseki 4.4.0 with the necessary Openllet Jars

- the assembler config

- please set FUSEKI_BASE then run the server with --conf assembler.ttl

- and of course modify the assembler file to link to your TDB2 location path

Link: https://www.file.io/R2ls/download/7YKjIFR0eWyb


On 23.02.22 12:05, Lorenz Buehmann wrote:


On 23.02.22 11:54, Luca Turchet wrote:
So, firstly mvn --version provided the JDK 17.02, but the JDK version 
can

be set with "export JAVA_HOME= "

I installed JDK 11 and used the POM.xml configuration you suggested. 
I also

exported the JAVA_HOME in the shell session to make sure
that JDK 11 is used (with export
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.0.13.jdk/Contents/Home/). 


I got only the error :
ERROR: Type 'openllet help' for usage.


what did you do here? why would you get this error? This looks more 
like a commandline script call?


You should simply call (just skip the test for speedup) to build the 
Openllet project:


mvn clean install -Dmaven.test.skip=true

but follow the instruction of my previous email, you have to put the 
Jar file generated in openllet/distribution/target  to the Fuseki 
classpath




No more information in the output. I repeated with JDK 13 getting the 
same

error.

it would be great to port fuseki to JDK 17 as JDK 11 is pretty old.

Luca

- 



*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions Laboratory
*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 10:29 Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> ha scritto:


Ok, I'm still on JDK 11 and this worked for me - so you could give it a
try. If not then can't you set the compiler plugin to 13? Although I'm
wondering why it failed with JDK 17 if you have it installed. Did you
also set this as your current JDK?

mvn --version should have shown Java 17 then

But let's try with JDK 11 first, Fuseki distribution is currently also
on Java 11

On 23.02.22 10:22, Luca Turchet wrote:

I have amended the file as you suggested (and even modifying
the  aven-enforcer-plugin  to 3.0.0) but the result is the 
same.


I attach the pom POM file.

However, I don't have installed JDK version 11, or 15. I have 13 14,
16 and 17. Do I need to install JDK version 11?

Cheers

Luca


- 


*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions 
Laboratory

*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 10:04 Lorenz Buehmann
 ha scritto:

 I checked Openllet, it has been set to Java 17 - you can 
change it in

 the POM file:

 - set maven-compiler-plugin source and target entry to 11
 - change maven-enforcer-plugin Java rule to

[10,15)
 Note, the fork is currently set to Jena 4.2.X, not sure if it 
will

 have
     conflicts when you use it with latest Jena

 On 23.02.22 08:02, Lorenz Buehmann wrote:
 > Hi,
 >
 > follow up from your Stackoverflow thread, the Jena built-in
 reasoners
 > do not support SWRL rules - what exactly is supported by 
which OWL

 > reasoner is documented here:
 > https://jena.apache.org/documentation/inference/#owl
 >
 > None of them is a full OWL DL reasoner, that's only covered via
 > Pellet. And Pellet does also support SWRL. I suggested to use a
 Pellet
 > fork like Openllet because the official Pellet reasoner is 
still on

 > Jena 2.x/3.x and any further version of Pellet is closed source
 being
 > integrated in Stardog triple store.
 >
 > Openllet does support Jena 4.x so in theory it should work. It
 would
 > be helpful to show your Java/Maven issues, otherwise it's a 
wild

 guess.
 >
 > The other option I suggested was to use the Jena rules 
instead of

 > SWRL. I understand though that if you want to stick to W3C
 standards
 > (technically SWRL isn't) this won't be an option for you.
 >
 > Note, the whole reasoner will happen mostly in-memory - don't
 expect
 > OWL DL reasoning to scale in a large dataset in a triple store-
 that's
 > why people designed profil

Re: Re: Re: Re: Re: Integrating a reasoner in Fuseki

2022-02-23 Thread Lorenz Buehmann
FUSEKI_BASE is an environment variable used for server startup - when 
you start the Fuseki server it will looks for additional Jar files at 
$FUSEKI_BASE/extra and put those to the Java classpath - this is 
necessary as Openllet is an external lib


On 23.02.22 12:03, Luca Turchet wrote:

- set FUSEKI_BASE to the path of the Fuseki distribution -> what do you
mean exactly with "set FUSEKI_BASE" ?
in my .bash_profile I added the line export
FUSEKI_BASE=/Users/luca/Documents/Development/semantic_web/apache_jena_fuseki/apache-jena-fuseki-4.4.0:$FUSEKI_BASE
Is this what you mean?


FUSEKI_BASE=/Users/luca/Documents/Development/semantic_web/apache_jena_fuseki/apache-jena-fuseki-4.4.0

would be sufficient



Luca

-

*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions Laboratory
*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 11:29 Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> ha scritto:


Ok, looks like I got it working - took more effort than expected:

- set FUSEKI_BASE to the path of the Fuseki distribution
- create a directory $FUSEKI_BASE/extra - this will used for the
additional Jars on classpath
- given that it is JDK 11 we have to either modify the distribution
pom.xml, or for now just download form Maven repo and put to
$FUSEKI_BASE/extra:

   * javax.xml.bind:jaxb-api:2.3.0
   * com.sun.xml.bind:jaxb-core:2.3.0
   * com.sun.xml.bind:jaxb-impl:2.3.0

- modify the openllet/distribution/pom.xml to **not** exclude the
JGraphT dependency but instead include it (you have to explicitly
include it)

- rebuild and then put the openllet-distribution-2.6.6-SNAPSHOT.jar file
to $FUSEKI_BASE/extra

- use the attached assembler and adapt it - currently it assumes a TDB2
database at /tmp/DB folder where I loaded some data via tdb2.tdbloader
--loc /tmp/DB

On 23.02.22 10:29, Lorenz Buehmann wrote:

Ok, I'm still on JDK 11 and this worked for me - so you could give it
a try. If not then can't you set the compiler plugin to 13? Although
I'm wondering why it failed with JDK 17 if you have it installed. Did
you also set this as your current JDK?

mvn --version should have shown Java 17 then

But let's try with JDK 11 first, Fuseki distribution is currently also
on Java 11

On 23.02.22 10:22, Luca Turchet wrote:

I have amended the file as you suggested (and even modifying
the  aven-enforcer-plugin  to 3.0.0) but the result is the
same.

I attach the pom POM file.

However, I don't have installed JDK version 11, or 15. I have 13 14,
16 and 17. Do I need to install JDK version 11?

Cheers

Luca



-



*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions Laboratory
*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 10:04 Lorenz Buehmann
 ha scritto:

 I checked Openllet, it has been set to Java 17 - you can change
it in
 the POM file:

 - set maven-compiler-plugin source and target entry to 11
 - change maven-enforcer-plugin Java rule to
[10,15)

 Note, the fork is currently set to Jena 4.2.X, not sure if it will
 have
 conflicts when you use it with latest Jena

 On 23.02.22 08:02, Lorenz Buehmann wrote:
 > Hi,
 >
 > follow up from your Stackoverflow thread, the Jena built-in
 reasoners
 > do not support SWRL rules - what exactly is supported by which OWL
 > reasoner is documented here:
 > https://jena.apache.org/documentation/inference/#owl
 >
 > None of them is a full OWL DL reasoner, that's only covered via
 > Pellet. And Pellet does also support SWRL. I suggested to use a
 Pellet
 > fork like Openllet because the official Pellet reasoner is
still on
 > Jena 2.x/3.x and any further version of Pellet is closed source
 being
 > integrated in Stardog triple store.
 >
 > Openllet does support Jena 4.x so in theory it should work. It
 would
 > be helpful to show your Java/Maven issues, otherwise it's a wild
 guess.
 >
 > The other option I suggested was to use the Jena rules instead of
 > SWRL. I understand though that if you want to stick to W3C
 standards
 > (technically SWRL isn't) this won't be an option for you.
 >
 &

Re: Re: Re: Re: Integrating a reasoner in Fuseki

2022-02-23 Thread Lorenz Buehmann



On 23.02.22 11:54, Luca Turchet wrote:

So, firstly mvn --version provided the JDK 17.02, but the JDK version can
be set with "export JAVA_HOME= "

I installed JDK 11 and used the POM.xml configuration you suggested. I also
exported the JAVA_HOME in the shell session to make sure
that JDK 11 is used (with export
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk-11.0.13.jdk/Contents/Home/).
I got only the error :
ERROR: Type 'openllet help' for usage.


what did you do here? why would you get this error? This looks more like 
a commandline script call?


You should simply call (just skip the test for speedup) to build the 
Openllet project:


mvn clean install -Dmaven.test.skip=true

but follow the instruction of my previous email, you have to put the Jar 
file generated in openllet/distribution/target  to the Fuseki classpath




No more information in the output. I repeated with JDK 13 getting the same
error.

it would be great to port fuseki to JDK 17 as JDK 11 is pretty old.

Luca

-

*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions Laboratory
*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 10:29 Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> ha scritto:


Ok, I'm still on JDK 11 and this worked for me - so you could give it a
try. If not then can't you set the compiler plugin to 13? Although I'm
wondering why it failed with JDK 17 if you have it installed. Did you
also set this as your current JDK?

mvn --version should have shown Java 17 then

But let's try with JDK 11 first, Fuseki distribution is currently also
on Java 11

On 23.02.22 10:22, Luca Turchet wrote:

I have amended the file as you suggested (and even modifying
the  aven-enforcer-plugin  to 3.0.0) but the result is the same.

I attach the pom POM file.

However, I don't have installed JDK version 11, or 15. I have 13 14,
16 and 17. Do I need to install JDK version 11?

Cheers

Luca



-

*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions Laboratory
*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 10:04 Lorenz Buehmann
 ha scritto:

 I checked Openllet, it has been set to Java 17 - you can change it in
 the POM file:

 - set maven-compiler-plugin source and target entry to 11
 - change maven-enforcer-plugin Java rule to

[10,15)

 Note, the fork is currently set to Jena 4.2.X, not sure if it will
 have
     conflicts when you use it with latest Jena

 On 23.02.22 08:02, Lorenz Buehmann wrote:
 > Hi,
 >
 > follow up from your Stackoverflow thread, the Jena built-in
 reasoners
 > do not support SWRL rules - what exactly is supported by which OWL
 > reasoner is documented here:
 > https://jena.apache.org/documentation/inference/#owl
 >
 > None of them is a full OWL DL reasoner, that's only covered via
 > Pellet. And Pellet does also support SWRL. I suggested to use a
 Pellet
 > fork like Openllet because the official Pellet reasoner is still on
 > Jena 2.x/3.x and any further version of Pellet is closed source
 being
 > integrated in Stardog triple store.
 >
 > Openllet does support Jena 4.x so in theory it should work. It
 would
 > be helpful to show your Java/Maven issues, otherwise it's a wild
 guess.
 >
 > The other option I suggested was to use the Jena rules instead of
 > SWRL. I understand though that if you want to stick to W3C
 standards
 > (technically SWRL isn't) this won't be an option for you.
 >
 > Note, the whole reasoner will happen mostly in-memory - don't
 expect
 > OWL DL reasoning to scale in a large dataset in a triple store-
 that's
 > why people designed profiles like OWL RL which can be easily
 mapped to
 > rule based inference and don't need a tableau algorithm or the

like.

 >
 > Cheers,
 >
 > Lorenz
 >
 > On 22.02.22 18:36, Luca Turchet wrote:
 >> Dear list members,
 >> I am trying to integrate a reasoner in the Fuseki server. I
 first tried
 >> openllet but there are some techni

Re: Re: Re: Re: Integrating a reasoner in Fuseki

2022-02-23 Thread Lorenz Buehmann

Ok, looks like I got it working - took more effort than expected:

- set FUSEKI_BASE to the path of the Fuseki distribution
- create a directory $FUSEKI_BASE/extra - this will used for the 
additional Jars on classpath
- given that it is JDK 11 we have to either modify the distribution 
pom.xml, or for now just download form Maven repo and put to 
$FUSEKI_BASE/extra:


 * javax.xml.bind:jaxb-api:2.3.0
 * com.sun.xml.bind:jaxb-core:2.3.0
 * com.sun.xml.bind:jaxb-impl:2.3.0

- modify the openllet/distribution/pom.xml to **not** exclude the 
JGraphT dependency but instead include it (you have to explicitly 
include it)


- rebuild and then put the openllet-distribution-2.6.6-SNAPSHOT.jar file 
to $FUSEKI_BASE/extra


- use the attached assembler and adapt it - currently it assumes a TDB2 
database at /tmp/DB folder where I loaded some data via tdb2.tdbloader 
--loc /tmp/DB


On 23.02.22 10:29, Lorenz Buehmann wrote:
Ok, I'm still on JDK 11 and this worked for me - so you could give it 
a try. If not then can't you set the compiler plugin to 13? Although 
I'm wondering why it failed with JDK 17 if you have it installed. Did 
you also set this as your current JDK?


mvn --version should have shown Java 17 then

But let's try with JDK 11 first, Fuseki distribution is currently also 
on Java 11


On 23.02.22 10:22, Luca Turchet wrote:
I have amended the file as you suggested (and even modifying 
the  aven-enforcer-plugin  to 3.0.0) but the result is the 
same.


I attach the pom POM file.

However, I don't have installed JDK version 11, or 15. I have 13 14, 
16 and 17. Do I need to install JDK version 11?


Cheers

Luca

- 



*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions Laboratory
*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 10:04 Lorenz Buehmann 
 ha scritto:


    I checked Openllet, it has been set to Java 17 - you can change 
it in

    the POM file:

    - set maven-compiler-plugin source and target entry to 11
    - change maven-enforcer-plugin Java rule to 
[10,15)


    Note, the fork is currently set to Jena 4.2.X, not sure if it will
    have
    conflicts when you use it with latest Jena

    On 23.02.22 08:02, Lorenz Buehmann wrote:
    > Hi,
    >
    > follow up from your Stackoverflow thread, the Jena built-in
    reasoners
    > do not support SWRL rules - what exactly is supported by which OWL
    > reasoner is documented here:
    > https://jena.apache.org/documentation/inference/#owl
    >
    > None of them is a full OWL DL reasoner, that's only covered via
    > Pellet. And Pellet does also support SWRL. I suggested to use a
    Pellet
    > fork like Openllet because the official Pellet reasoner is 
still on

    > Jena 2.x/3.x and any further version of Pellet is closed source
    being
    > integrated in Stardog triple store.
    >
    > Openllet does support Jena 4.x so in theory it should work. It
    would
    > be helpful to show your Java/Maven issues, otherwise it's a wild
    guess.
    >
    > The other option I suggested was to use the Jena rules instead of
    > SWRL. I understand though that if you want to stick to W3C
    standards
    > (technically SWRL isn't) this won't be an option for you.
    >
    > Note, the whole reasoner will happen mostly in-memory - don't
    expect
    > OWL DL reasoning to scale in a large dataset in a triple store-
    that's
    > why people designed profiles like OWL RL which can be easily
    mapped to
    > rule based inference and don't need a tableau algorithm or the 
like.

    >
    > Cheers,
    >
    > Lorenz
    >
    > On 22.02.22 18:36, Luca Turchet wrote:
    >> Dear list members,
    >> I am trying to integrate a reasoner in the Fuseki server. I
    first tried
    >> openllet but there are some technical issues with java and
    maven which
    >> currently prevent the openllet installation on a mac.
    >>
    >> So I tried to launch fuseki with the --conf option using one 
of the

    >> reasoners listed at the bottom of this page:
    >>
https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html
    >>
    >> in particular, I tried the examples
    >>
    >>     - config-inference-1.ttl
    >>
<https://github.com/apache/jena/blob/main/jena-fuseki2/examples/config-inference-1.ttl>
    >>     - config-inference-2.ttl
    >>
<https://github.com/apache/jena/blob/main/jena-fuseki2/examples/co

Re: Re: Re: Integrating a reasoner in Fuseki

2022-02-23 Thread Lorenz Buehmann
Ok, I'm still on JDK 11 and this worked for me - so you could give it a 
try. If not then can't you set the compiler plugin to 13? Although I'm 
wondering why it failed with JDK 17 if you have it installed. Did you 
also set this as your current JDK?


mvn --version should have shown Java 17 then

But let's try with JDK 11 first, Fuseki distribution is currently also 
on Java 11


On 23.02.22 10:22, Luca Turchet wrote:
I have amended the file as you suggested (and even modifying 
the  aven-enforcer-plugin  to 3.0.0) but the result is the same.


I attach the pom POM file.

However, I don't have installed JDK version 11, or 15. I have 13 14, 
16 and 17. Do I need to install JDK version 11?


Cheers

Luca

-

*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions Laboratory
*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Il giorno mer 23 feb 2022 alle ore 10:04 Lorenz Buehmann 
 ha scritto:


I checked Openllet, it has been set to Java 17 - you can change it in
the POM file:

- set maven-compiler-plugin source and target entry to 11
- change maven-enforcer-plugin Java rule to [10,15)

Note, the fork is currently set to Jena 4.2.X, not sure if it will
have
conflicts when you use it with latest Jena

    On 23.02.22 08:02, Lorenz Buehmann wrote:
> Hi,
>
> follow up from your Stackoverflow thread, the Jena built-in
reasoners
> do not support SWRL rules - what exactly is supported by which OWL
> reasoner is documented here:
> https://jena.apache.org/documentation/inference/#owl
>
> None of them is a full OWL DL reasoner, that's only covered via
> Pellet. And Pellet does also support SWRL. I suggested to use a
Pellet
> fork like Openllet because the official Pellet reasoner is still on
> Jena 2.x/3.x and any further version of Pellet is closed source
being
> integrated in Stardog triple store.
>
> Openllet does support Jena 4.x so in theory it should work. It
would
> be helpful to show your Java/Maven issues, otherwise it's a wild
guess.
>
> The other option I suggested was to use the Jena rules instead of
> SWRL. I understand though that if you want to stick to W3C
standards
> (technically SWRL isn't) this won't be an option for you.
>
> Note, the whole reasoner will happen mostly in-memory - don't
expect
> OWL DL reasoning to scale in a large dataset in a triple store-
that's
> why people designed profiles like OWL RL which can be easily
mapped to
> rule based inference and don't need a tableau algorithm or the like.
>
> Cheers,
>
> Lorenz
>
> On 22.02.22 18:36, Luca Turchet wrote:
>> Dear list members,
>> I am trying to integrate a reasoner in the Fuseki server. I
first tried
>> openllet but there are some technical issues with java and
maven which
>> currently prevent the openllet installation on a mac.
>>
>> So I tried to launch fuseki with the --conf option using one of the
>> reasoners listed at the bottom of this page:
>>
https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html
>>
>> in particular, I tried the examples
>>
>>     - config-inference-1.ttl
>>

<https://github.com/apache/jena/blob/main/jena-fuseki2/examples/config-inference-1.ttl>
>>     - config-inference-2.ttl
>>

<https://github.com/apache/jena/blob/main/jena-fuseki2/examples/config-inference-2.ttl>
>>
>> which are provided in the documentation:
>> https://github.com/apache/jena/tree/main/jena-fuseki2/examples
>>
>> When performing the query which should return the result of an
>> inference I
>> don't get the expected result, like if the reasoner was not
integrated.
>> I am sure that the triplestore I am using is correct and
contains the
>> rule
>> as I have tested it in Protegè using the Snap SPARQL query tab
with the
>> Pellet reasoner activated.
>>
>> What am I doing wrong? I launch the server with
>> ./fuseki-server
>> --conf=/Users/luca/semanticweb/prova/config-inference-1.ttl
>>
>> Thanks in advance
>>
>> Best wishes
>>
>> Luca
>>

Re: Re: Integrating a reasoner in Fuseki

2022-02-23 Thread Lorenz Buehmann
I checked Openllet, it has been set to Java 17 - you can change it in 
the POM file:


- set maven-compiler-plugin source and target entry to 11
- change maven-enforcer-plugin Java rule to [10,15)

Note, the fork is currently set to Jena 4.2.X, not sure if it will have 
conflicts when you use it with latest Jena


On 23.02.22 08:02, Lorenz Buehmann wrote:

Hi,

follow up from your Stackoverflow thread, the Jena built-in reasoners 
do not support SWRL rules - what exactly is supported by which OWL 
reasoner is documented here: 
https://jena.apache.org/documentation/inference/#owl


None of them is a full OWL DL reasoner, that's only covered via 
Pellet. And Pellet does also support SWRL. I suggested to use a Pellet 
fork like Openllet because the official Pellet reasoner is still on 
Jena 2.x/3.x and any further version of Pellet is closed source being 
integrated in Stardog triple store.


Openllet does support Jena 4.x so in theory it should work. It would 
be helpful to show your Java/Maven issues, otherwise it's a wild guess.


The other option I suggested was to use the Jena rules instead of 
SWRL. I understand though that if you want to stick to W3C standards 
(technically SWRL isn't) this won't be an option for you.


Note, the whole reasoner will happen mostly in-memory - don't expect 
OWL DL reasoning to scale in a large dataset in a triple store- that's 
why people designed profiles like OWL RL which can be easily mapped to 
rule based inference and don't need a tableau algorithm or the like.


Cheers,

Lorenz

On 22.02.22 18:36, Luca Turchet wrote:

Dear list members,
I am trying to integrate a reasoner in the Fuseki server. I first tried
openllet but there are some technical issues with java and maven which
currently prevent the openllet installation on a mac.

So I tried to launch fuseki with the --conf option using one of the
reasoners listed at the bottom of this page:
https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html

in particular, I tried the examples

    - config-inference-1.ttl
<https://github.com/apache/jena/blob/main/jena-fuseki2/examples/config-inference-1.ttl>
    - config-inference-2.ttl
<https://github.com/apache/jena/blob/main/jena-fuseki2/examples/config-inference-2.ttl>

which are provided in the documentation:
https://github.com/apache/jena/tree/main/jena-fuseki2/examples

When performing the query which should return the result of an 
inference I

don't get the expected result, like if the reasoner was not integrated.
I am sure that the triplestore I am using is correct and contains the 
rule

as I have tested it in Protegè using the Snap SPARQL query tab with the
Pellet reasoner activated.

What am I doing wrong? I launch the server with
./fuseki-server 
--conf=/Users/luca/semanticweb/prova/config-inference-1.ttl


Thanks in advance

Best wishes

Luca

- 



*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions Laboratory
*https://www.cimil.disi.unitn.it/* <https://www.cimil.disi.unitn.it/>

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Re: Re: Geo indexing Wikidata

2022-02-23 Thread Lorenz Buehmann
Thanks both for your very helpful input - I'm still a GeoSPARQL novice 
and trying to learn stuff and first of all just use the Jena 
implementation as efficient as possible.


On 21.02.22 15:22, Andy Seaborne wrote:



On 21/02/2022 09:07, Lorenz Buehmann wrote:

Any experience or comments so far?


Using SubsystemLifecycle, could make the conversions by

    GeoSPARQLOperations.convertGeoPredicates

extensible.

    Andy

But having coordinate location (P625), located on astronomical body 
(P376) as properties of a thing, is dangerous because of monotonicity 
in RDF:


   SELECT * { ?x wdt:P625 ?coords }

the association of P625 and P376 is lost.


Yep, I could simply omit the extra-terrestrial entities for now when 
storing the GeoSPARQL conform triples in the separate graph - clearly, 
this would need the Wikidata full dump as qualifiers are not contained 
in truthy.


As Marco pointed out there is ongoing discussion on Wikidata community: 
https://www.wikidata.org/wiki/Wikidata:Property_proposal/planetary_coordinates




What is the range of P625? It is not "earth geometry" any more.
What if there is no P376 on ?x?


Wikidata doesn't really have a concept of range or let's say they do not 
make use of RDFS at all. They use "property constraints" and if I look 
at https://www.wikidata.org/wiki/Property:P625 they more or less define 
some kind of domain


"not being human or company or railway" and some other more weird like 
"not being a female given name" etc. - I can'T see any range at least 
not in a structured data format maybe in some discussion only.


Currently, I'd treat absence of P376 as "on Earth" but that's just my 
intepretation.




As with any n-ary-like relationship, the indirection keeps the related 
properties together.


This is not unique to geo. Temperatures with units for example



-

This brings me to another "issue" - or let's call it unexpected behavior 
which for me is counter-intuitive:


I used geof:distance function and according to GeoSPARQL standard this 
is defined as


Returns the shortest distance in units between any two Points in the 
two geometric

objects as calculated in the spatial reference system ofgeom1.
so I'd consider some metric regarding the used CRS and if absent it 
should be CRS84. But Jena does implement just the euclidean distance 
according to source code, is this intended? Here is an example of a few 
cities in Germany with it's pairwise distance as well as the Haversine 
distance:



PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX uom: <http://www.opengis.net/def/uom/OGC/1.0/>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX spatialF: <http://jena.apache.org/function/spatial#>
PREFIX afn: <http://jena.apache.org/ARQ/function#>

SELECT ?s ?o ?d1 ?d2 ?diff_d1_d2 ?d_hav ?diff_eucl_hav {
  VALUES ?s {wd:Q1709 wd:Q64 wd:Q1729 wd:Q1718 wd:Q1726}
  VALUES ?o {wd:Q1709 wd:Q64 wd:Q1729 wd:Q1718 wd:Q1726}
  ?s wdt:P625 ?wkt1 .
  ?o wdt:P625 ?wkt2 .
  FILTER(?s != ?o && str(?s) < str(?o))
  BIND(geof:distance(?wkt1, ?wkt2, uom:kilometer) as ?d1)
  BIND(geof:distance(?wkt2, ?wkt1, uom:kilometer) as ?d2)
  BIND(abs(?d1 - ?d2) as ?diff_d1_d2)
  BIND(spatialF:distance(?wkt1, ?wkt2, uom:kilometer) as ?d_hav)
  BIND(afn:max(abs(?d1 - ?d_hav), abs(?d2 - ?d_hav)) as ?diff_eucl_hav)
}

with result


|+--+--+--+--+--+--+--+||
|||    s |    o |  d1  |  d2 |  
diff_d1_d2  |    d_hav |    diff_eucl_hav |||

||+--+--+--+--+--+--+--+||
||| wd:Q1709 | wd:Q64   | 149.280218e0 | 153.202637e0 | 
3.92241900019e0  | 180.75785e0  | 31.477632e0  |||
||| wd:Q1709 | wd:Q1729 | 177.123944e0 | 188.077111e0 | 
10.9531678e0 | 296.42569e0  | 119.3017459998e0 |||
||| wd:Q1709 | wd:Q1718 | 345.13558e0  | 364.477344e0 | 
19.34176400012e0 | 412.752229e0 | 67.616649e0  |||
||| wd:Q1709 | wd:Q1726 | 362.915021e0 | 408.448278e0 | 
45.533256e0  | 611.210126e0 | 248.2951049992e0 |||
||| wd:Q1729 | wd:Q64   | 197.116217e0 | 190.514338e0 | 
6.6018787e0  | 235.639289e0 | 45.1249509998e0  |||
||| wd:Q1718 | wd:Q64   | 469.456614e0 | 456.224537e0 | 
13.2320774e0 | 475.626349e0 | 19.4018127e0 |||
||| wd:Q1718 | wd:Q1729 | 297.248804e0 | 298.880777e0 | 
1.631973000163e0 | 298.493316e0 | 1.24451199986e0  |||
||| wd:Q1718 | wd:Q1726 | 398.21636e0  | 424.395158e0 | 
26.17879799972e0 | 487.365165e0 | 89.1488049998e0  |||
||| wd:Q1726 | wd:Q64   | 351.968792e0 | 320.94899e0 | 
31.01980200027

Re: Integrating a reasoner in Fuseki

2022-02-22 Thread Lorenz Buehmann

Hi,

follow up from your Stackoverflow thread, the Jena built-in reasoners do 
not support SWRL rules - what exactly is supported by which OWL reasoner 
is documented here: https://jena.apache.org/documentation/inference/#owl


None of them is a full OWL DL reasoner, that's only covered via Pellet. 
And Pellet does also support SWRL. I suggested to use a Pellet fork like 
Openllet because the official Pellet reasoner is still on Jena 2.x/3.x 
and any further version of Pellet is closed source being integrated in 
Stardog triple store.


Openllet does support Jena 4.x so in theory it should work. It would be 
helpful to show your Java/Maven issues, otherwise it's a wild guess.


The other option I suggested was to use the Jena rules instead of SWRL. 
I understand though that if you want to stick to W3C standards 
(technically SWRL isn't) this won't be an option for you.


Note, the whole reasoner will happen mostly in-memory - don't expect OWL 
DL reasoning to scale in a large dataset in a triple store- that's why 
people designed profiles like OWL RL which can be easily mapped to rule 
based inference and don't need a tableau algorithm or the like.


Cheers,

Lorenz

On 22.02.22 18:36, Luca Turchet wrote:

Dear list members,
I am trying to integrate a reasoner in the Fuseki server. I first tried
openllet but there are some technical issues with java and maven which
currently prevent the openllet installation on a mac.

So I tried to launch fuseki with the --conf option using one of the
reasoners listed at the bottom of this page:
https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html

in particular, I tried the examples

- config-inference-1.ttl


- config-inference-2.ttl



which are provided in the documentation:
https://github.com/apache/jena/tree/main/jena-fuseki2/examples

When performing the query which should return the result of an inference I
don't get the expected result, like if the reasoner was not integrated.
I am sure that the triplestore I am using is correct and contains the rule
as I have tested it in Protegè using the Snap SPARQL query tab with the
Pellet reasoner activated.

What am I doing wrong? I launch the server with
./fuseki-server --conf=/Users/luca/semanticweb/prova/config-inference-1.ttl

Thanks in advance

Best wishes

Luca

-

*Luca Turchet*
Associate Professor
Head of the Creative, Intelligent & Multisensory Interactions Laboratory
*https://www.cimil.disi.unitn.it/* 

Department of Information Engineering and Computer Science
University of Trento
Via Sommarive 9 - 38123 Trento - Italy

E-mail: luca.turc...@unitn.it
Tel: +39 0461 283792



Geo indexing Wikidata

2022-02-21 Thread Lorenz Buehmann

Hi,

we can use this as an complementary thread for the ongoing "loading 
Wikidata" threads, this time with focus on the geospatial part.


Joachim already did the same for the text index and it works as 
expected, though still loading time could be improved.



For the geospatial index things are different, summary and current state 
here:


- Wikidata stores the coordinates in wdt:P625 property

- the literals values are of type geo:wktLiteral

- so far so good? well, not really ... Jena geospatial components 
expects to have the data either


 a) following the GeoSPARQL standard, i.e. having a geometry object 
with a serialization linking to the WKT literal


 b) having the data as WGS lat/lon literals and doing the conversion 
before indexing


- apparently neither a) or b) holds for Wikidata as the WKT literal is 
simply directly attached to an entity via wdt:P625 property, so we do 
not have it in a form like


||


|

|wd:Q3150 geo:hasDefaultGeometry [||geo:asWKT "||Point(11.58638 
50.92722)"^^geo:wktLiteral] .|


|



||nor do we have it as

||


||wd:Q3150 wgs:||lon "11.58638"^^xsd:double;
 wgs:lat "50.92722"^^xsd:double .||

||

all we have is

||

||wd:Q3150 wdt:P625||"||Point(11.58638 
50.92722)"^^geo:wktLiteral .||

|
|

So what does this mean? Well, you'll see the following output when 
starting a Fuseki with geosparql assembler used:



./fuseki-server --conf ~/fuseki-wikidata-geosparql-assembler.ttl
09:20:46 WARN  system  :: The “SIS_DATA” environment variable 
is not set.
09:20:46 INFO  Server  :: Apache Jena Fuseki 4.5.0-SNAPSHOT 
2022-02-17T09:59:26Z
09:20:46 INFO  Config  :: 
FUSEKI_HOME=/home/user/apache-jena-fuseki-4.5.0-SNAPSHOT/.
09:20:46 INFO  Config  :: 
FUSEKI_BASE=/home/user/apache-jena-fuseki-4.5.0-SNAPSHOT/run
09:20:46 INFO  Config  :: Shiro file: 
file:///home/user/apache-jena-fuseki-4.5.0-SNAPSHOT/run/shiro.ini

09:20:47 INFO  GeoSPARQLOperations :: Find Mode SRS - Started
09:20:47 INFO  GeoSPARQLOperations :: Find Mode SRS - Completed
09:20:47 WARN  GeoAssembler    :: No SRS found. Check 
'http://www.opengis.net/ont/geosparql#hasSerialization' or 
'http://www.w3.org/2003/01/geo/wgs84_pos#lat'/'http://www.w3.org/2003/01/geo/wgs84_pos#lon' 
predicates are present in the source data. Hint: Inferencing with 
GeoSPARQL schema may be required.

09:20:47 INFO  Server  :: Path = /ds
09:20:47 INFO  Server  :: System
09:20:47 INFO  Server  ::   Memory: 4.0 GiB
09:20:47 INFO  Server  ::   Java:   11.0.11
09:20:47 INFO  Server  ::   OS: Linux 5.4.0-90-generic amd64
09:20:47 INFO  Server  ::   PID:    1866352
09:20:47 INFO  Server  :: Started 2022/02/19 09:20:47 CET on 
port 3030
so technically nothing happens because the data is not in one of the 
expected formats.


So what could be a workaround here? Clearly, we could execute a SPARQL 
Update statement to add the expected triples. There are ~9 million 
wdt:P625 triples, which means 18 million additional triples (takes 2 
triples for each variant) resp. 36 million triples if we want to avoid 
inference needed (2 more triples are necessary to add geo:Feature and 
geo:Geometry). So for each entity we add triple like


|wd:Q3150 a geo:Feature, geo:hasDefaultGeometry [||a geo:Geometry ; 
geo:asWKT "||Point(11.58638 50.92722)"^^geo:wktLiteral] .|


If we do not care about the dataset size we could say not a big deal I 
guess? But for querying it matters as we have to consider this different 
non-Wikidata format. Indeed we don't have to do


|?subj geo:hasDefaultGeometry ?subjGeom . ?subjGeom geo:asWKT ?subjLit 
. ?obj geo:hasDefaultGeometry ?objGeom . ?objGeom geo:|||asWKT| ?objLit . FILTER(geof:sfContains(?subjLit, ?objLit))|
because when query rewriting is enabled, it means for the topological 
functions this is fine as we can write



|?subj geo:sfContains ?obj .|


But for queries using non-topological functions like distance it 
matters, i.e. we either have to go the full path from entity to literal 
or we just get the literal from the original wdt:P625 triple which then 
would be fine. Here I recognized that no spatial index is used for 
distance functions.


So far so good. Now we come to the data quality which I didn't check in 
the first step. Some observations I made and which have to be considered:


- there are some non-literal wdt:P625 triples, those should be omitted 
in the SPARQL Update statement
- some wdt:P625 triples resp. their WKT literal values refer to 
coordinates on other planets, like Mars and the like. The problem here 
is that Wikidata decided to use the corresponding Wikidata planet entity 
URI as CRS inside the WKT literal. Clearly Jena can't parse those as 
this is misleading. There are some CRS URIs for planets, but those 
haven't been used for the non-terrestrial geo literals. For example


|wd:Q2267142 wdt:P31 wd:Q1439394 ; wdt:P376 wd:Q3303 ; wdt:P2824 
"2727" ; 

Re: Text indexing Wikidata

2022-02-19 Thread Lorenz Buehmann

Hi,

so far you can't do anything else - the whole indexing pipeline is 
single-threaded as far as I know. It simply iterates all properties 
declared to be used for fetching the RDF triple values - Lucene indexing 
itself would be threadsafe, so the easiest thing would be to apply one 
writer thread per property. This clearly would not help here when you 
just set rdfs:label as only property. Thus, we would have to also split 
the dataset somehow for the given property and then would be able to 
distribute each split to a separate writer thread.


The main loop is here and makes it rather easy to understand where we 
could introduce parallelism: 
https://github.com/apache/jena/blob/main/jena-text/src/main/java/org/apache/jena/query/text/cmd/textindexer.java#L125-L143


Multiple read from a dataset is trivial, we just have to get appropriate 
splits - not sure how easy this is, maybe a cursor/iterator on the 
subjects with different offsets or something?


@Andy what do you think?

On 18.02.22 09:59, Neubert, Joachim wrote:

Text indexing the truthy Wikidata dump took 13:10 h for 1.5b labels (in parts 
using text:LowerCaseKeywordAnalyzer) on the massive parallel machine.

I observed a CPU usage of 100-250 %, and wonder if I could do something to 
speed up. My command line simply was

java -cp /opt/fuseki/fuseki-server.jar jena.textindexer --debug 
--desc=/tmp/temp.ttl

(apache-jena-fuseki-4.5.0-SNAPSHOT)

Cheers, Joachim

--
Joachim Neubert

ZBW - Leibniz Information Centre for Economics
Neuer Jungfernstieg 21
20354 Hamburg
Phone +49-40-42834-462




Re: How to resolve a transaction error

2022-02-14 Thread Lorenz Buehmann

Hi

On 14.02.22 09:26, Erik Bijsterbosch wrote:

Hi,

I want to resolve the transaction error I mentioned  before in an earlier
post/conversation.
This question was cluttered too much with context to get noticed, I guess.
So here's a new attempt...

After starting a (4.4.0 docker) fuseki server or a fuseki geosparql server
with inference enabled on my large dataset I get the following error
message:

fuseki_1| Write transaction with no commit() or abort() before
end() - forced abort

Inference seemed to work earlier on this dataset with my
previous implementation and I assume now this is data related.
What can I do to debug this?


what was your previous implementation? What did you change?

Do we already know if it works without Docker?



Regards,
Erik



Re: Re: Configure fuseki-server with geosparql assembler

2022-02-11 Thread Lorenz Buehmann
I'm pretty sure the inference will be computed in-memory, so maybe it's 
a memory issue?


On the other hand, do you really need the inferences computed for the 
GeoSPARQL schema? If not, it's possible to disable it in the assembler 
file.


On 11.02.22 10:08, Erik Bijsterbosch wrote:

Hi Adrian,

Thanks.
I noticed this repo yesterday.
I will check it out later.

For now I really want to know what's happening in my implementation,
which works fine on my local machine with a small dataset.
Some more tests show that fuseki only aborts while *inferencing *my 256
million triple dataset.

Maybe someone can provide me with a way to debug this...

Regards,
Erik









Op vr 11 feb. 2022 om 06:29 schreef Adrian Gschwend :


On 10.02.22 22:18, Erik Bijsterbosch wrote:

Hi Erik,


I pursued my attempt to set up a dockerised fuseki-server and
fuseki-geosparql combi application.
I created one image for boh services which I can start with

docker-compose

arguments.

You might want to try the work my colleague Ludovic did:

https://github.com/zazuko/fuseki-geosparql

regards

Adrian



Re: Request for adding log4j 2.17.1 to Fuseki Jena

2022-01-03 Thread Lorenz Buehmann

That has already been addressed and will be provided with Jena 4.4.0:

https://issues.apache.org/jira/browse/JENA-2233?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel

I doubt there will be another minor version 4.3.3, Andy?

You could built the Docker image from sources, just checkout the latest 
code. Indeed, it's still a SNAPSHOT version, but you could also make 
your own version out of it if you have no time to wait for 4.4.0


On 03.01.22 10:42, Erik Bijsterbosch wrote:

Hi there,

I ran a docker scan on a Fuseki Jena 4.3.2 image which I built with the
latest version:
https://repo1.maven.org/maven2/org/apache/jena/jena-fuseki-server/4.3.2/

This image still contains log4j vulnerabilities fom version 2.16.0.
These are supposed to be fixed in version 2.17.1
I also had to upgrade versions in de Dockerfile for openjdk en alpine to
get rid off more vulnerabilities:

ARG OPENJDK_VERSION=17
ARG ALPINE_VERSION=3.15.0

1) Is there a way to set the log4j version yourself?

2) Can log4j version 2.17.1 be implemented in Fuseki Jena 4.3.3?

Regards,
Erik

scan.log
  - - - - - -

Testing docker.io/library/fuskeki-local...

Tested 58 dependencies for known issues, found 3 issues.


Issues with no direct upgrade or patch:
   ✗ Denial of Service (DoS) [Medium Severity][
https://snyk.io/vuln/SNYK-JAVA-COMFASTERXMLJACKSONCORE-2326698] in
com.fasterxml.jackson.core:jackson-databind@2.13.0
 introduced by org.apache.jena:jena-fuseki-server@4.3.2 >
com.fasterxml.jackson.core:jackson-databind@2.13.0
   This issue was fixed in versions: 2.13.1, 2.12.6
   ✗ Denial of Service (DoS) [High Severity][
https://snyk.io/vuln/SNYK-JAVA-ORGAPACHELOGGINGLOG4J-2321524] in
org.apache.logging.log4j:log4j-core@2.16.0
 introduced by org.apache.jena:jena-fuseki-server@4.3.2 >
org.apache.logging.log4j:log4j-core@2.16.0
   This issue was fixed in versions: 2.3.1, 2.12.3, 2.17.0
   ✗ Arbitrary Code Execution [Medium Severity][
https://snyk.io/vuln/SNYK-JAVA-ORGAPACHELOGGINGLOG4J-2327339] in
org.apache.logging.log4j:log4j-core@2.16.0
 introduced by org.apache.jena:jena-fuseki-server@4.3.2 >
org.apache.logging.log4j:log4j-core@2.16.0
   This issue was fixed in versions: 2.3.2, 2.12.4, 2.17.1



Re: K-fold validation on TDB

2021-12-16 Thread Lorenz Buehmann
There is nothing TDB specific here I think, the same would hold for any 
database holding the full data or not?


Not sure if I understand what you're doing, nor do I understand how you 
generated the folds on a graph but RDF datasets can manage so called 
graphs and so does Jena.


So my question is, why can't you split the dataset into n graphs and 
store this all into the TDB? I mean, just keep the graphs stored. You 
only have to select then the n-1 graphs for training and the remaining 
graph for validation or not? We also don't know how you assign weights 
to the RDF graph, but I'm pretty sure in RDF this has to be done via 
some kind of property attached to each node. You can add and delete 
those triples each time via SPARQL 1.1 Update statements. Or via Jena 
API methods of course.


Long story, short it would be helpful to explain what exactly you're 
doing and even better show the current source code and/or queries.


On 15.12.21 14:18, emri mbiemri wrote:

Hello all,

I am interested to conduct a k-fold validation for an algorithm that uses
TDB as its database. The stored graph is weighted based on some criteria.
The point is that when performing k-fold cross validation I have for each
iteration (k-times) to create the TDP repo, to load the training models, to
weight the graph, calculate the Precision of the algorithm with the
remaining test models, delete the complete graph again, and so it iterates
for each step.

My question is if I have to completely delete for each time all the files
and create a new dataset for each iteration? Or, is there maybe any other
more appropriate way to perform k-fold cross-validation with a TDB?

Thanks.



Re: Re: Error initializing geosparql

2021-12-11 Thread Lorenz Buehmann
It's on the way with GeoSPARQL 1.1, isn't it? At least there are tickets 
related to it, e.g. [1] and many functions will be stated to work on 3D 
as well [2]



I personally think we should go beyond GeoSPARQL soon with Jena to provide
users with more advanced features. Possibly flag it as geosparql++ or the
like.
You mean because there was already this GeoSPARQL+ thing from Steffen 
Staab group with support for rasterized data? It's a shame that those 
stuff does never make it into the main public projects they are based 
on. What a waste of resources and time (from my point of view)


[1] https://github.com/opengeospatial/ogc-geosparql/issues/238
[2] 
https://opengeospatial.github.io/ogc-geosparql/geosparql11/spec.html#_b_1_functions_summary_table


On 11.12.21 17:39, Marco Neumann wrote:

That's correct Jean-Marc, no comma.

And yes the OGC GeoSPARQL spec is not supporting 3D access methods. And if
you record a third dimension, which is of course possible, it will be
ignored in Jena. Unfortunately the entire record will be. We could record
this as a bug but it's really not supported at the moment by the spec. Many
of the spatial functions in the OGC GeoSPARQL spec operate with a 2D
reference system.

I personally think we should go beyond GeoSPARQL soon with Jena to provide
users with more advanced features. Possibly flag it as geosparql++ or the
like.

Best,
Marco




On Sun, Dec 5, 2021 at 4:15 PM Jean-Marc Vanel 
wrote:


I fixed the WKT not having the right datatype, as said before; here are the
SPARQL used to check and fix:
COUNT-spatial-wkt-as-string.rq
<
https://github.com/jmvanel/semantic_forms/blob/master/sparql/COUNT-spatial-wkt-as-string.rq
FIX-spatial-wkt-as-string.upd.rq
<
https://github.com/jmvanel/semantic_forms/blob/master/sparql/FIX-spatial-wkt-as-string.upd.rq
Now this is not the end of the road . Another imperfect data causing
geosparql initialization to fail :

*Exception: Build WKT Geometry Exception - Type: point, Coordinates:
(2.353821,48.83399,0). Index 1 out of bounds for length 1*
2021-12-05T15:48:54.166Z [application-akka.actor.default-dispatcher-5]
ERROR jena - Exception class:class
org.apache.jena.datatypes.DatatypeFormatException
2021-12-05T15:48:54.167Z [application-akka.actor.default-dispatcher-5]
ERROR jena - Exception
org.apache.jena.datatypes.DatatypeFormatException: Build WKT Geometry
Exception - Type: point, Coordinates: (2.353821,48.83399,0). Index 1 out of
bounds for length 1

org.apache.jena.geosparql.implementation.parsers.wkt.WKTReader.buildGeometry(WKTReader.java:141)

org.apache.jena.geosparql.implementation.parsers.wkt.WKTReader.(WKTReader.java:50)

org.apache.jena.geosparql.implementation.parsers.wkt.WKTReader.extract(WKTReader.java:292)


org.apache.jena.geosparql.implementation.datatype.WKTDatatype.read(WKTDatatype.java:89)

org.apache.jena.geosparql.implementation.index.GeometryLiteralIndex.retrieveMemoryIndex(GeometryLiteralIndex.java:69)

org.apache.jena.geosparql.implementation.index.GeometryLiteralIndex.retrieve(GeometryLiteralIndex.java:51)

org.apache.jena.geosparql.implementation.datatype.GeometryDatatype.parse(GeometryDatatype.java:57)

org.apache.jena.geosparql.implementation.GeometryWrapper.extract(GeometryWrapper.java:1176)


org.apache.jena.geosparql.implementation.GeometryWrapper.extract(GeometryWrapper.java:1137)

org.apache.jena.geosparql.implementation.GeometryWrapper.extract(GeometryWrapper.java:1147)
org.apache.jena.geosparql.configuration.ModeSRS.search(ModeSRS.java:61)

org.apache.jena.geosparql.configuration.GeoSPARQLOperations.findModeSRS(GeoSPARQLOperations.java:520)

Is it because the WKT separator should be a space instead of a comma, or
because 3D is not allowed ?

Jean-Marc Vanel
<
http://semantic-forms.cc:9112/display?displayuri=http://jmvanel.free.fr/jmv.rdf%23me
+33
(0)6 89 16 29 52


Le dim. 5 déc. 2021 à 13:03, Jean-Marc Vanel  a
écrit :


After looking at this code, failing line in bold:



jena-geosparql/src/main/java/org/apache/jena/geosparql/configuration/ModeSRS.java



https://github.com/apache/jena/blob/main/jena-geosparql/src/main/java/org/apache/jena/geosparql/configuration/ModeSRS.java

 ExtendedIterator nodeIter =
model.listObjectsOfProperty(Geo.HAS_SERIALIZATION_PROP);
 boolean isGeometryLiteralsFound = nodeIter.hasNext();
 if (!isGeometryLiteralsFound) {
 NodeIterator wktNodeIter =
model.listObjectsOfProperty(Geo.AS_WKT_PROP);
 NodeIterator gmlNodeIter =
model.listObjectsOfProperty(Geo.AS_GML_PROP);
 nodeIter = wktNodeIter.andThen(gmlNodeIter);
 }

 while (nodeIter.hasNext()) {
 RDFNode node = nodeIter.next();
 if (node.isLiteral()) {
*GeometryWrapper geometryWrapper =
GeometryWrapper.extract(node.asLiteral());*

I did SELECT queries to try to understand what is wrong .
It appears that these triples are not present:
?S  ?O .
?S 

  1   2   3   4   5   >