Re: Ever-increasing memory usage in Fuseki

2023-11-04 Thread Dave Reynolds

Hi Hugo,

A data size of 20Mt is not big and shouldn't need anything like 30G of 
heap let alone 18G of stack. As others have said you need a much bigger 
container limit if you going pack both those plus the JVM plus direct 
buffers in.


For fuseki in kubernetes at that scale we would typically run with a 
heap more like 2G and a container limit of 4G (assuming TDB1) and no 
need to increase stack defaults. For larger sizes in the 100-200Mt 
region then 3-4G of heap and 8G container limit typically works for us. 
We only get up to needing machine sizes on the 30G range for 600MT and 
very high request rates (millions a day) and even then we keep heap 
relatively small (~12G) to leave space for the OS caching. For TDB1 the 
heap size, once there's enough to work in, is more critical for updates 
than for query and at 20M your updates can't be that big.


The progressive jetty memory leak mentioned in the threads Martynas 
referenced was solved by switching off Jetty's use of direct memory, 
which you can do with configuration files in your container. I don't 
think this is the cause of your problems in any case unless you are 
dealing with a very high request rate.


Hope this helps.
Dave

On 03/11/2023 10:08, Hugo Mills wrote:

Thanks to all who replied. We're trying out all the recommendations (slowly), 
and will update when we've got something to report.

Hugo.

Dr. Hugo Mills
Senior Data Scientist
hugo.mi...@agrimetrics.co.uk


NEWS: Visit our Data Marketplace to explore our agrifood data catalogue.
www.agrimetrics.co.uk




-Original Message-
From: Andy Seaborne 
Sent: Thursday, November 2, 2023 8:40 AM
To: users@jena.apache.org
Subject: Re: Ever-increasing memory usage in Fuseki

[You don't often get email from a...@apache.org. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

Hi Hugo,

On 01/11/2023 19:43, Hugo Mills wrote:

Hi,

We've got an application we've inherited recently which uses a Fuseki
database. It was originally Fuseki 3.4.0, and has been upgraded to
4.9.0 recently. The 3.4.0 server needed regular restarts (once a day)
in order to keep working; the 4.9.0 server is even more unreliable,
and has been running out of memory and being OOM-killed multiple times
a day. This afternoon, it crashed enough times, fast enough, to make
Kubernetes go into a back-off loop, and brought the app down for some time.

We're using OpenJDK 19. The JVM options are: "-Xmx:30g -Xms18g", and
the container we're running it in has a memory limit of 31 GiB.


Setting Xmx close to the container limit can cause problems.

The JVM itself takes space and the operating system needs space.
The JVM itself has a ~1G extra space for direct memory which networking uses.

The Java heap will almost certainly grow to reach Xmx at some point because 
Java delays running full garbage collections. The occasional drops you see are 
likely incremental garbage collections happening.

If Xxm is very close to container limit, the heap will naturally grow (it does 
not know about the container limit),  then the total in-use memory for the 
machine is reached and the container is killed.

30G heap looks like a very tight setting. Is there anything customized running 
in Fuseki? is the server dedicated to Fuseki?

As Conal mentioned, TDB used memory mapped files - these are not part of the 
heap. They are part of the OS virtual memory.

Is this a single database?
One TDB database needs about 4G RAM of heap space. Try a setting of -Xmx4G.

Only if you have a high proportion of very large literals will that setting not 
work.

More is not better from TDB's point of view.  Space for memory mapped files is 
handled elsewhere, and that space that will expand and contract as needed. If 
that space is squeezed out the system will slow down.


We tried the
"-XX:+UserSerialGC" option this evening, but it didn't seem to help
much. We see the RAM usage of the java process rising steadily as
queries are made, with occasional small, but insufficient, drops.



The store is somewhere around 20M triples in size.


Is this a TDB database or in-memory? (I'm guessing TDB but could you confirm 
that.)

Query processing can lead to a lot of memory use if the queries are inefficient 
and there is a high, overlapping query load.

What is the query load on the server? Are there many overlapping requests?


Could anyone suggest any tweaks or options we could do to make this
more stable, and not leak memory? We've downgraded to 3.4.0 again, and
it's not running out of space every few minutes at least, but it still
has an ever-growing memory usage.

Thanks,

Hugo.

*Dr. Hugo Mills*

Senior Data Scientist

hugo.mi...@agrimetrics.co.uk 


Re: Java 11 vs Java 17

2023-08-30 Thread Dave Reynolds




On 29/08/2023 12:26, Andy Seaborne wrote:



On 29/08/2023 08:46, Dave Reynolds wrote:

Hi Andy,

On 27/08/2023 10:36, Andy Seaborne wrote:


On 25/08/2023 15:18, Dave Reynolds wrote: [1]
 > We've being testing some of our troublesome queries on 4.9.0 on java
 > 11 vs java 17 and see a 10-15% performance hit on java 17 (even after
 > we take control of the GC by forcing both to use the old parallel GC
 > instead of G1). No idea why, seems wrong! Makes us inclined to stick
 > with java 11 and thus jena 4.x series as long as we can.

Dave,

Is this 4.9.0 specific or across multiple Jena versions?


Seems to be multiple versions (at least 4.8.0 and 4.9.0), but not 
tested exhaustively.



Is G1 worse than the old parallel GC on Java17?


It is definitely worse on Java11 for a particular narrow type of query 
that is an issue for us. Believe the same is true on Java17 but 
haven't collected definitive data on this.


It may be possible to tune G1 to better match our particular test case 
but the testing and tuning is time consuming and the parallel GC does 
the trick.


Our aim was to replace a system running on 3.x era fuseki with a 4.x 
era one without significant loss of performance. Out of box there was 
a 20% hit. Switching GC reduced much of that, switching to java11 
instead of 17 brought us basically to parity - for this special case. 
This is a case where legitimate queries get close to the timeout 
threshold we run at, so a 20% performance drop is particularly visible 
in having currently working queries timeout on a newer version.


The query itself is trivial - return large numbers of resources 
(10k-1m) found by a simple lucene query along with a few (~15) 
properties of each. Performance in this case seems to be dominated by 
the time to render the large results stream rather than lucene or TDB 
query performance. So it makes some sense that in this specific case a 
GC tuned for throughput rather than pause time would help.


Which result format is this? JSON? XML?


XML. Also tested JSON which is around 10% slower.

Dave





No suggestion that our case is representative of any broader pattern.

Dave


     Andy


Re: Java 11 vs Java 17

2023-08-29 Thread Dave Reynolds

Hi Andy,

On 27/08/2023 10:36, Andy Seaborne wrote:


On 25/08/2023 15:18, Dave Reynolds wrote: [1]
 > We've being testing some of our troublesome queries on 4.9.0 on java
 > 11 vs java 17 and see a 10-15% performance hit on java 17 (even after
 > we take control of the GC by forcing both to use the old parallel GC
 > instead of G1). No idea why, seems wrong! Makes us inclined to stick
 > with java 11 and thus jena 4.x series as long as we can.

Dave,

Is this 4.9.0 specific or across multiple Jena versions?


Seems to be multiple versions (at least 4.8.0 and 4.9.0), but not tested 
exhaustively.



Is G1 worse than the old parallel GC on Java17?


It is definitely worse on Java11 for a particular narrow type of query 
that is an issue for us. Believe the same is true on Java17 but haven't 
collected definitive data on this.


It may be possible to tune G1 to better match our particular test case 
but the testing and tuning is time consuming and the parallel GC does 
the trick.


Our aim was to replace a system running on 3.x era fuseki with a 4.x era 
one without significant loss of performance. Out of box there was a 20% 
hit. Switching GC reduced much of that, switching to java11 instead of 
17 brought us basically to parity - for this special case. This is a 
case where legitimate queries get close to the timeout threshold we run 
at, so a 20% performance drop is particularly visible in having 
currently working queries timeout on a newer version.


The query itself is trivial - return large numbers of resources (10k-1m) 
found by a simple lucene query along with a few (~15) properties of 
each. Performance in this case seems to be dominated by the time to 
render the large results stream rather than lucene or TDB query 
performance. So it makes some sense that in this specific case a GC 
tuned for throughput rather than pause time would help.


No suggestion that our case is representative of any broader pattern.

Dave


Re: Mystery memory leak in fuseki

2023-08-25 Thread Dave Reynolds

On 25/08/2023 11:44, Andy Seaborne wrote:



On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when 
running (in docker containers) on small machines. Suspect a jetty 
issue but it's not clear.


 From the threads here, it does seem to be Jetty related.


Yes.

We've followed up on Rob's suggestions for tuning the jetty settings so 
we can use a stock fuseki. On 4.9.0 if we switch off direct buffer using 
in jetty altogether the problem does seem to go away. The performance 
hit we see is small and barely above noise.


We currently have a soak test of leaving direct buffers on but limiting 
max and retained levels, that looks promising but too early to be sure.


I haven't managed to reproduce the situation on my machine in any sort 
of predictable way where I can look at what's going on.


Understood. While we can reproduce some effects in desktop test set ups 
the only real test has been to leave configurations running for days at 
a time in the real dev setting with all it's monitoring and 
instrumentation. Which makes testing any changes very painful, let alone 
deeper investigations.


For Jena5, there will be a switch to a Jetty to use uses jakarta.* 
packages. That's no more than a rename of imports. The migration 
EE8->EE9 is only repackaging.  That's Jetty10->Jetty11.


There is now Jetty12. It is a major re-architecture of Jetty including 
it's network handling for better HTTP/2 and HTTP/3.


If there has been some behaviour of Jetty involved in the memory growth, 
it is quite unlikely to carried over to Jetty12.


Jetty12 is not a simple switch of artifacts for Fuseki. APIs have 
changed but it's a step that going to be needed sometime.


If it does not turn out that Fuseki needs a major re-architecture, I 
think that Jena5 should be based on Jetty12. So far, it looks doable.


Sound promising. Agreed that jetty12 is enough of a new build it's 
unlikely to have the same behaviour.


We've being testing some of our troublesome queries on 4.9.0 on java 11 
vs java 17 and see a 10-15% performance hit on java 17 (even after we 
take control of the GC by forcing both to use the old parallel GC 
instead of G1). No idea why, seems wrong! Makes us inclined to stick 
with java 11 and thus jena 4.x series as long as we can.


Dave



Re: Mystery memory leak in fuseki

2023-07-11 Thread Dave Reynolds

Hi Rob,

Good point. Will try to find time to experiment with that but given the 
testing cycle time that will take a while and can't start immediately.


I'm a little sceptical though. As mentioned before, all the metrics we 
see show the direct memory pool that Jetty uses cycling up the max heap 
size and then being collected but with no long term growth to match the 
process size growth. This really feels more like a bug (though not sure 
where) than tuning. The fact that actual behaviour doesn't match the 
documentation isn't encouraging.


It's also pretty hard to figure what the right pool configuration would 
be. This thing is just being asked to deliver a few metrics (12KB per 
request) several times a minute but manages to eat 500MB of direct 
buffer space every 5mins. So what the right pool parameters are to 
support real usage peaks is not going to be easy to figure out.


None the less you are right. That's something that should be explored.

Dave


On 11/07/2023 09:45, Rob @ DNR wrote:

Dave

Thanks for the further information.

Have you experimented with using Jetty 10 but providing more detailed 
configuration?  Fuseki supports providing detailed Jetty configuration if 
needed via the --jetty-config option

The following section look relevant:

https://eclipse.dev/jetty/documentation/jetty-10/operations-guide/index.html#og-module-bytebufferpool

It looks like the default is that Jetty uses a heuristic to determine these 
values, sadly the heuristic in question is not detailed in that documentation.

Best guess from digging through their code is that the “heuristic” is this:

https://github.com/eclipse/jetty.project/blob/jetty-10.0.x/jetty-io/src/main/java/org/eclipse/jetty/io/AbstractByteBufferPool.java#L78-L84

i.e., ¼ of the configured max heap size.  This doesn’t necessarily align with 
the exact sizes of process growth you see but I note the documentation does 
explicitly say that buffers used can go beyond these limits but that those will 
just be GC’d rather than pooled for reuse.

Example byte buffer configuration at 
https://github.com/eclipse/jetty.project/blob/9a05c75ad28ebad4abbe624fa432664c59763747/jetty-server/src/main/config/etc/jetty-bytebufferpool.xml#L4

Any chance you could try customising this for your needs with stock Fuseki and 
see if this allows you to make the process size smaller and sufficiently 
predictable for your use case?

Rob

From: Dave Reynolds 
Date: Tuesday, 11 July 2023 at 08:58
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
For interest[*] ...

This is what the core JVM metrics look like when transitioning from a
Jetty10 to a Jetty9.4 instance. You can see the direct buffer cycling up
to 500MB (which happens to be the max heap setting) on Jetty 10, nothing
on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been
asked any queries yet.

https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0

Here' the same metrics around the time of triggering a TDB backup. Shows
the mapped buffer use for TDB but no significant impact on heap etc.

https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0

These are all on the same instance as the RES memory trace:

https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0

Dave

[*] I've been staring and metric graphs for so many days I may have a
distorted notion of what's interesting :)

On 11/07/2023 08:39, Dave Reynolds wrote:

After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the
production, containerized, environment then it is indeed very stable.

Running at less that 6% of memory on 4GB machine compared to peaks of
~50% for versions with Jetty 10. RES shows as 240K with 35K shared
(presume mostly libraries).

Copy of trace is:
https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0

The high spikes on left of image are the prior run on with out of the
box 4.7.0 on same JVM.

The small spike at 06:00 is a dump so TDB was able to touch and scan all
the (modest) data with very minor blip in resident size (as you'd hope).
JVM stats show the mapped buffers for TDB jumping up but confirm heap is
stable at < 60M, non-heap 60M.

Dave

On 10/07/2023 20:52, Dave Reynolds wrote:

Since this thread has got complex, I'm posting this update here at the
top level.

Thanks to folks, especially Andy and Rob for suggestions and for
investigating.

After a lot more testing at our end I believe we now have some
workarounds.

First, at least on java 17, the process growth does seem to level out.
Despite what I just said to Rob, having just checked our soak tests, a
jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days.
Process size oscillates between 1.5GB and 2GB but hasn't gone above
that in a week. The oscillation is almost entir

Re: Mystery memory leak in fuseki

2023-07-11 Thread Dave Reynolds

Hi Marco,

On 11/07/2023 09:04, Marco Neumann wrote:

Dave, can you say a bit more about the profiling methodology? Are you using
a tool such as VisualVM to collect the data? Or do you just use the system
monitor?


The JVM metrics here are from prometheus scanning the metrics exposed by 
fuseki via the built in micrometer (displayed use grafana). They give a 
*lot* of details on things like GC behaviour etc which I'm not showing.


Ironically the only thing this fuseki was doing when it died originally 
was supporting these metric scans, and the health check ping.


The overall memory curve is picked up by telegraph scanning the OS level 
stats for the docker processes (collected via influx DB and again 
displayed in grafana). These are what you would get with e.g. top on the 
machine or a system monitor but means we have longer term records which 
we access remotely. When I quoted 240K RES, 35K shared that was actually 
just top on the machine.


When running locally can also use things like jconsole or visualVM but 
I actually find the prometheus + telegraph metrics we have in our 
production monitoring more detailed and easier to work with. We run lots 
of services so the monitoring and alerting stack, while all industry 
standard, has been a life saver for us.


For doing the debugging locally I also tried setting the JVM flags to 
enable finer grain native memory tracking and use jcmd (in a scripted 
loop) to pull out those more detailed metrics. Though they are not that 
much more detailed than the micrometer/prometheus metrics.
That use of jcmd and the caution on how to interpret RES came from the 
blog item I mentioned earlier:

https://poonamparhar.github.io/troubleshooting_native_memory_leaks/

For the memory leak checking I used valgrind but there's lots of others.

Dave



Marco

On Tue, Jul 11, 2023 at 8:57 AM Dave Reynolds 
wrote:


For interest[*] ...

This is what the core JVM metrics look like when transitioning from a
Jetty10 to a Jetty9.4 instance. You can see the direct buffer cycling up
to 500MB (which happens to be the max heap setting) on Jetty 10, nothing
on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been
asked any queries yet.


https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0

Here' the same metrics around the time of triggering a TDB backup. Shows
the mapped buffer use for TDB but no significant impact on heap etc.


https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0

These are all on the same instance as the RES memory trace:


https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0

Dave

[*] I've been staring and metric graphs for so many days I may have a
distorted notion of what's interesting :)

On 11/07/2023 08:39, Dave Reynolds wrote:

After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the
production, containerized, environment then it is indeed very stable.

Running at less that 6% of memory on 4GB machine compared to peaks of
~50% for versions with Jetty 10. RES shows as 240K with 35K shared
(presume mostly libraries).

Copy of trace is:


https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0


The high spikes on left of image are the prior run on with out of the
box 4.7.0 on same JVM.

The small spike at 06:00 is a dump so TDB was able to touch and scan all
the (modest) data with very minor blip in resident size (as you'd hope).
JVM stats show the mapped buffers for TDB jumping up but confirm heap is
stable at < 60M, non-heap 60M.

Dave

On 10/07/2023 20:52, Dave Reynolds wrote:

Since this thread has got complex, I'm posting this update here at the
top level.

Thanks to folks, especially Andy and Rob for suggestions and for
investigating.

After a lot more testing at our end I believe we now have some
workarounds.

First, at least on java 17, the process growth does seem to level out.
Despite what I just said to Rob, having just checked our soak tests, a
jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days.
Process size oscillates between 1.5GB and 2GB but hasn't gone above
that in a week. The oscillation is almost entirely the cycling of the
direct memory buffers used by Jetty. Empirically those cycle up to
something comparable to the set max heap size, at least for us.

While this week long test was 4.7.0, based on earlier tests I suspect
4.8.0 (and now 4.9.0) would also level out at least on a timescale of
days.

The key has been setting the max heap low. At 2GB and even 1GB (the
default on a 4GB machine) we see higher peak levels of direct buffers
and overall process size grew to around 3GB at which point the
container is killed on the small machines. Though java 17 does seem to
be better behaved that java 11, so switching to that probably also
helped.

Given the act

Re: Mystery memory leak in fuseki

2023-07-11 Thread Dave Reynolds

For interest[*] ...

This is what the core JVM metrics look like when transitioning from a 
Jetty10 to a Jetty9.4 instance. You can see the direct buffer cycling up 
to 500MB (which happens to be the max heap setting) on Jetty 10, nothing 
on Jetty 9. The drop in Mapped buffers is just because TDB hadn't been 
asked any queries yet.


https://www.dropbox.com/scl/fi/9afhrztbb36fvzqkuw996/fuseki-jetty10-jetty9-transition.png?rlkey=7fpj4x1pn5mjnf3jjwenmp65m=0

Here' the same metrics around the time of triggering a TDB backup. Shows 
the mapped buffer use for TDB but no significant impact on heap etc.


https://www.dropbox.com/scl/fi/0s40vpizf94c4w3m2awna/fuseki-jetty10-backup.png?rlkey=ai31m6z58w0uex8zix8e9ctna=0

These are all on the same instance as the RES memory trace:

https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0

Dave

[*] I've been staring and metric graphs for so many days I may have a 
distorted notion of what's interesting :)


On 11/07/2023 08:39, Dave Reynolds wrote:
After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the 
production, containerized, environment then it is indeed very stable.


Running at less that 6% of memory on 4GB machine compared to peaks of 
~50% for versions with Jetty 10. RES shows as 240K with 35K shared 
(presume mostly libraries).


Copy of trace is: 
https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0


The high spikes on left of image are the prior run on with out of the 
box 4.7.0 on same JVM.


The small spike at 06:00 is a dump so TDB was able to touch and scan all 
the (modest) data with very minor blip in resident size (as you'd hope). 
JVM stats show the mapped buffers for TDB jumping up but confirm heap is 
stable at < 60M, non-heap 60M.


Dave

On 10/07/2023 20:52, Dave Reynolds wrote:
Since this thread has got complex, I'm posting this update here at the 
top level.


Thanks to folks, especially Andy and Rob for suggestions and for 
investigating.


After a lot more testing at our end I believe we now have some 
workarounds.


First, at least on java 17, the process growth does seem to level out. 
Despite what I just said to Rob, having just checked our soak tests, a 
jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days. 
Process size oscillates between 1.5GB and 2GB but hasn't gone above 
that in a week. The oscillation is almost entirely the cycling of the 
direct memory buffers used by Jetty. Empirically those cycle up to 
something comparable to the set max heap size, at least for us.


While this week long test was 4.7.0, based on earlier tests I suspect 
4.8.0 (and now 4.9.0) would also level out at least on a timescale of 
days.


The key has been setting the max heap low. At 2GB and even 1GB (the 
default on a 4GB machine) we see higher peak levels of direct buffers 
and overall process size grew to around 3GB at which point the 
container is killed on the small machines. Though java 17 does seem to 
be better behaved that java 11, so switching to that probably also 
helped.


Given the actual heap is low (50MB heap, 60MB non-heap) then needing 
2GB to run in feels high but is workable. So my previously suggested 
rule of thumb that, in this low memory regime, allow 4x the max heap 
size seems to work.


Second, we're now pretty confident the issue is jetty 10+.

We've built a fuseki-server 4.9.0 with Jetty replaced by version 
9.4.51.v20230217. This required some minor source changes to compile 
and pass tests. On a local bare metal test where we saw process growth 
up to 1.5-2GB this build has run stably using less than 500MB for 4 
hours.


We'll set a longer term test running in the target containerized 
environment to confirm things but quite hopeful this will be long term 
stable.


I realise Jetty 9.4.x is out of community support but eclipse say EOL 
is "unlikely to happen before 2025". So, while this may not be a 
solution for the Jena project, it could give us a workaround at the 
cost of doing custom builds.


Dave


On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when 
running (in docker containers) on small machines. Suspect a jetty 
issue but it's not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB 
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of 
a day or so to reach ~3GB of memory at which point the 4GB machine 
becomes unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks 
and (prometheus) metrics scrapes from the monitoring systems.

Re: Mystery memory leak in fuseki

2023-07-11 Thread Dave Reynolds
After a 10 hour test of 4.9.0 with Jetty 9.4 on java 17 in the 
production, containerized, environment then it is indeed very stable.


Running at less that 6% of memory on 4GB machine compared to peaks of 
~50% for versions with Jetty 10. RES shows as 240K with 35K shared 
(presume mostly libraries).


Copy of trace is: 
https://www.dropbox.com/scl/fi/c58nqkr2hi193a84btedg/fuseki-4.9.0-jetty-9.4.png?rlkey=b7osnj6k1oy1xskl4j25zz6o8=0


The high spikes on left of image are the prior run on with out of the 
box 4.7.0 on same JVM.


The small spike at 06:00 is a dump so TDB was able to touch and scan all 
the (modest) data with very minor blip in resident size (as you'd hope). 
JVM stats show the mapped buffers for TDB jumping up but confirm heap is 
stable at < 60M, non-heap 60M.


Dave

On 10/07/2023 20:52, Dave Reynolds wrote:
Since this thread has got complex, I'm posting this update here at the 
top level.


Thanks to folks, especially Andy and Rob for suggestions and for 
investigating.


After a lot more testing at our end I believe we now have some workarounds.

First, at least on java 17, the process growth does seem to level out. 
Despite what I just said to Rob, having just checked our soak tests, a 
jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days. 
Process size oscillates between 1.5GB and 2GB but hasn't gone above that 
in a week. The oscillation is almost entirely the cycling of the direct 
memory buffers used by Jetty. Empirically those cycle up to something 
comparable to the set max heap size, at least for us.


While this week long test was 4.7.0, based on earlier tests I suspect 
4.8.0 (and now 4.9.0) would also level out at least on a timescale of days.


The key has been setting the max heap low. At 2GB and even 1GB (the 
default on a 4GB machine) we see higher peak levels of direct buffers 
and overall process size grew to around 3GB at which point the container 
is killed on the small machines. Though java 17 does seem to be better 
behaved that java 11, so switching to that probably also helped.


Given the actual heap is low (50MB heap, 60MB non-heap) then needing 2GB 
to run in feels high but is workable. So my previously suggested rule of 
thumb that, in this low memory regime, allow 4x the max heap size seems 
to work.


Second, we're now pretty confident the issue is jetty 10+.

We've built a fuseki-server 4.9.0 with Jetty replaced by version 
9.4.51.v20230217. This required some minor source changes to compile and 
pass tests. On a local bare metal test where we saw process growth up to 
1.5-2GB this build has run stably using less than 500MB for 4 hours.


We'll set a longer term test running in the target containerized 
environment to confirm things but quite hopeful this will be long term 
stable.


I realise Jetty 9.4.x is out of community support but eclipse say EOL is 
"unlikely to happen before 2025". So, while this may not be a solution 
for the Jena project, it could give us a workaround at the cost of doing 
custom builds.


Dave


On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when 
running (in docker containers) on small machines. Suspect a jetty 
issue but it's not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB 
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of 
a day or so to reach ~3GB of memory at which point the 4GB machine 
becomes unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks 
and (prometheus) metrics scrapes from the monitoring systems.


Furthermore the space being consumed is not visible to any of the JVM 
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly 
non-heap metaspace).

- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being 
reclaimed. Since there are no sparql queries at all we assume this is 
jetty NIO buffers being churned as a result of the metric scrapes. 
However, this direct buffer behaviour seems stable, it cycles between 
0 and 500MB on approx a 10min cycle but is stable over a period of 
days and shows no leaks.


Yet the java process grows from an initial 100MB to at least 3GB. This 
can occur in the space of a couple of hours or can take up to a day or 
two with no predictability in how fast.


Presumably there is some low level JNI space allocated by Jetty (?) 
which is invisible to all the JVM metrics and is not being reliably 
reclaimed.


Trying 4.6.0, which we've had less problems with elsewhere, that seems 
to grow to around 1GB (plus up to 0.5GB for the c

Re: Mystery memory leak in fuseki

2023-07-10 Thread Dave Reynolds
Since this thread has got complex, I'm posting this update here at the 
top level.


Thanks to folks, especially Andy and Rob for suggestions and for 
investigating.


After a lot more testing at our end I believe we now have some workarounds.

First, at least on java 17, the process growth does seem to level out. 
Despite what I just said to Rob, having just checked our soak tests, a 
jena 4.7.0/java 17 test with 500MB max heap has lasted for 7 days. 
Process size oscillates between 1.5GB and 2GB but hasn't gone above that 
in a week. The oscillation is almost entirely the cycling of the direct 
memory buffers used by Jetty. Empirically those cycle up to something 
comparable to the set max heap size, at least for us.


While this week long test was 4.7.0, based on earlier tests I suspect 
4.8.0 (and now 4.9.0) would also level out at least on a timescale of days.


The key has been setting the max heap low. At 2GB and even 1GB (the 
default on a 4GB machine) we see higher peak levels of direct buffers 
and overall process size grew to around 3GB at which point the container 
is killed on the small machines. Though java 17 does seem to be better 
behaved that java 11, so switching to that probably also helped.


Given the actual heap is low (50MB heap, 60MB non-heap) then needing 2GB 
to run in feels high but is workable. So my previously suggested rule of 
thumb that, in this low memory regime, allow 4x the max heap size seems 
to work.


Second, we're now pretty confident the issue is jetty 10+.

We've built a fuseki-server 4.9.0 with Jetty replaced by version 
9.4.51.v20230217. This required some minor source changes to compile and 
pass tests. On a local bare metal test where we saw process growth up to 
1.5-2GB this build has run stably using less than 500MB for 4 hours.


We'll set a longer term test running in the target containerized 
environment to confirm things but quite hopeful this will be long term 
stable.


I realise Jetty 9.4.x is out of community support but eclipse say EOL is 
"unlikely to happen before 2025". So, while this may not be a solution 
for the Jena project, it could give us a workaround at the cost of doing 
custom builds.


Dave


On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when running 
(in docker containers) on small machines. Suspect a jetty issue but it's 
not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB as 
NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of a 
day or so to reach ~3GB of memory at which point the 4GB machine becomes 
unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks and 
(prometheus) metrics scrapes from the monitoring systems.


Furthermore the space being consumed is not visible to any of the JVM 
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly 
non-heap metaspace).

- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being 
reclaimed. Since there are no sparql queries at all we assume this is 
jetty NIO buffers being churned as a result of the metric scrapes. 
However, this direct buffer behaviour seems stable, it cycles between 0 
and 500MB on approx a 10min cycle but is stable over a period of days 
and shows no leaks.


Yet the java process grows from an initial 100MB to at least 3GB. This 
can occur in the space of a couple of hours or can take up to a day or 
two with no predictability in how fast.


Presumably there is some low level JNI space allocated by Jetty (?) 
which is invisible to all the JVM metrics and is not being reliably 
reclaimed.


Trying 4.6.0, which we've had less problems with elsewhere, that seems 
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory 
buffers) and then stays stable (at least on a three day soak test).  We 
could live with allocating 1.5GB to a system that should only need a few 
100MB but concerned that it may not be stable in the really long term 
and, in any case, would rather be able to update to more recent fuseki 
versions.


Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then 
keeps ticking up slowly at random intervals. We project that it would 
take a few weeks to grow the scale it did under java 11 but it will 
still eventually kill the machine.


Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries and 
that should still leave plenty of space for OS buffers etc in the 
remaining memory on a 4GB machine.






Re: OOM Killed

2023-07-10 Thread Dave Reynolds

Hi Andy,

On 10/07/2023 12:18, Andy Seaborne wrote:

Laura, Dave,

This doesn't sound like the same issue but let's see.


It may well be different, if so apologies for causing noise.


Dave - your situation isn't under high load is it?


We see the process size growth under no load other than metric scrapes. 
However, growth seems faster if there's more traffic (faster scrapes) so 
expecting that high query load will make it worse. Whether "worse" means 
it'll just get to some asymptote faster or actually go higher is unproven.



- Is it in a container? If so:


The original problem was in a container. But as I've said we can 
reproduce the process growth on bare metal (i.e. local desktop).



   Is it the container being killed OOM or
     Java throwing an OOM exception?


For the original problem it's the container being OOM killed, no java 
exception.


For local tests both on container and bare metal we've just been looking 
at the process size growth, haven't run it long enough to reach OOM 
state on the size of machine I'm using then I doubt it will on a 
timescale I can wait for.



   Much RAM does the container get? How many threads?


For the original problem the container had no memory limit other than 
machine total of 4GB. No constraints set on threads.



- If not a container, how many CPU Threads are there? How many cores?


For local tests 6 cores, should mean 12 CPU threads but checking just 
now I suspect hyperthreading isn't working on my current install so call 
it 6 of both.



- Which form of Fuseki are you using?


fuseki-server


what does
   java -XX:+PrintFlagsFinal -version \
    | grep -i 'M..HeapSize`

say?


E.g. in the container:

   size_t ErgoHeapSizeLimit= 0 
   {product} {default}
   size_t HeapSizePerGCThread  = 43620760 
   {product} {default}
   size_t InitialHeapSize  = 65011712 
   {product} {ergonomic}
   size_t LargePageHeapSizeThreshold   = 134217728 
   {product} {default}
   size_t MaxHeapSize  = 1019215872 
   {product} {ergonomic}
   size_t MinHeapSize  = 8388608 
   {product} {ergonomic}
uintx NonNMethodCodeHeapSize   = 5826188 
{pd product} {ergonomic}
uintx NonProfiledCodeHeapSize  = 122916026 
{pd product} {ergonomic}
uintx ProfiledCodeHeapSize = 122916026 
{pd product} {ergonomic}
   size_t SoftMaxHeapSize  = 1019215872 
{manageable} {ergonomic}


But we are overriding the MaxHeapSize with the -Xmx flag in the actual 
running process.

How are you sending the queries to the server?


For the original problems this occurred on a system with no queries at 
all just the metrics scraping. The /$/ping point was getting checked by 
a healthcheck monitoring tool (sensu) and /$/metrics by prometheus.


On the local bare metal checks where we can reproduce process growth at 
least medium term we are just checking those by curl in a watch loop.


We've made some process at our end, and update on that back on the 
previous thread rather than further confuse this one.


Dave


On 09/07/2023 20:33, Laura Morales wrote:
I'm running a job that is submitting a lot of queries to a Fuseki 
server, in parallel. My problem is that Fuseki is OOM-killed and I 
don't know how to fix this. Some details:


- Fuseki is queried as fast as possible. Queries take around 50-100ms 
to complete so I think it's serving 10s of queries each second


Are all the queries about the same amount of work are are some going to 
cause significantly more memory use?


It is quite possible to send queries faster than the server can process 
them - there is little point sending in parallel more than there are 
real CPU threads to service them.


They will interfere and the machine can end up going slower (query of 
queries per second).


I don't know exactly the impact on the GC but I think the JVM delays 
minor GC's when very busy but that pushes it to do major ones earlier.


A thing to try is use less parallelism.

- Fuseki 4.8. OS is Debian 12 (minimal installation with only OS, 
Fuseki, no desktop environments, uses only ~100MB of RAM)
- all the queries are read queries. No updates, inserts, or other 
write queries

- all the queries are over HTTP to the Fuseki endpoint
- database is TDB2 (created with tdb2.tdbloader)
- database contains around 2.5M triples
- the machine has 8GB RAM. I've tried on another PC with 16GB and it 
completes the job. On 8GB though, it won't
- with -Xmx6G it's killed earlier. With -Xmx2G it's killed later. 
Either way it's always killed.


Is it getting OOM at random or do certain queries tend to push it over 
he edge?


Is that the machine 

Re: Mystery memory leak in fuseki

2023-07-10 Thread Dave Reynolds

Hi Rob,

On 10/07/2023 14:05, Rob @ DNR wrote:

Dave

Poked around a bit today but not sure I’ve reproduced anything as such or found 
any smoking guns

I ran a Fuseki instance with the same watch command you showed in your last 
message.  JVM Heap stays essentially static even after hours, there’s some 
minor fluctuation up and down in used heap space but the heap itself doesn’t 
grow at all.  Did this with a couple of different versions of 4.x to see if 
there’s any discernible difference but nothing meaningful showed up.  I also 
used 3.17.0 but again couldn’t reproduce the behaviour you are describing.


I too reported the heap (and non-heap) remain stable so not sure in what 
way the behaviour you are seeing is different. The issue is process size.



For reference I’m on OS X 13.4.1 using OpenJDK 17


For reference I have similar behaviour on the follow combinations:

Containerized:
   Amazon Linux 2, Amazon Corretto 11
   Amazon Linux 2, Amazon Corretto 17
   Amazon Linux 2, Elipse Temurin 17
   Ubuntu 22.04, Elipse Temurin 17

Bare metal:
   Ubuntu 22.04, OpenJdk 11


The process peak memory (for all versions I tested) seems to peak at about 1.5G 
as reported by the vmmap tool.  Ongoing monitoring, i.e., OS X Activity Monitor 
shows the memory usage of the process fluctuating over time, but I don’t ever 
see the unlimited growth that your original report suggested.  Also, I didn’t 
set heap explicitly at all so I’m getting the default max heap of 4GB, and my 
actual heap usage was around 100 MB.


If I set the heap max to 500M the process size growth seems to largely 
level off around 1.5-2GB over the space of hours. So comparable to 
yours. However, it's not clear that it is absolutely stable by then. Our 
original failures only occurred after several days and the graphs for 
24hour tests are noisy enough to not be confident it's reached any 
absolute stability by 2GB.


If I set the heap max to 4GB then the process grows larger and we've 
certainly seen instances where it reached 3GB. Even though the the heap 
size itself is stable (small fluctuations but no trends) and remains 
under 100MB. Not left it going longer than than because that's already 
no good for us.


Note almost all of these tests have been with data in TDB, even though 
not running any queries. If I run a fuseki with just --mem and no data 
loaded at all the growth is slower and that may be the closer to your 
test setup but the growth is still there.



I see from vmmap that most of the memory appears to be virtual memory related 
to the many shared native libraries that the JVM links against which on a real 
OS is often swapped out as it’s not under active usage.


I don't have vmmap available (that's a BSD tool I think) but clearly 
virtual memory is a different matter. I'm only concerned with the 
resident set size.


To check resident size the original graphs showing the issue were based 
on memory % metrics from docker runtime (via prometheus metrics scrape) 
when testing as a container. Testing bare metal then I've used both RSS 
and the so called PSS mentioned in:

https://poonamparhar.github.io/troubleshooting_native_memory_leaks/

PSS didn't show any noticeably different curve than RSS so while RSS can 
be misleading it seems accurate here.



In a container, where swap is likely disabled, that’s obviously more 
problematic as everything occupies memory even if much of it might be for 
native libraries that are never needed by anything Fuseki does.  Again, I don’t 
see how that would lead to the apparently unbounded memory usage you’re 
describing.


Not sure swap is relevant, even on the bare metal there's no swapping 
going on.


Certainly agree that virtual memory space will fill up with mappings for 
native libraries but it's not VSS I'm worried about.



You could try using jlink to build a minimal image where you only have the 
parts of the JDK that you need in the image.  I found the following old Jena 
thread - https://lists.apache.org/thread/dmmkndmy2ds8pf95zvqbcxpv84bj7cz6 - 
which actually describes an apparently similar memory issue but also has an 
example of a Dockerfile linked at the start of the thread that builds just such 
a minimal JRE for Fuseki.


Interesting thought but petty sure at this point the issue is Jetty, and 
that gives us the best work around so far. I'll post separately about that.



Note that I also ran the leaks tool against the long running Fuseki processes 
and that didn’t find anything of note, 5.19KB of memory leaks over a 3.5 hr run 
so no smoking gun there.


Agreed, we've also run leak testers but didn't find any issue and didn't 
expect to. As we've said several times, throughout all this - heap, 
non-heap, thread count, thread stack and direct memory buffers (at least 
as visible to the JVM) are all stable.


Cheers,
Dave


Regards,

Rob

From: Dave Reynolds 
Date: Friday, 7 July 2023 at 11:11
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Hi Andy

Re: OOM Killed

2023-07-10 Thread Dave Reynolds
There is an issue with memory growth in fuseki, though it's growth 
outside of normal java heap and non-heap space.


See https://www.mail-archive.com/users@jena.apache.org/msg20362.html

For that scale of data and scale of machine suggest setting the heap 
smaller -Xmx1G or -Xmx500M. Empirically the process growth seems to 
largely levels off at around 4x the given heap size (though this very 
much depends on the usage model and have no clear explanation for this).


You might also try -XX:MaxDirectMemorySize=1G or less though exactly 
what size to set will depend on how much data is involved in your 
queries. If the process dies with an exception about unable to allocate 
new direct memory then increase it.


If this is not a public service liable to security issues and you are 
able to use a 3.17.0 (or earlier) version of fuseki then those are not 
subject to this growth issue. Or at least not to the version of the 
issue that we are seeing in our own usage.


Dave

On 09/07/2023 20:33, Laura Morales wrote:

I'm running a job that is submitting a lot of queries to a Fuseki server, in 
parallel. My problem is that Fuseki is OOM-killed and I don't know how to fix 
this. Some details:

- Fuseki is queried as fast as possible. Queries take around 50-100ms to 
complete so I think it's serving 10s of queries each second
- Fuseki 4.8. OS is Debian 12 (minimal installation with only OS, Fuseki, no 
desktop environments, uses only ~100MB of RAM)
- all the queries are read queries. No updates, inserts, or other write queries
- all the queries are over HTTP to the Fuseki endpoint
- database is TDB2 (created with tdb2.tdbloader)
- database contains around 2.5M triples
- the machine has 8GB RAM. I've tried on another PC with 16GB and it completes 
the job. On 8GB though, it won't
- with -Xmx6G it's killed earlier. With -Xmx2G it's killed later. Either way 
it's always killed.

Is there anything that I can tweak to avoid Fuseki getting killed? Something that isn't 
"just buy more RAM".
Thank you


Re: Mystery memory leak in fuseki

2023-07-07 Thread Dave Reynolds

Hi Andy,

Thanks for looking.

Good thought on some issue with stacked requests causing thread leak but 
don't think that matches our data.


From the metrics the number of threads and total thread memory used is 
not that great and is stable long term while the process size grows, at 
least in our situation.


This is based on both the JVM metrics from the prometheus scrape and by 
switching on native memory checking and using jcmd to do various low 
level dumps.


In a test set up we can replicate the long term (~3 hours) process 
growth (while the heap, non-heap and threads stay stable) by just doing 
something like:


watch -n 1 'curl -s http://localhost:3030/$/metrics'

With no other requests at all. So I think that makes it less likely the 
root cause is triggered by stacked concurrent requests. Certainly the 
curl process has exited completely each time. Though I guess there could 
some connection cleanup going on in the linux kernel still.


> Is the OOM kill the container runtime or Java exception?

We're not limiting the container memory but the OOM error is from docker 
runtime itself:

fatal error: out of memory allocating heap arena map

We have replicated the memory growth outside a container but not left 
that to soak on a small machine to provoke an OOM, so not sure if the 
OOM killer would hit first or get a java OOM exception first.


One curiosity we've found on the recent tests is that, when the process 
has grown to dangerous level for the server, we do randomly sometimes 
see the JVM (Temurin 17.0.7) spit out a thread dump and heap summary as 
if there were a low level exception. However, there's no exception 
message at all - just a timestamp the thread dump and nothing else. The 
JVM seems to just carry on and the process doesn't exit. We're not 
setting any debug flags and not requesting any thread dump, and there's 
no obvious triggering event. This is before the server gets completely 
out of the memory causing the docker runtime to barf.


Dave


On 07/07/2023 09:56, Andy Seaborne wrote:
I tried running without any datasets. I get the same heap effect of 
growing slowly then a dropping back.


Fuseki Main (fuseki-server did the same but the figures are from main - 
there is less going on)

Version 4.8.0

fuseki -v --ping --empty    # No datasets

4G heap.
71M allocated
4 threads (+ Daemon system threads)
2 are not parked (i.e. they are blocked)
The heap grows slowly to 48M then a GC runs then drops to 27M
This repeats.

Run one ping.
Heap now 142M, 94M/21M GC cycle
and 2 more threads at least for a while. They seem to go away after time.
2 are not parked.

Now pause process the JVM, queue 100 pings and continue the process.
Heap now 142M, 80M/21M GC cycle
and no more threads.

Thread stacks are not heap so there may be something here.

Same except -Xmx500M
RSS is 180M
Heap is 35M actual.
56M/13M heap cycle
and after one ping:
I saw 3 more threads, and one quickly exited.
2 are not parked

100 concurrent ping requests.
Maybe 15 more threads. 14 parked. One is marked "running" by visualvm.
RSS is 273M

With -Xmx250M -Xss170k
The Fuseki command failed below 170k during classloading.

1000 concurrent ping requests.
Maybe 15 more threads. 14 parked. One is marked "running" by visualvm.
The threads aren't being gathered.
RSS is 457M.

So a bit of speculation:

Is the OOM kill the container runtime or Java exception?

There aren't many moving parts.

Maybe under some circumstances, the metrics gatherer or ping caller
causes more threads. This could be bad timing, several operations 
arriving at the same time, or it could be the client end isn't releasing 
the HTTP connection in a timely manner or is delayed/failing to read the 
entire response.  HTTP/1.1. -- HTTP/2 probably isn't at risk.


Together with a dataset, memory mapped files etc, it is pushing the 
process size up and on a small machine that might become a problem 
especially if the container host is limiting RAM.


But speculation.

     Andy



Re: Mystery memory leak in fuseki

2023-07-06 Thread Dave Reynolds
a memory leak.

Test setup:
* Debian 12, Linux kernel 6.1.0-9-amd64
* `java -version`:
   openjdk version "17.0.7" 2023-04-18
   OpenJDK Runtime Environment (build 17.0.7+7-Debian-1deb12u1)
* fresh install of apache-jena-fuseki-4.8.0, default configuration (run 
directory is created automatically), no datasets
* Fuseki is started via the `fuseki-server` script; no extra JVM_ARGS (i.e. it 
becomes --Xmx4G)

Test execution:
* Call the ping endpoint a few hundred times, e.g. via `for i in {1..100}; do 
curl http://127.0.0.1:3030/$/ping; done`.

Observation:
* Memory consumption of the java process increases significantly up to around 
4GBs, then some GC steps in and reduces memory consumption to less than 1GB. 
The cycle repeats with more pings.
* Tweaking the JVM arg --Xmx can change when GC steps in.

Can anyone reproduce my observations?

I tried that with all versions down from v4.8.0 down to v4.0.0 and I'm happy to 
give you some clues:
The erratic behaviour starts with version 4.3.0, so it's advisable to check what happened 
between v4.2.0 and v4.3.0. Another impression is that v4.1.0 is even less 
"memory-leaky" than v4.2.0.

I also analyzed with VisualVM in this test setup, but to be honest I don't see 
any suspicious memory leak situation there.


Best regards,
Frank



____
Von: Dave Reynolds 
Gesendet: Dienstag, 4. Juli 2023 12:16
An: users@jena.apache.org
Betreff: Re: Mystery memory leak in fuseki

Try that again:

For interest this is what the JVM metrics look like. The main
heap/non-heap ones are:

https://www.dropbox.com/s/g1ih98kprnvjvxx/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB)
then GC back. Lots of churn just for reporting the metrics but no sign
of the upward trend which dominates the MEM% curves and nothing to
explain the growth to 1.8GB and beyond

Guess could try doing a heap dump anyway in case that gives a clue but
not sure that's the right haystack.

Dave

On 04/07/2023 10:56, Dave Reynolds wrote:

For interest this is what the JVM metrics look like. The main
heap/non-heap ones are:

https://www.dropbox.com/s/8auux5v352ur04m/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB)
then GC back. Lots of churn just for reporting the metrics but no sign
of the upward trend which dominates the MEM% curves and nothing to
explain the growth to 1.8GB and beyond

Guess could try doing a heap dump anyway in case that gives a clue but
not sure that's the right haystack.

Dave


On 04/07/2023 10:41, Dave Reynolds wrote:

  >  Does this only happen in a container?  Or can you reproduce it
running locally as well?

Not reproduced locally yet, partly because it's harder to set up the
equivalent metrics monitoring there.

Can try harder at that.

  > If you can reproduce it locally then attaching a profiler like
VisualVM so you can take a heap snapshot and see where the memory is
going that would be useful

Thanks, aware of that option but I thought that would just allow us to
probe the heap, non-heap and buffer JVM memory pools. We have quite
detailed monitoring traces on all the JVM metrics which confirms heap
and non-heap are all fine, sitting stably at a low level and not
reflecting the leak.

That's also what tells us the direct memory buffers are cycling but
being properly collected and not leaking. Assuming the JVM metrics are
accurate then the leak is somewhere in native memory beyond the ken of
the JVM metrics.

Dave


On 04/07/2023 10:11, Rob @ DNR wrote:

Does this only happen in a container?  Or can you reproduce it
running locally as well?

If you can reproduce it locally then attaching a profiler like
VisualVM so you can take a heap snapshot and see where the memory is
going that would be useful

Rob

From: Dave Reynolds 
Date: Tuesday, 4 July 2023 at 09:31
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After
16hours it gets to about 1.6GB and by eye has nearly flatted off
somewhat but not completely.

For interest here's a MEM% curve on a 4GB box (hope the link works).

https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0

The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for
comparison. The curve from then onwards is 4.7.0.

The spikes on the 4.7.0 match the allocation and recovery of the direct
memory buffers. The JVM metrics show those cycling around every 10mins
and being reclaimed each time with no leaking visible at that level.
Heap, non-heap and mapped buffers are all basically unchanging which is
to be expected

Re: Mystery memory leak in fuseki

2023-07-04 Thread Dave Reynolds

Try that again:

For interest this is what the JVM metrics look like. The main 
heap/non-heap ones are:


https://www.dropbox.com/s/g1ih98kprnvjvxx/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB) 
then GC back. Lots of churn just for reporting the metrics but no sign 
of the upward trend which dominates the MEM% curves and nothing to 
explain the growth to 1.8GB and beyond


Guess could try doing a heap dump anyway in case that gives a clue but 
not sure that's the right haystack.


Dave

On 04/07/2023 10:56, Dave Reynolds wrote:
For interest this is what the JVM metrics look like. The main 
heap/non-heap ones are:


https://www.dropbox.com/s/8auux5v352ur04m/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB) 
then GC back. Lots of churn just for reporting the metrics but no sign 
of the upward trend which dominates the MEM% curves and nothing to 
explain the growth to 1.8GB and beyond


Guess could try doing a heap dump anyway in case that gives a clue but 
not sure that's the right haystack.


Dave


On 04/07/2023 10:41, Dave Reynolds wrote:
 >  Does this only happen in a container?  Or can you reproduce it 
running locally as well?


Not reproduced locally yet, partly because it's harder to set up the 
equivalent metrics monitoring there.


Can try harder at that.

 > If you can reproduce it locally then attaching a profiler like 
VisualVM so you can take a heap snapshot and see where the memory is 
going that would be useful


Thanks, aware of that option but I thought that would just allow us to 
probe the heap, non-heap and buffer JVM memory pools. We have quite 
detailed monitoring traces on all the JVM metrics which confirms heap 
and non-heap are all fine, sitting stably at a low level and not 
reflecting the leak.


That's also what tells us the direct memory buffers are cycling but 
being properly collected and not leaking. Assuming the JVM metrics are 
accurate then the leak is somewhere in native memory beyond the ken of 
the JVM metrics.


Dave


On 04/07/2023 10:11, Rob @ DNR wrote:
Does this only happen in a container?  Or can you reproduce it 
running locally as well?


If you can reproduce it locally then attaching a profiler like 
VisualVM so you can take a heap snapshot and see where the memory is 
going that would be useful


Rob

From: Dave Reynolds 
Date: Tuesday, 4 July 2023 at 09:31
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After
16hours it gets to about 1.6GB and by eye has nearly flatted off
somewhat but not completely.

For interest here's a MEM% curve on a 4GB box (hope the link works).

https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0

The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for
comparison. The curve from then onwards is 4.7.0.

The spikes on the 4.7.0 match the allocation and recovery of the direct
memory buffers. The JVM metrics show those cycling around every 10mins
and being reclaimed each time with no leaking visible at that level.
Heap, non-heap and mapped buffers are all basically unchanging which is
to be expected since it's doing nothing apart from reporting metrics.

Whereas this curve (again from 17:20 onwards) shows basically the same
4.7.0 set up on a separate host, showing that despite flattening out
somewhat usage continues to grow - a least on a 16 hour timescale.

https://www.dropbox.com/s/k0v54yq4kexklk0/fuseki-mem-percent-2.png?dl=0


Both of those runs were using Eclipse Temurin on a base Ubuntu jammy
container. Pervious runs used AWS Corretto on an AL2 base container.
Behaviour basically unchanged so eliminates this being some
Corretto-specific issue or a weird base container OS issue.

Dave

On 03/07/2023 14:54, Andy Seaborne wrote:

Hi Dave,

Could you try 4.7.0?

4.6.0 was 2022-08-20
4.7.0 was 2022-12-27
4.8.0 was 2023-04-20

This is an in-memory database?

Micrometer/Prometheus has had several upgrades but if it is not heap 
and

not direct memory (I though that was a hard bound set at start up), I
don't see how it can be involved.

  Andy

On 03/07/2023 14:20, Dave Reynolds wrote:

We have a very strange problem with recent fuseki versions when
running (in docker containers) on small machines. Suspect a jetty
issue but it's not clear.

Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB
as NQuads). Runs on 4GB machines with java heap allocation of 
500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
s

Re: Mystery memory leak in fuseki

2023-07-04 Thread Dave Reynolds
For interest this is what the JVM metrics look like. The main 
heap/non-heap ones are:


https://www.dropbox.com/s/8auux5v352ur04m/fusdeki-metrics-1.png?dl=0

So stable at around 75MB used, 110MB committed.

Whereas the buffer pools are:

https://www.dropbox.com/s/c77b2oarzxjlsa7/fuseki-buffer-metrics.png?dl=0

So gets up to a size comparable with the allowed max heap size (500MB) 
then GC back. Lots of churn just for reporting the metrics but no sign 
of the upward trend which dominates the MEM% curves and nothing to 
explain the growth to 1.8GB and beyond


Guess could try doing a heap dump anyway in case that gives a clue but 
not sure that's the right haystack.


Dave


On 04/07/2023 10:41, Dave Reynolds wrote:
 >  Does this only happen in a container?  Or can you reproduce it 
running locally as well?


Not reproduced locally yet, partly because it's harder to set up the 
equivalent metrics monitoring there.


Can try harder at that.

 > If you can reproduce it locally then attaching a profiler like 
VisualVM so you can take a heap snapshot and see where the memory is 
going that would be useful


Thanks, aware of that option but I thought that would just allow us to 
probe the heap, non-heap and buffer JVM memory pools. We have quite 
detailed monitoring traces on all the JVM metrics which confirms heap 
and non-heap are all fine, sitting stably at a low level and not 
reflecting the leak.


That's also what tells us the direct memory buffers are cycling but 
being properly collected and not leaking. Assuming the JVM metrics are 
accurate then the leak is somewhere in native memory beyond the ken of 
the JVM metrics.


Dave


On 04/07/2023 10:11, Rob @ DNR wrote:
Does this only happen in a container?  Or can you reproduce it running 
locally as well?


If you can reproduce it locally then attaching a profiler like 
VisualVM so you can take a heap snapshot and see where the memory is 
going that would be useful


Rob

From: Dave Reynolds 
Date: Tuesday, 4 July 2023 at 09:31
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After
16hours it gets to about 1.6GB and by eye has nearly flatted off
somewhat but not completely.

For interest here's a MEM% curve on a 4GB box (hope the link works).

https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0

The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for
comparison. The curve from then onwards is 4.7.0.

The spikes on the 4.7.0 match the allocation and recovery of the direct
memory buffers. The JVM metrics show those cycling around every 10mins
and being reclaimed each time with no leaking visible at that level.
Heap, non-heap and mapped buffers are all basically unchanging which is
to be expected since it's doing nothing apart from reporting metrics.

Whereas this curve (again from 17:20 onwards) shows basically the same
4.7.0 set up on a separate host, showing that despite flattening out
somewhat usage continues to grow - a least on a 16 hour timescale.

https://www.dropbox.com/s/k0v54yq4kexklk0/fuseki-mem-percent-2.png?dl=0


Both of those runs were using Eclipse Temurin on a base Ubuntu jammy
container. Pervious runs used AWS Corretto on an AL2 base container.
Behaviour basically unchanged so eliminates this being some
Corretto-specific issue or a weird base container OS issue.

Dave

On 03/07/2023 14:54, Andy Seaborne wrote:

Hi Dave,

Could you try 4.7.0?

4.6.0 was 2022-08-20
4.7.0 was 2022-12-27
4.8.0 was 2023-04-20

This is an in-memory database?

Micrometer/Prometheus has had several upgrades but if it is not heap and
not direct memory (I though that was a hard bound set at start up), I
don't see how it can be involved.

  Andy

On 03/07/2023 14:20, Dave Reynolds wrote:

We have a very strange problem with recent fuseki versions when
running (in docker containers) on small machines. Suspect a jetty
issue but it's not clear.

Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].

We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
support) with no problems.

Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of
a day or so to reach ~3GB of memory at which point the 4GB machine
becomes unviable and things get OOM killed.

The strange thing is that this growth happens when the system is
answering no Sparql queries at all, just regular health ping checks
and (prometheus) metrics scrapes from the monitoring systems.

Furthermore the space being consumed is not visible to any of the JVM
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly
non-heap metaspace).
- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being
reclaimed. Since there are no sparql queries at all we 

Re: Mystery memory leak in fuseki

2023-07-04 Thread Dave Reynolds
>  Does this only happen in a container?  Or can you reproduce it 
running locally as well?


Not reproduced locally yet, partly because it's harder to set up the 
equivalent metrics monitoring there.


Can try harder at that.

> If you can reproduce it locally then attaching a profiler like 
VisualVM so you can take a heap snapshot and see where the memory is 
going that would be useful


Thanks, aware of that option but I thought that would just allow us to 
probe the heap, non-heap and buffer JVM memory pools. We have quite 
detailed monitoring traces on all the JVM metrics which confirms heap 
and non-heap are all fine, sitting stably at a low level and not 
reflecting the leak.


That's also what tells us the direct memory buffers are cycling but 
being properly collected and not leaking. Assuming the JVM metrics are 
accurate then the leak is somewhere in native memory beyond the ken of 
the JVM metrics.


Dave


On 04/07/2023 10:11, Rob @ DNR wrote:

Does this only happen in a container?  Or can you reproduce it running locally 
as well?

If you can reproduce it locally then attaching a profiler like VisualVM so you 
can take a heap snapshot and see where the memory is going that would be useful

Rob

From: Dave Reynolds 
Date: Tuesday, 4 July 2023 at 09:31
To: users@jena.apache.org 
Subject: Re: Mystery memory leak in fuseki
Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After
16hours it gets to about 1.6GB and by eye has nearly flatted off
somewhat but not completely.

For interest here's a MEM% curve on a 4GB box (hope the link works).

https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0

The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for
comparison. The curve from then onwards is 4.7.0.

The spikes on the 4.7.0 match the allocation and recovery of the direct
memory buffers. The JVM metrics show those cycling around every 10mins
and being reclaimed each time with no leaking visible at that level.
Heap, non-heap and mapped buffers are all basically unchanging which is
to be expected since it's doing nothing apart from reporting metrics.

Whereas this curve (again from 17:20 onwards) shows basically the same
4.7.0 set up on a separate host, showing that despite flattening out
somewhat usage continues to grow - a least on a 16 hour timescale.

https://www.dropbox.com/s/k0v54yq4kexklk0/fuseki-mem-percent-2.png?dl=0


Both of those runs were using Eclipse Temurin on a base Ubuntu jammy
container. Pervious runs used AWS Corretto on an AL2 base container.
Behaviour basically unchanged so eliminates this being some
Corretto-specific issue or a weird base container OS issue.

Dave

On 03/07/2023 14:54, Andy Seaborne wrote:

Hi Dave,

Could you try 4.7.0?

4.6.0 was 2022-08-20
4.7.0 was 2022-12-27
4.8.0 was 2023-04-20

This is an in-memory database?

Micrometer/Prometheus has had several upgrades but if it is not heap and
not direct memory (I though that was a hard bound set at start up), I
don't see how it can be involved.

  Andy

On 03/07/2023 14:20, Dave Reynolds wrote:

We have a very strange problem with recent fuseki versions when
running (in docker containers) on small machines. Suspect a jetty
issue but it's not clear.

Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].

We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
support) with no problems.

Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of
a day or so to reach ~3GB of memory at which point the 4GB machine
becomes unviable and things get OOM killed.

The strange thing is that this growth happens when the system is
answering no Sparql queries at all, just regular health ping checks
and (prometheus) metrics scrapes from the monitoring systems.

Furthermore the space being consumed is not visible to any of the JVM
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly
non-heap metaspace).
- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being
reclaimed. Since there are no sparql queries at all we assume this is
jetty NIO buffers being churned as a result of the metric scrapes.
However, this direct buffer behaviour seems stable, it cycles between
0 and 500MB on approx a 10min cycle but is stable over a period of
days and shows no leaks.

Yet the java process grows from an initial 100MB to at least 3GB. This
can occur in the space of a couple of hours or can take up to a day or
two with no predictability in how fast.

Presumably there is some low level JNI space allocated by Jetty (?)
which is invisible to all the JVM metrics and is not being reliably
reclaimed.

Trying 4.6.0, which we've had less problems with elsewhere, that seems
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory
buffers) and then

Re: Mystery memory leak in fuseki

2023-07-04 Thread Dave Reynolds
Tried 4.7.0 under most up to date java 17 and it acts like 4.8.0. After 
16hours it gets to about 1.6GB and by eye has nearly flatted off 
somewhat but not completely.


For interest here's a MEM% curve on a 4GB box (hope the link works).

https://www.dropbox.com/s/xjmluk4o3wlwo0y/fuseki-mem-percent.png?dl=0

The flattish curve from 12:00 to 17:20 is a run using 3.16.0 for 
comparison. The curve from then onwards is 4.7.0.


The spikes on the 4.7.0 match the allocation and recovery of the direct 
memory buffers. The JVM metrics show those cycling around every 10mins 
and being reclaimed each time with no leaking visible at that level. 
Heap, non-heap and mapped buffers are all basically unchanging which is 
to be expected since it's doing nothing apart from reporting metrics.


Whereas this curve (again from 17:20 onwards) shows basically the same 
4.7.0 set up on a separate host, showing that despite flattening out 
somewhat usage continues to grow - a least on a 16 hour timescale.


https://www.dropbox.com/s/k0v54yq4kexklk0/fuseki-mem-percent-2.png?dl=0


Both of those runs were using Eclipse Temurin on a base Ubuntu jammy 
container. Pervious runs used AWS Corretto on an AL2 base container. 
Behaviour basically unchanged so eliminates this being some 
Corretto-specific issue or a weird base container OS issue.


Dave

On 03/07/2023 14:54, Andy Seaborne wrote:

Hi Dave,

Could you try 4.7.0?

4.6.0 was 2022-08-20
4.7.0 was 2022-12-27
4.8.0 was 2023-04-20

This is an in-memory database?

Micrometer/Prometheus has had several upgrades but if it is not heap and 
not direct memory (I though that was a hard bound set at start up), I 
don't see how it can be involved.


     Andy

On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when 
running (in docker containers) on small machines. Suspect a jetty 
issue but it's not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB 
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of 
a day or so to reach ~3GB of memory at which point the 4GB machine 
becomes unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks 
and (prometheus) metrics scrapes from the monitoring systems.


Furthermore the space being consumed is not visible to any of the JVM 
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly 
non-heap metaspace).

- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being 
reclaimed. Since there are no sparql queries at all we assume this is 
jetty NIO buffers being churned as a result of the metric scrapes. 
However, this direct buffer behaviour seems stable, it cycles between 
0 and 500MB on approx a 10min cycle but is stable over a period of 
days and shows no leaks.


Yet the java process grows from an initial 100MB to at least 3GB. This 
can occur in the space of a couple of hours or can take up to a day or 
two with no predictability in how fast.


Presumably there is some low level JNI space allocated by Jetty (?) 
which is invisible to all the JVM metrics and is not being reliably 
reclaimed.


Trying 4.6.0, which we've had less problems with elsewhere, that seems 
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory 
buffers) and then stays stable (at least on a three day soak test).  
We could live with allocating 1.5GB to a system that should only need 
a few 100MB but concerned that it may not be stable in the really long 
term and, in any case, would rather be able to update to more recent 
fuseki versions.


Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then 
keeps ticking up slowly at random intervals. We project that it would 
take a few weeks to grow the scale it did under java 11 but it will 
still eventually kill the machine.


Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries 
and that should still leave plenty of space for OS buffers etc in the 
remaining memory on a 4GB machine.






Re: Mystery memory leak in fuseki

2023-07-04 Thread Dave Reynolds

Thanks for the suggestion, that could be useful.

Not managed to make that work yet. From within the container get 
permission denied, and running it on the host is no use because the 
relevant so's aren't where ltrace expects and it crashes out.


Similarly strace can't attach to the process in the container and 
running on the host gives no info.


Guess would have to replicate the set up without using containers. 
Certainly possible but a fair amount of work and loses all the metrics 
we get from the container stack. May have to resort to that.


Dave

On 03/07/2023 22:22, Justin wrote:

You might try running `ltrace` to watch the library calls and system calls
the jvm is making.
e.g.
ltrace -S -f -p 

I think the `sbrk` system call is used to allocate memory. It might be
interesting to see if you can catch the jvm invoking that system call and
also see what is happening around it.

On Mon, Jul 3, 2023 at 10:50 AM Dave Reynolds 
wrote:


On 03/07/2023 14:36, Martynas Jusevičius wrote:

There have been a few similar threads:

https://www.mail-archive.com/users@jena.apache.org/msg19871.html

https://www.mail-archive.com/users@jena.apache.org/msg18825.html



Thanks, I've seen those and not sure they quite match our case but maybe
I'm mistaken.

We already have a smallish heap allocation (500MB) which seem to be a
key conclusion of both those threads. Though I guess we could try even
lower.

Furthermore the second thread was related to 3.16.0 which is completely
stable for us at 150MB (rather than the 1.5GB that 4.6.* gets to, let
alone the 3+GB that gets 4.8.0 killed).

Dave




On Mon, 3 Jul 2023 at 15.20, Dave Reynolds 
wrote:


We have a very strange problem with recent fuseki versions when running
(in docker containers) on small machines. Suspect a jetty issue but it's
not clear.

Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB as
NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].

We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
support) with no problems.

Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of a
day or so to reach ~3GB of memory at which point the 4GB machine becomes
unviable and things get OOM killed.

The strange thing is that this growth happens when the system is
answering no Sparql queries at all, just regular health ping checks and
(prometheus) metrics scrapes from the monitoring systems.

Furthermore the space being consumed is not visible to any of the JVM
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly
non-heap metaspace).
- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being
reclaimed. Since there are no sparql queries at all we assume this is
jetty NIO buffers being churned as a result of the metric scrapes.
However, this direct buffer behaviour seems stable, it cycles between 0
and 500MB on approx a 10min cycle but is stable over a period of days
and shows no leaks.

Yet the java process grows from an initial 100MB to at least 3GB. This
can occur in the space of a couple of hours or can take up to a day or
two with no predictability in how fast.

Presumably there is some low level JNI space allocated by Jetty (?)
which is invisible to all the JVM metrics and is not being reliably
reclaimed.

Trying 4.6.0, which we've had less problems with elsewhere, that seems
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory
buffers) and then stays stable (at least on a three day soak test).  We
could live with allocating 1.5GB to a system that should only need a few
100MB but concerned that it may not be stable in the really long term
and, in any case, would rather be able to update to more recent fuseki
versions.

Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then
keeps ticking up slowly at random intervals. We project that it would
take a few weeks to grow the scale it did under java 11 but it will
still eventually kill the machine.

Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries and
that should still leave plenty of space for OS buffers etc in the
remaining memory on a 4GB machine.












Re: Mystery memory leak in fuseki

2023-07-03 Thread Dave Reynolds

On 03/07/2023 14:36, Martynas Jusevičius wrote:

There have been a few similar threads:

https://www.mail-archive.com/users@jena.apache.org/msg19871.html

https://www.mail-archive.com/users@jena.apache.org/msg18825.html



Thanks, I've seen those and not sure they quite match our case but maybe 
I'm mistaken.


We already have a smallish heap allocation (500MB) which seem to be a 
key conclusion of both those threads. Though I guess we could try even 
lower.


Furthermore the second thread was related to 3.16.0 which is completely 
stable for us at 150MB (rather than the 1.5GB that 4.6.* gets to, let 
alone the 3+GB that gets 4.8.0 killed).


Dave




On Mon, 3 Jul 2023 at 15.20, Dave Reynolds 
wrote:


We have a very strange problem with recent fuseki versions when running
(in docker containers) on small machines. Suspect a jetty issue but it's
not clear.

Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB as
NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].

We used to run using 3.16 on jdk 8 (AWS Corretto for the long term
support) with no problems.

Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of a
day or so to reach ~3GB of memory at which point the 4GB machine becomes
unviable and things get OOM killed.

The strange thing is that this growth happens when the system is
answering no Sparql queries at all, just regular health ping checks and
(prometheus) metrics scrapes from the monitoring systems.

Furthermore the space being consumed is not visible to any of the JVM
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly
non-heap metaspace).
- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being
reclaimed. Since there are no sparql queries at all we assume this is
jetty NIO buffers being churned as a result of the metric scrapes.
However, this direct buffer behaviour seems stable, it cycles between 0
and 500MB on approx a 10min cycle but is stable over a period of days
and shows no leaks.

Yet the java process grows from an initial 100MB to at least 3GB. This
can occur in the space of a couple of hours or can take up to a day or
two with no predictability in how fast.

Presumably there is some low level JNI space allocated by Jetty (?)
which is invisible to all the JVM metrics and is not being reliably
reclaimed.

Trying 4.6.0, which we've had less problems with elsewhere, that seems
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory
buffers) and then stays stable (at least on a three day soak test).  We
could live with allocating 1.5GB to a system that should only need a few
100MB but concerned that it may not be stable in the really long term
and, in any case, would rather be able to update to more recent fuseki
versions.

Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then
keeps ticking up slowly at random intervals. We project that it would
take a few weeks to grow the scale it did under java 11 but it will
still eventually kill the machine.

Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries and
that should still leave plenty of space for OS buffers etc in the
remaining memory on a 4GB machine.








Re: Mystery memory leak in fuseki

2023-07-03 Thread Dave Reynolds

On 03/07/2023 15:07, Andy Seaborne wrote:

A possibility:

https://www.nickebbitt.com/blog/2022/01/26/the-story-of-a-java-17-native-memory-leak/

suggests workaround

-XX:-UseStringDeduplication

https://bugs.openjdk.org/browse/JDK-8277981
https://github.com/openjdk/jdk/pull/6613

which may be in Java 17.0.2


Ah, thanks hadn't spotted that. Though I was testing with 17.0.7 and, as 
you say, they claim that was fixed in 17.02.


Dave


Re: Mystery memory leak in fuseki

2023-07-03 Thread Dave Reynolds

Hi Andy,

> Could you try 4.7.0?

Will do, though each test takes quite a while :)

> This is an in-memory database?

No TDB1, sorry should have said that.

Though as I say we are leaving the system to soak with absolutely no 
queries arriving so it's not TDB churn and it's RSS that's filling up.


FWIW 3.16.0 runs at 150MB with the same max heap setting, completely 
stable. So that's 10x smaller than 4.6.0 stabilizes at. If nothing else 
that confirms that the container set up itself is not to blame.


> Micrometer/Prometheus has had several upgrades but if it is not heap and
> not direct memory (I though that was a hard bound set at start up), I
> don't see how it can be involved.

Likewise.

Dave

On 03/07/2023 14:54, Andy Seaborne wrote:

Hi Dave,

Could you try 4.7.0?

4.6.0 was 2022-08-20
4.7.0 was 2022-12-27
4.8.0 was 2023-04-20

This is an in-memory database?

Micrometer/Prometheus has had several upgrades but if it is not heap and 
not direct memory (I though that was a hard bound set at start up), I 
don't see how it can be involved.


     Andy

On 03/07/2023 14:20, Dave Reynolds wrote:
We have a very strange problem with recent fuseki versions when 
running (in docker containers) on small machines. Suspect a jetty 
issue but it's not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB 
as NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of 
a day or so to reach ~3GB of memory at which point the 4GB machine 
becomes unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks 
and (prometheus) metrics scrapes from the monitoring systems.


Furthermore the space being consumed is not visible to any of the JVM 
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly 
non-heap metaspace).

- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being 
reclaimed. Since there are no sparql queries at all we assume this is 
jetty NIO buffers being churned as a result of the metric scrapes. 
However, this direct buffer behaviour seems stable, it cycles between 
0 and 500MB on approx a 10min cycle but is stable over a period of 
days and shows no leaks.


Yet the java process grows from an initial 100MB to at least 3GB. This 
can occur in the space of a couple of hours or can take up to a day or 
two with no predictability in how fast.


Presumably there is some low level JNI space allocated by Jetty (?) 
which is invisible to all the JVM metrics and is not being reliably 
reclaimed.


Trying 4.6.0, which we've had less problems with elsewhere, that seems 
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory 
buffers) and then stays stable (at least on a three day soak test).  
We could live with allocating 1.5GB to a system that should only need 
a few 100MB but concerned that it may not be stable in the really long 
term and, in any case, would rather be able to update to more recent 
fuseki versions.


Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then 
keeps ticking up slowly at random intervals. We project that it would 
take a few weeks to grow the scale it did under java 11 but it will 
still eventually kill the machine.


Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries 
and that should still leave plenty of space for OS buffers etc in the 
remaining memory on a 4GB machine.






Mystery memory leak in fuseki

2023-07-03 Thread Dave Reynolds
We have a very strange problem with recent fuseki versions when running 
(in docker containers) on small machines. Suspect a jetty issue but it's 
not clear.


Wondering if anyone has seen anything like this.

This is a production service but with tiny data (~250k triples, ~60MB as 
NQuads). Runs on 4GB machines with java heap allocation of 500MB[1].


We used to run using 3.16 on jdk 8 (AWS Corretto for the long term 
support) with no problems.


Switching to fuseki 4.8.0 on jdk 11 the process grows in the space of a 
day or so to reach ~3GB of memory at which point the 4GB machine becomes 
unviable and things get OOM killed.


The strange thing is that this growth happens when the system is 
answering no Sparql queries at all, just regular health ping checks and 
(prometheus) metrics scrapes from the monitoring systems.


Furthermore the space being consumed is not visible to any of the JVM 
metrics:
- Heap and and non-heap are stable at around 100MB total (mostly 
non-heap metaspace).

- Mapped buffers stay at 50MB and remain long term stable.
- Direct memory buffers being allocated up to around 500MB then being 
reclaimed. Since there are no sparql queries at all we assume this is 
jetty NIO buffers being churned as a result of the metric scrapes. 
However, this direct buffer behaviour seems stable, it cycles between 0 
and 500MB on approx a 10min cycle but is stable over a period of days 
and shows no leaks.


Yet the java process grows from an initial 100MB to at least 3GB. This 
can occur in the space of a couple of hours or can take up to a day or 
two with no predictability in how fast.


Presumably there is some low level JNI space allocated by Jetty (?) 
which is invisible to all the JVM metrics and is not being reliably 
reclaimed.


Trying 4.6.0, which we've had less problems with elsewhere, that seems 
to grow to around 1GB (plus up to 0.5GB for the cycling direct memory 
buffers) and then stays stable (at least on a three day soak test).  We 
could live with allocating 1.5GB to a system that should only need a few 
100MB but concerned that it may not be stable in the really long term 
and, in any case, would rather be able to update to more recent fuseki 
versions.


Trying 4.8.0 on java 17 it grows rapidly to around 1GB again but then 
keeps ticking up slowly at random intervals. We project that it would 
take a few weeks to grow the scale it did under java 11 but it will 
still eventually kill the machine.


Anyone seem anything remotely like this?

Dave

[1]  500M heap may be overkill but there can be some complex queries and 
that should still leave plenty of space for OS buffers etc in the 
remaining memory on a 4GB machine.






Re: Performance of rebinds after multiple changes?

2023-06-14 Thread Dave Reynolds

Hi Steve,

On 14/06/2023 13:38, Steve Vestal wrote:
The help pages say a rebind will happen automatically when things are 
added or removed to an OntModel (except odd cases).  I'm curious about 
the performance impact when a sequence of multiple changes are made.  Is 
the rebind itself fast, so a sequence of changes and rebinds has no 
significant performance impact until the next request for information? 


Yes, rebind() is very cheap, it just sets a flag (or rather unsets the 
isPrepared flag). So it's not until the next query that the work will be 
redone.


Dave


Re: Add rule at runtime, without trigger not-related rule?

2023-04-06 Thread Dave Reynolds

Hi,

Responses inline ...

On 04/04/2023 23:19, L B wrote:

Here is the API location.
image.png


Note that this list doesn't support attachments so we can't see your images.

As you suggested, I started to investigate the backward rules.  Here is 
some experimental code I did


...

I.could successfully add a backward rule. Thanks for the suggestion.  
When adding triple (a C1 t) as below


infgraph.performAdd(Triple.create(a, C1, t));

I could see:  a @C2 t.  printed.     <--- backward rule triggered.

Regarding backward rule, I have a few questions:
1. It seems the backward rule does not support builtin primitives, such 
as print()?  I did not see C2Print print anything.


Correct. A backward rule, by definition, is only invoked when a query 
matches the consequent of the rule. If a rule doesn't infer any triples 
then it'll never (need to be) invoked.


You should be able to use a primitive in the body of the rule even if 
it's just a side-effecting primitive like print. So something like (not 
tested):


  (?a MYDeduction ?c) <- (?a MYPremise ?c), print('I was run') .

2. I created a forward rule (C2ForwardPrint) at the very beginning.  If 
I add triple as below


infgraph.performAdd(Triple.create(a, C2, t));

'check check forward'  will be printed

However, if I add above (a C1 t) triple, nothing happens. I assume (a C1 
t)  generats (a C2 t) from backward rule C1, then recursively calls the  
C2ForwardPrint rule to print something?


No. The hybrid reasoner is just feed-forward. The forward rules run over 
the data, make deductions and can generate additional backward rules.


When you ask a query of the model then the original data, the extra 
forward deductions plus the backward rules are used to answer it. The 
backward rules are only used on demand and they don't materialize any 
new deduction triples.


Summary: the forward rules don't "see" the results of any backward rules.


Is there anything wrong?


Well seems like it's working as expected. Whether that's what you need 
might be a different question :)


Dave



Dave Reynolds <mailto:dave.e.reyno...@gmail.com>> 于2023年4月4日周二 02:20写道:


Hi,

Very mysterious. If I check the github repository as tagged for 4.2.0 I
can't see an addRule method in BasicForwardRuleInfGraph. Clearly I'm
missing something.

You can add rules to a reasoner but you would then have to rebuild the
InfGraph/InfModel.

Like I say, you can add backward rules to a running hybrid engine, if
that's any use but I don't believe there's any support for dynamically
adding forward rules.

Dave

On 03/04/2023 19:52, L B wrote:
 > Hi Dave,
 >
 > Thanks for the update.
 >
 > The Jena version I am using is 4.2.0.
 > image.png
 >
 > Checked Jena version 4.5.0. This API seems gone.
 >
 > The system we are developing may have thousands of rules. To make
the
 > engine more efficient,  we want some rules to be loaded/unloaded
 > dynamically.  For example, the driving related rules are loaded
when the
 > user is driving. When the user gets home, those driving rules
will be
 > unloaded.
 >
 > In Jena 4.5 API, the reasoner has addRules API, but I do not know
how to
 > make it work.
 >
 > //inf is the InfModel created already.
 > GenericRuleReasoner reasoner = (GenericRuleReasoner)
inf.getReasoner();
 > reasoner.setMode(GenericRuleReasoner.HYBRID);
 > reasoner.setDerivationLogging(true);
 > reasoner.setTraceOn(true);
 >
 > List rules = Rule.parseRules(ruleStr);
 > reasoner.addRules(rules);
 > inf.prepare();
 >
 > The rule added is never triggered.  Is there any way we could
add/remove
 > rules programmatically without cause to rebuild InfModal and
recheck all
 > the existing rules?
 >
 >
 > Dave Reynolds mailto:dave.e.reyno...@gmail.com>
 > <mailto:dave.e.reyno...@gmail.com
<mailto:dave.e.reyno...@gmail.com>>> 于2023年3月30日周四 14:14写道:
 >
 >     There's no support for dynamically adding forward rules to a
"running"
 >     reasoner that I recall.
 >
 >     You can dynamically add backward rules to a hybrid reasoner,
but that's
 >     not relevant here.
 >
 >     Your code example suggests you are calling addRule on a
 >     BasicForwardRuleInfGraph but, unless I'm missing something,
that method
 >     is not in the jena source. Is it something you have added?
 >
 >     You can add rules to a reasoner but then would essentially
have to
 >     build
 >     a new InfGraph or reset the engine state of the current infgraph.
 >     Either
 >     way that's why you are seeing the original deduction retr

Re: Add rule at runtime, without trigger not-related rule?

2023-04-04 Thread Dave Reynolds

Hi,

Very mysterious. If I check the github repository as tagged for 4.2.0 I 
can't see an addRule method in BasicForwardRuleInfGraph. Clearly I'm 
missing something.


You can add rules to a reasoner but you would then have to rebuild the 
InfGraph/InfModel.


Like I say, you can add backward rules to a running hybrid engine, if 
that's any use but I don't believe there's any support for dynamically 
adding forward rules.


Dave

On 03/04/2023 19:52, L B wrote:

Hi Dave,

Thanks for the update.

The Jena version I am using is 4.2.0.
image.png

Checked Jena version 4.5.0. This API seems gone.

The system we are developing may have thousands of rules. To make the 
engine more efficient,  we want some rules to be loaded/unloaded 
dynamically.  For example, the driving related rules are loaded when the 
user is driving. When the user gets home, those driving rules will be 
unloaded.


In Jena 4.5 API, the reasoner has addRules API, but I do not know how to 
make it work.


//inf is the InfModel created already.
GenericRuleReasoner reasoner = (GenericRuleReasoner) inf.getReasoner();
reasoner.setMode(GenericRuleReasoner.HYBRID);
reasoner.setDerivationLogging(true);
reasoner.setTraceOn(true);

List rules = Rule.parseRules(ruleStr);
reasoner.addRules(rules);
inf.prepare();

The rule added is never triggered.  Is there any way we could add/remove 
rules programmatically without cause to rebuild InfModal and recheck all 
the existing rules?



Dave Reynolds <mailto:dave.e.reyno...@gmail.com>> 于2023年3月30日周四 14:14写道:


There's no support for dynamically adding forward rules to a "running"
reasoner that I recall.

You can dynamically add backward rules to a hybrid reasoner, but that's
not relevant here.

Your code example suggests you are calling addRule on a
BasicForwardRuleInfGraph but, unless I'm missing something, that method
is not in the jena source. Is it something you have added?

You can add rules to a reasoner but then would essentially have to
build
a new InfGraph or reset the engine state of the current infgraph.
Either
way that's why you are seeing the original deduction retrigger.

If you need to fire some action when adding triples to a model, and
dynamically change the processing, I'd suggest using the listener
machinery rather than the rule system.

Dave

On 28/03/2023 18:43, L B wrote:
 > I am currently still trying to figure out the solution.
 >
 > 1. If I do not call coreInfModel.prepare(), nothing happens
during the
 > triple update which is WAD, I believe.
 > 2. When one triple is updated, all the rules which have true
condition will
 > be triggered, rather than the rules related. I am expecting the
other rules
 > will not be triggered unless its related triplets are updated. Is it
 > possible?
 >
 >
 > L B mailto:dayday...@gmail.com>> 于2023年3
月24日周五 16:10写道:
 >
 >>
 >> 1) Init code:
 >>
 >> GenericRuleReasoner reasoner = new GenericRuleReasoner(new
 >> ArrayList());
 >>              reasoner.setMode(GenericRuleReasoner.FORWARD);
 >>              mCoreInfModel = ModelFactory.createInfModel(reasoner,
 >> mCoreModel);
 >>
 >> 2) Statement exits
 >>   (:A :B :C)
 >>
 >> 3) Rule exists and triggered before
 >>
 >>              [ alwaysTrue:
 >>                  (:A :B :C)
 >>                  -> triggerAction
 >>              ]
 >>
 >> 4) Now, I am trying to add rules at runtime.
 >>
 >>               [ rule2:
 >>                  (:X :Y :Z)
 >>                  -> triggerAction2
 >>              ]
 >>
 >>          InfModel infModel = getCoreInfModel();
 >>          BasicForwardRuleInfGraph infGraph =
(BasicForwardRuleInfGraph)
 >> infModel.getGraph();          infGraph.addRule(rule)
 >>          coreInfModel.prepare()
 >>
 >> 5) add triple (:X :Y :Z)
 >>
 >>          val infGraph = infModel.graph as BasicForwardRuleInfGraph
 >>         infGraph.add(triple)
 >>
 >> *The problem comes, both "alwaysTrue" and "rule2" triggered.*
 >>
 >> *I am expecting only "rule2" is triggered. *
 >>
 >> Is there anyway that only "rule2" can be triggered since the updated
 >> triple is only X, Y Z?
 >>
 >> Many thanks
 >>
 >>
 >>
 >



Re: Add rule at runtime, without trigger not-related rule?

2023-03-30 Thread Dave Reynolds
There's no support for dynamically adding forward rules to a "running" 
reasoner that I recall.


You can dynamically add backward rules to a hybrid reasoner, but that's 
not relevant here.


Your code example suggests you are calling addRule on a 
BasicForwardRuleInfGraph but, unless I'm missing something, that method 
is not in the jena source. Is it something you have added?


You can add rules to a reasoner but then would essentially have to build 
a new InfGraph or reset the engine state of the current infgraph. Either 
way that's why you are seeing the original deduction retrigger.


If you need to fire some action when adding triples to a model, and 
dynamically change the processing, I'd suggest using the listener 
machinery rather than the rule system.


Dave

On 28/03/2023 18:43, L B wrote:

I am currently still trying to figure out the solution.

1. If I do not call coreInfModel.prepare(), nothing happens during the
triple update which is WAD, I believe.
2. When one triple is updated, all the rules which have true condition will
be triggered, rather than the rules related. I am expecting the other rules
will not be triggered unless its related triplets are updated. Is it
possible?


L B  于2023年3月24日周五 16:10写道:



1) Init code:

GenericRuleReasoner reasoner = new GenericRuleReasoner(new
ArrayList());
 reasoner.setMode(GenericRuleReasoner.FORWARD);
 mCoreInfModel = ModelFactory.createInfModel(reasoner,
mCoreModel);

2) Statement exits
  (:A :B :C)

3) Rule exists and triggered before

 [ alwaysTrue:
 (:A :B :C)
 -> triggerAction
 ]

4) Now, I am trying to add rules at runtime.

  [ rule2:
 (:X :Y :Z)
 -> triggerAction2
 ]

 InfModel infModel = getCoreInfModel();
 BasicForwardRuleInfGraph infGraph = (BasicForwardRuleInfGraph)
infModel.getGraph();  infGraph.addRule(rule)
 coreInfModel.prepare()

5) add triple (:X :Y :Z)

 val infGraph = infModel.graph as BasicForwardRuleInfGraph
infGraph.add(triple)

*The problem comes, both "alwaysTrue" and "rule2" triggered.*

*I am expecting only "rule2" is triggered. *

Is there anyway that only "rule2" can be triggered since the updated
triple is only X, Y Z?

Many thanks







Re: Who's using Tomcat 9 or earlier?

2023-02-23 Thread Dave Reynolds

On 22/02/2023 16:28, Andy Seaborne wrote:
Java EE is becoming Jakarta EE as part of the transfer to the Eclipse 
Foundation.


APIs javax.* become jakarta.*

This affects the Fuseki war file. The migration should not be 
user-visible for Fuseki as a runnable jar file.


Because the war file runs in a webapp container, the container execution 
environment matters.


Apache Tomcat version 10 switches from javax.* to jakarta.*.

For the runnable jars, it is matter of rename and switch to Jetty 11.
Jetty version 10 and Jetty version 11 are the same except for the 
package renaming.


Who's using Tomcat 9 or earlier to run the Fuseki warfile?
Is there anything stopping you upgrading to Tomcat 10?


We in general use Tomcat 9 or earlier due to the cost of migrating to 10 
(sheer number of systems rather than hard blocker). Though, to be fair, 
our current production use of fuseki is mostly based on the standalone 
version and not the war.


However, we make significant use of embedded fuseki in tests and in some 
cases do so with Spring Boot 2. Attempts to migrate some systems to 
Spring Boot 3 got mired in many backward incompatibilities so I don't 
see us managing to ditch SB 2 wholesale any time soon.



Has anyone tried the Tomcat-provided migration tool?
https://github.com/apache/tomcat-jakartaee-migration


No, sorry.

Dave


Re: How to implicitly integrate OWL ontology?

2023-02-06 Thread Dave Reynolds

On 06/02/2023 09:42, Yang-Min KIM wrote:
Le lun., févr. 6 2023 at 10:30:17 +0100, Lorenz Buehmann 
 a écrit :

SWRL


Dear Dave and Lorenz,

Thank you for your reply!
As I am a beginner in ontology, I do not yet know all the different 
terms, e.g. SWRL, I'm checking!


My exemple (father-child...etc) is not related to the link (Biolink 
Model), and actually Biolink Model provides also in OWL (ttl) format: 



I don't code in Java, but as advised by Dave, I'll see the Fuseki 's 
"Inference" section.


Ah, if you don't code in java then ignore that bit of advice. Stick to 
using fuseki and just configuring it. But do start with a smaller example.


Dave




Re: How to implicitly integrate OWL ontology?

2023-02-06 Thread Dave Reynolds
To configure use of a reasoner with fuseki see 
https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html 
under the section "Inference".


The reasoners are not graph-aware so the union of your ontology and your 
instance data all need to appear in the default graph. Either by loading 
them there directly OR by loading them as separate graphs and setting 
default union flag.


However, the link you provide to your ontology doesn't match your prose 
example in any way at all. In particular it seems to be a mix of skos 
and linkml (whatever that is) and I see virtual no OWL in in there.[*] 
Though it is a 1.3Mb turtle file so who knows what's lurking. So on the 
face of it there's no OWL to reason over and you won't get any useful 
results.


My advice would be to isolate a smaller test example of the kind of 
reasoning you are trying to do and check that programmatically see 
https://jena.apache.org/documentation/inference/index.html#OWLexamples


Then, if it seems like inference does work, you can tackle the separate 
problem of setting that up within fuseki.


Dave

[*] In particular there's no use of rdfs:subClassOf. There are 188 
owl:inverseOf states but they are applied to things of type 
linkml:SlotDefinition which makes no sense at all.


On 03/02/2023 10:16, Yang-Min KIM wrote:

Dear Jena community,

I hope your day is going great.
I have a question about the ontology: we want to request an ontoogy data 
A that also import another ontology OWL-B.


e.g.

A includes:
 John is Male.
 John has a daughter called Monica.

OWL-B includes:
 Daughter is subclass of Children. (rdfs:subClassOf)
 If X is Male and has Children Y, X is father of Y. (owl:inverseOf)

What I want to query:
 Who is Monica's father?

Expected response:
 John


To get expected response, Jena needs to include OWL-B then manage 
implicit statement. However, I got results by explicit querying only: 
Who is John's daughter? -> Monica


I'm sure there is a solution since I see "The OWL reasoner" in 



Are there additional steps to include Ontology structure? (We are using 
Fuseki's API REST)
Is it better to import OWL-B as a default graph or a named graph? and 
what if we have several OWL files to import?


P.S. here is an example of our OWL file, BIolink Model, downloadable via 



Thank you for your time.




Re: Builtin primitives in rule not triggered

2023-02-01 Thread Dave Reynolds

On 31/01/2023 22:25, L B wrote:

The current implementation only matches the triples when facts are
updated.  


Correct, that's kind of the point of it.


Do we have any plan to support built-in primitives?


? There are built in primitives and you can write your own primitives. 
That doesn't change the structure of rule evaluation



such as the rule below, the rule will be evaluated when  uri1 or uri2 is
updated.

[rule1:
built-inName1(uri1, built-inName2(uri2)) -> 
]


What does it mean to "update a uri"? Or match on one for that matter?

RDF is a set of triples, uris are just one of the types of atom that can 
be used to construct triples but don't themselves change. It's only 
facts about them that change. So if you want to fire some rule when a 
fact about either uri1 or uri2 changes then that's just:


(uri1 ?p1 ?o1), (uri2 ?p2 ?o2), f(uri1, uri2, ?result) -> x

The rule systems in Jena were developed to enable deductions to be made 
over a set of triples. If the base triples change the rules are supposed 
to figure out the new deduced triples. That's it.


They were designed to be relatively efficient when you are just adding 
more facts, and monotonic. So when you add more facts (triples) only the 
new facts need to be processed and you only get more deduced triples 
added, no old deductions gets removed. Over time we added some 
non-monotonic features (e.g. "remove", cough) because people found that 
useful but the system wasn't designed with the in mind and those 
non-monotonic features can be clunky.


If you are doing something too far outside this design centre then, as I 
said before, you may be better off using the generic listener events.


Dave



Dave Reynolds  于2023年1月28日周六 04:56写道:


Once a rule has triggered it will only retrigger if the data changes in
a way that affects that rule's bindings. So a rule which doesn't match
anything on the LHS will not normally be retriggered.

The behaviour you are seeing is as expected for the rule system.

If your goal is to do something whenever data changes irrespective of
what the change is then you either need a rule with a pattern that will
match any data i.e. (?s ?p ?o) or don't use rules but use graph
listeners via the GraphEventManager interface.

Dave

On 27/01/2023 21:31, L B wrote:

Thanks for the test code.

Please correct me if I am wrong. Per the doc of listStatements:  Find all
the statements matching a pattern.

My problem is that when I update the facts, this rule is not be triggered
(executed).  I assume this rule will be triggered every time when the

fact

is updated.

On the other hand, a leading triple can bypass it. For example

[test1: (a b ?c) now(?x) -> print(\"now test\") ].   When you update

triple

(a, b, xxx), the rule will be executed.

Lorenz Buehmann  于2023年1月26日周四

23:13写道:



I cannot reproduce this. For example, the test code


public static void main(String[] args) {
   String raw = "<http://ex.org/a> <http://ex.org/p>
<http://ex.org/b> .";
   Model rawData = ModelFactory.createDefaultModel();
   rawData.read(new StringReader(raw), null, "N-Triples");
   String rules =
   "[test1: now(?x) -> print(\"now test\") ]";
   Reasoner reasoner = new
GenericRuleReasoner(Rule.parseRules(rules));
   InfModel inf = ModelFactory.createInfModel(reasoner, rawData);
   System.out.println("A * * =>");
   StmtIterator iterator = inf.listStatements(null, null,
(RDFNode) null);
   while (iterator.hasNext()) {
   System.out.println(" - " + iterator.next());
   }
}


does in fact print "now test" to the console.


On 26.01.23 19:43, L B wrote:

test1: now(?x) -> print("now test")










Re: Builtin primitives in rule not triggered

2023-01-28 Thread Dave Reynolds
Once a rule has triggered it will only retrigger if the data changes in 
a way that affects that rule's bindings. So a rule which doesn't match 
anything on the LHS will not normally be retriggered.


The behaviour you are seeing is as expected for the rule system.

If your goal is to do something whenever data changes irrespective of 
what the change is then you either need a rule with a pattern that will 
match any data i.e. (?s ?p ?o) or don't use rules but use graph 
listeners via the GraphEventManager interface.


Dave

On 27/01/2023 21:31, L B wrote:

Thanks for the test code.

Please correct me if I am wrong. Per the doc of listStatements:  Find all
the statements matching a pattern.

My problem is that when I update the facts, this rule is not be triggered
(executed).  I assume this rule will be triggered every time when the fact
is updated.

On the other hand, a leading triple can bypass it. For example

[test1: (a b ?c) now(?x) -> print(\"now test\") ].   When you update triple
(a, b, xxx), the rule will be executed.

Lorenz Buehmann  于2023年1月26日周四 23:13写道:


I cannot reproduce this. For example, the test code


public static void main(String[] args) {
  String raw = " 
 .";
  Model rawData = ModelFactory.createDefaultModel();
  rawData.read(new StringReader(raw), null, "N-Triples");
  String rules =
  "[test1: now(?x) -> print(\"now test\") ]";
  Reasoner reasoner = new
GenericRuleReasoner(Rule.parseRules(rules));
  InfModel inf = ModelFactory.createInfModel(reasoner, rawData);
  System.out.println("A * * =>");
  StmtIterator iterator = inf.listStatements(null, null,
(RDFNode) null);
  while (iterator.hasNext()) {
  System.out.println(" - " + iterator.next());
  }
}


does in fact print "now test" to the console.


On 26.01.23 19:43, L B wrote:

test1: now(?x) -> print("now test")






Re: all(?P, ?D) function in doc

2023-01-13 Thread Dave Reynolds

There isn't an all function as such.

That's the notation for a structured value in the rule system. The rule 
system supports structured values which are essentially named tuples, 
using the syntax:


   name(value1, value2, ... valuen)

Think of them as like prolog predicates but limited to a single level. 
Since "predicate" means something else in RDF land, and "tuple" would 
also have been confusing we called these structures functors, which may 
or may not have been a good choice of name.


Internally they are represented as a special kind of structured literal 
but they are not supposed to escape out into query results. They are 
just used to pass information between sets of rules.


So the allID rule is gathering all the information from the triples that 
make up an allValues restriction into one data structure, then the later 
allX rules are triggered off that data structure. Basically helps with 
performance and readability.


Dave


On 13/01/2023 01:38, L B wrote:

In the official document, there are some example code shown as below.

https://jena.apache.org/documentation/inference/

My question is where the all() function defined?

[allID: (?C rdf:type owl:Restriction), (?C owl:onProperty ?P),
  (?C owl:allValuesFrom ?D)  -> (?C owl:equivalentClass all(?P, ?D)) ]

[all2: (?C rdfs:subClassOf all(?P, ?D)) -> print(‘Rule for ‘, ?C)
[all1b: (?Y rdf:type ?D) <- (?X ?P ?Y), (?X rdf:type ?C) ] ]

[max1: (?A rdf:type max(?P, 1)), (?A ?P ?B), (?A ?P ?C)
-> (?B owl:sameAs ?C) ]



Re: How to implement min/max cardinality in apache Jena Generic Rule Reasoner?

2022-09-18 Thread Dave Reynolds
If you just want a rule to fire if a min/max cardinality is declared 
then you have to match on the triples that make up the declaration 
syntax. So I guess (unchecked) something like:


(?myclass rdfs:subClassOf ?restriction) (?restriction 
owl:minCardinatlity 5) (?restriction owl:onProperty ?prop) ...


If you want to test whether a give instance satisfies such a constraint 
then that's non-monotonic and not well suited to Jena rules. It's 
possible but so painful as to not be worth bothering.


Dave

On 15/09/2022 13:21, Roman Shvetsov wrote:


Hello, guys!

I’m calculating performance with different reasoning engines.

For now I’ve got stuck on a problem of how can I implement rules for 
generic rule reasoner with min/max cardinality?


With SWRL rules I could do it like this: “Rule: Subject(?p), 
(hasRelation min 5 owl:Thing)(?s) -> ReasonedType(?p)”. How can I do 
this with rules in generic rule reasoner?






*
 Roman Shvetsov*
* Senior Data Scientist*
**971589249942
**roman.shvet...@bayanat.ai
www.bayanat.ai 




/*Disclaimer:* This communication may contain information which is 
confidential. It is exclusively to the intended recipient(s). If you 
are not the intended recipient(s), please: (1) notify the sender by 
forwarding this email and delete all copies from your system and (2) 
note that disclosure, distribution, copying or use of this 
communication is strictly prohibited. /


  ­­ 

Re: How to use reasoning in Jena Fuseki?

2022-07-02 Thread Dave Reynolds

Also note that:

   (?x rdfs:subClassOf ?y) and (?y rdfs:subClassOf ?z)

do not entail

   (?x rdf:type ?z)

Perhaps you meant you are starting with:

   (?x rdf:type ?y) and (?y rdfs:subClassOf ?z)

Dave

On 01/07/2022 15:14, Lorenz Buehmann wrote:
I guess you have to use an assembler file to configure reasoning, at 
least for anything beyond RDFS (for just RDFS Simple, you can start 
fuseki with --rdfs param)


Here is a blog post how to do this: 
https://apothem.blog/apache-jena-fuseki-adding-reasoning-and-full-text-search-capabilities-to-a-dataset.html 



On 01.07.22 15:48, Dương Hồ wrote:

Hi all !
I'm use Jena Fuseki webapp but i can't find setup reasoning in UI.
If i have (?x rdfs:subClassOf ?y) and (?y rdfs:subClassOf ?z)
so how to run reasoning to (?x rdf:type ?z)
Can you help me ?



Re: [TDB2 & Fuseki 4.4.0] Huge tdb2 database disk size when performing incremental SPARQL update to endpoint.

2022-02-10 Thread Dave Reynolds

While I can't help with the substance of this question ...

> Since, as far as I know, the latest fuseki (4.4.0) no longer supports 
TDB1


I don't think that's correct. While there are new features of TDB2 in 
the new release (the faster loader) I don't believe TDB 1 has been 
deprecated let alone dropped.


Dave

On 10/02/2022 16:58, Cédric Viaccoz wrote:

Hello everyone,

I deploy a data treatment pipeline at the University of Geneva where a 
linked data platform, Fedora Commons Repository 
(https://duraspace.org/fedora/ ) database 
is loaded with researchers’ data, and then its RDF metadata is 
synchronized/uploaded to a fuseki triplestore. The synchronization tool 
I use is the fcrepo-indexing-triplestore messaging application from the 
fcrepo-camel-toolbox 
(https://github.com/fcrepo-exts/fcrepo-camel-toolbox 
), basically an 
Apache Camel application designed to synchronize Fedora with an external 
triplestore.


Since, as far as I know, the latest fuseki (4.4.0) no longer supports 
TDB1, I opted to migrate all the projects’ data to TDB2, meaning 
synchronizing the whole of the data from Fedora to Fuseki, this time 
making the camel app pointing to TDB2 based endpoints.



However, I noticed that the data volume as it is stored in fuseki in the 
“/databases” folder increased drastically in TDB2 compared 
to TDB1. For instance, a dataset which used to occupy 74Mb of data on 
TDB1 now weighs more than 11Gb! After some investigation I hypothesized 
that incremental insertion of triples in TDB2 endpoint create bigger 
disk footprint than a single batch load (where as in TDB1 both loading 
strategy leads to the same disk footprint).


It is quite tiresome to replicate my precise use case, because it 
requires deploying a Fedora repository and a camel application, so 
instead I included to this mail a zip containing a small sample of our 
data as a turtle file and a python script that “emulates” the behavior 
of the data synchronization between fedora and fuseki. If you create a 
persistent TDB2 dataset on your local fuseki listening on localhost port 
3030, and name this dataset “gypso”, then running the Python script 
“triplestore_incremental_update.py” will, for each single triple from 
the “gypso.ttl” file, send an INSERT DATA {} sparql query to the fuseki 
gypso/update endpoint. Please note that the phython script uses the 
package rdflib, so installing it through “pip install rdflib” previously 
might be necessary. On my Debian server, the resulting size of the 
database (can be checked  by the linux command “du -h 
/databases/gypso/Data-001”) was 50Mb, whereas directly 
uploading the “gypso.ttl” file to then endpoint results in a size of 
only 538Kb even though the data and query performance is identical after 
either loading strategy.


I know that as a workaround I could serialize all the data from our 
infrastructure into compact turtle files and then directly uploads them 
to TDB2 endpoints, but the data on Fedora side gets updated regularly, 
so having the camel application taking care of doing automatic 
synchronization is necessary, besides this was not an issue at all on 
TDB1. Would anyone have an idea what might be the culprit behind this 
behavior ?


If you need additional details, by looking at the individual file size 
under “Data-001” I noticed that only the following files grow between 
the two different loading strategies : “SPO.idn”, “nodes.idn”, 
“nodes.dat”, “OSP.dat”, “POS.idn”, “OSP.idn”, “POS.dat” and “SPO.dat”. I 
also have included to this mail a screenshot displaying a side-by-side 
comparison of the size of the databases files between gypso.ttl loaded 
incrementally on the left, and as a single file upload and the right. 
Hope this can maybe give a more low-level vision on the issue.


Best regards,

Cédric Viaccoz

*Concepteur-Développeur au sein du domaine fonctionnel “Recherche et 
Information Scientifique (RISe)”*


Division du système et des technologies de l'information et de la 
communication/ IT Services (DISTIC)


Université de Genève | 24 rue Général-Dufour | Bureau 338

Tél : +41 22 379 71 10



Re: How do you determine whether a triple from a query result has been inferred?

2021-12-06 Thread Dave Reynolds

On 06/12/2021 08:16, Simon Gray wrote:

I see, thanks for the swift reply.

Would you agree that a decent approach could be making a secondary query into a 
graph of the raw data, comparing the result sets using set 
difference/intersection? The main issue I see with this approach is that the 
execution time effectively becomes N*M which could be a problem for 
particularly heavy queries.


Yes, you would need to test each result triple for presence in the base 
graph. I'm not sure what the indexing structure of in-memory graphs in 
Jena is these days but hopefully the index is good enough to bring that 
down the more like mlog(n).


If this operation is a significant bottleneck then you could, in 
principle, use a bloom filter to speed up the average membership test.


Dave




Den 6. dec. 2021 kl. 09.08 skrev Dave Reynolds :

On 06/12/2021 07:57, Simon Gray wrote:

I would like to display inferred triples differently in my UI, but I’m unsure 
how to programmatically ascertain which triple is from inference and which is 
raw data. The only way I can think of that might work would be to make a 
secondary look-up in the raw data graph and compare the difference of the 
result sets, but I thought maybe there is a utility method or some kind of 
attached metadata I could use on the individual triple instead.


Sorry no, there's no triples-with-inference-metadata API in jena.

There is the derivations API but that's just for drilling down into single 
deductions and is way more expensive than a simple test against the base model.

With forward inference the deductions are stored separately from the base 
triples (getDeductionsGraph()) but that doesn't really help you.

Dave






Re: How do you determine whether a triple from a query result has been inferred?

2021-12-06 Thread Dave Reynolds

On 06/12/2021 07:57, Simon Gray wrote:

I would like to display inferred triples differently in my UI, but I’m unsure 
how to programmatically ascertain which triple is from inference and which is 
raw data. The only way I can think of that might work would be to make a 
secondary look-up in the raw data graph and compare the difference of the 
result sets, but I thought maybe there is a utility method or some kind of 
attached metadata I could use on the individual triple instead.


Sorry no, there's no triples-with-inference-metadata API in jena.

There is the derivations API but that's just for drilling down into 
single deductions and is way more expensive than a simple test against 
the base model.


With forward inference the deductions are stored separately from the 
base triples (getDeductionsGraph()) but that doesn't really help you.


Dave




Re: abox, tbox for owl in jena

2021-11-05 Thread Dave Reynolds

On 04/11/2021 09:34, Luis Enrique Ramos García wrote:

Dear All,

I found this information regarding the use of the so called ABox and TBox
with a reasoner in Jena:
https://jena.apache.org/documentation/ontology/

However, I wonder if there is the possibility, after creating and
populating an ontology with instances, to split the ontology in abox,
tbox?.


There's certainly no built in operation in Jena to make that sort of 
separation. In principle someone could develop such a thing but not sure 
how useful it would be - especially given how some communities use 
classes for things that others would regard as instance data.



I also wonder if loading these separated graphs to a reasoner should be
more or less efficient in computational terms, in comparison to loading
both ( ABox and TBox ) just in only one file or ontology?, or does it not
make any difference?


Would make very little difference - the separation would slow things 
down fractionally but not enough to notice.


Dave


Re: Reference Data

2021-11-02 Thread Dave Reynolds
This isn't really a question about Jena usage but about application and 
data design.


Typically for reference data such as geography you would want your 
reference data to be linkable to existing managed IDs for these things. 
At least for UK geography there's a wealth of well-maintained identifier 
schemes including support and open data resources for using them in 
linked data.


There's the OS Linked data service which provides URIs and descriptions 
for both administrative geography (from Boundary Line) and places names 
from both OS Names and the 50K Gazetteer. These URIs build on and link 
to the OS identifiers like TOIDS. https://data.ordnancesurvey.co.uk/


Then for statistical geography there's the ONS linked data service which 
provides URIs that build on and link to GSS codes: 
https://www.ons.gov.uk/methodology/geography/geographicalproducts/geographylinkeddata


So you at least have the option to reuse either or both of these schemes 
and include in your store whatever level of description of these 
entities you need for your purpose. The whole of the OS linked data, 
excluding postcodes (CodePoint open) is only about 30M triples.


Dave


On 02/11/2021 16:46, Matt Whitby wrote:

I'll try and keep my question relatively succinct if I can.

The top level question is we're trying to decide whether to have reference
data within the triplestore, or whether we have it externally in a standard
relational database.

Wikidata implements each SUBJECT as a URI (Q Code), which we would assume
is allocated a number in an RDBMS, etc. somewhere. It then resolves the
code back to a label with it's label service. We can certainly do this, but
it's an overhead to have to resolve all the names back.

Alternatively, do we have - say - our County, District, Parish data entries
within the triplestore? So, if we go that route how would we construct a
URI without going outside of Jena?

We can't have a URL along the lines of;
www.test.com/schema/spatial/parish/abberton because Parish names are
non-unique.

I hope that makes some semblance of sense.

Matt.



Re: Suggestions for OWL_MEM_MICRO_RULE_INF ConcurrentModificationException?

2021-11-01 Thread Dave Reynolds

On 01/11/2021 17:18, Steve Vestal wrote:

Thanks, this started me down the path to a solution.

For the record, I searched through my code, and I could not find 
multiple threads accessing the model.  This was occurring during a 
SPARQL query of an OntModel.  Is it possible ARQ uses threads? Below is 
what I tried, which it seems does add triples to the model. 


Yes, createClass will add a statement so that the given resource is 
indeed a class, in case presumably: " rdf:type owl:Class.".


So you are adding statements to a model that you are iterating over, the 
reasoner will be testing for things like type statements, hence the CME, 
no threads needed.


I still got 
a ConcurrentModificationException at the second iteration of the 
selectResult.hasNext() call, with these transaction statements added. 



When I commented-out the createClass (and its subsequent use), the 
exception disappeared. I could restructure the code so that the recovery 
of the Resource as an OntClass happened outside this loop.  I'm not sure 
why it didn't recognize that thisClassResource was an OntClass in the 
ontoModel, though.


Presumably the resources found by your query aren't (or aren't all) 
explicitly declared as instances owl:Class.


Dave



     QueryExecution selectExec = 
QueryExecutionFactory.create(textSelectQuery, ontoModel);

     Dataset selectData = selectExec.getDataset();
     selectData.begin(ReadWrite.WRITE);
     ResultSet selectResult = selectExec.execSelect();
         while (selectResult.hasNext()) {
      RDFNode thisClassNode = 
selectSolution.get(STRUCTURE_CLASS_VAR);

      Resource thisClassResource = thisClassNode.asResource();
             //OntClass thisClass = thisClassNode.as(OntClass.class); // 
didn't work for me.
             OntClass thisClass = 
ontoSetModel.createClass(thisClassResource.getURI());  // attempted 
workaround for above

             selectData.commit()
             // other stuff...
         }
     selectData.end();  // Doesn't get this far

On 10/29/2021 4:00 AM, Andy Seaborne wrote:

Steve,

Is your usage multithreaded?  If so, you'll need to make sure that 
usage is mutlireaer or single writer.


Using jena transaction mecahnism is best - they work with datasets and 
choose the best implementation for the datasets.  For ones containing 
inference, that's MRSW locking.


Another approach is to not reuse the inf model but to create a new 
one. Any operation gets the model to work with from some 
AtomicReference<>.


Then outstanding threads finish what they are doing with the old setup 
while new requests get the new view. Teh garbage collector wil reclaim 
space sometime after all the outstanding old operatiosn have finished.


    Andy

On 28/10/2021 13:26, Steve Vestal wrote:
Does anyone have any suggestions on things to try to avoid a 
ConcurrentModificationException when using 
OWLReasoner.OWL_MEM_MICRO_RULE_INF?  Or what the potential 
consequences of that are?  (The below stack dump only goes back to 
where my code made the call, the full one is fairly lengthy and full 
of Eclipse stuff as well as mine.  This is Jena 3.17.)


I am doing something a bit odd.  There is one imported model that 
gets reloaded from time-to-time, at the end of which I do an 
ontModel.getBaseModel().rebind().  (The overall intent is sort of a 
backwards base v schema workflow, where it is a small set of 
definitions and axioms applied to the same big base model that get 
changed.)  I get this exception shortly after doing a reload/rebind, 
such as the first one or few queries (as in this stackdump).  After 
that things seem to work OK. I only get the one exception after a 
reload/rebind.  I'd still like to (someday) understand what I'm 
doing, though.


Openllet/Pellet doesn't do this, but that is overkill and noticeably 
slower for many workflows.


There is some punning done in the big base model, but works OK in 
many workflows.  This is the only case where I have seen anything 
other than a few "not supported" warnings.


java.util.ConcurrentModificationException
     at 
org.apache.jena.reasoner.rulesys.impl.LPTopGoalIterator.checkCME(LPTopGoalIterator.java:248) 

     at 
org.apache.jena.reasoner.rulesys.impl.LPTopGoalIterator.hasNext(LPTopGoalIterator.java:222) 

     at 
org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90) 

     at 
org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90) 

     at 
org.apache.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:55) 

     at 
org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90) 

     at 
org.apache.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:55) 

     at 
org.apache.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:143) 

     at 
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114) 

     at 

Re: How to pun a URI?

2021-10-25 Thread Dave Reynolds

On 24/10/2021 20:33, Steve Vestal wrote:

I tried the following.  All get a ConversionException.

     //public static final OntModelSpec INFERENCE_RULES = 
OntModelSpec.OWL_DL_MEM;
     public static final OntModelSpec INFERENCE_RULES = 
OntModelSpec.OWL_MEM;
     //public static final OntModelSpec INFERENCE_RULES = 
OntModelSpec.OWL_MEM_MICRO_RULE_INF;
     //public static final OntModelSpec INFERENCE_RULES = 
OntModelSpec.OWL_MEM_MINI_RULE_INF;
     //public static final OntModelSpec INFERENCE_RULES = 
OntModelSpec.OWL_LITE_MEM;


Odd. I had tried what I thought was your test case before answering.

OntModel m = 
ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);

String uri = "http://ex.org/A;;
OntClass cls = m.createClass(uri);
cls.createIndividual(uri);
System.out.println("Test model " + uri);
m.write(System.out, "Turtle");

works for me with current Jena. Giving:

Test model http://ex.org/A
@prefix owl:  <http://www.w3.org/2002/07/owl#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

<http://ex.org/A>  rdf:type  <http://ex.org/A> , owl:Class .

Dave



On 10/24/2021 1:28 PM, Dave Reynolds wrote:

On 24/10/2021 12:42, Steve Vestal wrote:

I omitted a detail that seems to be important.

                     //OntModel m = 
ModelFactory.createOntologyModel();
                     OntModel m = 
ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM);

                     String uri = "http://ex.org/A;;
                     OntClass cls = m.createClass(uri);
                     cls.createIndividual(uri);
                     System.out.println("Test model " + uri);
                     m.write(System.out, "Turtle");

If I run the version without an OntModelSpec, it works.  If I run the 
version with OntModelSpec.OWL_DL_MEM, then I get a 
ConversionException. I am running Jena 3.17. 


Use the OWL_MEM* OntModelSpecs not OWL_DL_MEM. OWL_MEM is set up to 
support OWL Full and so allows punning.


OWL_MEM itself does not include any reasoner configuration, but then 
none is needed for this test case case. The OWL_MEM rule reasoner 
configurations support various subsets of OWL full so are also 
compatible with punning.


The Javadoc says the default is a "weak reasoner."  I may be able to 
work around this in this one app, but I'm wondering what will happen 
when the model is later loaded with another choice of reasoner such 
as OWL_DL_MEM or Openllet.


OWL_DL_MEM doesn't include any reasoner, I can't speak to what 
Openllet supports in the way of punning.


Dave


Re: How to pun a URI?

2021-10-24 Thread Dave Reynolds

On 24/10/2021 15:16, Steve Vestal wrote:
I searched a bit for a way to declare a HasKey axiom.  Does this mean 
that is a futile exercise?


As Lorenz says, there is no API or reasoner support in Jena for OWL 2 so 
no there's no convenient API for declaring a HasKey axiom.


HasKey, like all OWL axioms, is expressed syntactically as just a set of 
RDF statements so you could manually create the correct triples using 
the RDF API which would allow you to create a model that you could 
export to other tools. However, there's no reasoner support in Jena 
itself for it.


Dave


On 10/24/2021 6:57 AM, Lorenz Buehmann wrote:
yeah, punning is a new feature of OWL 2, Jena does only support OWL 1 
and some parts of OWL 2. Whether there is a workaround or room for a 
feature request, Dave Reynolds knows better for sure.


On 24.10.21 13:48, Martynas Jusevičius wrote:

Could this be the reason?
“ OWL DL includes all OWL language constructs with restrictions such as
type separation (a class can not also be an individual or property, a
property can not also be an individual or class).”

https://www.w3.org/TR/owl-guide/

On Sun, 24 Oct 2021 at 13.43, Steve Vestal 


wrote:


I omitted a detail that seems to be important.

  //OntModel m = 
ModelFactory.createOntologyModel();

  OntModel m =
ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM);
  String uri = "http://ex.org/A;;
  OntClass cls = m.createClass(uri);
  cls.createIndividual(uri);
  System.out.println("Test model " + uri);
  m.write(System.out, "Turtle");

If I run the version without an OntModelSpec, it works.  If I run the
version with OntModelSpec.OWL_DL_MEM, then I get a ConversionException.
I am running Jena 3.17.  The Javadoc says the default is a "weak
reasoner."  I may be able to work around this in this one app, but I'm
wondering what will happen when the model is later loaded with another
choice of reasoner such as OWL_DL_MEM or Openllet.

On 10/20/2021 5:30 AM, Lorenz Buehmann wrote:

OntModel m = ModelFactory.createOntologyModel();
String uri = "http://ex.org/A;;
OntClass cls = m.createClass(uri);
cls.createIndividual(uri);
m.write(System.out, "Turtle");


Re: How to pun a URI?

2021-10-24 Thread Dave Reynolds

On 24/10/2021 12:42, Steve Vestal wrote:

I omitted a detail that seems to be important.

                     //OntModel m = ModelFactory.createOntologyModel();
                     OntModel m = 
ModelFactory.createOntologyModel(OntModelSpec.OWL_DL_MEM);

                     String uri = "http://ex.org/A;;
                     OntClass cls = m.createClass(uri);
                     cls.createIndividual(uri);
                     System.out.println("Test model " + uri);
                     m.write(System.out, "Turtle");

If I run the version without an OntModelSpec, it works.  If I run the 
version with OntModelSpec.OWL_DL_MEM, then I get a ConversionException. 
I am running Jena 3.17.  


Use the OWL_MEM* OntModelSpecs not OWL_DL_MEM. OWL_MEM is set up to 
support OWL Full and so allows punning.


OWL_MEM itself does not include any reasoner configuration, but then 
none is needed for this test case case. The OWL_MEM rule reasoner 
configurations support various subsets of OWL full so are also 
compatible with punning.


The Javadoc says the default is a "weak 
reasoner."  I may be able to work around this in this one app, but I'm 
wondering what will happen when the model is later loaded with another 
choice of reasoner such as OWL_DL_MEM or Openllet.


OWL_DL_MEM doesn't include any reasoner, I can't speak to what Openllet 
supports in the way of punning.


Dave


Re: Subclass caching has some problems on Fuseki startup

2021-08-29 Thread Dave Reynolds

On 27/08/2021 21:09, Brandon Sara wrote:
I’ve finally tracked down the problem (at least at a high level). When using the Transitive Reasoner, there is a block of code which caches all sub class triples (https://github.com/apache/jena/blob/main/jena-core/src/main/java/org/apache/jena/reasoner/transitiveReasoner/TransitiveEngine.java#L316-L326). Part of this code searches for all sub properties of `subClassOf` and begins caching triples for those sub-properties. In my situation, I’ve added `owl:equivalentClass` manually (since only TransitiveReasoner` is being used) and manually made it a sub property of `subClassOf`. The data that I’m uploading right now has a lot of equivalent class triples (~>300k). It seems, if I’m understanding the code correctly as I’ve been debugging it, that not only is the triple cached…but a traversal of many other triples occurs when the caching occurs for even a single triple, is that correct? This would explain why (1) it never seems to finish what it is doing and (2) the memory grows very, very large while doing it. I ran a single query last night and after more than 6 hours, 8 CPUs, and 20GB of RAM, it still never finished loading the cache. It seems as though that the runtime of this could be exponential in nature. 


Indeed it can be expensive. The transitive reasoner is doing a 
transitive reduction (finding direct links) not just a transitive 
closure. If I remember correctly this is somewhere between quadratic and 
cubic (something like O(|V|(|V| + |E|)) in the best case). It uses a 
standard but rather old algorithm for this but I think the algorithm is 
still polynomial not exponential. However, (a) there could be some cases 
that throw it off and (b) at 20m records then even quadratic would be 
high cost and it's likely to be closer a power of 2.5.



My dataset is well over 20 million records (maybe even more, I still haven’t 
gotten a full count yet, but I know for a fact that it is well over 10 million 
and believe it to be well more than 20 million). Like I’ve mentioned before, 
there are basically no individuals in the dataset, it’s all ontology because it 
is health care industry coding systems and classifications.



Another strange thing, which I’ve mentioned before, is that I don’t have any of 
these issues when I initially load the data, I can load everything with just 4 
GB of RAM, it loads in a reasonable amount of time, and I can submit queries of 
pretty much any complexity after the upload is complete with no issues, and 
they are very fast too. This only occurs when the server has been restarted and 
the first query that actually pulls something from the dataset (I.E. not an 
empty query) is submitted (no matter how simple or complex that query may be).


Can't explain that.


Is this a bug or should `owl:equivalent` class work without my own manual 
specification of it?


Depends what you mean by "work".

I'm afraid I've not been following your earlier posts so I'm not clear 
on what you are trying to achieve.


If you want to deduce new subClassOf (and perhaps equivalentClass) 
relationships from a mix of subClassOf and equivalentClass assertions 
just using the transitive reasoner, with no rule engine, then you would 
have to insert that manual relationship.


If you just want to store and retrieve the equivalentClass relationships 
with no inference then clearly you wouldn't need that extra assertion.


Dave


Re: Inference rule selection for dummies?

2021-08-25 Thread Dave Reynolds

Hi Simon,


So now I’ve instantiated a GenericRuleReasoner and mutated it in a similar 
fashion. There is one bit I’m unsure about which is the implementation of the 
bind method that I had copied from the OWLMicroReasoner:


@Override
public InfGraph bind(Graph data) throws ReasonerException {
 InfGraph graph = super.bind(data);
 ((FBRuleInfGraph)graph).setDatatypeRangeValidation(true);
 return graph;
}

This method seems to mutate the graph before returning. I wonder if it is 
necessary for me to even retain this functionality... and how would I go about 
doing that using a GenericRuleReasoner?


This is a feature/hack to (as the name suggests) do some validation of 
OWL data type ranges in a way that can't be conveniently done in the 
rules. Given your goals of just supporting inverse reasoning you can 
safely ignore this.


Dave


Re: Inference rule selection for dummies?

2021-08-23 Thread Dave Reynolds

On 23/08/2021 08:16, Simon Gray wrote:

Thanks Dave, that is great to know!

Rght now I’m using one of the built-in OntModelSpec instances, calling 
`setBaseModelMaker` and `setImportModelMaker` on it with an instance created by 
`ModelFactory.createMemModelMaker` as the argument. It is my understanding that 
I will then also have to extend (or implement) both a Reasoner, e.g. 
org.apache.jena.reasoner.rulesys.OWLMicroReasoner, as well as a 
ReasonerFactory, e.g. OWLMicroReasonerFactory to make fresh OntModelSpec 
instance.


You shouldn't need to extend or implement a Reasoner.

I think you can simply create a reasoner instance - instantiate a 
GenericRuleReasoner with the rules you want - and can then use 
setReasoner on your copy of the OntModelSpec to install it for use.


Note that a reasoner can be reused multiple times - it's the 
InfModel/InfGraph which holds all the reasoner state for a given 
underlying base Model/Graph. The reasoner is just a reusable engine.


If you find that inelegant then you could indeed install a 
ReasonerFactory (either the GenericRuleReasonerFactory passing in a 
configuration model or create your own instance of a ReasonerFactory) 
but I don't think that's necessary and certainly don't need to create a 
new Reasoner class.


Dave




I wonder if this is the way to go?


Den 20. aug. 2021 kl. 17.24 skrev Dave Reynolds :

[You don't often get email from dave.e.reyno...@gmail.com. Learn why this is 
important at http://aka.ms/LearnAboutSenderIdentification.]

To create a reason with just a few rules of your choice then use the
GenericRuleReasoner. See
https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fjena.apache.org%2Fdocumentation%2Finference%2Findex.html%23rulesdata=04%7C01%7Csimongray%40hum.ku.dk%7C969da2c5af654732493b08d963eea058%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637650698810113318%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=9vGnLU75Hl1sq%2FKe6vmBLthD9TKEoLBYvGIT%2F469P6w%3Dreserved=0

You can write your own rules or pick and choose from those in the source
code you've found.

To only compute the inverse relationships, and do so on demand, you
probably just need:

[inverseOf2: (?P owl:inverseOf ?Q) -> table(?P), table(?Q),
[inverseOf2b: (?X ?P ?Y) <- (?Y ?Q ?X)] ]

That is a "hyrid" rule (mix of forward and backward) see the docs, but
that's the default for GenericRuleReasoner anyway.

Dave

On 20/08/2021 14:01, Simon Gray wrote:

Hi everyone,

I'm on Jena 3.14 experimenting with the built-in OntologySpecs.

My main requirement is inferring reverse relations (when applicable). Only the 
OWL reasoners do this, but they also infer so many more triples which balloons 
the size of the InfModel so much that it is impossible for me to infer every 
single triple within the memory constraints of my laptop. I have 16GB 
available, with 12GB set aside for the JVM heap, but every attempt at copying 
all inferred triples to a TDB2 model eventually fails with an OutOfMemoryError. 
The dataset is a modest-sized WordNet.

I wonder if it's possible for me to create an OntologySpec with a Reasoner that 
is a bit more basic, possibly only inferring reverse relation triples. Looking 
into this, it seems like one needs knowledge of the Logic DSL used to define 
the various built-in reasoners...? Or at least I never found an easy way to 
create a Reasoner that just does this.

Maybe it's possible to use one of the built-in OWL reasoners and simply disable 
some functionality? I'm not sure how, though. The only way I've found looking 
at the source code and the documentation seems to be editing a .rules file 
which contains the aforementioned logic DSL.

Kind regards
Simon






Re: Inference rule selection for dummies?

2021-08-20 Thread Dave Reynolds
To create a reason with just a few rules of your choice then use the 
GenericRuleReasoner. See 
https://jena.apache.org/documentation/inference/index.html#rules


You can write your own rules or pick and choose from those in the source 
code you've found.


To only compute the inverse relationships, and do so on demand, you 
probably just need:


[inverseOf2: (?P owl:inverseOf ?Q) -> table(?P), table(?Q), 
[inverseOf2b: (?X ?P ?Y) <- (?Y ?Q ?X)] ]


That is a "hyrid" rule (mix of forward and backward) see the docs, but 
that's the default for GenericRuleReasoner anyway.


Dave

On 20/08/2021 14:01, Simon Gray wrote:

Hi everyone,

I'm on Jena 3.14 experimenting with the built-in OntologySpecs.

My main requirement is inferring reverse relations (when applicable). Only the 
OWL reasoners do this, but they also infer so many more triples which balloons 
the size of the InfModel so much that it is impossible for me to infer every 
single triple within the memory constraints of my laptop. I have 16GB 
available, with 12GB set aside for the JVM heap, but every attempt at copying 
all inferred triples to a TDB2 model eventually fails with an OutOfMemoryError. 
The dataset is a modest-sized WordNet.

I wonder if it's possible for me to create an OntologySpec with a Reasoner that 
is a bit more basic, possibly only inferring reverse relation triples. Looking 
into this, it seems like one needs knowledge of the Logic DSL used to define 
the various built-in reasoners...? Or at least I never found an easy way to 
create a Reasoner that just does this.

Maybe it's possible to use one of the built-in OWL reasoners and simply disable 
some functionality? I'm not sure how, though. The only way I've found looking 
at the source code and the documentation seems to be editing a .rules file 
which contains the aforementioned logic DSL.

Kind regards
Simon




Re: Precomputing OWL inferences

2021-07-09 Thread Dave Reynolds

On 08/07/2021 10:05, Simon Gray wrote:

So I have a follow-up question...

What I really want is an updatable graph that persists on disk as TDB and then 
an expanded view that contains all of the inferred triples too - this may very 
well be an in-memory graph. Basically, I want to be able to add to the 
underlying TDB graph and then rely on inference to create additional triples, 
keeping a separation. I am not interested in persisting any inferred triples to 
a new TDB like some of the replies here assume. To me, the advantage of 
inference is having the flexibility of expanding the dataset on-demand while 
having a separation between man-made, curated triples and some varying set of 
inferred triples.

Is this at all possible to do with Jena?


Possible but not necessarily performant or convenient.

A major limitation of the Jena inference support is that it is in-memory 
only. There's no mechanism to persist/reload the internal state of the 
inference engines, you can only query for the resulting materialized 
triples and persist those as discussed on this thread. And the inference 
scaling is limited to memory, whether or not the base data is held on 
scalable persistent storage.


So you *can* create an inference model over a TDB model and updates made 
through the inference model will be persisted by the TDB base model, and 
also result in new inferences. However, the inference engine will be 
querying TDB for every query made by the rules. The performance of that 
will be much worse than performance of a purely in-memory configuration. 
When you first start up your service the first query (or any explicit 
initial prepare() call) will be very slow. After that, once the forward 
inferences have been completed, performance should be better but still 
significantly slower than a purely in-memory solution.


Depending on your application structure and scale of data you may be 
able to run with a dual in-memory-with-reasoning and 
copied-to-TDB-for-persistence architecture. Where on start up you copy 
the TDB data to the memory InfModel once and updates are written to both 
copies. That would still have high start up latency but not as high.


What you want to do is entirely reasonable but not well supported by 
Jena inference as it stands.


Dave


Den 5. jul. 2021 kl. 10.38 skrev Dave Reynolds :

On 05/07/2021 08:03, Simon Gray wrote:

Thank you for that answer, Dave! I think this provides the missing link in my 
understanding of the matter.
Is there a single method call to use when copying the inference model to a 
plain model or do I need to make copies of every triple myself and add them to 
a new model?


Model.add does it for you, so you should just need something like like:

plain.add( infModel );

and it will enumerate all triples and add them to the new model. Potentially 
taking some time!

Dave


Den 3. jul. 2021 kl. 18.34 skrev Dave Reynolds :


On 02/07/2021 13:29, Simon Gray wrote:

Hmm… I am not sure how my rules are modeled. I just use the built-in 
OWL_MEM_MICRO_RULE_INF OntModelSpec.
Anyway, my question is still this: how do I get all of those inferences 
computed *before* I start querying the Model. It’s great if I can just store 
them later, but I still need to *compute* them before I can think about 
persisting anything. Running a single query doesn’t seem to compute them all, 
just relevant ones to that specific query… I think?


Short answer is there's no built in way to precompute everything that's 
precomputable for the OWL reasoners other than that which the others have 
pointed out - copy the inferred model.

The OWL rules use a mix of forward and backward reasoning. The forward 
reasoning can all be invoked in one go via prepare() but the backward reasoning 
is mostly done on demand. Some of the backward rules are tabled/memoized so 
once they've been run once future runs are supposed to be quicker. Others are 
always run on demand.

If you have a few particular query patterns then to warm up the relevant 
memoization run those queries.

The most comprehensive way to ensure everything has been computed is to copy 
the model to a plain model (in memory or persistent). That copy is essentially 
running the query (?s ?p ?o) and will compute everything the rules can reach. 
After that the inference model is as warm as it's going to get. But since that 
that point you've already materialized everything then might as well keep the 
materialized copy as the others have said.

There'd be nothing to doing the general query (e.g. via an unbounded 
listStatements()) call and throwing the results away. That *could* be 
beneficial if the materialized model is too big but the tabling/memoization is 
proving useful and smaller - but no guarantees.

Dave


Den 2. jul. 2021 kl. 14.06 skrev Lorenz Buehmann 
:

But can't you do this inference just once and then somewhere store those 
inferences? Next time you can simply load the inferred model instead of the raw 
dataset

Re: Precomputing OWL inferences

2021-07-05 Thread Dave Reynolds

On 05/07/2021 08:03, Simon Gray wrote:

Thank you for that answer, Dave! I think this provides the missing link in my 
understanding of the matter.

Is there a single method call to use when copying the inference model to a 
plain model or do I need to make copies of every triple myself and add them to 
a new model?


Model.add does it for you, so you should just need something like like:

plain.add( infModel );

and it will enumerate all triples and add them to the new model. 
Potentially taking some time!


Dave


Den 3. jul. 2021 kl. 18.34 skrev Dave Reynolds :


On 02/07/2021 13:29, Simon Gray wrote:

Hmm… I am not sure how my rules are modeled. I just use the built-in 
OWL_MEM_MICRO_RULE_INF OntModelSpec.
Anyway, my question is still this: how do I get all of those inferences 
computed *before* I start querying the Model. It’s great if I can just store 
them later, but I still need to *compute* them before I can think about 
persisting anything. Running a single query doesn’t seem to compute them all, 
just relevant ones to that specific query… I think?


Short answer is there's no built in way to precompute everything that's 
precomputable for the OWL reasoners other than that which the others have 
pointed out - copy the inferred model.

The OWL rules use a mix of forward and backward reasoning. The forward 
reasoning can all be invoked in one go via prepare() but the backward reasoning 
is mostly done on demand. Some of the backward rules are tabled/memoized so 
once they've been run once future runs are supposed to be quicker. Others are 
always run on demand.

If you have a few particular query patterns then to warm up the relevant 
memoization run those queries.

The most comprehensive way to ensure everything has been computed is to copy 
the model to a plain model (in memory or persistent). That copy is essentially 
running the query (?s ?p ?o) and will compute everything the rules can reach. 
After that the inference model is as warm as it's going to get. But since that 
that point you've already materialized everything then might as well keep the 
materialized copy as the others have said.

There'd be nothing to doing the general query (e.g. via an unbounded 
listStatements()) call and throwing the results away. That *could* be 
beneficial if the materialized model is too big but the tabling/memoization is 
proving useful and smaller - but no guarantees.

Dave


Den 2. jul. 2021 kl. 14.06 skrev Lorenz Buehmann 
:

But can't you do this inference just once and then somewhere store those 
inferences? Next time you can simply load the inferred model instead of the raw 
dataset. It is not specific to TDB, you can load dataset A, compute the 
inferred model in a slow process once, materialize it as dataset B, and later 
on always work on dataset B - this is standard forward chaining with writing 
the data back to disk or database. Can you try this procedure, maybe it works 
for you?

Indeed this wont work if your rules are currently modeled as backward chaining 
rules as those are computed at query time always.


On 02.07.21 13:37, Simon Gray wrote:

Thank you Lorenz, although this seems to be a reply to my side comment about 
TDB rather than the question I had, right?

The main issue right now is that I would like to use inferencing to get e.g. 
inverse relations, but doing this is very slow the first time a query is run, 
likely due to some preprocessing step that needs to run first. I would like to 
run the preprocessing step in advance rather than running it implicitly.


Den 2. jul. 2021 kl. 13.30 skrev Lorenz Buehmann 
:

you can just add the inferred model to the dataset, i.e. add all triple to your 
TDB. Then you can disable the reasoner afterwards or just omit the rules that 
you do not need anymore

On 02.07.21 13:13, Simon Gray wrote:

Hi there,

I’m using Apache Jena from Clojure to create new home for the Danish WordNet. I 
use the Arachne Aristotle library + some additional Java interop code of my own.

I would like to use OWL inferencing to query e.g transitive or inverse 
relations. This does seem to work fine although I’ve only tried using the 
supplied in-memory model for now (and it looks like I will have to create my 
own instance of a ModelMaker to integrate with TDB 1 or 2).

However, the first query always seems to run really, really slow. Is there any 
way to precompute inferred relations so that I don’t have to wait? I’ve tried 
calling `rebind` and `prepare`, but they don’t seem to do anything.

Kind regards,

Simon Gray
Research Officer
Centre for Language Technology, University of Copenhagen







Re: Precomputing OWL inferences

2021-07-03 Thread Dave Reynolds



On 02/07/2021 13:29, Simon Gray wrote:

Hmm… I am not sure how my rules are modeled. I just use the built-in 
OWL_MEM_MICRO_RULE_INF OntModelSpec.

Anyway, my question is still this: how do I get all of those inferences 
computed *before* I start querying the Model. It’s great if I can just store 
them later, but I still need to *compute* them before I can think about 
persisting anything. Running a single query doesn’t seem to compute them all, 
just relevant ones to that specific query… I think?


Short answer is there's no built in way to precompute everything that's 
precomputable for the OWL reasoners other than that which the others 
have pointed out - copy the inferred model.


The OWL rules use a mix of forward and backward reasoning. The forward 
reasoning can all be invoked in one go via prepare() but the backward 
reasoning is mostly done on demand. Some of the backward rules are 
tabled/memoized so once they've been run once future runs are supposed 
to be quicker. Others are always run on demand.


If you have a few particular query patterns then to warm up the relevant 
memoization run those queries.


The most comprehensive way to ensure everything has been computed is to 
copy the model to a plain model (in memory or persistent). That copy is 
essentially running the query (?s ?p ?o) and will compute everything the 
rules can reach. After that the inference model is as warm as it's going 
to get. But since that that point you've already materialized everything 
then might as well keep the materialized copy as the others have said.


There'd be nothing to doing the general query (e.g. via an unbounded 
listStatements()) call and throwing the results away. That *could* be 
beneficial if the materialized model is too big but the 
tabling/memoization is proving useful and smaller - but no guarantees.


Dave




Den 2. jul. 2021 kl. 14.06 skrev Lorenz Buehmann 
:

But can't you do this inference just once and then somewhere store those 
inferences? Next time you can simply load the inferred model instead of the raw 
dataset. It is not specific to TDB, you can load dataset A, compute the 
inferred model in a slow process once, materialize it as dataset B, and later 
on always work on dataset B - this is standard forward chaining with writing 
the data back to disk or database. Can you try this procedure, maybe it works 
for you?

Indeed this wont work if your rules are currently modeled as backward chaining 
rules as those are computed at query time always.


On 02.07.21 13:37, Simon Gray wrote:

Thank you Lorenz, although this seems to be a reply to my side comment about 
TDB rather than the question I had, right?

The main issue right now is that I would like to use inferencing to get e.g. 
inverse relations, but doing this is very slow the first time a query is run, 
likely due to some preprocessing step that needs to run first. I would like to 
run the preprocessing step in advance rather than running it implicitly.


Den 2. jul. 2021 kl. 13.30 skrev Lorenz Buehmann 
:

you can just add the inferred model to the dataset, i.e. add all triple to your 
TDB. Then you can disable the reasoner afterwards or just omit the rules that 
you do not need anymore

On 02.07.21 13:13, Simon Gray wrote:

Hi there,

I’m using Apache Jena from Clojure to create new home for the Danish WordNet. I 
use the Arachne Aristotle library + some additional Java interop code of my own.

I would like to use OWL inferencing to query e.g transitive or inverse 
relations. This does seem to work fine although I’ve only tried using the 
supplied in-memory model for now (and it looks like I will have to create my 
own instance of a ModelMaker to integrate with TDB 1 or 2).

However, the first query always seems to run really, really slow. Is there any 
way to precompute inferred relations so that I don’t have to wait? I’ve tried 
calling `rebind` and `prepare`, but they don’t seem to do anything.

Kind regards,

Simon Gray
Research Officer
Centre for Language Technology, University of Copenhagen







Re: [GenericRuleReasoner] print builtins in forward rules

2021-04-13 Thread Dave Reynolds

Hi Barry,

Yes, the builtins on the LHS of a forward rule will not run until the 
triple patterns match. It's only when there's a binding vector from the 
tree of patterns that it gets submitted to the builtins. Which makes 
sense if you think of LHS builtins as normally being guards.


Dvae

On 13/04/2021 12:00, Nouwt, B. (Barry) wrote:

Hi all, I am working with forward rules for a project and I noticed that the 
print(...) builtins in the LHS of a forward rule only get executed when the 
full LHS matches. I was expecting the LHS to be matched in a per triple/builtin 
manner and so the print(...) builtins get executed until a triple or builtin 
did not match.

Can anyone confirm or deny that this is indeed the way the forwardRETE engine 
works?

Kind regards, Barry
This message may contain information that is not intended for you. If you are 
not the addressee or if this message was sent to you by mistake, you are 
requested to inform the sender and delete the message. TNO accepts no liability 
for the content of this e-mail, for the manner in which you use it and for 
damage of any kind resulting from the risks inherent to the electronic 
transmission of messages.



Re: Java 8 or 11?

2021-01-11 Thread Dave Reynolds

+1 to bumping version number, it is clearly a breaking change.

Dave

On 11/01/2021 12:59, Andrii Berezovskyi wrote:

Hello,

Just noticed that the discussion went really fast. I am a maintainer for 
Eclipse Lyo and as an integration SDK, we ship JDK 8 library builds for wide 
compat (with Jena dependency). Our GH Actions build matrix succeeds on JDK 8, 
11, 15, 16-ea, and 17-ea but this change will be breaking for us. To be clear, 
we build our libraries under JDK 8 and use them under JDK 11+ where possible, 
so we do take advantage of better Docker compat and TLS improvements.

1) Could you please consider bumping the Jena version to 4.0 as this is a 
breaking change?
2) Is it possible to designate some 3.x version to receive security fixes (I 
think Jackson is the biggest offender we see in our GH/Snyk reports) for some 
time after 4.0 release? I have seen reports that RDF* brings some problems to 
old users, so perhaps a version before that? Lyo 4.0 is on Jena 3.15 and Lyo 
4.1.alpha is on 3.17 - for now without issues. I think JDK 8 support 
(non-Oracle) will generally stop around 2026 
(https://aws.amazon.com/corretto/faqs/ and 
https://adoptopenjdk.net/support.html) and many integration projects are not 
eager to move (I just forwarded this thread to our mailing list and asked our 
users to begin testing their integration projects with JDK 11 but we will see).
3) How much trouble would it be to keep a JDK8 build of Jena without a new 
JSON-LD library? Are you switching libs or did Titanium drop JDK 8?

Thank you.

--
Best regards,
Andrew Berezovskyi

On 2021-01-08 , at 23:45, Andy Seaborne 
mailto:a...@apache.org>> wrote:

The Jena build has been switched to produce Java11 bytecode.

Nothing else in the codebase has been changed so this is easily reversible at 
the moment.

Using SNAPSHOT artifacts will get you Java11 bytecode.

There is currently some problems producing javadoc

One problem is [1] on early Java11 releases (11.0.1, 11.0.2, but not the GA 
release 11.0.0). Update-to-date Java11 is now 11.0.9 and works

Another is overlapping packages across modules using automatic module naming.

These do not affect the running of Jena.

Andy

[1] https://bugs.openjdk.java.net/browse/JDK-8212233




Re: Java 8 or 11?

2021-01-06 Thread Dave Reynolds

tl;dr It'd be inconvenient but we could cope.

As you say, there is likely to remain a bimodal distribution.

We currently remain with the java 8 runtime (increasingly using AWS 
Corretto). Mostly this is due to the time cost of qualifying and 
updating an increasingly large number of different running systems. I 
know of no technical barriers to moving to java11.


We do have at least one customer who, last we checked, was restricted 
(organizationally rather than technically) to java8. To be fair, the 
tool they use is not on the latest jena version anyway so we could move 
to different JVM versions for that v.s. other components, but targetting 
one platform is operationally easier hence sticking to java8 as the default.


Dave

On 05/01/2021 20:38, Andy Seaborne wrote:

Currently, Jena is compiled to run on any JVM from Java8 onwards.

Java8 was released March 2014.
Java11 (Sept 2018) is LTS (long term support)
Java17 (due Sept 2021) is probably going to be LTS.

Should Jena switch to Java11 going forward?

This message is to ask:

Are there deployments that do regularly upgrade can not for some reason 
move to the Java11 LTS platform?



There are the usually issues of moving to a newer Java. There seems 
likely to be an emerging bimodal distribution of systems remaining with 
Java8 and systems moving to Java11 and Java 17 (likely an LTS - 
September 2021).


The question is how many systems would upgrade their Jena version and 
are restricted to Java8 (and why!).


Java is evolving to better fit in the new tech landscape (e.g. better 
container usage), more compact strings (significant for Jena), and 
JDK-provided HTTP/2.


Some dependences or potential dependencies are Java11:

Titanium - for JSON-LD 1.1 (JENA-1948 - titanium-json-ld )

Eclipse Jetty 10 and 11 now depend on Java11.

     Andy


Re: Backup Jena Without Inferred Data

2020-12-05 Thread Dave Reynolds
If you want to write out the base data in an inference model, without 
and inferences, then use InfModel.getBaseModel() and write that out.


Dave


On 05/12/2020 03:35, Sushanth Vaddaram wrote:

Hi All,

We were seeing performance issues after restoring data from the backup files , 
upon checking the backup file we found that the inferred data was also being 
added. Is there any way during backup  we remove the inferred data.
Backup sample:
===

 a dip:DPLMSWorkpoint , dip:InformationObject , 
dip:SoftPoint , dip:ParameterPoint , rdfs:Resource , dip:TargetValueSetPoint , 
dip:SetPoint , owl:Thing , dip:DataPoint ;

I just want to get the following


   a dip:DPLMSWorkpoint

My doubt is if we load the inferred triples even that will be loaded in memory 
by reasoner and time will be wasted by the reasoner to analyze inferred data 
during query execution.
 //fetching data from jena after creating a connection
 Model model = conn.fetch();
// write model to the backup file
RDFDataMgr.write(out, model, Lang.TRIG);


Thanks 



Re: Jena Performance Issue with Reasoner

2020-12-03 Thread Dave Reynolds

Hi,

If your data doesn't change then yes you can compute the results ahead 
of time. Set up your data and reasoner, write the whole resulting model 
out to a dump file. Then you can load that dump file into another TDB 
and serve it as a fuseki dataset with no reasoner configuration.


That doesn't help if your data is continuously changing though.

The other option would be to run in-memory instead of from TDB. Given 
the reasoners are intrinsically limited to data that fits in memory then 
would could switch to running in memory, reading the data in from files 
when fuseki starts up. That would mean start up is slow but the reasoner 
performance will be a little better.


If neither of those options work or are open to you then I'm afraid it 
may simply be that the reasoner is not up to the job.


Dave

On 03/12/2020 17:47, Sushanth Vaddaram wrote:

We have 6 lakh statements and when trying to hit the  query with the 
configuration below, it is almost taking 90 secs to load the results. Is there 
anything I can do to improve the performance. Also is there any way inferencing 
can be done during loading of the statements rather than runtime as I was 
getting the result in 2 seconds without the reasoner in the configuration.

Query:

PREFIX dip:
PREFIX map-onto:
PREFIX auth:
PREFIX rdf: 
PREFIX rdfs: 
PREFIX owl: 

SELECT DISTINCT
   ?datapointURI  ?datapointTYPE  ?datapointLABEL  ?datapointSYSTEM  
?datapointPROVIDER  ?datapointDATAMOD  ?datapointABSTPROP
WHERE
{
  {
  ?datapointURI  rdf:type   dip:DataPoint  .
  ?datapointPROVIDER  dip:provides_info_object   ?datapointURI .
  ?datapointPROVIDER  dip:in_system   .
  ?datapointURI  rdf:type   ?datapointTYPE .
  }
UNION
  {
  ?datapointURI  rdf:type   dip:DataPoint  .
  ?datapointPROVIDER  dip:provides_info_object   ?datapointURI .
  ?datapointPROVIDER  dip:in_system   .
  ?datapointURI  rdfs:label   ?datapointLABEL .
  }
UNION
  {
  ?datapointURI  rdf:type   dip:DataPoint  .
  ?datapointPROVIDER  dip:provides_info_object   ?datapointURI .
  ?datapointPROVIDER  dip:in_system   .
  ?datapointURI  dip:data_source_url   ?datapointDATAMOD .
  }
UNION
  {
  ?datapointURI  rdf:type   dip:DataPoint  .
  ?datapointPROVIDER  dip:provides_info_object   ?datapointURI .
  ?datapointPROVIDER  dip:in_system   .
  ?datapointPROVIDER  dip:provides_info_object   ?datapointURI .
  ?datapointPROVIDER  dip:in_system   ?datapointSYSTEM .
  }
UNION
  {
  ?datapointURI  rdf:type   dip:DataPoint  .
  ?datapointPROVIDER  dip:provides_info_object   ?datapointURI .
  ?datapointPROVIDER  dip:in_system   .
  ?datapointPROVIDER  dip:provides_info_object   ?datapointURI .
  }
UNION
  {
  ?datapointURI  rdf:type   dip:DataPoint  .
  ?datapointPROVIDER  dip:provides_info_object   ?datapointURI .
  ?datapointPROVIDER  dip:in_system   .
  ?datapointURI  dip:abstracts   ?datapointABSTPROP .
  }
{


SELECT DISTINCT
   ?datapointURI
WHERE
{
  {
  ?zoneURI  dip:covers  
 .
  ?zoneURI  dip:has_scene_property   ?zonePROP .
  ?datapointURI  dip:abstracts   ?zonePROP .
  ?datapointURI  rdf:type   dip:DataPoint  .
  }
UNION
  {
  
dip:spatial_containment   ?deviceURI .
  ?deviceURI  dip:provides_info_object   ?datapointURI .
  ?datapointURI  rdf:type   dip:DataPoint  .
  }
UNION
  {
  ?datapointURI  rdf:type   dip:DataPoint  .
  ?datapointPROVIDER  dip:provides_info_object   ?datapointURI .
  ?datapointPROVIDER  dip:in_system   .
  
dip:contains_hardware / dip:contains_hardware_device   ?deviceURI .
  ?deviceURI  dip:provides_info_object   ?datapointURI .
  ?datapointURI  rdf:type   dip:DataPoint  .
  }
}

}
}



Configuration:
==
@prefix :<#> .
@prefix fuseki:   .
@prefix rdf:  .
@prefix rdfs: .
@prefix tdb:  .
@prefix tdb2:   .
@prefix ja:   .

[] rdf:type fuseki:Server ;
fuseki:services (   
  <#dip>
) .

# TDB
#[] ja:loadClass "org.apache.jena.tdb.TDB" .
#tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
#tdb:GraphTDBrdfs:subClassOf  ja:Model .

## 

Re: Rule Engine Help

2020-12-03 Thread Dave Reynolds

Hi,

No, there's no better solution I'm afraid.

For clean "monotonic" rules the rule engine should deal changes in the 
data automatically.


However, the noValue builtin is intrinsically non-monotonic and depends 
entirely on the state of the database at the exact time it fires and 
there's no support for rolling back such non-monotonic rules. In 
retrospect it would have been better to develop a separate rule engine 
for these sort of state-dependent rules rather than bolt them on top of 
an engine not designed for them. But that's water under the bridge.


Dave

On 03/12/2020 17:34, Jenson Joseph wrote:

Hi,

I'm using a General Rule Reasoner with forward rules and forwardRETE
engine. I am wondering if there is a way to invalidate/rollback the triples
of a rule that uses the 'noValue' builtin when a triple is
introduced invalidating it. It appears to do this for regular rule items,
eg. if an object is no longer an rdf:type a rule requiring it could be
rolled back.

The question is does the same happen for noValue rule items.Currently my
method around this is to look for invalid instances and use the 'remove'
builtin to remove and roll them back but that needs to be done exhaustively
for each 'noValue' instance. If there were a better solution for that that
would be preferable.

Thanks



Re: Aw: RE: Wich reasoner do i have to use to validate the given schema violation?

2020-10-23 Thread Dave Reynolds
Without seeing the schema (that seems to be only the data) then it's 
hard to be sure but as Han says, unless there's some disjointness 
expressions that's not an inconsistency.


I'm afraid OWL really isn't a schema language and not very got at 
expressing these sorts of constraints. The net effect is just that your 
example instance is inferred to be a Component as well as being a Topic. 
Unless there's an explicit axiom somewhere that prevents something being 
both a Component and a Topic at the same time then there's no violation.


To answer your original question then if there are any disjointness 
axioms then you will need at least the OWLMini configuration.


Dave

On 23/10/2020 16:53, alexander.fan...@web.de wrote:

Hi Han,
thanks for you fast answer.
I am not sure if i understood you right, but my model just contains a few 
Topics and Components. I will post it below, maybe it helps. I am not an expert 
in this semantic ontology by the way :D
The schema is an external schema.rdf...
The problem is the schema is from an external company and i have to use exactly 
this, so i am not allowed to change it.

My complete model ( data.rdf):

http://www.w3.org/1999/02/22-rdf-syntax-ns#;
 xmlns:pifan="http://customer.lebfy.de/pifan#;
 xmlns:ixsr="http://ixsr.abokom.de/ixsr#;
 xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#;>
   http://customer.lebfy.de/pifan#Heizung;>
 http://ixsr.abokom.de/ixsr#ProductFeature
 Heizung
   
   http://customer.lebfy.de/pifan#AusstattungDisplay;>
 http://ixsr.abokom.de/ixsr#ProductFeature
 Ausstattung Display
   
   http://customer.lebfy.de/pifan#Antrieb;>
 http://ixsr.abokom.de/ixsr#ProductFeature
 Antrieb
   
   http://ixsr.lebfy.de/topics/0d8b9d4a4aa5056274596bf9a7cb2a68/1/de-de;>
 Ventilator „PI-Fan“ X5-DH2
 http://ixsr.abokom.de/ixsr#OperatingInstructions"/>
 
   http://ixsr.lebfy.de/metadata/85e7506eb2e33f90c0a802687f559f0c;>
 http://ixsr.lebfy.de/metadata/85e7506eb2e33f90c0a802687fxyz
 PI-Fan – Ventilator
   
 
 
   http://ixsr.lebfy.de/metadata/dbbb7334b2e2a9e5c0a8026860d20c17;>
 5-stufig
   
 
 
   http://ixsr.lebfy.de/metadata/10b4c299b2ddae19c0a8026824aa7410;>
 mit Display
   
 
 
   http://ixsr.lebfy.de/metadata/b63e4b15b2dc2d83c0a8026877f6aa88;>
 Heizung 2 Stufen
   
 
 2020-08-06T12:24:07.285+02:00
   
   http://ixsr.lebfy.de/topics/2083fd4098849c6cd642d3ee6f5cfd32/1/de-de;>
 Inbetriebnahme
 http://ixsr.lebfy.de/metadata/85e7506eb2e33f90c0a802687f559f0c"/>
 http://ixsr.lebfy.de/metadata/dbbb7334b2e2a9e5c0a8026860d20c17"/>
 http://ixsr.lebfy.de/metadata/10b4c299b2ddae19c0a8026824aa7410"/>
 http://ixsr.lebfy.de/metadata/b63e4b15b2dc2d83c0a8026877f6aa88"/>
 2020-08-06T12:24:07.285+02:00
   
   http://ixsr.lebfy.de/topics/b583cdcfa6f24ae5b309cb8b429bdb83/1/de-de;>
 Bestimmungsgemäße Verwendung
 http://ixsr.lebfy.de/metadata/85e7506eb2e33f90c0a802687f559f0c"/>
 http://ixsr.lebfy.de/metadata/dbbb7334b2e2a9e5c0a8026860d20c17"/>
 http://ixsr.lebfy.de/metadata/10b4c299b2ddae19c0a8026824aa7410"/>
 http://ixsr.lebfy.de/metadata/b63e4b15b2dc2d83c0a8026877f6aa88"/>
 2020-08-06T12:24:07.285+02:00
   
   http://ixsr.lebfy.de/topics/6e4f5396e2209b9886b71340a6a6487f/1/de-de;>
 Teleskopstange und Standplatte 
montieren
 http://ixsr.lebfy.de/metadata/85e7506eb2e33f90c0a802687f559f0c"/>
 http://ixsr.lebfy.de/metadata/dbbb7334b2e2a9e5c0a8026860d20c17"/>
 http://ixsr.lebfy.de/metadata/10b4c299b2ddae19c0a8026824aa7410"/>
 http://ixsr.lebfy.de/metadata/b63e4b15b2dc2d83c0a8026877f6aa88"/>
 2020-08-06T12:24:07.285+02:00
   
   http://ixsr.lebfy.de/topics/e5508049e2a330281bb59096647ca61c/1/de-de;>
 Höhe einstellen
 http://ixsr.lebfy.de/metadata/85e7506eb2e33f90c0a802687f559f0c"/>
 http://ixsr.lebfy.de/metadata/dbbb7334b2e2a9e5c0a8026860d20c17"/>
 http://ixsr.lebfy.de/metadata/10b4c299b2ddae19c0a8026824aa7410"/>
 http://ixsr.lebfy.de/metadata/b63e4b15b2dc2d83c0a8026877f6aa88"/>
 2020-08-06T12:24:07.285+02:00
   
   http://ixsr.lebfy.de/topics/f495278ccd8b0624658d519aaea6d5f8/1/de-de;>
 Fehlercodes am Display
 http://ixsr.lebfy.de/metadata/10b4c299b2ddae19c0a8026824aa7410"/>
 http://ixsr.lebfy.de/metadata/85e7506eb2e33f90c0a802687f559f0c"/>
 http://ixsr.lebfy.de/metadata/dbbb7334b2e2a9e5c0a8026860d20c17"/>
 http://ixsr.lebfy.de/metadata/b63e4b15b2dc2d83c0a8026877f6aa88"/>
 2020-08-06T12:24:07.285+02:00
   
   http://ixsr.lebfy.de/topics/d263130eda2ed87a61da6a16ff965f1e/1/de-de;>
 TD 1
 http://ixsr.lebfy.de/metadata/85e7506eb2e33f90c0a802687f559f0c"/>
 http://ixsr.lebfy.de/metadata/dbbb7334b2e2a9e5c0a8026860d20c17"/>
 http://ixsr.lebfy.de/metadata/10b4c299b2ddae19c0a8026824aa7410"/>
 http://ixsr.lebfy.de/metadata/b63e4b15b2dc2d83c0a8026877f6aa88"/>
 2020-08-06T12:24:07.285+02:00
   
   

Re: Reasoning profiles and performance

2020-10-20 Thread Dave Reynolds
tl;dr I'm afraid there's no support for pre-materialized but 
incrementally updated OWL reasoning over persistent models in Jena.


On 20/10/2020 02:00, Zalan Kemenczy wrote:

Hi there,

I've been experimenting with various reasoning profiles to get better
performance in my project, and I'm looking for guidance on how to push
materialization towards data creation time, rather than query time. I
believe this would suit my use-case: data mutations are infrequent (they do
happen), but queries need to be fast.

Some additional configuration context:

1. Loading some owl ontologies (~80k triples) into an in memory model
2. Bind it to a OWLMicro reasoner using bindSchema
3. Bind the reasoner to a TDB2 backed model

With just the ontologies loaded, even before I loaded any instance data,
this led to fairly slow queries (> 1 min). To confirm the issue was with
the inference layer, I serialized all the ground and inferred triples to a
second tdb2 backed dataset, with no inference layer, and my reference
queries were much faster (~100ms) and returned identical results.

Reading the docs, I gathered I could try to push materialization towards
data mutation time with a combination of forward reasoning and prepare. The
inference doc had this to say about the GenericRuleReasoner:

"When run in forward mode all rules are treated as forward even if they
were written in backward ("<-") syntax. This allows the same rule set to be
used in different modes to explore the performance tradeoffs."


If you just have a plain rule set with simple A -> B or B <- A rules 
then you can indeed run the rules in either direction. However, if you 
have a hybrid rule set which mixes directions, and in particular uses 
forward rules to create instantiated backward rules, that won't work - 
those intrinsically need the hybrid engine.



However, I've run into a couple of issues:

1. If I setMode on the OWLMicroReasoner too FORWARD, I get the following
exception when I try to bind the reasoner to a graph:

org.apache.jena.reasoner.rulesys.BasicForwardRuleInfGraph cannot be cast to
org.apache.jena.reasoner.rulesys.FBRuleInfGraph

Due to the following line:(
https://github.com/apache/jena/blob/bfce1741cb12f9cf544235d32fba6598bc7341b5/jena-core/src/main/java/org/apache/jena/reasoner/rulesys/OWLMicroReasoner.java#L94
)


Yes the OWLMicroReasoner is intrinsically a hybrid ("FB") rule reasoner 
and that can't be changed.



2. If I use a GenericRuleReasoner, loaded with OWLMicro rules set to
FORWARD mode, I can bind the reasoner, but then I get the following
execution error at query execution time:

Forward reasoner does not support hybrid rules - [ (?x owl:intersectionOf
?y) -> (?x rdf:type owl:Class) ]

Which I don't understand because that does not seem like a backward rule.


As noted above the OWLMicro rules are hybrid and can't be run via a pure 
forward engine.


That particular example looks like a plain fact and forward mode should 
support that, possible a bug. However, those cases are the least of the 
your worries, it's true hybrid rules like:


[inverseOf2: (?P owl:inverseOf ?Q)
-> table(?P), table(?Q), [inverseOf2b: (?X ?P ?Y) <- (?Y ?Q ?X)] ]

that simply have no pure forward equivalent.


So to sum up, I have two questions:

1. What would be your recommended approach to pushing materialization to
data creation time


I'm afraid there's no good support for this in jena.

If the rate of data updates is very low compared to the rate of queries 
then you could re-run the entire materialization from scratch each time 
the data changes. Unsubtle and slow at materialization time but queries 
would then be faster.


If the rate of data updates is high but your data fits in memory then 
use the in-memory reasoner and let it's (limited) incremental reasoning 
handle the changes.


But I'm afraid Jena has no support for incrementally updating inference 
results when the data is beyond memory limits and persisted to e.g. TDB.



2. How would you create forward rules reasoner that implements OWLmicro, or
closest to


You would need to write a custom pure-forward ruleset to implement the 
axioms you want, perhaps starting from etc/rdfs-noresource.rules and 
adding the relevant OWL axioms. Depending on which axioms you want 
performance may or may not be problematic and the pure forward engine 
will still hold all it's data in memory so that won't scale any better.


Dave


Re: Jena InfModel Listener Help

2020-09-22 Thread Dave Reynolds



On 21/09/2020 15:27, Jenson Joseph wrote:

Thanks.

That explains a lot. Like why the difference in the size variable of the graphs 
only includes the new statements and not implied ones.

Knowing this I have 2 options, run the listStatements command every time 
there’s changes or only use forward chaining, but that one is limiting.

Do you think a 3rd option is viable where I create a new InfModel class that 
takes this into account and somehow traces throw and keeps a cache of those 
changes from the listStatement process, notifying the difference?


I guess it would be possible to construct such a wrapper Graph with a 
cache and a notifier on triples which hadn't been cached before. 
However, it would very inefficient and would only notify based on 
whatever queries had been run.


If you need some notification of all inferences then I would have 
thought that pure forward chaining would be the way to go.


Dave


Sent from my iPhone


On Sep 21, 2020, at 6:21 AM, Dave Reynolds  wrote:

I don't believe there's a custom notification arrangement for InfModels so 
indeed you'd have to attach the listener to both the base model and the 
deductions model.

For pure forward chaining rules that should be sufficient.

If you are using backward chaining rules, or the hybrid rule system (as used 
for the all the default configurations for RDFS and OWL), then it's not 
possible to get notifications for everything. In those setups then some of the 
triples you see from listStatements have been implicitly generated on-demand 
prompted by the query pattern and are never materialized or stored anywhere and 
so can't trigger a listener.

Dave


On 20/09/2020 15:13, Jenson Joseph wrote:
HI,
I'm hoping I could get some help on this. I have an InfModel that I add a 
listener to where I'm looking for all changes to the model, whether it was 
inferred or not. On the model, I only get notified of changes that were 
directly added, inferred changes are not notified. I switched to listening to 
the deductions model and I don't get any notifications when I add new 
statements.
I tried using combinations of the .rebind, .reset, .prepare functions with no 
luck and I searched the internet on this with no luck.
I know the inferences are being made because they show up if I list the 
statements of the inference model. I would like to know if it is currently 
possible to have a listener that receives all changes made to n InfModel. The 
reason for this is I want to keep track of the incremental changes and perform 
queries on a small subset of the model. If I do the queries on the entire model 
it would be more expensive while returning redundant results that have already 
been seen and processed.
If this is an expected use case in Jena, that's handled differently I'm open to 
hearing about that but I would love to have this meticulous listener within the 
InfModel if you can help me figure that out.
Thanks,
Jenson


Re: Jena InfModel Listener Help

2020-09-21 Thread Dave Reynolds
I don't believe there's a custom notification arrangement for InfModels 
so indeed you'd have to attach the listener to both the base model and 
the deductions model.


For pure forward chaining rules that should be sufficient.

If you are using backward chaining rules, or the hybrid rule system (as 
used for the all the default configurations for RDFS and OWL), then it's 
not possible to get notifications for everything. In those setups then 
some of the triples you see from listStatements have been implicitly 
generated on-demand prompted by the query pattern and are never 
materialized or stored anywhere and so can't trigger a listener.


Dave

On 20/09/2020 15:13, Jenson Joseph wrote:

HI,

I'm hoping I could get some help on this. I have an InfModel that I add a 
listener to where I'm looking for all changes to the model, whether it was 
inferred or not. On the model, I only get notified of changes that were 
directly added, inferred changes are not notified. I switched to listening to 
the deductions model and I don't get any notifications when I add new 
statements.

I tried using combinations of the .rebind, .reset, .prepare functions with no 
luck and I searched the internet on this with no luck.

I know the inferences are being made because they show up if I list the 
statements of the inference model. I would like to know if it is currently 
possible to have a listener that receives all changes made to n InfModel. The 
reason for this is I want to keep track of the incremental changes and perform 
queries on a small subset of the model. If I do the queries on the entire model 
it would be more expensive while returning redundant results that have already 
been seen and processed.

If this is an expected use case in Jena, that's handled differently I'm open to 
hearing about that but I would love to have this meticulous listener within the 
InfModel if you can help me figure that out.

Thanks,
Jenson



Re: Builtin & RuleContext

2020-09-04 Thread Dave Reynolds

Hi Barry,

I had been going joke that you could throw/catch an exception and look 
at the stack trace, I may have under-estimated how ugly you were 
prepared to get :)


I guess looking at threads *might* work. Each concurrent fuseki request 
will be run in a separate thread by jetty or your servlet container (if 
running as a war). I don't think ARQ forks any separate threads for the 
query all the infgraph calls for one query will indeed be on one thread.


So you might think you could just mark the first time you see a call the 
Builtin by setting a threadlocal and then test that on later invocations.


The problem is ThreadPools.

If you are running fuseki as a war in your own container you may have 
enough access to the thread pool to add a before/after Execute hook to 
clear the threadlocal. If you are running using the embedded Jetty then 
I just don't know enough about Jetty to know if you can access the 
thread pool or plugin your own version. Seems plausible but not 
something I've ever tried.


Dave

On 04/09/2020 09:59, Nouwt, B. (Barry) wrote:

Hi Dave,

Thanks for your answer! What about (the very ugly solution of) looking at the 
thread name from within the Builtin...is every SPARQL query handled by a single 
thread? So, if I notice subsequent calls of the same Builtin from the same 
thread, could I assume that they belong to the same query? You would only need 
a way to know when the first query is finished and the second begins, but I 
think I can handle that.

Regards, Barry

-Original Message-
From: Dave Reynolds 
Sent: donderdag 3 september 2020 10:09
To: users@jena.apache.org
Subject: Re: Builtin & RuleContext

On 01/09/2020 09:06, Nouwt, B. (Barry) wrote:

Hi everyone,

I have an Apache Jena Fuseki server running with a set of backward rules each 
with a custom BuiltIn. The rules get applied whenever we post a SPARQL query 
and this works as expected. Since backward rules are only allowed to have a 
single triple pattern as head, a single SPARQL query triggers multiple backward 
rules. We would like to know whether we can distinguish between different 
SPARQL query posts from within the custom BuiltIn. I am aware that this would 
not result in an elegant solution (i.e. a hack), but that is not a problem for 
now. Is this somehow possible? I've looked at the RuleContext and 
BindingEnvironment, whether there is some way to recognize or identify the 
query instance, but I haven't found one.

So, if I would post the same query twice, I would like to be able to identify 
these two separate sessions. Any ideas?


Sorry, I can't think of a way to do it from within the rule system or your 
custom Builtin. All the rule system sees is a triple pattern query to the 
InfGraph, it doesn't know anything about the calling context and has no way to 
pass that on to the builtin if it did.

You might be able table some of your predicates to prevent some patterns 
getting rerun unnecessarily but that still wouldn't allow the custom builtin to 
figure out what the context is when it is run.

Dave
This message may contain information that is not intended for you. If you are 
not the addressee or if this message was sent to you by mistake, you are 
requested to inform the sender and delete the message. TNO accepts no liability 
for the content of this e-mail, for the manner in which you use it and for 
damage of any kind resulting from the risks inherent to the electronic 
transmission of messages.



Re: Builtin & RuleContext

2020-09-03 Thread Dave Reynolds

On 01/09/2020 09:06, Nouwt, B. (Barry) wrote:

Hi everyone,

I have an Apache Jena Fuseki server running with a set of backward rules each 
with a custom BuiltIn. The rules get applied whenever we post a SPARQL query 
and this works as expected. Since backward rules are only allowed to have a 
single triple pattern as head, a single SPARQL query triggers multiple backward 
rules. We would like to know whether we can distinguish between different 
SPARQL query posts from within the custom BuiltIn. I am aware that this would 
not result in an elegant solution (i.e. a hack), but that is not a problem for 
now. Is this somehow possible? I've looked at the RuleContext and 
BindingEnvironment, whether there is some way to recognize or identify the 
query instance, but I haven't found one.

So, if I would post the same query twice, I would like to be able to identify 
these two separate sessions. Any ideas?


Sorry, I can't think of a way to do it from within the rule system or 
your custom Builtin. All the rule system sees is a triple pattern query 
to the InfGraph, it doesn't know anything about the calling context and 
has no way to pass that on to the builtin if it did.


You might be able table some of your predicates to prevent some patterns 
getting rerun unnecessarily but that still wouldn't allow the custom 
builtin to figure out what the context is when it is run.


Dave


Re: OWL inverseOf inference

2020-05-28 Thread Dave Reynolds



On 28/05/2020 21:37, Kenneth Keefe wrote:

The Jena OWL Micro inference engine is not doing what I'm expecting when
faced with an objectProperty with an inverseOf statement. I explain a
simple test ontology and provide the owl and java to test it below.

I created an ontology with a Person class and childOf and parentOf object
properties. When I create two nodes and make one the parent of the other,
I'm expecting the inference engine to infer the appropriate child property,
but it doesn't.


General rule of thumb in such cases - check your URIs ...

and, if at all possible, use turtle not RDF/XML, makes it easier to see 
errors like this.



inverseOf.owl

http://example.com/ont/roster/;
xml:base="http://example.com/ont/roster/;
xmlns:pr="http://example.com/ont/roster/;
xmlns:owl="http://www.w3.org/2002/07/owl#;
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#;
xmlns:xml="http://www.w3.org/XML/1998/namespace;
xmlns:xsd="http://www.w3.org/2001/XMLSchema#;
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#;>
http://example.com/ont/inverse/;>
http://example.com/ont/inverse/0.0.1; />


http://example.com/ont/inverse/Person; />

http://example.com/ont/inverse/parentOf;>
   http://example.com/ont/inverse/childOf; />
   http://example.com/ont/inverse/Person; />
   http://example.com/ont/inverse/Person; />



This declares an inverse of: http://example.com/ont/inverse/parentOf


http://example.com/ont/inverse/childOf;>
   http://example.com/ont/inverse/Person; />
   http://example.com/ont/inverse/Person; />


http://example.com/ont/inverse/Sally;>
   http://example.com/ont/inverse/Person"/>
   http://example.com/ont/inverse/Bob; />


This uses a different relation from the one you meant, this uses:
http://example.com/ont/roster/parentOf




http://example.com/ont/inverse/Bob;>
   http://example.com/ont/inverse/Person"/>




Here is the Jena code I use:

OntModel model =
ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM_MICRO_RULE_INF);
model.read("file:///inverseTest.owl");

printAllStatements(model);

The resulting model contains these nodes with Bob or Sally as the subject:

http://example.com/ont/inverse/Bob
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://example.com/ont/inverse/Person .
http://example.com/ont/inverse/Bob
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#Resource .
http://example.com/ont/inverse/Bob
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2002/07/owl#Thing .
http://example.com/ont/inverse/Sally http://example.com/ont/roster/parentOf
http://example.com/ont/inverse/Bob .
http://example.com/ont/inverse/Sally
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://example.com/ont/inverse/Person .
http://example.com/ont/inverse/Sally
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2000/01/rdf-schema#Resource .
http://example.com/ont/inverse/Sally
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2002/07/owl#Thing .


No, that's not all the statements. It doesn't include the following 
triple from your base model:


http://example.com/ont/inverse/Sally
http://example.com/ont/roster/parentOf
http://example.com/ont/inverse/Bob .

If you had included that it would have easier to spot that you had two 
different namespaces for parentOf in play.


Dave


Re: Help with inference rules

2020-05-27 Thread Dave Reynolds

On 27/05/2020 04:54, Kenneth Keefe wrote:

Thank you! Can you suggest any texts that teach how to effectively write
inference rules for Jena? 


I don't know of any though no doubt there's the odd blog post example.


However, I'm not having success changing the node URIs. When I change the
rule to this:

  [r1: (?x rdf:type ro:RosterEntry), (?x ro:hasSignature ?sig), regex(?sig,

'(.*) (.*)', ?first, ?last), makeSkolem(?y, ?x), uriConcat('
http://example.com/ont/roster/people/', ?first, ?last, ?y) -> (?y
ro:hasFirstName ?first),  (?y ro:hasLastName ?last),  (?y rdf:type
ro:Person)]



Jena doesn't seem to infer any new nodes.
Why have both makeSkolem and uriConcat? You either want to create a URI 
resource as your subject or create a bNode. That's trying to make it 
both, which can never happen so the rule never fires.



I tried separating it into another rule:

  [r1: (?x rdf:type ro:RosterEntry), (?x ro:hasSignature ?sig), regex(?sig,

'(.*) (.*)', ?first, ?last), makeSkolem(?y, ?x) -> (?y ro:hasFirstName
?first),  (?y ro:hasLastName ?last),  (?y rdf:type ro:Person)]
  [r2: (?x rdf:type ro:Person), (?x ro:hasFirstName ?first), (?x
ro:hasLastName ?last), uriConcat('http://example.com/ont/roster/people/',
?first, ?last, ?y) -> print('woot', ?y)]



What is strange is that with the second rule, the inference engine prints
out:

'woot' <http://example.com/ont/roster/people/BobSmith>
'woot' <http://example.com/ont/roster/people/SallyJones>

But, when I listStatements and print out the InfModel,

I get the same node URIs:

1PXx9jabIZ6w9mfO0GygJA== type Person .

1PXx9jabIZ6w9mfO0GygJA== hasLastName "Jones" .
1PXx9jabIZ6w9mfO0GygJA== hasFirstName "Sally" .
lOqL7Rx2Lwv5OK6BWD/jSA== type Person .
lOqL7Rx2Lwv5OK6BWD/jSA== hasLastName "Smith" .
lOqL7Rx2Lwv5OK6BWD/jSA== hasFirstName "Bob" .



So, does uriConcat not set the node's uri permanently?


Ah, I think you might be imaging that uriConcat somehow rewrites the 
node in an RDF graph so it has a name. No. Like all the rule builtins 
all it does is bind a value to a variable or, if the variable already 
has a value, then test it matches.


So just should just need something like (untested):

 [r1: (?x rdf:type ro:RosterEntry),
  (?x ro:hasSignature ?sig),
  regex(?sig, '(.*) (.*)', ?first, ?last),
 uriConcat('http://example.com/ont/roster/people/', ?first, ?last, ?y)
   -> (?y ro:hasFirstName ?first),
  (?y ro:hasLastName ?last),
  (?y rdf:type ro:Person)]

Dave


On Sun, May 24, 2020 at 6:03 AM Dave Reynolds 
wrote:


Sorry, only just noticed this question. Responses below:

On 20/05/2020 10:41, Kenneth Keefe wrote:

Thanks for all the help so far! I've made some good progress on this
example I'm trying to forge. Here are the files for this example:

Ontology: https://cioi.iti.illinois.edu/ont/examples/roster.owl
Rules: https://cioi.iti.illinois.edu/ont/examples/roster.rules

Using this code:

OntModel model =

ModelFactory.*createOntologyModel*(OntModelSpec.*OWL_MEM*);


model.read("file:roster.owl");

List rules = Rule.*rulesFromURL*("file:roster.rules");

Reasoner reasoner = new GenericRuleReasoner(rules);

InfModel inf = ModelFactory.*createInfModel*(reasoner, model);


printAllStatements(inf);

I get this output (I use the LocalName for the predicates and objects for
readability):

b04230c7-55dc-4e08-92d7-26559c0c478f hasLastName "Smith" .
c33de24f-82f1-4aa2-a78e-73201341b274 type Person .
c812ff54-30bf-49e7-8e35-3544dad096b7 hasFirstName "Sally" .
16b7a887-2fb9-4bdb-a529-34ca8b3137e4 hasFirstName "Bob" .
b31eb7d5-a8ad-469e-ae83-366bdf46fd72 hasLastName "Jones" .
8b46f0c7-4eb2-41e2-8789-8967e8f63eec type Person .
http://example.com/ont/roster/ versionIRI .
http://example.com/ont/roster/ type Ontology .
http://example.com/ont/roster/RosterEntry type Class .
http://example.com/ont/roster/bobEntry hasSignature "Bob Smith" .
http://example.com/ont/roster/bobEntry type RosterEntry .
http://example.com/ont/roster/bobEntry type NamedIndividual .
http://example.com/ont/roster/Person type Class .
http://example.com/ont/roster/hasLastName range string .
http://example.com/ont/roster/hasLastName domain Person .
http://example.com/ont/roster/hasLastName type DatatypeProperty .
http://example.com/ont/roster/sallyEntry hasSignature "Sally Jones" .
http://example.com/ont/roster/sallyEntry type RosterEntry .
http://example.com/ont/roster/sallyEntry type NamedIndividual .
http://example.com/ont/roster/hasSignature range string .
http://example.com/ont/roster/hasSignature domain RosterEntry .
http://example.com/ont/roster/hasSignature type DatatypeProperty .
http://example.com/ont/roster/hasFirstName range string .
http://example.com/ont/roster/hasFirstName domain Person .
http://example.com/ont/roster/hasFirstName type DatatypeProperty .

Here are my questions:

Focusing on just the Bob Smith entry:

b0

Re: Help with inference rules

2020-05-24 Thread Dave Reynolds

Sorry, only just noticed this question. Responses below:

On 20/05/2020 10:41, Kenneth Keefe wrote:

Thanks for all the help so far! I've made some good progress on this
example I'm trying to forge. Here are the files for this example:

Ontology: https://cioi.iti.illinois.edu/ont/examples/roster.owl
Rules: https://cioi.iti.illinois.edu/ont/examples/roster.rules

Using this code:

OntModel model = ModelFactory.*createOntologyModel*(OntModelSpec.*OWL_MEM*);

model.read("file:roster.owl");

List rules = Rule.*rulesFromURL*("file:roster.rules");

Reasoner reasoner = new GenericRuleReasoner(rules);

InfModel inf = ModelFactory.*createInfModel*(reasoner, model);


printAllStatements(inf);

I get this output (I use the LocalName for the predicates and objects for
readability):

b04230c7-55dc-4e08-92d7-26559c0c478f hasLastName "Smith" .
c33de24f-82f1-4aa2-a78e-73201341b274 type Person .
c812ff54-30bf-49e7-8e35-3544dad096b7 hasFirstName "Sally" .
16b7a887-2fb9-4bdb-a529-34ca8b3137e4 hasFirstName "Bob" .
b31eb7d5-a8ad-469e-ae83-366bdf46fd72 hasLastName "Jones" .
8b46f0c7-4eb2-41e2-8789-8967e8f63eec type Person .
http://example.com/ont/roster/ versionIRI .
http://example.com/ont/roster/ type Ontology .
http://example.com/ont/roster/RosterEntry type Class .
http://example.com/ont/roster/bobEntry hasSignature "Bob Smith" .
http://example.com/ont/roster/bobEntry type RosterEntry .
http://example.com/ont/roster/bobEntry type NamedIndividual .
http://example.com/ont/roster/Person type Class .
http://example.com/ont/roster/hasLastName range string .
http://example.com/ont/roster/hasLastName domain Person .
http://example.com/ont/roster/hasLastName type DatatypeProperty .
http://example.com/ont/roster/sallyEntry hasSignature "Sally Jones" .
http://example.com/ont/roster/sallyEntry type RosterEntry .
http://example.com/ont/roster/sallyEntry type NamedIndividual .
http://example.com/ont/roster/hasSignature range string .
http://example.com/ont/roster/hasSignature domain RosterEntry .
http://example.com/ont/roster/hasSignature type DatatypeProperty .
http://example.com/ont/roster/hasFirstName range string .
http://example.com/ont/roster/hasFirstName domain Person .
http://example.com/ont/roster/hasFirstName type DatatypeProperty .

Here are my questions:

Focusing on just the Bob Smith entry:

b04230c7-55dc-4e08-92d7-26559c0c478f hasLastName "Smith" .
c33de24f-82f1-4aa2-a78e-73201341b274 type Person .
16b7a887-2fb9-4bdb-a529-34ca8b3137e4 hasFirstName "Bob" .

1. Are these unique ids of anonymous nodes?


Yes.


2. Why are they not identical across these three lines?


Because your ?y variable hasn't been bound in the rule body, each triple 
pattern finds there's no value for ?y and separately treats it as a bNode.


If you want a shared bNode then use makeTemp(?y) in the body (or 
makeSoklem).



3. Is there a way to name these new nodes in the rule? For example, make
the new node http://example.com/ont/roster/people/BobSmith.


Sure, use uriConcat, something like (untested):

   uriConcat('http://example.com/ont/roster/people/', ?first, ?last, 
?y) -> ...


Dave



Thank you!

Ken





On Tue, May 12, 2020 at 1:44 AM Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:


Hi,

I think the rule would be basically

[r1: (?x rdf:type ex:RosterEntry), (?x ex:hasSignature ?sig),
regex(?sig, '(.*) (.*)', ?first, ?last)  -> (?x ex:hasFirstName
?first),  (?x ex:hasLastName ?last),  (?x rdf:type ex:Person) ) ]

Note, it's untested and you have to define your prefix ex: in the rules
file. You might also have to adapt the regex pattern to cover different
white space chars.

On 12.05.20 00:56, Kenneth Keefe wrote:

I am pretty new to using Jena and OWL. I have read many great tutorials
regarding RDF and OWL. The focus of those tutorials has largely been how

to

structure the ontology and define restrictions on properties and such.
However, I have not been able to find good tutorials that explain how
inference is done and how I can define my own inference rules. I'm
wondering if I am simply not searching for the right thing.

Regardless, here is a significant example that I think will really help

me

get started with inference using Jena. I created a minimal example to
enable discussion. Here is a pastebin:  https://pastebin.com/ScTGcbcZ

The ontology has two classes, RosterEntry and Person and three data
properties, Signature (associated with RosterEntry), and FirstName and
LastName (both associated with Person). The example also has two
RosterEntry individuals with signatures of "Bob Smith" and "Sally Jones."

I would like to write a rule that causes Jena to infer the following new
facts:



 http://example.com/ont/roster/Person; />
 Bob

 Smith

 




 http://example.com/ont/roster/Person; />
 Sally

 Jones

 


How do I do that? Full answers or nudges in the right direction are both
very welcome. Thank you!

Ken








Re: questions about reasoning with TDB

2020-04-04 Thread Dave Reynolds

Hi,

On 03/04/2020 15:38, Benjamin Geer wrote:

I’ve been reading the documentation and list archives about Fuseki assembler 
configurations with TDB and reasoners, and I’m trying to figure out whether the 
setup I’d like to use is possible. I have three questions:

1. I’d like to use a forward-chaining reasoner to improve query performance 
with a large TDB dataset by inferring some frequently queried relations. To 
avoid having to recompute all the inferred triples every time Fuseki is started 
(which could take a long time), I’d like to persist the inferred triples in TDB 
as well. Is that possible? I looked for this scenario in the Jena documentation 
but didn’t find it.


Basically this isn't supported, sorry.

The forward chaining engine keeps a *lot* of state in memory in the 
RETE-like network. Which means unless you have very selective patterns 
in your rules you can end up with large parts of the data in memory. In 
worst cases you can have multiple copies.


This has several implications:

First, it means that it's not scalable. If you have a very large TDB 
dataset then the reasoner is likely to run out memory. Plus the internal 
format is really not optimised for large scale data and inference speed 
will take a hit.


Second, it means that there's no point persisting the inference results 
on their own, unless they are static. If, as in your case, you want to 
continue to add new data and get incremental inferencing then you would 
need some way to preserve and restore the intermediate state in the 
engine, which is not supported.


So given this there's little point in supporting having the deductions 
graph in TDB because that doesn't solve the problems of scaling and restart.



2. For queries, I’d like a default graph containing the union of all named 
graphs plus the inferred statements. Can this be done along with (1)?


The first part can be done manually but not along with (1).

It's possible to use some offline process to generate a static set of 
inferences (whether using the rule engine or e.g. SPARQL construct 
queries) to one named graph, put the data in another graph and then have 
the default graph be the union.


However, your data isn't static so this doesn't help.


3. The named graphs in the base model need to be continually updated (always 
using SPARQL quad patterns), and I’d like the reasoner to update its inferences 
when that happens. After reading some old messages on this list, I think this 
might not be possible, because if I understand correctly, the only way to 
update the base model would be via a separate Fuseki service that updates the 
underlying TDB dataset directly, and in that case, the reasoner won’t see those 
updates until Fuseki is restarted. Did I understand that correctly, and if so, 
is it still true?


I thought you could configure fuseki to have a reasoner as the source 
model and so have updates do to the reasoner rather than a base graph. 
However, given none of the rest of what you need to do is supported this 
point is moot.


Sorry to not be able to support your use case.

Dave


Re: unexpected output in rule

2020-03-16 Thread Dave Reynolds

Hi Luis,

On 16/03/2020 09:41, Luis Enrique Ramos García wrote:

Hi again Dave,

sorry if I have not explained appropriately,

let me tell you at first my goal: I am inspecting a dataset of 1,5 millions
individuals, againts another dataset with 5k *search_ID *values, where I
have  to get an individual with a given value in a property. I use rule 2
with search_ID in *registration_Authority_entity_ID* property, in order to
identify individuals with search_ID value. I think rule 2 could be
rewritten as follows:

rule 2=  (?b rdf_ns:type   Entity)  (?b registration_Authority_entity_ID "
*search_ID'*) -> (?b has_ord ?ord)

  rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns;
  rdf_ns = rdf+"#"


Unclear why you are using rules for rather than simply a Sparql query.

That example is *still* not a legal rule syntax.


In my experience with owl api, when a rule is triggered, no change occurred
in the original data, that means rule does not affect original model, and
an inferred model is generated, and this model can be store separated. 


Um, this is Jena not the OWL API. It is true that if a forward rule 
fires the result of the rule will be asserted in the deductions model.



When
I trigger rule 2, I obtained this inferred model, and when I inspect it, I
can see the individual with the property value, but wrong,


Not sure what you mean by "wrong". It's doing what it is supposed to do. 
You haven't bound ?ord so it can't assert a value for it in the 
deductions model.



in the same way,
when I execute rule 1, I do not obtain any result, and model holds blank. I
code the necessary control element to verify that inferred model of rule 1
holds blank, in other words I confirm that there is not triggering.


I repeat, if you can't get your rule 1 to fire and you still need help 
then show us actual minimal test data (need only be a few statements, 
sufficient to match the rule) and actual rule syntax not isolated 
fragments of your source code with key values missing. The chances are 
you have a namespace/URI error somewhere.


Dave


Currently I am executing the following rule:

rule_3 = (?b rdf_ns:type Entity)  (?b registration_Authority_entity_ID
*search_ID*) -> (?b rdf_ns:type Entity)

So, I obtain all ?b individuals with value search_ID, however I require
extra work to obtain the target property value, that will let me identify
my individual later, and I think I should be able to get it with the rule.


Hope this could clarify a little more my use case.


Luis Ramos











El lun., 16 mar. 2020 a las 10:04, Dave Reynolds ()
escribió:



On 16/03/2020 08:49, Luis Enrique Ramos García wrote:

Hi Dave,

thanks for your quick response,

I though that was the cause of the problem, however when I add the bind

to

?ord in the body, then the rule is not triggered , I changed  as in rule

1:



String rule 1=  (?b rdf_ns:type  GLEIF1_NS Entity)  (?b GLEIF1_NS has_ord
?ord) (?b GLEIF1_NS registration_Authority_entity_ID "search_ID') -> (?b
GLEIF1_NS has_ord ?ord)


What's that supposed to do? You seem to be binding ?ord then asserting
the same value back again. So that will have no effect on the data.


String rule 2=  (?b rdf_ns:type  GLEIF1_NS Entity)  (?b GLEIF1_NS
registration_Authority_entity_ID "search_ID') -> (?b GLEIF1_NS has_ord

?ord)

What's that supposed to do? There's no binding for ?ord so why would you
expect it to have a specific value?


as shown above the rule 1 does not trigger, rule 2 does trigger as
expected, but the output value does not corresponds to the value of

?ord. I

obtained this value > *ae791d81-7538-49ac-9436-898ede09d7b5*], and should
have been >

ord:urn:iso:std:iso:17442:2019:ed-1:v2:en:[097900BHID080614]

Sorry I can't follow what you are attempting to do. Tou aren't showing
us the actual data (with namespaces) or actual rules (with all these
variables expanded) so it's hard to spot the details. Also unclear what
you mean by "output", whether you are looking at a dump of the whole
model, just the deductions graph or something else.

How you do you know the first rule isn't firing? Since it makes no
change to the data it's going to be hard to tell. Use the print builtin
to help debug.

If you can't work it out then generate a minimal complete test case with
just minimal test data (ideally in turtle with all prefixes defined) and
minimal rule example (actual rule, not source code snippet that
generates the rule) then post that. Then maybe someone can spot what's
happening.

Dave


El lun., 16 mar. 2020 a las 9:13, Dave Reynolds (<

dave.e.reyno...@gmail.com>)

escribió:



On 16/03/2020 06:58, Luis Enrique Ramos García wrote:

Dear friends,

I am running a rule in a data set, which has the following format:

 http://www.example.com/onto/gleif1.owl#097900BHID080614;>
   


xml:lang="ia">ord:urn:iso:std:iso:17442:2019:ed-1:v2:en:[097900BHID080614]

   scope s.r.o.



Re: unexpected output in rule

2020-03-16 Thread Dave Reynolds



On 16/03/2020 08:49, Luis Enrique Ramos García wrote:

Hi Dave,

thanks for your quick response,

I though that was the cause of the problem, however when I add the bind to
?ord in the body, then the rule is not triggered , I changed  as in rule 1:


String rule 1=  (?b rdf_ns:type  GLEIF1_NS Entity)  (?b GLEIF1_NS has_ord
?ord) (?b GLEIF1_NS registration_Authority_entity_ID "search_ID') -> (?b
GLEIF1_NS has_ord ?ord)


What's that supposed to do? You seem to be binding ?ord then asserting 
the same value back again. So that will have no effect on the data.



String rule 2=  (?b rdf_ns:type  GLEIF1_NS Entity)  (?b GLEIF1_NS
registration_Authority_entity_ID "search_ID') -> (?b GLEIF1_NS has_ord ?ord)


What's that supposed to do? There's no binding for ?ord so why would you 
expect it to have a specific value?



as shown above the rule 1 does not trigger, rule 2 does trigger as
expected, but the output value does not corresponds to the value of ?ord. I
obtained this value > *ae791d81-7538-49ac-9436-898ede09d7b5*], and should
have been > ord:urn:iso:std:iso:17442:2019:ed-1:v2:en:[097900BHID080614]


Sorry I can't follow what you are attempting to do. Tou aren't showing 
us the actual data (with namespaces) or actual rules (with all these 
variables expanded) so it's hard to spot the details. Also unclear what 
you mean by "output", whether you are looking at a dump of the whole 
model, just the deductions graph or something else.


How you do you know the first rule isn't firing? Since it makes no 
change to the data it's going to be hard to tell. Use the print builtin 
to help debug.


If you can't work it out then generate a minimal complete test case with 
just minimal test data (ideally in turtle with all prefixes defined) and 
minimal rule example (actual rule, not source code snippet that 
generates the rule) then post that. Then maybe someone can spot what's 
happening.


Dave


El lun., 16 mar. 2020 a las 9:13, Dave Reynolds ()
escribió:



On 16/03/2020 06:58, Luis Enrique Ramos García wrote:

Dear friends,

I am running a rule in a data set, which has the following format:

http://www.example.com/onto/gleif1.owl#097900BHID080614;>
  
xml:lang="ia">ord:urn:iso:std:iso:17442:2019:ed-1:v2:en:[097900BHID080614]

  scope s.r.o.



search_ID



RA000526

  scope s.r.o.
*


ord:urn:iso:std:iso:17442:2019:ed-1:v2:en:[097900BHID080614]*

  1


where I want to obtain the *has_ord* property value with the rule:

   String rule_rid= "[rule1: (?b "+rdf_ns+"type "+GLEIF1_NS+"Entity) "
+ "(?b "+GLEIF1_NS+"registration_Authority_entity_ID
"+"'"+search_ID+"')"//get all gleif entities ID
+ "-> (?b "+GLEIF1_NS+"has_ord ?ord)]";//put the output


That's very hard to read but unless I'm missing something there's
nothing in the body of the rule to bind ?ord.

Dave



The rule is triggered as expected, however the value in the output does

not

corresponds to the real value:

output:

http://www.example.com/onto/gleif1.owl#has_ord,
*ae791d81-7538-49ac-9436-898ede09d7b5*]

but, it should be:

http://www.example.com/onto/gleif1.owl#has_ord,*
ord:urn:iso:std:iso:17442:2019:ed-1:v2:en:[097900BHID080614]
]*


I am running the jena rule against a model stored in a tdb database.

Thanks in advanced for your support.


Luis Ramos







Re: unexpected output in rule

2020-03-16 Thread Dave Reynolds



On 16/03/2020 06:58, Luis Enrique Ramos García wrote:

Dear friends,

I am running a rule in a data set, which has the following format:

   http://www.example.com/onto/gleif1.owl#097900BHID080614;>
 ord:urn:iso:std:iso:17442:2019:ed-1:v2:en:[097900BHID080614]
 scope s.r.o.

search_ID
 RA000526
 scope s.r.o.
*
ord:urn:iso:std:iso:17442:2019:ed-1:v2:en:[097900BHID080614]*
 1


where I want to obtain the *has_ord* property value with the rule:

  String rule_rid= "[rule1: (?b "+rdf_ns+"type "+GLEIF1_NS+"Entity) "
+ "(?b "+GLEIF1_NS+"registration_Authority_entity_ID
"+"'"+search_ID+"')"//get all gleif entities ID
+ "-> (?b "+GLEIF1_NS+"has_ord ?ord)]";//put the output


That's very hard to read but unless I'm missing something there's 
nothing in the body of the rule to bind ?ord.


Dave



The rule is triggered as expected, however the value in the output does not
corresponds to the real value:

output:

http://www.example.com/onto/gleif1.owl#has_ord,
*ae791d81-7538-49ac-9436-898ede09d7b5*]

but, it should be:

http://www.example.com/onto/gleif1.owl#has_ord,*
ord:urn:iso:std:iso:17442:2019:ed-1:v2:en:[097900BHID080614]
]*


I am running the jena rule against a model stored in a tdb database.

Thanks in advanced for your support.


Luis Ramos



Re: SPARQL performance question

2020-02-24 Thread Dave Reynolds



On 24/02/2020 13:55, Steve Vestal wrote:

Responses and questions inserted...

On 2/24/2020 3:02 AM, Dave Reynolds wrote:

On 23/02/2020 23:11, Steve Vestal wrote:

If I comment out the FILTER clause that prevents variable aliasing, the
query is processed almost immediately.  The number of rows goes from 192
to 576, but it's fast.


Interesting. That does suggest it might actually be Sparql rather than
inference that's the bottleneck. The materialization experiment will
be a test of that.


Have you done that test?


I earlier iterated over statements.  To make sure that I fully
materialize all possible entailments, do I need to query for ?s ?p ?o?
Any suggestions on the most efficient way to do this materialization?


As has been said several times now - copy the whole model to a plain 
in-memory model and query that.


Querying all statements in the inf model will not, itself, fill in all 
the different goal patterns for the backward chainer tables. Hence the 
suggestion to materialize to a separate model not just "warm up the 
backward rule caches" in an inf model.



Though looking at your query I wonder if you need inference at all -
we can't see your data to be sure since the list doesn't allow
attachments.
Have you tried without any inference? Do you know what inference you
are relying on?

I tried the following.

 OntModelSpec.OWL_DL_MEM_RULE_INF
 OntModelSpec.OWL_MEM_RULE_INF
 OntModelSpec.OWL_LITE_MEM_TRANS_INF
 OntModelSpec.OWL_LITE_MEM_RULES_INF
 OntModelSpec.OWL_MEM_RDFS_INF
 OntModelSpec.OWL_MEM_MICRO_RULE_INF
 OntModelSpec.OWL_MEM

I do need some reasoning, minimally chasing through some shallow type
hierarchies and transitive properties/predicates/roles.  


Sounds like the sort of inference that can be done with some custom 
forward rules (which would eliminate all this backward chainer caching) 
or some SPARQL construct queries.



What is the proper way to write a query when you
want a particular set of variables to have distinct solution values?


Not sure there is a better way in general. However, I wonder if you
can partition your query into subgroups, filter within the groups then
do a simpler join on the results. That might reduce the combinatorics.



I had earlier thought briefly about coming up with a more general
pre-fetch query that would collect a set of asserted triples guaranteed
to include triples of possible interest into a separate model of
(hopefully) much smaller size, and then running my sequence of
queries-with-reasoning on that.  Has this sort of thing been done
successfully?  What gave me pause is that some triples derived from
query results will need to be added back into the original model, and
I'm not sure how blank nodes would play into that. But pre-fetch models
in practice would likely not be smaller than this test case model.


Sounds like a different notion from that which I was suggesting. I was 
suggesting partitioning the query to reduce the combinatorics not 
partitioning the data. May or may not be possible/relevant.



In one or two earlier postings, mention was made of Pellet as being more
efficient and complete in some cases.  My impression is that a Pellet
reasoner is not bundled with Jena, and I would have to find and install
one myself (although the Protege wiki mentions one is available in
Jena).  Is that correct?  A general web search turned up a number of
sources, e.g., openpellet, mindswap, stardog.   Does anyone have any
recommendations and a link to a site that has the master version
compatible with Jena 3 and having a reasonably clear and smooth
install?  Are any of the other OWL reasoners out there packaged for use
with Jena?


Pellet is a full DL reasoner and so both complete and, for many 
challenging cases, higher performance. You are correct it is not part of 
Jena. I believe OpenPellet is the right version to look at but I've no 
direct experience with it, not something the Jena team can support. 
That's the only only third partly open source jena reasoner I'm aware of.


However, your first job is to check if it really is the reasoner or the 
query that's the bottleneck by doing the materialize-to-plain-model 
test. You could probably even do that with test data that doesn't need 
any inference.


Dave


Re: SPARQL performance question

2020-02-24 Thread Dave Reynolds

On 23/02/2020 23:11, Steve Vestal wrote:

If I comment out the FILTER clause that prevents variable aliasing, the
query is processed almost immediately.  The number of rows goes from 192
to 576, but it's fast.  


Interesting. That does suggest it might actually be Sparql rather than 
inference that's the bottleneck. The materialization experiment will be 
a test of that.


Though looking at your query I wonder if you need inference at all - we 
can't see your data to be sure since the list doesn't allow attachments.
Have you tried without any inference? Do you know what inference you are 
relying on?



What is the proper way to write a query when you
want a particular set of variables to have distinct solution values?


Not sure there is a better way in general. However, I wonder if you can 
partition your query into subgroups, filter within the groups then do a 
simpler join on the results. That might reduce the combinatorics.


However, I don't understand your query nor the modelling (especially 
around simplexConnect, which looks odd) so might be wrong about that.



I speculated that when I iterated over the statements in the OntModel,
and the number went from a model size() of ~1500 to ~4700 iterated
statements, that I was materializing the entire inference closure (which
was fast).  Is there some other set of calls needed to do that?


The jena inference engines supports a mix of forward and backward 
inference rules. The forward inference rules will run once and store all 
the results. That's the growth you are probably seeing. That's then 
efficient to query.


The backward rules are run on-demand. They generally (this is 
controllable) cache the results of the particular triple patterns that 
are requested. Because they only cache against the specific patterns 
("goals") they see then, depending on what order the goals come in, you 
can get cases where there's redundancy in those caches. Those caches 
aren't particularly well indexed either. You can certainly query one way 
and fill up one set of caches but then a different query asks for 
different patterns and more rules still need to fire.


*If* multiple overlapping caches in the backward rules is the issue 
*then* materializing everything and not using inference after that  can 
help. It's a balance of whether you are going to query for most of the 
data or just do a bunch of point probes. In the former case it's better 
to work everything out once. In the latter case better to use on demand 
rules.


Your query pattern looks like it's going to touch everything.


Are there circumstances where it is faster to materialize the entire
closure and query a plain model than to query the inference model itself?


Yes, see earlier message, and above.

Dave


On 2/23/2020 3:33 PM, Dave Reynolds wrote:

The issues is not performance of SPARQL but performance of the
inference engines.

If you need some OWL inference then your best bet is OWLMicro.

If that's tow slow to query directly then one option to try is to
materialize the entire inference closure and then query that. You can
that by simply copying the inference model to a plain model.

If that's too slow then you'll need a higher performance third party
reasoner.

Dave

On 23/02/2020 18:57, Steve Vestal wrote:

I'm looking for suggestions on a SPARQL performance issue.  My test
model has ~800 sentences, and processing of one select query takes about
25 minutes.  The query is a basic graph pattern with 9 variables and 20
triples, plus a filter that forces distinct variables to have distinct
solutions using pair-wise not-equals constraints.  No option clause or
anything else fancy.

I am issuing the query against an inference model.  Most of the asserted
sentences are in imported models.  If I iterate over all the statements
in the OntModel, I get ~1500 almost instantly.  I experimented with
several of the reasoners.

Below is the basic control flow.  The thing I found curious is that the
execSelect() method finishes almost instantly.  It is the iteration over
the ResultSet that is taking all the time, it seems in the call to
selectResult.hasNext(). The result has 192 rows, 9 columns.  The results
are provided in bursts of 8 rows each, with ~1 minute between bursts.

      OntModel ontologyModel = getMyOntModel(); // Tried various
reasoners
          String selectQuery = getMySelectQuery();
      QueryExecution selectExec =
QueryExecutionFactory.create(selectQuery, ontologyModel);
      ResultSet selectResult = selectExec.execSelect();
      while (selectResult.hasNext()) {  // Time seems to be spent in
hasNext
          QuerySolution selectSolution = selectResult.next();
          for (String var : getMyVariablesOfInterest() {
              RDFNode varValue = selectSolution.get(var);
  // process varValue
          }
      }

Any suggestions would be appreciated.





Re: disable lazy infgraph

2020-02-06 Thread Dave Reynolds

Hi,

The prepare() call you are using is the right way to go. This is the 
closest to incremental processing the engine supports - the reset() and 
rebind() calls basically start over from scratch.


Triggering it on every update will be inefficient but the best you can 
do in most general case to achieve eager eval. Depending on your 
application you may be able to batch these (with count and/or time 
threshold) or trigger after some application specific batch of updates.


Dave


On 06/02/2020 16:20, Nouwt, B. (Barry) wrote:

Hi all,

We are using Apache Jena in our project and noticed that the 
GenericRuleReasoner InfGraph is lazy. Namely, after a SPARQL INSERT query on 
the dataset, the matching rules do not get applied immediately. Only after 
performing a SPARQL SELECT query on the dataset, the matching rules will be 
applied.

We would like to disable this lazy behavior, because it conflicts with our use 
case, but we are not sure how to do it properly (and less brute force). 
Currently, we attach a GraphListener to that graph in which we force a 
prepare() each time one of GraphListener's update methods is called, but this 
has some side effects (among others the reapplying of rules that were already 
applied) that we would like to prevent. I assume there is a less brute force 
manner to kick the reasoner into action (similar to what a SPARQL SELECT query 
achieves), but I'm not sure what this is. If I look at the source code of Jena 
I see several candidates, like the BaseInfGraph.rebind() and 
BaseInfGraph.reset() method, are those better options?

Does anyone has an idea how to achieve non-laziness of the InfGraph some other 
way?

Thanks in advance!

Barry



This message may contain information that is not intended for you. If you are 
not the addressee or if this message was sent to you by mistake, you are 
requested to inform the sender and delete the message. TNO accepts no liability 
for the content of this e-mail, for the manner in which you use it and for 
damage of any kind resulting from the risks inherent to the electronic 
transmission of messages.



Re: super slow filter

2020-01-22 Thread Dave Reynolds

On 21/01/2020 20:37, Élie Roux wrote:


which I believe might result in a penalty... although frankly, I still
can't understand how a very basic bgp like

  21 (bgp
  22   (triple bdr:G844 ?rel ?res)
  23   (triple ?res rdf:type :Place)
  24   (triple ?res skos:prefLabel ?reso)

can take 100ms. Is there a way to tune the optimization level of
features in queries or at the Fuseki level?


As Lorenz says, do you have a stats.opt file?

A possible explanation is that you might be using fixed.opt instead of 
stats.opt (or have some really out of date stats file).


With fixed.opt the optimizer will reorder based on the more grounded 
triples. In your case this is the second pattern in that block:


(triple ?res rdf:type :Place)

If there are a lot of :Places compared to properties of your particular 
place bdr:G844 then this isn't optimal.


Apart from using stats.opt, with the option to manually tune the rules, 
you have the option to use none.opt to stop any reordering of triple 
patterns in bgps. That allows you to write the triple patterns in 
optimal order for your data (which you have in this case).


Dave


Re: unexpected output when running rule

2020-01-20 Thread Dave Reynolds
In your example code you have an OntModel which is specified to include 
a reasoner using the default OWL rules. You then have an InfModel on top 
of that with your own rule. So the OWL rules are generating entailments 
from the domain declarations for born_in which are then visible through 
your enclosing rule InfModel.


I'm not sure exactly want you want to achieve ...

If you want an OntModel interface but with just your rule(s) then one 
easy way is to generate your own OntModelSpec based off OWL_MEM and 
attach your reasoner to it using setReasoner, then use that to create 
your OntModel. Don't bother with a separate InfModel. That way you just 
have one layer of model and avoid any confusion from nesting models.


If you just want a plaing InfModel interface with just your rules then 
you can create the InfModel as you are doing over either a plain Model 
or over an OntModel with no second reasoning layer.


If you want OWL inference and then want to run your own rules on top of 
that then the setup you have is one way to do that.


Comments inline below ...

On 20/01/2020 10:06, Luis Enrique Ramos García wrote:

Dear friends of jena community,



I am testing the syntax of apache jena rules, with a very simple example,
where I create the following model and  classes:


   OntModel m = ModelFactory.createOntologyModel();

OntClass c0 = m.createClass( NS + "c0" );
OntClass c1 = m.createClass( NS + "c1" );
OntClass c2 = m.createClass( NS + "c2" );

and the following individuals as members of the its respective classes:

//creation of individual

Individual i0 = m.createIndividual( NS + "individual0", c0 );
Individual i1 = m.createIndividual( NS + "individual1", c1 );

  when I run the rule, that says:

if individual* i? *is member of c0, then has to be member of *c2*

String *ruleSrc* = "[rule1: (?a
http://www.w3.org/1999/02/22-rdf-syntax-ns#type www.example.com#c0) -> "
+ "(?a http://www.w3.org/1999/02/22-rdf-syntax-ns#type www.example.com#c2)
]";


More readable if you use the builtin prefixes like rdf:type

Also www.example.com# is a (scheme-) relative URL, not that it matters 
in this case but better to get used to using absolute URLs including the 
http:// bit.




the rule is triggered as expected and give me the following result:

  [www.example.com#individual0,
http://www.w3.org/1999/02/22-rdf-syntax-ns#type, www.example.com#c2]>

Nevertheless, when I add more information to the model, and say that
individuals i0 and i1 have birthday:

i0.addProperty(born_in, date1.toString());
i1.addProperty(born_in, date2.toString());

The behavior of rule output changed, and I obtain a different result:
http://www.w3.org/1999/02/22-rdf-syntax-ns#type,
www.example.com#c2] [www.example.com#individual0,
http://www.w3.org/1999/02/22-rdf-syntax-ns#type, www.example.com#c2]>



Where individual 1 is declared as member of c2, it means that individual1
is member of c0, something that I did not declared.


You did but indirectly. You declared that the domain of born_in includes 
c0 and c1. So the with-OWL-inference OntModel will deduce, among other 
things:


   :i0 rdf:type  :c0

Then your own rule, runing in the InfModel, sees that and since it 
states that anything of type c0 is also of type c2 it deduces that:


   :i0 rdf:type  :c2



I changed model declaration as follows:

   OntModel m = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);

and obtained the expected result, however the documentation says there is
not reasoning there, 


Correct. That way the OntModel itself is not doing any reasoning, so the 
only reasoning you see out of your InfModel is from your own rule set 
and nothing more. Just one reasoner in play.



thus I implemented a reasoner model and obtained an
unexpected result again.


Don't follow what you did but it shoulds no different from what you show 
below with two layers of inference one on top of the other.



My main concern is that when I inspect the *asserted model , *I see it
contains individual 1 declared as part of c0, something that, according to
my understanding should not occur, because I have not declared that.


The based model you are giving to the InfModel is the OntModel *with OWL 
inference* (by virtue of the OWL_DL_MEM_RULE_INF spec). It is not an 
"asserted model".


In general avoid having lots of layers of different models with multiple 
reasoners runing other the top of each other unless that is absolutely 
what you need.


Dave



Any comment and recommendation is welcomed.

Bellow is the whole java code.


Luis Ramos


CODE
**

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.List;

import org.apache.jena.ontology.DatatypeProperty;
import org.apache.jena.ontology.Individual;
import org.apache.jena.ontology.OntClass;
import org.apache.jena.ontology.OntModel;
import org.apache.jena.ontology.OntModelSpec;
import 

Re: How to enter OWL cardinality restrictions?

2020-01-18 Thread Dave Reynolds

Yes, that's right.

The Jena OWL API doesn't support OWL 2, only OWL 1.

While OWL 1 had plain cardinality restrictions, among others, qualified 
cardinatlity restrictions didn't come in until OWL 2. You would have to 
create the corresponding triples using the RDF API.


Dave

On 18/01/2020 08:44, Lorenz Buehmann wrote:

I'd say none of the profiles does support it given that it would be
beyond OWL DL, OWL Lite and RDFS and part of OWL 2.

Source code also indicates this [1].

Not sure if this was ever added to Jena, Dave or Andy know for sure
better than me. At least the docs [2] would be confusing if this feature
isn't available via Ontology API.


[1]
https://github.com/apache/jena/blob/master/jena-core/src/main/java/org/apache/jena/ontology/impl/OWLProfile.java#L166-L172

[2]
https://jena.apache.org/documentation/ontology/#restriction-class-expressions

On 18.01.20 03:07, Steve Vestal wrote:

I am trying to create an OntModel that contains a
CardinalityQRestriction.  If the OntModel is an OWL_MEM model, then I
get the exception "Attempted to use language construct CARDINALITY_Q
that is not supported in the current language profile: OWL Full."   If I
create the OntModel as an RDFS_MEM model, I get the exception "Attempted
to use language construct ONTOLOGY that is not supported in the current
language profile: RDFS."   What model configuration is needed to for
cardinality Q restrictions?

My impression from the OntModel API is that the W3C OWL cardinality
restrictions map to the OntModel cardinality Q restrictions, except
maybe the Q restrictions also impose universal quantification as well as
cardinality restrictions.  Is that so?  How would theOWL standard
cardinality restriction ObjectMinCardinality(nonNegativeInteger
ObjectPropertyExpression ClassExpression) be entered?




Re: Missed inference in Jena Fuseki

2020-01-09 Thread Dave Reynolds

Hi,

On 08/01/2020 20:57, Andrea Leofreddi wrote:

Hello,
I'm using Jena Fuseki 3.13.1 (with OWLFBRuleReasoner), and I have 
asserted (uploaded) the following triples:


@prefix rdfs:  .
@prefix owl:  .
@prefix f:  .

f:Bob f:hasWife f:Alice .
f:Bob f:hasWife f:Alice2 .
f:Alice2 f:hasHusband f:Bob2 .

f:hasWife a owl:FunctionalProperty .
f:hasWife a owl:InverseFunctionalProperty .
f:hasHusband owl:inverseOf f:hasWife .


Now, If I query and ASK { f:Alice owl:sameAs f:Alice2 }, I get 
true. However, If I ASK { f:Bob owl:sameAs f:Bob2 }, I get false! 
Loading the same triples on another reasoner (owl-rl), I get the 
triple f:Bob owl:sameAs f:Bob2 inferred.


I have asked this very question on StackOverflow 
(https://stackoverflow.com/questions/59603569/jena-fuseki-missed-inference?noredirect=1#comment105379894_59603569), 
and I got pointed out to the owl-fb rules file. Tweaking it a bit I 
noticed that If I explicitly add the forward version of inverseOf, I get 
that f:Bob owl:sameAs f:Bob2:


[inverseOf2b: (?P owl:inverseOf ?Q), (?X ?P ?Y) -> (?Y ?Q ?X) ]


Am I missing something?


No, that seems right.

The issue is that the default jena rules file for OWL (the fb rules) 
uses a mix of forward and backward rules in order to make the various 
performance/completeness trade-offs.


Those rules are "stratified". All the forward rules run. Then when you 
ask queries those use the backward rules (including extra backward rules 
generated by the forward rules) which expand on the results of the 
forward phase.


This approach allowed us to get usable performance for a range of use 
cases but with the risk of incomplete results in some cases where the 
stratification is wrong. The challenge has always been to make the right 
trade-off for the average use cases while leaving the areas of 
incompleteness understandable enough that developers can work with it.


Originally sameAs reasoning was done in the backward rules and that 
proved unscalable due to limitations of the engine. So the sameAs rules 
were moved to the forward phase. Which means that the existing inverseOf 
backward phase is missed.


Duplicating the inverseOf reasoning into the forward phase, in the way 
you've done, is one way to solve this. Since it's working for you, stick 
with it.


The general solution is not so clear since the cost of this simple 
approach is that all inverseOf implications are fully materialized in 
the forward phase, rather than on demand. This means that people who 
don't care about sameAs but do care about inverses may see a performance 
hit. Since equality reasoning in jena is really slow then I doubt anyone 
is relying on that in production whereas the simple things like 
inverseOf may be in use.


An alternative could be to look at augmenting the sameAs rules with some 
backward rules which handle the interaction with inverseOf. Should be 
possible.


If you'd like to create a ticket for this to track the issue that would 
be great. Though no promises how quickly I could look at this.


Dave


Re: reasoner performance

2020-01-07 Thread Dave Reynolds

On 07/01/2020 08:31, Luis Enrique Ramos García wrote:

Dear friends,

I am currently working in an application in where I have to implement a
reasoner, in which I have had some experience, the difference is that this
time i have to implement it in a big data environment, where I have to deal
with a data set od some giga bytes.

About that, my questions are the following:

1. is there a benchmark or evaluation of performance of jena with some
reasoners, which consider memory or quantity of triples, and
execution time?.


Depends what sort of inference you are talking about.

Apart from the OWL benchmarks you mention, some of the Sparql benchmarks 
do require small amounts of reasoning loosely around RDFS++. For 
example, I seem to remember LUBM requires this but I've never worked 
with it.


Jena's inference is not designed to scale to billons of triples, it's a 
memory-only solution (though "giga byes" might mean just millions of 
triples and might fit in memory). So reasoning at scale benchmarks on 
Jena are not going to be much use to you. Look at the results for 
commercial stores that do claim inference at scale.



2. is elephas, and a map reduce approach a good alternative to deal with a
big data environment?


Depends what sort of inference you are talking about and whether you 
care about latency or just overall throughput at scale. Map reduce is 
not good for low latency interactive queries.



3. is necessary a triple store to use with reasoner and rule engine?, in
that case what do you recommend?


Don't understand the question. Triple stores and reasoners are different 
things. You can have reasoners that have nothing to do with 
RDF/triple-stores and you can have triple stores with no reasoner. There 
are fair number of commercial and open source tools in both categories 
and in the overlap.


Dave


Re: Question about adding sub-models

2019-11-25 Thread Dave Reynolds



On 25/11/2019 00:23, Steve Vestal wrote:

If I had three OntModels, each with their own OntDocumentManager,
FileManager, and choice of reasoner,

         OntModel ontoA
         OntModel ontoB
         OntModel ontoCommon

and I then do the following

         ontoA.addSubModel(ontoCommon)
         ontoB.addSubModel(ontoCommon)

how is ontoCommon loaded into ontoA and ontoB?  Whose settings for
import closure, alternate URL lookup, caching, etc., are used?  Which
reasoner?  Will the settings for ontoA determine how import closure,
reasoning, etc., is handled when ontoCommon is added to ontoA?


The OntModels act as if they were Models (with a richer API) but 
potentially with additional triples in them as a result of import 
processing and inference.


When you create ontoCommon there may have been some import processing 
depending on what the settings for it are. If you have configured 
inference for it then it will act as if there are more inferred triples 
present (some may have been manifest, some may be lazily evaluated).


When you add ontoCommon to ontoA then none of that goes away. Whatever 
triples are in (appear to be in) ontoCommon will now be visible as part 
of the ontoA union.


Separately the settings for import processing etc for ontoA will control 
how ontoA was created. Note that addSubModel itself does not trigger any 
import processing, it just adds the model as one more element of the 
union. However, it does optionally reset ("rebind") the ontoA reasoner 
state. So if you have inference configured for ontoA then that inference 
will still be run but it'll now have as the base triples to work from 
those in ontoCommon which will including any as a result of *its* import 
and inference settings.


In general, having one reason work over the top of another is a bad idea 
in terms of performance.


Dave


Re: Question about multiple OntModel reads

2019-10-23 Thread Dave Reynolds

On 23/10/2019 03:16, Steve Vestal wrote:

The description at https://jena.apache.org/documentation/ontology/ talks
about reading an ontology in as the base graph.  Import closure puts
each imported ontology into its own graph to produce a "compound
document."  What happens if I read more than one ontology into the same
OntModel?  Are they merged into a single base graph?


Yes. If you add a triple to an OntModel that triple goes into the base 
graph. Read is just adding a bunch of triples.



It says all updates change the base model, in the event a write is
done.  Am I correct in assuming "change" refers to things like calling
OntModel methods, and that while reasoners may affect results returned
by SPARQL queries or Graph contains methods, reasoning will not "change"
the base model in that sense.


Correct. Reasoning is packaged as a wrapper model which makes additional 
inferred triples available but doesn't change the base model.


Dave



Re: Jena OWL Reasoner question

2019-10-03 Thread Dave Reynolds

There's not a lot details there to go on.

As a general principle Pellet is a complete DL reasoner whereas the jena 
OWL rules are much more limited. If you want complete and performant DL 
inference then use a proper DL reasoner. In simple cases the jena rules 
can give useful results in reasonable time and they serve a useful 
purpose in many settings. However, for hard cases they can easily fall 
into exponential behaviour. Even just equality reasoning can be 
problematic. Ontology size it not itself an indication of reasoning 
complexity.


Dave

On 02/10/2019 13:37, Zlatareva, Neli (Computer Science) wrote:

Hi, I am trying to run a very small ontology with Jena OWLReasoner using Jena 
libraries in Eclipse and it runs forever. The memory was initially an issue, 
but after I added extra 16GB of RAM, it was still working after 24 hours. This 
is an ontology that runs in under a second with Pellet. Any idea why is that? I 
will really appreciate any suggestions.
Thank you SO MUCH.
Regards, Neli.

Neli P. Zlatareva, PhD
Professor of Computer Science
Department of Computer Science
Central Connecticut State University
New Britain, CT 06050
Phone: (860) 832-2723
Fax: (860) 832-2712
Web site: cs.ccsu.edu/~neli/



Re: is it possible to combine kleene paths and rules?

2019-09-05 Thread Dave Reynolds

Hi James,

On 05/09/2019 09:19, james anderson wrote:

good morning;


On 2019-09-05, at 09:46:27, Dave Reynolds  wrote:

Hi,

In principle, I could imagine it would be possible to allow property paths in 
the predicate position in body patterns.


yes, that i would expect.
i am interested in the other variant: where the effect of a predicate is 
defined b a rule.


Sorry don't follow what you have in mind here. However, that likely 
means that the answer ("not aware of any work on this") doesn't change :)


Cheers,
Dave


Personally I'm not aware of any work like this.

Dave

On 03/09/2019 09:40, James Anderson wrote:

good morning;
has there been any experience combining the general purpose rule engine with 
arbitrary length property paths?
when i read the jena documentation, i fail to find “path" on the page
 https://jena.apache.org/documentation/query/property_paths.html
and neither does either “rule" or “reason" appear on
 https://jena.apache.org/documentation/inference/
but the descriptions do leave the impression that it should be possible to 
combine the two.
are there any well known examples?
best regards, from berlin,




Re: is it possible to combine kleene paths and rules?

2019-09-05 Thread Dave Reynolds

Hi,

In principle, I could imagine it would be possible to allow property 
paths in the predicate position in body patterns.


Personally I'm not aware of any work like this.

Dave

On 03/09/2019 09:40, James Anderson wrote:

good morning;

has there been any experience combining the general purpose rule engine with 
arbitrary length property paths?


when i read the jena documentation, i fail to find “path" on the page

 https://jena.apache.org/documentation/query/property_paths.html

and neither does either “rule" or “reason" appear on

 https://jena.apache.org/documentation/inference/

but the descriptions do leave the impression that it should be possible to 
combine the two.
are there any well known examples?

best regards, from berlin,



Re: Any way to enforce constraints at assertion-time?

2019-08-13 Thread Dave Reynolds

On 13/08/2019 04:09, Jeff Lerman wrote:

Is there any way, with Jena (either the distributed version or via any
additional software anyone is aware of) to implement enforcement of
constraints on assertions, at the time of assertion?


Nothing built in. The reasoners do support a validation method which is 
implemented in the OWL reasoners and you can create your own validation 
rules in the generic rule reasoner [1]. However, there's no automatic 
checking of updates against such reasoner validation. For the OWL 
reasoner such enforcement would be impractically slow.



In particular, it’d be very helpful to be able to protect the graph(s) from
any assertion that breaks the assumption (in RDFS and OWL reasoners) that
rtfs:subClassOf “edges” form a directed acyclic graph.


There's no such assumption.


 Ideally, an UPDATE
(or other operation adding content to the store) would fail if that rule
(and maybe a small collection of similarly “basic” rules) would be violated
by storing the updated data.  For example, this would fail with an error:

prefix rdfs: 
prefix owl: 
PREFIX rdf: 
PREFIX local: 

INSERT DATA {
   GRAPH local:notadag {
  rdfs:subClassOf  .
  rdfs:subClassOf  .
   }
}


That's a perfectly legal set of assertions and equivalent to asserting

 owl:equivalentClass  .

Indeed applications which limit themselves to RDFS for one reason or 
another have this as their only idiom for expressing class equivalence. 
Furthermore the OWL reasoners will effectively infer this pair from any 
direct or indirect expression of class equivalence.


It's also always the case that:

 rdfs:subClassOf  .

Dave

[1] https://jena.apache.org/documentation/inference/#RULEnotes


Re: Combining inferences from GRR and RDFS or OWL reasoner

2019-07-22 Thread Dave Reynolds

Hi Pierre,

Good luck with tracking down the issues.

It's certainly the case that the rule system can be slow even over 
memory based stores, so when running over a TDB store then performance 
can definitely be an issue and could make some queries look like they'll 
never finish.


There was a recent report of a transaction error with the rule systems:
https://issues.apache.org/jira/projects/JENA/issues/JENA-1719

That instance has been fixed for 3.13.0 so if you continue to run into 
those "in transaction" messages then try a nightly build. However, as it 
says in that Jira, the underlying issue runs fairly deep (due to the age 
of the rules systems which were build in an era of in memory stores and 
less clear transaction management) so there may be other manifestations 
of the problem with haven't been caught.


Regards,
Dave

On 22/07/2019 17:55, Pierre Grenon wrote:

Hi Dave,

Thanks so much for your detailed reply, it helped.

I’ve looked into 1 so far (using @include) and I seem to have some good results 
whether using  or some file with the rule set. (In fact I tried different 
variants of the rdfs rule set and it seemed to be doing about the same.) Yes, indeed, 
it has to be all config based I’m afraid as java isn’t the way to go in the context.

However, I haven’t ironed out yet my config and I ran into a few crippling 
inconveniences with Fuseki 3.12 (some apparently non terminating queries – you 
need to hit ‘enter’ in the fuseki-server prompt to get a response, some 
annoying white space issues in the file names -- %20 doesn’t help – and I had 
some ‘in transaction’ message erratically at some point – no clue and I’m sorry 
that I can’t reproduce this now.)

I need to look more into https://jena.apache.org/documentation/inference/#rules 
I suppose as this is all a bit too much shooting in the dark.

I’m trying again from scratch with a clean example, sticking with Fuseki 3.10. 
I’ll post my config file if I manage.

I have not looked at 2 yet.

With many thanks and kind regards,
Pierre

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION.
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN.

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014).
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION
ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS,
PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/.

HORIZON ASSET LLP IS AUTHORISED AND REGULATED
BY THE FINANCIAL CONDUCT AUTHORITY.


From: Dave Reynolds [mailto:dave.e.reyno...@gmail.com]
Sent: 21 July 2019 09:54
To: users@jena.apache.org
Subject: Re: Combining inferences from GRR and RDFS or OWL reasoner

Hi Pierre,

You've two options for combining a GRR rule set and RDFS inference
either combine the rules into a single rule set and use a GRR
configuration or, as you say, layer the GRR reasoner over the top of an
RDFS reasoner.

1. To combine the rules then in your GRR rule set use the directive:

@include .

to include the rules for RDFS inference before your own rules. There are
some restrictions here though. Firstly, the default RDFS rules which are
included by this method need the transitive reasoner enabled.
Programmatically that's easy:

reasoner.setTransitiveClosureCaching(true);

However, I'm not sure if the assembler notation supports that. I'm not
really familiar with the assembler machinery but a quick glance through
the documentation and schema didn't turn up any obvious way to set this.

If there really is no way to set this flag in assemblers, and if you are
restricted to using assemblers, then a workaround (other than option 2,
below) would be to use an RDFS rule set which doesn't need the
transitive reasoner. Jena includes such a rule set:

https://github.com/apache/jena/blob/master/jena-core/src/main/resources/etc/rdfs-fb.rules<https://github.com/apache/jena/blob/master/jena-core/src/main/resources/etc/rdfs-fb.rules>

You could copy those somewhere visible to your application and @include
them from there.

The other limitation of the single ruleset approach is that, because the
RDFS rules using a mix of forward and backward chaning, the rules you
add on top all[*] need to be written as backward chaining rules.
Otherwise they won't "see&qu

Re: Combining inferences from GRR and RDFS or OWL reasoner

2019-07-21 Thread Dave Reynolds

Hi Pierre,

You've two options for combining a GRR rule set and RDFS inference 
either combine the rules into a single rule set and use a GRR 
configuration or, as you say, layer the GRR reasoner over the top of an 
RDFS reasoner.


1. To combine the rules then in your GRR rule set use the directive:

@include .

to include the rules for RDFS inference before your own rules. There are 
some restrictions here though. Firstly, the default RDFS rules which are 
included by this method need the transitive reasoner enabled. 
Programmatically that's easy:


   reasoner.setTransitiveClosureCaching(true);

However, I'm not sure if the assembler notation supports that. I'm not 
really familiar with the assembler machinery but a quick glance through 
the documentation and schema didn't turn up any obvious way to set this.


If there really is no way to set this flag in assemblers, and if you are 
restricted to using assemblers, then a workaround (other than option 2, 
below) would be to use an RDFS rule set which doesn't need the 
transitive reasoner. Jena includes such a rule set:


https://github.com/apache/jena/blob/master/jena-core/src/main/resources/etc/rdfs-fb.rules

You could copy those somewhere visible to your application and @include 
them from there.


The other limitation of the single ruleset approach is that, because the 
RDFS rules using a mix of forward and backward chaning, the rules you 
add on top all[*] need to be written as backward chaining rules. 
Otherwise they won't "see" the results of the RDFS backward chaining rules.


2. The alternative is, as you say, to configure an RDFS reasoner, then 
configure a GRR instances whose base model is your RDFS reasoner. You 
wouldn't then need a union as well - all the triples visible in the RDFS 
InfGraph would be visible through the GRR InfGraph.


Dave

[*] Well, at least those that might be affected by the results of RDFS 
inference.


On 19/07/2019 22:36, Pierre Grenon wrote:

apologies for piecemeal post -- didn't copy the whole file at first, so the 
RDFS reasoner, in particular, wasn't there.

I'm wondering if I need to have an inf model with the GRR reasoner with an RDFS 
reasoner submodel and also the reverse and then union these... sounds a bit 
weird.

With many thanks,
Pierre

*Rest of config file*

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION.
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN.

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014).
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION
ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS,
PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/.

HORIZON ASSET LLP IS AUTHORISED AND REGULATED
BY THE FINANCIAL CONDUCT AUTHORITY.


<#theModel_RDFS> a ja:InfModel ;
 ja:baseModel <#theGraph> ;
ja:reasoner [
ja:reasonerURL 
] ;
.

<#theGraph> rdf:type tdb2:GraphTDB ;
tdb2:dataset :theTDB2Dataset .

:theTDB2Dataset
 a  tdb2:DatasetTDB2 ;
 tdb2:location  
"C:\\dev\\apache-jena-fuseki-3.10.0\\run/databases/Conference1" ;
tdb2:unionDefaultGraph true.


-Original Message-
From: Pierre Grenon
Sent: 19 July 2019 17:48
To: 'users@jena.apache.org'
Subject: Combining inferences from GRR and RDFS or OWL reasoner

Hello,

(I apologise for I am not sure if this has been addressed before and I have
not found the right thread or documentation.)

The configuration file below seems to allow reasoning with either a GRR or
an out of the box reasoner on the same dataset.

However, I don't think it allows combining inferences from both reasoners at
the same time. I am not sure how to achieve this through configuration.

(Happy to provide an example of data and rule for the GRR. I've noticed this
when adding a rule to classify undeclared individuals, i.e., individuals
appearing in the subject position of a triple. The rule to the effect that if
they do, they are instances of a class A. It is possible to derive the
instantiation. However, it is not possible to combine it with type inheritance
from a reasoner. If class A is a subclass of class B, there is no inference to 
the
effect 

Re: Combining inferences from GRR and RDFS or OWL reasoner

2019-07-21 Thread Dave Reynolds
These kind of messages have been sent by Adrian to pretty much every 
list related to rules for at least 15 years. Definitely spam.


Dave

On 21/07/2019 08:35, Lorenz B. wrote:

Ehm, is this supposed to be SPAM? I remember that you also made this
same weird suggestion in a previous thread without given any background
why. And I still don't understand what Executable English is supposed to
be ... I never heard about anybody using it nor do I think that any
customer will think about using it given that - well, let's call it
"quite old" fashioned web appearance ...

So, for me this attempt is more than off-topic. And sorry, I don't want
to be rude, I'm just confused by your messages on the Apache Jena
mailing list, when a user asks for support and not about alternatives


@others: do you know anything about this tool/project/framework whatever?


Hi Pierre,

You may like to write your example in Executable English.  (The vocabulary
is open,
so you could also use phrases in French).

Executable English is a platform for cooperative writing of self-serve,
self-explaining analytics in open vocabulary English.  It's live online
with many examples.   You are cordially invited to write and run your own
examples too.  Just point your browser to executable-english.com  .  Shared
use is free, and there are no commercials.

Enjoy!  - Adrian

Adrian Walker
Executable English LLC
San Jose, CA, USA
(USA) 860 830 2085 (California time)
www.executable-english.com





On Fri, Jul 19, 2019 at 9:48 AM Pierre Grenon 
wrote:


Hello,

(I apologise for I am not sure if this has been addressed before and I
have not found the right thread or documentation.)

The configuration file below seems to allow reasoning with either a GRR or
an out of the box reasoner on the same dataset.

However, I don't think it allows combining inferences from both reasoners
at the same time. I am not sure how to achieve this through configuration.

(Happy to provide an example of data and rule for the GRR. I've noticed
this when adding a rule to classify undeclared individuals, i.e.,
individuals appearing in the subject position of a triple. The rule to the
effect that if they do, they are instances of a class A. It is possible to
derive the instantiation. However, it is not possible to combine it with
type inheritance from a reasoner. If class A is a subclass of class B,
there is no inference to the effect that the individual is also an instance
of class B.)

With many thanks and kind regards,
Pierre




-


@prefix :   .
@prefix rdf:    .
@prefix tdb2:   .
@prefix ja: .
@prefix rdfs:   .
@prefix fuseki:  .

:theService a   fuseki:Service ;
 rdfs:label"Service with update and query to
test minimal dataset with inference using an instance of generic rule
reasoner and RDFSExptRuleReasoner" ;
 fuseki:dataset:theDataset ;
 #:tdb_dataset_readwrite ;
 fuseki:name   "Conference2" ;
 fuseki:serviceQuery   "query" , "sparql" ;
 fuseki:serviceReadGraphStore  "get" ;
 fuseki:serviceReadWriteGraphStore
 "data" ;
 fuseki:serviceUpdate  "update" ;
 fuseki:serviceUpload  "upload" .

:theDataset a ja:RDFDataset ;
 ja:defaultGraph <#theUnionModel>
 .

<#theUnionModel> a ja:UnionModel ;
 ja:rootModel <#theRootModel> ;
 ja:subModel <#theModel_GRR> , <#theModel_RDFS> .

<#theRootModel> a ja:Model ;
 ja:baseModel <#theGraph> ;
.


<#theModel_GRR> a ja:InfModel ;
 ja:baseModel <#theGraph> ;
 ja:reasoner [
 ja:reasonerURL <
http://jena.hpl.hp.com/2003/GenericRuleReasoner> ;
 ja:rulesFrom


 ] ;
.

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION.
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN.

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO
596/2014).
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION
ON THE FIRM'S POLICY IN RELATION TO ITS 

Re: RDFDataset dump inferred triples only

2019-07-02 Thread Dave Reynolds

On 02/07/2019 11:09, Laura Morales wrote:

Can I do this with one of the Jena command line tools?


Not that I know of, don't think there's a command line tool for running 
a set rules over data.


Dave


Sent: Tuesday, July 02, 2019 at 11:34 AM
From: "Dave Reynolds" 
To: users@jena.apache.org
Subject: Re: RDFDataset dump inferred triples only

On 02/07/2019 09:19, Laura Morales wrote:

How can I dump to a .nt files *only* the inferred triples in a RDFDataset?
In other words, I have a RDFDataset with a GenericRuleReasoner InfModel, and I 
would like to export all the inferred triples to a file.


If you are only using forward rules then use getDeductionsModel to get
just the inferred triples and serialize that.

If you are using backward or hybrid rules then it's trickier. You would
have to materialize everything the backward rules can find as well,
remove the starting graph and then serialize that.

Dave




Re: RDFDataset dump inferred triples only

2019-07-02 Thread Dave Reynolds

On 02/07/2019 09:19, Laura Morales wrote:

How can I dump to a .nt files *only* the inferred triples in a RDFDataset?
In other words, I have a RDFDataset with a GenericRuleReasoner InfModel, and I 
would like to export all the inferred triples to a file.


If you are only using forward rules then use getDeductionsModel to get 
just the inferred triples and serialize that.


If you are using backward or hybrid rules then it's trickier. You would 
have to materialize everything the backward rules can find as well, 
remove the starting graph and then serialize that.


Dave



Re: Fw: GenericRuleReasoner live rule update

2019-06-26 Thread Dave Reynolds
I don't think fuseki has any built in autoreload for rules and the rule 
engine definitely can't just skip broken rules on its own.


You might be able to create a wrapper class which acts like a Dataset 
but behind the scenes uses a dynamically replaceable InfGraph as the 
default graph of the dataset. Then your backend application could use 
fuseki as an embedded server (Fuseki main) and programmatically install 
your wrapper dataset as the dataset to serve.


Dave

On 26/06/2019 07:01, Laura Morales wrote:

To explain my problem a little better, I have a program (website) that is used 
by several people with Fuseki in the backend, and I would like to accept 
user-defined inference rules. The only way I know to add new rules is by 
changing the configuration files and reloading Fuseki. This is not ideal for 
two reasons: 1st it requires a database restart for reading in the new 
configuration files, and 2nd if a rule has a syntax error Fuseki stops with an 
exception. I've read in the documentation about ja:rule but I feel like it 
doesn't solve the problem since it too must be defined in the configuration 
files.
I would like to know if there's a way that I can add inference rules simply by updating a 
graph (some kind of Fuseki "configuration graph" with ja:rule maybe?) instead 
of writing the configuration files, or if broken rules can be skipped instead of blocking 
Fuseki.
Thank you so much!



Sent: Tuesday, June 25, 2019 at 10:39 AM
From: "Laura Morales" 
To: jena-users-ml 
Subject: GenericRuleReasoner live rule update

Is it possible to live-reload GenericRuleReasoner rules? That is without 
restarting Fuseki?


Re: Cannot setup GenericRuleReasoner

2019-06-26 Thread Dave Reynolds
Not a bug in that the syntax was never based on Sparqkm indeed it 
predates Sparql. However, I'm sure someone could add that extension or 
redesign the syntax entirely to be more Sparql compatible.


Dave

On 25/06/2019 12:41, Nouwt, B. (Barry) wrote:

I can confirm that rules cannot use the 'a' keyword to replace rdf:type, like 
you can in for example SPARQL. Not sure if it’s a bug...

You already found the workaround: use rdf:type instead of 'a'

Regards, Barry

-Original Message-
From: Laura Morales 
Sent: dinsdag 25 juni 2019 10:08
To: jena-users-ml 
Subject: Fw: Cannot setup GenericRuleReasoner

It seems to work if I replace the rule

 [ okrule: (?s a ex:Person) -> (?s ex:works "OK") ]

with this rule

 [ okrule: (?s rdf:type ex:Person) -> (?s ex:works "OK") ]

is this a bug?





Sent: Tuesday, June 25, 2019 at 9:40 AM
From: "Laura Morales" 
To: jena-users-ml 
Subject: Cannot setup GenericRuleReasoner

What's wrong with this configuration? It doesn't seem to infer any triples when 
I query the dataset (Fuseki 3.6.0). It doesn't show any errors either.

config.ttl

PREFIX :   <#>
PREFIX fuseki: 
PREFIX ja: 
PREFIX rdf:
PREFIX rdfs:   
PREFIX tdb:

:service a fuseki:Service ;
 rdfs:label"test" ;
 fuseki:name   "test" ;
 fuseki:serviceQuery   "query" ;
 fuseki:serviceReadGraphStore  "get" ;
 fuseki:serviceReadWriteGraphStore "data" ;
 fuseki:serviceUpdate  "update" ;
 fuseki:serviceUpload  "upload" ;
 fuseki:dataset:dataset ;
 .

:dataset a ja:RDFDataset ;
 ja:defaultGraph :model_inf .

:model_inf a ja:InfModel ;
 ja:baseModel :g ;
 ja:reasoner [
 ja:reasonerURL  ;
 ja:rulesFrom  ;
 ] .

:ds a tdb:DatasetTDB ;
 tdb:location "/opt/fuseki/run/databases/ds/" .

:g a tdb:GraphTDB ;
 tdb:dataset :ds .


rules

@prefix rdf:   .
@prefix rdfs:  .
@prefix owl:   .
@prefix xsd:   .
@prefix ex:    .

[ okrule: (?s a ex:Person) -> (?s ex:works "OK") ]


This message may contain information that is not intended for you. If you are 
not the addressee or if this message was sent to you by mistake, you are 
requested to inform the sender and delete the message. TNO accepts no liability 
for the content of this e-mail, for the manner in which you use it and for 
damage of any kind resulting from the risks inherent to the electronic 
transmission of messages.



Re: Documentation/tutorial on using dates in Jena rules with GenericRuleReasoner

2019-06-03 Thread Dave Reynolds

On 03/06/2019 16:36, Pierre Grenon wrote:

So can I just edit

\jena-core\src\main\java\org\apache\jena\reasoner\rulesys\ BuiltinRegistry.java

?

Or is that too hackish?


Definitely too hackish.

After some google searching it I see some old mentions that fuseki 
loadClass will call any public static void init() method on the class 
you load. If that's still true then you should be able to put the


BuiltinRegistry.theRegistry.register(...)

call(s) in such an init method.

Dave


Thanks ,
Pierre

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION.
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN.

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014).
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION
ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS,
PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/.

HORIZON ASSET LLP IS AUTHORISED AND REGULATED
BY THE FINANCIAL CONDUCT AUTHORITY.


From: Dave Reynolds [mailto:dave.e.reyno...@gmail.com]
Sent: 03 June 2019 11:17
To: users@jena.apache.org
Subject: Re: Documentation/tutorial on using dates in Jena rules with 
GenericRuleReasoner

Hi Pierre,

I'm afraid I've lost track of what you are tying to do. Originally it
seemed to be a problem running comparisons on date time values but
Lorenz has already answered that.

In terms on writing your own new Builtins, then before you can use a new
rule Builtin it needs to be registered. That's what the line:

BuiltinRegistry.theRegistry.register( new StringEqualIgnoreCase() )

was doing in my (non-fuseki) example.

If you need your new builtin to run within fuseki then you would need
some way to trigger such registration code. No doubt that's possible but
not something I've any first hand knowledge of.

By the way, for simply getting code loaded into fuseki you don't need to
repack the jar. Just add your new jar to the classpath and use the
ja:loadClass function to get your class loaded when fuseki starts up.
See last example in:

https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html

Dave

On 03/06/2019 07:01, Pierre Grenon wrote:

Hi Dave,

Executive summary:

I'm not a java coder. I did what I could to try to do this using fuseki.

I get this:
[2019-05-31 18:47:30] Functor WARN Invoking undefined functor testBuilt in r1

I understand this may be related to RuleContext. I don't understand any further.
Details below.

With many thanks,
Pierre

Details ---

1. I unzipped my fuseki-server.jar

2. I placed the code below into a ..\rulesys\testBuilt.java as


package org.apache.jena.reasoner.rulesys.builtins;


import org.apache.jena.graph.* ;
import org.apache.jena.reasoner.rulesys.* ;

/**
* Tests if the first argument is less than the second.
*/

class testBuilt extends BaseBuiltin implements Builtin {
public String getName() {
return "testBuilt";
}

@Override
public int getArgLength() {
return 2;
}

@Override
public boolean bodyCall(Node[] args, int length, RuleContext context) {
checkArgs(length, context);
Node n1 = getArg(0, args, context);
Node n2 = getArg(1, args, context);
if (n1.isLiteral() && n1.isLiteral()) {
return n1.getLiteralLexicalForm().equalsIgnoreCase(
n2.getLiteralLexicalForm() );
} else {
return false;
}
}
}



3. Compiled that:

C:\dev\apache-jena-fuseki-3.10.0\woot>javac 
org\apache\jena\reasoner\rulesys\builtins\testBuilt.java

4. Jar-ed the whole thing back

C:\dev\apache-jena-fuseki-3.10.0\woot>jar cmvf 
fuseki-server\META-INF\MANIFEST.MF fuseki-server.jar -C fuseki-server/ .

5. Replaced my fuseki-server.jar

6. Created a rule file


@prefix ns: <http://test.org#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

[r1:
(?x ns:p ?pl)
(?x ns:q ?ql)
testBuilt(?pl, ?ql)
->
(?x ns:r 'equal')
]


7. Created a dataset file


@prefix ns: <http://test.org#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://test.org#Conference0> ns:p "

Re: Documentation/tutorial on using dates in Jena rules with GenericRuleReasoner

2019-06-03 Thread Dave Reynolds
m built-in inference using an instance of generic rule 
reasoner" ;
 fuseki:dataset:theDatasetBI ;
#:tdb_dataset_readwrite ;
 fuseki:name   "ConferenceBuiltIn" ;
 fuseki:serviceQuery   "query" , "sparql" ;
 fuseki:serviceReadGraphStore  "get" ;
 fuseki:serviceReadWriteGraphStore
 "data" ;
 fuseki:serviceUpdate  "update" ;
 fuseki:serviceUpload  "upload" .

:theDatasetBI a ja:RDFDataset ;
 ja:defaultGraph <#theModel_GRRBI> .

<#theModel_GRRBI> a ja:InfModel ;
 ja:baseModel <#theGraphBI> ;
ja:reasoner [
ja:reasonerURL 
<http://jena.hpl.hp.com/2003/GenericRuleReasoner> ;
ja:rulesFrom 

] ;
.

<#theGraphBI> rdf:type tdb2:GraphTDB ;
tdb2:dataset :theTDB2DatasetBI .

:theTDB2DatasetBI
 a  tdb2:DatasetTDB2 ;
 tdb2:location  
"C:\\dev\\apache-jena-fuseki-3.10.0\\run/databases/ConferenceBuiltIn" ;
tdb2:unionDefaultGraph true.



This is my query:
---

prefix ns: <http://test.org#>
select *
where
{?x ns:r ?z}
limit 5

This is Fuskei's log:
--

[2019-05-31 18:47:22] Server INFO  Started 2019/05/31 18:47:22 BST on port 
3030
[2019-05-31 18:47:30] Fuseki INFO  [1] POST 
http://localhost:3030/ConferenceBuiltIn/sparql
[2019-05-31 18:47:30] Fuseki INFO  [1] Query = prefix ns: 
<http://test.org#> select * where  {?x ns:r ?z} limit 5
[2019-05-31 18:47:30] FunctorWARN  Invoking undefined functor testBuilt in 
r1
[2019-05-31 18:47:30] FunctorWARN  Invoking undefined functor testBuilt in 
r1
[2019-05-31 18:47:30] FunctorWARN  Invoking undefined functor testBuilt in 
r1
[2019-05-31 18:47:30] FunctorWARN  Invoking undefined functor testBuilt in 
r1
[2019-05-31 18:47:30] FunctorWARN  Invoking undefined functor testBuilt in 
r1
[2019-05-31 18:47:30] Fuseki INFO  [1] 200 OK (78 ms)

## END OF MESSAGE

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION.
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN.

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014).
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION
ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS,
PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/.

HORIZON ASSET LLP IS AUTHORISED AND REGULATED
BY THE FINANCIAL CONDUCT AUTHORITY.


From: Dave Reynolds [mailto:dave.e.reyno...@gmail.com]
Sent: 17 May 2019 09:01
To: users@jena.apache.org
Subject: Re: Documentation/tutorial on using dates in Jena rules with 
GenericRuleReasoner

Hi Pierre,

I can't offer to hold you by the hand I'm afraid, snowed under with
work. But a minimal example might help. Here's an example of a minimal
extension builtin:

class StringEqualIgnoreCase extends BaseBuiltin implements Builtin {

public String getName() {
return "stringEqualIgnoreCase";
}

@Override
public int getArgLength() {
return 2;
}

@Override
public boolean bodyCall(Node[] args, int length, RuleContext context) {
checkArgs(length, context);
Node n1 = getArg(0, args, context);
Node n2 = getArg(1, args, context);
if (n1.isLiteral() && n1.isLiteral()) {
return n1.getLiteralLexicalForm().equalsIgnoreCase(
n2.getLiteralLexicalForm() );
} else {
return false;
}
}

}

and an example driver class for demonstrating it operating:

/**
* Rule test.
*/
public void testRuleSet2() {
String NS = "http://ont.com/;;
BuiltinRegistry.theRegistry.register( new
StringEqualIgnoreCase() );
String rules = "[r1: (?x ns:p ?pl) (?x ns:q ?ql)
stringEqualIgnoreCase(?pl, ?ql) -> (?x ns:r 'equal') ]";
Model m = ModelFactory.createDefaultModel();
Resource a = m.createResource(NS + "a");
Resource b = m.createResource(NS + "b");
Property p = m.createProperty(NS + "p");
Property q = m.createProperty(NS + "q");
m.add(a, p, "FOO");
m.add(a, q, "foo");
m.add(b, p, "FOO");
m.add(b, q, "foobar");
GenericRuleReason

Re: Documentation/tutorial on using dates in Jena rules with GenericRuleReasoner

2019-05-17 Thread Dave Reynolds

Hi Pierre,

I can't offer to hold you by the hand I'm afraid, snowed under with 
work. But a minimal example might help. Here's an example of a minimal 
extension builtin:


class StringEqualIgnoreCase extends BaseBuiltin implements Builtin {

public String getName() {
return "stringEqualIgnoreCase";
}

@Override
public int getArgLength() {
return 2;
}

@Override
public boolean bodyCall(Node[] args, int length, RuleContext context) {
checkArgs(length, context);
Node n1 = getArg(0, args, context);
Node n2 = getArg(1, args, context);
if (n1.isLiteral() && n1.isLiteral()) {
return n1.getLiteralLexicalForm().equalsIgnoreCase( 
n2.getLiteralLexicalForm() );

} else {
return false;
}
}

}

and an example driver class for demonstrating it operating:

/**
 * Rule test.
 */
public void testRuleSet2() {
String NS = "http://ont.com/;;
BuiltinRegistry.theRegistry.register( new 
StringEqualIgnoreCase() );
String rules = "[r1: (?x ns:p ?pl) (?x ns:q ?ql) 
stringEqualIgnoreCase(?pl, ?ql) -> (?x ns:r 'equal') ]";

Model m = ModelFactory.createDefaultModel();
Resource a = m.createResource(NS + "a");
Resource b = m.createResource(NS + "b");
Property p = m.createProperty(NS + "p");
Property q = m.createProperty(NS + "q");
m.add(a, p, "FOO");
m.add(a, q, "foo");
m.add(b, p, "FOO");
m.add(b, q, "foobar");
GenericRuleReasoner reasoner = new GenericRuleReasoner(Rule
.parseRules(rules));
InfModel infModel = ModelFactory.createInfModel(reasoner, m);
infModel.write(System.out, "Turtle");
}

These are cut/paste from some very ancient examples but hopefully should 
work, if not let us know I can see about assembling it into a self 
contained working example.


As it says in the documentation, for examples of how to write particular 
sorts of builtin then the best place is to look is the source code for 
the current builtins.


Dave

On 17/05/2019 07:53, Pierre Grenon wrote:

Hi

Thanks again.

Hear you.

I think this is becoming a bit too meta perhaps. Maybe there’s a couple of ways 
to go forward.


a.  Anybody is a taker to hold me by the hand and use this thread to come 
up with a complete cycle for making a new built in and adding it to my fuseki? 
If somebody has the time to do this---and I’m happy that it takes what it 
takes, I can’t on my end make it a high priority--, we could reuse the thread 
for the purpose of a detailed how-to for noobs like me.

b.  I think I actually tried the rule below and I didn’t get any inference 
result. Don’t know if it’s my config, my rule or my data. I could start a. by 
trying to provide a dataset and config file as well. Again, anybody willing to 
hold my hand?

Give a shout.

Thanks,
Pierre

From: Lorenz B. [mailto:buehm...@informatik.uni-leipzig.de]
Sent: 17 May 2019 07:24
To: users@jena.apache.org
Subject: Re: Documentation/tutorial on using dates in Jena rules with 
GenericRuleReasoner

Hi,


Hi Lorenz,

Thank you for your answer.

Quick follow up.

I think the issue for me is the documentation of the built-ins is too abstract 
or relies on understanding the source code. So I suppose, documentation / 
tutorial seems somewhat superfluous when you can do that – only I can’t 
understand what’s there and the source at the moment.

I can see that it might be too abstract for people coming from different
areas, sure. But, the question is who is able to provide such a tutorial
and also who has the time. It's always a trade-off in Open Source
projects like Jena - I guess most of the devs or other project related
people here are not getting payed, and clearly such a tutorial for most
if not all of the built-ins definitely needs some effort. Ideally, the
community could take over those things, but looks like nobody ever wrote
blog posts or tutorials about the Jena rule system and its built-ins.




1. Yes, I seem to understand difference is a no go but I was wondering if there 
might be some work around coercing the dateTime to something else. I’m not sure 
I understood that very well but it looks like I can’t use functions in 
arguments of built-ins (so no xsd:year(?date) or whatever).

I don't think you can use functions or expressions from the SPARQL
engine resp. its XPath constructors. Both are totally different
implementations I guess - but again, I'm not a developer, so I can't
make a valid statement, except for looking into the code and the docs.
 From my point of view, only the mentioned built-ins from the docs are
valid so far.




But then, on greaterThan, something should be workable if I have xsd:dateTime, 
no?

What’s wrong with :



[ruleMissedDeadline2:

(?conference ns:hasDeadline ?date)

now(?now)

greaterThan(?now, ?date)

->

(?conference ns:status ns:DeadlinePassed)

]


Well I was clearly 

Re: [GenericRuleReasoner] inner workings

2019-03-13 Thread Dave Reynolds

Hi Marco,

Not a "consensus" that I'm part of so not something I could comment on.

Dave

On 13/03/2019 10:49, Marco Neumann wrote:

correct if me if I am wrong but from my vantage point I seem to notice
a silent consensus in the RDF community to go from SPIN rules to SHACL
which now comes with its own SHACL rule engine [1].

The new SHACL efforts are mostly guided by TopQuadrant and a change
from the initial layered approach to go with SPARQL RDF
(SPIN+(SHACL-rules)). So I presume the current game plan is that SHACL
will "rule" them all in the end.

If so it would be nice to have a feature list for SHACL rules. And
does this mean it will be rules without validation and just CONSTRUCT
queries or are the rule semantic restrictions build into SHACL? I am
sure this will work fine for many use cases we have but since we are
starting to blur the lines between rules/reasoner/sparql would be nice
to have some general autoritative clarification here.

[1] 
https://github.com/TopQuadrant/shacl/blob/master/src/main/java/org/topbraid/shacl/rules/RuleEngine.java


On Wed, Mar 13, 2019 at 10:14 AM Dave Reynolds
 wrote:


Hi Marco,

Sorry, I'm not aware of other rule engines having been wired to Jena but
that doesn't mean it hasn't been done. In particular I'm surprised
there's not a drools-for-jena project somewhere. People have certainly
experimented with that, even written papers comparing performance [1],
but I'm not aware of any supported tooling.

Dave

[1] https://ieeexplore.ieee.org/document/7516153

On 12/03/2019 22:18, Marco Neumann wrote:

so what's your current recommendation for a superior third party rules
reasoner that works efficiently with the jena tooling? free & commercial
option welcome

Marco



On Mon 14. Jan 2019 at 19:16, Dave Reynolds 
wrote:


Hi Barry,

[Agreed that dev is probably the better place to discuss this.]

The two engines in jena are indeed loosely styled on RETE and on tabled
datalog. However, I wouldn't claim they were particularly complete or
good implementations of either. So while looking at some of the source
literature that inspired them might be helpful don't expect very much of
what's covered in the literature to be present in the code.

For RETE then the wikipedia article [1] is a good summary and source of
starting references. I had a copy of the original Forgy paper [1](ref
1), among others,when I was doing the work. There has been a *lot* of
work on improvements to RETE since the 80s and while there were times
when we might have done a new forward engine using more modern
techniques it never happened.

For the backward engine the approach is a variant of SLG-WAM as used for
XSB but highly highly simplified since we can't express general tuples
or recursive data structures within jena's triples. A few google
searches haven't turned up the exact paper that originally inspired the
approach. The closest I've found are [2] and [3], which probably cover
the same ground.

Let me reinforce that the Jena engines are really simplified. They were
enough to get the job done at the time (over a decade ago now) and have
proved useful for some people since but I wouldn't want to defend any of
the implementation choices.

Dave

[1] https://en.wikipedia.org/wiki/Rete_algorithm
[2]

https://pdfs.semanticscholar.org/2078/96964ee85f983cd861a4f8c5dff0bfc9f03e.pdf
[3]

https://pdfs.semanticscholar.org/6c6d/26e8fe1b755140ffcb57025b021a046b2a3b.pdf

On 14/01/2019 16:33, ajs6f wrote:

I have no useful general information about the reasoning framework, but

I am copying this over to dev@. Discussions of how to extend Jena
definitely have a place there.


ajs6f


On Jan 14, 2019, at 6:40 AM, Nouwt, B. (Barry)

 wrote:


Hi all, I want to investigate the inner workings of the

GenericRuleReasoner (with the purpose of extending it in the future). In
Jena's documentation I read:


"Jena includes a general purpose rule-based reasoner which is used to

implement both the RDFS and OWL reasoners but is also available for general
use. This reasoner supports rule-based inference over RDF graphs and
provides forward chaining, backward chaining and a hybrid execution model.
To be more exact, there are two internal rule engines one forward chaining
RETE engine and one tabled datalog engine - they can be run separately or
the forward engine can be used to prime the backward engine which in turn
will be used to answer queries."

source: https://jena.apache.org/documentation/inference/#rules

Apart from Jena's documentation, Jena's mailing lists and its source

code, are there any resources that can better help me grasp what is
happening inside the generic rule reasoner? For example, the text above
mentions the forward chaining RETE engine and the tabled datalog engine,
are there any scientific papers that I might read to better understand
their inner workings?


Maybe this question is better suited for the d...@jena.apache.org

<mailto:d...@jena.apache.org>?


Re: [GenericRuleReasoner] inner workings

2019-03-13 Thread Dave Reynolds

Hi Marco,

Sorry, I'm not aware of other rule engines having been wired to Jena but 
that doesn't mean it hasn't been done. In particular I'm surprised 
there's not a drools-for-jena project somewhere. People have certainly 
experimented with that, even written papers comparing performance [1], 
but I'm not aware of any supported tooling.


Dave

[1] https://ieeexplore.ieee.org/document/7516153

On 12/03/2019 22:18, Marco Neumann wrote:

so what's your current recommendation for a superior third party rules
reasoner that works efficiently with the jena tooling? free & commercial
option welcome

Marco



On Mon 14. Jan 2019 at 19:16, Dave Reynolds 
wrote:


Hi Barry,

[Agreed that dev is probably the better place to discuss this.]

The two engines in jena are indeed loosely styled on RETE and on tabled
datalog. However, I wouldn't claim they were particularly complete or
good implementations of either. So while looking at some of the source
literature that inspired them might be helpful don't expect very much of
what's covered in the literature to be present in the code.

For RETE then the wikipedia article [1] is a good summary and source of
starting references. I had a copy of the original Forgy paper [1](ref
1), among others,when I was doing the work. There has been a *lot* of
work on improvements to RETE since the 80s and while there were times
when we might have done a new forward engine using more modern
techniques it never happened.

For the backward engine the approach is a variant of SLG-WAM as used for
XSB but highly highly simplified since we can't express general tuples
or recursive data structures within jena's triples. A few google
searches haven't turned up the exact paper that originally inspired the
approach. The closest I've found are [2] and [3], which probably cover
the same ground.

Let me reinforce that the Jena engines are really simplified. They were
enough to get the job done at the time (over a decade ago now) and have
proved useful for some people since but I wouldn't want to defend any of
the implementation choices.

Dave

[1] https://en.wikipedia.org/wiki/Rete_algorithm
[2]

https://pdfs.semanticscholar.org/2078/96964ee85f983cd861a4f8c5dff0bfc9f03e.pdf
[3]

https://pdfs.semanticscholar.org/6c6d/26e8fe1b755140ffcb57025b021a046b2a3b.pdf

On 14/01/2019 16:33, ajs6f wrote:

I have no useful general information about the reasoning framework, but

I am copying this over to dev@. Discussions of how to extend Jena
definitely have a place there.


ajs6f


On Jan 14, 2019, at 6:40 AM, Nouwt, B. (Barry)

 wrote:


Hi all, I want to investigate the inner workings of the

GenericRuleReasoner (with the purpose of extending it in the future). In
Jena's documentation I read:


"Jena includes a general purpose rule-based reasoner which is used to

implement both the RDFS and OWL reasoners but is also available for general
use. This reasoner supports rule-based inference over RDF graphs and
provides forward chaining, backward chaining and a hybrid execution model.
To be more exact, there are two internal rule engines one forward chaining
RETE engine and one tabled datalog engine - they can be run separately or
the forward engine can be used to prime the backward engine which in turn
will be used to answer queries."

source: https://jena.apache.org/documentation/inference/#rules

Apart from Jena's documentation, Jena's mailing lists and its source

code, are there any resources that can better help me grasp what is
happening inside the generic rule reasoner? For example, the text above
mentions the forward chaining RETE engine and the tabled datalog engine,
are there any scientific papers that I might read to better understand
their inner workings?


Maybe this question is better suited for the d...@jena.apache.org

<mailto:d...@jena.apache.org>?


Regards, Barry
This message may contain information that is not intended for you. If

you are not the addressee or if this message was sent to you by mistake,
you are requested to inform the sender and delete the message. TNO accepts
no liability for the content of this e-mail, for the manner in which you
use it and for damage of any kind resulting from the risks inherent to the
electronic transmission of messages.






Re: [GenericRuleReasoner] inner workings

2019-01-14 Thread Dave Reynolds

Hi Barry,

[Agreed that dev is probably the better place to discuss this.]

The two engines in jena are indeed loosely styled on RETE and on tabled 
datalog. However, I wouldn't claim they were particularly complete or 
good implementations of either. So while looking at some of the source 
literature that inspired them might be helpful don't expect very much of 
what's covered in the literature to be present in the code.


For RETE then the wikipedia article [1] is a good summary and source of 
starting references. I had a copy of the original Forgy paper [1](ref 
1), among others,when I was doing the work. There has been a *lot* of 
work on improvements to RETE since the 80s and while there were times 
when we might have done a new forward engine using more modern 
techniques it never happened.


For the backward engine the approach is a variant of SLG-WAM as used for 
XSB but highly highly simplified since we can't express general tuples 
or recursive data structures within jena's triples. A few google 
searches haven't turned up the exact paper that originally inspired the 
approach. The closest I've found are [2] and [3], which probably cover 
the same ground.


Let me reinforce that the Jena engines are really simplified. They were 
enough to get the job done at the time (over a decade ago now) and have 
proved useful for some people since but I wouldn't want to defend any of 
the implementation choices.


Dave

[1] https://en.wikipedia.org/wiki/Rete_algorithm
[2] 
https://pdfs.semanticscholar.org/2078/96964ee85f983cd861a4f8c5dff0bfc9f03e.pdf
[3] 
https://pdfs.semanticscholar.org/6c6d/26e8fe1b755140ffcb57025b021a046b2a3b.pdf


On 14/01/2019 16:33, ajs6f wrote:

I have no useful general information about the reasoning framework, but I am 
copying this over to dev@. Discussions of how to extend Jena definitely have a 
place there.
  
ajs6f



On Jan 14, 2019, at 6:40 AM, Nouwt, B. (Barry)  
wrote:

Hi all, I want to investigate the inner workings of the GenericRuleReasoner 
(with the purpose of extending it in the future). In Jena's documentation I 
read:

"Jena includes a general purpose rule-based reasoner which is used to implement both 
the RDFS and OWL reasoners but is also available for general use. This reasoner supports 
rule-based inference over RDF graphs and provides forward chaining, backward chaining and 
a hybrid execution model. To be more exact, there are two internal rule engines one 
forward chaining RETE engine and one tabled datalog engine - they can be run separately 
or the forward engine can be used to prime the backward engine which in turn will be used 
to answer queries."
source: https://jena.apache.org/documentation/inference/#rules

Apart from Jena's documentation, Jena's mailing lists and its source code, are 
there any resources that can better help me grasp what is happening inside the 
generic rule reasoner? For example, the text above mentions the forward 
chaining RETE engine and the tabled datalog engine, are there any scientific 
papers that I might read to better understand their inner workings?

Maybe this question is better suited for the 
d...@jena.apache.org?

Regards, Barry
This message may contain information that is not intended for you. If you are 
not the addressee or if this message was sent to you by mistake, you are 
requested to inform the sender and delete the message. TNO accepts no liability 
for the content of this e-mail, for the manner in which you use it and for 
damage of any kind resulting from the risks inherent to the electronic 
transmission of messages.




Re: ModelChangedListener not triggered when new inferred statements are added.

2019-01-07 Thread Dave Reynolds
Actually it depends on the rules and configuration of the 
GenericRuleReasoner.


For rules executed in backward mode it's exactly as ajs6f says, there's 
no extra statements added to a model.


For rules executed in forward mode then the inferred triples are added 
to the deductions graph (which is separate from the base graph which 
holds the data). So I imagine that a listener on the deductions graph 
should work. Not tested it myself though - the rule system largely 
predates the current version of listeners.


However, note that even in forward mode the inferences are only trigged 
when you ask a query or call prepare(). They are not immediately 
triggered by just adding new base triples.


Dave

On 07/01/2019 18:36, ajs6f wrote:

Dave Reynolds will be able to say more, but my understanding is that inferred triples are 
_not_ added to the model in the sense in which we might use Model::add to add a 
Statement. They are simply presented by the implementation as the model is scanned, 
"injected" by the inference machinery, so to speak.

As for your use case, perhaps you could use a rule to do this? Again, I don't 
know a lot about this end of Jena, but perhaps you could write a custom 
built-in function for the rules system [1] .

ajs6f

[1] https://jena.apache.org/documentation/inference/index.html#builtins


On Jan 7, 2019, at 10:32 AM, Nouwt, B. (Barry)  
wrote:

Hi everyone,
  
We would like Apache Jena to raise an event whenever a particular triple is inferred by the GenericRuleReasoner.
  
We tried to achieve this by registering our implementation of the ModelChangedListener interface to an InfModel, but our implementation does not get triggered when inferred statements are added by the reasoner.
  
Is there any (other) way to raise an event whenever an inferred statement is added to an InfModel?
  
Regards, Barry
  
  
  
B. (Barry) Nouwt

Medior Innovator Semantic Technology
Connected Business
T +31 (0)88 866 56 91
M +31 (0)64 977 53 56
E barry.no...@tno.nl
Location

  


This message may contain information that is not intended for you. If you are 
not the addressee or if this message was sent to you by mistake, you are 
requested to inform the sender and delete the message. TNO accepts no liability 
for the content of this e-mail, for the manner in which you use it and for 
damage of any kind resulting from the risks inherent to the electronic 
transmission of messages.




Re: [GenericRuleReasoner] multi-headed backward chaining

2018-09-04 Thread Dave Reynolds

Hi Barry,

On 04/09/18 12:47, Nouwt, B. (Barry) wrote:

Hi Dave,

Thanks for your answers and the pointer towards implementing such feature. I 
assume the syntactical solution you mention would not solve the limitation I 
describe below. I'm unsure how to circumvent this limitation without 
multi-headed backward rule support, but I'm of course open to suggestions.


See below ...


Do you maybe have "alternative general purpose backward rule engine" in mind 
that either already supports multiple heads, or that might be easier to extend to support 
it? I've also been looking into other engines, like TopQuadrant's SHACL engine 
(https://github.com/TopQuadrant/shacl/issues/44) or Openllet 
(https://github.com/Galigator/openllet), but for now none of them beats the Jena's 
GenericRuleEngine .


I was thinking of non-RDF rule engines which would then need 
(non-trivial) work to wire up to a triple representation.


If you want serious backward chaining then XSB is the one to look at. 
The jena backward engine is based loosely on a simplified version of the 
SLG-WAM approach that XSB uses.


The alternative is Drools, which has in general a good reputation, but I 
know nothing about it's backward rule support (indeed next to nothing 
about it at all).



I'll try to explain one of the limitations we encounter in more detail. Imagine 
the following (untested and not that useful) scenario with a SPARQL query that 
retrieves a particular measured value based on a start and end date:

SELECT ?value
WHERE {
?measurement :hasValue ?value .
?measurement :validFor ?interval .
?interval rdf:type :DateInterval .
?interval :startDate "2018-06-10"^^xsd:date .
?interval :endDate "2018-07-07"^^xsd:date .
?interval :valid "true"^^xsd:boolean .
}

The data could look like this (note the missing ":interval1 :valid 
"true"^^xsd:boolean" triple):

:measure1 :hasValue "123" .
:measure1 :validFor :interval1 .
:interval1 rdf:type :DateInterval .
:interval1 :startDate "2018-06-10"^^xsd:date .
:interval1 :endDate "2018-07-07"^^xsd:date .

Now we would like to have a backward rule that checks whether the start lies before the 
end date of the intervals and adds the valid = "true" triple.

[IntRule:
(?int rdf:type :DateInterval)
(?int :startDate ?start)
(?int :endDate ?end)
<-
lessThan(?start, ?end)
(?int :valid "true"^^xsd:boolean)
]


Not sure I follow that. If you want to conclude that a measurement with 
correctly ordered dates is valid, and do so using a forward rule then 
you would use:


[IntRule:
(?int rdf:type :DateInterval)
(?int :startDate ?start)
(?int :endDate ?end)
lessThan(?start, ?end)
  ->
(?int :valid "true"^^xsd:boolean)
]

That rule is valid both forward and backward. There is only one head 
(conclusion) from a set of three body terms and a condition. So you 
could set the engine to run in backward mode and run that same rule 
backwards with no problems.


If you want to use the explicit backward rule syntax, so you can use the 
generic reasoner in default hybrid mode, then the corresponding backward 
rule syntax is:


[IntRule:
(?int :valid "true"^^xsd:boolean)
<-
(?int rdf:type :DateInterval)
(?int :startDate ?start)
(?int :endDate ?end)
lessThan(?start, ?end)
]

That's valid and works for me on a trivial test case.

Maybe I'm misunderstanding what you are trying to do, or maybe there's 
some confusion caused by the backward rule syntax.


Dave


So, to answer the above query, the head of the backward rule would match with 
the corresponding goal triples from the SPARQL query and bind the ?start and 
?end variable in the rule to the dates mentioned in the SPARQL query. We have 
difficulties getting this to work with single-headed backward rules, since the 
splitted single-headed backward rules look like below and this means none of 
the rules can make the lessThan(?start, ?end) comparison since none of them has 
more than one triple in its head.

[IntRule1:
(?int rdf:type :DateInterval)
<-
lessThan(?start, ?end)  #MISSES BOTH ?start AND ?end
(?int :valid "true"^^xsd:boolean)
]

[IntRule2:
(?int :startDate ?start)
<-
lessThan(?start, ?end)  #MISSES ?end
(?int :valid "true"^^xsd:boolean)
]

[IntRule3:
(?int :endDate ?end)
<-
lessThan(?start, ?end)  #MISSES ?end
(?int :valid "true"^^xsd:boolean)
]

Thanks in advance!

Regards, Barry

-Original Message-
From: Dave Reynolds 
Sent: dinsdag 4 september 2018 12:41
To: users@jena.apache.org
Subject: Re: [GenericRuleReasoner] multi-headed backward chaining

Hi,

On 03/09/18 13:05, Nouwt, B. (Barry) wrote:

Hi all,

We are using Apache Jena's G

Re: [GenericRuleReasoner] multi-headed backward chaining

2018-09-04 Thread Dave Reynolds

Hi Barry,

On 04/09/18 12:47, Nouwt, B. (Barry) wrote:

Hi Dave,

Thanks for your answers and the pointer towards implementing such feature. I 
assume the syntactical solution you mention would not solve the limitation I 
describe below. I'm unsure how to circumvent this limitation without 
multi-headed backward rule support, but I'm of course open to suggestions.


See below ...


Do you maybe have "alternative general purpose backward rule engine" in mind 
that either already supports multiple heads, or that might be easier to extend to support 
it? I've also been looking into other engines, like TopQuadrant's SHACL engine 
(https://github.com/TopQuadrant/shacl/issues/44) or Openllet 
(https://github.com/Galigator/openllet), but for now none of them beats the Jena's 
GenericRuleEngine .


I was thinking of non-RDF rule engines which would then need 
(non-trivial) work to wire up to a triple representation.


If you want serious backward chaining then XSB is the one to look at. 
The jena backward engine is based loosely on a simplified version of the 
SLG-WAM approach that XSB uses.


The alternative is Drools, which has in general a good reputation, but I 
know nothing about it's backward rule support (indeed next to nothing 
about it at all).



I'll try to explain one of the limitations we encounter in more detail. Imagine 
the following (untested and not that useful) scenario with a SPARQL query that 
retrieves a particular measured value based on a start and end date:

SELECT ?value
WHERE {
?measurement :hasValue ?value .
?measurement :validFor ?interval .
?interval rdf:type :DateInterval .
?interval :startDate "2018-06-10"^^xsd:date .
?interval :endDate "2018-07-07"^^xsd:date .
?interval :valid "true"^^xsd:boolean .
}

The data could look like this (note the missing ":interval1 :valid 
"true"^^xsd:boolean" triple):

:measure1 :hasValue "123" .
:measure1 :validFor :interval1 .
:interval1 rdf:type :DateInterval .
:interval1 :startDate "2018-06-10"^^xsd:date .
:interval1 :endDate "2018-07-07"^^xsd:date .

Now we would like to have a backward rule that checks whether the start lies before the 
end date of the intervals and adds the valid = "true" triple.

[IntRule:
(?int rdf:type :DateInterval)
(?int :startDate ?start)
(?int :endDate ?end)
<-
lessThan(?start, ?end)
(?int :valid "true"^^xsd:boolean)
]


Not sure I follow that. If you want to conclude that a measurement with 
correctly ordered dates is valid, and do so using a forward rule then 
you would use:


[IntRule:
(?int rdf:type :DateInterval)
(?int :startDate ?start)
(?int :endDate ?end)
lessThan(?start, ?end)
  ->
(?int :valid "true"^^xsd:boolean)
]

That rule is valid both forward and backward. There is only one head 
(conclusion) from a set of three body terms and a condition. So you 
could set the engine to run in backward mode and run that same rule 
backwards with no problems.


If you want to use the explicit backward rule syntax, so you can use the 
generic reasoner in default hybrid mode, then the corresponding backward 
rule syntax is:


[IntRule:
(?int :valid "true"^^xsd:boolean)
<-
(?int rdf:type :DateInterval)
(?int :startDate ?start)
(?int :endDate ?end)
lessThan(?start, ?end)
]

That's valid and works for me on a trivial test case.

Maybe I'm misunderstanding what you are trying to do, or maybe there's 
some confusion caused by the backward rule syntax.


Dave


So, to answer the above query, the head of the backward rule would match with 
the corresponding goal triples from the SPARQL query and bind the ?start and 
?end variable in the rule to the dates mentioned in the SPARQL query. We have 
difficulties getting this to work with single-headed backward rules, since the 
splitted single-headed backward rules look like below and this means none of 
the rules can make the lessThan(?start, ?end) comparison since none of them has 
more than one triple in its head.

[IntRule1:
(?int rdf:type :DateInterval)
<-
lessThan(?start, ?end)  #MISSES BOTH ?start AND ?end
(?int :valid "true"^^xsd:boolean)
]

[IntRule2:
(?int :startDate ?start)
<-
lessThan(?start, ?end)  #MISSES ?end
(?int :valid "true"^^xsd:boolean)
]

[IntRule3:
(?int :endDate ?end)
<-
lessThan(?start, ?end)  #MISSES ?end
(?int :valid "true"^^xsd:boolean)
]

Thanks in advance!

Regards, Barry

-Original Message-
From: Dave Reynolds 
Sent: dinsdag 4 september 2018 12:41
To: users@jena.apache.org
Subject: Re: [GenericRuleReasoner] multi-headed backward chaining

Hi,

On 03/09/18 13:05, Nouwt, B. (Barry) wrote:

Hi all,

We are using Apache Jena's G

  1   2   3   4   5   6   >