Re: Database Migrations in Fuseki

2024-02-10 Thread Paul Tyson
I don't know if my experience is helpful, but I went a different way to 
solve these sorts of problems.


I avoid adding business logic while generating the RDF from the source 
data systems. It is almost entirely a simple transliteration from one 
format to another (I use an R2RML mapping.) The purpose is to combine 
data about the same subject from different source systems.


Separately, I wrote rules that check for several different desired 
constraints on the data. I think this corresponds to your "application 
logic". The source rules are written in RIF, then translated to HTML for 
display and SPARQL for execution.


Using the rules (in SPARQL format) I materialize additional triples and 
add them to the datastore.


This approach decouples data from business logic and lets each evolve 
separately. When the source data schema changes, I change the R2RML 
mapping. When the business logic changes, I change the RIF rules.


The logic (rules) may be applied in any of several different ways to 
suit your needs (for example using a different rules engine--or shex or 
shacl--, or generating results on-the-fly instead of materializing them 
back into the datastore).


Regards,
--Paul

On 2/9/24 08:17, balduin.landolt@dasch.swiss.INVALID wrote:

Hi Andy,


If I understand correctly, this is a schema change requiring the data to change.

Correct, but we don't enforce any schema on the database level (no SHACL 
involved), that's only done programmatically in the application.


The transformation of the data to the updated data model could be done offline, 
that would reduce downtime. If the data is being continuously updated, that's 
harder because the offline copy will get out of step with the live data.
How often does the data change (not due to application logic changes)?

The data might theoretically change constantly, so just doing it offline on a 
copy isn't really possible.
A compromise I've been thinking about, which would still be better than 
downtime, would be a read-only mode for the duration of the migration. But for 
now, the application doesn't support something like this yet.
(And if we can get the downtime to something reasonable, that would be good 
enough.)


Do you have a concrete example of such a change?

These changes can vary from very simple to very complex:
- The simplest case would maybe be that a certain property that used to be 
optional on a certain type of resource becomes mandatory; for all instances 
where this is not present, a default value needs to be supplied.
   => this we could easily do with a SPARQL update.
- The most complex case I encountered so far was roughly this:
   Given in graph A (representing the data model for a subset of the data), a 
particular statement on something of type P (defining some kind of property) is 
present, and in graph B (the subset of data corresponding to the model in A), a 
certain statement holds true for all V (which have a reference to P), then P 
should be modified. If the statement does not hold true for all V, then each V 
where it does not, must be modified to become a more complex object.
   (More concretely: V represents a text value. If P says that V may contain 
markup, then check if any V contains markup. If not, change P to say that it 
does not contain markup;  if any V contains markup, then all Vs that represent 
text without markup need to be changed to contain text with markup. Text 
without markup here represents a bit of reification around a string literal; 
text with markup follows a sophisticated standoff markup model, and even if no 
markup is present, it needs to contain information on nature of the markup that 
is used.)
   => this is something I would not know how to, or feel comfortable attempting 
in SPARQL, so it needs to happen in code.

Long story short: Some simple changes I could easily do in SPARQL; the more 
complex ones would require me to be able to do the changes in code, but it 
might be possible to have it set up in such a way that the code essentially has 
read access to the data and can generate update queries from that.

Our previous setup worked like this (durations not measured, just from 
experience):
On application start, if a migration needs doing, it won't start right away, 
but kick that process off:
- download an entire dump of fuseki to a file on disk (ca. 20 min.)
- load the dump into an in-memory Jena model (10 min. -> plus huge memory 
consumption that will always grow proportional to our data growing)
- perform the migration on the in-memory model (1 sec. - 1 min.)
- dump the model to a file on disk
- drop all graphs from fuseki (20 min.)
- upload the dump into fuseki (20 min.)
Then the application would start... so at least 1h downtime, clearly room for 
improvement.
The good thing about this approach is that if the migration fails, the data 
would not be corrupted because the data loaded in fuseki is not affected.

My best bet at this point is to say, we take the risk of data 

Re: jena-fuseki UI in podman execution (2nd effort without attachments)

2024-02-10 Thread Andrii Berezovskyi
A bit unrelated, but I could also recommend secoresearch/fuseki image, which is 
maintained by Jouni Tuominen and is currently at Jena 4.10.0 & JDK 21.0.2.

–Andrew.

On 9 Feb 2024, at 12:37, Andy Seaborne  wrote:

Hi Jaana,

Glad you got it sorted out.

The Fuseki UI does not do anything special about browser caches. There was a 
major UI update with implementing it in Vue and all the HTML assets that go 
with that.

   Andy

On 09/02/2024 05:37, jaa...@kolumbus.fi wrote:
Hi, I just noticed that it's not  question about podman or docker but about 
browser cache. After deleting everything in browser cache I managed to get the 
correct user interface when running stain/jena-fuseki:3.14.0 and 
stain/jena-fuseki:4.0.0 by both podman and docker, but when I tried the latest 
stain/jena-fuseki (4.8.0) I got the incorrect interface (shown here 
https://github.com/jamietti/jena/blob/main/fuseki-podman.png).
Jaana M
08.02.2024 13.23 EET jaa...@kolumbus.fi kirjoitti:

 Hi, I've running jena-fuseki with docker:
 docker run -p 3030:3030 -e ADMIN_PASSWORD=pw123 stain/jena-fuseki
 and rootless podman:
 podman run -p 3030:3030 -e ADMIN_PASSWORD=pw123 docker.io/stain/jena-fuseki
 when excuted the same version 4.8.0 of jena-fuseki with podman the UI looks 
totally different from the UI of the instance excuted with docker.
 see file fuseki-podman.png 
https://github.com/jamietti/jena/blob/main/fuseki-podman.png in 
https://github.com/jamietti/jena/
What can cause this problem ?
 Br, Jaana M