Hi list,
I am using BaseX here at my institution a lot as an XML database that
provides data using RESTful APIs [1] or serves HTML, JS and CSS with
RestXQ which in turn uses HTTP requests to fetch more data [2]. Our main
use of BaseX databases is about TEI/XML encoded dictionaries of various
sizes. That means we mainly want to search for, get and change small
tei:entry parts in larger TEI/XML documents.
BaseXwas amazingly stable (compared to some other open source XML
database existing today) over the last few years and has a very solid
set of built in functions that almost always are sufficient to get a job
done. I also like BaseX because I am pretty sure I always could
understand why things did not work and what I can do about it even
without doing any Java programming (although I saw some weirdness or the
other over the last few years).
I created or ported some XQuery modules [3][4][6][7] and a
containerization environment [5] that make my life easier and I would
like to share them here and maybe have a discussion about my
implementations and if others can make use of them and how (for example
can these [6][7] be expath packages, I see some obstacles in the way
RestXQ annotations work).
I will try to give an introduction to each of the modules in some
separate mails.
Finally, I now write to the list as I have a performance problem with a
CRUD API I created [1] for the task mentioned above when using it with a
dataset that is about 7GB looking at the BaseX databases that make it up.
This API uses many of the modules and techniques I came up with so I
thought it might be helpful to first talk about those parts. I hope that
they may be useful to others as well.
I tried to get creative at finding a way to optimize querying the data
without having many long lasting global lock situations and having BaseX
using indexes as much as possible (this started before the
db:enforceindex pragma was introduced and still works for me as expected
without it) while still writing the RESTful API in BaseX' implementation
of RestXQ.
That is why I created [7]. It heavily uses (abuses?) BaseX' jobs module.
It allows me to query in smaller BaseX databases in parallel and present
them as if they were one big XML DB, which vastly improves performance
on update and reindex, to a point.
Still there is a file based lock (or is it even class based?) [8], I
think the JVM profiling tells me, that severely limits the number of
(read) operations that can be done over the API without the user having
to wait so long they think the operation failed. This is a multi
threading problem as I see it.
Or maybe I overlooked something that would solve my problems without all
the creative stuff I tried? That probably will be obvious when I have
explained the current implentation in more detail which I intend to do
in the next few days.
[1] https://vle-curation.acdh.oeaw.ac.at/openapi/,
https://github.com/acdh-oeaw/vleserver_basex
[2] https://vicav.acdh.oeaw.ac.at/, https://github.com/acdh-oeaw/vicav-app
[3] https://github.com/acdh-oeaw/openapi4restxq
[4] https://github.com/acdh-oeaw/api-problem4restxq
[5] https://github.com/simar0at/heroku-buildpack-basex
[6] https://github.com/acdh-oeaw/vicav-app/blob/master/http.xqm,
https://github.com/acdh-oeaw/openapi4restxq/blob/master_basex/swagger-ui.xqm
[7]
https://github.com/acdh-oeaw/vleserver_basex/blob/main/vleserver/util.xqm
[8]
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/io/random/DataAccess.java#L184
and other read mehods there
Best regards
--
Mag. Ing. Omar Siam
Austrian Center for Digital Humanities and Cultural Heritage
Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences
Stellvertretende Behindertenvertrauensperson | Deputy representative for
disabled persons
Wohllebengasse 12-14, 1040 Wien, Österreich | Vienna, Austria
T: +43 1 51581-7295
omar.s...@oeaw.ac.at |www.oeaw.ac.at/acdh