Hi list,

I am using BaseX here at my institution a lot as an XML database that provides data using RESTful APIs [1] or serves HTML, JS and CSS with RestXQ which in turn uses HTTP requests to fetch more data [2]. Our main use of BaseX databases is about TEI/XML encoded dictionaries of various sizes. That means we mainly want to search for, get and change small tei:entry parts in larger TEI/XML documents.

BaseXwas amazingly stable (compared to some other open source XML database existing today) over the last few years and has a very solid set of built in functions that almost always are sufficient to get a job done. I also like BaseX because I am pretty sure I always could understand why things did not work and what I can do about it even without doing any Java programming (although I saw some weirdness or the other over the last few years).

I created or ported some XQuery modules [3][4][6][7] and a containerization environment [5] that make my life easier and I would like to share them here and maybe have a discussion about my implementations and if others can make use of them and how (for example can these [6][7] be expath packages, I see some obstacles in the way RestXQ annotations work).

I will try to give an introduction to each of the modules in some separate mails.

Finally, I now write to the list as I have a performance problem with a CRUD API I created [1] for the task mentioned above when using it with a dataset that is about 7GB looking at the BaseX databases that make it up. This API uses many of the modules and techniques I came up with so I thought it might be helpful to first talk about those parts. I hope that they may be useful to others as well.

I tried to get creative at finding a way to optimize querying the data without having many long lasting global lock situations and having BaseX using indexes as much as possible (this started before the db:enforceindex pragma was introduced and still works for me as expected without it) while still writing the RESTful API in BaseX' implementation of RestXQ. That is why I created [7]. It heavily uses (abuses?) BaseX' jobs module. It allows me to query in smaller BaseX databases in parallel and present them as if they were one big XML DB, which vastly improves performance on update and reindex, to a point. Still there is a file based lock (or is it even class based?) [8], I think the JVM profiling tells me, that severely limits the number of (read) operations that can be done over the API without the user having to wait so long they think the operation failed. This is a multi threading problem as I see it.

Or maybe I overlooked something that would solve my problems without all the creative stuff I tried? That probably will be obvious when I have explained the current implentation in more detail which I intend to do in the next few days.

[1] https://vle-curation.acdh.oeaw.ac.at/openapi/, https://github.com/acdh-oeaw/vleserver_basex
[2] https://vicav.acdh.oeaw.ac.at/, https://github.com/acdh-oeaw/vicav-app
[3] https://github.com/acdh-oeaw/openapi4restxq
[4] https://github.com/acdh-oeaw/api-problem4restxq
[5] https://github.com/simar0at/heroku-buildpack-basex
[6] https://github.com/acdh-oeaw/vicav-app/blob/master/http.xqm, https://github.com/acdh-oeaw/openapi4restxq/blob/master_basex/swagger-ui.xqm [7] https://github.com/acdh-oeaw/vleserver_basex/blob/main/vleserver/util.xqm [8] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/io/random/DataAccess.java#L184 and other read mehods there

Best regards

--
Mag. Ing. Omar Siam
Austrian Center for Digital Humanities and Cultural Heritage
Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences
Stellvertretende Behindertenvertrauensperson | Deputy representative for 
disabled persons
Wohllebengasse 12-14, 1040 Wien, Österreich | Vienna, Austria
T: +43 1 51581-7295
omar.s...@oeaw.ac.at  |www.oeaw.ac.at/acdh

Reply via email to