[basex-talk] I would like to share some code I use for easier RestXQ development

Omar Siam Wed, 06 Apr 2022 09:40:28 -0700

Hi list,

I am using BaseX here at my institution a lot as an XML database thatprovides data using RESTful APIs [1] or serves HTML, JS and CSS withRestXQ which in turn uses HTTP requests to fetch more data [2]. Our mainuse of BaseX databases is about TEI/XML encoded dictionaries of varioussizes. That means we mainly want to search for, get and change smalltei:entry parts in larger TEI/XML documents.

BaseXwas amazingly stable (compared to some other open source XMLdatabase existing today) over the last few years and has a very solidset of built in functions that almost always are sufficient to get a jobdone. I also like BaseX because I am pretty sure I always couldunderstand why things did not work and what I can do about it evenwithout doing any Java programming (although I saw some weirdness or theother over the last few years).

I created or ported some XQuery modules [3][4][6][7] and acontainerization environment [5] that make my life easier and I wouldlike to share them here and maybe have a discussion about myimplementations and if others can make use of them and how (for examplecan these [6][7] be expath packages, I see some obstacles in the wayRestXQ annotations work).

I will try to give an introduction to each of the modules in someseparate mails.

Finally, I now write to the list as I have a performance problem with aCRUD API I created [1] for the task mentioned above when using it with adataset that is about 7GB looking at the BaseX databases that make it up.This API uses many of the modules and techniques I came up with so Ithought it might be helpful to first talk about those parts. I hope thatthey may be useful to others as well.

I tried to get creative at finding a way to optimize querying the datawithout having many long lasting global lock situations and having BaseXusing indexes as much as possible (this started before thedb:enforceindex pragma was introduced and still works for me as expectedwithout it) while still writing the RESTful API in BaseX' implementationof RestXQ.That is why I created [7]. It heavily uses (abuses?) BaseX' jobs module.It allows me to query in smaller BaseX databases in parallel and presentthem as if they were one big XML DB, which vastly improves performanceon update and reindex, to a point.Still there is a file based lock (or is it even class based?) [8], Ithink the JVM profiling tells me, that severely limits the number of(read) operations that can be done over the API without the user havingto wait so long they think the operation failed. This is a multithreading problem as I see it.

Or maybe I overlooked something that would solve my problems without allthe creative stuff I tried? That probably will be obvious when I haveexplained the current implentation in more detail which I intend to doin the next few days.

[1] https://vle-curation.acdh.oeaw.ac.at/openapi/,https://github.com/acdh-oeaw/vleserver_basex

[2] https://vicav.acdh.oeaw.ac.at/, https://github.com/acdh-oeaw/vicav-app
[3] https://github.com/acdh-oeaw/openapi4restxq
[4] https://github.com/acdh-oeaw/api-problem4restxq
[5] https://github.com/simar0at/heroku-buildpack-basex

[6] https://github.com/acdh-oeaw/vicav-app/blob/master/http.xqm,https://github.com/acdh-oeaw/openapi4restxq/blob/master_basex/swagger-ui.xqm[7]https://github.com/acdh-oeaw/vleserver_basex/blob/main/vleserver/util.xqm[8]https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/io/random/DataAccess.java#L184and other read mehods there


Best regards

--
Mag. Ing. Omar Siam
Austrian Center for Digital Humanities and Cultural Heritage
Österreichische Akademie der Wissenschaften | Austrian Academy of Sciences
Stellvertretende Behindertenvertrauensperson | Deputy representative for 
disabled persons
Wohllebengasse 12-14, 1040 Wien, Österreich | Vienna, Austria
T: +43 1 51581-7295
omar.s...@oeaw.ac.at  |www.oeaw.ac.at/acdh

[basex-talk] I would like to share some code I use for easier RestXQ development

Reply via email to