Hi Bram,

Thanks for the summary on your work on Treebank and BaseX!

> The problem that I have encountered is that BaseX seems to
> cache very efficiently. Obviously this is not a problem on production
> websites but for benchmarking it may not be ideal. My first question to you,
> then, is: is it possible to disable caching when testing queries locally?
> And how exactly does BaseX handle the caching? Or more specifically, if I
> enter a query: what is cached, and for how long? This information me be
> useful to analyse our logs with.

You may be surprised to hear that BaseX does not have any particular
caching strategies for queries and query results. Various
optimizations exist for caching IO data on a lower level, though. As
these strategies reach down to the OS and hardware disk access level,
it’s hardly possible to disable all of them. Usually, it’s simply your
main memory that distorts your performance measurements, because the
relevant disk data will only be pulled once from disk as long as
enough main memory is available. Besides that, Java programs are
generally getting faster and faster the longer they are running (due
to Just-in-Time Compilation – JIT)… and so on.

In practice, if you do benchmarking, it’s usually good to “warm up”
your BaseX instance by running various initial queries, and by using
the client/server architecture and e.g. look at the execution time
output by the -v or -V command-line flag. In order to simulate
real-life query patterns, you should run your test queries in random
order, and run a great number of different queries. Moreover, it’s
recommendable to run your queries multiple times and eventually take
the mean or minimum value as result. If this value differs more than
5% when repeating the test, then you should possibly increase the
number of runs.

I hope this helps a bit; I invite you to report back on your experiences,
Christian

Reply via email to