Good morning Christian

Thank you for the quick reply! I am indeed surprised that BaseX does not do 
much particular caching. Now that I think of it, it does seem to make sense: if 
results are loaded in memory, they will be accessible much faster for 
consequent queries, and they will reside in memory until overwritten or wiped - 
or at least that is how I see it, I am no computer expert!

I have now gathered all XPath structures that I would like to benchmark (~100; 
I'm not sure if this is enough?). Considering I am no hero in XQuery, I will 
ask my supervisor if he can write a script for this purpose (he loves Perl, so 
I assume he'll come up with something). I did read on your website that it is 
possible to communicate with BaseX from Java. Is there any documentation or 
guidelines on this? I am knowledgeable with Java, so I assume I should be able 
to conjure up a benchmark script in Java. The only thing that I don't know is 
how to contact the database and insert a query. Could you lead me to a 
tutorial-like source, if available? If not I will ask my supervisor's help.

Finally I'd like to thank you for the tips for benchmarking, they are very 
useful!


Kind regards

Bram
https://be.linkedin.com/in/bramvanroy

________________________________________
Van: Christian Grün [christian.gr...@gmail.com]
Verzonden: maandag 15 februari 2016 13:26
Aan: Bram Vanroy
CC: BaseX
Onderwerp: Re: [basex-talk] Benchmarking and caching in BaseX

Hi Bram,

Thanks for the summary on your work on Treebank and BaseX!

> The problem that I have encountered is that BaseX seems to
> cache very efficiently. Obviously this is not a problem on production
> websites but for benchmarking it may not be ideal. My first question to you,
> then, is: is it possible to disable caching when testing queries locally?
> And how exactly does BaseX handle the caching? Or more specifically, if I
> enter a query: what is cached, and for how long? This information me be
> useful to analyse our logs with.

You may be surprised to hear that BaseX does not have any particular
caching strategies for queries and query results. Various
optimizations exist for caching IO data on a lower level, though. As
these strategies reach down to the OS and hardware disk access level,
it’s hardly possible to disable all of them. Usually, it’s simply your
main memory that distorts your performance measurements, because the
relevant disk data will only be pulled once from disk as long as
enough main memory is available. Besides that, Java programs are
generally getting faster and faster the longer they are running (due
to Just-in-Time Compilation – JIT)… and so on.

In practice, if you do benchmarking, it’s usually good to “warm up”
your BaseX instance by running various initial queries, and by using
the client/server architecture and e.g. look at the execution time
output by the -v or -V command-line flag. In order to simulate
real-life query patterns, you should run your test queries in random
order, and run a great number of different queries. Moreover, it’s
recommendable to run your queries multiple times and eventually take
the mean or minimum value as result. If this value differs more than
5% when repeating the test, then you should possibly increase the
number of runs.

I hope this helps a bit; I invite you to report back on your experiences,
Christian

Reply via email to