Hi Christian,

I've made a comparation of the persistence time series running your example 
code and mine, in all possible combinations of following scenarios: 
- with and without "set intparse on"
- using my prepared test data and your test data
- closing and opening the DB connection each "n"-th insertion operation (where 
n in {5, 100, 500, 1000})
- with and without "set autoflush on".

I finally found out, that the only relevant variable that influence the insert 
operation duration is the value of the AUTOFLASH option. 

If AUTOFLASH = OFF when opening a database, then the persistence durations 
remains relative constant (on my machine about 43 ms) during the entire insert 
operations sequence (50.000 or 100.000 times), for all possible combinations 
named above.

If AUTOFLASH = ON when opening a database, then the persistence durations 
increase monotonic, for all possible combinations named above. 

The persistence duration, if AUTOFLASH = ON, is directly proportional to the 
number of DB clients executing these insert operations, respectively to the 
sequence length of insert operations executed by a DB client.

In my opinion, this behaviour is an issue of BaseX, because AUTOFLASH is 
implcitly set to ON (see BaseX documentation 
http://docs.basex.org/wiki/Options#AUTOFLUSH), so DB clients must explicitly 
set AUTOFLASH = OFF in order to keep the insert operation durations relatively 
constant over time. Additionally, no explicitly flushing data, increases the 
risk of data loss (see BaseX documentation 
http://docs.basex.org/wiki/Options#AUTOFLUSH), but clients how repeatedly 
execute the FLUSH command increase the durations of the subsequent insert 
operations.

Regards,
Lucian

________________________________________
Von: Christian Grün [christian.gr...@gmail.com]
Gesendet: Dienstag, 10. Januar 2017 17:33
An: Bularca, Lucian
Cc: Dirk Kirsten; basex-talk@mailman.uni-konstanz.de
Betreff: Re: [basex-talk] Gravierende Performance-Einbüße bei Persistierung von 
mehr als 5000, 160 KB große XML Datenstrukturen.

Hi Lucian,

I couldn’t run your code example out of the box. 24 hours sounds
pretty alarming, though, so I have written my own example (attached).
It creates 50.000 XML documents, each sized around 160 KB. It’s not as
fast as I had expected, but the total runtime is around 13 minutes,
and it only slow down a little when adding more documents...

10000: 125279.45 ms
20000: 128244.23 ms
30000: 130499.9 ms
40000: 132286.05 ms
50000: 134814.82 ms

Maybe you could compare the code with yours, and we can find out what
causes the delay?

Best,
Christian


On Tue, Jan 10, 2017 at 4:44 PM, Bularca, Lucian
<lucian.bula...@mueller.de> wrote:
> Hi Dirk,
>
>  of course, querying millions of data entries on a single database rise
> problems. This is equally problematic for all databases, not only for the
> BaseX DB and certain storing strategies will be mandatory at production
> time.
>
> The actual problem is, that adding 50.000 of 160 KB xml stuctures took 24
> hours because that inexplicable monotonic increase of the insert operation
> durations.
>
> I'll really appreciate if someone can explain this behaviour or a
> counterexample can demonstrate, that the cause of this behaviour is test
> case but not DB inherent.
>
> Regards,
> Lucian

Reply via email to