Hi Michael,

IMHO I don't think it is the right way to handle data changes in a document 
oriented database.



An efficient way may be to add new versions as they come.



There is always a way to sort the related documents - sometimes with an 
attribute in the data,

or with a part of the filename. 

If not, you might have to build an index database containing the tuples 
<object_id, version, pre-id> (because pre-id node is constant in a append-only 
db).

Then I would write a simple function(object_id) returning the top element in 
the versions' list ordered by descending version (using hof:top-k-by for 
example).



You can also split your data in two :

a big readonly database containing the data before one point in time (index 
already setup).

a light append-only database containing the data after that point in time 
(where index update is fast or even UPDINDEX option is set).

On schedule, you would construct a new readonly database aggregating the back 
and front data.

Note that with two (or even more !) databases, you would have to add the 
database name in the index tuple <object_id, version, db-name, pre-id>



I had success with that update strategy when working with the EPO DOCDB 
collection 
(https://www.epo.org/searching-for-patents/data/bulk-data-sets/docdb.html#tab-2).

Thanks to Christian for giving me the right pointers when I needed to !



Hoping it helps,



Best regards,



Fabrice ETANCHAUD

De : @pyschny.de <[email protected]>
À : [email protected]
Sujet : [basex-talk] dB:update()
Date : 11/09/2018 16:09:01 CEST

I want to solve the following problem:
For $doc in $list-of-docs
detect differences in doc against the basex-db and add the changed records to 
the basex-db. 
After differences of each doc are added to the basex-dB create a new index for 
basex-dB which is required for the next $doc

How can I solve the problem that the added records are not visible for the 
index creation?
Michael 



Reply via email to