In my Mirabel system, I create a link database that records all the links made in a set of documents. This becomes a “where used” index over the content.
We have on the order if 200K links for one content set, so at 0.1 second per link it takes about 7 hours to build this index. I’m currently doing this in one process that builds the whole index and then stores it in a database. This is failing in hard-to-diagnose ways, for example, because a database has a write lock on it when I go to rename it from it’s temp name to it’s production name (to replace the current production version). The data is such that I could parallelize the processing but I’m not sure how I would do that in BaseX so that I can safely write to a single database from multiple threads. The fork-join() docs clearly say “non-updating” functions, so that doesn’t seem to be an option. I have multiple BaseX HTTP servers running so I could farm processing across them, but I think I would then run into write lock issues. I could create separate databases for each thread of operation and then combine those at the end—that seems like it might be the best option. Have I missed anything? Thanks, Eliot _____________________________________________ Eliot Kimber Sr. Staff Content Engineer O: 512 554 9368 servicenow servicenow.com<https://www.servicenow.com> LinkedIn<https://www.linkedin.com/company/servicenow> | X<https://twitter.com/servicenow> | YouTube<https://www.youtube.com/user/servicenowinc> | Instagram<https://www.instagram.com/servicenow>

