Hi, in general that would be possible, because another process could simply open a collection's memory-mapped datafiles and read them.
Still there are a few issues with this approach: - all data is written into the write-ahead log (WAL) first, and then eventually transferred to the datafiles of a collection. If an external reader wants to process all data of a collection by reading the collection's datafiles, it must trigger a WAL flush first and wait until all data has been transferred from the WAL to the datafiles. - the server may compact existing datafiles of collections at any time. Datafiles may become obsolete because of the compaction, and will get deleted eventually. ArangoDB is not aware of external processes reading its datafiles, so an external process may crash when reading a datafile that ArangoDB is currently unmapping or physically deleting. A simple fix for this is to turn off the compaction for a collection, but that will lead to ever-growing datafiles for this collection. This may not be a problem if there are only few update/remove operations on this collection. - the procedure relies on the datafiles being arranged in a certain way, with certain storage format. The storage format of ArangoDB may change in future versions and external processes that read ArangoDB's datafiles may need to be adjusted then. - the datafiles written by ArangoDB are append-only and are not indexed. Using datafiles to quickly locate documents or connected documents is not ideal without maintaining a separate index in the external process (which effectively requires reading the collection datafiles completely once to build up the index first). As there are several disadvantages with this approach, I suggest looking for an alternative. Is it possible to run the graph traversals/document lookups in JavaScript inside the server, and expose that over a REST API? That would minimize the number of HTTP requests and does not require any modifications to the server code. Best regards Jan Am Donnerstag, 23. Februar 2017 20:24:22 UTC+1 schrieb Alexandre Rostovtsev: > > Hi all, > > With ArangoDB, is it feasible for an external process to open the data > files of a live database server in read-only mode, and perform graph > traversal using them? > > Assuming competent c++ developers and a willingness to read, patch and > customize Arango source, but not rewrite half of it. > > (We are trying to traverse large graphs with filters that cannot be > efficiently implemented on the server side. We can live with the overhead > of mmap over nfs, but wish to avoid the overhead of tens of thousands of > http requests.) > > -Alexandre Rostovtsev. > -- You received this message because you are subscribed to the Google Groups "ArangoDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
