Thanks a lot for the explanation. On Friday, February 24, 2017 at 5:50:12 AM UTC-5, Jan wrote: > > Hi, > > in general that would be possible, because another process could simply > open a collection's memory-mapped datafiles and read them. > > Still there are a few issues with this approach: > - all data is written into the write-ahead log (WAL) first, and then > eventually transferred to the datafiles of a collection. If an external > reader wants to process all data of a collection by reading the > collection's datafiles, it must trigger a WAL flush first and wait until > all data has been transferred from the WAL to the datafiles. > - the server may compact existing datafiles of collections at any time. > Datafiles may become obsolete because of the compaction, and will get > deleted eventually. ArangoDB is not aware of external processes reading its > datafiles, so an external process may crash when reading a datafile that > ArangoDB is currently unmapping or physically deleting. A simple fix for > this is to turn off the compaction for a collection, but that will lead to > ever-growing datafiles for this collection. This may not be a problem if > there are only few update/remove operations on this collection. > - the procedure relies on the datafiles being arranged in a certain way, > with certain storage format. The storage format of ArangoDB may change in > future versions and external processes that read ArangoDB's datafiles may > need to be adjusted then. >
The usage pattern is frequent reads, rare updates, so waiting for WAL flush and compaction to finish is no problem. And of course we realize we will be relying on a particular Arango version's storage format. > - the datafiles written by ArangoDB are append-only and are not indexed. > Using datafiles to quickly locate documents or connected documents is not > ideal without maintaining a separate index in the external process (which > effectively requires reading the collection datafiles completely once to > build up the index first). > I was hoping that Arango's own hash indexes are also stored on disk, and we could make use of them. Or is that not possible? > As there are several disadvantages with this approach, I suggest looking > for an alternative. Is it possible to run the graph traversals/document > lookups in JavaScript inside the server, and expose that over a REST API? > That would minimize the number of HTTP requests and does not require any > modifications to the server code. > Unfortunately, the traversal filters will make use of extremely large non-Arango data files that live on other nodes in the cluster. Keeping the graph database and the other raw data together on one node would be difficult. Best regards > Jan > -- You received this message because you are subscribed to the Google Groups "ArangoDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
