[arangodb-google] Re: Read-only external process access to a live server's storage

Jan Fri, 24 Feb 2017 02:51:13 -0800

Hi,

in general that would be possible, because another process could simply 
open a collection's memory-mapped datafiles and read them.


Still there are a few issues with this approach:
- all data is written into the write-ahead log (WAL) first, and then 
eventually transferred to the datafiles of a collection. If an external 
reader wants to process all data of a collection by reading the 
collection's datafiles, it must trigger a WAL flush first and wait until 
all data has been transferred from the WAL to the datafiles.
- the server may compact existing datafiles of collections at any time. 
Datafiles may become obsolete because of the compaction, and will get 
deleted eventually. ArangoDB is not aware of external processes reading its 
datafiles, so an external process may crash when reading a datafile that 
ArangoDB is currently unmapping or physically deleting. A simple fix for 
this is to turn off the compaction for a collection, but that will lead to 
ever-growing datafiles for this collection. This may not be a problem if 
there are only few update/remove operations on this collection.
- the procedure relies on the datafiles being arranged in a certain way, 
with certain storage format. The storage format of ArangoDB may change in 
future versions and external processes that read ArangoDB's datafiles may 
need to be adjusted then.
- the datafiles written by ArangoDB are append-only and are not indexed. 
Using datafiles to quickly locate documents or connected documents is not 
ideal without maintaining a separate index in the external process (which 
effectively requires reading the collection datafiles completely once to 
build up the index first).

As there are several disadvantages with this approach, I suggest looking 
for an alternative. Is it possible to run the graph traversals/document 
lookups in JavaScript inside the server, and expose that over a REST API? 
That would minimize the number of HTTP requests and does not require any 
modifications to the server code. 

Best regards
Jan

Am Donnerstag, 23. Februar 2017 20:24:22 UTC+1 schrieb Alexandre Rostovtsev:
>
> Hi all,
>
> With ArangoDB, is it feasible for an external process to open the data 
> files of a live database server in read-only mode, and perform graph 
> traversal using them?
>
> Assuming competent c++ developers and a willingness to read, patch and 
> customize Arango source, but not rewrite half of it.
>
> (We are trying to traverse large graphs with filters that cannot be 
> efficiently implemented on the server side. We can live with the overhead 
> of mmap over nfs, but wish to avoid the overhead of tens of thousands of 
> http requests.)
>
> -Alexandre Rostovtsev.
>

-- 
You received this message because you are subscribed to the Google Groups 
"ArangoDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[arangodb-google] Re: Read-only external process access to a live server's storage

Reply via email to