Hi, it depends a bit on how big the documents are. For smaller documents it will make sense to insert/import data with multiple parallel client threads.
If the documents are "big" and writing them to the storage engine becomes the bottleneck, then parallelizing the insert/import will not help so much. You may try out how much parallelization will help you by importing data in parallel using the bundled arangoimport binary. arangoimport provides an option `--threads`, which defaults to 2. You can try modifying the values for this option from 1 to whatever upper bound you think could make sense to see if there is any difference in the runtime of the import process. Apart from this, it will very likely make sense to insert documents in parallel if the single-document APIs are used. This is because the actual insertion time will only be a small fraction of each request, and a great deal of time will be spent for processing requests, putting together responses and waiting for the network. Here parallelization should help a lot. It may be different if you are already sending multiple documents to the server in a single batch, e.g. using the import API at POST /_api/import, or by sending an array of documents to POST /_api/document. Here the server may already be quite busy, but maybe parallelization can still help at least to some extent here. I suggest trying with arangoimport first to assess the potential benefits (if any). If you are using, please use the import format that has a single JSON document per line (jsonl). Best regards Jan Am Dienstag, 28. Mai 2019 14:02:33 UTC+2 schrieb Andreas Jung: > > We are currently using ArangoDB as a migration database (100.000 JSON > files, 50 GB data, about 25% of the JSON files contain base64 encoded > images, PDF files etc.). > I wrote a custom import script for the data that takes about 90 minutes > for the import using pyArango - one JSON file at a time...working nicely so > far. > Question: would it make sense parallelize the import in order to speed up > the import process? Or is the performance of ArangoDB CPU/IO bound for such > mass imports? > We are running a standard standalone installation of ArangoDB 3.4.5 on a > local SDD...no fancy setup. > > Andreas > -- You received this message because you are subscribed to the Google Groups "ArangoDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/arangodb/1dcc802b-6c67-42e1-9552-c21eb043f24d%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
