[MarkLogic Dev General] Marklogic Java API - Retrieving All Documents From Database

Robert Kennedy Thu, 15 Oct 2015 18:48:39 -0700

Hi,

I have written a Java program to return all documents from a specified 
Marklogic database, flatten the documents into pipe delimited format and write 
them to file. My question is if there is any faster way of returning all the 
documents than the way I am doing it as for a large database - for 30 million 
documents it takes ~ 15 hours. The timing does not seem to scale in a linear 
fashion as a database with 2.5 million of the same documents only takes 20 
minutes to extract.


The way I have written it is to set a page size of 100000(I have tried 
different values and 100000 seems to roughly work best) and do an empty search 
on the database. The search returns a page at a time, and the program sends the 
page of documents to a separate thread for processing and goes back to 
Marklogic to get the next page.

If there is a faster way to return all documents via the Java API without doing 
a search, please let me know! Alternatively if there are some settings I can 
use, either in the API or directly in Marklogic that might make a big 
difference to this kind of task, that would be great to know.

I should say that I am running the program directly on the same server as 
Marklogic to minimize any network delay.

Thanks,
Robert

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Marklogic Java API - Retrieving All Documents From Database

Reply via email to