Hi, I have written a Java program to return all documents from a specified Marklogic database, flatten the documents into pipe delimited format and write them to file. My question is if there is any faster way of returning all the documents than the way I am doing it as for a large database - for 30 million documents it takes ~ 15 hours. The timing does not seem to scale in a linear fashion as a database with 2.5 million of the same documents only takes 20 minutes to extract.
The way I have written it is to set a page size of 100000(I have tried different values and 100000 seems to roughly work best) and do an empty search on the database. The search returns a page at a time, and the program sends the page of documents to a separate thread for processing and goes back to Marklogic to get the next page. If there is a faster way to return all documents via the Java API without doing a search, please let me know! Alternatively if there are some settings I can use, either in the API or directly in Marklogic that might make a big difference to this kind of task, that would be great to know. I should say that I am running the program directly on the same server as Marklogic to minimize any network delay. Thanks, Robert
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
