You might also try using info:load, which loads things in batches.

http://docs.marklogic.com/4.2doc/docapp.xqy#display.xqy?fname=http://pubs/4.2doc/apidoc/info.xml&category=Information%20Studio&function=info:load

-Danny

From: [email protected] 
[mailto:[email protected]] On Behalf Of Damon Feldman
Sent: Tuesday, April 19, 2011 10:59 AM
To: General MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Loading xml files in mark logic server

Rajesh,

Each module invoke such as yours below runs as a single transaction with all 
the data in memory. For thousands of XML documents, you should break the work 
up into smaller chunks.

The InformationStudio flows available in version 4.2 will do this 
automatically, and also provide a nice GUI for viewing progress, unloading the 
data later, and checking on errors.

Also, the Java-based RecordLoader utility 
(http://developer.marklogic.com/code/recordloader, 
http://marklogic.github.com/recordloader/tutorial.html) will insert documents 
in smaller chunks. It does not provide all the power of InformationStudio, but 
can be faster in some instances.

Yours,
Damon

________________________________
From: [email protected] 
[[email protected]] On Behalf Of Rajesh Marklogic 
[[email protected]]
Sent: Tuesday, April 19, 2011 1:03 PM
To: [email protected]
Subject: [MarkLogic Dev General] Loading xml files in mark logic server
Hi

We are trying to load 14 million xml files in Mark logic database. The below 
xdmp:document-load script could load maximum 5000 xml files at a time.  
Anything more than 5000 xml files threw Memory exceptions.

xquery version "1.0-ml";

let $files:=xdmp:filesystem-directory("/filePath/")
for $filepath in $files//dir:entry[1 to 5000]
return (xdmp:document-load($filepath//dir:pathname,
<options xmlns="xdmp:document-load">
       <uri>{$filepath//dir:filename/text()}</uri>
       <permissions>{xdmp:default-permissions()}</permissions>
      <format>xml</format>
       <repair>none</repair>
    </options>))


Is there any configuration changes required in admin setting to load all the 14 
million xml files in 3 to 4 hours?. The total size of the content will be 
around 4GB and we have Unix server with 250 GB memory (RAM)

It would be great, if you suggest an best  approach to load all the 14 million 
xml files in the time frame of 3-4 hours.

Thanks and Regards

Rajesh
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to