The extra space is for the indexes.

-jh-

On Apr 26, 2011, at 9:51 AM, Rajesh Marklogic wrote:

> Hi Damon,
> 
> Using Record loader, i could upload the million xml documents successfully. 
> The total size of the document is 40 mb, but the forest size is increased to 
> 70 mb.
> 
> Any idea  why the forest size is double than actual file size?
> 
> Thanks and Regards
> 
> Rajesh Govindan
> 
> On Tue, Apr 19, 2011 at 11:28 PM, Damon Feldman <[email protected]> 
> wrote:
> Rajesh,
>  
> Each module invoke such as yours below runs as a single transaction with all 
> the data in memory. For thousands of XML documents, you should break the work 
> up into smaller chunks.
>  
> The InformationStudio flows available in version 4.2 will do this 
> automatically, and also provide a nice GUI for viewing progress, unloading 
> the data later, and checking on errors.
>  
> Also, the Java-based RecordLoader utility 
> (http://developer.marklogic.com/code/recordloader, 
> http://marklogic.github.com/recordloader/tutorial.html) will insert documents 
> in smaller chunks. It does not provide all the power of InformationStudio, 
> but can be faster in some instances.
>  
> Yours,
> Damon
>  
> From: [email protected] 
> [[email protected]] On Behalf Of Rajesh Marklogic 
> [[email protected]]
> Sent: Tuesday, April 19, 2011 1:03 PM
> To: [email protected]
> Subject: [MarkLogic Dev General] Loading xml files in mark logic server
> 
> Hi 
> 
> We are trying to load 14 million xml files in Mark logic database. The below 
> xdmp:document-load script could load maximum 5000 xml files at a time.  
> Anything more than 5000 xml files threw Memory exceptions.
> 
> xquery version "1.0-ml";
> 
> let $files:=xdmp:filesystem-directory("/filePath/")
> for $filepath in $files//dir:entry[1 to 5000]
> return (xdmp:document-load($filepath//dir:pathname,
> <options xmlns="xdmp:document-load">          
>        <uri>{$filepath//dir:filename/text()}</uri>       
>        <permissions>{xdmp:default-permissions()}</permissions>        
>       <format>xml</format>
>        <repair>none</repair>       
>     </options>)) 
> 
> 
> Is there any configuration changes required in admin setting to load all the 
> 14 million xml files in 3 to 4 hours?. The total size of the content will be 
> around 4GB and we have Unix server with 250 GB memory (RAM)
> 
> It would be great, if you suggest an best  approach to load all the 14 
> million xml files in the time frame of 3-4 hours.
> 
> Thanks and Regards
> 
> Rajesh 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to