The extra space is for the indexes. -jh-
On Apr 26, 2011, at 9:51 AM, Rajesh Marklogic wrote: > Hi Damon, > > Using Record loader, i could upload the million xml documents successfully. > The total size of the document is 40 mb, but the forest size is increased to > 70 mb. > > Any idea why the forest size is double than actual file size? > > Thanks and Regards > > Rajesh Govindan > > On Tue, Apr 19, 2011 at 11:28 PM, Damon Feldman <[email protected]> > wrote: > Rajesh, > > Each module invoke such as yours below runs as a single transaction with all > the data in memory. For thousands of XML documents, you should break the work > up into smaller chunks. > > The InformationStudio flows available in version 4.2 will do this > automatically, and also provide a nice GUI for viewing progress, unloading > the data later, and checking on errors. > > Also, the Java-based RecordLoader utility > (http://developer.marklogic.com/code/recordloader, > http://marklogic.github.com/recordloader/tutorial.html) will insert documents > in smaller chunks. It does not provide all the power of InformationStudio, > but can be faster in some instances. > > Yours, > Damon > > From: [email protected] > [[email protected]] On Behalf Of Rajesh Marklogic > [[email protected]] > Sent: Tuesday, April 19, 2011 1:03 PM > To: [email protected] > Subject: [MarkLogic Dev General] Loading xml files in mark logic server > > Hi > > We are trying to load 14 million xml files in Mark logic database. The below > xdmp:document-load script could load maximum 5000 xml files at a time. > Anything more than 5000 xml files threw Memory exceptions. > > xquery version "1.0-ml"; > > let $files:=xdmp:filesystem-directory("/filePath/") > for $filepath in $files//dir:entry[1 to 5000] > return (xdmp:document-load($filepath//dir:pathname, > <options xmlns="xdmp:document-load"> > <uri>{$filepath//dir:filename/text()}</uri> > <permissions>{xdmp:default-permissions()}</permissions> > <format>xml</format> > <repair>none</repair> > </options>)) > > > Is there any configuration changes required in admin setting to load all the > 14 million xml files in 3 to 4 hours?. The total size of the content will be > around 4GB and we have Unix server with 250 GB memory (RAM) > > It would be great, if you suggest an best approach to load all the 14 > million xml files in the time frame of 3-4 hours. > > Thanks and Regards > > Rajesh > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
