Thanks for your reply. We didn't configure any index for the database yet. Does the default index double the size?
regards Rajesh On Tue, Apr 26, 2011 at 10:30 PM, Jason Hunter <[email protected]>wrote: > The extra space is for the indexes. > > -jh- > > On Apr 26, 2011, at 9:51 AM, Rajesh Marklogic wrote: > > Hi Damon, > > Using Record loader, i could upload the million xml documents successfully. > The total size of the document is 40 mb, but the forest size is increased to > 70 mb. > > Any idea why the forest size is double than actual file size? > > Thanks and Regards > > Rajesh Govindan > > On Tue, Apr 19, 2011 at 11:28 PM, Damon Feldman < > [email protected]> wrote: > >> Rajesh, >> >> Each module invoke such as yours below runs as a single transaction with >> all the data in memory. For thousands of XML documents, you should break the >> work up into smaller chunks. >> >> The InformationStudio flows available in version 4.2 will do this >> automatically, and also provide a nice GUI for viewing progress, unloading >> the data later, and checking on errors. >> >> Also, the Java-based RecordLoader utility ( >> http://developer.marklogic.com/code/recordloader, >> http://marklogic.github.com/recordloader/tutorial.html) will insert >> documents in smaller chunks. It does not provide all the power of >> InformationStudio, but can be faster in some instances. >> >> Yours, >> Damon >> >> ------------------------------ >> *From:* [email protected] [ >> [email protected]] On Behalf Of Rajesh Marklogic [ >> [email protected]] >> *Sent:* Tuesday, April 19, 2011 1:03 PM >> *To:* [email protected] >> *Subject:* [MarkLogic Dev General] Loading xml files in mark logic server >> >> Hi >> >> We are trying to load 14 million xml files in Mark logic database. The >> below xdmp:document-load script could load maximum 5000 xml files at a time. >> Anything more than 5000 xml files threw Memory exceptions. >> >> xquery version "1.0-ml"; >> >> let $files:=xdmp:filesystem-directory("/filePath/") >> for $filepath in $files//dir:entry[1 to 5000] >> return (xdmp:document-load($filepath//dir:pathname, >> <options xmlns="xdmp:document-load"> >> <uri>{$filepath//dir:filename/text()}</uri> >> <permissions>{xdmp:default-permissions()}</permissions> >> <format>xml</format> >> <repair>none</repair> >> </options>)) >> >> >> Is there any configuration changes required in admin setting to load all >> the 14 million xml files in 3 to 4 hours?. The total size of the content >> will be around 4GB and we have Unix server with 250 GB memory (RAM) >> >> It would be great, if you suggest an best approach to load all the 14 >> million xml files in the time frame of 3-4 hours. >> >> Thanks and Regards >> >> Rajesh >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> >> > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > >
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
