By default there are a number of indexes enabled. You have admin control over them, but MarkLogic chooses a good default set. A doubling of size isn't abnormal. In fact, compared to other products it's actually very good. MarkLogic works hard to provide so many rich indexes at such a small size.
Sent from my iPhone On Apr 26, 2011, at 8:03 PM, Rajesh Marklogic <[email protected]> wrote: > Thanks for your reply. > > We didn't configure any index for the database yet. Does the default index > double the size? > > regards > > Rajesh > > On Tue, Apr 26, 2011 at 10:30 PM, Jason Hunter <[email protected]> wrote: > The extra space is for the indexes. > > -jh- > > On Apr 26, 2011, at 9:51 AM, Rajesh Marklogic wrote: > >> Hi Damon, >> >> Using Record loader, i could upload the million xml documents successfully. >> The total size of the document is 40 mb, but the forest size is increased to >> 70 mb. >> >> Any idea why the forest size is double than actual file size? >> >> Thanks and Regards >> >> Rajesh Govindan >> >> On Tue, Apr 19, 2011 at 11:28 PM, Damon Feldman >> <[email protected]> wrote: >> Rajesh, >> >> Each module invoke such as yours below runs as a single transaction with all >> the data in memory. For thousands of XML documents, you should break the >> work up into smaller chunks. >> >> The InformationStudio flows available in version 4.2 will do this >> automatically, and also provide a nice GUI for viewing progress, unloading >> the data later, and checking on errors. >> >> Also, the Java-based RecordLoader utility >> (http://developer.marklogic.com/code/recordloader, >> http://marklogic.github.com/recordloader/tutorial.html) will insert >> documents in smaller chunks. It does not provide all the power of >> InformationStudio, but can be faster in some instances. >> >> Yours, >> Damon >> >> From: [email protected] >> [[email protected]] On Behalf Of Rajesh Marklogic >> [[email protected]] >> Sent: Tuesday, April 19, 2011 1:03 PM >> To: [email protected] >> Subject: [MarkLogic Dev General] Loading xml files in mark logic server >> >> Hi >> >> We are trying to load 14 million xml files in Mark logic database. The below >> xdmp:document-load script could load maximum 5000 xml files at a time. >> Anything more than 5000 xml files threw Memory exceptions. >> >> xquery version "1.0-ml"; >> >> let $files:=xdmp:filesystem-directory("/filePath/") >> for $filepath in $files//dir:entry[1 to 5000] >> return (xdmp:document-load($filepath//dir:pathname, >> <options xmlns="xdmp:document-load"> >> <uri>{$filepath//dir:filename/text()}</uri> >> <permissions>{xdmp:default-permissions()}</permissions> >> <format>xml</format> >> <repair>none</repair> >> </options>)) >> >> >> Is there any configuration changes required in admin setting to load all the >> 14 million xml files in 3 to 4 hours?. The total size of the content will be >> around 4GB and we have Unix server with 250 GB memory (RAM) >> >> It would be great, if you suggest an best approach to load all the 14 >> million xml files in the time frame of 3-4 hours. >> >> Thanks and Regards >> >> Rajesh >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
