Your are probably running out of "inodes" or run into a limit of directory size, which will give the same error message if you have large numbers of files. You can check using "df -I" (assuming you are on a unix-like system which your filename suggests)
like this ... [dlee@z600 ~]$ df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/fedora-root 3276800 231268 3045532 8% / devtmpfs 3081228 525 3080703 1% /dev tmpfs 3084004 1 3084003 1% /dev/shm ... The "IUse%" will show if your getting close to the limit (note that unless your root you cant get 100%) Some links: http://serverfault.com/questions/482173/is-there-any-other-reason-for-no-space-left-on-device http://stackoverflow.com/questions/466521/how-many-files-can-i-put-in-a-directory In any case if you get into the multi-thousands of files in a directory you will run into performance problems. There are several ways to solve this, the easiest is to run xsplit iteratively. First split by big chunks (depending on how many elements you have, you want at most about 1000 files at once0. so if you had a million elements, this gives you 1000 files of 1000 elements each xsplit -c 1000 Then process each of these files one by one and load them to ML and deleting (or zipping or otherwise combining) the temp files so at no time do you have millions of files ... with xmlsh you would do something like this : import ml=marklogic # for ml:put below mkdir big xsplit -c 1000 -o big file.xml for f in big/*.xml ; do rm -rf temp mkdir temp xsplit -o temp $f cd temp ### Load the 1000 files here then delete them ### This uses the marklogic extension for xmlsh ### You could use mlcp or other tool here or zip the files to a zip ml:put -baseuri /dir/ -maxfiles 100 -maxthreads 5 *.xml cd .. rm -rf temp done There are more efficient ways to do this but are trickier, try something like the above first and see if it helps. With lots of small files you need to batch them during the insert or it will go slowly, thus the arugments to ml:put but most marklogic tools for upload have options for this. mlcp is a good one and can read directly from zip files so you can instead of doing the upload there just zip the files into a zip and delete the temp files. Then you will have a bunch of zip files to upload instead of a million xml files. From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of irisDeveloper Sent: Wednesday, August 20, 2014 4:59 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Attribute indexing Hi David Thanks for detail explanation . I was working on your guidance of splitting large xml file with smaller xmls each carrying 1 transaction, I used xmlsh -xsplit utility one you recommended . Igot following error /xxx/yyyy/x6548087.xml (No space left on device) even though it has 70GB of space available. Thanks Samby On 08/19/2014 09:27 PM, David Lee wrote: /////////////// MarkLogic always indexes element values and element-attribute values in a hash index. No extra configuration is needed, and it can't be turned off. Element, attribute, and path range indexes are value indexes. These are only needed for fast sorting, inequality lookups, facets, and similar operations. [DAL:] /// This is true ... but the poster is experiencing unusual slowness. For data of this size, results should be extremely quick -- but -- the devil is the details. So why is it slow ? The answer first then the fishing pole. 1) You say you split your documents into smaller files " I split 1.1GB xml file into small pieces of xmls (1000 elements each)." Could you be more precise ? Ideally you should split the XML file into as separate documents that are logical units ... I am guessing that "1000" was picked just to get the files smaller, if this is true, use 1 not 1000 ... each document should be like a table row ... contain one (hierarchical) collection of self-contained information. 2) your using a "*" for the element but it must be under another specific element. Indexes and indexed searches default to element/attribute *pairs* ... (not precisely, but useful to think this way). To see what can be indexed efficiently sometimes its useful to see the primitive search API's A good guide is here: http://docs.marklogic.com/guide/search-dev/cts_query#chapter But a quick look for anything starting with cts: and has the word "query" in it is useful. Go here: http://docs.marklogic.com/guide/search-dev Click the "XQuery'XSLT" tab and type "cts:" (wait a few secs for your browser to update) These are the query related primitive APIS's and give a good clue as to whats efficient out of the box and what needs help. Note there is no cts:attribute-query ... only cts:element-attribute items. This is a close match: http://docs.marklogic.com/cts:element-attribute-value-query This is why the suggestion for a Path index (which can explicitly add a new index for your attributes). But why need this ? Because your xpath has a * for the element name. /transaction/*[@transInfoRef='ti1'] This won't optimize with the default indexes ... because the system has no idea what element/attribute pair your looking for ... Add to that is my suspicion that you didn't break down your XML files into individual transactions. So what the server has to do is 1) Find all element/attribute index matches with @transInfoRef='ti1' in all elements. 2) Since it is not sure if that element is a direct child of /transaction it needs to load every document 3) Load each and every document, re-parse it, and then search to see of the "*" associated with the @transInfoRef matches an element as a direct child of /transaction/ 4) Return you all documents ... not able to stop until the entire DB is searched. Not so good ... If you add a path range index this will optimize, but there's other ways. For example if you know all possible (or useful) element names which are associated with your attribute you can enumerate them in the search. This will allow the search to be resolved 100% from indexes (providing you split your documents into 1 transaction per document). So first do that - resplit your docs down to 1 document per "main XML Element" ... in your snippet I would guess this is <transaction> - Ideally don't use more than 1 transaction XML element per document or the server will still have to dig into documents where it finds possibly 1 match to locate them all. It can work with bigger groups but its better not to. An easy way to try (prove/disprove) this is to use QConsole http://localhost:8000/qconsole/ Now since I don't know your data - I copied the one element in and just had the system find the names for me. You don't want to do this for every query - but it's a way to prove the queries can be fast ... If you don't know at coding time all the element names then either use a path index, or you can use this trick, and store the results ... but that gets more advanced. Still its worth the try to see what difference this makes. let $elems := distinct-values(/transaction/*/node-name(.)) return cts:search(/transaction, cts:element-attribute-value-query( $elems , xs:QName("transInfoRef"), "ti1")) Try using the Query Console "Profile" to get an idea of what has to load documents and what can go to indexes. For deeper research the Query Plan is useful ... https://docs.marklogic.com/guide/performance You may find you can use a slightly different query that doesn't require extra tuning ... or you may find that you need to add a range or path index ... Finally ... how much data is in your results ? A fully optimized query tends to be liner with the output data size ... if you have a large number of matching rows then the results take a long time to get to you. This is another reason to use the search:search or cts:search functions which are easy to limit the result set and "paginate" them ... Or you can add [n to m] at the end of your xpath like (/transaction/*[@transInfoRef eq "ti1"])[1 to 10] If you are not sure, always limit your results until you discover a good size. ----------------------------------------------------------------------------- David Lee Lead Engineer MarkLogic Corporation d...@marklogic.com<mailto:d...@marklogic.com> Phone: +1 812-482-5224 Cell: +1 812-630-7622 www.marklogic.com<http://www.marklogic.com/> _______________________________________________ General mailing list General@developer.marklogic.com<mailto:General@developer.marklogic.com> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general