/////////////// MarkLogic always indexes element values and element-attribute values in a hash index. No extra configuration is needed, and it can't be turned off.
Element, attribute, and path range indexes are value indexes. These are only needed for fast sorting, inequality lookups, facets, and similar operations. [DAL:] /// This is true ... but the poster is experiencing unusual slowness. For data of this size, results should be extremely quick -- but -- the devil is the details. So why is it slow ? The answer first then the fishing pole. 1) You say you split your documents into smaller files " I split 1.1GB xml file into small pieces of xmls (1000 elements each)." Could you be more precise ? Ideally you should split the XML file into as separate documents that are logical units ... I am guessing that "1000" was picked just to get the files smaller, if this is true, use 1 not 1000 ... each document should be like a table row ... contain one (hierarchical) collection of self-contained information. 2) your using a "*" for the element but it must be under another specific element. Indexes and indexed searches default to element/attribute *pairs* ... (not precisely, but useful to think this way). To see what can be indexed efficiently sometimes its useful to see the primitive search API's A good guide is here: http://docs.marklogic.com/guide/search-dev/cts_query#chapter But a quick look for anything starting with cts: and has the word "query" in it is useful. Go here: http://docs.marklogic.com/guide/search-dev Click the "XQuery'XSLT" tab and type "cts:" (wait a few secs for your browser to update) These are the query related primitive APIS's and give a good clue as to whats efficient out of the box and what needs help. Note there is no cts:attribute-query ... only cts:element-attribute items. This is a close match: http://docs.marklogic.com/cts:element-attribute-value-query This is why the suggestion for a Path index (which can explicitly add a new index for your attributes). But why need this ? Because your xpath has a * for the element name. /transaction/*[@transInfoRef='ti1'] This won't optimize with the default indexes ... because the system has no idea what element/attribute pair your looking for ... Add to that is my suspicion that you didn't break down your XML files into individual transactions. So what the server has to do is 1) Find all element/attribute index matches with @transInfoRef='ti1' in all elements. 2) Since it is not sure if that element is a direct child of /transaction it needs to load every document 3) Load each and every document, re-parse it, and then search to see of the "*" associated with the @transInfoRef matches an element as a direct child of /transaction/ 4) Return you all documents ... not able to stop until the entire DB is searched. Not so good ... If you add a path range index this will optimize, but there's other ways. For example if you know all possible (or useful) element names which are associated with your attribute you can enumerate them in the search. This will allow the search to be resolved 100% from indexes (providing you split your documents into 1 transaction per document). So first do that - resplit your docs down to 1 document per "main XML Element" ... in your snippet I would guess this is <transaction> - Ideally don't use more than 1 transaction XML element per document or the server will still have to dig into documents where it finds possibly 1 match to locate them all. It can work with bigger groups but its better not to. An easy way to try (prove/disprove) this is to use QConsole http://localhost:8000/qconsole/ Now since I don't know your data - I copied the one element in and just had the system find the names for me. You don't want to do this for every query - but it's a way to prove the queries can be fast ... If you don't know at coding time all the element names then either use a path index, or you can use this trick, and store the results ... but that gets more advanced. Still its worth the try to see what difference this makes. let $elems := distinct-values(/transaction/*/node-name(.)) return cts:search(/transaction, cts:element-attribute-value-query( $elems , xs:QName("transInfoRef"), "ti1")) Try using the Query Console "Profile" to get an idea of what has to load documents and what can go to indexes. For deeper research the Query Plan is useful ... https://docs.marklogic.com/guide/performance You may find you can use a slightly different query that doesn't require extra tuning ... or you may find that you need to add a range or path index ... Finally ... how much data is in your results ? A fully optimized query tends to be liner with the output data size ... if you have a large number of matching rows then the results take a long time to get to you. This is another reason to use the search:search or cts:search functions which are easy to limit the result set and "paginate" them ... Or you can add [n to m] at the end of your xpath like (/transaction/*[@transInfoRef eq "ti1"])[1 to 10] If you are not sure, always limit your results until you discover a good size. ----------------------------------------------------------------------------- David Lee Lead Engineer MarkLogic Corporation d...@marklogic.com Phone: +1 812-482-5224 Cell: +1 812-630-7622 www.marklogic.com<http://www.marklogic.com/>
_______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general