Re: [MarkLogic Dev General] Interesting case where ML refuses to optimize XPath

Jason Hunter Sun, 06 Dec 2009 10:33:18 -0800

That query doesnt do what I want, because (shame on me) I havemultiple
docs with //row elements.

And do those row elements have RXAUI child elements that would matchthe value?

But just for testing I ran it and it performs about the same as the
cts:search case.
(4.5 sec) so it seems to be using indexes in that case.

Even that seems to be too slow for me where the result set is 8records

of about 100 bytes each.

Does the query I gave you return 8 results or does it return a lotmore because of your spurious row entries?

My real question here is one I'm trying to discover.   And one that I
think many people are asking.
Can I get MarkLogic to perform like an RDBMS in the (hopefully rare)
cases where the data really is like RDB data ?

You can definitely get Mark Logic to return 8 records in a fraction ofa second, and faster still if the 8 records have their expanded treecache records in memory. The fact that you're not seeing that makesme think something else is going on.

Your original query, for example, was actually crossing fragmentsbecause of the full path you were using, which required fragment joinwork that will slow things down. The query I provided had thepossibility of reducing that (and if it had been faster we couldn'tadded a more efficient rule to limit to the right document). Doingwhat Kelly suggested and breaking the fragments into their own docswill definitely eliminate the join. Let's see what happens there.


Also, what does query-trace() say about index utilization?

Note that philosophically, tiny fragments are just fine if thefragments are the unit of retrieval, as they are in your case. Theadvice not to make your fragments too small assumes that you mightwant to sometimes retrieve a full document made up of fragments and inthat case you don't want to have to build your doc out of lots of tinyfragment retrievals. You won't be doing that here, so tiny fragmentsare fine.

That is lots (millions) of small identical "rows" of data where I'dliketo 'simply' look up a row by an exact key match. Not word or phraseor
wildcard searching of big docs in the haystack,
but a real RDBMS style single key lookup type index.

Should be able to run extremely fast. MarkMail does it all the timefetching one mail out of millions, with each mail being fairly small.

What I going to experiment with next is sticking this particularfile in
an RDBMS and using the SQL connector code ... Yuck.  I was really
hopping not to do that.


You shouldn't have to.

Another idea, which I think is pretty ugly, but might help, is to
artificially create structure where none exists. For example saygroup
the records by the first 2 digits of the key value into a document and
reduce the fragmentation by 100x But even getting thisrestructuringdone is painful because the doc is too big to load into memory so Ineed
to use a DB just to get at it.
Which probably means I load it into an RDBMS to restructure the XML or
maybe just leave it there.


You shouldn't have to do that either.

-jh-

_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Interesting case where ML refuses to optimize XPath

Reply via email to