RE: [MarkLogic Dev General] Interesting case where ML refuses tooptimize XPath

Kelly Stirman Sun, 06 Dec 2009 07:39:10 -0800

David,

Can you try loading each of your "rows" as a separate document and report back? 
For this test, it may be easiest to do this in a new database. Alternately, you 
can put all these documents in a common directory or collection, which will 
allow you to efficiently query these new documents.
 
Kelly

Message: 2
Date: Sun, 6 Dec 2009 04:55:49 -0800
From: "Lee, David" <[email protected]>
Subject: RE: [MarkLogic Dev General] Interesting case where ML refuses
        to      optimize XPath
To: "General Mark Logic Developer Discussion"
        <[email protected]>
Message-ID: <dd37f70d78609d4e9587d473fc61e0a71467b...@postoffice>
Content-Type: text/plain;       charset="us-ascii"

That query doesnt do what I want, because (shame on me) I have multiple docs 
with //row elements.
But just for testing I ran it and it performs about the same as the cts:search 
case.
(4.5 sec) so it seems to be using indexes in that case.
Even that seems to be too slow for me where the result set is 8 records of 
about 100 bytes each.

My real question here is one I'm trying to discover.   And one that I
think many people are asking.
Can I get MarkLogic to perform like an RDBMS in the (hopefully rare) cases 
where the data really is like RDB data ?
That is lots (millions) of small identical "rows" of data where I'd like to 
'simply' look up a row by an exact key match.  Not word or phrase or wildcard 
searching of big docs in the haystack, but a real RDBMS style single key lookup 
type index.

I tried creating a FIELD but that didn't seem to do much good.   
(The field search wasn't any faster then element-word searches).

What's interesting is I can search other document sets and return hundreds of 
results in < 200ms but this one is really thrashing ML.  I suspect due to the 
high fragmentation.  (3 million fragments).
But what's the suggestion when the data really is flat like this ?   If
I dont fragment it,
it makes a 1G XML file .. which blows up ML.  There's no structure in-between.

What I going to experiment with next is sticking this particular file in an 
RDBMS and using the SQL connector code ... Yuck.  I was really hopping not to 
do that.

Another idea, which I think is pretty ugly, but might help, is to
artificially create structure where none exists.   For example say group
the records by the first 2 digits of the key value into a document and
reduce the fragmentation by 100x    But even getting this restructuring
done is painful because the doc is too big to load into memory so I need to use 
a DB just to get at it.
Which probably means I load it into an RDBMS to restructure the XML or maybe 
just leave it there.

I'm sure others have had this kind of problem ? Any suggestions for techniques 
for handling millions of "rows" of very small "records"  ?
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

RE: [MarkLogic Dev General] Interesting case where ML refuses tooptimize XPath

Reply via email to