I have RecordLoader chugging away at this,  I'm putting them into a
separate DB/Forest
and will report back when I have some stats on using separate docs.



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Lee, David
Sent: Sunday, December 06, 2009 11:05 AM
To: General Mark Logic Developer Discussion
Subject: RE: [MarkLogic Dev General] Interesting case where
MLrefusestooptimize XPath

I will try to do this, but its 3 *million* documents.
I dont think I can do that on my filesystem then load them remotely.
maybe dynamically somehow.


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Kelly
Stirman
Sent: Sunday, December 06, 2009 10:39 AM
To: [email protected]
Subject: RE: [MarkLogic Dev General] Interesting case where ML
refusestooptimize XPath

David,

Can you try loading each of your "rows" as a separate document and
report back? For this test, it may be easiest to do this in a new
database. Alternately, you can put all these documents in a common
directory or collection, which will allow you to efficiently query these
new documents.
 
Kelly

Message: 2
Date: Sun, 6 Dec 2009 04:55:49 -0800
From: "Lee, David" <[email protected]>
Subject: RE: [MarkLogic Dev General] Interesting case where ML refuses
        to      optimize XPath
To: "General Mark Logic Developer Discussion"
        <[email protected]>
Message-ID: <dd37f70d78609d4e9587d473fc61e0a71467b...@postoffice>
Content-Type: text/plain;       charset="us-ascii"

That query doesnt do what I want, because (shame on me) I have multiple
docs with //row elements.
But just for testing I ran it and it performs about the same as the
cts:search case.
(4.5 sec) so it seems to be using indexes in that case.
Even that seems to be too slow for me where the result set is 8 records
of about 100 bytes each.


My real question here is one I'm trying to discover.   And one that I
think many people are asking.
Can I get MarkLogic to perform like an RDBMS in the (hopefully rare)
cases where the data really is like RDB data ?
That is lots (millions) of small identical "rows" of data where I'd like
to 'simply' look up a row by an exact key match.  Not word or phrase or
wildcard searching of big docs in the haystack, but a real RDBMS style
single key lookup type index.

I tried creating a FIELD but that didn't seem to do much good.   
(The field search wasn't any faster then element-word searches).

What's interesting is I can search other document sets and return
hundreds of results in < 200ms but this one is really thrashing ML.  I
suspect due to the high fragmentation.  (3 million fragments).
But what's the suggestion when the data really is flat like this ?   If
I dont fragment it,
it makes a 1G XML file .. which blows up ML.  There's no structure
in-between.

What I going to experiment with next is sticking this particular file in
an RDBMS and using the SQL connector code ... Yuck.  I was really
hopping not to do that.


Another idea, which I think is pretty ugly, but might help, is to
artificially create structure where none exists.   For example say group
the records by the first 2 digits of the key value into a document and
reduce the fragmentation by 100x    But even getting this restructuring
done is painful because the doc is too big to load into memory so I need
to use a DB just to get at it.
Which probably means I load it into an RDBMS to restructure the XML or
maybe just leave it there.

I'm sure others have had this kind of problem ? Any suggestions for
techniques for handling millions of "rows" of very small "records"  ?
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to