I have RecordLoader chugging away at this, I'm putting them into a separate DB/Forest and will report back when I have some stats on using separate docs.
-----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Lee, David Sent: Sunday, December 06, 2009 11:05 AM To: General Mark Logic Developer Discussion Subject: RE: [MarkLogic Dev General] Interesting case where MLrefusestooptimize XPath I will try to do this, but its 3 *million* documents. I dont think I can do that on my filesystem then load them remotely. maybe dynamically somehow. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Kelly Stirman Sent: Sunday, December 06, 2009 10:39 AM To: [email protected] Subject: RE: [MarkLogic Dev General] Interesting case where ML refusestooptimize XPath David, Can you try loading each of your "rows" as a separate document and report back? For this test, it may be easiest to do this in a new database. Alternately, you can put all these documents in a common directory or collection, which will allow you to efficiently query these new documents. Kelly Message: 2 Date: Sun, 6 Dec 2009 04:55:49 -0800 From: "Lee, David" <[email protected]> Subject: RE: [MarkLogic Dev General] Interesting case where ML refuses to optimize XPath To: "General Mark Logic Developer Discussion" <[email protected]> Message-ID: <dd37f70d78609d4e9587d473fc61e0a71467b...@postoffice> Content-Type: text/plain; charset="us-ascii" That query doesnt do what I want, because (shame on me) I have multiple docs with //row elements. But just for testing I ran it and it performs about the same as the cts:search case. (4.5 sec) so it seems to be using indexes in that case. Even that seems to be too slow for me where the result set is 8 records of about 100 bytes each. My real question here is one I'm trying to discover. And one that I think many people are asking. Can I get MarkLogic to perform like an RDBMS in the (hopefully rare) cases where the data really is like RDB data ? That is lots (millions) of small identical "rows" of data where I'd like to 'simply' look up a row by an exact key match. Not word or phrase or wildcard searching of big docs in the haystack, but a real RDBMS style single key lookup type index. I tried creating a FIELD but that didn't seem to do much good. (The field search wasn't any faster then element-word searches). What's interesting is I can search other document sets and return hundreds of results in < 200ms but this one is really thrashing ML. I suspect due to the high fragmentation. (3 million fragments). But what's the suggestion when the data really is flat like this ? If I dont fragment it, it makes a 1G XML file .. which blows up ML. There's no structure in-between. What I going to experiment with next is sticking this particular file in an RDBMS and using the SQL connector code ... Yuck. I was really hopping not to do that. Another idea, which I think is pretty ugly, but might help, is to artificially create structure where none exists. For example say group the records by the first 2 digits of the key value into a document and reduce the fragmentation by 100x But even getting this restructuring done is painful because the doc is too big to load into memory so I need to use a DB just to get at it. Which probably means I load it into an RDBMS to restructure the XML or maybe just leave it there. I'm sure others have had this kind of problem ? Any suggestions for techniques for handling millions of "rows" of very small "records" ? _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
