Re: [MarkLogic Dev General] xpath string construction

Michael Blakeley Tue, 14 Oct 2008 11:08:28 -0700

Eric,

I don't understand why you say that random ids don't scale well. I'dargue that sequential ids don't scale at all, and that the recursiveapproach scales pretty well. Maybe we mean different things when we talkabout scaling?

For small numbers of ids per transaction, where small is less than a fewhundred, the recursive random approach will perform well and will veryrarely lock any extra documents at all. The queries to see if a documentalready exists will be indexed, and will never return more than 1fragment. So I would expect this approach to scale at O(log n) with thenumber of documents in the database.

The sequential approach can do better with large number of ids pertransaction. But this approach requires a lock on the sequence document,for every transaction, effectively serializing everything. You canmitigate this with multiple sequence documents, but then you don't,strictly speaking, have a sequence anymore.

I suppose it depends on the workload you need to optimize: the randomapproach is better for highly-parallel workloads, while the sequentialapproach might do better for workloads that need thousands of ids pertransaction and don't mind being single-threaded.

In today's world, I'd rather use a solution that is capable of a highdegree of parallelism. So I rarely, if ever, use sequential ids.

To generate thousands of ids per transaction (say, 1 for each element ina document), I might start with a single random id. Any check would behighly unlikely to take more than zero iterations. Then I'd concatenatemy own, in-memory sequence onto it, for each descendant element.

So the root element might have an id "123af097324-1", its first childwould be "123af097324-2", etc (apply string-pad if you want the ids tohave a fixed length). This approach would preserve the scalability ofrandom ids, without requiring extra work to check any but the first id.


-- Mike

Eric Palmitesta wrote:

Wow, thanks for the reply, Michael. I'll probably be using somevariation of one of your examples.
Michael Blakeley wrote:
Many people ask about sequential ids. It is possible to model an idsequence as a database document. But as with RDBMS sequences, there areserialization penalties. I don't see the advantage of sequential ids, soI rarely, if ever, use this approach.
Assuming the recursive check isn't feasible (it doesn't scale well), theadvantage of sequential ids is being able to sleep at night knowingcollisions are simply impossible, and are not reliant on a 'good-enough'random() function. I'm nit-picking of course, I'm sure random() isfine. :)
Cheers,

Eric
_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general


_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

Re: [MarkLogic Dev General] xpath string construction

Reply via email to