Eric,

I don't understand why you say that random ids don't scale well. I'd argue that sequential ids don't scale at all, and that the recursive approach scales pretty well. Maybe we mean different things when we talk about scaling?

For small numbers of ids per transaction, where small is less than a few hundred, the recursive random approach will perform well and will very rarely lock any extra documents at all. The queries to see if a document already exists will be indexed, and will never return more than 1 fragment. So I would expect this approach to scale at O(log n) with the number of documents in the database.

The sequential approach can do better with large number of ids per transaction. But this approach requires a lock on the sequence document, for every transaction, effectively serializing everything. You can mitigate this with multiple sequence documents, but then you don't, strictly speaking, have a sequence anymore.

I suppose it depends on the workload you need to optimize: the random approach is better for highly-parallel workloads, while the sequential approach might do better for workloads that need thousands of ids per transaction and don't mind being single-threaded.

In today's world, I'd rather use a solution that is capable of a high degree of parallelism. So I rarely, if ever, use sequential ids.

To generate thousands of ids per transaction (say, 1 for each element in a document), I might start with a single random id. Any check would be highly unlikely to take more than zero iterations. Then I'd concatenate my own, in-memory sequence onto it, for each descendant element.

So the root element might have an id "123af097324-1", its first child would be "123af097324-2", etc (apply string-pad if you want the ids to have a fixed length). This approach would preserve the scalability of random ids, without requiring extra work to check any but the first id.

-- Mike

Eric Palmitesta wrote:
Wow, thanks for the reply, Michael. I'll probably be using some variation of one of your examples.

Michael Blakeley wrote:
Many people ask about sequential ids. It is possible to model an id sequence as a database document. But as with RDBMS sequences, there are serialization penalties. I don't see the advantage of sequential ids, so I rarely, if ever, use this approach.

Assuming the recursive check isn't feasible (it doesn't scale well), the advantage of sequential ids is being able to sleep at night knowing collisions are simply impossible, and are not reliant on a 'good-enough' random() function. I'm nit-picking of course, I'm sure random() is fine. :)

Cheers,

Eric
_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

_______________________________________________
General mailing list
General@developer.marklogic.com
http://xqzone.com/mailman/listinfo/general

Reply via email to