subject:"Re\: \[MarkLogic Dev General\] New Feature Request\: Unique Value Range Indexes"

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-11 Thread Geert Josten

I'm sure you will find lots of mentions about this in markmail if you look
for unique id, and random. MarkLogic was using that same method internally
as well for creating its own objects. The idea is indeed that you run that
if in update mode, in the same transaction in which you plan to do the
insert. The 'lookahead' will create a read lock, which causes writes from
other transactions to wait and retry if necessary..

Cheers,
Geert

-Oorspronkelijk bericht-
Van: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] Namens Ron Hitchens
Verzonden: donderdag 5 juni 2014 00:19
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] New Feature Request: Unique Value
Range Indexes


   Unless your unique-uri() function is running in a non-update query, in
which case it runs lock free at a timestamp.  If you're using the pattern of
main code as a query and updates delegated to invoked/eval'ed transactions,
you could get bit by this.  It would work fine the vast majority of the
time, but you wouldn't be protected from someone else's update happening
between your check in the query and the execution of your invoked update.

   DOIs are a perfect example of what I'm talking about.  Or account
numbers, or patient record IDs, or aircraft tail numbers, etc.  The impact
of non-unique record identifiers can range from annoying all the way to
legally/financially costly or even life-threatening if you're managing
medication records, for example.

---
Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

On Jun 4, 2014, at 8:49 PM, Whitby, Rob rob.whi...@springer.com wrote:

 I thought 2 simultaneous transactions would both get read locks on the
uri, then one would get a write lock and the other would fail and retry.
Maybe I'm missing something though.
 
 But anyway, I agree unique indexes would be a handy feature. e.g. our docs
have a DOI element which *should* be unique but occasionally aren't, would
be nice to enforce that rather than have to code defensively.
 
 Rob
 
 From: general-boun...@developer.marklogic.com
[general-boun...@developer.marklogic.com] on behalf of Ron Hitchens
[r...@ronsoft.com]
 Sent: 04 June 2014 19:31
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value
RangeIndexes
 
 Rob,
 
   I believe there is a race condition here.  A document may not exit as-of
the timestamp when this request starts running, but some other request could
create one while it's running.  This request would then over-write that
document.
 
   I'm actually more concerned about element values inside documents than
generating unique document URIs.  It's easy to generate document URIs with
64-bit random numbers that are very unlikely to collide.  But I want to
guarantee that some meaningful value inside a document is unique across all
documents.
 
   In my case, the naming space is actually quite small because I want the
IDs to be meaningful but unique.  For example images:cats:fluffy:XX.png,
where XX can increment or be set randomly until the ID is unique.  One way
to check for uniqueness is to make the document URI from this ID, then test
for an existing document.
 
   But this doesn't solve the general problem.  I could conceivably have
multiple elements in the document that I want to be unique.  To check for
unique element values it's necessary to run a cts query against the
element(s).  And I'm not sure if you can completely close the race window
between checking for an existing instance and inserting a new one if the
query comes back empty.
 
   Someone from ML pointed out privately that checking for uniqueness in
the index would require cross-cluster communication.  I'm sure that's true,
but I'm also pretty sure that any user-level code solution is going to be
far less efficient.  I'd be happy to pay that ingestion time penalty for the
guarantee that indexed element values are unique.  At query time, such a
unique value index should perform like any other range index.
 
 ---
 Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212
 
 On Jun 4, 2014, at 6:59 PM, Whitby, Rob rob.whi...@springer.com wrote:
 
 How about something like this?
 
 declare function unique-uri() {
 let $uri := /doc/ || xdmp:random() || .xml
 return if (fn:not(fn:doc-available($uri))) then $uri else unique-uri()
 };
 
 I guess because indexes are distributed across forests, ensuring
uniqueness is not that easy?
 
 Rob
 
 From: general-boun...@developer.marklogic.com
[general-boun...@developer.marklogic.com] on behalf of Ron Hitchens
[r...@ronsoft.com]
 Sent: 04 June 2014 18:01
 To: MarkLogic Developer Discussion
 Subject: [MarkLogic Dev General] New Feature Request: Unique Value Range
Indexes
 
  I'm working on a project, one aspect of which requires minting unique
IDs and assuring that no two documents with the same ID wind up in the
database.  I know how

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-11 Thread Geert Josten

The general topic of generating unique id's is even a lot older. I like the
idea of the database being able to impose a uniqueness constraint on
anything stored in it. It is much more difficult to guarantee that code is
behaving correctly, then imposing such an assertion..

 

Interesting thought to use (range) indexes for that, hadn't heard that one
before!

 

Cheers,

Geert

 

Van: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] Namens Wayne Feick
Verzonden: donderdag 5 juni 2014 00:12
Aan: general@developer.marklogic.com
Onderwerp: Re: [MarkLogic Dev General] New Feature Request: Unique Value
Range Indexes

 

Fair points, Ron. We have RFE 2322 filed back in Feb 2012 to track this.
I'll add a note indicating your interest as well.

Wayne.



On 06/04/2014 03:00 PM, Ron Hitchens wrote:

 

Wayne, 

 

   Thanks for this.  It's a useful code pattern for this sort of thing and I
will probably use it for the specific requirement I have at the moment (I
was planning to do something similar anyway).

 

   But this code, or any user-level code, does not fully implement the
uniqueness guarantee I'd like to have and that I think a specialized range
index could easily provide.  This will work, but as you say it would be
necessary to always use this code convention.  It would not prevent creation
of duplicate values by code that doesn't follow the convention.  If
uniqueness were enforced by the index, then I could be confident that
uniqueness is absolutely guaranteed and I don't need to trust anyone
(including my future self) to always follow the same locking protocol.


---

Ron Hitchens {r...@overstory.co.uk mailto:r...@overstory.co.uk }  +44 7879
358212

 

On Jun 4, 2014, at 9:19 PM, Wayne Feick wayne.fe...@marklogic.com
mailto:wayne.fe...@marklogic.com  wrote:





The simplest is to have the document URI correspond to the element value,
and if you can use a random value it's good for concurrency.

If you can't do that, but you want to ensure only one document can have a
particular value for an element, I think it's pretty easy using
xdmp:lock-for-update() on an URI that corresponds to the element value. You
don't actually need to create a document at that URI, just use it to
serialize transactions. Here's one way to do it.

declare function lock-element-value($qn as xs:QName, $v as item)
{
  xdmp:lock-for-update(
 http://acme.com/ http://acme.com/;
|| xdmp:hash64(fn:namespace-uri-from-QName($qn))
|| /
|| xdmp:hash64(fn:localname-from-QName($qn)))
};

You'd then do something like the following.

let $lock := lock-element-value($qn, $v)
let $existing := cts:search(fn:collection(), cts:element-range-query($qn,
=, $v, unfiltered))
return
  if (fn:exists($existing))
  then ... do whatever you need to do with the existing document
  else ... create a new document, safe from a race with another transaction

You'd want to use lock-element-value() in any updates that could affect a
change in the element value (insert, update, delete). I think you could get
away with ignoring deletes since those would automatically serialize with
any transaction that would modify the existing document.

We use this sort of pattern internally to ensure uniqueness of IDs.

Wayne.



On 06/04/2014 12:49 PM, Whitby, Rob wrote:

I thought 2 simultaneous transactions would both get read locks on the uri,
then one would get a write lock and the other would fail and retry. Maybe
I'm missing something though.
 
But anyway, I agree unique indexes would be a handy feature. e.g. our docs
have a DOI element which *should* be unique but occasionally aren't, would
be nice to enforce that rather than have to code defensively.
 
Rob

From: general-boun...@developer.marklogic.com
mailto:general-boun...@developer.marklogic.com
[general-boun...@developer.marklogic.com
mailto:general-boun...@developer.marklogic.com ] on behalf of Ron Hitchens
[r...@ronsoft.com mailto:r...@ronsoft.com ]
Sent: 04 June 2014 19:31
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value Range
Indexes
 
Rob,
 
   I believe there is a race condition here.  A document may not exit as-of
the timestamp when this request starts running, but some other request could
create one while it's running.  This request would then over-write that
document.
 
   I'm actually more concerned about element values inside documents than
generating unique document URIs.  It's easy to generate document URIs with
64-bit random numbers that are very unlikely to collide.  But I want to
guarantee that some meaningful value inside a document is unique across all
documents.
 
   In my case, the naming space is actually quite small because I want the
IDs to be meaningful but unique.  For example images:cats:fluffy:XX.png,
where XX can increment or be set randomly until the ID is unique.  One way
to check for uniqueness is to make the document URI from this ID

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-04 Thread Ron Hitchens


Rob,

   I believe there is a race condition here.  A document may not exit as-of the 
timestamp when this request starts running, but some other request could create 
one while it's running.  This request would then over-write that document.

   I'm actually more concerned about element values inside documents than 
generating unique document URIs.  It's easy to generate document URIs with 
64-bit random numbers that are very unlikely to collide.  But I want to 
guarantee that some meaningful value inside a document is unique across all 
documents.

   In my case, the naming space is actually quite small because I want the IDs 
to be meaningful but unique.  For example images:cats:fluffy:XX.png, where XX 
can increment or be set randomly until the ID is unique.  One way to check for 
uniqueness is to make the document URI from this ID, then test for an existing 
document.

   But this doesn't solve the general problem.  I could conceivably have 
multiple elements in the document that I want to be unique.  To check for 
unique element values it's necessary to run a cts query against the element(s). 
 And I'm not sure if you can completely close the race window between checking 
for an existing instance and inserting a new one if the query comes back empty.

   Someone from ML pointed out privately that checking for uniqueness in the 
index would require cross-cluster communication.  I'm sure that's true, but I'm 
also pretty sure that any user-level code solution is going to be far less 
efficient.  I'd be happy to pay that ingestion time penalty for the guarantee 
that indexed element values are unique.  At query time, such a unique value 
index should perform like any other range index.

---
Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

On Jun 4, 2014, at 6:59 PM, Whitby, Rob rob.whi...@springer.com wrote:

 How about something like this?
 
 declare function unique-uri() {
  let $uri := /doc/ || xdmp:random() || .xml
  return if (fn:not(fn:doc-available($uri))) then $uri else unique-uri()
 };
 
 I guess because indexes are distributed across forests, ensuring uniqueness 
 is not that easy?
 
 Rob
 
 From: general-boun...@developer.marklogic.com 
 [general-boun...@developer.marklogic.com] on behalf of Ron Hitchens 
 [r...@ronsoft.com]
 Sent: 04 June 2014 18:01
 To: MarkLogic Developer Discussion
 Subject: [MarkLogic Dev General] New Feature Request: Unique Value Range  
   Indexes
 
   I'm working on a project, one aspect of which requires minting unique IDs 
 and assuring that no two documents with the same ID wind up in the database.  
 I know how to accomplish this using locks (I'm pretty sure) but any such 
 implementation is awkward and prone to subtle edge case errors, and can be 
 difficult to test.
 
   It seems to me that this is something that MarkLogic could do much more 
 reliably and quickly than any user-level code.  The thought that occurred to 
 me is a variation on range indexes which only allow a single instance of any 
 given value.
 
   Conventional range indexes work by creating term lists that look like this 
 (see Jason Hunter's ML Architecture paper), where each term list contains an 
 element (or attribute) value and a list of fragment IDs where that term 
 exists.
 
 aardvark | 23, 135, 469, 611
 ant  | 23, 469, 558, 611, 750
 baboon   | 53, 97, 469, 621
 etc...
 
   By making a range index like this but which only allows a single fragment 
 ID in the list, that would ensure that no two documents in the database 
 contain a given element with the same value.  That is, attempting to add a 
 second document with the same element or attribute value would cause an 
 exception.  And being a range index, it would provide a fast lexicon of all 
 the current unique values in the DB.
 
   Such an index would look something like this:
 
 abc3vk34 | 17
 bkx46lkd | 52
 bz1d34nm | 37
 etc...
 
   Usage could be something like this:
 
 declare function create-new-id-doc ($id-root as xs:string) as xs:string
 {
try {
let $id := $id-root || - || mylib:random-string(8)
let $uri := /idregistry/id- || $id
let $_ :=
xdmp:document-insert ($uri,
registered-id
id{ $id }/id
created{ fn:current-dateTime() }/created
/registered-id
 return $id
} catch (e) {
create-new-id-doc ($id-root)
}
 };
 
   This doesn't require that I write any (possibly buggy) mutual exclusion 
 code and I can be confident that once the xdmp:document-insert succeeds that 
 the ID is unique in the database and that the type (as configured for the 
 range index) is correct.
 
   Any love for Unique Value Range Indexes in the next version of MarkLogic?
 
 ---
 Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212
 
 ___
 General mailing list
 General@developer.marklogic.com

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-04 Thread John Snelson

Maybe you could consider using sem:uuid() in MarkLogic 7? You are much 
better off with a statistically unique ID than actually taking the time 
and massive concurrency reduction to check uniqueness.

John

On 04/06/2014 18:01, Ron Hitchens wrote:

 I'm working on a project, one aspect of which requires minting unique IDs 
 and assuring that no two documents with the same ID wind up in the database.  
 I know how to accomplish this using locks (I'm pretty sure) but any such 
 implementation is awkward and prone to subtle edge case errors, and can be 
 difficult to test.

 It seems to me that this is something that MarkLogic could do much more 
 reliably and quickly than any user-level code.  The thought that occurred to 
 me is a variation on range indexes which only allow a single instance of any 
 given value.

 Conventional range indexes work by creating term lists that look like 
 this (see Jason Hunter's ML Architecture paper), where each term list 
 contains an element (or attribute) value and a list of fragment IDs where 
 that term exists.

 aardvark | 23, 135, 469, 611
 ant  | 23, 469, 558, 611, 750
 baboon   | 53, 97, 469, 621
 etc...

 By making a range index like this but which only allows a single fragment 
 ID in the list, that would ensure that no two documents in the database 
 contain a given element with the same value.  That is, attempting to add a 
 second document with the same element or attribute value would cause an 
 exception.  And being a range index, it would provide a fast lexicon of all 
 the current unique values in the DB.

 Such an index would look something like this:

 abc3vk34 | 17
 bkx46lkd | 52
 bz1d34nm | 37
 etc...

 Usage could be something like this:

 declare function create-new-id-doc ($id-root as xs:string) as xs:string
 {
  try {
  let $id := $id-root || - || mylib:random-string(8)
  let $uri := /idregistry/id- || $id
  let $_ :=
  xdmp:document-insert ($uri,
  registered-id
  id{ $id }/id
  created{ fn:current-dateTime() }/created
  /registered-id
   return $id
  } catch (e) {
  create-new-id-doc ($id-root)
  }
 };

 This doesn't require that I write any (possibly buggy) mutual exclusion 
 code and I can be confident that once the xdmp:document-insert succeeds that 
 the ID is unique in the database and that the type (as configured for the 
 range index) is correct.

 Any love for Unique Value Range Indexes in the next version of MarkLogic?

 ---
 Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

 ___
 General mailing list
 General@developer.marklogic.com
 http://developer.marklogic.com/mailman/listinfo/general



-- 
John Snelson, Lead Engineerhttp://twitter.com/jpcs
MarkLogic Corporation http://www.marklogic.com
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-04 Thread John Snelson

On 04/06/2014 19:31, Ron Hitchens wrote:
 In my case, the naming space is actually quite small because I want the 
 IDs to be meaningful but unique.  For example images:cats:fluffy:XX.png, 
 where XX can increment or be set randomly until the ID is unique.

Make XX a random number. Or two or more random numbers - until the 
statistical likelihood of a collision is small enough that you don't 
care about checking uniqueness anymore.

John

-- 
John Snelson, Lead Engineerhttp://twitter.com/jpcs
MarkLogic Corporation http://www.marklogic.com
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-04 Thread Whitby, Rob

I thought 2 simultaneous transactions would both get read locks on the uri, 
then one would get a write lock and the other would fail and retry. Maybe I'm 
missing something though.

But anyway, I agree unique indexes would be a handy feature. e.g. our docs have 
a DOI element which *should* be unique but occasionally aren't, would be nice 
to enforce that rather than have to code defensively.

Rob

From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Ron Hitchens 
[r...@ronsoft.com]
Sent: 04 June 2014 19:31
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value Range
Indexes

Rob,

   I believe there is a race condition here.  A document may not exit as-of the 
timestamp when this request starts running, but some other request could create 
one while it's running.  This request would then over-write that document.

   I'm actually more concerned about element values inside documents than 
generating unique document URIs.  It's easy to generate document URIs with 
64-bit random numbers that are very unlikely to collide.  But I want to 
guarantee that some meaningful value inside a document is unique across all 
documents.

   In my case, the naming space is actually quite small because I want the IDs 
to be meaningful but unique.  For example images:cats:fluffy:XX.png, where XX 
can increment or be set randomly until the ID is unique.  One way to check for 
uniqueness is to make the document URI from this ID, then test for an existing 
document.

   But this doesn't solve the general problem.  I could conceivably have 
multiple elements in the document that I want to be unique.  To check for 
unique element values it's necessary to run a cts query against the element(s). 
 And I'm not sure if you can completely close the race window between checking 
for an existing instance and inserting a new one if the query comes back empty.

   Someone from ML pointed out privately that checking for uniqueness in the 
index would require cross-cluster communication.  I'm sure that's true, but I'm 
also pretty sure that any user-level code solution is going to be far less 
efficient.  I'd be happy to pay that ingestion time penalty for the guarantee 
that indexed element values are unique.  At query time, such a unique value 
index should perform like any other range index.

---
Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

On Jun 4, 2014, at 6:59 PM, Whitby, Rob rob.whi...@springer.com wrote:

 How about something like this?

 declare function unique-uri() {
  let $uri := /doc/ || xdmp:random() || .xml
  return if (fn:not(fn:doc-available($uri))) then $uri else unique-uri()
 };

 I guess because indexes are distributed across forests, ensuring uniqueness 
 is not that easy?

 Rob
 
 From: general-boun...@developer.marklogic.com 
 [general-boun...@developer.marklogic.com] on behalf of Ron Hitchens 
 [r...@ronsoft.com]
 Sent: 04 June 2014 18:01
 To: MarkLogic Developer Discussion
 Subject: [MarkLogic Dev General] New Feature Request: Unique Value Range  
   Indexes

   I'm working on a project, one aspect of which requires minting unique IDs 
 and assuring that no two documents with the same ID wind up in the database.  
 I know how to accomplish this using locks (I'm pretty sure) but any such 
 implementation is awkward and prone to subtle edge case errors, and can be 
 difficult to test.

   It seems to me that this is something that MarkLogic could do much more 
 reliably and quickly than any user-level code.  The thought that occurred to 
 me is a variation on range indexes which only allow a single instance of any 
 given value.

   Conventional range indexes work by creating term lists that look like this 
 (see Jason Hunter's ML Architecture paper), where each term list contains an 
 element (or attribute) value and a list of fragment IDs where that term 
 exists.

 aardvark | 23, 135, 469, 611
 ant  | 23, 469, 558, 611, 750
 baboon   | 53, 97, 469, 621
 etc...

   By making a range index like this but which only allows a single fragment 
 ID in the list, that would ensure that no two documents in the database 
 contain a given element with the same value.  That is, attempting to add a 
 second document with the same element or attribute value would cause an 
 exception.  And being a range index, it would provide a fast lexicon of all 
 the current unique values in the DB.

   Such an index would look something like this:

 abc3vk34 | 17
 bkx46lkd | 52
 bz1d34nm | 37
 etc...

   Usage could be something like this:

 declare function create-new-id-doc ($id-root as xs:string) as xs:string
 {
try {
let $id := $id-root || - || mylib:random-string(8)
let $uri := /idregistry/id- || $id
let $_ :=
xdmp:document-insert ($uri,
registered-id
id{ $id

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-04 Thread Wayne Feick

The simplest is to have the document URI correspond to the element 
value, and if you can use a random value it's good for concurrency.


If you can't do that, but you want to ensure only one document can have 
a particular value for an element, I think it's pretty easy using 
xdmp:lock-for-update() on an URI that corresponds to the element value. 
You don't actually need to create a document at that URI, just use it to 
serialize transactions. Here's one way to do it.


   declare function lock-element-value($qn as xs:QName, $v as item)
   {
  xdmp:lock-for-update(
http://acme.com/;
|| xdmp:hash64(fn:namespace-uri-from-QName($qn))
|| /
|| xdmp:hash64(fn:localname-from-QName($qn)))
   };

You'd then do something like the following.

   let $lock := lock-element-value($qn, $v)
   let $existing := cts:search(fn:collection(), cts:element-range-query($qn, =, $v, 
unfiltered))
   return
  if (fn:exists($existing))
  then ... do whatever you need to do with the existing document
  else ... create a new document, safe from a race with another transaction

You'd want to use lock-element-value() in any updates that could affect 
a change in the element value (insert, update, delete). I think you 
could get away with ignoring deletes since those would automatically 
serialize with any transaction that would modify the existing document.


We use this sort of pattern internally to ensure uniqueness of IDs.

Wayne.


On 06/04/2014 12:49 PM, Whitby, Rob wrote:

I thought 2 simultaneous transactions would both get read locks on the uri, 
then one would get a write lock and the other would fail and retry. Maybe I'm 
missing something though.

But anyway, I agree unique indexes would be a handy feature. e.g. our docs have 
a DOI element which *should* be unique but occasionally aren't, would be nice 
to enforce that rather than have to code defensively.

Rob

From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Ron Hitchens 
[r...@ronsoft.com]
Sent: 04 June 2014 19:31
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value Range
Indexes

Rob,

I believe there is a race condition here.  A document may not exit as-of 
the timestamp when this request starts running, but some other request could 
create one while it's running.  This request would then over-write that 
document.

I'm actually more concerned about element values inside documents than 
generating unique document URIs.  It's easy to generate document URIs with 
64-bit random numbers that are very unlikely to collide.  But I want to 
guarantee that some meaningful value inside a document is unique across all 
documents.

In my case, the naming space is actually quite small because I want the IDs to be 
meaningful but unique.  For example images:cats:fluffy:XX.png, where XX can 
increment or be set randomly until the ID is unique.  One way to check for uniqueness is 
to make the document URI from this ID, then test for an existing document.

But this doesn't solve the general problem.  I could conceivably have 
multiple elements in the document that I want to be unique.  To check for 
unique element values it's necessary to run a cts query against the element(s). 
 And I'm not sure if you can completely close the race window between checking 
for an existing instance and inserting a new one if the query comes back empty.

Someone from ML pointed out privately that checking for uniqueness in the 
index would require cross-cluster communication.  I'm sure that's true, but I'm 
also pretty sure that any user-level code solution is going to be far less 
efficient.  I'd be happy to pay that ingestion time penalty for the guarantee 
that indexed element values are unique.  At query time, such a unique value 
index should perform like any other range index.

---
Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

On Jun 4, 2014, at 6:59 PM, Whitby, Rob rob.whi...@springer.com wrote:


How about something like this?

declare function unique-uri() {
  let $uri := /doc/ || xdmp:random() || .xml
  return if (fn:not(fn:doc-available($uri))) then $uri else unique-uri()
};

I guess because indexes are distributed across forests, ensuring uniqueness is 
not that easy?

Rob

From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Ron Hitchens 
[r...@ronsoft.com]
Sent: 04 June 2014 18:01
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] New Feature Request: Unique Value Range
Indexes

   I'm working on a project, one aspect of which requires minting unique IDs 
and assuring that no two documents with the same ID wind up in the database.  I 
know how to accomplish this using locks (I'm pretty sure) but any such 
implementation is awkward and prone to subtle edge

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-04 Thread Analyze That | Johan van den Brink

Hi guys,

How can I unsubscribe from this mailing list?


Met vriendelijke groet,


Johan van den Brink
Consultant
Analyze That -  Analytics | Data Integration | Reporting | Process Mining
Kerkewijk 8
3901 EG Veenendaal
T: (06) 49 92 30 30
T: (0318) 52 55 87
M: jo...@analyzethat.nl
W: www.analyzethat.nl
L: http://nl.linkedin.com/in/brinkjohanvanden

-Original Message-
From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Whitby, Rob
Sent: woensdag 4 juni 2014 19:59
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value Range 
Indexes

How about something like this?

declare function unique-uri() {
  let $uri := /doc/ || xdmp:random() || .xml
  return if (fn:not(fn:doc-available($uri))) then $uri else unique-uri() };

I guess because indexes are distributed across forests, ensuring uniqueness is 
not that easy?

Rob

From: general-boun...@developer.marklogic.com 
[general-boun...@developer.marklogic.com] on behalf of Ron Hitchens 
[r...@ronsoft.com]
Sent: 04 June 2014 18:01
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] New Feature Request: Unique Value Range
Indexes

   I'm working on a project, one aspect of which requires minting unique IDs 
and assuring that no two documents with the same ID wind up in the database.  I 
know how to accomplish this using locks (I'm pretty sure) but any such 
implementation is awkward and prone to subtle edge case errors, and can be 
difficult to test.

   It seems to me that this is something that MarkLogic could do much more 
reliably and quickly than any user-level code.  The thought that occurred to me 
is a variation on range indexes which only allow a single instance of any given 
value.

   Conventional range indexes work by creating term lists that look like this 
(see Jason Hunter's ML Architecture paper), where each term list contains an 
element (or attribute) value and a list of fragment IDs where that term exists.

aardvark | 23, 135, 469, 611
ant  | 23, 469, 558, 611, 750
baboon   | 53, 97, 469, 621
etc...

   By making a range index like this but which only allows a single fragment ID 
in the list, that would ensure that no two documents in the database contain a 
given element with the same value.  That is, attempting to add a second 
document with the same element or attribute value would cause an exception.  
And being a range index, it would provide a fast lexicon of all the current 
unique values in the DB.

   Such an index would look something like this:

abc3vk34 | 17
bkx46lkd | 52
bz1d34nm | 37
etc...

   Usage could be something like this:

declare function create-new-id-doc ($id-root as xs:string) as xs:string {
try {
let $id := $id-root || - || mylib:random-string(8)
let $uri := /idregistry/id- || $id
let $_ :=
xdmp:document-insert ($uri,
registered-id
id{ $id }/id
created{ fn:current-dateTime() }/created
/registered-id
 return $id
} catch (e) {
create-new-id-doc ($id-root)
}
};

   This doesn't require that I write any (possibly buggy) mutual exclusion code 
and I can be confident that once the xdmp:document-insert succeeds that the ID 
is unique in the database and that the type (as configured for the range index) 
is correct.

   Any love for Unique Value Range Indexes in the next version of MarkLogic?

---
Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-04 Thread David Ennis

HI.

I believe you can do that here:

http://developer.marklogic.com/mailman/listinfo/general

Kind Regards,
David Ennis


On 4 June 2014 23:09, Analyze That | Johan van den Brink 
jo...@analyzethat.nl wrote:

 Hi guys,

 How can I unsubscribe from this mailing list?


 Met vriendelijke groet,


 Johan van den Brink
 Consultant
 Analyze That -  Analytics | Data Integration | Reporting | Process Mining
 Kerkewijk 8
 3901 EG Veenendaal
 T: (06) 49 92 30 30
 T: (0318) 52 55 87
 M: jo...@analyzethat.nl
 W: www.analyzethat.nl
 L: http://nl.linkedin.com/in/brinkjohanvanden

 -Original Message-
 From: general-boun...@developer.marklogic.com [mailto:
 general-boun...@developer.marklogic.com] On Behalf Of Whitby, Rob
 Sent: woensdag 4 juni 2014 19:59
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value
 Range Indexes

 How about something like this?

 declare function unique-uri() {
   let $uri := /doc/ || xdmp:random() || .xml
   return if (fn:not(fn:doc-available($uri))) then $uri else unique-uri() };

 I guess because indexes are distributed across forests, ensuring
 uniqueness is not that easy?

 Rob
 
 From: general-boun...@developer.marklogic.com [
 general-boun...@developer.marklogic.com] on behalf of Ron Hitchens [
 r...@ronsoft.com]
 Sent: 04 June 2014 18:01
 To: MarkLogic Developer Discussion
 Subject: [MarkLogic Dev General] New Feature Request: Unique Value Range
  Indexes

I'm working on a project, one aspect of which requires minting unique
 IDs and assuring that no two documents with the same ID wind up in the
 database.  I know how to accomplish this using locks (I'm pretty sure) but
 any such implementation is awkward and prone to subtle edge case errors,
 and can be difficult to test.

It seems to me that this is something that MarkLogic could do much more
 reliably and quickly than any user-level code.  The thought that occurred
 to me is a variation on range indexes which only allow a single instance of
 any given value.

Conventional range indexes work by creating term lists that look like
 this (see Jason Hunter's ML Architecture paper), where each term list
 contains an element (or attribute) value and a list of fragment IDs where
 that term exists.

 aardvark | 23, 135, 469, 611
 ant  | 23, 469, 558, 611, 750
 baboon   | 53, 97, 469, 621
 etc...

By making a range index like this but which only allows a single
 fragment ID in the list, that would ensure that no two documents in the
 database contain a given element with the same value.  That is, attempting
 to add a second document with the same element or attribute value would
 cause an exception.  And being a range index, it would provide a fast
 lexicon of all the current unique values in the DB.

Such an index would look something like this:

 abc3vk34 | 17
 bkx46lkd | 52
 bz1d34nm | 37
 etc...

Usage could be something like this:

 declare function create-new-id-doc ($id-root as xs:string) as xs:string {
 try {
 let $id := $id-root || - || mylib:random-string(8)
 let $uri := /idregistry/id- || $id
 let $_ :=
 xdmp:document-insert ($uri,
 registered-id
 id{ $id }/id
 created{ fn:current-dateTime() }/created
 /registered-id
  return $id
 } catch (e) {
 create-new-id-doc ($id-root)
 }
 };

This doesn't require that I write any (possibly buggy) mutual exclusion
 code and I can be confident that once the xdmp:document-insert succeeds
 that the ID is unique in the database and that the type (as configured for
 the range index) is correct.

Any love for Unique Value Range Indexes in the next version of
 MarkLogic?

 ---
 Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

 ___
 General mailing list
 General@developer.marklogic.com
 http://developer.marklogic.com/mailman/listinfo/general
 ___
 General mailing list
 General@developer.marklogic.com
 http://developer.marklogic.com/mailman/listinfo/general
 ___
 General mailing list
 General@developer.marklogic.com
 http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-04 Thread Analyze That | Johan van den Brink

Thanks David.


Met vriendelijke groet,


Johan van den Brink
Consultant
Analyze That -  Analytics | Data Integration | Reporting | Process Mining
Kerkewijk 8
3901 EG Veenendaal
T: (06) 49 92 30 30
T: (0318) 52 55 87
M: jo...@analyzethat.nlmailto:jo...@analyzethat.nl
W: www.analyzethat.nlhttp://www.analyzethat.nl/
L: http://nl.linkedin.com/in/brinkjohanvanden

From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of David Ennis
Sent: woensdag 4 juni 2014 23:32
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value Range 
Indexes

HI.

I believe you can do that here:

http://developer.marklogic.com/mailman/listinfo/general

Kind Regards,
David Ennis

On 4 June 2014 23:09, Analyze That | Johan van den Brink 
jo...@analyzethat.nlmailto:jo...@analyzethat.nl wrote:
Hi guys,

How can I unsubscribe from this mailing list?


Met vriendelijke groet,


Johan van den Brink
Consultant
Analyze That -  Analytics | Data Integration | Reporting | Process Mining
Kerkewijk 8
3901 EG Veenendaal
T: (06) 49 92 30 30
T: (0318) 52 55 87
M: jo...@analyzethat.nlmailto:jo...@analyzethat.nl
W: www.analyzethat.nlhttp://www.analyzethat.nl
L: http://nl.linkedin.com/in/brinkjohanvanden

-Original Message-
From: 
general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com
 
[mailto:general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com]
 On Behalf Of Whitby, Rob
Sent: woensdag 4 juni 2014 19:59
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value Range 
Indexes

How about something like this?

declare function unique-uri() {
  let $uri := /doc/ || xdmp:random() || .xml
  return if (fn:not(fn:doc-available($uri))) then $uri else unique-uri() };

I guess because indexes are distributed across forests, ensuring uniqueness is 
not that easy?

Rob

From: 
general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com
 
[general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com]
 on behalf of Ron Hitchens [r...@ronsoft.commailto:r...@ronsoft.com]
Sent: 04 June 2014 18:01
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] New Feature Request: Unique Value Range
Indexes

   I'm working on a project, one aspect of which requires minting unique IDs 
and assuring that no two documents with the same ID wind up in the database.  I 
know how to accomplish this using locks (I'm pretty sure) but any such 
implementation is awkward and prone to subtle edge case errors, and can be 
difficult to test.

   It seems to me that this is something that MarkLogic could do much more 
reliably and quickly than any user-level code.  The thought that occurred to me 
is a variation on range indexes which only allow a single instance of any given 
value.

   Conventional range indexes work by creating term lists that look like this 
(see Jason Hunter's ML Architecture paper), where each term list contains an 
element (or attribute) value and a list of fragment IDs where that term exists.

aardvark | 23, 135, 469, 611
ant  | 23, 469, 558, 611, 750
baboon   | 53, 97, 469, 621
etc...

   By making a range index like this but which only allows a single fragment ID 
in the list, that would ensure that no two documents in the database contain a 
given element with the same value.  That is, attempting to add a second 
document with the same element or attribute value would cause an exception.  
And being a range index, it would provide a fast lexicon of all the current 
unique values in the DB.

   Such an index would look something like this:

abc3vk34 | 17
bkx46lkd | 52
bz1d34nm | 37
etc...

   Usage could be something like this:

declare function create-new-id-doc ($id-root as xs:string) as xs:string {
try {
let $id := $id-root || - || mylib:random-string(8)
let $uri := /idregistry/id- || $id
let $_ :=
xdmp:document-insert ($uri,
registered-id
id{ $id }/id
created{ fn:current-dateTime() }/created
/registered-id
 return $id
} catch (e) {
create-new-id-doc ($id-root)
}
};

   This doesn't require that I write any (possibly buggy) mutual exclusion code 
and I can be confident that once the xdmp:document-insert succeeds that the ID 
is unique in the database and that the type (as configured for the range index) 
is correct.

   Any love for Unique Value Range Indexes in the next version of MarkLogic?

---
Ron Hitchens {r...@overstory.co.ukmailto:r...@overstory.co.uk}  +44 7879 
358212tel:%2B44%207879%20358212

___
General mailing list
General@developer.marklogic.commailto:General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-04 Thread Ron Hitchens


Wayne,

   Thanks for this.  It's a useful code pattern for this sort of thing and I 
will probably use it for the specific requirement I have at the moment (I was 
planning to do something similar anyway).

   But this code, or any user-level code, does not fully implement the 
uniqueness guarantee I'd like to have and that I think a specialized range 
index could easily provide.  This will work, but as you say it would be 
necessary to always use this code convention.  It would not prevent creation of 
duplicate values by code that doesn't follow the convention.  If uniqueness 
were enforced by the index, then I could be confident that uniqueness is 
absolutely guaranteed and I don't need to trust anyone (including my future 
self) to always follow the same locking protocol.

---
Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

On Jun 4, 2014, at 9:19 PM, Wayne Feick wayne.fe...@marklogic.com wrote:

 The simplest is to have the document URI correspond to the element value, and 
 if you can use a random value it's good for concurrency.
 
 If you can't do that, but you want to ensure only one document can have a 
 particular value for an element, I think it's pretty easy using 
 xdmp:lock-for-update() on an URI that corresponds to the element value. You 
 don't actually need to create a document at that URI, just use it to 
 serialize transactions. Here's one way to do it.
 declare function lock-element-value($qn as xs:QName, $v as item)
 {
   xdmp:lock-for-update(
 http://acme.com/;
 || xdmp:hash64(fn:namespace-uri-from-QName($qn))
 || /
 || xdmp:hash64(fn:localname-from-QName($qn)))
 };
 You'd then do something like the following.
 let $lock := lock-element-value($qn, $v)
 let $existing := cts:search(fn:collection(), cts:element-range-query($qn, 
 =, $v, unfiltered))
 return
   if (fn:exists($existing))
   then ... do whatever you need to do with the existing document
   else ... create a new document, safe from a race with another transaction
 You'd want to use lock-element-value() in any updates that could affect a 
 change in the element value (insert, update, delete). I think you could get 
 away with ignoring deletes since those would automatically serialize with any 
 transaction that would modify the existing document.
 
 We use this sort of pattern internally to ensure uniqueness of IDs.
 
 Wayne.
 
 
 On 06/04/2014 12:49 PM, Whitby, Rob wrote:
 I thought 2 simultaneous transactions would both get read locks on the uri, 
 then one would get a write lock and the other would fail and retry. Maybe 
 I'm missing something though.
 
 But anyway, I agree unique indexes would be a handy feature. e.g. our docs 
 have a DOI element which *should* be unique but occasionally aren't, would 
 be nice to enforce that rather than have to code defensively.
 
 Rob
 
 From: general-boun...@developer.marklogic.com 
 [general-boun...@developer.marklogic.com] on behalf of Ron Hitchens 
 [r...@ronsoft.com]
 Sent: 04 June 2014 19:31
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value Range 
Indexes
 
 Rob,
 
I believe there is a race condition here.  A document may not exit as-of 
 the timestamp when this request starts running, but some other request could 
 create one while it's running.  This request would then over-write that 
 document.
 
I'm actually more concerned about element values inside documents than 
 generating unique document URIs.  It's easy to generate document URIs with 
 64-bit random numbers that are very unlikely to collide.  But I want to 
 guarantee that some meaningful value inside a document is unique across all 
 documents.
 
In my case, the naming space is actually quite small because I want the 
 IDs to be meaningful but unique.  For example images:cats:fluffy:XX.png, 
 where XX can increment or be set randomly until the ID is unique.  One way 
 to check for uniqueness is to make the document URI from this ID, then test 
 for an existing document.
 
But this doesn't solve the general problem.  I could conceivably have 
 multiple elements in the document that I want to be unique.  To check for 
 unique element values it's necessary to run a cts query against the 
 element(s).  And I'm not sure if you can completely close the race window 
 between checking for an existing instance and inserting a new one if the 
 query comes back empty.
 
Someone from ML pointed out privately that checking for uniqueness in the 
 index would require cross-cluster communication.  I'm sure that's true, but 
 I'm also pretty sure that any user-level code solution is going to be far 
 less efficient.  I'd be happy to pay that ingestion time penalty for the 
 guarantee that indexed element values are unique.  At query time, such a 
 unique value index should perform like any other range index.
 
 ---
 Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212
 
 On Jun 4, 2014

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-04 Thread Wayne Feick

Fair points, Ron. We have RFE 2322 filed back in Feb 2012 to track this. 
I'll add a note indicating your interest as well.


Wayne.


On 06/04/2014 03:00 PM, Ron Hitchens wrote:


Wayne,

   Thanks for this.  It's a useful code pattern for this sort of thing 
and I will probably use it for the specific requirement I have at the 
moment (I was planning to do something similar anyway).


   But this code, or any user-level code, does not fully implement the 
uniqueness guarantee I'd like to have and that I think a specialized 
range index could easily provide.  This will work, but as you say it 
would be necessary to always use this code convention.  It would not 
prevent creation of duplicate values by code that doesn't follow the 
convention.  If uniqueness were enforced by the index, then I could be 
confident that uniqueness is absolutely guaranteed and I don't need to 
trust anyone (including my future self) to always follow the same 
locking protocol.


---
Ron Hitchens {r...@overstory.co.uk mailto:r...@overstory.co.uk} +44 
7879 358212


On Jun 4, 2014, at 9:19 PM, Wayne Feick wayne.fe...@marklogic.com 
mailto:wayne.fe...@marklogic.com wrote:


The simplest is to have the document URI correspond to the element 
value, and if you can use a random value it's good for concurrency.


If you can't do that, but you want to ensure only one document can 
have a particular value for an element, I think it's pretty easy 
using xdmp:lock-for-update() on an URI that corresponds to the 
element value. You don't actually need to create a document at that 
URI, just use it to serialize transactions. Here's one way to do it.


declare function lock-element-value($qn as xs:QName, $v as item)
{
   xdmp:lock-for-update(
 http://acme.com/;
 || xdmp:hash64(fn:namespace-uri-from-QName($qn))
 || /
 || xdmp:hash64(fn:localname-from-QName($qn)))
};

You'd then do something like the following.

let $lock := lock-element-value($qn, $v)
let $existing := cts:search(fn:collection(), cts:element-range-query($qn, =, $v, 
unfiltered))
return
   if (fn:exists($existing))
   then ... do whatever you need to do with the existing document
   else ... create a new document, safe from a race with another transaction

You'd want to use lock-element-value() in any updates that could 
affect a change in the element value (insert, update, delete). I 
think you could get away with ignoring deletes since those would 
automatically serialize with any transaction that would modify the 
existing document.


We use this sort of pattern internally to ensure uniqueness of IDs.

Wayne.


On 06/04/2014 12:49 PM, Whitby, Rob wrote:

I thought 2 simultaneous transactions would both get read locks on the uri, 
then one would get a write lock and the other would fail and retry. Maybe I'm 
missing something though.

But anyway, I agree unique indexes would be a handy feature. e.g. our docs have 
a DOI element which *should* be unique but occasionally aren't, would be nice 
to enforce that rather than have to code defensively.

Rob

From:general-boun...@developer.marklogic.com  
[general-boun...@developer.marklogic.com] on behalf of Ron Hitchens 
[r...@ronsoft.com]
Sent: 04 June 2014 19:31
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value Range
Indexes

Rob,

I believe there is a race condition here.  A document may not exit as-of 
the timestamp when this request starts running, but some other request could 
create one while it's running.  This request would then over-write that 
document.

I'm actually more concerned about element values inside documents than 
generating unique document URIs.  It's easy to generate document URIs with 
64-bit random numbers that are very unlikely to collide.  But I want to 
guarantee that some meaningful value inside a document is unique across all 
documents.

In my case, the naming space is actually quite small because I want the IDs to be 
meaningful but unique.  For example images:cats:fluffy:XX.png, where XX can 
increment or be set randomly until the ID is unique.  One way to check for uniqueness is 
to make the document URI from this ID, then test for an existing document.

But this doesn't solve the general problem.  I could conceivably have 
multiple elements in the document that I want to be unique.  To check for 
unique element values it's necessary to run a cts query against the element(s). 
 And I'm not sure if you can completely close the race window between checking 
for an existing instance and inserting a new one if the query comes back empty.

Someone from ML pointed out privately that checking for uniqueness in the 
index would require cross-cluster communication.  I'm sure that's true, but I'm 
also pretty sure that any user-level code solution is going to be far less 
efficient.  I'd be happy to pay that ingestion

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

2014-06-04 Thread Ron Hitchens


   Thanks Wayne.

---
Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

On Jun 4, 2014, at 11:12 PM, Wayne Feick wayne.fe...@marklogic.com wrote:

 Fair points, Ron. We have RFE 2322 filed back in Feb 2012 to track this. I'll 
 add a note indicating your interest as well.
 
 Wayne.
 
 
 On 06/04/2014 03:00 PM, Ron Hitchens wrote:
 
 Wayne,
 
Thanks for this.  It's a useful code pattern for this sort of thing and I 
 will probably use it for the specific requirement I have at the moment (I 
 was planning to do something similar anyway).
 
But this code, or any user-level code, does not fully implement the 
 uniqueness guarantee I'd like to have and that I think a specialized range 
 index could easily provide.  This will work, but as you say it would be 
 necessary to always use this code convention.  It would not prevent creation 
 of duplicate values by code that doesn't follow the convention.  If 
 uniqueness were enforced by the index, then I could be confident that 
 uniqueness is absolutely guaranteed and I don't need to trust anyone 
 (including my future self) to always follow the same locking protocol.
 
 ---
 Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212
 
 On Jun 4, 2014, at 9:19 PM, Wayne Feick wayne.fe...@marklogic.com wrote:
 
 The simplest is to have the document URI correspond to the element value, 
 and if you can use a random value it's good for concurrency.
 
 If you can't do that, but you want to ensure only one document can have a 
 particular value for an element, I think it's pretty easy using 
 xdmp:lock-for-update() on an URI that corresponds to the element value. You 
 don't actually need to create a document at that URI, just use it to 
 serialize transactions. Here's one way to do it.
 declare function lock-element-value($qn as xs:QName, $v as item)
 {
   xdmp:lock-for-update(
 http://acme.com/;
 || xdmp:hash64(fn:namespace-uri-from-QName($qn))
 || /
 || xdmp:hash64(fn:localname-from-QName($qn)))
 };
 You'd then do something like the following.
 let $lock := lock-element-value($qn, $v)
 let $existing := cts:search(fn:collection(), cts:element-range-query($qn, 
 =, $v, unfiltered))
 return
   if (fn:exists($existing))
   then ... do whatever you need to do with the existing document
   else ... create a new document, safe from a race with another transaction
 You'd want to use lock-element-value() in any updates that could affect a 
 change in the element value (insert, update, delete). I think you could get 
 away with ignoring deletes since those would automatically serialize with 
 any transaction that would modify the existing document.
 
 We use this sort of pattern internally to ensure uniqueness of IDs.
 
 Wayne.
 
 
 On 06/04/2014 12:49 PM, Whitby, Rob wrote:
 I thought 2 simultaneous transactions would both get read locks on the 
 uri, then one would get a write lock and the other would fail and retry. 
 Maybe I'm missing something though.
 
 But anyway, I agree unique indexes would be a handy feature. e.g. our docs 
 have a DOI element which *should* be unique but occasionally aren't, would 
 be nice to enforce that rather than have to code defensively.
 
 Rob
 
 From: general-boun...@developer.marklogic.com 
 [general-boun...@developer.marklogic.com] on behalf of Ron Hitchens 
 [r...@ronsoft.com]
 Sent: 04 June 2014 19:31
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] New Feature Request: Unique Value 
 RangeIndexes
 
 Rob,
 
I believe there is a race condition here.  A document may not exit 
 as-of the timestamp when this request starts running, but some other 
 request could create one while it's running.  This request would then 
 over-write that document.
 
I'm actually more concerned about element values inside documents than 
 generating unique document URIs.  It's easy to generate document URIs with 
 64-bit random numbers that are very unlikely to collide.  But I want to 
 guarantee that some meaningful value inside a document is unique across 
 all documents.
 
In my case, the naming space is actually quite small because I want the 
 IDs to be meaningful but unique.  For example images:cats:fluffy:XX.png, 
 where XX can increment or be set randomly until the ID is unique.  One way 
 to check for uniqueness is to make the document URI from this ID, then 
 test for an existing document.
 
But this doesn't solve the general problem.  I could conceivably have 
 multiple elements in the document that I want to be unique.  To check for 
 unique element values it's necessary to run a cts query against the 
 element(s).  And I'm not sure if you can completely close the race window 
 between checking for an existing instance and inserting a new one if the 
 query comes back empty.
 
Someone from ML pointed out privately that checking for uniqueness in 
 the index would require cross-cluster communication.  I'm sure that's 
 true, but I'm also

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

Re: [MarkLogic Dev General] New Feature Request: Unique Value Range Indexes

13 matches

Site Navigation

Mail list logo

Footer information