Hello devs

After CASSANDRA-7423 (Cassandra 3.6), it is possible to declare un-frozen
UDT at 1st level and more important/interesting, it is possible to update
atomically individual fields on an UDT (without the need to rewrite the UDT
entirely)

This JIRA opens tremendous new opportunity in term of data modeling. It is
sensible to store hierarchical data (think JSON/document) using UDT and
collections.

However, update of individual fields at level 2 and deeper is still not
supported (UDT must be frozen at level 2 and onward)

This limitation makes the update of individual fields at deeper level a
nightmare.

Let's take a contrive example:

CREATE TYPE phone_number (
    international_prefix text,
    local_prefix text,
    suffix text,
    type text // HOME, WORK, MOBILE, LANDLINE ...
);

CREATE TYPE contact (
    firstname text,
    lastname text,
    phone phone_number,
    email text,
    ...
);

CREATE TABLE user_contacts(
   user_id uuid,
   contact_id uuid,
   contact_details contact,
   PRIMARY KEY ((user_id), contact_id)
);

Now, to update a contact phone_number, one has to:

- perform a SELECT contact_details.phone_number FROM user_contacts WHERE
user_id=.. AND contact_id = ...
- perform an UPDATE user_contacts SET contact_details.phone_number = ...
WHERE user_id=.. AND contact_id = ...

Not only this update procedure is bad (but mandatory) because it implies a
read-before-write anti-pattern but it also exposes the end-user to
horrendous concurrency issues.

If there are 2 concurrent updates on the same phone_number nested UDT, one
changing the type field and the other changing the suffix value, we will
face data loss:
  - either the 1st update wins (in term of LWW) and the type is updated but
not the suffix
  - or the 2nd update wins and the suffix is updated but not the type

Ideally we would like to have both fields type & suffix updated. The only
fix currently available is to rely on LWT and using a version column as
optimistic concurrency locking:

UPDATE user_contacts SET contact_details.phone_number = ..., version=2
WHERE user_id=.. AND contact_id = ... IF version=1

This guarantees that only 1 concurrent update succeeds at a time and force
the failing updates re-fetching the fresh data and retry

Of course, LWT has a huge cost and is overkill to solve such a problem.

Thus my question are:

- is there any plan to extend the UDT field updates to deeper level
- is it complicated to do so ? I'm tempted to cast a glance and attempt a
patch but I would like to know if it is going a gigantic task or not

Thanks

Regards

Duy Hai DOAN

Reply via email to