[ 
https://issues.apache.org/jira/browse/CASSANDRA-11258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252011#comment-15252011
 ] 

Marcus Olsson commented on CASSANDRA-11258:
-------------------------------------------

bq. From the code it seems that when an LWT insert timeouts, the CasLockFactory 
assumes the lock was not acquired, but maybe the operation succeeded and there 
was a timeout, so we will not be able to re-acquire the lock before it expires. 
So we should perform a read at SERIAL level in this situation to make sure any 
previous in-progress operations are committed and we get the most recent value.
Good catch, I'll add that. 

bq. Is the sufficientNodesForLocking check necessary?
It is mostly to avoid trying to do CAS operations that we know will fail, 
however that check would be done later down in StorageProxy, so it might be 
redundant.

bq. I noticed that we are doing non-LWT reads at ONE, but we should use QUORUM 
instead and that check will be automatically done when reading or writing.
I'll change that.

bq. I think we should adjust our nomenclature and mindset from distributed 
locks to expiring leases, since this is what we are doing rather than 
distributed locking. If you agree, can you rename classes to reflect that?
I agree, leases seems to be a more reasonable term for it.

{quote}
When renewing the lease we should also insert the current lease holder priority 
into the resource_lock_priority table, otherwise other nodes might try to 
acquire the lease while it's being hold (the operation will fail, but the load 
on the system will be higher due to LWT).

We should also probably let lease holders renew leases explicitly rather than 
auto-renewing leases at the lease service, so for example the job scheduler can 
abort the job if it cannot renew the lease. For that matter, we should probably 
extend the DistributedLease interface with methods to renew the lease and/or 
check if it's still valid (perhaps we should have a look at the JINI lease spec 
for inspiration, although it looks a bit verbose).
{quote}
I've taken a look at the JINI lease spec and I think there are some parts of it 
that we wouldn't need, for instance {{setSerialFormat()}} and {{canBatch()}}. 
But the interface could perhaps look like this instead:
{code}
interface Lease {
 long getExpiration();
 void renew(long duration) throws LeaseException;
 void cancel(); throws LeaseException;
 boolean valid();
}

interface LeaseGrantor { // Or LeaseFactory
 Lease newLease(long duration, String resource, int priority, Map<String, 
String> metadata); throws LeaseException
}
{code}
I think the {{LeaseMap}}(mentioned in the JINI lease spec) or a similar 
interface will be useful for locking multiple data centers. Maybe it's enough 
to create some kind of {{LeaseCollection}} that bundles the leases together and 
performs renew()/cancel() on all underlying leases?

--
I'll also change the keyspace name to {{system_leases}} and the tables to 
{{resource_lease}} and {{resource_lease_priority}}.

> Repair scheduling - Resource locking API
> ----------------------------------------
>
>                 Key: CASSANDRA-11258
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11258
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Marcus Olsson
>            Assignee: Marcus Olsson
>            Priority: Minor
>
> Create a resource locking API & implementation that is able to lock a 
> resource in a specified data center. It should handle priorities to avoid 
> node starvation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to