Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-24 Thread horschi
Oh yes it is, like Couters :-)


On Sat, Dec 24, 2016 at 4:02 AM, Edward Capriolo 
wrote:

> Anecdotal CAS works differently than the typical cassandra workload. If
> you run a stress instance 3 nodes one host, you find that you typically run
> into CPU issues, but if you are doing a CAS workload you see things timing
> out and before you hit 100% CPU. It is a strange beast.
>
> On Fri, Dec 23, 2016 at 7:28 AM, horschi  wrote:
>
>> Update: I replace all quorum reads on that table with serial reads, and
>> now these errors got less. Somehow quorum reads on CAS values cause most of
>> these WTEs.
>>
>> Also I found two tickets on that topic:
>> https://issues.apache.org/jira/browse/CASSANDRA-9328
>> https://issues.apache.org/jira/browse/CASSANDRA-8672
>>
>> On Thu, Dec 15, 2016 at 3:14 PM, horschi  wrote:
>>
>>> Hi,
>>>
>>> I would like to warm up this old thread. I did some debugging and found
>>> out that the timeouts are coming from StorageProxy.proposePaxos()
>>> - callback.isFullyRefused() returns false and therefore triggers a
>>> WriteTimeout.
>>>
>>> Looking at my ccm cluster logs, I can see that two replica nodes return
>>> different results in their ProposeVerbHandler. In my opinion the
>>> coordinator should not throw a Exception in such a case, but instead retry
>>> the operation.
>>>
>>> What do the CAS/Paxos experts on this list say to this? Feel free to
>>> instruct me to do further tests/code changes. I'd be glad to help.
>>>
>>> Log:
>>>
>>> node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15
>>> 14:48:36,896 PaxosState.java:124 - Rejecting proposal for
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node1/logs/system.log-Row: id=@ | value=) because
>>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>>> --
>>> node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15
>>> 14:48:36,980 StorageProxy.java:506 - proposePaxos:
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node1/logs/system.log-Row: id=@ | value=)//1//0
>>> --
>>> node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15
>>> 14:48:36,969 PaxosState.java:117 - Accepting proposal:
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node2/logs/system.log-Row: id=@ | value=)
>>> --
>>> node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15
>>> 14:48:36,897 PaxosState.java:124 - Rejecting proposal for
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node3/logs/system.log-Row: id=@ | value=) because
>>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>>>
>>>
>>> kind regards,
>>> Christian
>>>
>>>
>>> On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers  wrote:
>>>
 My thinking was that due to the size of the data that there maybe I/O
 issues. But it sounds more like you're competing for locks and hit a
 deadlock issue.

 Regards,
 Denise
 Cell - (860)989-3431 <(860)%20989-3431>

 Sent from mi iPhone

 On Apr 15, 2016, at 9:00 AM, horschi  wrote:

 Hi Denise,

 in my case its a small blob I am writing (should be around 100 bytes):

  CREATE TABLE "Lock" (
  lockname varchar,
  id varchar,
  value blob,
  PRIMARY KEY (lockname, id)
  ) WITH COMPACT STORAGE
  AND COMPRESSION = { 'sstable_compression' :
 'SnappyCompressor', 'chunk_length_kb' : '8' };

 You ask because large values are known to cause issues? Anything
 special you have in mind?

 kind regards,
 Christian




 On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers 
 wrote:

> Also, what type of data were you reading/writing?
>
> Regards,
> Denise
>
> Sent from mi iPad
>
> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>
> Hi Jan,
>
> were you able to resolve your Problem?
>
> We are trying the same and also see a lot of WriteTimeouts:
> WriteTimeoutException: Cassandra timeout during write query at
> consistency SERIAL (2 replica were required but only 1 acknowledged the
> write)
>
> How many clients were competing for a lock in your case? In our case
> its only two :-(
>
> cheers,
> Christian
>
>
> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
> wrote:
>
>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
>> jan.algermis...@nordsc.com> wrote:
>>
>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0
>>> snapshot) for implementing distributed locks.
>>>
>>
>> [ and I'm experiencing the problem described in the subject ... ]
>>
>>
>>

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-23 Thread Edward Capriolo
Anecdotal CAS works differently than the typical cassandra workload. If you
run a stress instance 3 nodes one host, you find that you typically run
into CPU issues, but if you are doing a CAS workload you see things timing
out and before you hit 100% CPU. It is a strange beast.

On Fri, Dec 23, 2016 at 7:28 AM, horschi  wrote:

> Update: I replace all quorum reads on that table with serial reads, and
> now these errors got less. Somehow quorum reads on CAS values cause most of
> these WTEs.
>
> Also I found two tickets on that topic:
> https://issues.apache.org/jira/browse/CASSANDRA-9328
> https://issues.apache.org/jira/browse/CASSANDRA-8672
>
> On Thu, Dec 15, 2016 at 3:14 PM, horschi  wrote:
>
>> Hi,
>>
>> I would like to warm up this old thread. I did some debugging and found
>> out that the timeouts are coming from StorageProxy.proposePaxos()
>> - callback.isFullyRefused() returns false and therefore triggers a
>> WriteTimeout.
>>
>> Looking at my ccm cluster logs, I can see that two replica nodes return
>> different results in their ProposeVerbHandler. In my opinion the
>> coordinator should not throw a Exception in such a case, but instead retry
>> the operation.
>>
>> What do the CAS/Paxos experts on this list say to this? Feel free to
>> instruct me to do further tests/code changes. I'd be glad to help.
>>
>> Log:
>>
>> node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15 14:48:36,896
>> PaxosState.java:124 - Rejecting proposal for 
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> node1/logs/system.log-Row: id=@ | value=) because
>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> --
>> node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15
>> 14:48:36,980 StorageProxy.java:506 - proposePaxos:
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>> columns=[[] | [value]]
>> node1/logs/system.log-Row: id=@ | value=)//1//0
>> --
>> node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15 14:48:36,969
>> PaxosState.java:117 - Accepting proposal: 
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> node2/logs/system.log-Row: id=@ | value=)
>> --
>> node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15 14:48:36,897
>> PaxosState.java:124 - Rejecting proposal for 
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> node3/logs/system.log-Row: id=@ | value=) because
>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>>
>>
>> kind regards,
>> Christian
>>
>>
>> On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers  wrote:
>>
>>> My thinking was that due to the size of the data that there maybe I/O
>>> issues. But it sounds more like you're competing for locks and hit a
>>> deadlock issue.
>>>
>>> Regards,
>>> Denise
>>> Cell - (860)989-3431 <(860)%20989-3431>
>>>
>>> Sent from mi iPhone
>>>
>>> On Apr 15, 2016, at 9:00 AM, horschi  wrote:
>>>
>>> Hi Denise,
>>>
>>> in my case its a small blob I am writing (should be around 100 bytes):
>>>
>>>  CREATE TABLE "Lock" (
>>>  lockname varchar,
>>>  id varchar,
>>>  value blob,
>>>  PRIMARY KEY (lockname, id)
>>>  ) WITH COMPACT STORAGE
>>>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
>>> 'chunk_length_kb' : '8' };
>>>
>>> You ask because large values are known to cause issues? Anything special
>>> you have in mind?
>>>
>>> kind regards,
>>> Christian
>>>
>>>
>>>
>>>
>>> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:
>>>
 Also, what type of data were you reading/writing?

 Regards,
 Denise

 Sent from mi iPad

 On Apr 15, 2016, at 8:29 AM, horschi  wrote:

 Hi Jan,

 were you able to resolve your Problem?

 We are trying the same and also see a lot of WriteTimeouts:
 WriteTimeoutException: Cassandra timeout during write query at
 consistency SERIAL (2 replica were required but only 1 acknowledged the
 write)

 How many clients were competing for a lock in your case? In our case
 its only two :-(

 cheers,
 Christian


 On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
 wrote:

> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
> jan.algermis...@nordsc.com> wrote:
>
>> I am experimenting with C* 2.0 ( and today's java-driver 2.0
>> snapshot) for implementing distributed locks.
>>
>
> [ and I'm experiencing the problem described in the subject ... ]
>
>
>> Any idea how to approach this problem?
>>
>
> 1) Upgrade to 2.0.1 release.
> 2) Try to reproduce symptoms.
> 3) If able to, file a JIRA at https://issues.apache.org/jira
> /secure/Dashboard.jspa including repro steps

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-23 Thread horschi
Update: I replace all quorum reads on that table with serial reads, and now
these errors got less. Somehow quorum reads on CAS values cause most of
these WTEs.

Also I found two tickets on that topic:
https://issues.apache.org/jira/browse/CASSANDRA-9328
https://issues.apache.org/jira/browse/CASSANDRA-8672

On Thu, Dec 15, 2016 at 3:14 PM, horschi  wrote:

> Hi,
>
> I would like to warm up this old thread. I did some debugging and found
> out that the timeouts are coming from StorageProxy.proposePaxos()
> - callback.isFullyRefused() returns false and therefore triggers a
> WriteTimeout.
>
> Looking at my ccm cluster logs, I can see that two replica nodes return
> different results in their ProposeVerbHandler. In my opinion the
> coordinator should not throw a Exception in such a case, but instead retry
> the operation.
>
> What do the CAS/Paxos experts on this list say to this? Feel free to
> instruct me to do further tests/code changes. I'd be glad to help.
>
> Log:
>
> node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15 14:48:36,896
> PaxosState.java:124 - Rejecting proposal for 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node1/logs/system.log-Row: id=@ | value=) because
> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
> key=locktest_ 1 columns=[[] | [value]]
> --
> node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15 14:48:36,980
> StorageProxy.java:506 - proposePaxos: 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node1/logs/system.log-Row: id=@ | value=)//1//0
> --
> node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15 14:48:36,969
> PaxosState.java:117 - Accepting proposal: 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node2/logs/system.log-Row: id=@ | value=)
> --
> node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15 14:48:36,897
> PaxosState.java:124 - Rejecting proposal for 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node3/logs/system.log-Row: id=@ | value=) because
> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
> key=locktest_ 1 columns=[[] | [value]]
>
>
> kind regards,
> Christian
>
>
> On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers  wrote:
>
>> My thinking was that due to the size of the data that there maybe I/O
>> issues. But it sounds more like you're competing for locks and hit a
>> deadlock issue.
>>
>> Regards,
>> Denise
>> Cell - (860)989-3431 <(860)%20989-3431>
>>
>> Sent from mi iPhone
>>
>> On Apr 15, 2016, at 9:00 AM, horschi  wrote:
>>
>> Hi Denise,
>>
>> in my case its a small blob I am writing (should be around 100 bytes):
>>
>>  CREATE TABLE "Lock" (
>>  lockname varchar,
>>  id varchar,
>>  value blob,
>>  PRIMARY KEY (lockname, id)
>>  ) WITH COMPACT STORAGE
>>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
>> 'chunk_length_kb' : '8' };
>>
>> You ask because large values are known to cause issues? Anything special
>> you have in mind?
>>
>> kind regards,
>> Christian
>>
>>
>>
>>
>> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:
>>
>>> Also, what type of data were you reading/writing?
>>>
>>> Regards,
>>> Denise
>>>
>>> Sent from mi iPad
>>>
>>> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>>>
>>> Hi Jan,
>>>
>>> were you able to resolve your Problem?
>>>
>>> We are trying the same and also see a lot of WriteTimeouts:
>>> WriteTimeoutException: Cassandra timeout during write query at
>>> consistency SERIAL (2 replica were required but only 1 acknowledged the
>>> write)
>>>
>>> How many clients were competing for a lock in your case? In our case its
>>> only two :-(
>>>
>>> cheers,
>>> Christian
>>>
>>>
>>> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
>>> wrote:
>>>
 On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
 jan.algermis...@nordsc.com> wrote:

> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
> for implementing distributed locks.
>

 [ and I'm experiencing the problem described in the subject ... ]


> Any idea how to approach this problem?
>

 1) Upgrade to 2.0.1 release.
 2) Try to reproduce symptoms.
 3) If able to, file a JIRA at https://issues.apache.org/jira
 /secure/Dashboard.jspa including repro steps
 4) Reply to this thread with the JIRA ticket URL

 =Rob



>>>
>>>
>>
>


Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-15 Thread horschi
Hi,

I would like to warm up this old thread. I did some debugging and found out
that the timeouts are coming from StorageProxy.proposePaxos()
- callback.isFullyRefused() returns false and therefore triggers a
WriteTimeout.

Looking at my ccm cluster logs, I can see that two replica nodes return
different results in their ProposeVerbHandler. In my opinion the
coordinator should not throw a Exception in such a case, but instead retry
the operation.

What do the CAS/Paxos experts on this list say to this? Feel free to
instruct me to do further tests/code changes. I'd be glad to help.

Log:

node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15 14:48:36,896
PaxosState.java:124 - Rejecting proposal for
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node1/logs/system.log-Row: id=@ | value=) because inProgress
is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
key=locktest_ 1 columns=[[] | [value]]
--
node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15 14:48:36,980
StorageProxy.java:506 - proposePaxos:
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node1/logs/system.log-Row: id=@ | value=)//1//0
--
node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15 14:48:36,969
PaxosState.java:117 - Accepting proposal:
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node2/logs/system.log-Row: id=@ | value=)
--
node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15 14:48:36,897
PaxosState.java:124 - Rejecting proposal for
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node3/logs/system.log-Row: id=@ | value=) because inProgress
is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
key=locktest_ 1 columns=[[] | [value]]


kind regards,
Christian


On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers  wrote:

> My thinking was that due to the size of the data that there maybe I/O
> issues. But it sounds more like you're competing for locks and hit a
> deadlock issue.
>
> Regards,
> Denise
> Cell - (860)989-3431 <(860)%20989-3431>
>
> Sent from mi iPhone
>
> On Apr 15, 2016, at 9:00 AM, horschi  wrote:
>
> Hi Denise,
>
> in my case its a small blob I am writing (should be around 100 bytes):
>
>  CREATE TABLE "Lock" (
>  lockname varchar,
>  id varchar,
>  value blob,
>  PRIMARY KEY (lockname, id)
>  ) WITH COMPACT STORAGE
>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
> 'chunk_length_kb' : '8' };
>
> You ask because large values are known to cause issues? Anything special
> you have in mind?
>
> kind regards,
> Christian
>
>
>
>
> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:
>
>> Also, what type of data were you reading/writing?
>>
>> Regards,
>> Denise
>>
>> Sent from mi iPad
>>
>> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>>
>> Hi Jan,
>>
>> were you able to resolve your Problem?
>>
>> We are trying the same and also see a lot of WriteTimeouts:
>> WriteTimeoutException: Cassandra timeout during write query at
>> consistency SERIAL (2 replica were required but only 1 acknowledged the
>> write)
>>
>> How many clients were competing for a lock in your case? In our case its
>> only two :-(
>>
>> cheers,
>> Christian
>>
>>
>> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
>> wrote:
>>
>>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
>>> jan.algermis...@nordsc.com> wrote:
>>>
 I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
 for implementing distributed locks.

>>>
>>> [ and I'm experiencing the problem described in the subject ... ]
>>>
>>>
 Any idea how to approach this problem?

>>>
>>> 1) Upgrade to 2.0.1 release.
>>> 2) Try to reproduce symptoms.
>>> 3) If able to, file a JIRA at https://issues.apache.org/
>>> jira/secure/Dashboard.jspa including repro steps
>>> 4) Reply to this thread with the JIRA ticket URL
>>>
>>> =Rob
>>>
>>>
>>>
>>
>>
>


Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread Denise Rogers
My thinking was that due to the size of the data that there maybe I/O issues. 
But it sounds more like you're competing for locks and hit a deadlock issue. 

Regards,
Denise
Cell - (860)989-3431

Sent from mi iPhone

> On Apr 15, 2016, at 9:00 AM, horschi  wrote:
> 
> Hi Denise,
> 
> in my case its a small blob I am writing (should be around 100 bytes):
> 
>  CREATE TABLE "Lock" (
>  lockname varchar,
>  id varchar,
>  value blob,
>  PRIMARY KEY (lockname, id)
>  ) WITH COMPACT STORAGE 
>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor', 
> 'chunk_length_kb' : '8' };
> 
> You ask because large values are known to cause issues? Anything special you 
> have in mind?
> 
> kind regards,
> Christian
> 
> 
> 
> 
>> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:
>> Also, what type of data were you reading/writing?
>> 
>> Regards,
>> Denise
>> 
>> Sent from mi iPad
>> 
>>> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>>> 
>>> Hi Jan,
>>> 
>>> were you able to resolve your Problem?
>>> 
>>> We are trying the same and also see a lot of WriteTimeouts:
>>> WriteTimeoutException: Cassandra timeout during write query at consistency 
>>> SERIAL (2 replica were required but only 1 acknowledged the write)
>>> 
>>> How many clients were competing for a lock in your case? In our case its 
>>> only two :-(
>>> 
>>> cheers,
>>> Christian
>>> 
>>> 
 On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli  wrote:
> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen 
>  wrote:
> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot) 
> for implementing distributed locks.
 
 [ and I'm experiencing the problem described in the subject ... ]
  
> Any idea how to approach this problem?
 
 1) Upgrade to 2.0.1 release.
 2) Try to reproduce symptoms.
 3) If able to, file a JIRA at 
 https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
 4) Reply to this thread with the JIRA ticket URL
 
 =Rob
> 


Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread horschi
Hi Denise,

in my case its a small blob I am writing (should be around 100 bytes):

 CREATE TABLE "Lock" (
 lockname varchar,
 id varchar,
 value blob,
 PRIMARY KEY (lockname, id)
 ) WITH COMPACT STORAGE
 AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
'chunk_length_kb' : '8' };

You ask because large values are known to cause issues? Anything special
you have in mind?

kind regards,
Christian




On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:

> Also, what type of data were you reading/writing?
>
> Regards,
> Denise
>
> Sent from mi iPad
>
> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>
> Hi Jan,
>
> were you able to resolve your Problem?
>
> We are trying the same and also see a lot of WriteTimeouts:
> WriteTimeoutException: Cassandra timeout during write query at consistency
> SERIAL (2 replica were required but only 1 acknowledged the write)
>
> How many clients were competing for a lock in your case? In our case its
> only two :-(
>
> cheers,
> Christian
>
>
> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
> wrote:
>
>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
>> jan.algermis...@nordsc.com> wrote:
>>
>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
>>> for implementing distributed locks.
>>>
>>
>> [ and I'm experiencing the problem described in the subject ... ]
>>
>>
>>> Any idea how to approach this problem?
>>>
>>
>> 1) Upgrade to 2.0.1 release.
>> 2) Try to reproduce symptoms.
>> 3) If able to, file a JIRA at
>> https://issues.apache.org/jira/secure/Dashboard.jspa including repro
>> steps
>> 4) Reply to this thread with the JIRA ticket URL
>>
>> =Rob
>>
>>
>>
>
>


Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread Denise Rogers
Also, what type of data were you reading/writing?

Regards,
Denise

Sent from mi iPad

> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
> 
> Hi Jan,
> 
> were you able to resolve your Problem?
> 
> We are trying the same and also see a lot of WriteTimeouts:
> WriteTimeoutException: Cassandra timeout during write query at consistency 
> SERIAL (2 replica were required but only 1 acknowledged the write)
> 
> How many clients were competing for a lock in your case? In our case its only 
> two :-(
> 
> cheers,
> Christian
> 
> 
>> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli  wrote:
>>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen 
>>>  wrote:
>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot) for 
>>> implementing distributed locks.
>> 
>> [ and I'm experiencing the problem described in the subject ... ]
>>  
>>> Any idea how to approach this problem?
>> 
>> 1) Upgrade to 2.0.1 release.
>> 2) Try to reproduce symptoms.
>> 3) If able to, file a JIRA at 
>> https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
>> 4) Reply to this thread with the JIRA ticket URL
>> 
>> =Rob
> 


Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread horschi
Hi Jan,

were you able to resolve your Problem?

We are trying the same and also see a lot of WriteTimeouts:
WriteTimeoutException: Cassandra timeout during write query at consistency
SERIAL (2 replica were required but only 1 acknowledged the write)

How many clients were competing for a lock in your case? In our case its
only two :-(

cheers,
Christian


On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli  wrote:

> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
> jan.algermis...@nordsc.com> wrote:
>
>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
>> for implementing distributed locks.
>>
>
> [ and I'm experiencing the problem described in the subject ... ]
>
>
>> Any idea how to approach this problem?
>>
>
> 1) Upgrade to 2.0.1 release.
> 2) Try to reproduce symptoms.
> 3) If able to, file a JIRA at
> https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
> 4) Reply to this thread with the JIRA ticket URL
>
> =Rob
>
>
>


Re: All subsequent CAS requests time out after heavy use of new CAS feature

2013-09-23 Thread Robert Coli
On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen  wrote:

> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot) for
> implementing distributed locks.
>

[ and I'm experiencing the problem described in the subject ... ]


> Any idea how to approach this problem?
>

1) Upgrade to 2.0.1 release.
2) Try to reproduce symptoms.
3) If able to, file a JIRA at
https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
4) Reply to this thread with the JIRA ticket URL

=Rob


All subsequent CAS requests time out after heavy use of new CAS feature

2013-09-16 Thread Jan Algermissen
Hi,

I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot) for 
implementing distributed locks.

Basically, I have a table  of 'states' I want to serialize access to:

  create table state ( id text , lock uuid , data text, primary key (id) )   (3 
nodes, replication level 3)

  insert into state (id) values ( 'foo')

I try to akquire the lock for state 'foo' like this:

  update state set lock = myUUID where id = 'foo' if lock = null;

and check whether I got it by comparing the lock against my supplied UUID:

   select lock from state where id = 'foo'; 

... do work on 'foo' state 

release lock:

 update state set lock = null where id = 'foo' if lock = myUUID;


This works pretty well and if I increase the number of clients competing for 
the lock I start seeing timeouts on the client side. Natural so far and the 
lock also remains in a consistent state (it works to work around the failing 
clients and the uncertainty whether they got the lock or not).

However, after pausing the clients for a while the timeouts do not disappear. 
Meaning that when I send a single request after everything calms down , I still 
get a timeout:

   Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: 
Cassandra timeout during write query at consistency SERIAL (-1 replica were 
required but only -1 acknowledged the write)

I do not see any reaction in the C* logs for these follow-up requests that 
still time out.

Any idea how to approach this problem?

Jan