subject:"All subsequent CAS requests time out after heavy use of new CAS feature"

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-24 Thread horschi

Oh yes it is, like Couters :-)


On Sat, Dec 24, 2016 at 4:02 AM, Edward Capriolo 
wrote:

> Anecdotal CAS works differently than the typical cassandra workload. If
> you run a stress instance 3 nodes one host, you find that you typically run
> into CPU issues, but if you are doing a CAS workload you see things timing
> out and before you hit 100% CPU. It is a strange beast.
>
> On Fri, Dec 23, 2016 at 7:28 AM, horschi  wrote:
>
>> Update: I replace all quorum reads on that table with serial reads, and
>> now these errors got less. Somehow quorum reads on CAS values cause most of
>> these WTEs.
>>
>> Also I found two tickets on that topic:
>> https://issues.apache.org/jira/browse/CASSANDRA-9328
>> https://issues.apache.org/jira/browse/CASSANDRA-8672
>>
>> On Thu, Dec 15, 2016 at 3:14 PM, horschi  wrote:
>>
>>> Hi,
>>>
>>> I would like to warm up this old thread. I did some debugging and found
>>> out that the timeouts are coming from StorageProxy.proposePaxos()
>>> - callback.isFullyRefused() returns false and therefore triggers a
>>> WriteTimeout.
>>>
>>> Looking at my ccm cluster logs, I can see that two replica nodes return
>>> different results in their ProposeVerbHandler. In my opinion the
>>> coordinator should not throw a Exception in such a case, but instead retry
>>> the operation.
>>>
>>> What do the CAS/Paxos experts on this list say to this? Feel free to
>>> instruct me to do further tests/code changes. I'd be glad to help.
>>>
>>> Log:
>>>
>>> node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15
>>> 14:48:36,896 PaxosState.java:124 - Rejecting proposal for
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node1/logs/system.log-Row: id=@ | value=) because
>>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>>> --
>>> node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15
>>> 14:48:36,980 StorageProxy.java:506 - proposePaxos:
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node1/logs/system.log-Row: id=@ | value=)//1//0
>>> --
>>> node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15
>>> 14:48:36,969 PaxosState.java:117 - Accepting proposal:
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node2/logs/system.log-Row: id=@ | value=)
>>> --
>>> node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15
>>> 14:48:36,897 PaxosState.java:124 - Rejecting proposal for
>>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>>> columns=[[] | [value]]
>>> node3/logs/system.log-Row: id=@ | value=) because
>>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>>>
>>>
>>> kind regards,
>>> Christian
>>>
>>>
>>> On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers  wrote:
>>>
 My thinking was that due to the size of the data that there maybe I/O
 issues. But it sounds more like you're competing for locks and hit a
 deadlock issue.

 Regards,
 Denise
 Cell - (860)989-3431 <(860)%20989-3431>

 Sent from mi iPhone

 On Apr 15, 2016, at 9:00 AM, horschi  wrote:

 Hi Denise,

 in my case its a small blob I am writing (should be around 100 bytes):

  CREATE TABLE "Lock" (
  lockname varchar,
  id varchar,
  value blob,
  PRIMARY KEY (lockname, id)
  ) WITH COMPACT STORAGE
  AND COMPRESSION = { 'sstable_compression' :
 'SnappyCompressor', 'chunk_length_kb' : '8' };

 You ask because large values are known to cause issues? Anything
 special you have in mind?

 kind regards,
 Christian




 On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers 
 wrote:

> Also, what type of data were you reading/writing?
>
> Regards,
> Denise
>
> Sent from mi iPad
>
> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>
> Hi Jan,
>
> were you able to resolve your Problem?
>
> We are trying the same and also see a lot of WriteTimeouts:
> WriteTimeoutException: Cassandra timeout during write query at
> consistency SERIAL (2 replica were required but only 1 acknowledged the
> write)
>
> How many clients were competing for a lock in your case? In our case
> its only two :-(
>
> cheers,
> Christian
>
>
> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
> wrote:
>
>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
>> jan.algermis...@nordsc.com> wrote:
>>
>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0
>>> snapshot) for implementing distributed locks.
>>>
>>
>> [ and I'm experiencing the problem described in the subject ... ]
>>
>>
>>

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-23 Thread Edward Capriolo

Anecdotal CAS works differently than the typical cassandra workload. If you
run a stress instance 3 nodes one host, you find that you typically run
into CPU issues, but if you are doing a CAS workload you see things timing
out and before you hit 100% CPU. It is a strange beast.

On Fri, Dec 23, 2016 at 7:28 AM, horschi  wrote:

> Update: I replace all quorum reads on that table with serial reads, and
> now these errors got less. Somehow quorum reads on CAS values cause most of
> these WTEs.
>
> Also I found two tickets on that topic:
> https://issues.apache.org/jira/browse/CASSANDRA-9328
> https://issues.apache.org/jira/browse/CASSANDRA-8672
>
> On Thu, Dec 15, 2016 at 3:14 PM, horschi  wrote:
>
>> Hi,
>>
>> I would like to warm up this old thread. I did some debugging and found
>> out that the timeouts are coming from StorageProxy.proposePaxos()
>> - callback.isFullyRefused() returns false and therefore triggers a
>> WriteTimeout.
>>
>> Looking at my ccm cluster logs, I can see that two replica nodes return
>> different results in their ProposeVerbHandler. In my opinion the
>> coordinator should not throw a Exception in such a case, but instead retry
>> the operation.
>>
>> What do the CAS/Paxos experts on this list say to this? Feel free to
>> instruct me to do further tests/code changes. I'd be glad to help.
>>
>> Log:
>>
>> node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15 14:48:36,896
>> PaxosState.java:124 - Rejecting proposal for 
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> node1/logs/system.log-Row: id=@ | value=) because
>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> --
>> node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15
>> 14:48:36,980 StorageProxy.java:506 - proposePaxos:
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
>> columns=[[] | [value]]
>> node1/logs/system.log-Row: id=@ | value=)//1//0
>> --
>> node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15 14:48:36,969
>> PaxosState.java:117 - Accepting proposal: 
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> node2/logs/system.log-Row: id=@ | value=)
>> --
>> node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15 14:48:36,897
>> PaxosState.java:124 - Rejecting proposal for 
>> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>> node3/logs/system.log-Row: id=@ | value=) because
>> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4,
>> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
>>
>>
>> kind regards,
>> Christian
>>
>>
>> On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers  wrote:
>>
>>> My thinking was that due to the size of the data that there maybe I/O
>>> issues. But it sounds more like you're competing for locks and hit a
>>> deadlock issue.
>>>
>>> Regards,
>>> Denise
>>> Cell - (860)989-3431 <(860)%20989-3431>
>>>
>>> Sent from mi iPhone
>>>
>>> On Apr 15, 2016, at 9:00 AM, horschi  wrote:
>>>
>>> Hi Denise,
>>>
>>> in my case its a small blob I am writing (should be around 100 bytes):
>>>
>>>  CREATE TABLE "Lock" (
>>>  lockname varchar,
>>>  id varchar,
>>>  value blob,
>>>  PRIMARY KEY (lockname, id)
>>>  ) WITH COMPACT STORAGE
>>>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
>>> 'chunk_length_kb' : '8' };
>>>
>>> You ask because large values are known to cause issues? Anything special
>>> you have in mind?
>>>
>>> kind regards,
>>> Christian
>>>
>>>
>>>
>>>
>>> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:
>>>
 Also, what type of data were you reading/writing?

 Regards,
 Denise

 Sent from mi iPad

 On Apr 15, 2016, at 8:29 AM, horschi  wrote:

 Hi Jan,

 were you able to resolve your Problem?

 We are trying the same and also see a lot of WriteTimeouts:
 WriteTimeoutException: Cassandra timeout during write query at
 consistency SERIAL (2 replica were required but only 1 acknowledged the
 write)

 How many clients were competing for a lock in your case? In our case
 its only two :-(

 cheers,
 Christian


 On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
 wrote:

> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
> jan.algermis...@nordsc.com> wrote:
>
>> I am experimenting with C* 2.0 ( and today's java-driver 2.0
>> snapshot) for implementing distributed locks.
>>
>
> [ and I'm experiencing the problem described in the subject ... ]
>
>
>> Any idea how to approach this problem?
>>
>
> 1) Upgrade to 2.0.1 release.
> 2) Try to reproduce symptoms.
> 3) If able to, file a JIRA at https://issues.apache.org/jira
> /secure/Dashboard.jspa including repro steps

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-23 Thread horschi

Update: I replace all quorum reads on that table with serial reads, and now
these errors got less. Somehow quorum reads on CAS values cause most of
these WTEs.

Also I found two tickets on that topic:
https://issues.apache.org/jira/browse/CASSANDRA-9328
https://issues.apache.org/jira/browse/CASSANDRA-8672

On Thu, Dec 15, 2016 at 3:14 PM, horschi  wrote:

> Hi,
>
> I would like to warm up this old thread. I did some debugging and found
> out that the timeouts are coming from StorageProxy.proposePaxos()
> - callback.isFullyRefused() returns false and therefore triggers a
> WriteTimeout.
>
> Looking at my ccm cluster logs, I can see that two replica nodes return
> different results in their ProposeVerbHandler. In my opinion the
> coordinator should not throw a Exception in such a case, but instead retry
> the operation.
>
> What do the CAS/Paxos experts on this list say to this? Feel free to
> instruct me to do further tests/code changes. I'd be glad to help.
>
> Log:
>
> node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15 14:48:36,896
> PaxosState.java:124 - Rejecting proposal for 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node1/logs/system.log-Row: id=@ | value=) because
> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
> key=locktest_ 1 columns=[[] | [value]]
> --
> node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15 14:48:36,980
> StorageProxy.java:506 - proposePaxos: 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node1/logs/system.log-Row: id=@ | value=)//1//0
> --
> node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15 14:48:36,969
> PaxosState.java:117 - Accepting proposal: 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node2/logs/system.log-Row: id=@ | value=)
> --
> node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15 14:48:36,897
> PaxosState.java:124 - Rejecting proposal for 
> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc,
> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]]
> node3/logs/system.log-Row: id=@ | value=) because
> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
> key=locktest_ 1 columns=[[] | [value]]
>
>
> kind regards,
> Christian
>
>
> On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers  wrote:
>
>> My thinking was that due to the size of the data that there maybe I/O
>> issues. But it sounds more like you're competing for locks and hit a
>> deadlock issue.
>>
>> Regards,
>> Denise
>> Cell - (860)989-3431 <(860)%20989-3431>
>>
>> Sent from mi iPhone
>>
>> On Apr 15, 2016, at 9:00 AM, horschi  wrote:
>>
>> Hi Denise,
>>
>> in my case its a small blob I am writing (should be around 100 bytes):
>>
>>  CREATE TABLE "Lock" (
>>  lockname varchar,
>>  id varchar,
>>  value blob,
>>  PRIMARY KEY (lockname, id)
>>  ) WITH COMPACT STORAGE
>>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
>> 'chunk_length_kb' : '8' };
>>
>> You ask because large values are known to cause issues? Anything special
>> you have in mind?
>>
>> kind regards,
>> Christian
>>
>>
>>
>>
>> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:
>>
>>> Also, what type of data were you reading/writing?
>>>
>>> Regards,
>>> Denise
>>>
>>> Sent from mi iPad
>>>
>>> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>>>
>>> Hi Jan,
>>>
>>> were you able to resolve your Problem?
>>>
>>> We are trying the same and also see a lot of WriteTimeouts:
>>> WriteTimeoutException: Cassandra timeout during write query at
>>> consistency SERIAL (2 replica were required but only 1 acknowledged the
>>> write)
>>>
>>> How many clients were competing for a lock in your case? In our case its
>>> only two :-(
>>>
>>> cheers,
>>> Christian
>>>
>>>
>>> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
>>> wrote:
>>>
 On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
 jan.algermis...@nordsc.com> wrote:

> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
> for implementing distributed locks.
>

 [ and I'm experiencing the problem described in the subject ... ]


> Any idea how to approach this problem?
>

 1) Upgrade to 2.0.1 release.
 2) Try to reproduce symptoms.
 3) If able to, file a JIRA at https://issues.apache.org/jira
 /secure/Dashboard.jspa including repro steps
 4) Reply to this thread with the JIRA ticket URL

 =Rob



>>>
>>>
>>
>

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-12-15 Thread horschi

Hi,

I would like to warm up this old thread. I did some debugging and found out
that the timeouts are coming from StorageProxy.proposePaxos()
- callback.isFullyRefused() returns false and therefore triggers a
WriteTimeout.

Looking at my ccm cluster logs, I can see that two replica nodes return
different results in their ProposeVerbHandler. In my opinion the
coordinator should not throw a Exception in such a case, but instead retry
the operation.

What do the CAS/Paxos experts on this list say to this? Feel free to
instruct me to do further tests/code changes. I'd be glad to help.

Log:

node1/logs/system.log:WARN  [SharedPool-Worker-5] 2016-12-15 14:48:36,896
PaxosState.java:124 - Rejecting proposal for
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node1/logs/system.log-Row: id=@ | value=) because inProgress
is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
key=locktest_ 1 columns=[[] | [value]]
--
node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15 14:48:36,980
StorageProxy.java:506 - proposePaxos:
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node1/logs/system.log-Row: id=@ | value=)//1//0
--
node2/logs/system.log:WARN  [SharedPool-Worker-7] 2016-12-15 14:48:36,969
PaxosState.java:117 - Accepting proposal:
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node2/logs/system.log-Row: id=@ | value=)
--
node3/logs/system.log:WARN  [SharedPool-Worker-2] 2016-12-15 14:48:36,897
PaxosState.java:124 - Rejecting proposal for
Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1
columns=[[] | [value]]
node3/logs/system.log-Row: id=@ | value=) because inProgress
is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, [MDS.Lock]
key=locktest_ 1 columns=[[] | [value]]


kind regards,
Christian


On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers  wrote:

> My thinking was that due to the size of the data that there maybe I/O
> issues. But it sounds more like you're competing for locks and hit a
> deadlock issue.
>
> Regards,
> Denise
> Cell - (860)989-3431 <(860)%20989-3431>
>
> Sent from mi iPhone
>
> On Apr 15, 2016, at 9:00 AM, horschi  wrote:
>
> Hi Denise,
>
> in my case its a small blob I am writing (should be around 100 bytes):
>
>  CREATE TABLE "Lock" (
>  lockname varchar,
>  id varchar,
>  value blob,
>  PRIMARY KEY (lockname, id)
>  ) WITH COMPACT STORAGE
>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
> 'chunk_length_kb' : '8' };
>
> You ask because large values are known to cause issues? Anything special
> you have in mind?
>
> kind regards,
> Christian
>
>
>
>
> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:
>
>> Also, what type of data were you reading/writing?
>>
>> Regards,
>> Denise
>>
>> Sent from mi iPad
>>
>> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>>
>> Hi Jan,
>>
>> were you able to resolve your Problem?
>>
>> We are trying the same and also see a lot of WriteTimeouts:
>> WriteTimeoutException: Cassandra timeout during write query at
>> consistency SERIAL (2 replica were required but only 1 acknowledged the
>> write)
>>
>> How many clients were competing for a lock in your case? In our case its
>> only two :-(
>>
>> cheers,
>> Christian
>>
>>
>> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
>> wrote:
>>
>>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
>>> jan.algermis...@nordsc.com> wrote:
>>>
 I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
 for implementing distributed locks.

>>>
>>> [ and I'm experiencing the problem described in the subject ... ]
>>>
>>>
 Any idea how to approach this problem?

>>>
>>> 1) Upgrade to 2.0.1 release.
>>> 2) Try to reproduce symptoms.
>>> 3) If able to, file a JIRA at https://issues.apache.org/
>>> jira/secure/Dashboard.jspa including repro steps
>>> 4) Reply to this thread with the JIRA ticket URL
>>>
>>> =Rob
>>>
>>>
>>>
>>
>>
>

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread Denise Rogers

My thinking was that due to the size of the data that there maybe I/O issues. 
But it sounds more like you're competing for locks and hit a deadlock issue. 

Regards,
Denise
Cell - (860)989-3431

Sent from mi iPhone

> On Apr 15, 2016, at 9:00 AM, horschi  wrote:
> 
> Hi Denise,
> 
> in my case its a small blob I am writing (should be around 100 bytes):
> 
>  CREATE TABLE "Lock" (
>  lockname varchar,
>  id varchar,
>  value blob,
>  PRIMARY KEY (lockname, id)
>  ) WITH COMPACT STORAGE 
>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor', 
> 'chunk_length_kb' : '8' };
> 
> You ask because large values are known to cause issues? Anything special you 
> have in mind?
> 
> kind regards,
> Christian
> 
> 
> 
> 
>> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:
>> Also, what type of data were you reading/writing?
>> 
>> Regards,
>> Denise
>> 
>> Sent from mi iPad
>> 
>>> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>>> 
>>> Hi Jan,
>>> 
>>> were you able to resolve your Problem?
>>> 
>>> We are trying the same and also see a lot of WriteTimeouts:
>>> WriteTimeoutException: Cassandra timeout during write query at consistency 
>>> SERIAL (2 replica were required but only 1 acknowledged the write)
>>> 
>>> How many clients were competing for a lock in your case? In our case its 
>>> only two :-(
>>> 
>>> cheers,
>>> Christian
>>> 
>>> 
 On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli  wrote:
> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen 
>  wrote:
> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot) 
> for implementing distributed locks.
 
 [ and I'm experiencing the problem described in the subject ... ]
  
> Any idea how to approach this problem?
 
 1) Upgrade to 2.0.1 release.
 2) Try to reproduce symptoms.
 3) If able to, file a JIRA at 
 https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
 4) Reply to this thread with the JIRA ticket URL
 
 =Rob
>

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread horschi

Hi Denise,

in my case its a small blob I am writing (should be around 100 bytes):

 CREATE TABLE "Lock" (
 lockname varchar,
 id varchar,
 value blob,
 PRIMARY KEY (lockname, id)
 ) WITH COMPACT STORAGE
 AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
'chunk_length_kb' : '8' };

You ask because large values are known to cause issues? Anything special
you have in mind?

kind regards,
Christian




On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:

> Also, what type of data were you reading/writing?
>
> Regards,
> Denise
>
> Sent from mi iPad
>
> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>
> Hi Jan,
>
> were you able to resolve your Problem?
>
> We are trying the same and also see a lot of WriteTimeouts:
> WriteTimeoutException: Cassandra timeout during write query at consistency
> SERIAL (2 replica were required but only 1 acknowledged the write)
>
> How many clients were competing for a lock in your case? In our case its
> only two :-(
>
> cheers,
> Christian
>
>
> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
> wrote:
>
>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
>> jan.algermis...@nordsc.com> wrote:
>>
>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
>>> for implementing distributed locks.
>>>
>>
>> [ and I'm experiencing the problem described in the subject ... ]
>>
>>
>>> Any idea how to approach this problem?
>>>
>>
>> 1) Upgrade to 2.0.1 release.
>> 2) Try to reproduce symptoms.
>> 3) If able to, file a JIRA at
>> https://issues.apache.org/jira/secure/Dashboard.jspa including repro
>> steps
>> 4) Reply to this thread with the JIRA ticket URL
>>
>> =Rob
>>
>>
>>
>
>

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread Denise Rogers

Also, what type of data were you reading/writing?

Regards,
Denise

Sent from mi iPad

> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
> 
> Hi Jan,
> 
> were you able to resolve your Problem?
> 
> We are trying the same and also see a lot of WriteTimeouts:
> WriteTimeoutException: Cassandra timeout during write query at consistency 
> SERIAL (2 replica were required but only 1 acknowledged the write)
> 
> How many clients were competing for a lock in your case? In our case its only 
> two :-(
> 
> cheers,
> Christian
> 
> 
>> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli  wrote:
>>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen 
>>>  wrote:
>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot) for 
>>> implementing distributed locks.
>> 
>> [ and I'm experiencing the problem described in the subject ... ]
>>  
>>> Any idea how to approach this problem?
>> 
>> 1) Upgrade to 2.0.1 release.
>> 2) Try to reproduce symptoms.
>> 3) If able to, file a JIRA at 
>> https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
>> 4) Reply to this thread with the JIRA ticket URL
>> 
>> =Rob
>

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread horschi

Hi Jan,

were you able to resolve your Problem?

We are trying the same and also see a lot of WriteTimeouts:
WriteTimeoutException: Cassandra timeout during write query at consistency
SERIAL (2 replica were required but only 1 acknowledged the write)

How many clients were competing for a lock in your case? In our case its
only two :-(

cheers,
Christian


On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli  wrote:

> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
> jan.algermis...@nordsc.com> wrote:
>
>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
>> for implementing distributed locks.
>>
>
> [ and I'm experiencing the problem described in the subject ... ]
>
>
>> Any idea how to approach this problem?
>>
>
> 1) Upgrade to 2.0.1 release.
> 2) Try to reproduce symptoms.
> 3) If able to, file a JIRA at
> https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
> 4) Reply to this thread with the JIRA ticket URL
>
> =Rob
>
>
>

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2013-09-23 Thread Robert Coli

On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen  wrote:

> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot) for
> implementing distributed locks.
>

[ and I'm experiencing the problem described in the subject ... ]

> Any idea how to approach this problem?
>

1) Upgrade to 2.0.1 release.
2) Try to reproduce symptoms.
3) If able to, file a JIRA at
https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
4) Reply to this thread with the JIRA ticket URL

=Rob

All subsequent CAS requests time out after heavy use of new CAS feature

2013-09-16 Thread Jan Algermissen

Hi,

I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot) for 
implementing distributed locks.

Basically, I have a table  of 'states' I want to serialize access to:

  create table state ( id text , lock uuid , data text, primary key (id) )   (3 
nodes, replication level 3)

  insert into state (id) values ( 'foo')

I try to akquire the lock for state 'foo' like this:

  update state set lock = myUUID where id = 'foo' if lock = null;

and check whether I got it by comparing the lock against my supplied UUID:

   select lock from state where id = 'foo'; 

... do work on 'foo' state 

release lock:

 update state set lock = null where id = 'foo' if lock = myUUID;


This works pretty well and if I increase the number of clients competing for 
the lock I start seeing timeouts on the client side. Natural so far and the 
lock also remains in a consistent state (it works to work around the failing 
clients and the uncertainty whether they got the lock or not).

However, after pausing the clients for a while the timeouts do not disappear. 
Meaning that when I send a single request after everything calms down , I still 
get a timeout:

   Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: 
Cassandra timeout during write query at consistency SERIAL (-1 replica were 
required but only -1 acknowledged the write)

I do not see any reaction in the C* logs for these follow-up requests that 
still time out.

Any idea how to approach this problem?

Jan

Re: All subsequent CAS requests time out after heavy use of new CAS feature

Re: All subsequent CAS requests time out after heavy use of new CAS feature

Re: All subsequent CAS requests time out after heavy use of new CAS feature

Re: All subsequent CAS requests time out after heavy use of new CAS feature

Re: All subsequent CAS requests time out after heavy use of new CAS feature

Re: All subsequent CAS requests time out after heavy use of new CAS feature

Re: All subsequent CAS requests time out after heavy use of new CAS feature

Re: All subsequent CAS requests time out after heavy use of new CAS feature

Re: All subsequent CAS requests time out after heavy use of new CAS feature

All subsequent CAS requests time out after heavy use of new CAS feature

10 matches

Site Navigation

Mail list logo

Footer information