[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-11-30 Thread Hiroyuki Yamada (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17241218#comment-17241218
 ] 

Hiroyuki Yamada commented on CASSANDRA-12126:
-

> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.

Sorry, I'm not fully sure about the current implementation and how realistic my 
proposal is but,

can we read all the replicas to do the read recovery in step 3 to solve the 
issue?

It only reads A and B but if we read C as well, we know that the proposal is 
not accepted by the majority.

 

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 4.0-beta4, 3.0.24, 3.11.10
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15138) A cluster (RF=3) not recovering after two nodes are stopped

2019-05-23 Thread Hiroyuki Yamada (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846779#comment-16846779
 ] 

Hiroyuki Yamada commented on CASSANDRA-15138:
-

[~jmeredithco] Yes, that is correct. Sorry for not stating it.

> A cluster (RF=3) not recovering after two nodes are stopped
> ---
>
> Key: CASSANDRA-15138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Hiroyuki Yamada
>Priority: Normal
>
> I faced a weird issue when recovering a cluster after two nodes are stopped.
>  It is easily reproduce-able and looks like a bug or an issue to fix.
>  The following is a step to reproduce it.
> === STEP TO REPRODUCE ===
>  * Create a 3-node cluster with RF=3
>     - node1(seed), node2, node3
>  * Start requests to the cluster with cassandra-stress (it continues
>  until the end)
>     - what we did: cassandra-stress mixed cl=QUORUM duration=10m
>  -errors ignore -node node1,node2,node3 -rate threads\>=16
>  threads\<=256
>  - (It doesn't have to be this many threads. Can be 1)
>  * Stop node3 normally (with systemctl stop or kill (without -9))
>     - the system is still available as expected because the quorum of nodes is
>  still available
>  * Stop node2 normally (with systemctl stop or kill (without -9))
>     - the system is NOT available as expected after it's stopped.
>     - the client gets `UnavailableException: Not enough replicas
>  available for query at consistency QUORUM`
>     - the client gets errors right away (so few ms)
>     - so far it's all expected
>  * Wait for 1 mins
>  * Bring up node2 back
>     - {color:#ff}The issue happens here.{color}
>     - the client gets ReadTimeoutException` or WriteTimeoutException
>  depending on if the request is read or write even after the node2 is
>  up
>     - the client gets errors after about 5000ms or 2000ms, which are
>  request timeout for write and read request
>     - what node1 reports with `nodetool status` and what node2 reports
>  are not consistent. (node2 thinks node1 is down)
>     - It takes very long time to recover from its state
> === STEPS TO REPRODUCE ===
> Some additional important information to note:
>  * If we don't start cassandra-stress, it doesn't cause the issue.
>  * Restarting node1 and it recovers its state right after it's restarted
>  * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
>  or something) fixes the issue
>  * If we `kill -9` the nodes, then it doesn't cause the issue.
>  * Hints seems not related. I tested with hints disabled, it didn't make any 
> difference.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15138) A cluster (RF=3) not recovering after two nodes are stopped

2019-05-22 Thread Hiroyuki Yamada (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroyuki Yamada updated CASSANDRA-15138:

Description: 
I faced a weird issue when recovering a cluster after two nodes are stopped.
 It is easily reproduce-able and looks like a bug or an issue to fix.
 The following is a step to reproduce it.

=== STEP TO REPRODUCE ===
 * Create a 3-node cluster with RF=3
    - node1(seed), node2, node3
 * Start requests to the cluster with cassandra-stress (it continues
 until the end)
    - what we did: cassandra-stress mixed cl=QUORUM duration=10m
 -errors ignore -node node1,node2,node3 -rate threads\>=16
 threads\<=256

 - (It doesn't have to be this many threads. Can be 1)

 * Stop node3 normally (with systemctl stop or kill (without -9))
    - the system is still available as expected because the quorum of nodes is
 still available
 * Stop node2 normally (with systemctl stop or kill (without -9))
    - the system is NOT available as expected after it's stopped.
    - the client gets `UnavailableException: Not enough replicas
 available for query at consistency QUORUM`
    - the client gets errors right away (so few ms)
    - so far it's all expected
 * Wait for 1 mins
 * Bring up node2 back
    - {color:#ff}The issue happens here.{color}
    - the client gets ReadTimeoutException` or WriteTimeoutException
 depending on if the request is read or write even after the node2 is
 up
    - the client gets errors after about 5000ms or 2000ms, which are
 request timeout for write and read request
    - what node1 reports with `nodetool status` and what node2 reports
 are not consistent. (node2 thinks node1 is down)
    - It takes very long time to recover from its state

=== STEPS TO REPRODUCE ===

Some additional important information to note:
 * If we don't start cassandra-stress, it doesn't cause the issue.
 * Restarting node1 and it recovers its state right after it's restarted
 * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
 or something) fixes the issue
 * If we `kill -9` the nodes, then it doesn't cause the issue.
 * Hints seems not related. I tested with hints disabled, it didn't make any 
difference.

 

  was:
I faced a weird issue when recovering a cluster after two nodes are stopped.
 It is easily reproduce-able and looks like a bug or an issue to fix.
 The following is a step to reproduce it.

=== STEP TO REPRODUCE ===
 * Create a 3-node cluster with RF=3
    - node1(seed), node2, node3
 * Start requests to the cluster with cassandra-stress (it continues
 until the end)
    - what we did: cassandra-stress mixed cl=QUORUM duration=10m
 -errors ignore -node node1,node2,node3 -rate threads\>=16
 threads\<=256

 - (It doesn't have to be this many threads. Can be 1)

 * Stop node3 normally (with systemctl stop or kill (not without -9))
    - the system is still available as expected because the quorum of nodes is
 still available
 * Stop node2 normally (with systemctl stop or kill (not without -9))
    - the system is NOT available as expected after it's stopped.
    - the client gets `UnavailableException: Not enough replicas
 available for query at consistency QUORUM`
    - the client gets errors right away (so few ms)
    - so far it's all expected
 * Wait for 1 mins
 * Bring up node2 back
    - {color:#FF}The issue happens here.{color}
    - the client gets ReadTimeoutException` or WriteTimeoutException
 depending on if the request is read or write even after the node2 is
 up
    - the client gets errors after about 5000ms or 2000ms, which are
 request timeout for write and read request
    - what node1 reports with `nodetool status` and what node2 reports
 are not consistent. (node2 thinks node1 is down)
    - It takes very long time to recover from its state

=== STEPS TO REPRODUCE ===

Some additional important information to note:
 * If we don't start cassandra-stress, it doesn't cause the issue.
 * Restarting node1 and it recovers its state right after it's restarted
 * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
 or something) fixes the issue
 * If we `kill -9` the nodes, then it doesn't cause the issue.
 * Hints seems not related. I tested with hints disabled, it didn't make any 
difference.

 


> A cluster (RF=3) not recovering after two nodes are stopped
> ---
>
> Key: CASSANDRA-15138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Hiroyuki Yamada
>Priority: Normal
>
> I faced a weird issue when recovering a cluster after two nodes are stopped.
>  It is easily reproduce-able and looks like a bug or an issue to fix.
>  The following is a step to reproduce it.
> === STEP TO REPRODUCE ===

[jira] [Updated] (CASSANDRA-15138) A cluster (RF=3) not recovering after two nodes are stopped

2019-05-22 Thread Hiroyuki Yamada (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroyuki Yamada updated CASSANDRA-15138:

Discovered By: User Report

> A cluster (RF=3) not recovering after two nodes are stopped
> ---
>
> Key: CASSANDRA-15138
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15138
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: Hiroyuki Yamada
>Priority: Normal
>
> I faced a weird issue when recovering a cluster after two nodes are stopped.
>  It is easily reproduce-able and looks like a bug or an issue to fix.
>  The following is a step to reproduce it.
> === STEP TO REPRODUCE ===
>  * Create a 3-node cluster with RF=3
>     - node1(seed), node2, node3
>  * Start requests to the cluster with cassandra-stress (it continues
>  until the end)
>     - what we did: cassandra-stress mixed cl=QUORUM duration=10m
>  -errors ignore -node node1,node2,node3 -rate threads\>=16
>  threads\<=256
>  - (It doesn't have to be this many threads. Can be 1)
>  * Stop node3 normally (with systemctl stop or kill (not without -9))
>     - the system is still available as expected because the quorum of nodes is
>  still available
>  * Stop node2 normally (with systemctl stop or kill (not without -9))
>     - the system is NOT available as expected after it's stopped.
>     - the client gets `UnavailableException: Not enough replicas
>  available for query at consistency QUORUM`
>     - the client gets errors right away (so few ms)
>     - so far it's all expected
>  * Wait for 1 mins
>  * Bring up node2 back
>     - {color:#FF}The issue happens here.{color}
>     - the client gets ReadTimeoutException` or WriteTimeoutException
>  depending on if the request is read or write even after the node2 is
>  up
>     - the client gets errors after about 5000ms or 2000ms, which are
>  request timeout for write and read request
>     - what node1 reports with `nodetool status` and what node2 reports
>  are not consistent. (node2 thinks node1 is down)
>     - It takes very long time to recover from its state
> === STEPS TO REPRODUCE ===
> Some additional important information to note:
>  * If we don't start cassandra-stress, it doesn't cause the issue.
>  * Restarting node1 and it recovers its state right after it's restarted
>  * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
>  or something) fixes the issue
>  * If we `kill -9` the nodes, then it doesn't cause the issue.
>  * Hints seems not related. I tested with hints disabled, it didn't make any 
> difference.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15138) A cluster (RF=3) not recovering after two nodes are stopped

2019-05-22 Thread Hiroyuki Yamada (JIRA)
Hiroyuki Yamada created CASSANDRA-15138:
---

 Summary: A cluster (RF=3) not recovering after two nodes are 
stopped
 Key: CASSANDRA-15138
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15138
 Project: Cassandra
  Issue Type: Bug
  Components: Cluster/Membership
Reporter: Hiroyuki Yamada


I faced a weird issue when recovering a cluster after two nodes are stopped.
 It is easily reproduce-able and looks like a bug or an issue to fix.
 The following is a step to reproduce it.

=== STEP TO REPRODUCE ===
 * Create a 3-node cluster with RF=3
    - node1(seed), node2, node3
 * Start requests to the cluster with cassandra-stress (it continues
 until the end)
    - what we did: cassandra-stress mixed cl=QUORUM duration=10m
 -errors ignore -node node1,node2,node3 -rate threads\>=16
 threads\<=256

 - (It doesn't have to be this many threads. Can be 1)

 * Stop node3 normally (with systemctl stop or kill (not without -9))
    - the system is still available as expected because the quorum of nodes is
 still available
 * Stop node2 normally (with systemctl stop or kill (not without -9))
    - the system is NOT available as expected after it's stopped.
    - the client gets `UnavailableException: Not enough replicas
 available for query at consistency QUORUM`
    - the client gets errors right away (so few ms)
    - so far it's all expected
 * Wait for 1 mins
 * Bring up node2 back
    - {color:#FF}The issue happens here.{color}
    - the client gets ReadTimeoutException` or WriteTimeoutException
 depending on if the request is read or write even after the node2 is
 up
    - the client gets errors after about 5000ms or 2000ms, which are
 request timeout for write and read request
    - what node1 reports with `nodetool status` and what node2 reports
 are not consistent. (node2 thinks node1 is down)
    - It takes very long time to recover from its state

=== STEPS TO REPRODUCE ===

Some additional important information to note:
 * If we don't start cassandra-stress, it doesn't cause the issue.
 * Restarting node1 and it recovers its state right after it's restarted
 * Setting lower value in dynamic_snitch_reset_interval_in_ms (to 6
 or something) fixes the issue
 * If we `kill -9` the nodes, then it doesn't cause the issue.
 * Hints seems not related. I tested with hints disabled, it didn't make any 
difference.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13530) GroupCommitLogService

2017-10-05 Thread Hiroyuki Yamada (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16193849#comment-16193849
 ] 

Hiroyuki Yamada commented on CASSANDRA-13530:
-

Thank you for thinking about it seriously > [~aweisberg]
Thank you for the good proposal > [~yuji]

Personally, Having commitlog_sync_group_window_in_ms in the config sounds good 
to me, but no problem in the alternative way as well.


> GroupCommitLogService
> -
>
> Key: CASSANDRA-13530
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13530
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuji Ito
>Assignee: Yuji Ito
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: groupAndBatch.png, groupCommit22.patch, 
> groupCommit30.patch, groupCommit3x.patch, 
> groupCommitLog_noSerial_result.xlsx, groupCommitLog_result.xlsx, 
> GuavaRequestThread.java, MicroRequestThread.java
>
>
> I propose a new CommitLogService, GroupCommitLogService, to improve the 
> throughput when lots of requests are received.
> It improved the throughput by maximum 94%.
> I'd like to discuss about this CommitLogService.
> Currently, we can select either 2 CommitLog services; Periodic and Batch.
> In Periodic, we might lose some commit log which hasn't written to the disk.
> In Batch, we can write commit log to the disk every time. The size of commit 
> log to write is too small (< 4KB). When high concurrency, these writes are 
> gathered and persisted to the disk at once. But, when insufficient 
> concurrency, many small writes are issued and the performance decreases due 
> to the latency of the disk. Even if you use SSD, processes of many IO 
> commands decrease the performance.
> GroupCommitLogService writes some commitlog to the disk at once.
> The patch adds GroupCommitLogService (It is enabled by setting 
> `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml).
> The difference from Batch is just only waiting for the semaphore.
> By waiting for the semaphore, some writes for commit logs are executed at the 
> same time.
> In GroupCommitLogService, the latency becomes worse if the there is no 
> concurrency.
> I measured the performance with my microbench (MicroRequestThread.java) by 
> increasing the number of threads.The cluster has 3 nodes (Replication factor: 
> 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume.
> The result is as below. The GroupCommitLogService with 10ms window improved 
> update with Paxos by 94% and improved select with Paxos by 76%.
> h6. SELECT / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|192|103|
> |2|163|212|
> |4|264|416|
> |8|454|800|
> |16|744|1311|
> |32|1151|1481|
> |64|1767|1844|
> |128|2949|3011|
> |256|4723|5000|
> h6. UPDATE / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|45|26|
> |2|39|51|
> |4|58|102|
> |8|102|198|
> |16|167|213|
> |32|289|295|
> |64|544|548|
> |128|1046|1058|
> |256|2020|2061|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13530) GroupCommitLogService

2017-09-06 Thread Hiroyuki Yamada (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16155319#comment-16155319
 ] 

Hiroyuki Yamada commented on CASSANDRA-13530:
-

I'm sorry from outside.

What do you mean `That documentation in the YAML looks wrong to me.` ?
In the apache doc, it also states 2ms is the default value.
http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html

I'm not really sure what you are trying to say here, 
but the current batch is not working as expected as I described below, and it 
is not very useful.
https://issues.apache.org/jira/browse/CASSANDRA-12864

Even if `commitlog_sync_batch_window_in_ms` is set a big number, 
it is the maximum length of time that queries may be batched together for, not 
the minimum,
so, it is pretty nondeterministic and the behavior is not predictable.
You can't really balance between latency and throughput.

On the other hand, GroupCommitLogService makes more sense and 
actually makes things very different from performance perspective and it seems 
behaving pretty predictable.
Also, It only updates few lines of code changes without much complication.

I'm sorry from outside, but this discussion seems unnecessarily long without 
good directions even though the proposal looks pretty good.

> GroupCommitLogService
> -
>
> Key: CASSANDRA-13530
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13530
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuji Ito
>Assignee: Yuji Ito
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: groupCommit22.patch, groupCommit30.patch, 
> groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, 
> groupCommitLog_result.xlsx, GuavaRequestThread.java, MicroRequestThread.java
>
>
> I propose a new CommitLogService, GroupCommitLogService, to improve the 
> throughput when lots of requests are received.
> It improved the throughput by maximum 94%.
> I'd like to discuss about this CommitLogService.
> Currently, we can select either 2 CommitLog services; Periodic and Batch.
> In Periodic, we might lose some commit log which hasn't written to the disk.
> In Batch, we can write commit log to the disk every time. The size of commit 
> log to write is too small (< 4KB). When high concurrency, these writes are 
> gathered and persisted to the disk at once. But, when insufficient 
> concurrency, many small writes are issued and the performance decreases due 
> to the latency of the disk. Even if you use SSD, processes of many IO 
> commands decrease the performance.
> GroupCommitLogService writes some commitlog to the disk at once.
> The patch adds GroupCommitLogService (It is enabled by setting 
> `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml).
> The difference from Batch is just only waiting for the semaphore.
> By waiting for the semaphore, some writes for commit logs are executed at the 
> same time.
> In GroupCommitLogService, the latency becomes worse if the there is no 
> concurrency.
> I measured the performance with my microbench (MicroRequestThread.java) by 
> increasing the number of threads.The cluster has 3 nodes (Replication factor: 
> 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume.
> The result is as below. The GroupCommitLogService with 10ms window improved 
> update with Paxos by 94% and improved select with Paxos by 76%.
> h6. SELECT / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|192|103|
> |2|163|212|
> |4|264|416|
> |8|454|800|
> |16|744|1311|
> |32|1151|1481|
> |64|1767|1844|
> |128|2949|3011|
> |256|4723|5000|
> h6. UPDATE / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|45|26|
> |2|39|51|
> |4|58|102|
> |8|102|198|
> |16|167|213|
> |32|289|295|
> |64|544|548|
> |128|1046|1058|
> |256|2020|2061|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13530) GroupCommitLogService

2017-05-15 Thread Hiroyuki Yamada (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16011560#comment-16011560
 ] 

Hiroyuki Yamada commented on CASSANDRA-13530:
-

I vote for an acceptance.
As I discussed here,
https://issues.apache.org/jira/browse/CASSANDRA-12864
the current batch is not working as most of us assume because you can't control 
the size of the batch actually even if there is a parameter called 
commitlog_sync_batch_window_in_ms.
So, it can't control the balance between throughput and latency, whereas that's 
the point of group commit in general.


> GroupCommitLogService
> -
>
> Key: CASSANDRA-13530
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13530
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuji Ito
>Assignee: Yuji Ito
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: groupCommit22.patch, groupCommit30.patch, 
> groupCommit3x.patch, MicroRequestThread.java
>
>
> I propose a new CommitLogService, GroupCommitLogService, to improve the 
> throughput when lots of requests are received.
> It improved the throughput by maximum 94%.
> I'd like to discuss about this CommitLogService.
> Currently, we can select either 2 CommitLog services; Periodic and Batch.
> In Periodic, we might lose some commit log which hasn't written to the disk.
> In Batch, we can write commit log to the disk every time. The size of commit 
> log to write is too small (< 4KB). When high concurrency, these writes are 
> gathered and persisted to the disk at once. But, when insufficient 
> concurrency, many small writes are issued and the performance decreases due 
> to the latency of the disk. Even if you use SSD, processes of many IO 
> commands decrease the performance.
> GroupCommitLogService writes some commitlog to the disk at once.
> The patch adds GroupCommitLogService (It is enabled by setting 
> `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml).
> The difference from Batch is just only waiting for the semaphore.
> By waiting for the semaphore, some writes for commit logs are executed at the 
> same time.
> In GroupCommitLogService, the latency becomes worse if the there is no 
> concurrency.
> I measured the performance with my microbench (MicroRequestThread.java) by 
> increasing the number of threads.The cluster has 3 nodes (Replication factor: 
> 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume.
> The result is as below. The GroupCommitLogService with 10ms window improved 
> update with Paxos by 94% and improved select with Paxos by 76%.
> h6. SELECT / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|192|103|
> |2|163|212|
> |4|264|416|
> |8|454|800|
> |16|744|1311|
> |32|1151|1481|
> |64|1767|1844|
> |128|2949|3011|
> |256|4723|5000|
> h6. UPDATE / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|45|26|
> |2|39|51|
> |4|58|102|
> |8|102|198|
> |16|167|213|
> |32|289|295|
> |64|544|548|
> |128|1046|1058|
> |256|2020|2061|



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12864) "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 2.1, 2.2 and 3.9

2016-10-31 Thread Hiroyuki Yamada (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15624234#comment-15624234
 ] 

Hiroyuki Yamada commented on CASSANDRA-12864:
-

OK, thank you for pointing out, Benjamin.
In that case, the documents should be described more correctly.
They say it waits commitlog_sync_batch_window_in_ms .

> To Apache Cassandra Community
http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html?highlight=sync#commitlog-sync

> To Datastax
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html#reference_ds_qfg_n1r_1k__commitlog_sync
http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html#configCassandra_yaml__commitlog_sync

Also, if it is the expected behavior, I think it's kind of missing the points 
of group commit because it can't really control the window size and almost all 
of the mutations are committed right after they are issued.
So it can't control the balance between latency and throughput.



> "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 
> 2.1, 2.2 and 3.9
> --
>
> Key: CASSANDRA-12864
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12864
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Hiroyuki Yamada
>
> "commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in 
> the latest versions in 2.1.16, 2.2.8 and 3.9.
> Here is the way to reproduce the bug.
> 1.  set the following parameters in cassandra.yaml
> * commitlog_sync: batch
> * commitlog_sync_batch_window_in_ms: 1 (10s)
> 2. issue an insert from cqlsh
> 3. it immediately returns instead of waiting for 10 seconds.
> Please refer to the communication in the mailing list.
> http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12864) "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 2.1, 2.2 and 3.9

2016-10-30 Thread Hiroyuki Yamada (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroyuki Yamada updated CASSANDRA-12864:

Summary: "commitlog_sync_batch_window_in_ms" parameter is not working 
correctly in 2.1, 2.2 and 3.9  (was: "commitlog_sync_batch_window_in_ms" 
parameter is not working correctly in 2.1 and 2.2)

> "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 
> 2.1, 2.2 and 3.9
> --
>
> Key: CASSANDRA-12864
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12864
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Hiroyuki Yamada
>
> "commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in 
> the latest versions in 2.1.16, 2.2.8 and 3.9.
> Here is the way to reproduce the bug.
> 1.  set the following parameters in cassandra.yaml
> * commitlog_sync: batch
> * commitlog_sync_batch_window_in_ms: 1 (10s)
> 2. issue an insert from cqlsh
> 3. it immediately returns instead of waiting for 10 seconds.
> Please refer to the communication in the mailing list.
> http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12864) "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 2.1 and 2.2

2016-10-30 Thread Hiroyuki Yamada (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroyuki Yamada updated CASSANDRA-12864:

Description: 
"commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in the 
latest versions in 2.1.16, 2.2.8 and 3.9.

Here is the way to reproduce the bug.

1.  set the following parameters in cassandra.yaml
* commitlog_sync: batch
* commitlog_sync_batch_window_in_ms: 1 (10s)
2. issue an insert from cqlsh
3. it immediately returns instead of waiting for 10 seconds.

Please refer to the communication in the mailing list.
http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html

  was:
"commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in the 
latest versions in 2.1 and 2.2. (2.1.16 and 2.2.8 respectively)

Here is the way to reproduce the bug.

1.  set the following parameters in cassandra.yaml
* commitlog_sync: batch
* commitlog_sync_batch_window_in_ms: 1 (10s)
2. issue an insert from cqlsh
3. it immediately returns instead of waiting for 10 seconds.

Please refer to the communication in the mailing list.
http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html


> "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 2.1 
> and 2.2
> -
>
> Key: CASSANDRA-12864
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12864
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Hiroyuki Yamada
>
> "commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in 
> the latest versions in 2.1.16, 2.2.8 and 3.9.
> Here is the way to reproduce the bug.
> 1.  set the following parameters in cassandra.yaml
> * commitlog_sync: batch
> * commitlog_sync_batch_window_in_ms: 1 (10s)
> 2. issue an insert from cqlsh
> 3. it immediately returns instead of waiting for 10 seconds.
> Please refer to the communication in the mailing list.
> http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12864) "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 2.1 and 2.2

2016-10-30 Thread Hiroyuki Yamada (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroyuki Yamada updated CASSANDRA-12864:

Component/s: Core

> "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 2.1 
> and 2.2
> -
>
> Key: CASSANDRA-12864
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12864
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Hiroyuki Yamada
>
> "commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in 
> the latest versions in 2.1 and 2.2. (2.1.16 and 2.2.8 respectively)
> Here is the way to reproduce the bug.
> 1.  set the following parameters in cassandra.yaml
> * commitlog_sync: batch
> * commitlog_sync_batch_window_in_ms: 1 (10s)
> 2. issue an insert from cqlsh
> 3. it immediately returns instead of waiting for 10 seconds.
> Please refer to the communication in the mailing list.
> http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12864) "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 2.1 and 2.2

2016-10-30 Thread Hiroyuki Yamada (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroyuki Yamada updated CASSANDRA-12864:

Fix Version/s: (was: 2.1.16)
   (was: 2.2.8)

> "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 2.1 
> and 2.2
> -
>
> Key: CASSANDRA-12864
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12864
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hiroyuki Yamada
>
> "commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in 
> the latest versions in 2.1 and 2.2. (2.1.16 and 2.2.8 respectively)
> Here is the way to reproduce the bug.
> 1.  set the following parameters in cassandra.yaml
> * commitlog_sync: batch
> * commitlog_sync_batch_window_in_ms: 1 (10s)
> 2. issue an insert from cqlsh
> 3. it immediately returns instead of waiting for 10 seconds.
> Please refer to the communication in the mailing list.
> http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12864) "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 2.1 and 2.2

2016-10-30 Thread Hiroyuki Yamada (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroyuki Yamada updated CASSANDRA-12864:

Summary: "commitlog_sync_batch_window_in_ms" parameter is not working 
correctly in 2.1 and 2.2  (was: "commitlog_sync_batch_window_in_ms" parameter 
is not working correctly)

> "commitlog_sync_batch_window_in_ms" parameter is not working correctly in 2.1 
> and 2.2
> -
>
> Key: CASSANDRA-12864
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12864
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Hiroyuki Yamada
> Fix For: 2.1.16, 2.2.8
>
>
> "commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in 
> the latest versions in 2.1 and 2.2. (2.1.16 and 2.2.8 respectively)
> Here is the way to reproduce the bug.
> 1.  set the following parameters in cassandra.yaml
> * commitlog_sync: batch
> * commitlog_sync_batch_window_in_ms: 1 (10s)
> 2. issue an insert from cqlsh
> 3. it immediately returns instead of waiting for 10 seconds.
> Please refer to the communication in the mailing list.
> http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12864) "commitlog_sync_batch_window_in_ms" parameter is not working correctly

2016-10-30 Thread Hiroyuki Yamada (JIRA)
Hiroyuki Yamada created CASSANDRA-12864:
---

 Summary: "commitlog_sync_batch_window_in_ms" parameter is not 
working correctly
 Key: CASSANDRA-12864
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12864
 Project: Cassandra
  Issue Type: Bug
Reporter: Hiroyuki Yamada
 Fix For: 2.2.8, 2.1.16


"commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in the 
latest versions in 2.1 and 2.2. (2.1.16 and 2.2.8 respectively)

Here is the way to reproduce the bug.

1.  set the following parameters in cassandra.yaml
* commitlog_sync: batch
* commitlog_sync_batch_window_in_ms: 1 (10s)
2. issue an insert from cqlsh
3. it immediately returns instead of waiting for 10 seconds.

Please refer to the communication in the mailing list.
http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)