[ 
https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155485#comment-16155485
 ] 

Ariel Weisberg commented on CASSANDRA-13530:
--------------------------------------------

{quote}I'm sorry from outside.
What do you mean `That documentation in the YAML looks wrong to me.` ?
In the apache doc, it also states 2ms is the default value.
http://cassandra.apache.org/doc/latest/configuration/cassandra_config_file.html
I'm not really sure what you are trying to say here, 
but the current batch is not working as expected as I described below, and it 
is not very useful.
https://issues.apache.org/jira/browse/CASSANDRA-12864{quote}
I am agreeing with Benedict the documentation everywhere is wrong.

{quote}
Even if `commitlog_sync_batch_window_in_ms` is set a big number, 
it is the maximum length of time that queries may be batched together for, not 
the minimum,
so, it is pretty nondeterministic and the behavior is not predictable.{quote}
Predictability isn't the goal though it's the lowest average latency and lowest 
P99 (or whatever). More variability but lower average and P99 is still better.

{quote}You can't really balance between latency and throughput.{quote}
Is this true? Fsync latency has a fixed cost as well as a variable cost that is 
linear with the amount of data being written. So calling fsync as often as 
possible with whatever data is available seems like a reasonable strategy if 
you have a dedicated device that is doing nothing but waiting for the commit 
log to flush.
 
This should balance latency and throughput at any load level. The more 
throughput you have the more latency you have as the fsyncs take a little 
longer. The less throughput you have the less latency you have as the fsyncs 
are a little faster. But either way the latency is determined by the 
willingness of the device to sync data and not by a hard coded configuration 
which may or may not be optimal. Devices also don't have predictable fsync 
latency over time. As they fill up or run out of erase blocks or are contended 
by other IO the optimal batch size may change.

We see this effect at low concurrency where I expect it to be pronounced. 
What's unexpected is that we are also seeing worse throughput as op rate 
increases which is unexpected because I would expect the batches to grow as 
fsync latency increases until it converges on the optimal batch size for the 
device.


> GroupCommitLogService
> ---------------------
>
>                 Key: CASSANDRA-13530
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13530
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuji Ito
>            Assignee: Yuji Ito
>             Fix For: 2.2.x, 3.0.x, 3.11.x
>
>         Attachments: groupCommit22.patch, groupCommit30.patch, 
> groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, 
> groupCommitLog_result.xlsx, GuavaRequestThread.java, MicroRequestThread.java
>
>
> I propose a new CommitLogService, GroupCommitLogService, to improve the 
> throughput when lots of requests are received.
> It improved the throughput by maximum 94%.
> I'd like to discuss about this CommitLogService.
> Currently, we can select either 2 CommitLog services; Periodic and Batch.
> In Periodic, we might lose some commit log which hasn't written to the disk.
> In Batch, we can write commit log to the disk every time. The size of commit 
> log to write is too small (< 4KB). When high concurrency, these writes are 
> gathered and persisted to the disk at once. But, when insufficient 
> concurrency, many small writes are issued and the performance decreases due 
> to the latency of the disk. Even if you use SSD, processes of many IO 
> commands decrease the performance.
> GroupCommitLogService writes some commitlog to the disk at once.
> The patch adds GroupCommitLogService (It is enabled by setting 
> `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml).
> The difference from Batch is just only waiting for the semaphore.
> By waiting for the semaphore, some writes for commit logs are executed at the 
> same time.
> In GroupCommitLogService, the latency becomes worse if the there is no 
> concurrency.
> I measured the performance with my microbench (MicroRequestThread.java) by 
> increasing the number of threads.The cluster has 3 nodes (Replication factor: 
> 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume.
> The result is as below. The GroupCommitLogService with 10ms window improved 
> update with Paxos by 94% and improved select with Paxos by 76%.
> h6. SELECT / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|192|103|
> |2|163|212|
> |4|264|416|
> |8|454|800|
> |16|744|1311|
> |32|1151|1481|
> |64|1767|1844|
> |128|2949|3011|
> |256|4723|5000|
> h6. UPDATE / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|45|26|
> |2|39|51|
> |4|58|102|
> |8|102|198|
> |16|167|213|
> |32|289|295|
> |64|544|548|
> |128|1046|1058|
> |256|2020|2061|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to