[
https://issues.apache.org/jira/browse/KAFKA-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061843#comment-15061843
]
Arkadiusz Firus commented on KAFKA-2997:
----------------------------------------
[~granthenke]
We are currently considering Kafka as a message backbone and a general log
(source of truth) accessible for all systems. Because we are working on
financial data we have to have much more guarantees than the memory replication.
[~fpj]
I want to have guarantee that when the client call returns the message will be
persisted to disk. On the other hand I do not want invoke flush after every
message because it has very negative impact to the performance. In VoltDB
(https://voltdb.com/) they have had a similar problem - they want to have a
persistence to disk and high performance. They have found a solution - gather
few writes to disk (from different sessions) in a one batch and then invoke
fsync. I want to use this approach in Kafka. A thread which wants to write a
data will wait a few ms because maybe in this time there will be other threads
which wants to write data to the same partition.
Running a thread in a loop (instead of timer) could be also a good solution. I
have to think about this.
Thank you very much for the link.
> Synchronous write to disk
> -------------------------
>
> Key: KAFKA-2997
> URL: https://issues.apache.org/jira/browse/KAFKA-2997
> Project: Kafka
> Issue Type: Improvement
> Components: core
> Affects Versions: 0.9.0.0
> Reporter: Arkadiusz Firus
> Priority: Minor
> Labels: features, patch
>
> Hi All,
> I am currently work on a mechanism which allows to do an efficient
> synchronous writing to the file system. My idea is to gather few write
> requests for one partition and after that call the fsync.
> As I read the code I find out that the best place to do it is to modify:
> kafka.log.Log.append
> method. Currently at the end of the method (line 368) there is a verification
> if the number of unflushed messages is greater than the flush interval
> (configuration parameter).
> I am thinking of extending this condition. I want to add additional boolean
> configuration parameter (sync write or something like this). If this
> parameter is set to true at the end of this method the thread should hang on
> a lock. On the other hand there will be another timer thread (for every
> partition) which will be invoked every 10ms (configuration parameter). During
> invocation the thread will call flush method and after that will be releasing
> all hanged threads.
> I am writing here because I would like to know your opinion about such
> approach. Do you think this one is good or maybe someone have a better (more
> permanent) one. I would also like to know if such approach is according to
> general Kafka architecture.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)