[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15618319#comment-15618319
 ] 

Venkateswararao Jujjuri (JV) commented on BOOKKEEPER-934:
---------------------------------------------------------

I gave more thought on this. Good path is pretty easy and straightforward.
I believe error path has multiple things we need to worry about. 

Scenario-1:
#1 - Non durable write, L1-E1  goes to bookies b1, b2, b3.
#2 - Some failover event happens and now new ensemble is B3, b4, b5.
#3 - Durable write, L1-E2 comes in writes to B3, B4, B5 and persists the write.
#4 - Now we have only 1 copy of L1-E1. and stay like that until we detect and 
rereplicate it.

To make things worse:
In #2 -> All 3 bookies, b1, b2, b3 goes down and completely new ensemble comes 
up. b4, b5, b6.
In this scenario, We persist L1-E2 and assume everything is fine.
This scenario becomes a silent data loss issue. With current code, it is just a 
data availability issue.

To make this feature work:
We need to make sure that every sync write is more like a recovery operation. 
i.e make sure that *all* previous
entries are persisted before persisting current write and ack back. This 
becomes extremely expensive.

[~zhaijia] [~ayegorov]

> Relax durability
> ----------------
>
>                 Key: BOOKKEEPER-934
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-934
>             Project: Bookkeeper
>          Issue Type: Improvement
>            Reporter: Jia Zhai
>            Assignee: Jia Zhai
>
> I am thinking adding a new flag to bookkeeper#addEntry(..., Boolean sync). So 
> the application can control whether to sync or not for individual entries.
> - On the write protocol, adding a flag to indicate whether this write should 
> sync to disk or not.
> - On the bookie side, if the addEntry request is sync, going through original 
> pipeline. If the addEntry disables sync,    complete the add callbacks after 
> writing to the journal file and before flushing journal.
> - Those add entries (disabled syncs) will be flushed to disks with subsequent 
> sync add entries.
> There is already a discussion in mail thread, here this ticket could gather 
> ideas, and provide the discussion materials



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to