ctubbsii commented on code in PR #5988:
URL: https://github.com/apache/accumulo/pull/5988#discussion_r2557850588


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -1280,7 +1280,9 @@ public enum Property {
       "The durability used to write to the write-ahead log. Legal values are:"
           + " none, which skips the write-ahead log; log, which sends the data 
to the"
           + " write-ahead log, but does nothing to make it durable; flush, 
which pushes"
-          + " data to the file system; and sync, which ensures the data is 
written to disk.",
+          + " data to the file system; and sync, which ensures the data is 
written to disk."

Review Comment:
   > Introducing a bug is a one thing, I was also worried about the performance 
impact being unknown.
   
   I think there are 3 kinds of risks that users might want to consider:
   
   1. Risks to data integrity
   2. Risks to data availability
   3. Risks to the system
   
   I think the way we have talked about `table.durability` in the past, and 
called it by that name (vs. something like `wal.durability`), is to communicate 
that the main trade-off here is data integrity vs. data availability. We accept 
some loss of availability (in terms of speed) in exchange for lower risk of 
integrity loss. And, specifically, I think the trade-off of most concern is the 
overall performance, not the performance of individual components (RFile vs. 
WAL). The pressing issue for users, at least the way we've talked about it 
before, is to protect against integrity loss, and for that, overall performance 
is what matters. If this property controls both, I don't think that 
conversation is going to change at all, and users are still going to be 
weighing the overall performance against the risk of integrity loss when 
deciding which setting to use... even if the overall performance calculation is 
a little different with this change in behavior.
   
   However, what hasn't yet been brought up is that when things fail, they fail 
very differently if there is a loss of integrity in the RFile vs. loss of 
integrity in the WAL. If something is appended to a WAL, but not sync'd 
properly, Accumulo will simply not have the data when it recovers from a 
failure. That's a pretty low risk to the system, regardless of which durability 
setting you choose. However, if the same thing happens to an RFile, now 
Accumulo will not work correctly... data won't just be missing... there is 
likely to be corrupt files that prevent Accumulo from behaving correctly. The 
data is not just lost, but interfering with proper operations. This is an 
entirely different calculation of risk for users. While they may be willing to 
tolerate a lower durability in exchange for performance, they may not be 
willing to tolerate a lower durability in exchange for the potential to have to 
perform "surgery" on an Accumulo system to get it back into working order.
   
   For me, the argument for making them different properties would be to ensure 
that the RFile durability is *at least as high* as the WAL durability, in order 
to mitigate the recovery time/effort risks. But, I think I would strongly 
prefer max durability on the RFiles, regardless of what is used for the WALs. 
If Accumulo's behavior was to automatically recover from corrupt RFiles, based 
on the durability, to achieve a similar effect as an unfinished WAL, then I can 
see the argument for always making them equal. However, if there were two 
separate properties, I would be inclined to always select the maximum 
durability for RFiles, and not relax it for performance reasons.
   
   So, the options are:
   
   1. Have one property that controls both behaviors (best if the focus is on 
data integrity vs. overall performance trade-offs only, or if we were to 
automate more graceful recovery when encountering a corrupt RFile on a table 
configured for a lower durability)
   2. Have two properties (good if we wish to wish to also consider risks to 
system integrity)
   3. Have one property that only controls the WALs, like we do, and just 
always do max durability possible for RFiles (perhaps best if we consider 
system integrity to be paramount)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to