[jira] [Commented] (HBASE-4282) Potential data loss in retries of WAL close introduced in HBASE-4222

Gary Helmling (Commented) (JIRA) Thu, 06 Oct 2011 14:43:52 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-4282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122326#comment-13122326
 ]


Gary Helmling commented on HBASE-4282:
--------------------------------------

bq. On v3, the txids are pretty useless at least out in logs? No harm logging 
them I suppose but there is nothing I can infer given a txid? Is that so?

Yes, txids are not so useful.  I can drop them from the logs.  I left them in 
as the analog of the previous version's deferred seqNum, which are moderately 
more useful.

{code}
-            if (unflushedEntries.get() <= syncedTillHere) {
-              Thread.sleep(this.optionalFlushInterval);
-            }
+            Thread.sleep(this.optionalFlushInterval);
{code}

This is reverting what I think is a dangerous change introduced by HBASE-4487.  
If the sync fails, then the if condition will be false, making the LogSyncer 
thread go into a hard loop until the sync succeeds.  This is going to interfere 
with attempting to perform the log roll, so I think it at least needs to be 
throttled.  The simplest change seemed to be restoring previous behavior.  I 
can move this into a separate issue, if you think broader discussion would be 
good.

{code}
+    TEST_UTIL.cleanupTestDir();
+    TEST_UTIL.shutdownMiniCluster();
{code}

cleanupTestDir() actually deletes the test directory in HDFS, so the cluster 
would need to be running for it.  But shutdownMiniCluster() does it's own 
cleanup of the local FS dirs for testing, so I don't think we need the 
additional cleanupTestDir() at all.

{code}
+    assertTrue("Need HDFS-826 for this test", log.canGetCurReplicas());
{code}

Sure, I'll add that in.
                
> Potential data loss in retries of WAL close introduced in HBASE-4222
> --------------------------------------------------------------------
>
>                 Key: HBASE-4282
>                 URL: https://issues.apache.org/jira/browse/HBASE-4282
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0, 0.94.0, 0.90.5
>            Reporter: Gary Helmling
>            Assignee: Gary Helmling
>            Priority: Blocker
>             Fix For: 0.92.0, 0.90.5
>
>         Attachments: HBASE-4282_0.90_2.patch, HBASE-4282_trunk_2.patch, 
> HBASE-4282_trunk_3.patch, HBASE-4282_trunk_prelim.patch
>
>
> The ability to ride over WAL close errors on log rolling added in HBASE-4222 
> could lead to missing HLog entries if:
> * A table has DEFERRED_LOG_FLUSH=true
> * There are unflushed WALEdit entries for that table in the current 
> SequenceFile writer buffer
> Since the writes were already acknowledged to the client, just ignoring the 
> close error to allow for another log roll doesn't seem like the right thing 
> to do here.
> We could easily flag this state and only ride over the close error if there 
> aren't unflushed entries.  This would bring the above condition back to the 
> previous behavior of aborting the region server.  However, aborting the 
> region server in this state is still guaranteeing data loss.  Is there 
> anything we can do better in this case?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4282) Potential data loss in retries of WAL close introduced in HBASE-4222

Reply via email to