[jira] [Comment Edited] (HBASE-13877) Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL

2015-06-09 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579764#comment-14579764
 ] 

Enis Soztutar edited comment on HBASE-13877 at 6/10/15 12:06 AM:
-

bq.  There is no other caller? Just call shutdown rather than shutdownNow?
Yeah, looked like the intention for writing that method was to do interruption, 
but it is called from cancelTasks() which calls cancel(false). Let me update 
the patch. 


was (Author: enis):
bq.  There is no other caller? Just call shutdown rather than shutdownNow?
Yeah, looked like the intention for writing that method to do interruption, but 
it is called from cancelTasks() which calls cancel(false). Let me update the 
patch. 

 Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL
 

 Key: HBASE-13877
 URL: https://issues.apache.org/jira/browse/HBASE-13877
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: hbase-13877_v1.patch, hbase-13877_v2-branch-1.1.patch


 ITBLL with 1.25B rows failed for me (and Stack as reported in 
 https://issues.apache.org/jira/browse/HBASE-13811?focusedCommentId=14577834page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14577834)
  
 HBASE-13811 and HBASE-13853 fixed an issue with WAL edit filtering. 
 The root cause this time seems to be different. It is due to procedure based 
 flush interrupting the flush request in case the procedure is cancelled from 
 an exception elsewhere. This leaves the memstore snapshot intact without 
 aborting the server. The next flush, then flushes the previous memstore with 
 the current seqId (as opposed to seqId from the memstore snapshot). This 
 creates an hfile with larger seqId than what its contents are. Previous 
 behavior in 0.98 and 1.0 (I believe) is that after flush prepare and 
 interruption / exception will cause RS abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-13877) Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL

2015-06-09 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14579870#comment-14579870
 ] 

Duo Zhang edited comment on HBASE-13877 at 6/10/15 2:16 AM:


{quote}
i've added extra abort() statement to safeguard against cases where the caller 
does not handle this exception explicitly.
{quote}
Then this is the caller's fault? Why not fix it at the caller side?
Or we could remove the abort regionserver semantics of 
DroppedSnapshotException so the caller do not need to deal with it anymore? 
We'd better make the rule clear otherwise people may get confusing...


was (Author: apache9):
{code}
i've added extra abort() statement to safeguard against cases where the caller 
does not handle this exception explicitly.
{code}
Then this is the caller's fault? Why not fix it at the caller side?
Or we could remove the abort regionserver semantics of 
DroppedSnapshotException so the caller do not need to deal with it anymore? 
We'd better make the rule clear otherwise people may get confusing...

 Interrupt to flush from TableFlushProcedure causes dataloss in ITBLL
 

 Key: HBASE-13877
 URL: https://issues.apache.org/jira/browse/HBASE-13877
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 2.0.0, 1.2.0, 1.1.1

 Attachments: hbase-13877_v1.patch, hbase-13877_v2-branch-1.1.patch


 ITBLL with 1.25B rows failed for me (and Stack as reported in 
 https://issues.apache.org/jira/browse/HBASE-13811?focusedCommentId=14577834page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14577834)
  
 HBASE-13811 and HBASE-13853 fixed an issue with WAL edit filtering. 
 The root cause this time seems to be different. It is due to procedure based 
 flush interrupting the flush request in case the procedure is cancelled from 
 an exception elsewhere. This leaves the memstore snapshot intact without 
 aborting the server. The next flush, then flushes the previous memstore with 
 the current seqId (as opposed to seqId from the memstore snapshot). This 
 creates an hfile with larger seqId than what its contents are. Previous 
 behavior in 0.98 and 1.0 (I believe) is that after flush prepare and 
 interruption / exception will cause RS abort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)