[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156858#comment-13156858 ] Todd Lipcon commented on HBASE-4862: wait, wait -- _why_ is this happening concurrently? A region should never be opened until the split process is done for that region. If this is happening we have a much larger issue, which we shouldn't be working around with tmp file names, etc. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156866#comment-13156866 ] Ted Yu commented on HBASE-4862: --- @Chunhui: Can you attach master and region server log snippets which would show us what happened ? Thanks > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156965#comment-13156965 ] chunhui shen commented on HBASE-4862: - @Ted Yu @Todd Lipcon It will happen concurrently in the following case: 1.Move region from server A to server B (for example,do balance) 2.kill server A and Server B 3.restart server A and Server B immediately Before we restart server A and Server B, log data about this region appear in the both server's log file, 4.After we restart server B, serverShutdownHandler process this dead server , and assign this region, 5.At the same time, serverShutdownHandler would process dead server B, and split server B's hlog because 4 and 5 is concurrent, replayRecoveredEditsIfAny in 4 and appending log entry for this region's recoverd.edit file are concurrent. So, when the recoverd.edit file deleted by replayRecoveredEdits, exception is thrown. master and region server log in this case as the following: master log: 2011-11-16 11:50:13,037 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while writing log entry to log org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680 File does not exist. [Lease. Holder: DFSClient_hb_m_dw75.kgb.sqa.cm4:6_1321413286871, pendingcreates: 54] at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1542) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1533) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1449) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:96) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:49) at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:962) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:926) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:898) regionserver log: 2011-11-16 11:49:49,727 ERROR org.apache.hadoop.hbase.regionserver.HRegion: Failed delete of hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680 2011-11-16 11:49:49,732 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Deleted recovered.edits file=hdfs://dw74.kgb.sqa.cm4:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156800103 > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edi
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156967#comment-13156967 ] chunhui shen commented on HBASE-4862: - After successfully move region from server A to server B, the log about this region in server A's log file is successful because flushed already, but it affects other regions'log data in server A's log file if encounter this exception when split hlog > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156991#comment-13156991 ] chunhui shen commented on HBASE-4862: - @Ted @Todd I'm sorry my explanation is not clear. I think I should descibe the detailed case first. In the whole following process , client's putting data to region C. 1.Sucessfully move region C from server A to server B, At the moment,there is log entry about region C in both server A's log file and server B's log file 2.kill server A and server B, 3.restart server B, Now, mastet start serverShutdownHanlder for server B, and assign the region C to server D 4,Before region C is opend on the server D,restart server A Now,mastet start serverShutdownHanlder for server A, and split server A's log file. Because there is log entry about region C in server A's log file (why? see 1), split hlog thread would create a file F in the region C's recovered.edits directory. 5.In region C opening process, it will execute replayRecoveredEdits(),and then delete file F. 6.Therefore,in the 4, it throws IO Exception that file F not exists, and cause stopping parse the current server A's hlog file, however, other data in this server A's hlog file lossed The posted region server log is server B's log, and it is doing replayRecoveredEditsIfAny(). Although it prints failed delete of file recovered.edits/13156791680, but in fact this file has been deleted, and master throws file not exist exception : 2011-11-16 11:50:13,037 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while writing log entry to log org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680 File does not exist. I'm not sure whether you are clear now, waiting for your question. Thanks! > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157250#comment-13157250 ] Ted Yu commented on HBASE-4862: --- Log snippets from Chunhui. Region C was 3591e9867a4c125493dc82168854ea0c File F was 13156791680 Master log: {code} 2011-11-16 11:47:23,134 INFO org.apache.hadoop.hbase.master.ServerManager: Triggering server recovery; existingServer serverB,60020,1321415172631 looks stale 2011-11-16 11:47:23,134 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=serverB,60020,1321415172631 to dead servers, submitted shutdown handler to be executed, root=false, meta=true 2011-11-16 11:47:29,305 INFO org.apache.hadoop.hbase.master.ServerManager: Triggering server recovery; existingServer serverA,60020,1321415179549 looks stale 2011-11-16 11:47:29,305 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=serverA,60020,1321415179549 to dead servers, submitted shutdown handler to be executed, root=false, meta=false 2011-11-16 11:48:28,700 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 28 hlog(s) in hdfs://serverX:9000/hbase-common/.logs/serverB,60020,1321414043798 2011-11-16 11:48:30,657 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer path=hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156800103 region=3591e9867a4c125493dc82168854ea0c 2011-11-16 11:49:17,855 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Closed path hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156800103 (wrote 75875 edits in 3228ms) 2011-11-16 11:49:19,629 INFO org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting 28 hlog(s) in hdfs://serverX:9000/hbase-common/.logs/serverA,60020,1321414056134 2011-11-16 11:49:20,650 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Creating writer path=hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680 region=3591e9867a4c125493dc82168854ea0c 2011-11-16 11:49:36,731 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c. to serverD,60020,1321415224381 2011-11-16 11:49:49,755 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c. on serverD,60020,1321415224381 2011-11-16 11:50:13,030 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680 File does not exist. 2011-11-16 11:50:13,037 FATAL org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: WriterThread-1 Got while writing log entry to log org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680 File does not exist. 2011-11-16 11:50:13,051 ERROR org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting hdfs://serverX:9000/hbase-common/.logs/serverA,60020,1321414056134 {code} Log from region server D: {code} 2011-11-16 11:49:36,730 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c. 2011-11-16 11:49:49,727 ERROR org.apache.hadoop.hbase.regionserver.HRegion: Failed delete of hdfs://serverX:9000/hbase-common/writetest1/3591e9867a4c125493dc82168854ea0c/recovered.edits/13156791680 2011-11-16 11:49:49,733 INFO org.apache.hadoop.hbase.regionserver.HRegion: Onlined writetest1,19ILNKUHRKQ3BT0FLC9CMVWBP2ZPRV4W7XYA491BE6ZS2JE9132BO5GABIHNJHDU79TXBA4OOAP8OEIVTQ0PDHZB26QI5XHY17BK,1321267032810.3591e9867a4c125493dc82168854ea0c.; next sequenceid=13160672878 {code} > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > >
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157270#comment-13157270 ] ramkrishna.s.vasudevan commented on HBASE-4862: --- If the scenario is valid do we need to up the priority of this defect? But may not be common. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for > trunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157289#comment-13157289 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505060/hbase-4862v1+for+trunk.diff against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestRollingRestart org.apache.hadoop.hbase.master.TestRestartCluster org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler org.apache.hadoop.hbase.regionserver.wal.TestHLogBench org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.TestInfoServers org.apache.hadoop.hbase.regionserver.TestParallelPut org.apache.hadoop.hbase.regionserver.wal.TestLogRolling org.apache.hadoop.hbase.regionserver.TestStoreFileBlockCacheSummary org.apache.hadoop.hbase.TestRegionRebalancing org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort org.apache.hadoop.hbase.regionserver.TestFSErrorsExposed org.apache.hadoop.hbase.ipc.TestDelayedRpc org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.regionserver.wal.TestWALReplay org.apache.hadoop.hbase.master.TestHMasterRPCException org.apache.hadoop.hbase.regionserver.TestHRegion org.apache.hadoop.hbase.client.TestMultipleTimestamps org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.client.TestMetaScanner org.apache.hadoop.hbase.master.TestMaster org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable org.apache.hadoop.hbase.TestDrainingServer org.apache.hadoop.hbase.regionserver.TestSplitLogWorker org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion org.apache.hadoop.hbase.avro.TestAvroServer org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit org.apache.hadoop.hbase.thrift.TestThriftServer org.apache.hadoop.hbase.regionserver.TestRegionServerMetrics org.apache.hadoop.hbase.master.TestMasterFailover org.apache.hadoop.hbase.regionserver.wal.TestHLog org.apache.hadoop.hbase.TestMultiVersions org.apache.hadoop.hbase.master.TestMasterTransitions org.apache.hadoop.hbase.master.TestSplitLogManager org.apache.hadoop.hbase.master.TestOpenedRegionHandler org.apache.hadoop.hbase.thrift.TestThriftServerCmdLine Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/369//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/369//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/369//console This message is automatically generated. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, hbase-4862v1 for 0.90.diff, hbase-4862v1 for > trunk.diff >
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157327#comment-13157327 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505162/4862.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/371//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/371//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/371//console This message is automatically generated. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157334#comment-13157334 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505167/4862.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/373//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/373//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/373//console This message is automatically generated. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157339#comment-13157339 ] Ted Yu commented on HBASE-4862: --- I could run test suite by executing 'mvn test' on MacBook. PreCommit builds 371 and 373 didn't run any tests. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157341#comment-13157341 ] Ted Yu commented on HBASE-4862: --- When attaching patch, please grant license to ASF. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157367#comment-13157367 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505172/hbase-4862v1+for+trunk.diff against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/374//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/374//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/374//console This message is automatically generated. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157379#comment-13157379 ] Ted Yu commented on HBASE-4862: --- Chunhui ran the patch through test suite. The OS is: Red Hat Enterprise Linux Server release 5.4 (Tikanga) {code} Results : Failed tests: testResetZooKeeperSession(org.apache.hadoop.hbase.replication.TestReplicationPeer): ReplicationPeer ZooKeeper session was not properly expired. testClosing(org.apache.hadoop.hbase.client.TestHCM) Tests run: 1173, Failures: 2, Errors: 0, Skipped: 8 [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 2:02:44.930s {code} testClosing failure is captured in HBASE-4874. TestReplicationPeer passed when run manually. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157385#comment-13157385 ] Ted Yu commented on HBASE-4862: --- @Todd: Do you need more details from Chunhui ? Thanks > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157388#comment-13157388 ] Ted Yu commented on HBASE-4862: --- {code} +if (fileName.endsWith(HLog.RECOVERED_LOG_TMPFILE_SUFFIX)) + fileName = fileName.split(HLog.RECOVERED_LOG_TMPFILE_SUFFIX)[0]; {code} Please enclose the second line above in curly braces. w.r.t. fs.rename() call, here is javadoc from ClientProtocol.rename(which is called by fs.rename): {code} * @return true if successful, or false if the old name does not exist * or if the new name already belongs to the namespace. {code} We should check the return value along with catching exception. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157406#comment-13157406 ] Ted Yu commented on HBASE-4862: --- Thanks for the quick turnaround. {code} +throw new IOException("Failed rename " + wap.p + " to " + dst); {code} The above should read 'Failed renaming '. For HLog.java: {code} + if (p.getName().endsWith(RECOVERED_LOG_TMPFILE_SUFFIX)) +result = false; {code} Please add curly braces for the above as well. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157407#comment-13157407 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505178/hbase-4862v2fortrunk.diff against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/376//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/376//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/376//console This message is automatically generated. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157410#comment-13157410 ] chunhui shen commented on HBASE-4862: - @Ted I has amend the patch again Please check > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157411#comment-13157411 ] Ted Yu commented on HBASE-4862: --- +1 on patch v3. Please run patch for 0.90 through test suite and let us know the results. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157412#comment-13157412 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505180/hbase-4862v3fortrunk.diff against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/377//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/377//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/377//console This message is automatically generated. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157525#comment-13157525 ] shenchunhui commented on HBASE-4862: Ted, I find patch v3 make some failed test after changing fs.rename(wap.p, dst) to if (!fs.rename(wap.p, dst)) { throw new IOException("Failed renaming " + wap.p + " to " + dst); } I will amend it , and give you test results later > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157636#comment-13157636 ] Jonathan Hsieh commented on HBASE-4862: --- How feasible is it to add testing to this patch? Maybe simulate the failure situation by aborting RS's and then starting them like in the TestSplitTransactionOnCluster tests? > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157638#comment-13157638 ] chunhui shen commented on HBASE-4862: - @Jonathan I think we could add testing to this patch through doing region's replayrecoverdedit after creating writer when doing splitlog. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157652#comment-13157652 ] chunhui shen commented on HBASE-4862: - @Ted I add testing to this patch in patchV5. In the OS:Red Hat Enterprise Linux Server release 5.4 (Tikanga) The test results is as the following: For trunk with patchV5: _ Results : Failed tests: testResetZooKeeperSession(org.apache.hadoop.hbase.replication.TestReplicationPeer): ReplicationPeer ZooKeeper session was not properly expired. testClosing(org.apache.hadoop.hbase.client.TestHCM) Tests run: 1174, Failures: 2, Errors: 0, Skipped: 8 [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 2:00:49.122s [INFO] Finished at: Sun Nov 27 02:41:40 CST 2011 [INFO] Final Memory: 35M/361M [INFO] _ For 0.90 with patchV5: _ Results : Tests run: 702, Failures: 0, Errors: 0, Skipped: 9 [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 1:15:37.342s [INFO] Finished at: Sun Nov 27 11:00:07 CST 2011 [INFO] Final Memory: 26M/525M [INFO] _ The failed two tests In trunk are the same as the last run, one of which(testResetZooKeeperSession#TestReplicationPeer) could passed separately, and the other is related to HBASE-4874 > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157662#comment-13157662 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505225/hbase-4862v5fortrunk.diff against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestRollingRestart org.apache.hadoop.hbase.util.TestRegionSplitter org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.master.TestRestartCluster org.apache.hadoop.hbase.thrift2.TestThriftHBaseServiceHandler org.apache.hadoop.hbase.client.TestInstantSchemaChange org.apache.hadoop.hbase.regionserver.wal.TestHLogBench org.apache.hadoop.hbase.rest.TestGzipFilter org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD org.apache.hadoop.hbase.regionserver.TestAtomicOperation org.apache.hadoop.hbase.rest.TestScannersWithFilters org.apache.hadoop.hbase.TestInfoServers org.apache.hadoop.hbase.regionserver.TestParallelPut org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.regionserver.wal.TestLogRolling org.apache.hadoop.hbase.filter.TestColumnRangeFilter org.apache.hadoop.hbase.mapred.TestTableInputFormat org.apache.hadoop.hbase.client.TestHCM org.apache.hadoop.hbase.regionserver.TestStoreFileBlockCacheSummary org.apache.hadoop.hbase.util.hbck.TestOfflineMetaRebuildHole org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.rest.TestStatusResource org.apache.hadoop.hbase.TestRegionRebalancing org.apache.hadoop.hbase.regionserver.wal.TestLogRollAbort org.apache.hadoop.hbase.rest.TestVersionResource org.apache.hadoop.hbase.client.TestScannerTimeout org.apache.hadoop.hbase.client.TestFromClientSide org.apache.hadoop.hbase.regionserver.TestFSErrorsExposed org.apache.hadoop.hbase.coprocessor.TestAggregateProtocol org.apache.hadoop.hbase.rest.TestRowResource org.apache.hadoop.hbase.rest.TestScannerResource org.apache.hadoop.hbase.ipc.TestDelayedRpc org.apache.hadoop.hbase.rest.client.TestRemoteAdmin org.apache.hadoop.hbase.util.TestFSUtils org.apache.hadoop.hbase.master.TestDistributedLogSplitting org.apache.hadoop.hbase.rest.TestTableResource org.apache.hadoop.hbase.regionserver.wal.TestWALReplay org.apache.hadoop.hbase.util.TestIdLock org.apache.hadoop.hbase.catalog.TestCatalogTrackerOnCluster org.apache.hadoop.hbase.rest.TestTransform org.apache.hadoop.hbase.coprocessor.TestCoprocessorEndpoint org.apache.hadoop.hbase.client.TestInstantSchemaChangeSplit org.apache.hadoop.hbase.regionserver.TestHRegion org.apache.hadoop.hbase.client.TestMultipleTimestamps org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort org.apache.hadoop.hbase.catalog.TestMetaReaderEditor org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.client.TestMetaScanner org.apache.hadoop.hbase.io.hfile.TestHFileBlock org.apache.hadoop.hbase.client.TestTimestampsFilter org.apache.hadoop.hbase.client.TestInstantSchemaChangeFailover org.apache.hadoop.hbase.client.TestShell org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable org.apache.hadoop.hbase.coprocessor.TestRegionObserverBypass
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157671#comment-13157671 ] Jonathan Hsieh commented on HBASE-4862: --- @chenhui I have a question and a few nits. What happens if the .temp gets left behind without being renamed? You might want to mention that hlogs files in progress (.temp file suffixed) are excluded here. {code} +// After creating writer, simulate partial region's +// replayRecoveredEditsIfAny() which gets SplitEditFiles of this +// region,and delete them. {code} Also, probably want to update javadoc of getSplitEditFilesSorted. Comment should probably be "most likely" instead of "mostly" {code} +try{ + logSplitter.splitLog(); +} catch (IOException e) { + LOG.info(e); + Assert.fail("Throws IOException when spliting " + + "log, it is mostly because writing file does not " + + "exist which is caused by concurrent replayRecoveredEditsIfAny()"); +} +if (fs.exists(corruptDir)) { + if (fs.listStatus(corruptDir).length > 0) { +Assert.fail("There are some corrupt logs, " ++ "it is mostly caused by concurrent replayRecoveredEditsIfAny()"); + } +} + } {code} > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157678#comment-13157678 ] chunhui shen commented on HBASE-4862: - @Jonathan What happens if the .temp gets left behind without being renamed? If the the .temp gets left ,it means the spliting log is failed, and the .temp file would be deleted in the next spliting log. You could find that, for the same splitted hlog file, it creates the same name file in the region's recoverd.edits directory Thanks for your suggestion. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157680#comment-13157680 ] Ted Yu commented on HBASE-4862: --- {code} // Skip the test which creates a splitter that reads and writes the // data without touching disk. testThreading#TestHLogSplit .etc if (fs.exists(wap.p)) { {code} The javadoc should read: {code} // Skip the unit tests which create a splitter that reads and writes the // data without touching disk. TestHLogSplit#testThreading is an example. {code} Specific test is represented by classname#testname > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for trunk.diff, hbase-4862v1 for > trunk.diff, hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157918#comment-13157918 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505251/4862-v6-trunk.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/379//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/379//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/379//console This message is automatically generated. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-v6-90.txt, 4862-v6-trunk.txt, 4862.patch, 4862.txt, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for > trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, > hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, > hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157919#comment-13157919 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505252/4862-v6-90.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/380//console This message is automatically generated. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-v6-90.txt, 4862-v6-trunk.txt, 4862.patch, 4862.txt, > hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, hbase-4862v1 for > trunk.diff, hbase-4862v1 for trunk.diff, hbase-4862v2for0.90.diff, > hbase-4862v2fortrunk.diff, hbase-4862v3for0.90.diff, > hbase-4862v3fortrunk.diff, hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158141#comment-13158141 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505283/4862-v6-trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/387//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/387//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/387//console This message is automatically generated. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, > 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, > hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158146#comment-13158146 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505285/hbase-4862v7fortrunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 javadoc. The javadoc tool appears to have generated -162 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 67 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/388//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/388//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/388//console This message is automatically generated. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-v6-90.txt, 4862-v6-trunk.patch, 4862.patch, > 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, > hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, > hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158151#comment-13158151 ] Hadoop QA commented on HBASE-4862: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12505287/4862-0.92.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/389//console This message is automatically generated. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, > 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, > hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, > hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158177#comment-13158177 ] Ted Yu commented on HBASE-4862: --- Integrated to 0.90, 0.92 branches and TRUNK. Thanks for the patch Chunhui. Thanks for the review Jonathan. > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, > 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, > hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, > hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158178#comment-13158178 ] Hudson commented on HBASE-4862: --- Integrated in HBase-TRUNK #2490 (See [https://builds.apache.org/job/HBase-TRUNK/2490/]) HBASE-4862 Splitting hlog and opening region concurrently may cause data loss (Chunhui Shen) tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, > 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, > hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, > hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158266#comment-13158266 ] Hudson commented on HBASE-4862: --- Integrated in HBase-0.92-security #20 (See [https://builds.apache.org/job/HBase-0.92-security/20/]) HBASE-4862 Splitting hlog and opening region concurrently may cause data loss (Chunhui Shen) move JIRA to 0.90 section in CHANGES.txt HBASE-4862 Splitting hlog and opening region concurrently may cause data loss (Chunhui Shen) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, > 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, > hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, > hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158321#comment-13158321 ] Hudson commented on HBASE-4862: --- Integrated in HBase-TRUNK-security #12 (See [https://builds.apache.org/job/HBase-TRUNK-security/12/]) HBASE-4862 Splitting hlog and opening region concurrently may cause data loss (Chunhui Shen) Move JIRA to 0.90 section HBASE-4862 Splitting hlog and opening region concurrently may cause data loss (Chunhui Shen) tedyu : Files : * /hbase/trunk/CHANGES.txt tedyu : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, > 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, > hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, > hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158347#comment-13158347 ] Hudson commented on HBASE-4862: --- Integrated in HBase-TRUNK #2491 (See [https://builds.apache.org/job/HBase-TRUNK/2491/]) HBASE-4862 Splitting hlog and opening region concurrently may cause data loss (Chunhui Shen) Move JIRA to 0.90 section tedyu : Files : * /hbase/trunk/CHANGES.txt > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, > 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, > hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, > hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158835#comment-13158835 ] stack commented on HBASE-4862: -- This is integrated. Can we close it? > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, > 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, > hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, > hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4862) Splitting hlog and opening region concurrently may cause data loss
[ https://issues.apache.org/jira/browse/HBASE-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159790#comment-13159790 ] Hudson commented on HBASE-4862: --- Integrated in HBase-0.92 #163 (See [https://builds.apache.org/job/HBase-0.92/163/]) HBASE-4862 Splitting hlog and opening region concurrently may cause data loss (Chunhui Shen) move JIRA to 0.90 section in CHANGES.txt HBASE-4862 Splitting hlog and opening region concurrently may cause data loss (Chunhui Shen) tedyu : Files : * /hbase/branches/0.92/CHANGES.txt tedyu : Files : * /hbase/branches/0.92/CHANGES.txt * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java * /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java * /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java > Splitting hlog and opening region concurrently may cause data loss > -- > > Key: HBASE-4862 > URL: https://issues.apache.org/jira/browse/HBASE-4862 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: chunhui shen >Assignee: chunhui shen >Priority: Critical > Fix For: 0.92.0, 0.94.0, 0.90.5 > > Attachments: 4862-0.92.txt, 4862-v6-90.txt, 4862-v6-trunk.patch, > 4862.patch, 4862.txt, hbase-4862v1 for 0.90.diff, hbase-4862v1 for 0.90.diff, > hbase-4862v1 for trunk.diff, hbase-4862v1 for trunk.diff, > hbase-4862v2for0.90.diff, hbase-4862v2fortrunk.diff, > hbase-4862v3for0.90.diff, hbase-4862v3fortrunk.diff, > hbase-4862v5for0.90.diff, hbase-4862v5fortrunk.diff, > hbase-4862v7for0.90.patch, hbase-4862v7fortrunk.patch > > > Case Description: > 1.Split hlog thread creat writer for the file region A/recoverd.edits/123456 > and is appending log entry > 2.Regionserver is opening region A now, and in the process > replayRecoveredEditsIfAny() ,it will delete the file region > A/recoverd.edits/123456 > 3.Split hlog thread catches the io exception, and stop parse this log file > and if skipError = true , add it to the corrupt logsHowever, data in > other regions in this log file will loss > 4.Or if skipError = false, it will check filesystem.Of course, the file > system is ok , and it only prints a error log, continue assigning regions. > Therefore, data in other log files will also loss!! > The case may happen in the following: > 1.Move region from server A to server B > 2.kill server A and Server B > 3.restart server A and Server B > We could prevent this exception throuth forbiding deleting recover.edits > file > which is appending by split hlog thread -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira