[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510470#comment-14510470 ] Jeffrey Zhong commented on HBASE-13389: --- {quote} I don't see the WALEdit sequenceid being used when we replicate. Is this something to implement? (Sounds like a good idea... ) {quote} [~saint@gmail.com]I thought we already had used it because intra-replication did otherwise I can give a first try on this. [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations - Key: HBASE-13389 URL: https://issues.apache.org/jira/browse/HBASE-13389 Project: HBase Issue Type: Sub-task Components: Performance Reporter: stack Attachments: 13389.txt HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): {quote} Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. {quote} This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508495#comment-14508495 ] Jeffrey Zhong commented on HBASE-13389: --- [~saint@gmail.com] Well said and good examples! As of today. there are two cases that we could have out of order puts: DLR or replication, where the order of wal files to be replayed isn't guaranteed. For non-adjacent hfile compactions, it seems that we have to keep mvcc in KVs level, For example, hfile1(max mvcc=1) hfile2(max mvcc=2) and hfile3(max mvcc=3). If we just compact hfile1 and hfile3, we can't set the newly compacted hfile's max mvcc=3 because hfile2 may have same rows in either hfile1 or hfile2. Keeping mvcc will make the haunting out-of-order issue go away and one less concern. Let me know which option we should go and I can also help on the fix. [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations - Key: HBASE-13389 URL: https://issues.apache.org/jira/browse/HBASE-13389 Project: HBase Issue Type: Sub-task Components: Performance Reporter: stack Attachments: 13389.txt HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): {quote} Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. {quote} This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502173#comment-14502173 ] Jeffrey Zhong commented on HBASE-13389: --- {quote} All other cases we should be covering with metadata in the HFiles trailer, not on individual Cells. {quote} This may be hard to achieve because out of order puts can be flushed at different time. Let's say row1/logSeqId=2 is flushed earlier than row1/logSeqId=1. HFile trailer meta data's mvcc range will be overlapped among multiple HFiles. One option is that we can reinstate your original code by checking against the oldest running scanner and only keep mvcc around during region recovery time so that we can still keep HBASE-12600 goal. If not much overall read performance degrade(because this part may not be the bottleneck in the read path), I think it's better to keep current way so all cases can work correctly for out of order puts. How do you guys think? Thanks. [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations - Key: HBASE-13389 URL: https://issues.apache.org/jira/browse/HBASE-13389 Project: HBase Issue Type: Sub-task Components: Performance Reporter: stack Attachments: 13389.txt HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): {quote} Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. {quote} This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502373#comment-14502373 ] Jeffrey Zhong commented on HBASE-13389: --- That sounds good. We can shorter the time period to 2 or 3 days. In one case that keeping mvcc longer can gain some performance because it makes possible that we can compact HFiles out of order in minor compactions. [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations - Key: HBASE-13389 URL: https://issues.apache.org/jira/browse/HBASE-13389 Project: HBase Issue Type: Sub-task Components: Performance Reporter: stack Attachments: 13389.txt HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): {quote} Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. {quote} This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482558#comment-14482558 ] Jeffrey Zhong commented on HBASE-13389: --- Changing comparing order in row column - ts - seqId - type order can make things more consistently and doesn't change HBase current idempotence. For example, for puts with the same timestamp, the last put wins while if we do put, delete, put or delete, put , put and the delete always win. I think it's better that a delete should be treated as a put so users can have same exceptions as puts. Otherwise, for low time resolution OS or when a put is missing, we often want to check if there is a delete overshadowing newer puts. Yeah, keeping mvcc 3 days is good enough. [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations - Key: HBASE-13389 URL: https://issues.apache.org/jira/browse/HBASE-13389 Project: HBase Issue Type: Sub-task Components: Performance Reporter: stack Attachments: 13389.txt HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): {quote} Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. {quote} This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481348#comment-14481348 ] Jeffrey Zhong commented on HBASE-13389: --- {quote} To what does the above statement apply? To all three of your 'cases' or just to the last case, case #3? {quote} Just for case#3. The other two cases need mvcc around for a little bit time. [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations - Key: HBASE-13389 URL: https://issues.apache.org/jira/browse/HBASE-13389 Project: HBase Issue Type: Sub-task Components: Performance Reporter: stack Attachments: 13389.txt HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): {quote} Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. {quote} This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395994#comment-14395994 ] Jeffrey Zhong commented on HBASE-13389: --- Thanks [~lhofhansl] for looking this. I think your patch can help a bit. {quote} Do we need valid (non 0) mvcc readpoints for committed data (i.e. data that was flushed to an HFile and hence we'll never need to replay any HLogs for those)? Do we need these anywhere but in the memstore? {quote} There are three cases(I could think of and maybe more) that we need the logSeqId(mvcc) around to help us keep the put order. Assuming all put/deletes are of same row timestamp(version) case 1) region server recovery case We need mvcc(logSeqId) only when region is in recovery mode but not after recovery. case 2) replication receiving side, we need logSeqId to maintain the order because region move or recovery in replication playing side cause puts out of order We need mvcc for couple of days(to be safe) so that at least the data eventually in receiving side are correct. case 3) put , delete, put. Currently delete overshadows the later put but with logSeqId we can easily solve the issue because logSeqId is the real version of a put. Seems to me not needed(before I thought we need to keep mvcc around till a major compaction) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations - Key: HBASE-13389 URL: https://issues.apache.org/jira/browse/HBASE-13389 Project: HBase Issue Type: Sub-task Components: Performance Reporter: stack Attachments: 13389.txt HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): {quote} Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. {quote} This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14395995#comment-14395995 ] Jeffrey Zhong commented on HBASE-13389: --- There is another thought. If we can keep mvcc being part of key byte array(logically it is but not in key serialization deserialization) then we could use lazy read approach because mvcc is hardly used during key comparison. [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations - Key: HBASE-13389 URL: https://issues.apache.org/jira/browse/HBASE-13389 Project: HBase Issue Type: Sub-task Components: Performance Reporter: stack Attachments: 13389.txt HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): {quote} Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. {quote} This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394111#comment-14394111 ] Jeffrey Zhong commented on HBASE-13389: --- [~stack] The performance regression is due to we keep mvcc values longer(HBASE-11315) so comes the later change https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96. I'm surprised the extra mvcc value caused so much perf regression. Here is the code which calculates minSeqId to keep in file Compactor.java during compaction. {code} // when isAllFiles is true, all files are compacted so we can calculate the smallest // MVCC value to keep if(fd.minSeqIdToKeep file.getMaxMemstoreTS()) { fd.minSeqIdToKeep = file.getMaxMemstoreTS(); } // output to writer: for (Cell c : cells) { if (cleanSeqId c.getSequenceId() = smallestReadPoint) { CellUtil.setSequenceId(c, 0); } {code} [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations - Key: HBASE-13389 URL: https://issues.apache.org/jira/browse/HBASE-13389 Project: HBase Issue Type: Sub-task Components: Performance Reporter: stack HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): {quote} Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. {quote} This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394114#comment-14394114 ] Jeffrey Zhong commented on HBASE-13389: --- Should we keep the time period configuration shorter or revert all related changes? Thanks. [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations - Key: HBASE-13389 URL: https://issues.apache.org/jira/browse/HBASE-13389 Project: HBase Issue Type: Sub-task Components: Performance Reporter: stack HBASE-12600 moved the edit sequenceid from tags to instead exploit the mvcc/sequenceid slot in a key. Now Cells near-always have an associated mvcc/sequenceid where previous it was rare or the mvcc was kept up at the file level. This is sort of how it should be many of us would argue but as a side-effect of this change, read-time optimizations that helped speed scans were undone by this change. In this issue, lets see if we can get the optimizations back -- or just remove the optimizations altogether. The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. The optimizations undone by this changes are (to quote the optimizer himself, Mr [~lhofhansl]): {quote} Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. We're always storing the mvcc readpoints, and we never compare them against the actual smallestReadpoint, and hence we're always performing all the checks, tests, and comparisons that these jiras removed in addition to actually storing the data - which with up to 8 bytes per Cell is not trivial. {quote} This is the 'breaking' change: https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13172) TestDistributedLogSplitting.testThreeRSAbort fails several times on branch-1
[ https://issues.apache.org/jira/browse/HBASE-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352559#comment-14352559 ] Jeffrey Zhong commented on HBASE-13172: --- I just skimmed through the thread. It seems the test was stucked in isServerReachable(). [~Apache9] In order to make the test case stable you can set config hbase.master.maximum.ping.server.attempts to 3(by default it's 10). For isServerReachable() call, inside IOException catch block, we should check following conditions and return false immediately when any of them is true. 1) if current server is put in deadServer already 2) If current IOException is one of RegionServerStoppedException or ServerNotRunningYetException [~jxiang] The following code inside RegionStates seems unnecessary and should just return false(because the result of isServerReachable call may still return false positive info after retries) . In addition, should we expire the server instead directly put it in deadServers? Thanks. {code} if (serverManager.isServerReachable(server)) { return false; } // The size of deadServers won't grow unbounded. deadServers.put(hostAndPort, Long.valueOf(startCode)); {code} TestDistributedLogSplitting.testThreeRSAbort fails several times on branch-1 Key: HBASE-13172 URL: https://issues.apache.org/jira/browse/HBASE-13172 Project: HBase Issue Type: Bug Components: test Affects Versions: 1.1.0 Reporter: zhangduo The direct reason is we are stuck in ServerManager.isServerReachable. https://builds.apache.org/job/HBase-1.1/253/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testThreeRSAbort/ {noformat} 2015-03-06 04:06:19,430 DEBUG [AM.-pool300-t1] master.ServerManager(855): Couldn't reach asf906.gq1.ygridcore.net,59366,1425614770146, try=0 of 10 2015-03-06 04:07:10,545 DEBUG [AM.-pool300-t1] master.ServerManager(855): Couldn't reach asf906.gq1.ygridcore.net,59366,1425614770146, try=9 of 10 {noformat} The interval between first and last retry log is about 1 minute, and we only wait 1 minute so the test is timeout. Still do not know why this happen. And at last there are lots of this {noformat} 2015-03-06 04:07:21,529 DEBUG [AM.-pool300-t1] master.ServerManager(855): Couldn't reach asf906.gq1.ygridcore.net,59366,1425614770146, try=9 of 10 org.apache.hadoop.hbase.ipc.StoppedRpcClientException at org.apache.hadoop.hbase.ipc.RpcClientImpl.getConnection(RpcClientImpl.java:1261) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1146) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.getServerInfo(AdminProtos.java:22031) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getServerInfo(ProtobufUtil.java:1797) at org.apache.hadoop.hbase.master.ServerManager.isServerReachable(ServerManager.java:850) at org.apache.hadoop.hbase.master.RegionStates.isServerDeadAndNotProcessed(RegionStates.java:843) at org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1969) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1576) at org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:48) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} I think the problem is here {code:title=ServerManager.java} while (retryCounter.shouldRetry()) { ... try { retryCounter.sleepUntilNextRetry(); } catch(InterruptedException ie) { Thread.currentThread().interrupt(); } ... } {code} We need to break out of the while loop when getting InterruptedException, not just mark current thread as interrupted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13160) SplitLogWorker does not pick up the task immediately
[ https://issues.apache.org/jira/browse/HBASE-13160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350757#comment-14350757 ] Jeffrey Zhong commented on HBASE-13160: --- +1. Two very minor things: if we could append the condition {code} if (seq_start == taskReadySeq) { {code} to {code} if (seq_start == taskReadySeq numTasks==0) { {code} 2) the following isn't needed any more {noformat} if (childrenPaths != null) { return childrenPaths; } {noformat} SplitLogWorker does not pick up the task immediately Key: HBASE-13160 URL: https://issues.apache.org/jira/browse/HBASE-13160 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 1.1.0 Attachments: hbase-13160_v1.patch We were reading some code with Jeffrey, and we realized that the SplitLogWorker's internal task loop is weird. It does {{ls}} every second and sleeps, but have another mechanism to learn about new tasks, but does not make affective use of the zk notification. I have a simple patch which might improve this area. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13121) Async wal replication for region replicas and dist log replay does not work together
[ https://issues.apache.org/jira/browse/HBASE-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349872#comment-14349872 ] Jeffrey Zhong commented on HBASE-13121: --- Looks good to me(+1) with two minor comments: 1) Could you still set recovering to false and then submit the rest work into executor 2) openSeqNum in the following code may still use the old value? {code} status.setStatus(Writing region open event marker to WAL because recovery is finished); try { writeRegionOpenMarker(wal, openSeqNum); } catch (IOException e) { {code} Async wal replication for region replicas and dist log replay does not work together Key: HBASE-13121 URL: https://issues.apache.org/jira/browse/HBASE-13121 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 1.1.0 Attachments: hbase-13121_v1.patch We had not tested dist log replay while testing async wal replication for region replicas. There seems to be a couple of issues, but fixable. The distinction for dist log replay is that, the region will be opened for recovery and regular writes when a primary fails over. This causes the region open event marker to be written to WAL, but at this time, the region actually does not contain all the edits flushed (since it is still recovering). If secondary regions see this event, and picks up all the files in the region open event marker, then they can drop edits. The solution is: - Only write the region open event marker to WAL when region is out of recovering mode. - Force a flush out of recovering mode. This ensures that all data is force flushed in this case. Before the region open event marker is written, we guarantee that all data in the region is flushed, so the list of files in the event marker is complete. - Edits coming from recovery are re-written to WAL when recovery is in action. These edits will have a larger seqId then their original seqId. If this is the case, we do not replicate these edits to the secondary replicas. Since the dist log replay recovers edits out of order (coming from parallel replays from WAL file split tasks), this ensures that TIMELINE consistency is respected and edits are not seen out of order in secondaries. These edits are seen from secondaries via the forced flush event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12562) Handling memory pressure for secondary region replicas
[ https://issues.apache.org/jira/browse/HBASE-12562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348056#comment-14348056 ] Jeffrey Zhong commented on HBASE-12562: --- +1. Looks good to me with some some minor comments: 1) {code} + if (store.getSnapshotSize() 0) { +canDrop = false; + } {code} You can break the loop after set canDrop to false 2) Just to check acquiring lock on writestate and memstore are always in this order 3) There maybe no need for the following condition {code} +if (region.writestate.flushing {code} 4. Rename getBiggestMemstoreOfSecondaryRegion to getBiggestMemstoreOfRegionReplica may be better Handling memory pressure for secondary region replicas -- Key: HBASE-12562 URL: https://issues.apache.org/jira/browse/HBASE-12562 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 1.1.0 Attachments: hbase-12562_v1.patch This issue will track the implementation of how to handle the memory pressure for secondary region replicas. Since the replicas cannot flush by themselves, the region server might get blocked or cause extensive flushing for its primary regions. The design doc attached at HBASE-11183 contains two possible solutions that we can pursue. The first one is to not allow secondary region replicas to not flush by themselves, but instead of needed allow them to refresh their store files on demand (which possibly allows them to drop their memstore snapshots or memstores). The second approach is to allow the secondaries to flush to a temporary space. Both have pros and cons, but for simplicity and to not cause extra write amplification, we have implemented the first approach. More details can be found in the design doc, but we can also discuss other options here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11571) Bulk load handling from secondary region replicas
[ https://issues.apache.org/jira/browse/HBASE-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345997#comment-14345997 ] Jeffrey Zhong commented on HBASE-11571: --- Thanks [~enis] for the reviews! I've integrated the patch into master branch-1. Bulk load handling from secondary region replicas - Key: HBASE-11571 URL: https://issues.apache.org/jira/browse/HBASE-11571 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Jeffrey Zhong Fix For: 2.0.0, 1.1.0 Attachments: HBASE-11571-rebase.patch, HBASE-11571-v2.patch, hbase-11571.patch We should be replaying the bulk load events from the primary region replica in the secondary region replica so that the bulk loaded files will be made visible in the secondaries. This will depend on HBASE-11567 and HBASE-11568 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11571) Bulk load handling from secondary region replicas
[ https://issues.apache.org/jira/browse/HBASE-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-11571: -- Resolution: Fixed Fix Version/s: 1.1.0 2.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Bulk load handling from secondary region replicas - Key: HBASE-11571 URL: https://issues.apache.org/jira/browse/HBASE-11571 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Jeffrey Zhong Fix For: 2.0.0, 1.1.0 Attachments: HBASE-11571-rebase.patch, HBASE-11571-v2.patch, hbase-11571.patch We should be replaying the bulk load events from the primary region replica in the secondary region replica so that the bulk loaded files will be made visible in the secondaries. This will depend on HBASE-11567 and HBASE-11568 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13136) TestSplitLogManager.testGetPreviousRecoveryMode is flakey
[ https://issues.apache.org/jira/browse/HBASE-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344005#comment-14344005 ] Jeffrey Zhong commented on HBASE-13136: --- [~Apache9] There does exist an race condition. Since SplitLogManager has a chore(TimeoutMonitor) which creates rescan znode, the newly created rescan znode causes the flaky ness. Below is suggested changes OR we could also just fix test case to make sure there is no znode under splitLogZnode : {code} diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZKSplitLogManagerCoordination.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZKSplitLogManagerCoordination.java index 694ccff..8ed4357 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZKSplitLogManagerCoordination.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZKSplitLogManagerCoordination.java @@ -27,6 +27,7 @@ import static org.apache.hadoop.hbase.master.SplitLogManager.TerminationStatus.S import java.io.IOException; import java.io.InterruptedIOException; +import java.util.ArrayList; import java.util.List; import java.util.Set; import java.util.concurrent.ConcurrentMap; @@ -801,7 +802,16 @@ public class ZKSplitLogManagerCoordination extends ZooKeeperListener implements } if (previousRecoveryMode == RecoveryMode.UNKNOWN) { // Secondly check if there are outstanding split log task -ListString tasks = ZKUtil.listChildrenNoWatch(watcher, watcher.splitLogZNode); +ListString tmpTasks = ZKUtil.listChildrenNoWatch(watcher, watcher.splitLogZNode); +// Remove rescan nodes +ListString tasks = new ArrayListString(); +for(String tmpTask : tmpTasks) { + String znodePath = ZKUtil.joinZNode(watcher.splitLogZNode, tmpTask); + if (ZKSplitLog.isRescanNode(watcher, znodePath)) { +continue; + } + tasks.add(tmpTask); +} if (tasks != null !tasks.isEmpty()) { {code} TestSplitLogManager.testGetPreviousRecoveryMode is flakey - Key: HBASE-13136 URL: https://issues.apache.org/jira/browse/HBASE-13136 Project: HBase Issue Type: Bug Reporter: zhangduo Add test code to run it 100 times then we can make it fail always. {code:title=TestSplitLogManager.java} @Test public void test() throws Exception { for (int i = 0; i 100; i++) { setup(); testGetPreviousRecoveryMode(); teardown(); } } {code} Add then add some ugly debug logs(Yeah I usually debug in this way...) {code:title=ZKSplitLogManagerCoordination.java} @Override public void setRecoveryMode(boolean isForInitialization) throws IOException { synchronized(this) { if (this.isDrainingDone) { // when there is no outstanding splitlogtask after master start up, we already have up to // date recovery mode return; } } if (this.watcher == null) { // when watcher is null(testing code) and recovery mode can only be LOG_SPLITTING synchronized(this) { this.isDrainingDone = true; this.recoveryMode = RecoveryMode.LOG_SPLITTING; } return; } boolean hasSplitLogTask = false; boolean hasRecoveringRegions = false; RecoveryMode previousRecoveryMode = RecoveryMode.UNKNOWN; RecoveryMode recoveryModeInConfig = (isDistributedLogReplay(conf)) ? RecoveryMode.LOG_REPLAY : RecoveryMode.LOG_SPLITTING; // Firstly check if there are outstanding recovering regions try { ListString regions = ZKUtil.listChildrenNoWatch(watcher, watcher.recoveringRegionsZNode); LOG.debug(=== + regions); if (regions != null !regions.isEmpty()) { hasRecoveringRegions = true; previousRecoveryMode = RecoveryMode.LOG_REPLAY; } if (previousRecoveryMode == RecoveryMode.UNKNOWN) { // Secondly check if there are outstanding split log task ListString tasks = ZKUtil.listChildrenNoWatch(watcher, watcher.splitLogZNode); LOG.debug(=== + tasks); if (tasks != null !tasks.isEmpty()) { hasSplitLogTask = true; if (isForInitialization) { // during initialization, try to get recovery mode from splitlogtask int listSize = tasks.size(); for (int i = 0; i listSize; i++) { String task = tasks.get(i); try { byte[] data = ZKUtil.getData(this.watcher, ZKUtil.joinZNode(watcher.splitLogZNode, task)); if (data == null) continue; SplitLogTask slt = SplitLogTask.parseFrom(data); previousRecoveryMode =
[jira] [Commented] (HBASE-13136) TestSplitLogManager.testGetPreviousRecoveryMode is flakey
[ https://issues.apache.org/jira/browse/HBASE-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344191#comment-14344191 ] Jeffrey Zhong commented on HBASE-13136: --- Looks good to me. (+1). TestSplitLogManager.testGetPreviousRecoveryMode is flakey - Key: HBASE-13136 URL: https://issues.apache.org/jira/browse/HBASE-13136 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.1.0 Reporter: zhangduo Assignee: zhangduo Attachments: HBASE-13136.patch Add test code to run it 100 times then we can make it fail always. {code:title=TestSplitLogManager.java} @Test public void test() throws Exception { for (int i = 0; i 100; i++) { setup(); testGetPreviousRecoveryMode(); teardown(); } } {code} Add then add some ugly debug logs(Yeah I usually debug in this way...) {code:title=ZKSplitLogManagerCoordination.java} @Override public void setRecoveryMode(boolean isForInitialization) throws IOException { synchronized(this) { if (this.isDrainingDone) { // when there is no outstanding splitlogtask after master start up, we already have up to // date recovery mode return; } } if (this.watcher == null) { // when watcher is null(testing code) and recovery mode can only be LOG_SPLITTING synchronized(this) { this.isDrainingDone = true; this.recoveryMode = RecoveryMode.LOG_SPLITTING; } return; } boolean hasSplitLogTask = false; boolean hasRecoveringRegions = false; RecoveryMode previousRecoveryMode = RecoveryMode.UNKNOWN; RecoveryMode recoveryModeInConfig = (isDistributedLogReplay(conf)) ? RecoveryMode.LOG_REPLAY : RecoveryMode.LOG_SPLITTING; // Firstly check if there are outstanding recovering regions try { ListString regions = ZKUtil.listChildrenNoWatch(watcher, watcher.recoveringRegionsZNode); LOG.debug(=== + regions); if (regions != null !regions.isEmpty()) { hasRecoveringRegions = true; previousRecoveryMode = RecoveryMode.LOG_REPLAY; } if (previousRecoveryMode == RecoveryMode.UNKNOWN) { // Secondly check if there are outstanding split log task ListString tasks = ZKUtil.listChildrenNoWatch(watcher, watcher.splitLogZNode); LOG.debug(=== + tasks); if (tasks != null !tasks.isEmpty()) { hasSplitLogTask = true; if (isForInitialization) { // during initialization, try to get recovery mode from splitlogtask int listSize = tasks.size(); for (int i = 0; i listSize; i++) { String task = tasks.get(i); try { byte[] data = ZKUtil.getData(this.watcher, ZKUtil.joinZNode(watcher.splitLogZNode, task)); if (data == null) continue; SplitLogTask slt = SplitLogTask.parseFrom(data); previousRecoveryMode = slt.getMode(); if (previousRecoveryMode == RecoveryMode.UNKNOWN) { // created by old code base where we don't set recovery mode in splitlogtask // we can safely set to LOG_SPLITTING because we're in master initialization code // before SSH is enabled there is no outstanding recovering regions previousRecoveryMode = RecoveryMode.LOG_SPLITTING; } break; } catch (DeserializationException e) { LOG.warn(Failed parse data for znode + task, e); } catch (InterruptedException e) { throw new InterruptedIOException(); } } } } } } catch (KeeperException e) { throw new IOException(e); } synchronized (this) { if (this.isDrainingDone) { return; } if (!hasSplitLogTask !hasRecoveringRegions) { this.isDrainingDone = true; LOG.debug(set to + recoveryModeInConfig); this.recoveryMode = recoveryModeInConfig; return; } else if (!isForInitialization) { // splitlogtask hasn't drained yet, keep existing recovery mode return; } if (previousRecoveryMode != RecoveryMode.UNKNOWN) { LOG.debug(set to + previousRecoveryMode); this.isDrainingDone = (previousRecoveryMode == recoveryModeInConfig); this.recoveryMode = previousRecoveryMode; } else { LOG.debug(set to + recoveryModeInConfig); this.recoveryMode = recoveryModeInConfig; } } } {code} When failing, I got this {noformat} 2015-03-02
[jira] [Updated] (HBASE-11571) Bulk load handling from secondary region replicas
[ https://issues.apache.org/jira/browse/HBASE-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-11571: -- Attachment: HBASE-11571-v2.patch The v2 patch addressed [~enis] comments. Bulk load handling from secondary region replicas - Key: HBASE-11571 URL: https://issues.apache.org/jira/browse/HBASE-11571 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Jeffrey Zhong Attachments: HBASE-11571-rebase.patch, HBASE-11571-v2.patch, hbase-11571.patch We should be replaying the bulk load events from the primary region replica in the secondary region replica so that the bulk loaded files will be made visible in the secondaries. This will depend on HBASE-11567 and HBASE-11568 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11571) Bulk load handling from secondary region replicas
[ https://issues.apache.org/jira/browse/HBASE-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-11571: -- Attachment: HBASE-11571-rebase.patch Rebase patch. [~enis] Please review it. Thanks. Bulk load handling from secondary region replicas - Key: HBASE-11571 URL: https://issues.apache.org/jira/browse/HBASE-11571 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Jeffrey Zhong Attachments: HBASE-11571-rebase.patch, hbase-11571.patch We should be replaying the bulk load events from the primary region replica in the secondary region replica so that the bulk loaded files will be made visible in the secondaries. This will depend on HBASE-11567 and HBASE-11568 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11571) Bulk load handling from secondary region replicas
[ https://issues.apache.org/jira/browse/HBASE-11571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-11571: -- Status: Patch Available (was: Open) Bulk load handling from secondary region replicas - Key: HBASE-11571 URL: https://issues.apache.org/jira/browse/HBASE-11571 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Jeffrey Zhong Attachments: HBASE-11571-rebase.patch, hbase-11571.patch We should be replaying the bulk load events from the primary region replica in the secondary region replica so that the bulk loaded files will be made visible in the secondaries. This will depend on HBASE-11567 and HBASE-11568 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11580) Failover handling for secondary region replicas
[ https://issues.apache.org/jira/browse/HBASE-11580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341098#comment-14341098 ] Jeffrey Zhong commented on HBASE-11580: --- I've reviewed the patch and left some comments on the review board. +1 assuming unit tests pass. For the flush amplifications, it's more an optimization issue which can be addressed by secondary replica sends the seqId it firstly sees as part of flush request. Primary region can check if it's last flushed seqId is larger than the passed seqId from replica to decide if to perform a flush. Failover handling for secondary region replicas --- Key: HBASE-11580 URL: https://issues.apache.org/jira/browse/HBASE-11580 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar With the async wal approach (HBASE-11568), the edits are not persisted (to wal) in the secondary region replicas. However this means that we have to deal with secondary region replica failures. We can seek to re-replicate the edits from primary to the secondary when the secondary region is opened in another server but this would mean to setup a replication queue again, and holding on to the wals for longer. Instead, we can design it so that the edits form the secondaries are not persisted to wal, and if the secondary replica fails over, it will not start serving reads until it has guaranteed that it has all the past data. For guaranteeing that the secondary replica has all the edits before serving reads, we can use flush and region opening markers. Whenever a region open event is seen, it writes all the files at the time of opening to wal (HBASE-11512). In case of flush, the flushed file is written as well, and the secondary replica can do a ls for the store files and pick up all the files before the seqId of the flushed file. So, in this design, the secodary replica will wait until it sees and replays a flush or region open marker from wal from primary. and then start serving. For speeding up replica opening time, we can trigger a flush to the primary whenever the secondary replica opens as an optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13077) BoundedCompletionService doesn't pass trace info to server
[ https://issues.apache.org/jira/browse/HBASE-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-13077: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've integrated the fix into branch-1.0, branch-1 and master branch. Thanks [~enis] for the review and [~ndimiduk] for the help! BoundedCompletionService doesn't pass trace info to server -- Key: HBASE-13077 URL: https://issues.apache.org/jira/browse/HBASE-13077 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 1.0.0, 2.0.0, 1.1.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 2.0.0, 1.0.1, 1.1.0 Attachments: HBASE-13077.patch Today [~ndimiduk] I found that BoundedCompletionService doesn't pass htrace info to server. This issue causes scan doesn't pass trace info to server. [~enis] FYI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13077) BoundedCompletionService doesn't pass trace info to server
[ https://issues.apache.org/jira/browse/HBASE-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-13077: -- Attachment: HBASE-13077.patch This patch is for 1.0. Thanks. BoundedCompletionService doesn't pass trace info to server -- Key: HBASE-13077 URL: https://issues.apache.org/jira/browse/HBASE-13077 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 1.0.0, 2.0.0, 1.1.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-13077.patch Today [~ndimiduk] I found that BoundedCompletionService doesn't pass htrace info to server. [~enis] FYI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13077) BoundedCompletionService doesn't pass trace info to server
Jeffrey Zhong created HBASE-13077: - Summary: BoundedCompletionService doesn't pass trace info to server Key: HBASE-13077 URL: https://issues.apache.org/jira/browse/HBASE-13077 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 1.0.0, 2.0.0, 1.1.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Today [~ndimiduk] I found that BoundedCompletionService doesn't pass htrace info to server. [~enis] FYI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13077) BoundedCompletionService doesn't pass trace info to server
[ https://issues.apache.org/jira/browse/HBASE-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-13077: -- Status: Patch Available (was: Open) BoundedCompletionService doesn't pass trace info to server -- Key: HBASE-13077 URL: https://issues.apache.org/jira/browse/HBASE-13077 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 1.0.0, 2.0.0, 1.1.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-13077.patch Today [~ndimiduk] I found that BoundedCompletionService doesn't pass htrace info to server. [~enis] FYI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13077) BoundedCompletionService doesn't pass trace info to server
[ https://issues.apache.org/jira/browse/HBASE-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-13077: -- Description: Today [~ndimiduk] I found that BoundedCompletionService doesn't pass htrace info to server. This issue causes scan doesn't pass trace info to server. [~enis] FYI. was: Today [~ndimiduk] I found that BoundedCompletionService doesn't pass htrace info to server. [~enis] FYI. BoundedCompletionService doesn't pass trace info to server -- Key: HBASE-13077 URL: https://issues.apache.org/jira/browse/HBASE-13077 Project: HBase Issue Type: Bug Components: hbase Affects Versions: 1.0.0, 2.0.0, 1.1.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-13077.patch Today [~ndimiduk] I found that BoundedCompletionService doesn't pass htrace info to server. This issue causes scan doesn't pass trace info to server. [~enis] FYI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11569) Flush / Compaction handling from secondary region replicas
[ https://issues.apache.org/jira/browse/HBASE-11569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315467#comment-14315467 ] Jeffrey Zhong commented on HBASE-11569: --- Looks good to me(+1). I posted few minor comments in the review board. Thanks. Flush / Compaction handling from secondary region replicas -- Key: HBASE-11569 URL: https://issues.apache.org/jira/browse/HBASE-11569 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 1.1.0 Attachments: hbase-11569-master-v3.patch We should be handling flushes and compactions from the primary region replica being replayed to the secondary region replica via HBASE-11568. Some initial thoughts for how can this be done is discussed in HBASE-11183. More details will come together with the patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11567) Write bulk load COMMIT events to WAL
[ https://issues.apache.org/jira/browse/HBASE-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-11567: -- Resolution: Fixed Fix Version/s: 1.1.0 2.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks [~posix4e] for the contributions! I've integrated the v4-rebase patch into master and branch-1. Write bulk load COMMIT events to WAL Key: HBASE-11567 URL: https://issues.apache.org/jira/browse/HBASE-11567 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Alex Newman Fix For: 2.0.0, 1.1.0 Attachments: HBASE-11567-v1.patch, HBASE-11567-v2.patch, HBASE-11567-v4-rebase.patch, hbase-11567-branch-1.0-partial.patch, hbase-11567-v3.patch, hbase-11567-v4.patch Similar to writing flush (HBASE-11511), compaction(HBASE-2231) to WAL and region open/close (HBASE-11512) , we should persist bulk load events to WAL. This is especially important for secondary region replicas, since we can use this information to pick up primary regions' files from secondary replicas. A design doc for secondary replica replication can be found at HBASE-11183. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11567) Write bulk load COMMIT events to WAL
[ https://issues.apache.org/jira/browse/HBASE-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-11567: -- Attachment: HBASE-11567-v4-rebase.patch Write bulk load COMMIT events to WAL Key: HBASE-11567 URL: https://issues.apache.org/jira/browse/HBASE-11567 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Alex Newman Attachments: HBASE-11567-v1.patch, HBASE-11567-v2.patch, HBASE-11567-v4-rebase.patch, hbase-11567-v3.patch, hbase-11567-v4.patch Similar to writing flush (HBASE-11511), compaction(HBASE-2231) to WAL and region open/close (HBASE-11512) , we should persist bulk load events to WAL. This is especially important for secondary region replicas, since we can use this information to pick up primary regions' files from secondary replicas. A design doc for secondary replica replication can be found at HBASE-11183. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11567) Write bulk load COMMIT events to WAL
[ https://issues.apache.org/jira/browse/HBASE-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-11567: -- Attachment: (was: HBASE-11567-v4-rebase.patch) Write bulk load COMMIT events to WAL Key: HBASE-11567 URL: https://issues.apache.org/jira/browse/HBASE-11567 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Alex Newman Attachments: HBASE-11567-v1.patch, HBASE-11567-v2.patch, hbase-11567-v3.patch, hbase-11567-v4.patch Similar to writing flush (HBASE-11511), compaction(HBASE-2231) to WAL and region open/close (HBASE-11512) , we should persist bulk load events to WAL. This is especially important for secondary region replicas, since we can use this information to pick up primary regions' files from secondary replicas. A design doc for secondary replica replication can be found at HBASE-11183. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-11567) Write bulk load COMMIT events to WAL
[ https://issues.apache.org/jira/browse/HBASE-11567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-11567: -- Attachment: HBASE-11567-v4-rebase.patch rebased the v4 patch against master. The patch is long due. [~enis] could u please give a quick review? I think the old v4 patch is in ready state but somehow it's left over. Thanks. Write bulk load COMMIT events to WAL Key: HBASE-11567 URL: https://issues.apache.org/jira/browse/HBASE-11567 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Alex Newman Attachments: HBASE-11567-v1.patch, HBASE-11567-v2.patch, HBASE-11567-v4-rebase.patch, hbase-11567-v3.patch, hbase-11567-v4.patch Similar to writing flush (HBASE-11511), compaction(HBASE-2231) to WAL and region open/close (HBASE-11512) , we should persist bulk load events to WAL. This is especially important for secondary region replicas, since we can use this information to pick up primary regions' files from secondary replicas. A design doc for secondary replica replication can be found at HBASE-11183. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12782) ITBLL fails for me if generator does anything but 5M per maptask
[ https://issues.apache.org/jira/browse/HBASE-12782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299325#comment-14299325 ] Jeffrey Zhong commented on HBASE-12782: --- [~saint@gmail.com] Great findings! I previously reviewed the patch. The intention was good and should do flush |= restoreEdit(store, cell); as [~lhofhansl] mentioned above but it apparently that the fix did more than that. Thanks. ITBLL fails for me if generator does anything but 5M per maptask Key: HBASE-12782 URL: https://issues.apache.org/jira/browse/HBASE-12782 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 0.98.9 Reporter: stack Assignee: stack Priority: Critical Fix For: 2.0.0, 1.0.1, 1.1.0, 0.98.11 Attachments: 12782.fix.txt, 12782.search.plus.archive.recovered.edits.txt, 12782.search.plus.txt, 12782.search.txt, 12782.unit.test.and.it.test.txt, 12782.unit.test.writing.txt, 12782v2.0.98.txt, 12782v2.txt Anyone else seeing this? If I do an ITBLL with generator doing 5M rows per maptask, all is good -- verify passes. I've been running 5 servers and had one splot per server. So below works: HADOOP_CLASSPATH=/home/stack/conf_hbase:`/home/stack/hbase/bin/hbase classpath` ./hadoop/bin/hadoop --config ~/conf_hadoop org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey serverKilling Generator 5 500 g1.tmp or if I double the map tasks, it works: HADOOP_CLASSPATH=/home/stack/conf_hbase:`/home/stack/hbase/bin/hbase classpath` ./hadoop/bin/hadoop --config ~/conf_hadoop org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList --monkey serverKilling Generator 10 500 g2.tmp ...but if I change the 5M to 50M or 25M, Verify fails. Looking into it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12743) [ITBLL] Master fails rejoining cluster stuck splitting logs; Distributed log replay=true
[ https://issues.apache.org/jira/browse/HBASE-12743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280997#comment-14280997 ] Jeffrey Zhong commented on HBASE-12743: --- For the error org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region hbase:namespace,,1417551886199.ecdcd0172cd3e32d291bc282771895da. is not online, master won't start. But it should not unrelated to log recovery either splitting/replay. [~saint@gmail.com] could you share more master logs so that I can check why hbase:namespace wasn't online assigned for two hours? Thanks. [ITBLL] Master fails rejoining cluster stuck splitting logs; Distributed log replay=true Key: HBASE-12743 URL: https://issues.apache.org/jira/browse/HBASE-12743 Project: HBase Issue Type: Bug Reporter: stack Fix For: 1.0.0, 2.0.0, 1.1.0 Master is stuck for two days trying to rejoin cluster after monkey killed and restarted it. After retrying to get namespace 350 times, Master goes down: {code} 2014-12-20 18:43:54,285 INFO [c2020:16020.activeMasterManager] client.RpcRetryingCaller: Call exception, tries=349, retries=350, started=6885331 ms ago, cancelled=false, msg=row 'default' on table 'hbase:namespace' at region=hbase:namespace,,1417551886199.ecdcd0172cd3e32d291bc282771895da., hostname=c2023.halxg.cloudera.com,16020,1418988286696, seqNum=600190 2014-12-20 18:43:54,303 WARN [c2020:16020.activeMasterManager] master.TableNamespaceManager: Caught exception in initializing namespace table manager org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=350, exceptions: Sat Dec 20 16:49:08 PST 2014, RpcRetryingCaller{globalStartTime=1419122948954, pause=100, retries=350}, org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Region hbase:namespace,,1417551886199.ecdcd0172cd3e32d291bc282771895da. is not online on c2023.halxg.cloudera.com,16020,1418988286696 at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2722) at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:851) at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1695) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30434) {code} Seems like 2014-12-20 16:49:03,665 INFO [RS_LOG_REPLAY_OPS-c2021:16020-0] wal.WALSplitter: DistributedLogReplay = true Seems easy enough to reproduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12746) [1.0.0RC0] Distributed Log Replay is on (HBASE-12577 was insufficient)
[ https://issues.apache.org/jira/browse/HBASE-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12746: -- Attachment: 12746-v2.patch [~saint@gmail.com] I amended your patch to address the three test failures for your references. Thanks. [1.0.0RC0] Distributed Log Replay is on (HBASE-12577 was insufficient) -- Key: HBASE-12746 URL: https://issues.apache.org/jira/browse/HBASE-12746 Project: HBase Issue Type: Bug Components: wal Affects Versions: 1.0.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 1.0.0 Attachments: 12746-v2.patch, 12746.txt, 12746.txt Testing the 1.0.0RC0 candidate, I noticed DLR was on (because I was bumping into HBASE-12743) I thought it my environment but apparently not. If I add this to HMaster diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java index a85c2e7..d745f94 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java @@ -416,6 +416,10 @@ public class HMaster extends HRegionServer implements MasterServices, Server { throw new IOException(Failed to start redirecting jetty server, e); } masterInfoPort = connector.getPort(); + boolean dlr = + conf.getBoolean(HConstants.DISTRIBUTED_LOG_REPLAY_KEY, + HConstants.DEFAULT_DISTRIBUTED_LOG_REPLAY_CONFIG); + LOG.info(Distributed log replay= + dlr); } It says DLR is on. HBASE-12577 was not enough it seems. The hbase-default.xml still has DLR as true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12746) [1.0.0RC0] Distributed Log Replay is on (HBASE-12577 was insufficient)
[ https://issues.apache.org/jira/browse/HBASE-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259242#comment-14259242 ] Jeffrey Zhong commented on HBASE-12746: --- The only rest change is the following: {quote} +ds = new DummyServer(zkw, testConf); {quote} Because we want to use testConf which has DISTRIBUTED_LOG_REPLAY_KEY on as in the beginning of the test case we have testConf.setBoolean(HConstants.DISTRIBUTED_LOG_REPLAY_KEY, true); [1.0.0RC0] Distributed Log Replay is on (HBASE-12577 was insufficient) -- Key: HBASE-12746 URL: https://issues.apache.org/jira/browse/HBASE-12746 Project: HBase Issue Type: Bug Components: wal Affects Versions: 1.0.0 Reporter: stack Assignee: stack Priority: Critical Fix For: 1.0.0 Attachments: 12746-v2.patch, 12746.txt, 12746.txt Testing the 1.0.0RC0 candidate, I noticed DLR was on (because I was bumping into HBASE-12743) I thought it my environment but apparently not. If I add this to HMaster diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java index a85c2e7..d745f94 100644 --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java @@ -416,6 +416,10 @@ public class HMaster extends HRegionServer implements MasterServices, Server { throw new IOException(Failed to start redirecting jetty server, e); } masterInfoPort = connector.getPort(); + boolean dlr = + conf.getBoolean(HConstants.DISTRIBUTED_LOG_REPLAY_KEY, + HConstants.DEFAULT_DISTRIBUTED_LOG_REPLAY_CONFIG); + LOG.info(Distributed log replay= + dlr); } It says DLR is on. HBASE-12577 was not enough it seems. The hbase-default.xml still has DLR as true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248725#comment-14248725 ] Jeffrey Zhong commented on HBASE-10201: --- Looks good to me(+1) for master branch. Branch-1 should rely on [~enis]'s feedbacks. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_19.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243274#comment-14243274 ] Jeffrey Zhong commented on HBASE-10201: --- {quote} Now I always generate a new flushSeqId and use this as the seqId of flushed StoreFiles. And use a maxFlushedSeqId to record completeSequenceId that passed to HMaster. Is it OK? {quote} Sounds good to me. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243528#comment-14243528 ] Jeffrey Zhong commented on HBASE-10201: --- [~saint@gmail.com] {quote} Are you referring to the following: Will this mean we drop edits because region thinks its sequenceid is higher than it should be? {quote} Yes, as of today during replay edits in both modes, we drop WAL edits whose seqId less than relating store Seq Ids. There some edge cases(like a new PUT, region move to a different RS, DELETE on the new PUT, major compaction, move back to the original RS and the RS crashes) we have to know the hFile seqId accurately otherwise the PUT may be restored after recovery. We need to pass flushed seqIds per store to master so that we can optimize recovery process but doesn't impact correctness. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12465) HBase master start fails due to incorrect file creations
[ https://issues.apache.org/jira/browse/HBASE-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243538#comment-14243538 ] Jeffrey Zhong commented on HBASE-12465: --- Ping [~saint@gmail.com] any thoughts on this? Thanks. HBase master start fails due to incorrect file creations Key: HBASE-12465 URL: https://issues.apache.org/jira/browse/HBASE-12465 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.0 Environment: Ubuntu Reporter: Biju Nair Assignee: Alicia Ying Shu Labels: hbase, hbase-bulkload - Start of HBase master fails due to the following error found in the log. 2014-11-11 20:25:58,860 WARN org.apache.hadoop.hbase.backup.HFileArchiver: Failed to archive class org.apache.hadoop.hbase.backup.HFileArchiver$FileablePa th,file:hdfs:///hbase/.tmp/data/default/tbl/00820520f5cb7839395e83f40c8d97c2/e/52bf9eee7a27460c8d9e2a26fa43c918_SeqId_282271246_ on try #1 org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase,access=WRITE,inode=/hbase/.tmp/data/default/tbl/00820520f5cb7839395e83f40c8d97c2/e/52bf9eee7a27460c8d9e2a26fa43c918_SeqId_282271246_:devuser:supergroup:-rwxr-xr-x - All the files that hbase master was complaining about are created under an users user-id instead on hbase user resulting in incorrect access permission for the master to act on. - Looks like this was due to bulk load done using LoadIncrementalHFiles program. - HBASE-12052 is another scenario similar to this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243631#comment-14243631 ] Jeffrey Zhong commented on HBASE-10201: --- [~saint@gmail.com] Besides [~Apache9] mentioned, we skip edits using seqId of each relating store, the #4(which is #3) is only set after region is full recovered(i.e all WAL edits are already replayed). {quote} If master crash and loss the information, then we will not skip any edits? {quote} yes, we'll lose the info and will replay more edits. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241806#comment-14241806 ] Jeffrey Zhong commented on HBASE-10201: --- {quote} because we are not doing DLR in 0.98 or for some other reason? This patch is unlikely to make it back to 0.98 I'd say. {quote} It's because we defer mvcc values clean up(by HBASE-11315) but anyway we should maintain the semantics that HStore file seqId is the largest flushed SeqId for the file. {quote} And do I need to change original log split policy to also use a familyName-seqId map to filter out cells that already flushed? {quote} Yes, we should but you could do in a separate issue on this though. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_18.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12485) Maintain SeqId monotonically increasing
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12485: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for all the reviews comments! I've integrated the fix into branch-1 master branch. Maintain SeqId monotonically increasing --- Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 1.0.0, 2.0.0 Attachments: HBASE-12485-v2.patch, HBASE-12485-v2.patch, HBASE-12485.patch We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239958#comment-14239958 ] Jeffrey Zhong commented on HBASE-10201: --- This is a nice feature. I scan through the patch and below are my comments: 1) There may be a correctness issue for same version(same row key version) updates. Because you use following code as store file flush id, we could end up multiple hstore files with exact same flush seq id. While HBase resolve same version updates by store files' seqid(flush id). Therefore, we may end up with incorrect results. This issue may only happen in 0.98 though. {code} + long oldestUnflushedSeqId = wal + .getEarliestMemstoreSeqNum(encodedRegionName); {code} In order to fix the issue, we should use current store's max flushed seq id as its real hstore seq id. While we need to change HRegion.lastFlushSeqId to use oldestUnflushedSeqId to report back Master otherwise we may have data loss issue. 2) We have a feature where we force a flush by hbase.regionserver.optionalcacheflushinterval or hbase.regionserver.flush.per.changes while I didn't see you handle both cases in selectStoresToFlush() function. This may cause HRegion.shouldFlush() always return true and end up with small hstore files. 3) For region server recovery, we have an optimization by using lastFlushSeqId reported by region servers to skip writing edits into recovered.edits files. With this feature, we may unnecessarily write much more data into recovered.edits. This issue doesn't happen in log replay case. 4) Relating to your FlushMarker question, FulshMarker(or similar RegionEventWALEdit) are used for region replica feature and reasoning on region/store state. As you can see(in WALEdit class), those special events are using special column family METAFAMILY which doesn't exist for data regions. You should handle those events specially in getFamilyNames() otherwise they may affect your book keeping on oldest un-flushed seqid. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240461#comment-14240461 ] Jeffrey Zhong commented on HBASE-10201: --- {quote} (and the format of zk data in distributed log replay) {quote} You don't have to change this because log replay already gets max seqId per store before sending edits for replay. Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Fix For: 1.0.0, 2.0.0 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12465) HBase master start fails due to incorrect file creations
[ https://issues.apache.org/jira/browse/HBASE-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240556#comment-14240556 ] Jeffrey Zhong commented on HBASE-12465: --- This issue might be a user uses hbase tmp folder as Import tool temporary output folder while HBase will try to recreate(delete and then create) tmp folder during starts. Therefore it cause HMaster can't start. [~saint@gmail.com] Do u think any error from checkTempDir inside HMaster#createInitialFileSystemLayout is fatal? If it's fatal then we don't need do any thing for the JIRA otherwise we catch the error/log it and move on. HBase master start fails due to incorrect file creations Key: HBASE-12465 URL: https://issues.apache.org/jira/browse/HBASE-12465 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.96.0 Environment: Ubuntu Reporter: Biju Nair Assignee: Alicia Ying Shu Labels: hbase, hbase-bulkload - Start of HBase master fails due to the following error found in the log. 2014-11-11 20:25:58,860 WARN org.apache.hadoop.hbase.backup.HFileArchiver: Failed to archive class org.apache.hadoop.hbase.backup.HFileArchiver$FileablePa th,file:hdfs:///hbase/.tmp/data/default/tbl/00820520f5cb7839395e83f40c8d97c2/e/52bf9eee7a27460c8d9e2a26fa43c918_SeqId_282271246_ on try #1 org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase,access=WRITE,inode=/hbase/.tmp/data/default/tbl/00820520f5cb7839395e83f40c8d97c2/e/52bf9eee7a27460c8d9e2a26fa43c918_SeqId_282271246_:devuser:supergroup:-rwxr-xr-x - All the files that hbase master was complaining about are created under an users user-id instead on hbase user resulting in incorrect access permission for the master to act on. - Looks like this was due to bulk load done using LoadIncrementalHFiles program. - HBASE-12052 is another scenario similar to this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10201) Port 'Make flush decisions per column family' to trunk
[ https://issues.apache.org/jira/browse/HBASE-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239008#comment-14239008 ] Jeffrey Zhong commented on HBASE-10201: --- [~saint@gmail.com] Sure. Let me take a look at this patch! Port 'Make flush decisions per column family' to trunk -- Key: HBASE-10201 URL: https://issues.apache.org/jira/browse/HBASE-10201 Project: HBase Issue Type: Improvement Components: wal Reporter: Ted Yu Assignee: zhangduo Priority: Critical Fix For: 1.0.0, 2.0.0, 0.98.9 Attachments: 3149-trunk-v1.txt, HBASE-10201-0.98.patch, HBASE-10201-0.98_1.patch, HBASE-10201-0.98_2.patch, HBASE-10201-0.99.patch, HBASE-10201.patch, HBASE-10201_1.patch, HBASE-10201_10.patch, HBASE-10201_11.patch, HBASE-10201_12.patch, HBASE-10201_13.patch, HBASE-10201_13.patch, HBASE-10201_14.patch, HBASE-10201_15.patch, HBASE-10201_16.patch, HBASE-10201_17.patch, HBASE-10201_2.patch, HBASE-10201_3.patch, HBASE-10201_4.patch, HBASE-10201_5.patch, HBASE-10201_6.patch, HBASE-10201_7.patch, HBASE-10201_8.patch, HBASE-10201_9.patch, compactions.png, count.png, io.png, memstore.png Currently the flush decision is made using the aggregate size of all column families. When large and small column families co-exist, this causes many small flushes of the smaller CF. We need to make per-CF flush decisions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12485) Maintain SeqId monotonically increasing
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14237468#comment-14237468 ] Jeffrey Zhong commented on HBASE-12485: --- Thanks [~saint@gmail.com] for the comments! {quote} should be just 'return isSequenceIdFile(p);' {quote} That's a good point. I'll change the part when committing the patch. {quote} That is because if old style, its stale... not pertinent to this recovery? {quote} Yes. {quote} that is the reasoning? {quote} Yes. Maintain SeqId monotonically increasing --- Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 1.0.0, 2.0.0 Attachments: HBASE-12485-v2.patch, HBASE-12485-v2.patch, HBASE-12485.patch We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12485) Maintain SeqId monotonically increasing
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12485: -- Attachment: PHOENIX-1498-v2.patch The v2 patch addressed [~saint@gmail.com] comments by using .seqid as seqid file name suffix. Maintain SeqId monotonically increasing --- Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 1.0.0, 2.0.0 Attachments: HBASE-12485.patch, PHOENIX-1498-v2.patch We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12485) Maintain SeqId monotonically increasing
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12485: -- Attachment: HBASE-12485-v2.patch Maintain SeqId monotonically increasing --- Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 1.0.0, 2.0.0 Attachments: HBASE-12485-v2.patch, HBASE-12485.patch We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12485) Maintain SeqId monotonically increasing
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12485: -- Attachment: (was: PHOENIX-1498-v2.patch) Maintain SeqId monotonically increasing --- Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 1.0.0, 2.0.0 Attachments: HBASE-12485-v2.patch, HBASE-12485.patch We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12485) Maintain SeqId monotonically increasing
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236454#comment-14236454 ] Jeffrey Zhong commented on HBASE-12485: --- [~saint@gmail.com] if we use .sqeid' then hbck reports ERROR: Found lingering reference file error. It's due to we have a bug in the code: FSUtils#getTableStoreFilePathMap() where we didn't skip recovered.edits that is on same folder level as column family. I fixed the issue in the attached patch as following. {code} --- hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java +++ hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java @@ -1508,6 +1508,9 @@ public abstract class FSUtils { FileStatus[] familyDirs = fs.listStatus(dd, familyFilter); for (FileStatus familyDir : familyDirs) { Path family = familyDir.getPath(); +if (family.getName().equals(HConstants.RECOVERED_EDITS_DIR)) { + continue; +} // now in family, iterate over the StoreFiles and {code} While if we use .sqeid, the old hbck won't work. This will cause issue for rollback and during upgrade if a user uses old hbck. Should we still keep _seqid? what do you suggest? Thanks. Maintain SeqId monotonically increasing --- Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 1.0.0, 2.0.0 Attachments: HBASE-12485.patch We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12485) Maintain SeqId monotonically increasing
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12485: -- Summary: Maintain SeqId monotonically increasing (was: Maintain SeqId monotonically increasing when Region Replica is on) Maintain SeqId monotonically increasing --- Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 1.0.0, 2.0.0 We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12485) Maintain SeqId monotonically increasing
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12485: -- Attachment: (was: HBASE-12485.patch) Maintain SeqId monotonically increasing --- Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 1.0.0, 2.0.0 We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12485) Maintain SeqId monotonically increasing
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12485: -- Attachment: HBASE-12485.patch Resubmit for QA run Maintain SeqId monotonically increasing --- Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 1.0.0, 2.0.0 Attachments: HBASE-12485.patch We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12485) Maintain SeqId monotonically increasing
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14234729#comment-14234729 ] Jeffrey Zhong commented on HBASE-12485: --- Thanks [~saint@gmail.com] for the review! {quote} Throw exception rather than WARN. {quote} This is a good point. If we do this then the region won't be opened anymore unless human intention(it might be also hard as it needs get rid of certain edits from recovered edits files). {quote} dot prefix like other special files {quote} I tried this before but hbck gives some errors. Let me try it again to see if I can make hbck happy. Maintain SeqId monotonically increasing --- Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 1.0.0, 2.0.0 Attachments: HBASE-12485.patch We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12485) Maintain SeqId monotonically increasing when Region Replica is on
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12485: -- Status: Patch Available (was: Open) Maintain SeqId monotonically increasing when Region Replica is on - Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12485.patch We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12485) Maintain SeqId monotonically increasing when Region Replica is on
[ https://issues.apache.org/jira/browse/HBASE-12485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12485: -- Attachment: HBASE-12485.patch Submit Patch for QA run. The patch basically uses SeqId file to store region latest seqId during region close open. Thanks. Maintain SeqId monotonically increasing when Region Replica is on - Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12485.patch We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12600) Remove REPLAY tag dependency in Distributed Replay Mode
[ https://issues.apache.org/jira/browse/HBASE-12600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12600: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the reviews! I've integrated the fix into branch-1 master branch. Remove REPLAY tag dependency in Distributed Replay Mode --- Key: HBASE-12600 URL: https://issues.apache.org/jira/browse/HBASE-12600 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 2.0.0, 0.99.2 Attachments: HBASE-12600.patch After HBASE-11315 HBASE-8763, each edit has a unique 'version' i.e. its SequenceId(or old mvcc value). Therefore, we don't need replay tag to handle out of order same version updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12600) Remove REPLAY tag dependency in Distributed Replay Mode
[ https://issues.apache.org/jira/browse/HBASE-12600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228953#comment-14228953 ] Jeffrey Zhong commented on HBASE-12600: --- Yes. Thanks [~saint@gmail.com] for the reviews! I checked that checkstyle errors seems un-related to my patch. Remove REPLAY tag dependency in Distributed Replay Mode --- Key: HBASE-12600 URL: https://issues.apache.org/jira/browse/HBASE-12600 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12600.patch After HBASE-11315 HBASE-8763, each edit has a unique 'version' i.e. its SequenceId(or old mvcc value). Therefore, we don't need replay tag to handle out of order same version updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12588) Need to fail writes when row lock can't be acquired
[ https://issues.apache.org/jira/browse/HBASE-12588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229003#comment-14229003 ] Jeffrey Zhong commented on HBASE-12588: --- I agree with [~Apache9]. batchMutate is all right and we just need to make sure that our own code do check result for each update operation after a batchMutate call. Thanks. Need to fail writes when row lock can't be acquired --- Key: HBASE-12588 URL: https://issues.apache.org/jira/browse/HBASE-12588 Project: HBase Issue Type: Bug Affects Versions: 0.98.8, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12588.patch Currently we don't fail write operations when can't acquiring row locks as shown below in HRegion#doMiniBatchMutation. {code} ... RowLock rowLock = null; try { rowLock = getRowLock(mutation.getRow(), shouldBlock); } catch (IOException ioe) { LOG.warn(Failed getting lock in batch put, row= + Bytes.toStringBinary(mutation.getRow()), ioe); } if (rowLock == null) { // We failed to grab another lock assert !shouldBlock : Should never fail to get lock when blocking; break; // stop acquiring more rows for this batch } else { acquiredRowLocks.add(rowLock); } ... {code} We saw this issue when there is meta corruption problem and checkRow fails with error: {noformat} org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for row lock on HRegion {noformat} While current code still continues with writes. In all cases, this is so dangerous because row locks have to be acquired before update operations to guarantee row update atomicity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12588) Need to fail writes when row lock can't be acquired
[ https://issues.apache.org/jira/browse/HBASE-12588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228652#comment-14228652 ] Jeffrey Zhong commented on HBASE-12588: --- [~saint@gmail.com] Yes, it's similar. Batch update partially failed due not to get RowLock because of meta data corruption in my case. In addition, I searched the code and it seems we have issue in function HBaseFsck#rebuildMeta where we don't check return code. {code} meta.batchMutate(puts.toArray(new Put[puts.size()])); {code} Need to fail writes when row lock can't be acquired --- Key: HBASE-12588 URL: https://issues.apache.org/jira/browse/HBASE-12588 Project: HBase Issue Type: Bug Affects Versions: 0.98.8, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12588.patch Currently we don't fail write operations when can't acquiring row locks as shown below in HRegion#doMiniBatchMutation. {code} ... RowLock rowLock = null; try { rowLock = getRowLock(mutation.getRow(), shouldBlock); } catch (IOException ioe) { LOG.warn(Failed getting lock in batch put, row= + Bytes.toStringBinary(mutation.getRow()), ioe); } if (rowLock == null) { // We failed to grab another lock assert !shouldBlock : Should never fail to get lock when blocking; break; // stop acquiring more rows for this batch } else { acquiredRowLocks.add(rowLock); } ... {code} We saw this issue when there is meta corruption problem and checkRow fails with error: {noformat} org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for row lock on HRegion {noformat} While current code still continues with writes. In all cases, this is so dangerous because row locks have to be acquired before update operations to guarantee row update atomicity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12600) Remove REPLAY tag dependency in Distributed Replay Mode
Jeffrey Zhong created HBASE-12600: - Summary: Remove REPLAY tag dependency in Distributed Replay Mode Key: HBASE-12600 URL: https://issues.apache.org/jira/browse/HBASE-12600 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.99.1, 2.0.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong After HBASE-11315 HBASE-8763, each edit has a unique 'version' i.e. its SequenceId(or old mvcc value). Therefore, we don't need replay tag to handle out of order same version updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12600) Remove REPLAY tag dependency in Distributed Replay Mode
[ https://issues.apache.org/jira/browse/HBASE-12600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12600: -- Attachment: HBASE-12600.patch Submit the patch for QA run. Remove REPLAY tag dependency in Distributed Replay Mode --- Key: HBASE-12600 URL: https://issues.apache.org/jira/browse/HBASE-12600 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12600.patch After HBASE-11315 HBASE-8763, each edit has a unique 'version' i.e. its SequenceId(or old mvcc value). Therefore, we don't need replay tag to handle out of order same version updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12600) Remove REPLAY tag dependency in Distributed Replay Mode
[ https://issues.apache.org/jira/browse/HBASE-12600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12600: -- Status: Patch Available (was: Open) Remove REPLAY tag dependency in Distributed Replay Mode --- Key: HBASE-12600 URL: https://issues.apache.org/jira/browse/HBASE-12600 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.99.1, 2.0.0 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12600.patch After HBASE-11315 HBASE-8763, each edit has a unique 'version' i.e. its SequenceId(or old mvcc value). Therefore, we don't need replay tag to handle out of order same version updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12600) Remove REPLAY tag dependency in Distributed Replay Mode
[ https://issues.apache.org/jira/browse/HBASE-12600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228658#comment-14228658 ] Jeffrey Zhong commented on HBASE-12600: --- [~enis] I want to get this in branch-1. Please check it. Thanks. Remove REPLAY tag dependency in Distributed Replay Mode --- Key: HBASE-12600 URL: https://issues.apache.org/jira/browse/HBASE-12600 Project: HBase Issue Type: Bug Components: wal Affects Versions: 2.0.0, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12600.patch After HBASE-11315 HBASE-8763, each edit has a unique 'version' i.e. its SequenceId(or old mvcc value). Therefore, we don't need replay tag to handle out of order same version updates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12533) staging directories are not deleted after secure bulk load
[ https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12533: -- Attachment: HBASE-12533-v2.patch I tested the patch in a secure env and verified the fix solves the issue. In the v2 patch, I amended the existing test case by adding a check to verify we don't left staging folders behind. staging directories are not deleted after secure bulk load -- Key: HBASE-12533 URL: https://issues.apache.org/jira/browse/HBASE-12533 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Environment: CDH5.2 + Kerberos Reporter: Andrejs Dubovskis Assignee: Jeffrey Zhong Attachments: HBASE-12533-v2.patch, HBASE-12533.patch We using secure bulk load heavily in our environment. And it was working with no problem during some time. But last week I found that clients hangs while calling *doBulkLoad* After some investigation I found that HDFS keeps more than 1,000,000 directories in /tmp/hbase-staging directory. When directory's content was purged the load process runs successfully. According the [hbase book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] {code} HBase manages creation and deletion of this directory. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12588) Need to fail writes when row lock can't be acquired
Jeffrey Zhong created HBASE-12588: - Summary: Need to fail writes when row lock can't be acquired Key: HBASE-12588 URL: https://issues.apache.org/jira/browse/HBASE-12588 Project: HBase Issue Type: Bug Affects Versions: 0.99.1, 0.98.8 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Currently we don't fail write operations when can't acquiring row locks as shown below in HRegion#doMiniBatchMutation. {code} ... RowLock rowLock = null; try { rowLock = getRowLock(mutation.getRow(), shouldBlock); } catch (IOException ioe) { LOG.warn(Failed getting lock in batch put, row= + Bytes.toStringBinary(mutation.getRow()), ioe); } if (rowLock == null) { // We failed to grab another lock assert !shouldBlock : Should never fail to get lock when blocking; break; // stop acquiring more rows for this batch } else { acquiredRowLocks.add(rowLock); } ... {code} We saw this issue when there is meta corruption problem and checkRow fails with error: {noformat} org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for row lock on HRegion {noformat} While current code still continues with writes. In all cases, this is so dangerous because row locks have to be acquired before update operations to guarantee row update atomicity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12577) Disable distributed log replay by default
[ https://issues.apache.org/jira/browse/HBASE-12577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12577: -- Attachment: HBASE-12567.patch Submit for QA run. Thanks. Disable distributed log replay by default - Key: HBASE-12577 URL: https://issues.apache.org/jira/browse/HBASE-12577 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.99.2 Attachments: HBASE-12567.patch Distributed log replay is an awesome feature, but due of HBASE-11094, the rolling upgrade story from 0.98 is hard to explain / enforce. The fix for HBASE-11094 only went into 0.98.4, meaning rolling upgrades from 0.98.4- might lose data during the upgrade. I feel no matter how much documentation / warning we do, we cannot prevent users from doing rolling upgrades from 0.98.4- to 1.0. And we do not want to inconvenience the user by requiring a two step rolling upgrade. Thus I think we should disable dist log replay for 1.0, and re-enable it again for 1.1 (if rolling upgrade from 0.98 is not supported). ie. undo: HBASE-10888 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12577) Disable distributed log replay by default
[ https://issues.apache.org/jira/browse/HBASE-12577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12577: -- Status: Patch Available (was: Open) Disable distributed log replay by default - Key: HBASE-12577 URL: https://issues.apache.org/jira/browse/HBASE-12577 Project: HBase Issue Type: Sub-task Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.99.2 Attachments: HBASE-12567.patch Distributed log replay is an awesome feature, but due of HBASE-11094, the rolling upgrade story from 0.98 is hard to explain / enforce. The fix for HBASE-11094 only went into 0.98.4, meaning rolling upgrades from 0.98.4- might lose data during the upgrade. I feel no matter how much documentation / warning we do, we cannot prevent users from doing rolling upgrades from 0.98.4- to 1.0. And we do not want to inconvenience the user by requiring a two step rolling upgrade. Thus I think we should disable dist log replay for 1.0, and re-enable it again for 1.1 (if rolling upgrade from 0.98 is not supported). ie. undo: HBASE-10888 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12588) Need to fail writes when row lock can't be acquired
[ https://issues.apache.org/jira/browse/HBASE-12588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12588: -- Status: Patch Available (was: Open) Need to fail writes when row lock can't be acquired --- Key: HBASE-12588 URL: https://issues.apache.org/jira/browse/HBASE-12588 Project: HBase Issue Type: Bug Affects Versions: 0.99.1, 0.98.8 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12588.patch Currently we don't fail write operations when can't acquiring row locks as shown below in HRegion#doMiniBatchMutation. {code} ... RowLock rowLock = null; try { rowLock = getRowLock(mutation.getRow(), shouldBlock); } catch (IOException ioe) { LOG.warn(Failed getting lock in batch put, row= + Bytes.toStringBinary(mutation.getRow()), ioe); } if (rowLock == null) { // We failed to grab another lock assert !shouldBlock : Should never fail to get lock when blocking; break; // stop acquiring more rows for this batch } else { acquiredRowLocks.add(rowLock); } ... {code} We saw this issue when there is meta corruption problem and checkRow fails with error: {noformat} org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for row lock on HRegion {noformat} While current code still continues with writes. In all cases, this is so dangerous because row locks have to be acquired before update operations to guarantee row update atomicity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12588) Need to fail writes when row lock can't be acquired
[ https://issues.apache.org/jira/browse/HBASE-12588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12588: -- Attachment: HBASE-12588.patch Need to fail writes when row lock can't be acquired --- Key: HBASE-12588 URL: https://issues.apache.org/jira/browse/HBASE-12588 Project: HBase Issue Type: Bug Affects Versions: 0.98.8, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12588.patch Currently we don't fail write operations when can't acquiring row locks as shown below in HRegion#doMiniBatchMutation. {code} ... RowLock rowLock = null; try { rowLock = getRowLock(mutation.getRow(), shouldBlock); } catch (IOException ioe) { LOG.warn(Failed getting lock in batch put, row= + Bytes.toStringBinary(mutation.getRow()), ioe); } if (rowLock == null) { // We failed to grab another lock assert !shouldBlock : Should never fail to get lock when blocking; break; // stop acquiring more rows for this batch } else { acquiredRowLocks.add(rowLock); } ... {code} We saw this issue when there is meta corruption problem and checkRow fails with error: {noformat} org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for row lock on HRegion {noformat} While current code still continues with writes. In all cases, this is so dangerous because row locks have to be acquired before update operations to guarantee row update atomicity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12588) Need to fail writes when row lock can't be acquired
[ https://issues.apache.org/jira/browse/HBASE-12588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226993#comment-14226993 ] Jeffrey Zhong commented on HBASE-12588: --- Yes, you're right so updates do under row lock protection. Therefore, it might be all right then. In this case, caller won't know this unless it checks all mutation return status. Need to fail writes when row lock can't be acquired --- Key: HBASE-12588 URL: https://issues.apache.org/jira/browse/HBASE-12588 Project: HBase Issue Type: Bug Affects Versions: 0.98.8, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12588.patch Currently we don't fail write operations when can't acquiring row locks as shown below in HRegion#doMiniBatchMutation. {code} ... RowLock rowLock = null; try { rowLock = getRowLock(mutation.getRow(), shouldBlock); } catch (IOException ioe) { LOG.warn(Failed getting lock in batch put, row= + Bytes.toStringBinary(mutation.getRow()), ioe); } if (rowLock == null) { // We failed to grab another lock assert !shouldBlock : Should never fail to get lock when blocking; break; // stop acquiring more rows for this batch } else { acquiredRowLocks.add(rowLock); } ... {code} We saw this issue when there is meta corruption problem and checkRow fails with error: {noformat} org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for row lock on HRegion {noformat} While current code still continues with writes. In all cases, this is so dangerous because row locks have to be acquired before update operations to guarantee row update atomicity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12053) SecurityBulkLoadEndPoint set 777 permission on input data files
[ https://issues.apache.org/jira/browse/HBASE-12053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong resolved HBASE-12053. --- Resolution: Fixed Hadoop Flags: Reviewed Thanks for the comments! I've integrated the fix into 0.98,0.99 master branches. SecurityBulkLoadEndPoint set 777 permission on input data files Key: HBASE-12053 URL: https://issues.apache.org/jira/browse/HBASE-12053 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12053.patch We have code in SecureBulkLoadEndpoint#secureBulkLoadHFiles {code} LOG.trace(Setting permission for: + p); fs.setPermission(p, PERM_ALL_ACCESS); {code} This is against the point we use staging folder for secure bulk load. Currently we create a hidden staging folder which has ALL_ACCESS permission and we use doAs to move input files into staging folder. Therefore, we should not set 777 permission on the original input data files but files in staging folder after move. This may comprise security setting especially when there is an error we move the file with 777 permission back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12533) staging directories are not deleted after secure bulk load
[ https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12533: -- Resolution: Fixed Fix Version/s: 0.99.2 0.98.9 2.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review/comments! I've integrated the fix into 0.98,0.99 master branches. staging directories are not deleted after secure bulk load -- Key: HBASE-12533 URL: https://issues.apache.org/jira/browse/HBASE-12533 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Environment: CDH5.2 + Kerberos Reporter: Andrejs Dubovskis Assignee: Jeffrey Zhong Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12533-v2.patch, HBASE-12533.patch We using secure bulk load heavily in our environment. And it was working with no problem during some time. But last week I found that clients hangs while calling *doBulkLoad* After some investigation I found that HDFS keeps more than 1,000,000 directories in /tmp/hbase-staging directory. When directory's content was purged the load process runs successfully. According the [hbase book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] {code} HBase manages creation and deletion of this directory. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12588) Need to fail writes when row lock can't be acquired
[ https://issues.apache.org/jira/browse/HBASE-12588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227081#comment-14227081 ] Jeffrey Zhong commented on HBASE-12588: --- [~Apache9] it's similar cause. This call gives a wrong impression about the whole batch is atomically committed while the same function will fail in whole batch for other errors. Need to fail writes when row lock can't be acquired --- Key: HBASE-12588 URL: https://issues.apache.org/jira/browse/HBASE-12588 Project: HBase Issue Type: Bug Affects Versions: 0.98.8, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12588.patch Currently we don't fail write operations when can't acquiring row locks as shown below in HRegion#doMiniBatchMutation. {code} ... RowLock rowLock = null; try { rowLock = getRowLock(mutation.getRow(), shouldBlock); } catch (IOException ioe) { LOG.warn(Failed getting lock in batch put, row= + Bytes.toStringBinary(mutation.getRow()), ioe); } if (rowLock == null) { // We failed to grab another lock assert !shouldBlock : Should never fail to get lock when blocking; break; // stop acquiring more rows for this batch } else { acquiredRowLocks.add(rowLock); } ... {code} We saw this issue when there is meta corruption problem and checkRow fails with error: {noformat} org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for row lock on HRegion {noformat} While current code still continues with writes. In all cases, this is so dangerous because row locks have to be acquired before update operations to guarantee row update atomicity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12588) Need to fail writes when row lock can't be acquired
[ https://issues.apache.org/jira/browse/HBASE-12588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227281#comment-14227281 ] Jeffrey Zhong commented on HBASE-12588: --- Adding FAILURE status would be better. I'm thinking closing the JIRA as by design because it seems that's the behavior(i.e let partial updates go through) we want. Need to fail writes when row lock can't be acquired --- Key: HBASE-12588 URL: https://issues.apache.org/jira/browse/HBASE-12588 Project: HBase Issue Type: Bug Affects Versions: 0.98.8, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12588.patch Currently we don't fail write operations when can't acquiring row locks as shown below in HRegion#doMiniBatchMutation. {code} ... RowLock rowLock = null; try { rowLock = getRowLock(mutation.getRow(), shouldBlock); } catch (IOException ioe) { LOG.warn(Failed getting lock in batch put, row= + Bytes.toStringBinary(mutation.getRow()), ioe); } if (rowLock == null) { // We failed to grab another lock assert !shouldBlock : Should never fail to get lock when blocking; break; // stop acquiring more rows for this batch } else { acquiredRowLocks.add(rowLock); } ... {code} We saw this issue when there is meta corruption problem and checkRow fails with error: {noformat} org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for row lock on HRegion {noformat} While current code still continues with writes. In all cases, this is so dangerous because row locks have to be acquired before update operations to guarantee row update atomicity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12588) Need to fail writes when row lock can't be acquired
[ https://issues.apache.org/jira/browse/HBASE-12588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12588: -- Resolution: Not a Problem Status: Resolved (was: Patch Available) It seems that's the expected behavior we want: allowing partial updates of a batch and relying client to handle the partial update scenario. Need to fail writes when row lock can't be acquired --- Key: HBASE-12588 URL: https://issues.apache.org/jira/browse/HBASE-12588 Project: HBase Issue Type: Bug Affects Versions: 0.98.8, 0.99.1 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Attachments: HBASE-12588.patch Currently we don't fail write operations when can't acquiring row locks as shown below in HRegion#doMiniBatchMutation. {code} ... RowLock rowLock = null; try { rowLock = getRowLock(mutation.getRow(), shouldBlock); } catch (IOException ioe) { LOG.warn(Failed getting lock in batch put, row= + Bytes.toStringBinary(mutation.getRow()), ioe); } if (rowLock == null) { // We failed to grab another lock assert !shouldBlock : Should never fail to get lock when blocking; break; // stop acquiring more rows for this batch } else { acquiredRowLocks.add(rowLock); } ... {code} We saw this issue when there is meta corruption problem and checkRow fails with error: {noformat} org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row out of range for row lock on HRegion {noformat} While current code still continues with writes. In all cases, this is so dangerous because row locks have to be acquired before update operations to guarantee row update atomicity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12522) Backport WAL refactoring to branch-1
[ https://issues.apache.org/jira/browse/HBASE-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221464#comment-14221464 ] Jeffrey Zhong commented on HBASE-12522: --- {quote} Or do you think Phoenix's secondary index needs to have WAL edits that span regions? {quote} WAL Edits can't span regions because our log SeqId is only guaranteed to monotonically increase by region. Local index doesn't span edits across regions. For transaction support, some high level support is needed but not at the WAL level. Backport WAL refactoring to branch-1 Key: HBASE-12522 URL: https://issues.apache.org/jira/browse/HBASE-12522 Project: HBase Issue Type: Task Components: wal Reporter: Sean Busbey Assignee: Sean Busbey Fix For: 0.99.2 backport HBASE-10378 to branch-1. This will let us remove the Deprecated stuff in master, allow some baking time within the 1.x line, and will give us the option of pulling back follow on performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12533) staging directories are not deleted after secure bulk load
[ https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14221552#comment-14221552 ] Jeffrey Zhong commented on HBASE-12533: --- [~dubislv] Is that possible for you to try to patch to see if the issue is addressed? The patch should be able to apply to 0.98 code base as well. Thanks. staging directories are not deleted after secure bulk load -- Key: HBASE-12533 URL: https://issues.apache.org/jira/browse/HBASE-12533 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Environment: CDH5.2 + Kerberos Reporter: Andrejs Dubovskis Assignee: Jeffrey Zhong Attachments: HBASE-12533.patch We using secure bulk load heavily in our environment. And it was working with no problem during some time. But last week I found that clients hangs while calling *doBulkLoad* After some investigation I found that HDFS keeps more than 1,000,000 directories in /tmp/hbase-staging directory. When directory's content was purged the load process runs successfully. According the [hbase book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] {code} HBase manages creation and deletion of this directory. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12533) staging directories are not deleted after secure bulk load
[ https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220079#comment-14220079 ] Jeffrey Zhong commented on HBASE-12533: --- {quote} Can you add a comment when we are calling the coprocessors to say that we are only calling the first region {quote} Yes, we're use the first region to call the prepareBulkLoad once. staging directories are not deleted after secure bulk load -- Key: HBASE-12533 URL: https://issues.apache.org/jira/browse/HBASE-12533 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Environment: CDH5.2 + Kerberos Reporter: Andrejs Dubovskis Assignee: Jeffrey Zhong Attachments: HBASE-12533.patch We using secure bulk load heavily in our environment. And it was working with no problem during some time. But last week I found that clients hangs while calling *doBulkLoad* After some investigation I found that HDFS keeps more than 1,000,000 directories in /tmp/hbase-staging directory. When directory's content was purged the load process runs successfully. According the [hbase book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] {code} HBase manages creation and deletion of this directory. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11099) Two situations where we could open a region with smaller sequence number
[ https://issues.apache.org/jira/browse/HBASE-11099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220334#comment-14220334 ] Jeffrey Zhong commented on HBASE-11099: --- {quote} Is this speculation or something from phoenix or so? {quote} Currently it's a possible scenario by checking the code {quote} this a 0.98 issue too? {quote} Yes, that's a 0.98 issue too. [~apurtell] This is a low risk fix. It's better to get it in 0.98 as well. Thanks. Two situations where we could open a region with smaller sequence number Key: HBASE-11099 URL: https://issues.apache.org/jira/browse/HBASE-11099 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.1 Reporter: Jeffrey Zhong Assignee: Stephen Yuan Jiang Fix For: 2.0.0, 0.99.2 Attachments: HBASE-11099.v1-2.0.patch Recently I happened to run into code where we potentially could open region with smaller sequence number: 1) Inside function: HRegion#internalFlushcache. This is due to we change the way WAL Sync where we use late binding(assign sequence number right before wal sync). The flushSeqId may less than the change sequence number included in the flush which may cause later region opening code to use a smaller than expected sequence number when we reopen the region. {code} flushSeqId = this.sequenceId.incrementAndGet(); ... mvcc.waitForRead(w); {code} 2) HRegion#replayRecoveredEdits where we have following code: {code} ... if (coprocessorHost != null) { status.setStatus(Running pre-WAL-restore hook in coprocessors); if (coprocessorHost.preWALRestore(this.getRegionInfo(), key, val)) { // if bypass this log entry, ignore it ... continue; } } ... currentEditSeqId = key.getLogSeqNum(); {code} If coprocessor skip some tail WALEdits, then the function will return smaller currentEditSeqId. In the end, a region may also open with a smaller sequence number. This may cause data loss because Master may record a larger flushed sequence Id and some WALEdits maybe skipped during recovery if the region fail again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12533) staging directories does not deleted after secure bulk load
[ https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218280#comment-14218280 ] Jeffrey Zhong commented on HBASE-12533: --- [~dubislv] What kind of folders are left in the staging folder? Could you show some examples? I'm assuming you saw issue that bulkload runs successfully while it still leave some folders in staging folder after the bulkload. staging directories does not deleted after secure bulk load --- Key: HBASE-12533 URL: https://issues.apache.org/jira/browse/HBASE-12533 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Environment: CDH5.2 + Kerberos Reporter: Andrejs Dubovskis We using secure bulk load heavily in our environment. And it was working with no problem during some time. But last week I found that clients hangs while calling *doBulkLoad* After some investigation I found that HDFS keeps more than 1,000,000 directories in /tmp/hbase-staging directory. When directory's content was purged the load process runs successfully. According the [hbase book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] {code} HBase manages creation and deletion of this directory. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12533) staging directories are not deleted after secure bulk load
[ https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218920#comment-14218920 ] Jeffrey Zhong commented on HBASE-12533: --- From the pasted left folder names and they're root staging folders used by bulk load. In the code, I run a small test against the 0.98 code and seems they're cleared after a bulkload. staging directories are not deleted after secure bulk load -- Key: HBASE-12533 URL: https://issues.apache.org/jira/browse/HBASE-12533 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Environment: CDH5.2 + Kerberos Reporter: Andrejs Dubovskis We using secure bulk load heavily in our environment. And it was working with no problem during some time. But last week I found that clients hangs while calling *doBulkLoad* After some investigation I found that HDFS keeps more than 1,000,000 directories in /tmp/hbase-staging directory. When directory's content was purged the load process runs successfully. According the [hbase book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] {code} HBase manages creation and deletion of this directory. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-12533) staging directories are not deleted after secure bulk load
[ https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong reassigned HBASE-12533: - Assignee: Jeffrey Zhong staging directories are not deleted after secure bulk load -- Key: HBASE-12533 URL: https://issues.apache.org/jira/browse/HBASE-12533 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Environment: CDH5.2 + Kerberos Reporter: Andrejs Dubovskis Assignee: Jeffrey Zhong We using secure bulk load heavily in our environment. And it was working with no problem during some time. But last week I found that clients hangs while calling *doBulkLoad* After some investigation I found that HDFS keeps more than 1,000,000 directories in /tmp/hbase-staging directory. When directory's content was purged the load process runs successfully. According the [hbase book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] {code} HBase manages creation and deletion of this directory. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12533) staging directories are not deleted after secure bulk load
[ https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219087#comment-14219087 ] Jeffrey Zhong commented on HBASE-12533: --- I think I found the root cause of the issue, which I think it's a serious one. Below is the culprit: {code} public String prepareBulkLoad(final TableName tableName) throws IOException { try { return table.coprocessorService(SecureBulkLoadProtos.SecureBulkLoadService.class, EMPTY_START_ROW, LAST_ROW, ... {code} The prepareBulkLoad is fired up to hit all data regions so it will create same number of staging folders as the number of regions of the bulkloaded table while we only use the first one. That's why you can see many staging folders are left. There are couple of bugs in the SecureBulkLoadEndpoint#cleanupBulkLoad. 1) fire same request to all data regions 2) It tries to firstly create an already existing folder and then delete it. Too many unnecessary NN operations. staging directories are not deleted after secure bulk load -- Key: HBASE-12533 URL: https://issues.apache.org/jira/browse/HBASE-12533 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Environment: CDH5.2 + Kerberos Reporter: Andrejs Dubovskis Assignee: Jeffrey Zhong We using secure bulk load heavily in our environment. And it was working with no problem during some time. But last week I found that clients hangs while calling *doBulkLoad* After some investigation I found that HDFS keeps more than 1,000,000 directories in /tmp/hbase-staging directory. When directory's content was purged the load process runs successfully. According the [hbase book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] {code} HBase manages creation and deletion of this directory. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12533) staging directories are not deleted after secure bulk load
[ https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12533: -- Attachment: HBASE-12533.patch staging directories are not deleted after secure bulk load -- Key: HBASE-12533 URL: https://issues.apache.org/jira/browse/HBASE-12533 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Environment: CDH5.2 + Kerberos Reporter: Andrejs Dubovskis Assignee: Jeffrey Zhong Attachments: HBASE-12533.patch We using secure bulk load heavily in our environment. And it was working with no problem during some time. But last week I found that clients hangs while calling *doBulkLoad* After some investigation I found that HDFS keeps more than 1,000,000 directories in /tmp/hbase-staging directory. When directory's content was purged the load process runs successfully. According the [hbase book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] {code} HBase manages creation and deletion of this directory. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12533) staging directories are not deleted after secure bulk load
[ https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12533: -- Status: Patch Available (was: Open) staging directories are not deleted after secure bulk load -- Key: HBASE-12533 URL: https://issues.apache.org/jira/browse/HBASE-12533 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Environment: CDH5.2 + Kerberos Reporter: Andrejs Dubovskis Assignee: Jeffrey Zhong Attachments: HBASE-12533.patch We using secure bulk load heavily in our environment. And it was working with no problem during some time. But last week I found that clients hangs while calling *doBulkLoad* After some investigation I found that HDFS keeps more than 1,000,000 directories in /tmp/hbase-staging directory. When directory's content was purged the load process runs successfully. According the [hbase book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] {code} HBase manages creation and deletion of this directory. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12533) staging directories are not deleted after secure bulk load
[ https://issues.apache.org/jira/browse/HBASE-12533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219100#comment-14219100 ] Jeffrey Zhong commented on HBASE-12533: --- This issue also cause bulkload slow because it fire unnecessary RPC requests to hit all data regions to create/delete staging folders. staging directories are not deleted after secure bulk load -- Key: HBASE-12533 URL: https://issues.apache.org/jira/browse/HBASE-12533 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Environment: CDH5.2 + Kerberos Reporter: Andrejs Dubovskis Assignee: Jeffrey Zhong Attachments: HBASE-12533.patch We using secure bulk load heavily in our environment. And it was working with no problem during some time. But last week I found that clients hangs while calling *doBulkLoad* After some investigation I found that HDFS keeps more than 1,000,000 directories in /tmp/hbase-staging directory. When directory's content was purged the load process runs successfully. According the [hbase book|http://hbase.apache.org/book/ch08s03.html#hbase.secure.bulkload] {code} HBase manages creation and deletion of this directory. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11099) Two situations where we could open a region with smaller sequence number
[ https://issues.apache.org/jira/browse/HBASE-11099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215628#comment-14215628 ] Jeffrey Zhong commented on HBASE-11099: --- +1. Looks good to me as well. Two situations where we could open a region with smaller sequence number Key: HBASE-11099 URL: https://issues.apache.org/jira/browse/HBASE-11099 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.99.1 Reporter: Jeffrey Zhong Assignee: Stephen Yuan Jiang Fix For: 1.0.0, 2.0.0, 0.99.2 Attachments: HBASE-11099.v1-2.0.patch Recently I happened to run into code where we potentially could open region with smaller sequence number: 1) Inside function: HRegion#internalFlushcache. This is due to we change the way WAL Sync where we use late binding(assign sequence number right before wal sync). The flushSeqId may less than the change sequence number included in the flush which may cause later region opening code to use a smaller than expected sequence number when we reopen the region. {code} flushSeqId = this.sequenceId.incrementAndGet(); ... mvcc.waitForRead(w); {code} 2) HRegion#replayRecoveredEdits where we have following code: {code} ... if (coprocessorHost != null) { status.setStatus(Running pre-WAL-restore hook in coprocessors); if (coprocessorHost.preWALRestore(this.getRegionInfo(), key, val)) { // if bypass this log entry, ignore it ... continue; } } ... currentEditSeqId = key.getLogSeqNum(); {code} If coprocessor skip some tail WALEdits, then the function will return smaller currentEditSeqId. In the end, a region may also open with a smaller sequence number. This may cause data loss because Master may record a larger flushed sequence Id and some WALEdits maybe skipped during recovery if the region fail again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12485) Maintain SeqId monotonically increasing when Region Replica is on
Jeffrey Zhong created HBASE-12485: - Summary: Maintain SeqId monotonically increasing when Region Replica is on Key: HBASE-12485 URL: https://issues.apache.org/jira/browse/HBASE-12485 Project: HBase Issue Type: Sub-task Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong We added FLUSH, REGION CLOSE events into WAL, for each those events the region SeqId is bumped. The issue comes from region close operation. Because when opening a region, we use flushed SeqId from store files while after store flush during region close we still write COMMIT_FLUSH, REGION_CLOSE events etc which respectively bump up SeqId. Therefore, the region opening SeqId is lower than it should be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12319) Inconsistencies during region recovery due to close/open of a region during recovery
[ https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207395#comment-14207395 ] Jeffrey Zhong commented on HBASE-12319: --- The v2 patch attached is for 0.98 only. The issue is for branch-1 0.98 and let me try to commit it 0.98 branch-1 if you don't mind. Inconsistencies during region recovery due to close/open of a region during recovery Key: HBASE-12319 URL: https://issues.apache.org/jira/browse/HBASE-12319 Project: HBase Issue Type: Bug Affects Versions: 0.98.7, 0.99.1 Reporter: Devaraj Das Assignee: Jeffrey Zhong Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12319-v2.patch, HBASE-12319.patch In one of my test runs, I saw the following: {noformat} 2014-10-14 13:45:30,782 DEBUG [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04, isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Found 3 recovered edits file(s) under hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d . . 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Null or non-existent edits file: hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0198080 {noformat} The above logs is from a regionserver, say RS2. From the initial analysis it seemed like the master asked a certain regionserver to open the region (let's say RS1) and for some reason asked it to close soon after. The open was still proceeding on RS1 but the master reassigned the region to RS2. This also started the recovery but it ended up seeing an inconsistent view of the recovered-edits files (it reports missing files as per the logs above) since the first regionserver (RS1) deleted some files after it completed the recovery. When RS2 really opens the region, it might not see the recent data that was written by flushes on hor9n10 during the recovery process. Reads of that data would have inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-12319) Inconsistencies during region recovery due to close/open of a region during recovery
[ https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong resolved HBASE-12319. --- Resolution: Fixed Fix Version/s: (was: 2.0.0) I've integrated the fix into 0.98 branch-1. Thanks. Inconsistencies during region recovery due to close/open of a region during recovery Key: HBASE-12319 URL: https://issues.apache.org/jira/browse/HBASE-12319 Project: HBase Issue Type: Bug Affects Versions: 0.98.7, 0.99.1 Reporter: Devaraj Das Assignee: Jeffrey Zhong Priority: Critical Fix For: 0.99.2, 0.98.8 Attachments: HBASE-12319-v2.patch, HBASE-12319.patch In one of my test runs, I saw the following: {noformat} 2014-10-14 13:45:30,782 DEBUG [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04, isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Found 3 recovered edits file(s) under hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d . . 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Null or non-existent edits file: hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0198080 {noformat} The above logs is from a regionserver, say RS2. From the initial analysis it seemed like the master asked a certain regionserver to open the region (let's say RS1) and for some reason asked it to close soon after. The open was still proceeding on RS1 but the master reassigned the region to RS2. This also started the recovery but it ended up seeing an inconsistent view of the recovered-edits files (it reports missing files as per the logs above) since the first regionserver (RS1) deleted some files after it completed the recovery. When RS2 really opens the region, it might not see the recent data that was written by flushes on hor9n10 during the recovery process. Reads of that data would have inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12319) Inconsistencies during region recovery due to close/open of a region during recovery
[ https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12319: -- Priority: Critical (was: Major) Inconsistencies during region recovery due to close/open of a region during recovery Key: HBASE-12319 URL: https://issues.apache.org/jira/browse/HBASE-12319 Project: HBase Issue Type: Bug Affects Versions: 0.98.7, 0.99.1 Reporter: Devaraj Das Assignee: Jeffrey Zhong Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12319.patch In one of my test runs, I saw the following: {noformat} 2014-10-14 13:45:30,782 DEBUG [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04, isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Found 3 recovered edits file(s) under hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d . . 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Null or non-existent edits file: hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0198080 {noformat} The above logs is from a regionserver, say RS2. From the initial analysis it seemed like the master asked a certain regionserver to open the region (let's say RS1) and for some reason asked it to close soon after. The open was still proceeding on RS1 but the master reassigned the region to RS2. This also started the recovery but it ended up seeing an inconsistent view of the recovered-edits files (it reports missing files as per the logs above) since the first regionserver (RS1) deleted some files after it completed the recovery. When RS2 really opens the region, it might not see the recent data that was written by flushes on hor9n10 during the recovery process. Reads of that data would have inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12319) Inconsistencies during region recovery due to close/open of a region during recovery
[ https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203690#comment-14203690 ] Jeffrey Zhong commented on HBASE-12319: --- Since this issue may cause data loss or inconsistent data read, I marked it as critical. The symptom of the issue is that a region open doesn't wait for the previous region close completes so the newly opened region may not open all stores files if the previous region close may flush more data to disk. The test testOpenCloseRacing failure after the fix is a test issue. During the test, the region is opened twice therefore after the fix the region is opened in another RS while AM returns the first RS the region previously is assigned to. Before the fix, the test case doesn't wait for previous region open cancel complete, the test case can see the second region assignment immediately. If you put a sleep after the final assertion in the test case, you will see the meta location will be updated again by the previous canceled region opening. Below is the log after I put a 60-secs sleep after the final assert and you can see region ff976daf00708ecad200b113349fc4b4 in OPEN state and still got another OPENED which was from the previous assignment. {noformat} 2014-11-08 12:48:32,238 DEBUG [FifoRpcScheduler.handler1-thread-2] master.AssignmentManager(4077): Got transition OPENED for {ff976daf00708ecad200b113349fc4b4 state=PENDING_OPEN, ts=1415479712217, server=10.10.8.224,55613,1415479709023} from 10.10.8.224,55613,1415479709023 … 2014-11-08 12:48:32,936 DEBUG [FifoRpcScheduler.handler1-thread-4] master.AssignmentManager(4077): Got transition OPENED for {ff976daf00708ecad200b113349fc4b4 state=OPEN, ts=1415479712238, server=10.10.8.224,55613,1415479709023} from 10.10.8.224,55609,1415479708922 {noformat} The v2 patch amend the test case and make sure that region opening cleanupFailedOpen wait for region close before returning NotServingRegionException. Thanks. Inconsistencies during region recovery due to close/open of a region during recovery Key: HBASE-12319 URL: https://issues.apache.org/jira/browse/HBASE-12319 Project: HBase Issue Type: Bug Affects Versions: 0.98.7, 0.99.1 Reporter: Devaraj Das Assignee: Jeffrey Zhong Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12319.patch In one of my test runs, I saw the following: {noformat} 2014-10-14 13:45:30,782 DEBUG [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04, isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Found 3 recovered edits file(s) under hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d . . 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Null or non-existent edits file: hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0198080 {noformat} The above logs is from a regionserver, say RS2. From the initial analysis it seemed like the master asked a certain regionserver to open the region (let's say RS1) and for some reason asked it to close soon after. The open was still proceeding on RS1 but the master reassigned the region to RS2. This also started the recovery but it ended up seeing an inconsistent view of the recovered-edits files (it reports missing files as per the logs above) since the first regionserver (RS1) deleted some files after it completed the recovery. When RS2 really opens the region, it might not see the recent data that was written by flushes on hor9n10 during the recovery process. Reads of that data would have inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12319) Inconsistencies during region recovery due to close/open of a region during recovery
[ https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12319: -- Attachment: HBASE-12319-v2.patch Inconsistencies during region recovery due to close/open of a region during recovery Key: HBASE-12319 URL: https://issues.apache.org/jira/browse/HBASE-12319 Project: HBase Issue Type: Bug Affects Versions: 0.98.7, 0.99.1 Reporter: Devaraj Das Assignee: Jeffrey Zhong Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12319-v2.patch, HBASE-12319.patch In one of my test runs, I saw the following: {noformat} 2014-10-14 13:45:30,782 DEBUG [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04, isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Found 3 recovered edits file(s) under hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d . . 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Null or non-existent edits file: hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0198080 {noformat} The above logs is from a regionserver, say RS2. From the initial analysis it seemed like the master asked a certain regionserver to open the region (let's say RS1) and for some reason asked it to close soon after. The open was still proceeding on RS1 but the master reassigned the region to RS2. This also started the recovery but it ended up seeing an inconsistent view of the recovered-edits files (it reports missing files as per the logs above) since the first regionserver (RS1) deleted some files after it completed the recovery. When RS2 really opens the region, it might not see the recent data that was written by flushes on hor9n10 during the recovery process. Reads of that data would have inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12319) Inconsistencies during region recovery due to close/open of a region during recovery
[ https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-12319: -- Status: Patch Available (was: Reopened) Inconsistencies during region recovery due to close/open of a region during recovery Key: HBASE-12319 URL: https://issues.apache.org/jira/browse/HBASE-12319 Project: HBase Issue Type: Bug Affects Versions: 0.99.1, 0.98.7 Reporter: Devaraj Das Assignee: Jeffrey Zhong Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12319-v2.patch, HBASE-12319.patch In one of my test runs, I saw the following: {noformat} 2014-10-14 13:45:30,782 DEBUG [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04, isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Found 3 recovered edits file(s) under hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d . . 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Null or non-existent edits file: hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0198080 {noformat} The above logs is from a regionserver, say RS2. From the initial analysis it seemed like the master asked a certain regionserver to open the region (let's say RS1) and for some reason asked it to close soon after. The open was still proceeding on RS1 but the master reassigned the region to RS2. This also started the recovery but it ended up seeing an inconsistent view of the recovered-edits files (it reports missing files as per the logs above) since the first regionserver (RS1) deleted some files after it completed the recovery. When RS2 really opens the region, it might not see the recent data that was written by flushes on hor9n10 during the recovery process. Reads of that data would have inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12053) SecurityBulkLoadEndPoint set 777 permission on input data files
[ https://issues.apache.org/jira/browse/HBASE-12053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203693#comment-14203693 ] Jeffrey Zhong commented on HBASE-12053: --- I've tested the patch in a secure env and worked. If no objection, I'll commit it later next week. Thanks. SecurityBulkLoadEndPoint set 777 permission on input data files Key: HBASE-12053 URL: https://issues.apache.org/jira/browse/HBASE-12053 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12053.patch We have code in SecureBulkLoadEndpoint#secureBulkLoadHFiles {code} LOG.trace(Setting permission for: + p); fs.setPermission(p, PERM_ALL_ACCESS); {code} This is against the point we use staging folder for secure bulk load. Currently we create a hidden staging folder which has ALL_ACCESS permission and we use doAs to move input files into staging folder. Therefore, we should not set 777 permission on the original input data files but files in staging folder after move. This may comprise security setting especially when there is an error we move the file with 777 permission back. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12319) Inconsistencies during region recovery due to close/open of a region during recovery
[ https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203696#comment-14203696 ] Jeffrey Zhong commented on HBASE-12319: --- The v2 patch passed all tests against 0.98 branch. Inconsistencies during region recovery due to close/open of a region during recovery Key: HBASE-12319 URL: https://issues.apache.org/jira/browse/HBASE-12319 Project: HBase Issue Type: Bug Affects Versions: 0.98.7, 0.99.1 Reporter: Devaraj Das Assignee: Jeffrey Zhong Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12319-v2.patch, HBASE-12319.patch In one of my test runs, I saw the following: {noformat} 2014-10-14 13:45:30,782 DEBUG [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04, isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Found 3 recovered edits file(s) under hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d . . 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Null or non-existent edits file: hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0198080 {noformat} The above logs is from a regionserver, say RS2. From the initial analysis it seemed like the master asked a certain regionserver to open the region (let's say RS1) and for some reason asked it to close soon after. The open was still proceeding on RS1 but the master reassigned the region to RS2. This also started the recovery but it ended up seeing an inconsistent view of the recovered-edits files (it reports missing files as per the logs above) since the first regionserver (RS1) deleted some files after it completed the recovery. When RS2 really opens the region, it might not see the recent data that was written by flushes on hor9n10 during the recovery process. Reads of that data would have inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12319) Inconsistencies during region recovery due to close/open of a region during recovery
[ https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203722#comment-14203722 ] Jeffrey Zhong commented on HBASE-12319: --- {quote} I'm not inclined to do it over except for a blocker so this may wait for the next release. {quote} That's fine. This is an existing issue and there is no reason to hold off current release. It's also better to bake the fix for a little before releasing it. Once all good, I can hold to check the fix in once after 0.98.8 is out. Thanks. Inconsistencies during region recovery due to close/open of a region during recovery Key: HBASE-12319 URL: https://issues.apache.org/jira/browse/HBASE-12319 Project: HBase Issue Type: Bug Affects Versions: 0.98.7, 0.99.1 Reporter: Devaraj Das Assignee: Jeffrey Zhong Priority: Critical Fix For: 2.0.0, 0.98.9, 0.99.2 Attachments: HBASE-12319-v2.patch, HBASE-12319.patch In one of my test runs, I saw the following: {noformat} 2014-10-14 13:45:30,782 DEBUG [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04, isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Found 3 recovered edits file(s) under hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d . . 2014-10-14 13:45:31,916 WARN [RS_OPEN_REGION-hor9n01:60020-1] regionserver.HRegion: Null or non-existent edits file: hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0198080 {noformat} The above logs is from a regionserver, say RS2. From the initial analysis it seemed like the master asked a certain regionserver to open the region (let's say RS1) and for some reason asked it to close soon after. The open was still proceeding on RS1 but the master reassigned the region to RS2. This also started the recovery but it ended up seeing an inconsistent view of the recovered-edits files (it reports missing files as per the logs above) since the first regionserver (RS1) deleted some files after it completed the recovery. When RS2 really opens the region, it might not see the recent data that was written by flushes on hor9n10 during the recovery process. Reads of that data would have inconsistencies. -- This message was sent by Atlassian JIRA (v6.3.4#6332)