[jira] [Created] (HBASE-10577) Remove unnecessary looping in FSHLog
Himanshu Vashishtha created HBASE-10577: --- Summary: Remove unnecessary looping in FSHLog Key: HBASE-10577 URL: https://issues.apache.org/jira/browse/HBASE-10577 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.99.0 Reporter: Himanshu Vashishtha In the new disruptor based FSHLog, the Syncer threads are handed a batch of SyncFuture objects from the RingBufferHandler. The Syncer then invokes a sync call on the current writer instance. This handing of batch is done in serially in RingBufferHandler, that is, every syncer receives a non overlapping batch of SyncFutures. Once synced, Syncer thread updates highestSyncedSequence. In the run method of Syncer, we have: {code} long currentHighestSyncedSequence = highestSyncedSequence.get(); if (currentSequence < currentHighestSyncedSequence) { syncCount += releaseSyncFuture(takeSyncFuture, currentHighestSyncedSequence, null); // Done with the 'take'. Go around again and do a new 'take'. continue; } {code} I find this logic of polling the BlockingQueue again in this condition un-necessary. When the currentHighestSyncedSequence is already greater than currentSequence, then doesn't it mean some other Syncer has already synced SyncFuture of these ops ? And, we should just go ahead and release all the SyncFutures for this batch to unblock the handlers. That would avoid polling the Blockingqueue for all SyncFuture objects in this case. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10563) Set name for FlushHandler thread
Himanshu Vashishtha created HBASE-10563: --- Summary: Set name for FlushHandler thread Key: HBASE-10563 URL: https://issues.apache.org/jira/browse/HBASE-10563 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.0 Reporter: Himanshu Vashishtha The FlushHandler thread in MemStoreFlusher class uses default thread name (Thread -XX). This is un-intentional and also confusing in case when there are multiple handlers. Current Stack trace looks like this: {code} "Thread-18" prio=10 tid=0x7f4e8cb21800 nid=0x356e waiting on condition [0x7f4e6d49a000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0004e5684b00> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at java.util.concurrent.DelayQueue.poll(DelayQueue.java:201) at java.util.concurrent.DelayQueue.poll(DelayQueue.java:39) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:228) at java.lang.Thread.run(Thread.java:662) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10378) Divide HLog interface into User and Implementor specific interfaces
Himanshu Vashishtha created HBASE-10378: --- Summary: Divide HLog interface into User and Implementor specific interfaces Key: HBASE-10378 URL: https://issues.apache.org/jira/browse/HBASE-10378 Project: HBase Issue Type: Sub-task Components: wal Reporter: Himanshu Vashishtha HBASE-5937 introduces the HLog interface as a first step to support multiple WAL implementations. This interface is a good start, but has some limitations/drawbacks in its current state, such as: 1) There is no clear distinction b/w User and Implementor APIs, and it provides APIs both for WAL users (append, sync, etc) and also WAL implementors (Reader/Writer interfaces, etc). There are APIs which are very much implementation specific (getFileNum, etc) and a user such as a RegionServer shouldn't know about it. 2) There are about 14 methods in FSHLog which are not present in HLog interface but are used at several places in the unit test code. These tests typecast HLog to FSHLog, which makes it very difficult to test multiple WAL implementations without doing some ugly checks. I'd like to propose some changes in HLog interface that would ease the multi WAL story: 1) Have two interfaces WAL and WALService. WAL provides APIs for implementors. WALService provides APIs for users (such as RegionServer). 2) A skeleton implementation of the above two interface as the base class for other WAL implementations (AbstractWAL). It provides required fields for all subclasses (fs, conf, log dir, etc). Make a minimal set of test only methods and add this set in AbstractWAL. 3) HLogFactory returns a WALService reference when creating a WAL instance; if a user need to access impl specific APIs (there are unit tests which get WAL from a HRegionServer and then call impl specific APIs), use AbstractWAL type casting, 4) Make TestHLog abstract and let all implementors provide their respective test class which extends TestHLog (TestFSHLog, for example). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10278) Provide better write predictability
Himanshu Vashishtha created HBASE-10278: --- Summary: Provide better write predictability Key: HBASE-10278 URL: https://issues.apache.org/jira/browse/HBASE-10278 Project: HBase Issue Type: New Feature Reporter: Himanshu Vashishtha Currently, HBase has one WAL per region server. Whenever there is any latency in the write pipeline (due to whatever reasons such as n/w blip, a node in the pipeline having a bad disk, etc), the overall write latency suffers. Jonathan Hsieh and I analyzed various approaches to tackle this issue. We also looked at HBASE-5699, which talks about adding concurrent multi WALs. Along with performance numbers, we also focussed on design simplicity, minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. Considering all these parameters, we propose a new HLog implementation with WAL Switching functionality. Please find attached the design doc for the same. It introduces the WAL Switching feature, and experiments/results of a prototype implementation, showing the benefits of this feature. The second goal of this work is to serve as a building block for concurrent multiple WALs feature. Please review the doc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10004) Some fixes for scoping sequence Ids to region level
Himanshu Vashishtha created HBASE-10004: --- Summary: Some fixes for scoping sequence Ids to region level Key: HBASE-10004 URL: https://issues.apache.org/jira/browse/HBASE-10004 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.98.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.98.0 While looking at trunk, I figured out some issues related to fix provided in HBASE-8741. This jira is to fix them: 1) Not so helpful log message in FSHLog#getFileNumFromFileName. 2) HLogPE verify method, the region:sequenceId map was not getting updated. 3) reverse use of assertEquals arguments in TestHLog#testFindMemStoresEligibleForFlush. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9925) Don't close a file if doesn't EOF while replicating
Himanshu Vashishtha created HBASE-9925: -- Summary: Don't close a file if doesn't EOF while replicating Key: HBASE-9925 URL: https://issues.apache.org/jira/browse/HBASE-9925 Project: HBase Issue Type: Bug Affects Versions: 0.96.0, 0.98.0 Reporter: Himanshu Vashishtha While doing replication, we open and close the WAL file _every_ time we read entries to send. We could open/close the reader only when we hit EOF. That would alleviate some NN load, especially on a write heavy cluster. This came while discussing our current open/close heuristic in replication with [~jdcryans]. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9785) Fix heap size reporting in HRegion
Himanshu Vashishtha created HBASE-9785: -- Summary: Fix heap size reporting in HRegion Key: HBASE-9785 URL: https://issues.apache.org/jira/browse/HBASE-9785 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Himanshu Vashishtha Fix For: 0.98.0, 0.96.1 The size reported by Fixed_Overhead variable in HRegion misses out on a boolean variable. TestHeapSize doesn't report it because of the alignment we do to make it multiple of 8. The equation for HRegion heap usage from Fixed_Overhead is: Fixed_Overhead = align(100 + 42*ref + 1 arr) = align(284) = 288 on a 32 bit vm (On a 32 bit vm, 1 ref = 4bytes, 1 arr = 16 bytes) The equation formed using reflection (in Classsize) is: Expected_Overehead = align(101 + 42*ref + 1arr) = align(285) = 288. So, the testHeapSize doesn't fail currently. But if I add a reference (did in last patch in 8741), it starts failing because, now the equations are: Fixed_Overhead = align(100 + 43*ref + 1 arr) = align(288) = 288 Expected_Overhead = align(101 + 43*ref + 1 arr) = align(289) = 296. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Resolved] (HBASE-6515) Setting request size with protobuf
[ https://issues.apache.org/jira/browse/HBASE-6515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha resolved HBASE-6515. Resolution: Duplicate > Setting request size with protobuf > -- > > Key: HBASE-6515 > URL: https://issues.apache.org/jira/browse/HBASE-6515 > Project: HBase > Issue Type: Bug > Components: IPC/RPC, Replication >Affects Versions: 0.95.2 >Reporter: Himanshu Vashishtha >Priority: Critical > > While running replication on upstream code, I am hitting the size-limit > exception while sending WALEdits to a different cluster. > {code} > com.google.protobuf.InvalidProtocolBufferException: IPC server unable to read > call parameters: Protocol message was too large. May be malicious. Use > CodedInputStream.setSizeLimit() to increase the size limit. > {code} > Do we have a property to set some max size or something? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9539) Handle post namespace snapshot files when checking for HFile V1
Himanshu Vashishtha created HBASE-9539: -- Summary: Handle post namespace snapshot files when checking for HFile V1 Key: HBASE-9539 URL: https://issues.apache.org/jira/browse/HBASE-9539 Project: HBase Issue Type: Bug Components: migration Reporter: Himanshu Vashishtha When checking for HFileV1 before upgrading to 96, the snapshot file links tries to read from post-namespace locations. The migration script needs to be run on 94 cluster, and it requires reading the old (94) layout to check for HFileV1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9509) Fix HFile V1 Detector to handle AccessControlException for non-existant files
Himanshu Vashishtha created HBASE-9509: -- Summary: Fix HFile V1 Detector to handle AccessControlException for non-existant files Key: HBASE-9509 URL: https://issues.apache.org/jira/browse/HBASE-9509 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Himanshu Vashishtha Fix For: 0.98.0, 0.96.0 On some hadoop versions, fs.exists() throws an AccessControlException if there is a non-searchable inode in the file path. Versions such as 2.1.0-beta just returns false. This jira is to fix HFile V1 detector tool to avoid making such calls. See the below exception when running the tool on one hadoop version {code} ERROR util.HFileV1Detector: org.apache.hadoop.security.AccessControlException: Permission denied: user=hbase, access=EXECUTE, inode="/hbase/.META./.tableinfo.01":hbase:supergroup:-rw-r--r-- at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:187) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:150) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5141) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5123) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkTraverse(FSNamesystem.java:5102) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3265) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:719) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:692) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:59628) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2036) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2034) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9497) Old .META. .tableinfo file kills HMaster
Himanshu Vashishtha created HBASE-9497: -- Summary: Old .META. .tableinfo file kills HMaster Key: HBASE-9497 URL: https://issues.apache.org/jira/browse/HBASE-9497 Project: HBase Issue Type: Bug Components: master, migration Affects Versions: 0.95.2 Reporter: Himanshu Vashishtha Fix For: 0.98.0, 0.96.0 In pre-0.96, .META. has .tableinfo files which refer to .META. On startup, master tries to read it and aborts since the table name has changed. The .META. .tableinfo files are not being created in 0.94.x (fixed for 96 in HBASE-6971; but this can be reproduced when migrating from 0.92 -> 0.94 -> 0.96. Our old users would be affected by this. {code} java.lang.IllegalArgumentException: .META. no longer exists. The table has been renamed to hbase:meta at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:291) at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:283) at org.apache.hadoop.hbase.HTableDescriptor.readFields(HTableDescriptor.java:960) at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:131) at org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:101) at org.apache.hadoop.hbase.HTableDescriptor.parseFrom(HTableDescriptor.java:1407) at org.apache.hadoop.hbase.util.FSTableDescriptors.readTableDescriptor(FSTableDescriptors.java:521) at org.apache.hadoop.hbase.util.FSTableDescriptors.createTableDescriptorForTableDirectory(FSTableDescriptors.java:707) at org.apache.hadoop.hbase.util.FSTableDescriptors.createTableDescriptor(FSTableDescriptors.java:683) at org.apache.hadoop.hbase.util.FSTableDescriptors.createTableDescriptor(FSTableDescriptors.java:670) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:485) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:145) at org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:129) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:761) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-9297) Fix HFileV1Detector tool post HBASE-9126
[ https://issues.apache.org/jira/browse/HBASE-9297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha resolved HBASE-9297. Resolution: Fixed I folded it in HBase-9311. > Fix HFileV1Detector tool post HBASE-9126 > > > Key: HBASE-9297 > URL: https://issues.apache.org/jira/browse/HBASE-9297 > Project: HBase > Issue Type: Bug > Components: migration >Reporter: Himanshu Vashishtha >Assignee: Himanshu Vashishtha > > We need to detect any HFileV1 before upgrading to 0.96. The code to read > HFileV1 version is removed in HBASE-9126. It breaks HFileV1Detector tool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8414) Hbck still refers to -ROOT- table to locate .META.
[ https://issues.apache.org/jira/browse/HBASE-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha resolved HBASE-8414. Resolution: Duplicate It was resolved in HBase-8627. > Hbck still refers to -ROOT- table to locate .META. > -- > > Key: HBASE-8414 > URL: https://issues.apache.org/jira/browse/HBASE-8414 > Project: HBase > Issue Type: Bug > Components: hbck >Affects Versions: 0.95.0 >Reporter: Himanshu Vashishtha >Assignee: Himanshu Vashishtha > > In the current ROOT-less trunk, hbck still tries to fix meta by looking its > location in the .ROOT. table. This happens if there is no .META. assigned > when hbck is ran. > HbaseFsck.java: > {code} > boolean checkMetaRegion() { > ... > HRegionLocation rootLocation = connection.locateRegion( > HConstants.ROOT_TABLE_NAME, HConstants.EMPTY_START_ROW); > ... > } > {code} > Running hbck while meta is in transition: > {code} > bin/hbase hbck > Version: 0.95.0-SNAPSHOT > ERROR: META region or some of its attributes are null. > ERROR: Fatal error: unable to get root region location. Exiting... > Summary: > 2 inconsistencies detected. > Status: INCONSISTENT > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9297) Fix HFileV1Detector tool post HBASE-9126
Himanshu Vashishtha created HBASE-9297: -- Summary: Fix HFileV1Detector tool post HBASE-9126 Key: HBASE-9297 URL: https://issues.apache.org/jira/browse/HBASE-9297 Project: HBase Issue Type: Bug Components: migration Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha We need to detect any HFileV1 before upgrading to 0.96. The code to read HFileV1 version is removed in HBASE-9126. It breaks HFileV1Detector tool. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9278) Reading Pre-namespace meta table edits kills the reader
Himanshu Vashishtha created HBASE-9278: -- Summary: Reading Pre-namespace meta table edits kills the reader Key: HBASE-9278 URL: https://issues.apache.org/jira/browse/HBASE-9278 Project: HBase Issue Type: Bug Components: migration, wal Affects Versions: 0.95.2 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Priority: Critical Fix For: 0.95.3 In upgrading to 0.96, there might be some meta/root table edits. Currently, we are just killing SplitLogWorker thread in case it sees any META, or ROOT waledit, which blocks log splitting/replaying of remaining WALs. {code} 2013-08-20 15:45:16,998 ERROR regionserver.SplitLogWorker (SplitLogWorker.java:run(210)) - unexpected error java.lang.IllegalArgumentException: .META. no longer exists. The table has been renamed to hbase:meta at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:269) at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:261) at org.apache.hadoop.hbase.regionserver.wal.HLogKey.readFields(HLogKey.java:338) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1898) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1938) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.readNext(SequenceFileLogReader.java:215) at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:98) at org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:85) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:582) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:292) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:209) at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:138) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:358) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:245) at org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:205) at java.lang.Thread.run(Thread.java:662) 2013-08-20 15:45:16,999 INFO regionserver.SplitLogWorker (SplitLogWorker.java:run(212)) - SplitLogWorker localhost,60020,1377035111898 exiting {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9195) TestFSHDFSUtils is too aggressive
Himanshu Vashishtha created HBASE-9195: -- Summary: TestFSHDFSUtils is too aggressive Key: HBASE-9195 URL: https://issues.apache.org/jira/browse/HBASE-9195 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.95.1 Reporter: Himanshu Vashishtha Priority: Minor The recoverLease test in this class sets hbase.lease.recovery.pause to 10ms. It causes that after every 10 ms, it calls isFileClosed (if it is available). Though the test takes only 3-4 sec, but it makes about 270 isFileClosed calls. This cause test to become somewhat flaky in some internal testing. The proposed fix is to just to increase the pause interval to 100 ms. This reduces the number of calls to about 30. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9141) Save replication znodes while migrating to 0.96
Himanshu Vashishtha created HBASE-9141: -- Summary: Save replication znodes while migrating to 0.96 Key: HBASE-9141 URL: https://issues.apache.org/jira/browse/HBASE-9141 Project: HBase Issue Type: Improvement Components: migration, Replication Affects Versions: 0.94.10 Reporter: Himanshu Vashishtha Fix For: 0.95.2 While migrating to 0.96, we recommend deleting old znodes so users not face issues like HBASE-7766, and let HBase create them out of box. Though HBase tends to store only ephemeral data in zookeeper, replication has a different approach. Almost all of its data (state, peer info, logs, etc) is present in zookeeper. We would like to preserve them in order to not do re-adding of peers, and ensuring complete replication after we have migrated to 0.96. This jira adds a tool to serialize/de-serialize replication znodes to the underlying filesystem. This could be used while migrating to 0.96.0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9122) NPE in MasterMonitorCallable.close()
Himanshu Vashishtha created HBASE-9122: -- Summary: NPE in MasterMonitorCallable.close() Key: HBASE-9122 URL: https://issues.apache.org/jira/browse/HBASE-9122 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.95.1 Reporter: Himanshu Vashishtha Priority: Minor Found this npe while running Integration test suite: {code} Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.client.HBaseAdmin$MasterMonitorCallable.close(HBaseAdmin.java:2691) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:2721) at org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:1949) at org.apache.hadoop.hbase.util.ChaosMonkey$MoveRegionsOfTable.perform(ChaosMonkey.java:349) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:490) at org.apache.hadoop.hbase.mttr.IntegrationTestMTTR$ActionCallable.call(IntegrationTestMTTR.java:481) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9110) Meta region edits not recovered while migrating to 0.96.0
Himanshu Vashishtha created HBASE-9110: -- Summary: Meta region edits not recovered while migrating to 0.96.0 Key: HBASE-9110 URL: https://issues.apache.org/jira/browse/HBASE-9110 Project: HBase Issue Type: Bug Components: migration Affects Versions: 0.94.10, 0.95.2 Reporter: Himanshu Vashishtha I was doing the the migration testing from 0.94.11-snapshot to 0.95.0, and faced this issue. 1) Do some edits in meta table (for eg, create a table). 2) Kill the cluster. (I used kill because we would be doing log splitting when upgrading anyway). 3) There is some dependency on WALs. Upgrade the bits to 0.95.2-snapshot. Start the cluster. Every thing comes up. I see log splitting happening as expected. But, the WAL-data for meta table is missing. I could see recovered.edits file for meta created, and placed at the right location. It is just that the new HMaster code tries to recover meta by looking at meta prefix in the log name, and if it didn't find one, just opens the meta region. So, the recovered.edits file, created afterwards, is not honored. Opening this jira to let folks give their opinions about how to tackle this migration issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8911) Inject MTTR specific traces to get a break up of various steps
Himanshu Vashishtha created HBASE-8911: -- Summary: Inject MTTR specific traces to get a break up of various steps Key: HBASE-8911 URL: https://issues.apache.org/jira/browse/HBASE-8911 Project: HBase Issue Type: Bug Components: MTTR Affects Versions: 0.95.1 Reporter: Himanshu Vashishtha There are various steps involved in a regionserver recovery process. This jira adds instrumentation at various places in order to get an idea what are the steps involved in a regionserver recovery and how much time is spent in each of these parts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8750) MetaServerShutdownHandler stucks if .META. assignment fails in previous attempt
Himanshu Vashishtha created HBASE-8750: -- Summary: MetaServerShutdownHandler stucks if .META. assignment fails in previous attempt Key: HBASE-8750 URL: https://issues.apache.org/jira/browse/HBASE-8750 Project: HBase Issue Type: Bug Components: MTTR Affects Versions: 0.95.1 Reporter: Himanshu Vashishtha While running log replay on a one node setup, I killed meta regionserver. The MetaSSH tries to assign the meta table, but it failed as there was no other regionservers to assign to. But the meta server znode was already updated to null. When the assignment fails, the metaSSH is retried. But from the next iteration, it will not try to assign the meta region, but keeps on waiting for meta region. This keeps on going even after regionserver is brought up again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8741) Mutations on Regions in recovery mode might have same sequenceIDs
Himanshu Vashishtha created HBASE-8741: -- Summary: Mutations on Regions in recovery mode might have same sequenceIDs Key: HBASE-8741 URL: https://issues.apache.org/jira/browse/HBASE-8741 Project: HBase Issue Type: Bug Components: MTTR Affects Versions: 0.95.1 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Currently, when opening a region, we find the maximum sequence ID from all its HFiles and then set the LogSequenceId of the log (in case the later is at a small value). This works good in recovered.edits case as we are not writing to the region until we have replayed all of its previous edits. With distributed log replay, if we want to enable writes while a region is under recovery, we need to make sure that the logSequenceId > maximum logSequenceId of the old regionserver. Otherwise, we might have a situation where new edits have same (or smaller) sequenceIds. We can store region level information in the WALTrailer, than this scenario could be avoided by: a) reading the trailer of the "last completed" file, i.e., last wal file which has a trailer and, b) completely reading the last wal file (this file would not have the trailer, so it needs to be read completely). In future, if we switch to multi wal file, we could read the trailer for all completed WAL files, and reading the remaining incomplete files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8689) Cover all mutations rather than only Put while reporting for mutations not writing to WAL
Himanshu Vashishtha created HBASE-8689: -- Summary: Cover all mutations rather than only Put while reporting for mutations not writing to WAL Key: HBASE-8689 URL: https://issues.apache.org/jira/browse/HBASE-8689 Project: HBase Issue Type: Bug Components: metrics, regionserver Affects Versions: 0.95.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Currently, only Puts are reported instead of all mutations (increment, append, delete) when it is not writing to WAL. It should do the book keeping for other mutations too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8612) Fix TestMetaScanner.testConcurrentMetaScannerAndCatalogJanitor failure
[ https://issues.apache.org/jira/browse/HBASE-8612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha resolved HBASE-8612. Resolution: Duplicate > Fix TestMetaScanner.testConcurrentMetaScannerAndCatalogJanitor failure > -- > > Key: HBASE-8612 > URL: https://issues.apache.org/jira/browse/HBASE-8612 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 0.94.8 >Reporter: Himanshu Vashishtha > > Got this test failure: > REGRESSION: > org.apache.hadoop.hbase.client.TestMetaScanner.testConcurrentMetaScannerAndCatalogJanitor > Error Message: > Split daughter region > testConcurrentMetaScannerAndCatalogJanitor,q\xFF\xFF\xFF\xFF\xFF\xFF\xFF,1369373178944.aa8d1dc3daf7fae3ec55a940f9848e42. > cannot be found in META. > Stack Trace: > org.apache.hadoop.hbase.client.RegionOfflineException: Split daughter region > testConcurrentMetaScannerAndCatalogJanitor,q\xFF\xFF\xFF\xFF\xFF\xFF\xFF,1369373178944.aa8d1dc3daf7fae3ec55a940f9848e42. > cannot be found in META. > at > org.apache.hadoop.hbase.client.MetaScanner$BlockingMetaScannerVisitor.processRow(MetaScanner.java:433) > at > org.apache.hadoop.hbase.client.MetaScanner$TableMetaScannerVisitor.processRow(MetaScanner.java:495) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:224) > at > org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:54) > at > org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:133) > at > org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) > at > org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:383) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:130) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:105) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:83) > at > org.apache.hadoop.hbase.client.MetaScanner.allTableRegions(MetaScanner.java:323) > at > org.apache.hadoop.hbase.client.TestMetaScanner$1MetaScannerVerifier.run(TestMetaScanner.java:194) > at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8612) Fix TestMetaScanner.testConcurrentMetaScannerAndCatalogJanitor failure
Himanshu Vashishtha created HBASE-8612: -- Summary: Fix TestMetaScanner.testConcurrentMetaScannerAndCatalogJanitor failure Key: HBASE-8612 URL: https://issues.apache.org/jira/browse/HBASE-8612 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.8, 0.95.1 Reporter: Himanshu Vashishtha Got this test failure: REGRESSION: org.apache.hadoop.hbase.client.TestMetaScanner.testConcurrentMetaScannerAndCatalogJanitor Error Message: Split daughter region testConcurrentMetaScannerAndCatalogJanitor,q\xFF\xFF\xFF\xFF\xFF\xFF\xFF,1369373178944.aa8d1dc3daf7fae3ec55a940f9848e42. cannot be found in META. Stack Trace: org.apache.hadoop.hbase.client.RegionOfflineException: Split daughter region testConcurrentMetaScannerAndCatalogJanitor,q\xFF\xFF\xFF\xFF\xFF\xFF\xFF,1369373178944.aa8d1dc3daf7fae3ec55a940f9848e42. cannot be found in META. at org.apache.hadoop.hbase.client.MetaScanner$BlockingMetaScannerVisitor.processRow(MetaScanner.java:433) at org.apache.hadoop.hbase.client.MetaScanner$TableMetaScannerVisitor.processRow(MetaScanner.java:495) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:224) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:54) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:133) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:383) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:105) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:83) at org.apache.hadoop.hbase.client.MetaScanner.allTableRegions(MetaScanner.java:323) at org.apache.hadoop.hbase.client.TestMetaScanner$1MetaScannerVerifier.run(TestMetaScanner.java:194) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8591) Doc Improvement: Replication blog
Himanshu Vashishtha created HBASE-8591: -- Summary: Doc Improvement: Replication blog Key: HBASE-8591 URL: https://issues.apache.org/jira/browse/HBASE-8591 Project: HBase Issue Type: Task Components: documentation, Replication Affects Versions: 0.95.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Add a section for source level metrics and some truth about table batch ops at sink. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8476) locateRegionInMeta should check the cache before doing the prefetch
[ https://issues.apache.org/jira/browse/HBASE-8476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha resolved HBASE-8476. Resolution: Fixed Assignee: Himanshu Vashishtha (was: Amitanand Aiyer) Folded it in HBASE-8346 > locateRegionInMeta should check the cache before doing the prefetch > --- > > Key: HBASE-8476 > URL: https://issues.apache.org/jira/browse/HBASE-8476 > Project: HBase > Issue Type: Bug >Reporter: Amitanand Aiyer >Assignee: Himanshu Vashishtha >Priority: Minor > Fix For: 0.89-fb, 0.95.2 > > > locateRegionInMeta uses a regionLockObject to synchronize all accesses to > prefetch the RegionCache. > synchronized (regionLockObject) { > // If the parent table is META, we may want to pre-fetch some > // region info into the global region cache for this table. > if (Bytes.equals(parentTable, HConstants.META_TABLE_NAME) && > (getRegionCachePrefetch(tableName)) ) { > prefetchRegionCache(tableName, row); > } > // Check the cache again for a hit in case some other thread made > the > // same query while we were waiting on the lock. If not supposed > to > // be using the cache, delete any existing cached location so it > won't > // interfere. > if (useCache) { > location = getCachedLocation(tableName, row); > if (location != null) { > return location; > } > } else { > deleteCachedLocation(tableName, row); > } > > However, for this to be effective, we need to check the cache as soon as we > grab the lock; before doing the prefetch. Checking the cache after doing the > prefetch does not help the current thread, in case another thread has done > the prefetch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8414) Hbck still refers to -ROOT- table to locate .META.
Himanshu Vashishtha created HBASE-8414: -- Summary: Hbck still refers to -ROOT- table to locate .META. Key: HBASE-8414 URL: https://issues.apache.org/jira/browse/HBASE-8414 Project: HBase Issue Type: Bug Components: hbck Reporter: Himanshu Vashishtha In the current ROOT-less trunk, hbck still tries to fix meta by looking its location in the .ROOT. table. This happens if there is no .META. assigned when hbck is ran. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8402) ScanMetrics depends on number of rpc calls to the server.
Himanshu Vashishtha created HBASE-8402: -- Summary: ScanMetrics depends on number of rpc calls to the server. Key: HBASE-8402 URL: https://issues.apache.org/jira/browse/HBASE-8402 Project: HBase Issue Type: Bug Reporter: Himanshu Vashishtha Priority: Minor I find it a bit odd while testing scan metrics that is not publishing metrics when there is one trip to server. Why would they depend on number of rpc calls? I was testing a small table with caching set to a large value, but scan metrics was coming to null. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8395) Remove TestFromClientSide.testPoolBehavior
Himanshu Vashishtha created HBASE-8395: -- Summary: Remove TestFromClientSide.testPoolBehavior Key: HBASE-8395 URL: https://issues.apache.org/jira/browse/HBASE-8395 Project: HBase Issue Type: Task Components: test Affects Versions: 0.95.0 Reporter: Himanshu Vashishtha Priority: Trivial This test tests the underlying ThreadPoolExecutor's thread management, and has nothing to do with HBase functionality. I suggest we should delete it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8346) Prefetchiing .META. rows in case only when useCache is set to true
Himanshu Vashishtha created HBASE-8346: -- Summary: Prefetchiing .META. rows in case only when useCache is set to true Key: HBASE-8346 URL: https://issues.apache.org/jira/browse/HBASE-8346 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.95.0 Reporter: Himanshu Vashishtha Priority: Minor While doing a .META. lookup (HCM#locateRegionInMeta), we also prefetch some other region's info for that table. The usual call to the meta lookup has useCache variable set to true. Currently, it calls preFetch irrespective of the value useCache flag: {code} if (Bytes.equals(parentTable, HConstants.META_TABLE_NAME) && (getRegionCachePrefetch(tableName))) { prefetchRegionCache(tableName, row); } {code} Later on, if useCache flag is set to false, it deletes the entry for that row from the cache with a forceDeleteCachedLocation() call. This always results in two calls to the .META. table in this case. The useCase variable is set to false in case we are retrying to find a region (regionserver failover). It can be verified from the log statements of a client while having a regionserver failover. In the below example, the client was connected to a1217, when a1217 got killed. The region in question is moved to a1215. Client got this info from META scan, where as client cache this info from META, but then delete it from cache as it want the latest info. The result is even the meta provides the latest info, it is still deleted This causes even the latest info to be deleted. {code} 13/04/15 09:49:12 DEBUG client.HConnectionManager$HConnectionImplementation: Cached location for t,user7225973201630273569,1365536809331.40382355b8c45e1338d620c018f8ff6c. is a1217.abc.com:40020 13/04/15 09:49:12 WARN client.ServerCallable: Received exception, tries=1, numRetries=30 message=Connection refused 13/04/15 09:49:12 DEBUG client.HConnectionManager$HConnectionImplementation: Removed all cached region locations that map to a1217.abc.com,40020,1365621947381 13/04/15 09:49:13 DEBUG client.MetaScanner: Current INFO from scan results = {NAME => 't,user7225973201630273569,1365536809331.40382355b8c45e1338d620c018f8ff6c.', STARTKEY => 'user7225973201630273569', ENDKEY => '', ENCODED => 40382355b8c45e1338d620c018f8ff6c,} 13/04/15 09:49:13 DEBUG client.MetaScanner: Scanning .META. starting at row=t,user7225973201630273569,00 for max=10 rows using hconnection-0x7786df0f 13/04/15 09:49:13 DEBUG client.MetaScanner: Current INFO from scan results = {NAME => 't,user7225973201630273569,1365536809331.40382355b8c45e1338d620c018f8ff6c.', STARTKEY => 'user7225973201630273569', ENDKEY => '', ENCODED => 40382355b8c45e1338d620c018f8ff6c,} 13/04/15 09:49:13 DEBUG client.HConnectionManager$HConnectionImplementation: Cached location for t,user7225973201630273569,1365536809331.40382355b8c45e1338d620c018f8ff6c. is *a1215.abc.com:40020* 13/04/15 09:49:13 DEBUG client.HConnectionManager$HConnectionImplementation: *Removed a1215.abc.com:40020* as a location of t,user7225973201630273569,1365536809331.40382355b8c45e1338d620c018f8ff6c. for tableName=t from cache 13/04/15 09:49:13 DEBUG client.MetaScanner: Current INFO from scan results = {NAME => 't,user7225973201630273569,1365536809331.40382355b8c45e1338d620c018f8ff6c.', STARTKEY => 'user7225973201630273569', ENDKEY => '', ENCODED => 40382355b8c45e1338d620c018f8ff6c,} 13/04/15 09:49:13 DEBUG client.HConnectionManager$HConnectionImplementation: Cached location for t,user7225973201630273569,1365536809331.40382355b8c45e1338d620c018f8ff6c. is *a1215.abc.com:40020* 13/04/15 09:49:13 WARN client.ServerCallable: Received exception, tries=2, numRetries=30 message=org.apache.hadoop.hbase.exceptions.UnknownScannerException: Name: -6313340536390503703, already closed? 13/04/15 09:49:13 DEBUG client.ClientScanner: Advancing internal scanner to startKey at 'user760712450403198900' {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-7954) Fix the retrying logic of memstore flushes to avoid extra sleep
[ https://issues.apache.org/jira/browse/HBASE-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha reopened HBASE-7954: > Fix the retrying logic of memstore flushes to avoid extra sleep > --- > > Key: HBASE-7954 > URL: https://issues.apache.org/jira/browse/HBASE-7954 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5, 0.95.0 >Reporter: Himanshu Vashishtha >Assignee: Himanshu Vashishtha >Priority: Minor > > Matteo pointed out: > "We can avoid the redundant sleep in the retrying logic." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-7954) Fix the retrying logic of memstore flushes to avoid extra sleep
[ https://issues.apache.org/jira/browse/HBASE-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha resolved HBASE-7954. Resolution: Invalid With HBASE-7507 rolled back, we no longer need this. > Fix the retrying logic of memstore flushes to avoid extra sleep > --- > > Key: HBASE-7954 > URL: https://issues.apache.org/jira/browse/HBASE-7954 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 0.94.5, 0.95.0 >Reporter: Himanshu Vashishtha >Assignee: Himanshu Vashishtha >Priority: Minor > > Matteo pointed out: > "We can avoid the redundant sleep in the retrying logic." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8294) Make HBaseConfiguration a singleton class
Himanshu Vashishtha created HBASE-8294: -- Summary: Make HBaseConfiguration a singleton class Key: HBASE-8294 URL: https://issues.apache.org/jira/browse/HBASE-8294 Project: HBase Issue Type: Improvement Components: master, regionserver Affects Versions: 0.95.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha HBaseConfiguration.create() calls a new Configuration object. Ideally, we would like to have just one configuration object in the jvm. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8288) HBaseFileSystem: Refactoring and correct semantics for createPath methods
Himanshu Vashishtha created HBASE-8288: -- Summary: HBaseFileSystem: Refactoring and correct semantics for createPath methods Key: HBASE-8288 URL: https://issues.apache.org/jira/browse/HBASE-8288 Project: HBase Issue Type: Bug Components: Filesystem Integration Affects Versions: 0.94.6 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.94.7 This jira is for two issues I see in the HBaseFileSystem class: 1) Load testing on a 7 node cluster using ycsb insert workload shows that static initialization of conf properties results in a slightly better throughput. Though the initialization uses HBaseConfiguration.create() call which is expensive (and I tried to avoid that in its first version), this class is used for most of the filesystem class, and had to invoke an additional checkAndSetXX call before making the fs call because it is not certain whether the retry properties are set or not. Having initialize them in static block removes that limitation. 2) Correct semantics for CreatePathXXX method. In case the overwrite flag is false and file already exists, underlying fs throws an exception. It should be re-thrown to the caller. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8211) Support for NN HA for 0.94
Himanshu Vashishtha created HBASE-8211: -- Summary: Support for NN HA for 0.94 Key: HBASE-8211 URL: https://issues.apache.org/jira/browse/HBASE-8211 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.94.6 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.94.7 HBase-8156 is for adding support for retrying non-idempotent operations. This is useful in case NN is suffering from n/w hiccups, etc. This jira is to add similar support for 0.94.x branch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8156) Support for Namenode HA for non-idempotent operations
Himanshu Vashishtha created HBASE-8156: -- Summary: Support for Namenode HA for non-idempotent operations Key: HBASE-8156 URL: https://issues.apache.org/jira/browse/HBASE-8156 Project: HBase Issue Type: Improvement Components: Filesystem Integration Affects Versions: 0.95.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.98.0 In hadoop 2 HA, non-idempotent operations are not retried at the hdfs side. This is by design as retrying a non-idempotent operation might not be a good design choice for some use case. HBase needs to handle the retries for such operations at its end. With HBase-7806, there is already some work going on for file system abstractions. There, HReginFileSystem sits as an abstraction between region and FS. This jira is a move in the same direction, where it adds retry functionality for non-idempotent calls such as create, rename and delete in HRegionFileSystem class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8106) Test to check replication log znodes move is done correctly
Himanshu Vashishtha created HBASE-8106: -- Summary: Test to check replication log znodes move is done correctly Key: HBASE-8106 URL: https://issues.apache.org/jira/browse/HBASE-8106 Project: HBase Issue Type: Test Components: Replication Affects Versions: 0.94.5 Reporter: Himanshu Vashishtha Fix For: 0.94.7 ReplicationZookeeper#copyQueuesFromRSUsingMulti moves the znodes under a regionserver failover environment. This jira is to add that the move is done correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8102) Replication NodeFailoverWorker should check other rs znodes before proceeding
Himanshu Vashishtha created HBASE-8102: -- Summary: Replication NodeFailoverWorker should check other rs znodes before proceeding Key: HBASE-8102 URL: https://issues.apache.org/jira/browse/HBASE-8102 Project: HBase Issue Type: Improvement Components: Replication Affects Versions: 0.94.5 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.95.0, 0.94.7 NodeFailoverWorker takes the dead rs znode and starts processing it (moving the log znodes under it, etc). Even a regionserver restart will trigger this znodes movement. This cause some other regionserver to read the log files remotely, even though, the original regionserver is up. Ideally, it should check available regionserver znodes after it comes out it sleep in its run method in order to decide whether it should really process the znode or not. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8070) Rollback support for Increments
[ https://issues.apache.org/jira/browse/HBASE-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha resolved HBASE-8070. Resolution: Duplicate Looked at the Increment code path. It can easily be folded into 8028. Will upload a consolidated patch there. > Rollback support for Increments > --- > > Key: HBASE-8070 > URL: https://issues.apache.org/jira/browse/HBASE-8070 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.94.5 >Reporter: Himanshu Vashishtha >Assignee: Himanshu Vashishtha > Fix For: 0.95.0 > > > Add rollback support for Increments. > This is basically a subtask of HBase-8028. > See the discussion up there to see how Append handles its rollback mechanism. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8070) Rollback support for Increments
Himanshu Vashishtha created HBASE-8070: -- Summary: Rollback support for Increments Key: HBASE-8070 URL: https://issues.apache.org/jira/browse/HBASE-8070 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.5 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.95.0 Add rollback support for Increments. This is basically a subtask of HBase-8028. See the discussion up there to see how Append handles its rollback mechanism. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8028) Append, Increment doesn't handle wall-sync exceptions correctly
Himanshu Vashishtha created HBASE-8028: -- Summary: Append, Increment doesn't handle wall-sync exceptions correctly Key: HBASE-8028 URL: https://issues.apache.org/jira/browse/HBASE-8028 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.5 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.95.0 In case there is an exception while doing the log-sync, the memstore is not rollbacked, while the mvcc is _always_ forwarded to the writeentry created at the beginning of the operation. This may lead to scanners seeing results which are not synched to the fs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7970) Improve file descriptor usage: Currently, there are two fds per one storefile
Himanshu Vashishtha created HBASE-7970: -- Summary: Improve file descriptor usage: Currently, there are two fds per one storefile Key: HBASE-7970 URL: https://issues.apache.org/jira/browse/HBASE-7970 Project: HBase Issue Type: Bug Reporter: Himanshu Vashishtha This is because there are two open calls in the HFile: one with checksum and another for without checksum support in v2: see the method in HFile:createReaderWithEncoding() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7954) Fix the retrying logic of memstore flushes to avoid extra sleep
Himanshu Vashishtha created HBASE-7954: -- Summary: Fix the retrying logic of memstore flushes to avoid extra sleep Key: HBASE-7954 URL: https://issues.apache.org/jira/browse/HBASE-7954 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.5, 0.95.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Priority: Minor Matteo pointed out: "We can avoid the redundant sleep in the retrying logic." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7937) Retry log rolling to support HA NN scenario
Himanshu Vashishtha created HBASE-7937: -- Summary: Retry log rolling to support HA NN scenario Key: HBASE-7937 URL: https://issues.apache.org/jira/browse/HBASE-7937 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.94.5 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha A failure in log rolling causes regionserver abort. In case of HA NN, it will be good if there is a retry mechanism to roll the logs. A corresponding jira for MemStore retries is HBASE-7507. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7883) Update memstore size when removing the entries in append operation
Himanshu Vashishtha created HBASE-7883: -- Summary: Update memstore size when removing the entries in append operation Key: HBASE-7883 URL: https://issues.apache.org/jira/browse/HBASE-7883 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.96.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0 The memstore size is not updated when the previous entries are removed from the memstore. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7792) HLog: Removing unnecessary attributes: METAFAMILY, METAROW
Himanshu Vashishtha created HBASE-7792: -- Summary: HLog: Removing unnecessary attributes: METAFAMILY, METAROW Key: HBASE-7792 URL: https://issues.apache.org/jira/browse/HBASE-7792 Project: HBase Issue Type: Bug Components: wal Affects Versions: 0.96.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Priority: Minor I don't see any judicial use of these two parameters in the code base. Why not remove them? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-7765) A tool to replay HLog entries in case a log file is missed while log splitting
[ https://issues.apache.org/jira/browse/HBASE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha resolved HBASE-7765. Resolution: Fixed > A tool to replay HLog entries in case a log file is missed while log splitting > -- > > Key: HBASE-7765 > URL: https://issues.apache.org/jira/browse/HBASE-7765 > Project: HBase > Issue Type: New Feature > Components: wal >Affects Versions: 0.94.4 >Reporter: Himanshu Vashishtha >Assignee: Himanshu Vashishtha > > There may be a case where a HLog can escape the Log splitting. > It will be good to have a standalone tool that reads entries from a HLog, and > replays it back to HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-7765) A tool to replay HLog entries in case a log file is missed while log splitting
[ https://issues.apache.org/jira/browse/HBASE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha reopened HBASE-7765: Agreed Kevin. > A tool to replay HLog entries in case a log file is missed while log splitting > -- > > Key: HBASE-7765 > URL: https://issues.apache.org/jira/browse/HBASE-7765 > Project: HBase > Issue Type: New Feature > Components: wal >Affects Versions: 0.94.4 >Reporter: Himanshu Vashishtha >Assignee: Himanshu Vashishtha > > There may be a case where a HLog can escape the Log splitting. > It will be good to have a standalone tool that reads entries from a HLog, and > replays it back to HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-7765) A tool to replay HLog entries in case a log file is missed while log splitting
[ https://issues.apache.org/jira/browse/HBASE-7765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha resolved HBASE-7765. Resolution: Duplicate > A tool to replay HLog entries in case a log file is missed while log splitting > -- > > Key: HBASE-7765 > URL: https://issues.apache.org/jira/browse/HBASE-7765 > Project: HBase > Issue Type: New Feature > Components: wal >Affects Versions: 0.94.4 >Reporter: Himanshu Vashishtha >Assignee: Himanshu Vashishtha > > There may be a case where a HLog can escape the Log splitting. > It will be good to have a standalone tool that reads entries from a HLog, and > replays it back to HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7765) A tool to replay HLog entries in case a log file is missed while log splitting
Himanshu Vashishtha created HBASE-7765: -- Summary: A tool to replay HLog entries in case a log file is missed while log splitting Key: HBASE-7765 URL: https://issues.apache.org/jira/browse/HBASE-7765 Project: HBase Issue Type: New Feature Components: wal Affects Versions: 0.94.4 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha There may be a case where a HLog can escape the Log splitting. It will be good to have a standalone tool that reads entries from a HLog, and replays it back to HBase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7722) Fix TestRegionServerCoprocessorExceptionWithAbort flakiness in trunk
Himanshu Vashishtha created HBASE-7722: -- Summary: Fix TestRegionServerCoprocessorExceptionWithAbort flakiness in trunk Key: HBASE-7722 URL: https://issues.apache.org/jira/browse/HBASE-7722 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha In trunk, the test fails as the table.put() statement in the test passes even with BuggyRegionCoprocessor failed the transaction: "The put should have failed, as the coprocessor is buggy". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-7607) Fix TestRegionServerCoprocessorExceptionWithAbort flakiness
[ https://issues.apache.org/jira/browse/HBASE-7607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha reopened HBASE-7607: re-opening as par last comment. > Fix TestRegionServerCoprocessorExceptionWithAbort flakiness > > > Key: HBASE-7607 > URL: https://issues.apache.org/jira/browse/HBASE-7607 > Project: HBase > Issue Type: Bug > Components: Client, test >Affects Versions: 0.94.4 >Reporter: Himanshu Vashishtha >Assignee: Himanshu Vashishtha > > TestRegionServerCoprocessorExceptionWithAbort fails sometimes both on trunk > and 0.94.X. The codebase is different in both. > In trunk, table.put() passes even with BuggyRegionCoprocessor failed the > transaction: > "The put should have failed, as the coprocessor is buggy" > In 0.94.x, client retries to look at the root region, while the cluster is > down and /hbase znode is no longer present. > "Check the value configured in 'zookeeper.znode.parent'. There could be a > mismatch with the one configured in the master." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7657) Make ModifyTableHandler synchronous
Himanshu Vashishtha created HBASE-7657: -- Summary: Make ModifyTableHandler synchronous Key: HBASE-7657 URL: https://issues.apache.org/jira/browse/HBASE-7657 Project: HBase Issue Type: Bug Components: Admin, Client Affects Versions: 0.96.0 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0 This is along the lines of other admin operations such as modifyColumnFamily, AddColumnFamily to make it a synchronous op. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7607) Fix TestRegionServerCoprocessorExceptionWithAbort flakiness
Himanshu Vashishtha created HBASE-7607: -- Summary: Fix TestRegionServerCoprocessorExceptionWithAbort flakiness Key: HBASE-7607 URL: https://issues.apache.org/jira/browse/HBASE-7607 Project: HBase Issue Type: Bug Components: Client, test Affects Versions: 0.94.4 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.5 TestRegionServerCoprocessorExceptionWithAbort fails sometimes both on trunk and 0.94.X. The codebase is different in both. In trunk, table.put() passes even with BuggyRegionCoprocessor failed the transaction: "The put should have failed, as the coprocessor is buggy" In 0.94.x, client retries to look at the root region, while the cluster is down and /hbase znode is no longer present. "Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master." -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7562) ZKUtil: missing "else condition" in multi processing
Himanshu Vashishtha created HBASE-7562: -- Summary: ZKUtil: missing "else condition" in multi processing Key: HBASE-7562 URL: https://issues.apache.org/jira/browse/HBASE-7562 Project: HBase Issue Type: Bug Reporter: Himanshu Vashishtha Priority: Minor The method multiOrSequential misses an else condition and process the list of Ops both in multi and sequentially. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7540) Make znode dump to print a dump of replciation znodes
Himanshu Vashishtha created HBASE-7540: -- Summary: Make znode dump to print a dump of replciation znodes Key: HBASE-7540 URL: https://issues.apache.org/jira/browse/HBASE-7540 Project: HBase Issue Type: Improvement Components: Replication, UI Affects Versions: 0.94.3 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.4 It will be nice to have a dump of replication related znodes on the master UI (along with other znode dump). It helps while using replication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7502) TestScannerTimeout fails on snapshot branch
Himanshu Vashishtha created HBASE-7502: -- Summary: TestScannerTimeout fails on snapshot branch Key: HBASE-7502 URL: https://issues.apache.org/jira/browse/HBASE-7502 Project: HBase Issue Type: Bug Components: test Affects Versions: hbase-7290 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: hbase-7290 TestScannerTimeout#test3686a fails consistently on snapshot branch. This is because there is an increase in the number of watches on the rs znode and its deletion takes more time now. The repercussion is that when test3686a starts, it ensures that there are two regionservers and it counts the aborted regionserver as a live one. While processing, it kills one of its server, and also the znode of the previously aborted server expires. Overall effect is there are no regionservers now, and client hangs. {code} Error Message test timed out after 30 milliseconds Stacktrace java.lang.Exception: test timed out after 30 milliseconds at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.close(HConnectionManager.java:1769) at org.apache.hadoop.hbase.client.HTable.close(HTable.java:961) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:180) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:54) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:133) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:360) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7440) ReplicationZookeeper#addPeer is racy
Himanshu Vashishtha created HBASE-7440: -- Summary: ReplicationZookeeper#addPeer is racy Key: HBASE-7440 URL: https://issues.apache.org/jira/browse/HBASE-7440 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.3 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.4 While adding a peer, ReplicationZK does the znodes creation in three transactions. Create : a) peers znode b) peerId specific znode, and c) peerState znode There is a PeerWatcher which invokes getPeer() (after steps b) and c)). If it happens that while adding a peer, the control flows to getPeer() and step c) has not been processed, it may results in a state where the peer will not be added. This happens while running TestMasterReplication#testCyclicReplication(). {code} 2012-12-26 07:36:35,187 INFO [RegionServer:0;p0120.X,38423,1356536179470-EventThread] zookeeper.RecoverableZooKeeper(447): Node /2/replication/peers/1/peer-state already exists and this is not a retry 2012-12-26 07:36:35,188 ERROR [RegionServer:0;p0120.X,38423,1356536179470-EventThread] regionserver.ReplicationSourceManager$PeersWatcher(527): Error while adding a new peer org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /2/replication/peers/1/peer-state at org.apache.zookeeper.KeeperException.create(KeeperException.java:119) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:428) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:410) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:1044) at org.apache.hadoop.hbase.replication.ReplicationPeer.startStateTracker(ReplicationPeer.java:82) at org.apache.hadoop.hbase.replication.ReplicationZookeeper.getPeer(ReplicationZookeeper.java:344) at org.apache.hadoop.hbase.replication.ReplicationZookeeper.connectToPeer(ReplicationZookeeper.java:307) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$PeersWatcher.nodeChildrenChanged(ReplicationSourceManager.java:519) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:315) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2012-12-26 07:36:35,188 DEBUG [RegionServer:0;p0120.X,55742,1356536171947-EventThread] zookeeper.ZKUtil(1545): regionserver:55742-0x13bd7db39580004 Retrieved 36 byte(s) of data from znode /1/hbaseid; data=9ce66123-d3e8-4ae9-a249-afe03... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira