[jira] [Created] (HBASE-11868) Data loss in hlog when the hdfs is unavailable
Liu Shaohui created HBASE-11868: --- Summary: Data loss in hlog when the hdfs is unavailable Key: HBASE-11868 URL: https://issues.apache.org/jira/browse/HBASE-11868 Project: HBase Issue Type: Bug Affects Versions: 0.98.5 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Blocker When using the new thread model in hbase, we found a bug which may cause data loss when the the hdfs is unavailable. When writing wal Edits to hlog in doMiniBatchMutation of HRegion, the hlog first call appendNoSync to write the edits to hlog and then call sync with txid. Assumed that the txid of current write is 10, and the syncedTillHere in hlog is 9 and the failedTxid is 0. When the the hdfs is unavailable, the AsyncWriter or AsyncSyncer will fail to apend the edits or sync, then they will update the syncedTillHere to 10 and the failedTxid to 10. When the hlog calls the sync with txid :10, the failedTxid will nerver be checked for txid is less than syncedTillHere. The client thinks the write success , but the data only be writtten to memstore not hlog. If the regionserver is down later before the memstore if flushed, the data will be lost. {code} // sync all transactions upto the specified txid private void syncer(long txid) throws IOException { synchronized (this.syncedTillHere) { while (this.syncedTillHere.get() < txid) { try { this.syncedTillHere.wait(); if (txid <= this.failedTxid.get()) { assert asyncIOE != null : "current txid is among(under) failed txids, but asyncIOE is null!"; throw asyncIOE; } } catch (InterruptedException e) { LOG.debug("interrupted while waiting for notification from AsyncNotifier"); } } } } {code} We can fix this issue by moving the comparing of txid and failedTxid outside the while block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-11869) Support snapshot owner
Liu Shaohui created HBASE-11869: --- Summary: Support snapshot owner Key: HBASE-11869 URL: https://issues.apache.org/jira/browse/HBASE-11869 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor In current codebase, the table snapshot operations only can be done by the global admin , not by the table admin. There is a multi-tenant hbase cluster, each table has different snapshot policies, eg: do snapshot per week, or snapshot after the new data are imported. We want to release the snapshot permission to each table admin. According to [~mbertozzi]'s suggestion, we implement the snapshot own feature. * The user with table admin permission can create snapshot and the owner of this snapshot is this user. * The owner of snapshot can delete and restore the snapshot. * Only the user with global admin permission can clone a snapshot, for this operation creates a new table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-11877) Make TableSplit more readable
Liu Shaohui created HBASE-11877: --- Summary: Make TableSplit more readable Key: HBASE-11877 URL: https://issues.apache.org/jira/browse/HBASE-11877 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Priority: Minor When debugging MR jobs reading from hbase table, it's import to figure out which region a map task is reading from. But the table split object is hard to read. eg: {code} 2014-09-01 20:58:39,783 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: lg-hadoop-prc-st40.bj:,0 {code} We should make it more readable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-11897) Add append and remove table-cfs cmds for replication
Liu Shaohui created HBASE-11897: --- Summary: Add append and remove table-cfs cmds for replication Key: HBASE-11897 URL: https://issues.apache.org/jira/browse/HBASE-11897 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Priority: Minor HBASE-8751 introduces the tables/table-column families config for a replication peer. It's very flexible for practical replication in hbase clusters. But it is easy to make mistakes during add or remove a table/table-column family for a existing peer, especially when the table-cfs is very long, for we need to copy the current table-cfs of the peer first, and then add or remove a table/table-column family to/from the table-cfs, at last set back the table-cfs using the cmd: set_peer_tableCFs. So we implements two new cmds: append_peer_tableCFs and remove_peer_tableCFs to do the operation of adding and removing a table/table-column family. They are useful operation tools. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-11957) Backport HBASE-5974 to 0.94
Liu Shaohui created HBASE-11957: --- Summary: Backport HBASE-5974 to 0.94 Key: HBASE-11957 URL: https://issues.apache.org/jira/browse/HBASE-11957 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Critical Fix For: 0.94.24 HBASE-5974:Scanner retry behavior with RPC timeout on next() seems incorrect, which cause data missing in hbase scan. I think we should fix it in 0.94. [~lhofhansl] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-11958) Add documents about snapshot owner
Liu Shaohui created HBASE-11958: --- Summary: Add documents about snapshot owner Key: HBASE-11958 URL: https://issues.apache.org/jira/browse/HBASE-11958 Project: HBase Issue Type: Sub-task Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor HBASE-11869 introduct snapshot owner feature. We need to add documents about it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12241) The crash of regionServer when taking deadserver's queue break replication
Liu Shaohui created HBASE-12241: --- Summary: The crash of regionServer when taking deadserver's queue break replication Key: HBASE-12241 URL: https://issues.apache.org/jira/browse/HBASE-12241 Project: HBase Issue Type: Bug Components: Replication Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Critical When a regionserver crash, another regionserver will try to take over the replication hlogs queue and help the the the dead regionserver to finish the replcation.See NodeFailoverWorker in ReplicationSourceManager Currently hbase.zookeeper.useMulti is false in default configuration. The operation of taking over replication queue is not atomic. The ReplicationSourceManager firstly lock the replication node of dead regionserver and then copy the replication queue, and delete replication node of dead regionserver at last. The operation of the lockOtherRS just creates a persistent zk node named "lock" which prevent other regionserver taking over the replication queue. See: {code} public boolean lockOtherRS(String znode) { try { String parent = ZKUtil.joinZNode(this.rsZNode, znode); if (parent.equals(rsServerNameZnode)) { LOG.warn("Won't lock because this is us, we're dead!"); return false; } String p = ZKUtil.joinZNode(parent, RS_LOCK_ZNODE); ZKUtil.createAndWatch(this.zookeeper, p, Bytes.toBytes(rsServerNameZnode)); } catch (KeeperException e) { ... return false; } return true; } {code} But if a regionserver crashed after creating this "lock" zk node and before coping the replication queue to its replication queue, the "lock" zk node will be left forever and no other regionserver can take over the replication queue. In out production cluster, we encounter this problem. We found the replication queue was there and no regionserver took over it and a "lock" zk node left there. {quote} hbase.32561.log:2014-09-24,14:09:28,790 INFO org.apache.hadoop.hbase.replication.ReplicationZookeeper: Won't transfer the queue, another RS took care of it because of: KeeperErrorCode = NoNode for /hbase/hhsrv-micloud/replication/rs/hh-hadoop-srv-st09.bj,12610,1410937824255/lock hbase.32561.log:2014-09-24,14:14:45,148 INFO org.apache.hadoop.hbase.replication.ReplicationZookeeper: Won't transfer the queue, another RS took care of it because of: KeeperErrorCode = NoNode for /hbase/hhsrv-micloud/replication/rs/hh-hadoop-srv-st10.bj,12600,1410937795685/lock {quote} A quick solution is that the lock operation just create an ephemeral "lock" zookeeper node and when the lock node is deleted, other regionserver will be notified to check if there are replication queue left. Suggestions are welcomed! Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12263) RegionServer listens on localhost in distributed cluster when DNS is unavailable
Liu Shaohui created HBASE-12263: --- Summary: RegionServer listens on localhost in distributed cluster when DNS is unavailable Key: HBASE-12263 URL: https://issues.apache.org/jira/browse/HBASE-12263 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Priority: Minor When DNS is unavailable, the new started regionservers will listen on localhost(127.0.0.1) in a distributed cluster, which results that the hmaster will fail to assign regions to those regionservers. {quote} 2014-10-15,04:26:42,273 WARN org.apache.hadoop.net.DNS: Unable to determine local hostname -falling back to "localhost" java.net.UnknownHostException: xx-hadoop-srv-st01.bj: xx-hadoop-srv-st01.bj at java.net.InetAddress.getLocalHost(InetAddress.java:1360) at org.apache.hadoop.net.DNS.resolveLocalHostname(DNS.java:260) at org.apache.hadoop.net.DNS.(DNS.java:58) at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:472) {quote} {quote} $ netstat -nap | grep 13748 tcp0 0 127.0.0.1:12610 0.0.0.0:* LISTEN 13748/java tcp0 0 0.0.0.0:12611 0.0.0.0:* LISTEN 13748/java {quote} In this situation, I think we shoud throw an exception and make the startup of regionservers failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12336) RegionServer failed to shutdown for NodeFailoverWorker thread
Liu Shaohui created HBASE-12336: --- Summary: RegionServer failed to shutdown for NodeFailoverWorker thread Key: HBASE-12336 URL: https://issues.apache.org/jira/browse/HBASE-12336 Project: HBase Issue Type: Bug Affects Versions: 0.94.11 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor After enabling hbase.zookeeper.useMulti in hbase cluster, we found that regionserver failed to shutdown. Other threads have exited except the a NodeFailoverWorker thread. {code} "ReplicationExecutor-0" prio=10 tid=0x7f0d40195ad0 nid=0x73a in Object.wait() [0x7f0dc8fe6000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) - locked <0x0005a16df080> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:930) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:912) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531) at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1518) at org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:804) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:612) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {code} It's sure that the shutdown method of the executor is called in ReplicationSourceManager#join. I am looking for the root cause and suggestions are welcomed. Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12361) Show data locality of region in tabel page
Liu Shaohui created HBASE-12361: --- Summary: Show data locality of region in tabel page Key: HBASE-12361 URL: https://issues.apache.org/jira/browse/HBASE-12361 Project: HBase Issue Type: New Feature Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Data Locality is an important metric added in HBASE-4114 for read performance. It will be useful to show it in table page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12434) Add a command to compact all the region in a regionserver
Liu Shaohui created HBASE-12434: --- Summary: Add a command to compact all the region in a regionserver Key: HBASE-12434 URL: https://issues.apache.org/jira/browse/HBASE-12434 Project: HBase Issue Type: Improvement Components: shell Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0 When adding new regionserver to the hbase cluster, the data locality of the regions in the new regionserver is very low. Most read hdfs requests from these regions are remote read, not local read. So, the latency of read request to these regions is not stable. Usually we can compact all the regions in the new regionserver to improve the data locality in offpeak. So we add a command: compact_rs in hbase shell. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12451) IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region split in rolling update of cluster
Liu Shaohui created HBASE-12451: --- Summary: IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region split in rolling update of cluster Key: HBASE-12451 URL: https://issues.apache.org/jira/browse/HBASE-12451 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0 Currently IncreasingToUpperBoundRegionSplitPolicy is the default region split policy. In this policy, split size is the number of regions that are on this server that all are of the same table, cubed, times 2x the region flush size. But when unloading regions of a regionserver in a cluster using region_mover.rb, the number of regions that are on this server that all are of the same table will decrease, and the split size will decrease too, which may cause the left region split in the regionsever. Region Splits also happens when loading regions of a regionserver in a cluster. A improvment may set a minmum split size in IncreasingToUpperBoundRegionSplitPolicy Suggestions are welcomed. Thanks~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12462) Support deleting all columns of the specified family of a row in hbase shell
Liu Shaohui created HBASE-12462: --- Summary: Support deleting all columns of the specified family of a row in hbase shell Key: HBASE-12462 URL: https://issues.apache.org/jira/browse/HBASE-12462 Project: HBase Issue Type: New Feature Components: shell Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0 Currently, HBase shell only support deleting a column of a row in a table. In some scenarios, we want to delete all the columns under a a column family of a row, but there may be many columns there. It's difficult to delete the columns one by one in shell. It's easy to add this feature in shell since the Delete have the API of deleting a family. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12534) Wrong region location cache in client after regions are moved
Liu Shaohui created HBASE-12534: --- Summary: Wrong region location cache in client after regions are moved Key: HBASE-12534 URL: https://issues.apache.org/jira/browse/HBASE-12534 Project: HBase Issue Type: Bug Affects Versions: 0.94.24 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Critical In our 0.94 hbase cluster, we found that client got wrong region location cache and did not update it after a region is moved to another regionserver. The reason is wrong client config and bug in RpcRetryingCaller of hbase client. The rpc configs are following: {code} hbase.rpc.timeout=1000 hbase.client.pause=200 hbase.client.operation.timeout=1200 {code} But the client retry number is 3 {code} hbase.client.retries.number=3 {code} Assumed that a region is at regionserver A before, and then it is moved to regionserver B. The client try to make a call to regionserver A and get an NotServingRegionException. For the rety number is not 1, the region server location cache is not cleaned. See: RpcRetryingCaller.java#141 and RegionServerCallable.java#127 {code} @Override public void throwable(Throwable t, boolean retrying) { if (t instanceof SocketTimeoutException || } else if (t instanceof NotServingRegionException && !retrying) { // Purge cache entries for this specific region from hbase:meta cache // since we don't call connect(true) when number of retries is 1. getConnection().deleteCachedRegionLocation(location); } } {code} But the call did not retry and throw an SocketTimeoutException for the time the call will take is larger than the operation timeout.See RpcRetryingCaller.java#152 {code} expectedSleep = callable.sleep(pause, tries + 1); // If, after the planned sleep, there won't be enough time left, we stop now. long duration = singleCallDuration(expectedSleep); if (duration > callTimeout) { String msg = "callTimeout=" + callTimeout + ", callDuration=" + duration + ": " + callable.getExceptionMessageAdditionalDetail(); throw (SocketTimeoutException)(new SocketTimeoutException(msg).initCause(t)); } {code} At last, the wrong region location will never be not cleaned up . [~lhofhansl] In hbase 0.94, the MIN_RPC_TIMEOUT in singleCallDuration is 2000 in default, which trigger this bug. {code} private long singleCallDuration(final long expectedSleep) { return (EnvironmentEdgeManager.currentTimeMillis() - this.globalStartTime) + MIN_RPC_TIMEOUT + expectedSleep; } {code} But there is risk in master code too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12542) Delete a family of table online will crash regionserver
Liu Shaohui created HBASE-12542: --- Summary: Delete a family of table online will crash regionserver Key: HBASE-12542 URL: https://issues.apache.org/jira/browse/HBASE-12542 Project: HBase Issue Type: Bug Components: regionserver Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Critical Fix For: 2.0.0, 0.94.25 Using alter command to delete a family of table online will make the regionsevers that serve the regions of the table crash. {code} alter 't', NAME => 'f', METHOD => 'delete' {code} The reason is that TableDeleteFamilyHandler in HMaster delete the family dir firstly and then reopen all the regions of table. When the regionserver reopen the region, it will crash for the exception in flushing memstore to hfile of the deleted family during closing the region, because the parent dir of the hfile has been deleted in TableDeleteFamilyHandler. See: TableDeleteFamilyHandler.java #57 A simple solution is change the order of operations in TableDeleteFamilyHandler. - update table descriptor first, - reopen all the regions, - delete the the family dir at last. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12635) Delete acl notify znode of table after the table is deleted
Liu Shaohui created HBASE-12635: --- Summary: Delete acl notify znode of table after the table is deleted Key: HBASE-12635 URL: https://issues.apache.org/jira/browse/HBASE-12635 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor In our multi-tenant hbase cluster, we found that there are over 1M znodes under the acl node. The reason is that users create and delete tables with different names frequently. The acl notify znode are left there after the tables are deleted. Simple solution is that deleting acl notify znode of table in AccessController when deleting the table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12636) Avoid too many write operations on zookeeper in replication
Liu Shaohui created HBASE-12636: --- Summary: Avoid too many write operations on zookeeper in replication Key: HBASE-12636 URL: https://issues.apache.org/jira/browse/HBASE-12636 Project: HBase Issue Type: Improvement Affects Versions: 0.94.11 Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 1.0.0 In our production cluster, we found there are about over 1k write operations per second on zookeeper from hbase replication. The reason is that the replication source will write the log position to zookeeper for every edit shipping. If the current replicating WAL is just the WAL that regionserver is writing to, each skipping will be very small but the frequency is very high, which causes many write operations on zookeeper. A simple solution is that writing log position to zookeeper when position diff or skipped edit number is larger than a threshold, not every edit shipping. Suggestions are welcomed, thx~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12641) Grant all permissions of hbase zookeeper node to hbase superuser in a secure cluster
Liu Shaohui created HBASE-12641: --- Summary: Grant all permissions of hbase zookeeper node to hbase superuser in a secure cluster Key: HBASE-12641 URL: https://issues.apache.org/jira/browse/HBASE-12641 Project: HBase Issue Type: Improvement Components: Zookeeper Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Currently in a secure cluster, only the master/regionserver kerberos user can manage the znode of hbase. But he master/regionserver kerberos user is for rpc connection and we usually use another super user to manage the cluster. In some special scenarios, we need to manage the data of znode with the supper user. eg: a, To get the data of the znode for debugging. b, HBASE-8253: We need to delete the znode for the corrupted hlog to avoid it block the replication. So we grant all permissions of hbase zookeeper node to hbase superuser during creating these znodes. Suggestions are welcomed. [~apurtell] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12739) Avoid too large identifier of ZooKeeperWatcher
Liu Shaohui created HBASE-12739: --- Summary: Avoid too large identifier of ZooKeeperWatcher Key: HBASE-12739 URL: https://issues.apache.org/jira/browse/HBASE-12739 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor For each SyncConnected event, the ZooKeeperWatcher will append the session id to its identity. During the failover of zk, the zookeeper client can connect to the zk server, but the the zk server can not serve the request, so the client will try continually, which will produce many SyncConnected events and a very large identifier of ZooKeeperWatcher in hbase log. {code} 2014-12-22,12:38:56,296 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:16500-0x349cbb4e4a7f0ba-0x349cbb4e4a7f0ba-0x349cbb4e4a7f0ba-0x349cbb4e4a7f0ba-0x349cbb4e4a7f0ba-0x349cbb4e4a7f0ba-0x349cbb4e4a7f0ba-0x349cbb4e4a7f0ba-0x349cbb4e4a7f0ba-0x349cbb4e4a7f0ba-0x349cbb4e4a7f0ba-... {code} A simple patch to fix this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12801) Failed to truncate a table while maintaing binary region boundaries
Liu Shaohui created HBASE-12801: --- Summary: Failed to truncate a table while maintaing binary region boundaries Key: HBASE-12801 URL: https://issues.apache.org/jira/browse/HBASE-12801 Project: HBase Issue Type: Bug Components: shell Affects Versions: 0.94.11 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Binary region boundaries become wrong during converting byte array to normal string, and back to byte array in truncate_preserve of admin.rb, which makes the truncation of table failed. See: truncate_preserve method in admin.rb {code} splits = h_table.getRegionLocations().keys().map{|i| Bytes.toString(i.getStartKey)}.delete_if{|k| k == ""}.to_java :String splits = org.apache.hadoop.hbase.util.Bytes.toByteArrays(splits) {code} eg: {code} \xFA\x00\x00\x00\x00\x00\x00\x00 -> \xEF\xBF\xBD\x00\x00\x00\x00\x00\x00\x00 \xFC\x00\x00\x00\x00\x00\x00\x00 -> \xEF\xBF\xBD\x00\x00\x00\x00\x00\x00\x00 {code} Simple patch is using binary string instead of normal string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12865) Wals may be deleted before they are replicated to peers
Liu Shaohui created HBASE-12865: --- Summary: Wals may be deleted before they are replicated to peers Key: HBASE-12865 URL: https://issues.apache.org/jira/browse/HBASE-12865 Project: HBase Issue Type: Bug Reporter: Liu Shaohui By design, ReplicationLogCleaner guarantee that the WALs being in replication queue can't been deleted by the HMaster. The ReplicationLogCleaner gets the WAL set from zookeeper by scanning the replication zk node. But it may get uncompleted WAL set during replication failover for the scan operation is not atomic. For example: There are three region servers: rs1, rs2, rs3, and peer id 10. The layout of replication zookeeper nodes is: {code} /hbase/replication/rs/rs1/10/wals /rs2/10/wals /rs3/10/wals {code} - t1: the ReplicationLogCleaner finished scanning the replication queue of rs1, and start to scan the queue of rs2. - t2: region server rs3 is down, and rs1 take over rs3's replication queue. The new layout is {code} /hbase/replication/rs/rs1/10/wals /rs1/10-rs3/wals /rs2/10/wals /rs3 {code} - t3, the ReplicationLogCleaner finished scanning the queue of rs2, and start to scan the node of rs3. But the the queue has been moved to "replication/rs1/10-rs3/WALS" So the ReplicationLogCleaner will miss the WALs of rs3 in peer 10 and the hmaster may delete these WALs before they are replicated to peer clusters. We encountered this problem in our cluster and I think it's a serious bug for replication. Suggestions are welcomed to fix this bug. thx~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12884) Asynchronous Event Notification in HBase
Liu Shaohui created HBASE-12884: --- Summary: Asynchronous Event Notification in HBase Key: HBASE-12884 URL: https://issues.apache.org/jira/browse/HBASE-12884 Project: HBase Issue Type: New Feature Reporter: Liu Shaohui *Background* In many scenarios, we need an asynchronous event notification mechanism on HBase to know which data have been changed and users can do some pre-defined reactions to these events. For example: * Incremental statistics of data in HBase * Audit about change of important data * To clean invalid data in other cache systems *Features maybe* * The mechanism is scalable. * The notification is asynchronous. We don't want to affect the write performance of HBase. * The notification is reliable. The events can't be lost, but we can tolerate duplicated events. *Solution Maybe* * Event notification based on replication. Transform the WAL edits to events, and replicates a special peer that users implements. This is just a brief thought about this feature. Discussions and suggestions are welcomed! Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12916) No access control for replicating WAL entries
Liu Shaohui created HBASE-12916: --- Summary: No access control for replicating WAL entries Key: HBASE-12916 URL: https://issues.apache.org/jira/browse/HBASE-12916 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.94.26, 2.0.0, 0.98.12 Reporter: Liu Shaohui Assignee: Liu Shaohui Currently, there is no access control for replicating WAL entries in secure HBase cluster. Any authenticated user can write any data they want to any table of a secure cluster by using the replication api. Simple solution is to add permission check before replicating WAL entries. And only user with global write permission can replicate WAL entries to this cluster. Another option is adding "Replication" action in hbase and only user with "Replication" permission can replicate WAL entries to this cluster? [~apurtell] What's your suggestion? Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12921) Port HBASE-5356 'region_mover.rb can hang if table region it belongs to is deleted' to 0.94
Liu Shaohui created HBASE-12921: --- Summary: Port HBASE-5356 'region_mover.rb can hang if table region it belongs to is deleted' to 0.94 Key: HBASE-12921 URL: https://issues.apache.org/jira/browse/HBASE-12921 Project: HBase Issue Type: Bug Affects Versions: 0.94.11 Environment: Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 0.94.28 This is backport of HBASE-5356 'region_mover.rb can hang if table region it belongs to is deleted' to 0.94. [~lhofhansl] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12943) Set sun.net.inetaddr.ttl in HBase
Liu Shaohui created HBASE-12943: --- Summary: Set sun.net.inetaddr.ttl in HBase Key: HBASE-12943 URL: https://issues.apache.org/jira/browse/HBASE-12943 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui The default value of config: sun.net.inetaddr.ttl is -1 and the java processes will cache the mapping of hostname to ip address forever, See: http://docs.oracle.com/javase/7/docs/technotes/guides/net/properties.html But things go wrong when a regionserver with same hostname and different ip address rejoins the hbase cluster. The HMaster will get wrong ip address of the regionserver from this cache and every region assignment to this regionserver will be blocked for a time because the HMaster can't communicate with the regionserver. A tradeoff is to set the sun.net.inetaddr.ttl to 10m or 1h and make the wrong cache expired. Suggestions are welcomed. Thanks~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13199) Some small improvements on canary tool
Liu Shaohui created HBASE-13199: --- Summary: Some small improvements on canary tool Key: HBASE-13199 URL: https://issues.apache.org/jira/browse/HBASE-13199 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Improvements - Make the sniff of region and regionserver parallel to support large cluster with 1+ region and 500+ regionservers using thread pool. - Set cacheblock to false in get and scan to avoid influence to block cache. - Add FirstKeyOnlyFilter to get and scan to avoid read and translate too many data from HBase. There may be many column under a column family in a flat-wide table. - random select the region when sniffing a regionserver. [~stack] Suggestions are welcomed. Thanks~ Another question is that why to check each column family with separate requests when sniffing a region? Can we just check a column family of a region? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13216) Add version info in RPC connection header
Liu Shaohui created HBASE-13216: --- Summary: Add version info in RPC connection header Key: HBASE-13216 URL: https://issues.apache.org/jira/browse/HBASE-13216 Project: HBase Issue Type: Improvement Components: Client, rpc Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0 In the operation of a cluster, we usually want to know which clients are using the HBase client with critical bugs or too old version we will not support in future. By adding version info in RPC connection header, we can get these informations from audit log and promote them upgrade before a deadline. Discussions and suggestions are welcomed. Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13280) TestSecureRPC failed
Liu Shaohui created HBASE-13280: --- Summary: TestSecureRPC failed Key: HBASE-13280 URL: https://issues.apache.org/jira/browse/HBASE-13280 Project: HBase Issue Type: Test Reporter: Liu Shaohui Priority: Minor {code} Running org.apache.hadoop.hbase.security.TestSecureRPC Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 33.795 sec <<< FAILURE! - in org.apache.hadoop.hbase.security.TestSecureRPC testRpc(org.apache.hadoop.hbase.security.TestSecureRPC) Time elapsed: 14.963 sec <<< ERROR! java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hbase.security.TestSecureRPC.testRpcCallWithEnabledKerberosSaslAuth(TestSecureRPC.java:160) at org.apache.hadoop.hbase.security.TestSecureRPC.testRpc(TestSecureRPC.java:102) testAsyncRpc(org.apache.hadoop.hbase.security.TestSecureRPC) Time elapsed: 5.15 sec <<< ERROR! java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hbase.security.TestSecureRPC.testRpcCallWithEnabledKerberosSaslAuth(TestSecureRPC.java:160) at org.apache.hadoop.hbase.security.TestSecureRPC.testAsyncRpc(TestSecureRPC.java:107) {code} >From log we saw that: {code} 2015-03-19 11:27:02,271 WARN [Thread-5] ipc.RpcClientImpl$Connection$1(662): Couldn't setup connection for hbase/liushaohui-optiplex-...@example.com to hbase/liushaohui-optiplex-...@example.com Exception in thread "Thread-5" java.lang.RuntimeException: com.google.protobuf.ServiceException: java.io.IOException: Couldn't setup connection for hbase/liushaohui-optiplex-...@example.com to hbase/liushaohui-optiplex-...@example.com at org.apache.hadoop.hbase.ipc.TestDelayedRpc$TestThread.run(TestDelayedRpc.java:275) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Couldn't setup connection for hbase/liushaohui-optiplex-...@example.com to hbase/liushaohui-optiplex-...@example.com at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:223) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) at org.apache.hadoop.hbase.ipc.protobuf.generated.TestDelayedRpcProtos$TestDelayedService$BlockingStub.test(TestDelayedRpcProtos.java:1115) at org.apache.hadoop.hbase.ipc.TestDelayedRpc$TestThread.run(TestDelayedRpc.java:272) Caused by: java.io.IOException: Couldn't setup connection for hbase/liushaohui-optiplex-...@example.com to hbase/liushaohui-optiplex-...@example.com at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$1.run(RpcClientImpl.java:663) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.handleSaslConnectionFailure(RpcClientImpl.java:635) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:743) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:885) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:854) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1170) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) ... 3 more Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - Server not found in Kerberos database)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:609) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:154) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:735) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:732) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstr
[jira] [Created] (HBASE-13319) Support 64-bits total row number in PerformanceEvaluation
Liu Shaohui created HBASE-13319: --- Summary: Support 64-bits total row number in PerformanceEvaluation Key: HBASE-13319 URL: https://issues.apache.org/jira/browse/HBASE-13319 Project: HBase Issue Type: Improvement Components: Performance Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Currently the total row number in PerformanceEvaluation is 32 bits. It's not enough when testing a large hbase cluster using PerformanceEvaluation in mapreduce mode. Suggestions are welcomed~ Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13348) Separate the thread number configs for meta server and server operations
Liu Shaohui created HBASE-13348: --- Summary: Separate the thread number configs for meta server and server operations Key: HBASE-13348 URL: https://issues.apache.org/jira/browse/HBASE-13348 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Currently, the config keys for thread number of meta server and server operations in HMaster are same: See: HMaster.java #993 {code} this.service.startExecutorService(ExecutorType.MASTER_SERVER_OPERATIONS, conf.getInt("hbase.master.executor.serverops.threads", 5)); this.service.startExecutorService(ExecutorType.MASTER_META_SERVER_OPERATIONS, conf.getInt("hbase.master.executor.serverops.threads", 5)); {code} In large cluster, we usually enlarge the thread number for server operation separately to make the master handle regionserver shutdown events quickly in some extremely cases. So I think we need separate the thread number config for the two operations. Suggestions are welcomed. Thanks~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13366) Throw DoNotRetryIOException instead of read only IOException
Liu Shaohui created HBASE-13366: --- Summary: Throw DoNotRetryIOException instead of read only IOException Key: HBASE-13366 URL: https://issues.apache.org/jira/browse/HBASE-13366 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Currently, the read only region just throws an IOException to the clients who send write requests to it. This will cause the clients retry for configured times or until operation timeout. Changing this exception to DoNotRetryIOException will make the client failed fast. Suggestions are welcomed~ Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13367) Add a replication label to mutations from replication
Liu Shaohui created HBASE-13367: --- Summary: Add a replication label to mutations from replication Key: HBASE-13367 URL: https://issues.apache.org/jira/browse/HBASE-13367 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui In some scenarios, the regions need to distinguish the mutations from client for actual users or from replication of peer cluster. - Lower the priority of mutations from replication to improve the latency of requests from actuals users - Set table in replicated state to keep data consistent. In this state, the table will reject mutations from users, but accept mutations from replication and read requests from users. So we need to add a replication label to mutations from replication. Suggestions and discussions are welcomed. Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13396) Cleanup unclosed writers in later writer rolling
Liu Shaohui created HBASE-13396: --- Summary: Cleanup unclosed writers in later writer rolling Key: HBASE-13396 URL: https://issues.apache.org/jira/browse/HBASE-13396 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Currently, the default value of hbase.regionserver.logroll.errors.tolerated is 2, which means regionserver can tolerate two continuous failures of closing writers at most. Temporary problems of network or namenode may cause those failures. After those failures, the hdfs clients in RS may continue to renew the lease of the hlog of the writer and the namenode will not help to recover the lease of this hlog. So the last block of this hlog will be RBW(replica being written) state until the regionserver is down. Blocks in this state will block the datanode decommission and other operations in HDFS. So I think we need a mechanism to clean up those unclosed writers afterwards. A simple solution is to record those unclosed writers and attempt to close these writers until success. Discussions and suggestions are welcomed~ Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13988) Add exception handler for lease thread
Liu Shaohui created HBASE-13988: --- Summary: Add exception handler for lease thread Key: HBASE-13988 URL: https://issues.apache.org/jira/browse/HBASE-13988 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor In a prod cluster, a region server exited for some important threads were not alive. After excluding other threads from the log, we doubted the lease thread was the root. So we need to add an exception handler to the lease thread to debug why it exited in future. {quote} 2015-06-29,12:46:09,222 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: One or more threads are no longer alive -- stop 2015-06-29,12:46:09,223 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 21600 ... 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Thread-37 exiting 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver21600.compactionChecker exiting 2015-06-29,12:46:12,403 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher: regionserver21600.periodicFlusher exiting {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13996) Add write sniffing in canary
Liu Shaohui created HBASE-13996: --- Summary: Add write sniffing in canary Key: HBASE-13996 URL: https://issues.apache.org/jira/browse/HBASE-13996 Project: HBase Issue Type: New Feature Components: canary Reporter: Liu Shaohui Assignee: Liu Shaohui Currently the canary tool only sniff the read operations, it's hard to finding the problem in write path. To support the write sniffing, we create a system table named '_canary_' in the canary tool. And the tool will make sure that the region number is large than the number of the regionserver and the regions will be distributed onto all regionservers. Periodically, the tool will put data to these regions to calculate the write availability of HBase and send alerts if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-9526) LocalHBaseCluster.shutdown hangs for regionserver thread wait zk deleteMyEphemeralNode packet.
Liu Shaohui created HBASE-9526: -- Summary: LocalHBaseCluster.shutdown hangs for regionserver thread wait zk deleteMyEphemeralNode packet. Key: HBASE-9526 URL: https://issues.apache.org/jira/browse/HBASE-9526 Project: HBase Issue Type: Test Affects Versions: 0.94.3 Reporter: Liu Shaohui Priority: Minor Attachments: stack.log When LocalHBaseCluster is shutdowned, it will join all regionserver threads. Regionserver thread will try to delete EphemeralNode and wait zk packet, but the sendThread has been not existed, so no one notify the regionserver thread. {noformat} "RegionServer:0;10.237.14.236,43311,1378958529812" prio=10 tid=0x7f0a9d02 nid=0x18db in Object.wait() [0x7f0a8aa35000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309) - locked <0x8ece3de8> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:866) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:127) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1038) at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1027) at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1073) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:851) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:147) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:100) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:131) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:337) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1340) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.util.Methods.call(Methods.java:37) at org.apache.hadoop.hbase.security.User.call(User.java:603) at org.apache.hadoop.hbase.security.User.access$700(User.java:50) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:443) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:129) {noformat} This situation emerges randomly. If i rerun the test, the tests may pass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9543) Impl unique aggregation
Liu Shaohui created HBASE-9543: -- Summary: Impl unique aggregation Key: HBASE-9543 URL: https://issues.apache.org/jira/browse/HBASE-9543 Project: HBase Issue Type: New Feature Components: Coprocessors Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Impl unique aggregation: return a set of all columns' values in a scan. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9568) backport HBASE-6508 to 0.94
Liu Shaohui created HBASE-9568: -- Summary: backport HBASE-6508 to 0.94 Key: HBASE-9568 URL: https://issues.apache.org/jira/browse/HBASE-9568 Project: HBase Issue Type: Improvement Components: MTTR Reporter: Liu Shaohui Priority: Minor Backport HBASE-6508: Filter out edits at log split time to hbase 0.94 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9764) htable AutoFlush is hardcoded as false in PerformanceEvaluation
Liu Shaohui created HBASE-9764: -- Summary: htable AutoFlush is hardcoded as false in PerformanceEvaluation Key: HBASE-9764 URL: https://issues.apache.org/jira/browse/HBASE-9764 Project: HBase Issue Type: Bug Components: Performance, test Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor In PerformanceEvaluation, htable AutoFlush option is hardcoded as false {code:title=PerformanceEvaluation.java|borderStyle=solid} void testSetup() throws IOException { this.admin = new HBaseAdmin(conf); this.table = new HTable(conf, tableName); this.table.setAutoFlush(false); this.table.setScannerCaching(30); } {code} This makes the write performace unreal. Should we add an autoflush option in PerformanceEvaluation? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9780) Total row num to write in PerformanceEvaluation vary on thread number
Liu Shaohui created HBASE-9780: -- Summary: Total row num to write in PerformanceEvaluation vary on thread number Key: HBASE-9780 URL: https://issues.apache.org/jira/browse/HBASE-9780 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Priority: Minor Total row num to write in PerformanceEvaluation vary on thread number {code} // Set total number of rows to write. this.R = this.R * N; {code} Different row num may result in different random read perf. More threads, More rows, may less block cache hit raito. Should we make the total row num be immutable for different thread numer? -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9873) Some improvements in hlog and hlog split
Liu Shaohui created HBASE-9873: -- Summary: Some improvements in hlog and hlog split Key: HBASE-9873 URL: https://issues.apache.org/jira/browse/HBASE-9873 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Some improvements in hlog and hlog split 1) Try to clean old hlog after each memstore flush to avoid unnecessary hlogs split in failover. Now hlogs cleaning only be run in rolling hlog writer. 2) Add a background hlog compaction thread to compaction the hlog: remove the hlog entries whose data have been flushed to hfile. The scenario is that in a share cluster, write requests of a table may very little and periodical, a lots of hlogs can not be cleaned for entries of this table in those hlogs. 3) Rely on the smallest of all biggest hfile's seqId of previous served regions to ignore some entries. Facebook have implemented this in HBASE-6508 and we backport it to hbase 0.94 in HBASE-9568. 4) Support running multiple hlog splitters on a single RS and on master(latter can boost split efficiency for tiny cluster) 5) Enable multiple splitters on 'big' hlog file by splitting(logically) hlog to slices(configurable size, eg hdfs trunk size 64M) support concurrent multiple split tasks on a single hlog file slice 6) Do not cancel the timeout split task until one task reports it succeeds (avoids scenario where split for a hlog file fails due to no one task can succeed within the timeout period ), and and reschedule a same split task to reduce split time ( to avoid some straggler in hlog split) 7) Consider the hlog data locality when schedule the hlog split task. Schedule the hlog to a splitter which is near to hlog data. 8) Support multi hlog writers and switching to another hlog writer when long write latency to current hlog due to possible temporary network spike? This is a draft which lists the improvements about hlog we try to implement in the near future. Comments and discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9892) Add info port to ServerName to support multi instances in a node
Liu Shaohui created HBASE-9892: -- Summary: Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor The full GC time of regionserver with big heap(> 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9974) Rest sometimes returns incomplete xml/json data
Liu Shaohui created HBASE-9974: -- Summary: Rest sometimes returns incomplete xml/json data Key: HBASE-9974 URL: https://issues.apache.org/jira/browse/HBASE-9974 Project: HBase Issue Type: Bug Components: REST Reporter: Liu Shaohui Rest sometimes return incomplete xml/json data. We found these exceptions in rest server. 13/11/15 11:40:51 ERROR mortbay.log:/log/1A:23:11:0C:06:22* javax.ws.rs.WebApplicationException: javax.xml.bind.MarshalException - with linked exception: [org.mortbay.jetty.EofException] at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:159) at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699) at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.hbase.rest.filter.GzipFilter.doFilter(GzipFilter.java:73) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:322) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: javax.xml.bind.MarshalException - with linked exception: [org.mortbay.jetty.EofException] at com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:325) at com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:249) at javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:75) at com.sun.jersey.json.impl.JSONMarshallerImpl.marshal(JSONMarshallerImpl.java:74) at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:179) at com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:157) ... 24 more Caused by: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) at org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:551) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:572) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:651) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580) at com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:307) at com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:134) at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.flushBuffer(UTF8XmlOutput.java:416) at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.text(UTF8XmlOutput.java:369) at com.sun.xml.bind.v2.runtime.unmarshaller.Base64Data.writeTo(Base64Data.java:303) at com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.text(UTF8XmlOutput.java:310) at com.sun.xml.bind.v2.runtime.XMLSerializer.text(XMLSerializer.java:425) at com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl$PcdataImpl.writeText(RuntimeBuiltinLeafInfoImpl.java:
[jira] [Resolved] (HBASE-9974) Rest sometimes returns incomplete xml/json data
[ https://issues.apache.org/jira/browse/HBASE-9974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui resolved HBASE-9974. Resolution: Not A Problem Assignee: Liu Shaohui > Rest sometimes returns incomplete xml/json data > --- > > Key: HBASE-9974 > URL: https://issues.apache.org/jira/browse/HBASE-9974 > Project: HBase > Issue Type: Bug > Components: REST >Reporter: Liu Shaohui >Assignee: Liu Shaohui > > Rest sometimes return incomplete xml/json data. > We found these exceptions in rest server. > 13/11/15 11:40:51 ERROR mortbay.log:/log/1A:23:11:0C:06:22* > javax.ws.rs.WebApplicationException: javax.xml.bind.MarshalException > - with linked exception: > [org.mortbay.jetty.EofException] > at > com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:159) > at > com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306) > at > com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) > at > com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) > at > com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) > at > com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > at > org.apache.hadoop.hbase.rest.filter.GzipFilter.doFilter(GzipFilter.java:73) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:322) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > Caused by: javax.xml.bind.MarshalException > - with linked exception: > [org.mortbay.jetty.EofException] > at > com.sun.xml.bind.v2.runtime.MarshallerImpl.write(MarshallerImpl.java:325) > at > com.sun.xml.bind.v2.runtime.MarshallerImpl.marshal(MarshallerImpl.java:249) > at > javax.xml.bind.helpers.AbstractMarshallerImpl.marshal(AbstractMarshallerImpl.java:75) > at > com.sun.jersey.json.impl.JSONMarshallerImpl.marshal(JSONMarshallerImpl.java:74) > at > com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:179) > at > com.sun.jersey.core.provider.jaxb.AbstractRootElementProvider.writeTo(AbstractRootElementProvider.java:157) > ... 24 more > Caused by: org.mortbay.jetty.EofException > at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) > at > org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:551) > at > org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:572) > at > org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012) > at > org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:651) > at > org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580) > at > com.sun.jersey.spi.container.servlet.WebComponent$Writer.write(WebComponent.java:307) > at > com.sun.jersey.spi.container.ContainerResponse$CommittingOutputStream.write(ContainerResponse.java:134) > at > com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.flushBuffer(UTF8XmlOutput.java:416) > at > com.sun.xml.bind.v2.runtime.output.UTF8XmlOutput.text(UTF8XmlOutput.java:369) > at > com.sun.xml.bind.v2.runtime.
[jira] [Created] (HBASE-10048) Add hlog number metric in regionserver
Liu Shaohui created HBASE-10048: --- Summary: Add hlog number metric in regionserver Key: HBASE-10048 URL: https://issues.apache.org/jira/browse/HBASE-10048 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Add hlog number metric in regionserver. We can use this metric to alert about memstore flush because of too many hlogs. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10049) Small improvments in region_mover.rb
Liu Shaohui created HBASE-10049: --- Summary: Small improvments in region_mover.rb Key: HBASE-10049 URL: https://issues.apache.org/jira/browse/HBASE-10049 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor We use region_mover.rb in the graceful upgrade of hbase cluster. Here are small improvements. a. remove the table.close(), because the htable could be reused. b. Add more info in the log of moving region. c. Add 20s sleep in load command to make sure the rs finished initialization of rpc server. There is a time gap between rs startup report and rpc server initialization. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10054) Add the default column compression option
Liu Shaohui created HBASE-10054: --- Summary: Add the default column compression option Key: HBASE-10054 URL: https://issues.apache.org/jira/browse/HBASE-10054 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Add the default column compression option for cluster level. If users don't set column compression for a column family, we should use the default column compression. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10055) Add option to limit the scan speed in CopyTable and VerifyReplication
Liu Shaohui created HBASE-10055: --- Summary: Add option to limit the scan speed in CopyTable and VerifyReplication Key: HBASE-10055 URL: https://issues.apache.org/jira/browse/HBASE-10055 Project: HBase Issue Type: Improvement Environment: Add option to limit the scan speed in CopyTable and VerifyReplication. When adding a new replication, we use 'CopyTable' to copy old data from online cluster to peer cluster. After that, we use 'VerifyReplication' to check the data consistency of two clusters. To reduce the impact the online cluster's service, we add an option to limit the scan speed. Reporter: Liu Shaohui Priority: Minor -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10335) AuthFailedException in zookeeper may block replication forever
Liu Shaohui created HBASE-10335: --- Summary: AuthFailedException in zookeeper may block replication forever Key: HBASE-10335 URL: https://issues.apache.org/jira/browse/HBASE-10335 Project: HBase Issue Type: Bug Components: Replication, security Reporter: Liu Shaohui ReplicationSource will rechoose sinks when encounted exceptions during skipping edits to the current sink. But if the zookeeper client for peer cluster go to AUTH_FAILED state, the ReplicationSource will always get AuthFailedException. The ReplicationSource does not reconnect the peer, because reconnectPeer only handle ConnectionLossException and SessionExpiredException. As a result, the replication will print log: {quote} 2014-01-14,12:07:06,892 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting 0 rs from peer cluster # 20 2014-01-14,12:07:06,892 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Slave cluster looks down: 20 has 0 region servers {quote} and be blocked forever. I think other places may have same problems for not handling AuthFailedException in zookeeper. eg: HBASE-8675. [~apurtell] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10370) Compaction in out-of-date Store causes region split failed
Liu Shaohui created HBASE-10370: --- Summary: Compaction in out-of-date Store causes region split failed Key: HBASE-10370 URL: https://issues.apache.org/jira/browse/HBASE-10370 Project: HBase Issue Type: Bug Components: Compaction Reporter: Liu Shaohui Priority: Critical In out product cluster, we encounter a problem that two daughter regions can not been opened for FileNotFoundException. {quote} 2014-01-14,20:12:46,927 INFO org.apache.hadoop.hbase.regionserver.SplitRequest: Running rollback/cleanup of failed split of user_profile,x,1389671863815.99e016485b0bc142d67ae07a884f6966.; Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98 java.io.IOException: Failed lg-hadoop-st34.bj,21600,1389060755669-daughterOpener=ec8bbda0f132c481b451fa40e7152b98 at org.apache.hadoop.hbase.regionserver.SplitTransaction.openDaughters(SplitTransaction.java:375) at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:467) at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:69) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/lgprc-xiaomi/user_profile/99e016485b0bc142d67ae07a884f6966/A/5e05d706e4a84f34acc2cf00f089a4cf {quote} The reason is that a compaction in an out-of-date Store deletes the hfiles, which are referenced by the daughter regions after split. This will cause the daughter regions can not be opened forever. The timeline is that Assumption: there are two hfiles: a, b in Store A in Region R t0: A compaction request of Store A(a+b) in Region R is send. t1: A Split for Region R. But the split is timeout and rollbacked. In the rollback, region reinitializes all store objects , see SplitTransaction #824. Now the store is Region R is A'(a+b). t2: Run compaction(a + b -> c): A(a+b) -> A(c). Hfile a and b are archived. t3: A Split for Region R. R splits into two region R.0, R.1, which create hfile references for hfile a, b from Store A'(a + b) t4: For hfile a, b have been deleted, the opening for region R.0 and R.1 will failed for FileNotFoundException. I have add a test to identity this problem. After search the jira, maybe HBASE-8502 is the same problem. [~goldin] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10535) Table trash to recover table deleted by mistake
Liu Shaohui created HBASE-10535: --- Summary: Table trash to recover table deleted by mistake Key: HBASE-10535 URL: https://issues.apache.org/jira/browse/HBASE-10535 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Priority: Minor When table is deleted, only Hfiles are moved to archives dir, table and region infos are deleted immediately. So it's very difficult to recover tables which are deleted by mistakes. I think if we can introduce an table trash dir in HDFS. When the table is deleted, the entire table dir is moved to trash dir. And after an configurable ttl, the dir is deleted actually. This can be done by HMaster. If we want to recover the deleted table, we can use a tool which moves table dir out of trash and recovery the meta data of the table. There are many problems the recover tool will encountered eg, parent and daughter regions are all in the table dir. But I think this feature is useful to handle some special cases. Discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10583) backport HBASE-8402 to 0.94
Liu Shaohui created HBASE-10583: --- Summary: backport HBASE-8402 to 0.94 Key: HBASE-10583 URL: https://issues.apache.org/jira/browse/HBASE-10583 Project: HBase Issue Type: Bug Reporter: Liu Shaohui see HBASE-8402 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10617) Value lost if "$" element is before "column" element in json when posted to Rest Server
Liu Shaohui created HBASE-10617: --- Summary: Value lost if "$" element is before "column" element in json when posted to Rest Server Key: HBASE-10617 URL: https://issues.apache.org/jira/browse/HBASE-10617 Project: HBase Issue Type: Bug Components: REST Affects Versions: 0.94.11 Reporter: Liu Shaohui Priority: Minor When post following json data to rest server, it return 200, but the value is null in HBase {code} {"Row": { "key":"cjI=", "Cell": {"$":"ZGF0YTE=", "column":"ZjE6YzI="}}} {code} >From rest server log, we found the length of value is null after the server >paste the json to RowModel object {code} 14/02/26 17:52:14 DEBUG rest.RowResource: PUT {"totalColumns":1,"families":{"f1":[{"timestamp":9223372036854775807,"qualifier":"c2","vlen":0}]},"row":"r2"} {code} When the order is that "column" before "$", it works fine. {code} {"Row": { "key":"cjI=", "Cell": {"column":"ZjE6YzI=", "$":"ZGF0YTE=" }}} {code} DIfferent json libs may have different order of this two elements even if "column" is put before "$". -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10627) A logic mistake in HRegionServer isHealthy
Liu Shaohui created HBASE-10627: --- Summary: A logic mistake in HRegionServer isHealthy Key: HBASE-10627 URL: https://issues.apache.org/jira/browse/HBASE-10627 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Priority: Minor After visiting the isHealthy in HRegionServer, I think there is a logic mistake. {code} // Verify that all threads are alive if (!(leases.isAlive() && cacheFlusher.isAlive() && hlogRoller.isAlive() && this.compactionChecker.isAlive()) < logic wrong here && this.periodicFlusher.isAlive()) { stop("One or more threads are no longer alive -- stop"); return false; } {code} which should be {code} // Verify that all threads are alive if (!(leases.isAlive() && cacheFlusher.isAlive() && hlogRoller.isAlive() && this.compactionChecker.isAlive() && this.periodicFlusher.isAlive())) { stop("One or more threads are no longer alive -- stop"); return false; } {code} Please finger out if i am wrong. Thx -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10692) The Multi TableMap job don't support the security HBase cluster
Liu Shaohui created HBASE-10692: --- Summary: The Multi TableMap job don't support the security HBase cluster Key: HBASE-10692 URL: https://issues.apache.org/jira/browse/HBASE-10692 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Liu Shaohui Priority: Minor HBASE-3996 adds the support of multiple tables and scanners as input to the mapper in map/reduce jobs. But it don't support the security HBase cluster. [~erank] [~bbaugher] Ps: HBASE-3996 only support multiple tables from the same HBase cluster. Should we support multiple tables from the different clusters? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10774) Restore TestMultiTableInputFormat
Liu Shaohui created HBASE-10774: --- Summary: Restore TestMultiTableInputFormat Key: HBASE-10774 URL: https://issues.apache.org/jira/browse/HBASE-10774 Project: HBase Issue Type: Test Reporter: Liu Shaohui Priority: Minor TestMultiTableInputFormat is removed in HBASE-9009 for this test made the ci failed. But in HBASE-10692 we need to add a new test TestSecureMultiTableInputFormat which is depends on it. So we try to restore it in this issue. I rerun the test for several times and it passed. {code} Running org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 314.163 sec {code} [~stack] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10782) Hadoop2 MR tests fail occasionally because of mapreduce.jobhistory.address is no set in job conf
Liu Shaohui created HBASE-10782: --- Summary: Hadoop2 MR tests fail occasionally because of mapreduce.jobhistory.address is no set in job conf Key: HBASE-10782 URL: https://issues.apache.org/jira/browse/HBASE-10782 Project: HBase Issue Type: Test Reporter: Liu Shaohui Priority: Minor Hadoop2 MR tests fail occasionally with output like this: {code} --- Test set: org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1 --- Tests run: 5, Failures: 0, Errors: 5, Skipped: 0, Time elapsed: 347.57 sec <<< FAILURE! testScanEmptyToAPP(org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1) Time elapsed: 50.047 sec <<< ERROR! java.io.IOException: java.net.ConnectException: Call From liushaohui-OptiPlex-990/127.0.0.1 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:334) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:419) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:524) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:314) at org.apache.hadoop.mapreduce.Job$1.run(Job.java:311) at java.security.AccessController.doPrivileged(Native Method) ... {code} The reason is that when MR job was running, the job client pulled the job status from AppMaster. When the job is completed, the AppMaster will exit. At this time, if the job client have not got the job completed event from AppMaster, it will try to get job report from history server. But in HBaseTestingUtility#startMiniMapReduceCluster, the config: mapreduce.jobhistory.address is not copied to TestUtil's config. CRUNCH-249 reported the same problem. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10788) Add 99th percentile of latency in PE
Liu Shaohui created HBASE-10788: --- Summary: Add 99th percentile of latency in PE Key: HBASE-10788 URL: https://issues.apache.org/jira/browse/HBASE-10788 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor In production env, 99th percentile of latency is more important than the avg. The 99th percentile is helpful to measure the influence of GC, slow read/write of HDFS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10790) make assembly:single as default in pom.xml
Liu Shaohui created HBASE-10790: --- Summary: make assembly:single as default in pom.xml Key: HBASE-10790 URL: https://issues.apache.org/jira/browse/HBASE-10790 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Priority: Minor Now to compile a HBase tar release package, we should use the cmd: {code} mvn clean package assembly:single {code}, which is not convenient. We can make assembly:single as a default option and run the assembly plugin in maven package phase. Then we can just use the cmd {code} mvn clean package {code} to get a release package. Other suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10806) Two protos missing in hbase-protocol/pom.xml
Liu Shaohui created HBASE-10806: --- Summary: Two protos missing in hbase-protocol/pom.xml Key: HBASE-10806 URL: https://issues.apache.org/jira/browse/HBASE-10806 Project: HBase Issue Type: Bug Environment: VisibilityLabels.proto and Encryption.proto are missing in hbase-protocol/pom.xml. The corresponding classed are not regenerated in maven cmd: {code} mvn compile -Pcompile-protobuf {code} Reporter: Liu Shaohui -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10839) NullPointerException in construction of RegionServer in Security Cluster
Liu Shaohui created HBASE-10839: --- Summary: NullPointerException in construction of RegionServer in Security Cluster Key: HBASE-10839 URL: https://issues.apache.org/jira/browse/HBASE-10839 Project: HBase Issue Type: Bug Components: regionserver Reporter: Liu Shaohui Priority: Critical The initialization of secure rpc server depends on regionserver's servername and zooKeeper watcher. But, After HBASE-10569, they are null when creating secure rpc services. [~jxiang] {code} Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.ipc.RpcServer.createSecretManager(RpcServer.java:1974) at org.apache.hadoop.hbase.ipc.RpcServer.start(RpcServer.java:1945) at org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:706) at org.apache.hadoop.hbase.master.MasterRpcServices.(MasterRpcServices.java:190) at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:297) at org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:431) at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:234) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10846) Links between active and backup masters are broken
Liu Shaohui created HBASE-10846: --- Summary: Links between active and backup masters are broken Key: HBASE-10846 URL: https://issues.apache.org/jira/browse/HBASE-10846 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Priority: Minor Links between active and backup masters are broken for the the blanks before info port in the url. {code} href="//wcc-hadoop-tst-ct01.bj: 12501/master-status" {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10881) Support reverse scan in thrift2
Liu Shaohui created HBASE-10881: --- Summary: Support reverse scan in thrift2 Key: HBASE-10881 URL: https://issues.apache.org/jira/browse/HBASE-10881 Project: HBase Issue Type: New Feature Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Support reverse scan in thrift2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-10943) Backport HBASE-7329 to 0.94
Liu Shaohui created HBASE-10943: --- Summary: Backport HBASE-7329 to 0.94 Key: HBASE-10943 URL: https://issues.apache.org/jira/browse/HBASE-10943 Project: HBase Issue Type: Improvement Affects Versions: 0.94.18 Reporter: Liu Shaohui Priority: Minor See HBASE-7329 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11043) Users with table's read/write permission can't get table's description
Liu Shaohui created HBASE-11043: --- Summary: Users with table's read/write permission can't get table's description Key: HBASE-11043 URL: https://issues.apache.org/jira/browse/HBASE-11043 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.99.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor AccessController#preGetTableDescriptors only allow users with admin or create permission to get table's description. {quote} requirePermission("getTableDescriptors", nameAsBytes, null, null, Permission.Action.ADMIN, Permission.Action.CREATE); {quote} I think Users with table's read/write permission should also be able to get table's description. Eg: when create a hive table on HBase, hive will get the table description to check if the mapping is right. Usually the hive users only have the read permission of table. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11095) Add ip restriction in user permissions
Liu Shaohui created HBASE-11095: --- Summary: Add ip restriction in user permissions Key: HBASE-11095 URL: https://issues.apache.org/jira/browse/HBASE-11095 Project: HBase Issue Type: New Feature Components: security Reporter: Liu Shaohui Priority: Minor For some sensitive data, users want to restrict the from ips of hbase users like mysql access control. One direct solution is to add the candidated ips when granting user permisions. {quote} grant [ [ [ ] ] ] {quote} Any comments and suggestions are welcomed. [~apurtell] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11115) Support setting max version per column family in Get
Liu Shaohui created HBASE-5: --- Summary: Support setting max version per column family in Get Key: HBASE-5 URL: https://issues.apache.org/jira/browse/HBASE-5 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Priority: Minor The Get operation only supports setting the max version for all column families. But different column families may have different versions data, and users may want to get data with different versions from different column families in a single Get operation. Though, we can translate this kind of Get to multi single-column-family Gets, these Gets are sequential in regionserver and have different mvcc. Comments and suggestions are welcomed. Thx -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11218) Data loss in HBase standalone mode
Liu Shaohui created HBASE-11218: --- Summary: Data loss in HBase standalone mode Key: HBASE-11218 URL: https://issues.apache.org/jira/browse/HBASE-11218 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Fix For: 0.99.0 Data loss in HBase standalone mode. *How to produce it* # Start HBase standalone mode. # Create a table using hbase shell. # Scan '.META.' and you will find data in meta table # Kill the HBase process with -9 option # Start the HBase agaion # Scan '.META.' and you will find nothing in meta table. *There are three main reasons.* # FSDataOutputStream.sync should call flush() if the underlying wrapped stream is not Syncable. See HADOOP-8861 # writeChecksum is ture in default LocalFileSystem and the ChecksumFSOutputSummer will buffer the data, which make the waledits are not written to os's filesystem with sync method immediately, and those edits will be lost in regionserver's failover. # The MiniZooKeeperCluster deletes the old zk data at startup which maye cause data loss in meta table. The failover procedure is: split pre root regionserver's hlog -> assign root -> split pre meta regionserver's hlog -> assign meta -> split all other regionservers' hlogs -> assign other regions. If there is no data in zookeeper, we will get null for root regionserver and then assign root table. Some data in root table maybe be lost for some root's WalEdits have not been splited and replayed. So does the Meta table. I finished the patch for 0.94 and am working on the patch for trunk. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11223) Limit the actions number of a call in the batch
Liu Shaohui created HBASE-11223: --- Summary: Limit the actions number of a call in the batch Key: HBASE-11223 URL: https://issues.apache.org/jira/browse/HBASE-11223 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.99.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Huge batch operation will make regionserver crash for GC. The extreme code like this: {code} final List deletes = new ArrayList(); final long rows = 400; for (long i = 0; i < rows; ++i) { deletes.add(new Delete(Bytes.toBytes(i))); } table.delete(deletes); {code} We should limit the actions number of a call in the batch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11232) Region fail to release the updatelock for illegal CF in multi row mutations
Liu Shaohui created HBASE-11232: --- Summary: Region fail to release the updatelock for illegal CF in multi row mutations Key: HBASE-11232 URL: https://issues.apache.org/jira/browse/HBASE-11232 Project: HBase Issue Type: Bug Components: regionserver Reporter: Liu Shaohui Assignee: Liu Shaohui Fix For: 0.99.0 The failback code in processRowsWithLocks did not check the column family. If there is an illegal CF in the muation, it will throw NullPointException and the update lock will not be released. So the region can not be flushed and compacted. HRegion #4946 {code} if (!mutations.isEmpty() && !walSyncSuccessful) { LOG.warn("Wal sync failed. Roll back " + mutations.size() + " memstore keyvalues for row(s):" + processor.getRowsToLock().iterator().next() + "..."); for (KeyValue kv : mutations) { stores.get(kv.getFamily()).rollback(kv); } } // 11. Roll mvcc forward if (writeEntry != null) { mvcc.completeMemstoreInsert(writeEntry); writeEntry = null; } if (locked) { this.updatesLock.readLock().unlock(); locked = false; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11240) Print hdfs pipeline when hlog's sync is slow
Liu Shaohui created HBASE-11240: --- Summary: Print hdfs pipeline when hlog's sync is slow Key: HBASE-11240 URL: https://issues.apache.org/jira/browse/HBASE-11240 Project: HBase Issue Type: Improvement Components: wal Reporter: Liu Shaohui Assignee: Liu Shaohui Sometimes the slow sync of hlog writer is because there is an abnormal datanode in the pipeline. So it will be helpful to print the pipeline of slow sync to diagnose those problems. The ultimate solution is to join the trace of HBase and HDFS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11255) Negative request num in region load
Liu Shaohui created HBASE-11255: --- Summary: Negative request num in region load Key: HBASE-11255 URL: https://issues.apache.org/jira/browse/HBASE-11255 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor We found that the request number of region is negative in long-running hbase cluster. The is because of improper cast in HRegionServer#createRegionLoad {code} ... .setReadRequestsCount((int)r.readRequestsCount.get()) .setWriteRequestsCount((int) r.writeRequestsCount.get()) {code} The patch is simple and just removes the cast. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11263) Share the open/close store file thread pool for all store in a region
Liu Shaohui created HBASE-11263: --- Summary: Share the open/close store file thread pool for all store in a region Key: HBASE-11263 URL: https://issues.apache.org/jira/browse/HBASE-11263 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Liu Shaohui Priority: Minor Currently, the open/close store file thread pool is divided equally to all stores of a region. {code} protected ThreadPoolExecutor getStoreFileOpenAndCloseThreadPool( final String threadNamePrefix) { int numStores = Math.max(1, this.htableDescriptor.getFamilies().size()); int maxThreads = Math.max(1, conf.getInt(HConstants.HSTORE_OPEN_AND_CLOSE_THREADS_MAX, HConstants.DEFAULT_HSTORE_OPEN_AND_CLOSE_THREADS_MAX) / numStores); return getOpenAndCloseThreadPool(maxThreads, threadNamePrefix); } {code} This is not very optimal in following scenarios: # The data of some column families are very large and there are many hfiles in those stores, and others may be very small and in-memory column families. # Usually we preserve some column families for later needs. The thread pool for these column families are wasted。 The simple way is to share a big thread pool for all stores to open/close hfiles. Suggestions are welcomed. Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11274) More general single-row Condition Mutation
Liu Shaohui created HBASE-11274: --- Summary: More general single-row Condition Mutation Key: HBASE-11274 URL: https://issues.apache.org/jira/browse/HBASE-11274 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Priority: Minor Currently, the checkAndDelete and checkAndPut interface only support atomic mutation with single condition. But in actual apps, we need more general condition-mutation that support multi conditions and logical expression with those conditions. For example, to support the following sql {quote} insert row where (column A == 'X' and column B == 'Y') or (column C == 'z') {quote} Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11300) Wrong permission check for checkAndPut in AccessController
Liu Shaohui created HBASE-11300: --- Summary: Wrong permission check for checkAndPut in AccessController Key: HBASE-11300 URL: https://issues.apache.org/jira/browse/HBASE-11300 Project: HBase Issue Type: Bug Components: security Affects Versions: 0.99.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor For the checkAndPut operation, the AccessController only checks the read and write permission for the family and qualifier to check, but ignores the write permission for the family map of "put". What's more, we don't need the write permission for the family and qualifier to check. See the code AccessController.java #1538 {code} Map> families = makeFamilyMap(family, qualifier); User user = getActiveUser(); AuthResult authResult = permissionGranted(OpType.CHECK_AND_PUT, user, env, families, Action.READ, Action.WRITE); {code} Same problem for checkAndDelete operation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11369) The proceduce of interrupting current split task should be updated after HBASE-9736.
Liu Shaohui created HBASE-11369: --- Summary: The proceduce of interrupting current split task should be updated after HBASE-9736. Key: HBASE-11369 URL: https://issues.apache.org/jira/browse/HBASE-11369 Project: HBase Issue Type: Bug Components: wal Reporter: Liu Shaohui Priority: Minor Before HBASE-9736, SplitLogWorker only split one hlog at a time. When the data of znode for this task is changed (task is timeouted and resigned by splitLogManager), zookeeper will notify the SplitLogWorker. If this log task is owned by other regionserver, the SplitLogWorker will interrupt current task and try to get another task. HBASE-9736 allow multi log splitters per RS so there will be multi current tasks running in the thread pool in SplitLogWorker. So the proceduce of Interrupting current split task need be updated. [~jeffreyz] [~stack] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11373) hbase-protocol compile failed for name conflict of RegionTransition
Liu Shaohui created HBASE-11373: --- Summary: hbase-protocol compile failed for name conflict of RegionTransition Key: HBASE-11373 URL: https://issues.apache.org/jira/browse/HBASE-11373 Project: HBase Issue Type: Bug Components: Protobufs Reporter: Liu Shaohui Priority: Minor The compile of hbase-protocol failed for there are two message named RegionTransition in ZooKeeper.proto and RegionServerStatus.proto {quote} $mvn clean package -Pcompile-protobuf -X \[DEBUG\] RegionServerStatus.proto:81:9: "RegionTransition" is already defined in file "ZooKeeper.proto". \[DEBUG\] RegionServerStatus.proto:114:12: "RegionTransition" seems to be defined in "ZooKeeper.proto", which is not imported by "RegionServerStatus.proto". To use it here, please add the necessary import. \[ERROR\] protoc compiler error {quote} Through It will be ok if we compile the the ZooKeeper.proto and RegionServerStatus.proto seperately. it is not very convenient. The new RegionTransition is RegionServerStatus.proto and introduced in HBASE-11059. [~jxiang] What's your suggestion about this issue? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11410) A tool for deleting data of a column using BulkDeleteEndpoint
Liu Shaohui created HBASE-11410: --- Summary: A tool for deleting data of a column using BulkDeleteEndpoint Key: HBASE-11410 URL: https://issues.apache.org/jira/browse/HBASE-11410 Project: HBase Issue Type: Improvement Components: Coprocessors Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Sometimes we need a tool to delete unused or wrong format data in some columns. So we add a tool using BulkDeleteEndpoint. usage: delete column f1:c1 in table t1 {quote} ./hbase org.apache.hadoop.hbase.coprocessor.example.BulkDeleteTool t1 f1:c1 {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11536) Puts of region location to Meta may be out of order which causes inconsistent of region location
Liu Shaohui created HBASE-11536: --- Summary: Puts of region location to Meta may be out of order which causes inconsistent of region location Key: HBASE-11536 URL: https://issues.apache.org/jira/browse/HBASE-11536 Project: HBase Issue Type: Bug Components: Region Assignment Reporter: Liu Shaohui Priority: Critical In product hbase cluster, we found inconsistency of region location in the meta table. Region cdfa2ed711bbdf054d9733a92fd43eb5 is onlined in regionserver 10.237.12.13:11600 but the region location in Meta table is 10.237.12.15:11600. This is because of the out-of-order puts for meta table. # HMaster try to assign the region to 10.237.12.15:11600. # RegionServer: 10.237.12.15:11600. During the opening the region, the put of region location(10.237.12.15:11600) to meta table is timeout(60s) and the htable retry for second time. (regionserver serving meta has got the request of the put. The timeout is beause ther is a bad disk in this regionserver and sync of hlog is very slow. ) During the retry in htable, the OpenRegionHandler is timeout(100s) and the PostOpenDeployTasksThread is interrupted. Through the htable is closed in the MetaEditor finally, the share connection the htable used is not closed and the call of put for meta table is on-flying in the connection. Assumed that this on-flying call of put to meta is named call A. # RegionServer: 10.237.12.15:11600. For the timeout of OpenRegionHandler, the OpenRegionHandler marks the assign state of this region to FAILED_OPEN. # HMaster watchs this event of FAILED_OPEN and assigns the region to another regionserver: 10.237.12.13:11600 # RegionServer: 10.237.12.13:11600. This regionserver opens the region successfully . Assumed that the put of region location(10.237.12.13:11600) to meta table in this regionserver is named B. There is no order guarantee for call A and B. If call A is processed after call B in regionserver serving meta region, the region location in meta table will be wrong. >From the raw scan of meta table we found: {code} scan '.META.', {RAW => true, LIMIT => 1, VERSIONS => 10, STARTROW => 'xxx.adfa2ed711bbdf054d9733a92fd43eb5.'} {code} {quote} xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, timestamp=1404885460553(=> Wed Jul 09 13:57:40 +0800 2014), value=10.237.12.15:11600 --> Retry put from 10.237.12.15 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, timestamp=1404885456731(=> Wed Jul 09 13:57:36 +0800 2014), value=10.237.12.13:11600 --> put from 10.237.12.13 xxx.adfa2ed711bbdf054d9733a92fd43eb5. column=info:server, timestamp=1404885353122( Wed Jul 09 13:55:53 +0800 2014), value=10.237.12.15:11600 --> First put from 10.237.12.15 {quote} Related hbase log is attached in this issue and disscusions are welcomed. For there is no order guarantee for puts from different htables, one solution for this issue is to give an increased id for each assignment of a region and use this id as the timestamp of put of region location to meta table. The region location with large assign id will be got by hbase clients. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11541) Wrong result when scaning meta with startRow
Liu Shaohui created HBASE-11541: --- Summary: Wrong result when scaning meta with startRow Key: HBASE-11541 URL: https://issues.apache.org/jira/browse/HBASE-11541 Project: HBase Issue Type: Bug Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor When we scan the meta with STARTROW optiion, wrong result may be returned. For exmaple: if there are two tables named "a" and "b" in hbase, when we scan the meta with startrow = 'b', the region location of table "a" will be returned but we expect to get the region location of table "b". {code} > create 'a', {NAME => 'f'} > create 'b', {NAME => 'f'} > scan '.META.', {STARTROW => 'b', LIMIT => 1} a,,1405655897758.f8b547476b6dc80545e6413c31396, {code} The reason is the wrong assumption in MetaKeyComparator See: KeyValue.java#2011 {code} int leftDelimiter = getDelimiter(left, loffset, llength, HRegionInfo.DELIMITER); int rightDelimiter = getDelimiter(right, roffset, rlength, HRegionInfo.DELIMITER); if (leftDelimiter < 0 && rightDelimiter >= 0) { // Nothing between .META. and regionid. Its first key. return -1; } else if (rightDelimiter < 0 && leftDelimiter >= 0) { return 1; } else if (leftDelimiter < 0 && rightDelimiter < 0) { return 0; } {code} It's a little troublesome to fix this problem for given a start row which contains more than two "," for meta, it's not easy to extract the startKey of region. eg: STARTROW => 'aaa,bbb,ccc,xxx'. Comments and suggestions are welcomed. Thanks -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11648) Typo of config: hbase.hstore.compaction.ratio in book.xml
Liu Shaohui created HBASE-11648: --- Summary: Typo of config: hbase.hstore.compaction.ratio in book.xml Key: HBASE-11648 URL: https://issues.apache.org/jira/browse/HBASE-11648 Project: HBase Issue Type: Bug Components: Compaction Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor When looking at the parameters used by compaction algorithm in http://hbase.apache.org/book/regions.arch.html, we found there is typo. In hbase code, the config key for compaction ratio is hbase.hstore.compaction.ratio. But in the hbase book it's hbase.store.compaction.ratio. CompactSelection.java#66 {code} this.conf = conf; this.compactRatio = conf.getFloat("hbase.hstore.compaction.ratio", 1.2F); this.compactRatioOffPeak = conf.getFloat("hbase.hstore.compaction.ratio.offpeak", 5.0F); {code} Just fix it to avoid misleading. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11685) Incr/decr on the reference count of HConnectionImplementation need be atomic
Liu Shaohui created HBASE-11685: --- Summary: Incr/decr on the reference count of HConnectionImplementation need be atomic Key: HBASE-11685 URL: https://issues.apache.org/jira/browse/HBASE-11685 Project: HBase Issue Type: Bug Components: Client Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Currently, the incr/decr operation on the ref count of HConnectionImplementation are not atomic. This may cause that the ref count always be larger than 0 and the connection never be closed. {code} /** * Increment this client's reference count. */ void incCount() { ++refCount; } /** * Decrement this client's reference count. */ void decCount() { if (refCount > 0) { --refCount; } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11707) Using Map instead of list in FailedServers of RpcClient
Liu Shaohui created HBASE-11707: --- Summary: Using Map instead of list in FailedServers of RpcClient Key: HBASE-11707 URL: https://issues.apache.org/jira/browse/HBASE-11707 Project: HBase Issue Type: Improvement Components: Client Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Currently, FailedServers uses a list to record the black list of servers and iterate the list to check if a server is in list. It's not efficient when the list is very large. And the list is not thread safe for the add and iteration operations. RpcClient.java#175 {code} // iterate, looking for the search entry and cleaning expired entries Iterator> it = failedServers.iterator(); while (it.hasNext()) { Pair cur = it.next(); if (cur.getFirst() < now) { it.remove(); } else { if (lookup.equals(cur.getSecond())) { return true; } } {code} A simple change is to change this list to ConcurrentHashMap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-14237) Meta region may be onlined on multi regonservers for bugs of assigning meta
Liu Shaohui created HBASE-14237: --- Summary: Meta region may be onlined on multi regonservers for bugs of assigning meta Key: HBASE-14237 URL: https://issues.apache.org/jira/browse/HBASE-14237 Project: HBase Issue Type: Bug Affects Versions: 0.94.11 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Critical When a regionserver failed to open the meta region and crash after setting the RS_ZK_REGION_FAILED_OPEN state of meta region in zookeeper, the master will handle the event of RS_ZK_REGION_FAILED_OPEN and try to assign the meta region again in AssignmentManager#handleRegion. But at the same time, the master will handle the regionserver expired event and start a MetaServerShutdownHandler for the regionserver, because the servername of regionserver is same as the servername of the unassigned node of meta region. In the MetaServerShutdownHandler, the meta region may be assigned for second time. [~heliangliang] We have encountered this problem in our production cluster which resulted in inconsistency of region location in meta table. You can see the log from the attachment. The code of AssignmentManager is so complex and I have not get a solution to fix this problem. Could someone kindly help to give some suggestions? Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14247) Separate the old WALs into different regionserver directories
Liu Shaohui created HBASE-14247: --- Summary: Separate the old WALs into different regionserver directories Key: HBASE-14247 URL: https://issues.apache.org/jira/browse/HBASE-14247 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Currently all old WALs of regionservers are achieved into the single directory of oldWALs. In big clusters, because of long TTL of WAL or disabled replications, the number of files under oldWALs may reach the max-directory-items limit of HDFS, which will make the hbase cluster crashed. ``` Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemsExceededException): The directory item limit of /hbase/lgprc-xiaomi/.oldlogs is exceeded: limit=1048576 items=1048576 ``` A simple solution is to separate the old WALs into different directories according to the server name of the WAL. Suggestions are welcomed~ Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14254) Wrong error message when throwing NamespaceNotFoundException in shell
Liu Shaohui created HBASE-14254: --- Summary: Wrong error message when throwing NamespaceNotFoundException in shell Key: HBASE-14254 URL: https://issues.apache.org/jira/browse/HBASE-14254 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Wrong error message when throwing NamespaceNotFoundException in shell {code} hbase(main):004:0> create 'ns:t1', {NAME => 'f1'} ERROR: Unknown namespace ns:t1! {code} The namespace shoud be {color:red}ns {color}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14277) TestRegionServerHostname.testRegionServerHostname may fail at host with a case sensitive name
Liu Shaohui created HBASE-14277: --- Summary: TestRegionServerHostname.testRegionServerHostname may fail at host with a case sensitive name Key: HBASE-14277 URL: https://issues.apache.org/jira/browse/HBASE-14277 Project: HBase Issue Type: Test Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor After HBASE-13995, hostname will be converted to lower case in ServerName. It may cause the test: TestRegionServerHostname.testRegionServerHostname failed at host with a case sensitive name. Just fix it in test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-14404) Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98
[ https://issues.apache.org/jira/browse/HBASE-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui reopened HBASE-14404: - [~apurtell] There are typos in patch v2, which made the test failed. All failed tests passed with patch v3. You can see the diff of v2 and v3 from the file: v3-v2.diff > Backport HBASE-14098 (Allow dropping caches behind compactions) to 0.98 > --- > > Key: HBASE-14404 > URL: https://issues.apache.org/jira/browse/HBASE-14404 > Project: HBase > Issue Type: Task >Reporter: Andrew Purtell > Attachments: HBASE-14404-0.98.patch, HBASE-14404-0.98.patch > > > HBASE-14098 adds a new configuration toggle - > "hbase.hfile.drop.behind.compaction" - which if set to "true" tells > compactions to drop pages from the OS blockcache after write. It's on by > default where committed so far but a backport to 0.98 would default it to > off. (The backport would also retain compat methods to LimitedPrivate > interface StoreFileScanner.) What could make it a controversial change in > 0.98 is it changes the default setting of > 'hbase.regionserver.compaction.private.readers' from "false" to "true". I > think it's fine, we use private readers in production. They're stable and do > not present perf issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14517) Show regionserver's version in master status page
Liu Shaohui created HBASE-14517: --- Summary: Show regionserver's version in master status page Key: HBASE-14517 URL: https://issues.apache.org/jira/browse/HBASE-14517 Project: HBase Issue Type: Improvement Components: monitoring Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor In production env, regionservers may be removed from the cluster for hardware problems and rejoined the cluster after the repair. There is a potential risk that the version of rejoined regionserver may diff from others because the cluster has been upgraded through many versions. To solve this, we can show the all regionservers' version in the server list of master's status page, and highlight the regionserver when its version is different from the master's version, similar to HDFS-3245 Suggestions are welcome~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14591) Region with reference hfile may split after a forced split in IncreasingToUpperBoundRegionSplitPolicy
Liu Shaohui created HBASE-14591: --- Summary: Region with reference hfile may split after a forced split in IncreasingToUpperBoundRegionSplitPolicy Key: HBASE-14591 URL: https://issues.apache.org/jira/browse/HBASE-14591 Project: HBase Issue Type: Bug Affects Versions: 0.98.15 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0 In the IncreasingToUpperBoundRegionSplitPolicy, a region with a store having hfile reference may split after a forced split. This will break many assumptions of design. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15312) Update the dependences of pom for mini cluster in HBase Book
Liu Shaohui created HBASE-15312: --- Summary: Update the dependences of pom for mini cluster in HBase Book Key: HBASE-15312 URL: https://issues.apache.org/jira/browse/HBASE-15312 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor In HBase book, the dependences of pom for mini cluster is outdated after version 0.96. See: http://hbase.apache.org/book.html#_integration_testing_with_an_hbase_mini_cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15338) Add a option to disable the data block cache for testing the performance of underlying file system
Liu Shaohui created HBASE-15338: --- Summary: Add a option to disable the data block cache for testing the performance of underlying file system Key: HBASE-15338 URL: https://issues.apache.org/jira/browse/HBASE-15338 Project: HBase Issue Type: Improvement Components: integration tests Reporter: Liu Shaohui Assignee: Liu Shaohui When testing and comparing the performance of different file systems(HDFS, Azure blob storage, AWS S3 and so on) for HBase, it's better to avoid the affect of the HBase BlockCache and get the actually random read latency when data block is read from underlying file system. (Usually, the index block and meta block should be cached in memory in the testing). So we add a option in CacheConfig to disable the data block cache. Suggestions are welcomed~ Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15385) A failed atomic folder rename operation can never recovery for the destination file deleted in Wasb filesystem
Liu Shaohui created HBASE-15385: --- Summary: A failed atomic folder rename operation can never recovery for the destination file deleted in Wasb filesystem Key: HBASE-15385 URL: https://issues.apache.org/jira/browse/HBASE-15385 Project: HBase Issue Type: Bug Components: hadoop-azure Reporter: Liu Shaohui Priority: Critical Fix For: 3.0.0 When using Wsab file system, we found that a failed atomic folder rename operation can never recovery for the destination file deleted in Wasb filesystem. {quota} ls: Attempting to complete rename of file hbase/azurtst-xiaomi/data/default/YCSBTest/.tabledesc during folder rename redo, and file was not found in source or destination. {quote} The reason is the the file is renamed to the destination file before the crash, and the destination file is deleted by another process after crash. So the recovery is blocked during finishing the rename operation of this file when found the source and destination files all don't exist. See: NativeAzureFileSystem.java #finishSingleFileRename Another serious problem is that the recovery of atomic rename operation may delete new created file which is same name as the source file, because the file system don't check if there are rename operation need be redo. Suggestions are welcomed~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-15385) A failed atomic folder rename operation can never recovery for the destination file is deleted in Wasb filesystem
[ https://issues.apache.org/jira/browse/HBASE-15385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui resolved HBASE-15385. - Resolution: Invalid Fix Version/s: (was: 3.0.0) > A failed atomic folder rename operation can never recovery for the > destination file is deleted in Wasb filesystem > - > > Key: HBASE-15385 > URL: https://issues.apache.org/jira/browse/HBASE-15385 > Project: HBase > Issue Type: Bug > Components: hadoop-azure >Reporter: Liu Shaohui >Priority: Critical > > When using Wsab file system, we found that a failed atomic folder rename > operation can never recovery for the destination file deleted in Wasb > filesystem. > {quota} > ls: Attempting to complete rename of file > hbase/azurtst-xiaomi/data/default/YCSBTest/.tabledesc during folder rename > redo, and file was not found in source or destination. > {quote} > The reason is the the file is renamed to the destination file before the > crash, and the destination file is deleted by another process after crash. So > the recovery is blocked during finishing the rename operation of this file > when found the source and destination files all don't exist. > See: NativeAzureFileSystem.java #finishSingleFileRename > Another serious problem is that the recovery of atomic rename operation may > delete new created file which is same name as the source file, because the > file system don't check if there are rename operation need be redo. > Suggestions are welcomed~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15391) Avoid too large "deleted from META" info log
Liu Shaohui created HBASE-15391: --- Summary: Avoid too large "deleted from META" info log Key: HBASE-15391 URL: https://issues.apache.org/jira/browse/HBASE-15391 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0 When deleting a large table in HBase, there will be a large info log in HMaster. {code} 2016-02-29,05:58:45,920 INFO org.apache.hadoop.hbase.catalog.MetaEditor: Deleted [{ENCODED => 4b54572150941cd03f5addfdeab0a754, NAME => 'YCSBTest,,1453186492932.4b54572150941cd03f5addfdeab0a754.', STARTKEY => '', ENDKEY => 'user01'}, {ENCODED => 715e142bcd6a31d7842abf286ef8a5fe, NAME => 'YCSBTest,user01,1453186492933.715e142bcd6a31d7842abf286ef8a5fe.', STARTKEY => 'user01', ENDKEY => 'user02'}, {ENCODED => 5f9cef5714973f13baa63fba29a68d70, NAME => 'YCSBTest,user02,1453186492933.5f9cef5714973f13baa63fba29a68d70.', STARTKEY => 'user02', ENDKEY => 'user03'}, {ENCODED => 86cf3fa4c0a6b911275512c1d4b78533, NAME => 'YCSBTest,user0... {code} The reason is that MetaTableAccessor will log all regions when deleting them from meta. See, MetaTableAccessor.java#deleteRegions {code} public static void deleteRegions(Connection connection, List regionsInfo, long ts) throws IOException { List deletes = new ArrayList(regionsInfo.size()); for (HRegionInfo hri: regionsInfo) { Delete e = new Delete(hri.getRegionName()); e.addFamily(getCatalogFamily(), ts); deletes.add(e); } deleteFromMetaTable(connection, deletes); LOG.info("Deleted " + regionsInfo); } {code} Just change the info log to debug and add a info log about the number of deleted regions. Others suggestions are welcomed~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15409) TestHFileBackedByBucketCache failed randomly on jdk8
Liu Shaohui created HBASE-15409: --- Summary: TestHFileBackedByBucketCache failed randomly on jdk8 Key: HBASE-15409 URL: https://issues.apache.org/jira/browse/HBASE-15409 Project: HBase Issue Type: Test Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor When running the small tests, we found TestHFileBackedByBucketCache failed randomly {code} mvn clean package install -DrunSmallTests -Dtest=TestHFileBackedByBucketCache Running org.apache.hadoop.hbase.io.hfile.TestHFileBackedByBucketCache Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.262 sec <<< FAILURE! - in org.apache.hadoop.hbase.io.hfile.TestHFileBackedByBucketCache testBucketCacheCachesAndPersists(org.apache.hadoop.hbase.io.hfile.TestHFileBackedByBucketCache) Time elapsed: 0.69 sec <<< FAILURE! java.lang.AssertionError: expected:<5> but was:<4> at org.apache.hadoop.hbase.io.hfile.TestHFileBackedByBucketCache.testBucketCacheCachesAndPersists(TestHFileBackedByBucketCache.java:161) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15420) TestCacheConfig failed after HBASE-15338
Liu Shaohui created HBASE-15420: --- Summary: TestCacheConfig failed after HBASE-15338 Key: HBASE-15420 URL: https://issues.apache.org/jira/browse/HBASE-15420 Project: HBase Issue Type: Test Components: test Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0 TestCacheConfig failed after HBASE-15338. Fix it in this issue~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-8675) Two active Hmaster for AUTH_FAILED in secure hbase cluster
Liu Shaohui created HBASE-8675: -- Summary: Two active Hmaster for AUTH_FAILED in secure hbase cluster Key: HBASE-8675 URL: https://issues.apache.org/jira/browse/HBASE-8675 Project: HBase Issue Type: Bug Components: master Reporter: Liu Shaohui Priority: Critical In our product cluster, because of the net problem to kerberos server, the ZooKeeperWatcher in active hmaster fails to Auth , gets a connection Event of AUTH_FAILED and loose the master lock. But the zookeeper watcher ignores the event, so the old active hmaster keeps to be active. After the net problem is fixed, the backup hmaster gets the master lock and becomes active. There are two two active hmasters in the cluster. 2013-05-30 09:44:21,004 ERROR org.apache.zookeeper.client.ZooKeeperSaslClient: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: krb1.xiaomi.net)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. 2013-05-30 09:54:07,755 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x3e10d98be405bc Unable to set watcher on znode /hbase/master org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:123) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1036) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:166) at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:231) at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:76) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.ensureZookeeperTrackers(HConnectionManager.java:595) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:850) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:825) at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:286) at org.apache.hadoop.hbase.client.HTable.(HTable.java:201) at org.apache.hadoop.hbase.catalog.MetaReader.getHTable(MetaReader.java:200) at org.apache.hadoop.hbase.catalog.MetaReader.getMetaHTable(MetaReader.java:226) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:705) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:183) at org.apache.hadoop.hbase.catalog.MetaReader.fullScan(MetaReader.java:168) at org.apache.hadoop.hbase.master.CatalogJanitor.getSplitParents(CatalogJanitor.java:123) at org.apache.hadoop.hbase.master.CatalogJanitor.scan(CatalogJanitor.java:134) at org.apache.hadoop.hbase.master.CatalogJanitor.chore(CatalogJanitor.java:92) at org.apache.hadoop.hbase.Chore.run(Chore.java:67) at java.lang.Thread.run(Thread.java:662) I want to just abort the hmaster server if AuthFailed or SaslAuthenticated. Any better idea about this issue? For ZookeeperWatcher is used in many classes, will the aborting will bring more problems? Any more problems we need consider? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8690) Reduce unnecessary getFileStatus hdfs calls in TTL hfile and hlog cleanners
Liu Shaohui created HBASE-8690: -- Summary: Reduce unnecessary getFileStatus hdfs calls in TTL hfile and hlog cleanners Key: HBASE-8690 URL: https://issues.apache.org/jira/browse/HBASE-8690 Project: HBase Issue Type: Improvement Components: master Reporter: Liu Shaohui Priority: Minor For each in file in archive dir, the TimeToLiveHFileCleaner need call getFileStatus to get the modify time of file. Actually the CleanerChore have had the file status when listing its parent dir. When we set the TTL to 7 days in our cluster for data security, the number of files left in archive dir is up to 65 thousands. In each clean period, TimeToLiveHFileCleaner will generate ten thousand getFileStatus call in a short time, which is very heavy for hdfs namenode. Fix: Change the path param to FileStatus in isFileDeletable method and reduce unnecessary getFileStatus hdfs calls in TTL cleaners. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8707) Add LongComparator for filter
Liu Shaohui created HBASE-8707: -- Summary: Add LongComparator for filter Key: HBASE-8707 URL: https://issues.apache.org/jira/browse/HBASE-8707 Project: HBase Issue Type: New Feature Reporter: Liu Shaohui Priority: Minor Add LongComparator for filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira