[jira] [Updated] (HBASE-4095) Hlog may not be rolled in a long time if checkLowReplication's request of LogRoll is blocked
[ https://issues.apache.org/jira/browse/HBASE-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jieshan Bean updated HBASE-4095: Attachment: HBase-4095-V9-trunk.patch HBase-4095-V9-branch.patch Hlog may not be rolled in a long time if checkLowReplication's request of LogRoll is blocked Key: HBASE-4095 URL: https://issues.apache.org/jira/browse/HBASE-4095 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.3 Reporter: Jieshan Bean Assignee: Jieshan Bean Fix For: 0.90.5 Attachments: HBASE-4095-90-v2.patch, HBASE-4095-90.patch, HBASE-4095-trunk-v2.patch, HBASE-4095-trunk.patch, HBase-4095-V4-Branch.patch, HBase-4095-V5-Branch.patch, HBase-4095-V5-trunk.patch, HBase-4095-V6-branch.patch, HBase-4095-V6-trunk.patch, HBase-4095-V7-branch.patch, HBase-4095-V7-trunk.patch, HBase-4095-V8-branch.patch, HBase-4095-V8-trunk.patch, HBase-4095-V9-branch.patch, HBase-4095-V9-trunk.patch, HlogFileIsVeryLarge.gif, LatestLLTResults-20110810.rar, RelatedLogs2011-07-28.txt, TestResultForPatch-V4.rar, flowChart-IntroductionToThePatch.gif, hbase-root-regionserver-193-195-5-111.rar, surefire-report-V5-trunk.html, surefire-report-branch.html Some large Hlog files(Larger than 10G) appeared in our environment, and I got the reason why they got so huge: 1. The replicas is less than the expect number. So the method of checkLowReplication will be called each sync. 2. The method checkLowReplication request log-roll first, and set logRollRequested as true: {noformat} private void checkLowReplication() { // if the number of replicas in HDFS has fallen below the initial // value, then roll logs. try { int numCurrentReplicas = getLogReplication(); if (numCurrentReplicas != 0 numCurrentReplicas this.initialReplication) { LOG.warn(HDFS pipeline error detected. + Found + numCurrentReplicas + replicas but expecting + this.initialReplication + replicas. + Requesting close of hlog.); requestLogRoll(); logRollRequested = true; } } catch (Exception e) { LOG.warn(Unable to invoke DFSOutputStream.getNumCurrentReplicas + e + still proceeding ahead...); } } {noformat} 3.requestLogRoll() just commit the roll request. It may not execute in time, for it must got the un-fair lock of cacheFlushLock. But the lock may be carried by the cacheflush threads. 4.logRollRequested was true until the log-roll executed. So during the time, each request of log-roll in sync() was skipped. Here's the logs while the problem happened(Please notice the file size of hlog 193-195-5-111%3A20020.1309937386639 in the last row): 2011-07-06 15:28:59,284 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error detected. Found 2 replicas but expecting 3 replicas. Requesting close of hlog. 2011-07-06 15:29:46,714 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119, entries=32434, filesize=239589754. New hlog /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937386639 2011-07-06 15:29:56,929 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline error detected. Found 2 replicas but expecting 3 replicas. Requesting close of hlog. 2011-07-06 15:29:56,933 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed file at hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/.tmp/4656903854447026847 to hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/value/8603005630220380983 2011-07-06 15:29:57,391 INFO org.apache.hadoop.hbase.regionserver.Store: Added hdfs://193.195.5.112:9000/hbase/Htable_UFDR_034/a3780cf0c909d8cf8f8ed618b290cc95/value/8603005630220380983, entries=445880, sequenceid=248900, memsize=207.5m, filesize=130.1m 2011-07-06 15:29:57,478 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished memstore flush of ~207.5m for region Htable_UFDR_034,07664,1309936974158.a3780cf0c909d8cf8f8ed618b290cc95. in 10839ms, sequenceid=248900, compaction requested=false 2011-07-06 15:28:59,236 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309926531955, entries=216459, filesize=2370387468. New hlog /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119 2011-07-06 15:29:46,714 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase/.logs/193-195-5-111,20020,1309922880081/193-195-5-111%3A20020.1309937339119, entries=32434,
[jira] [Updated] (HBASE-4176) Exposing HBase Filters to the Thrift API
[ https://issues.apache.org/jira/browse/HBASE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anirudh Todi updated HBASE-4176: Attachment: book.xml Hi Stack - Thanks for the +1 I worked on book.xml for a while. Attached is what I have so far. Let me know if you feel I'm going in the right direction and I can polish it and finish it off then. Exposing HBase Filters to the Thrift API Key: HBASE-4176 URL: https://issues.apache.org/jira/browse/HBASE-4176 Project: HBase Issue Type: Improvement Components: thrift Reporter: Anirudh Todi Assignee: Anirudh Todi Priority: Minor Attachments: Filter Language (3).xml, Filter Language(2).docx, Filter Language(2).xml, Filter Language(3).docx, Filter Language.docx, HBASE-4176.patch, book.xml Currently, to use any of the filters, one has to explicitly add a scanner for the filter in the Thrift API making it messy and long. With this patch, I am trying to add support for all the filters in a clean way. The user specifies a filter via a string. The string is parsed on the server to construct the filter. More information can be found in the attached document named Filter Language This patch is trying to extend and further the progress made by the patches in the HBASE-1744 JIRA (https://issues.apache.org/jira/browse/HBASE-1744) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4211) Do init-sizing of the StringBuilder making a ServerName.
[ https://issues.apache.org/jira/browse/HBASE-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086191#comment-13086191 ] Hudson commented on HBASE-4211: --- Integrated in HBase-TRUNK #2122 (See [https://builds.apache.org/job/HBase-TRUNK/2122/]) HBASE-4211 Do init-sizing of the StringBuilder making a ServerName stack : Files : * /hbase/trunk/CHANGES.txt * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ServerName.java Do init-sizing of the StringBuilder making a ServerName. Key: HBASE-4211 URL: https://issues.apache.org/jira/browse/HBASE-4211 Project: HBase Issue Type: Bug Reporter: stack Assignee: Benoit Sigoure Priority: Minor Fix For: 0.92.0 Simple patch from BenoƮt. --- .../java/org/apache/hadoop/hbase/ServerName.java |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/src/main/java/org/apache/hadoop/hbase/ServerName.java b/src/main/java/org/apache/hadoop/hbase/ServerName.java index 6b03832..4ddb5b7 100644 --- a/src/main/java/org/apache/hadoop/hbase/ServerName.java +++ b/src/main/java/org/apache/hadoop/hbase/ServerName.java @@ -128,7 +128,8 @@ public class ServerName implements ComparableServerName { * startcode formatted as codelt;hostname ',' lt;port ',' lt;startcode/code */ public static String getServerName(String hostName, int port, long startcode) { -StringBuilder name = new StringBuilder(hostName); +final StringBuilder name = new StringBuilder(hostName.length() + 1 + 5 + 1 + 13); +name.append(hostName); name.append(SERVERNAME_SEPARATOR); name.append(port); name.append(SERVERNAME_SEPARATOR); -- 1.7.6.434.g1d2b3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4212) TestMasterFailover fails occasionally
TestMasterFailover fails occasionally - Key: HBASE-4212 URL: https://issues.apache.org/jira/browse/HBASE-4212 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: gaojinchao Fix For: 0.90.5 It seems a bug. The root in RIT can't be moved.. In the failover process, it enforces root on-line. But not clean zk node. test will wait forever. void processFailover() throws KeeperException, IOException, InterruptedException { // we enforce on-line root. HServerInfo hsi = this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation()); regionOnline(HRegionInfo.FIRST_META_REGIONINFO, hsi); hsi = this.serverManager.getHServerInfo(this.catalogTracker.getRootLocation()); regionOnline(HRegionInfo.ROOT_REGIONINFO, hsi); It seems that we should wait finished as meta region int assignRootAndMeta() throws InterruptedException, IOException, KeeperException { int assigned = 0; long timeout = this.conf.getLong(hbase.catalog.verification.timeout, 1000); // Work on ROOT region. Is it in zk in transition? boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); if (!catalogTracker.verifyRootRegionLocation(timeout)) { this.assignmentManager.assignRoot(); this.catalogTracker.waitForRoot(); //we need add this code and guarantee that the transition has completed this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO); assigned++; } logs: 2011-08-16 07:45:40,715 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,715 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2011-08-16 07:45:40,715 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,716 INFO [PostOpenDeployTasks:70236052] catalog.RootLocationEditor(62): Setting ROOT region location in ZooKeeper as C4S2.site:47710 2011-08-16 07:45:40,716 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052 and set watcher; region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENING 2011-08-16 07:45:40,717 DEBUG [Thread-760-EventThread] master.AssignmentManager(477): Handling transition=RS_ZK_REGION_OPENING, server=C4S2.site,47710,1313495126115, region=70236052/-ROOT- 2011-08-16 07:45:40,725 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(661): regionserver:47710-0x131d2690f780004 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-08-16 07:45:40,727 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKUtil(1109): regionserver:47710-0x131d2690f780004 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052; data=region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENING 2011-08-16 07:45:40,740 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,740 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,740 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-08-16 07:45:40,741 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] handler.OpenRegionHandler(121): Opened -ROOT-,,0.70236052 2011-08-16 07:45:40,741 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052 and set watcher; region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENED 2011-08-16 07:45:40,741 DEBUG [Thread-760-EventThread] master.AssignmentManager(477): Handling transition=RS_ZK_REGION_OPENED, server=C4S2.site,47710,1313495126115, region=70236052/-ROOT- //.It said that zk node can't be cleaned because of we have
[jira] [Updated] (HBASE-4212) TestMasterFailover fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-4212: -- Attachment: HBASE-4212_branch90V1.patch TestMasterFailover fails occasionally - Key: HBASE-4212 URL: https://issues.apache.org/jira/browse/HBASE-4212 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: gaojinchao Fix For: 0.90.5 Attachments: HBASE-4212_branch90V1.patch It seems a bug. The root in RIT can't be moved.. In the failover process, it enforces root on-line. But not clean zk node. test will wait forever. void processFailover() throws KeeperException, IOException, InterruptedException { // we enforce on-line root. HServerInfo hsi = this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation()); regionOnline(HRegionInfo.FIRST_META_REGIONINFO, hsi); hsi = this.serverManager.getHServerInfo(this.catalogTracker.getRootLocation()); regionOnline(HRegionInfo.ROOT_REGIONINFO, hsi); It seems that we should wait finished as meta region int assignRootAndMeta() throws InterruptedException, IOException, KeeperException { int assigned = 0; long timeout = this.conf.getLong(hbase.catalog.verification.timeout, 1000); // Work on ROOT region. Is it in zk in transition? boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); if (!catalogTracker.verifyRootRegionLocation(timeout)) { this.assignmentManager.assignRoot(); this.catalogTracker.waitForRoot(); //we need add this code and guarantee that the transition has completed this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO); assigned++; } logs: 2011-08-16 07:45:40,715 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,715 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2011-08-16 07:45:40,715 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,716 INFO [PostOpenDeployTasks:70236052] catalog.RootLocationEditor(62): Setting ROOT region location in ZooKeeper as C4S2.site:47710 2011-08-16 07:45:40,716 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052 and set watcher; region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENING 2011-08-16 07:45:40,717 DEBUG [Thread-760-EventThread] master.AssignmentManager(477): Handling transition=RS_ZK_REGION_OPENING, server=C4S2.site,47710,1313495126115, region=70236052/-ROOT- 2011-08-16 07:45:40,725 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(661): regionserver:47710-0x131d2690f780004 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-08-16 07:45:40,727 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKUtil(1109): regionserver:47710-0x131d2690f780004 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052; data=region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENING 2011-08-16 07:45:40,740 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,740 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,740 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-08-16 07:45:40,741 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] handler.OpenRegionHandler(121): Opened -ROOT-,,0.70236052 2011-08-16 07:45:40,741 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052 and set watcher;
[jira] [Commented] (HBASE-4212) TestMasterFailover fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086199#comment-13086199 ] gaojinchao commented on HBASE-4212: --- I have made a patch. Please review it. Thanks. TestMasterFailover fails occasionally - Key: HBASE-4212 URL: https://issues.apache.org/jira/browse/HBASE-4212 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: gaojinchao Fix For: 0.90.5 Attachments: HBASE-4212_branch90V1.patch It seems a bug. The root in RIT can't be moved.. In the failover process, it enforces root on-line. But not clean zk node. test will wait forever. void processFailover() throws KeeperException, IOException, InterruptedException { // we enforce on-line root. HServerInfo hsi = this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation()); regionOnline(HRegionInfo.FIRST_META_REGIONINFO, hsi); hsi = this.serverManager.getHServerInfo(this.catalogTracker.getRootLocation()); regionOnline(HRegionInfo.ROOT_REGIONINFO, hsi); It seems that we should wait finished as meta region int assignRootAndMeta() throws InterruptedException, IOException, KeeperException { int assigned = 0; long timeout = this.conf.getLong(hbase.catalog.verification.timeout, 1000); // Work on ROOT region. Is it in zk in transition? boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); if (!catalogTracker.verifyRootRegionLocation(timeout)) { this.assignmentManager.assignRoot(); this.catalogTracker.waitForRoot(); //we need add this code and guarantee that the transition has completed this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO); assigned++; } logs: 2011-08-16 07:45:40,715 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,715 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2011-08-16 07:45:40,715 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,716 INFO [PostOpenDeployTasks:70236052] catalog.RootLocationEditor(62): Setting ROOT region location in ZooKeeper as C4S2.site:47710 2011-08-16 07:45:40,716 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052 and set watcher; region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENING 2011-08-16 07:45:40,717 DEBUG [Thread-760-EventThread] master.AssignmentManager(477): Handling transition=RS_ZK_REGION_OPENING, server=C4S2.site,47710,1313495126115, region=70236052/-ROOT- 2011-08-16 07:45:40,725 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(661): regionserver:47710-0x131d2690f780004 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-08-16 07:45:40,727 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKUtil(1109): regionserver:47710-0x131d2690f780004 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052; data=region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENING 2011-08-16 07:45:40,740 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,740 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,740 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-08-16 07:45:40,741 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] handler.OpenRegionHandler(121): Opened -ROOT-,,0.70236052 2011-08-16 07:45:40,741 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved 52 byte(s) of data from znode
[jira] [Commented] (HBASE-4212) TestMasterFailover fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086202#comment-13086202 ] gaojinchao commented on HBASE-4212: --- I test 10 times and logs said that META is assigned after root has finished. 2011-08-17 05:06:51,419 DEBUG [MASTER_OPEN_REGION-C4S2.site:47578-0] zookeeper.ZKUtil(1109): master:47578-0x131d6fe02e50009 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052; data=region=-ROOT-,,0, server=C4S2.site,60960,1313571996605, state=RS_ZK_REGION_OPENED 2011-08-17 05:06:51,425 DEBUG [Thread-755-EventThread] zookeeper.ZooKeeperWatcher(252): master:47578-0x131d6fe02e50009 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-17 05:06:51,425 DEBUG [MASTER_OPEN_REGION-C4S2.site:47578-0] zookeeper.ZKAssign(420): master:47578-0x131d6fe02e50009 Successfully deleted unassigned node for region 70236052 in expected state RS_ZK_REGION_OPENED 2011-08-17 05:06:51,426 INFO [Master:0;C4S2.site:47578] master.HMaster(437): -ROOT- assigned=1, rit=false, location=C4S2.site:60960 2011-08-17 05:06:51,426 DEBUG [MASTER_OPEN_REGION-C4S2.site:47578-0] handler.OpenedRegionHandler(108): Opened region -ROOT-,,0.70236052 on C4S2.site,60960,1313571996605 2011-08-17 05:06:51,427 DEBUG [Master:0;C4S2.site:47578] zookeeper.ZKUtil(553): master:47578-0x131d6fe02e50009 Unable to get data of znode /hbase/unassigned/1028785192 because node does not exist (not an error) 2011-08-17 05:06:51,429 INFO [Master:0;C4S2.site:47578] catalog.CatalogTracker(421): Passed metaserver is null TestMasterFailover fails occasionally - Key: HBASE-4212 URL: https://issues.apache.org/jira/browse/HBASE-4212 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: gaojinchao Fix For: 0.90.5 Attachments: HBASE-4212_branch90V1.patch It seems a bug. The root in RIT can't be moved.. In the failover process, it enforces root on-line. But not clean zk node. test will wait forever. void processFailover() throws KeeperException, IOException, InterruptedException { // we enforce on-line root. HServerInfo hsi = this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation()); regionOnline(HRegionInfo.FIRST_META_REGIONINFO, hsi); hsi = this.serverManager.getHServerInfo(this.catalogTracker.getRootLocation()); regionOnline(HRegionInfo.ROOT_REGIONINFO, hsi); It seems that we should wait finished as meta region int assignRootAndMeta() throws InterruptedException, IOException, KeeperException { int assigned = 0; long timeout = this.conf.getLong(hbase.catalog.verification.timeout, 1000); // Work on ROOT region. Is it in zk in transition? boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); if (!catalogTracker.verifyRootRegionLocation(timeout)) { this.assignmentManager.assignRoot(); this.catalogTracker.waitForRoot(); //we need add this code and guarantee that the transition has completed this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO); assigned++; } logs: 2011-08-16 07:45:40,715 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,715 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2011-08-16 07:45:40,715 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,716 INFO [PostOpenDeployTasks:70236052] catalog.RootLocationEditor(62): Setting ROOT region location in ZooKeeper as C4S2.site:47710 2011-08-16 07:45:40,716 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052 and set watcher; region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENING 2011-08-16 07:45:40,717 DEBUG [Thread-760-EventThread] master.AssignmentManager(477): Handling transition=RS_ZK_REGION_OPENING, server=C4S2.site,47710,1313495126115, region=70236052/-ROOT- 2011-08-16 07:45:40,725 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(661): regionserver:47710-0x131d2690f780004 Attempting to transition node
[jira] [Updated] (HBASE-4212) TestMasterFailover fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-4212: -- Assignee: gaojinchao Status: Patch Available (was: Open) TestMasterFailover fails occasionally - Key: HBASE-4212 URL: https://issues.apache.org/jira/browse/HBASE-4212 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.5 Attachments: HBASE-4212_branch90V1.patch It seems a bug. The root in RIT can't be moved.. In the failover process, it enforces root on-line. But not clean zk node. test will wait forever. void processFailover() throws KeeperException, IOException, InterruptedException { // we enforce on-line root. HServerInfo hsi = this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation()); regionOnline(HRegionInfo.FIRST_META_REGIONINFO, hsi); hsi = this.serverManager.getHServerInfo(this.catalogTracker.getRootLocation()); regionOnline(HRegionInfo.ROOT_REGIONINFO, hsi); It seems that we should wait finished as meta region int assignRootAndMeta() throws InterruptedException, IOException, KeeperException { int assigned = 0; long timeout = this.conf.getLong(hbase.catalog.verification.timeout, 1000); // Work on ROOT region. Is it in zk in transition? boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); if (!catalogTracker.verifyRootRegionLocation(timeout)) { this.assignmentManager.assignRoot(); this.catalogTracker.waitForRoot(); //we need add this code and guarantee that the transition has completed this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO); assigned++; } logs: 2011-08-16 07:45:40,715 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,715 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2011-08-16 07:45:40,715 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,716 INFO [PostOpenDeployTasks:70236052] catalog.RootLocationEditor(62): Setting ROOT region location in ZooKeeper as C4S2.site:47710 2011-08-16 07:45:40,716 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052 and set watcher; region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENING 2011-08-16 07:45:40,717 DEBUG [Thread-760-EventThread] master.AssignmentManager(477): Handling transition=RS_ZK_REGION_OPENING, server=C4S2.site,47710,1313495126115, region=70236052/-ROOT- 2011-08-16 07:45:40,725 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(661): regionserver:47710-0x131d2690f780004 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-08-16 07:45:40,727 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKUtil(1109): regionserver:47710-0x131d2690f780004 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052; data=region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENING 2011-08-16 07:45:40,740 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,740 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,740 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-08-16 07:45:40,741 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] handler.OpenRegionHandler(121): Opened -ROOT-,,0.70236052 2011-08-16 07:45:40,741 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved 52 byte(s) of data from znode
[jira] [Updated] (HBASE-4212) TestMasterFailover fails occasionally
[ https://issues.apache.org/jira/browse/HBASE-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-4212: -- Attachment: HBASE-4212_TrunkV1.patch TestMasterFailover fails occasionally - Key: HBASE-4212 URL: https://issues.apache.org/jira/browse/HBASE-4212 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.90.4 Reporter: gaojinchao Assignee: gaojinchao Fix For: 0.90.5 Attachments: HBASE-4212_TrunkV1.patch, HBASE-4212_branch90V1.patch It seems a bug. The root in RIT can't be moved.. In the failover process, it enforces root on-line. But not clean zk node. test will wait forever. void processFailover() throws KeeperException, IOException, InterruptedException { // we enforce on-line root. HServerInfo hsi = this.serverManager.getHServerInfo(this.catalogTracker.getMetaLocation()); regionOnline(HRegionInfo.FIRST_META_REGIONINFO, hsi); hsi = this.serverManager.getHServerInfo(this.catalogTracker.getRootLocation()); regionOnline(HRegionInfo.ROOT_REGIONINFO, hsi); It seems that we should wait finished as meta region int assignRootAndMeta() throws InterruptedException, IOException, KeeperException { int assigned = 0; long timeout = this.conf.getLong(hbase.catalog.verification.timeout, 1000); // Work on ROOT region. Is it in zk in transition? boolean rit = this.assignmentManager. processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO); if (!catalogTracker.verifyRootRegionLocation(timeout)) { this.assignmentManager.assignRoot(); this.catalogTracker.waitForRoot(); //we need add this code and guarantee that the transition has completed this.assignmentManager.waitForAssignment(HRegionInfo.ROOT_REGIONINFO); assigned++; } logs: 2011-08-16 07:45:40,715 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,715 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENING 2011-08-16 07:45:40,715 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,716 INFO [PostOpenDeployTasks:70236052] catalog.RootLocationEditor(62): Setting ROOT region location in ZooKeeper as C4S2.site:47710 2011-08-16 07:45:40,716 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052 and set watcher; region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENING 2011-08-16 07:45:40,717 DEBUG [Thread-760-EventThread] master.AssignmentManager(477): Handling transition=RS_ZK_REGION_OPENING, server=C4S2.site,47710,1313495126115, region=70236052/-ROOT- 2011-08-16 07:45:40,725 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(661): regionserver:47710-0x131d2690f780004 Attempting to transition node 70236052/-ROOT- from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-08-16 07:45:40,727 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKUtil(1109): regionserver:47710-0x131d2690f780004 Retrieved 52 byte(s) of data from znode /hbase/unassigned/70236052; data=region=-ROOT-,,0, server=C4S2.site,47710,1313495126115, state=RS_ZK_REGION_OPENING 2011-08-16 07:45:40,740 DEBUG [RegionServer:0;C4S2.site,47710,1313495126115-EventThread] zookeeper.ZooKeeperWatcher(252): regionserver:47710-0x131d2690f780004 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,740 DEBUG [Thread-760-EventThread] zookeeper.ZooKeeperWatcher(252): master:60701-0x131d2690f780009 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052 2011-08-16 07:45:40,740 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] zookeeper.ZKAssign(712): regionserver:47710-0x131d2690f780004 Successfully transitioned node 70236052 from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2011-08-16 07:45:40,741 DEBUG [RS_OPEN_ROOT-C4S2.site,47710,1313495126115-0] handler.OpenRegionHandler(121): Opened -ROOT-,,0.70236052 2011-08-16 07:45:40,741 DEBUG [Thread-760-EventThread] zookeeper.ZKUtil(1109): master:60701-0x131d2690f780009 Retrieved 52 byte(s) of data from znode
[jira] [Updated] (HBASE-4124) ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'.
[ https://issues.apache.org/jira/browse/HBASE-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] gaojinchao updated HBASE-4124: -- Attachment: HBASE-4124_Branch90V1_trial.patch I try to make a patch and fix this issue. But I only run the UT test. Please review it firstly and give me some suggestion. I will test it tomorrow. Thanks. ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'. Key: HBASE-4124 URL: https://issues.apache.org/jira/browse/HBASE-4124 Project: HBase Issue Type: Bug Components: master Reporter: fulin wang Attachments: HBASE-4124_Branch90V1_trial.patch, log.txt Original Estimate: 0.4h Remaining Estimate: 0.4h ZK restarted while assigning a region, new active HM re-assign it but the RS warned 'already online on this server'. Issue: The RS failed besause of 'already online on this server' and return; The HM can not receive the message and report 'Regions in transition timed out'. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4175: -- Attachment: HBASE-4175_2_with catch block.patch Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch, HBASE-4175_1.patch, HBASE-4175_2_with catch block.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4175: -- Status: Open (was: Patch Available) Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch, HBASE-4175_1.patch, HBASE-4175_2_with catch block.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4175: -- Attachment: HBASE-4175_2_without catch block.patch Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch, HBASE-4175_1.patch, HBASE-4175_2_with catch block.patch, HBASE-4175_2_without catch block.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086256#comment-13086256 ] ramkrishna.s.vasudevan commented on HBASE-4175: --- I have submitted two versions. One with catch block and without catch block. Catch block i have mainly used for logging at one place and also limited the number of changes through default apis. Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch, HBASE-4175_1.patch, HBASE-4175_2_with catch block.patch, HBASE-4175_2_without catch block.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4175: -- Status: Patch Available (was: Open) Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch, HBASE-4175_1.patch, HBASE-4175_2_with catch block.patch, HBASE-4175_2_without catch block.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4199) blockCache summary - backend
[ https://issues.apache.org/jira/browse/HBASE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086317#comment-13086317 ] Ted Yu commented on HBASE-4199: --- Please use better table names below: {code} + private static final String TEST_TABLE = testFamily; + private static final String TEST_TABLE2 = testFamily2; {code} Javadoc for BlockCacheSummaryEntry should mention entry: {code} +/** + * Represents a summary of the blockCache by Table and ColumnFamily + * + */ +public class BlockCacheSummaryEntry implements Writable, ComparableBlockCacheSummaryEntry { {code} I think the code below: {code} + bcse = new BlockCacheSummaryEntry(); + bcse.setTable(s[ s.length - 4]); // 4th from the end + bcse.setColumnFamily(s[ s.length - 2]); // 2nd from the end {code} should be replaced with calling the two parameter ctor. The default ctor should be made package private. blockCache summary - backend Key: HBASE-4199 URL: https://issues.apache.org/jira/browse/HBASE-4199 Project: HBase Issue Type: Sub-task Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: java_HBASE_4199.patch, java_HBASE_4199_v2.patch This is the backend work for the blockCache summary. Change to BlockCache interface, Summarization in LruBlockCache, BlockCacheSummaryEntry, addition to HRegionInterface, and HRegionServer. This will NOT include any of the web UI or anything else like that. That is for another sub-task. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)
Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subbu M Iyer updated HBASE-4213: Attachment: HBASE-4213-Instant_schema_change.patch Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Attachments: HBASE-4213-Instant_schema_change.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086325#comment-13086325 ] Ted Yu commented on HBASE-4175: --- +1 on HBASE-4175_2_without catch block.patch Minor comment, the second F in ForceFul should be lower cased: {code} + public void testShouldAllowForceFulCreationOfAlreadyExistingTableDescriptor() {code} Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch, HBASE-4175_1.patch, HBASE-4175_2_with catch block.patch, HBASE-4175_2_without catch block.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086328#comment-13086328 ] Subbu M Iyer commented on HBASE-4213: - As earlier, for some reasons I am not able to attach my patch to the review board. Ted/Stack: Can one of you help me with this? thanks Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Attachments: HBASE-4213-Instant_schema_change.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4203) While master restarts and if the META region's state is OPENING then master cannot assign META until timeout monitor deducts
[ https://issues.apache.org/jira/browse/HBASE-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086332#comment-13086332 ] ramkrishna.s.vasudevan commented on HBASE-4203: --- @Stack, I am planning to implement the same logic that happens in timeoutmonitor when it finds a node in OPENING. -The existing logic takes care of checking if the node had got changed to OPENED or not. If not forces the node to OFFLINE and again starts assignment. So we can also do the same here. Also as per the current changes that am trying out in timeoutmonitor(HBASE-4015) this change can also be incorporated. Or do you want me to submit a seperate patch for this? While master restarts and if the META region's state is OPENING then master cannot assign META until timeout monitor deducts Key: HBASE-4203 URL: https://issues.apache.org/jira/browse/HBASE-4203 Project: HBase Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Priority: Minor 1. Start Master and 2 RS. 2. If any exception happens while opening the META region the state in znode will be OPENING. 3. If at this point the master restarts then the master will start processing the regions in RIT. 4. If the znode is found to be in OPENING then master waits for timeout monitor to deduct and then call opening. 5. If default timeout monitor is configured(180 sec/30 min) then it will take 30 mins to open the META region itself. Soln: Better not to wait for the Timeout monitor period to open catalog tables on Master restart -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1730) Near-instantaneous online schema and table state updates
[ https://issues.apache.org/jira/browse/HBASE-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086333#comment-13086333 ] Subbu M Iyer commented on HBASE-1730: - In continuation of HBASE-451, I was working on a patch for this issue and just realized that there is already a patch submitted for this Jira. I created a related patch HBASE-4213 that follows a slightly different approach to the same problem and thought will any way submit my patch as well. Near-instantaneous online schema and table state updates Key: HBASE-1730 URL: https://issues.apache.org/jira/browse/HBASE-1730 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: stack Priority: Critical Fix For: 0.92.0 Attachments: 1730-v2.patch, 1730-v3.patch, 1730.patch, HBASE-1730.patch We should not need to take a table offline to update HCD or HTD. One option for that is putting HTDs and HCDs up into ZK, with mirror on disk catalog tables to be used only for cold init scenarios, as discussed on IRC. In this scheme, regionservers hosting regions of a table would watch permanent nodes in ZK associated with that table for schema updates and take appropriate actions out of the watcher. In effect, schema updates become another item in the ToDo list. {{/hbase/tables/table-name/schema}} Must be associated with a write locking scheme also handled with ZK primitives to avoid situations where one concurrent update clobbers another. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086356#comment-13086356 ] Ted Yu commented on HBASE-4213: --- I haven't gone through the patch yet. bq. 5. Master will recursively delete the node /hbase/schema/table name, if the number of childrens of /hbase/schema/table name is greater than or equal to current number of active region servers. You meant recursively deleting the children of node /hbase/schema/table name, right ? For region server which joins the cluster after the creation of /hbase/schema/table name, it should be able to find out that it already reads the most recent HTD. Does it create a child node under /hbase/schema/table name ? Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Attachments: HBASE-4213-Instant_schema_change.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086356#comment-13086356 ] Ted Yu edited comment on HBASE-4213 at 8/17/11 2:47 PM: I haven't gone through the patch yet. bq. 5. Master will recursively delete the node /hbase/schema/table name, if the number of childrens of /hbase/schema/table name is greater than or equal to current number of active region servers. You meant recursively deleting the children of node /hbase/schema/table name, right ? For region server which joins the cluster after the creation of /hbase/schema/table name, it should be able to find out that it already reads the most recent HTD. Does it create a child node under /hbase/schema/table name ? was (Author: yuzhih...@gmail.com): I haven't gone through the patch yet. bq. 5. Master will recursively delete the node /hbase/schema/table name, if the number of childrens of /hbase/schema/table name is greater than or equal to current number of active region servers. You meant recursively deleting the children of node /hbase/schema/table name, right ? For region server which joins the cluster after the creation of /hbase/schema/table name, it should be able to find out that it already reads the most recent HTD. Does it create a child node under /hbase/schema/table name ? Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Attachments: HBASE-4213-Instant_schema_change.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4195) Possible inconsistency in a memstore read after a reseek, possible performance improvement
[ https://issues.apache.org/jira/browse/HBASE-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086358#comment-13086358 ] nkeywal commented on HBASE-4195: I can do a simple patch (removing all the code around numIterReseek). However, it would conflict with the patch for HBASE-4188/HBASE-1938. Is it possible for you to commit this one first? Note that I have been able to make this reseek implementation fails as well by adding a Thread.sleep between the search on the two iterators. In other words, there is a race condition somewhere. It could be a conflict with the flush process. I noticed that a flush cannot happen during a put (lock on hregion.update) or a seek (lock on store), but there is nothing to prevent a reseek to take place during the snapshot. But I don't how long it will take to find the real issue behind all this, so a partial fix lowering the probability of having an issue makes sense... Possible inconsistency in a memstore read after a reseek, possible performance improvement -- Key: HBASE-4195 URL: https://issues.apache.org/jira/browse/HBASE-4195 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.4 Environment: all Reporter: nkeywal Priority: Critical This follows the dicussion around HBASE-3855, and the random errors (20% failure on trunk) on the unit test org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting I saw some points related to numIterReseek, used in the MemStoreScanner#getNext (line 690): {noformat}679 protected KeyValue getNext(Iterator it) { 680 KeyValue ret = null; 681 long readPoint = ReadWriteConsistencyControl.getThreadReadPoint(); 682 //DebugPrint.println( MS@ + hashCode() + : threadpoint = + readPoint); 683 684 while (ret == null it.hasNext()) { 685 KeyValue v = it.next(); 686 if (v.getMemstoreTS() = readPoint) { 687 // keep it. 688 ret = v; 689 } 690 numIterReseek--; 691 if (numIterReseek == 0) { 692 break; 693} 694 } 695 return ret; 696 }{noformat} This function is called by seek, reseek, and next. The numIterReseek is only usefull for reseek. There are some issues, I am not totally sure it's the root cause of the test case error, but it could explain partly the randomness of the error, and one point is for sure a bug. 1) In getNext, numIterReseek is decreased, then compared to zero. The seek function sets numIterReseek to zero before calling getNext. It means that the value will be actually negative, hence the test will always fail, and the loop will continue. It is the expected behaviour, but it's quite smart. 2) In reseek, numIterReseek is not set between the loops on the two iterators. If the numIterReseek is equals to zero after the loop on the first one, the loop on the second one will never call seek, as numIterReseek will be negative. 3) Still in reseek, the test to call seek is (kvsetNextRow == null numIterReseek == 0). In other words, if kvsetNextRow is not null when numIterReseek equals zero, numIterReseek will start to be negative at the next iteration and seek will never be called. 4) You can have side effects if reseek ends with a numIterReseek 0: the following calls to the next function will decrease numIterReseek to zero, and getNext will break instead of continuing the loop. As a result, later calls to next() may return null or not depending on how is configured the default value for numIterReseek. To check if the issue comes from point 4, you can set the numIterReseek to zero before returning in reseek: {noformat} numIterReseek = 0; return (kvsetNextRow != null || snapshotNextRow != null); }{noformat} On my env, on trunk, it seems to work, but as it's random I am not really sure. I also had to modify the test (I added a loop) to make it fails more often, the original test was working quite well here. It has to be confirmed that this totally fix (it could be partial or unrelated) org.apache.hadoop.hbase.regionserver.TestHRegion.testWritesWhileGetting before implementing a complete solution. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4206) jenkins hash implementation uses longs unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ron Yang updated HBASE-4206: Status: Patch Available (was: Open) jenkins hash implementation uses longs unnecessarily Key: HBASE-4206 URL: https://issues.apache.org/jira/browse/HBASE-4206 Project: HBase Issue Type: Improvement Components: util Reporter: Ron Yang Priority: Minor I don't believe you need to use long for a,b,c and as a result no longer need to against INT_MASK. At a minimum the private static longs should be made final, and the main method should not print the absolute value of the hash but instead use something like Integer.toHexString -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4206) jenkins hash implementation uses longs unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ron Yang updated HBASE-4206: Attachment: ryang.patch jenkins hash implementation uses longs unnecessarily Key: HBASE-4206 URL: https://issues.apache.org/jira/browse/HBASE-4206 Project: HBase Issue Type: Improvement Components: util Reporter: Ron Yang Priority: Minor Attachments: ryang.patch I don't believe you need to use long for a,b,c and as a result no longer need to against INT_MASK. At a minimum the private static longs should be made final, and the main method should not print the absolute value of the hash but instead use something like Integer.toHexString -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4199) blockCache summary - backend
[ https://issues.apache.org/jira/browse/HBASE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086405#comment-13086405 ] Doug Meil commented on HBASE-4199: -- Thanks Ted. I'll fix the Javadoc and the unit test table constants. I'm not sure I agree about the overloaded constructor. Doing this... {code} bcse = new BlockCacheSummaryEntry( s[ s.length - 4], s[s.length - 2]); {code] ... seems less clear to me. I think the 'setTable' with the comment reminder on why it's being done makes more sense. And doing this... {code} String table = ...; String cf = ...; bcse = new BlockCacheSummaryEntry(table, cf); {code] ... results in basically the same 3 lines of code that exist now. blockCache summary - backend Key: HBASE-4199 URL: https://issues.apache.org/jira/browse/HBASE-4199 Project: HBase Issue Type: Sub-task Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: java_HBASE_4199.patch, java_HBASE_4199_v2.patch This is the backend work for the blockCache summary. Change to BlockCache interface, Summarization in LruBlockCache, BlockCacheSummaryEntry, addition to HRegionInterface, and HRegionServer. This will NOT include any of the web UI or anything else like that. That is for another sub-task. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-4202) Check filesystem permissions on startup
[ https://issues.apache.org/jira/browse/HBASE-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan reassigned HBASE-4202: - Assignee: ramkrishna.s.vasudevan Check filesystem permissions on startup --- Key: HBASE-4202 URL: https://issues.apache.org/jira/browse/HBASE-4202 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.4 Environment: debian squeeze Reporter: Matthias Hofschen Assignee: ramkrishna.s.vasudevan Labels: noob We added a new node to a 44 node cluster starting the datanode, mapred and regionserver processes on it. The Unix filesystem was configured incorrectly, i.e. /tmp was not writable to processes. All three processes had issues with this. Datanode and mapred shutdown on exception. Regionserver did not stop, in fact reported to master that its up without regions. So master assigned regions to it. Regionserver would not accept them, resulting in a constant assign, reject, reassign cycle, that put many regions into a state of not being available. There are no logs about this, but we could observer the regioncount fluctuate by hundredths of regions and the application throwing many NotServingRegion exceptions. In fact to the master process the regionserver looked fine, so it was trying to send regions its way. Regionserver rejected them. So the master/balancer was going into a assign/reassign cycle destabilizing the cluster. Many puts and gets simply failed with NotServingRegionExceptions and took a long time to complete. Exception from regionserver: 2011-08-06 23:57:13,953 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: SyncConnected, type: NodeCreated, path: /hbase/master 2011-08-06 23:57:13,957 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 17.1.0.1:6 that we are up 2011-08-06 23:57:13,957 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 17.1.0.1:6 that we are up 2011-08-07 00:07:39.648::INFO: Logging to STDERR via org.mortbay.log.StdErrLog 2011-08-07 00:07:39.712::INFO: jetty-6.1.14 2011-08-07 00:07:39.742::WARN: tmpdir java.io.IOException: Permission denied at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.checkAndCreate(File.java:1704) at java.io.File.createTempFile(File.java:1792) at java.io.File.createTempFile(File.java:1828) at org.mortbay.jetty.webapp.WebAppContext.getTempDirectory(WebAppContext.java:745) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:458) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at org.mortbay.jetty.Server.doStart(Server.java:222) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.apache.hadoop.http.HttpServer.start(HttpServer.java:461) at org.apache.hadoop.hbase.regionserver.HRegionServer.startServiceThreads(HRegionServer.java:1168) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:792) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:430) at java.lang.Thread.run(Thread.java:619) Exception from datanode: 2011-08-06 23:37:20,444 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075 2011-08-06 23:37:20,444 INFO org.mortbay.log: jetty-6.1.14 2011-08-06 23:37:20,469 WARN org.mortbay.log: tmpdir java.io.IOException: Permission denied at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.checkAndCreate(File.java:1704) at java.io.File.createTempFile(File.java:1792) at java.io.File.createTempFile(File.java:1828) at org.mortbay.jetty.webapp.WebAppContext.getTempDirectory(WebAppContext.java:745) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:458) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at
[jira] [Commented] (HBASE-4199) blockCache summary - backend
[ https://issues.apache.org/jira/browse/HBASE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086421#comment-13086421 ] Ted Yu commented on HBASE-4199: --- Please use curly braces for the code directive. The advantage of using two parameter ctor is that table and column family would be set at the same time, reducing the chance of inconsistency between them now that the two fields carry default values. This is my personal opinion. blockCache summary - backend Key: HBASE-4199 URL: https://issues.apache.org/jira/browse/HBASE-4199 Project: HBase Issue Type: Sub-task Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: java_HBASE_4199.patch, java_HBASE_4199_v2.patch This is the backend work for the blockCache summary. Change to BlockCache interface, Summarization in LruBlockCache, BlockCacheSummaryEntry, addition to HRegionInterface, and HRegionServer. This will NOT include any of the web UI or anything else like that. That is for another sub-task. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4175: -- Status: Open (was: Patch Available) Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch, HBASE-4175_1.patch, HBASE-4175_2_with catch block.patch, HBASE-4175_2_without catch block.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4175: -- Attachment: HBASE-4175_3.patch Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch, HBASE-4175_1.patch, HBASE-4175_2_with catch block.patch, HBASE-4175_2_without catch block.patch, HBASE-4175_3.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086424#comment-13086424 ] ramkrishna.s.vasudevan commented on HBASE-4175: --- @Ted Thanks for your review. Resubmitted the patch with the mentioned change. Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch, HBASE-4175_1.patch, HBASE-4175_2_with catch block.patch, HBASE-4175_2_without catch block.patch, HBASE-4175_3.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-4175: -- Status: Patch Available (was: Open) Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch, HBASE-4175_1.patch, HBASE-4175_2_with catch block.patch, HBASE-4175_2_without catch block.patch, HBASE-4175_3.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4175) Fix FSUtils.createTableDescriptor()
[ https://issues.apache.org/jira/browse/HBASE-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086425#comment-13086425 ] Ted Yu commented on HBASE-4175: --- +1 on HBASE-4175_3.patch Fix FSUtils.createTableDescriptor() --- Key: HBASE-4175 URL: https://issues.apache.org/jira/browse/HBASE-4175 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: Ted Yu Assignee: ramkrishna.s.vasudevan Attachments: HBASE-4175.patch, HBASE-4175_1.patch, HBASE-4175_2_with catch block.patch, HBASE-4175_2_without catch block.patch, HBASE-4175_3.patch Currently createTableDescriptor() doesn't return anything. The caller wouldn't know whether the descriptor is created or not. See exception handling: {code} } catch(IOException ioe) { LOG.info(IOException while trying to create tableInfo in HDFS, ioe); } {code} We should return a boolean. If the table descriptor exists already, maybe we should deserialize from hdfs and compare with htableDescriptor argument. If they differ, I am not sure what the proper action would be. Maybe we can add a boolean argument, force, to createTableDescriptor(). When force is true, existing table descriptor would be overwritten. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4199) blockCache summary - backend
[ https://issues.apache.org/jira/browse/HBASE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-4199: - Attachment: java_HBASE_4199_v3.patch Uploading v3 with all requested changes. blockCache summary - backend Key: HBASE-4199 URL: https://issues.apache.org/jira/browse/HBASE-4199 Project: HBase Issue Type: Sub-task Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: java_HBASE_4199.patch, java_HBASE_4199_v2.patch, java_HBASE_4199_v3.patch This is the backend work for the blockCache summary. Change to BlockCache interface, Summarization in LruBlockCache, BlockCacheSummaryEntry, addition to HRegionInterface, and HRegionServer. This will NOT include any of the web UI or anything else like that. That is for another sub-task. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4214) Per-region request counters should be clearer about scope
Per-region request counters should be clearer about scope - Key: HBASE-4214 URL: https://issues.apache.org/jira/browse/HBASE-4214 Project: HBase Issue Type: Bug Components: metrics, regionserver Affects Versions: 0.92.0 Reporter: Todd Lipcon Fix For: 0.92.0 In testing trunk, I noticed that per-region request counters shown on table.jsp are lifetime-scoped, rather than per-second or some other time range. However, I'm pretty sure they reset when the region is moved. So, it's hard to use them to judge relative hotness of regions from the web UI without hooking it up to something lik OpenTSDB -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-4213: -- Fix Version/s: 0.92.0 Whatever the resolution this should have the same fix version as HBASE-1730 Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: HBASE-4213-Instant_schema_change.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4206) jenkins hash implementation uses longs unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086471#comment-13086471 ] Andrew Purtell commented on HBASE-4206: --- I'm curious if there are before and after microbenchmarks? jenkins hash implementation uses longs unnecessarily Key: HBASE-4206 URL: https://issues.apache.org/jira/browse/HBASE-4206 Project: HBase Issue Type: Improvement Components: util Reporter: Ron Yang Priority: Minor Attachments: ryang.patch I don't believe you need to use long for a,b,c and as a result no longer need to against INT_MASK. At a minimum the private static longs should be made final, and the main method should not print the absolute value of the hash but instead use something like Integer.toHexString -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4216) IllegalArgumentException prefetching from META
IllegalArgumentException prefetching from META -- Key: HBASE-4216 URL: https://issues.apache.org/jira/browse/HBASE-4216 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Received one of these while doing a YCSB test on 26 nodes on trunk: java.io.IOException: java.lang.IllegalArgumentException: hostname can't be null -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4216) IllegalArgumentException prefetching from META
[ https://issues.apache.org/jira/browse/HBASE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086476#comment-13086476 ] Todd Lipcon commented on HBASE-4216: {noformat} 11/08/17 10:44:59 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: java.io.IOException: java.lang.IllegalArgumentException: hostname can't be null at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$2.processRow(HConnectionManager.java:822) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:212) at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130) at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127) at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:341) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:828) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:882) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:770) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:740) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:659) at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:70) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1238) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:612) at com.yahoo.ycsb.db.HBaseClient.read(HBaseClient.java:160) at com.yahoo.ycsb.DBWrapper.read(DBWrapper.java:86) at com.yahoo.ycsb.workloads.CoreWorkload.doTransactionRead(CoreWorkload.java:444) at com.yahoo.ycsb.workloads.CoreWorkload.doTransaction(CoreWorkload.java:391) at com.yahoo.ycsb.ClientTask.call(ClientTask.java:47) at com.yahoo.ycsb.RateLimiter.call(RateLimiter.java:53) at com.yahoo.ycsb.RateLimiter.call(RateLimiter.java:13) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.IllegalArgumentException: hostname can't be null at java.net.InetSocketAddress.init(InetSocketAddress.java:121) at org.apache.hadoop.hbase.HServerAddress.getResolvedAddress(HServerAddress.java:89) at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:75) at org.apache.hadoop.hbase.HRegionLocation.getServerAddress(HRegionLocation.java:101) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.cacheLocation(HConnectionManager.java:1123) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.access$000(HConnectionManager.java:439) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$2.processRow(HConnectionManager.java:818) ... 27 more {noformat} Seems like a race since it went away IllegalArgumentException prefetching from META -- Key: HBASE-4216 URL: https://issues.apache.org/jira/browse/HBASE-4216 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Received one of these while doing a YCSB test on 26 nodes on trunk: java.io.IOException: java.lang.IllegalArgumentException: hostname can't be null -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4217) HRS.closeRegion should be able to close regions with only the encoded name
HRS.closeRegion should be able to close regions with only the encoded name -- Key: HBASE-4217 URL: https://issues.apache.org/jira/browse/HBASE-4217 Project: HBase Issue Type: Improvement Affects Versions: 0.90.4 Reporter: Jean-Daniel Cryans Fix For: 0.92.0 We had some sort of an outage this morning due to a few racks losing power, and some regions were left in the following state: ERROR: Region UNKNOWN_REGION on sv4r17s9:60020, key=e32bbe1f48c9b3633c557dc0291b90a3, not on HDFS or in META but deployed on sv4r17s9:60020 That region was deleted by the master but the region server never got the memo. Right now there's no way to force close it because HRS.closeRegion requires an HRI and the only way to create one is to get it from .META. which in our case doesn't contain a row for that region. Basically we have to wait until that server is dead to get rid of the region and make hbck happy. The required change is to have closeRegion accept an encoded name in both HBA (when the RS address is provided) and HRS since it's able to find it anyways from it's list of live regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4176) Exposing HBase Filters to the Thrift API
[ https://issues.apache.org/jira/browse/HBASE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086486#comment-13086486 ] stack commented on HBASE-4176: -- Anirudh. Keep going. It looks great. Run xmllint every so often to ensure you haven't damaged well-formedness. Run 'mvn -DskipTests site' to actually generate the doc. to look at it. Great stuff. Attach a patch only when done... and then I'll commit the whole shebang. Good stuff. Exposing HBase Filters to the Thrift API Key: HBASE-4176 URL: https://issues.apache.org/jira/browse/HBASE-4176 Project: HBase Issue Type: Improvement Components: thrift Reporter: Anirudh Todi Assignee: Anirudh Todi Priority: Minor Attachments: Filter Language (3).xml, Filter Language(2).docx, Filter Language(2).xml, Filter Language(3).docx, Filter Language.docx, HBASE-4176.patch, book.xml Currently, to use any of the filters, one has to explicitly add a scanner for the filter in the Thrift API making it messy and long. With this patch, I am trying to add support for all the filters in a clean way. The user specifies a filter via a string. The string is parsed on the server to construct the filter. More information can be found in the attached document named Filter Language This patch is trying to extend and further the progress made by the patches in the HBASE-1744 JIRA (https://issues.apache.org/jira/browse/HBASE-1744) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4176) Exposing HBase Filters to the Thrift API
[ https://issues.apache.org/jira/browse/HBASE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086501#comment-13086501 ] Anirudh Todi commented on HBASE-4176: - Great! I can continue working on it. If it's okay with you - can we go ahead and commit the code? And add this when I finish the book in a separate commit? They seem fairly unrelated. Exposing HBase Filters to the Thrift API Key: HBASE-4176 URL: https://issues.apache.org/jira/browse/HBASE-4176 Project: HBase Issue Type: Improvement Components: thrift Reporter: Anirudh Todi Assignee: Anirudh Todi Priority: Minor Attachments: Filter Language (3).xml, Filter Language(2).docx, Filter Language(2).xml, Filter Language(3).docx, Filter Language.docx, HBASE-4176.patch, book.xml Currently, to use any of the filters, one has to explicitly add a scanner for the filter in the Thrift API making it messy and long. With this patch, I am trying to add support for all the filters in a clean way. The user specifies a filter via a string. The string is parsed on the server to construct the filter. More information can be found in the attached document named Filter Language This patch is trying to extend and further the progress made by the patches in the HBASE-1744 JIRA (https://issues.apache.org/jira/browse/HBASE-1744) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4209) The HBase hbase-daemon.sh SIGKILLs master when stopping it
[ https://issues.apache.org/jira/browse/HBASE-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086513#comment-13086513 ] Roman Shaposhnik commented on HBASE-4209: - stack, before I submit the patch, I would really appreciate if you could let me know the relationship between bin/stop-hbase.sh and bin/hbase-daemon.sh. I was under the impression that whatever stop-hbase.sh triggers in the master code would also be triggered by JVM shutdown hook upon receiving SIGTERM, but it doesn't seem to be that way. Do we have to call bin/stop-hbase.sh manually from within the hbase-daemon.sh before stopping daemons? The HBase hbase-daemon.sh SIGKILLs master when stopping it -- Key: HBASE-4209 URL: https://issues.apache.org/jira/browse/HBASE-4209 Project: HBase Issue Type: Bug Components: master Reporter: Roman Shaposhnik There's a bit of code in hbase-daemon.sh that makes HBase master being SIGKILLed when stopping it rather than trying SIGTERM (like it does for other daemons). When HBase is executed in a standalone mode (and the only daemon you need to run is master) that causes newly created tables to go missing as unflushed data is thrown out. If there was not a good reason to kill master with SIGKILL perhaps we can take that special case out and rely on SIGTERM. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4206) jenkins hash implementation uses longs unnecessarily
[ https://issues.apache.org/jira/browse/HBASE-4206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086526#comment-13086526 ] Ron Yang commented on HBASE-4206: - Seems about 35% faster on my MBP core i7 osx 10.6: java version 1.6.0_26 Java(TM) SE Runtime Environment (build 1.6.0_26-b03-384-10M3425) Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02-384, mixed mode) 0% Scenario{vm=java, trial=0, benchmark=JenkinsOld, len=5} 29.96 ns; ?=0.45 ns @ 10 trials 6% Scenario{vm=java, trial=0, benchmark=JenkinsNew, len=5} 15.03 ns; ?=0.13 ns @ 3 trials 13% Scenario{vm=java, trial=0, benchmark=JenkinsOld, len=10} 32.73 ns; ?=0.06 ns @ 3 trials 19% Scenario{vm=java, trial=0, benchmark=JenkinsNew, len=10} 17.75 ns; ?=0.04 ns @ 3 trials 25% Scenario{vm=java, trial=0, benchmark=JenkinsOld, len=15} 55.01 ns; ?=0.20 ns @ 3 trials 31% Scenario{vm=java, trial=0, benchmark=JenkinsNew, len=15} 26.48 ns; ?=0.26 ns @ 3 trials 38% Scenario{vm=java, trial=0, benchmark=JenkinsOld, len=20} 59.97 ns; ?=0.17 ns @ 3 trials 44% Scenario{vm=java, trial=0, benchmark=JenkinsNew, len=20} 29.21 ns; ?=0.12 ns @ 3 trials 50% Scenario{vm=java, trial=0, benchmark=JenkinsOld, len=500} 1103.94 ns; ?=5.87 ns @ 3 trials 56% Scenario{vm=java, trial=0, benchmark=JenkinsNew, len=500} 710.87 ns; ?=0.73 ns @ 3 trials 63% Scenario{vm=java, trial=0, benchmark=JenkinsOld, len=1000} 2206.56 ns; ?=5.04 ns @ 3 trials 69% Scenario{vm=java, trial=0, benchmark=JenkinsNew, len=1000} 1400.48 ns; ?=5.44 ns @ 3 trials 75% Scenario{vm=java, trial=0, benchmark=JenkinsOld, len=1} 21632.90 ns; ?=38.49 ns @ 3 trials 81% Scenario{vm=java, trial=0, benchmark=JenkinsNew, len=1} 13975.43 ns; ?=65.42 ns @ 3 trials 88% Scenario{vm=java, trial=0, benchmark=JenkinsOld, len=10} 216426.33 ns; ?=1378.41 ns @ 3 trials 94% Scenario{vm=java, trial=0, benchmark=JenkinsNew, len=10} 139348.44 ns; ?=594.38 ns @ 3 trials len benchmark ns linear runtime 5 JenkinsOld 30.0 = 5 JenkinsNew 15.0 = 10 JenkinsOld 32.7 = 10 JenkinsNew 17.7 = 15 JenkinsOld 55.0 = 15 JenkinsNew 26.5 = 20 JenkinsOld 60.0 = 20 JenkinsNew 29.2 = 500 JenkinsOld 1103.9 = 500 JenkinsNew710.9 = 1000 JenkinsOld 2206.6 = 1000 JenkinsNew 1400.5 = 1 JenkinsOld 21632.9 == 1 JenkinsNew 13975.4 = 10 JenkinsOld 216426.3 == 10 JenkinsNew 139348.4 === Caliper benchmark source: public static class Benchmark6 extends SimpleBenchmark { @Param({5, 10, 15, 20, 500, 1000, 1, 10}) int len; byte[] bs; @Override protected void setUp() { Random r = new Random(); bs = new byte[len]; r.nextBytes(bs); } public boolean timeJenkinsOld(int reps) { int h = 0; for (int x = 0; x reps; x++) { h += JenkinsHashOld.hash(bs, h); } return true; } public boolean timeJenkinsNew(int reps) { int h = 0; JenkinsHashNew jh = new JenkinsHashNew(); for (int x = 0; x reps; x++) { h += jh.hash(bs, 0, len, h); } return true; } } jenkins hash implementation uses longs unnecessarily Key: HBASE-4206 URL: https://issues.apache.org/jira/browse/HBASE-4206 Project: HBase Issue Type: Improvement Components: util Reporter: Ron Yang Priority: Minor Attachments: ryang.patch I don't believe you need to use long for a,b,c and as a result no longer need to against INT_MASK. At a minimum the private static longs should be made final, and the main method should not print the absolute value of the hash but instead use something like Integer.toHexString -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Reporter: Jacek Migdal A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086545#comment-13086545 ] Ted Yu commented on HBASE-4218: --- bq. Moreover, it should allow far more efficient seeking which should improve performance a bit. Can performance improvement be quantified ? Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Reporter: Jacek Migdal Labels: compression A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086556#comment-13086556 ] Jacek Migdal commented on HBASE-4218: - Yes, I plan to measure seek performance within one block. I haven't implement it yet, but I rather expect that it will make seeking and decompressing KeyValues as fast as operating on uncompressed bytes. The primary goal is to save memory in buffers. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Reporter: Jacek Migdal Labels: compression A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086574#comment-13086574 ] Matt Corgan commented on HBASE-4218: Sorry I haven't chimed in on this in a while, but I've made significant progress implementing some of the ideas I mentioned in the discussion you linked to. Taking a sorted ListKeyValue, converting to a compressed byte[], and then providing fast mechanisms for reading the byte[] back to KeyValues. It should work for block indexes and data blocks. I don't think I'll be able to do the full integration into HBase, but I'm trying to get the code to a point where it's well designed, tested, and easy (possible) to start working in to the code base. I'll try to get it on github in the next couple weeks. I wish I could dedicate more time, but it's been a nights/weekends project. Here's a quick storage format overview. Class names begin with Pt for Prefix Trie. A block of KeyValues gets converted to a byte[] composed of 5 sections: 1) PtBlockMeta stores some offsets into the block, the width of some byte-encoded integers, etc.. http://pastebin.com/iizJz3f4 2) PtRowNodes are the bulk of the complexity. They store a trie structure for rebuilding the row keys in the block. Each Leaf node has a list of offsets that point to the corresponding columns, timestamps, and data offsets/lengths in that row. The row data is structured for efficient sequential iteration and/or individual row lookups. http://pastebin.com/cb79N0Ge 3) PtColNodes store a trie structure that provides random access to column qualifiers. A PtRowNode points at one of these and it traverses its parents backwards through the trie to rebuild the full column qualifier. Important for wide rows. http://pastebin.com/7rsq7epp 4) TimestampDeltas are byte-encoded deltas from the minimum timestamp in the block. The PtRowNodes contain pointers to these deltas. The width of all deltas is determined by the longest one. Supports having all timestamps equal to the minTimestamp resulting in zero storage cost. 5) A data section made of all data values concatenated together. The PtRowNodes contain the offsets/lengths. My first priority is getting the storage format right. Then optimizing the read path. Then the write path. I'd love to hear any comments, and will continue to work on getting the full code ready. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Reporter: Jacek Migdal Labels: compression A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4219) Add Per-Column Family Metrics
Add Per-Column Family Metrics - Key: HBASE-4219 URL: https://issues.apache.org/jira/browse/HBASE-4219 Project: HBase Issue Type: New Feature Reporter: Nicolas Spiegelberg Assignee: David Goode Fix For: 0.92.0 Right now, we have region server level statistics. However, the read/write flow varies a lot based on the column family involved. We should add dynamic, per column family metrics to JMX so we can track each column family individually. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4176) Exposing HBase Filters to the Thrift API
[ https://issues.apache.org/jira/browse/HBASE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anirudh Todi updated HBASE-4176: Attachment: book2.html book2.xml Attaching book2.xml and book2.html - containing my Filter Language document Exposing HBase Filters to the Thrift API Key: HBASE-4176 URL: https://issues.apache.org/jira/browse/HBASE-4176 Project: HBase Issue Type: Improvement Components: thrift Reporter: Anirudh Todi Assignee: Anirudh Todi Priority: Minor Attachments: Filter Language (3).xml, Filter Language(2).docx, Filter Language(2).xml, Filter Language(3).docx, Filter Language.docx, HBASE-4176.patch, book.xml, book2.html, book2.xml Currently, to use any of the filters, one has to explicitly add a scanner for the filter in the Thrift API making it messy and long. With this patch, I am trying to add support for all the filters in a clean way. The user specifies a filter via a string. The string is parsed on the server to construct the filter. More information can be found in the attached document named Filter Language This patch is trying to extend and further the progress made by the patches in the HBASE-1744 JIRA (https://issues.apache.org/jira/browse/HBASE-1744) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086626#comment-13086626 ] Ted Yu commented on HBASE-4213: --- In TableEventHandler.java, Exception handling should be enhanced: {code} + } catch (KeeperException e) { +LOG.warn(Instant schema change failed for table + tableName ); + } {code} Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: HBASE-4213-Instant_schema_change.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-2321) Support RPC interface changes at runtime
[ https://issues.apache.org/jira/browse/HBASE-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoit Sigoure updated HBASE-2321: -- Hadoop Flags: [Incompatible change, Reviewed] (was: [Reviewed]) This breaks RPC compatibility. Support RPC interface changes at runtime Key: HBASE-2321 URL: https://issues.apache.org/jira/browse/HBASE-2321 Project: HBase Issue Type: Improvement Components: coprocessors Reporter: Andrew Purtell Assignee: Gary Helmling Fix For: 0.92.0 Now we are able to append methods to interfaces without breaking RPC compatibility with earlier releases. However there is no way that I am aware of to dynamically add entire new RPC interfaces. Methods/parameters are fixed to the class used to instantiate the server at that time. Coprocessors need this. They will extend functionality on regions in arbitrary ways. How to support that on the client side? A couple of options: 1. New RPC from scratch. 2. Modify HBaseServer such that multiple interface objects can be used for reflection and objects can be added or removed at runtime. 3. Have the coprocessor host instantiate new HBaseServer instances on ephemeral ports and publish the endpoints to clients via Zookeeper. Couple this with a small modification to HBaseServer to support elastic thread pools to minimize the number of threads that might be kept around in the JVM. 4. Add a generic method to HRegionInterface, an ioctl-like construction, which accepts a ImmutableBytesWritable key and an array of Writable as parameters. My opinion is we should opt for #4 as it is the simplest and most expedient approach. I could also do #3 if consensus prefers. Really we should do #1 but it's not clear who has the time for that at the moment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086634#comment-13086634 ] Matt Corgan commented on HBASE-4218: That sounds great Jacek. Let me know how to get the interfaces, tests, and benchmarks when you're ready to share them. They would be really helpful. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Reporter: Jacek Migdal Labels: compression A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4220) Lots of DNS queries from client
[ https://issues.apache.org/jira/browse/HBASE-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086643#comment-13086643 ] Todd Lipcon commented on HBASE-4220: I managed to snag the following stack trace: {noformat} pool-1-thread-1 prio=10 tid=0x2aac380e4800 nid=0x7797 runnable [0x40d4d000] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) - locked 0x2aabb74d06b8 (a java.lang.NumberFormatException) at java.lang.Throwable.init(Throwable.java:196) at java.lang.Exception.init(Exception.java:41) at java.lang.RuntimeException.init(RuntimeException.java:43) at java.lang.IllegalArgumentException.init(IllegalArgumentException.java:36) at java.lang.NumberFormatException.init(NumberFormatException.java:38) at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:449) at java.lang.Integer.parseInt(Integer.java:499) at sun.net.util.IPAddressUtil.textToNumericFormatV4(IPAddressUtil.java:94) at java.net.InetAddress.getAllByName(InetAddress.java:1051) at java.net.InetAddress.getAllByName(InetAddress.java:1020) at java.net.InetAddress.getByName(InetAddress.java:970) at java.net.InetSocketAddress.init(InetSocketAddress.java:124) at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:75) at org.apache.hadoop.hbase.HRegionLocation.getServerAddress(HRegionLocation.java:101) at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:71) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1238) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:612) {noformat} Lots of DNS queries from client --- Key: HBASE-4220 URL: https://issues.apache.org/jira/browse/HBASE-4220 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 In running a YCSB workload, I managed to DDOS a DNS server since it seems to be flooding lots of DNS requests. Installing nscd on the client machines improved throughput by a factor of 6 and stopped killing the server. These are long-running clients, so it's not clear why we do so many lookups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4220) Lots of DNS queries from client
Lots of DNS queries from client --- Key: HBASE-4220 URL: https://issues.apache.org/jira/browse/HBASE-4220 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 In running a YCSB workload, I managed to DDOS a DNS server since it seems to be flooding lots of DNS requests. Installing nscd on the client machines improved throughput by a factor of 6 and stopped killing the server. These are long-running clients, so it's not clear why we do so many lookups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086645#comment-13086645 ] Ted Yu commented on HBASE-4213: --- In HRegionServer.java, the following method would always return true if there is no intervening IOException: {code} + public boolean refreshSchema(byte[] tableName) throws IOException { {code} I wonder if the return type should be void. Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: HBASE-4213-Instant_schema_change.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086651#comment-13086651 ] Ted Yu commented on HBASE-4213: --- In EventHandler.java, should isSchemaChangeEvent() include C_M_DELETE_FAMILY as well ? Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: HBASE-4213-Instant_schema_change.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086650#comment-13086650 ] Jacek Migdal commented on HBASE-4218: - So far the implemented interface looks like: {noformat} /** * Fast compression of KeyValue. It aims to be fast and efficient * using assumptions: * - the KeyValue are stored sorted by key * - we know the structure of KeyValue * - the values are iterated always forward from beginning of block * - application specific knowledge * * It is designed to work fast enough to be feasible as in memory compression. */ public interface DeltaEncoder { /** * Compress KeyValues and write them to output buffer. * @param writeHere Where to write compressed data. * @param rawKeyValues Source of KeyValue for compression. * @throws IOException If there is an error in writeHere. */ public void compressKeyValue(OutputStream writeHere, ByteBuffer rawKeyValues) throws IOException; /** * Uncompress assuming that original size is known. * @param source Compressed stream of KeyValues. * @param decompressedSize Size in bytes of uncompressed KeyValues. * @return Uncompressed block of KeyValues. * @throws IOException If there is an error in source. * @throws DeltaEncoderToSmallBufferException If specified uncompressed *size is too small. */ public ByteBuffer uncompressKeyValue(DataInputStream source, int decompressedSize) throws IOException, DeltaEncoderToSmallBufferException; } {noformat} I also need some kind of interface for iterating and seeking. I haven't got it yet but would like to have something like: {noformat} public IteratorKeyValue getIterator(ByteBuffer encodedKeyValues); public IteratorKeyValue getIteratorStartingFrom(ByteBuffer encodedKeyValues, byte[] keyBuffer, int offset, int length); {noformat} For me it would work, but for you I might have changing it to something like: {noformat} public EncodingIterator getState(ByteBuffer encodedKeyValues); class EncodingIterator implements IteratorKeyValue { ... public void seekToBeginning(); public void seekTo(byte[] keyBuffer, int offset, int length); {noformat} I will figure out how we could share the code. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Reporter: Jacek Migdal Labels: compression A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4220) Lots of DNS queries from client
[ https://issues.apache.org/jira/browse/HBASE-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086654#comment-13086654 ] Todd Lipcon commented on HBASE-4220: I think the constructor of HRegionLocation should create and cache the HServerAddress Lots of DNS queries from client --- Key: HBASE-4220 URL: https://issues.apache.org/jira/browse/HBASE-4220 Project: HBase Issue Type: Bug Components: client Affects Versions: 0.92.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.92.0 In running a YCSB workload, I managed to DDOS a DNS server since it seems to be flooding lots of DNS requests. Installing nscd on the client machines improved throughput by a factor of 6 and stopped killing the server. These are long-running clients, so it's not clear why we do so many lookups. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4221) Changes necessary to build and run against Hadoop 0.23
Changes necessary to build and run against Hadoop 0.23 -- Key: HBASE-4221 URL: https://issues.apache.org/jira/browse/HBASE-4221 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.92.0 A few modifications necessary to run against today's trunk: - copy-paste VersionedProtocol into the hbase IPC package - upgrade protobufs to 2.4.0a - fix one of the tests in TestHFileOutputFormat for new TaskAttemptContext API - remove illegal accesses to private members of FSNamesystem in tests (use reflection) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3917) Separate the Avro schema definition file from the code
[ https://issues.apache.org/jira/browse/HBASE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-3917: --- Attachment: 0001-HBASE-3917.-Separate-the-Avro-schema-definition-file.patch Separate the Avro schema definition file from the code -- Key: HBASE-3917 URL: https://issues.apache.org/jira/browse/HBASE-3917 Project: HBase Issue Type: Improvement Components: avro Affects Versions: 0.90.3 Reporter: Lars George Priority: Trivial Labels: noob Fix For: 0.90.5 Attachments: 0001-HBASE-3917.-Separate-the-Avro-schema-definition-file.patch The Avro schema files are in the src/main/java path, but should be in /src/main/resources just like the Hbase.thrift is. Makes the separation the same and cleaner. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-3917) Separate the Avro schema definition file from the code
[ https://issues.apache.org/jira/browse/HBASE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman reassigned HBASE-3917: -- Assignee: Alex Newman Separate the Avro schema definition file from the code -- Key: HBASE-3917 URL: https://issues.apache.org/jira/browse/HBASE-3917 Project: HBase Issue Type: Improvement Components: avro Affects Versions: 0.90.3 Reporter: Lars George Assignee: Alex Newman Priority: Trivial Labels: noob Fix For: 0.90.5 Attachments: 0001-HBASE-3917.-Separate-the-Avro-schema-definition-file.patch The Avro schema files are in the src/main/java path, but should be in /src/main/resources just like the Hbase.thrift is. Makes the separation the same and cleaner. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3917) Separate the Avro schema definition file from the code
[ https://issues.apache.org/jira/browse/HBASE-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated HBASE-3917: --- Status: Patch Available (was: Open) Separate the Avro schema definition file from the code -- Key: HBASE-3917 URL: https://issues.apache.org/jira/browse/HBASE-3917 Project: HBase Issue Type: Improvement Components: avro Affects Versions: 0.90.3 Reporter: Lars George Priority: Trivial Labels: noob Fix For: 0.90.5 Attachments: 0001-HBASE-3917.-Separate-the-Avro-schema-definition-file.patch The Avro schema files are in the src/main/java path, but should be in /src/main/resources just like the Hbase.thrift is. Makes the separation the same and cleaner. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-2750) Add sanity check for system configs in hbase-daemon wrapper
[ https://issues.apache.org/jira/browse/HBASE-2750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086687#comment-13086687 ] Alex Newman commented on HBASE-2750: I assume it should only prevent the regionserver/master daemons from starting? Add sanity check for system configs in hbase-daemon wrapper --- Key: HBASE-2750 URL: https://issues.apache.org/jira/browse/HBASE-2750 Project: HBase Issue Type: New Feature Components: scripts Affects Versions: 0.90.0 Reporter: Todd Lipcon Priority: Minor Labels: noob We should add a config variable like MIN_ULIMIT_TO_START in hbase-env.sh. If the daemon script finds ulimit this value, it will print a warning and refuse to start. We can make the default set to 0 so that this doesn't affect non-production clusters, but in the tuning guide recommend that people change it to the expected ulimit. (I've seen it happen all the time where people configure ulimit on some nodes, add a new node to the cluster, and forgot to re-tune it on the new one, and then that new one borks the whole cluster when it joins) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4221) Changes necessary to build and run against Hadoop 0.23
[ https://issues.apache.org/jira/browse/HBASE-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086703#comment-13086703 ] Andrew Purtell commented on HBASE-4221: --- bq. This patch doesn't include the protobuf update - talking with Andrew and Gary about updating our protobufs to 2.4.0a to match Hadoop's. We can just up the protobuf dep for REST to the latest and regenerate. This use is quite self contained. Changes necessary to build and run against Hadoop 0.23 -- Key: HBASE-4221 URL: https://issues.apache.org/jira/browse/HBASE-4221 Project: HBase Issue Type: Improvement Affects Versions: 0.92.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.92.0 Attachments: hbase-4221.txt A few modifications necessary to run against today's trunk: - copy-paste VersionedProtocol into the hbase IPC package - upgrade protobufs to 2.4.0a - fix one of the tests in TestHFileOutputFormat for new TaskAttemptContext API - remove illegal accesses to private members of FSNamesystem in tests (use reflection) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086704#comment-13086704 ] Matt Corgan commented on HBASE-4218: I should be able to work with ByteBuffer as the backing block data. Like you said above, we'll have to work on smarter iterators and comparators that can do most things without instantiating a full KeyValue in it's current form. Sounds like it will be a longer term project to make KeyValue into a more flexible interface, so in the mean time there will be places it has to cut a full KeyValue by copying bytes. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Reporter: Jacek Migdal Labels: compression A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4176) Exposing HBase Filters to the Thrift API
[ https://issues.apache.org/jira/browse/HBASE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086707#comment-13086707 ] stack commented on HBASE-4176: -- Can you add the patch as a diff against src/docbk/book.xml? Seems like there are a bunch of changes outside of the scope of your filter addition (and your book.html is missing stuff added recently). Thanks Anirudh. Exposing HBase Filters to the Thrift API Key: HBASE-4176 URL: https://issues.apache.org/jira/browse/HBASE-4176 Project: HBase Issue Type: Improvement Components: thrift Reporter: Anirudh Todi Assignee: Anirudh Todi Priority: Minor Attachments: Filter Language (3).xml, Filter Language(2).docx, Filter Language(2).xml, Filter Language(3).docx, Filter Language.docx, HBASE-4176.patch, book.xml, book2.html, book2.xml Currently, to use any of the filters, one has to explicitly add a scanner for the filter in the Thrift API making it messy and long. With this patch, I am trying to add support for all the filters in a clean way. The user specifies a filter via a string. The string is parsed on the server to construct the filter. More information can be found in the attached document named Filter Language This patch is trying to extend and further the progress made by the patches in the HBASE-1744 JIRA (https://issues.apache.org/jira/browse/HBASE-1744) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)
[ https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086708#comment-13086708 ] Jonathan Gray commented on HBASE-4218: -- bq. in the mean time there will be places it has to cut a full KeyValue by copying bytes Agreed. There's some other work going on around slab allocators and object reuse that could be paired with this to ameliorate some of that overhead. Delta Encoding of KeyValues (aka prefix compression) - Key: HBASE-4218 URL: https://issues.apache.org/jira/browse/HBASE-4218 Project: HBase Issue Type: Improvement Components: io Reporter: Jacek Migdal Labels: compression A compression for keys. Keys are sorted in HFile and they are usually very similar. Because of that, it is possible to design better compression than general purpose algorithms, It is an additional step designed to be used in memory. It aims to save memory in cache as well as speeding seeks within HFileBlocks. It should improve performance a lot, if key lengths are larger than value lengths. For example, it makes a lot of sense to use it when value is a counter. Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) shows that I could achieve decent level of compression: key compression ratio: 92% total compression ratio: 85% LZO on the same data: 85% LZO after delta encoding: 91% While having much better performance (20-80% faster decompression ratio than LZO). Moreover, it should allow far more efficient seeking which should improve performance a bit. It seems that a simple compression algorithms are good enough. Most of the savings are due to prefix compression, int128 encoding, timestamp diffs and bitfields to avoid duplication. That way, comparisons of compressed data can be much faster than a byte comparator (thanks to prefix compression and bitfields). In order to implement it in HBase two important changes in design will be needed: -solidify interface to HFileBlock / HFileReader Scanner to provide seeking and iterating; access to uncompressed buffer in HFileBlock will have bad performance -extend comparators to support comparison assuming that N first bytes are equal (or some fields are equal) Link to a discussion about something similar: http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-4222) Make HLog more resilient to write pipeline failures
Make HLog more resilient to write pipeline failures --- Key: HBASE-4222 URL: https://issues.apache.org/jira/browse/HBASE-4222 Project: HBase Issue Type: Improvement Components: wal Reporter: Gary Helmling Fix For: 0.92.0 The current implementation of HLog rolling to recover from transient errors in the write pipeline seems to have two problems: # When {{HLog.LogSyncer}} triggers an {{IOException}} during time-based sync operations, it triggers a log rolling request in the corresponding catch block, but only after escaping from the internal while loop. As a result, the {{LogSyncer}} thread will exit and never be restarted from what I can tell, even if the log rolling was successful. # Log rolling requests triggered by an {{IOException}} in {{sync()}} or {{append()}} never happen if no entries have yet been written to the log. This means that write errors are not immediately recovered, which extends the exposure to more errors occurring in the pipeline. In addition, it seems like we should be able to better handle transient problems, like a rolling restart of DataNodes while the HBase RegionServers are running. Currently this will reliably cause RegionServer aborts during log rolling: either an append or time-based sync triggers an initial {{IOException}}, initiating a log rolling request. However the log rolling then fails in closing the current writer (All datanodes are bad), causing a RegionServer abort. In this case, it seems like we should at least allow you an option to continue with the new writer and only abort on subsequent errors. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits
[ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086728#comment-13086728 ] gaojinchao commented on HBASE-3845: --- Hi,Patch has not yet apply to the branch ? data loss because lastSeqWritten can miss memstore edits Key: HBASE-3845 URL: https://issues.apache.org/jira/browse/HBASE-3845 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: Prakash Khemani Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.5 Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.) In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably. After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore. HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore. Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens. step 1: flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock(). step 2 : as soon as the updatesLock.writeLock() is released new entries will be added into the memstore. step 3 : wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten. step 4: the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing. == as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086745#comment-13086745 ] Subbu M Iyer commented on HBASE-4213: - Ted, Yes. DELETE_FAMILY also needs to be added. Will take care of that. You meant recursively deleting the children of node /hbase/schema/table name, right ? Yes. For region server which joins the cluster after the creation of /hbase/schema/table name, it should be able to find out that it already reads the most recent HTD. Does it create a child node under /hbase/schema/table name ? The newly joined RS will not create a child under /hbase/schema/table name as it will not be processing the schema change event. Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: HBASE-4213-Instant_schema_change.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4213) Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign)
[ https://issues.apache.org/jira/browse/HBASE-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086751#comment-13086751 ] Subbu M Iyer commented on HBASE-4213: - Regarding: public boolean refreshSchema(byte[] tableName) throws IOException { I agree that it could be a void instead of a boolean. Will change that as well. Support instant schema updates with out master's intervention (i.e with out enable/disable and bulk assign/unassign) Key: HBASE-4213 URL: https://issues.apache.org/jira/browse/HBASE-4213 Project: HBase Issue Type: Improvement Reporter: Subbu M Iyer Assignee: Subbu M Iyer Fix For: 0.92.0 Attachments: HBASE-4213-Instant_schema_change.patch This Jira is a slight variation in approach to what is being done as part of https://issues.apache.org/jira/browse/HBASE-1730 Support instant schema updates such as Modify Table, Add Column, Modify Column operations: 1. With out enable/disabling the table. 2. With out bulk unassign/assign of regions. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4223) Support the ability to return a set of rows using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nichole Treadway updated HBASE-4223: Affects Version/s: 0.92.0 Status: Patch Available (was: Open) Support the ability to return a set of rows using Coprocessors -- Key: HBASE-4223 URL: https://issues.apache.org/jira/browse/HBASE-4223 Project: HBase Issue Type: Improvement Components: coprocessors Affects Versions: 0.92.0 Reporter: Nichole Treadway Priority: Minor Currently HBase supports returning the results of aggregation operations using coprocessors with the AggregationClient. It would be useful to include a client and implementation which would return a set of rows which match a certain criteria using coprocessors as well. We have a use case in our business process for this. We have an initial implementation of this, which I've attached. The only limitation that we've found is that it cannot be used to return very large sets of rows. If the result set is very large, it would probably require some sort of pagination. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-4223) Support the ability to return a set of rows using Coprocessors
[ https://issues.apache.org/jira/browse/HBASE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nichole Treadway updated HBASE-4223: Attachment: HBASE-4223.patch Support the ability to return a set of rows using Coprocessors -- Key: HBASE-4223 URL: https://issues.apache.org/jira/browse/HBASE-4223 Project: HBase Issue Type: Improvement Components: coprocessors Affects Versions: 0.92.0 Reporter: Nichole Treadway Priority: Minor Attachments: HBASE-4223.patch Currently HBase supports returning the results of aggregation operations using coprocessors with the AggregationClient. It would be useful to include a client and implementation which would return a set of rows which match a certain criteria using coprocessors as well. We have a use case in our business process for this. We have an initial implementation of this, which I've attached. The only limitation that we've found is that it cannot be used to return very large sets of rows. If the result set is very large, it would probably require some sort of pagination. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086799#comment-13086799 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- Review request for Ian Varley. Summary --- A min versions coupled with TTL. Note that the I unified the GC logic inside the ColumnTrackers. Previously they did the versioning and TTL was handled outside. What is still open what to do with Store.getKeyAtOrBefore(...) This addresses bug HBASE-4071. https://issues.apache.org/jira/browse/HBASE-4071 Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1158860 Diff: https://reviews.apache.org/r/1582/diff Testing --- See TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086804#comment-13086804 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-18 05:01:59.054304) Review request for Ian Varley. Summary --- A min versions coupled with TTL. Note that the I unified the GC logic inside the ColumnTrackers. Previously they did the versioning and TTL was handled outside. What is still open what to do with Store.getKeyAtOrBefore(...) Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1158860 Diff: https://reviews.apache.org/r/1582/diff Testing --- See TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3845) data loss because lastSeqWritten can miss memstore edits
[ https://issues.apache.org/jira/browse/HBASE-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086805#comment-13086805 ] ramkrishna.s.vasudevan commented on HBASE-3845: --- Yes Gao. The fix is not gone into 0.90.x version. its available in trunk only. data loss because lastSeqWritten can miss memstore edits Key: HBASE-3845 URL: https://issues.apache.org/jira/browse/HBASE-3845 Project: HBase Issue Type: Bug Affects Versions: 0.90.3 Reporter: Prakash Khemani Assignee: ramkrishna.s.vasudevan Priority: Critical Fix For: 0.90.5 Attachments: 0001-HBASE-3845-data-loss-because-lastSeqWritten-can-miss.patch, HBASE-3845-fix-TestResettingCounters-test.txt, HBASE-3845_1.patch, HBASE-3845_2.patch, HBASE-3845_4.patch, HBASE-3845_5.patch, HBASE-3845_6.patch, HBASE-3845__trunk.patch, HBASE-3845_trunk_2.patch, HBASE-3845_trunk_3.patch (I don't have a test case to prove this yet but I have run it by Dhruba and Kannan internally and wanted to put this up for some feedback.) In this discussion let us assume that the region has only one column family. That way I can use region/memstore interchangeably. After a memstore flush it is possible for lastSeqWritten to have a log-sequence-id for a region that is not the earliest log-sequence-id for that region's memstore. HLog.append() does a putIfAbsent into lastSequenceWritten. This is to ensure that we only keep track of the earliest log-sequence-number that is present in the memstore. Every time the memstore is flushed we remove the region's entry in lastSequenceWritten and wait for the next append to populate this entry again. This is where the problem happens. step 1: flusher.prepare() snapshots the memstore under HRegion.updatesLock.writeLock(). step 2 : as soon as the updatesLock.writeLock() is released new entries will be added into the memstore. step 3 : wal.completeCacheFlush() is called. This method removes the region's entry from lastSeqWritten. step 4: the next append will create a new entry for the region in lastSeqWritten(). But this will be the log seq id of the current append. All the edits that were added in step 2 are missing. == as a temporary measure, instead of removing the region's entry in step 3 I will replace it with the log-seq-id of the region-flush-event. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4071) Data GC: Remove all versions TTL EXCEPT the last written version
[ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086808#comment-13086808 ] jirapos...@reviews.apache.org commented on HBASE-4071: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1582/ --- (Updated 2011-08-18 05:07:35.742598) Review request for Ian Varley. Summary --- A min versions coupled with TTL. Note that the I unified the GC logic inside the ColumnTrackers. Previously they did the versioning and TTL was handled outside. What is still open what to do with Store.getKeyAtOrBefore(...) Diffs - http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java PRE-CREATION http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java 1158860 http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 1158860 Diff: https://reviews.apache.org/r/1582/diff Testing --- See TestMinVersions. Thanks, Lars Data GC: Remove all versions TTL EXCEPT the last written version -- Key: HBASE-4071 URL: https://issues.apache.org/jira/browse/HBASE-4071 Project: HBase Issue Type: New Feature Reporter: stack Attachments: MinVersions.diff We were chatting today about our backup cluster. What we want is to be able to restore the dataset from any point of time but only within a limited timeframe -- say one week. Thereafter, if the versions are older than one week, rather than as we do with TTL where we let go of all versions older than TTL, instead, let go of all versions EXCEPT the last one written. So, its like versions==1 when TTL one week. We want to allow that if an error is caught within a week of its happening -- user mistakenly removes a critical table -- then we'll be able to restore up the the moment just before catastrophe hit otherwise, we keep one version only. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-4176) Exposing HBase Filters to the Thrift API
[ https://issues.apache.org/jira/browse/HBASE-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086821#comment-13086821 ] jirapos...@reviews.apache.org commented on HBASE-4176: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1326/ --- (Updated 2011-08-18 05:35:32.078889) Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jonathan Gray. Changes --- Included the Filter Language document in the docbook. Please do let me know if I should add more documentation or any formatting issues I can try and address. P.S. Is trunk broken right now? I get an error saying - BaseMasterObserver is not abstract and does not override abstract method stop(org.apache.hadoop.hbase.coprocessor.CoprocessorEnvironment) in org.apache.hadoop.hbase.coprocessor.Coprocessor This is completely unrelated to my diff - and I checked that the only difference b/w this diff and the previous one that I had submitted is the change to book.xml. Thus I am going ahead and uploading the patch. Summary --- https://issues.apache.org/jira/browse/HBASE-4176: Exposing HBase Filters to the Thrift API Currently, to use any of the filters, one has to explicitly add a scanner for the filter in the Thrift API making it messy and long. With this patch, I am trying to add support for all the filters in a clean way. The user specifies a filter via a string. The string is parsed on the server to construct the filter. More information can be found in the attached document named Filter Language This patch is trying to extend and further the progress made by the patches in HBASE-1744 There is document attached to the HBASE-4176 JIRA that describes this patch in further detail This addresses bug HBASE-4176. https://issues.apache.org/jira/browse/HBASE-4176 Diffs (updated) - /src/docbkx/book.xml 1158977 /src/main/java/org/apache/hadoop/hbase/filter/ColumnCountGetFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/ColumnPaginationFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/ColumnPrefixFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/ColumnRangeFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/CompareFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/DependentColumnFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/FamilyFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/InclusiveStopFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/KeyOnlyFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/MultipleColumnPrefixFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/PageFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/ParseConstants.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/ParseFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/PrefixFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/QualifierFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/RowFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueExcludeFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/TimestampsFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/filter/ValueFilter.java 1158977 /src/main/java/org/apache/hadoop/hbase/thrift/ThriftServer.java 1158977 /src/main/java/org/apache/hadoop/hbase/thrift/generated/Hbase.java 1158977 /src/main/java/org/apache/hadoop/hbase/thrift/generated/TScan.java 1158977 /src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift 1158977 /src/main/ruby/hbase/table.rb 1158977 /src/main/ruby/shell/commands/scan.rb 1158977 /src/test/java/org/apache/hadoop/hbase/filter/TestParseFilter.java 1158977 Diff: https://reviews.apache.org/r/1326/diff Testing --- patch includes one test: TestParseFilter.java Thanks, Anirudh Exposing HBase Filters to the Thrift API Key: HBASE-4176 URL: https://issues.apache.org/jira/browse/HBASE-4176 Project: HBase Issue Type: Improvement Components: thrift Reporter: Anirudh Todi Assignee: Anirudh Todi Priority: Minor Attachments: Filter Language (3).xml, Filter Language(2).docx, Filter Language(2).xml, Filter Language(3).docx, Filter Language.docx,
[jira] [Commented] (HBASE-4202) Check filesystem permissions on startup
[ https://issues.apache.org/jira/browse/HBASE-4202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086827#comment-13086827 ] ramkrishna.s.vasudevan commented on HBASE-4202: --- @Stack, I think 0.90.x behaves correctly. I checked the behaviour. Below are the logs {noformat} 2011-08-18 10:57:49,345 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2011-08-18 10:57:49,475 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Failed initialization 2011-08-18 10:57:49,479 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Failed init java.lang.IllegalArgumentException: Bad temp directory: /tmp/hadoop-test666/Jetty/regionserver at org.mortbay.jetty.webapp.WebAppContext.setTempDirectory(WebAppContext.java:1201) at org.apache.hadoop.http.HttpServer.init(HttpServer.java:128) at org.apache.hadoop.hbase.util.InfoServer.init(InfoServer.java:54) at org.apache.hadoop.hbase.regionserver.HRegionServer.startServiceThreads(HRegionServer.java:1262) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:880) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1481) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571) at java.lang.Thread.run(Thread.java:619) 2011-08-18 10:57:49,488 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server serverName=linux-kxjl,60020,1313645267001, load=(requests=0, regions=0, usedHeap=22, maxHeap=995): Unhandled exception: Region server startup failed java.io.IOException: Region server startup failed at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:987) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:889) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1481) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:571) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.IllegalArgumentException: Bad temp directory: /tmp/hadoop-test666/Jetty/regionserver at org.mortbay.jetty.webapp.WebAppContext.setTempDirectory(WebAppContext.java:1201) at org.apache.hadoop.http.HttpServer.init(HttpServer.java:128) at org.apache.hadoop.hbase.util.InfoServer.init(InfoServer.java:54) at org.apache.hadoop.hbase.regionserver.HRegionServer.startServiceThreads(HRegionServer.java:1262) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:880) ... 3 more --- 2011-08-18 10:57:49,770 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-14,5,main] 2011-08-18 10:57:49,771 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown hook 2011-08-18 10:57:49,771 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown hook thread. 2011-08-18 10:57:49,874 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished. {noformat} This defect may not be valid in 0.90.x version. Check filesystem permissions on startup --- Key: HBASE-4202 URL: https://issues.apache.org/jira/browse/HBASE-4202 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.4 Environment: debian squeeze Reporter: Matthias Hofschen Assignee: ramkrishna.s.vasudevan Labels: noob We added a new node to a 44 node cluster starting the datanode, mapred and regionserver processes on it. The Unix filesystem was configured incorrectly, i.e. /tmp was not writable to processes. All three processes had issues with this. Datanode and mapred shutdown on exception. Regionserver did not stop, in fact reported to master that its up without regions. So master assigned regions to it. Regionserver would not accept them, resulting in a constant assign, reject, reassign cycle, that put many regions into a state of not being available. There are no logs about this, but we could observer the regioncount fluctuate by hundredths of regions and the application throwing many NotServingRegion exceptions. In fact to the master process the regionserver looked fine, so it was trying to send regions its way. Regionserver rejected them. So the master/balancer was going into a assign/reassign cycle destabilizing the cluster. Many puts and gets simply failed with NotServingRegionExceptions and took a long time to complete. Exception from