[jira] Commented: (HBASE-3413) DNS Configs may completely break HBase cluster
[ https://issues.apache.org/jira/browse/HBASE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977685#action_12977685 ] ryan rawson commented on HBASE-3413: how would we ensure the uuid would be generated the same upon every regionserver startup? Another thing to consider is that dns names are inserted into META tables, and this is used by clients to find the machine. If we detect a DNS change we would have to do a bunch of fancy work to ensure the META table is correct, no? DNS Configs may completely break HBase cluster -- Key: HBASE-3413 URL: https://issues.apache.org/jira/browse/HBASE-3413 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Environment: all Reporter: Mathias Herberts I recently experienced a cluster malfunction which was caused by a change in DNS config for services co-hosted on the machines running region servers. The RS are specified using IP addresses in the 'regionservers' file. Those machines are 1.example.com to N.example.com (there are A RRs for those names to each of the N IP addresses in 'regionservers'). Until recently, the PTR RRs for the RS IPs were those x.example.com names. Then a service was deployed on some of the x.example.com machines, and new A RRs were added for svc.example.com which point to each of the IPs used for the service. Jointly new PTR records were added too for the given IPs. Those PTR records have 'svc.example.com' as their PTRDATA, and this is causing the HBase cluster to get completely confused. Since it is perfectly legal to have multiple PTR records, it seems important to make the canonicalization of RS more robust to DNS tweaks. Maybe generating a UUID when a RS is started would help, this UUID could be used to register the RS in ZK and we would not rely on DNS for obtaining a stable canonical name (which may not even exist...). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3413) DNS Configs may completely break HBase cluster
[ https://issues.apache.org/jira/browse/HBASE-3413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977689#action_12977689 ] Mathias Herberts commented on HBASE-3413: - The idea of using a UUID was to be able to detect that two RS accessed with different DNS names were truely the same (wih the same UUID). As for using names in META tables, I've still to understand why we do that instead of using IPs. DNS Configs may completely break HBase cluster -- Key: HBASE-3413 URL: https://issues.apache.org/jira/browse/HBASE-3413 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Environment: all Reporter: Mathias Herberts I recently experienced a cluster malfunction which was caused by a change in DNS config for services co-hosted on the machines running region servers. The RS are specified using IP addresses in the 'regionservers' file. Those machines are 1.example.com to N.example.com (there are A RRs for those names to each of the N IP addresses in 'regionservers'). Until recently, the PTR RRs for the RS IPs were those x.example.com names. Then a service was deployed on some of the x.example.com machines, and new A RRs were added for svc.example.com which point to each of the IPs used for the service. Jointly new PTR records were added too for the given IPs. Those PTR records have 'svc.example.com' as their PTRDATA, and this is causing the HBase cluster to get completely confused. Since it is perfectly legal to have multiple PTR records, it seems important to make the canonicalization of RS more robust to DNS tweaks. Maybe generating a UUID when a RS is started would help, this UUID could be used to register the RS in ZK and we would not rely on DNS for obtaining a stable canonical name (which may not even exist...). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3415) When scanners have readers updated we should use original file selection algorithm rather than include all files
When scanners have readers updated we should use original file selection algorithm rather than include all files Key: HBASE-3415 URL: https://issues.apache.org/jira/browse/HBASE-3415 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.7, 0.90.0 Reporter: Jonathan Gray Fix For: 0.90.1 Currently when a {{StoreScanner}} is instantiated we use a {{getScanner(scan, columns)}} call that looks at things like bloom filters and memstore only flags. But when we get a changed readers notification, we use {{getScanner()}} which just grabs everything. We should always use the original file selection algorithm. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3415) When scanners have readers updated we should use original file selection algorithm rather than include all files
[ https://issues.apache.org/jira/browse/HBASE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3415: - Attachment: HBASE-3415-v1.patch First go. There are other bugs in this code and updating readers, especially with intra-row scans. Going to file more jiras. When scanners have readers updated we should use original file selection algorithm rather than include all files Key: HBASE-3415 URL: https://issues.apache.org/jira/browse/HBASE-3415 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.7, 0.90.0 Reporter: Jonathan Gray Fix For: 0.90.1 Attachments: HBASE-3415-v1.patch Currently when a {{StoreScanner}} is instantiated we use a {{getScanner(scan, columns)}} call that looks at things like bloom filters and memstore only flags. But when we get a changed readers notification, we use {{getScanner()}} which just grabs everything. We should always use the original file selection algorithm. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3416) For intra-row scanning, the update readers notification resets the query matcher and can lead to incorrect behavior
For intra-row scanning, the update readers notification resets the query matcher and can lead to incorrect behavior --- Key: HBASE-3416 URL: https://issues.apache.org/jira/browse/HBASE-3416 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.7, 0.90.0 Reporter: Jonathan Gray Fix For: 0.90.1 In {{StoreScanner.resetScannerStack()}}, which is called on the first {{next()}} call after readers have been updated, we do a query matcher reset. Normally this is not an issue because the query matcher does not need to maintain state between rows. However, if doing intra-row scanning w/ the specified limit, we could have the query matcher reset in the middle of reading a row. This could lead to incorrect behavior (too many versions coming back, etc). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme -- Key: HBASE-3417 URL: https://issues.apache.org/jira/browse/HBASE-3417 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.92.0 Currently the block names used in the block cache are built using the filesystem path. However, for cache on write, the path is a temporary output file. The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough. Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3415) When scanners have readers updated we should use original file selection algorithm rather than include all files
[ https://issues.apache.org/jira/browse/HBASE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977815#action_12977815 ] stack commented on HBASE-3415: -- Am I missing something? You remove the getScanner(scan, columns) in the patch, the thing you would seem to want to preserve going by your comment above. When scanners have readers updated we should use original file selection algorithm rather than include all files Key: HBASE-3415 URL: https://issues.apache.org/jira/browse/HBASE-3415 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.20.7, 0.90.0 Reporter: Jonathan Gray Fix For: 0.90.1 Attachments: HBASE-3415-v1.patch Currently when a {{StoreScanner}} is instantiated we use a {{getScanner(scan, columns)}} call that looks at things like bloom filters and memstore only flags. But when we get a changed readers notification, we use {{getScanner()}} which just grabs everything. We should always use the original file selection algorithm. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3406) Region stuck in transition after RS failed while opening
[ https://issues.apache.org/jira/browse/HBASE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3406: - Fix Version/s: (was: 0.90.0) 0.90.1 Moving to 0.90.1 I cannot explain how in-memory state has OPENING for the node but the znode content is M_ZK_REGION_OFFLINE without more context. Region stuck in transition after RS failed while opening Key: HBASE-3406 URL: https://issues.apache.org/jira/browse/HBASE-3406 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Todd Lipcon Priority: Critical Fix For: 0.90.1 I had a RS fail due to GC pause while it was in the midst of opening a region, apparently. This got the region stuck in the following repeating sequence in the master log: 2011-01-03 17:24:33,884 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. 2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:6, state=M_ZK_REGION_OFFLINE 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. state=OPENING, ts=1293840977790 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. 2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:6, state=M_ZK_REGION_OFFLINE etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it to OFFLINE in ZK and it reassigned just fine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3418) Increment operations can break when qualifiers are split between memstore/snapshot and storefiles
Increment operations can break when qualifiers are split between memstore/snapshot and storefiles - Key: HBASE-3418 URL: https://issues.apache.org/jira/browse/HBASE-3418 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.90.1, 0.92.0 Doing investigation around some observed resetting counter behavior. An optimization was added to check memstore/snapshots first and then check storefiles if not all counters were found. However it looks like this introduced a bug when columns for a given row/family in a single increment operation are spread across memstores and storefiles. The results from get operations on both memstores and storefiles are appended together but when processed are expected to be fully sorted. This can lead to invalid results. Need to sort the combined result of memstores + storefiles. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3418) Increment operations can break when qualifiers are split between memstore/snapshot and storefiles
[ https://issues.apache.org/jira/browse/HBASE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3418: - Attachment: HBASE-3418-v1.patch Unit test which reproduces bad behavior and small fix which seems to work / fixes test. Increment operations can break when qualifiers are split between memstore/snapshot and storefiles - Key: HBASE-3418 URL: https://issues.apache.org/jira/browse/HBASE-3418 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.90.1, 0.92.0 Attachments: HBASE-3418-v1.patch Doing investigation around some observed resetting counter behavior. An optimization was added to check memstore/snapshots first and then check storefiles if not all counters were found. However it looks like this introduced a bug when columns for a given row/family in a single increment operation are spread across memstores and storefiles. The results from get operations on both memstores and storefiles are appended together but when processed are expected to be fully sorted. This can lead to invalid results. Need to sort the combined result of memstores + storefiles. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3418) Increment operations can break when qualifiers are split between memstore/snapshot and storefiles
[ https://issues.apache.org/jira/browse/HBASE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977864#action_12977864 ] Todd Lipcon commented on HBASE-3418: Can we put this in 0.90? seems like inaccurate counters are a bad problem! Increment operations can break when qualifiers are split between memstore/snapshot and storefiles - Key: HBASE-3418 URL: https://issues.apache.org/jira/browse/HBASE-3418 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.90.1, 0.92.0 Attachments: HBASE-3418-v1.patch Doing investigation around some observed resetting counter behavior. An optimization was added to check memstore/snapshots first and then check storefiles if not all counters were found. However it looks like this introduced a bug when columns for a given row/family in a single increment operation are spread across memstores and storefiles. The results from get operations on both memstores and storefiles are appended together but when processed are expected to be fully sorted. This can lead to invalid results. Need to sort the combined result of memstores + storefiles. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3418) Increment operations can break when qualifiers are split between memstore/snapshot and storefiles
[ https://issues.apache.org/jira/browse/HBASE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3418: - Fix Version/s: (was: 0.90.1) 0.90.0 Yeah, looks like we're doing at least one more RC. Increment operations can break when qualifiers are split between memstore/snapshot and storefiles - Key: HBASE-3418 URL: https://issues.apache.org/jira/browse/HBASE-3418 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.90.0, 0.92.0 Attachments: HBASE-3418-v1.patch Doing investigation around some observed resetting counter behavior. An optimization was added to check memstore/snapshots first and then check storefiles if not all counters were found. However it looks like this introduced a bug when columns for a given row/family in a single increment operation are spread across memstores and storefiles. The results from get operations on both memstores and storefiles are appended together but when processed are expected to be fully sorted. This can lead to invalid results. Need to sort the combined result of memstores + storefiles. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.
If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open. - Key: HBASE-3419 URL: https://issues.apache.org/jira/browse/HBASE-3419 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.90.0, 0.92.0 Reporter: Jonathan Gray Priority: Critical Fix For: 0.90.1, 0.92.0 The {{Progressable}} used on region open to tickle the ZK OPENING node to prevent the master from timing out a region open operation will currently abort the RegionServer if this fails for some reason. However it could be normal for an RS to have a region open operation aborted by the master, so should just handle as it does other places by reverting the open. We had a cluster trip over some other issue (for some reason, the tickle was not happening in 30 seconds, so master was timing out every time). Because of the abort on BadVersion, this eventually led to every single RS aborting itself eventually taking down the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.
[ https://issues.apache.org/jira/browse/HBASE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977870#action_12977870 ] Jonathan Gray commented on HBASE-3419: -- Currently the tickle happens on a number-of-replayed-edits interval (does not count edits skipped). This is probably not the best idea since edits can be wildly different sizes (in this case, an all increment cluster where there are very high numbers of small edits). The tickle is really about time not number of edits. Maybe a Chore instead set at 1/2 master timeout? Or some other way of doing it based on time instead of edits? If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open. - Key: HBASE-3419 URL: https://issues.apache.org/jira/browse/HBASE-3419 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.90.0, 0.92.0 Reporter: Jonathan Gray Priority: Critical Fix For: 0.90.1, 0.92.0 The {{Progressable}} used on region open to tickle the ZK OPENING node to prevent the master from timing out a region open operation will currently abort the RegionServer if this fails for some reason. However it could be normal for an RS to have a region open operation aborted by the master, so should just handle as it does other places by reverting the open. We had a cluster trip over some other issue (for some reason, the tickle was not happening in 30 seconds, so master was timing out every time). Because of the abort on BadVersion, this eventually led to every single RS aborting itself eventually taking down the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state
Handling a big rebalance, we can queue multiple instances of a Close event; messes up state --- Key: HBASE-3420 URL: https://issues.apache.org/jira/browse/HBASE-3420 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack Fix For: 0.90.1 This is pretty ugly. In short, on a heavily loaded cluster, we are queuing multiple instances of region close. They all try to run confusing state. Long version: I have a messy cluster. Its 16k regions on 8 servers. One node has 5k or so regions on it. Heaps are 1G all around. My master had OOME'd. Not sure why but not too worried about it for now. So, new master comes up and is trying to rebalance the cluster: {code} 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded servers onto 3 less loaded servers {code} The balancer ends up sending many closes to a single overloaded server are taking so long, the close times out in RIT. We then do this: {code} case CLOSED: LOG.info(Region has been CLOSED for too long, + retriggering ClosedRegionHandler); AssignmentManager.this.executorService.submit( new ClosedRegionHandler(master, AssignmentManager.this, regionState.getRegion())); break; {code} We queue a new close (Should we?). We time out a few more times (9 times) and each time we queue a new close. Eventually the close succeeds, the region gets assigned a new location. Then the next close pops off the eventhandler queue. Here is the telltale signature of stuff gone amiss: {code} 2011-01-05 00:52:19,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. state=OPEN, ts=1294188709030 {code} Notice how state is OPEN when we are forcing offline (It was actually just successfully opened). We end up assigning same server because plan was still around: {code} 2011-01-05 00:52:20,705 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. but already online on this server {code} But later when plan is cleared, we assign new server and we have dbl-assignment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-3412) HLogSplitter should handle missing HLogs
[ https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Daniel Cryans resolved HBASE-3412. --- Resolution: Fixed Assignee: Jean-Daniel Cryans Hadoop Flags: [Reviewed] Committed to branch and trunk, thanks for the review Stack! HLogSplitter should handle missing HLogs Key: HBASE-3412 URL: https://issues.apache.org/jira/browse/HBASE-3412 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Critical Fix For: 0.90.0 Attachments: HBASE-3412-2.patch, HBASE-3412.patch In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), TestReplication failed because of missing rows on the slave cluster. The reason is that a region server that was killed was able to archive a log at the same time the master was trying to recover it: {noformat} [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625): Recovering file hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 ... [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740): moving old hlog file /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 whose highest sequenceid is 422 to /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909 ... [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] master.MasterFileSystem(204): Failed splitting hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857 java.io.IOException: Failed to open hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 for append Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 File does not exist. [Lease. Holder: DFSClient_-986975908, pendingcreates: 1] {noformat} We should probably just handle the fact that a file could have been archived (maybe even check in .oldlogs to be sure) and move on to the next log. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state
[ https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977890#action_12977890 ] stack commented on HBASE-3420: -- Its timeout of a close. Here is sequence: {code} 2011-01-05 00:49:37,670 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041., src=sv2borg181,60020,1294096110452, dest=sv2borg188,60020,1294187735582 2011-01-05 00:49:37,670 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of region TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. (offlining) 2011-01-05 00:49:37,671 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=sv2borg181,60020,1294096110452, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region TestTable,0487405776,1294125523541. b1fa38bb610943e9eadc604babe4d041. 2011-01-05 00:49:38,310 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:6-0x12d3de9e7c60e37 Retrieved 112 byte(s) of data from znode /hbase/unassigned/b1fa38bb610943e9eadc604babe4d041 and set watcher; region=TestTable,0487405776,1294125523541. b1fa38bb610943e9eadc604babe4d041., server=sv2borg181,60020,1294096110452, state=RS_ZK_REGION_CLOSED 2011-01-05 00:49:38,385 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned node: /hbase/unassigned/b1fa38bb610943e9eadc604babe4d041 (region=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041., server=sv2borg181,60020, 1294096110452, state=RS_ZK_REGION_CLOSED) 2011-01-05 00:49:38,385 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_CLOSED, server=sv2borg181,60020,1294096110452, region=b1fa38bb610943e9eadc604babe4d041 2011-01-05 00:50:12,412 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. state=CLOSED, ts=1294188578211 2011-01-05 00:50:12,412 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been CLOSED for too long, retriggering ClosedRegionHandler {code} Handling a big rebalance, we can queue multiple instances of a Close event; messes up state --- Key: HBASE-3420 URL: https://issues.apache.org/jira/browse/HBASE-3420 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack Fix For: 0.90.1 This is pretty ugly. In short, on a heavily loaded cluster, we are queuing multiple instances of region close. They all try to run confusing state. Long version: I have a messy cluster. Its 16k regions on 8 servers. One node has 5k or so regions on it. Heaps are 1G all around. My master had OOME'd. Not sure why but not too worried about it for now. So, new master comes up and is trying to rebalance the cluster: {code} 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded servers onto 3 less loaded servers {code} The balancer ends up sending many closes to a single overloaded server are taking so long, the close times out in RIT. We then do this: {code} case CLOSED: LOG.info(Region has been CLOSED for too long, + retriggering ClosedRegionHandler); AssignmentManager.this.executorService.submit( new ClosedRegionHandler(master, AssignmentManager.this, regionState.getRegion())); break; {code} We queue a new close (Should we?). We time out a few more times (9 times) and each time we queue a new close. Eventually the close succeeds, the region gets assigned a new location. Then the next close pops off the eventhandler queue. Here is the telltale signature of stuff gone amiss: {code} 2011-01-05 00:52:19,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. state=OPEN, ts=1294188709030 {code} Notice how state is OPEN when we are forcing offline (It was actually just successfully opened). We end up assigning same server because plan was still around: {code} 2011-01-05 00:52:20,705 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. but already online on this server {code} But later when plan is cleared, we assign new server and we have dbl-assignment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3418) Increment operations can break when qualifiers are split between memstore/snapshot and storefiles
[ https://issues.apache.org/jira/browse/HBASE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977896#action_12977896 ] stack commented on HBASE-3418: -- +1 on patch. Increment operations can break when qualifiers are split between memstore/snapshot and storefiles - Key: HBASE-3418 URL: https://issues.apache.org/jira/browse/HBASE-3418 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.90.0, 0.92.0 Attachments: HBASE-3418-v1.patch Doing investigation around some observed resetting counter behavior. An optimization was added to check memstore/snapshots first and then check storefiles if not all counters were found. However it looks like this introduced a bug when columns for a given row/family in a single increment operation are spread across memstores and storefiles. The results from get operations on both memstores and storefiles are appended together but when processed are expected to be fully sorted. This can lead to invalid results. Need to sort the combined result of memstores + storefiles. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state
[ https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977897#action_12977897 ] stack commented on HBASE-3420: -- Looking more, the CLOSED event had been queued over on the master but tens of seconds elapsed before it had a chance to run (This was a rebalance of thousands of regions on constrained server). Meantime, we were requeuing CloseRegionHandlers every ten seconds as the CLOSED timeeout in RIT. I'm going to post patch that removes the adding new CRH to event queue on timeout of CLOSED. Either the queued original CRH will run or server will crash and region state will be altered appropriately at that time. Handling a big rebalance, we can queue multiple instances of a Close event; messes up state --- Key: HBASE-3420 URL: https://issues.apache.org/jira/browse/HBASE-3420 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack Fix For: 0.90.1 This is pretty ugly. In short, on a heavily loaded cluster, we are queuing multiple instances of region close. They all try to run confusing state. Long version: I have a messy cluster. Its 16k regions on 8 servers. One node has 5k or so regions on it. Heaps are 1G all around. My master had OOME'd. Not sure why but not too worried about it for now. So, new master comes up and is trying to rebalance the cluster: {code} 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded servers onto 3 less loaded servers {code} The balancer ends up sending many closes to a single overloaded server are taking so long, the close times out in RIT. We then do this: {code} case CLOSED: LOG.info(Region has been CLOSED for too long, + retriggering ClosedRegionHandler); AssignmentManager.this.executorService.submit( new ClosedRegionHandler(master, AssignmentManager.this, regionState.getRegion())); break; {code} We queue a new close (Should we?). We time out a few more times (9 times) and each time we queue a new close. Eventually the close succeeds, the region gets assigned a new location. Then the next close pops off the eventhandler queue. Here is the telltale signature of stuff gone amiss: {code} 2011-01-05 00:52:19,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. state=OPEN, ts=1294188709030 {code} Notice how state is OPEN when we are forcing offline (It was actually just successfully opened). We end up assigning same server because plan was still around: {code} 2011-01-05 00:52:20,705 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. but already online on this server {code} But later when plan is cleared, we assign new server and we have dbl-assignment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state
[ https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3420: - Attachment: 3420.txt This should address most egregious issue turned up by these logs. Other things to add are maximum regions to assign per balance. We should add that too. Will make a new issue for that once this goes in. Handling a big rebalance, we can queue multiple instances of a Close event; messes up state --- Key: HBASE-3420 URL: https://issues.apache.org/jira/browse/HBASE-3420 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack Fix For: 0.90.1 Attachments: 3420.txt This is pretty ugly. In short, on a heavily loaded cluster, we are queuing multiple instances of region close. They all try to run confusing state. Long version: I have a messy cluster. Its 16k regions on 8 servers. One node has 5k or so regions on it. Heaps are 1G all around. My master had OOME'd. Not sure why but not too worried about it for now. So, new master comes up and is trying to rebalance the cluster: {code} 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded servers onto 3 less loaded servers {code} The balancer ends up sending many closes to a single overloaded server are taking so long, the close times out in RIT. We then do this: {code} case CLOSED: LOG.info(Region has been CLOSED for too long, + retriggering ClosedRegionHandler); AssignmentManager.this.executorService.submit( new ClosedRegionHandler(master, AssignmentManager.this, regionState.getRegion())); break; {code} We queue a new close (Should we?). We time out a few more times (9 times) and each time we queue a new close. Eventually the close succeeds, the region gets assigned a new location. Then the next close pops off the eventhandler queue. Here is the telltale signature of stuff gone amiss: {code} 2011-01-05 00:52:19,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. state=OPEN, ts=1294188709030 {code} Notice how state is OPEN when we are forcing offline (It was actually just successfully opened). We end up assigning same server because plan was still around: {code} 2011-01-05 00:52:20,705 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. but already online on this server {code} But later when plan is cleared, we assign new server and we have dbl-assignment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state
[ https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977906#action_12977906 ] Jonathan Gray commented on HBASE-3420: -- So this just updates the timestamp. Seems like it would be equivalent to logging and doing a NO-OP on CLOSED timeout (only point of updating timestamp is to prevent another timeout). I guess this is fine since we will get a log message once per timeout period though. So once the CRH runs, the RegionState goes to OFFLINE huh? Makes sense then. +1 and +1 on a maxregionstobalanceatonce or the like Handling a big rebalance, we can queue multiple instances of a Close event; messes up state --- Key: HBASE-3420 URL: https://issues.apache.org/jira/browse/HBASE-3420 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack Fix For: 0.90.1 Attachments: 3420.txt This is pretty ugly. In short, on a heavily loaded cluster, we are queuing multiple instances of region close. They all try to run confusing state. Long version: I have a messy cluster. Its 16k regions on 8 servers. One node has 5k or so regions on it. Heaps are 1G all around. My master had OOME'd. Not sure why but not too worried about it for now. So, new master comes up and is trying to rebalance the cluster: {code} 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded servers onto 3 less loaded servers {code} The balancer ends up sending many closes to a single overloaded server are taking so long, the close times out in RIT. We then do this: {code} case CLOSED: LOG.info(Region has been CLOSED for too long, + retriggering ClosedRegionHandler); AssignmentManager.this.executorService.submit( new ClosedRegionHandler(master, AssignmentManager.this, regionState.getRegion())); break; {code} We queue a new close (Should we?). We time out a few more times (9 times) and each time we queue a new close. Eventually the close succeeds, the region gets assigned a new location. Then the next close pops off the eventhandler queue. Here is the telltale signature of stuff gone amiss: {code} 2011-01-05 00:52:19,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. state=OPEN, ts=1294188709030 {code} Notice how state is OPEN when we are forcing offline (It was actually just successfully opened). We end up assigning same server because plan was still around: {code} 2011-01-05 00:52:20,705 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. but already online on this server {code} But later when plan is cleared, we assign new server and we have dbl-assignment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state
[ https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977929#action_12977929 ] stack commented on HBASE-3420: -- @Ted Balancing works differently in 0.90. Where before, when a RS would heartbeat, in the response we'd give it a set of regions to open/close. The new region assignment goes via zk. The balancer looks at total cluster state and comes up w/ a plan. It then starts the plan rolling which instigates a cascade of closings done via zk. Handling a big rebalance, we can queue multiple instances of a Close event; messes up state --- Key: HBASE-3420 URL: https://issues.apache.org/jira/browse/HBASE-3420 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack Fix For: 0.90.1 Attachments: 3420.txt This is pretty ugly. In short, on a heavily loaded cluster, we are queuing multiple instances of region close. They all try to run confusing state. Long version: I have a messy cluster. Its 16k regions on 8 servers. One node has 5k or so regions on it. Heaps are 1G all around. My master had OOME'd. Not sure why but not too worried about it for now. So, new master comes up and is trying to rebalance the cluster: {code} 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded servers onto 3 less loaded servers {code} The balancer ends up sending many closes to a single overloaded server are taking so long, the close times out in RIT. We then do this: {code} case CLOSED: LOG.info(Region has been CLOSED for too long, + retriggering ClosedRegionHandler); AssignmentManager.this.executorService.submit( new ClosedRegionHandler(master, AssignmentManager.this, regionState.getRegion())); break; {code} We queue a new close (Should we?). We time out a few more times (9 times) and each time we queue a new close. Eventually the close succeeds, the region gets assigned a new location. Then the next close pops off the eventhandler queue. Here is the telltale signature of stuff gone amiss: {code} 2011-01-05 00:52:19,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. state=OPEN, ts=1294188709030 {code} Notice how state is OPEN when we are forcing offline (It was actually just successfully opened). We end up assigning same server because plan was still around: {code} 2011-01-05 00:52:20,705 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. but already online on this server {code} But later when plan is cleared, we assign new server and we have dbl-assignment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state
[ https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977930#action_12977930 ] Jonathan Gray commented on HBASE-3420: -- It's unrelated to the notion of checkins (which is almost completely gone now) so not sure why we would reuse this config param. We could set per-RS limits but that would probably require significantly more hack-up of the balancing algo. Handling a big rebalance, we can queue multiple instances of a Close event; messes up state --- Key: HBASE-3420 URL: https://issues.apache.org/jira/browse/HBASE-3420 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack Fix For: 0.90.1 Attachments: 3420.txt This is pretty ugly. In short, on a heavily loaded cluster, we are queuing multiple instances of region close. They all try to run confusing state. Long version: I have a messy cluster. Its 16k regions on 8 servers. One node has 5k or so regions on it. Heaps are 1G all around. My master had OOME'd. Not sure why but not too worried about it for now. So, new master comes up and is trying to rebalance the cluster: {code} 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded servers onto 3 less loaded servers {code} The balancer ends up sending many closes to a single overloaded server are taking so long, the close times out in RIT. We then do this: {code} case CLOSED: LOG.info(Region has been CLOSED for too long, + retriggering ClosedRegionHandler); AssignmentManager.this.executorService.submit( new ClosedRegionHandler(master, AssignmentManager.this, regionState.getRegion())); break; {code} We queue a new close (Should we?). We time out a few more times (9 times) and each time we queue a new close. Eventually the close succeeds, the region gets assigned a new location. Then the next close pops off the eventhandler queue. Here is the telltale signature of stuff gone amiss: {code} 2011-01-05 00:52:19,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. state=OPEN, ts=1294188709030 {code} Notice how state is OPEN when we are forcing offline (It was actually just successfully opened). We end up assigning same server because plan was still around: {code} 2011-01-05 00:52:20,705 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. but already online on this server {code} But later when plan is cleared, we assign new server and we have dbl-assignment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
[ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977943#action_12977943 ] Jonathan Gray commented on HBASE-3417: -- One idea from discussion with stack is to use a UUID for the filename. That way we can generate it once for the temporary file and then just move it in place without doing a rename. Would then just use UUID + blockNumber as the blockName. CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme -- Key: HBASE-3417 URL: https://issues.apache.org/jira/browse/HBASE-3417 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.92.0 Currently the block names used in the block cache are built using the filesystem path. However, for cache on write, the path is a temporary output file. The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough. Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state
[ https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977968#action_12977968 ] stack commented on HBASE-3420: -- Ok... with this patch in place, master was able to join the cluster w/o aborting and live through the rebalance (all regions cleared from RIT). I'm going to commit. Handling a big rebalance, we can queue multiple instances of a Close event; messes up state --- Key: HBASE-3420 URL: https://issues.apache.org/jira/browse/HBASE-3420 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack Fix For: 0.90.1 Attachments: 3420.txt This is pretty ugly. In short, on a heavily loaded cluster, we are queuing multiple instances of region close. They all try to run confusing state. Long version: I have a messy cluster. Its 16k regions on 8 servers. One node has 5k or so regions on it. Heaps are 1G all around. My master had OOME'd. Not sure why but not too worried about it for now. So, new master comes up and is trying to rebalance the cluster: {code} 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded servers onto 3 less loaded servers {code} The balancer ends up sending many closes to a single overloaded server are taking so long, the close times out in RIT. We then do this: {code} case CLOSED: LOG.info(Region has been CLOSED for too long, + retriggering ClosedRegionHandler); AssignmentManager.this.executorService.submit( new ClosedRegionHandler(master, AssignmentManager.this, regionState.getRegion())); break; {code} We queue a new close (Should we?). We time out a few more times (9 times) and each time we queue a new close. Eventually the close succeeds, the region gets assigned a new location. Then the next close pops off the eventhandler queue. Here is the telltale signature of stuff gone amiss: {code} 2011-01-05 00:52:19,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. state=OPEN, ts=1294188709030 {code} Notice how state is OPEN when we are forcing offline (It was actually just successfully opened). We end up assigning same server because plan was still around: {code} 2011-01-05 00:52:20,705 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. but already online on this server {code} But later when plan is cleared, we assign new server and we have dbl-assignment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state
[ https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-3420. -- Resolution: Fixed Fix Version/s: (was: 0.90.1) 0.90.0 Assignee: stack Hadoop Flags: [Reviewed] Committed to branch and trunk. Handling a big rebalance, we can queue multiple instances of a Close event; messes up state --- Key: HBASE-3420 URL: https://issues.apache.org/jira/browse/HBASE-3420 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack Assignee: stack Fix For: 0.90.0 Attachments: 3420.txt This is pretty ugly. In short, on a heavily loaded cluster, we are queuing multiple instances of region close. They all try to run confusing state. Long version: I have a messy cluster. Its 16k regions on 8 servers. One node has 5k or so regions on it. Heaps are 1G all around. My master had OOME'd. Not sure why but not too worried about it for now. So, new master comes up and is trying to rebalance the cluster: {code} 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded servers onto 3 less loaded servers {code} The balancer ends up sending many closes to a single overloaded server are taking so long, the close times out in RIT. We then do this: {code} case CLOSED: LOG.info(Region has been CLOSED for too long, + retriggering ClosedRegionHandler); AssignmentManager.this.executorService.submit( new ClosedRegionHandler(master, AssignmentManager.this, regionState.getRegion())); break; {code} We queue a new close (Should we?). We time out a few more times (9 times) and each time we queue a new close. Eventually the close succeeds, the region gets assigned a new location. Then the next close pops off the eventhandler queue. Here is the telltale signature of stuff gone amiss: {code} 2011-01-05 00:52:19,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. state=OPEN, ts=1294188709030 {code} Notice how state is OPEN when we are forcing offline (It was actually just successfully opened). We end up assigning same server because plan was still around: {code} 2011-01-05 00:52:20,705 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. but already online on this server {code} But later when plan is cleared, we assign new server and we have dbl-assignment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3421) Very wide rows -- 30M plus -- cause us OOME
Very wide rows -- 30M plus -- cause us OOME --- Key: HBASE-3421 URL: https://issues.apache.org/jira/browse/HBASE-3421 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack From the list, see 'jvm oom' in http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/browser, it looks like wide rows -- 30M or so -- causes OOME during compaction. We should check it out. Can the scanner used during compactions use the 'limit' when nexting? If so, this should save our OOME'ing (or, we need to add to the next a max size rather than count of KVs). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3422) Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added.
Balancer will willing try to rebalance thousands of regions in one go; needs an upper bound added. -- Key: HBASE-3422 URL: https://issues.apache.org/jira/browse/HBASE-3422 Project: HBase Issue Type: Improvement Affects Versions: 0.90.0 Reporter: stack See HBASE-3420. Therein, a wonky cluster had 5k regions on one server and 1k on others. Balancer ran and wanted to redistribute 3k+ all in one go. Madness. If a load of rebalancing to be done, should be done somewhat piecemeal. We need maximum regions to rebalance at a time upper bound at a minimum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3409) Failed server shutdown processing when retrying hlog split
[ https://issues.apache.org/jira/browse/HBASE-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977981#action_12977981 ] Hudson commented on HBASE-3409: --- Integrated in HBase-TRUNK #1703 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1703/]) Failed server shutdown processing when retrying hlog split -- Key: HBASE-3409 URL: https://issues.apache.org/jira/browse/HBASE-3409 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Todd Lipcon Assignee: stack Priority: Blocker Fix For: 0.90.0 Attachments: 3409.txt 2011-01-04 01:14:17,353 WARN org.apache.hadoop.hbase.master.MasterFileSystem: Retrying splitting because of: org.apache.hadoop.hbase.regionserver.wal.OrphanHLogAfterSplitException: Discovered orphan hlog after split. Maybe the HRegionServer was not dead when we started at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:286) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:187) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:96) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 2011-01-04 01:14:17,353 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_META_SERVER_SHUTDOWN java.lang.IllegalStateException: An HLogSplitter instance may only be used once at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:170) at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:199) at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:96) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3412) HLogSplitter should handle missing HLogs
[ https://issues.apache.org/jira/browse/HBASE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977978#action_12977978 ] Hudson commented on HBASE-3412: --- Integrated in HBase-TRUNK #1703 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1703/]) HBASE-3412 HLogSplitter should handle missing HLogs HLogSplitter should handle missing HLogs Key: HBASE-3412 URL: https://issues.apache.org/jira/browse/HBASE-3412 Project: HBase Issue Type: Bug Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Critical Fix For: 0.90.0 Attachments: HBASE-3412-2.patch, HBASE-3412.patch In build #48 (https://hudson.apache.org/hudson/job/hbase-0.90/48/), TestReplication failed because of missing rows on the slave cluster. The reason is that a region server that was killed was able to archive a log at the same time the master was trying to recover it: {noformat} [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] util.FSUtils(625): Recovering file hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 ... [RegionServer:0;vesta.apache.org,58598,1294117333857.logRoller] wal.HLog(740): moving old hlog file /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 whose highest sequenceid is 422 to /user/hudson/.oldlogs/vesta.apache.org%3A58598.1294117406909 ... [MASTER_META_SERVER_OPERATIONS-vesta.apache.org:47907-0] master.MasterFileSystem(204): Failed splitting hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857 java.io.IOException: Failed to open hdfs://localhost:50121/user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 for append Caused by: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/hudson/.logs/vesta.apache.org,58598,1294117333857/vesta.apache.org%3A58598.1294117406909 File does not exist. [Lease. Holder: DFSClient_-986975908, pendingcreates: 1] {noformat} We should probably just handle the fact that a file could have been archived (maybe even check in .oldlogs to be sure) and move on to the next log. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3402) Web UI shows two META regions
[ https://issues.apache.org/jira/browse/HBASE-3402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977979#action_12977979 ] Hudson commented on HBASE-3402: --- Integrated in HBase-TRUNK #1703 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1703/]) Web UI shows two META regions - Key: HBASE-3402 URL: https://issues.apache.org/jira/browse/HBASE-3402 Project: HBase Issue Type: Bug Components: master, regionserver Affects Versions: 0.90.0 Reporter: Todd Lipcon Assignee: stack Priority: Critical Fix For: 0.90.0 Attachments: two-metas.png Running 0...@r1052112 I see two regions for META on the same server. Both have start key '-' and end key '-'. Things seem to work OK, but it's very strange. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.
[ https://issues.apache.org/jira/browse/HBASE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12977983#action_12977983 ] stack commented on HBASE-3419: -- Chatting about this up on IRC, the tickle does not happen if we are skipping edits. Thats wrong. We should tickle even if we skip edits . Regards making progressable a Chore, I'd say not exactly. Progressable is about whether or no progress is being made. We dont' want the tickle to happen if we are stuck on HDFS. Chatting w/ Jon, the tickle should happen not after N edits but after P milliseconds AS LONG AS we're making progress. Also, killing regionserver if we fail replay recovered.edits in time is wrong. Instead we should fail the region open. If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open. - Key: HBASE-3419 URL: https://issues.apache.org/jira/browse/HBASE-3419 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.90.0, 0.92.0 Reporter: Jonathan Gray Priority: Critical Fix For: 0.90.1, 0.92.0 The {{Progressable}} used on region open to tickle the ZK OPENING node to prevent the master from timing out a region open operation will currently abort the RegionServer if this fails for some reason. However it could be normal for an RS to have a region open operation aborted by the master, so should just handle as it does other places by reverting the open. We had a cluster trip over some other issue (for some reason, the tickle was not happening in 30 seconds, so master was timing out every time). Because of the abort on BadVersion, this eventually led to every single RS aborting itself eventually taking down the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.
[ https://issues.apache.org/jira/browse/HBASE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray reassigned HBASE-3419: Assignee: Jonathan Gray Working on implementing what stack outlined above. If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open. - Key: HBASE-3419 URL: https://issues.apache.org/jira/browse/HBASE-3419 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.90.0, 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.90.1, 0.92.0 The {{Progressable}} used on region open to tickle the ZK OPENING node to prevent the master from timing out a region open operation will currently abort the RegionServer if this fails for some reason. However it could be normal for an RS to have a region open operation aborted by the master, so should just handle as it does other places by reverting the open. We had a cluster trip over some other issue (for some reason, the tickle was not happening in 30 seconds, so master was timing out every time). Because of the abort on BadVersion, this eventually led to every single RS aborting itself eventually taking down the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3421) Very wide rows -- 30M plus -- cause us OOME
[ https://issues.apache.org/jira/browse/HBASE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978016#action_12978016 ] Nicolas Spiegelberg commented on HBASE-3421: Note that you can limit the number of StoreFiles that can be compacted at one time... Store.java#204: this.maxFilesToCompact = conf.getInt(hbase.hstore.compaction.max, 10) 30M * 10 SF == 300MB. What is your RAM capacity? You are likely stuck on an merging outlier that exists in every SF. I would run: bin/hbase org.apache.hadoop.hbase.io.hfile.HFile -f FILE_NAME -p |sed 's/V:.*$//g'|less on the HFiles in that Store to see what your high watermark is. Very wide rows -- 30M plus -- cause us OOME --- Key: HBASE-3421 URL: https://issues.apache.org/jira/browse/HBASE-3421 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack From the list, see 'jvm oom' in http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/browser, it looks like wide rows -- 30M or so -- causes OOME during compaction. We should check it out. Can the scanner used during compactions use the 'limit' when nexting? If so, this should save our OOME'ing (or, we need to add to the next a max size rather than count of KVs). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.
[ https://issues.apache.org/jira/browse/HBASE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3419: - Attachment: HBASE-3419-v1.patch As outlined. Had to add new {{CancelableProgressable}} interface because we needed to be able to tell the caller to cancel the operation. If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open. - Key: HBASE-3419 URL: https://issues.apache.org/jira/browse/HBASE-3419 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.90.0, 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.90.1, 0.92.0 Attachments: HBASE-3419-v1.patch The {{Progressable}} used on region open to tickle the ZK OPENING node to prevent the master from timing out a region open operation will currently abort the RegionServer if this fails for some reason. However it could be normal for an RS to have a region open operation aborted by the master, so should just handle as it does other places by reverting the open. We had a cluster trip over some other issue (for some reason, the tickle was not happening in 30 seconds, so master was timing out every time). Because of the abort on BadVersion, this eventually led to every single RS aborting itself eventually taking down the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3403) Region orphaned after failure during split
[ https://issues.apache.org/jira/browse/HBASE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978035#action_12978035 ] stack commented on HBASE-3403: -- bq. + cluster.getMaster().catalogJanitorSwitch(false); I added above switch to avoid the unlikely but *perhaps* possible case of split, compaction in each daughter, and run of catalogjanitor happens before we get our edit of .META. in. Just trying to do all I can to avoid a debug of failed tests up on hudson. NP on changing name of method. Will do. bq. Does this change introduce a new bug? Yes. That could happen. Unlikely, but perhaps. Let me spin a new patch. Region orphaned after failure during split -- Key: HBASE-3403 URL: https://issues.apache.org/jira/browse/HBASE-3403 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.90.0 Attachments: 3403.txt, broken-split.txt, hbck-fix-missing-in-meta.txt, master-logs.txt.gz ERROR: Region hdfs://haus01.sf.cloudera.com:11020/hbase-normal/usertable/2ad8df700eea55f70e02ea89178a65a2 on HDFS, but not listed in META or deployed on any region server. ERROR: Found inconsistency in table usertable Not sure how I got into this state, will look through logs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3419) If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open.
[ https://issues.apache.org/jira/browse/HBASE-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3419: - Attachment: HBASE-3419-v2.patch Squashed the v1 patch with another patch. v2 is just this stuff. If re-transition to OPENING during log replay fails, server aborts. Instead, should just cancel region open. - Key: HBASE-3419 URL: https://issues.apache.org/jira/browse/HBASE-3419 Project: HBase Issue Type: Bug Components: regionserver, zookeeper Affects Versions: 0.90.0, 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.90.1, 0.92.0 Attachments: HBASE-3419-v1.patch, HBASE-3419-v2.patch The {{Progressable}} used on region open to tickle the ZK OPENING node to prevent the master from timing out a region open operation will currently abort the RegionServer if this fails for some reason. However it could be normal for an RS to have a region open operation aborted by the master, so should just handle as it does other places by reverting the open. We had a cluster trip over some other issue (for some reason, the tickle was not happening in 30 seconds, so master was timing out every time). Because of the abort on BadVersion, this eventually led to every single RS aborting itself eventually taking down the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3421) Very wide rows -- 30M plus -- cause us OOME
[ https://issues.apache.org/jira/browse/HBASE-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978054#action_12978054 ] Nicolas Spiegelberg commented on HBASE-3421: For interested parties... From: Ted Yu Hi, I used the command you suggested in HBASE-3421 on a table and got: K: 0012F2157E58883070B9814047048E8B/v:_/1283909035492/Put/vlen=1308 K: 0041A80A545C4CBF412865412065BF5E/v:_/1283909035492/Put/vlen=1311 K: 00546F4AA313020E551E049E848949C6/v:_/1283909035492/Put/vlen=1866 K: 0068CC263C81CE65B65FC5425EFEBBCD/v:_/1283909035492/Put/vlen=1191 K: 006DB8745D6D1B624F77E0F06C177C0B/v:_/1283909035492/Put/vlen=1021 K: 006F9037BD7A8F081B54C5B03756C143/v:_/1283909035492/Put/vlen=1382 ... Can you briefly describe what conclusion can be drawn here ? ~~~ From: Nicolas Spiegelberg You're basically seeing all the KeyValues in that HFile. The format is basically: K: KeyValue.toString() If you look at KeyValue.toString(), you'll see that the format is roughly: row/family:qualifier/timestamp/type/value_length So, it looks like you only have one qualifier per row and each row is roughly ~1500 bytes of data. For the user with the 30K columns per row, you should see an output that contains a ton of lines with the same row. If you grep that row, cut the number after vlen=, and sum the values, you can see the size of your rows on a per-Hfile basis. Very wide rows -- 30M plus -- cause us OOME --- Key: HBASE-3421 URL: https://issues.apache.org/jira/browse/HBASE-3421 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack From the list, see 'jvm oom' in http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/browser, it looks like wide rows -- 30M or so -- causes OOME during compaction. We should check it out. Can the scanner used during compactions use the 'limit' when nexting? If so, this should save our OOME'ing (or, we need to add to the next a max size rather than count of KVs). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
[ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3417: - Attachment: HBASE-3417-v1.patch Changes storefile names to be UUIDs. Makes it so we use the same name for the tmp file and the permanent file. Updates a regex which now matches against 32 char word string instead of digits. Changes HFile to use the file name for block cache block names rather than full path. CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme -- Key: HBASE-3417 URL: https://issues.apache.org/jira/browse/HBASE-3417 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.92.0 Attachments: HBASE-3417-v1.patch Currently the block names used in the block cache are built using the filesystem path. However, for cache on write, the path is a temporary output file. The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough. Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HBASE-3423) hbase-env.sh over-rides HBASE_OPTS incorrectly.
hbase-env.sh over-rides HBASE_OPTS incorrectly. --- Key: HBASE-3423 URL: https://issues.apache.org/jira/browse/HBASE-3423 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Ted Dunning Fix For: 0.90.0 conf/hbase-env.sh has the following line: export HBASE_OPTS=-ea -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode This should be export HBASE_OPTS=$HBASE_OPTS -ea -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
[ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray updated HBASE-3417: - Attachment: HBASE-3417-v2.patch Makes it so we don't have to parse fileName for each block when doing CacheOnWrite. CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme -- Key: HBASE-3417 URL: https://issues.apache.org/jira/browse/HBASE-3417 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.92.0 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch Currently the block names used in the block cache are built using the filesystem path. However, for cache on write, the path is a temporary output file. The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough. Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
[ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978122#action_12978122 ] stack commented on HBASE-3417: -- As discussed up on IRC, this is not backward compatible: {code} +Pattern.compile(^(\\w{32})(?:\\.(.+))?$); {code} You can do a range IIRC 20-32 (was old length 20 chars?) The below is a little bit messy.: {code} +return new Path(dir, UUID.randomUUID().toString().replaceAll(-, ) ++ ((suffix == null || suffix.length() = 0) ? : suffix)); {code} Up on IRC, was thinking should base64 because then it'd be more compact. See http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string. There is also in hbase util a Base64#encodeBytes method that will take the 128 UUID bits and emit them as base64 (Possible to get it all down to 22 chars). But looking at the base64 vocabulary, http://en.wikipedia.org/wiki/Base64, it includes '+' and '/' which are illegal in URL, a hdfs filepath. Base32? http://en.wikipedia.org/wiki/Base32? But that won't work either. Has to be multiples of 40 bits. Maybe leave it as it comes out of UUID.toString w/ hyphens. Then its plain its a UUID and its easier to read? CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme -- Key: HBASE-3417 URL: https://issues.apache.org/jira/browse/HBASE-3417 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.92.0 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch Currently the block names used in the block cache are built using the filesystem path. However, for cache on write, the path is a temporary output file. The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough. Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
[ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978124#action_12978124 ] Jonathan Gray commented on HBASE-3417: -- I changed regex to be {{([0-9a-z]+)}} I kind of like how it is. It looks just like the encoded region names used for region directory names. CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme -- Key: HBASE-3417 URL: https://issues.apache.org/jira/browse/HBASE-3417 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.92.0 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch Currently the block names used in the block cache are built using the filesystem path. However, for cache on write, the path is a temporary output file. The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough. Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3417) CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme
[ https://issues.apache.org/jira/browse/HBASE-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978125#action_12978125 ] Jonathan Gray commented on HBASE-3417: -- Old random file name was using rand.nextLong() so it could be any length = 1. CacheOnWrite is using the temporary output path for block names, need to use a more consistent block naming scheme -- Key: HBASE-3417 URL: https://issues.apache.org/jira/browse/HBASE-3417 Project: HBase Issue Type: Bug Components: io, regionserver Affects Versions: 0.92.0 Reporter: Jonathan Gray Assignee: Jonathan Gray Priority: Critical Fix For: 0.92.0 Attachments: HBASE-3417-v1.patch, HBASE-3417-v2.patch Currently the block names used in the block cache are built using the filesystem path. However, for cache on write, the path is a temporary output file. The original COW patch actually made some modifications to block naming stuff to make it more consistent but did not do enough. Should add a separate method somewhere for generating block names using some more easily mocked scheme (rather than just raw path as we generate a random unique file name twice, once for tmp and then again when moved into place). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3420) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state
[ https://issues.apache.org/jira/browse/HBASE-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978156#action_12978156 ] Hudson commented on HBASE-3420: --- Integrated in HBase-TRUNK #1705 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1705/]) Handling a big rebalance, we can queue multiple instances of a Close event; messes up state --- Key: HBASE-3420 URL: https://issues.apache.org/jira/browse/HBASE-3420 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: stack Assignee: stack Fix For: 0.90.0 Attachments: 3420.txt This is pretty ugly. In short, on a heavily loaded cluster, we are queuing multiple instances of region close. They all try to run confusing state. Long version: I have a messy cluster. Its 16k regions on 8 servers. One node has 5k or so regions on it. Heaps are 1G all around. My master had OOME'd. Not sure why but not too worried about it for now. So, new master comes up and is trying to rebalance the cluster: {code} 2011-01-05 00:48:07,385 INFO org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance in 14ms. Moving 3666 regions off of 6 overloaded servers onto 3 less loaded servers {code} The balancer ends up sending many closes to a single overloaded server are taking so long, the close times out in RIT. We then do this: {code} case CLOSED: LOG.info(Region has been CLOSED for too long, + retriggering ClosedRegionHandler); AssignmentManager.this.executorService.submit( new ClosedRegionHandler(master, AssignmentManager.this, regionState.getRegion())); break; {code} We queue a new close (Should we?). We time out a few more times (9 times) and each time we queue a new close. Eventually the close succeeds, the region gets assigned a new location. Then the next close pops off the eventhandler queue. Here is the telltale signature of stuff gone amiss: {code} 2011-01-05 00:52:19,379 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. state=OPEN, ts=1294188709030 {code} Notice how state is OPEN when we are forcing offline (It was actually just successfully opened). We end up assigning same server because plan was still around: {code} 2011-01-05 00:52:20,705 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open of TestTable,0487405776,1294125523541.b1fa38bb610943e9eadc604babe4d041. but already online on this server {code} But later when plan is cleared, we assign new server and we have dbl-assignment. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3423) hbase-env.sh over-rides HBASE_OPTS incorrectly.
[ https://issues.apache.org/jira/browse/HBASE-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978157#action_12978157 ] Hudson commented on HBASE-3423: --- Integrated in HBase-TRUNK #1705 (See [https://hudson.apache.org/hudson/job/HBase-TRUNK/1705/]) HBASE-3423 hbase-env.sh overrides HBASE_OPTS incorrectly hbase-env.sh over-rides HBASE_OPTS incorrectly. --- Key: HBASE-3423 URL: https://issues.apache.org/jira/browse/HBASE-3423 Project: HBase Issue Type: Bug Affects Versions: 0.90.0 Reporter: Ted Dunning Fix For: 0.90.0, 0.92.0 conf/hbase-env.sh has the following line: export HBASE_OPTS=-ea -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode This should be export HBASE_OPTS=$HBASE_OPTS -ea -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-3379) Log splitting slowed by repeated attempts at connecting to downed datanode
[ https://issues.apache.org/jira/browse/HBASE-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12978160#action_12978160 ] Hairong Kuang commented on HBASE-3379: -- Stack, HBASE-3285 should be able to fix the problem by avoiding this code path. This is the solution that our fb internal trunk uses. Log splitting slowed by repeated attempts at connecting to downed datanode -- Key: HBASE-3379 URL: https://issues.apache.org/jira/browse/HBASE-3379 Project: HBase Issue Type: Bug Components: wal Reporter: stack Priority: Critical Testing if I kill RS and DN on a node, log splitting takes longer as we doggedly try connecting to the downed DN to get WAL blocks. Here's the cycle I see: {code} 2010-12-21 17:34:48,239 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_900551257176291912_1203821 failed because recovery from primary datanode 10.20.20.182:10010 failed 5 times.Pipeline was 10.20.20.184:10010, 10.20.20.186:10010, 10.20.20.182:10010. Will retry... 2010-12-21 17:34:50,240 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.20.20.182:10020. Already tried 0 time(s). 2010-12-21 17:34:51,241 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.20.20.182:10020. Already tried 1 time(s). 2010-12-21 17:34:52,241 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.20.20.182:10020. Already tried 2 time(s). 2010-12-21 17:34:53,242 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.20.20.182:10020. Already tried 3 time(s). 2010-12-21 17:34:54,243 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.20.20.182:10020. Already tried 4 time(s). 2010-12-21 17:34:55,243 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.20.20.182:10020. Already tried 5 time(s). 2010-12-21 17:34:56,244 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.20.20.182:10020. Already tried 6 time(s). 2010-12-21 17:34:57,245 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.20.20.182:10020. Already tried 7 time(s). 2010-12-21 17:34:58,245 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.20.20.182:10020. Already tried 8 time(s). 2010-12-21 17:34:59,246 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: /10.20.20.182:10020. Already tried 9 time(s). 2010-12-21 17:34:59,246 WARN org.apache.hadoop.hdfs.DFSClient: Failed recovery attempt #5 from primary datanode 10.20.20.182:10010 java.net.ConnectException: Call to /10.20.20.182:10020 failed on connection exception: java.net.ConnectException: Connection refused at org.apache.hadoop.ipc.Client.wrapException(Client.java:767) at org.apache.hadoop.ipc.Client.call(Client.java:743) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy8.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383) ... {code} because recovery from primary datanode is done 5 times (hardcoded). Within these retries we'll do {code} this.maxRetries = conf.getInt(ipc.client.connect.max.retries, 10); {code} The hardcoding of 5 attempts we should get fixed and we should doc the ipc.client.connect.max.retries as important config. We should recommend bringing it down from default. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.