[jira] [Created] (HBASE-3830) dumb JVM figure out a deadlock on hbase
dumb JVM figure out a deadlock on hbase --- Key: HBASE-3830 URL: https://issues.apache.org/jira/browse/HBASE-3830 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.1 Reporter: zhoushuaifeng Found one Java-level deadlock: = IPC Server handler 9 on 60020: waiting to lock monitor 0x409f3908 (object 0x7fe7cbacbd48, a org.apache.hadoop.hbase.regionserver.MemStoreFlusher), which is held by IPC Server handler 7 on 60020 IPC Server handler 7 on 60020: waiting for ownable synchronizer 0x7fe7cbb06228, (a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is held by regionserver60020.cacheFlusher regionserver60020.cacheFlusher: waiting to lock monitor 0x409f3908 (object 0x7fe7cbacbd48, a org.apache.hadoop.hbase.regionserver.MemStoreFlusher), which is held by IPC Server handler 7 on 60020 Java stack information for the threads listed above: === IPC Server handler 9 on 60020: at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java) - waiting to lock 0x7fe7cbacbd48 (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) IPC Server handler 7 on 60020: at sun.misc.Unsafe.$$YJP$$park(Native Method) - parking to wait for 0x7fe7cbb06228 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at sun.misc.Unsafe.park(Unsafe.java) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:429) - locked 0x7fe7cbacbd48 (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) regionserver60020.cacheFlusher: at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506) - waiting to lock 0x7fe7cbacbd48 (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292) at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234) at java.util.ResourceBundle.getBundle(ResourceBundle.java:832) at sun.util.resources.LocaleData$1.run(LocaleData.java:127) at java.security.AccessController.$$YJP$$doPrivileged(Native Method) at java.security.AccessController.doPrivileged(AccessController.java) at sun.util.resources.LocaleData.getBundle(LocaleData.java:125) at sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97) at sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115) at sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80) at java.util.TimeZone.getDisplayNames(TimeZone.java:399) at java.util.TimeZone.getDisplayName(TimeZone.java:350) at java.util.Date.toString(Date.java:1025) at
[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration
[ https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027084#comment-13027084 ] Karthick Sankarachary commented on HBASE-3777: -- {quote}The mapping should really be cluster uuid (if such a thing exists) to connection. Perhaps there's a hmaster md5 that can be used in lieu of cluster-uuid sitting in ZK that can be probed?{quote} The thing is that a {{HConnection}}'s behavior is determined not just by the server-side cluster it goes against, but also its client-side properties, such as hbase.client.retries.number, hbase.client.prefetch.limit, and so on. Ergo, we really need a different connection for every unique set of connection-specific config properties, whether it be client- or server-specific. {quote}Perhaps there's a hmaster md5 that can be used in lieu of cluster-uuid sitting in ZK that can be probed?{quote} As per the [ZK/HBase use cases|http://wiki.apache.org/hadoop/ZooKeeper/HBaseUseCases] wiki, in theory we can have multiple masters registered with the ZK (to eliminate any SPOFs perhaps?). So, I'm not sure we can presuppose what hmaster we'll be going to at any given point in time. {quote}Then, an alternative other way is to go ahead and make the extra connection and use it to determine which cluster the client is going against. If it's a previously-seen cluster, close this newly-created connection, and use the stashed one. Else this is a new cluster and create a new mapping entry.{quote} The whole purpose of this patch was to reduce the number of connections by reusing them to the extent possible. At one point, the config's {{equals}} method was treated as the key to the connection, which promoted reuse to some extent, but started breaking down if the config was changed after the fact. Currently, the config's identity (object reference) is treated as the key, but that suffers from connection overload. Hopefully, the {{HConnectionKey}} defined in the HCM will serve as a happy medium between the two ends of the spectrum. Redefine Identity Of HBase Configuration Key: HBASE-3777 URL: https://issues.apache.org/jira/browse/HBASE-3777 Project: HBase Issue Type: Improvement Components: client, ipc Affects Versions: 0.90.2 Reporter: Karthick Sankarachary Assignee: Karthick Sankarachary Priority: Minor Fix For: 0.92.0 Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, HBASE-3777-V3.patch, HBASE-3777-V4.patch, HBASE-3777-V6.patch, HBASE-3777.patch Judging from the javadoc in {{HConnectionManager}}, sharing connections across multiple clients going to the same cluster is supposedly a good thing. However, the fact that there is a one-to-one mapping between a configuration and connection instance, kind of works against that goal. Specifically, when you create {{HTable}} instances using a given {{Configuration}} instance and a copy thereof, we end up with two distinct {{HConnection}} instances under the covers. Is this really expected behavior, especially given that the configuration instance gets cloned a lot? Here, I'd like to play devil's advocate and propose that we deep-compare {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} instances that have the same properties map to the same {{HConnection}} instance. In case one is concerned that a single {{HConnection}} is insufficient for sharing amongst clients, to quote the javadoc, then one should be able to mark a given {{HBaseConfiguration}} instance as being uniquely identifiable. Note that sharing connections makes clean up of {{HConnection}} instances a little awkward, unless of course, you apply the change described in HBASE-3766. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs
docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs -- Key: HBASE-3831 URL: https://issues.apache.org/jira/browse/HBASE-3831 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor To improve readability... regionserver, region server == RegionServer datanode, data node == DataNode zookeeper == ZooKeeper -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs
[ https://issues.apache.org/jira/browse/HBASE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-3831: - Attachment: book_HBASE_3831.xml.patch docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs -- Key: HBASE-3831 URL: https://issues.apache.org/jira/browse/HBASE-3831 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_HBASE_3831.xml.patch, configuration_HBASE_3831.xml.patch, getting_started_HBASE_3831.xml.patch, performance_HBASE_3831.xml.patch, troubleshooting_HBASE_3831.xml.patch To improve readability... regionserver, region server == RegionServer datanode, data node == DataNode zookeeper == ZooKeeper -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs
[ https://issues.apache.org/jira/browse/HBASE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-3831: - Attachment: performance_HBASE_3831.xml.patch docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs -- Key: HBASE-3831 URL: https://issues.apache.org/jira/browse/HBASE-3831 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_HBASE_3831.xml.patch, configuration_HBASE_3831.xml.patch, getting_started_HBASE_3831.xml.patch, performance_HBASE_3831.xml.patch, troubleshooting_HBASE_3831.xml.patch To improve readability... regionserver, region server == RegionServer datanode, data node == DataNode zookeeper == ZooKeeper -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs
[ https://issues.apache.org/jira/browse/HBASE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-3831: - Attachment: troubleshooting_HBASE_3831.xml.patch docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs -- Key: HBASE-3831 URL: https://issues.apache.org/jira/browse/HBASE-3831 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_HBASE_3831.xml.patch, configuration_HBASE_3831.xml.patch, getting_started_HBASE_3831.xml.patch, performance_HBASE_3831.xml.patch, troubleshooting_HBASE_3831.xml.patch To improve readability... regionserver, region server == RegionServer datanode, data node == DataNode zookeeper == ZooKeeper -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs
[ https://issues.apache.org/jira/browse/HBASE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-3831: - Attachment: getting_started_HBASE_3831.xml.patch docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs -- Key: HBASE-3831 URL: https://issues.apache.org/jira/browse/HBASE-3831 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_HBASE_3831.xml.patch, configuration_HBASE_3831.xml.patch, getting_started_HBASE_3831.xml.patch, performance_HBASE_3831.xml.patch, troubleshooting_HBASE_3831.xml.patch To improve readability... regionserver, region server == RegionServer datanode, data node == DataNode zookeeper == ZooKeeper -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs
[ https://issues.apache.org/jira/browse/HBASE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Meil updated HBASE-3831: - Attachment: configuration_HBASE_3831.xml.patch docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs -- Key: HBASE-3831 URL: https://issues.apache.org/jira/browse/HBASE-3831 Project: HBase Issue Type: Improvement Reporter: Doug Meil Assignee: Doug Meil Priority: Minor Attachments: book_HBASE_3831.xml.patch, configuration_HBASE_3831.xml.patch, getting_started_HBASE_3831.xml.patch, performance_HBASE_3831.xml.patch, troubleshooting_HBASE_3831.xml.patch To improve readability... regionserver, region server == RegionServer datanode, data node == DataNode zookeeper == ZooKeeper -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3830) MemStoreFlusher deadlock detected by JVM
[ https://issues.apache.org/jira/browse/HBASE-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-3830: -- Summary: MemStoreFlusher deadlock detected by JVM (was: dumb JVM figure out a deadlock on hbase) MemStoreFlusher deadlock detected by JVM Key: HBASE-3830 URL: https://issues.apache.org/jira/browse/HBASE-3830 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.90.1 Reporter: zhoushuaifeng Found one Java-level deadlock: = IPC Server handler 9 on 60020: waiting to lock monitor 0x409f3908 (object 0x7fe7cbacbd48, a org.apache.hadoop.hbase.regionserver.MemStoreFlusher), which is held by IPC Server handler 7 on 60020 IPC Server handler 7 on 60020: waiting for ownable synchronizer 0x7fe7cbb06228, (a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is held by regionserver60020.cacheFlusher regionserver60020.cacheFlusher: waiting to lock monitor 0x409f3908 (object 0x7fe7cbacbd48, a org.apache.hadoop.hbase.regionserver.MemStoreFlusher), which is held by IPC Server handler 7 on 60020 Java stack information for the threads listed above: === IPC Server handler 9 on 60020: at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java) - waiting to lock 0x7fe7cbacbd48 (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) IPC Server handler 7 on 60020: at sun.misc.Unsafe.$$YJP$$park(Native Method) - parking to wait for 0x7fe7cbb06228 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at sun.misc.Unsafe.park(Unsafe.java) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:429) - locked 0x7fe7cbacbd48 (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) regionserver60020.cacheFlusher: at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506) - waiting to lock 0x7fe7cbacbd48 (a org.apache.hadoop.hbase.regionserver.MemStoreFlusher) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379) at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292) at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234) at java.util.ResourceBundle.getBundle(ResourceBundle.java:832) at sun.util.resources.LocaleData$1.run(LocaleData.java:127) at java.security.AccessController.$$YJP$$doPrivileged(Native Method) at java.security.AccessController.doPrivileged(AccessController.java) at sun.util.resources.LocaleData.getBundle(LocaleData.java:125) at sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97) at
[jira] [Commented] (HBASE-3721) Speedup LoadIncrementalHFiles
[ https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027178#comment-13027178 ] jirapos...@reviews.apache.org commented on HBASE-3721: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/572/ --- (Updated 2011-04-29 20:48:41.082584) Review request for hbase and Todd Lipcon. Changes --- Simplified the changes for this JIRA according to Todd's review. TestLoadIncrementalHFiles and TestHFileOutputFormat pass. Summary --- I refactored LoadIncrementalHFiles so that tryLoad() queues work items in ListServerCallableVoid. doBulkLoad() periodically sends batch of ServerCallable's to HBase cluster. I added the following method to HConnection/HConnectionManager: public T void getRegionServerWithRetries(ExecutorService pool, ListServerCallableT callables, Object[] results) This method uses thread pool to send multiple ServerCallable's through getRegionServerWithRetries(ServerCallableT callable). I introduced two new config parameters: hbase.loadincremental.threads.max and hbase.loadincremental.batch.size hbase.loadincremental.batch.size is for configuring the batch size above which HConnection.getRegionServerWithRetries() would be called. In Adam's case, there're many small HFiles. LoadIncrementalHFiles shouldn't wait until all HFiles have been scanned. hbase.loadincremental.threads.max controls the maximum number of threads in thread pool. This addresses bug HBASE-3721. https://issues.apache.org/jira/browse/HBASE-3721 Diffs (updated) - /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 1097897 Diff: https://reviews.apache.org/r/572/diff Testing --- TestLoadIncrementalHFiles and TestHFileOutputFormat pass. Thanks, Ted Speedup LoadIncrementalHFiles - Key: HBASE-3721 URL: https://issues.apache.org/jira/browse/HBASE-3721 Project: HBase Issue Type: Improvement Components: util Reporter: Ted Yu Assignee: Ted Yu Attachments: 3721-v2.txt, 3721-v3.txt, 3721-v4.txt, 3721.txt From Adam Phelps: from the logs it looks like 1% of the hfiles we're loading have to be split. Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually thinking our problem is that this code loads the hfiles sequentially. Our largest table has over 2500 regions and the data being loaded is fairly well distributed across them, so there end up being around 2500 HFiles for each load period. At 1-2 seconds per HFile that means the loading process is very time consuming. Currently server.bulkLoadHFile() is a blocking call. We can utilize ExecutorService to achieve better parallelism on multi-core computer. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-3796) Per-Store Entries in Compaction Queue
[ https://issues.apache.org/jira/browse/HBASE-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Spiegelberg resolved HBASE-3796. Resolution: Fixed Fix Version/s: 0.92.0 +1 Peer reviewed applied Kathik's fix Per-Store Entries in Compaction Queue - Key: HBASE-3796 URL: https://issues.apache.org/jira/browse/HBASE-3796 Project: HBase Issue Type: Bug Reporter: Nicolas Spiegelberg Assignee: Karthik Ranganathan Priority: Minor Fix For: 0.92.0 Attachments: HBASE-3796-fixed.patch, HBASE-3796.patch Although compaction is decided on a per-store basis, right now the CompactSplitThread only deals at the Region level for queueing. Store-level compaction queue entries will give us more visibility into compaction workload + allow us to stop summarizing priorities. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3832) Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on jenkins
Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on jenkins Key: HBASE-3832 URL: https://issues.apache.org/jira/browse/HBASE-3832 Project: HBase Issue Type: Bug Reporter: stack Root region is stuck in RIT. Seems to be because of this: 3316 2011-04-29 05:53:11,941 WARN [Thread-642-EventThread] master.AssignmentManager(518): Received OPENED for region 70236052/-ROOT- from server vesta.apache.org,57336,1304056370834 but region was in the state null and not in expected PENDING_OPEN or OPENING states Later I see this: 3334 2011-04-29 05:53:12,014 DEBUG [Master:0;vesta.apache.org,36450,1304056384388] master.AssignmentManager(260): Found REGION = {NAME = '-ROOT-,,0', STARTKEY = '', ENDKEY = '', ENCODED = 70236052, TABLE = {{NAME = '-ROOT-', IS_ROOT = 'true', IS_META = 'true', FAMILIES = [{NAME = 'info', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '10', TTL = '2147483647', BLOCKSIZE = '8192', IN_MEMORY = 'true', BLOCKCACHE = 'true'}]}}=vesta.apache.org,57336,1304056370834 in RITs The former makes it so we don't clear a successfully opened -ROOT- from RIT so we get the second line and then the test fails with: 5192 2011-04-29 05:55:42,181 DEBUG [Thread-642] zookeeper.ZKAssign(815): ZK RIT - 70236052 printed over and over again. I don't get why the data is null in the zk when RS has updated it a couple of times. I see we do a double regionOnline in master code. This clears in-memory master state. I don't think this it but will commit this and more logging to help w/ the debug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-3832) Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on jenkins
[ https://issues.apache.org/jira/browse/HBASE-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-3832: - Attachment: 3832.txt Removes an extraneous regionOnline. Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on jenkins Key: HBASE-3832 URL: https://issues.apache.org/jira/browse/HBASE-3832 Project: HBase Issue Type: Bug Reporter: stack Attachments: 3832.txt Root region is stuck in RIT. Seems to be because of this: 3316 2011-04-29 05:53:11,941 WARN [Thread-642-EventThread] master.AssignmentManager(518): Received OPENED for region 70236052/-ROOT- from server vesta.apache.org,57336,1304056370834 but region was in the state null and not in expected PENDING_OPEN or OPENING states Later I see this: 3334 2011-04-29 05:53:12,014 DEBUG [Master:0;vesta.apache.org,36450,1304056384388] master.AssignmentManager(260): Found REGION = {NAME = '-ROOT-,,0', STARTKEY = '', ENDKEY = '', ENCODED = 70236052, TABLE = {{NAME = '-ROOT-', IS_ROOT = 'true', IS_META = 'true', FAMILIES = [{NAME = 'info', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '10', TTL = '2147483647', BLOCKSIZE = '8192', IN_MEMORY = 'true', BLOCKCACHE = 'true'}]}}=vesta.apache.org,57336,1304056370834 in RITs The former makes it so we don't clear a successfully opened -ROOT- from RIT so we get the second line and then the test fails with: 5192 2011-04-29 05:55:42,181 DEBUG [Thread-642] zookeeper.ZKAssign(815): ZK RIT - 70236052 printed over and over again. I don't get why the data is null in the zk when RS has updated it a couple of times. I see we do a double regionOnline in master code. This clears in-memory master state. I don't think this it but will commit this and more logging to help w/ the debug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3832) Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on jenkins
[ https://issues.apache.org/jira/browse/HBASE-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027235#comment-13027235 ] stack commented on HBASE-3832: -- Committed patch. Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on jenkins Key: HBASE-3832 URL: https://issues.apache.org/jira/browse/HBASE-3832 Project: HBase Issue Type: Bug Reporter: stack Attachments: 3832.txt Root region is stuck in RIT. Seems to be because of this: 3316 2011-04-29 05:53:11,941 WARN [Thread-642-EventThread] master.AssignmentManager(518): Received OPENED for region 70236052/-ROOT- from server vesta.apache.org,57336,1304056370834 but region was in the state null and not in expected PENDING_OPEN or OPENING states Later I see this: 3334 2011-04-29 05:53:12,014 DEBUG [Master:0;vesta.apache.org,36450,1304056384388] master.AssignmentManager(260): Found REGION = {NAME = '-ROOT-,,0', STARTKEY = '', ENDKEY = '', ENCODED = 70236052, TABLE = {{NAME = '-ROOT-', IS_ROOT = 'true', IS_META = 'true', FAMILIES = [{NAME = 'info', BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', VERSIONS = '10', TTL = '2147483647', BLOCKSIZE = '8192', IN_MEMORY = 'true', BLOCKCACHE = 'true'}]}}=vesta.apache.org,57336,1304056370834 in RITs The former makes it so we don't clear a successfully opened -ROOT- from RIT so we get the second line and then the test fails with: 5192 2011-04-29 05:55:42,181 DEBUG [Thread-642] zookeeper.ZKAssign(815): ZK RIT - 70236052 printed over and over again. I don't get why the data is null in the zk when RS has updated it a couple of times. I see we do a double regionOnline in master code. This clears in-memory master state. I don't think this it but will commit this and more logging to help w/ the debug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3827) hbase-1502, removing heartbeats, broke master joining a running cluster and was returning master hostname for rs to use
[ https://issues.apache.org/jira/browse/HBASE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027254#comment-13027254 ] Hudson commented on HBASE-3827: --- Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/]) hbase-1502, removing heartbeats, broke master joining a running cluster and was returning master hostname for rs to use --- Key: HBASE-3827 URL: https://issues.apache.org/jira/browse/HBASE-3827 Project: HBase Issue Type: Bug Affects Versions: 0.92.0 Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: 3827.txt A couple of silly issues in hbase-1502 turned up by cluster testing TRUNK. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3629) Update our thrift to 0.6
[ https://issues.apache.org/jira/browse/HBASE-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027257#comment-13027257 ] Hudson commented on HBASE-3629: --- Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/]) Update our thrift to 0.6 Key: HBASE-3629 URL: https://issues.apache.org/jira/browse/HBASE-3629 Project: HBase Issue Type: Task Reporter: stack Assignee: Moaz Reyad Fix For: 0.92.0 Attachments: HBASE-3629.patch.zip, pom.diff HBASE-3117 was about updating to 0.5. Moaz Reyad over in that issue is trying to move us to 0.6. Lets move the 0.6 upgrade effort here. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1921) When the Master's session times out and there's only one, cluster is wedged
[ https://issues.apache.org/jira/browse/HBASE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027258#comment-13027258 ] Hudson commented on HBASE-1921: --- Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/]) When the Master's session times out and there's only one, cluster is wedged --- Key: HBASE-1921 URL: https://issues.apache.org/jira/browse/HBASE-1921 Project: HBase Issue Type: Bug Affects Versions: 0.20.1 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.20.2, 0.90.0 Attachments: HBASE-1921-trunk.patch, HBASE-1921.patch On IRC, some fella had a session expiration on his Master and had only one. Maybe in this case the Master should first try to re-get the znode? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3805) Log RegionState that are processed too late in the master
[ https://issues.apache.org/jira/browse/HBASE-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027256#comment-13027256 ] Hudson commented on HBASE-3805: --- Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/]) Log RegionState that are processed too late in the master -- Key: HBASE-3805 URL: https://issues.apache.org/jira/browse/HBASE-3805 Project: HBase Issue Type: Improvement Affects Versions: 0.90.2 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Minor Fix For: 0.90.3 Attachments: HBASE-3805.patch Working on all the weird delayed processing in the master, I saw that it was hard to figure when a zookeeper event is processed too late. For example, cases where the processing of the events gets too slow and the master takes more than a minute after the event is triggered in the region server to get to it's processing. We should at least print that out. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3794) TestRpcMetrics fails on machine where region server is running
[ https://issues.apache.org/jira/browse/HBASE-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027262#comment-13027262 ] Hudson commented on HBASE-3794: --- Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/]) TestRpcMetrics fails on machine where region server is running -- Key: HBASE-3794 URL: https://issues.apache.org/jira/browse/HBASE-3794 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.90.2 Reporter: Ted Yu Assignee: Alex Newman Fix For: 0.90.3 Attachments: HBASE-3794.patch Since whole test suite takes over an hour to run, I ran them on Linux where region server is running. Here is the consistent TestRpcMetrics failure I saw: {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.196 sec FAILURE! testCustomMetrics(org.apache.hadoop.hbase.regionserver.TestRpcMetrics) Time elapsed: 0.079 sec ERROR! java.net.BindException: Problem binding to /10.202.50.107:60020 : Address already in use at org.apache.hadoop.hbase.ipc.HBaseServer.bind(HBaseServer.java:216) at org.apache.hadoop.hbase.ipc.HBaseServer$Listener.init(HBaseServer.java:283) at org.apache.hadoop.hbase.ipc.HBaseServer.init(HBaseServer.java:1189) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.init(WritableRpcEngine.java:266) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:233) at org.apache.hadoop.hbase.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:46) at org.apache.hadoop.hbase.ipc.HBaseRPC.getServer(HBaseRPC.java:379) at org.apache.hadoop.hbase.ipc.HBaseRPC.getServer(HBaseRPC.java:368) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:336) at org.apache.hadoop.hbase.regionserver.TestRpcMetrics$TestRegionServer.init(TestRpcMetrics.java:58) at org.apache.hadoop.hbase.regionserver.TestRpcMetrics.testCustomMetrics(TestRpcMetrics.java:119) {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3773) Set ZK max connections much higher in 0.90
[ https://issues.apache.org/jira/browse/HBASE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027264#comment-13027264 ] Hudson commented on HBASE-3773: --- Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/]) Set ZK max connections much higher in 0.90 -- Key: HBASE-3773 URL: https://issues.apache.org/jira/browse/HBASE-3773 Project: HBase Issue Type: Improvement Affects Versions: 0.90.2 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.3 I think by now we can all acknowledge that 0.90 has an issue with ZK connections, in that we create too many of them and it's also too easy for our users to shoot themselves in the foot. For 0.90.3, I think we should change the default configuration of 30 that we ship with and set it much much higher, I'm thinking of 32k. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions
[ https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027260#comment-13027260 ] Hudson commented on HBASE-1512: --- Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/]) Coprocessors: Support aggregate functions - Key: HBASE-1512 URL: https://issues.apache.org/jira/browse/HBASE-1512 Project: HBase Issue Type: Sub-task Components: coprocessors Reporter: stack Assignee: Himanshu Vashishtha Fix For: 0.92.0 Attachments: 1512.zip, AggregateCpProtocol.java, AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, addendum_1512.txt, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, patch-1512-9.txt, patch-1512.txt Chatting with jgray and holstad at the kitchen table about counts, sums, and other aggregating facility, facility generally where you want to calculate some meta info on your table, it seems like it wouldn't be too hard making a filter type that could run a function server-side and return the result ONLY of the aggregation or whatever. For example, say you just want to count rows, currently you scan, server returns all data to client and count is done by client counting up row keys. A bunch of time and resources have been wasted returning data that we're not interested in. With this new filter type, the counting would be done server-side and then it would make up a new result that was the count only (kinda like mysql when you ask it to count, it returns a 'table' with a count column whose value is count of rows). We could have it so the count was just done per region and return that. Or we could maybe make a small change in scanner too so that it aggregated the per-region counts. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart
[ https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027259#comment-13027259 ] Hudson commented on HBASE-3674: --- Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/]) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart -- Key: HBASE-3674 URL: https://issues.apache.org/jira/browse/HBASE-3674 Project: HBase Issue Type: Bug Components: wal Reporter: stack Assignee: stack Priority: Critical Fix For: 0.90.2 Attachments: 3674-distributed.txt, 3674-v2.txt, 3674.txt In short, a ChecksumException will fail log processing for a server so we skip out w/o archiving logs. On restart, we'll then reprocess the logs -- hit the checksumexception anew, usually -- and so on. Here is the splitLog method (edited): {code} private ListPath splitLog(final FileStatus[] logfiles) throws IOException { outputSink.startWriterThreads(entryBuffers); try { int i = 0; for (FileStatus log : logfiles) { Path logPath = log.getPath(); long logLength = log.getLen(); splitSize += logLength; LOG.debug(Splitting hlog + (i++ + 1) + of + logfiles.length + : + logPath + , length= + logLength); try { recoverFileLease(fs, logPath, conf); parseHLog(log, entryBuffers, fs, conf); processedLogs.add(logPath); } catch (EOFException eof) { // truncated files are expected if a RS crashes (see HBASE-2643) LOG.info(EOF from hlog + logPath + . Continuing); processedLogs.add(logPath); } catch (FileNotFoundException fnfe) { // A file may be missing if the region server was able to archive it // before shutting down. This means the edits were persisted already LOG.info(A log was missing + logPath + , probably because it was moved by the + now dead region server. Continuing); processedLogs.add(logPath); } catch (IOException e) { // If the IOE resulted from bad file format, // then this problem is idempotent and retrying won't help if (e.getCause() instanceof ParseException || e.getCause() instanceof ChecksumException) { LOG.warn(ParseException from hlog + logPath + . continuing); processedLogs.add(logPath); } else { if (skipErrors) { LOG.info(Got while parsing hlog + logPath + . Marking as corrupted, e); corruptedLogs.add(logPath); } else { throw e; } } } } if (fs.listStatus(srcDir).length processedLogs.size() + corruptedLogs.size()) { throw new OrphanHLogAfterSplitException( Discovered orphan hlog after split. Maybe the + HRegionServer was not dead when we started); } archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); } finally { splits = outputSink.finishWritingAndClose(); } return splits; } {code} Notice how we'll only archive logs only if we successfully split all logs. We won't archive 31 of 35 files if we happen to get a checksum exception on file 32. I think we should treat a ChecksumException the same as a ParseException; a retry will not fix it if HDFS could not get around the ChecksumException (seems like in our case all replicas were corrupt). Here is a play-by-play from the logs: {code} 813572 2011-03-18 20:31:44,687 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 34 of 35: hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481, length=150 65662813573 2011-03-18 20:31:44,687 INFO org.apache.hadoop.hbase.util.FSUtils: Recovering file hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481 813617 2011-03-18 20:31:46,238 INFO org.apache.hadoop.fs.FSInputChecker: Found checksum error: b[0, 512]=00cd00502037383661376439656265643938636463343433386132343631323633303239371d6170695f6163636573735f746f6b656e5f7374 6174735f6275636b6574000d9fa4d5dc012ec9c7cbaf000001006d005d0008002337626262663764626431616561366234616130656334383436653732333132643a32390764656661756c746170695f616e64726f69645f6c6f67676564
[jira] [Commented] (HBASE-3819) TestSplitLogWorker has too many SLWs running -- makes for contention and occasional failures
[ https://issues.apache.org/jira/browse/HBASE-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027267#comment-13027267 ] Hudson commented on HBASE-3819: --- Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/]) TestSplitLogWorker has too many SLWs running -- makes for contention and occasional failures Key: HBASE-3819 URL: https://issues.apache.org/jira/browse/HBASE-3819 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.92.0 Attachments: tslw.patch I noticed that TSPLW has a background SLW running. Sometimes it wins the race for tasks messing up tests. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3741) Make HRegionServer aware of the regions it's opening/closing
[ https://issues.apache.org/jira/browse/HBASE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027265#comment-13027265 ] Hudson commented on HBASE-3741: --- Integrated in HBase-TRUNK #1888 (See [https://builds.apache.org/hudson/job/HBase-TRUNK/1888/]) Make HRegionServer aware of the regions it's opening/closing Key: HBASE-3741 URL: https://issues.apache.org/jira/browse/HBASE-3741 Project: HBase Issue Type: Bug Affects Versions: 0.90.1 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Priority: Blocker Fix For: 0.90.3 Attachments: HBASE-3741-rsfix-v2.patch, HBASE-3741-rsfix-v3.patch, HBASE-3741-rsfix.patch, HBASE-3741-trunk.patch This is a serious issue about a race between regions being opened and closed in region servers. We had this situation where the master tried to unassign a region for balancing, failed, force unassigned it, force assigned it somewhere else, failed to open it on another region server (took too long), and then reassigned it back to the original region server. A few seconds later, the region server processed the first closed and the region was left unassigned. This is from the master log: {quote} 11-04-05 15:11:17,758 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to serverName=sv4borg42,60020,1300920459477, load=(requests=187, regions=574, usedHeap=3918, maxHeap=6973) for region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 2011-04-05 15:12:10,021 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=PENDING_CLOSE, ts=1302041477758 2011-04-05 15:12:10,021 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_CLOSE for too long, running forced unassign again on region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 ... 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=CLOSED, ts=1302041685733 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:6-0x42ec2cece810b68 Creating (or updating) unassigned node for 1470298961 with OFFLINE state ... 2011-04-05 15:14:45,885 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961; plan=hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961, src=sv4borg42,60020,1300920459477, dest=sv4borg40,60020,1302041218196 2011-04-05 15:14:45,885 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 to sv4borg40,60020,1302041218196 2011-04-05 15:15:39,410 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=PENDING_OPEN, ts=1302041700944 2011-04-05 15:15:39,410 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 state=PENDING_OPEN, ts=1302041700944 ... 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 so generated a random one; hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961, src=, dest=sv4borg42,60020,1300920459477; 19 (online=19, exclude=null) available servers 2011-04-05 15:15:39,410 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961 to sv4borg42,60020,1300920459477 2011-04-05 15:15:40,951 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration
[ https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027281#comment-13027281 ] M. C. Srivas commented on HBASE-3777: - bq. The thing is that a HConnection's behavior is determined not just by the server-side cluster it goes against, but also its client-side properties, such as hbase.client.retries.number, hbase.client.prefetch.limit, and so on. Ergo, we really need a different connection for every unique set of connection-specific config properties, whether it be client- or server-specific. I am beginning to understand the reasons behind taking this approach. Thanks for explaining. bq. As per the ZK/HBase use cases wiki, in theory we can have multiple masters registered with the ZK (to eliminate any SPOFs perhaps?). So, I'm not sure we can presuppose what hmaster we'll be going to at any given point in time. Even in the presence of multiple hmasters, does it really matter if we connect back to the same hmaster? It probably is important for the hmasters themselves which hmaster they connect to (and perhaps for region-servers as well). But it should not matter for clients. Agree? (of course, I am stating all this without knowing any details about Hbase, so don't kill me for it). bq. The whole purpose of this patch was to reduce the number of connections by reusing them to the extent possible. At one point, the config's equals method was treated as the key to the connection, which promoted reuse to some extent, but started breaking down if the config was changed after the fact. Currently, the config's identity (object reference) is treated as the key, but that suffers from connection overload. Hopefully, the HConnectionKey defined in the HCM will serve as a happy medium between the two ends of the spectrum. Ted Yu pointed out the work being done here, so I started reading the JIRA. I am not familiar with where/how the HConnection instance gets used, and this JIRA was pretty long to understand with the code changes and all. I started to comment on this Jira due to the problems we faced trying to scale up the YCSB benchmark. We tried to run about 500 threads in the YCSB HBase client, and ran out of connections to ZK. It was a complete, unexpected, surprise that the HBase client needed to maintain multiple connections to ZK, and it seemed to be using one per thread (ie, per HTable). We share the same goal: with this patch, we hope to be able to scale YCSB to 50 client machines, with 500 threads per client, and see how HBase holds up. Would you agree, that in the long run, the HBase client should use ZK only to find the hmaster and region-servers, but not keep the connection to ZK open? Otherwise ZK may go under as we try to scale the number of HBase clients. Redefine Identity Of HBase Configuration Key: HBASE-3777 URL: https://issues.apache.org/jira/browse/HBASE-3777 Project: HBase Issue Type: Improvement Components: client, ipc Affects Versions: 0.90.2 Reporter: Karthick Sankarachary Assignee: Karthick Sankarachary Priority: Minor Fix For: 0.92.0 Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, HBASE-3777-V3.patch, HBASE-3777-V4.patch, HBASE-3777-V6.patch, HBASE-3777.patch Judging from the javadoc in {{HConnectionManager}}, sharing connections across multiple clients going to the same cluster is supposedly a good thing. However, the fact that there is a one-to-one mapping between a configuration and connection instance, kind of works against that goal. Specifically, when you create {{HTable}} instances using a given {{Configuration}} instance and a copy thereof, we end up with two distinct {{HConnection}} instances under the covers. Is this really expected behavior, especially given that the configuration instance gets cloned a lot? Here, I'd like to play devil's advocate and propose that we deep-compare {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} instances that have the same properties map to the same {{HConnection}} instance. In case one is concerned that a single {{HConnection}} is insufficient for sharing amongst clients, to quote the javadoc, then one should be able to mark a given {{HBaseConfiguration}} instance as being uniquely identifiable. Note that sharing connections makes clean up of {{HConnection}} instances a little awkward, unless of course, you apply the change described in HBASE-3766. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-3833) ability to support includes/excludes list in Hbase
ability to support includes/excludes list in Hbase -- Key: HBASE-3833 URL: https://issues.apache.org/jira/browse/HBASE-3833 Project: HBase Issue Type: Improvement Components: client, regionserver Reporter: dhruba borthakur Assignee: dhruba borthakur An HBase cluster currently does not have the ability to specify that the master should accept regionservers only from a specified list. This helps preventing administrative errors where the same machine could be included in two clusters. It also allows the administrator to easily remove un-ssh-able machines from the cluster. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira