[jira] [Created] (HBASE-3830) dumb JVM figure out a deadlock on hbase

2011-04-29 Thread zhoushuaifeng (JIRA)
dumb JVM figure out a deadlock on hbase
---

 Key: HBASE-3830
 URL: https://issues.apache.org/jira/browse/HBASE-3830
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.1
Reporter: zhoushuaifeng


Found one Java-level deadlock:
=
IPC Server handler 9 on 60020:
  waiting to lock monitor 0x409f3908 (object 0x7fe7cbacbd48, a 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
  which is held by IPC Server handler 7 on 60020
IPC Server handler 7 on 60020:
  waiting for ownable synchronizer 0x7fe7cbb06228, (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by regionserver60020.cacheFlusher
regionserver60020.cacheFlusher:
  waiting to lock monitor 0x409f3908 (object 0x7fe7cbacbd48, a 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
  which is held by IPC Server handler 7 on 60020

Java stack information for the threads listed above:
===
IPC Server handler 9 on 60020:
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java)
- waiting to lock 0x7fe7cbacbd48 (a 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
IPC Server handler 7 on 60020:
at sun.misc.Unsafe.$$YJP$$park(Native Method)
- parking to wait for  0x7fe7cbb06228 (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
at sun.misc.Unsafe.park(Unsafe.java)
at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
at 
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:429)
- locked 0x7fe7cbacbd48 (a 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
regionserver60020.cacheFlusher:
at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
- waiting to lock 0x7fe7cbacbd48 (a 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
at 
java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
at java.security.AccessController.$$YJP$$doPrivileged(Native 
Method)
at 
java.security.AccessController.doPrivileged(AccessController.java)
at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
at 
sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
at 
sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
at 
sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:80)
at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
at java.util.TimeZone.getDisplayName(TimeZone.java:350)
at java.util.Date.toString(Date.java:1025)
at 

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-29 Thread Karthick Sankarachary (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027084#comment-13027084
 ] 

Karthick Sankarachary commented on HBASE-3777:
--

{quote}The mapping should really be cluster uuid (if such a thing exists) to 
connection. Perhaps there's a hmaster md5 that can be used in lieu of 
cluster-uuid sitting in ZK that can be probed?{quote}

The thing is that a {{HConnection}}'s behavior is determined not just by the 
server-side cluster it goes against, but also its client-side properties, such 
as hbase.client.retries.number, hbase.client.prefetch.limit, and so on. 
Ergo, we really need a different connection for every unique set of 
connection-specific config properties, whether it be client- or server-specific.

{quote}Perhaps there's a hmaster md5 that can be used in lieu of cluster-uuid 
sitting in ZK that can be probed?{quote}
As per the [ZK/HBase use 
cases|http://wiki.apache.org/hadoop/ZooKeeper/HBaseUseCases] wiki, in theory we 
can have multiple masters registered with the ZK (to eliminate any SPOFs 
perhaps?). So, I'm not sure we can presuppose what hmaster we'll be going to at 
any given point in time.

{quote}Then, an alternative other way is to go ahead and make the extra 
connection and use it to determine which cluster the client is going against. 
If it's a previously-seen cluster, close this newly-created connection, and use 
the stashed one. Else this is a new cluster and create a new mapping 
entry.{quote}
The whole purpose of this patch was to reduce the number of connections by 
reusing them to the extent possible. At one point, the config's {{equals}} 
method was treated as the key to the connection, which promoted reuse to some 
extent, but started breaking down if the config was changed after the fact. 
Currently, the config's identity (object reference) is treated as the key, but 
that suffers from connection overload. Hopefully, the {{HConnectionKey}} 
defined in the HCM will serve as a happy medium between the two ends of the 
spectrum.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, 
 HBASE-3777-V3.patch, HBASE-3777-V4.patch, HBASE-3777-V6.patch, 
 HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs

2011-04-29 Thread Doug Meil (JIRA)
docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in 
several xml docs
--

 Key: HBASE-3831
 URL: https://issues.apache.org/jira/browse/HBASE-3831
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor


To improve readability...

regionserver, region server  == RegionServer
datanode, data node  == DataNode
zookeeper == ZooKeeper



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs

2011-04-29 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3831:
-

Attachment: book_HBASE_3831.xml.patch

 docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in 
 several xml docs
 --

 Key: HBASE-3831
 URL: https://issues.apache.org/jira/browse/HBASE-3831
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_3831.xml.patch, 
 configuration_HBASE_3831.xml.patch, getting_started_HBASE_3831.xml.patch, 
 performance_HBASE_3831.xml.patch, troubleshooting_HBASE_3831.xml.patch


 To improve readability...
 regionserver, region server  == RegionServer
 datanode, data node  == DataNode
 zookeeper == ZooKeeper

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs

2011-04-29 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3831:
-

Attachment: performance_HBASE_3831.xml.patch

 docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in 
 several xml docs
 --

 Key: HBASE-3831
 URL: https://issues.apache.org/jira/browse/HBASE-3831
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_3831.xml.patch, 
 configuration_HBASE_3831.xml.patch, getting_started_HBASE_3831.xml.patch, 
 performance_HBASE_3831.xml.patch, troubleshooting_HBASE_3831.xml.patch


 To improve readability...
 regionserver, region server  == RegionServer
 datanode, data node  == DataNode
 zookeeper == ZooKeeper

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs

2011-04-29 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3831:
-

Attachment: troubleshooting_HBASE_3831.xml.patch

 docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in 
 several xml docs
 --

 Key: HBASE-3831
 URL: https://issues.apache.org/jira/browse/HBASE-3831
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_3831.xml.patch, 
 configuration_HBASE_3831.xml.patch, getting_started_HBASE_3831.xml.patch, 
 performance_HBASE_3831.xml.patch, troubleshooting_HBASE_3831.xml.patch


 To improve readability...
 regionserver, region server  == RegionServer
 datanode, data node  == DataNode
 zookeeper == ZooKeeper

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs

2011-04-29 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3831:
-

Attachment: getting_started_HBASE_3831.xml.patch

 docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in 
 several xml docs
 --

 Key: HBASE-3831
 URL: https://issues.apache.org/jira/browse/HBASE-3831
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_3831.xml.patch, 
 configuration_HBASE_3831.xml.patch, getting_started_HBASE_3831.xml.patch, 
 performance_HBASE_3831.xml.patch, troubleshooting_HBASE_3831.xml.patch


 To improve readability...
 regionserver, region server  == RegionServer
 datanode, data node  == DataNode
 zookeeper == ZooKeeper

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3831) docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in several xml docs

2011-04-29 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3831:
-

Attachment: configuration_HBASE_3831.xml.patch

 docbook xml files - standardized RegionServer, DataNode, and ZooKeeper in 
 several xml docs
 --

 Key: HBASE-3831
 URL: https://issues.apache.org/jira/browse/HBASE-3831
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_3831.xml.patch, 
 configuration_HBASE_3831.xml.patch, getting_started_HBASE_3831.xml.patch, 
 performance_HBASE_3831.xml.patch, troubleshooting_HBASE_3831.xml.patch


 To improve readability...
 regionserver, region server  == RegionServer
 datanode, data node  == DataNode
 zookeeper == ZooKeeper

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3830) MemStoreFlusher deadlock detected by JVM

2011-04-29 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-3830:
--

Summary: MemStoreFlusher deadlock detected by JVM  (was: dumb JVM figure 
out a deadlock on hbase)

 MemStoreFlusher deadlock detected by JVM
 

 Key: HBASE-3830
 URL: https://issues.apache.org/jira/browse/HBASE-3830
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.1
Reporter: zhoushuaifeng

 Found one Java-level deadlock:
 =
 IPC Server handler 9 on 60020:
   waiting to lock monitor 0x409f3908 (object 0x7fe7cbacbd48, a 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
   which is held by IPC Server handler 7 on 60020
 IPC Server handler 7 on 60020:
   waiting for ownable synchronizer 0x7fe7cbb06228, (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync),
   which is held by regionserver60020.cacheFlusher
 regionserver60020.cacheFlusher:
   waiting to lock monitor 0x409f3908 (object 0x7fe7cbacbd48, a 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
   which is held by IPC Server handler 7 on 60020
 Java stack information for the threads listed above:
 ===
 IPC Server handler 9 on 60020:
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java)
 - waiting to lock 0x7fe7cbacbd48 (a 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
 IPC Server handler 7 on 60020:
 at sun.misc.Unsafe.$$YJP$$park(Native Method)
 - parking to wait for  0x7fe7cbb06228 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 at sun.misc.Unsafe.park(Unsafe.java)
 at 
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
 at 
 java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
 at 
 java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
 at 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:429)
 - locked 0x7fe7cbacbd48 (a 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2558)
 at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
 regionserver60020.cacheFlusher:
 at 
 java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
 - waiting to lock 0x7fe7cbacbd48 (a 
 org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
 at 
 java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
 at 
 java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
 at 
 java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
 at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
 at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
 at java.security.AccessController.$$YJP$$doPrivileged(Native 
 Method)
 at 
 java.security.AccessController.doPrivileged(AccessController.java)
 at 
 sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
 at 
 sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
 at 
 

[jira] [Commented] (HBASE-3721) Speedup LoadIncrementalHFiles

2011-04-29 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027178#comment-13027178
 ] 

jirapos...@reviews.apache.org commented on HBASE-3721:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/572/
---

(Updated 2011-04-29 20:48:41.082584)


Review request for hbase and Todd Lipcon.


Changes
---

Simplified the changes for this JIRA according to Todd's review.
TestLoadIncrementalHFiles and TestHFileOutputFormat pass.


Summary
---

I refactored LoadIncrementalHFiles so that tryLoad() queues work items in 
ListServerCallableVoid. doBulkLoad() periodically sends batch of 
ServerCallable's to HBase cluster.
I added the following method to HConnection/HConnectionManager:
public T void getRegionServerWithRetries(ExecutorService pool,
ListServerCallableT callables, Object[] results)
This method uses thread pool to send multiple ServerCallable's through 
getRegionServerWithRetries(ServerCallableT callable).

I introduced two new config parameters: hbase.loadincremental.threads.max and 
hbase.loadincremental.batch.size
hbase.loadincremental.batch.size is for configuring the batch size above which 
HConnection.getRegionServerWithRetries() would be called. In Adam's case, 
there're many small HFiles. LoadIncrementalHFiles shouldn't wait until all 
HFiles have been scanned.
hbase.loadincremental.threads.max controls the maximum number of threads in 
thread pool.


This addresses bug HBASE-3721.
https://issues.apache.org/jira/browse/HBASE-3721


Diffs (updated)
-

  /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
1097897 

Diff: https://reviews.apache.org/r/572/diff


Testing
---

TestLoadIncrementalHFiles and TestHFileOutputFormat pass.


Thanks,

Ted



 Speedup LoadIncrementalHFiles
 -

 Key: HBASE-3721
 URL: https://issues.apache.org/jira/browse/HBASE-3721
 Project: HBase
  Issue Type: Improvement
  Components: util
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 3721-v2.txt, 3721-v3.txt, 3721-v4.txt, 3721.txt


 From Adam Phelps:
 from the logs it looks like 1% of the hfiles we're loading have to be split. 
  Looking at the code for LoadIncrementHFiles (hbase v0.90.1), I'm actually 
 thinking our problem is that this code loads the hfiles sequentially.  Our 
 largest table has over 2500 regions and the data being loaded is fairly well 
 distributed across them, so there end up being around 2500 HFiles for each 
 load period.  At 1-2 seconds per HFile that means the loading process is very 
 time consuming.
 Currently server.bulkLoadHFile() is a blocking call.
 We can utilize ExecutorService to achieve better parallelism on multi-core 
 computer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3796) Per-Store Entries in Compaction Queue

2011-04-29 Thread Nicolas Spiegelberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg resolved HBASE-3796.


   Resolution: Fixed
Fix Version/s: 0.92.0

+1 Peer reviewed  applied Kathik's fix

 Per-Store Entries in Compaction Queue
 -

 Key: HBASE-3796
 URL: https://issues.apache.org/jira/browse/HBASE-3796
 Project: HBase
  Issue Type: Bug
Reporter: Nicolas Spiegelberg
Assignee: Karthik Ranganathan
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3796-fixed.patch, HBASE-3796.patch


 Although compaction is decided on a per-store basis, right now the 
 CompactSplitThread only deals at the Region level for queueing.  Store-level 
 compaction queue entries will give us more visibility into compaction 
 workload + allow us to stop summarizing priorities.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3832) Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on jenkins

2011-04-29 Thread stack (JIRA)
Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on jenkins


 Key: HBASE-3832
 URL: https://issues.apache.org/jira/browse/HBASE-3832
 Project: HBase
  Issue Type: Bug
Reporter: stack


Root region is stuck in RIT.

Seems to be because of this:

3316 2011-04-29 05:53:11,941 WARN  [Thread-642-EventThread] 
master.AssignmentManager(518): Received OPENED for region 70236052/-ROOT- from 
server vesta.apache.org,57336,1304056370834 but region was in the state null 
and not in expected PENDING_OPEN or OPENING states

Later I see this:

3334 2011-04-29 05:53:12,014 DEBUG 
[Master:0;vesta.apache.org,36450,1304056384388] master.AssignmentManager(260): 
Found REGION = {NAME = '-ROOT-,,0', STARTKEY = '', ENDKEY = '', ENCODED = 
70236052,   TABLE = {{NAME = '-ROOT-', IS_ROOT = 'true', IS_META = 
'true', FAMILIES = [{NAME = 'info', BLOOMFILTER = 'NONE', REPLICATION_SCOPE 
= '0', COMPRESSION = 'NONE', VERSIONS = '10', TTL = '2147483647', BLOCKSIZE 
= '8192', IN_MEMORY = 'true', BLOCKCACHE = 
'true'}]}}=vesta.apache.org,57336,1304056370834 in RITs

The former makes it so we don't clear a successfully opened -ROOT- from RIT so 
we get the second line and then the test fails with:

5192 2011-04-29 05:55:42,181 DEBUG [Thread-642] zookeeper.ZKAssign(815): ZK RIT 
- 70236052

printed over and over again.

I don't get why the data is null in the zk when RS has updated it a couple of 
times.  I see we do a double regionOnline in master code.  This clears 
in-memory master state.  I don't think this it but will commit this and more 
logging to help w/ the debug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3832) Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on jenkins

2011-04-29 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3832:
-

Attachment: 3832.txt

Removes an extraneous regionOnline.

 Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on 
 jenkins
 

 Key: HBASE-3832
 URL: https://issues.apache.org/jira/browse/HBASE-3832
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 3832.txt


 Root region is stuck in RIT.
 Seems to be because of this:
 3316 2011-04-29 05:53:11,941 WARN  [Thread-642-EventThread] 
 master.AssignmentManager(518): Received OPENED for region 70236052/-ROOT- 
 from server vesta.apache.org,57336,1304056370834 but region was in the state 
 null and not in expected PENDING_OPEN or OPENING states
 Later I see this:
 3334 2011-04-29 05:53:12,014 DEBUG 
 [Master:0;vesta.apache.org,36450,1304056384388] 
 master.AssignmentManager(260): Found REGION = {NAME = '-ROOT-,,0', STARTKEY 
 = '', ENDKEY = '', ENCODED = 70236052,   TABLE = {{NAME = '-ROOT-', 
 IS_ROOT = 'true', IS_META = 'true', FAMILIES = [{NAME = 'info', 
 BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', 
 VERSIONS = '10', TTL = '2147483647', BLOCKSIZE = '8192', IN_MEMORY = 
 'true', BLOCKCACHE = 'true'}]}}=vesta.apache.org,57336,1304056370834 in RITs
 The former makes it so we don't clear a successfully opened -ROOT- from RIT 
 so we get the second line and then the test fails with:
 5192 2011-04-29 05:55:42,181 DEBUG [Thread-642] zookeeper.ZKAssign(815): ZK 
 RIT - 70236052
 printed over and over again.
 I don't get why the data is null in the zk when RS has updated it a couple of 
 times.  I see we do a double regionOnline in master code.  This clears 
 in-memory master state.  I don't think this it but will commit this and more 
 logging to help w/ the debug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3832) Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on jenkins

2011-04-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027235#comment-13027235
 ] 

stack commented on HBASE-3832:
--

Committed patch.

 Failing TestMasterFailover.testMasterFailoverWithMockedRITOnDeadRS up on 
 jenkins
 

 Key: HBASE-3832
 URL: https://issues.apache.org/jira/browse/HBASE-3832
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 3832.txt


 Root region is stuck in RIT.
 Seems to be because of this:
 3316 2011-04-29 05:53:11,941 WARN  [Thread-642-EventThread] 
 master.AssignmentManager(518): Received OPENED for region 70236052/-ROOT- 
 from server vesta.apache.org,57336,1304056370834 but region was in the state 
 null and not in expected PENDING_OPEN or OPENING states
 Later I see this:
 3334 2011-04-29 05:53:12,014 DEBUG 
 [Master:0;vesta.apache.org,36450,1304056384388] 
 master.AssignmentManager(260): Found REGION = {NAME = '-ROOT-,,0', STARTKEY 
 = '', ENDKEY = '', ENCODED = 70236052,   TABLE = {{NAME = '-ROOT-', 
 IS_ROOT = 'true', IS_META = 'true', FAMILIES = [{NAME = 'info', 
 BLOOMFILTER = 'NONE', REPLICATION_SCOPE = '0', COMPRESSION = 'NONE', 
 VERSIONS = '10', TTL = '2147483647', BLOCKSIZE = '8192', IN_MEMORY = 
 'true', BLOCKCACHE = 'true'}]}}=vesta.apache.org,57336,1304056370834 in RITs
 The former makes it so we don't clear a successfully opened -ROOT- from RIT 
 so we get the second line and then the test fails with:
 5192 2011-04-29 05:55:42,181 DEBUG [Thread-642] zookeeper.ZKAssign(815): ZK 
 RIT - 70236052
 printed over and over again.
 I don't get why the data is null in the zk when RS has updated it a couple of 
 times.  I see we do a double regionOnline in master code.  This clears 
 in-memory master state.  I don't think this it but will commit this and more 
 logging to help w/ the debug.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3827) hbase-1502, removing heartbeats, broke master joining a running cluster and was returning master hostname for rs to use

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027254#comment-13027254
 ] 

Hudson commented on HBASE-3827:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


 hbase-1502, removing heartbeats, broke master joining a running cluster and 
 was returning master hostname for rs to use
 ---

 Key: HBASE-3827
 URL: https://issues.apache.org/jira/browse/HBASE-3827
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.0
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: 3827.txt


 A couple of silly issues in hbase-1502 turned up by cluster testing TRUNK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3629) Update our thrift to 0.6

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027257#comment-13027257
 ] 

Hudson commented on HBASE-3629:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


 Update our thrift to 0.6
 

 Key: HBASE-3629
 URL: https://issues.apache.org/jira/browse/HBASE-3629
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Moaz Reyad
 Fix For: 0.92.0

 Attachments: HBASE-3629.patch.zip, pom.diff


 HBASE-3117 was about updating to 0.5.  Moaz Reyad over in that issue is 
 trying to move us to 0.6.  Lets move the 0.6 upgrade effort here.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1921) When the Master's session times out and there's only one, cluster is wedged

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027258#comment-13027258
 ] 

Hudson commented on HBASE-1921:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


 When the Master's session times out and there's only one, cluster is wedged
 ---

 Key: HBASE-1921
 URL: https://issues.apache.org/jira/browse/HBASE-1921
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.20.2, 0.90.0

 Attachments: HBASE-1921-trunk.patch, HBASE-1921.patch


 On IRC, some fella had a session expiration on his Master and had only one. 
 Maybe in this case the Master should first try to re-get the znode?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3805) Log RegionState that are processed too late in the master

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027256#comment-13027256
 ] 

Hudson commented on HBASE-3805:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


 Log RegionState that are processed too late in the master 
 --

 Key: HBASE-3805
 URL: https://issues.apache.org/jira/browse/HBASE-3805
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.90.3

 Attachments: HBASE-3805.patch


 Working on all the weird delayed processing in the master, I saw that it was 
 hard to figure when a zookeeper event is processed too late. For example, 
 cases where the processing of the events gets too slow and the master takes 
 more than a minute after the event is triggered in the region server to get 
 to it's processing.
 We should at least print that out.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3794) TestRpcMetrics fails on machine where region server is running

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027262#comment-13027262
 ] 

Hudson commented on HBASE-3794:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


 TestRpcMetrics fails on machine where region server is running
 --

 Key: HBASE-3794
 URL: https://issues.apache.org/jira/browse/HBASE-3794
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.90.2
Reporter: Ted Yu
Assignee: Alex Newman
 Fix For: 0.90.3

 Attachments: HBASE-3794.patch


 Since whole test suite takes over an hour to run, I ran them on Linux where 
 region server is running.
 Here is the consistent TestRpcMetrics failure I saw: 
 {code}
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.196 sec  
 FAILURE!
 testCustomMetrics(org.apache.hadoop.hbase.regionserver.TestRpcMetrics)  Time 
 elapsed: 0.079 sec   ERROR!
 java.net.BindException: Problem binding to /10.202.50.107:60020 : Address 
 already in use
 at org.apache.hadoop.hbase.ipc.HBaseServer.bind(HBaseServer.java:216)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.init(HBaseServer.java:283)
 at 
 org.apache.hadoop.hbase.ipc.HBaseServer.init(HBaseServer.java:1189)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.init(WritableRpcEngine.java:266)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:233)
 at 
 org.apache.hadoop.hbase.ipc.WritableRpcEngine.getServer(WritableRpcEngine.java:46)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getServer(HBaseRPC.java:379)
 at org.apache.hadoop.hbase.ipc.HBaseRPC.getServer(HBaseRPC.java:368)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:336)
 at 
 org.apache.hadoop.hbase.regionserver.TestRpcMetrics$TestRegionServer.init(TestRpcMetrics.java:58)
 at 
 org.apache.hadoop.hbase.regionserver.TestRpcMetrics.testCustomMetrics(TestRpcMetrics.java:119)
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3773) Set ZK max connections much higher in 0.90

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027264#comment-13027264
 ] 

Hudson commented on HBASE-3773:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


 Set ZK max connections much higher in 0.90
 --

 Key: HBASE-3773
 URL: https://issues.apache.org/jira/browse/HBASE-3773
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.3


 I think by now we can all acknowledge that 0.90 has an issue with ZK 
 connections, in that we create too many of them and it's also too easy for 
 our users to shoot themselves in the foot.
 For 0.90.3, I think we should change the default configuration of 30 that we 
 ship with and set it much much higher, I'm thinking of 32k.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027260#comment-13027260
 ] 

Hudson commented on HBASE-1512:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
Assignee: Himanshu Vashishtha
 Fix For: 0.92.0

 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 addendum_1512.txt, patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, 
 patch-1512-5.txt, patch-1512-6.txt, patch-1512-7.txt, patch-1512-8.txt, 
 patch-1512-9.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3674) Treat ChecksumException as we would a ParseException splitting logs; else we replay split on every restart

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027259#comment-13027259
 ] 

Hudson commented on HBASE-3674:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


 Treat ChecksumException as we would a ParseException splitting logs; else we 
 replay split on every restart
 --

 Key: HBASE-3674
 URL: https://issues.apache.org/jira/browse/HBASE-3674
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.2

 Attachments: 3674-distributed.txt, 3674-v2.txt, 3674.txt


 In short, a ChecksumException will fail log processing for a server so we 
 skip out w/o archiving logs.  On restart, we'll then reprocess the logs -- 
 hit the checksumexception anew, usually -- and so on.
 Here is the splitLog method (edited):
 {code}
   private ListPath splitLog(final FileStatus[] logfiles) throws IOException 
 {
 
 outputSink.startWriterThreads(entryBuffers);
 
 try {
   int i = 0;
   for (FileStatus log : logfiles) {
Path logPath = log.getPath();
 long logLength = log.getLen();
 splitSize += logLength;
 LOG.debug(Splitting hlog  + (i++ + 1) +  of  + logfiles.length
 + :  + logPath + , length= + logLength);
 try {
   recoverFileLease(fs, logPath, conf);
   parseHLog(log, entryBuffers, fs, conf);
   processedLogs.add(logPath);
 } catch (EOFException eof) {
   // truncated files are expected if a RS crashes (see HBASE-2643)
   LOG.info(EOF from hlog  + logPath + . Continuing);
   processedLogs.add(logPath);
 } catch (FileNotFoundException fnfe) {
   // A file may be missing if the region server was able to archive it
   // before shutting down. This means the edits were persisted already
   LOG.info(A log was missing  + logPath +
   , probably because it was moved by the +
now dead region server. Continuing);
   processedLogs.add(logPath);
 } catch (IOException e) {
   // If the IOE resulted from bad file format,
   // then this problem is idempotent and retrying won't help
   if (e.getCause() instanceof ParseException ||
   e.getCause() instanceof ChecksumException) {
 LOG.warn(ParseException from hlog  + logPath + .  continuing);
 processedLogs.add(logPath);
   } else {
 if (skipErrors) {
   LOG.info(Got while parsing hlog  + logPath +
 . Marking as corrupted, e);
   corruptedLogs.add(logPath);
 } else {
   throw e;
 }
   }
 }
   }
   if (fs.listStatus(srcDir).length  processedLogs.size()
   + corruptedLogs.size()) {
 throw new OrphanHLogAfterSplitException(
 Discovered orphan hlog after split. Maybe the 
 + HRegionServer was not dead when we started);
   }
   archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, conf); 
  
 } finally {
   splits = outputSink.finishWritingAndClose();
 }
 return splits;
   }
 {code}
 Notice how we'll only archive logs only if we successfully split all logs.  
 We won't archive 31 of 35 files if we happen to get a checksum exception on 
 file 32.
 I think we should treat a ChecksumException the same as a ParseException; a 
 retry will not fix it if HDFS could not get around the ChecksumException 
 (seems like in our case all replicas were corrupt).
 Here is a play-by-play from the logs:
 {code}
 813572 2011-03-18 20:31:44,687 DEBUG 
 org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Splitting hlog 34 of 
 35: 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481,
  length=150   65662813573 2011-03-18 20:31:44,687 INFO 
 org.apache.hadoop.hbase.util.FSUtils: Recovering file 
 hdfs://sv2borg170:9000/hbase/.logs/sv2borg182,60020,1300384550664/sv2borg182%3A60020.1300461329481
 
 813617 2011-03-18 20:31:46,238 INFO org.apache.hadoop.fs.FSInputChecker: 
 Found checksum error: b[0, 
 512]=00cd00502037383661376439656265643938636463343433386132343631323633303239371d6170695f6163636573735f746f6b656e5f7374

 6174735f6275636b6574000d9fa4d5dc012ec9c7cbaf000001006d005d0008002337626262663764626431616561366234616130656334383436653732333132643a32390764656661756c746170695f616e64726f69645f6c6f67676564

 

[jira] [Commented] (HBASE-3819) TestSplitLogWorker has too many SLWs running -- makes for contention and occasional failures

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027267#comment-13027267
 ] 

Hudson commented on HBASE-3819:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


 TestSplitLogWorker has too many SLWs running -- makes for contention and 
 occasional failures
 

 Key: HBASE-3819
 URL: https://issues.apache.org/jira/browse/HBASE-3819
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.92.0

 Attachments: tslw.patch


 I noticed that TSPLW has a background SLW running.  Sometimes it wins the 
 race for tasks messing up tests.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3741) Make HRegionServer aware of the regions it's opening/closing

2011-04-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027265#comment-13027265
 ] 

Hudson commented on HBASE-3741:
---

Integrated in HBase-TRUNK #1888 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1888/])


 Make HRegionServer aware of the regions it's opening/closing
 

 Key: HBASE-3741
 URL: https://issues.apache.org/jira/browse/HBASE-3741
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.1
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.3

 Attachments: HBASE-3741-rsfix-v2.patch, HBASE-3741-rsfix-v3.patch, 
 HBASE-3741-rsfix.patch, HBASE-3741-trunk.patch


 This is a serious issue about a race between regions being opened and closed 
 in region servers. We had this situation where the master tried to unassign a 
 region for balancing, failed, force unassigned it, force assigned it 
 somewhere else, failed to open it on another region server (took too long), 
 and then reassigned it back to the original region server. A few seconds 
 later, the region server processed the first closed and the region was left 
 unassigned.
 This is from the master log:
 {quote}
 11-04-05 15:11:17,758 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
 Sent CLOSE to serverName=sv4borg42,60020,1300920459477, load=(requests=187, 
 regions=574, usedHeap=3918, maxHeap=6973) for region 
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
 2011-04-05 15:12:10,021 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  state=PENDING_CLOSE, ts=1302041477758
 2011-04-05 15:12:10,021 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_CLOSE for too long, running forced unassign again on 
 region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
 ...
 2011-04-05 15:14:45,783 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  state=CLOSED, ts=1302041685733
 2011-04-05 15:14:45,783 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x42ec2cece810b68 Creating (or updating) unassigned node for 
 1470298961 with OFFLINE state
 ...
 2011-04-05 15:14:45,885 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing plan for 
 region 
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961;
  
 plan=hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961,
  src=sv4borg42,60020,1300920459477, dest=sv4borg40,60020,1302041218196
 2011-04-05 15:14:45,885 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  to sv4borg40,60020,1302041218196
 2011-04-05 15:15:39,410 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed 
 out:  
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  state=PENDING_OPEN, ts=1302041700944
 2011-04-05 15:15:39,410 INFO 
 org.apache.hadoop.hbase.master.AssignmentManager: Region has been 
 PENDING_OPEN for too long, reassigning 
 region=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
 2011-04-05 15:15:39,410 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  state=PENDING_OPEN, ts=1302041700944
 ...
 2011-04-05 15:15:39,410 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan 
 was found (or we are ignoring an existing plan) for 
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  so generated a random one; 
 hri=stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961,
  src=, dest=sv4borg42,60020,1300920459477; 19 (online=19, exclude=null) 
 available servers
 2011-04-05 15:15:39,410 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Assigning region 
 stumbles_by_userid2,\x00'\x8E\xE8\x7F\xFF\xFE\xE7\xA9\x97\xFC\xDF\x01\x10\xCC6,1266566087256.1470298961
  to sv4borg42,60020,1300920459477
 2011-04-05 15:15:40,951 DEBUG 
 org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: 
 

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-29 Thread M. C. Srivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027281#comment-13027281
 ] 

M. C. Srivas commented on HBASE-3777:
-

bq. The thing is that a HConnection's behavior is determined not just by the 
server-side cluster it goes against, but also its client-side properties, such 
as hbase.client.retries.number, hbase.client.prefetch.limit, and so on. 
Ergo, we really need a different connection for every unique set of 
connection-specific config properties, whether it be client- or server-specific.

I am beginning to understand the reasons behind taking this approach. Thanks 
for explaining.

bq. As per the ZK/HBase use cases wiki, in theory we can have multiple masters 
registered with the ZK (to eliminate any SPOFs perhaps?). So, I'm not sure we 
can presuppose what hmaster we'll be going to at any given point in time.

Even in the presence of multiple hmasters, does it really matter if we connect 
back to the same hmaster? It probably is important for the hmasters themselves 
which hmaster they connect to (and perhaps for region-servers as well). But it 
should not matter for clients. Agree?  (of course, I am stating all this 
without knowing any details about Hbase, so don't kill me for it).

bq. The whole purpose of this patch was to reduce the number of connections by 
reusing them to the extent possible. At one point, the config's equals method 
was treated as the key to the connection, which promoted reuse to some extent, 
but started breaking down if the config was changed after the fact. Currently, 
the config's identity (object reference) is treated as the key, but that 
suffers from connection overload. Hopefully, the HConnectionKey defined in the 
HCM will serve as a happy medium between the two ends of the spectrum.


Ted Yu pointed out the work being done here, so I started reading the JIRA. I 
am not familiar with where/how the HConnection instance gets used, and this 
JIRA was pretty long to understand with the code changes and all.

I started to comment on this Jira due to the problems we faced trying to scale 
up the YCSB benchmark. We tried to run about 500 threads in the YCSB HBase 
client, and ran out of connections to ZK. It was a complete, unexpected, 
surprise that the HBase client needed to maintain multiple connections to ZK, 
and it seemed to be using one per thread (ie, per HTable).

We share the same goal: with this patch, we hope to be able to scale YCSB to 50 
client machines, with 500 threads per client, and see how HBase holds up.

Would you agree, that in the long run, the HBase client should use ZK only to 
find the hmaster and region-servers, but not keep the connection to ZK open? 
Otherwise ZK may go under as we try to scale the number of HBase clients.


 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, 
 HBASE-3777-V3.patch, HBASE-3777-V4.patch, HBASE-3777-V6.patch, 
 HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3833) ability to support includes/excludes list in Hbase

2011-04-29 Thread dhruba borthakur (JIRA)
ability to support includes/excludes list in Hbase
--

 Key: HBASE-3833
 URL: https://issues.apache.org/jira/browse/HBASE-3833
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur


An HBase cluster currently does not have the ability to specify that the master 
should accept regionservers only from a specified list. This helps preventing 
administrative errors where the same machine could be included in two clusters. 
It also allows the administrator to easily remove un-ssh-able machines from the 
cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira