[jira] Commented: (HBASE-3196) Regionserver stuck when after all IPC Server handlers fatal'd

2010-11-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928138#action_12928138
 ] 

stack commented on HBASE-3196:
--

@Prakash It says 'org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol 
org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol version mismatch. 
(client = 5, server = 3)'

This RS is up because its waiting on all regions to close before it goes out:

{code}
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:645)
{code}

Seems like there are closer handlers waiting to do work.  Does it say in the 
regionserver log what region is not closing?  If so, can you grep it and try 
figure some history on the region?

Thanks.

> Regionserver stuck when after all IPC Server handlers fatal'd
> -
>
> Key: HBASE-3196
> URL: https://issues.apache.org/jira/browse/HBASE-3196
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: Prakash Khemani
>Assignee: Jonathan Gray
>
> The region server is stuck with the following jstack
> 2010-11-03 22:23:41
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.0-b16 mixed mode):
> "Attach Listener" daemon prio=10 tid=0x2aaeb6774000 nid=0x3974 waiting on 
> condition [0x]
>java.lang.Thread.State: RUNNABLE
> "RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-2" 
> prio=10 tid=0x2aaeb8449000 nid=0x3bbc waiting on condition 
> [0x43f67000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x2aaab7fd1130> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
> "RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-1" 
> prio=10 tid=0x2aaeb843f800 nid=0x3bbb waiting on condition 
> [0x43e66000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x2aaab7fd1130> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
> "RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-0" 
> prio=10 tid=0x2aaeb8447800 nid=0x3bba waiting on condition 
> [0x44068000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x2aaab7fd1130> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
> at java.lang.Thread.run(Thread.java:619)
> "RMI Scheduler(0)" daemon prio=10 tid=0x2aaeb48c4800 nid=0x1c97 waiting 
> on condition [0x580a7000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x2aaab773a118> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
> at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
> at 
> java.util.concurrent.ScheduledThreadP

[jira] Updated: (HBASE-2819) hbck should have the ability to repair basic problems

2010-11-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2819:
-

Attachment: 2819-addendum.txt

Something to add to this patch -- being able to deal with empty cells in .META. 
 HBCK should fix these up.

> hbck should have the ability to repair basic problems
> -
>
> Key: HBASE-2819
> URL: https://issues.apache.org/jira/browse/HBASE-2819
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts
>Reporter: Todd Lipcon
>Assignee: stack
>Priority: Critical
> Fix For: 0.90.0
>
> Attachments: 2819-addendum.txt, 2819-v10.txt, 2819-v11.txt, 
> 2819-v12.txt, HBASE-2819.patch
>
>
> Right now, the hbck utility can detect issues with region deployment but 
> can't fix them.
> It should be able to handle basic things like closing one side of a double 
> assignment, re-adding something to META, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-3192) Test that HBase runs when a .META. row without an HRI

2010-11-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3192.
--

Resolution: Won't Fix

Resolving as 'wont fix'.  There are so many places in the code that presume a 
non-null regioninfo in .META. - - MetaScanner, MetaReader, AssignmentManager, 
CatalogJanitor, etc. -- that a test would be hard to write.  Would need to test 
w/ empty HRI during master joining cluster, during bulk startup, during 
'normal' operation.

> Test that HBase runs when a .META. row without an HRI
> -
>
> Key: HBASE-3192
> URL: https://issues.apache.org/jira/browse/HBASE-3192
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.90.0
>
> Attachments: 3192.txt
>
>
> A .META. without an HRI entry should never happen but if it does, it should 
> not cause master shutdown (master is on a hair-trigger at mo. so that issues 
> are noticed quickly).  HBASE-3151 fixed being able to deal w/ empty HRI.  
> This issue is about adding a test to verify hbase stays up (make sure chore 
> runs and that test does meta scanning with MetaScanner and MetaReader).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3196) Regionserver stuck when after all IPC Server handlers fatal'd

2010-11-03 Thread Prakash Khemani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prakash Khemani updated HBASE-3196:
---

  Description: 
The region server is stuck with the following jstack

2010-11-03 22:23:41
Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.0-b16 mixed mode):

"Attach Listener" daemon prio=10 tid=0x2aaeb6774000 nid=0x3974 waiting on 
condition [0x]
   java.lang.Thread.State: RUNNABLE

"RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-2" prio=10 
tid=0x2aaeb8449000 nid=0x3bbc waiting on condition [0x43f67000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x2aaab7fd1130> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

"RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-1" prio=10 
tid=0x2aaeb843f800 nid=0x3bbb waiting on condition [0x43e66000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x2aaab7fd1130> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

"RS_CLOSE_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-0" prio=10 
tid=0x2aaeb8447800 nid=0x3bba waiting on condition [0x44068000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x2aaab7fd1130> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

"RMI Scheduler(0)" daemon prio=10 tid=0x2aaeb48c4800 nid=0x1c97 waiting on 
condition [0x580a7000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x2aaab773a118> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:164)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:583)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:576)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)

"RS_OPEN_REGION-pumahbase028.snc5.facebook.com,60020,1288733355197-2" daemon 
prio=10 tid=0x2aaeb4804800 nid=0x17a0 waiting on condition 
[0x582a9000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x2aaab7fca538> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at 
java.util.

[jira] Updated: (HBASE-2328) Make important configurations more obvious to new users

2010-11-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2328:
-

Attachment: notsoquick.html

I just committed an edit that adds example config. for distributed hbase and 
that adds requirements section from overview with some extra fill.  Sill a 
bunch TODO.  I added what page currently looks like.

> Make important configurations more obvious to new users
> ---
>
> Key: HBASE-2328
> URL: https://issues.apache.org/jira/browse/HBASE-2328
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.0
>
> Attachments: notsoquick.html
>
>
> Over the last 2 weeks, I encountered many situations where people didn't set 
> file descriptors and xcievers higher and that was causing a ton of problems 
> that are hard to debug if you're not used to them. To improve that we should:
>  - Refuse to start HBase if ulimit -n returns some small number smaller than 
> 2048, or at least print out in big red blinking letters that the current 
> configuration is bad and then link to a simple troubleshooting entry on the 
> wiki.
>  - Write a clearer Getting Started document where we don't give as much 
> explanations but add more stuff like "this is what your 
> hbase-site.xml/hdfs-site/xml should look like now" and give a complete file 
> example. At this point we don't even give a number for xcievers and we expect 
> new users to come up with one.
> Any other low hanging fruit others can think of?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3196) Regionserver stuck when after all IPC Server handlers fatal'

2010-11-03 Thread Prakash Khemani (JIRA)
Regionserver stuck when after all IPC Server handlers fatal'


 Key: HBASE-3196
 URL: https://issues.apache.org/jira/browse/HBASE-3196
 Project: HBase
  Issue Type: Bug
Reporter: Prakash Khemani




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3189) Stagger Major Compactions

2010-11-03 Thread Kannan Muthukkaruppan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928122#action_12928122
 ] 

Kannan Muthukkaruppan commented on HBASE-3189:
--

I think, usability wise, jitter (and it's default) should be specified as 
fraction (% value) of the major compaction cycle time, instead of absolute 
terms (like 4 hours)/

Otherwise, you have a backward compat issue with this change for someone who is 
running a major compaction say every three hours, but has forgotten to set the 
jitter parameter when they upgrade to 0.90. And they'll be compacting anywhere 
from 3hrs +/- (2* 4 hours jitter default). This approach will also ensure you 
don't return -ve values for "get next compaction time".


> Stagger Major Compactions
> -
>
> Key: HBASE-3189
> URL: https://issues.apache.org/jira/browse/HBASE-3189
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
>Priority: Minor
> Fix For: 0.90.0
>
> Attachments: HBASE-3189.patch
>
>
> For pre-split regions, we can get into a case where the oldest HFile in a 
> Store is pretty large and will not encounter a compaction within the 24hr 
> major compact window.  If that's the case, we don't want multiple multi-GB 
> major compactions being triggered at the same time.  Add ability to stagger 
> the major compaction expiration window.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3189) Stagger Major Compactions

2010-11-03 Thread Kannan Muthukkaruppan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928123#action_12928123
 ] 

Kannan Muthukkaruppan commented on HBASE-3189:
--

I think, usability wise, jitter (and it's default) should be specified as 
fraction (% value) of the major compaction cycle time, instead of absolute 
terms (like 4 hours)/

Otherwise, you have a backward compat issue with this change for someone who is 
running a major compaction say every three hours, but has forgotten to set the 
jitter parameter when they upgrade to 0.90. And they'll be compacting anywhere 
from 3hrs +/- (2* 4 hours jitter default). This approach will also ensure you 
don't return -ve values for "get next compaction time".


> Stagger Major Compactions
> -
>
> Key: HBASE-3189
> URL: https://issues.apache.org/jira/browse/HBASE-3189
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
>Priority: Minor
> Fix For: 0.90.0
>
> Attachments: HBASE-3189.patch
>
>
> For pre-split regions, we can get into a case where the oldest HFile in a 
> Store is pretty large and will not encounter a compaction within the 24hr 
> major compact window.  If that's the case, we don't want multiple multi-GB 
> major compactions being triggered at the same time.  Add ability to stagger 
> the major compaction expiration window.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3168) Sanity date and time check when a region server joins the cluster

2010-11-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928117#action_12928117
 ] 

stack commented on HBASE-3168:
--

@Jeff

#1 could be a legitimate problem in case where regionserver came up but there 
was no master to connect too so regionserver just hung out twiddling its thumbs 
for five or ten minutes.

#2 is not an issue.  You say "If each region server then calls 
reportsForDuty...".  Thats not what happens.  A regionserver when it comes up 
calls reportForDuty/regionServerStartup.  Thereafter, it heartbeats by calling 
regionServerReport (until it dies).  When a master joins an already running 
cluster, the regionservers will just call the new masters' regionServerReport - 
not the initializing regionServerStartup -- and the master just registers the 
regionserver at that time (TODO: do away with regionServerStartup or when a new 
master joins cluster, have regionserver call regionServerStartup rather than 
regionServerReport.  In interests of simplicity, it doesn't seem as though 
regionServerStartup is no longer necessary so we should just axe it).

I like Jon's suggestion of changing the signature on reportsForDuty to add 
regionServerCurrentTimeMillis param.

You might argue that regionServerReport should be modified too to also take the 
regionserver timestamp but thats probably overdoing it.

Thanks for working on this.

> Sanity date and time check when a region server joins the cluster
> -
>
> Key: HBASE-3168
> URL: https://issues.apache.org/jira/browse/HBASE-3168
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.89.20100924
> Environment: RHEL 5.5 64bit, 1 Master 4 Region Servers
>Reporter: Jeff Whiting
> Fix For: 0.90.0
>
> Attachments: HBASE-3168-trunk-v1.txt
>
>
> Introduce a sanity check when a RS joins the cluster to make sure its clock 
> isn't too far out of skew with the rest of the cluster.  If the RS's time is 
> too far out of skew then the master would prevent it from joining and RS 
> would die and log the error. 
> Having a RS with even small differences in time can cause huge problems due 
> to how bhase stores values with timestamps.
> According to J-D in ServerManager we are already doing: 
> {code}
> HServerInfo info = new HServerInfo(serverInfo);
> checkIsDead(info.getServerName(), "STARTUP");
> checkAlreadySameHostPort(info);
> recordNewServer(info, false, null);
> {code}
> And that the new check would fit in nicely there.
> JG suggests we add a "ClockOutOfSync-like exception"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-3193) Regression: HBASE_MANAGES_ZK=false broken

2010-11-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3193.
--

Resolution: Invalid

I just tried this.  I set the flag to false in hbase-env.sh.  I started hbase. 
It failed to start because no zk.  I then shut it all down.  I then started a 
zk instance and then started the cluster again.  This time it launched.  Seems 
like this is not an issue.  Closing for now as invalid till get more info 
(Charles)?

> Regression: HBASE_MANAGES_ZK=false broken
> -
>
> Key: HBASE-3193
> URL: https://issues.apache.org/jira/browse/HBASE-3193
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
> Fix For: 0.90.0
>
>
> From Charles Thayer up on the list:
> {code}
> I haven't seen any replies, which is probably because the master seems to
> be changing rapidly at the moment.  However, if anyone needs this for
> hbase 0.89.20100726, here's a patch to work around the issue temporarily
> until 0.90.0 (which will probably fix the problem).
> /charles thayer
> --- src/main/java/org/apache/hadoop/hbase/master/HMaster.java   2010-07-30 
> 21:09:11.0 +
> +++ src/main/java/org/apache/hadoop/hbase/master/HMaster.java   2010-10-11 
> 20:51:30.821519000 +
> @@ -1297,11 +1297,18 @@
>   runtime.getVmVendor() + ", vmVersion=" + 
> runtime.getVmVersion());
> LOG.info("vmInputArguments=" + runtime.getInputArguments());
>   }
> +
> + boolean hbase_manages_zk = true;
> + if (System.getenv("HBASE_MANAGES_ZK") != null
> + && System.getenv("HBASE_MANAGES_ZK").equals("false"))
> +   hbase_manages_zk = false;
> +
>   // If 'local', defer to LocalHBaseCluster instance.  Starts master
>   // and regionserver both in the one JVM.
>   if (LocalHBaseCluster.isLocal(conf)) {
> final MiniZooKeeperCluster zooKeeperCluster =
>   new MiniZooKeeperCluster();
> +   if (hbase_manages_zk) {  // thayer
> File zkDataPath = new 
> File(conf.get("hbase.zookeeper.property.dataDir"));
> int zkClientPort = 
> conf.getInt("hbase.zookeeper.property.clientPort", 0);
> if (zkClientPort == 0) {
> @@ -1319,11 +1326,15 @@
> }
> conf.set("hbase.zookeeper.property.clientPort",
>   Integer.toString(clientPort));
> +   } // thayer
> +
> // Need to have the zk cluster shutdown when master is shutdown.
> // Run a subclass that does the zk cluster shutdown on its way 
> out.
> LocalHBaseCluster cluster = new LocalHBaseCluster(conf, 1,
>   LocalHMaster.class, HRegionServer.class);
> +   if (hbase_manages_zk) {
> 
> ((LocalHMaster)cluster.getMaster()).setZKCluster(zooKeeperCluster);
> +   }
> cluster.startup();
>   } else {
> HMaster master = constructMaster(masterClass, conf);
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3195) Fix the new TestTransform breakage up on hudson

2010-11-03 Thread stack (JIRA)
Fix the new TestTransform breakage up on hudson
---

 Key: HBASE-3195
 URL: https://issues.apache.org/jira/browse/HBASE-3195
 Project: HBase
  Issue Type: Bug
Reporter: stack


This new test has been failing up on hudson since it was introduce at #1606.  I 
took a look.  It looks reasonable but its failing in an odd way -- can't find 
blocks in  hdfs.

I'm moving it aside for now till test gets some loving.  Breakage lasted till 
at least #1613.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3194) HBase should run on both secure and vanilla versions of Hadoop 0.20

2010-11-03 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928074#action_12928074
 ] 

Gary Helmling commented on HBASE-3194:
--

Using reflection for isolation should work fine and should allow running 
against both versions without rebuilding.  I'm working it out now.

The easy part is getting the current UGI.  The harder part is "setting" the 
current UGI (only needed by MiniHBaseCluster and test code at the moment), 
since secure Hadoop changed this to UGI.doAs() with a PrivilegedAction instance 
wrapping the actual execution.  I'll sort out an initial attempt at isolating 
that and we can discuss the general approach.



> HBase should run on both secure and vanilla versions of Hadoop 0.20
> ---
>
> Key: HBASE-3194
> URL: https://issues.apache.org/jira/browse/HBASE-3194
> Project: HBase
>  Issue Type: Bug
>Reporter: Gary Helmling
>
> There have been a couple cases recently of folks trying to run HBase trunk 
> (or 0.89 DRs) on CDH3b3 or secure Hadoop.While HBase security is in the 
> works, it currently only runs on secure Hadoop versions.  Meanwhile HBase 
> trunk won't compile on secure Hadoop due to backward incompatible changes in 
> org.apache.hadoop.security.UserGroupInformation.
> This issue is to work out the minimal set of changes necessary to allow HBase 
> to build and run on both secure and non-secure versions of Hadoop.  Though, 
> with secure Hadoop, I don't even think it's important to target running with 
> HDFS security enabled (and krb authentication).  Just allow HBase to build 
> and run in both versions.
> I think mainly this amounts to abstracting usage of UserGroupInformation and 
> UnixUserGroupInformation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3194) HBase should run on both secure and vanilla versions of Hadoop 0.20

2010-11-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928072#action_12928072
 ] 

Andrew Purtell commented on HBASE-3194:
---

It should be possible to wrap UGI and UUGI with something that uses reflection 
to determine what platform variant is below. Anyone forsee a problem with that 
approach?

> HBase should run on both secure and vanilla versions of Hadoop 0.20
> ---
>
> Key: HBASE-3194
> URL: https://issues.apache.org/jira/browse/HBASE-3194
> Project: HBase
>  Issue Type: Bug
>Reporter: Gary Helmling
>
> There have been a couple cases recently of folks trying to run HBase trunk 
> (or 0.89 DRs) on CDH3b3 or secure Hadoop.While HBase security is in the 
> works, it currently only runs on secure Hadoop versions.  Meanwhile HBase 
> trunk won't compile on secure Hadoop due to backward incompatible changes in 
> org.apache.hadoop.security.UserGroupInformation.
> This issue is to work out the minimal set of changes necessary to allow HBase 
> to build and run on both secure and non-secure versions of Hadoop.  Though, 
> with secure Hadoop, I don't even think it's important to target running with 
> HDFS security enabled (and krb authentication).  Just allow HBase to build 
> and run in both versions.
> I think mainly this amounts to abstracting usage of UserGroupInformation and 
> UnixUserGroupInformation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2819) hbck should have the ability to repair basic problems

2010-11-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2819:
-

Attachment: 2819-v12.txt

> hbck should have the ability to repair basic problems
> -
>
> Key: HBASE-2819
> URL: https://issues.apache.org/jira/browse/HBASE-2819
> Project: HBase
>  Issue Type: New Feature
>  Components: scripts
>Reporter: Todd Lipcon
>Assignee: stack
>Priority: Critical
> Fix For: 0.90.0
>
> Attachments: 2819-v10.txt, 2819-v11.txt, 2819-v12.txt, 
> HBASE-2819.patch
>
>
> Right now, the hbck utility can detect issues with region deployment but 
> can't fix them.
> It should be able to handle basic things like closing one side of a double 
> assignment, re-adding something to META, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-2471) Splitting logs, we'll make an output file though the region no longer exists

2010-11-03 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-2471.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed patch to trunk. It expect some sort of destabilization as some tests 
looked flaky when I ran the full suite, so I might have to fix more tests in 
the near future.

> Splitting logs, we'll make an output file though the region no longer exists
> 
>
> Key: HBASE-2471
> URL: https://issues.apache.org/jira/browse/HBASE-2471
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Jean-Daniel Cryans
> Fix For: 0.90.0
>
> Attachments: HBASE-2471-v2.patch
>
>
> The "human unit tester" (Kannan) last night wondered what happens splitting 
> logs and we come across an edit whose region has since been removed.  Taking 
> a look, it looks like we'll create the output file and write the edits for 
> the no-longer-extant region anyways.  This will leave litter in the 
> filesystem -- region split files that will never be used nor removed.  This 
> issue is about verifying that indeed this is whats happening (We do 
> SequenceFile.createWriter with the overwrite flag set to true which tracing 
> seems to mean create all intermediary directories -- to be verified) and if 
> it indeed is happening, fixing split so unless the region dir exists, don't 
> write out edits.. just drop them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2471) Splitting logs, we'll make an output file though the region no longer exists

2010-11-03 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-2471:
--

Attachment: HBASE-2471-v2.patch

Patch that I'm about to commit. It's different from what I posted on RB because 
some other unit tests needed to be changed and only figured it out when running 
the full test suite.

> Splitting logs, we'll make an output file though the region no longer exists
> 
>
> Key: HBASE-2471
> URL: https://issues.apache.org/jira/browse/HBASE-2471
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Jean-Daniel Cryans
> Fix For: 0.90.0
>
> Attachments: HBASE-2471-v2.patch
>
>
> The "human unit tester" (Kannan) last night wondered what happens splitting 
> logs and we come across an edit whose region has since been removed.  Taking 
> a look, it looks like we'll create the output file and write the edits for 
> the no-longer-extant region anyways.  This will leave litter in the 
> filesystem -- region split files that will never be used nor removed.  This 
> issue is about verifying that indeed this is whats happening (We do 
> SequenceFile.createWriter with the overwrite flag set to true which tracing 
> seems to mean create all intermediary directories -- to be verified) and if 
> it indeed is happening, fixing split so unless the region dir exists, don't 
> write out edits.. just drop them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3194) HBase should run on both secure and vanilla versions of Hadoop 0.20

2010-11-03 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928062#action_12928062
 ] 

ryan rawson commented on HBASE-3194:


it would also be nice to run on both w/o rebuilding.

> HBase should run on both secure and vanilla versions of Hadoop 0.20
> ---
>
> Key: HBASE-3194
> URL: https://issues.apache.org/jira/browse/HBASE-3194
> Project: HBase
>  Issue Type: Bug
>Reporter: Gary Helmling
>
> There have been a couple cases recently of folks trying to run HBase trunk 
> (or 0.89 DRs) on CDH3b3 or secure Hadoop.While HBase security is in the 
> works, it currently only runs on secure Hadoop versions.  Meanwhile HBase 
> trunk won't compile on secure Hadoop due to backward incompatible changes in 
> org.apache.hadoop.security.UserGroupInformation.
> This issue is to work out the minimal set of changes necessary to allow HBase 
> to build and run on both secure and non-secure versions of Hadoop.  Though, 
> with secure Hadoop, I don't even think it's important to target running with 
> HDFS security enabled (and krb authentication).  Just allow HBase to build 
> and run in both versions.
> I think mainly this amounts to abstracting usage of UserGroupInformation and 
> UnixUserGroupInformation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3194) HBase should run on both secure and vanilla versions of Hadoop 0.20

2010-11-03 Thread Gary Helmling (JIRA)
HBase should run on both secure and vanilla versions of Hadoop 0.20
---

 Key: HBASE-3194
 URL: https://issues.apache.org/jira/browse/HBASE-3194
 Project: HBase
  Issue Type: Bug
Reporter: Gary Helmling


There have been a couple cases recently of folks trying to run HBase trunk (or 
0.89 DRs) on CDH3b3 or secure Hadoop.While HBase security is in the works, 
it currently only runs on secure Hadoop versions.  Meanwhile HBase trunk won't 
compile on secure Hadoop due to backward incompatible changes in 
org.apache.hadoop.security.UserGroupInformation.

This issue is to work out the minimal set of changes necessary to allow HBase 
to build and run on both secure and non-secure versions of Hadoop.  Though, 
with secure Hadoop, I don't even think it's important to target running with 
HDFS security enabled (and krb authentication).  Just allow HBase to build and 
run in both versions.

I think mainly this amounts to abstracting usage of UserGroupInformation and 
UnixUserGroupInformation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3192) Test that HBase runs when a .META. row without an HRI

2010-11-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3192:
-

Attachment: 3192.txt

Add in this too when I make test... this makes HBaseAdmin immune to odd .META. 
rows.

> Test that HBase runs when a .META. row without an HRI
> -
>
> Key: HBASE-3192
> URL: https://issues.apache.org/jira/browse/HBASE-3192
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
> Fix For: 0.90.0
>
> Attachments: 3192.txt
>
>
> A .META. without an HRI entry should never happen but if it does, it should 
> not cause master shutdown (master is on a hair-trigger at mo. so that issues 
> are noticed quickly).  HBASE-3151 fixed being able to deal w/ empty HRI.  
> This issue is about adding a test to verify hbase stays up (make sure chore 
> runs and that test does meta scanning with MetaScanner and MetaReader).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2828) HTable unnecessarily coupled with HMaster

2010-11-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928051#action_12928051
 ] 

stack commented on HBASE-2828:
--

done

> HTable unnecessarily coupled with HMaster
> -
>
> Key: HBASE-2828
> URL: https://issues.apache.org/jira/browse/HBASE-2828
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.0
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
> Fix For: 0.90.0
>
> Attachments: HBASE-2828-0.90.patch, HBASE-2828.a.patch, 
> HBASE-2828.patch
>
>
> HTable constructor calls "getCurrentNrHRS()" to get the region server count 
> for thread pool creation.  This code calls HBaseAdmin.getClusterStatus() 
> [aka: the HMaster] to get the server count.  This information can be scraped 
> from counting the ZooKeeper /hbase/rs/--- ZNodes.  Need to remove unnecessary 
> master queries when ZooKeeper can do the same job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3193) Regression: HBASE_MANAGES_ZK=false broken

2010-11-03 Thread stack (JIRA)
Regression: HBASE_MANAGES_ZK=false broken
-

 Key: HBASE-3193
 URL: https://issues.apache.org/jira/browse/HBASE-3193
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.90.0


>From Charles Thayer up on the list:

{code}
I haven't seen any replies, which is probably because the master seems to
be changing rapidly at the moment.  However, if anyone needs this for
hbase 0.89.20100726, here's a patch to work around the issue temporarily
until 0.90.0 (which will probably fix the problem).

/charles thayer

--- src/main/java/org/apache/hadoop/hbase/master/HMaster.java   2010-07-30 
21:09:11.0 +
+++ src/main/java/org/apache/hadoop/hbase/master/HMaster.java   2010-10-11 
20:51:30.821519000 +
@@ -1297,11 +1297,18 @@

  runtime.getVmVendor() + ", vmVersion=" + runtime.getVmVersion());
LOG.info("vmInputArguments=" + runtime.getInputArguments());
  }
+
+ boolean hbase_manages_zk = true;
+ if (System.getenv("HBASE_MANAGES_ZK") != null
+ && System.getenv("HBASE_MANAGES_ZK").equals("false"))
+   hbase_manages_zk = false;

+
  // If 'local', defer to LocalHBaseCluster instance.  Starts master
  // and regionserver both in the one JVM.
  if (LocalHBaseCluster.isLocal(conf)) {
final MiniZooKeeperCluster zooKeeperCluster =
  new MiniZooKeeperCluster();
+   if (hbase_manages_zk) {  // thayer

File zkDataPath = new 
File(conf.get("hbase.zookeeper.property.dataDir"));
int zkClientPort = 
conf.getInt("hbase.zookeeper.property.clientPort", 0);
if (zkClientPort == 0) {
@@ -1319,11 +1326,15 @@

}
conf.set("hbase.zookeeper.property.clientPort",
  Integer.toString(clientPort));
+   } // thayer

+
// Need to have the zk cluster shutdown when master is shutdown.
// Run a subclass that does the zk cluster shutdown on its way out.
LocalHBaseCluster cluster = new LocalHBaseCluster(conf, 1,
  LocalHMaster.class, HRegionServer.class);
+   if (hbase_manages_zk) {

((LocalHMaster)cluster.getMaster()).setZKCluster(zooKeeperCluster);
+   }
cluster.startup();
  } else {
HMaster master = constructMaster(masterClass, conf);
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2828) HTable unnecessarily coupled with HMaster

2010-11-03 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928012#action_12928012
 ] 

Nicolas Spiegelberg commented on HBASE-2828:


can we add some explicit comment in there about purposefully not going to 
Master for the RS count.  I wouldn't want a 3rd occurrence of this...

> HTable unnecessarily coupled with HMaster
> -
>
> Key: HBASE-2828
> URL: https://issues.apache.org/jira/browse/HBASE-2828
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.0
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
> Fix For: 0.90.0
>
> Attachments: HBASE-2828-0.90.patch, HBASE-2828.a.patch, 
> HBASE-2828.patch
>
>
> HTable constructor calls "getCurrentNrHRS()" to get the region server count 
> for thread pool creation.  This code calls HBaseAdmin.getClusterStatus() 
> [aka: the HMaster] to get the server count.  This information can be scraped 
> from counting the ZooKeeper /hbase/rs/--- ZNodes.  Need to remove unnecessary 
> master queries when ZooKeeper can do the same job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3192) Test that HBase runs when a .META. row without an HRI

2010-11-03 Thread stack (JIRA)
Test that HBase runs when a .META. row without an HRI
-

 Key: HBASE-3192
 URL: https://issues.apache.org/jira/browse/HBASE-3192
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.90.0


A .META. without an HRI entry should never happen but if it does, it should not 
cause master shutdown (master is on a hair-trigger at mo. so that issues are 
noticed quickly).  HBASE-3151 fixed being able to deal w/ empty HRI.  This 
issue is about adding a test to verify hbase stays up (make sure chore runs and 
that test does meta scanning with MetaScanner and MetaReader).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3151) NPE when trying to read regioninfo from .META.

2010-11-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928010#action_12928010
 ] 

stack commented on HBASE-3151:
--

Well, I think I can add test for case of a .META. row that  has all but HRI.  
That'd be good for testing we don't crap out as we were doing.  Let me make a 
new issue to do that.

> NPE when trying to read regioninfo from .META.
> --
>
> Key: HBASE-3151
> URL: https://issues.apache.org/jira/browse/HBASE-3151
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
>Reporter: stack
>Assignee: stack
> Fix For: 0.90.0
>
> Attachments: offline.txt
>
>
> This is an old issue perhaps in a new guise.  From the list, Sebastien Bauer 
> reports:
> {code}
> > 2010-10-25 08:13:01,690 ERROR
> > org.apache.hadoop.hbase.master.CatalogJanitor: Caught exception
> > java.lang.NullPointerException
> > 2010-10-25 08:13:24,385 INFO
> > org.apache.hadoop.hbase.master.ServerManager: regionservers=2,
> > averageload=2538
> > 2010-10-23 20:16:17,890 DEBUG
> >  org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> >  Cached location for .META.,,1.1028785192 is
> >  db2a.goldenline.pl:60020
> >  2010-10-23 20:16:18,432 FATAL org.apache.hadoop.hbase.master.HMaster:
> >  Unhandled exception. Starting
> >  shutdown.
> >
> >  java.lang.NullPointerException
> >
> >        at
> >  org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)
> >
> >        at
> >  org.apache.hadoop.hbase.util.Writables.getHRegionInfo(Writables.java:119)
> >
> >        at
> >  org.apache.hadoop.hbase.client.MetaScanner$1.processRow(MetaScanner.java:188)
> >
> >        at
> >  org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
> >
> >        at
> >  org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:69)
> >
> >        at
> >  org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:54)
> >
> >        at
> >  org.apache.hadoop.hbase.client.MetaScanner.listAllRegions(MetaScanner.java:195)
> >
> >       at
> >  org.apache.hadoop.hbase.master.AssignmentManager.assignAllUserRegions(AssignmentManager.java:1048)
> >
> >        at
> >  org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:379)
> >
> >        at
> >  org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:265)
> >
> >  2010-10-23 20:16:18,433 INFO org.apache.hadoop.hbase.master.HMaster:
> >  Aborting
> >
> >  2010-10-23 20:16:18,433 DEBUG org.apache.hadoop.hbase.master.HMaster:
> >  Stopping service threads
> {code}
> I think he has an old master... checking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2770) Major compactions from shell may not major compact all families

2010-11-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12928009#action_12928009
 ] 

stack commented on HBASE-2770:
--

Will I close this Dave?

> Major compactions from shell may not major compact all families
> ---
>
> Key: HBASE-2770
> URL: https://issues.apache.org/jira/browse/HBASE-2770
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.20.4
>Reporter: Dave Latham
>Priority: Critical
> Fix For: 0.92.0
>
>
> As part of a data center migration, I initiated a major_compaction request on 
> all tables from the shell.  A few hours later, all the region servers in the 
> cluster appeared to have completed the compactions and all compactionQueue 
> metrics were back to 0.  However, some column families of some regions had 
> not actually done a major compaction.
> Digging through logs and code, it looks like the following happened.  The 
> shell makes a major compaction request which sets 
> HRegion.forceMajorCompaction to true for every region.  Periodically, the 
> HRegionServer.MajorCompactionChecker checks to see if a major compaction is 
> needed in any family's store.  If so, calls 
> CompactSplitThread.compactionRequested which ends up setting the region 
> forceMajorCompaction to false, even if it is already in the compaction queue 
> and set to true.  Then, when that region comes off the queue to be compacted, 
> each family/store separately checks for whether it should do a major 
> compaction, so some families may not do so.
> (This is not good if, for example, you're doing a DistCp of the hbase dir and 
> later on the cluster decides to do a compaction on those files and deletes 
> ones the DistCp job is looking for, causing it to fail.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2828) HTable unnecessarily coupled with HMaster

2010-11-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2828:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I committed your patch Jon.

> HTable unnecessarily coupled with HMaster
> -
>
> Key: HBASE-2828
> URL: https://issues.apache.org/jira/browse/HBASE-2828
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.0
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
> Fix For: 0.90.0
>
> Attachments: HBASE-2828-0.90.patch, HBASE-2828.a.patch, 
> HBASE-2828.patch
>
>
> HTable constructor calls "getCurrentNrHRS()" to get the region server count 
> for thread pool creation.  This code calls HBaseAdmin.getClusterStatus() 
> [aka: the HMaster] to get the server count.  This information can be scraped 
> from counting the ZooKeeper /hbase/rs/--- ZNodes.  Need to remove unnecessary 
> master queries when ZooKeeper can do the same job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-3191) FilterList with MUST_PASS_ONE and SCVF isn't working

2010-11-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3191.
--

   Resolution: Fixed
Fix Version/s: 0.90.0
 Hadoop Flags: [Reviewed]

Committed to TRUNK.  Thank you for the patch Stefan.

> FilterList with MUST_PASS_ONE and SCVF isn't working
> 
>
> Key: HBASE-3191
> URL: https://issues.apache.org/jira/browse/HBASE-3191
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.89.20100924, 0.90.0
>Reporter: Stefan Seelmann
>Priority: Minor
> Fix For: 0.90.0
>
> Attachments: HBASE-3191.patch
>
>
> In a special case the FilterList with MUST_PASS_ONE operator doesn't work 
> correctly:
> - a filter in the list is a SingleColumValueFilter with filterIfMissing=true
> - FilterList.filterKeyValue(KeyValue) is called
> - SingleColumValueFilter.filterKeyValue(KeyValue) is called
> - SingleColumValueFilter.filterKeyValue(KeyValue) returns ReturnCode.INCLUDE 
> if the KeyValue doesn't match a column (to support filterIfMissing)
> - FilterList.filterKeyValue(KeyValue) immediately returns ReturnCode.INCLUDE, 
> remaining filters in the list aren't evaluated.
> However it is required to evaluate remaining filters, otherwise filterRow() 
> filters out rows in case the filter's filterKeyValue() saves state that is 
> used by filterRow(). (SingleColumValueFilter, SkipFilter, WhileMatchFilter do 
> so)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-3191) FilterList with MUST_PASS_ONE and SCVF isn't working

2010-11-03 Thread Stefan Seelmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Seelmann updated HBASE-3191:
---

Attachment: HBASE-3191.patch

Patch with Test

> FilterList with MUST_PASS_ONE and SCVF isn't working
> 
>
> Key: HBASE-3191
> URL: https://issues.apache.org/jira/browse/HBASE-3191
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.89.20100924, 0.90.0
>Reporter: Stefan Seelmann
>Priority: Minor
> Attachments: HBASE-3191.patch
>
>
> In a special case the FilterList with MUST_PASS_ONE operator doesn't work 
> correctly:
> - a filter in the list is a SingleColumValueFilter with filterIfMissing=true
> - FilterList.filterKeyValue(KeyValue) is called
> - SingleColumValueFilter.filterKeyValue(KeyValue) is called
> - SingleColumValueFilter.filterKeyValue(KeyValue) returns ReturnCode.INCLUDE 
> if the KeyValue doesn't match a column (to support filterIfMissing)
> - FilterList.filterKeyValue(KeyValue) immediately returns ReturnCode.INCLUDE, 
> remaining filters in the list aren't evaluated.
> However it is required to evaluate remaining filters, otherwise filterRow() 
> filters out rows in case the filter's filterKeyValue() saves state that is 
> used by filterRow(). (SingleColumValueFilter, SkipFilter, WhileMatchFilter do 
> so)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-3191) FilterList with MUST_PASS_ONE and SCVF isn't working

2010-11-03 Thread Stefan Seelmann (JIRA)
FilterList with MUST_PASS_ONE and SCVF isn't working


 Key: HBASE-3191
 URL: https://issues.apache.org/jira/browse/HBASE-3191
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.89.20100924, 0.90.0
Reporter: Stefan Seelmann
Priority: Minor


In a special case the FilterList with MUST_PASS_ONE operator doesn't work 
correctly:
- a filter in the list is a SingleColumValueFilter with filterIfMissing=true
- FilterList.filterKeyValue(KeyValue) is called
- SingleColumValueFilter.filterKeyValue(KeyValue) is called
- SingleColumValueFilter.filterKeyValue(KeyValue) returns ReturnCode.INCLUDE if 
the KeyValue doesn't match a column (to support filterIfMissing)
- FilterList.filterKeyValue(KeyValue) immediately returns ReturnCode.INCLUDE, 
remaining filters in the list aren't evaluated.

However it is required to evaluate remaining filters, otherwise filterRow() 
filters out rows in case the filter's filterKeyValue() saves state that is used 
by filterRow(). (SingleColumValueFilter, SkipFilter, WhileMatchFilter do so)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2828) HTable unnecessarily coupled with HMaster

2010-11-03 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-2828:
-

Status: Patch Available  (was: Reopened)

> HTable unnecessarily coupled with HMaster
> -
>
> Key: HBASE-2828
> URL: https://issues.apache.org/jira/browse/HBASE-2828
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.0
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
> Fix For: 0.90.0
>
> Attachments: HBASE-2828-0.90.patch, HBASE-2828.a.patch, 
> HBASE-2828.patch
>
>
> HTable constructor calls "getCurrentNrHRS()" to get the region server count 
> for thread pool creation.  This code calls HBaseAdmin.getClusterStatus() 
> [aka: the HMaster] to get the server count.  This information can be scraped 
> from counting the ZooKeeper /hbase/rs/--- ZNodes.  Need to remove unnecessary 
> master queries when ZooKeeper can do the same job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3160) Compactions: Use more intelligent priorities for PriorityCompactionQueue

2010-11-03 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927969#action_12927969
 ] 

Nicolas Spiegelberg commented on HBASE-3160:


@Jeff: I forgot yesterday, but I also did fix an issue with the compaction 
priorities.  If a flush happened for the store that was being compacted, it 
would be added to the compaction queue at the pre-compact priority instead of 
post-compact.  We now do:
{code}
  if (!this.server.isStopRequested()) {
// requests that were added during compaction will have a 
// stale priority. remove and re-insert to update priority
boolean hadCompaction = compactionQueue.remove(r);
if (midKey != null) {
  split(r, midKey);
} else if (hadCompaction) {
  compactionQueue.add(r);
}
  }
{code}



> Compactions: Use more intelligent priorities for PriorityCompactionQueue
> 
>
> Key: HBASE-3160
> URL: https://issues.apache.org/jira/browse/HBASE-3160
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.89.20100924, 0.90.0
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
> Fix For: 0.90.0
>
> Attachments: HBASE-3160.patch
>
>
> One of the problems with the current compaction queue is that we have a very 
> low granularity on the importance of the various compactions in the queue.  
> If a StoreFile count exceeds 15 files, only then do we bump via enum change.  
> We should instead look into more intelligent, granular priority metrics for 
> choosing the next compaction.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2828) HTable unnecessarily coupled with HMaster

2010-11-03 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-2828:
-

Attachment: HBASE-2828-0.90.patch

Looks like the documentation changes made it, just not the 
HTable.getCurrentNrHRS().  Patch changes implementation of that method (same 
principle as original patch but slightly different0.

> HTable unnecessarily coupled with HMaster
> -
>
> Key: HBASE-2828
> URL: https://issues.apache.org/jira/browse/HBASE-2828
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.0
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
> Fix For: 0.90.0
>
> Attachments: HBASE-2828-0.90.patch, HBASE-2828.a.patch, 
> HBASE-2828.patch
>
>
> HTable constructor calls "getCurrentNrHRS()" to get the region server count 
> for thread pool creation.  This code calls HBaseAdmin.getClusterStatus() 
> [aka: the HMaster] to get the server count.  This information can be scraped 
> from counting the ZooKeeper /hbase/rs/--- ZNodes.  Need to remove unnecessary 
> master queries when ZooKeeper can do the same job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (HBASE-2828) HTable unnecessarily coupled with HMaster

2010-11-03 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray reopened HBASE-2828:
--


This is in some 0.89 branches but was left out from 0.90 (I guess during master 
rewrite commit).

> HTable unnecessarily coupled with HMaster
> -
>
> Key: HBASE-2828
> URL: https://issues.apache.org/jira/browse/HBASE-2828
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.0
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
> Fix For: 0.90.0
>
> Attachments: HBASE-2828.a.patch, HBASE-2828.patch
>
>
> HTable constructor calls "getCurrentNrHRS()" to get the region server count 
> for thread pool creation.  This code calls HBaseAdmin.getClusterStatus() 
> [aka: the HMaster] to get the server count.  This information can be scraped 
> from counting the ZooKeeper /hbase/rs/--- ZNodes.  Need to remove unnecessary 
> master queries when ZooKeeper can do the same job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HBASE-2445) Clean up client retry policies

2010-11-03 Thread Dave Latham (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Latham updated HBASE-2445:
---

Fix Version/s: 0.92.0

Here's hoping it can get picked up for 0.92

> Clean up client retry policies
> --
>
> Key: HBASE-2445
> URL: https://issues.apache.org/jira/browse/HBASE-2445
> Project: HBase
>  Issue Type: Improvement
>  Components: client
>Reporter: Todd Lipcon
>Priority: Critical
> Fix For: 0.92.0
>
>
> Right now almost all retry behavior is governed by a single parameter that 
> determines the number of retries. In a few places, there are also conf for 
> the number of millis to sleep between retries. This isn't quite flexible 
> enough. If we can refactor some of the retry logic into a RetryPolicy class, 
> we could introduce exponential backoff where appropriate, clean up some of 
> the config, etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3160) Compactions: Use more intelligent priorities for PriorityCompactionQueue

2010-11-03 Thread Dave Latham (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927887#action_12927887
 ] 

Dave Latham commented on HBASE-3160:


Does that mean https://issues.apache.org/jira/browse/HBASE-2770 is fixed then?

> Compactions: Use more intelligent priorities for PriorityCompactionQueue
> 
>
> Key: HBASE-3160
> URL: https://issues.apache.org/jira/browse/HBASE-3160
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.89.20100924, 0.90.0
>Reporter: Nicolas Spiegelberg
>Assignee: Nicolas Spiegelberg
> Fix For: 0.90.0
>
> Attachments: HBASE-3160.patch
>
>
> One of the problems with the current compaction queue is that we have a very 
> low granularity on the importance of the various compactions in the queue.  
> If a StoreFile count exceeds 15 files, only then do we bump via enum change.  
> We should instead look into more intelligent, granular priority metrics for 
> choosing the next compaction.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3190) Problem with disabling and droping table

2010-11-03 Thread Sebastian Bauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927778#action_12927778
 ] 

Sebastian Bauer commented on HBASE-3190:


im using revision 1030348

> Problem with disabling and droping table
> 
>
> Key: HBASE-3190
> URL: https://issues.apache.org/jira/browse/HBASE-3190
> Project: HBase
>  Issue Type: Bug
>Reporter: Sebastian Bauer
> Fix For: 0.90.0
>
>
> Table disabling was interrupted by kill -9 all part of hbase and now we 
> cannot do anything with this table, disabling doesn't show any exception:
> hbase(main):019:0> disable 'NGolden_CTU'
> 0 row(s) in 0.0250 seconds
> but droping show this:
> hbase(main):020:0> drop 'NGolden_CTU'   
> ERROR: org.apache.hadoop.hbase.TableNotDisabledException: 
> org.apache.hadoop.hbase.TableNotDisabledException: NGolden_CTU
> at 
> org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:861)
> at 
> org.apache.hadoop.hbase.master.handler.TableEventHandler.(TableEventHandler.java:52)
> at 
> org.apache.hadoop.hbase.master.handler.DeleteTableHandler.(DeleteTableHandler.java:42)
> at 
> org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:779)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025)
> Here is some help for this command:
>   Drop the named table. Table must first be disabled. If table has
>   more than one region, run a major compaction on .META.:
>   hbase> major_compact ".META."
> after this nothing strange is in logs
> when we restart hbase we get this:
> 2010-11-03 08:56:37,892 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing 
> open of 
> NGolden_CTU,3065-d_2010_10_14_245FF1A15F4E236002ED3AB651BAB97E,1288046281444.0c8579e52b0ea3f2dab5b6a857ad030b.
> 
> 2010-11-03 08:56:37,892 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x12c10b5fb780005 Attempting to transition node 
> 0c8579e52b0ea3f2dab5b6a857ad030b from M_ZK_REGION_OFFLINE to 
> RS_ZK_REGION_OPENING 
> 2010-11-03 08:56:37,892 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
> Caught throwable while processing event M_RS_OPEN_REGION  
>   
>   
> java.lang.NullPointerException
>   
>   
>  
> at 
> org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75) 
>   
>   
>   
> at 
> org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)
>   
>   
>   
> at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:669)  
>   
>   
>   
> at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:549)
>   
>   
>  
> at 
> org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:542)
>   
>   
>  
> at 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZookeeperOfflineToOpening(OpenRegionHandler.java:208)
>  

[jira] Created: (HBASE-3190) Problem with disabling and droping table

2010-11-03 Thread Sebastian Bauer (JIRA)
Problem with disabling and droping table


 Key: HBASE-3190
 URL: https://issues.apache.org/jira/browse/HBASE-3190
 Project: HBase
  Issue Type: Bug
Reporter: Sebastian Bauer
 Fix For: 0.90.0


Table disabling was interrupted by kill -9 all part of hbase and now we cannot 
do anything with this table, disabling doesn't show any exception:
hbase(main):019:0> disable 'NGolden_CTU'
0 row(s) in 0.0250 seconds


but droping show this:
hbase(main):020:0> drop 'NGolden_CTU'   

ERROR: org.apache.hadoop.hbase.TableNotDisabledException: 
org.apache.hadoop.hbase.TableNotDisabledException: NGolden_CTU
at 
org.apache.hadoop.hbase.master.HMaster.checkTableModifiable(HMaster.java:861)
at 
org.apache.hadoop.hbase.master.handler.TableEventHandler.(TableEventHandler.java:52)
at 
org.apache.hadoop.hbase.master.handler.DeleteTableHandler.(DeleteTableHandler.java:42)
at org.apache.hadoop.hbase.master.HMaster.deleteTable(HMaster.java:779)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025)

Here is some help for this command:
  Drop the named table. Table must first be disabled. If table has
  more than one region, run a major compaction on .META.:

  hbase> major_compact ".META."

after this nothing strange is in logs

when we restart hbase we get this:

2010-11-03 08:56:37,892 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open 
of 
NGolden_CTU,3065-d_2010_10_14_245FF1A15F4E236002ED3AB651BAB97E,1288046281444.0c8579e52b0ea3f2dab5b6a857ad030b.

2010-11-03 08:56:37,892 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:60020-0x12c10b5fb780005 Attempting to transition node 
0c8579e52b0ea3f2dab5b6a857ad030b from M_ZK_REGION_OFFLINE to 
RS_ZK_REGION_OPENING 
2010-11-03 08:56:37,892 ERROR org.apache.hadoop.hbase.executor.EventHandler: 
Caught throwable while processing event M_RS_OPEN_REGION

  
java.lang.NullPointerException  


   
at 
org.apache.hadoop.hbase.util.Writables.getWritable(Writables.java:75)   



at 
org.apache.hadoop.hbase.executor.RegionTransitionData.fromBytes(RegionTransitionData.java:198)


  
at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNode(ZKAssign.java:669)



at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:549)


 
at 
org.apache.hadoop.hbase.zookeeper.ZKAssign.transitionNodeOpening(ZKAssign.java:542)


 
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.transitionZookeeperOfflineToOpening(OpenRegionHandler.java:208)

  
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:89)


   
at 
org.apa

[jira] Commented: (HBASE-1956) Export HDFS read and write latency as a metric

2010-11-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927759#action_12927759
 ] 

stack commented on HBASE-1956:
--

Thanks for looking into this Gary.

> Export HDFS read and write latency as a metric
> --
>
> Key: HBASE-1956
> URL: https://issues.apache.org/jira/browse/HBASE-1956
> Project: HBase
>  Issue Type: Improvement
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
>Priority: Minor
> Fix For: 0.90.0
>
> Attachments: HBASE-1956.patch, HBASE-1956.patch
>
>
> HDFS write latency spikes especially are an indicator of general cluster 
> overloading. We see this where the WAL writer complains about writes taking > 
> 1 second, sometimes > 4, etc.  If for example the average write latency over 
> the monitoring period is exported as a metric, then this can feed into 
> alerting for or automatic provisioning of additional cluster hardware. While 
> we're at it, export read side metrics as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.