date:20110412


[ 
https://issues.apache.org/jira/browse/HBASE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019223#comment-13019223
 ] 

Nicolas Spiegelberg commented on HBASE-3763:


@stack: we ran into a problem where our bloom sizes were getting quite 
substantial (100 MB.  Believe it or not, blooms still make sense here). When 
this is not in the LRU cache, read requests stall until the entire bloom is 
loaded into memory.  Sometimes, this can be a non-local read.  If we can do a 
block index for blooms and only have to load a 64kb shard, our read stalls will 
severely diminish.

> Add Bloom Block Index Support
> -
>
> Key: HBASE-3763
> URL: https://issues.apache.org/jira/browse/HBASE-3763
> Project: HBase
>  Issue Type: Improvement
>  Components: io, regionserver
>Affects Versions: 0.89.20100924, 0.90.0, 0.90.1, 0.90.2
>Reporter: mikhail
>Assignee: mikhail
>Priority: Minor
>  Labels: hbase, performance
> Fix For: 0.89.20100924
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add a way to save HBase Bloom filters into an array of Meta blocks instead of 
> one big Meta block, and load only the blocks required to answer a query.  
> This will allow us faster bloom load times for large StoreFiles & pave the 
> path for adding Bloom Filter support to HFileOutputFormat bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3759) Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and complete()


 [ 
https://issues.apache.org/jira/browse/HBASE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling resolved HBASE-3759.
--

   Resolution: Fixed
Fix Version/s: 0.92.0
 Hadoop Flags: [Reviewed]

Committed to trunk.  Thanks for review Stack!

> Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and 
> complete()
> 
>
> Key: HBASE-3759
> URL: https://issues.apache.org/jira/browse/HBASE-3759
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Fix For: 0.92.0
>
> Attachments: HBASE-3759.patch, cp_bypass.tar.gz
>
>
> In the current coprocessor framework, ThreadLocal objects are used for the 
> bypass and complete booleans in CoprocessorEnvironment.  This allows the 
> *CoprocessorHost implementations to identify when to short-circuit processing 
> the the preXXX and postXXX hook methods.
> Profiling the region server, however, shows that these ThreadLocals can 
> become a contention point when on a hot code path (such as prePut()).  We 
> should refactor the CoprocessorHost pre/post implementations to remove usage 
> of the ThreadLocal variables and replace them with locally scoped variables 
> to eliminate contention between handler threads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3759) Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and complete()


 [ 
https://issues.apache.org/jira/browse/HBASE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HBASE-3759:
-

Attachment: HBASE-3759.patch

Patch from review

> Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and 
> complete()
> 
>
> Key: HBASE-3759
> URL: https://issues.apache.org/jira/browse/HBASE-3759
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Attachments: HBASE-3759.patch, cp_bypass.tar.gz
>
>
> In the current coprocessor framework, ThreadLocal objects are used for the 
> bypass and complete booleans in CoprocessorEnvironment.  This allows the 
> *CoprocessorHost implementations to identify when to short-circuit processing 
> the the preXXX and postXXX hook methods.
> Profiling the region server, however, shows that these ThreadLocals can 
> become a contention point when on a hot code path (such as prePut()).  We 
> should refactor the CoprocessorHost pre/post implementations to remove usage 
> of the ThreadLocal variables and replace them with locally scoped variables 
> to eliminate contention between handler threads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-12 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019209#comment-13019209
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review436
---


I read half the patch.  Will finish in morning.  Comments below.  This utility 
looks really great.  Hurry up and finish it!


/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Its 2011!



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


There is xtra white space here and elsewhere in this block.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


should be 'handler'



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Do you want to make this actual javadoc link; e.g. {@link Aggr}  Is 
AggrationClient misspelled?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Is this comment still right?  Says 8 byte long (Ted's blog seems to 
indicate this is not longer the case)



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Nice javadoc.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Why this constructor?  We'll have a null conf?  Will that be dangerous 
later?  NPEs?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


White space



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Looks like this comment is no longer true?  The method has been genericized?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Should you reuse the passed configuration else you are making a new 
COnnection per invocation.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Whats this?  The return?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Reuse passed conf?



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Whats this?  Xtra white space.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Reuse conf creating HTable.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


Whats this?  This prob. is in all subsequent methods... the xtra white 
space too.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java


This needs to be passed the conf.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java


2011


- Michael


On 2011-04-12 04:41:49, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-12 04:41:49)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq

[jira] [Resolved] (HBASE-3722) A lot of data is lost when name node crashed


 [ 
https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3722.
--

   Resolution: Fixed
Fix Version/s: 0.90.3
 Hadoop Flags: [Reviewed]

Applied to branch and trunk.  Makes sense.  Thanks for patch and substantiating 
evidence gaojinchao.

>  A lot of data is lost when name node crashed
> -
>
> Key: HBASE-3722
> URL: https://issues.apache.org/jira/browse/HBASE-3722
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.1
>Reporter: gaojinchao
> Fix For: 0.90.3
>
> Attachments: HmasterFilesystem_PatchV1.patch
>
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on 
> connection exception: java.net.ConnectException: Connection refused
>  at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>  at org.apache.hadoop.ipc.Client.call(Client.java:820)
>  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>  at $Proxy5.getListing(Unknown Source)
>  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  at java.lang.reflect.Method.invoke(Method.java:597)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>  at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>  at $Proxy5.getListing(Unknown Source)
>  at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>  at 
> org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>  at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>  at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>  at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>  at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>  at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>  at 
> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>  at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>  at org.apache.hadoop.ipc.Client.call(Client.java:788)
>  ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on 
> connection exception: java.net.ConnectException: Connection refused
>  at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>  at org.apache.hadoop.ipc.Client.call(Client.java:820)
>  at org.ap

[jira] [Commented] (HBASE-3609) Improve the selection of regions to balance; part 2


[ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019204#comment-13019204
 ] 

stack commented on HBASE-3609:
--

bq. Empty server can be detected within balanceCluster(). But this detection 
has been performed by Master, hence the flag.

Thats fair.  Its nice having the balance invocation method simple as possible 
though.

bq. The static regionId helps make each region Id unique. I actually utilized 
this fact to debug my code.

I missed that it was being used.

bq. Preliminary response from Stan Barton showed improvement over random 
selector.
I am waiting for further feedback from gaojinc...@huawei.com and Stan.

I think that if you get good feedback from others, that'll help getting this 
patch committed.

Good stuff Ted.

> Improve the selection of regions to balance; part 2
> ---
>
> Key: HBASE-3609
> URL: https://issues.apache.org/jira/browse/HBASE-3609
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Ted Yu
> Attachments: 3609-double-alternation.txt, 3609-empty-RS.txt, 
> hbase-3609-by-region-age.txt, hbase-3609.txt
>
>
> See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
> of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3765) metrics.xml - small format change and adding nav to hbase book metrics section


 [ 
https://issues.apache.org/jira/browse/HBASE-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3765:
-

   Resolution: Fixed
Fix Version/s: 0.92.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Whoops.  My fault Doug.  Sorry.  I thought this a patch for the docbook, the 
stuff under src/docbook/*.xml but its for the src/site/xdoc -- the wanna-be 
docbook mess.  Sorry about that.  Thanks for the patch.  Applied to TRUNK.

> metrics.xml - small format change and adding nav to hbase book metrics section
> --
>
> Key: HBASE-3765
> URL: https://issues.apache.org/jira/browse/HBASE-3765
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: metrics_HBASE-3765.xml.patch
>
>
> (in src\site\xdoc)
> There was a section header near the top of page that wasn't formatted in bold 
> which I changed.
> Adding small section at bottom to refer to the HBase book metrics section for 
> more info.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3768) Add best practice to book for loading row key only


 [ 
https://issues.apache.org/jira/browse/HBASE-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3768:
-

   Resolution: Fixed
Fix Version/s: 0.92.0
 Assignee: Erik Onnen
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Applied to TRUNK.  Thank you for the patch Erik.

> Add best practice to book for loading row key only
> --
>
> Key: HBASE-3768
> URL: https://issues.apache.org/jira/browse/HBASE-3768
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
>Assignee: Erik Onnen
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-3768.patch
>
>
> Book and wiki FAQs are missing guidance on the recommended practice for 
> loading row keys only during a scan.
> Patch attached based on jdcryans' feedback from IRC.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3609) Improve the selection of regions to balance; part 2


[ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019199#comment-13019199
 ] 

Ted Yu commented on HBASE-3609:
---

Empty server can be detected within balanceCluster(). But this detection has 
been performed by Master, hence the flag.

The static regionId helps make each region Id unique. I actually utilized this 
fact to debug my code.

Let me think more about writing unit test(s) that verifies the distribution 
over underloaded servers.

Both TestLoadBalancer and TestAdmin pass.

Preliminary response from Stan Barton showed improvement over random selector.
I am waiting for further feedback from gaojinc...@huawei.com and Stan.

> Improve the selection of regions to balance; part 2
> ---
>
> Key: HBASE-3609
> URL: https://issues.apache.org/jira/browse/HBASE-3609
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Ted Yu
> Attachments: 3609-double-alternation.txt, 3609-empty-RS.txt, 
> hbase-3609-by-region-age.txt, hbase-3609.txt
>
>
> See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
> of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3769) TableMapReduceUtil is inconsistent with other table-related classes that accept byte[] as a table name


 [ 
https://issues.apache.org/jira/browse/HBASE-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3769:
-

   Resolution: Fixed
Fix Version/s: (was: 0.90.3)
   0.92.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to TRUNK.  Thank you for the patch Erik (I made you a contributor and 
have assigned you these issues you've fixed).

> TableMapReduceUtil is inconsistent with other table-related classes that 
> accept byte[] as a table name
> --
>
> Key: HBASE-3769
> URL: https://issues.apache.org/jira/browse/HBASE-3769
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
>Assignee: Erik Onnen
>Priority: Trivial
> Fix For: 0.92.0
>
> Attachments: HBASE-3769.patch
>
>
> Minor gripe but we define our entire schema as a set of byte[] constants for 
> tables and CFs. This works well with HTable and HTablePool but 
> TableMapReduceUtil requires conversion to a string, most table-related 
> classes do not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3770) Make FilterList accept var arg Filters in its constructor as a convenience


 [ 
https://issues.apache.org/jira/browse/HBASE-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3770:
-

Fix Version/s: (was: 0.90.3)
   0.92.0

> Make FilterList accept var arg Filters in its constructor as a convenience
> --
>
> Key: HBASE-3770
> URL: https://issues.apache.org/jira/browse/HBASE-3770
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
>Assignee: Erik Onnen
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: HBASE-3770.patch
>
>
> When using a small number of Filters for a FilterList, it's cleaner to use 
> var args rather than forcing a list on the client. Compare:
> scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, new 
> FirstKeyOnlyFilter(), new KeyOnlyFilter()));
> vs:
> List filters = new ArrayList(2);
> filters.add(new FilrstKeyOnlyFilter());
> filters.add(new KeyOnlyFilter());
> scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3769) TableMapReduceUtil is inconsistent with other table-related classes that accept byte[] as a table name


 [ 
https://issues.apache.org/jira/browse/HBASE-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-3769:


Assignee: Erik Onnen

> TableMapReduceUtil is inconsistent with other table-related classes that 
> accept byte[] as a table name
> --
>
> Key: HBASE-3769
> URL: https://issues.apache.org/jira/browse/HBASE-3769
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
>Assignee: Erik Onnen
>Priority: Trivial
> Fix For: 0.90.3
>
> Attachments: HBASE-3769.patch
>
>
> Minor gripe but we define our entire schema as a set of byte[] constants for 
> tables and CFs. This works well with HTable and HTablePool but 
> TableMapReduceUtil requires conversion to a string, most table-related 
> classes do not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3770) Make FilterList accept var arg Filters in its constructor as a convenience


 [ 
https://issues.apache.org/jira/browse/HBASE-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reassigned HBASE-3770:


Assignee: Erik Onnen

> Make FilterList accept var arg Filters in its constructor as a convenience
> --
>
> Key: HBASE-3770
> URL: https://issues.apache.org/jira/browse/HBASE-3770
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
>Assignee: Erik Onnen
>Priority: Minor
> Fix For: 0.90.3
>
> Attachments: HBASE-3770.patch
>
>
> When using a small number of Filters for a FilterList, it's cleaner to use 
> var args rather than forcing a list on the client. Compare:
> scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, new 
> FirstKeyOnlyFilter(), new KeyOnlyFilter()));
> vs:
> List filters = new ArrayList(2);
> filters.add(new FilrstKeyOnlyFilter());
> filters.add(new KeyOnlyFilter());
> scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3770) Make FilterList accept var arg Filters in its constructor as a convenience


 [ 
https://issues.apache.org/jira/browse/HBASE-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3770:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Applied to TRUNK.  Thank you for the patch Erik.

> Make FilterList accept var arg Filters in its constructor as a convenience
> --
>
> Key: HBASE-3770
> URL: https://issues.apache.org/jira/browse/HBASE-3770
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
>Priority: Minor
> Fix For: 0.90.3
>
> Attachments: HBASE-3770.patch
>
>
> When using a small number of Filters for a FilterList, it's cleaner to use 
> var args rather than forcing a list on the client. Compare:
> scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, new 
> FirstKeyOnlyFilter(), new KeyOnlyFilter()));
> vs:
> List filters = new ArrayList(2);
> filters.add(new FilrstKeyOnlyFilter());
> filters.add(new KeyOnlyFilter());
> scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3609) Improve the selection of regions to balance; part 2


[ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019193#comment-13019193
 ] 

stack commented on HBASE-3609:
--

Ted:

Why do we pass the emptyRS flag?  Didn't we just insert an HServerInfo for the 
server with not regions into the assignments Map?  Isn't an HSI that has an 
empty array of Regions enough of a flag such that you don't need this extra 
boolean?

Is this used?

{code}
+  static int regionId = 0;
{code}

The new javadoc helps.

Otherwise patch looks good Ted.  It seems to be an ornamentation on what we had 
previous; rather than taking regionservers at random, it alternatively takes 
the newest and then the oldest off the regionserver -- is that right?  (Is that 
explained in the patch?  I don't think I can see it).  You've also added this 
enhancement: "Basically I find the new regions and put them on different 
underloaded servers. Previously one underloaded server would be filled up 
before the next underloaded server is considered."  Any chance of your proving 
the last enhancement with a unit test?

All the other load balancer tests pass?

Thanks.

> Improve the selection of regions to balance; part 2
> ---
>
> Key: HBASE-3609
> URL: https://issues.apache.org/jira/browse/HBASE-3609
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Ted Yu
> Attachments: 3609-double-alternation.txt, 3609-empty-RS.txt, 
> hbase-3609-by-region-age.txt, hbase-3609.txt
>
>
> See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
> of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3759) Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and complete()


[ 
https://issues.apache.org/jira/browse/HBASE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019188#comment-13019188
 ] 

stack commented on HBASE-3759:
--

++1 on commit then (2:15 vs 1:40)

> Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and 
> complete()
> 
>
> Key: HBASE-3759
> URL: https://issues.apache.org/jira/browse/HBASE-3759
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Attachments: cp_bypass.tar.gz
>
>
> In the current coprocessor framework, ThreadLocal objects are used for the 
> bypass and complete booleans in CoprocessorEnvironment.  This allows the 
> *CoprocessorHost implementations to identify when to short-circuit processing 
> the the preXXX and postXXX hook methods.
> Profiling the region server, however, shows that these ThreadLocals can 
> become a contention point when on a hot code path (such as prePut()).  We 
> should refactor the CoprocessorHost pre/post implementations to remove usage 
> of the ThreadLocal variables and replace them with locally scoped variables 
> to eliminate contention between handler threads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3775) Unique transient names for processes


[ 
https://issues.apache.org/jira/browse/HBASE-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019187#comment-13019187
 ] 

stack commented on HBASE-3775:
--

1502 marks HServerAddress as deprecated and replaces most instances, all but 
those that bubble up to the API with a ServerName class.  ServerName is just a 
host for a String formatted as  ','  ',' .  1502 
removes heartbeats.  RS volunteers port and startcode.  Master tells it what 
its hostname is.  Thereafter, RS uses the 'String' the Master gave it for 
registering itself in ZK and for passing the master its load ever-after.

This was how 0.90.x was supposed to work but it the above protocol used 
HServerAddress which does a lookup on each deserialization.  Now we just uses 
Strings to identify with startcode serving as a sort-of UUID but with the the 
ServerName human-readable rather than UUID opaque.

If you are game, lets try 1502 before we resort to UUIDs?  Agree though that 
identity loss, theft, and change has made for way too much grief over the life 
of hbase.

> Unique transient names for processes
> 
>
> Key: HBASE-3775
> URL: https://issues.apache.org/jira/browse/HBASE-3775
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Andrew Purtell
>
> HBASE-3772 is the latest of several incidents where regionservers and master 
> map their identities to hostnames yet hostname resolution is inconsistent 
> cluster wide. With HBase 0.20 we have seen this lead conditions like META 
> being hosted on 11 servers at once. The situation with HBase 0.90 is better 
> but it concerns me a lot. Confusion about identity cannot be anything but bad.
> Why don't we have the processes generate for themselves a random UUID upon 
> startup, or similar, and have all processes on the cluster map these UUIDs to 
> identities? Critically, region assignment state should hold the UUID of the 
> current assignee. This would not remove the need to resolve region locations 
> to network addresses, nor determine liveness of assignments, but will prevent 
> the specific double assignment scenarios we have seen if hostname resolution 
> is flaky.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3759) Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and complete()

2011-04-12 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019185#comment-13019185
]

jirapos...@reviews.apache.org commented on HBASE-3759:
--

bq. On 2011-04-13 03:23:28, Michael Stack wrote:
bq. > +1 Looks good Gary. Agree do it now before 0.92. By chance did you see
if it made a difference profiling?

Yeah if you grab
https://issues.apache.org/jira/secure/attachment/12475852/cp_bypass.tar.gz you
can see the call trees I grabbed from profiling with the context object
(Call_Tree_context_xxx) vs. ThreadLocals (Call_Tree_tl_xxx). If you look at
Call_Tree_tl_run.html vs. Call_Tree_context_run.html, you'll see ~20% of the
runnable thread time spent in ThreadLocal.get() (under shouldComplete() and
shouldBypass()). This is completely eliminated in the context version, though
with small overhead for the object instatiation -- 0.4% in
CallContext.createAndPrepare(). (This was before a rename of CallContext ->
ObserverContext).

Granted this is only runnable thread time, so it's skewed in terms of overall
impact. But at the macro level, the MR put-based import that generated this
ran in ~2h15m with the ThreadLocal version, but only ~1h40m with the context
version. So seems a pretty substantial improvement.

- Gary

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/588/#review434
---

On 2011-04-13 01:08:50, Gary Helmling wrote:
bq.
bq. ---
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/588/
bq. ---
bq.
bq. (Updated 2011-04-13 01:08:50)
bq.
bq.
bq. Review request for hbase.
bq.
bq.
bq. Summary
bq. ---
bq.
bq. Profiling the HRegionServer process with a RegionObserver coprocessor
loaded shows a fair amount of runnable thread CPU time spent getting the bypass
and complete flag ThreadLocal values by RegionCoprocessorHost. See the
HBASE-3759 JIRA for some attached graphs.
bq.
bq. With the caveat that this is runnable CPU time and not threads in all
states, this still seems like a significant processing bottleneck on a hot call
path. The workload profiled was a put-based bulk load, so for each multi-put
request, RegionCoprocessorHost.prePut() could be called many times.
bq.
bq. Instead of using ThreadLocal variable for bypass/complete, which will
incur contention on the underlying map of values, I think we can eliminate the
bottleneck by using locally scoped variables for each preXXX/putXXX method
called in the RegionCoprocessorHost, MasterCoprocessorHost and
WALCoprocessorHost classes.
bq.
bq. The attached patch refactors the current RegionObserver, MasterObserver
and WALObserver APIs to provide a locally scoped ObserverContext object for
storing and checking the bypass and complete values.
bq.
bq. Summary of changes:
bq.
bq. * adds a new ObserverContext class,
containing references for bypass, complete and the environment instance
bq. * in each pre/post method in RegionObserver, the
RegionCoprocessorEnvironment parameter is replaced by
ObserverContext
bq. * in each pre/post method in MasterObserver, the
MasterCoprocessorEnvironment parameter is replaced by
ObserverContext
bq. * in each pre/post method in WALObserver, the WALCoprocessorEnvironment
parameter is replace by ObserverContext
bq.
bq.
bq. This is obviously a large bulk change to the existing API. I could avoid
the API change with hacky modification underneath the *CoprocessorEnvironment
interfaces. But since we do not yet have a public release with coprocessors, I
would prefer to take the time to make the initial API the best we can before we
push it out.
bq.
bq. Please let me know your thoughts on this approach.
bq.
bq.
bq. This addresses bug HBASE-3759.
bq. https://issues.apache.org/jira/browse/HBASE-3759
bq.
bq.
bq. Diffs
bq. -
bq.
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java
9576c48
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserverCoprocessor.java
5a0f095
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorEnvironment.java
d45b950
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
a82f62b
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java
db0870b
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContext.java
PRE-CREATION
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java
3501958
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/WALObserver.java
7a34d18
bq.src/main/java/org/apache/ha

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019183#comment-13019183
 ] 

Jean-Daniel Cryans commented on HBASE-3767:
---

My expectation is that in a long living JVM the HTables are created early 
rather than later, for example with our Thrift servers once you served each 
table from each thread you already got all your HTables.

In the case of a bulk upload, the processes are usually MR tasks and are short 
lived and the HTable is created up front.

bq. If we had oversized TPE it'd grow as servers grew.

It'd actually prefer that to setting the max to CPU times some number. Default 
max to 1000 and lower bound at 1?

bq. Aside: Whose idea was the passing of an ExecutorService from HTable down 
into HCM for it to use? Thats a little perverse

Your favorite Canadian, guess which one :)

> Cache the number of RS in HTable
> 
>
> Key: HBASE-3767
> URL: https://issues.apache.org/jira/browse/HBASE-3767
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.3
>
>
> When creating a new HTable we have to query ZK to learn about the number of 
> region servers in the cluster. That is done for every single one of them, I 
> think instead we should do it once per JVM and then reuse that number for all 
> the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019180#comment-13019180
 ] 

stack commented on HBASE-3767:
--

bq. If the HTable is already created, currently you still have just 1 thread in 
the TPE.

But because we are not caching, the next HTable creation will have a 
right-sized TPE.

If we had oversized TPE it'd grow as servers grew.

Aside: Whose idea was the passing of an ExecutorService from HTable down into 
HCM for it to use?  Thats a little perverse

> Cache the number of RS in HTable
> 
>
> Key: HBASE-3767
> URL: https://issues.apache.org/jira/browse/HBASE-3767
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.3
>
>
> When creating a new HTable we have to query ZK to learn about the number of 
> region servers in the cluster. That is done for every single one of them, I 
> think instead we should do it once per JVM and then reuse that number for all 
> the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019178#comment-13019178
 ] 

Jean-Daniel Cryans commented on HBASE-3767:
---

bq. So if we create an HTable with one RegionServer in the cluster and then add 
ten nodes, we'll have cached 1 rather than 10

If the HTable is already created, currently you still have just 1 thread in the 
TPE.

> Cache the number of RS in HTable
> 
>
> Key: HBASE-3767
> URL: https://issues.apache.org/jira/browse/HBASE-3767
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.3
>
>
> When creating a new HTable we have to query ZK to learn about the number of 
> region servers in the cluster. That is done for every single one of them, I 
> think instead we should do it once per JVM and then reuse that number for all 
> the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3773) Set ZK max connections much higher in 0.90


[ 
https://issues.apache.org/jira/browse/HBASE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019177#comment-13019177
 ] 

stack commented on HBASE-3773:
--

Nice work proofing out which higher number to use.

> Set ZK max connections much higher in 0.90
> --
>
> Key: HBASE-3773
> URL: https://issues.apache.org/jira/browse/HBASE-3773
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.90.3
>
>
> I think by now we can all acknowledge that 0.90 has an issue with ZK 
> connections, in that we create too many of them and it's also too easy for 
> our users to shoot themselves in the foot.
> For 0.90.3, I think we should change the default configuration of 30 that we 
> ship with and set it much much higher, I'm thinking of 32k.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3103) investigate/improve compaction performance


[ 
https://issues.apache.org/jira/browse/HBASE-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019175#comment-13019175
 ] 

stack commented on HBASE-3103:
--

CityHash looks sweet N.  I like the name.  Says Strings only but I'm sure that 
can be worked around?

> investigate/improve compaction performance
> --
>
> Key: HBASE-3103
> URL: https://issues.apache.org/jira/browse/HBASE-3103
> Project: HBase
>  Issue Type: Improvement
>  Components: performance
>Reporter: Kannan Muthukkaruppan
> Attachments: profiler_data.jpg
>
>
> I was running some tests and am seeing that major compacting about 100M of 
> data seems to take around 40-50 seconds. 
> My simplified test case is something like:
> * Created about a 100M store file (800M uncompressed).
> * 10k keys with 1k columns each (avg. key size: 30 bytes; avg. value size: 45 
> bytes) 
> * Compression and ROWCOL bloom was turned on.
> The test was to major compact this single store file into a new file.
> Added some nanoTime() calls around these three stages:
> * Scanner.next operations
> * bloom computation logic in: StoreFile:append()
> * StoreFile.Writer.append()
> This is what I saw for these three stages:
> {code}
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: 
> major Compaction scanTime (ns) 4338103000
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: 
> major Compaction bloom only time (ns) 14433821000
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: 
> major Compaction append time (ns) 23191478000
> {code}
> The HFile.getReadTime() and HFile.getWriteTime() themselves seems pretty low 
> (under 1 second levels). These are the times for the parts that interact with 
> the DFS (readBlock() and finishBlock() mostly).
> Are these numbers roughly in line with what others are seeing normally? 
> Will double check my instrumentations, and try to get more data. Might try to 
> run it under a profiler. But wanted to put it out there for additional 
> input/ideas on improvement.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3763) Add Bloom Block Index Support


[ 
https://issues.apache.org/jira/browse/HBASE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019176#comment-13019176
 ] 

stack commented on HBASE-3763:
--

So we'd load and unload blooms as we went?

> Add Bloom Block Index Support
> -
>
> Key: HBASE-3763
> URL: https://issues.apache.org/jira/browse/HBASE-3763
> Project: HBase
>  Issue Type: Improvement
>  Components: io, regionserver
>Affects Versions: 0.89.20100924, 0.90.0, 0.90.1, 0.90.2
>Reporter: mikhail
>Assignee: mikhail
>Priority: Minor
>  Labels: hbase, performance
> Fix For: 0.89.20100924
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add a way to save HBase Bloom filters into an array of Meta blocks instead of 
> one big Meta block, and load only the blocks required to answer a query.  
> This will allow us faster bloom load times for large StoreFiles & pave the 
> path for adding Bloom Filter support to HFileOutputFormat bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019174#comment-13019174
 ] 

stack commented on HBASE-3767:
--

So if we create an HTable with one RegionServer in the cluster and then add ten 
nodes, we'll have cached 1 rather than 10.  That's going to be a pain to debug 
why upload is slow.

Oversize the executor pool and have it shrink back down when unused (as per Ted 
above)?

> Cache the number of RS in HTable
> 
>
> Key: HBASE-3767
> URL: https://issues.apache.org/jira/browse/HBASE-3767
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.3
>
>
> When creating a new HTable we have to query ZK to learn about the number of 
> region servers in the cluster. That is done for every single one of them, I 
> think instead we should do it once per JVM and then reuse that number for all 
> the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3771) All jsp pages don't clean their HBA


[ 
https://issues.apache.org/jira/browse/HBASE-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019173#comment-13019173
 ] 

stack commented on HBASE-3771:
--

+1

> All jsp pages don't clean their HBA
> ---
>
> Key: HBASE-3771
> URL: https://issues.apache.org/jira/browse/HBASE-3771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.90.3
>
> Attachments: HBASE-3771.patch
>
>
> Noticed by Dave Latham, refreshing the zk web page will eventually make that 
> machine run out of connections with ZK. It's because we don't close the 
> connection created inside HBA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3759) Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and complete()

2011-04-12 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019172#comment-13019172
 ] 

jirapos...@reviews.apache.org commented on HBASE-3759:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/588/#review434
---

Ship it!

+1 Looks good Gary.  Agree do it now before 0.92.  By chance did you see if it 
made a difference profiling?

- Michael

On 2011-04-13 01:08:50, Gary Helmling wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/588/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 01:08:50)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Profiling the HRegionServer process with a RegionObserver coprocessor 
loaded shows a fair amount of runnable thread CPU time spent getting the bypass 
and complete flag ThreadLocal values by RegionCoprocessorHost.  See the 
HBASE-3759 JIRA for some attached graphs.
bq.  
bq.  With the caveat that this is runnable CPU time and not threads in all 
states, this still seems like a significant processing bottleneck on a hot call 
path.  The workload profiled was a put-based bulk load, so for each multi-put 
request, RegionCoprocessorHost.prePut() could be called many times.
bq.  
bq.  Instead of using ThreadLocal variable for bypass/complete, which will 
incur contention on the underlying map of values, I think we can eliminate the 
bottleneck by using locally scoped variables for each preXXX/putXXX method 
called in the RegionCoprocessorHost, MasterCoprocessorHost and 
WALCoprocessorHost classes.
bq.  
bq.  The attached patch refactors the current RegionObserver, MasterObserver 
and WALObserver APIs to provide a locally scoped ObserverContext object for 
storing and checking the bypass and complete values.
bq.  
bq.  Summary of changes:
bq.  
bq.  * adds a new ObserverContext class, 
containing references for bypass, complete and the environment instance
bq.  * in each pre/post method in RegionObserver, the 
RegionCoprocessorEnvironment parameter is replaced by 
ObserverContext
bq.  * in each pre/post method in MasterObserver, the 
MasterCoprocessorEnvironment parameter is replaced by 
ObserverContext
bq.  * in each pre/post method in WALObserver, the WALCoprocessorEnvironment 
parameter is replace by ObserverContext
bq.  
bq.  
bq.  This is obviously a large bulk change to the existing API.  I could avoid 
the API change with hacky modification underneath the *CoprocessorEnvironment 
interfaces.  But since we do not yet have a public release with coprocessors, I 
would prefer to take the time to make the initial API the best we can before we 
push it out.
bq.  
bq.  Please let me know your thoughts on this approach.
bq.  
bq.  
bq.  This addresses bug HBASE-3759.
bq.  https://issues.apache.org/jira/browse/HBASE-3759
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
9576c48 
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserverCoprocessor.java
 5a0f095 
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorEnvironment.java 
d45b950 
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 
a82f62b 
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
db0870b 
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 
3501958 
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/WALObserver.java 
7a34d18 
bq.src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
019bbde 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 
60efa12 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java 
a3f3b31 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/SampleRegionWALObserver.java 
834283f 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java 
0ce2147 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 
0db5001 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
a15d53a 
bq.
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java
 2c1e4a0 
bq.  
bq.  Diff: https://reviews.apache.org/r/588/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Gary
bq.  
bq.

> Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and 
> complete()
>

[jira] [Commented] (HBASE-3759) Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and complete()

2011-04-12 Thread jirapos...@reviews.apache.org (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019159#comment-13019159
 ] 

jirapos...@reviews.apache.org commented on HBASE-3759:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/588/
---

Review request for hbase.


Summary
---

Profiling the HRegionServer process with a RegionObserver coprocessor loaded 
shows a fair amount of runnable thread CPU time spent getting the bypass and 
complete flag ThreadLocal values by RegionCoprocessorHost.  See the HBASE-3759 
JIRA for some attached graphs.

With the caveat that this is runnable CPU time and not threads in all states, 
this still seems like a significant processing bottleneck on a hot call path.  
The workload profiled was a put-based bulk load, so for each multi-put request, 
RegionCoprocessorHost.prePut() could be called many times.

Instead of using ThreadLocal variable for bypass/complete, which will incur 
contention on the underlying map of values, I think we can eliminate the 
bottleneck by using locally scoped variables for each preXXX/putXXX method 
called in the RegionCoprocessorHost, MasterCoprocessorHost and 
WALCoprocessorHost classes.

The attached patch refactors the current RegionObserver, MasterObserver and 
WALObserver APIs to provide a locally scoped ObserverContext object for storing 
and checking the bypass and complete values.

Summary of changes:

* adds a new ObserverContext class, 
containing references for bypass, complete and the environment instance
* in each pre/post method in RegionObserver, the RegionCoprocessorEnvironment 
parameter is replaced by ObserverContext
* in each pre/post method in MasterObserver, the MasterCoprocessorEnvironment 
parameter is replaced by ObserverContext
* in each pre/post method in WALObserver, the WALCoprocessorEnvironment 
parameter is replace by ObserverContext


This is obviously a large bulk change to the existing API.  I could avoid the 
API change with hacky modification underneath the *CoprocessorEnvironment 
interfaces.  But since we do not yet have a public release with coprocessors, I 
would prefer to take the time to make the initial API the best we can before we 
push it out.

Please let me know your thoughts on this approach.


This addresses bug HBASE-3759.
https://issues.apache.org/jira/browse/HBASE-3759


Diffs
-

  src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
9576c48 
  
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserverCoprocessor.java
 5a0f095 
  src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorEnvironment.java 
d45b950 
  src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 
a82f62b 
  src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java db0870b 
  src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContext.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 3501958 
  src/main/java/org/apache/hadoop/hbase/coprocessor/WALObserver.java 7a34d18 
  src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
019bbde 
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 
60efa12 
  
src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCoprocessorHost.java 
a3f3b31 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/SampleRegionWALObserver.java 
834283f 
  src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java 
0ce2147 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorInterface.java 
0db5001 
  src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java 
a15d53a 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionObserverStacking.java
 2c1e4a0 

Diff: https://reviews.apache.org/r/588/diff


Testing
---


Thanks,

Gary



> Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and 
> complete()
> 
>
> Key: HBASE-3759
> URL: https://issues.apache.org/jira/browse/HBASE-3759
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: Gary Helmling
>Assignee: Gary Helmling
> Attachments: cp_bypass.tar.gz
>
>
> In the current coprocessor framework, ThreadLocal objects are used for the 
> bypass and complete booleans in CoprocessorEnvironment.  This allows the 
> *CoprocessorHost implementations to identify when to short-circuit processing 
> the the preXXX and postXXX hook methods.
> Profiling the region server, however, shows that these ThreadLocals can 
> become a contention point when on a hot code path (such

[jira] [Created] (HBASE-3776) Add Bloom Filter Support to HFileOutputFormat

Add Bloom Filter Support to HFileOutputFormat
-

 Key: HBASE-3776
 URL: https://issues.apache.org/jira/browse/HBASE-3776
 Project: HBase
  Issue Type: Improvement
Reporter: Nicolas Spiegelberg
Assignee: mikhail
Priority: Minor
 Fix For: 0.92.0


Add Bloom Filter support for bulk imports.  Lacking a bloom filter, even on a 
single imported file, can cause perf degradation.  Since we now set our 
compression type based on the HBase CF configuration, it would be good to 
follow this path for the bloom filter addition.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3771) All jsp pages don't clean their HBA


 [ 
https://issues.apache.org/jira/browse/HBASE-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-3771:
--

Attachment: HBASE-3771.patch

Patch that fixes the issue. I tested zk.jsp and it doesn't show one more 
connection every time I hit refresh.

> All jsp pages don't clean their HBA
> ---
>
> Key: HBASE-3771
> URL: https://issues.apache.org/jira/browse/HBASE-3771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.90.3
>
> Attachments: HBASE-3771.patch
>
>
> Noticed by Dave Latham, refreshing the zk web page will eventually make that 
> machine run out of connections with ZK. It's because we don't close the 
> connection created inside HBA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019149#comment-13019149
 ] 

Nicolas Spiegelberg commented on HBASE-3767:


We were talking about this exact issue last week.  I think caching the RS count 
per cluster is the correct way to go.  I hacked together a Map but never finished it

> Cache the number of RS in HTable
> 
>
> Key: HBASE-3767
> URL: https://issues.apache.org/jira/browse/HBASE-3767
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.3
>
>
> When creating a new HTable we have to query ZK to learn about the number of 
> region servers in the cluster. That is done for every single one of them, I 
> think instead we should do it once per JVM and then reuse that number for all 
> the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3103) investigate/improve compaction performance


[ 
https://issues.apache.org/jira/browse/HBASE-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019145#comment-13019145
 ] 

Nicolas Spiegelberg commented on HBASE-3103:


Note:  we could look into using CityHash instead of MurmurHash to improve bloom 
insertion performance here: 

http://google-opensource.blogspot.com/2011/04/introducing-cityhash.html

> investigate/improve compaction performance
> --
>
> Key: HBASE-3103
> URL: https://issues.apache.org/jira/browse/HBASE-3103
> Project: HBase
>  Issue Type: Improvement
>  Components: performance
>Reporter: Kannan Muthukkaruppan
> Attachments: profiler_data.jpg
>
>
> I was running some tests and am seeing that major compacting about 100M of 
> data seems to take around 40-50 seconds. 
> My simplified test case is something like:
> * Created about a 100M store file (800M uncompressed).
> * 10k keys with 1k columns each (avg. key size: 30 bytes; avg. value size: 45 
> bytes) 
> * Compression and ROWCOL bloom was turned on.
> The test was to major compact this single store file into a new file.
> Added some nanoTime() calls around these three stages:
> * Scanner.next operations
> * bloom computation logic in: StoreFile:append()
> * StoreFile.Writer.append()
> This is what I saw for these three stages:
> {code}
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: 
> major Compaction scanTime (ns) 4338103000
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: 
> major Compaction bloom only time (ns) 14433821000
> 2010-10-11 11:25:39,774 INFO org.apache.hadoop.hbase.regionserver.Store: 
> major Compaction append time (ns) 23191478000
> {code}
> The HFile.getReadTime() and HFile.getWriteTime() themselves seems pretty low 
> (under 1 second levels). These are the times for the parts that interact with 
> the DFS (readBlock() and finishBlock() mostly).
> Are these numbers roughly in line with what others are seeing normally? 
> Will double check my instrumentations, and try to get more data. Might try to 
> run it under a profiler. But wanted to put it out there for additional 
> input/ideas on improvement.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3763) Add Bloom Block Index Support


 [ 
https://issues.apache.org/jira/browse/HBASE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Spiegelberg updated HBASE-3763:
---

Component/s: io
Description: Add a way to save HBase Bloom filters into an array of Meta 
blocks instead of one big Meta block, and load only the blocks required to 
answer a query.  This will allow us faster bloom load times for large 
StoreFiles & pave the path for adding Bloom Filter support to HFileOutputFormat 
bulk load.  (was: Adding a way to save HBase Bloom filters into an array of 
Meta blocks instead of one big Meta block, and load only the blocks required to 
answer a query. This behavior is controlled by the io.storefile.bloom.lazy 
configuration option, which is set to false by default. Existing StoreFiles 
with single-block Bloom filters are handled the same way as before.)
   Assignee: mikhail
Summary: Add Bloom Block Index Support  (was: Splitting Bloom filters 
into multiple meta blocks and loading those blocks on demand to avoid blocking 
on large Bloom filter loads at read time)

> Add Bloom Block Index Support
> -
>
> Key: HBASE-3763
> URL: https://issues.apache.org/jira/browse/HBASE-3763
> Project: HBase
>  Issue Type: Improvement
>  Components: io, regionserver
>Affects Versions: 0.89.20100924, 0.90.0, 0.90.1, 0.90.2
>Reporter: mikhail
>Assignee: mikhail
>Priority: Minor
>  Labels: hbase, performance
> Fix For: 0.89.20100924
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add a way to save HBase Bloom filters into an array of Meta blocks instead of 
> one big Meta block, and load only the blocks required to answer a query.  
> This will allow us faster bloom load times for large StoreFiles & pave the 
> path for adding Bloom Filter support to HFileOutputFormat bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019125#comment-13019125
 ] 

Jean-Daniel Cryans commented on HBASE-3767:
---

@Ted, I guess the improvement provided by setting the core pool size is 
minimal... I think I wouldn't even bother going all the way to using the number 
of CPUs and just start with 1. Cleaner code.

> Cache the number of RS in HTable
> 
>
> Key: HBASE-3767
> URL: https://issues.apache.org/jira/browse/HBASE-3767
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.3
>
>
> When creating a new HTable we have to query ZK to learn about the number of 
> region servers in the cluster. That is done for every single one of them, I 
> think instead we should do it once per JVM and then reuse that number for all 
> the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3755) Catch zk's ConnectionLossException and augment error message with more help


 [ 
https://issues.apache.org/jira/browse/HBASE-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-3755:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to branch and trunk, many thanks to Gary for the thorough review.

> Catch zk's ConnectionLossException and augment error message with more help
> ---
>
> Key: HBASE-3755
> URL: https://issues.apache.org/jira/browse/HBASE-3755
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.1
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.90.3
>
> Attachments: HBASE-3755-v2.patch, HBASE-3755.patch
>
>
> 0.90 has a different behavior regarding ZK connections, it tends to create 
> too many of them and it's not obvious to users what they should do to fix. I 
> think I've helped at least 5 different users this week with this error.
> By catching ConnectionLossException and augmenting its message, we could say 
> something like "it's possible that the ZooKeeper server has too many 
> connections from this IP, see doc at blah" since the ZK server isn't nice 
> enough to let us know what's going on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3773) Set ZK max connections much higher in 0.90


 [ 
https://issues.apache.org/jira/browse/HBASE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-3773.
---

  Resolution: Fixed
Assignee: Jean-Daniel Cryans
Release Note: The max connections per IP for ZK is now set at 2000, but 
only for 0.90

Committed to branch after making sure it's working as advertised.

> Set ZK max connections much higher in 0.90
> --
>
> Key: HBASE-3773
> URL: https://issues.apache.org/jira/browse/HBASE-3773
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.90.3
>
>
> I think by now we can all acknowledge that 0.90 has an issue with ZK 
> connections, in that we create too many of them and it's also too easy for 
> our users to shoot themselves in the foot.
> For 0.90.3, I think we should change the default configuration of 30 that we 
> ship with and set it much much higher, I'm thinking of 32k.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3773) Set ZK max connections much higher in 0.90


[ 
https://issues.apache.org/jira/browse/HBASE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019072#comment-13019072
 ] 

Jean-Daniel Cryans commented on HBASE-3773:
---

I tried creating as many connections as I could in the shell on my MBP using a 
line like this:

bq. 3000.times { HTable.isTableEnabled("t") }

During that time I also monitored the total number of threads used by ZK and by 
the shell. What happens is that the server doesn't create new threads, only the 
client does, and it eventually dies of OOME unable to create new threads.

I wouldn't mind setting the max connections lower, like to a number just under 
MBP's default max number of threads, so that we would hit the ConnectionLoss 
before OOME.

> Set ZK max connections much higher in 0.90
> --
>
> Key: HBASE-3773
> URL: https://issues.apache.org/jira/browse/HBASE-3773
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.90.3
>
>
> I think by now we can all acknowledge that 0.90 has an issue with ZK 
> connections, in that we create too many of them and it's also too easy for 
> our users to shoot themselves in the foot.
> For 0.90.3, I think we should change the default configuration of 30 that we 
> ship with and set it much much higher, I'm thinking of 32k.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3775) Unique transient names for processes

2011-04-12 Thread Andrew Purtell (JIRA)

Unique transient names for processes


 Key: HBASE-3775
 URL: https://issues.apache.org/jira/browse/HBASE-3775
 Project: HBase
  Issue Type: Brainstorming
Reporter: Andrew Purtell


HBASE-3772 is the latest of several incidents where regionservers and master 
map their identities to hostnames yet hostname resolution is inconsistent 
cluster wide. With HBase 0.20 we have seen this lead conditions like META being 
hosted on 11 servers at once. The situation with HBase 0.90 is better but it 
concerns me a lot. Confusion about identity cannot be anything but bad.

Why don't we have the processes generate for themselves a random UUID upon 
startup, or similar, and have all processes on the cluster map these UUIDs to 
identities? Critically, region assignment state should hold the UUID of the 
current assignee. This would not remove the need to resolve region locations to 
network addresses, nor determine liveness of assignments, but will prevent the 
specific double assignment scenarios we have seen if hostname resolution is 
flaky.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3755) Catch zk's ConnectionLossException and augment error message with more help


[ 
https://issues.apache.org/jira/browse/HBASE-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019065#comment-13019065
 ] 

Gary Helmling commented on HBASE-3755:
--

My last comment really should be handled more comprehensively as part of 
HBASE-3065.

Thanks for the update J-D.

+1 on patch.

> Catch zk's ConnectionLossException and augment error message with more help
> ---
>
> Key: HBASE-3755
> URL: https://issues.apache.org/jira/browse/HBASE-3755
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.1
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.90.3
>
> Attachments: HBASE-3755-v2.patch, HBASE-3755.patch
>
>
> 0.90 has a different behavior regarding ZK connections, it tends to create 
> too many of them and it's not obvious to users what they should do to fix. I 
> think I've helped at least 5 different users this week with this error.
> By catching ConnectionLossException and augmenting its message, we could say 
> something like "it's possible that the ZooKeeper server has too many 
> connections from this IP, see doc at blah" since the ZK server isn't nice 
> enough to let us know what's going on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3755) Catch zk's ConnectionLossException and augment error message with more help


[ 
https://issues.apache.org/jira/browse/HBASE-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019061#comment-13019061
 ] 

Gary Helmling commented on HBASE-3755:
--

Do you think we need an additional check for 
KeeperException.ConnectionLossException at the bottom of 
ZooKeeperWatcher?  This way we could perform the same cleanup as above.

So instead of:
{code:java}
} catch (KeeperException e) {
  throw new ZooKeeperConnectionException(
  prefix("Unexpected KeeperException creating base node"), e);
}
{code}

do something like:
{code:java}
} catch (KeeperException e) {
  if (e instanceof KeeperException.ConnectionLossException) {
try {
  this.zooKeeper.close();
} catch (InterruptedException ie) {
  Thread.currentThread().interrupt();
  LOG.warn("Interrupted while closing", ie);
}
  }
  throw new ZooKeeperConnectionException(
  prefix("Unexpected KeeperException creating base node"), e);
}
{code}



> Catch zk's ConnectionLossException and augment error message with more help
> ---
>
> Key: HBASE-3755
> URL: https://issues.apache.org/jira/browse/HBASE-3755
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.1
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.90.3
>
> Attachments: HBASE-3755-v2.patch, HBASE-3755.patch
>
>
> 0.90 has a different behavior regarding ZK connections, it tends to create 
> too many of them and it's not obvious to users what they should do to fix. I 
> think I've helped at least 5 different users this week with this error.
> By catching ConnectionLossException and augmenting its message, we could say 
> something like "it's possible that the ZooKeeper server has too many 
> connections from this IP, see doc at blah" since the ZK server isn't nice 
> enough to let us know what's going on.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3773) Set ZK max connections much higher in 0.90


[ 
https://issues.apache.org/jira/browse/HBASE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019059#comment-13019059
 ] 

Jean-Daniel Cryans commented on HBASE-3773:
---

I can try it.

> Set ZK max connections much higher in 0.90
> --
>
> Key: HBASE-3773
> URL: https://issues.apache.org/jira/browse/HBASE-3773
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.90.3
>
>
> I think by now we can all acknowledge that 0.90 has an issue with ZK 
> connections, in that we create too many of them and it's also too easy for 
> our users to shoot themselves in the foot.
> For 0.90.3, I think we should change the default configuration of 30 that we 
> ship with and set it much much higher, I'm thinking of 32k.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3772) Hadoop DNS.reverseDns() doesn't canonicalize host names, leading to possible discrepancy in RS hostname vs. Master seen hostname for RS


[ 
https://issues.apache.org/jira/browse/HBASE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019043#comment-13019043
 ] 

stack commented on HBASE-3772:
--

I haven't seen it.  I've just seen issues where HSA resolves differently on 
either side of the connection whether one side finds FQDN and the other only a 
hostname or other side is failing reverse DNS and so just uses IP.  hbase-1502 
is going to deprecated HSA and just pass Strings rather than have an HSA do 
resolve on deserialization (ISA creation).

> Hadoop DNS.reverseDns() doesn't canonicalize host names, leading to possible 
> discrepancy in RS hostname vs. Master seen hostname for RS
> ---
>
> Key: HBASE-3772
> URL: https://issues.apache.org/jira/browse/HBASE-3772
> Project: HBase
>  Issue Type: Bug
>Reporter: Gary Helmling
>
> I ran across this issue on a 0.20 based branch, so I'm not sure if this is 
> still an issue for 0.90+.  However, 0.90 and current trunk do still make use 
> of DNS.getDefaultHost(), so I wanted to open this for discussion.
> In 0.20, the problem was:
>  1. configure hbase-site.xml with hbase.regionserver.dns.interface=xxx
>  2. IP bound on interface xxx has reverse DNS correctly configured
>  3. DNS.getDefaultHost() calls DNS.reverseDns() for this IP, which does a 
> JNDI bind to the DNS provider, returning the *absolute* hostname: 
> host1.my.domain.
>  4. RS reports startup to master as host1.my.domain.,60020,1234...
>  5. BaseScanner when scanning .META. sees region assignments as not valid 
> because the resolved hostname from IP goes through 
> InetSocketAddress.getHostName() which returns the canonicalized form 
> (host1.my.domain != host1.my.domain. though they are equivalent)
> I know the master <-> RS negotiated hostname has completely changed for 0.90. 
>  So hopefully this is no longer an issue and we can close as invalid and go 
> have a beer.  But given the underlying problem in DNS.getDefaultHost(), I 
> wanted to confirm this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3773) Set ZK max connections much higher in 0.90


[ 
https://issues.apache.org/jira/browse/HBASE-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019036#comment-13019036
 ] 

stack commented on HBASE-3773:
--

OK.  Will be interesting to see what kinda issues folks run into when they have 
32k connections to an ensemble. Hopefully they will level off well before 
we hit this big number.

> Set ZK max connections much higher in 0.90
> --
>
> Key: HBASE-3773
> URL: https://issues.apache.org/jira/browse/HBASE-3773
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.90.3
>
>
> I think by now we can all acknowledge that 0.90 has an issue with ZK 
> connections, in that we create too many of them and it's also too easy for 
> our users to shoot themselves in the foot.
> For 0.90.3, I think we should change the default configuration of 30 that we 
> ship with and set it much much higher, I'm thinking of 32k.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3609) Improve the selection of regions to balance; part 2


 [ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3609:
--

Attachment: (was: 3609-double-alternation.txt)

> Improve the selection of regions to balance; part 2
> ---
>
> Key: HBASE-3609
> URL: https://issues.apache.org/jira/browse/HBASE-3609
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Ted Yu
> Attachments: 3609-double-alternation.txt, 3609-empty-RS.txt, 
> hbase-3609-by-region-age.txt, hbase-3609.txt
>
>
> See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
> of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3609) Improve the selection of regions to balance; part 2


 [ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3609:
--

Attachment: 3609-double-alternation.txt

Removed statements used for debugging.

> Improve the selection of regions to balance; part 2
> ---
>
> Key: HBASE-3609
> URL: https://issues.apache.org/jira/browse/HBASE-3609
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Ted Yu
> Attachments: 3609-double-alternation.txt, 3609-empty-RS.txt, 
> hbase-3609-by-region-age.txt, hbase-3609.txt
>
>
> See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
> of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3774) Manage ZK connections by components and ensembles

Manage ZK connections by components and ensembles
-

 Key: HBASE-3774
 URL: https://issues.apache.org/jira/browse/HBASE-3774
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.92.0


This is the real fix to HBASE-3773, we need to stop creating so many 
connections to ZK. Currently the problem is that we don't want to mix 
connections between the region server, a client, and whatever else in a JVM 
that requires talking to ZK. The current situation is a (non-intended) tradeoff 
of worse usability for easier unit testing.

Another thing to take into account is that a client or a region server needs to 
be able to talk to more than one ensemble at the same time (CopyTable, 
replication are examples).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3609) Improve the selection of regions to balance; part 2


 [ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3609:
--

Attachment: (was: 3609-alternate.txt)

> Improve the selection of regions to balance; part 2
> ---
>
> Key: HBASE-3609
> URL: https://issues.apache.org/jira/browse/HBASE-3609
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Ted Yu
> Attachments: 3609-double-alternation.txt, 3609-empty-RS.txt, 
> hbase-3609-by-region-age.txt, hbase-3609.txt
>
>
> See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
> of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3773) Set ZK max connections much higher in 0.90

Set ZK max connections much higher in 0.90
--

 Key: HBASE-3773
 URL: https://issues.apache.org/jira/browse/HBASE-3773
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.3


I think by now we can all acknowledge that 0.90 has an issue with ZK 
connections, in that we create too many of them and it's also too easy for our 
users to shoot themselves in the foot.

For 0.90.3, I think we should change the default configuration of 30 that we 
ship with and set it much much higher, I'm thinking of 32k.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3609) Improve the selection of regions to balance; part 2


 [ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3609:
--

Attachment: 3609-double-alternation.txt

Modified unit test so that regions generated have unique regionIds.
This is first step toward better verification of balancing results.
Also fixed a bug according to Stanislav Barton's feedback.


> Improve the selection of regions to balance; part 2
> ---
>
> Key: HBASE-3609
> URL: https://issues.apache.org/jira/browse/HBASE-3609
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Ted Yu
> Attachments: 3609-alternate.txt, 3609-double-alternation.txt, 
> 3609-empty-RS.txt, hbase-3609-by-region-age.txt, hbase-3609.txt
>
>
> See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
> of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3609) Improve the selection of regions to balance; part 2


 [ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3609:
--

Attachment: (was: 3609-double-alternation.txt)

> Improve the selection of regions to balance; part 2
> ---
>
> Key: HBASE-3609
> URL: https://issues.apache.org/jira/browse/HBASE-3609
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Ted Yu
> Attachments: 3609-alternate.txt, 3609-empty-RS.txt, 
> hbase-3609-by-region-age.txt, hbase-3609.txt
>
>
> See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
> of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3772) Hadoop DNS.reverseDns() doesn't canonicalize host names, leading to possible discrepancy in RS hostname vs. Master seen hostname for RS


[ 
https://issues.apache.org/jira/browse/HBASE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019010#comment-13019010
 ] 

Gary Helmling commented on HBASE-3772:
--

Looking further at HMaster and HRegionServer, both launder the returned 
hostname through HServerAddress, which uses InetSocketAddress to resolve the 
hostname.  So I believe we will *not* see this issue in 0.90/trunk.  Anyone 
want to confirm?

> Hadoop DNS.reverseDns() doesn't canonicalize host names, leading to possible 
> discrepancy in RS hostname vs. Master seen hostname for RS
> ---
>
> Key: HBASE-3772
> URL: https://issues.apache.org/jira/browse/HBASE-3772
> Project: HBase
>  Issue Type: Bug
>Reporter: Gary Helmling
>
> I ran across this issue on a 0.20 based branch, so I'm not sure if this is 
> still an issue for 0.90+.  However, 0.90 and current trunk do still make use 
> of DNS.getDefaultHost(), so I wanted to open this for discussion.
> In 0.20, the problem was:
>  1. configure hbase-site.xml with hbase.regionserver.dns.interface=xxx
>  2. IP bound on interface xxx has reverse DNS correctly configured
>  3. DNS.getDefaultHost() calls DNS.reverseDns() for this IP, which does a 
> JNDI bind to the DNS provider, returning the *absolute* hostname: 
> host1.my.domain.
>  4. RS reports startup to master as host1.my.domain.,60020,1234...
>  5. BaseScanner when scanning .META. sees region assignments as not valid 
> because the resolved hostname from IP goes through 
> InetSocketAddress.getHostName() which returns the canonicalized form 
> (host1.my.domain != host1.my.domain. though they are equivalent)
> I know the master <-> RS negotiated hostname has completely changed for 0.90. 
>  So hopefully this is no longer an issue and we can close as invalid and go 
> have a beer.  But given the underlying problem in DNS.getDefaultHost(), I 
> wanted to confirm this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3772) Hadoop DNS.reverseDns() doesn't canonicalize host names, leading to possible discrepancy in RS hostname vs. Master seen hostname for RS

Hadoop DNS.reverseDns() doesn't canonicalize host names, leading to possible 
discrepancy in RS hostname vs. Master seen hostname for RS
---

 Key: HBASE-3772
 URL: https://issues.apache.org/jira/browse/HBASE-3772
 Project: HBase
  Issue Type: Bug
Reporter: Gary Helmling


I ran across this issue on a 0.20 based branch, so I'm not sure if this is 
still an issue for 0.90+.  However, 0.90 and current trunk do still make use of 
DNS.getDefaultHost(), so I wanted to open this for discussion.

In 0.20, the problem was:

 1. configure hbase-site.xml with hbase.regionserver.dns.interface=xxx
 2. IP bound on interface xxx has reverse DNS correctly configured
 3. DNS.getDefaultHost() calls DNS.reverseDns() for this IP, which does a JNDI 
bind to the DNS provider, returning the *absolute* hostname: host1.my.domain.
 4. RS reports startup to master as host1.my.domain.,60020,1234...
 5. BaseScanner when scanning .META. sees region assignments as not valid 
because the resolved hostname from IP goes through 
InetSocketAddress.getHostName() which returns the canonicalized form 
(host1.my.domain != host1.my.domain. though they are equivalent)

I know the master <-> RS negotiated hostname has completely changed for 0.90.  
So hopefully this is no longer an issue and we can close as invalid and go have 
a beer.  But given the underlying problem in DNS.getDefaultHost(), I wanted to 
confirm this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3771) All jsp pages don't clean their HBA


 [ 
https://issues.apache.org/jira/browse/HBASE-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-3771:
--

Summary: All jsp pages don't clean their HBA  (was: All jsp oages don't 
clean their HBA)

> All jsp pages don't clean their HBA
> ---
>
> Key: HBASE-3771
> URL: https://issues.apache.org/jira/browse/HBASE-3771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.90.3
>
>
> Noticed by Dave Latham, refreshing the zk web page will eventually make that 
> machine run out of connections with ZK. It's because we don't close the 
> connection created inside HBA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3771) All jsp oages don't clean their HBA


 [ 
https://issues.apache.org/jira/browse/HBASE-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-3771:
--

Summary: All jsp oages don't clean their HBA  (was: zk.jsp doesn't clean 
it's HBA)

> All jsp oages don't clean their HBA
> ---
>
> Key: HBASE-3771
> URL: https://issues.apache.org/jira/browse/HBASE-3771
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.90.3
>
>
> Noticed by Dave Latham, refreshing the zk web page will eventually make that 
> machine run out of connections with ZK. It's because we don't close the 
> connection created inside HBA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3771) zk.jsp doesn't clean it's HBA

zk.jsp doesn't clean it's HBA
-

 Key: HBASE-3771
 URL: https://issues.apache.org/jira/browse/HBASE-3771
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.3


Noticed by Dave Latham, refreshing the zk web page will eventually make that 
machine run out of connections with ZK. It's because we don't close the 
connection created inside HBA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018982#comment-13018982
 ] 

Ted Yu commented on HBASE-3767:
---

We can set core pool size to be the number of available processors and set max 
pool size to be (large) multiple of the number of available processors.
ThreadPoolExecutor is able to dynamically shrink thread count when appropriate.

> Cache the number of RS in HTable
> 
>
> Key: HBASE-3767
> URL: https://issues.apache.org/jira/browse/HBASE-3767
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.3
>
>
> When creating a new HTable we have to query ZK to learn about the number of 
> region servers in the cluster. That is done for every single one of them, I 
> think instead we should do it once per JVM and then reuse that number for all 
> the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-04-12 Thread Prakash Khemani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018973#comment-13018973
 ] 

Prakash Khemani commented on HBASE-1364:


updated patch at https://review.cloudera.org/r/1655/

> [performance] Distributed splitting of regionserver commit logs
> ---
>
> Key: HBASE-1364
> URL: https://issues.apache.org/jira/browse/HBASE-1364
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: stack
>Assignee: Prakash Khemani
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-1364.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> HBASE-1008 has some improvements to our log splitting on regionserver crash; 
> but it needs to run even faster.
> (Below is from HBASE-1008)
> In bigtable paper, the split is distributed. If we're going to have 1000 
> logs, we need to distribute or at least multithread the splitting.
> 1. As is, regions starting up expect to find one reconstruction log only. 
> Need to make it so pick up a bunch of edit logs and it should be fine that 
> logs are elsewhere in hdfs in an output directory written by all split 
> participants whether multithreaded or a mapreduce-like distributed process 
> (Lets write our distributed sort first as a MR so we learn whats involved; 
> distributed sort, as much as possible should use MR framework pieces). On 
> startup, regions go to this directory and pick up the files written by split 
> participants deleting and clearing the dir when all have been read in. Making 
> it so can take multiple logs for input, can also make the split process more 
> robust rather than current tenuous process which loses all edits if it 
> doesn't make it to the end without error.
> 2. Each column family rereads the reconstruction log to find its edits. Need 
> to fix that. Split can sort the edits by column family so store only reads 
> its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018969#comment-13018969
 ] 

Jean-Daniel Cryans commented on HBASE-3767:
---

bq. And if the number of region servers changes, are there repercussions?

Currently once the HTable is created its ThreadPoolExecutor will stay the same 
size disregard the changing number of region servers. Caching it here has the 
same behavior. Where it changes is if a HTable is created later after the 
number of region server changes, but running with less threads than the total 
number of region server is only less efficient under bulk load situations where 
you need to insert into all of them at the same time (which I believe isn't 
frequent when uploading, usually you create the HTables up front). That's the 
only repercussion I see, and it's still less bad than the following:

bq. Thats better than doing getCurrentNrHRS. Maybe 2* number of processors

So the reason we use the number of RS is to be able to insert into all the 
region servers at the same time in a bulk upload case. Using the number of CPUs 
by itself isn't particularly useful since uploading isn't CPU intensive on the 
client (it's just threads waiting on region servers) and the fact that you 
usually have many HTables per JVM kinda defeats the purpose of limiting the 
number of executors.

I personally like the fact that we try to learn how many RS there is in order 
to tune the TPE, but it's just that calling it every time is rather expensive 
and mostly useless. I still believe we should just cache it.

> Cache the number of RS in HTable
> 
>
> Key: HBASE-3767
> URL: https://issues.apache.org/jira/browse/HBASE-3767
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.3
>
>
> When creating a new HTable we have to query ZK to learn about the number of 
> region servers in the cluster. That is done for every single one of them, I 
> think instead we should do it once per JVM and then reuse that number for all 
> the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3770) Make FilterList accept var arg Filters in its constructor as a convenience


 [ 
https://issues.apache.org/jira/browse/HBASE-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Onnen updated HBASE-3770:
--

Fix Version/s: 0.90.3
Affects Version/s: 0.90.3
   Status: Patch Available  (was: Open)

> Make FilterList accept var arg Filters in its constructor as a convenience
> --
>
> Key: HBASE-3770
> URL: https://issues.apache.org/jira/browse/HBASE-3770
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
>Priority: Minor
> Fix For: 0.90.3
>
> Attachments: HBASE-3770.patch
>
>
> When using a small number of Filters for a FilterList, it's cleaner to use 
> var args rather than forcing a list on the client. Compare:
> scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, new 
> FirstKeyOnlyFilter(), new KeyOnlyFilter()));
> vs:
> List filters = new ArrayList(2);
> filters.add(new FilrstKeyOnlyFilter());
> filters.add(new KeyOnlyFilter());
> scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3770) Make FilterList accept var arg Filters in its constructor as a convenience


 [ 
https://issues.apache.org/jira/browse/HBASE-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Onnen updated HBASE-3770:
--

Attachment: HBASE-3770.patch

> Make FilterList accept var arg Filters in its constructor as a convenience
> --
>
> Key: HBASE-3770
> URL: https://issues.apache.org/jira/browse/HBASE-3770
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
>Priority: Minor
> Fix For: 0.90.3
>
> Attachments: HBASE-3770.patch
>
>
> When using a small number of Filters for a FilterList, it's cleaner to use 
> var args rather than forcing a list on the client. Compare:
> scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, new 
> FirstKeyOnlyFilter(), new KeyOnlyFilter()));
> vs:
> List filters = new ArrayList(2);
> filters.add(new FilrstKeyOnlyFilter());
> filters.add(new KeyOnlyFilter());
> scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3770) Make FilterList accept var arg Filters in its constructor as a convenience

Make FilterList accept var arg Filters in its constructor as a convenience
--

 Key: HBASE-3770
 URL: https://issues.apache.org/jira/browse/HBASE-3770
 Project: HBase
  Issue Type: Improvement
Reporter: Erik Onnen
Priority: Minor


When using a small number of Filters for a FilterList, it's cleaner to use var 
args rather than forcing a list on the client. Compare:

scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, new 
FirstKeyOnlyFilter(), new KeyOnlyFilter()));

vs:

List filters = new ArrayList(2);
filters.add(new FilrstKeyOnlyFilter());
filters.add(new KeyOnlyFilter());
scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3769) TableMapReduceUtil is inconsistent with other table-related classes that accept byte[] as a table name


 [ 
https://issues.apache.org/jira/browse/HBASE-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Onnen updated HBASE-3769:
--

Priority: Trivial  (was: Major)

Changing priority

> TableMapReduceUtil is inconsistent with other table-related classes that 
> accept byte[] as a table name
> --
>
> Key: HBASE-3769
> URL: https://issues.apache.org/jira/browse/HBASE-3769
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
>Priority: Trivial
> Fix For: 0.90.3
>
> Attachments: HBASE-3769.patch
>
>
> Minor gripe but we define our entire schema as a set of byte[] constants for 
> tables and CFs. This works well with HTable and HTablePool but 
> TableMapReduceUtil requires conversion to a string, most table-related 
> classes do not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3769) TableMapReduceUtil is inconsistent with other table-related classes that accept byte[] as a table name


 [ 
https://issues.apache.org/jira/browse/HBASE-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Onnen updated HBASE-3769:
--

Fix Version/s: 0.90.3
Affects Version/s: 0.90.3
   Status: Patch Available  (was: Open)

> TableMapReduceUtil is inconsistent with other table-related classes that 
> accept byte[] as a table name
> --
>
> Key: HBASE-3769
> URL: https://issues.apache.org/jira/browse/HBASE-3769
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
> Fix For: 0.90.3
>
> Attachments: HBASE-3769.patch
>
>
> Minor gripe but we define our entire schema as a set of byte[] constants for 
> tables and CFs. This works well with HTable and HTablePool but 
> TableMapReduceUtil requires conversion to a string, most table-related 
> classes do not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3769) TableMapReduceUtil is inconsistent with other table-related classes that accept byte[] as a table name

TableMapReduceUtil is inconsistent with other table-related classes that accept 
byte[] as a table name
--

 Key: HBASE-3769
 URL: https://issues.apache.org/jira/browse/HBASE-3769
 Project: HBase
  Issue Type: Improvement
Reporter: Erik Onnen
 Attachments: HBASE-3769.patch

Minor gripe but we define our entire schema as a set of byte[] constants for 
tables and CFs. This works well with HTable and HTablePool but 
TableMapReduceUtil requires conversion to a string, most table-related classes 
do not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3769) TableMapReduceUtil is inconsistent with other table-related classes that accept byte[] as a table name


 [ 
https://issues.apache.org/jira/browse/HBASE-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Onnen updated HBASE-3769:
--

Attachment: HBASE-3769.patch

> TableMapReduceUtil is inconsistent with other table-related classes that 
> accept byte[] as a table name
> --
>
> Key: HBASE-3769
> URL: https://issues.apache.org/jira/browse/HBASE-3769
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
> Fix For: 0.90.3
>
> Attachments: HBASE-3769.patch
>
>
> Minor gripe but we define our entire schema as a set of byte[] constants for 
> tables and CFs. This works well with HTable and HTablePool but 
> TableMapReduceUtil requires conversion to a string, most table-related 
> classes do not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3768) Add best practice to book for loading row key only


 [ 
https://issues.apache.org/jira/browse/HBASE-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Onnen updated HBASE-3768:
--

Attachment: HBASE-3768.patch

> Add best practice to book for loading row key only
> --
>
> Key: HBASE-3768
> URL: https://issues.apache.org/jira/browse/HBASE-3768
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
>Priority: Minor
> Attachments: HBASE-3768.patch
>
>
> Book and wiki FAQs are missing guidance on the recommended practice for 
> loading row keys only during a scan.
> Patch attached based on jdcryans' feedback from IRC.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3768) Add best practice to book for loading row key only

Add best practice to book for loading row key only
--

 Key: HBASE-3768
 URL: https://issues.apache.org/jira/browse/HBASE-3768
 Project: HBase
  Issue Type: Improvement
Reporter: Erik Onnen
Priority: Minor


Book and wiki FAQs are missing guidance on the recommended practice for loading 
row keys only during a scan.

Patch attached based on jdcryans' feedback from IRC.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3768) Add best practice to book for loading row key only


 [ 
https://issues.apache.org/jira/browse/HBASE-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Onnen updated HBASE-3768:
--

Affects Version/s: 0.90.3
   Status: Patch Available  (was: Open)

> Add best practice to book for loading row key only
> --
>
> Key: HBASE-3768
> URL: https://issues.apache.org/jira/browse/HBASE-3768
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Erik Onnen
>Priority: Minor
>
> Book and wiki FAQs are missing guidance on the recommended practice for 
> loading row keys only during a scan.
> Patch attached based on jdcryans' feedback from IRC.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3680) Publish more metrics about mslab

2011-04-12 Thread Chris Tarnas (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018919#comment-13018919
 ] 

Chris Tarnas commented on HBASE-3680:
-

I have been seeing similar problems with MSLAB enabled. When under heavy 
read/write workloads we've seen regionservers OOME and some, what appear to be, 
long GC pauses. When I get a chance i'll do more testing with GC logging 
re-enabled and MSLAB enabled. 

> Publish more metrics about mslab
> 
>
> Key: HBASE-3680
> URL: https://issues.apache.org/jira/browse/HBASE-3680
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.1
>Reporter: Jean-Daniel Cryans
>Assignee: Todd Lipcon
> Fix For: 0.92.0
>
>
> We have been using mslab on all our clusters for a while now and it seems it 
> tends to OOME or send us into GC loops of death a lot more than it used to. 
> For example, one RS with mslab enabled and 7GB of heap died out of OOME this 
> afternoon; it had .55GB in the block cache and 2.03GB in the memstores which 
> doesn't account for much... but it could be that because of mslab a lot of 
> space was lost in those incomplete 2MB blocks and without metrics we can't 
> really tell. Compactions were running at the time of the OOME and I see block 
> cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even 
> take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018872#comment-13018872
 ] 

Ted Yu commented on HBASE-3767:
---

The following call would end up in native code and give us the answer:
{code}
Runtime.getRuntime().availableProcessors()
{code}

> Cache the number of RS in HTable
> 
>
> Key: HBASE-3767
> URL: https://issues.apache.org/jira/browse/HBASE-3767
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.3
>
>
> When creating a new HTable we have to query ZK to learn about the number of 
> region servers in the cluster. That is done for every single one of them, I 
> think instead we should do it once per JVM and then reuse that number for all 
> the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3609) Improve the selection of regions to balance; part 2


 [ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3609:
--

Attachment: 3609-double-alternation.txt

This patch combines 3609-empty-RS.txt with my earlier enhancement.
Basically I find the new regions and put them on different underloaded servers. 
Previously one underloaded server would be filled up before the next 
underloaded server is considered.

> Improve the selection of regions to balance; part 2
> ---
>
> Key: HBASE-3609
> URL: https://issues.apache.org/jira/browse/HBASE-3609
> Project: HBase
>  Issue Type: Improvement
>Reporter: stack
>Assignee: Ted Yu
> Attachments: 3609-alternate.txt, 3609-double-alternation.txt, 
> 3609-empty-RS.txt, hbase-3609-by-region-age.txt, hbase-3609.txt
>
>
> See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
> of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3765) metrics.xml - small format change and adding nav to hbase book metrics section

2011-04-12 Thread Doug Meil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018867#comment-13018867
 ] 

Doug Meil commented on HBASE-3765:
--

It was like that when I got it, honest!   :-)

But seriously, all those other section definitions (i.e.,  ) 
were already like that before my change.  The one that wasn't was the "HOWTO", 
but that one also didn't look right in the generated page.

> metrics.xml - small format change and adding nav to hbase book metrics section
> --
>
> Key: HBASE-3765
> URL: https://issues.apache.org/jira/browse/HBASE-3765
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: metrics_HBASE-3765.xml.patch
>
>
> (in src\site\xdoc)
> There was a section header near the top of page that wasn't formatted in bold 
> which I changed.
> Adding small section at bottom to refer to the HBase book metrics section for 
> more info.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018863#comment-13018863
 ] 

stack commented on HBASE-3767:
--

bq. Can we default the value for hbase.htable.threads.max using a multiple of 
the available processors ?

Thats better than doing getCurrentNrHRS.  Maybe 2* number of processors.  We'd 
have to do a call outside of java to figure system characteristics?  

> Cache the number of RS in HTable
> 
>
> Key: HBASE-3767
> URL: https://issues.apache.org/jira/browse/HBASE-3767
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.2
>Reporter: Jean-Daniel Cryans
> Fix For: 0.90.3
>
>
> When creating a new HTable we have to query ZK to learn about the number of 
> region servers in the cluster. That is done for every single one of them, I 
> think instead we should do it once per JVM and then reuse that number for all 
> the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3765) metrics.xml - small format change and adding nav to hbase book metrics section


[ 
https://issues.apache.org/jira/browse/HBASE-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018860#comment-13018860
 ] 

stack commented on HBASE-3765:
--

I don't see 'name' as an attribute of section -- see 
http://docbook.org/tdg51/en/html/section.html.  Its not in common nor linkable 
attributes either.  You know where you got it?  Maybe it was in the doc before 
you showed up, a mistake I made way back?

> metrics.xml - small format change and adding nav to hbase book metrics section
> --
>
> Key: HBASE-3765
> URL: https://issues.apache.org/jira/browse/HBASE-3765
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: metrics_HBASE-3765.xml.patch
>
>
> (in src\site\xdoc)
> There was a section header near the top of page that wasn't formatted in bold 
> which I changed.
> Adding small section at bottom to refer to the HBase book metrics section for 
> more info.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3765) metrics.xml - small format change and adding nav to hbase book metrics section

2011-04-12 Thread Doug Meil (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018836#comment-13018836
 ] 

Doug Meil commented on HBASE-3765:
--

That was intentional.  The current version of the page looks a little goofy 
because that first section title isn't bold, but the rest of the section 
headers show up bold.

http://hbase.apache.org/metrics.html

... and the rest of the sections are defined like this...




... so I made that section definition look like the rest.  After generating 
this change locally it looked OK (i.e., that first header was bold like the 
rest).


> metrics.xml - small format change and adding nav to hbase book metrics section
> --
>
> Key: HBASE-3765
> URL: https://issues.apache.org/jira/browse/HBASE-3765
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: metrics_HBASE-3765.xml.patch
>
>
> (in src\site\xdoc)
> There was a section header near the top of page that wasn't formatted in bold 
> which I changed.
> Adding small section at bottom to refer to the HBase book metrics section for 
> more info.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3529) Add search to HBase

2011-04-12 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018835#comment-13018835
 ] 

Jason Rutherglen commented on HBASE-3529:
-

I'm working on profiling and optimizing the HDFS random access, so that the 
Lucene HDFS queries are the same as native file system access using 
NIOFSDirectory.  

I think one extremely direct approach is to set the max block size to something 
above all Lucene segments files (at runtime via the DFSClient.create method).  
This will guarantee that there is only one underlying java.io.File per HDFS 
file, and so random access will avoid navigating block structures (which 
require expensive network calls, a binary search, and object creation overhead).

> Add search to HBase
> ---
>
> Key: HBASE-3529
> URL: https://issues.apache.org/jira/browse/HBASE-3529
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.0
>Reporter: Jason Rutherglen
> Attachments: HBASE-3529.patch, 
> lucene-analyzers-common-4.0-SNAPSHOT.jar, lucene-core-4.0-SNAPSHOT.jar, 
> lucene-misc-4.0-SNAPSHOT.jar
>
>
> Using the Apache Lucene library we can add freetext search to HBase.  The 
> advantages of this are:
> * HBase is highly scalable and distributed
> * HBase is realtime
> * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
> * Lucene offers many types of queries not currently available in HBase (eg, 
> AND, OR, NOT, phrase, etc)
> * It's easier to build scalable realtime systems on top of already 
> architecturally sound, scalable realtime data system, eg, HBase.
> * Scaling realtime search will be as simple as scaling HBase.
> Phase 1 - Indexing:
> * Integrate Lucene into HBase such that an index mirrors a given region.  
> This means cascading add, update, and deletes between a Lucene index and an 
> HBase region (and vice versa).
> * Define meta-data to mark a region as indexed, and use a Solr schema to 
> allow the user to define the fields and analyzers.
> * Integrate with the HLog to ensure that index recovery can occur properly 
> (eg, on region server failure)
> * Mirror region splits with indexes (use Lucene's IndexSplitter?)
> * When a region is written to HDFS, also write the corresponding Lucene index 
> to HDFS.
> * A row key will be the ID of a given Lucene document.  The Lucene docstore 
> will explicitly not be used because the document/row data is stored in HBase. 
>  We will need to solve what the best data structure for efficiently mapping a 
> docid -> row key is.  It could be a docstore, field cache, column stride 
> fields, or some other mechanism.
> * Write unit tests for the above
> Phase 2 - Queries:
> * Enable distributed Lucene queries
> * Regions that have Lucene indexes are inherently available and may be 
> searched on, meaning there's no need for a separate search related system in 
> Zookeeper.
> * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed

2011-04-12 Thread gaojinchao (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018791#comment-13018791
 ] 

gaojinchao commented on HBASE-3722:
---

In my cluster :
1.HDFS cluster is HA namenode( ANN and BNN)
2.HBASE Version 0.90.1:
  Active Hmaster: C4C1 
  Backup Hmaster: C4C2
  Region server: C4C3,C4C4,C4C5,...

operation:
1.ANN crashed and BNN becomed Active(that needs some time)
2.Some region server crashed(eg:C4C3 has meta table) that Hbase client is 
putting into data and some Region server is ok.
3.Hmaster split hlog failed and skip it.
4.BNN had been active and Hmaster had finished processed shutdown event.
5.A lots of data is lost that region server had crashed.


log as:
14:57:58 C4C3 shutdow itself  because of ANN crashed.
skip splitlog and ressigned Meta table.  

2011-04-12 14:57:58,782 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs 
for C4C3.site,60020,1302590910433
2011-04-12 14:57:59,790 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: C4C1/157.5.100.1:9000. Already tried 0 time(s).

2011-04-12 14:58:08,793 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
2011-04-12 14:58:08,795 ERROR org.apache.hadoop.hbase.master.MasterFileSystem: 
Failed splitting hdfs://C4C1:9000/hbase/.logs/C4C3.site,60020,1302590910433
java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection 
exception: java.net.ConnectException: Connection refused
2011-04-12 14:58:08,805 INFO 
org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region 
location in ZooKeeper
2011-04-12 14:58:08,880 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: 
Failed verification of .META.,,1 at address=C4C3.site:60020; 
java.net.ConnectException: Connection refused
2011-04-12 14:58:08,880 INFO org.apache.hadoop.hbase.catalog.CatalogTracker: 
Current cached META location is not valid, resetting

Hmaster finished process shutdown event when BNN becomes active and meta table 
ressigned 

2011-04-12 15:00:31,681 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
2011-04-12 15:00:32,682 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
2011-04-12 15:00:40,698 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  .META.,,1.1028785192 state=OPENING, 
ts=1302591600701
2011-04-12 15:00:40,699 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been OPENING for too long, reassigning region=.META.,,1.1028785192
2011-04-12 15:00:40,709 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Successfully transitioned region=.META.,,1.1028785192 into OFFLINE and forcing 
a new assignment
2011-04-12 15:00:40,712 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  -ROOT-,,0.70236052 state=OPENING, 
ts=1302591600718
2011-04-12 15:00:40,712 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been OPENING for too long, reassigning region=-ROOT-,,0.70236052
2011-04-12 15:00:40,725 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Successfully transitioned region=-ROOT-,,0.70236052 into OFFLINE and forcing a 
new assignment
2011-04-12 15:00:40,892 INFO org.apache.hadoop.hbase.zookeeper.MetaNodeTracker: 
Detected completed assignment of META, notifying catalog tracker
2011-04-12 15:00:45,870 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 0 
region(s) that C4C3.site,60020,1302590910433 was carrying (skipping 0 
regions(s) that are already in transition)
2011-04-12 15:00:45,870 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished 
processing of shutdown of C4C3.site,60020,1302590910433



It has been lost that the Hlog is skipped if Hmaster don't restart when NN 
recovered.
so I think Hmaster should shutdown itslef when NN crashed.
like as region server roll Hlog shutdowns itself when it catchs any IO 
exception.

>  A lot of data is lost when name node crashed
> -
>
> Key: HBASE-3722
> URL: https://issues.apache.org/jira/browse/HBASE-3722
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.1
>Reporter: gaojinchao
> Attachments: HmasterFilesystem_PatchV1.patch
>
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on 
> connection exception: java.net.ConnectException: Connection refused
>  at or

[jira] [Commented] (HBASE-3740) hbck doesn't reset the number of errors when retrying


[ 
https://issues.apache.org/jira/browse/HBASE-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018780#comment-13018780
 ] 

Hudson commented on HBASE-3740:
---

Integrated in HBase-TRUNK #1846 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1846/])


> hbck doesn't reset the number of errors when retrying
> -
>
> Key: HBASE-3740
> URL: https://issues.apache.org/jira/browse/HBASE-3740
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.1
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.90.3
>
> Attachments: HBASE-3740.patch
>
>
> Using hbck to fix a problem, I see that when it retries it doesn't reset the 
> number of inconsistencies so the number doubles.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3624) Only one coprocessor of each priority type can be loaded for a table


[ 
https://issues.apache.org/jira/browse/HBASE-3624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018777#comment-13018777
 ] 

Hudson commented on HBASE-3624:
---

Integrated in HBase-TRUNK #1846 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1846/])


> Only one coprocessor of each priority type can be loaded for a table
> 
>
> Key: HBASE-3624
> URL: https://issues.apache.org/jira/browse/HBASE-3624
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors
>Affects Versions: 0.92.0
> Environment: Standalone HBase, linux
>Reporter: Jesse Daniels
>Assignee: Andrew Purtell
> Fix For: 0.92.0
>
> Attachments: HBASE-3624.patch
>
>
> Coprocessors are added to HBase using a TreeSet that is initialized with an 
> EnvironmentPriorityComparator. The net effect is that only one coprocessor of 
> a given priority can be loaded at a time for a given table. This appears to 
> be due to how the TreeSet uses the EnvironmentPriorityComparator to determine 
> whether there are duplicate entries - if the coprocessors have the same 
> priority (e.g., User), they are considered the same and won't be added to the 
> Set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3764) Book.xml - adding 2 FAQs (SQL and arch question)


[ 
https://issues.apache.org/jira/browse/HBASE-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018779#comment-13018779
 ] 

Hudson commented on HBASE-3764:
---

Integrated in HBase-TRUNK #1846 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1846/])
HBASE-3764 Book.xml - adding 2 FAQs (SQL and arch question)


> Book.xml - adding 2 FAQs (SQL and arch question)
> 
>
> Key: HBASE-3764
> URL: https://issues.apache.org/jira/browse/HBASE-3764
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Fix For: 0.92.0
>
> Attachments: book_HBASE-3764.xml.patch
>
>
> Adding 2 general FAQs.
> 1) does HBase support SQL?  (Hive, but not really for most cases)... 
> 2) how does HBase work on HDFS?  (if HDFS is for large files without fast 
> lookup, how does HBase work?)  Doesn't answer the question inline but refers 
> to DataModel and Arch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3762) HTableFactory.releaseHTableInterface() wraps IOException in RuntimeException


[ 
https://issues.apache.org/jira/browse/HBASE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018778#comment-13018778
 ] 

Hudson commented on HBASE-3762:
---

Integrated in HBase-TRUNK #1846 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1846/])


> HTableFactory.releaseHTableInterface() wraps IOException in RuntimeException
> 
>
> Key: HBASE-3762
> URL: https://issues.apache.org/jira/browse/HBASE-3762
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.2
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: HBASE-3762.patch
>
>
> Currently HTableFactory.releaseHTableInterface() wraps IOException in 
> RuntimeException.
> We should let HTableInterfaceFactory.releaseHTableInterface() throw 
> IOException explicitly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3756) Can't move META or ROOT from shell


[ 
https://issues.apache.org/jira/browse/HBASE-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018773#comment-13018773
 ] 

Hudson commented on HBASE-3756:
---

Integrated in HBase-TRUNK #1846 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1846/])


> Can't move META or ROOT from shell
> --
>
> Key: HBASE-3756
> URL: https://issues.apache.org/jira/browse/HBASE-3756
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 0.90.3
>
> Attachments: 3756.txt
>
>
> Fails with unknownregionexception:
> {code}
> ERROR: java.lang.reflect.UndeclaredThrowableException: 
> org.apache.hadoop.hbase.UnknownRegionException: -ROOT-,,0,70236052
> at org.apache.hadoop.hbase.master.HMaster.move(HMaster.java:729)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3652) Speed up tests by lowering some sleeps


[ 
https://issues.apache.org/jira/browse/HBASE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018774#comment-13018774
 ] 

Hudson commented on HBASE-3652:
---

Integrated in HBase-TRUNK #1846 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1846/])


> Speed up tests by lowering some sleeps
> --
>
> Key: HBASE-3652
> URL: https://issues.apache.org/jira/browse/HBASE-3652
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.1
>Reporter: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.90.3
>
> Attachments: HBASE-3652.patch
>
>
> While trying TestAdmin in the scope of HBASE-3650, I saw that it takes a lot 
> more time to run than it used to. Upon inspection I see that there's 2 
> hardcoded 1 second sleeps in DisableTableHandler and EnableTableHandler in 
> waitUntilDone (which is almost the same code in both cases too). Setting that 
> down to 50ms dropped the run time in half... and I'm sure there's a few other 
> sleeps that we could get rid of.
> I think that at least those 1sec should be configurable so that we can tune 
> them down in the tests, but I wonder if we need them at all.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3750) HTablePool.putTable() should call tableFactory.releaseHTableInterface() for discarded table


[ 
https://issues.apache.org/jira/browse/HBASE-3750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018775#comment-13018775
 ] 

Hudson commented on HBASE-3750:
---

Integrated in HBase-TRUNK #1846 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1846/])


> HTablePool.putTable() should call tableFactory.releaseHTableInterface() for 
> discarded table
> ---
>
> Key: HBASE-3750
> URL: https://issues.apache.org/jira/browse/HBASE-3750
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.1
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 0.90.3
>
> Attachments: 3750-addendum.patch, 3750-v2.patch, 3750.txt
>
>
> Currently HTablePool.putTable() doesn't call table.flushCommits()
> When HTable instance is discarded in putTable(), we should call 
> tableFactory.releaseHTableInterface().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3744) createTable blocks until all regions are out of transition


[ 
https://issues.apache.org/jira/browse/HBASE-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018772#comment-13018772
 ] 

Hudson commented on HBASE-3744:
---

Integrated in HBase-TRUNK #1846 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1846/])


> createTable blocks until all regions are out of transition
> --
>
> Key: HBASE-3744
> URL: https://issues.apache.org/jira/browse/HBASE-3744
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.90.1
>Reporter: Todd Lipcon
>Assignee: Ted Yu
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 3744-addendum.txt, 3744-v2.txt, 3744-v3.txt, 3744.txt, 
> create_big_tables.rb, create_big_tables.rb, create_big_tables.rb
>
>
> In HBASE-3305, the behavior of createTable was changed and introduced this 
> bug: createTable now blocks until all regions have been assigned, since it 
> uses BulkStartupAssigner. BulkStartupAssigner.waitUntilDone calls 
> assignmentManager.waitUntilNoRegionsInTransition, which waits across all 
> regions, not just the regions of the table that has just been created.
> We saw an issue where one table had a region which was unable to be opened, 
> so it was stuck in RegionsInTransition permanently (every open was failing). 
> Since this was the case, waitUntilDone would always block indefinitely even 
> though the newly created table had been assigned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3734) HBaseAdmin creates new configurations in getCatalogTracker