[jira] [Commented] (HBASE-3763) Add Bloom Block Index Support

2011-04-13 Thread Nicolas Spiegelberg (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019223#comment-13019223
 ] 

Nicolas Spiegelberg commented on HBASE-3763:


@stack: we ran into a problem where our bloom sizes were getting quite 
substantial (100 MB.  Believe it or not, blooms still make sense here). When 
this is not in the LRU cache, read requests stall until the entire bloom is 
loaded into memory.  Sometimes, this can be a non-local read.  If we can do a 
block index for blooms and only have to load a 64kb shard, our read stalls will 
severely diminish.

 Add Bloom Block Index Support
 -

 Key: HBASE-3763
 URL: https://issues.apache.org/jira/browse/HBASE-3763
 Project: HBase
  Issue Type: Improvement
  Components: io, regionserver
Affects Versions: 0.89.20100924, 0.90.0, 0.90.1, 0.90.2
Reporter: mikhail
Assignee: mikhail
Priority: Minor
  Labels: hbase, performance
 Fix For: 0.89.20100924

   Original Estimate: 0h
  Remaining Estimate: 0h

 Add a way to save HBase Bloom filters into an array of Meta blocks instead of 
 one big Meta block, and load only the blocks required to answer a query.  
 This will allow us faster bloom load times for large StoreFiles  pave the 
 path for adding Bloom Filter support to HFileOutputFormat bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-13 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019227#comment-13019227
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review438
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment855

ill effects of copy-paste. will change.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment854

Removed it.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment852

I use Eclipse formatter (which says it is using Apache's standard, and it 
is inserting these spaces. I tried to edit the setting to make it work, but 
couldn't find the way for these extra spaces between doc and arg list. I 
removed them manually, but want to know the standard approach.



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment853

Yes, will do it. Thanks.


- himanshu


On 2011-04-12 04:41:49, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-12 04:41:49)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-13 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-1512:
---

Attachment: patch-1512-5.txt

Stack reviewed it half on rb (https://reviews.apache.org/r/585/). Incorporated 
his suggestions and uploading the new patch here, as rb request is initiated by 
Ted (and I don't think I can upload this under the same rb request). 
Ted, please post this version on rb for further reviews. 

Thanks.

 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3762) HTableFactory.releaseHTableInterface() wraps IOException in RuntimeException

2011-04-13 Thread Lars George (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019240#comment-13019240
 ] 

Lars George commented on HBASE-3762:


Sorry for bumping the issue, but wouldn't it make sense to add IOException to 
the getTable() too as eventually it creates an HTable, and that can throw an 
IOE, and currently this is also wrapped in an ugly RTE catch all.

 HTableFactory.releaseHTableInterface() wraps IOException in RuntimeException
 

 Key: HBASE-3762
 URL: https://issues.apache.org/jira/browse/HBASE-3762
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.2
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.92.0

 Attachments: HBASE-3762.patch


 Currently HTableFactory.releaseHTableInterface() wraps IOException in 
 RuntimeException.
 We should let HTableInterfaceFactory.releaseHTableInterface() throw 
 IOException explicitly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-13 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019248#comment-13019248
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/#review440
---



/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
https://reviews.apache.org/r/585/#comment863

This is the type parameter for return value.


- Ted


On 2011-04-12 04:41:49, Ted Yu wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/585/
bq.  ---
bq.  
bq.  (Updated 2011-04-12 04:41:49)
bq.  
bq.  
bq.  Review request for hbase and Gary Helmling.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  This patch provides reference implementation for aggregate function 
support through Coprocessor framework.
bq.  ColumnInterpreter interface allows client to specify how the value's byte 
array is interpreted.
bq.  Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html
bq.  
bq.  Himanshu Vashishtha started the work. I provided some review comments and 
some of the code.
bq.  
bq.  
bq.  This addresses bug HBASE-1512.
bq.  https://issues.apache.org/jira/browse/HBASE-1512
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
bq.
/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/585/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  TestAggFunctions passes.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ted
bq.  
bq.



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1512) Coprocessors: Support aggregate functions

2011-04-13 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019250#comment-13019250
 ] 

jirapos...@reviews.apache.org commented on HBASE-1512:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/585/
---

(Updated 2011-04-13 08:37:14.182698)


Review request for hbase and Gary Helmling.


Changes
---

Himanshu updated the patch according to Stack's suggestions.


Summary
---

This patch provides reference implementation for aggregate function support 
through Coprocessor framework.
ColumnInterpreter interface allows client to specify how the value's byte array 
is interpreted.
Some of the thoughts are summarized at 
http://zhihongyu.blogspot.com/2011/03/genericizing-endpointcoprocessor.html

Himanshu Vashishtha started the work. I provided some review comments and some 
of the code.


This addresses bug HBASE-1512.
https://issues.apache.org/jira/browse/HBASE-1512


Diffs (updated)
-

  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
 PRE-CREATION 
  
/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
 PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateCpProtocol.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocolImpl.java 
PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/coprocessor/TestAggFunctions.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/585/diff


Testing
---

TestAggFunctions passes.


Thanks,

Ted



 Coprocessors: Support aggregate functions
 -

 Key: HBASE-1512
 URL: https://issues.apache.org/jira/browse/HBASE-1512
 Project: HBase
  Issue Type: Sub-task
  Components: coprocessors
Reporter: stack
 Attachments: 1512.zip, AggregateCpProtocol.java, 
 AggregateProtocolImpl.java, AggregationClient.java, ColumnInterpreter.java, 
 patch-1512-2.txt, patch-1512-3.txt, patch-1512-4.txt, patch-1512-5.txt, 
 patch-1512.txt


 Chatting with jgray and holstad at the kitchen table about counts, sums, and 
 other aggregating facility, facility generally where you want to calculate 
 some meta info on your table, it seems like it wouldn't be too hard making a 
 filter type that could run a function server-side and return the result ONLY 
 of the aggregation or whatever.
 For example, say you just want to count rows, currently you scan, server 
 returns all data to client and count is done by client counting up row keys.  
 A bunch of time and resources have been wasted returning data that we're not 
 interested in.  With this new filter type, the counting would be done 
 server-side and then it would make up a new result that was the count only 
 (kinda like mysql when you ask it to count, it returns a 'table' with a count 
 column whose value is count of rows).   We could have it so the count was 
 just done per region and return that.  Or we could maybe make a small change 
 in scanner too so that it aggregated the per-region counts.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3609) Improve the selection of regions to balance; part 2

2011-04-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3609:
--

Attachment: (was: 3609-double-alternation.txt)

 Improve the selection of regions to balance; part 2
 ---

 Key: HBASE-3609
 URL: https://issues.apache.org/jira/browse/HBASE-3609
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Ted Yu
 Attachments: 3609-empty-RS.txt, hbase-3609-by-region-age.txt, 
 hbase-3609.txt


 See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
 of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3771) All jsp pages don't clean their HBA

2011-04-13 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-3771.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to branch and trunk, thanks for the review Stack.

 All jsp pages don't clean their HBA
 ---

 Key: HBASE-3771
 URL: https://issues.apache.org/jira/browse/HBASE-3771
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.3

 Attachments: HBASE-3771.patch


 Noticed by Dave Latham, refreshing the zk web page will eventually make that 
 machine run out of connections with ZK. It's because we don't close the 
 connection created inside HBA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3708) createAndFailSilent is not so silent; leaves lots of logging in ensemble logs

2011-04-13 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019464#comment-13019464
 ] 

Dmitriy V. Ryaboy commented on HBASE-3708:
--

I posted the patch a while back, but I guess something was wrong with the RB 
integration? Here it is: https://review.cloudera.org/r/1672/

 createAndFailSilent is not so silent; leaves lots of logging in ensemble logs
 -

 Key: HBASE-3708
 URL: https://issues.apache.org/jira/browse/HBASE-3708
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: stack
Assignee: Dmitriy V. Ryaboy

 Clients on startup create a ZKWatcher instance.  Part of construction is 
 check that hbase dirs are all up in zk.  Its done by making the following 
 call: 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/zookeeper/ZKUtil.html#898
 A user complains that its making for lots of logging every second over on the 
 zk ensemble:
 14:59 seeing lots of these in the ZK log though, dozens per second of 
 Got user-level KeeperException when processing sessionid:0x42daa1daab0ecbe 
 type:create cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a 
 Error Path:/hbase Error:KeeperErrorCode = NodeExists for /hbase

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned

2011-04-13 Thread Jerry Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019468#comment-13019468
 ] 

Jerry Chen commented on HBASE-3685:
---

@stack, can you take a look at this? Kannan and Jonathan have reviewed it 
internally. 

 when multiple columns are combined with TimestampFilter, only one column is 
 returned
 

 Key: HBASE-3685
 URL: https://issues.apache.org/jira/browse/HBASE-3685
 Project: HBase
  Issue Type: Bug
  Components: filters, regionserver
Reporter: Jerry Chen
Priority: Minor
  Labels: noob
 Attachments: 3685-missing-column.patch


 As reported by an Hbase user: 
 I have a ThreadMetadata column family, and there are two columns in it: 
 v12:th: and v12:me. The following code only returns v12:me
 get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:th:);
 get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:me:);
 ListLong threadIds = new ArrayListLong();
 threadIds.add(10709L);
 TimestampFilter filter = new TimestampFilter(threadIds);
 get.setFilter(filter);
 get.setMaxVersions();
 Result result = table.get(get);
 I checked hbase for the key/value, they are present. Also other combinations 
 like no timestampfilter, it returns both.
 Kannan was able to do a small repro of the issue and commented that if we 
 drop the get.setMaxVersions(), then the problem goes away. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3708) createAndFailSilent is not so silent; leaves lots of logging in ensemble logs

2011-04-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019469#comment-13019469
 ] 

Ted Yu commented on HBASE-3708:
---

In the future, please use https://reviews.apache.org
Your review request wouldn't be bounced.

 createAndFailSilent is not so silent; leaves lots of logging in ensemble logs
 -

 Key: HBASE-3708
 URL: https://issues.apache.org/jira/browse/HBASE-3708
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: stack
Assignee: Dmitriy V. Ryaboy

 Clients on startup create a ZKWatcher instance.  Part of construction is 
 check that hbase dirs are all up in zk.  Its done by making the following 
 call: 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/zookeeper/ZKUtil.html#898
 A user complains that its making for lots of logging every second over on the 
 zk ensemble:
 14:59 seeing lots of these in the ZK log though, dozens per second of 
 Got user-level KeeperException when processing sessionid:0x42daa1daab0ecbe 
 type:create cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a 
 Error Path:/hbase Error:KeeperErrorCode = NodeExists for /hbase

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3708) createAndFailSilent is not so silent; leaves lots of logging in ensemble logs

2011-04-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019471#comment-13019471
 ] 

stack commented on HBASE-3708:
--

Sorry Dmitriy, the feedback loop between cloudera's rb and JIRA has been broke 
w/ a while so fellas have been manually flagging postings to RB by noting them 
here in JIRA.  Let me take a look.  We've also since moved to 
reviews.apache.org for our RB -- for the future.

 createAndFailSilent is not so silent; leaves lots of logging in ensemble logs
 -

 Key: HBASE-3708
 URL: https://issues.apache.org/jira/browse/HBASE-3708
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: stack
Assignee: Dmitriy V. Ryaboy

 Clients on startup create a ZKWatcher instance.  Part of construction is 
 check that hbase dirs are all up in zk.  Its done by making the following 
 call: 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/zookeeper/ZKUtil.html#898
 A user complains that its making for lots of logging every second over on the 
 zk ensemble:
 14:59 seeing lots of these in the ZK log though, dozens per second of 
 Got user-level KeeperException when processing sessionid:0x42daa1daab0ecbe 
 type:create cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a 
 Error Path:/hbase Error:KeeperErrorCode = NodeExists for /hbase

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3769) TableMapReduceUtil is inconsistent with other table-related classes that accept byte[] as a table name

2011-04-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019517#comment-13019517
 ] 

Hudson commented on HBASE-3769:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


 TableMapReduceUtil is inconsistent with other table-related classes that 
 accept byte[] as a table name
 --

 Key: HBASE-3769
 URL: https://issues.apache.org/jira/browse/HBASE-3769
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.3
Reporter: Erik Onnen
Assignee: Erik Onnen
Priority: Trivial
 Fix For: 0.92.0

 Attachments: HBASE-3769.patch


 Minor gripe but we define our entire schema as a set of byte[] constants for 
 tables and CFs. This works well with HTable and HTablePool but 
 TableMapReduceUtil requires conversion to a string, most table-related 
 classes do not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3768) Add best practice to book for loading row key only

2011-04-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019519#comment-13019519
 ] 

Hudson commented on HBASE-3768:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


 Add best practice to book for loading row key only
 --

 Key: HBASE-3768
 URL: https://issues.apache.org/jira/browse/HBASE-3768
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.3
Reporter: Erik Onnen
Assignee: Erik Onnen
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3768.patch


 Book and wiki FAQs are missing guidance on the recommended practice for 
 loading row keys only during a scan.
 Patch attached based on jdcryans' feedback from IRC.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3765) metrics.xml - small format change and adding nav to hbase book metrics section

2011-04-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019521#comment-13019521
 ] 

Hudson commented on HBASE-3765:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


 metrics.xml - small format change and adding nav to hbase book metrics section
 --

 Key: HBASE-3765
 URL: https://issues.apache.org/jira/browse/HBASE-3765
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Fix For: 0.92.0

 Attachments: metrics_HBASE-3765.xml.patch


 (in src\site\xdoc)
 There was a section header near the top of page that wasn't formatted in bold 
 which I changed.
 Adding small section at bottom to refer to the HBase book metrics section for 
 more info.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed

2011-04-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019520#comment-13019520
 ] 

Hudson commented on HBASE-3722:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


  A lot of data is lost when name node crashed
 -

 Key: HBASE-3722
 URL: https://issues.apache.org/jira/browse/HBASE-3722
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.1
Reporter: gaojinchao
 Fix For: 0.90.3

 Attachments: HmasterFilesystem_PatchV1.patch


 I'm not sure exactly what arose it. there is some split failed logs .
 the master should shutdown itself when the HDFS is crashed.
  The logs is :
  2011-03-22 13:21:55,056 WARN 
  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
  logs
  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on 
 connection exception: java.net.ConnectException: Connection refused
  at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
  at org.apache.hadoop.ipc.Client.call(Client.java:820)
  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
  at $Proxy5.getListing(Unknown Source)
  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
  at $Proxy5.getListing(Unknown Source)
  at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
  at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
  at 
 org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
  at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
  at 
  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
  Caused by: java.net.ConnectException: Connection refused
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
  at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
  at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
  at 
 org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
  at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
  at org.apache.hadoop.ipc.Client.call(Client.java:788)
  ... 13 more
  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
  2011-03-22 13:22:05,060 ERROR 
  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on 
 connection exception: java.net.ConnectException: Connection refused
  at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
  at org.apache.hadoop.ipc.Client.call(Client.java:820)
  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
  at $Proxy5.getFileInfo(Unknown Source)
  at 

[jira] [Commented] (HBASE-3759) Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and complete()

2011-04-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019518#comment-13019518
 ] 

Hudson commented on HBASE-3759:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


 Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and 
 complete()
 

 Key: HBASE-3759
 URL: https://issues.apache.org/jira/browse/HBASE-3759
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-3759.patch, cp_bypass.tar.gz


 In the current coprocessor framework, ThreadLocal objects are used for the 
 bypass and complete booleans in CoprocessorEnvironment.  This allows the 
 *CoprocessorHost implementations to identify when to short-circuit processing 
 the the preXXX and postXXX hook methods.
 Profiling the region server, however, shows that these ThreadLocals can 
 become a contention point when on a hot code path (such as prePut()).  We 
 should refactor the CoprocessorHost pre/post implementations to remove usage 
 of the ThreadLocal variables and replace them with locally scoped variables 
 to eliminate contention between handler threads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3770) Make FilterList accept var arg Filters in its constructor as a convenience

2011-04-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019523#comment-13019523
 ] 

Hudson commented on HBASE-3770:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


 Make FilterList accept var arg Filters in its constructor as a convenience
 --

 Key: HBASE-3770
 URL: https://issues.apache.org/jira/browse/HBASE-3770
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.3
Reporter: Erik Onnen
Assignee: Erik Onnen
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3770.patch


 When using a small number of Filters for a FilterList, it's cleaner to use 
 var args rather than forcing a list on the client. Compare:
 scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, new 
 FirstKeyOnlyFilter(), new KeyOnlyFilter()));
 vs:
 ListFilter filters = new ArrayListFilter(2);
 filters.add(new FilrstKeyOnlyFilter());
 filters.add(new KeyOnlyFilter());
 scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3771) All jsp pages don't clean their HBA

2011-04-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019522#comment-13019522
 ] 

Hudson commented on HBASE-3771:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])
HBASE-3771  All jsp pages don't clean their HBA


 All jsp pages don't clean their HBA
 ---

 Key: HBASE-3771
 URL: https://issues.apache.org/jira/browse/HBASE-3771
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.3

 Attachments: HBASE-3771.patch


 Noticed by Dave Latham, refreshing the zk web page will eventually make that 
 machine run out of connections with ZK. It's because we don't close the 
 connection created inside HBA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Karthick Sankarachary (JIRA)
Redefine Identity Of HBase Configuration


 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0


Judging from the javadoc in {{HConnectionManager}}, sharing connections across 
multiple clients going to the same cluster is supposedly a good thing. However, 
the fact that there is a one-to-one mapping between a configuration and 
connection instance, kind of works against that goal. Specifically, when you 
create {{HTable}} instances using a given {{Configuration}} instance and a copy 
thereof, we end up with two distinct {{HConnection}} instances under the 
covers. Is this really expected behavior, especially given that the 
configuration instance gets cloned a lot?

Here, I'd like to play devil's advocate and propose that we deep-compare 
{{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
instances that have the same properties map to the same {{HConnection}} 
instance. In case one is concerned that a single {{HConnection}} is 
insufficient for sharing amongst clients,  to quote the javadoc, then one 
should be able to mark a given {{HBaseConfiguration}} instance as being 
uniquely identifiable.

Note that sharing connections makes clean up of {{HConnection}} instances a 
little awkward, unless of course, you apply the change described in HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Karthick Sankarachary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthick Sankarachary updated HBASE-3777:
-

Attachment: HBASE-3777.patch

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Karthick Sankarachary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthick Sankarachary updated HBASE-3777:
-

Status: Patch Available  (was: Open)

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3778) HBaseAdmin.create doesn't create empty boundary keys

2011-04-13 Thread Ted Dunning (JIRA)
HBaseAdmin.create doesn't create empty boundary keys


 Key: HBASE-3778
 URL: https://issues.apache.org/jira/browse/HBASE-3778
 Project: HBase
  Issue Type: Bug
Reporter: Ted Dunning


In my ycsb stuff, I have code that looks like this:
{code}
String startKey = user102000;
String endKey = user94000;
admin.createTable(descriptor, startKey.getBytes(), endKey.getBytes(), 
regions);
{code}
The result, however, is a table where the first and last region has defined 
first and last keys rather than empty keys.

The patch I am about to attach fixes this, I think.  I have some worries about 
other uses of Bytes.split, however, and would like some eyes on this patch.  
Perhaps we need a new dialect of split.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3778) HBaseAdmin.create doesn't create empty boundary keys

2011-04-13 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated HBASE-3778:
---

Attachment: HBASE-3778.patch

Proposed patch.

Almost certainly breaks current tests.

 HBaseAdmin.create doesn't create empty boundary keys
 

 Key: HBASE-3778
 URL: https://issues.apache.org/jira/browse/HBASE-3778
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Ted Dunning
 Attachments: HBASE-3778.patch


 In my ycsb stuff, I have code that looks like this:
 {code}
 String startKey = user102000;
 String endKey = user94000;
 admin.createTable(descriptor, startKey.getBytes(), endKey.getBytes(), 
 regions);
 {code}
 The result, however, is a table where the first and last region has defined 
 first and last keys rather than empty keys.
 The patch I am about to attach fixes this, I think.  I have some worries 
 about other uses of Bytes.split, however, and would like some eyes on this 
 patch.  Perhaps we need a new dialect of split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3778) HBaseAdmin.create doesn't create empty boundary keys

2011-04-13 Thread Ted Dunning (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated HBASE-3778:
---

Affects Version/s: 0.90.2
   Status: Patch Available  (was: Open)

 HBaseAdmin.create doesn't create empty boundary keys
 

 Key: HBASE-3778
 URL: https://issues.apache.org/jira/browse/HBASE-3778
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Ted Dunning
 Attachments: HBASE-3778.patch


 In my ycsb stuff, I have code that looks like this:
 {code}
 String startKey = user102000;
 String endKey = user94000;
 admin.createTable(descriptor, startKey.getBytes(), endKey.getBytes(), 
 regions);
 {code}
 The result, however, is a table where the first and last region has defined 
 first and last keys rather than empty keys.
 The patch I am about to attach fixes this, I think.  I have some worries 
 about other uses of Bytes.split, however, and would like some eyes on this 
 patch.  Perhaps we need a new dialect of split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019550#comment-13019550
 ] 

Ted Yu commented on HBASE-3777:
---

I think this JIRA and HBASE-3766 combined can be expressed by my comment on 
HBASE-3734 at 05/Apr/11 05:20

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3609) Improve the selection of regions to balance; part 2

2011-04-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019560#comment-13019560
 ] 

Ted Yu commented on HBASE-3609:
---

From Stan Barton who helps me experiment with my changes:

There is no easy
way to check how many regions are assigned to particular RS, so will
probably need to write some small parser to prove that.

I think we should backport HBASE-3704 (at least Regions by Region Server) to 
0.90.3 so that people can easily tell how (un)even the load is distributed.

 Improve the selection of regions to balance; part 2
 ---

 Key: HBASE-3609
 URL: https://issues.apache.org/jira/browse/HBASE-3609
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Ted Yu
 Attachments: 3609-double-alternation.txt, 3609-empty-RS.txt, 
 hbase-3609-by-region-age.txt, hbase-3609.txt


 See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
 of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Karthick Sankarachary (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019570#comment-13019570
 ] 

Karthick Sankarachary commented on HBASE-3777:
--

Ted, 

I saw your comment on HBASE-3734. It:

a) Proposes a neater way of comparing {{Configuration}} instances, for the 
purposes of {{HConnection}} lookup. In fact, the thought of comparing just the 
cluster-specific properties in {{HBaseConfiguration}} did cross my mind. 
However, at times, you may want the ability to have multiple connections per 
cluster, which would not be possible using your approach. 

b) Validates the need for having a reference count on the connection. Instead 
of using a (refcount, connection) tuple as the value in HBASE_INSTANCES though, 
HBASE-3766 puts the refcount in the connection itself. Do you see a specific 
advantage to separating out the refcount from the connection?

Regards,
Karthick



 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019574#comment-13019574
 ] 

Ted Yu commented on HBASE-3777:
---

For a), I like the idea of adding uniquifier to HBaseConfiguration. This is can 
be standardized through a well-known configuration parameter, such as 
hbase.zookeeper.uniquifier (a secondary key really).

For b), I don't have strong opinion about particular implementation. What I 
have yet to propose is that we can implement (optional) timeout mechanism for 
connections to address the issue under the thread hbase -0.90.x upgrade - 
zookeeper exception in mapreduce job on user mailing list.
Maybe it's easier to enforce timeout policy in HCM, hence the centralized 
reference counting.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019592#comment-13019592
 ] 

Ted Yu commented on HBASE-3777:
---

J-D informed me that my initial proposal mirrors what used to be done in 0.89
The current design is to bypass certain issues encountered by 0.89

Shall we do the following ?
Step 1, agree upon mechanism for determining identity of HBaseConfiguration's 
and reference counting. Enumerate the possibilities of error from experience of 
0.89 development.
Step 2, implement the new mechanism in trunk.
Step 3, thoroughly test (YCSB, etc) before publishing.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Karthick Sankarachary (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019597#comment-13019597
 ] 

Karthick Sankarachary commented on HBASE-3777:
--

That sounds like a plan. Are there any threads that talk about the error cases 
we run into in 0.89?

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019601#comment-13019601
 ] 

Jean-Daniel Cryans commented on HBASE-3777:
---

This is one of the most important one, that also removed both hashCode and 
equals from HBaseConfiguration, HBASE-2925.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Karthick Sankarachary (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019619#comment-13019619
 ] 

Karthick Sankarachary commented on HBASE-3777:
--

I see. In that case, using a combination of 
{{conf.get(hbase.zookeeper.quorum)}} and 
{{conf.get(hbase.client.uniqueid)}} as the key, like Ted suggested, may be 
the way to go.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019630#comment-13019630
 ] 

Ted Yu commented on HBASE-3777:
---

Allow me to add step 2.5:
apply the implementation from step 2 on existing (and new) unit tests for 
validation.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3609) Improve the selection of regions to balance; part 2

2011-04-13 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3609:
--

Attachment: (was: 3609-double-alternation.txt)

 Improve the selection of regions to balance; part 2
 ---

 Key: HBASE-3609
 URL: https://issues.apache.org/jira/browse/HBASE-3609
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Ted Yu
 Attachments: 3609-empty-RS.txt, hbase-3609-by-region-age.txt, 
 hbase-3609.txt


 See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
 of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced

2011-04-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019635#comment-13019635
 ] 

Ted Yu commented on HBASE-3373:
---

Suggestion from Stan Barton:
This JIRA can be generalized as a new policy for load balancer. That is, to 
have balanced number of
regions per RS per table and not in total number of regions from all tables.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0

 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3759) Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and complete()

2011-04-13 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019676#comment-13019676
 ] 

jirapos...@reviews.apache.org commented on HBASE-3759:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/588/#review458
---



src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContext.java
https://reviews.apache.org/r/588/#comment885

First, please let me know if i am thinking in the right direction:

In the threadlocal version, we are setting it to false because this 
variable is shared by the registered CPs in all their pre/postXXX hooks, and it 
was used to decide whether to continue with the CP chain or return from the 
currently executing CP. So, to reuse this variable, it was set to false again.

If that is the case, in this version, we are having a separate instance of 
ObserverContext for one hook, and i don't see that we need to reset these 
variables.

The same goes with the current variable.

Am i getting it right?
(I want to come up with a CP observer for 3607, therefore want to grok it a 
bit, hope you don't mind)
Thanks.


- himanshu


On 2011-04-13 01:08:50, Gary Helmling wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/588/
bq.  ---
bq.  
bq.  (Updated 2011-04-13 01:08:50)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Profiling the HRegionServer process with a RegionObserver coprocessor 
loaded shows a fair amount of runnable thread CPU time spent getting the bypass 
and complete flag ThreadLocal values by RegionCoprocessorHost.  See the 
HBASE-3759 JIRA for some attached graphs.
bq.  
bq.  With the caveat that this is runnable CPU time and not threads in all 
states, this still seems like a significant processing bottleneck on a hot call 
path.  The workload profiled was a put-based bulk load, so for each multi-put 
request, RegionCoprocessorHost.prePut() could be called many times.
bq.  
bq.  Instead of using ThreadLocal variable for bypass/complete, which will 
incur contention on the underlying map of values, I think we can eliminate the 
bottleneck by using locally scoped variables for each preXXX/putXXX method 
called in the RegionCoprocessorHost, MasterCoprocessorHost and 
WALCoprocessorHost classes.
bq.  
bq.  The attached patch refactors the current RegionObserver, MasterObserver 
and WALObserver APIs to provide a locally scoped ObserverContext object for 
storing and checking the bypass and complete values.
bq.  
bq.  Summary of changes:
bq.  
bq.  * adds a new ObserverContextT extends CoprocessorEnvironment class, 
containing references for bypass, complete and the environment instance
bq.  * in each pre/post method in RegionObserver, the 
RegionCoprocessorEnvironment parameter is replaced by 
ObserverContextRegionCoprocessorEnvironment
bq.  * in each pre/post method in MasterObserver, the 
MasterCoprocessorEnvironment parameter is replaced by 
ObserverContextMasterCoprocessorEnvironment
bq.  * in each pre/post method in WALObserver, the WALCoprocessorEnvironment 
parameter is replace by ObserverContextWALCoprocesorEnvironment
bq.  
bq.  
bq.  This is obviously a large bulk change to the existing API.  I could avoid 
the API change with hacky modification underneath the *CoprocessorEnvironment 
interfaces.  But since we do not yet have a public release with coprocessors, I 
would prefer to take the time to make the initial API the best we can before we 
push it out.
bq.  
bq.  Please let me know your thoughts on this approach.
bq.  
bq.  
bq.  This addresses bug HBASE-3759.
bq.  https://issues.apache.org/jira/browse/HBASE-3759
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java 
9576c48 
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserverCoprocessor.java
 5a0f095 
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorEnvironment.java 
d45b950 
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 
a82f62b 
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java 
db0870b 
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContext.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java 
3501958 
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/WALObserver.java 
7a34d18 
bq.src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
019bbde 
bq.

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable

2011-04-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019684#comment-13019684
 ] 

Ted Yu commented on HBASE-3767:
---

The javadoc fragment doesn't mention allowCoreThreadTimeOut.

From 
http://fuseyism.com/classpath/doc/java/util/concurrent/ThreadPoolExecutor-source.html:
{code:Java}
 494: Runnable getTask() {
 495: for (;;) {
 496: try {
 497: switch (runState) {
 498: case RUNNING: {
 499: // untimed wait if core and not allowing core timeout
 500: if (poolSize = corePoolSize  
!allowCoreThreadTimeOut)
 501: return workQueue.take();
 502: 
 503: long timeout = keepAliveTime;
 504: if (timeout = 0) // die immediately for 0 timeout
 505: return null;
 506: Runnable r = workQueue.poll(timeout, 
TimeUnit.NANOSECONDS);
 507: if (r != null)
 508: return r;
 509: if (poolSize  corePoolSize || allowCoreThreadTimeOut)
 510: return null; // timed out
 511: // Else, after timeout, the pool shrank. Retry
 512: break;
 513: }
{code:Java}
In HTable(), allowCoreThreadTimeOut is set to true. So we're not bounded by 
corePoolSize threads.

 Cache the number of RS in HTable
 

 Key: HBASE-3767
 URL: https://issues.apache.org/jira/browse/HBASE-3767
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.3

 Attachments: HBASE-3767.patch


 When creating a new HTable we have to query ZK to learn about the number of 
 region servers in the cluster. That is done for every single one of them, I 
 think instead we should do it once per JVM and then reuse that number for all 
 the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable

2011-04-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019685#comment-13019685
 ] 

Ted Yu commented on HBASE-3767:
---

Re-pasting source code due to garbled display above:
{code}
 497: switch (runState) {
 498: case RUNNING: {
 499: // untimed wait if core and not allowing core timeout
 500: if (poolSize = corePoolSize  
!allowCoreThreadTimeOut)
 501: return workQueue.take();
 502: 
 503: long timeout = keepAliveTime;
 504: if (timeout = 0) // die immediately for 0 timeout
 505: return null;
 506: Runnable r = workQueue.poll(timeout, 
TimeUnit.NANOSECONDS);
 507: if (r != null)
 508: return r;
 509: if (poolSize  corePoolSize || allowCoreThreadTimeOut)
 510: return null; // timed out
 511: // Else, after timeout, the pool shrank. Retry
 512: break;
 513: }
{code}

 Cache the number of RS in HTable
 

 Key: HBASE-3767
 URL: https://issues.apache.org/jira/browse/HBASE-3767
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.3

 Attachments: HBASE-3767.patch


 When creating a new HTable we have to query ZK to learn about the number of 
 region servers in the cluster. That is done for every single one of them, I 
 think instead we should do it once per JVM and then reuse that number for all 
 the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3708) createAndFailSilent is not so silent; leaves lots of logging in ensemble logs

2011-04-13 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3708.
--

   Resolution: Fixed
Fix Version/s: 0.90.3
 Hadoop Flags: [Reviewed]

Committed branch and trunk. Thanks for the patch Dmitriy.

 createAndFailSilent is not so silent; leaves lots of logging in ensemble logs
 -

 Key: HBASE-3708
 URL: https://issues.apache.org/jira/browse/HBASE-3708
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: stack
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.90.3


 Clients on startup create a ZKWatcher instance.  Part of construction is 
 check that hbase dirs are all up in zk.  Its done by making the following 
 call: 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/zookeeper/ZKUtil.html#898
 A user complains that its making for lots of logging every second over on the 
 zk ensemble:
 14:59 seeing lots of these in the ZK log though, dozens per second of 
 Got user-level KeeperException when processing sessionid:0x42daa1daab0ecbe 
 type:create cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a 
 Error Path:/hbase Error:KeeperErrorCode = NodeExists for /hbase

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3210) HBASE-1921 for the new master

2011-04-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019693#comment-13019693
 ] 

stack commented on HBASE-3210:
--

Subbu: Your patch looks great (as does your reenabling of TZK).  I'm up for 
committing it -- was going to run all tests first first though since a pretty 
significant change -- but your patch is 4x the time it needs to be since the 
bulk is formatting only changes.  Would you mind resubmitting the patch absent 
the formatting changes.  Try also to keep lines  80.  Good stuff Subbu.

 HBASE-1921 for the new master
 -

 Key: HBASE-3210
 URL: https://issues.apache.org/jira/browse/HBASE-3210
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: 
 HBASE-3210-When_the_Master_s_session_times_out_and_there_s_only_one,_cluster_is_wedged.patch


 HBASE-1921 was lost when writing the new master code. I guess it's going to 
 be much harder to implement now, but I think it's a critical feature to have 
 considering the reasons that brought me do it in the old master. There's 
 already a test in TestZooKeeper which has been disabled a while ago.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3779) Allow split regions to be placed on different region servers

2011-04-13 Thread Ted Yu (JIRA)
Allow split regions to be placed on different region servers


 Key: HBASE-3779
 URL: https://issues.apache.org/jira/browse/HBASE-3779
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.2
Reporter: Ted Yu
Assignee: Ted Yu


Currently daughter regions are placed on the same region server where the 
parent region was.
Stanislav Barton mentioned the idea that load information should be considered 
when placing the daughter regions.
The rationale is that the daughter regions tend to receive more writes. So it 
would be beneficial to place at least one daughter region on a different region 
server.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira