date:20110413


 [ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3609:
--

Attachment: (was: 3609-double-alternation.txt)

 Improve the selection of regions to balance; part 2
 ---

 Key: HBASE-3609
 URL: https://issues.apache.org/jira/browse/HBASE-3609
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Ted Yu
 Attachments: 3609-empty-RS.txt, hbase-3609-by-region-age.txt, 
 hbase-3609.txt


 See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
 of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3771) All jsp pages don't clean their HBA

2011-04-13 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-3771.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Committed to branch and trunk, thanks for the review Stack.

 All jsp pages don't clean their HBA
 ---

 Key: HBASE-3771
 URL: https://issues.apache.org/jira/browse/HBASE-3771
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.3

 Attachments: HBASE-3771.patch


 Noticed by Dave Latham, refreshing the zk web page will eventually make that 
 machine run out of connections with ZK. It's because we don't close the 
 connection created inside HBA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3708) createAndFailSilent is not so silent; leaves lots of logging in ensemble logs

2011-04-13 Thread Dmitriy V. Ryaboy (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019464#comment-13019464
 ] 

Dmitriy V. Ryaboy commented on HBASE-3708:
--

I posted the patch a while back, but I guess something was wrong with the RB 
integration? Here it is: https://review.cloudera.org/r/1672/

 createAndFailSilent is not so silent; leaves lots of logging in ensemble logs
 -

 Key: HBASE-3708
 URL: https://issues.apache.org/jira/browse/HBASE-3708
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: stack
Assignee: Dmitriy V. Ryaboy

 Clients on startup create a ZKWatcher instance.  Part of construction is 
 check that hbase dirs are all up in zk.  Its done by making the following 
 call: 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/zookeeper/ZKUtil.html#898
 A user complains that its making for lots of logging every second over on the 
 zk ensemble:
 14:59 seeing lots of these in the ZK log though, dozens per second of 
 Got user-level KeeperException when processing sessionid:0x42daa1daab0ecbe 
 type:create cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a 
 Error Path:/hbase Error:KeeperErrorCode = NodeExists for /hbase

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3685) when multiple columns are combined with TimestampFilter, only one column is returned

2011-04-13 Thread Jerry Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019468#comment-13019468
 ] 

Jerry Chen commented on HBASE-3685:
---

@stack, can you take a look at this? Kannan and Jonathan have reviewed it 
internally. 

 when multiple columns are combined with TimestampFilter, only one column is 
 returned
 

 Key: HBASE-3685
 URL: https://issues.apache.org/jira/browse/HBASE-3685
 Project: HBase
  Issue Type: Bug
  Components: filters, regionserver
Reporter: Jerry Chen
Priority: Minor
  Labels: noob
 Attachments: 3685-missing-column.patch


 As reported by an Hbase user: 
 I have a ThreadMetadata column family, and there are two columns in it: 
 v12:th: and v12:me. The following code only returns v12:me
 get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:th:);
 get.addColumn(Bytes.toBytes(ThreadMetadata), Bytes.toBytes(v12:me:);
 ListLong threadIds = new ArrayListLong();
 threadIds.add(10709L);
 TimestampFilter filter = new TimestampFilter(threadIds);
 get.setFilter(filter);
 get.setMaxVersions();
 Result result = table.get(get);
 I checked hbase for the key/value, they are present. Also other combinations 
 like no timestampfilter, it returns both.
 Kannan was able to do a small repro of the issue and commented that if we 
 drop the get.setMaxVersions(), then the problem goes away. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3708) createAndFailSilent is not so silent; leaves lots of logging in ensemble logs


[ 
https://issues.apache.org/jira/browse/HBASE-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019469#comment-13019469
 ] 

Ted Yu commented on HBASE-3708:
---

In the future, please use https://reviews.apache.org
Your review request wouldn't be bounced.

 createAndFailSilent is not so silent; leaves lots of logging in ensemble logs
 -

 Key: HBASE-3708
 URL: https://issues.apache.org/jira/browse/HBASE-3708
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: stack
Assignee: Dmitriy V. Ryaboy

 Clients on startup create a ZKWatcher instance.  Part of construction is 
 check that hbase dirs are all up in zk.  Its done by making the following 
 call: 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/zookeeper/ZKUtil.html#898
 A user complains that its making for lots of logging every second over on the 
 zk ensemble:
 14:59 seeing lots of these in the ZK log though, dozens per second of 
 Got user-level KeeperException when processing sessionid:0x42daa1daab0ecbe 
 type:create cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a 
 Error Path:/hbase Error:KeeperErrorCode = NodeExists for /hbase

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3708) createAndFailSilent is not so silent; leaves lots of logging in ensemble logs

2011-04-13 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019471#comment-13019471
 ] 

stack commented on HBASE-3708:
--

Sorry Dmitriy, the feedback loop between cloudera's rb and JIRA has been broke 
w/ a while so fellas have been manually flagging postings to RB by noting them 
here in JIRA.  Let me take a look.  We've also since moved to 
reviews.apache.org for our RB -- for the future.

 createAndFailSilent is not so silent; leaves lots of logging in ensemble logs
 -

 Key: HBASE-3708
 URL: https://issues.apache.org/jira/browse/HBASE-3708
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: stack
Assignee: Dmitriy V. Ryaboy

 Clients on startup create a ZKWatcher instance.  Part of construction is 
 check that hbase dirs are all up in zk.  Its done by making the following 
 call: 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/zookeeper/ZKUtil.html#898
 A user complains that its making for lots of logging every second over on the 
 zk ensemble:
 14:59 seeing lots of these in the ZK log though, dozens per second of 
 Got user-level KeeperException when processing sessionid:0x42daa1daab0ecbe 
 type:create cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a 
 Error Path:/hbase Error:KeeperErrorCode = NodeExists for /hbase

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3769) TableMapReduceUtil is inconsistent with other table-related classes that accept byte[] as a table name


[ 
https://issues.apache.org/jira/browse/HBASE-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019517#comment-13019517
 ] 

Hudson commented on HBASE-3769:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


 TableMapReduceUtil is inconsistent with other table-related classes that 
 accept byte[] as a table name
 --

 Key: HBASE-3769
 URL: https://issues.apache.org/jira/browse/HBASE-3769
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.3
Reporter: Erik Onnen
Assignee: Erik Onnen
Priority: Trivial
 Fix For: 0.92.0

 Attachments: HBASE-3769.patch


 Minor gripe but we define our entire schema as a set of byte[] constants for 
 tables and CFs. This works well with HTable and HTablePool but 
 TableMapReduceUtil requires conversion to a string, most table-related 
 classes do not.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3768) Add best practice to book for loading row key only


[ 
https://issues.apache.org/jira/browse/HBASE-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019519#comment-13019519
 ] 

Hudson commented on HBASE-3768:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


 Add best practice to book for loading row key only
 --

 Key: HBASE-3768
 URL: https://issues.apache.org/jira/browse/HBASE-3768
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.3
Reporter: Erik Onnen
Assignee: Erik Onnen
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3768.patch


 Book and wiki FAQs are missing guidance on the recommended practice for 
 loading row keys only during a scan.
 Patch attached based on jdcryans' feedback from IRC.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3765) metrics.xml - small format change and adding nav to hbase book metrics section


[ 
https://issues.apache.org/jira/browse/HBASE-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019521#comment-13019521
 ] 

Hudson commented on HBASE-3765:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


 metrics.xml - small format change and adding nav to hbase book metrics section
 --

 Key: HBASE-3765
 URL: https://issues.apache.org/jira/browse/HBASE-3765
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Fix For: 0.92.0

 Attachments: metrics_HBASE-3765.xml.patch


 (in src\site\xdoc)
 There was a section header near the top of page that wasn't formatted in bold 
 which I changed.
 Adding small section at bottom to refer to the HBase book metrics section for 
 more info.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed


[ 
https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019520#comment-13019520
 ] 

Hudson commented on HBASE-3722:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


  A lot of data is lost when name node crashed
 -

 Key: HBASE-3722
 URL: https://issues.apache.org/jira/browse/HBASE-3722
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.1
Reporter: gaojinchao
 Fix For: 0.90.3

 Attachments: HmasterFilesystem_PatchV1.patch


 I'm not sure exactly what arose it. there is some split failed logs .
 the master should shutdown itself when the HDFS is crashed.
  The logs is :
  2011-03-22 13:21:55,056 WARN 
  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
  logs
  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on 
 connection exception: java.net.ConnectException: Connection refused
  at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
  at org.apache.hadoop.ipc.Client.call(Client.java:820)
  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
  at $Proxy5.getListing(Unknown Source)
  at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
  at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
  at $Proxy5.getListing(Unknown Source)
  at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
  at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
  at 
 org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
  at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
  at 
  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
  Caused by: java.net.ConnectException: Connection refused
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at 
 sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
  at 
 org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
  at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
  at 
 org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
  at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
  at org.apache.hadoop.ipc.Client.call(Client.java:788)
  ... 13 more
  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).
  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).
  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 2 time(s).
  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 3 time(s).
  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 4 time(s).
  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 5 time(s).
  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 6 time(s).
  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 7 time(s).
  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 8 time(s).
  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect 
 to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).
  2011-03-22 13:22:05,060 ERROR 
  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on 
 connection exception: java.net.ConnectException: Connection refused
  at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
  at org.apache.hadoop.ipc.Client.call(Client.java:820)
  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
  at $Proxy5.getFileInfo(Unknown Source)
  at

[jira] [Commented] (HBASE-3759) Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and complete()


[ 
https://issues.apache.org/jira/browse/HBASE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019518#comment-13019518
 ] 

Hudson commented on HBASE-3759:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


 Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and 
 complete()
 

 Key: HBASE-3759
 URL: https://issues.apache.org/jira/browse/HBASE-3759
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: Gary Helmling
Assignee: Gary Helmling
 Fix For: 0.92.0

 Attachments: HBASE-3759.patch, cp_bypass.tar.gz


 In the current coprocessor framework, ThreadLocal objects are used for the 
 bypass and complete booleans in CoprocessorEnvironment.  This allows the 
 *CoprocessorHost implementations to identify when to short-circuit processing 
 the the preXXX and postXXX hook methods.
 Profiling the region server, however, shows that these ThreadLocals can 
 become a contention point when on a hot code path (such as prePut()).  We 
 should refactor the CoprocessorHost pre/post implementations to remove usage 
 of the ThreadLocal variables and replace them with locally scoped variables 
 to eliminate contention between handler threads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3770) Make FilterList accept var arg Filters in its constructor as a convenience


[ 
https://issues.apache.org/jira/browse/HBASE-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019523#comment-13019523
 ] 

Hudson commented on HBASE-3770:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])


 Make FilterList accept var arg Filters in its constructor as a convenience
 --

 Key: HBASE-3770
 URL: https://issues.apache.org/jira/browse/HBASE-3770
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.3
Reporter: Erik Onnen
Assignee: Erik Onnen
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3770.patch


 When using a small number of Filters for a FilterList, it's cleaner to use 
 var args rather than forcing a list on the client. Compare:
 scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, new 
 FirstKeyOnlyFilter(), new KeyOnlyFilter()));
 vs:
 ListFilter filters = new ArrayListFilter(2);
 filters.add(new FilrstKeyOnlyFilter());
 filters.add(new KeyOnlyFilter());
 scan.setFilter(new FilterList(FilterList.Operator.MUST_PASS_ALL, filters);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3771) All jsp pages don't clean their HBA


[ 
https://issues.apache.org/jira/browse/HBASE-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019522#comment-13019522
 ] 

Hudson commented on HBASE-3771:
---

Integrated in HBase-TRUNK #1850 (See 
[https://hudson.apache.org/hudson/job/HBase-TRUNK/1850/])
HBASE-3771  All jsp pages don't clean their HBA


 All jsp pages don't clean their HBA
 ---

 Key: HBASE-3771
 URL: https://issues.apache.org/jira/browse/HBASE-3771
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.3

 Attachments: HBASE-3771.patch


 Noticed by Dave Latham, refreshing the zk web page will eventually make that 
 machine run out of connections with ZK. It's because we don't close the 
 connection created inside HBA.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3777) Redefine Identity Of HBase Configuration

Redefine Identity Of HBase Configuration


 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0


Judging from the javadoc in {{HConnectionManager}}, sharing connections across 
multiple clients going to the same cluster is supposedly a good thing. However, 
the fact that there is a one-to-one mapping between a configuration and 
connection instance, kind of works against that goal. Specifically, when you 
create {{HTable}} instances using a given {{Configuration}} instance and a copy 
thereof, we end up with two distinct {{HConnection}} instances under the 
covers. Is this really expected behavior, especially given that the 
configuration instance gets cloned a lot?

Here, I'd like to play devil's advocate and propose that we deep-compare 
{{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
instances that have the same properties map to the same {{HConnection}} 
instance. In case one is concerned that a single {{HConnection}} is 
insufficient for sharing amongst clients,  to quote the javadoc, then one 
should be able to mark a given {{HBaseConfiguration}} instance as being 
uniquely identifiable.

Note that sharing connections makes clean up of {{HConnection}} instances a 
little awkward, unless of course, you apply the change described in HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3777) Redefine Identity Of HBase Configuration


 [ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthick Sankarachary updated HBASE-3777:
-

Attachment: HBASE-3777.patch

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3777) Redefine Identity Of HBase Configuration


 [ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthick Sankarachary updated HBASE-3777:
-

Status: Patch Available  (was: Open)

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3778) HBaseAdmin.create doesn't create empty boundary keys

2011-04-13 Thread Ted Dunning (JIRA)

HBaseAdmin.create doesn't create empty boundary keys


 Key: HBASE-3778
 URL: https://issues.apache.org/jira/browse/HBASE-3778
 Project: HBase
  Issue Type: Bug
Reporter: Ted Dunning


In my ycsb stuff, I have code that looks like this:
{code}
String startKey = user102000;
String endKey = user94000;
admin.createTable(descriptor, startKey.getBytes(), endKey.getBytes(), 
regions);
{code}
The result, however, is a table where the first and last region has defined 
first and last keys rather than empty keys.

The patch I am about to attach fixes this, I think.  I have some worries about 
other uses of Bytes.split, however, and would like some eyes on this patch.  
Perhaps we need a new dialect of split.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3778) HBaseAdmin.create doesn't create empty boundary keys

2011-04-13 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated HBASE-3778:
---

Attachment: HBASE-3778.patch

Proposed patch.

Almost certainly breaks current tests.

 HBaseAdmin.create doesn't create empty boundary keys
 

 Key: HBASE-3778
 URL: https://issues.apache.org/jira/browse/HBASE-3778
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Ted Dunning
 Attachments: HBASE-3778.patch


 In my ycsb stuff, I have code that looks like this:
 {code}
 String startKey = user102000;
 String endKey = user94000;
 admin.createTable(descriptor, startKey.getBytes(), endKey.getBytes(), 
 regions);
 {code}
 The result, however, is a table where the first and last region has defined 
 first and last keys rather than empty keys.
 The patch I am about to attach fixes this, I think.  I have some worries 
 about other uses of Bytes.split, however, and would like some eyes on this 
 patch.  Perhaps we need a new dialect of split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3778) HBaseAdmin.create doesn't create empty boundary keys

2011-04-13 Thread Ted Dunning (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Dunning updated HBASE-3778:
---

Affects Version/s: 0.90.2
   Status: Patch Available  (was: Open)

 HBaseAdmin.create doesn't create empty boundary keys
 

 Key: HBASE-3778
 URL: https://issues.apache.org/jira/browse/HBASE-3778
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.2
Reporter: Ted Dunning
 Attachments: HBASE-3778.patch


 In my ycsb stuff, I have code that looks like this:
 {code}
 String startKey = user102000;
 String endKey = user94000;
 admin.createTable(descriptor, startKey.getBytes(), endKey.getBytes(), 
 regions);
 {code}
 The result, however, is a table where the first and last region has defined 
 first and last keys rather than empty keys.
 The patch I am about to attach fixes this, I think.  I have some worries 
 about other uses of Bytes.split, however, and would like some eyes on this 
 patch.  Perhaps we need a new dialect of split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration


[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019550#comment-13019550
 ] 

Ted Yu commented on HBASE-3777:
---

I think this JIRA and HBASE-3766 combined can be expressed by my comment on 
HBASE-3734 at 05/Apr/11 05:20

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3609) Improve the selection of regions to balance; part 2


[ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019560#comment-13019560
 ] 

Ted Yu commented on HBASE-3609:
---

From Stan Barton who helps me experiment with my changes:

There is no easy
way to check how many regions are assigned to particular RS, so will
probably need to write some small parser to prove that.

I think we should backport HBASE-3704 (at least Regions by Region Server) to 
0.90.3 so that people can easily tell how (un)even the load is distributed.

 Improve the selection of regions to balance; part 2
 ---

 Key: HBASE-3609
 URL: https://issues.apache.org/jira/browse/HBASE-3609
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Ted Yu
 Attachments: 3609-double-alternation.txt, 3609-empty-RS.txt, 
 hbase-3609-by-region-age.txt, hbase-3609.txt


 See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
 of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration


[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019570#comment-13019570
 ] 

Karthick Sankarachary commented on HBASE-3777:
--

Ted, 

I saw your comment on HBASE-3734. It:

a) Proposes a neater way of comparing {{Configuration}} instances, for the 
purposes of {{HConnection}} lookup. In fact, the thought of comparing just the 
cluster-specific properties in {{HBaseConfiguration}} did cross my mind. 
However, at times, you may want the ability to have multiple connections per 
cluster, which would not be possible using your approach. 

b) Validates the need for having a reference count on the connection. Instead 
of using a (refcount, connection) tuple as the value in HBASE_INSTANCES though, 
HBASE-3766 puts the refcount in the connection itself. Do you see a specific 
advantage to separating out the refcount from the connection?

Regards,
Karthick



 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration


[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019574#comment-13019574
 ] 

Ted Yu commented on HBASE-3777:
---

For a), I like the idea of adding uniquifier to HBaseConfiguration. This is can 
be standardized through a well-known configuration parameter, such as 
hbase.zookeeper.uniquifier (a secondary key really).

For b), I don't have strong opinion about particular implementation. What I 
have yet to propose is that we can implement (optional) timeout mechanism for 
connections to address the issue under the thread hbase -0.90.x upgrade - 
zookeeper exception in mapreduce job on user mailing list.
Maybe it's easier to enforce timeout policy in HCM, hence the centralized 
reference counting.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration


[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019592#comment-13019592
 ] 

Ted Yu commented on HBASE-3777:
---

J-D informed me that my initial proposal mirrors what used to be done in 0.89
The current design is to bypass certain issues encountered by 0.89

Shall we do the following ?
Step 1, agree upon mechanism for determining identity of HBaseConfiguration's 
and reference counting. Enumerate the possibilities of error from experience of 
0.89 development.
Step 2, implement the new mechanism in trunk.
Step 3, thoroughly test (YCSB, etc) before publishing.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration


[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019597#comment-13019597
 ] 

Karthick Sankarachary commented on HBASE-3777:
--

That sounds like a plan. Are there any threads that talk about the error cases 
we run into in 0.89?

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-04-13 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019601#comment-13019601
 ] 

Jean-Daniel Cryans commented on HBASE-3777:
---

This is one of the most important one, that also removed both hashCode and 
equals from HBaseConfiguration, HBASE-2925.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration


[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019619#comment-13019619
 ] 

Karthick Sankarachary commented on HBASE-3777:
--

I see. In that case, using a combination of 
{{conf.get(hbase.zookeeper.quorum)}} and 
{{conf.get(hbase.client.uniqueid)}} as the key, like Ted suggested, may be 
the way to go.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration


[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019630#comment-13019630
 ] 

Ted Yu commented on HBASE-3777:
---

Allow me to add step 2.5:
apply the implementation from step 2 on existing (and new) unit tests for 
validation.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3609) Improve the selection of regions to balance; part 2


 [ 
https://issues.apache.org/jira/browse/HBASE-3609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3609:
--

Attachment: (was: 3609-double-alternation.txt)

 Improve the selection of regions to balance; part 2
 ---

 Key: HBASE-3609
 URL: https://issues.apache.org/jira/browse/HBASE-3609
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Ted Yu
 Attachments: 3609-empty-RS.txt, hbase-3609-by-region-age.txt, 
 hbase-3609.txt


 See 'HBASE-3586  Improve the selection of regions to balance' for discussion 
 of algorithms that improve on current random assignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3373) Allow regions of specific table to be load-balanced


[ 
https://issues.apache.org/jira/browse/HBASE-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019635#comment-13019635
 ] 

Ted Yu commented on HBASE-3373:
---

Suggestion from Stan Barton:
This JIRA can be generalized as a new policy for load balancer. That is, to 
have balanced number of
regions per RS per table and not in total number of regions from all tables.

 Allow regions of specific table to be load-balanced
 ---

 Key: HBASE-3373
 URL: https://issues.apache.org/jira/browse/HBASE-3373
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.20.6
Reporter: Ted Yu
 Fix For: 0.92.0

 Attachments: HbaseBalancerTest2.java


 From our experience, cluster can be well balanced and yet, one table's 
 regions may be badly concentrated on few region servers.
 For example, one table has 839 regions (380 regions at time of table 
 creation) out of which 202 are on one server.
 It would be desirable for load balancer to distribute regions for specified 
 tables evenly across the cluster. Each of such tables has number of regions 
 many times the cluster size.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3759) Eliminate use of ThreadLocals for CoprocessorEnvironment bypass() and complete()

2011-04-13 Thread jirapos...@reviews.apache.org (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019676#comment-13019676
]

jirapos...@reviews.apache.org commented on HBASE-3759:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/588/#review458
---

src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContext.java
https://reviews.apache.org/r/588/#comment885

First, please let me know if i am thinking in the right direction:

In the threadlocal version, we are setting it to false because this
variable is shared by the registered CPs in all their pre/postXXX hooks, and it
was used to decide whether to continue with the CP chain or return from the
currently executing CP. So, to reuse this variable, it was set to false again.

If that is the case, in this version, we are having a separate instance of
ObserverContext for one hook, and i don't see that we need to reset these
variables.

The same goes with the current variable.

Am i getting it right?
(I want to come up with a CP observer for 3607, therefore want to grok it a
bit, hope you don't mind)
Thanks.

- himanshu

On 2011-04-13 01:08:50, Gary Helmling wrote:
bq.
bq. ---
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/588/
bq. ---
bq.
bq. (Updated 2011-04-13 01:08:50)
bq.
bq.
bq. Review request for hbase.
bq.
bq.
bq. Summary
bq. ---
bq.
bq. Profiling the HRegionServer process with a RegionObserver coprocessor
loaded shows a fair amount of runnable thread CPU time spent getting the bypass
and complete flag ThreadLocal values by RegionCoprocessorHost. See the
HBASE-3759 JIRA for some attached graphs.
bq.
bq. With the caveat that this is runnable CPU time and not threads in all
states, this still seems like a significant processing bottleneck on a hot call
path. The workload profiled was a put-based bulk load, so for each multi-put
request, RegionCoprocessorHost.prePut() could be called many times.
bq.
bq. Instead of using ThreadLocal variable for bypass/complete, which will
incur contention on the underlying map of values, I think we can eliminate the
bottleneck by using locally scoped variables for each preXXX/putXXX method
called in the RegionCoprocessorHost, MasterCoprocessorHost and
WALCoprocessorHost classes.
bq.
bq. The attached patch refactors the current RegionObserver, MasterObserver
and WALObserver APIs to provide a locally scoped ObserverContext object for
storing and checking the bypass and complete values.
bq.
bq. Summary of changes:
bq.
bq. * adds a new ObserverContextT extends CoprocessorEnvironment class,
containing references for bypass, complete and the environment instance
bq. * in each pre/post method in RegionObserver, the
RegionCoprocessorEnvironment parameter is replaced by
ObserverContextRegionCoprocessorEnvironment
bq. * in each pre/post method in MasterObserver, the
MasterCoprocessorEnvironment parameter is replaced by
ObserverContextMasterCoprocessorEnvironment
bq. * in each pre/post method in WALObserver, the WALCoprocessorEnvironment
parameter is replace by ObserverContextWALCoprocesorEnvironment
bq.
bq.
bq. This is obviously a large bulk change to the existing API. I could avoid
the API change with hacky modification underneath the *CoprocessorEnvironment
interfaces. But since we do not yet have a public release with coprocessors, I
would prefer to take the time to make the initial API the best we can before we
push it out.
bq.
bq. Please let me know your thoughts on this approach.
bq.
bq.
bq. This addresses bug HBASE-3759.
bq. https://issues.apache.org/jira/browse/HBASE-3759
bq.
bq.
bq. Diffs
bq. -
bq.
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java
9576c48
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/BaseRegionObserverCoprocessor.java
5a0f095
bq.
src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorEnvironment.java
d45b950
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
a82f62b
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java
db0870b
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/ObserverContext.java
PRE-CREATION
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/RegionObserver.java
3501958
bq.src/main/java/org/apache/hadoop/hbase/coprocessor/WALObserver.java
7a34d18
bq.src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java
019bbde
bq.

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019684#comment-13019684
 ] 

Ted Yu commented on HBASE-3767:
---

The javadoc fragment doesn't mention allowCoreThreadTimeOut.

From 
http://fuseyism.com/classpath/doc/java/util/concurrent/ThreadPoolExecutor-source.html:
{code:Java}
 494: Runnable getTask() {
 495: for (;;) {
 496: try {
 497: switch (runState) {
 498: case RUNNING: {
 499: // untimed wait if core and not allowing core timeout
 500: if (poolSize = corePoolSize  
!allowCoreThreadTimeOut)
 501: return workQueue.take();
 502: 
 503: long timeout = keepAliveTime;
 504: if (timeout = 0) // die immediately for 0 timeout
 505: return null;
 506: Runnable r = workQueue.poll(timeout, 
TimeUnit.NANOSECONDS);
 507: if (r != null)
 508: return r;
 509: if (poolSize  corePoolSize || allowCoreThreadTimeOut)
 510: return null; // timed out
 511: // Else, after timeout, the pool shrank. Retry
 512: break;
 513: }
{code:Java}
In HTable(), allowCoreThreadTimeOut is set to true. So we're not bounded by 
corePoolSize threads.

 Cache the number of RS in HTable
 

 Key: HBASE-3767
 URL: https://issues.apache.org/jira/browse/HBASE-3767
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.3

 Attachments: HBASE-3767.patch


 When creating a new HTable we have to query ZK to learn about the number of 
 region servers in the cluster. That is done for every single one of them, I 
 think instead we should do it once per JVM and then reuse that number for all 
 the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3767) Cache the number of RS in HTable


[ 
https://issues.apache.org/jira/browse/HBASE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019685#comment-13019685
 ] 

Ted Yu commented on HBASE-3767:
---

Re-pasting source code due to garbled display above:
{code}
 497: switch (runState) {
 498: case RUNNING: {
 499: // untimed wait if core and not allowing core timeout
 500: if (poolSize = corePoolSize  
!allowCoreThreadTimeOut)
 501: return workQueue.take();
 502: 
 503: long timeout = keepAliveTime;
 504: if (timeout = 0) // die immediately for 0 timeout
 505: return null;
 506: Runnable r = workQueue.poll(timeout, 
TimeUnit.NANOSECONDS);
 507: if (r != null)
 508: return r;
 509: if (poolSize  corePoolSize || allowCoreThreadTimeOut)
 510: return null; // timed out
 511: // Else, after timeout, the pool shrank. Retry
 512: break;
 513: }
{code}

 Cache the number of RS in HTable
 

 Key: HBASE-3767
 URL: https://issues.apache.org/jira/browse/HBASE-3767
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
 Fix For: 0.90.3

 Attachments: HBASE-3767.patch


 When creating a new HTable we have to query ZK to learn about the number of 
 region servers in the cluster. That is done for every single one of them, I 
 think instead we should do it once per JVM and then reuse that number for all 
 the others.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3708) createAndFailSilent is not so silent; leaves lots of logging in ensemble logs

2011-04-13 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3708.
--

   Resolution: Fixed
Fix Version/s: 0.90.3
 Hadoop Flags: [Reviewed]

Committed branch and trunk. Thanks for the patch Dmitriy.

 createAndFailSilent is not so silent; leaves lots of logging in ensemble logs
 -

 Key: HBASE-3708
 URL: https://issues.apache.org/jira/browse/HBASE-3708
 Project: HBase
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 0.90.1
Reporter: stack
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.90.3


 Clients on startup create a ZKWatcher instance.  Part of construction is 
 check that hbase dirs are all up in zk.  Its done by making the following 
 call: 
 http://hbase.apache.org/xref/org/apache/hadoop/hbase/zookeeper/ZKUtil.html#898
 A user complains that its making for lots of logging every second over on the 
 zk ensemble:
 14:59 seeing lots of these in the ZK log though, dozens per second of 
 Got user-level KeeperException when processing sessionid:0x42daa1daab0ecbe 
 type:create cxid:0x1 zxid:0xfffe txntype:unknown reqpath:n/a 
 Error Path:/hbase Error:KeeperErrorCode = NodeExists for /hbase

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3210) HBASE-1921 for the new master

2011-04-13 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019693#comment-13019693
 ] 

stack commented on HBASE-3210:
--

Subbu: Your patch looks great (as does your reenabling of TZK).  I'm up for 
committing it -- was going to run all tests first first though since a pretty 
significant change -- but your patch is 4x the time it needs to be since the 
bulk is formatting only changes.  Would you mind resubmitting the patch absent 
the formatting changes.  Try also to keep lines  80.  Good stuff Subbu.

 HBASE-1921 for the new master
 -

 Key: HBASE-3210
 URL: https://issues.apache.org/jira/browse/HBASE-3210
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.92.0

 Attachments: 
 HBASE-3210-When_the_Master_s_session_times_out_and_there_s_only_one,_cluster_is_wedged.patch


 HBASE-1921 was lost when writing the new master code. I guess it's going to 
 be much harder to implement now, but I think it's a critical feature to have 
 considering the reasons that brought me do it in the old master. There's 
 already a test in TestZooKeeper which has been disabled a while ago.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-3779) Allow split regions to be placed on different region servers