[jira] [Commented] (HBASE-3833) ability to support includes/excludes list in Hbase

2011-05-23 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037791#comment-13037791
 ] 

dhruba borthakur commented on HBASE-3833:
-

There are actually two use-cases that triggered this JIRA.

1. There are times when the adminstrator wants to shut down a few region 
servers, usually to upgrade hardware or some such reason. In this case, the 
administrator can put these machines in the excludes list, wait for those 
region servers to gracefully shutdown.

2. The other use-case is when certain region servers become unresponsive. Twice 
it so happened that a regionserver is heartbeating with ZK, but its capability 
to process hbase workload suddenly fell to almost zero. We could not ssh into 
the machine to debug what is wrong. (The suspicion is that the machine started 
swapping). In this case, it would be nice if the administrator had an option to 
put a machine in the excludes list, wait for a few minutes for it to gracefully 
exit, but if it still does not exit, then forcefully declare the regionserver 
as dead.

In short, maybe we need a force option in decommissioning, which if used, 
does not wait for a graceful shutdown of the specified regionserver, instead 
declares it dead immediately and then follows the normal course of action 
(lease recovery, reassign regions, etc)
  

 ability to support includes/excludes list in Hbase
 --

 Key: HBASE-3833
 URL: https://issues.apache.org/jira/browse/HBASE-3833
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Affects Versions: 0.90.2
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: excl-patch.txt, excl-patch.txt


 An HBase cluster currently does not have the ability to specify that the 
 master should accept regionservers only from a specified list. This helps 
 preventing administrative errors where the same machine could be included in 
 two clusters. It also allows the administrator to easily remove un-ssh-able 
 machines from the cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.

2011-05-23 Thread jian zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037816#comment-13037816
 ] 

jian zhang commented on HBASE-3906:
---

1, Ted, This patch is only for branch.
2, Andrew, In my hmaster dump, there are 1481 HServerInfo and HServerLoad 
objects,24,423,058 RegionLoad objects,one RegionLoad occupy 136B.I'm not native 
speaker,so i'm not very sure that i understand your question correctly. Can i 
understand live objects as the objects which cann't be garbage collected by 
jvm?if so,i think all these objects are live.
3,stack, I tested serveral senarios,the patch can work correctly,no issues 
found about synchronize blocks.
Indeed,refreshing hserverinfo is not grace enough. and balancing don't use the 
load of the HSI in regions map i think. according to your suggestion, i cleared 
the load and patched on my cluster to test,until now,it works ok.i will try to 
test more senarios and then provide the new patch to you for reviewing again.
BTW,one hserverinfo object occupy about 350B memory though cleared the load,if 
we don't use my ugly refreshing solution, in worst case,one region need one 
hserverinfo object,if a big hbase cluster have 500,000 regions,the hserverinfo 
objects will occupy about 175,000,000B memory.do you think this can be 
acceptable?


 When HMaster is running,there are a lot of RegionLoad instances(far greater 
 than the regions),it has risk of OOME.
 --

 Key: HBASE-3906
 URL: https://issues.apache.org/jira/browse/HBASE-3906
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.2, 0.90.3
 Environment: 1 hmaster,4 regionserver,about 100,000 regions.
Reporter: jian zhang
 Fix For: 0.90.4

 Attachments: HBASE-3906.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 1、Start hbase cluster;
 2、After hmaster finish regions assignement,use jmap to dump the memory of 
 hmaster;
 3、Use MAT to analyse the dump file,there are too many RegionLoad 
 instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3908) TableSplit not implementing hashCode problem

2011-05-23 Thread Daniel Iancu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Iancu updated HBASE-3908:


 Tags: mapreduce
Fix Version/s: 0.90.4
   Status: Patch Available  (was: Open)

 TableSplit not implementing hashCode problem
 --

 Key: HBASE-3908
 URL: https://issues.apache.org/jira/browse/HBASE-3908
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.1
Reporter: Daniel Iancu
 Fix For: 0.90.4


 reported by Lucian Iordache on hbase-user mail list. will attach the patch 
 asap
 ---
 Hi guys,
 I've just found a problem with the class TableSplit. It implements equals,
 but it does not implement hashCode also, as it should have.
 I've discovered it by trying to use a HashSet of TableSplit's, and I've
 noticed that some duplicate splits are added to the set.
 The only option I have for now is to extend TableSplit and to use the
 subclass.
 I use cloudera hbase cdh3u0 version.
 Do you know about this problem? Should I open a Jira issue for that, or it
 already exists?
 Thanks,
 Lucian

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3908) TableSplit not implementing hashCode problem

2011-05-23 Thread Daniel Iancu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Iancu updated HBASE-3908:


Attachment: HBASE-3908-TableSplit-hashCode.patch

 TableSplit not implementing hashCode problem
 --

 Key: HBASE-3908
 URL: https://issues.apache.org/jira/browse/HBASE-3908
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.1
Reporter: Daniel Iancu
 Fix For: 0.90.4

 Attachments: HBASE-3908-TableSplit-hashCode.patch


 reported by Lucian Iordache on hbase-user mail list. will attach the patch 
 asap
 ---
 Hi guys,
 I've just found a problem with the class TableSplit. It implements equals,
 but it does not implement hashCode also, as it should have.
 I've discovered it by trying to use a HashSet of TableSplit's, and I've
 noticed that some duplicate splits are added to the set.
 The only option I have for now is to extend TableSplit and to use the
 subclass.
 I use cloudera hbase cdh3u0 version.
 Do you know about this problem? Should I open a Jira issue for that, or it
 already exists?
 Thanks,
 Lucian

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3912) [Stargate] Columns not handle by Scan

2011-05-23 Thread Lars George (JIRA)
[Stargate] Columns not handle by Scan
-

 Key: HBASE-3912
 URL: https://issues.apache.org/jira/browse/HBASE-3912
 Project: HBase
  Issue Type: Bug
  Components: rest
Affects Versions: 0.90.3
Reporter: Lars George
Priority: Minor
 Fix For: 0.90.4, 0.92.0


There is an issue with ScannerModel only adding the column families to the scan 
model, not actual columns. Easy fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3912) [Stargate] Columns not handle by Scan

2011-05-23 Thread Lars George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars George updated HBASE-3912:
---

Attachment: HBASE-3912.patch

Patch adds an iteration over the actual family map, adding all column families 
and qualifiers.

 [Stargate] Columns not handle by Scan
 -

 Key: HBASE-3912
 URL: https://issues.apache.org/jira/browse/HBASE-3912
 Project: HBase
  Issue Type: Bug
  Components: rest
Affects Versions: 0.90.3
Reporter: Lars George
Priority: Minor
 Fix For: 0.90.4, 0.92.0

 Attachments: HBASE-3912.patch


 There is an issue with ScannerModel only adding the column families to the 
 scan model, not actual columns. Easy fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-3912) [Stargate] Columns not handle by Scan

2011-05-23 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned HBASE-3912:
-

Assignee: Andrew Purtell

 [Stargate] Columns not handle by Scan
 -

 Key: HBASE-3912
 URL: https://issues.apache.org/jira/browse/HBASE-3912
 Project: HBase
  Issue Type: Bug
  Components: rest
Affects Versions: 0.90.3
Reporter: Lars George
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 0.90.4, 0.92.0

 Attachments: HBASE-3912.patch


 There is an issue with ScannerModel only adding the column families to the 
 scan model, not actual columns. Easy fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3912) [Stargate] Columns not handle by Scan

2011-05-23 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-3912.
---

Resolution: Fixed
  Assignee: Lars George  (was: Andrew Purtell)

Committed to trunk and 0.90. Passes tests (after finding and fixing NPE).

 [Stargate] Columns not handle by Scan
 -

 Key: HBASE-3912
 URL: https://issues.apache.org/jira/browse/HBASE-3912
 Project: HBase
  Issue Type: Bug
  Components: rest
Affects Versions: 0.90.3
Reporter: Lars George
Assignee: Lars George
Priority: Minor
 Fix For: 0.90.4, 0.92.0

 Attachments: HBASE-3912.patch


 There is an issue with ScannerModel only adding the column families to the 
 scan model, not actual columns. Easy fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3908) TableSplit not implementing hashCode problem

2011-05-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3908:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch.  Thanks for the patch Daniel

 TableSplit not implementing hashCode problem
 --

 Key: HBASE-3908
 URL: https://issues.apache.org/jira/browse/HBASE-3908
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.1
Reporter: Daniel Iancu
 Fix For: 0.90.4

 Attachments: HBASE-3908-TableSplit-hashCode.patch


 reported by Lucian Iordache on hbase-user mail list. will attach the patch 
 asap
 ---
 Hi guys,
 I've just found a problem with the class TableSplit. It implements equals,
 but it does not implement hashCode also, as it should have.
 I've discovered it by trying to use a HashSet of TableSplit's, and I've
 noticed that some duplicate splits are added to the set.
 The only option I have for now is to extend TableSplit and to use the
 subclass.
 I use cloudera hbase cdh3u0 version.
 Do you know about this problem? Should I open a Jira issue for that, or it
 already exists?
 Thanks,
 Lucian

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3906) When HMaster is running,there are a lot of RegionLoad instances(far greater than the regions),it has risk of OOME.

2011-05-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038061#comment-13038061
 ] 

stack commented on HBASE-3906:
--

Jian: Yes, Andrew is asking how many of the 24M objects are not collectable by 
the JVM?  Does your heap analysis tool have a means of cleaning dead objects 
and only showing 'live objects'?

 When HMaster is running,there are a lot of RegionLoad instances(far greater 
 than the regions),it has risk of OOME.
 --

 Key: HBASE-3906
 URL: https://issues.apache.org/jira/browse/HBASE-3906
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.90.2, 0.90.3
 Environment: 1 hmaster,4 regionserver,about 100,000 regions.
Reporter: jian zhang
 Fix For: 0.90.4

 Attachments: HBASE-3906.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 1、Start hbase cluster;
 2、After hmaster finish regions assignement,use jmap to dump the memory of 
 hmaster;
 3、Use MAT to analyse the dump file,there are too many RegionLoad 
 instances,and these instances occupy more than 3G memory;

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3833) ability to support includes/excludes list in Hbase

2011-05-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038067#comment-13038067
 ] 

stack commented on HBASE-3833:
--

bq. In short, maybe we need a force option in decommissioning, which if used, 
does not wait for a graceful shutdown of the specified regionserver, instead 
declares it dead immediately and then follows the normal course of action 
(lease recovery, reassign regions, etc)

Agreed.  Seems like this patch does the 'forced' option.  Missing from this 
patch (though available as a script) is the graceful decommission.  Vishal, 
were you going for the 'forced' option only?  If so, should we commit this 
patch as is (though I think it dangerous if misunderstood and user is not clear 
that excludes means 'forced' decommission).

 ability to support includes/excludes list in Hbase
 --

 Key: HBASE-3833
 URL: https://issues.apache.org/jira/browse/HBASE-3833
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Affects Versions: 0.90.2
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: excl-patch.txt, excl-patch.txt


 An HBase cluster currently does not have the ability to specify that the 
 master should accept regionservers only from a specified list. This helps 
 preventing administrative errors where the same machine could be included in 
 two clusters. It also allows the administrator to easily remove un-ssh-able 
 machines from the cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3911) book.xml - schema design, added comment about supported datatypes

2011-05-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3911:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Thanks for the patch Doug.  Applied to TRUNK

 book.xml - schema design, added comment about supported datatypes
 -

 Key: HBASE-3911
 URL: https://issues.apache.org/jira/browse/HBASE-3911
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_3911.xml.patch


 The supported datatypes in HBase is whatever can be converted to a 
 byte-array.  There are practical limits on size obviously, but this question 
 comes up enough on the dist-list to warrant inclusion in the book.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3913) Expose ColumnPaginationFilter to the Thrift Server

2011-05-23 Thread Matthew Ward (JIRA)
Expose ColumnPaginationFilter to the Thrift Server
--

 Key: HBASE-3913
 URL: https://issues.apache.org/jira/browse/HBASE-3913
 Project: HBase
  Issue Type: New Feature
  Components: thrift
Reporter: Matthew Ward
Priority: Minor


Expose the ColumnPaginationFilter to the thrift server by implementing the 
following methods:
public ListTRowResult getRowWithColumnsPaginated(byte[] tableName, byte[] 
row, Listbyte[] columns, int limit, int offset);
public ListTRowResult getRowWithColumnsTsPaginated(byte[] tableName, byte[] 
row, Listbyte[] columns, long timestamp, int limit, int offset)

Also look into exposing a thrift method for exposing the number of columns in a 
particular row's family. 

Original improvement Idea submitted on dev list and approved by Stack.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.

2011-05-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038173#comment-13038173
 ] 

stack commented on HBASE-3904:
--

Adding the getter and setter getNumberOfInitialRegions to HTD is a little odd.  
You are using HTD to carry a message used by the create table where you pass 
lots of regions.   Seems like an odd thing to expose as public new methods.  If 
you need to pass this message, perhaps use the generic get/set attributes.



 HConnection.isTableAvailable returns true even with not all regions available.
 --

 Key: HBASE-3904
 URL: https://issues.apache.org/jira/browse/HBASE-3904
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Vidhyashankar Venkataraman
Priority: Minor
 Attachments: 3904-v2.txt, 3904.txt


 This function as per the java doc is supposed to return true iff all the 
 regions in the table are available. But if the table is still being created 
 this function may return inconsistent results (For example, when a table with 
 a large number of split keys is created). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.

2011-05-23 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-3904:
--

Attachment: 3904-v3.txt

I removed the getNumberOfInitialRegions and switched to the generic 
getValue/setValue methods.

 HConnection.isTableAvailable returns true even with not all regions available.
 --

 Key: HBASE-3904
 URL: https://issues.apache.org/jira/browse/HBASE-3904
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Vidhyashankar Venkataraman
Priority: Minor
 Attachments: 3904-v2.txt, 3904-v3.txt, 3904.txt


 This function as per the java doc is supposed to return true iff all the 
 regions in the table are available. But if the table is still being created 
 this function may return inconsistent results (For example, when a table with 
 a large number of split keys is created). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1316) ZooKeeper: use native threads to avoid GC stalls (JNI integration)

2011-05-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038208#comment-13038208
 ] 

stack commented on HBASE-1316:
--

bq. We'll need to upload these artifacts somewhere, along with other supported 
OSes/architectures in the future.

This should be fine.  We've been doing this for various libs up to this point.  
Can add this np.

Do you think this patch will be generally useful Joey?  If so, maybe once its 
up working in hbase, it can be contrib'd back to zk?

bq. I haven't modified the packaging part of the build. I'm not sure how we'll 
want the build to generate versions of the native library for multiple 
platforms.

Tell me more about this?  Are you thinking we need to build the native libs 
in-line with a build each time?

Do you think this feature can be optionally enabled?  If we fail to load the 
required native lib, do we default to old-school session handling?  Or, its on 
always but we only use new-style if we find the native libs?

How does this timeout relate to the zk session timeout?

{code}
+  public static int DEFAULT_HBASE_ZOOKEEPER_NATIVE_SESSION_TIMEOUT = 5000;
{code}

Thats cool that you have unit tests in place for your new methods already.

Patch so far looks great to me.



 ZooKeeper: use native threads to avoid GC stalls (JNI integration)
 --

 Key: HBASE-1316
 URL: https://issues.apache.org/jira/browse/HBASE-1316
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.20.0
Reporter: Andrew Purtell
Assignee: Berk D. Demir
 Attachments: HBASE-1316-1.patch, zk_wrapper.tar.gz, 
 zookeeper-native-Linux-amd64-64.tgz, zookeeper-native-headers.tgz


 From Joey Echeverria up on hbase-users@:
 We've used zookeeper in a write-heavy project we've been working on and 
 experienced issues similar to what you described. After several days of 
 debugging, we discovered that our issue was garbage collection. There was no 
 way to guarantee we wouldn't have long pauses especially since our 
 environment was the worst case for garbage collection, millions of tiny, 
 short lived objects. I suspect HBase sees similar work loads frequently, if 
 it's not constantly. With anything shorter than a 30 second session time out, 
 we got session expiration events extremely frequently. We needed to use 60 
 seconds for any real confidence that an ephemeral node disappearing meant 
 something was unavailable.
 We really wanted quick recovery so we ended up writing a light-weight wrapper 
 around the C API and used swig to auto-generate a JNI interface. It's not 
 perfect, but since we switched to this method we've never seen a session 
 expiration event and ephemeral nodes only disappear when there are network 
 issues or a machine/process goes down.
 I don't know if it's worth doing the same kind of thing for HBase as it adds 
 some unnecessary native code, but it's a solution that I found works.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2937) Facilitate Timeouts In HBase Client

2011-05-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038210#comment-13038210
 ] 

stack commented on HBASE-2937:
--

Are you going to upload a new patch Karthick or this is ready to go?

 Facilitate Timeouts In HBase Client
 ---

 Key: HBASE-2937
 URL: https://issues.apache.org/jira/browse/HBASE-2937
 Project: HBase
  Issue Type: New Feature
  Components: client
Affects Versions: 0.89.20100621
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-2937.patch, HBASE-2937.patch


 Currently, there is no way to force an operation on the HBase client (viz. 
 HTable) to time out if a certain amount of time has elapsed.  In other words, 
 all invocations on the HTable class are veritable blocking calls, which will 
 not return until a response (successful or otherwise) is received. 
 In general, there are two ways to handle timeouts:  (a) call the operation in 
 a separate thread, until it returns a response or the wait on the thread 
 times out and (b) have the underlying socket unblock the operation if the 
 read times out.  The downside of the former approach is that it consumes more 
 resources in terms of threads and callables. 
 Here, we describe a way to specify and handle timeouts on the HTable client, 
 which relies on the latter approach (i.e., socket timeouts). Right now, the 
 HBaseClient sets the socket timeout to the value of the ipc.ping.interval 
 parameter, which is also how long it waits before pinging the server in case 
 of a failure. The goal is to allow clients to set that timeout on the fly 
 through HTable. Rather than adding an optional timeout argument to every 
 HTable operation, we chose to make it a property of HTable which effectively 
 applies to every method that involves a remote operation.
 In order to propagate the timeout  from HTable to HBaseClient, we replaced 
 all occurrences of ServerCallable in HTable with an extension called 
 ClientCallable, which sets the timeout on the region server interface, once 
 it has been instantiated, through the HConnection object. The latter, in 
 turn, asks HBaseRPC to pass that timeout to the corresponding Invoker, so 
 that it may inject the timeout at the time the invocation is made on the 
 region server proxy. Right before the request is sent to the server, we set 
 the timeout specified by the client on the underlying socket.
 In conclusion, this patch will afford clients the option of performing an 
 HBase operation until it completes or a specified timeout elapses. Note that 
 a timeout of zero is interpreted as an infinite timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3903) A successful write to client write-buffer may be lost or not visible

2011-05-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3903:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to TRUNK.  Thanks for the patch Doug and thanks for the reviews 
Tallat.

 A successful write to client write-buffer may be lost or not visible
 

 Key: HBASE-3903
 URL: https://issues.apache.org/jira/browse/HBASE-3903
 Project: HBase
  Issue Type: Bug
  Components: documentation
 Environment: Any.
Reporter: Tallat
Assignee: Doug Meil
Priority: Minor
  Labels: documentation
 Attachments: acid-semantics_HBASE_3903.xml.patch, 
 book_HBASE_3903.xml.patch


 A client can do a write to a client side 'write buffer' if enabled via 
 hTable.setAutoFlush(false). Now, assume a client puts value v under key k. 
 Two wrongs things can happen, violating the ACID semantics  of Hbase given 
 at: http://hbase.apache.org/acid-semantics.html
 1) Say the client fails immediately after the put succeeds. In this case, the 
 put will be lost, violating the durability property:
 quote Any operation that returns a success code (eg does not throw an 
 exception) will be made durable. /quote
  
 2) Say the client issues a read for k immediately after writing k. The put 
 will be stored in the client side write buffer, while the read will go to the 
 region server, returning an older value, instead of v, violating the 
 visibility property:
 quote
 When a client receives a success response for any mutation, that mutation
 is immediately visible to both that client and any client with whom it later
 communicates through side channels.
 /quote
 Thanks,
 Tallat

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3907) make it easier to add per-CF metrics; add some key per-CF metrics to start with

2011-05-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3907:
-

Component/s: metrics

 make it easier to add per-CF metrics; add some key per-CF metrics to start 
 with
 ---

 Key: HBASE-3907
 URL: https://issues.apache.org/jira/browse/HBASE-3907
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Kannan Muthukkaruppan
Assignee: Kannan Muthukkaruppan

 Add plumbing needed to add various types of per ColumnFamily metrics. And to 
 start with add a bunch per-CF metrics such as:
 1) Blocks read, cache hit, avg time of read for a column family.
 2) Similar stats for compaction related reads.
 3) Stats for meta block reads per CF
 4) Bloom Filter stats per CF
 etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3899) enhance HBase RPC to support free-ing up server handler threads even if response is not ready

2011-05-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038223#comment-13038223
 ] 

stack commented on HBASE-3899:
--

This patch looks fine to me.  Not too invasive.  How does it work? Something 
has to implement WritableDelayed?  Thanks D.

 enhance HBase RPC to support free-ing up server handler threads even if 
 response is not ready
 -

 Key: HBASE-3899
 URL: https://issues.apache.org/jira/browse/HBASE-3899
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: asyncRpc.txt, asyncRpc.txt


 In the current implementation, the server handler thread picks up an item 
 from the incoming callqueue, processes it and then wraps the response as a 
 Writable and sends it back to the IPC server module. This wastes 
 thread-resources when the thread is blocked for disk IO (transaction logging, 
 read into block cache, etc).
 It would be nice if we can make the RPC Server Handler threads pick up a call 
 from the IPC queue, hand it over to the application (e.g. HRegion), the 
 application can queue it to be processed asynchronously and send a response 
 back to the IPC server module saying that the response is not ready. The RPC 
 Server Handler thread is now ready to pick up another request from the 
 incoming callqueue. When the queued call is processed by the application, it 
 indicates to the IPC module that the response is now ready to be sent back to 
 the client.
 The RPC client continues to experience the same behaviour as before. A RPC 
 client is synchronous and blocks till the response arrives.
 This RPC enhancement allows us to do very powerful things with the 
 RegionServer. In future, we can make enhance the RegionServer's threading 
 model to a message-passing model for better performance. We will not be 
 limited by the number of threads in the RegionServer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HBASE-1960) Master should wait for DFS to come up when creating hbase.version

2011-05-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-1960:
--


Reopening until we look into Naresh's report.

 Master should wait for DFS to come up when creating hbase.version
 -

 Key: HBASE-1960
 URL: https://issues.apache.org/jira/browse/HBASE-1960
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 0.90.2, 0.92.0

 Attachments: HBASE-1960-redux.patch, HBASE-1960.patch


 The master does not wait for DFS to come up in the circumstance where the DFS 
 master is started for the first time after format and no datanodes have been 
 started yet. 
 {noformat}
 2009-11-07 11:47:28,115 INFO org.apache.hadoop.hbase.master.HMaster: 
 vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun Microsystems Inc., 
 vmVersion=14.2-b01
 2009-11-07 11:47:28,116 INFO org.apache.hadoop.hbase.master.HMaster: 
 vmInputArguments=[-Xmx1000m, -XX:+HeapDumpOnOutOfMemoryError, 
 -XX:+UseConcMarkSweepGC, -XX:+CMSIncrementalMode, 
 -Dhbase.log.dir=/mnt/hbase/logs, 
 -Dhbase.log.file=hbase-root-master-ip-10-242-15-159.log, 
 -Dhbase.home.dir=/usr/local/hbase-0.20.1/bin/.., -Dhbase.id.str=root, 
 -Dhbase.root.logger=INFO,DRFA, 
 -Djava.library.path=/usr/local/hbase-0.20.1/bin/../lib/native/Linux-amd64-64]
 2009-11-07 11:47:28,247 INFO org.apache.hadoop.hbase.master.HMaster: My 
 address is ip-10-242-15-159.ec2.internal:6
 2009-11-07 11:47:28,728 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
 /hbase/hbase.version could only be replicated to 0 nodes, instead of 1
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1267)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
 [...]
 2009-11-07 11:47:28,728 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
 for block null bad datanode[0] nodes == null
 2009-11-07 11:47:28,728 WARN org.apache.hadoop.hdfs.DFSClient: Could not get 
 block locations. Source file /hbase/hbase.version - Aborting...
 2009-11-07 11:47:28,729 FATAL org.apache.hadoop.hbase.master.HMaster: Not 
 starting HMaster because:
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
 /hbase/hbase.version could only be replicated to 0 nodes, instead of 1
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1267)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
 {noformat}
 Should probably sleep and retry the write a few times.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master

2011-05-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038229#comment-13038229
 ] 

stack commented on HBASE-3789:
--

You should remove rather than comment out code.

There is more to be done on this still, right J-D?

 Cleanup the locking contention in the master
 

 Key: HBASE-3789
 URL: https://issues.apache.org/jira/browse/HBASE-3789
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.92.0

 Attachments: HBASE-3789.patch


 The new master uses a lot of synchronized blocks to be safe, but it only 
 takes a few jstacks to see that there's multiple layers of lock contention 
 when a bunch of regions are moving (like when the balancer runs). The main 
 culprits are regionInTransition in AssignmentManager, ZKAssign that uses 
 ZKW.getZNnodes (basically another set of region in transitions), and locking 
 at the RegionState level. 
 My understanding is that even tho we have multiple threads to handle regions 
 in transition, everything is actually serialized. Most of the time, lock 
 holders are talking to ZK or a region server, which can take a few 
 milliseconds.
 A simple example is when AssignmentManager wants to update the timers for all 
 the regions on a RS, it will usually be waiting on another thread that's 
 holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3789) Cleanup the locking contention in the master

2011-05-23 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038231#comment-13038231
 ] 

Jean-Daniel Cryans commented on HBASE-3789:
---

Yes, like I wrote in the first comment it's a dirty WIP.

 Cleanup the locking contention in the master
 

 Key: HBASE-3789
 URL: https://issues.apache.org/jira/browse/HBASE-3789
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.2
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.92.0

 Attachments: HBASE-3789.patch


 The new master uses a lot of synchronized blocks to be safe, but it only 
 takes a few jstacks to see that there's multiple layers of lock contention 
 when a bunch of regions are moving (like when the balancer runs). The main 
 culprits are regionInTransition in AssignmentManager, ZKAssign that uses 
 ZKW.getZNnodes (basically another set of region in transitions), and locking 
 at the RegionState level. 
 My understanding is that even tho we have multiple threads to handle regions 
 in transition, everything is actually serialized. Most of the time, lock 
 holders are talking to ZK or a region server, which can take a few 
 milliseconds.
 A simple example is when AssignmentManager wants to update the timers for all 
 the regions on a RS, it will usually be waiting on another thread that's 
 holding the lock while talking to ZK.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2937) Facilitate Timeouts In HBase Client

2011-05-23 Thread Karthick Sankarachary (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038241#comment-13038241
 ] 

Karthick Sankarachary commented on HBASE-2937:
--

I believe the current patch in https://reviews.apache.org/r/755 is ready to go. 

Note that the {{ServerCallable}} will retry the call in the case of 
{{ConnectException}} and {{IOException}} if there appears to be time for for 
one more retry. However, in the case of a {{SocketTimeoutException}}, it will 
not retry the call, because that would entail pausing for an inordinately long 
period of time. 

A final caveat - under heavy load, the {{HTable}} operation may take slightly 
longer than hbase.client.operation.timeout to complete, since it could take a 
while for the response thread in {{HBaseClient}} to hand over control to the 
request thread.

 Facilitate Timeouts In HBase Client
 ---

 Key: HBASE-2937
 URL: https://issues.apache.org/jira/browse/HBASE-2937
 Project: HBase
  Issue Type: New Feature
  Components: client
Affects Versions: 0.89.20100621
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Critical
 Fix For: 0.92.0

 Attachments: HBASE-2937.patch, HBASE-2937.patch


 Currently, there is no way to force an operation on the HBase client (viz. 
 HTable) to time out if a certain amount of time has elapsed.  In other words, 
 all invocations on the HTable class are veritable blocking calls, which will 
 not return until a response (successful or otherwise) is received. 
 In general, there are two ways to handle timeouts:  (a) call the operation in 
 a separate thread, until it returns a response or the wait on the thread 
 times out and (b) have the underlying socket unblock the operation if the 
 read times out.  The downside of the former approach is that it consumes more 
 resources in terms of threads and callables. 
 Here, we describe a way to specify and handle timeouts on the HTable client, 
 which relies on the latter approach (i.e., socket timeouts). Right now, the 
 HBaseClient sets the socket timeout to the value of the ipc.ping.interval 
 parameter, which is also how long it waits before pinging the server in case 
 of a failure. The goal is to allow clients to set that timeout on the fly 
 through HTable. Rather than adding an optional timeout argument to every 
 HTable operation, we chose to make it a property of HTable which effectively 
 applies to every method that involves a remote operation.
 In order to propagate the timeout  from HTable to HBaseClient, we replaced 
 all occurrences of ServerCallable in HTable with an extension called 
 ClientCallable, which sets the timeout on the region server interface, once 
 it has been instantiated, through the HConnection object. The latter, in 
 turn, asks HBaseRPC to pass that timeout to the corresponding Invoker, so 
 that it may inject the timeout at the time the invocation is made on the 
 region server proxy. Right before the request is sent to the server, we set 
 the timeout specified by the client on the underlying socket.
 In conclusion, this patch will afford clients the option of performing an 
 HBase operation until it completes or a specified timeout elapses. Note that 
 a timeout of zero is interpreted as an infinite timeout.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3691) Add compressor support for 'snappy', google's compressor

2011-05-23 Thread John Heitmann (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038252#comment-13038252
 ] 

John Heitmann commented on HBASE-3691:
--

In the new instructions this:

COMPRESSION = 'snappy'

should be this:

COMPRESSION = 'SNAPPY'


 Add compressor support for 'snappy', google's compressor
 

 Key: HBASE-3691
 URL: https://issues.apache.org/jira/browse/HBASE-3691
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: hbase-snappy-3691-trunk-002.patch, 
 hbase-snappy-3691-trunk-003.patch, hbase-snappy-3691-trunk-004.patch, 
 hbase-snappy-3691-trunk.patch


 http://code.google.com/p/snappy/ is apache licensed.
 bq. Snappy is a compression/decompression library. It does not aim for 
 maximum compression, or compatibility with any other compression library; 
 instead, it aims for very high speeds and reasonable compression. For 
 instance, compared to the fastest mode of zlib, Snappy is an order of 
 magnitude faster for most inputs, but the resulting compressed files are 
 anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 
 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses 
 at about 500 MB/sec or more.
 bq. Snappy is widely used inside Google, in everything from BigTable and 
 MapReduce to our internal RPC systems. (Snappy has previously been referred 
 to as Zippy in some presentations and the likes.)
 Lets get it in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3691) Add compressor support for 'snappy', google's compressor

2011-05-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038260#comment-13038260
 ] 

stack commented on HBASE-3691:
--

Thanks John.  I fixed it in book (We should fix case sensitivity too for 
compressor names)

 Add compressor support for 'snappy', google's compressor
 

 Key: HBASE-3691
 URL: https://issues.apache.org/jira/browse/HBASE-3691
 Project: HBase
  Issue Type: Task
Reporter: stack
Priority: Critical
 Fix For: 0.92.0

 Attachments: hbase-snappy-3691-trunk-002.patch, 
 hbase-snappy-3691-trunk-003.patch, hbase-snappy-3691-trunk-004.patch, 
 hbase-snappy-3691-trunk.patch


 http://code.google.com/p/snappy/ is apache licensed.
 bq. Snappy is a compression/decompression library. It does not aim for 
 maximum compression, or compatibility with any other compression library; 
 instead, it aims for very high speeds and reasonable compression. For 
 instance, compared to the fastest mode of zlib, Snappy is an order of 
 magnitude faster for most inputs, but the resulting compressed files are 
 anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 
 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses 
 at about 500 MB/sec or more.
 bq. Snappy is widely used inside Google, in everything from BigTable and 
 MapReduce to our internal RPC systems. (Snappy has previously been referred 
 to as Zippy in some presentations and the likes.)
 Lets get it in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3911) book.xml - schema design, added comment about supported datatypes

2011-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038304#comment-13038304
 ] 

Hudson commented on HBASE-3911:
---

Integrated in HBase-TRUNK #1933 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1933/])


 book.xml - schema design, added comment about supported datatypes
 -

 Key: HBASE-3911
 URL: https://issues.apache.org/jira/browse/HBASE-3911
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_3911.xml.patch


 The supported datatypes in HBase is whatever can be converted to a 
 byte-array.  There are practical limits on size obviously, but this question 
 comes up enough on the dist-list to warrant inclusion in the book.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3908) TableSplit not implementing hashCode problem

2011-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038305#comment-13038305
 ] 

Hudson commented on HBASE-3908:
---

Integrated in HBase-TRUNK #1933 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1933/])


 TableSplit not implementing hashCode problem
 --

 Key: HBASE-3908
 URL: https://issues.apache.org/jira/browse/HBASE-3908
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce
Affects Versions: 0.90.1
Reporter: Daniel Iancu
 Fix For: 0.90.4

 Attachments: HBASE-3908-TableSplit-hashCode.patch


 reported by Lucian Iordache on hbase-user mail list. will attach the patch 
 asap
 ---
 Hi guys,
 I've just found a problem with the class TableSplit. It implements equals,
 but it does not implement hashCode also, as it should have.
 I've discovered it by trying to use a HashSet of TableSplit's, and I've
 noticed that some duplicate splits are added to the set.
 The only option I have for now is to extend TableSplit and to use the
 subclass.
 I use cloudera hbase cdh3u0 version.
 Do you know about this problem? Should I open a Jira issue for that, or it
 already exists?
 Thanks,
 Lucian

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3912) [Stargate] Columns not handle by Scan

2011-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038303#comment-13038303
 ] 

Hudson commented on HBASE-3912:
---

Integrated in HBase-TRUNK #1933 (See 
[https://builds.apache.org/hudson/job/HBase-TRUNK/1933/])


 [Stargate] Columns not handle by Scan
 -

 Key: HBASE-3912
 URL: https://issues.apache.org/jira/browse/HBASE-3912
 Project: HBase
  Issue Type: Bug
  Components: rest
Affects Versions: 0.90.3
Reporter: Lars George
Assignee: Lars George
Priority: Minor
 Fix For: 0.90.4, 0.92.0

 Attachments: HBASE-3912.patch


 There is an issue with ScannerModel only adding the column families to the 
 scan model, not actual columns. Easy fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3425) HMaster sends duplicate ports to regionserver in HServerAddress

2011-05-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3425.
--

Resolution: Won't Fix

HServerAddress has been deprecated in TRUNK.  We shouldn't have this prob. 
going forward.  Closing as 'wont fix'

 HMaster sends duplicate ports to regionserver in HServerAddress
 ---

 Key: HBASE-3425
 URL: https://issues.apache.org/jira/browse/HBASE-3425
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.0
Reporter: Matt Corgan
 Fix For: 0.92.0

 Attachments: HBASE-3425[0.90.0].patch


 On regionserver startup, the regionserver receives an HServerAddress from the 
 master as a Writable.  It's a string hostname and an integer port.  Our 
 master is also appending the port to the string, so when they are 
 concatenated it becomes hadoopnode98:60020:60020 and the HServerAddress 
 cannot be instantiated.  
 This should probably be fixed in the master as well, but I don't know where 
 it happens.  The attached patch handles it in the regionserver.
 Regionserver startup log:
 2011-01-06 15:55:48,813 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Connected to master at 
 hadoopmaster.hotpads.srv:6
 2011-01-06 15:55:48,857 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Telling master at 
 hadoopmaster.hotpads.srv:6 that we are up
 2011-01-06 15:55:48,910 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: 
 hbase.regionserver.address=HadoopNode98.hotpads.srv:60020
 2011-01-06 15:55:48,910 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: 
 fs.default.name=hdfs://hadoopmaster.hotpads.srv:54310/hbase
 2011-01-06 15:55:48,910 DEBUG 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Config from master: 
 hbase.rootdir=hdfs://hadoopmaster.hotpads.srv:54310/hbase
 2011-01-06 15:55:48,945 ERROR org.apache.hadoop.hbase.HServerAddress: Could 
 not resolve the DNS name of HadoopNode98.hotpads.srv:60020:60020
 2011-01-06 15:55:48,945 INFO 
 org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Failed 
 initialization
 2011-01-06 15:55:48,947 ERROR 
 org.apache.hadoop.hbase.regionserver.HRegionServer: Failed init
 java.lang.IllegalArgumentException: Could not resolve the DNS name of 
 HadoopNode98.hotpads.srv:60020:60020
 at 
 org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
 at 
 org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:76)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:798)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.tryReportForDuty(HRegionServer.java:1394)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:522)
 at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3913) Expose ColumnPaginationFilter to the Thrift Server

2011-05-23 Thread Matthew Ward (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Ward updated HBASE-3913:


Status: Patch Available  (was: Open)

 Expose ColumnPaginationFilter to the Thrift Server
 --

 Key: HBASE-3913
 URL: https://issues.apache.org/jira/browse/HBASE-3913
 Project: HBase
  Issue Type: New Feature
  Components: thrift
Reporter: Matthew Ward
Priority: Minor
  Labels: filter, thrift
 Attachments: YF-3913.patch


 Expose the ColumnPaginationFilter to the thrift server by implementing the 
 following methods:
 public ListTRowResult getRowWithColumnsPaginated(byte[] tableName, byte[] 
 row, Listbyte[] columns, int limit, int offset);
 public ListTRowResult getRowWithColumnsTsPaginated(byte[] tableName, byte[] 
 row, Listbyte[] columns, long timestamp, int limit, int offset)
 Also look into exposing a thrift method for exposing the number of columns in 
 a particular row's family. 
 Original improvement Idea submitted on dev list and approved by Stack.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3913) Expose ColumnPaginationFilter to the Thrift Server

2011-05-23 Thread Matthew Ward (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Ward updated HBASE-3913:


Attachment: YF-3913.patch

Attaching patch.

 Expose ColumnPaginationFilter to the Thrift Server
 --

 Key: HBASE-3913
 URL: https://issues.apache.org/jira/browse/HBASE-3913
 Project: HBase
  Issue Type: New Feature
  Components: thrift
Reporter: Matthew Ward
Priority: Minor
  Labels: filter, thrift
 Attachments: YF-3913.patch


 Expose the ColumnPaginationFilter to the thrift server by implementing the 
 following methods:
 public ListTRowResult getRowWithColumnsPaginated(byte[] tableName, byte[] 
 row, Listbyte[] columns, int limit, int offset);
 public ListTRowResult getRowWithColumnsTsPaginated(byte[] tableName, byte[] 
 row, Listbyte[] columns, long timestamp, int limit, int offset)
 Also look into exposing a thrift method for exposing the number of columns in 
 a particular row's family. 
 Original improvement Idea submitted on dev list and approved by Stack.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3913) Expose ColumnPaginationFilter to the Thrift Server

2011-05-23 Thread Matthew Ward (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Ward updated HBASE-3913:


Status: Patch Available  (was: Open)

 Expose ColumnPaginationFilter to the Thrift Server
 --

 Key: HBASE-3913
 URL: https://issues.apache.org/jira/browse/HBASE-3913
 Project: HBase
  Issue Type: New Feature
  Components: thrift
Reporter: Matthew Ward
Priority: Minor
  Labels: filter, thrift
 Attachments: YF-3913.patch


 Expose the ColumnPaginationFilter to the thrift server by implementing the 
 following methods:
 public ListTRowResult getRowWithColumnsPaginated(byte[] tableName, byte[] 
 row, Listbyte[] columns, int limit, int offset);
 public ListTRowResult getRowWithColumnsTsPaginated(byte[] tableName, byte[] 
 row, Listbyte[] columns, long timestamp, int limit, int offset)
 Also look into exposing a thrift method for exposing the number of columns in 
 a particular row's family. 
 Original improvement Idea submitted on dev list and approved by Stack.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3913) Expose ColumnPaginationFilter to the Thrift Server

2011-05-23 Thread Matthew Ward (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Ward updated HBASE-3913:


Status: Open  (was: Patch Available)

 Expose ColumnPaginationFilter to the Thrift Server
 --

 Key: HBASE-3913
 URL: https://issues.apache.org/jira/browse/HBASE-3913
 Project: HBase
  Issue Type: New Feature
  Components: thrift
Reporter: Matthew Ward
Priority: Minor
  Labels: filter, thrift
 Attachments: YF-3913.patch


 Expose the ColumnPaginationFilter to the thrift server by implementing the 
 following methods:
 public ListTRowResult getRowWithColumnsPaginated(byte[] tableName, byte[] 
 row, Listbyte[] columns, int limit, int offset);
 public ListTRowResult getRowWithColumnsTsPaginated(byte[] tableName, byte[] 
 row, Listbyte[] columns, long timestamp, int limit, int offset)
 Also look into exposing a thrift method for exposing the number of columns in 
 a particular row's family. 
 Original improvement Idea submitted on dev list and approved by Stack.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3899) enhance HBase RPC to support free-ing up server handler threads even if response is not ready

2011-05-23 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038340#comment-13038340
 ] 

dhruba borthakur commented on HBASE-3899:
-

Thanks for the comments stack. The HBase code that will uses it can return 
WritableDelayed. I have this code but thought it is better to get the ipc code 
in there first, because it is a standalone piece of code without any 
depenencies so that the rest of the patch is not bloated. Also, this patch does 
not change any existing APIs. 

 enhance HBase RPC to support free-ing up server handler threads even if 
 response is not ready
 -

 Key: HBASE-3899
 URL: https://issues.apache.org/jira/browse/HBASE-3899
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: asyncRpc.txt, asyncRpc.txt


 In the current implementation, the server handler thread picks up an item 
 from the incoming callqueue, processes it and then wraps the response as a 
 Writable and sends it back to the IPC server module. This wastes 
 thread-resources when the thread is blocked for disk IO (transaction logging, 
 read into block cache, etc).
 It would be nice if we can make the RPC Server Handler threads pick up a call 
 from the IPC queue, hand it over to the application (e.g. HRegion), the 
 application can queue it to be processed asynchronously and send a response 
 back to the IPC server module saying that the response is not ready. The RPC 
 Server Handler thread is now ready to pick up another request from the 
 incoming callqueue. When the queued call is processed by the application, it 
 indicates to the IPC module that the response is now ready to be sent back to 
 the client.
 The RPC client continues to experience the same behaviour as before. A RPC 
 client is synchronous and blocks till the response arrives.
 This RPC enhancement allows us to do very powerful things with the 
 RegionServer. In future, we can make enhance the RegionServer's threading 
 model to a message-passing model for better performance. We will not be 
 limited by the number of threads in the RegionServer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3833) ability to support includes/excludes list in Hbase

2011-05-23 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038342#comment-13038342
 ] 

dhruba borthakur commented on HBASE-3833:
-

It will be nice to get both forced and graceful decommissioning to this 
jira, is Vishal agrees.

 ability to support includes/excludes list in Hbase
 --

 Key: HBASE-3833
 URL: https://issues.apache.org/jira/browse/HBASE-3833
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Affects Versions: 0.90.2
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: excl-patch.txt, excl-patch.txt


 An HBase cluster currently does not have the ability to specify that the 
 master should accept regionservers only from a specified list. This helps 
 preventing administrative errors where the same machine could be included in 
 two clusters. It also allows the administrator to easily remove un-ssh-able 
 machines from the cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3914) ROOT region appeared in two regionserver's onlineRegions at the same time

2011-05-23 Thread Jieshan Bean (JIRA)
ROOT region appeared in two regionserver's onlineRegions at the same time
-

 Key: HBASE-3914
 URL: https://issues.apache.org/jira/browse/HBASE-3914
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: Jieshan Bean
 Fix For: 0.90.4


This could be happen under the following steps with little probability:
(I suppose the cluster nodes names are RS1/RS2/HM, and there's more than 10,000 
regions in the cluster)

1.Root region was opened in RS1.
2.Due to some reason(Maybe the hdfs process was got abnormal),RS1 aborted.
3.ServerShutdownHandler process start.
4.HMaster was restarted, during the finishInitialization's handling, ROOT 
region was unsetted, and assigned to RS2. 
5.Root region was opened successfully in RS2.
6.But after while, ROOT region was unsetted again by RS1's 
ServerShutdownHandler. Then it was reassigned. Before that, the RS1 was 
restarted. So there's two possibilities:
 Case a:
   ROOT region was assigned to RS1. 
   It seemed nothing would be affected. But the root region was still online in 
RS2.  
   
 Case b:
   ROOT region was assigned to RS2.
   The ROOT Region couldn't be opened until it would be reassigned to other 
regionserver, because it was showed online in this regionserver.

This could be proved from the logs:

1. ROOT region was opened with two times:
2011-05-17 10:32:59,188 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
-ROOT-,,0.70236052 on 162-2-77-0,20020,1305598359031
2011-05-17 10:33:01,536 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
-ROOT-,,0.70236052 on 162-2-16-6,20020,1305597548212

2.Regionserver 162-2-16-6 was aborted, so it was reassigned to 162-2-77-0, but 
already online on this server:
10:49:30,920 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received 
request to open region: -ROOT-,,0.70236052 10:49:30,920 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open 
of -ROOT-,,0.70236052 10:49:30,920 WARN 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted open 
of -ROOT-,,0.70236052 but already online on this server

This could be cause a long break of ROOT region offline, though it happened 
under a special scenario. And I have checked the code, it seems a tiny bug here.

There's 2 references about assignRoot():

1.
HMaster# assignRootAndMeta:

if (!catalogTracker.verifyRootRegionLocation(timeout)) {
  this.assignmentManager.assignRoot();
  this.catalogTracker.waitForRoot();
  assigned++;
}

2.
ServerShutdownHandler# process: 

  if (isCarryingRoot()) { // -ROOT-  
try {
   this.services.getAssignmentManager().assignRoot();
} catch (KeeperException e) {
   this.server.abort(In server shutdown processing, assigning root, 
e);
   throw new IOException(Aborting, e);
}
  }

I think each time call the method of assignRoot(), we should verify Root 
Region's Location first. Because before the assigning, the ROOT region could 
have been assigned by another place.





--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3833) ability to support includes/excludes list in Hbase

2011-05-23 Thread Vishal Kathuria (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038364#comment-13038364
 ] 

Vishal Kathuria commented on HBASE-3833:


Stack:
My intention was build a graceful shutdown (that is the reason I assign the 
regions first and then expire the server). I assumed that assign* API unloads 
them off the source region server when assigning them to the new region server, 
but I guess I was wrong in that assumption. Let me take a look at the code 
again to see how I can change it to make the shutdown graceful and not have the 
situation you mentioned of two servers owning the region - that is clearly not 
desirable.

Yuzhi:
In case any of the meta regions are involved, then the 
MetaServerShutdownHandler is dispatched. That class has both  isCarryingRoot() 
and isCarryingMeta() defined.

I will publish another patch in a couple of days. Stack, please hold on to 
hacking this in. 

Thanks!
Vishal

 ability to support includes/excludes list in Hbase
 --

 Key: HBASE-3833
 URL: https://issues.apache.org/jira/browse/HBASE-3833
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Affects Versions: 0.90.2
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: excl-patch.txt, excl-patch.txt


 An HBase cluster currently does not have the ability to specify that the 
 master should accept regionservers only from a specified list. This helps 
 preventing administrative errors where the same machine could be included in 
 two clusters. It also allows the administrator to easily remove un-ssh-able 
 machines from the cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3899) enhance HBase RPC to support free-ing up server handler threads even if response is not ready

2011-05-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3899.
--

   Resolution: Fixed
Fix Version/s: 0.92.0
 Hadoop Flags: [Reviewed]

Committed to TRUNK.  Thanks for the patch Dhruba.

 enhance HBase RPC to support free-ing up server handler threads even if 
 response is not ready
 -

 Key: HBASE-3899
 URL: https://issues.apache.org/jira/browse/HBASE-3899
 Project: HBase
  Issue Type: Improvement
  Components: ipc
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.92.0

 Attachments: asyncRpc.txt, asyncRpc.txt


 In the current implementation, the server handler thread picks up an item 
 from the incoming callqueue, processes it and then wraps the response as a 
 Writable and sends it back to the IPC server module. This wastes 
 thread-resources when the thread is blocked for disk IO (transaction logging, 
 read into block cache, etc).
 It would be nice if we can make the RPC Server Handler threads pick up a call 
 from the IPC queue, hand it over to the application (e.g. HRegion), the 
 application can queue it to be processed asynchronously and send a response 
 back to the IPC server module saying that the response is not ready. The RPC 
 Server Handler thread is now ready to pick up another request from the 
 incoming callqueue. When the queued call is processed by the application, it 
 indicates to the IPC module that the response is now ready to be sent back to 
 the client.
 The RPC client continues to experience the same behaviour as before. A RPC 
 client is synchronous and blocks till the response arrives.
 This RPC enhancement allows us to do very powerful things with the 
 RegionServer. In future, we can make enhance the RegionServer's threading 
 model to a message-passing model for better performance. We will not be 
 limited by the number of threads in the RegionServer.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3914) ROOT region appeared in two regionserver's onlineRegions at the same time

2011-05-23 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-3914:


Attachment: HBASE-3914.patch

 ROOT region appeared in two regionserver's onlineRegions at the same time
 -

 Key: HBASE-3914
 URL: https://issues.apache.org/jira/browse/HBASE-3914
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: Jieshan Bean
 Fix For: 0.90.4

 Attachments: HBASE-3914.patch


 This could be happen under the following steps with little probability:
 (I suppose the cluster nodes names are RS1/RS2/HM, and there's more than 
 10,000 regions in the cluster)
 1.Root region was opened in RS1.
 2.Due to some reason(Maybe the hdfs process was got abnormal),RS1 aborted.
 3.ServerShutdownHandler process start.
 4.HMaster was restarted, during the finishInitialization's handling, ROOT 
 region was unsetted, and assigned to RS2. 
 5.Root region was opened successfully in RS2.
 6.But after while, ROOT region was unsetted again by RS1's 
 ServerShutdownHandler. Then it was reassigned. Before that, the RS1 was 
 restarted. So there's two possibilities:
  Case a:
ROOT region was assigned to RS1. 
It seemed nothing would be affected. But the root region was still online 
 in RS2.  

  Case b:
ROOT region was assigned to RS2.
The ROOT Region couldn't be opened until it would be reassigned to other 
 regionserver, because it was showed online in this regionserver.
 This could be proved from the logs:
 1. ROOT region was opened with two times:
 2011-05-17 10:32:59,188 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
 -ROOT-,,0.70236052 on 162-2-77-0,20020,1305598359031
 2011-05-17 10:33:01,536 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
 -ROOT-,,0.70236052 on 162-2-16-6,20020,1305597548212
 2.Regionserver 162-2-16-6 was aborted, so it was reassigned to 162-2-77-0, 
 but already online on this server:
 10:49:30,920 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: 
 Received request to open region: -ROOT-,,0.70236052 10:49:30,920 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing 
 open of -ROOT-,,0.70236052 10:49:30,920 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
 open of -ROOT-,,0.70236052 but already online on this server
 This could be cause a long break of ROOT region offline, though it happened 
 under a special scenario. And I have checked the code, it seems a tiny bug 
 here.
 There's 2 references about assignRoot():
 1.
 HMaster# assignRootAndMeta:
 if (!catalogTracker.verifyRootRegionLocation(timeout)) {
   this.assignmentManager.assignRoot();
   this.catalogTracker.waitForRoot();
   assigned++;
 }
 2.
 ServerShutdownHandler# process: 
 
   if (isCarryingRoot()) { // -ROOT-  
 try {
this.services.getAssignmentManager().assignRoot();
 } catch (KeeperException e) {
this.server.abort(In server shutdown processing, assigning root, 
 e);
throw new IOException(Aborting, e);
 }
   }
 I think each time call the method of assignRoot(), we should verify Root 
 Region's Location first. Because before the assigning, the ROOT region could 
 have been assigned by another place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3915) Binary row keys in hbck and other miscellaneous binary key display issues

2011-05-23 Thread stack (JIRA)
Binary row keys in hbck and other miscellaneous binary key display issues
-

 Key: HBASE-3915
 URL: https://issues.apache.org/jira/browse/HBASE-3915
 Project: HBase
  Issue Type: Bug
Reporter: stack


This is a patch of miscellany that addresses print out of binary keys in zk and 
in hbck.  Fixes small issue too in hbck where it says all tables are 
inconsistent when later in its display it says they are not(and they are 
not).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3808) Implement Executor.toString for master handlers at least

2011-05-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038378#comment-13038378
 ] 

stack commented on HBASE-3808:
--

@Brock That looks great.  One thing.  I wonder if there is a way to distingush 
the Executor instances, a means that you could add to the toString?  We have a 
pool of them usually and it would be good to distingush between the pool 
members if only its a numeral or even just the class instance name?... so 
toString would look like 'TestEventHandler-localhost,8080,0-1', 
'TestEventHandler-localhost,8080,0-2', etc.  Would that be possible?  I wonder 
what Thread.currentThread.getName() says?  It probably varies with how the 
Executor is run but maybe this would help if it were part of the toString?  
Thanks for looking at this Brock.

 Implement Executor.toString for master handlers at least
 

 Key: HBASE-3808
 URL: https://issues.apache.org/jira/browse/HBASE-3808
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Minor

 On shutdown, if still outstanding Executors queued then when ExecutorService 
 lists what is outstanding, the list will be other than a list of default 
 toString implementations of ServerShutdownHandler objects.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3904) HConnection.isTableAvailable returns true even with not all regions available.

2011-05-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038380#comment-13038380
 ] 

stack commented on HBASE-3904:
--

@Ted Seems good.  Lets get Vidhya to try it.

 HConnection.isTableAvailable returns true even with not all regions available.
 --

 Key: HBASE-3904
 URL: https://issues.apache.org/jira/browse/HBASE-3904
 Project: HBase
  Issue Type: Bug
  Components: client
Reporter: Vidhyashankar Venkataraman
Priority: Minor
 Attachments: 3904-v2.txt, 3904-v3.txt, 3904.txt


 This function as per the java doc is supposed to return true iff all the 
 regions in the table are available. But if the table is still being created 
 this function may return inconsistent results (For example, when a table with 
 a large number of split keys is created). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3915) Binary row keys in hbck and other miscellaneous binary key display issues

2011-05-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3915:
-

Attachment: 3195.txt

Minor fixes.

 Binary row keys in hbck and other miscellaneous binary key display issues
 -

 Key: HBASE-3915
 URL: https://issues.apache.org/jira/browse/HBASE-3915
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 3195.txt


 This is a patch of miscellany that addresses print out of binary keys in zk 
 and in hbck.  Fixes small issue too in hbck where it says all tables are 
 inconsistent when later in its display it says they are not(and they are 
 not).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3915) Binary row keys in hbck and other miscellaneous binary key display issues

2011-05-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3915:
-

Attachment: 3195.txt

Update.  Don't bulk assign new table regions if only one region (We were 
running assigns of zero regions).

 Binary row keys in hbck and other miscellaneous binary key display issues
 -

 Key: HBASE-3915
 URL: https://issues.apache.org/jira/browse/HBASE-3915
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 3195.txt, 3195.txt


 This is a patch of miscellany that addresses print out of binary keys in zk 
 and in hbck.  Fixes small issue too in hbck where it says all tables are 
 inconsistent when later in its display it says they are not(and they are 
 not).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3915) Binary row keys in hbck and other miscellaneous binary key display issues

2011-05-23 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3915.
--

   Resolution: Fixed
Fix Version/s: 0.90.4
 Assignee: stack

Applied to branch and trunk.

 Binary row keys in hbck and other miscellaneous binary key display issues
 -

 Key: HBASE-3915
 URL: https://issues.apache.org/jira/browse/HBASE-3915
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
 Fix For: 0.90.4

 Attachments: 3195.txt, 3195.txt


 This is a patch of miscellany that addresses print out of binary keys in zk 
 and in hbck.  Fixes small issue too in hbck where it says all tables are 
 inconsistent when later in its display it says they are not(and they are 
 not).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3914) ROOT region appeared in two regionserver's onlineRegions at the same time

2011-05-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038395#comment-13038395
 ] 

stack commented on HBASE-3914:
--

Thanks for looking into this Jieshan.

Why not keep the change local to ServerShutdownHandler?  If it is the only 
class that uses the new verifyAndAssignRoot, why not make it private to the 
ServerShutdownHandler class?

base.catalog.verification.timeout is new?  And default is 1 second only?  Is 
that enough time?

Good stuff

 ROOT region appeared in two regionserver's onlineRegions at the same time
 -

 Key: HBASE-3914
 URL: https://issues.apache.org/jira/browse/HBASE-3914
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.3
Reporter: Jieshan Bean
 Fix For: 0.90.4

 Attachments: HBASE-3914.patch


 This could be happen under the following steps with little probability:
 (I suppose the cluster nodes names are RS1/RS2/HM, and there's more than 
 10,000 regions in the cluster)
 1.Root region was opened in RS1.
 2.Due to some reason(Maybe the hdfs process was got abnormal),RS1 aborted.
 3.ServerShutdownHandler process start.
 4.HMaster was restarted, during the finishInitialization's handling, ROOT 
 region was unsetted, and assigned to RS2. 
 5.Root region was opened successfully in RS2.
 6.But after while, ROOT region was unsetted again by RS1's 
 ServerShutdownHandler. Then it was reassigned. Before that, the RS1 was 
 restarted. So there's two possibilities:
  Case a:
ROOT region was assigned to RS1. 
It seemed nothing would be affected. But the root region was still online 
 in RS2.  

  Case b:
ROOT region was assigned to RS2.
The ROOT Region couldn't be opened until it would be reassigned to other 
 regionserver, because it was showed online in this regionserver.
 This could be proved from the logs:
 1. ROOT region was opened with two times:
 2011-05-17 10:32:59,188 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
 -ROOT-,,0.70236052 on 162-2-77-0,20020,1305598359031
 2011-05-17 10:33:01,536 DEBUG 
 org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
 -ROOT-,,0.70236052 on 162-2-16-6,20020,1305597548212
 2.Regionserver 162-2-16-6 was aborted, so it was reassigned to 162-2-77-0, 
 but already online on this server:
 10:49:30,920 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: 
 Received request to open region: -ROOT-,,0.70236052 10:49:30,920 DEBUG 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing 
 open of -ROOT-,,0.70236052 10:49:30,920 WARN 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Attempted 
 open of -ROOT-,,0.70236052 but already online on this server
 This could be cause a long break of ROOT region offline, though it happened 
 under a special scenario. And I have checked the code, it seems a tiny bug 
 here.
 There's 2 references about assignRoot():
 1.
 HMaster# assignRootAndMeta:
 if (!catalogTracker.verifyRootRegionLocation(timeout)) {
   this.assignmentManager.assignRoot();
   this.catalogTracker.waitForRoot();
   assigned++;
 }
 2.
 ServerShutdownHandler# process: 
 
   if (isCarryingRoot()) { // -ROOT-  
 try {
this.services.getAssignmentManager().assignRoot();
 } catch (KeeperException e) {
this.server.abort(In server shutdown processing, assigning root, 
 e);
throw new IOException(Aborting, e);
 }
   }
 I think each time call the method of assignRoot(), we should verify Root 
 Region's Location first. Because before the assigning, the ROOT region could 
 have been assigned by another place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira