[jira] [Commented] (HBASE-6533) [replication] replication will be block if WAL compress set differently in master and slave configuration

2012-08-09 Thread terry zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431640#comment-13431640
 ] 

terry zhang commented on HBASE-6533:


so sorry create so many same issue because my IE issue. Could anyone help me 
delete them ? 

> [replication] replication will be block if WAL compress set differently in 
> master and slave configuration
> -
>
> Key: HBASE-6533
> URL: https://issues.apache.org/jira/browse/HBASE-6533
> Project: HBase
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.94.0
>Reporter: terry zhang
>
> as we know in hbase 0.94.0 we have a configuration below
>   
> hbase.regionserver.wal.enablecompression
>  true
>   
> if we enable it in master cluster and disable it in slave cluster . Then 
> replication will not work. It will throw unwrapRemoteException again and 
> again in master cluster.
> 2012-08-09 12:49:55,892 WARN 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't 
> replicate because of an error
>  on the remote cluster: 
> java.io.IOException: IPC server unable to read call parameters: Error in 
> readFields
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
> at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365)
> Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read 
> call parameters: Error in readFields
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921)
> at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151)
> at $Proxy13.replicateLogEntries(Unknown Source)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616)
> ... 1 more 
> This is because Slave cluster can not parse the hlog entry .
> 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to 
> read call parameters for client 10.232.98.89
> java.io.IOException: Error in readFields
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635)
> at 
> org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254)
> at 
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682)
> ... 11 more 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6533) [replication] replication will be block if WAL compress set differently in master and slave configuration

2012-08-09 Thread terry zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431643#comment-13431643
 ] 

terry zhang commented on HBASE-6533:


  now we can only go around this issue is set the master back to uncompressed 
mode and delete the zk node replication/rs. And restart the master cluster. 
Cause replication slave don't support reading compress hlog.
  But if we have multi master and some of them set hlog to compressed mode. 
Them we can not handler this situation.


> [replication] replication will be block if WAL compress set differently in 
> master and slave configuration
> -
>
> Key: HBASE-6533
> URL: https://issues.apache.org/jira/browse/HBASE-6533
> Project: HBase
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.94.0
>Reporter: terry zhang
>
> as we know in hbase 0.94.0 we have a configuration below
>   
> hbase.regionserver.wal.enablecompression
>  true
>   
> if we enable it in master cluster and disable it in slave cluster . Then 
> replication will not work. It will throw unwrapRemoteException again and 
> again in master cluster.
> 2012-08-09 12:49:55,892 WARN 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Can't 
> replicate because of an error
>  on the remote cluster: 
> java.io.IOException: IPC server unable to read call parameters: Error in 
> readFields
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
> at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:635)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:365)
> Caused by: org.apache.hadoop.ipc.RemoteException: IPC server unable to read 
> call parameters: Error in readFields
> at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:921)
> at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:151)
> at $Proxy13.replicateLogEntries(Unknown Source)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:616)
> ... 1 more 
> This is because Slave cluster can not parse the hlog entry .
> 2012-08-09 14:46:05,891 WARN org.apache.hadoop.ipc.HBaseServer: Unable to 
> read call parameters for client 10.232.98.89
> java.io.IOException: Error in readFields
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:685)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:586)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:635)
> at 
> org.apache.hadoop.hbase.ipc.Invocation.readFields(Invocation.java:125)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:1292)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:1207)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:735)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:524)
> at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:499)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.hbase.KeyValue.readFields(KeyValue.java:2254)
> at 
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFields(WALEdit.java:146)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLog$Entry.readFields(HLog.java:1767)
> at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:682)
> ... 11 more 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
ht

[jira] [Commented] (HBASE-6520) MSLab May cause the Bytes.toLong not work correctly for increment

2012-08-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431655#comment-13431655
 ] 

Hudson commented on HBASE-6520:
---

Integrated in HBase-0.94 #389 (See 
[https://builds.apache.org/job/HBase-0.94/389/])
HBASE-6520 MSLab May cause the Bytes.toLong not work correctly for 
increment (ShiXing) (Revision 1371045)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestHRegion.java


> MSLab May cause the Bytes.toLong not work correctly for increment
> -
>
> Key: HBASE-6520
> URL: https://issues.apache.org/jira/browse/HBASE-6520
> Project: HBase
>  Issue Type: Bug
>Reporter: ShiXing
>Assignee: ShiXing
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6520-0.94-v1.patch, HBASE-6520-trunk-v1.patch
>
>
> When use MemStoreLAB, the KeyValues will share the byte array allocated by 
> the MemStoreLAB, all the KeyValues' "bytes" attributes are the same byte 
> array. When use the functions such as Bytes.toLong(byte[] bytes, int offset):
> {code}
>   public static long toLong(byte[] bytes, int offset) {
> return toLong(bytes, offset, SIZEOF_LONG);
>   }
>   public static long toLong(byte[] bytes, int offset, final int length) {
> if (length != SIZEOF_LONG || offset + length > bytes.length) {
>   throw explainWrongLengthOrOffset(bytes, offset, length, SIZEOF_LONG);
> }
> long l = 0;
> for(int i = offset; i < offset + length; i++) {
>   l <<= 8;
>   l ^= bytes[i] & 0xFF;
> }
> return l;
>   }
> {code}
> If we do not put a long value to the KeyValue, and read it as a long value in 
> HRegion.increment(),the check 
> {code}
> offset + length > bytes.length
> {code}
> will take no effects, because the bytes.length is not equal to 
> keyLength+valueLength, indeed it is MemStoreLAB chunkSize which is default 
> 2048 * 1024.
> I will paste the patch later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2012-08-09 Thread Jie Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Huang updated HBASE-6529:
-

Attachment: hbase-6529.diff

Oh~~. That is why bulkload becomes quite slow since 0.94 version, even for 
those data storing over the same HDFS cluster. Here attaches the quick fix. 

> With HFile v2, the region server will always perform an extra copy of source 
> files
> --
>
> Key: HBASE-6529
> URL: https://issues.apache.org/jira/browse/HBASE-6529
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6529.diff
>
>
> With HFile v2 implementation in HBase 0.94 & 0.96, the region server will use 
> HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
> Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
> {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
> Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2012-08-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431688#comment-13431688
 ] 

stack commented on HBASE-6529:
--

+1 on patch.  Good one Jason.

> With HFile v2, the region server will always perform an extra copy of source 
> files
> --
>
> Key: HBASE-6529
> URL: https://issues.apache.org/jira/browse/HBASE-6529
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6529.diff
>
>
> With HFile v2 implementation in HBase 0.94 & 0.96, the region server will use 
> HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
> Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
> {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
> Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2012-08-09 Thread Jie Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431697#comment-13431697
 ] 

Jie Huang commented on HBASE-6529:
--

BTW, all the tests are passed with this patch file. thanks.

> With HFile v2, the region server will always perform an extra copy of source 
> files
> --
>
> Key: HBASE-6529
> URL: https://issues.apache.org/jira/browse/HBASE-6529
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6529.diff
>
>
> With HFile v2 implementation in HBase 0.94 & 0.96, the region server will use 
> HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
> Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
> {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
> Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6537) Balancer compete with disable table will lead to cluster inconsistent

2012-08-09 Thread Zhou wenjian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhou wenjian updated HBASE-6537:


Summary: Balancer compete with disable table will lead to cluster 
inconsistent   (was: Balancer compete with disable table will lead to master 
abort)

> Balancer compete with disable table will lead to cluster inconsistent 
> --
>
> Key: HBASE-6537
> URL: https://issues.apache.org/jira/browse/HBASE-6537
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0
>Reporter: Zhou wenjian
> Fix For: 0.94.1
>
>
> Appear in 94. trunk is ok for the issue
> Balancer will collect the regionplans to move(unassign and then assign).
> before unassign, disable table appears, 
> after close the region in rs, master will delete the znode, romove region 
> from RIT,
> and then clean the region from the online regions.
> During romoving region from RIT and cleaning out the region from the online 
> regions. 
> balancer begins to unassign, it will get a NotServingRegionException and if 
> the table is disabling, it will deal with the state in master and delete the 
> znode . However the table is disabled now, so the RIT and znode will remain. 
> TimeoutMonitor draws a blank on it.
> It will hold back enabling the table or balancer unless restart

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6537) Balancer compete with disable table will lead to cluster inconsistent

2012-08-09 Thread Zhou wenjian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhou wenjian updated HBASE-6537:


Attachment: HBASE-6537-94.patch

> Balancer compete with disable table will lead to cluster inconsistent 
> --
>
> Key: HBASE-6537
> URL: https://issues.apache.org/jira/browse/HBASE-6537
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0
>Reporter: Zhou wenjian
> Fix For: 0.94.1
>
> Attachments: HBASE-6537-94.patch
>
>
> Appear in 94. trunk is ok for the issue
> Balancer will collect the regionplans to move(unassign and then assign).
> before unassign, disable table appears, 
> after close the region in rs, master will delete the znode, romove region 
> from RIT,
> and then clean the region from the online regions.
> During romoving region from RIT and cleaning out the region from the online 
> regions. 
> balancer begins to unassign, it will get a NotServingRegionException and if 
> the table is disabling, it will deal with the state in master and delete the 
> znode . However the table is disabled now, so the RIT and znode will remain. 
> TimeoutMonitor draws a blank on it.
> It will hold back enabling the table or balancer unless restart

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6522) Expose locks and leases to Coprocessors

2012-08-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431750#comment-13431750
 ] 

Hudson commented on HBASE-6522:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #127 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/127/])
HBASE-6522 Expose locks and leases to Coprocessors (Revision 1371024)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/MockRegionServer.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/MockRegionServerServices.java


> Expose locks and leases to Coprocessors
> ---
>
> Key: HBASE-6522
> URL: https://issues.apache.org/jira/browse/HBASE-6522
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6522-v2.txt, 6522.txt
>
>
> Currently it is not possible for CP to implement any of checkAndMutate type 
> operations, because coprocessor have no way create a lock, because getLock is 
> private HRegion (interestingly ReleaseLock is public).
> In addition it would nice if Coprocessor could hook into the RegionServers' 
> Lease management.
> Here I propose two trivial changes:
> # Make HRegion.getLock public
> # Add {code}Leases getLeases(){code} to RegionServerServices (and hence to 
> HRegionServer)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2012-08-09 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431774#comment-13431774
 ] 

nkeywal commented on HBASE-6364:


For the tests, I tried 2 options:
- Implementing a specific SocketFactory that would return configurable sockets. 
Too fragile & too complicated
- Adding a hook to get a specific HBaseClient implementation, that would add 
the sleep when necessary. That's in the v6.

On the long term, I think the best place to add test hooks is NetUtils, but 
this class is made of static methods and is in the hadoop.net package, not 
HBase.


I'm not a big fan of adding these new tests to the main build, but it can now 
be done.

> Powering down the server host holding the .META. table causes HBase Client to 
> take excessively long to recover and connect to reassigned .META. table
> -
>
> Key: HBASE-6364
> URL: https://issues.apache.org/jira/browse/HBASE-6364
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Suraj Varma
>Assignee: nkeywal
>  Labels: client
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 
> 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
> 6364.v5.withtests.patch, stacktrace.txt
>
>
> When a server host with a Region Server holding the .META. table is powered 
> down on a live cluster, while the HBase cluster itself detects and reassigns 
> the .META. table, connected HBase Client's take an excessively long time to 
> detect this and re-discover the reassigned .META. 
> Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
> value (default is 20s leading to 35 minute recovery time; we were able to get 
> acceptable results with 100ms getting a 3 minute recovery) 
> This was found during some hardware failure testing scenarios. 
> Test Case:
> 1) Apply load via client app on HBase cluster for several minutes
> 2) Power down the region server holding the .META. server (i.e. power off ... 
> and keep it off)
> 3) Measure how long it takes for cluster to reassign META table and for 
> client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
> and DN on that host).
> Observation:
> 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
> recover (i.e. for the thread count to go back to normal) - no client calls 
> are serviced - they just back up on a synchronized method (see #2 below)
> 2) All the client app threads queue up behind the 
> oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
> After taking several thread dumps we found that the thread within this 
> synchronized method was blocked on  NetUtils.connect(this.socket, 
> remoteId.getAddress(), getSocketTimeout(conf));
> The client thread that gets the synchronized lock would try to connect to the 
> dead RS (till socket times out after 20s), retries, and then the next thread 
> gets in and so forth in a serial manner.
> Workaround:
> ---
> Default ipc.socket.timeout is set to 20s. We dropped this to a low number 
> (1000 ms,  100 ms, etc) on the client side hbase-site.xml. With this setting, 
> the client threads recovered in a couple of minutes by failing fast and 
> re-discovering the .META. table on a reassigned RS.
> Assumption: This ipc.socket.timeout is only ever used during the initial 
> "HConnection" setup via the NetUtils.connect and should only ever be used 
> when connectivity to a region server is lost and needs to be re-established. 
> i.e it does not affect the normal "RPC" actiivity as this is just the connect 
> timeout.
> During RS GC periods, any _new_ clients trying to connect will fail and will 
> require .META. table re-lookups.
> This above timeout workaround is only for the HBase client side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2012-08-09 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6364:
---

Status: Open  (was: Patch Available)

> Powering down the server host holding the .META. table causes HBase Client to 
> take excessively long to recover and connect to reassigned .META. table
> -
>
> Key: HBASE-6364
> URL: https://issues.apache.org/jira/browse/HBASE-6364
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.94.0, 0.92.1, 0.90.6
>Reporter: Suraj Varma
>Assignee: nkeywal
>  Labels: client
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 
> 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
> 6364.v5.withtests.patch, stacktrace.txt
>
>
> When a server host with a Region Server holding the .META. table is powered 
> down on a live cluster, while the HBase cluster itself detects and reassigns 
> the .META. table, connected HBase Client's take an excessively long time to 
> detect this and re-discover the reassigned .META. 
> Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
> value (default is 20s leading to 35 minute recovery time; we were able to get 
> acceptable results with 100ms getting a 3 minute recovery) 
> This was found during some hardware failure testing scenarios. 
> Test Case:
> 1) Apply load via client app on HBase cluster for several minutes
> 2) Power down the region server holding the .META. server (i.e. power off ... 
> and keep it off)
> 3) Measure how long it takes for cluster to reassign META table and for 
> client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
> and DN on that host).
> Observation:
> 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
> recover (i.e. for the thread count to go back to normal) - no client calls 
> are serviced - they just back up on a synchronized method (see #2 below)
> 2) All the client app threads queue up behind the 
> oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
> After taking several thread dumps we found that the thread within this 
> synchronized method was blocked on  NetUtils.connect(this.socket, 
> remoteId.getAddress(), getSocketTimeout(conf));
> The client thread that gets the synchronized lock would try to connect to the 
> dead RS (till socket times out after 20s), retries, and then the next thread 
> gets in and so forth in a serial manner.
> Workaround:
> ---
> Default ipc.socket.timeout is set to 20s. We dropped this to a low number 
> (1000 ms,  100 ms, etc) on the client side hbase-site.xml. With this setting, 
> the client threads recovered in a couple of minutes by failing fast and 
> re-discovering the .META. table on a reassigned RS.
> Assumption: This ipc.socket.timeout is only ever used during the initial 
> "HConnection" setup via the NetUtils.connect and should only ever be used 
> when connectivity to a region server is lost and needs to be re-established. 
> i.e it does not affect the normal "RPC" actiivity as this is just the connect 
> timeout.
> During RS GC periods, any _new_ clients trying to connect will fail and will 
> require .META. table re-lookups.
> This above timeout workaround is only for the HBase client side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2012-08-09 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6364:
---

Attachment: 6364.v6.withtests.patch

> Powering down the server host holding the .META. table causes HBase Client to 
> take excessively long to recover and connect to reassigned .META. table
> -
>
> Key: HBASE-6364
> URL: https://issues.apache.org/jira/browse/HBASE-6364
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Suraj Varma
>Assignee: nkeywal
>  Labels: client
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 
> 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
> 6364.v5.withtests.patch, 6364.v6.withtests.patch, stacktrace.txt
>
>
> When a server host with a Region Server holding the .META. table is powered 
> down on a live cluster, while the HBase cluster itself detects and reassigns 
> the .META. table, connected HBase Client's take an excessively long time to 
> detect this and re-discover the reassigned .META. 
> Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
> value (default is 20s leading to 35 minute recovery time; we were able to get 
> acceptable results with 100ms getting a 3 minute recovery) 
> This was found during some hardware failure testing scenarios. 
> Test Case:
> 1) Apply load via client app on HBase cluster for several minutes
> 2) Power down the region server holding the .META. server (i.e. power off ... 
> and keep it off)
> 3) Measure how long it takes for cluster to reassign META table and for 
> client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
> and DN on that host).
> Observation:
> 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
> recover (i.e. for the thread count to go back to normal) - no client calls 
> are serviced - they just back up on a synchronized method (see #2 below)
> 2) All the client app threads queue up behind the 
> oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
> After taking several thread dumps we found that the thread within this 
> synchronized method was blocked on  NetUtils.connect(this.socket, 
> remoteId.getAddress(), getSocketTimeout(conf));
> The client thread that gets the synchronized lock would try to connect to the 
> dead RS (till socket times out after 20s), retries, and then the next thread 
> gets in and so forth in a serial manner.
> Workaround:
> ---
> Default ipc.socket.timeout is set to 20s. We dropped this to a low number 
> (1000 ms,  100 ms, etc) on the client side hbase-site.xml. With this setting, 
> the client threads recovered in a couple of minutes by failing fast and 
> re-discovering the .META. table on a reassigned RS.
> Assumption: This ipc.socket.timeout is only ever used during the initial 
> "HConnection" setup via the NetUtils.connect and should only ever be used 
> when connectivity to a region server is lost and needs to be re-established. 
> i.e it does not affect the normal "RPC" actiivity as this is just the connect 
> timeout.
> During RS GC periods, any _new_ clients trying to connect will fail and will 
> require .META. table re-lookups.
> This above timeout workaround is only for the HBase client side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2012-08-09 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6364:
---

Attachment: 6364.v6.patch

> Powering down the server host holding the .META. table causes HBase Client to 
> take excessively long to recover and connect to reassigned .META. table
> -
>
> Key: HBASE-6364
> URL: https://issues.apache.org/jira/browse/HBASE-6364
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Suraj Varma
>Assignee: nkeywal
>  Labels: client
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 
> 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
> 6364.v5.withtests.patch, 6364.v6.patch, 6364.v6.withtests.patch, 
> stacktrace.txt
>
>
> When a server host with a Region Server holding the .META. table is powered 
> down on a live cluster, while the HBase cluster itself detects and reassigns 
> the .META. table, connected HBase Client's take an excessively long time to 
> detect this and re-discover the reassigned .META. 
> Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
> value (default is 20s leading to 35 minute recovery time; we were able to get 
> acceptable results with 100ms getting a 3 minute recovery) 
> This was found during some hardware failure testing scenarios. 
> Test Case:
> 1) Apply load via client app on HBase cluster for several minutes
> 2) Power down the region server holding the .META. server (i.e. power off ... 
> and keep it off)
> 3) Measure how long it takes for cluster to reassign META table and for 
> client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
> and DN on that host).
> Observation:
> 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
> recover (i.e. for the thread count to go back to normal) - no client calls 
> are serviced - they just back up on a synchronized method (see #2 below)
> 2) All the client app threads queue up behind the 
> oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
> After taking several thread dumps we found that the thread within this 
> synchronized method was blocked on  NetUtils.connect(this.socket, 
> remoteId.getAddress(), getSocketTimeout(conf));
> The client thread that gets the synchronized lock would try to connect to the 
> dead RS (till socket times out after 20s), retries, and then the next thread 
> gets in and so forth in a serial manner.
> Workaround:
> ---
> Default ipc.socket.timeout is set to 20s. We dropped this to a low number 
> (1000 ms,  100 ms, etc) on the client side hbase-site.xml. With this setting, 
> the client threads recovered in a couple of minutes by failing fast and 
> re-discovering the .META. table on a reassigned RS.
> Assumption: This ipc.socket.timeout is only ever used during the initial 
> "HConnection" setup via the NetUtils.connect and should only ever be used 
> when connectivity to a region server is lost and needs to be re-established. 
> i.e it does not affect the normal "RPC" actiivity as this is just the connect 
> timeout.
> During RS GC periods, any _new_ clients trying to connect will fail and will 
> require .META. table re-lookups.
> This above timeout workaround is only for the HBase client side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2012-08-09 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6364:
---

Status: Patch Available  (was: Open)

> Powering down the server host holding the .META. table causes HBase Client to 
> take excessively long to recover and connect to reassigned .META. table
> -
>
> Key: HBASE-6364
> URL: https://issues.apache.org/jira/browse/HBASE-6364
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.94.0, 0.92.1, 0.90.6
>Reporter: Suraj Varma
>Assignee: nkeywal
>  Labels: client
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 
> 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
> 6364.v5.withtests.patch, 6364.v6.patch, 6364.v6.withtests.patch, 
> stacktrace.txt
>
>
> When a server host with a Region Server holding the .META. table is powered 
> down on a live cluster, while the HBase cluster itself detects and reassigns 
> the .META. table, connected HBase Client's take an excessively long time to 
> detect this and re-discover the reassigned .META. 
> Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
> value (default is 20s leading to 35 minute recovery time; we were able to get 
> acceptable results with 100ms getting a 3 minute recovery) 
> This was found during some hardware failure testing scenarios. 
> Test Case:
> 1) Apply load via client app on HBase cluster for several minutes
> 2) Power down the region server holding the .META. server (i.e. power off ... 
> and keep it off)
> 3) Measure how long it takes for cluster to reassign META table and for 
> client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
> and DN on that host).
> Observation:
> 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
> recover (i.e. for the thread count to go back to normal) - no client calls 
> are serviced - they just back up on a synchronized method (see #2 below)
> 2) All the client app threads queue up behind the 
> oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
> After taking several thread dumps we found that the thread within this 
> synchronized method was blocked on  NetUtils.connect(this.socket, 
> remoteId.getAddress(), getSocketTimeout(conf));
> The client thread that gets the synchronized lock would try to connect to the 
> dead RS (till socket times out after 20s), retries, and then the next thread 
> gets in and so forth in a serial manner.
> Workaround:
> ---
> Default ipc.socket.timeout is set to 20s. We dropped this to a low number 
> (1000 ms,  100 ms, etc) on the client side hbase-site.xml. With this setting, 
> the client threads recovered in a couple of minutes by failing fast and 
> re-discovering the .META. table on a reassigned RS.
> Assumption: This ipc.socket.timeout is only ever used during the initial 
> "HConnection" setup via the NetUtils.connect and should only ever be used 
> when connectivity to a region server is lost and needs to be re-established. 
> i.e it does not affect the normal "RPC" actiivity as this is just the connect 
> timeout.
> During RS GC periods, any _new_ clients trying to connect will fail and will 
> require .META. table re-lookups.
> This above timeout workaround is only for the HBase client side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431796#comment-13431796
 ] 

Zhihong Ted Yu commented on HBASE-6529:
---

What if srcFs is an HFileSystem and fs is DistributedFileSystem ?

> With HFile v2, the region server will always perform an extra copy of source 
> files
> --
>
> Key: HBASE-6529
> URL: https://issues.apache.org/jira/browse/HBASE-6529
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6529.diff
>
>
> With HFile v2 implementation in HBase 0.94 & 0.96, the region server will use 
> HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
> Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
> {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
> Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2012-08-09 Thread Jason Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431799#comment-13431799
 ] 

Jason Dai commented on HBASE-6529:
--

bq. What if srcFs is an HFileSystem and fs is DistributedFileSystem ?
if fs is DistributedFileSystem, then the RegionServer does not support HFile 
v2, yes? 


> With HFile v2, the region server will always perform an extra copy of source 
> files
> --
>
> Key: HBASE-6529
> URL: https://issues.apache.org/jira/browse/HBASE-6529
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6529.diff
>
>
> With HFile v2 implementation in HBase 0.94 & 0.96, the region server will use 
> HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
> Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
> {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
> Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2012-08-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431800#comment-13431800
 ] 

Hadoop QA commented on HBASE-6364:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12540017/6364.v6.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestLocalHBaseCluster
  org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2538//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2538//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2538//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2538//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2538//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2538//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2538//console

This message is automatically generated.

> Powering down the server host holding the .META. table causes HBase Client to 
> take excessively long to recover and connect to reassigned .META. table
> -
>
> Key: HBASE-6364
> URL: https://issues.apache.org/jira/browse/HBASE-6364
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Suraj Varma
>Assignee: nkeywal
>  Labels: client
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 
> 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
> 6364.v5.withtests.patch, 6364.v6.patch, 6364.v6.withtests.patch, 
> stacktrace.txt
>
>
> When a server host with a Region Server holding the .META. table is powered 
> down on a live cluster, while the HBase cluster itself detects and reassigns 
> the .META. table, connected HBase Client's take an excessively long time to 
> detect this and re-discover the reassigned .META. 
> Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
> value (default is 20s leading to 35 minute recovery time; we were able to get 
> acceptable results with 100ms getting a 3 minute recovery) 
> This was found during some hardware failure testing scenarios. 
> Test Case:
> 1) Apply load via client app on HBase cluster for several minutes
> 2) Power down the region server holding the .META. server (i.e. power off ... 
> and keep it off)
> 3) Measure how long it takes for cluster to reassign META table and for 
> client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
> and DN on that host).
> Observation:
> 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
> recover (i.e. for the thread count to go back to normal) - no client calls 
> are serviced - they just back up on a synchronized method (see #2 below)
> 2) All the client app threads queue up behind the 
> oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
> After taking several thread dumps we found that the thread within this 
> synchronized method was blocked on  NetUtils.connect(this.socket, 
> remoteId.getAddress(), getSocketTimeout(conf));
> The client thread that gets the synchronized lock would try to connect to the 
> dead RS (till socket times out after 20s), retries, and then the next thread 
> gets in and so forth 

[jira] [Commented] (HBASE-6496) Example ZK based scan policy

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431801#comment-13431801
 ] 

Zhihong Ted Yu commented on HBASE-6496:
---

@Lars:
The current way is fine too.

> Example ZK based scan policy
> 
>
> Key: HBASE-6496
> URL: https://issues.apache.org/jira/browse/HBASE-6496
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6496-v2.txt, 6496.txt
>
>
> Provide an example of a RegionServer that listens to a ZK node to learn about 
> what set of KVs can safely be deleted during a compaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431837#comment-13431837
 ] 

Zhihong Ted Yu commented on HBASE-6529:
---

I think this JIRA is related, if not identical, to HBASE-6358.
The root cause is that we cannot rely on hadoop semantics for judging equality 
between HFileSystem and DistributedFileSystem.

We should not tie the solution to HFileV2 which appears only in a comment in 
Store.java.

> With HFile v2, the region server will always perform an extra copy of source 
> files
> --
>
> Key: HBASE-6529
> URL: https://issues.apache.org/jira/browse/HBASE-6529
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6529.diff
>
>
> With HFile v2 implementation in HBase 0.94 & 0.96, the region server will use 
> HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
> Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
> {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
> Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6537) Balancer compete with disable table will lead to cluster inconsistent

2012-08-09 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431847#comment-13431847
 ] 

rajeshbabu commented on HBASE-6537:
---

@Zhou wenjian
clearing from online regions before removing from RIT may not solve the problem 
completely.
Its better to check whether table is in disabling and disabled also in case of 
NotServingRegionException.

May be this is applicable for trunk also because there also we are checking 
table is in disabling only.
Please correct me if i am wrong.

> Balancer compete with disable table will lead to cluster inconsistent 
> --
>
> Key: HBASE-6537
> URL: https://issues.apache.org/jira/browse/HBASE-6537
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0
>Reporter: Zhou wenjian
> Fix For: 0.94.1
>
> Attachments: HBASE-6537-94.patch
>
>
> Appear in 94. trunk is ok for the issue
> Balancer will collect the regionplans to move(unassign and then assign).
> before unassign, disable table appears, 
> after close the region in rs, master will delete the znode, romove region 
> from RIT,
> and then clean the region from the online regions.
> During romoving region from RIT and cleaning out the region from the online 
> regions. 
> balancer begins to unassign, it will get a NotServingRegionException and if 
> the table is disabling, it will deal with the state in master and delete the 
> znode . However the table is disabled now, so the RIT and znode will remain. 
> TimeoutMonitor draws a blank on it.
> It will hold back enabling the table or balancer unless restart

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2012-08-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431866#comment-13431866
 ] 

stack commented on HBASE-6364:
--

We already have a class named DeadServers here: 
./hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java  
Maybe name this something else?

Do you think the deadservers will hold reference to the original backing Map?  
If so, if we keep adding, are we adding to backing Map or to the tailMap?  I'm 
talking about here:

+deadServers = deadServers.tailMap(now);

I'm afraid we will be retaining reference to backing Map and it will grow 
without bound?  Is that possible?

So, if default timeout for items in list is two seconds and socket timeout is 
20 seconds, will items timeout in the dead list before we ever make use of it?

You should fix the formatting adding spaces around '+' etc., so it has 
formatting like the rest of the code in this class... or is that how its done 
elsewhere in this class?   If so, fine.

Your addition where we can override HBaseClient looks generally useful.

Use EnvironmentEdge instead of System.currentMillis...

+final long start = System.currentTimeMillis();

Test looks good.

Is the HBaseRecoveryTestingUtility generally useful do you think?  Maybe we 
don't add test but add this? (This class needs class comment explaining what 
goodies it has).

Good stuff.


> Powering down the server host holding the .META. table causes HBase Client to 
> take excessively long to recover and connect to reassigned .META. table
> -
>
> Key: HBASE-6364
> URL: https://issues.apache.org/jira/browse/HBASE-6364
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Suraj Varma
>Assignee: nkeywal
>  Labels: client
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 
> 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
> 6364.v5.withtests.patch, 6364.v6.patch, 6364.v6.withtests.patch, 
> stacktrace.txt
>
>
> When a server host with a Region Server holding the .META. table is powered 
> down on a live cluster, while the HBase cluster itself detects and reassigns 
> the .META. table, connected HBase Client's take an excessively long time to 
> detect this and re-discover the reassigned .META. 
> Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
> value (default is 20s leading to 35 minute recovery time; we were able to get 
> acceptable results with 100ms getting a 3 minute recovery) 
> This was found during some hardware failure testing scenarios. 
> Test Case:
> 1) Apply load via client app on HBase cluster for several minutes
> 2) Power down the region server holding the .META. server (i.e. power off ... 
> and keep it off)
> 3) Measure how long it takes for cluster to reassign META table and for 
> client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
> and DN on that host).
> Observation:
> 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
> recover (i.e. for the thread count to go back to normal) - no client calls 
> are serviced - they just back up on a synchronized method (see #2 below)
> 2) All the client app threads queue up behind the 
> oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
> After taking several thread dumps we found that the thread within this 
> synchronized method was blocked on  NetUtils.connect(this.socket, 
> remoteId.getAddress(), getSocketTimeout(conf));
> The client thread that gets the synchronized lock would try to connect to the 
> dead RS (till socket times out after 20s), retries, and then the next thread 
> gets in and so forth in a serial manner.
> Workaround:
> ---
> Default ipc.socket.timeout is set to 20s. We dropped this to a low number 
> (1000 ms,  100 ms, etc) on the client side hbase-site.xml. With this setting, 
> the client threads recovered in a couple of minutes by failing fast and 
> re-discovering the .META. table on a reassigned RS.
> Assumption: This ipc.socket.timeout is only ever used during the initial 
> "HConnection" setup via the NetUtils.connect and should only ever be used 
> when connectivity to a region server is lost and needs to be re-established. 
> i.e it does not affect the normal "RPC" actiivity as this is just the connect 
> timeout.
> During RS GC periods, any _new_ clients trying to connect will fail and will 
> require .META. table re-lookups.
> This above timeout workaround is only for the HBase client side.

--
This message is automatically generated by 

[jira] [Updated] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-6437:
--

Attachment: HBASE-6437_trunk.patch

Patch for trunk. If its ok I will prepare for other versions also.

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_trunk.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu reassigned HBASE-6437:
-

Assignee: rajeshbabu  (was: ramkrishna.s.vasudevan)

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_trunk.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-6437:
--

Status: Patch Available  (was: Open)

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_trunk.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies

2012-08-09 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-6438:
--

Attachment: HBASE-6438_trunk.patch

Patch for trunk. Please review and provide your comments/suggestions.

> RegionAlreadyInTransitionException needs to give more info to avoid 
> assignment inconsistencies
> --
>
> Key: HBASE-6438
> URL: https://issues.apache.org/jira/browse/HBASE-6438
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Attachments: HBASE-6438_trunk.patch
>
>
> Seeing some of the recent issues in region assignment, 
> RegionAlreadyInTransitionException is one reason after which the region 
> assignment may or may not happen(in the sense we need to wait for the TM to 
> assign).
> In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on 
> master restart.
> Consider the following case, due to some reason like master restart or 
> external assign call, we try to assign a region that is already getting 
> opened in a RS.
> Now the next call to assign has already changed the state of the znode and so 
> the current assign that is going on the RS is affected and it fails.  The 
> second assignment that started also fails getting RAITE exception.  Finally 
> both assignments not carrying on.  Idea is to find whether any such RAITE 
> exception can be retried or not.
> Here again we have following cases like where
> -> The znode is yet to transitioned from OFFLINE to OPENING in RS
> -> RS may be in the step of openRegion.
> -> RS may be trying to transition OPENING to OPENED.
> -> RS is yet to add to online regions in the RS side.
> Here in openRegion() and updateMeta() any failures we are moving the znode to 
> FAILED_OPEN.  So in these cases getting an RAITE should be ok.  But in other 
> cases the assignment is stopped.
> The idea is to just add the current state of the region assignment in the RIT 
> map in the RS side and using that info we can determine whether the 
> assignment can be retried or not on getting an RAITE.
> Considering the current work going on in AM, pls do share if this is needed 
> atleast in the 0.92/0.94 versions?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies

2012-08-09 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu reassigned HBASE-6438:
-

Assignee: rajeshbabu  (was: ramkrishna.s.vasudevan)

> RegionAlreadyInTransitionException needs to give more info to avoid 
> assignment inconsistencies
> --
>
> Key: HBASE-6438
> URL: https://issues.apache.org/jira/browse/HBASE-6438
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Attachments: HBASE-6438_trunk.patch
>
>
> Seeing some of the recent issues in region assignment, 
> RegionAlreadyInTransitionException is one reason after which the region 
> assignment may or may not happen(in the sense we need to wait for the TM to 
> assign).
> In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on 
> master restart.
> Consider the following case, due to some reason like master restart or 
> external assign call, we try to assign a region that is already getting 
> opened in a RS.
> Now the next call to assign has already changed the state of the znode and so 
> the current assign that is going on the RS is affected and it fails.  The 
> second assignment that started also fails getting RAITE exception.  Finally 
> both assignments not carrying on.  Idea is to find whether any such RAITE 
> exception can be retried or not.
> Here again we have following cases like where
> -> The znode is yet to transitioned from OFFLINE to OPENING in RS
> -> RS may be in the step of openRegion.
> -> RS may be trying to transition OPENING to OPENED.
> -> RS is yet to add to online regions in the RS side.
> Here in openRegion() and updateMeta() any failures we are moving the znode to 
> FAILED_OPEN.  So in these cases getting an RAITE should be ok.  But in other 
> cases the assignment is stopped.
> The idea is to just add the current state of the region assignment in the RIT 
> map in the RS side and using that info we can determine whether the 
> assignment can be retried or not on getting an RAITE.
> Considering the current work going on in AM, pls do share if this is needed 
> atleast in the 0.92/0.94 versions?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2012-08-09 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431893#comment-13431893
 ] 

nkeywal commented on HBASE-6364:


bq. We already have a class named DeadServers here: 
./hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java 
Maybe name this something else?
You're right. Ok.

bq. I'm afraid we will be retaining reference to backing Map and it will grow 
without bound? Is that possible?
It's even probable :-). I need to fix this.

bq. So, if default timeout for items in list is two seconds and socket timeout 
is 20 seconds, will items timeout in the dead list before we ever make use of 
it?

No, because it's added after the timeout. So:
t0: connect starts, will wait 20s
t <19: all threads get in the queue because of the synchronized
t20: timeout; added to the dead list, synchronized lock freed
t21: all threads gets in and gets out because of the dead server list
t22: if a new thread comes in with the wrong server it will wait again

bq. Use EnvironmentEdge instead of System.currentMillis...
Even in an unit test? I didn't know. Ok.

bq. You should fix the formatting adding spaces around '+' etc., so it has 
formatting like the rest of the code in this class... or is that how its done 
elsewhere in this class? If so, fine.
I will recheck. There are some '+' like this in HBaseClient already :-). But I 
will add spaces to mines.


bq. Is the HBaseRecoveryTestingUtility generally useful do you think? Maybe we 
don't add test but add this? (This class needs class comment explaining what 
goodies it has).
Yes, I think so. It helps to write more readable failure tests. It still has 
rough edges, and may be bugs. Also, some functions should be in 
HBaseTestingUtility. It will be ready soon, but it's better to wait a little 
before committing...

Thanks for the feedback.

> Powering down the server host holding the .META. table causes HBase Client to 
> take excessively long to recover and connect to reassigned .META. table
> -
>
> Key: HBASE-6364
> URL: https://issues.apache.org/jira/browse/HBASE-6364
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Suraj Varma
>Assignee: nkeywal
>  Labels: client
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 
> 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
> 6364.v5.withtests.patch, 6364.v6.patch, 6364.v6.withtests.patch, 
> stacktrace.txt
>
>
> When a server host with a Region Server holding the .META. table is powered 
> down on a live cluster, while the HBase cluster itself detects and reassigns 
> the .META. table, connected HBase Client's take an excessively long time to 
> detect this and re-discover the reassigned .META. 
> Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
> value (default is 20s leading to 35 minute recovery time; we were able to get 
> acceptable results with 100ms getting a 3 minute recovery) 
> This was found during some hardware failure testing scenarios. 
> Test Case:
> 1) Apply load via client app on HBase cluster for several minutes
> 2) Power down the region server holding the .META. server (i.e. power off ... 
> and keep it off)
> 3) Measure how long it takes for cluster to reassign META table and for 
> client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
> and DN on that host).
> Observation:
> 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
> recover (i.e. for the thread count to go back to normal) - no client calls 
> are serviced - they just back up on a synchronized method (see #2 below)
> 2) All the client app threads queue up behind the 
> oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
> After taking several thread dumps we found that the thread within this 
> synchronized method was blocked on  NetUtils.connect(this.socket, 
> remoteId.getAddress(), getSocketTimeout(conf));
> The client thread that gets the synchronized lock would try to connect to the 
> dead RS (till socket times out after 20s), retries, and then the next thread 
> gets in and so forth in a serial manner.
> Workaround:
> ---
> Default ipc.socket.timeout is set to 20s. We dropped this to a low number 
> (1000 ms,  100 ms, etc) on the client side hbase-site.xml. With this setting, 
> the client threads recovered in a couple of minutes by failing fast and 
> re-discovering the .META. table on a reassigned RS.
> Assumption: This ipc.socket.timeout is only ever used during the initial 
> "HConnection" s

[jira] [Commented] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2012-08-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431905#comment-13431905
 ] 

stack commented on HBASE-6529:
--

bq. What if srcFs is an HFileSystem and fs is DistributedFileSystem ?

How could that happen?   HFileSystem is the fs used by the running 
HRegionServer.  The srcFs is a String passed into the server... You'd have to 
do something perverse having that be an HFileSystem specification?

To guard against such a thing happening, I suppose you could do the same 
getBackingFs if srcFs is a HFileSystem.

bq. The root cause is that we cannot rely on hadoop semantics for judging 
equality between HFileSystem and DistributedFileSystem.

How so?

HBase-6358 is about different FSs.  This issue seems to be about issue where 
the filesystems are the same only the one used by the regionserver is wrapped 
in an HFileSystem ... so the equals fails.

> With HFile v2, the region server will always perform an extra copy of source 
> files
> --
>
> Key: HBASE-6529
> URL: https://issues.apache.org/jira/browse/HBASE-6529
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6529.diff
>
>
> With HFile v2 implementation in HBase 0.94 & 0.96, the region server will use 
> HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
> Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
> {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
> Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6321) ReplicationSource dies reading the peer's id

2012-08-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431913#comment-13431913
 ] 

stack commented on HBASE-6321:
--

+1 on patch.  Does running of TestReplication/Source/Manager test this patch?

> ReplicationSource dies reading the peer's id
> 
>
> Key: HBASE-6321
> URL: https://issues.apache.org/jira/browse/HBASE-6321
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6321-0.94.patch
>
>
> This is what I saw:
> {noformat}
> 2012-07-01 05:04:01,638 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing 
> source 8 because an error occurred: Could not read peer's cluster id
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /va1-backup/hbaseid
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259)
> at 
> org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253)
> {noformat}
> The session should just be reopened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431921#comment-13431921
 ] 

Zhihong Ted Yu commented on HBASE-6529:
---

bq.  The srcFs is a String passed into the server
srcPathStr is a String passed into the method. srcFs is the FileSystem 
corresponding to srcPathStr.

bq. you could do the same getBackingFs if srcFs is a HFileSystem.
Exactly. I suggest separating the logic of calling getBackingFs() and 
performing equality check into a new method in, e.g., FSUtils

bq. HBase-6358 is about different FSs
Fixes for both JIRAs center around the following line in Store.java:
{code}
-if (!srcFs.equals(fs)) {
{code}
Before HFileSystem came into play, the above check works adequately.
As identified above, one filesystem is wrapped in an HFileSystem, defeating the 
original intent of the check.

Please correct me if I am wrong.

> With HFile v2, the region server will always perform an extra copy of source 
> files
> --
>
> Key: HBASE-6529
> URL: https://issues.apache.org/jira/browse/HBASE-6529
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6529.diff
>
>
> With HFile v2 implementation in HBase 0.94 & 0.96, the region server will use 
> HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
> Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
> {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
> Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6308) Coprocessors should be loaded in a custom ClassLoader to prevent dependency conflicts with HBase

2012-08-09 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431936#comment-13431936
 ] 

Andrew Purtell commented on HBASE-6308:
---

I'll take this. Will add a simple test and commit.

> Coprocessors should be loaded in a custom ClassLoader to prevent dependency 
> conflicts with HBase
> 
>
> Key: HBASE-6308
> URL: https://issues.apache.org/jira/browse/HBASE-6308
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Affects Versions: 0.92.1, 0.94.0
>Reporter: James Baldassari
>Assignee: Andrew Purtell
> Fix For: 0.96.0
>
> Attachments: 6308-v2.txt, HBASE-6308-0.92.patch, 
> HBASE-6308-trunk.patch
>
>
> Currently each coprocessor is loaded with a URLClassLoader that puts the 
> coprocessor's jar at the beginning of the classpath.  The URLClassLoader 
> always tries to load classes from the parent ClassLoader first and only 
> attempts to load from its own configured URLs if the class was not found by 
> the parent.  This class loading behavior can be problematic for coprocessors 
> that have common dependencies with HBase but whose versions are incompatible. 
>  For example, I have a coprocessor that depends on a different version of 
> Avro than the version used by HBase.  The current class loading behavior 
> results in NoSuchMethodErrors in my coprocessor because some Avro classes 
> have already been loaded by HBase, and the ClassLoader for my coprocessor 
> picks up HBase's loaded classes first.
> My proposed solution to this problem is to use a custom ClassLoader when 
> instantiating coprocessor instances.  This custom ClassLoader would always 
> attempt to load classes from the coprocessor's jar first and would only 
> delegate to the parent ClassLoader if the class were not found in the 
> coprocessor jar.  However, certain classes would need to be exempt from this 
> behavior.  As an example, if the Copcoessor interface were loaded by both the 
> region server's ClassLoader and the coprocessor's custom ClassLoader, then 
> the region server would get a ClassCastException when attempting to cast the 
> coprocessor instance to the Coprocessor interface.  This problem can be 
> avoided by defining a set of class name prefixes that would be exempt from 
> loading by the custom ClassLoader.  When loading a class, if the class starts 
> with any of these prefixes (e.g. "org.apache.hadoop"), then the ClassLoader 
> would delegate immediately to the parent ClassLoader.
> I've already implemented a patch to provide this functionality which I'll 
> attach shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6308) Coprocessors should be loaded in a custom ClassLoader to prevent dependency conflicts with HBase

2012-08-09 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned HBASE-6308:
-

Assignee: Andrew Purtell

> Coprocessors should be loaded in a custom ClassLoader to prevent dependency 
> conflicts with HBase
> 
>
> Key: HBASE-6308
> URL: https://issues.apache.org/jira/browse/HBASE-6308
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Affects Versions: 0.92.1, 0.94.0
>Reporter: James Baldassari
>Assignee: Andrew Purtell
> Fix For: 0.96.0
>
> Attachments: 6308-v2.txt, HBASE-6308-0.92.patch, 
> HBASE-6308-trunk.patch
>
>
> Currently each coprocessor is loaded with a URLClassLoader that puts the 
> coprocessor's jar at the beginning of the classpath.  The URLClassLoader 
> always tries to load classes from the parent ClassLoader first and only 
> attempts to load from its own configured URLs if the class was not found by 
> the parent.  This class loading behavior can be problematic for coprocessors 
> that have common dependencies with HBase but whose versions are incompatible. 
>  For example, I have a coprocessor that depends on a different version of 
> Avro than the version used by HBase.  The current class loading behavior 
> results in NoSuchMethodErrors in my coprocessor because some Avro classes 
> have already been loaded by HBase, and the ClassLoader for my coprocessor 
> picks up HBase's loaded classes first.
> My proposed solution to this problem is to use a custom ClassLoader when 
> instantiating coprocessor instances.  This custom ClassLoader would always 
> attempt to load classes from the coprocessor's jar first and would only 
> delegate to the parent ClassLoader if the class were not found in the 
> coprocessor jar.  However, certain classes would need to be exempt from this 
> behavior.  As an example, if the Copcoessor interface were loaded by both the 
> region server's ClassLoader and the coprocessor's custom ClassLoader, then 
> the region server would get a ClassCastException when attempting to cast the 
> coprocessor instance to the Coprocessor interface.  This problem can be 
> avoided by defining a set of class name prefixes that would be exempt from 
> loading by the custom ClassLoader.  When loading a class, if the class starts 
> with any of these prefixes (e.g. "org.apache.hadoop"), then the ClassLoader 
> would delegate immediately to the parent ClassLoader.
> I've already implemented a patch to provide this functionality which I'll 
> attach shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431937#comment-13431937
 ] 

Zhihong Ted Yu commented on HBASE-6437:
---

{code}
+// if master not initialized, don't run balancer.
+if (!this.initialized) {
+  LOG.debug("Master not yet initialized to run balancer.");
{code}
If you put the comment in the log, it would be clearer.
+1 overall.

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_trunk.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431942#comment-13431942
 ] 

Hadoop QA commented on HBASE-6437:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12540040/HBASE-6437_trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2539//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2539//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2539//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2539//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2539//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2539//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2539//console

This message is automatically generated.

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_trunk.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2012-08-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431950#comment-13431950
 ] 

stack commented on HBASE-6529:
--

You do not address my comments asking how this could happen at all.  How could 
'srcPath.getFileSystem(conf);' return an HFileSystem instance?

You also make a statement '...we cannot rely on hadoop semantics for judging 
equality between HFileSystem and DistributedFileSystem...'  I'd be interested 
hearing more on how the above is so. 

> With HFile v2, the region server will always perform an extra copy of source 
> files
> --
>
> Key: HBASE-6529
> URL: https://issues.apache.org/jira/browse/HBASE-6529
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6529.diff
>
>
> With HFile v2 implementation in HBase 0.94 & 0.96, the region server will use 
> HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
> Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
> {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
> Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6526) Document FuzzyRowFilter

2012-08-09 Thread Alex Baranau (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Baranau reassigned HBASE-6526:
---

Assignee: Alex Baranau

> Document FuzzyRowFilter
> ---
>
> Key: HBASE-6526
> URL: https://issues.apache.org/jira/browse/HBASE-6526
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zhihong Ted Yu
>Assignee: Alex Baranau
>
> We should document the usage of FuzzyRowFilter in HBase book / manual

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6526) Document FuzzyRowFilter

2012-08-09 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431966#comment-13431966
 ] 

Alex Baranau commented on HBASE-6526:
-

As far as I understand, I need to add documentation in src/docbkx/book.xml, 
right?
Any special guides, format guides, etc.? Or I will just do that by analogy.

Also found filters mentioned in external_apis.xml too. Should I add 
documentation there too?

Thanx 

> Document FuzzyRowFilter
> ---
>
> Key: HBASE-6526
> URL: https://issues.apache.org/jira/browse/HBASE-6526
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zhihong Ted Yu
>Assignee: Alex Baranau
>
> We should document the usage of FuzzyRowFilter in HBase book / manual

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-6437:
--

Attachment: HBASE-6437_94.patch
HBASE-6437_92.patch

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_92.patch, HBASE-6437_94.patch, 
> HBASE-6437_trunk.patch, HBASE-6437_trunk_2.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-6437:
--

Attachment: HBASE-6437_trunk_2.patch

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_92.patch, HBASE-6437_94.patch, 
> HBASE-6437_trunk.patch, HBASE-6437_trunk_2.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431976#comment-13431976
 ] 

rajeshbabu commented on HBASE-6437:
---

@Ted
Updated. Thanks for review.

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_92.patch, HBASE-6437_94.patch, 
> HBASE-6437_trunk.patch, HBASE-6437_trunk_2.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-6165:
---

Attachment: HBase-6165-v1.patch

I hit this problem while testing a long running replication setup. All priority 
handlers were blocked by replicationLog method, and cluster became unresponsive.

Attached is a patch which does the following:
a) Add a differnt QOS level, customQOS. Methods with this attribute will be 
processed by a new set of handlers.
b) Adds customPriorityHandlers, a new set of handlers in Regionserver.

ReplicationSink#replicateLogEntries uses this attribute. 

Testing: Jenkins is green. Have a long running replication setup, and its up 
for a few days.

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431984#comment-13431984
 ] 

Zhihong Ted Yu commented on HBASE-6529:
---

Looking at HFileSystem closer, I don't see the case of HFileSystem being cached 
in FileSystem.Cache.
So srcPath.getFileSystem(conf) wouldn't return an HFileSystem instance.

This explains why the following check would always be true (fs being an 
HFileSystem instance):
{code}
if (!srcFs.equals(fs)) {
{code}
I searched through FilterFileSystem and FileSystem source code. They don't 
implement equals() method for instance of FileSystem or FilterFileSystem.
Meaning, identity equality is used in the above check.

> With HFile v2, the region server will always perform an extra copy of source 
> files
> --
>
> Key: HBASE-6529
> URL: https://issues.apache.org/jira/browse/HBASE-6529
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6529.diff
>
>
> With HFile v2 implementation in HBase 0.94 & 0.96, the region server will use 
> HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
> Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
> {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
> Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431986#comment-13431986
 ] 

Zhihong Ted Yu commented on HBASE-6437:
---

How about the following:
{code}
+  LOG.debug("Master has not been initialized, don't run balancer.");
{code}

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_92.patch, HBASE-6437_94.patch, 
> HBASE-6437_trunk.patch, HBASE-6437_trunk_2.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6526) Document FuzzyRowFilter

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431990#comment-13431990
 ] 

Zhihong Ted Yu commented on HBASE-6526:
---

Where is external_apis.xml located in source repo ?
I don't seem to find it.

> Document FuzzyRowFilter
> ---
>
> Key: HBASE-6526
> URL: https://issues.apache.org/jira/browse/HBASE-6526
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zhihong Ted Yu
>Assignee: Alex Baranau
>
> We should document the usage of FuzzyRowFilter in HBase book / manual

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6526) Document FuzzyRowFilter

2012-08-09 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431995#comment-13431995
 ] 

Jonathan Hsieh commented on HBASE-6526:
---

I don't know the exact filenames, but if what you refer to turns into these 
sections and doing it by analogy would be fine.  You should make sure to add a 
little info about why/when you would use this instead of just a RowKeyFilter + 
RegexStringComparator

http://hbase.apache.org/book.html#client.filter
http://hbase.apache.org/book.html#external_apis

> Document FuzzyRowFilter
> ---
>
> Key: HBASE-6526
> URL: https://issues.apache.org/jira/browse/HBASE-6526
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zhihong Ted Yu
>Assignee: Alex Baranau
>
> We should document the usage of FuzzyRowFilter in HBase book / manual

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6317) Master clean start up and Partially enabled tables make region assignment inconsistent.

2012-08-09 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432004#comment-13432004
 ] 

ramkrishna.s.vasudevan commented on HBASE-6317:
---

Pls review the patch and provide your comments.

> Master clean start up and Partially enabled tables make region assignment 
> inconsistent.
> ---
>
> Key: HBASE-6317
> URL: https://issues.apache.org/jira/browse/HBASE-6317
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6317_94.patch, HBASE-6317_94_3.patch, 
> HBASE-6317_trunk_2.patch
>
>
> If we have a  table in partially enabled state (ENABLING) then on HMaster 
> restart we treat it as a clean cluster start up and do a bulk assign.  
> Currently in 0.94 bulk assign will not handle ALREADY_OPENED scenarios and it 
> leads to region assignment problems.  Analysing more on this we found that we 
> have better way to handle these scenarios.
> {code}
> if (false == checkIfRegionBelongsToDisabled(regionInfo)
> && false == checkIfRegionsBelongsToEnabling(regionInfo)) {
>   synchronized (this.regions) {
> regions.put(regionInfo, regionLocation);
> addToServers(regionLocation, regionInfo);
>   }
> {code}
> We dont add to regions map so that enable table handler can handle it.  But 
> as nothing is added to regions map we think it as a clean cluster start up.
> Will come up with a patch tomorrow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-6437:
--

Attachment: HBASE-6437_94.patch
HBASE-6437_92.patch

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_92.patch, HBASE-6437_92.patch, 
> HBASE-6437_94.patch, HBASE-6437_94.patch, HBASE-6437_trunk.patch, 
> HBASE-6437_trunk.patch, HBASE-6437_trunk_2.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread rajeshbabu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rajeshbabu updated HBASE-6437:
--

Attachment: HBASE-6437_trunk.patch

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_92.patch, HBASE-6437_92.patch, 
> HBASE-6437_94.patch, HBASE-6437_94.patch, HBASE-6437_trunk.patch, 
> HBASE-6437_trunk.patch, HBASE-6437_trunk_2.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6438) RegionAlreadyInTransitionException needs to give more info to avoid assignment inconsistencies

2012-08-09 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432008#comment-13432008
 ] 

ramkrishna.s.vasudevan commented on HBASE-6438:
---

This patch solves the problem of depending on TM in case of RIT exception. Pls 
provide your suggestions.
 

> RegionAlreadyInTransitionException needs to give more info to avoid 
> assignment inconsistencies
> --
>
> Key: HBASE-6438
> URL: https://issues.apache.org/jira/browse/HBASE-6438
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Attachments: HBASE-6438_trunk.patch
>
>
> Seeing some of the recent issues in region assignment, 
> RegionAlreadyInTransitionException is one reason after which the region 
> assignment may or may not happen(in the sense we need to wait for the TM to 
> assign).
> In HBASE-6317 we got one problem due to RegionAlreadyInTransitionException on 
> master restart.
> Consider the following case, due to some reason like master restart or 
> external assign call, we try to assign a region that is already getting 
> opened in a RS.
> Now the next call to assign has already changed the state of the znode and so 
> the current assign that is going on the RS is affected and it fails.  The 
> second assignment that started also fails getting RAITE exception.  Finally 
> both assignments not carrying on.  Idea is to find whether any such RAITE 
> exception can be retried or not.
> Here again we have following cases like where
> -> The znode is yet to transitioned from OFFLINE to OPENING in RS
> -> RS may be in the step of openRegion.
> -> RS may be trying to transition OPENING to OPENED.
> -> RS is yet to add to online regions in the RS side.
> Here in openRegion() and updateMeta() any failures we are moving the znode to 
> FAILED_OPEN.  So in these cases getting an RAITE should be ok.  But in other 
> cases the assignment is stopped.
> The idea is to just add the current state of the region assignment in the RIT 
> map in the RS side and using that info we can determine whether the 
> assignment can be retried or not on getting an RAITE.
> Considering the current work going on in AM, pls do share if this is needed 
> atleast in the 0.92/0.94 versions?  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432014#comment-13432014
 ] 

Elliott Clark commented on HBASE-6165:
--

A better name is probably needed for the Queue.  Custom doesn't really get 
across what's can go into that qos level (replication).
Since this starts 0 "custom" priority handlers by default it will add another 
undocumented step when enabling replication.  We should either make the number 
of handlers start by default > 0, or have the number depend on if replication 
is enabled.
Why choose the number 5 for the priority ?  Since the QOS_THRESHOLD is 10. 
(Even if they are arbitrary seems like we should have some reason and a comment 
about the numbering scheme.)


Thanks for doing this.

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432015#comment-13432015
 ] 

Lars Hofhansl commented on HBASE-6165:
--

Patch looks good generally. Few comments:
# The naming is weird. These are not "Custom"QOS, but "Medium"QOS methods, 
right?
# Is there a way to generalize this to sets of Handlers with different priority 
(not important, though).
# By default now (if hbase.regionserver.custom.priority.handler.count is not 
set), replicateWALEntry would use non-priority handlers... Which is not right, 
I think. It should revert back to the current behavior in that case (which is 
to do use the priorityQOS.

What I still do not understand... Does this problem always happen? Does it 
happen because replicateWALEntry takes too long to finish? Does this only 
happen when the slave is already degraded for other reasons? Should we also 
work on replicateWALEntry failing faster in case of problems (shorter/fewer 
retries, etc)?

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432016#comment-13432016
 ] 

Zhihong Ted Yu commented on HBASE-6437:
---

+1 on latest patch.

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_92.patch, HBASE-6437_92.patch, 
> HBASE-6437_94.patch, HBASE-6437_94.patch, HBASE-6437_trunk.patch, 
> HBASE-6437_trunk.patch, HBASE-6437_trunk_2.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6165:
--

Status: Patch Available  (was: Open)

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6520) MSLab May cause the Bytes.toLong not work correctly for increment

2012-08-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432020#comment-13432020
 ] 

Lars Hofhansl commented on HBASE-6520:
--

Hmm... The OOM failures in the 0.94 must be unrelated (I don't seen how this 
can use up more memory than before).

> MSLab May cause the Bytes.toLong not work correctly for increment
> -
>
> Key: HBASE-6520
> URL: https://issues.apache.org/jira/browse/HBASE-6520
> Project: HBase
>  Issue Type: Bug
>Reporter: ShiXing
>Assignee: ShiXing
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6520-0.94-v1.patch, HBASE-6520-trunk-v1.patch
>
>
> When use MemStoreLAB, the KeyValues will share the byte array allocated by 
> the MemStoreLAB, all the KeyValues' "bytes" attributes are the same byte 
> array. When use the functions such as Bytes.toLong(byte[] bytes, int offset):
> {code}
>   public static long toLong(byte[] bytes, int offset) {
> return toLong(bytes, offset, SIZEOF_LONG);
>   }
>   public static long toLong(byte[] bytes, int offset, final int length) {
> if (length != SIZEOF_LONG || offset + length > bytes.length) {
>   throw explainWrongLengthOrOffset(bytes, offset, length, SIZEOF_LONG);
> }
> long l = 0;
> for(int i = offset; i < offset + length; i++) {
>   l <<= 8;
>   l ^= bytes[i] & 0xFF;
> }
> return l;
>   }
> {code}
> If we do not put a long value to the KeyValue, and read it as a long value in 
> HRegion.increment(),the check 
> {code}
> offset + length > bytes.length
> {code}
> will take no effects, because the bytes.length is not equal to 
> keyLength+valueLength, indeed it is MemStoreLAB chunkSize which is default 
> 2048 * 1024.
> I will paste the patch later.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432022#comment-13432022
 ] 

Zhihong Ted Yu commented on HBASE-6165:
---

w.r.t. default value for hbase.regionserver.custom.priority.handler.count, I 
agree with Lars and Elliot that the default should be > 0.
Actually we should perform check on the actual value: if user specifies 0 and 
either replication or security is enabled, we should raise the value to, say, 3.

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432023#comment-13432023
 ] 

Elliott Clark commented on HBASE-6165:
--

@Lars
We had this happen when a large cluster is replication to a small cluster.
Source (Large Cluster)
Sink (Small cluster)

After the sink goes down or re-starts, the source waits for meta to come up.  
After that lots of replicate wal edits are shipped to all the server.  So many 
in fact that the server holding meta does not have any left to answer meta 
scans or edits.

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6317) Master clean start up and Partially enabled tables make region assignment inconsistent.

2012-08-09 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432024#comment-13432024
 ] 

Jimmy Xiang commented on HBASE-6317:


I was thinking about the issue. In rebuilding the user region, if a region is 
already assigned, why shouldn't we put them in the map if the table is disabled 
or enabling? We can't pretend they are not assigned, right? So, if they are in 
the map, when bulk-assignment happens, they are already online.

In forceRegionStateToOffline, currently, we just check the region transition 
state.  We need to check the current region state too if not in transition.
If not in transition, but assigned, we don't need to assign it any more.  Will 
this solve the issue?


> Master clean start up and Partially enabled tables make region assignment 
> inconsistent.
> ---
>
> Key: HBASE-6317
> URL: https://issues.apache.org/jira/browse/HBASE-6317
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6317_94.patch, HBASE-6317_94_3.patch, 
> HBASE-6317_trunk_2.patch
>
>
> If we have a  table in partially enabled state (ENABLING) then on HMaster 
> restart we treat it as a clean cluster start up and do a bulk assign.  
> Currently in 0.94 bulk assign will not handle ALREADY_OPENED scenarios and it 
> leads to region assignment problems.  Analysing more on this we found that we 
> have better way to handle these scenarios.
> {code}
> if (false == checkIfRegionBelongsToDisabled(regionInfo)
> && false == checkIfRegionsBelongsToEnabling(regionInfo)) {
>   synchronized (this.regions) {
> regions.put(regionInfo, regionLocation);
> addToServers(regionLocation, regionInfo);
>   }
> {code}
> We dont add to regions map so that enable table handler can handle it.  But 
> as nothing is added to regions map we think it as a clean cluster start up.
> Will come up with a patch tomorrow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6407) Investigate moving to DI (guice) framework for plugin arch.

2012-08-09 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6407:
-

Attachment: HBASE-6407-5.patch

Fixed some tests, and showed how to mock up some more of the internal of 
HRegionServer.

> Investigate moving to DI (guice) framework for plugin arch.
> ---
>
> Key: HBASE-6407
> URL: https://issues.apache.org/jira/browse/HBASE-6407
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6407-1.patch, HBASE-6407-2.patch, 
> HBASE-6407-3.patch, HBASE-6407-4.patch, HBASE-6407-5.patch
>
>
> Investigate using Guice to inject the correct compat object provided by 
> compat plugins

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6407) Investigate moving to DI (guice) framework for plugin arch.

2012-08-09 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6407:
-

Status: Patch Available  (was: Open)

> Investigate moving to DI (guice) framework for plugin arch.
> ---
>
> Key: HBASE-6407
> URL: https://issues.apache.org/jira/browse/HBASE-6407
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6407-1.patch, HBASE-6407-2.patch, 
> HBASE-6407-3.patch, HBASE-6407-4.patch, HBASE-6407-5.patch
>
>
> Investigate using Guice to inject the correct compat object provided by 
> compat plugins

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432029#comment-13432029
 ] 

Himanshu Vashishtha commented on HBASE-6165:


[~eclark]: I used custom, because the current naming scheme is not appropriate 
in my opinion (I started with medium/semi QOS, but then changed it to Custom). 
Using priority is kind of a misnomer as there is no priority as such, its just 
different set of handlers that is serving the requests.
Though we call them priorityHandlers, etc, they are just like regular handlers 
but for meta operations. I think we should change their name to metaOpsHandlers 
(or metaHandlers). Yea, I just used a threshold b/w 0 and 10.

bq. Since this starts 0 "custom" priority handlers by default it will add 
another undocumented step when enabling replication. We should either make the 
number of handlers start by default > 0, or have the number depend on if 
replication is enabled.
I am ok with >0 default; don't think it should be tied to replication as they 
can be used for other methods too (such as Security, etc)

@Lars: 
bq. The naming is weird. These are not "Custom"QOS, but "Medium"QOS methods, 
right?
Hope you find it rationale now.

bq. By default now (if hbase.regionserver.custom.priority.handler.count is not 
set), replicateWALEntry would use non-priority handlers... Which is not right, 
I think. It should revert back to the current behavior in that case (which is 
to do use the priorityQOS.
default > 0 sounds good?


bq. What I still do not understand... Does this problem always happen? Does it 
happen because replicateWALEntry takes too long to finish? Does this only 
happen when the slave is already degraded for other reasons? Should we also 
work on replicateWALEntry failing faster in case of problems (shorter/fewer 
retries, etc)?

It can occur when the slave cluster is slow. And whenever it happens, it will 
make the entire cluster unresponsive. I have a patch which adds the fail fast 
behavior in sink and has been testing it too. It looks good so far. I tried 
creating a new JIRA but IOE while creating it (see INFRA-5131). Will attach the 
patch once its created.

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6524) Hooks for hbase tracing

2012-08-09 Thread Jonathan Leavitt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Leavitt updated HBASE-6524:


Attachment: (was: patch4.txt)

> Hooks for hbase tracing
> ---
>
> Key: HBASE-6524
> URL: https://issues.apache.org/jira/browse/HBASE-6524
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Leavitt
> Attachments: patch6.txt
>
>
> Includes the hooks that use [htrace|http://www.github.com/cloudera/htrace] 
> library to add dapper-like tracing to hbase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6524) Hooks for hbase tracing

2012-08-09 Thread Jonathan Leavitt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Leavitt updated HBASE-6524:


Status: Patch Available  (was: Open)

uploading updated patch with changes from stack and todd

> Hooks for hbase tracing
> ---
>
> Key: HBASE-6524
> URL: https://issues.apache.org/jira/browse/HBASE-6524
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Leavitt
> Attachments: patch6.txt
>
>
> Includes the hooks that use [htrace|http://www.github.com/cloudera/htrace] 
> library to add dapper-like tracing to hbase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6524) Hooks for hbase tracing

2012-08-09 Thread Jonathan Leavitt (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Leavitt updated HBASE-6524:


Attachment: patch6.txt

new patch with changes from stack and todd.

> Hooks for hbase tracing
> ---
>
> Key: HBASE-6524
> URL: https://issues.apache.org/jira/browse/HBASE-6524
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Jonathan Leavitt
> Attachments: patch6.txt
>
>
> Includes the hooks that use [htrace|http://www.github.com/cloudera/htrace] 
> library to add dapper-like tracing to hbase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6437) Avoid admin.balance during master initialize

2012-08-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432043#comment-13432043
 ] 

Hadoop QA commented on HBASE-6437:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12540079/HBASE-6437_trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.catalog.TestMetaReaderEditor

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2541//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2541//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2541//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2541//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2541//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2541//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2541//console

This message is automatically generated.

> Avoid admin.balance during master initialize
> 
>
> Key: HBASE-6437
> URL: https://issues.apache.org/jira/browse/HBASE-6437
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6437_92.patch, HBASE-6437_92.patch, 
> HBASE-6437_94.patch, HBASE-6437_94.patch, HBASE-6437_trunk.patch, 
> HBASE-6437_trunk.patch, HBASE-6437_trunk_2.patch
>
>
> In HBASE-5850 many of the admin operations have been blocked till the master 
> initializes.  But the balancer is not.  So this JIRA is to extend the 
> PleaseHoldException in case of admin.balance() call before master is 
> initialized.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432045#comment-13432045
 ] 

Lars Hofhansl commented on HBASE-6165:
--

@Himanshu: Thanks. Yes makes sense. I like MetaHandlers.
Re: failing fast: I think instead of using an HTablePool the sink should create 
a Connection and ThreadPool and then create HTable on demand using these (see: 
HBASE-4805), together with short timeouts and few retries.

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6165:
-

Fix Version/s: 0.94.2
   0.96.0

This should be in 0.94

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432050#comment-13432050
 ] 

Zhihong Ted Yu commented on HBASE-6165:
---

+1 on shifting away from using HTablePool in the JIRA for fail-fast.

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6414) Remove the WritableRpcEngine & associated Invocation classes

2012-08-09 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6414:
---

Attachment: 6414-3.patch.txt

Patch from RB. Attaching here to get it through hudson (the previous attempt 
failed).

> Remove the WritableRpcEngine & associated Invocation classes
> 
>
> Key: HBASE-6414
> URL: https://issues.apache.org/jira/browse/HBASE-6414
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.96.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.96.0
>
> Attachments: 6414-1.patch.txt, 6414-3.patch.txt, 
> 6414-initial.patch.txt, 6414-initial.patch.txt
>
>
> Remove the WritableRpcEngine & Invocation classes once HBASE-5705 gets 
> committed and all the protocols are rebased to use PB.
> Raising this jira in advance..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432053#comment-13432053
 ] 

Himanshu Vashishtha commented on HBASE-6165:


Lars, Ted and Elliot: Thanks for the feedback.


@Lars: Changing the name is beyond the scope of this jira, no? Another jira for 
that?
re: failfast: Yeah, the patch still uses HTablePool, but submits the batch in a 
threadpool (of ReplicationSink). Meanwhile, the handler keeps checking whether 
the client is still alive or not, while waiting for the task to finish. If the 
client is out, it cancels the task.
Also, ReplicationSink now has its own conf object where it can decorate it with 
its own timeout, number of retrials etc. Is there an open jira for 
ReplicationSink (can't create a jira yet)?

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6282) The introspection, etc. of objects in the RPC has to be handled for PB objects

2012-08-09 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432055#comment-13432055
 ] 

Devaraj Das commented on HBASE-6282:


BTW the patch in HBASE-6414 addresses the QosFunction part.

> The introspection, etc. of objects in the RPC has to be handled for PB objects
> --
>
> Key: HBASE-6282
> URL: https://issues.apache.org/jira/browse/HBASE-6282
> Project: HBase
>  Issue Type: Bug
>  Components: ipc
>Reporter: Devaraj Das
>Priority: Blocker
> Fix For: 0.96.0
>
>
> The places where the type of objects are inspected need to be updated to take 
> into consideration PB types. I have noticed Objects.describeQuantity being 
> used, and the private WritableRpcEngine.Server.logResponse method also needs 
> updating (in the PB world, all information about operations/tablenames is 
> contained in one PB argument).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432056#comment-13432056
 ] 

Hadoop QA commented on HBASE-6165:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12540074/HBase-6165-v1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.coprocessor.TestClassLoading
  org.apache.hadoop.hbase.master.TestAssignmentManager
  org.apache.hadoop.hbase.TestLocalHBaseCluster

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2542//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2542//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2542//console

This message is automatically generated.

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432059#comment-13432059
 ] 

Zhihong Ted Yu commented on HBASE-6364:
---

{code}
+  deadServers.put(expiry, address.toString());
{code}
Do we need a MultiMap as the backing store for deadServers (possibly two 
servers having the same expiration time) ?

The hbase.ipc.client.recheckServersTimeout is for dead servers. Release note is 
needed so that people know what to look for.


> Powering down the server host holding the .META. table causes HBase Client to 
> take excessively long to recover and connect to reassigned .META. table
> -
>
> Key: HBASE-6364
> URL: https://issues.apache.org/jira/browse/HBASE-6364
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Suraj Varma
>Assignee: nkeywal
>  Labels: client
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 
> 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
> 6364.v5.withtests.patch, 6364.v6.patch, 6364.v6.withtests.patch, 
> stacktrace.txt
>
>
> When a server host with a Region Server holding the .META. table is powered 
> down on a live cluster, while the HBase cluster itself detects and reassigns 
> the .META. table, connected HBase Client's take an excessively long time to 
> detect this and re-discover the reassigned .META. 
> Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
> value (default is 20s leading to 35 minute recovery time; we were able to get 
> acceptable results with 100ms getting a 3 minute recovery) 
> This was found during some hardware failure testing scenarios. 
> Test Case:
> 1) Apply load via client app on HBase cluster for several minutes
> 2) Power down the region server holding the .META. server (i.e. power off ... 
> and keep it off)
> 3) Measure how long it takes for cluster to reassign META table and for 
> client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
> and DN on that host).
> Observation:
> 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
> recover (i.e. for the thread count to go back to normal) - no client calls 
> are serviced - they just back up on a synchronized method (see #2 below)
> 2) All the client app threads queue up behind the 
> oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
> After taking several thread dumps we found that the thread within this 
> synchronized method was blocked on  NetUtils.connect(this.socket, 
> remoteId.getAddress(), getSocketTimeout(conf));
> The client thread that gets the synchronized lock would try to connect to the 
> dead RS (till socket times out after 20s), retries, and then the next thread 
> gets in and so forth in a serial manner.
> Workaround:
> ---
> Default ipc.socket.timeout is set to 20s. We dropped this to a low number 
> (1000 ms,  100 ms, etc) on the client side hbase-site.xml. With this setting, 
> the client threads recovered in a couple of minutes by failing fast and 
> re-discovering the .META. table on a reassigned RS.
> Assumption: This ipc.socket.timeout is only ever used during the initial 
> "HConnection" setup via the NetUtils.connect and should only ever be used 
> when connectivity to a region server is lost and needs to be re-established. 
> i.e it does not affect the normal "RPC" actiivity as this is just the connect 
> timeout.
> During RS GC periods, any _new_ clients trying to connect will fail and will 
> require .META. table re-lookups.
> This above timeout workaround is only for the HBase client side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6052) Convert .META. and -ROOT- content to pb

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432062#comment-13432062
 ] 

Zhihong Ted Yu commented on HBASE-6052:
---

I ran TestMetaMigrationConvertToPB and it passed locally:
{code}
[INFO] HBase - Server  SUCCESS [4:51.820s]
{code}
TestSplitLogManager#testOrphanTaskAcquisition passed locally as well.

> Convert .META. and -ROOT- content to pb
> ---
>
> Key: HBASE-6052
> URL: https://issues.apache.org/jira/browse/HBASE-6052
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: Enis Soztutar
>Priority: Blocker
> Fix For: 0.96.0
>
> Attachments: 6052-v5.txt, HBASE-6052_v1.patch, HBASE-6052_v2.patch, 
> HBASE-6052_v3.patch, HBASE-6052_v4.patch, HBASE-6052_v4.patch, 
> TestMetaMigrationConvertToPB.tgz
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6165) Replication can overrun .META scans on cluster re-start

2012-08-09 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432068#comment-13432068
 ] 

Himanshu Vashishtha commented on HBASE-6165:


Created fail-fast replicationSink jira HBase-6550 
(https://issues.apache.org/jira/browse/HBASE-6550)

> Replication can overrun .META scans on cluster re-start
> ---
>
> Key: HBASE-6165
> URL: https://issues.apache.org/jira/browse/HBASE-6165
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBase-6165-v1.patch
>
>
> When restarting a large set of regions on a reasonably small cluster the 
> replication from another cluster tied up every xceiver meaning nothing could 
> be onlined.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2012-08-09 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432071#comment-13432071
 ] 

nkeywal commented on HBASE-6364:


bq. Do we need a MultiMap as the backing store for deadServers (possibly two 
servers having the same expiration time) ?
You're right. I will need to fix this as well.

bq. The hbase.ipc.client.recheckServersTimeout is for dead servers. Release 
note is needed so that people know what to look for.
Will do. But the default should be good for most people imho. I will put a 
warning in the release notes (there could be an issue if a node disappears and 
comes back if the setting is too high: the client will have to wait for the end 
of the recheck before seeing the node coming back).

> Powering down the server host holding the .META. table causes HBase Client to 
> take excessively long to recover and connect to reassigned .META. table
> -
>
> Key: HBASE-6364
> URL: https://issues.apache.org/jira/browse/HBASE-6364
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Suraj Varma
>Assignee: nkeywal
>  Labels: client
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 
> 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
> 6364.v5.withtests.patch, 6364.v6.patch, 6364.v6.withtests.patch, 
> stacktrace.txt
>
>
> When a server host with a Region Server holding the .META. table is powered 
> down on a live cluster, while the HBase cluster itself detects and reassigns 
> the .META. table, connected HBase Client's take an excessively long time to 
> detect this and re-discover the reassigned .META. 
> Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
> value (default is 20s leading to 35 minute recovery time; we were able to get 
> acceptable results with 100ms getting a 3 minute recovery) 
> This was found during some hardware failure testing scenarios. 
> Test Case:
> 1) Apply load via client app on HBase cluster for several minutes
> 2) Power down the region server holding the .META. server (i.e. power off ... 
> and keep it off)
> 3) Measure how long it takes for cluster to reassign META table and for 
> client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
> and DN on that host).
> Observation:
> 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
> recover (i.e. for the thread count to go back to normal) - no client calls 
> are serviced - they just back up on a synchronized method (see #2 below)
> 2) All the client app threads queue up behind the 
> oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
> After taking several thread dumps we found that the thread within this 
> synchronized method was blocked on  NetUtils.connect(this.socket, 
> remoteId.getAddress(), getSocketTimeout(conf));
> The client thread that gets the synchronized lock would try to connect to the 
> dead RS (till socket times out after 20s), retries, and then the next thread 
> gets in and so forth in a serial manner.
> Workaround:
> ---
> Default ipc.socket.timeout is set to 20s. We dropped this to a low number 
> (1000 ms,  100 ms, etc) on the client side hbase-site.xml. With this setting, 
> the client threads recovered in a couple of minutes by failing fast and 
> re-discovering the .META. table on a reassigned RS.
> Assumption: This ipc.socket.timeout is only ever used during the initial 
> "HConnection" setup via the NetUtils.connect and should only ever be used 
> when connectivity to a region server is lost and needs to be re-established. 
> i.e it does not affect the normal "RPC" actiivity as this is just the connect 
> timeout.
> During RS GC periods, any _new_ clients trying to connect will fail and will 
> require .META. table re-lookups.
> This above timeout workaround is only for the HBase client side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6364) Powering down the server host holding the .META. table causes HBase Client to take excessively long to recover and connect to reassigned .META. table

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432080#comment-13432080
 ] 

Zhihong Ted Yu commented on HBASE-6364:
---

@N:
Thanks for the quick response. Appreciate it.
nit: please add javadoc for remoteId:
{code}
+   * Creates a connection. Can be overridden by a subclass for testing.
+   */
+  protected Connection createConnection(ConnectionId remoteId) throws 
IOException {
{code}
{code}
+IOException e = new DeadServerIOException(
+"This server is is the dead server list: "+server);
{code}
'is is' -> 'is in'
DeadServerIOException extends IOException, I think 'IO' doesn't have to appear 
in the name of exception.

Would be nice if the next patch is put up on review board.

> Powering down the server host holding the .META. table causes HBase Client to 
> take excessively long to recover and connect to reassigned .META. table
> -
>
> Key: HBASE-6364
> URL: https://issues.apache.org/jira/browse/HBASE-6364
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Suraj Varma
>Assignee: nkeywal
>  Labels: client
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6364-host-serving-META.v1.patch, 6364.v1.patch, 
> 6364.v1.patch, 6364.v2.patch, 6364.v3.patch, 6364.v3.patch, 6364.v5.patch, 
> 6364.v5.withtests.patch, 6364.v6.patch, 6364.v6.withtests.patch, 
> stacktrace.txt
>
>
> When a server host with a Region Server holding the .META. table is powered 
> down on a live cluster, while the HBase cluster itself detects and reassigns 
> the .META. table, connected HBase Client's take an excessively long time to 
> detect this and re-discover the reassigned .META. 
> Workaround: Decrease the ipc.socket.timeout on HBase Client side to a low  
> value (default is 20s leading to 35 minute recovery time; we were able to get 
> acceptable results with 100ms getting a 3 minute recovery) 
> This was found during some hardware failure testing scenarios. 
> Test Case:
> 1) Apply load via client app on HBase cluster for several minutes
> 2) Power down the region server holding the .META. server (i.e. power off ... 
> and keep it off)
> 3) Measure how long it takes for cluster to reassign META table and for 
> client threads to re-lookup and re-orient to the lesser cluster (minus the RS 
> and DN on that host).
> Observation:
> 1) Client threads spike up to maxThreads size ... and take over 35 mins to 
> recover (i.e. for the thread count to go back to normal) - no client calls 
> are serviced - they just back up on a synchronized method (see #2 below)
> 2) All the client app threads queue up behind the 
> oahh.ipc.HBaseClient#setupIOStreams method http://tinyurl.com/7js53dj
> After taking several thread dumps we found that the thread within this 
> synchronized method was blocked on  NetUtils.connect(this.socket, 
> remoteId.getAddress(), getSocketTimeout(conf));
> The client thread that gets the synchronized lock would try to connect to the 
> dead RS (till socket times out after 20s), retries, and then the next thread 
> gets in and so forth in a serial manner.
> Workaround:
> ---
> Default ipc.socket.timeout is set to 20s. We dropped this to a low number 
> (1000 ms,  100 ms, etc) on the client side hbase-site.xml. With this setting, 
> the client threads recovered in a couple of minutes by failing fast and 
> re-discovering the .META. table on a reassigned RS.
> Assumption: This ipc.socket.timeout is only ever used during the initial 
> "HConnection" setup via the NetUtils.connect and should only ever be used 
> when connectivity to a region server is lost and needs to be re-established. 
> i.e it does not affect the normal "RPC" actiivity as this is just the connect 
> timeout.
> During RS GC periods, any _new_ clients trying to connect will fail and will 
> require .META. table re-lookups.
> This above timeout workaround is only for the HBase client side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6317) Master clean start up and Partially enabled tables make region assignment inconsistent.

2012-08-09 Thread rajeshbabu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432081#comment-13432081
 ] 

rajeshbabu commented on HBASE-6317:
---

@Jimmy
bq.why shouldn't we put them in the map if the table is disabled or enabling?
In case of disabled we are sure that all the regions of table are offline so we 
should not add to regionAssignments. May be we can add to region states with 
state as CLOSED.
In case of enabling adding ENABLING table regions to regionAssignments cause 
data loss(regions never assigned until master restarted).
Consider a scenario:-
1)Lets take a table T with regions A,B,C and after table creation A,B,C are in 
RS1.
2)now disabled table.(All regions are in offline as for META A,B,C are in RS1)
3)enable table and lets suppose its partially enabled.(A is in transition,B is 
assigned to RS2,C assignment not yet started)
4)now only master restarted(All region servers are in online)
5)during rebuildUserRegions if we add ENABLING table regions on onlineServer to 
regionAssignments means A,B,C are online regions(In case of B its correct but A 
and B not because not yet assigned to any region server).
6)Any way B will be handled as part of processRIT
7)Now while recovering enabling table we will check for the regions not in 
regionAssignments and assign those through bulk assign.(C wont be served by any 
region server but available as online region) which is unexpected.

bq. If not in transition, but assigned, we don't need to assign it any more. 
This will helpful to avoid double assignments for regions already in transtion 
during EnableTableHandler but may not solve problem fully.

> Master clean start up and Partially enabled tables make region assignment 
> inconsistent.
> ---
>
> Key: HBASE-6317
> URL: https://issues.apache.org/jira/browse/HBASE-6317
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6317_94.patch, HBASE-6317_94_3.patch, 
> HBASE-6317_trunk_2.patch
>
>
> If we have a  table in partially enabled state (ENABLING) then on HMaster 
> restart we treat it as a clean cluster start up and do a bulk assign.  
> Currently in 0.94 bulk assign will not handle ALREADY_OPENED scenarios and it 
> leads to region assignment problems.  Analysing more on this we found that we 
> have better way to handle these scenarios.
> {code}
> if (false == checkIfRegionBelongsToDisabled(regionInfo)
> && false == checkIfRegionsBelongsToEnabling(regionInfo)) {
>   synchronized (this.regions) {
> regions.put(regionInfo, regionLocation);
> addToServers(regionLocation, regionInfo);
>   }
> {code}
> We dont add to regions map so that enable table handler can handle it.  But 
> as nothing is added to regions map we think it as a clean cluster start up.
> Will come up with a patch tomorrow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6407) Investigate moving to DI (guice) framework for plugin arch.

2012-08-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432087#comment-13432087
 ] 

Hadoop QA commented on HBASE-6407:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12540085/HBASE-6407-5.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 84 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 6 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint
  org.apache.hadoop.hbase.master.TestDistributedLogSplitting
  org.apache.hadoop.hbase.coprocessor.TestClassLoading
  
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.replication.TestMasterReplication
  org.apache.hadoop.hbase.replication.TestMultiSlaveReplication
  org.apache.hadoop.hbase.master.TestAssignmentManager
  org.apache.hadoop.hbase.regionserver.TestRpcMetrics
  org.apache.hadoop.hbase.replication.TestReplication
  org.apache.hadoop.hbase.security.access.TestAccessController
  
org.apache.hadoop.hbase.security.access.TestAccessControlFilter
  
org.apache.hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithRemove
  org.apache.hadoop.hbase.mapreduce.TestImportExport
  
org.apache.hadoop.hbase.coprocessor.TestMasterCoprocessorExceptionWithAbort
  org.apache.hadoop.hbase.coprocessor.TestMasterObserver
  org.apache.hadoop.hbase.security.access.TestTablePermissions
  
org.apache.hadoop.hbase.regionserver.TestRSKilledWhenMasterInitializing

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2543//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2543//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2543//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2543//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2543//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2543//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2543//console

This message is automatically generated.

> Investigate moving to DI (guice) framework for plugin arch.
> ---
>
> Key: HBASE-6407
> URL: https://issues.apache.org/jira/browse/HBASE-6407
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6407-1.patch, HBASE-6407-2.patch, 
> HBASE-6407-3.patch, HBASE-6407-4.patch, HBASE-6407-5.patch
>
>
> Investigate using Guice to inject the correct compat object provided by 
> compat plugins

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6340) HBase RPC does not allow protocol extension with common interfaces.

2012-08-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432091#comment-13432091
 ] 

Hadoop QA commented on HBASE-6340:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12539947/RPCInvocation.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.catalog.TestMetaReaderEditor
  org.apache.hadoop.hbase.master.TestAssignmentManager
  org.apache.hadoop.hbase.TestLocalHBaseCluster

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2545//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2545//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2545//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2545//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2545//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2545//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2545//console

This message is automatically generated.

> HBase RPC does not allow protocol extension with common interfaces.
> ---
>
> Key: HBASE-6340
> URL: https://issues.apache.org/jira/browse/HBASE-6340
> Project: HBase
>  Issue Type: Bug
>  Components: coprocessors, regionserver
>Affects Versions: 0.92.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: RPCInvocation.patch
>
>
> HBase RPC fails if MyProtocol extends an interface, which is not a 
> VersionedProtocol even if MyProtocol also directly extends VersionedProtocol. 
> The reason is that rpc Invocation uses Method.getDeclaringClass(), which 
> returns the interface class rather than the class of MyProtocol.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6414) Remove the WritableRpcEngine & associated Invocation classes

2012-08-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432094#comment-13432094
 ] 

Hadoop QA commented on HBASE-6414:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12540093/6414-3.patch.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestHMasterRPCException
  org.apache.hadoop.hbase.client.TestFromClientSide

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2546//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2546//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2546//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2546//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2546//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2546//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2546//console

This message is automatically generated.

> Remove the WritableRpcEngine & associated Invocation classes
> 
>
> Key: HBASE-6414
> URL: https://issues.apache.org/jira/browse/HBASE-6414
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.96.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.96.0
>
> Attachments: 6414-1.patch.txt, 6414-3.patch.txt, 
> 6414-initial.patch.txt, 6414-initial.patch.txt
>
>
> Remove the WritableRpcEngine & Invocation classes once HBASE-5705 gets 
> committed and all the protocols are rebased to use PB.
> Raising this jira in advance..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6414) Remove the WritableRpcEngine & associated Invocation classes

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432096#comment-13432096
 ] 

Zhihong Ted Yu commented on HBASE-6414:
---

@Devaraj:
I think the following test failure is related to the patch:
https://builds.apache.org/job/PreCommit-HBASE-Build/2546//testReport/org.apache.hadoop.hbase.master/TestHMasterRPCException/testRPCException/

> Remove the WritableRpcEngine & associated Invocation classes
> 
>
> Key: HBASE-6414
> URL: https://issues.apache.org/jira/browse/HBASE-6414
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.96.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.96.0
>
> Attachments: 6414-1.patch.txt, 6414-3.patch.txt, 
> 6414-initial.patch.txt, 6414-initial.patch.txt
>
>
> Remove the WritableRpcEngine & Invocation classes once HBASE-5705 gets 
> committed and all the protocols are rebased to use PB.
> Raising this jira in advance..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6552) TestAcidGuarantees system test should flush more aggresively

2012-08-09 Thread Gregory Chanan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan reassigned HBASE-6552:
-

Assignee: Gregory Chanan

> TestAcidGuarantees system test should flush more aggresively
> 
>
> Key: HBASE-6552
> URL: https://issues.apache.org/jira/browse/HBASE-6552
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
>
> HBASE-5887 allowed TestAcidGuarantees to be run as a system test by avoiding 
> the call to util.flush().
> It would be better to go through the HBaseAdmin interface to force flushes.  
> This would unify the code path between the unit test and the system test, as 
> well as forcing more frequent flushes, which have previously been the source 
> of ACID guarantee problems, e.g. HBASE-2856.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6553) Remove Avro Gateway

2012-08-09 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark reassigned HBASE-6553:


Assignee: Elliott Clark

> Remove Avro Gateway
> ---
>
> Key: HBASE-6553
> URL: https://issues.apache.org/jira/browse/HBASE-6553
> Project: HBase
>  Issue Type: Task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>
> The avro gateway was deprected in 0.94.  Remove it in 0.96

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6317) Master clean start up and Partially enabled tables make region assignment inconsistent.

2012-08-09 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432111#comment-13432111
 ] 

Jimmy Xiang commented on HBASE-6317:


The point is that if a table is in enabling state, its regions could be 
assigned, could be not. It is not good to assume they are online, or offline.
Currently we assume they are offline.  Your patch is to change the 
EnableTableHandler to make sure assign them to the same region server to avoid
possible double-assignments. Am I right?

> Master clean start up and Partially enabled tables make region assignment 
> inconsistent.
> ---
>
> Key: HBASE-6317
> URL: https://issues.apache.org/jira/browse/HBASE-6317
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: rajeshbabu
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6317_94.patch, HBASE-6317_94_3.patch, 
> HBASE-6317_trunk_2.patch
>
>
> If we have a  table in partially enabled state (ENABLING) then on HMaster 
> restart we treat it as a clean cluster start up and do a bulk assign.  
> Currently in 0.94 bulk assign will not handle ALREADY_OPENED scenarios and it 
> leads to region assignment problems.  Analysing more on this we found that we 
> have better way to handle these scenarios.
> {code}
> if (false == checkIfRegionBelongsToDisabled(regionInfo)
> && false == checkIfRegionsBelongsToEnabling(regionInfo)) {
>   synchronized (this.regions) {
> regions.put(regionInfo, regionLocation);
> addToServers(regionLocation, regionInfo);
>   }
> {code}
> We dont add to regions map so that enable table handler can handle it.  But 
> as nothing is added to regions map we think it as a clean cluster start up.
> Will come up with a patch tomorrow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6553) Remove Avro Gateway

2012-08-09 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6553:
-

Attachment: HBASE-6553-0.patch

Remove avro.

> Remove Avro Gateway
> ---
>
> Key: HBASE-6553
> URL: https://issues.apache.org/jira/browse/HBASE-6553
> Project: HBase
>  Issue Type: Task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6553-0.patch
>
>
> The avro gateway was deprected in 0.94.  Remove it in 0.96

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6552) TestAcidGuarantees system test should flush more aggresively

2012-08-09 Thread Gregory Chanan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-6552:
--

Attachment: HBASE-6552-trunk.patch

> TestAcidGuarantees system test should flush more aggresively
> 
>
> Key: HBASE-6552
> URL: https://issues.apache.org/jira/browse/HBASE-6552
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6552-94-92.patch, HBASE-6552-trunk.patch
>
>
> HBASE-5887 allowed TestAcidGuarantees to be run as a system test by avoiding 
> the call to util.flush().
> It would be better to go through the HBaseAdmin interface to force flushes.  
> This would unify the code path between the unit test and the system test, as 
> well as forcing more frequent flushes, which have previously been the source 
> of ACID guarantee problems, e.g. HBASE-2856.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6552) TestAcidGuarantees system test should flush more aggresively

2012-08-09 Thread Gregory Chanan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-6552:
--

Attachment: HBASE-6552-94-92.patch

> TestAcidGuarantees system test should flush more aggresively
> 
>
> Key: HBASE-6552
> URL: https://issues.apache.org/jira/browse/HBASE-6552
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6552-94-92.patch, HBASE-6552-trunk.patch
>
>
> HBASE-5887 allowed TestAcidGuarantees to be run as a system test by avoiding 
> the call to util.flush().
> It would be better to go through the HBaseAdmin interface to force flushes.  
> This would unify the code path between the unit test and the system test, as 
> well as forcing more frequent flushes, which have previously been the source 
> of ACID guarantee problems, e.g. HBASE-2856.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6552) TestAcidGuarantees system test should flush more aggressively

2012-08-09 Thread Gregory Chanan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-6552:
--

Summary: TestAcidGuarantees system test should flush more aggressively  
(was: TestAcidGuarantees system test should flush more aggresively)

> TestAcidGuarantees system test should flush more aggressively
> -
>
> Key: HBASE-6552
> URL: https://issues.apache.org/jira/browse/HBASE-6552
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6552-94-92.patch, HBASE-6552-trunk.patch
>
>
> HBASE-5887 allowed TestAcidGuarantees to be run as a system test by avoiding 
> the call to util.flush().
> It would be better to go through the HBaseAdmin interface to force flushes.  
> This would unify the code path between the unit test and the system test, as 
> well as forcing more frequent flushes, which have previously been the source 
> of ACID guarantee problems, e.g. HBASE-2856.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Himanshu Vashishtha (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-6550:
---

Attachment: HBase-6550-v1.patch

Attached is a patch to incorporate the suggestions mentioned in the description.

Testing: jenkins is green; ran replication for a few days (intermittently 
running ycsb write load on master), in tandem with HBase-6165.

> Refactoring ReplicationSink to make it more responsive of cluster health
> 
>
> Key: HBASE-6550
> URL: https://issues.apache.org/jira/browse/HBASE-6550
> Project: HBase
>  Issue Type: New Feature
>  Components: replication
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: HBase-6550-v1.patch
>
>
> ReplicationSink replicates the WALEdits in the local cluster. It uses native 
> HBase client to insert the mutations. Sometime, it takes a while to process 
> it (may be due to region splitting, gc pause, etc) and it undergoes the 
> retrial phase. 
> It has two repercussions:
> a) The regionserver handler which is serving the request (till now, a 
> priority handler) is blocked for this period.
> b) The caller may get timed out and it will retry it anyway, but the handler 
> serving the ReplicationSink requests is still working.
> Refactoring ReplicationSink to have the following features:
> a) Making it more configurable (have its own number of retrial limit, 
> connection timeout, etc)
> b) Add a fail fast behavior so that it bails out in case caller is timedout, 
> or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432144#comment-13432144
 ] 

Lars Hofhansl commented on HBASE-6550:
--

Looks like this should work.

I had something simpler in mind:
# Have a decorated conf (like you do), set client pause/retry and also lower 
client rpc timeout.
# Create an unmanaged HConnectionImplementation and an Executor
# For each batch create new HTable(connection, executor)
# apply batch
# close create HTable.

Seems that would be more readable...?


> Refactoring ReplicationSink to make it more responsive of cluster health
> 
>
> Key: HBASE-6550
> URL: https://issues.apache.org/jira/browse/HBASE-6550
> Project: HBase
>  Issue Type: New Feature
>  Components: replication
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: HBase-6550-v1.patch
>
>
> ReplicationSink replicates the WALEdits in the local cluster. It uses native 
> HBase client to insert the mutations. Sometime, it takes a while to process 
> it (may be due to region splitting, gc pause, etc) and it undergoes the 
> retrial phase. 
> It has two repercussions:
> a) The regionserver handler which is serving the request (till now, a 
> priority handler) is blocked for this period.
> b) The caller may get timed out and it will retry it anyway, but the handler 
> serving the ReplicationSink requests is still working.
> Refactoring ReplicationSink to have the following features:
> a) Making it more configurable (have its own number of retrial limit, 
> connection timeout, etc)
> b) Add a fail fast behavior so that it bails out in case caller is timedout, 
> or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6414) Remove the WritableRpcEngine & associated Invocation classes

2012-08-09 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6414:
---

Attachment: 6414-4.patch.txt

The new patch from RB.

> Remove the WritableRpcEngine & associated Invocation classes
> 
>
> Key: HBASE-6414
> URL: https://issues.apache.org/jira/browse/HBASE-6414
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.96.0
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.96.0
>
> Attachments: 6414-1.patch.txt, 6414-3.patch.txt, 6414-4.patch.txt, 
> 6414-initial.patch.txt, 6414-initial.patch.txt
>
>
> Remove the WritableRpcEngine & Invocation classes once HBASE-5705 gets 
> committed and all the protocols are rebased to use PB.
> Raising this jira in advance..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6552) TestAcidGuarantees system test should flush more aggressively

2012-08-09 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432156#comment-13432156
 ] 

Jonathan Hsieh commented on HBASE-6552:
---

+1 lgtm.  I'm assuming the test passed (as a unit tests and against a real 
cluster for some period of time?)

> TestAcidGuarantees system test should flush more aggressively
> -
>
> Key: HBASE-6552
> URL: https://issues.apache.org/jira/browse/HBASE-6552
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6552-94-92.patch, HBASE-6552-trunk.patch
>
>
> HBASE-5887 allowed TestAcidGuarantees to be run as a system test by avoiding 
> the call to util.flush().
> It would be better to go through the HBaseAdmin interface to force flushes.  
> This would unify the code path between the unit test and the system test, as 
> well as forcing more frequent flushes, which have previously been the source 
> of ACID guarantee problems, e.g. HBASE-2856.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6527) Make custom filters plugin

2012-08-09 Thread Lars George (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432159#comment-13432159
 ] 

Lars George commented on HBASE-6527:


What is the scope here? I am not sure what custom filters are supposed to be, 
and how they "plug in". We have JIRAs open to do a scripted filter, or use 
something like Groovy or other class compilers to load filters, but had issues 
that resulted in OSGI discussions. Is this the same? Then I'd say it is a dupe.

> Make custom filters plugin
> --
>
> Key: HBASE-6527
> URL: https://issues.apache.org/jira/browse/HBASE-6527
> Project: HBase
>  Issue Type: Bug
>Reporter: Zhihong Ted Yu
>
> More and more custom Filters are created.
> We should provide plugin mechanism for these custom Filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6321) ReplicationSource dies reading the peer's id

2012-08-09 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432158#comment-13432158
 ] 

Jean-Daniel Cryans commented on HBASE-6321:
---

bq. Does running of TestReplication/Source/Manager test this patch?

It exercises it but only in a way that it makes sure I didn't add bugs, it 
doesn't verify if it works or not.

> ReplicationSource dies reading the peer's id
> 
>
> Key: HBASE-6321
> URL: https://issues.apache.org/jira/browse/HBASE-6321
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6321-0.94.patch
>
>
> This is what I saw:
> {noformat}
> 2012-07-01 05:04:01,638 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing 
> source 8 because an error occurred: Could not read peer's cluster id
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /va1-backup/hbaseid
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259)
> at 
> org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253)
> {noformat}
> The session should just be reopened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6529) With HFile v2, the region server will always perform an extra copy of source files

2012-08-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432163#comment-13432163
 ] 

stack commented on HBASE-6529:
--

So, sounds like you've found a valid objection to this patch -- that it doesn't 
actually fix the issue?  It worked for you Jason?

We should include a unit test it seems.

> With HFile v2, the region server will always perform an extra copy of source 
> files
> --
>
> Key: HBASE-6529
> URL: https://issues.apache.org/jira/browse/HBASE-6529
> Project: HBase
>  Issue Type: Bug
>  Components: performance, regionserver
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6529.diff
>
>
> With HFile v2 implementation in HBase 0.94 & 0.96, the region server will use 
> HFileSystem as its {color:blue}fs{color}. When it performs bulk load in 
> Store.bulkLoadHFile(), it checks if its {color:blue}fs{color} is the same as 
> {color:blue}srcFs{color}, which however will be DistributedFileSystem. 
> Consequently, it will always perform an extra copy of source files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6527) Make custom filters plugin

2012-08-09 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432169#comment-13432169
 ] 

Zhihong Ted Yu commented on HBASE-6527:
---

@Lars:
I logged this according to Jonathan H's request.

If you can provide the JIRA number, it would be easier to see whether they're 
dupe or not.

> Make custom filters plugin
> --
>
> Key: HBASE-6527
> URL: https://issues.apache.org/jira/browse/HBASE-6527
> Project: HBase
>  Issue Type: Bug
>Reporter: Zhihong Ted Yu
>
> More and more custom Filters are created.
> We should provide plugin mechanism for these custom Filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6550) Refactoring ReplicationSink to make it more responsive of cluster health

2012-08-09 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432185#comment-13432185
 ] 

Himanshu Vashishtha commented on HBASE-6550:


I see :)

I will be glad to make it more simpler. But, its not that difficult...  :P
It basically adds two things: bailout mechanism; and to achieve it, use 
Callable to submit in a RepSink#threadpool.

I wanted to have the bailout functionality for the regionserver handler as part 
of the patch. With this, it gives the opportunity to do cleanup etc in case 
client goes away. Decorating config solves half the purpose. 
Another way is making similar changes at the master cluster regionserver side 
(decorating its config with a lower rpc timeout etc, but that's not desirable 
as its not intra-cluster and we want to give a full try before resending the 
shipment).


bq. Create an unmanaged HConnectionImplementation and an Executor
You mean at class level? In case another master cluster regionserver calls the 
method via another handler, it will wait then?
Or at method level? 

bq.For each batch create new HTable(connection, executor)
apply batch
close create HTable.

Yes, it also happens in the current patch. It closes out the connection, and 
htable's pool after the batch op.


> Refactoring ReplicationSink to make it more responsive of cluster health
> 
>
> Key: HBASE-6550
> URL: https://issues.apache.org/jira/browse/HBASE-6550
> Project: HBase
>  Issue Type: New Feature
>  Components: replication
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: HBase-6550-v1.patch
>
>
> ReplicationSink replicates the WALEdits in the local cluster. It uses native 
> HBase client to insert the mutations. Sometime, it takes a while to process 
> it (may be due to region splitting, gc pause, etc) and it undergoes the 
> retrial phase. 
> It has two repercussions:
> a) The regionserver handler which is serving the request (till now, a 
> priority handler) is blocked for this period.
> b) The caller may get timed out and it will retry it anyway, but the handler 
> serving the ReplicationSink requests is still working.
> Refactoring ReplicationSink to have the following features:
> a) Making it more configurable (have its own number of retrial limit, 
> connection timeout, etc)
> b) Add a fail fast behavior so that it bails out in case caller is timedout, 
> or any exception in processing the mutation batch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6552) TestAcidGuarantees system test should flush more aggressively

2012-08-09 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432190#comment-13432190
 ] 

Jonathan Hsieh commented on HBASE-6552:
---

Hm.. actually in testing I was looking for cmdline arg and saw a few things we 
should cleanup:

{code}
 // cannot run flusher in real cluster case.
{code} 

I will remove upon commit.

> TestAcidGuarantees system test should flush more aggressively
> -
>
> Key: HBASE-6552
> URL: https://issues.apache.org/jira/browse/HBASE-6552
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6552-94-92.patch, HBASE-6552-trunk.patch
>
>
> HBASE-5887 allowed TestAcidGuarantees to be run as a system test by avoiding 
> the call to util.flush().
> It would be better to go through the HBaseAdmin interface to force flushes.  
> This would unify the code path between the unit test and the system test, as 
> well as forcing more frequent flushes, which have previously been the source 
> of ACID guarantee problems, e.g. HBASE-2856.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6552) TestAcidGuarantees system test should flush more aggressively

2012-08-09 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432191#comment-13432191
 ] 

Jonathan Hsieh commented on HBASE-6552:
---

tested as a system test against 0.94.0.  seems good.

> TestAcidGuarantees system test should flush more aggressively
> -
>
> Key: HBASE-6552
> URL: https://issues.apache.org/jira/browse/HBASE-6552
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.92.2, 0.96.0, 0.94.1
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6552-94-92.patch, HBASE-6552-trunk.patch
>
>
> HBASE-5887 allowed TestAcidGuarantees to be run as a system test by avoiding 
> the call to util.flush().
> It would be better to go through the HBaseAdmin interface to force flushes.  
> This would unify the code path between the unit test and the system test, as 
> well as forcing more frequent flushes, which have previously been the source 
> of ACID guarantee problems, e.g. HBASE-2856.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6527) Make custom filters plugin

2012-08-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432197#comment-13432197
 ] 

stack commented on HBASE-6527:
--

The description on this feature request filed as a bug needs filling out or we 
just close it till there is more to say on this topic.  It also leads off with 
'We should...'.  I'd think something 'We should...' do would have some 
justification attached.

> Make custom filters plugin
> --
>
> Key: HBASE-6527
> URL: https://issues.apache.org/jira/browse/HBASE-6527
> Project: HBase
>  Issue Type: Bug
>Reporter: Zhihong Ted Yu
>
> More and more custom Filters are created.
> We should provide plugin mechanism for these custom Filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >