[jira] [Commented] (HBASE-12657) The Region is not being split and far exceeds the desired maximum size.

2014-12-09 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240783#comment-14240783
 ] 

Qiang Tian commented on HBASE-12657:


RatioBasedCompactionPolicy.getCurrentEligibleFiles does look suspicious(still 
not sure when it is triggered..)



> The Region is not being split and far exceeds the desired maximum size.
> ---
>
> Key: HBASE-12657
> URL: https://issues.apache.org/jira/browse/HBASE-12657
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 0.94.25
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 0.94.26
>
> Attachments: HBASE-12657-0.94.patch
>
>
> We are seeing this behavior when creating indexes in one of our environment.
> When an index is being created, most of the "requests" go into a single 
> region.  The amount of time to create an index seems to take longer than 
> usual and it can take days for the regions to compact and split after the 
> index is created.
> Here is a du of the HBase index table:
> {code}
> -bash-4.1$ sudo -su hdfs hadoop fs -du /hbase/43681
> 705  /hbase/43681/.tableinfo.01
> 0/hbase/43681/.tmp
> 27981697293  /hbase/43681/0492e22092e21d35fca8e779b21ec797
> 539687093/hbase/43681/832298c4e975fc47210feb6bac3d2f71
> 560660531/hbase/43681/be9bdb3bdf9365afe5fe90db4247d82c
> 7081938297   /hbase/43681/cd440e524f96fbe0719b2fe969848560
> 6297860287   /hbase/43681/dc893a2d8daa08c689dc69e6bb2c5b50
> 7189607722   /hbase/43681/ffbceaea5e2f142dbe6cd4cbeacc00e8
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12657) The Region is not being split and far exceeds the desired maximum size.

2014-12-11 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243652#comment-14243652
 ] 

Qiang Tian commented on HBASE-12657:


Hi [~lhofhansl]
sorry I do not quite understand that part.  we expect to see filesCompacting is 
empty in getCurrentEligibleFiles right?
regarding HStore#filesCompacting. it looks we always remove files in 
CompactionRequest by finishCompactionRequest, so not sure why we need to 
maintain it..
 


> The Region is not being split and far exceeds the desired maximum size.
> ---
>
> Key: HBASE-12657
> URL: https://issues.apache.org/jira/browse/HBASE-12657
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction
>Affects Versions: 0.94.25
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 0.94.26
>
> Attachments: HBASE-12657-0.94.patch, HBASE-12657-0.94.patch.2
>
>
> We are seeing this behavior when creating indexes in one of our environment.
> When an index is being created, most of the "requests" go into a single 
> region.  The amount of time to create an index seems to take longer than 
> usual and it can take days for the regions to compact and split after the 
> index is created.
> Here is a du of the HBase index table:
> {code}
> -bash-4.1$ sudo -su hdfs hadoop fs -du /hbase/43681
> 705  /hbase/43681/.tableinfo.01
> 0/hbase/43681/.tmp
> 27981697293  /hbase/43681/0492e22092e21d35fca8e779b21ec797
> 539687093/hbase/43681/832298c4e975fc47210feb6bac3d2f71
> 560660531/hbase/43681/be9bdb3bdf9365afe5fe90db4247d82c
> 7081938297   /hbase/43681/cd440e524f96fbe0719b2fe969848560
> 6297860287   /hbase/43681/dc893a2d8daa08c689dc69e6bb2c5b50
> 7189607722   /hbase/43681/ffbceaea5e2f142dbe6cd4cbeacc00e8
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12742) ClusterStatusPublisher crashes with a IPv6 network interface.

2014-12-28 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259874#comment-14259874
 ] 

Qiang Tian commented on HBASE-12742:


ah...ipv4 address..
thanks [~stack] for telling me the progress!

> ClusterStatusPublisher crashes with a IPv6 network interface.
> -
>
> Key: HBASE-12742
> URL: https://issues.apache.org/jira/browse/HBASE-12742
> Project: HBase
>  Issue Type: Bug
>Reporter: Jurriaan Mous
>Assignee: Jurriaan Mous
> Fix For: 1.0.0, 2.0.0, 1.1.0
>
> Attachments: HBASE-12742-v1.patch, HBASE-12742.patch
>
>
> On my dev machine the first network interface is an IPv6 tunnel. Hbase works 
> internally with ipv4 addresses. Addressing selects the first tunnel to use. 
> This causes the  ClusterStatusPublisher its DatagramChannel group join to 
> crash. 
> Stack trace:
> {code}
> java.io.IOException: Shutting down
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:231)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:93)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:976)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:936)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:810)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:792)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:779)
>   at 
> org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:140)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:74)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:211)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:67)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> Caused by: java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:143)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
>   at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
>   at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:213)
>   ... 26 more
> Caused by: java.io.IOException: Network interface not configured for IPv4
>   at 
> sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:860)
>   at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:885)
>   at 
> io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:409)
>   at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:286)
>   at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:129)
>   at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:379)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:139)
>   ... 29 more
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-04-23 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978028#comment-13978028
 ] 

Qiang Tian commented on HBASE-10289:


Hi  Demai Ni, stack,
I took a look. below is the details.

the problem is JMX created 2 additional random ports besides the port specified 
by "com.sun.management.jmxremote.port". 

One port is RMI server port, the other is used by Java Attach API for local 
attach purpose(see more details below)

it looks it has been the case for long time. prior effort focused on 
customizing the RMI server port, because the port is used by client -- when 
firewall is configured, client just cannot connect to JMX, for example:
http://olegz.wordpress.com/2009/03/23/jmx-connectivity-through-the-firewall/

there is some artifcat to address it as well, e.g. :
http://tomcat.apache.org/tomcat-7.0-doc/config/listeners.html#JMX_Remote_Lifecycle_Listener_-_org.apache.catalina.mbeans.JmxRemoteLifecycleListener

both registry port(rmiRegistryPortPlatform) and RMI server 
port(rmiServerPortPlatform) can be configured, and authentication is supported.

Sun JDK realized this problem, the RMI server port can be configured via 
"com.sun.management.jmxremote.rmi.port", since JDK7u4.

then what about the local attach port, which is also random?

According to 
http://stackoverflow.com/questions/20884353/why-java-opens-3-ports-when-jmx-is-configured,
 this feature can be disabled by "-XX:+DisableAttachMechanism" (there is no 
clear description about it in official link 
http://docs.oracle.com/javase/7/docs/technotes/tools/windows/java.html), but 
the guy tested it just does not work, and recently he opened defect against 
java: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8035404

> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Priority: Minor
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-04-23 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979257#comment-13979257
 ] 

Qiang Tian commented on HBASE-10289:


Hi [~stack],
I am new in this area as well...:-)
my 2 cents:
1)our QA has tested the patch cannot work. I'd think it is not a complete 
implementation, other options(stuff in HBASE_JMX_BASE) including security are 
not considered.  there is the wheel already, I might not tend to recreate it..

2)the real problem is the random ports that JMX opens could cause conflict. 
even if we implement by ourselves, the random local attach port is still there 
-- unless bug 8035404 is fixed. 

3) to avoid port conflict, we do need to try to eliminate random ports as we 
did in this jira. from a long term perspective, I am not sure if we need port 
management at multi-component level...? (e.g. I see a worldwide level port 
registration and management: 
http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?&page=132)

4)I tried google "hadoop port in use" and got quite a few hits..and the page 
ranking of "default ports" is high might show user need a centralized 
management of ports from various components.. anyway, just my thoughts...:-)




> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-04-24 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979398#comment-13979398
 ] 

Qiang Tian commented on HBASE-10289:


hi [~nijel],
/proc/sys/net/ipv4/ip_local_port_range looks not good solution as it limits 
port resource.

I just happen to find  when "com.sun.management.jmxremote.local.only=false" is 
set, there is only 1 random port, i.e.:
export HBASE_MASTER_OPTS="$HBASE_JMX_BASE 
-Dcom.sun.management.jmxremote.port=61100 -Dcom.sun.management
.jmxremote.local.only=false "

without "com.sun.management.jmxremote.local.only=false":

[root@test tmp]# netstat -nltp |grep 61100  
tcp0  0 :::61100:::*
LISTEN  1989249/java
[root@test tmp]# netstat -nltp |grep 1989249
tcp0  0 :::61100:::*
LISTEN  1989249/java
tcp0  0 :::4159 :::*
LISTEN  1989249/java
tcp0  0 :::9.181.64.235:6   :::*
LISTEN  1989249/java
tcp0  0 :::61320:::*
LISTEN  1989249/java
tcp0  0 :::60010:::*
LISTEN  1989249/java 

with "com.sun.management.jmxremote.local.only=false"
   
[root@test tmp]# netstat -nltp |grep 61100  
tcp0  0 :::61100:::*
LISTEN  2021776/java
[root@test tmp]# netstat -nltp |grep 2021776
tcp0  0 :::61100:::*
LISTEN  2021776/java
tcp0  0 :::2174 :::*
LISTEN  2021776/java
tcp0  0 :::9.181.64.235:6   :::*
LISTEN  2021776/java
tcp0  0 :::60010:::*
LISTEN  2021776/java

I tried jconsole can work locally and remotely. could you also have a try?


ps below is the description:
http://www.oracle.com/technetwork/java/javase/compatibility-417013.html
Area: JMX
Synopsis: New Property for JMX RMI Connector Server
Description: The new property, com.sun.management.jmxremote.local.only, when 
true (the default) indicates that the local JMX RMI connector will only accept 
connection requests from local interfaces. Setting this property to false 
restores JDK 6 behavior, but is not recommended because the local JMX RMI 
connector server will accept connection requests from both local and remote 
interfaces. For remote management, the remote JMX RMI connector server should 
be used with authentication and SLL/TLS encyrption enabled.
Nature of Incompatibility: behavioral


Regarding to the RMI server port, we could:
a)using parameter "com.sun.management.jmxremote.rmi.port" after upgrade to 
jdk7. this is the simplest way.
b)using existing artifcat catalina-jmx-remote.jar
c)implement by ourselves as you mentioned.








> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-04-24 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979408#comment-13979408
 ] 

Qiang Tian commented on HBASE-10289:


ps the official description is talking about JDK7, but I use jdk 6. and in 
jre/lib/management/management.properties file, there is similar description in 
"RMI connector settings for local management" section.



> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10581) ACL znode are left without PBed during upgrading hbase0.94* to hbase0.96+

2014-04-24 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979527#comment-13979527
 ] 

Qiang Tian commented on HBASE-10581:


Paste the stacktrace so user is easier to get here.
thanks.
---

org.apache.hadoop.hbase.security.access.HbaseObjectWritableFor96Migration:
Error in readFields
java.io.EOFException
at java.io.DataInputStream.readBoolean(DataInputStream.java:238)
at org.apache.hadoop.hbase.security.access.TablePermission.readFields
(TablePermission.java:397)
at
org.apache.hadoop.hbase.security.access.HbaseObjectWritableFor96Migration.readObject
(HbaseObjectWritableFor96Migration.java:689)
at
org.apache.hadoop.hbase.security.access.HbaseObjectWritableFor96Migration.readObject
(HbaseObjectWritableFor96Migration.java:589)
at
org.apache.hadoop.hbase.security.access.HbaseObjectWritableFor96Migration.readObject
(HbaseObjectWritableFor96Migration.java:650)
at
org.apache.hadoop.hbase.security.access.HbaseObjectWritableFor96Migration.readObject
(HbaseObjectWritableFor96Migration.java:589)
at
org.apache.hadoop.hbase.security.access.AccessControlLists.readPermissions
(AccessControlLists.java:614)
at
org.apache.hadoop.hbase.security.access.TableAuthManager.refreshTableCacheFromWritable
(TableAuthManager.java:158)
at
org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshAuthManager
(ZKPermissionWatcher.java:152)
at org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.refreshNodes
(ZKPermissionWatcher.java:135)
at org.apache.hadoop.hbase.security.access.ZKPermissionWatcher.start
(ZKPermissionWatcher.java:64)
at org.apache.hadoop.hbase.security.access.TableAuthManager.
(TableAuthManager.java:114)
at org.apache.hadoop.hbase.security.access.TableAuthManager.get
(TableAuthManager.java:662)
at org.apache.hadoop.hbase.security.access.AccessController.start
(AccessController.java:525)
at org.apache.hadoop.hbase.coprocessor.CoprocessorHost$Environment.startup
(CoprocessorHost.java:634)
at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.loadInstance
(CoprocessorHost.java:258)
at
org.apache.hadoop.hbase.coprocessor.CoprocessorHost.loadSystemCoprocessors
(CoprocessorHost.java:158)
at org.apache.hadoop.hbase.master.MasterCoprocessorHost.
(MasterCoprocessorHost.java:69)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization
(HMaster.java:827)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:612)

> ACL znode are left without PBed during upgrading hbase0.94* to hbase0.96+
> -
>
> Key: HBASE-10581
> URL: https://issues.apache.org/jira/browse/HBASE-10581
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0, 0.96.0, 0.96.1
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Critical
> Fix For: 0.96.2, 0.98.1, 0.99.0
>
> Attachments: hbase-10581.patch
>
>
> ACL znodes are left in the upgrade process when upgrading 0.94 to 0.96+
> Those 0.94 znodes will choke HMaster because their data aren't PBed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10581) ACL znode are left without PBed during upgrading hbase0.94* to hbase0.96+

2014-04-24 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980593#comment-13980593
 ] 

Qiang Tian commented on HBASE-10581:


hi [~enis], yes, it is.


> ACL znode are left without PBed during upgrading hbase0.94* to hbase0.96+
> -
>
> Key: HBASE-10581
> URL: https://issues.apache.org/jira/browse/HBASE-10581
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0, 0.96.0, 0.96.1
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
>Priority: Critical
> Fix For: 0.96.2, 0.98.1, 0.99.0
>
> Attachments: hbase-10581.patch
>
>
> ACL znodes are left in the upgrade process when upgrading 0.94 to 0.96+
> Those 0.94 znodes will choke HMaster because their data aren't PBed.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-04-24 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979398#comment-13979398
 ] 

Qiang Tian edited comment on HBASE-10289 at 4/25/14 6:07 AM:
-

hi [~nijel],
/proc/sys/net/ipv4/ip_local_port_range looks not so good as it limits port 
resource.

I just happen to find  when "com.sun.management.jmxremote.local.only=false" is 
set, there is only 1 random port, i.e.:
export HBASE_MASTER_OPTS="$HBASE_JMX_BASE 
-Dcom.sun.management.jmxremote.port=61100 -Dcom.sun.management
.jmxremote.local.only=false "

without "com.sun.management.jmxremote.local.only=false":

[root@test tmp]# netstat -nltp |grep 61100  
tcp0  0 :::61100:::*
LISTEN  1989249/java
[root@test tmp]# netstat -nltp |grep 1989249
tcp0  0 :::61100:::*
LISTEN  1989249/java
tcp0  0 :::4159 :::*
LISTEN  1989249/java
tcp0  0 :::192.168.1.101:6   :::*
LISTEN  1989249/java
tcp0  0 :::61320:::*
LISTEN  1989249/java
tcp0  0 :::60010:::*
LISTEN  1989249/java 

with "com.sun.management.jmxremote.local.only=false"
   
[root@test tmp]# netstat -nltp |grep 61100  
tcp0  0 :::61100:::*
LISTEN  2021776/java
[root@test tmp]# netstat -nltp |grep 2021776
tcp0  0 :::61100:::*
LISTEN  2021776/java
tcp0  0 :::2174 :::*
LISTEN  2021776/java
tcp0  0 :::192.168.1.101:6   :::*
LISTEN  2021776/java
tcp0  0 :::60010:::*
LISTEN  2021776/java

I tried jconsole can work locally and remotely. could you also have a try?


ps below is the description:
http://www.oracle.com/technetwork/java/javase/compatibility-417013.html
Area: JMX
Synopsis: New Property for JMX RMI Connector Server
Description: The new property, com.sun.management.jmxremote.local.only, when 
true (the default) indicates that the local JMX RMI connector will only accept 
connection requests from local interfaces. Setting this property to false 
restores JDK 6 behavior, but is not recommended because the local JMX RMI 
connector server will accept connection requests from both local and remote 
interfaces. For remote management, the remote JMX RMI connector server should 
be used with authentication and SLL/TLS encyrption enabled.
Nature of Incompatibility: behavioral


Regarding to the RMI server port, we could:
a)using parameter "com.sun.management.jmxremote.rmi.port" after upgrade to 
jdk7. this is the simplest way.
b)using existing artifcat catalina-jmx-remote.jar
c)implement by ourselves as you mentioned.









was (Author: tianq):
hi [~nijel],
/proc/sys/net/ipv4/ip_local_port_range looks not good solution as it limits 
port resource.

I just happen to find  when "com.sun.management.jmxremote.local.only=false" is 
set, there is only 1 random port, i.e.:
export HBASE_MASTER_OPTS="$HBASE_JMX_BASE 
-Dcom.sun.management.jmxremote.port=61100 -Dcom.sun.management
.jmxremote.local.only=false "

without "com.sun.management.jmxremote.local.only=false":

[root@test tmp]# netstat -nltp |grep 61100  
tcp0  0 :::61100:::*
LISTEN  1989249/java
[root@test tmp]# netstat -nltp |grep 1989249
tcp0  0 :::61100:::*
LISTEN  1989249/java
tcp0  0 :::4159 :::*
LISTEN  1989249/java
tcp0  0 :::9.181.64.235:6   :::*
LISTEN  1989249/java
tcp0  0 :::61320:::*
LISTEN  1989249/java
tcp0  0 :::60010:::*
LISTEN  1989249/java 

with "com.sun.management.jmxremote.local.only=false"
   
[root@test tmp]# netstat -nltp |grep 61100  
tcp0  0 :::61100:::*
LISTEN  2021776/java
[root@test tmp]# netstat -nltp |grep 2021776
tcp0  0 :::61100:::*
LISTEN  2021776/java
tcp0  0 :::2174 :::*
LISTEN  2021776/java
tcp0  0 :::9.181.64.235:6   :::*
LISTEN  2021776/java
tcp  

[jira] [Created] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-04-29 Thread Qiang Tian (JIRA)
Qiang Tian created HBASE-11096:
--

 Summary: stop method of Master and RegionServer coprocessor  is 
not invoked
 Key: HBASE-11096
 URL: https://issues.apache.org/jira/browse/HBASE-11096
 Project: HBase
  Issue Type: Bug
Reporter: Qiang Tian
Assignee: Qiang Tian


the stop method of coprocessor specified by "hbase.coprocessor.master.classes" 
and  "hbase.coprocessor.regionserver.classes"  is not invoked.
If coprocessor allocates OS resources, it could cause master/regionserver 
resource leak or hang during exit.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-04-29 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian reassigned HBASE-10289:
--

Assignee: Qiang Tian

> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: Qiang Tian
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-04-29 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985069#comment-13985069
 ] 

Qiang Tian commented on HBASE-11096:


Hi [~apurtell],
Thanks for response.

Yes, I will. the fix is simple.. this is part of work for HBASE-10289, 
during my test I just find another small bug--the HRegionServer.stop is called 
twice, the fix looks simple as well, but I better study the related code 
first..:-)


the first time:
HRegionServer.run:
  } else if (this.stopping) {
boolean allUserRegionsOffline = areAllUserRegionsOffline();
if (allUserRegionsOffline) {
  // Set stopped if no requests since last time we went around the 
loop.
  // The remaining meta regions will be closed on our way out.
  if (oldRequestCount == this.requestCount.get()) {
stop("Stopped; only catalog regions remaining online");
break;
  }


the second time in shutdown hook:
   at 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost.shutdown(CoprocessorHost.java:273)
at 
org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoproces
sorHost.java:63)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1680)
at 
org.apache.hadoop.hbase.regionserver.ShutdownHook$ShutdownHookThread.run(ShutdownHook.java:114)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
2014-04-29 18:56:02,617 WARN 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost: Not stopping coprocess
or org.apache.hadoop.hbase.jmxAgent because not active (state=STOPPED)
2014-04-29 18:56:02,617 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown hook
2014-04-29 18:56:02,617 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: 
Starting fs shutdown hoo
k thread.

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-04-30 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Attachment: HBASE-11096-trunk-v0.patch

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-04-30 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Status: Patch Available  (was: Open)

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.19, 0.98.1, 0.96.2
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-04-30 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985250#comment-13985250
 ] 

Qiang Tian commented on HBASE-11096:


Hi [~apurtell],
1)For MasterCoprocessorHost, I can see preShutdown is called in my test, not 
sure how preStopMaster is called(looks command invocation via RPC?)
2)HRegionServer.stop change is for the bug mentioned above. 

thanks!


> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-04-30 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985250#comment-13985250
 ] 

Qiang Tian edited comment on HBASE-11096 at 4/30/14 7:52 AM:
-

Hi [~apurtell],
1)For MasterCoprocessorHost, I can see preShutdown is called in my test, not 
sure how preStopMaster is called(looks command invocation via RPC?)
2)HRegionServer.stop change is for the bug mentioned above. 

will upload other patches after review.

thanks!



was (Author: tianq):
Hi [~apurtell],
1)For MasterCoprocessorHost, I can see preShutdown is called in my test, not 
sure how preStopMaster is called(looks command invocation via RPC?)
2)HRegionServer.stop change is for the bug mentioned above. 

thanks!


> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-04-30 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985271#comment-13985271
 ] 

Qiang Tian commented on HBASE-11096:


oops...just notice the "stop coprocessor.." message is logged per-region for 
"hbase.coprocessor.region.classes" coprocessor. need to be lower level or 
removed...


> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-04-30 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985338#comment-13985338
 ] 

Qiang Tian commented on HBASE-11096:


Hi, 
I ran the regression test by "mvn clean package ", 1 test did fail...but 
without the fix it failed either..:
Failed tests:   testNativeSizes(org.apache.hadoop.hbase.io.TestHeapSize): 
expected:<64> but was:<56>
Tests in error: 
  org.apache.hadoop.hbase.procedure.TestProcedureManager: 
org/apache/hadoop/io/nativeio/SharedFileDescriptorFactory.createDescriptor0(Ljava/lang/String;Ljava/lang/String;I)Ljava/io/FileDescriptor;


but I did not see result file of TestFlushSnapshotFromClient ...
I also tried "mvn test", did I miss something?

thanks!



> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-01 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Status: Open  (was: Patch Available)

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.19, 0.98.1, 0.96.2
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v0.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-01 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Attachment: HBASE-11096-trunk-v1.patch

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v1.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-01 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Status: Patch Available  (was: Open)

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.19, 0.98.1, 0.96.2
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v1.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-01 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986473#comment-13986473
 ] 

Qiang Tian commented on HBASE-11096:


Hi [~apurtell],
I took a look at current testcase, there is a TestMasterObserver.wasStopped 
method, but it is not called --it looks there is no good place to test it due 
to the scenario.. HBASE-10289 is a better place, it will cover the stop test 
naturally.
please let me know if it is acceptable.
thanks.

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v1.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-12 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Status: Open  (was: Patch Available)

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.19, 0.98.1, 0.96.2
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v1.patch, HBASE-11096-trunk-v2.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11146) HMaster instantiates both MasterCoprocessorHost and RegionServerCoprocessorHost

2014-05-13 Thread Qiang Tian (JIRA)
Qiang Tian created HBASE-11146:
--

 Summary: HMaster instantiates both MasterCoprocessorHost and 
RegionServerCoprocessorHost
 Key: HBASE-11146
 URL: https://issues.apache.org/jira/browse/HBASE-11146
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors
Affects Versions: 0.99.0
Reporter: Qiang Tian
Assignee: Qiang Tian


See HBASE-11096.

in 0.99, HRegionServer is the base class of HMaster. 
master and regionserver share the same run method, the master instantiates both 
MasterCoprocessorHost and RegionServerCoprocessorHost


below is example logs, a coprocessor is start/stop twice--one is for real 
regionserver, the other is for the RegionServerCoprocessorHost in master.

2014-05-08 00:33:51,632 INFO [M:0;bdvm135:36021] 
coprocessor.TestCoprocessorStop$FooCoprocessor(66): st
art coprocessor on regionserver

2014-05-08 00:33:51,633 INFO [RS:0;bdvm135:47513] 
coprocessor.TestCoprocessorStop$FooCoprocessor(66): s
tart coprocessor on regionserver

...
2014-05-08 00:34:03,166 INFO [main] regionserver.HRegionServer(1624): call 
stack of stop
java.io.IOException
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1624)
at 
org.apache.hadoop.hbase.master.ServerManager.shutdownCluster(ServerManager.java:975)
at org.apache.hadoop.hbase.master.HMaster.shutdown(HMaster.java:1623)
at org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(JVMClusterUtil.java:256)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.shutdown(LocalHBaseCluster.java:437)
at org.apache.hadoop.hbase.MiniHBaseCluster.shutdown(MiniHBaseCluster.java:519)
at 
org.apache.hadoop.hbase.coprocessor.TestCoprocessorStop.testStopped(TestCoprocessorStop.java:
114)
...
2014-05-08 00:34:03,215 INFO [main] regionserver.HRegionServer(1629): rsHost 
code path called

2014-05-08 00:34:03,228 DEBUG [main] coprocessor.CoprocessorHost(258): Stop 
coprocessor org.apache.hadoo
p.hbase.coprocessor.TestCoprocessorStop$FooCoprocessor
2014-05-08 00:34:03,462 INFO [main] 
coprocessor.TestCoprocessorStop$FooCoprocessor(88): create file hdf
s://localhost:8155/user/tianq/test-data/f0c9423c-e505-4feb-907e-c7bd6e16545b/regionserver1399534399680
 r
eturn rc true

...

2014-05-08 00:34:03,482 INFO [main] regionserver.HRegionServer(1624): call 
stack of stop
java.io.IOException
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1624)
at org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(JVMClusterUtil.java:264)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.shutdown(LocalHBaseCluster.java:437)
at org.apache.hadoop.hbase.MiniHBaseCluster.shutdown(MiniHBaseCluster.java:519)
at 
org.apache.hadoop.hbase.coprocessor.TestCoprocessorStop.testStopped(TestCoprocessorStop.java:
114)
2014-05-08 00:34:03,485 INFO [main] regionserver.HRegionServer(1629): rsHost 
code path called

2014-05-08 00:34:03,485 DEBUG [main] coprocessor.CoprocessorHost(258): Stop 
coprocessor org.apache.hadoo
p.hbase.coprocessor.TestCoprocessorStop$FooCoprocessor
2014-05-08 00:34:03,493 INFO [main] 
coprocessor.TestCoprocessorStop$FooCoprocessor(88): create file hdf
s://localhost:8155/user/tianq/test-data/f0c9423c-e505-4feb-907e-c7bd6e16545b/regionserver1399534399680
 r
eturn rc false



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-14 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996228#comment-13996228
 ] 

Qiang Tian commented on HBASE-11096:


Thanks Ted's comment. sure, will remove it.

Hi [~apurtell], [~yuzhih...@gmail.com], [~jxiang],
Regarding HBASE-11146, after studying HBASE-10569 and the code, it is by 
design. 
my concern is that, although master is also a regionserver, but from user's 
perspective master is still of a 'master role', so if they deploy master 
coprocessor and regionserver coprocessor on master and regionserver 
respectively, they might not want to see 2 coprocessors created on master, in 
which case the master coprocessor and regionserver coprocessor might have 
potential conflict.  --I just see we remove the master infoserver, and redirect 
the request to regionserver infoserver in 0.99. maybe we could follow similar 
way? i.e. disable RegionServerCoprocessorHost on master.

Could you please let me know your thoughts?

Thanks!
Qiang


> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v1.patch, HBASE-11096-trunk-v2.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-15 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Status: Patch Available  (was: Open)

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.19, 0.98.1, 0.96.2
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v1.patch, HBASE-11096-trunk-v2.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-15 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992587#comment-13992587
 ] 

Qiang Tian commented on HBASE-11096:


Hi [~apurtell], 

happen to find another new bug...

it looks 0.99 is in BIG refactoring... HRegionServer becomes the base class of 
HMaster, for coprocessor stuff, since they share the same run method,  the 
master will call coprocessor twice.

see below example logs, one is for real regionserver, the other is for the 
regionserver code path in master, 


2014-05-08 00:33:51,632 INFO  [M:0;bdvm135:36021] 
coprocessor.TestCoprocessorStop$FooCoprocessor(66): st
art coprocessor on regionserver

2014-05-08 00:33:51,633 INFO  [RS:0;bdvm135:47513] 
coprocessor.TestCoprocessorStop$FooCoprocessor(66): s
tart coprocessor on regionserver

...
2014-05-08 00:34:03,166 INFO  [main] regionserver.HRegionServer(1624): call 
stack of stop
java.io.IOException
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1624)
at 
org.apache.hadoop.hbase.master.ServerManager.shutdownCluster(ServerManager.java:975)
at org.apache.hadoop.hbase.master.HMaster.shutdown(HMaster.java:1623)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(JVMClusterUtil.java:256)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.shutdown(LocalHBaseCluster.java:437)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.shutdown(MiniHBaseCluster.java:519)
at 
org.apache.hadoop.hbase.coprocessor.TestCoprocessorStop.testStopped(TestCoprocessorStop.java:
114)
...
2014-05-08 00:34:03,215 INFO  [main] regionserver.HRegionServer(1629): rsHost 
code path called

2014-05-08 00:34:03,228 DEBUG [main] coprocessor.CoprocessorHost(258): Stop 
coprocessor org.apache.hadoo
p.hbase.coprocessor.TestCoprocessorStop$FooCoprocessor
2014-05-08 00:34:03,462 INFO  [main] 
coprocessor.TestCoprocessorStop$FooCoprocessor(88): create file hdf
s://localhost:8155/user/tianq/test-data/f0c9423c-e505-4feb-907e-c7bd6e16545b/regionserver1399534399680
 r
eturn rc true

...

2014-05-08 00:34:03,482 INFO  [main] regionserver.HRegionServer(1624): call 
stack of stop
java.io.IOException
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1624)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(JVMClusterUtil.java:264)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.shutdown(LocalHBaseCluster.java:437)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.shutdown(MiniHBaseCluster.java:519)
at 
org.apache.hadoop.hbase.coprocessor.TestCoprocessorStop.testStopped(TestCoprocessorStop.java:
114)
2014-05-08 00:34:03,485 INFO  [main] regionserver.HRegionServer(1629): rsHost 
code path called

2014-05-08 00:34:03,485 DEBUG [main] coprocessor.CoprocessorHost(258): Stop 
coprocessor org.apache.hadoo
p.hbase.coprocessor.TestCoprocessorStop$FooCoprocessor
2014-05-08 00:34:03,493 INFO  [main] 
coprocessor.TestCoprocessorStop$FooCoprocessor(88): create file hdf
s://localhost:8155/user/tianq/test-data/f0c9423c-e505-4feb-907e-c7bd6e16545b/regionserver1399534399680
 r
eturn rc false




> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v1.patch, HBASE-11096-trunk-v2.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-15 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Attachment: HBASE-11096-trunk-v2.patch

adding testcase for stop method test.

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v1.patch, HBASE-11096-trunk-v2.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-16 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Status: Patch Available  (was: Open)

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.19, 0.98.1, 0.96.2
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-16 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999677#comment-13999677
 ] 

Qiang Tian commented on HBASE-11096:


Hi Guys,
update and upload the patches for other branches.(mvn test passed)
thanks.


> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-16 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Status: Open  (was: Patch Available)

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.19, 0.98.1, 0.96.2
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-16 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Attachment: HBASE-11096-0.94.patch
HBASE-11096-0.96.patch
HBASE-11096-0.98.patch
HBASE-11096-trunk-v3.patch

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-11146) HMaster instantiates both MasterCoprocessorHost and RegionServerCoprocessorHost

2014-05-18 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian resolved HBASE-11146.


Resolution: Won't Fix

work as design

> HMaster instantiates both MasterCoprocessorHost and 
> RegionServerCoprocessorHost
> ---
>
> Key: HBASE-11146
> URL: https://issues.apache.org/jira/browse/HBASE-11146
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors
>Affects Versions: 0.99.0
>Reporter: Qiang Tian
>Assignee: Qiang Tian
>
> See HBASE-11096.
> in 0.99, HRegionServer is the base class of HMaster. 
> master and regionserver share the same run method, the master instantiates 
> both MasterCoprocessorHost and RegionServerCoprocessorHost
> below is example logs, a coprocessor is start/stop twice--one is for real 
> regionserver, the other is for the RegionServerCoprocessorHost in master.
> 2014-05-08 00:33:51,632 INFO [M:0;bdvm135:36021] 
> coprocessor.TestCoprocessorStop$FooCoprocessor(66): st
> art coprocessor on regionserver
> 2014-05-08 00:33:51,633 INFO [RS:0;bdvm135:47513] 
> coprocessor.TestCoprocessorStop$FooCoprocessor(66): s
> tart coprocessor on regionserver
> ...
> 2014-05-08 00:34:03,166 INFO [main] regionserver.HRegionServer(1624): call 
> stack of stop
> java.io.IOException
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1624)
> at 
> org.apache.hadoop.hbase.master.ServerManager.shutdownCluster(ServerManager.java:975)
> at org.apache.hadoop.hbase.master.HMaster.shutdown(HMaster.java:1623)
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(JVMClusterUtil.java:256)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.shutdown(LocalHBaseCluster.java:437)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster.shutdown(MiniHBaseCluster.java:519)
> at 
> org.apache.hadoop.hbase.coprocessor.TestCoprocessorStop.testStopped(TestCoprocessorStop.java:
> 114)
> ...
> 2014-05-08 00:34:03,215 INFO [main] regionserver.HRegionServer(1629): rsHost 
> code path called
> 2014-05-08 00:34:03,228 DEBUG [main] coprocessor.CoprocessorHost(258): Stop 
> coprocessor org.apache.hadoo
> p.hbase.coprocessor.TestCoprocessorStop$FooCoprocessor
> 2014-05-08 00:34:03,462 INFO [main] 
> coprocessor.TestCoprocessorStop$FooCoprocessor(88): create file hdf
> s://localhost:8155/user/tianq/test-data/f0c9423c-e505-4feb-907e-c7bd6e16545b/regionserver1399534399680
>  r
> eturn rc true
> ...
> 2014-05-08 00:34:03,482 INFO [main] regionserver.HRegionServer(1624): call 
> stack of stop
> java.io.IOException
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1624)
> at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(JVMClusterUtil.java:264)
> at 
> org.apache.hadoop.hbase.LocalHBaseCluster.shutdown(LocalHBaseCluster.java:437)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster.shutdown(MiniHBaseCluster.java:519)
> at 
> org.apache.hadoop.hbase.coprocessor.TestCoprocessorStop.testStopped(TestCoprocessorStop.java:
> 114)
> 2014-05-08 00:34:03,485 INFO [main] regionserver.HRegionServer(1629): rsHost 
> code path called
> 2014-05-08 00:34:03,485 DEBUG [main] coprocessor.CoprocessorHost(258): Stop 
> coprocessor org.apache.hadoo
> p.hbase.coprocessor.TestCoprocessorStop$FooCoprocessor
> 2014-05-08 00:34:03,493 INFO [main] 
> coprocessor.TestCoprocessorStop$FooCoprocessor(88): create file hdf
> s://localhost:8155/user/tianq/test-data/f0c9423c-e505-4feb-907e-c7bd6e16545b/regionserver1399534399680
>  r
> eturn rc false



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-18 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001345#comment-14001345
 ] 

Qiang Tian commented on HBASE-11096:


Thanks [~lhofhansl]

[~apurtell], is it OK for trunk?
thanks!

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-19 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002717#comment-14002717
 ] 

Qiang Tian commented on HBASE-11096:


Hi [~apurtell],
I ran "mvn test -Dtest=org.apache.hadoop.hbase.coprocessor.*" several 
times...all good.
Could you paste the full log file?
thanks.


> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-22 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Attachment: HBASE-11096-trunk-v3.patch

local test fine...kick off test again.

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.98.3, 0.94.21
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.patch, 
> HBASE-11096-trunk-v3.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-22 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Status: Patch Available  (was: Open)

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.19, 0.98.1, 0.96.2
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.98.3, 0.94.21
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.patch, 
> HBASE-11096-trunk-v3.patch
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-25 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008311#comment-14008311
 ] 

Qiang Tian commented on HBASE-11096:


Hi [~apurtell],
it looks the shutdown method is not called..could you please check if the patch 
apply to the MasterCoprocessorHost.java correctly? I just run command "patch 
-p0< ../backup/HBASE-11096-trunk-v3.patch" to trunk, and the change is not 
applied correctly. 

ps when I firstly saw your update on 20/5, I merged the code with latest code 
using "git merge" command and the test running fine. 
today I used "git rebase" command to merge my local branch with the latest code 
(since you recommended rebase in mailing list), and I got the error ---after 
investigation the MasterCoprocessorHost.java was not merged correctly..  

I did a search, it looks git rebase uses patch command to do the final merge. 
so if patch does not work correctly, the rebase will fail too...

pps..this is my second time to see patch command fails...Did I use wrong option?




> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.21, 0.98.4
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.patch, 
> HBASE-11096-trunk-v3.patch, TestCoprocessorStop-failed-output.txt
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-26 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Status: Open  (was: Patch Available)

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.19, 0.98.1, 0.96.2
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.21, 0.98.4
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.patch, 
> HBASE-11096-trunk-v3.patch, TestCoprocessorStop-failed-output.txt
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-26 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Status: Patch Available  (was: Open)

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.19, 0.98.1, 0.96.2
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.21, 0.98.4
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.1.patch, 
> HBASE-11096-trunk-v3.patch, HBASE-11096-trunk-v3.patch, 
> TestCoprocessorStop-failed-output.txt
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-26 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11096:
---

Attachment: HBASE-11096-trunk-v3.1.patch

> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.21, 0.98.4
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.1.patch, 
> HBASE-11096-trunk-v3.patch, HBASE-11096-trunk-v3.patch, 
> TestCoprocessorStop-failed-output.txt
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11096) stop method of Master and RegionServer coprocessor is not invoked

2014-05-26 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008680#comment-14008680
 ] 

Qiang Tian commented on HBASE-11096:


Hi [~apurtell],
it looks the code base changed much since HBASE-11096-trunk-v3.patch was 
uploaded, so that the merge against latest trunk fails to pick up the fix 
correctly(but without reporting any error).
I create a new patch based on the latest code base I can see:
Author: Ramkrishna 
Date:   Mon May 26 11:47:04 2014 +0530

HBASE-11251-LoadTestTool should grant READ permission for the users that
are given READ access for specific cells (Ram)

local test "mvn test -Dtest=org.apache.hadoop.hbase.coprocessor.*" passed
thanks.


> stop method of Master and RegionServer coprocessor  is not invoked
> --
>
> Key: HBASE-11096
> URL: https://issues.apache.org/jira/browse/HBASE-11096
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.19
>Reporter: Qiang Tian
>Assignee: Qiang Tian
> Fix For: 0.99.0, 0.96.3, 0.94.21, 0.98.4
>
> Attachments: HBASE-11096-0.94.patch, HBASE-11096-0.96.patch, 
> HBASE-11096-0.98.patch, HBASE-11096-trunk-v0.patch, 
> HBASE-11096-trunk-v0.patch, HBASE-11096-trunk-v1.patch, 
> HBASE-11096-trunk-v2.patch, HBASE-11096-trunk-v3.1.patch, 
> HBASE-11096-trunk-v3.patch, HBASE-11096-trunk-v3.patch, 
> TestCoprocessorStop-failed-output.txt
>
>
> the stop method of coprocessor specified by 
> "hbase.coprocessor.master.classes" and  
> "hbase.coprocessor.regionserver.classes"  is not invoked.
> If coprocessor allocates OS resources, it could cause master/regionserver 
> resource leak or hang during exit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-05-28 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-10289:
---

Status: Open  (was: Patch Available)

> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: Qiang Tian
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch, 
> HBase10289-master.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-05-28 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-10289:
---

Attachment: HBase10289-master.patch

> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: Qiang Tian
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch, 
> HBase10289-master.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-05-28 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-10289:
---

Status: Patch Available  (was: Open)

> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: Qiang Tian
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch, 
> HBase10289-master.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-05-28 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010963#comment-14010963
 ] 

Qiang Tian commented on HBASE-10289:


Hi guys,
sorry for late response.  
the new patch just uploaded is a pluggable solution based on coprocessor, so 
that user could turn off or switch back easily.
1)support password authentication

2)support a subset of SSL(Enforcing SSL for communication
) with default configuration.  see 
http://ddweerasiri.blogspot.com/2011/08/ssl-enabled-jconsole-to-monitor-wso2.html
 for example.

3)registry port and connector port can be configured separately. or combined 
into 1 when SSL is not enabled. (If SSL is enabled, the connector port must use 
a different value than registry port)

4)only load coprocessor for regionserver, since "master is also a regionserver" 
in 0.99.

5)the code references the example on official site: 
http://docs.oracle.com/javase/7/docs/technotes/guides/management/agent.html 
(see section "Example of Mimicking Out-of-the-Box Management")

6)password authentication and SSL test are done manually since the code almost 
does nothing for it - most work is related to environment

7)confirmed random port issue is not resolved in JDK 7 unless bug 8035404 is 
fixed.

appreciate your comments.
thanks.

> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: Qiang Tian
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch, 
> HBase10289-master.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-05-29 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013296#comment-14013296
 ] 

Qiang Tian commented on HBASE-10289:


Thanks [~stack],
Yes. it is OFF if user does not configure it in hbase-site.xml. (current JMX 
configuration in hbase-env.sh is also OFF by default)

What about adding a JMX part in hbase book section "2.5.3. Other 
Configurations"? e.g. the description of the problem, examples for supported 
configurations etc.
thanks.


> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: Qiang Tian
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch, 
> HBase10289-master.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-05-29 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013314#comment-14013314
 ] 

Qiang Tian commented on HBASE-10289:


ps making it coprocessor based just utilized the strength of class 
instantiation of the coprocessor framework, without  any other costs. so that 
we have a real pluggability without modifying other code..(since I thought the 
native JDK solution is still the best after the bug is fixed)... 
but if you guys think the original way is better, I can change it back..
thanks.


> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: Qiang Tian
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch, 
> HBase10289-master.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-05-30 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-10289:
---

Status: Open  (was: Patch Available)

> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: Qiang Tian
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch, 
> HBase10289-master.patch, hbase10289-master-v1.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-05-30 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-10289:
---

Attachment: hbase10289-master-v1.patch

add doc change

> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: Qiang Tian
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch, 
> HBase10289-master.patch, hbase10289-master-v1.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10289) Avoid random port usage by default JMX Server. Create Custome JMX server

2014-05-30 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-10289:
---

Status: Patch Available  (was: Open)

> Avoid random port usage by default JMX Server. Create Custome JMX server
> 
>
> Key: HBASE-10289
> URL: https://issues.apache.org/jira/browse/HBASE-10289
> Project: HBase
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: Qiang Tian
>Priority: Minor
>  Labels: stack
> Fix For: 0.99.0
>
> Attachments: HBASE-10289-v4.patch, HBASE-10289.patch, 
> HBASE-10289_1.patch, HBASE-10289_2.patch, HBASE-10289_3.patch, 
> HBase10289-master.patch, hbase10289-master-v1.patch
>
>
> If we enable JMX MBean server for HMaster or Region server  through VM 
> arguments, the process will use one random which we cannot configure.
> This can be a problem if that random port is configured for some other 
> service.
> This issue can be avoided by supporting  a custom JMX Server.
> The ports can be configured. If there is no ports configured, it will 
> continue the same way as now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-07 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163024#comment-14163024
 ] 

Qiang Tian commented on HBASE-11368:


As [~stack] mentioned in http://search-hadoop.com/m/DHED4NR0wT, the 
HRegion#lock is to protect region close. the comments in HRegion.java and the 
fact that only HRegion#doClose locks the writelock(if we do not consider 
HRegion#startBulkRegionOperation) also show that.

so using HRegion#lock to protect multi-CF bulkload in HBASE-4552 looks too 
heavy-weight?
from the stacktrace of HBASE-10882, all the read/scan are blocked since 
bulkload is waiting for lock.writelock, however compaction already acquired 
lock.readlock and is reading data, a time-consuming operation.

and related topic is discussed again in http://search-hadoop.com/m/DHED4I11p31. 
perhaps we need another region level lock.











> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-07 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163162#comment-14163162
 ] 

Qiang Tian commented on HBASE-11368:


ideas for lowering down the lock granularity(based on 0.98.5 code base)
1)read/scan 
is it the primary goal for atomic multi-CF bulkload in HBASE-4552?

After DefaultStoreFileManager#storefiles is updated in HStore#bulkLoadHFile, 
notifyChangedReadersObservers is called to reset the StoreScanner#heap,  so 
checkReseek->resetScannerStack will be triggered in next scan/read to recreate 
store scanners based on new storefiles.

so we could introduce a new region level rwlock multiCFLock,  
HRegion#bulkLoadHFiles acquires the writelock before multi-CF 
HStore.bulkLoadHFile call. and StoreScanner#resetScannerStack acquires the 
readlock. this way the scanners are recreated after all CFs' store files are 
populated.

2)split region.
the region will be closed in SplitTransaction#stepsBeforePONR, which falls into 
the HRegion#lock protection area. bulk load still still need to acquire its 
readlock at start.

3) memstore flush.
we flush to a new file which is not related to the loaded files.

4)compaction.
the compaction is performed store by store. if bulkload inserts new files to 
{{storefiles}} during the selectCompaction process, the file list to be 
compacted might be impacted. e.g., the compaction for some CF do not include 
new loaded files, while others might include. but this does not impact the data 
integrity and read behavior?
at the end of compaction,  {{storefiles}} access is still protected by 
HStore#lock if there is bulk load change to the same CF.

comments?
thanks















> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-09 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian reassigned HBASE-11368:
--

Assignee: Qiang Tian

> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-09 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164957#comment-14164957
 ] 

Qiang Tian commented on HBASE-11368:


Thanks [~jinghe],
is it right way to run the bulkload test? {{mvn test 
-Dtest=TestHRegionServerBulkLoad}}
the test is supposed to run for 5 minutes, but only after about 1 minutes then 
it exits. is it expected?

> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-09 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166466#comment-14166466
 ] 

Qiang Tian commented on HBASE-11368:


update: 
the idea will cause deadlock since bulkload and scanner follow different orders 
to acquire bulkload lock and StoreScanner.lock. will look at if we could lower 
the granularity of storescanner lock.


> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11890) HBase REST Client is hard coded to http protocol

2014-10-11 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168065#comment-14168065
 ] 

Qiang Tian commented on HBASE-11890:


Hi [~stack], there is also a doc update :-)

HBase-11890-doc.patch


> HBase REST Client is hard coded to http protocol
> 
>
> Key: HBASE-11890
> URL: https://issues.apache.org/jira/browse/HBASE-11890
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.96.2
>Reporter: Eric Yang
>Assignee: Qiang Tian
> Fix For: 2.0.0, 0.98.7, 0.99.1
>
> Attachments: HBase-11890-doc.patch, HBase-11890-master-v1.patch, 
> HBase-11890-master.patch
>
>
> HBase REST Client executePathOnly only supports http.  It would be nice if 
> there is a option to enable REST API client to connect through SSL.  
> org.apache.hadoop.hbase.rest.client.Cluster class does not indicate which 
> protocol can be used, we can either set flag in Cluster class or introduce a 
> parameter in Client class to toggle SSL.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-14 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11368:
---
Attachment: hbase-11368-0.98.5.patch

I forgot StoreScanner is per CF..earlier analysis is wrong:
{quote}
After DefaultStoreFileManager#storefiles is updated in HStore#bulkLoadHFile, 
notifyChangedReadersObservers is called to reset the StoreScanner#heap, so 
checkReseek->resetScannerStack will be triggered in next scan/read to recreate 
store scanners based on new storefiles.

so we could introduce a new region level rwlock multiCFLock, 
HRegion#bulkLoadHFiles acquires the writelock before multi-CF 
HStore.bulkLoadHFile call. and StoreScanner#resetScannerStack acquires the 
readlock. this way the scanners are recreated after all CFs' store files are 
populated.
{quote}

instead, the new lock should put at regionScanner layer.  see the patch 
attached.

the "mvn test" and "TestHRegionServerBulkLoad"(large test for atomic bulkload 
test) passed, still need to run large tests and performance test(any 
suggestions for it? YCSB?).

the lock can be further limited to a smaller scope by split 
HStore#bulkLoadHFile into 2 parts:1) rename the bulkload files and put new 
files into store files list 2) notifyChangedReadersObservers. only #2 needs the 
lock. 
if HDFS file rename is fast, the split may not be needed.



> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-14 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170649#comment-14170649
 ] 

Qiang Tian commented on HBASE-11368:


it looks to me the patch could show the value only when there is long 
compaction + gets/scans,  not sure if [~victorunique] wants to try it in some 
test env?
thanks.


> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12266) Slow Scan can cause dead loop in ClientScanner

2014-10-15 Thread Qiang Tian (JIRA)
Qiang Tian created HBASE-12266:
--

 Summary: Slow Scan can cause dead loop in ClientScanner 
 Key: HBASE-12266
 URL: https://issues.apache.org/jira/browse/HBASE-12266
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.96.0
Reporter: Qiang Tian
Priority: Minor


see http://search-hadoop.com/m/DHED45SVsC1.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12266) Slow Scan can cause dead loop in ClientScanner

2014-10-15 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-12266:
---
Attachment: HBASE-12266-master.patch

any particular purpose to set it to true there?
thanks.

> Slow Scan can cause dead loop in ClientScanner 
> ---
>
> Key: HBASE-12266
> URL: https://issues.apache.org/jira/browse/HBASE-12266
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Affects Versions: 0.96.0
>Reporter: Qiang Tian
>Priority: Minor
> Attachments: HBASE-12266-master.patch
>
>
> see http://search-hadoop.com/m/DHED45SVsC1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12266) Slow Scan can cause dead loop in ClientScanner

2014-10-15 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-12266:
---
Status: Patch Available  (was: Open)

> Slow Scan can cause dead loop in ClientScanner 
> ---
>
> Key: HBASE-12266
> URL: https://issues.apache.org/jira/browse/HBASE-12266
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Affects Versions: 0.96.0
>Reporter: Qiang Tian
>Priority: Minor
> Attachments: HBASE-12266-master.patch
>
>
> see http://search-hadoop.com/m/DHED45SVsC1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12266) Slow Scan can cause dead loop in ClientScanner

2014-10-15 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173259#comment-14173259
 ] 

Qiang Tian commented on HBASE-12266:


Thanks Guys,
frankly it looks to me such retry(including HBASE-7070)  just makes code more 
complicated to read and easy to create new bugs in complex system. and it is 
hard to be covered by test.

As mentioned in that jira: 
bq. 1.A next request is very large, so first time it is failed because of 
timeout

since it is caused by client side timeout, why not just throw exception so that 
user(or app layer code) knows it and set a bigger value. the timeout value is 
case by case, that is why we make it configurable, right?  







> Slow Scan can cause dead loop in ClientScanner 
> ---
>
> Key: HBASE-12266
> URL: https://issues.apache.org/jira/browse/HBASE-12266
> Project: HBase
>  Issue Type: Bug
>  Components: Scanners
>Affects Versions: 0.96.0
>Reporter: Qiang Tian
>Priority: Minor
> Attachments: 12266-v2.txt, HBASE-12266-master.patch
>
>
> see http://search-hadoop.com/m/DHED45SVsC1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12274) Race between RegionScannerImpl#nextInternal() and RegionScannerImpl#close() may produce null pointer exception

2014-10-15 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173292#comment-14173292
 ] 

Qiang Tian commented on HBASE-12274:


Hi Ted,
is it worth to find who close the scanner(is it possible to close it due to 
bug)? I asked this because I have the same question for StoreScanner, it looks 
only itself can call close.

Regarding synchronized, I did not test it myself, synchronized method cost may 
be big, but googled  synchronized block is fine, even better than lock.


> Race between RegionScannerImpl#nextInternal() and RegionScannerImpl#close() 
> may produce null pointer exception
> --
>
> Key: HBASE-12274
> URL: https://issues.apache.org/jira/browse/HBASE-12274
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.6.1
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 0.98.8, 0.99.2
>
> Attachments: 12274-v2.txt, 12274-v2.txt, 12274-v3.txt
>
>
> I saw the following in region server log:
> {code}
> 2014-10-15 03:28:36,976 ERROR 
> [B.DefaultRpcServer.handler=0,queue=0,port=60020] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5023)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4932)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4923)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3245)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is where the NPE happened:
> {code}
> // Let's see what we have in the storeHeap.
> KeyValue current = this.storeHeap.peek();
> {code}
> The cause was race between nextInternal(called through nextRaw) and close 
> methods.
> nextRaw() is not synchronized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-12274) Race between RegionScannerImpl#nextInternal() and RegionScannerImpl#close() may produce null pointer exception

2014-10-15 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173292#comment-14173292
 ] 

Qiang Tian edited comment on HBASE-12274 at 10/16/14 3:27 AM:
--

Hi Ted,
is it worth to find who close the scanner(is it possible to close it due to 
bug)? I asked this because I have the same question for StoreScanner, it looks 
only itself can call close.

Regarding synchronized, I did not test it myself, synchronized method cost may 
be big, but googled  synchronized block is fine, even better than lock. 
http://t.cn/R7zVKKB,http://t.cn/R7zVKK1



was (Author: tianq):
Hi Ted,
is it worth to find who close the scanner(is it possible to close it due to 
bug)? I asked this because I have the same question for StoreScanner, it looks 
only itself can call close.

Regarding synchronized, I did not test it myself, synchronized method cost may 
be big, but googled  synchronized block is fine, even better than lock.


> Race between RegionScannerImpl#nextInternal() and RegionScannerImpl#close() 
> may produce null pointer exception
> --
>
> Key: HBASE-12274
> URL: https://issues.apache.org/jira/browse/HBASE-12274
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.6.1
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 0.98.8, 0.99.2
>
> Attachments: 12274-v2.txt, 12274-v2.txt, 12274-v3.txt
>
>
> I saw the following in region server log:
> {code}
> 2014-10-15 03:28:36,976 ERROR 
> [B.DefaultRpcServer.handler=0,queue=0,port=60020] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5023)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4932)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4923)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3245)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is where the NPE happened:
> {code}
> // Let's see what we have in the storeHeap.
> KeyValue current = this.storeHeap.peek();
> {code}
> The cause was race between nextInternal(called through nextRaw) and close 
> methods.
> nextRaw() is not synchronized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12274) Race between RegionScannerImpl#nextInternal() and RegionScannerImpl#close() may produce null pointer exception

2014-10-16 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174650#comment-14174650
 ] 

Qiang Tian commented on HBASE-12274:


Hi Ted,
I also ran mvn test with 0.98.6. I did not hit the scanner error, but did get 
some other strange failure. the UT looks not very clean.

in RS log, the lease failure looks not expected as well.

{code}
org.apache.hadoop.hbase.regionserver.LeaseException: lease '8' does not exist
at 
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:221)
at 
org.apache.hadoop.hbase.regionserver.Leases.cancelLease(Leases.java:206)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3305)
{code}
it is from different rpc handler, just before NPE. 
we got NotServingRegionException? do we have more log?
thanks.




> Race between RegionScannerImpl#nextInternal() and RegionScannerImpl#close() 
> may produce null pointer exception
> --
>
> Key: HBASE-12274
> URL: https://issues.apache.org/jira/browse/HBASE-12274
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.6.1
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 0.98.8, 0.99.2
>
> Attachments: 12274-region-server.log, 12274-v2.txt, 12274-v2.txt, 
> 12274-v3.txt
>
>
> I saw the following in region server log:
> {code}
> 2014-10-15 03:28:36,976 ERROR 
> [B.DefaultRpcServer.handler=0,queue=0,port=60020] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5023)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4932)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4923)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3245)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is where the NPE happened:
> {code}
> // Let's see what we have in the storeHeap.
> KeyValue current = this.storeHeap.peek();
> {code}
> The cause was race between nextInternal(called through nextRaw) and close 
> methods.
> nextRaw() is not synchronized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12274) Race between RegionScannerImpl#nextInternal() and RegionScannerImpl#close() may produce null pointer exception

2014-10-16 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174674#comment-14174674
 ] 

Qiang Tian commented on HBASE-12274:


Hi Ted,
perhaps I misunderstood. sorry for that. please go ahead.
thanks.


> Race between RegionScannerImpl#nextInternal() and RegionScannerImpl#close() 
> may produce null pointer exception
> --
>
> Key: HBASE-12274
> URL: https://issues.apache.org/jira/browse/HBASE-12274
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.6.1
>Reporter: Ted Yu
>Assignee: Ted Yu
> Fix For: 2.0.0, 0.98.8, 0.99.2
>
> Attachments: 12274-region-server.log, 12274-v2.txt, 12274-v2.txt, 
> 12274-v3.txt
>
>
> I saw the following in region server log:
> {code}
> 2014-10-15 03:28:36,976 ERROR 
> [B.DefaultRpcServer.handler=0,queue=0,port=60020] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5023)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4932)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:4923)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3245)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This is where the NPE happened:
> {code}
> // Let's see what we have in the storeHeap.
> KeyValue current = this.storeHeap.peek();
> {code}
> The cause was race between nextInternal(called through nextRaw) and close 
> methods.
> nextRaw() is not synchronized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-23 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181168#comment-14181168
 ] 

Qiang Tian commented on HBASE-11368:


initial YCSB test:

Env:
---
hadoop 2.2.0
YCSB 1.0.4(Andrew's branch)
3 nodes, 1 master, 2 RS  //ignore cluster details since just to evaluate the 
new lock

Steps:
---
Followed Andrew's steps(see http://search-hadoop.com/m/DHED4hl7pC/)
the seed table has 3 CFs, pre-split to 20 regions
load 1 million rows to CF 'f1', using workloada
run 3 iterations for workloadc and workloada respectively. the parameter in 
each run:
bq. -p columnfamily=f1 -p operationcount=100 -s -threads 10


Results:
---
0.98.5:
workload c:
[READ], AverageLatency(us), 496.225811
[READ], AverageLatency(us), 510.206831
[READ], AverageLatency(us), 501.256123

workload a:
[READ], AverageLatency(us), 676.4527555821747
[READ], AverageLatency(us), 622.5544771452717
[READ], AverageLatency(us), 628.1365657163067


0.98.5+patch:
workload c:
[READ], AverageLatency(us), 536.334437
[READ], AverageLatency(us), 508.40
[READ], AverageLatency(us), 491.416182


workload a:
[READ], AverageLatency(us), 640.3625218319231
[READ], AverageLatency(us), 642.9719823488798
[READ], AverageLatency(us), 631.7491770928287

looks little performance penalty.

I also ran PE in the cluster, since the test table has only 1 CF, the new lock 
is actually not used. interestingly, with the patch the performance is even a 
bit better...

> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-24 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11368:
---
Attachment: performance_improvement_verification_98.5.patch

A simple comparison test using updated TestHRegionServerBulkLoad.java the 
number is for just for reference. the real perf improvement might depend on a 
combination of factors, such as campaction time, bulkload time, scan/read 
workload type, request currency etc)


98.5:
---
2014-10-24 02:30:03,399 INFO  [main] 
regionserver.TestHRegionServerBulkLoad(345):   loaded 16
2014-10-24 02:30:03,399 INFO  [main] 
regionserver.TestHRegionServerBulkLoad(346):   compations 16

2014-10-24 02:30:03,399 INFO  [main] 
regionserver.TestHRegionServerBulkLoad(348): Scanners:
//average # with 50 scanners
2014-10-24 02:30:03,399 INFO  [main] 
regionserver.TestHRegionServerBulkLoad(350):   scanned 73
2014-10-24 02:30:03,400 INFO  [main] 
regionserver.TestHRegionServerBulkLoad(351):   verified 18000 rows


98.5+patch

//since bulkload has smaller conflict with compaction, we get more 
bulkload/compaction request in fixed test cycle(5 minutes)
2014-10-24 02:41:19,071 INFO  [main] 
regionserver.TestHRegionServerBulkLoad(344): Loaders:
2014-10-24 02:41:19,072 INFO  [main] 
regionserver.TestHRegionServerBulkLoad(345):   loaded 43
2014-10-24 02:41:19,072 INFO  [main] 
regionserver.TestHRegionServerBulkLoad(346):   compations 43

2014-10-24 02:41:19,073 INFO  [main] 
regionserver.TestHRegionServerBulkLoad(348): Scanners:
 //since bulkload has smaller conflict with scan, we get more scans in fixed 
test cycle(5 minutes)
//average # for 50 scanners
2014-10-24 02:41:19,073 INFO  [main] 
regionserver.TestHRegionServerBulkLoad(350):   scanned 92  
2014-10-24 02:41:19,073 INFO  [main] 
regionserver.TestHRegionServerBulkLoad(351):   verified 25000 rows



> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch, 
> performance_improvement_verification_98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-24 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182647#comment-14182647
 ] 

Qiang Tian commented on HBASE-11368:


Hi [~stack], [~apurtell],
any comments?
thanks!


> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch, 
> performance_improvement_verification_98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-25 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11368:
---
Attachment: key_stacktrace_hbase10882.TXT

Hi [~stack],
Sorry for confusing. let me explain from scratch:
1)the root cause of problem - HRegion#lock.
>From the stacktrace in HBASE-10882(also see key_stacktrace_hbase10882.TXT 
>attached),  the event sequence is: 
1.1)the compaction acquires the readlock of HRegion#lock, 
1.2)the bulkload try to acquire the writelock of HRegion#lock if there are 
multiple CFs. it has to wait for compaction to release the readlock.
1.3)scanners try to acquire the readlock of HRegion#lock. they have to wait for 
the bulkload to release the writelock.
so both bulkload and scanners are blocked on HRegion#lock by compaction.

2)what is HRegion#lock used for?
Investigation on the HRegion#lock shows, it is originally designed to protect 
region close ONLY. if someone, such as region split, wants to close the region, 
it needs to wait for others release the readlock.  
Then HBASE-4552 used the lock to solve the multi-CF bulkload consistency issue. 
now we see it is too heavy.

3)can we not use HRegion#lock in bulkload?
the answer is yes. 
Internally, HStore#DefaultStoreFileManager#storefiles keeps track of the 
on-disk HFiles for a CF. we have below steps for the bulkload:
3.1)moves HFiles directly to region directory
3.2)add them into the {{storefiles}} list
3.3)notify StoreScanner that the HFile list is changed, which is done by 
resetting the StoreScanner#heap to null. this forces existing StoreScanner 
instances to reinitialize based on new the HFiles seen on disk in next 
scan/read request.
the step 3.2 and 3.3 is synchronized by HStore#lock. so we have CF level 
scan-bulkload consistency.
 
To achieve multi-CF scan-bulkload consistency, if we do not use HRegion#lock, 
we still need another region level lock --- a RegionScanner is composed of 
multiple StoreScanner, a StoreScanner(a CF scanner) is composed of a 
MemStoreScanner and multiple StoreFileScanner.

the RegionScannerImpl#sortheap(and joinedHeap) is just the entry point of 
multiple StoreScanners. to have multi-CF consistency, we need synchronization 
here - a lock is needed, but it is used only between scan and bulkload.



Regarding the code change you referenced, 
performance_improvement_verification_98.5.patch is to simulate the event 
sequence described in #1, for testing purpose only.

currently I use 98.5 for test since it is stable and easy to evaluate the 
effect of the change.
thanks.









> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch, key_stacktrace_hbase10882.TXT, 
> performance_improvement_verification_98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-25 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184066#comment-14184066
 ] 

Qiang Tian commented on HBASE-11368:


the attachments:
{{key_stacktrace_hbase10882.TXT}} : the problem stacktrace
{{hbase-11368-0.98.5.patch}} : the fix
{{performance_improvement_verification_98.5.patch}}: the testcase to verify 
performance improvement




> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch, key_stacktrace_hbase10882.TXT, 
> performance_improvement_verification_98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12336) RegionServer failed to shutdown for NodeFailoverWorker thread

2014-10-27 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184939#comment-14184939
 ] 

Qiang Tian commented on HBASE-12336:


the zookeeper stacktrace looks similar with ZOOKEEPER-2012 - 
ClientCnxn.submitRequest never returns.
http://pastebin.com/xU4MSq9k. 
is there any zookeeper error message in RS log?



> RegionServer failed to shutdown for NodeFailoverWorker thread
> -
>
> Key: HBASE-12336
> URL: https://issues.apache.org/jira/browse/HBASE-12336
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.11
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Attachments: stack
>
>
> After enabling hbase.zookeeper.useMulti in hbase cluster, we found that 
> regionserver failed to shutdown. Other threads have exited except a 
> NodeFailoverWorker thread.
> {code}
> "ReplicationExecutor-0" prio=10 tid=0x7f0d40195ad0 nid=0x73a in 
> Object.wait() [0x7f0dc8fe6000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:485)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
> - locked <0x0005a16df080> (a 
> org.apache.zookeeper.ClientCnxn$Packet)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:930)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:912)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1518)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:804)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:612)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> It's sure that the shutdown method of the executor is called in  
> ReplicationSourceManager#join.
>  
> I am looking for the root cause and suggestions are welcomed. Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-27 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11368:
---
Status: Patch Available  (was: Open)

> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, 
> key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long

2014-10-27 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-11368:
---
Attachment: hbase11368-master.patch

patch for master branch

> Multi-column family BulkLoad fails if compactions go on too long
> 
>
> Key: HBASE-11368
> URL: https://issues.apache.org/jira/browse/HBASE-11368
> Project: HBase
>  Issue Type: Bug
>Reporter: stack
>Assignee: Qiang Tian
> Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, 
> key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch
>
>
> Compactions take a read lock.  If a multi-column family region, before bulk 
> loading, we want to take a write lock on the region.  If the compaction takes 
> too long, the bulk load fails.
> Various recipes include:
> + Making smaller regions (lame)
> + [~victorunique] suggests major compacting just before bulk loading over in 
> HBASE-10882 as a work around.
> Does the compaction need a read lock for that long?  Does the bulk load need 
> a full write lock when multiple column families?  Can we fail more gracefully 
> at least?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12357) TestHCM#testClusterStatus is continuosly failing in jenkins

2014-10-28 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186563#comment-14186563
 ] 

Qiang Tian commented on HBASE-12357:


it looks here:-)
{quote}
Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
multicast group
   at sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
   at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
   at 
io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
   at 
org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
   at 
org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
   at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
   at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:142)
{quote}

{code}
 756   private MembershipKey innerJoin(InetAddress group,
  757   NetworkInterface interf,
  758   InetAddress source)
  759   throws IOException
  760   {
  761   if (!group.isMulticastAddress())
  762   throw new IllegalArgumentException("Group not a multicast 
address");
  763   
  764   // check multicast address is compatible with this socket
  765   if (group instanceof Inet4Address) {
  766   if (family == StandardProtocolFamily.INET6 && 
!Net.canIPv6SocketJoinIPv4Group())
  767   throw new IllegalArgumentException("IPv6 socket cannot 
join IPv4 multicast group");
  768   } else if (group instanceof Inet6Address) {
  769   if (family != StandardProtocolFamily.INET6)
  770   throw new IllegalArgumentException("Only IPv6 sockets 
can join IPv6 multicast group");
  771   } else {
  772   throw new IllegalArgumentException("Address type not 
supported");
  773   }
{code}

looks need to specify {{StandardProtocolFamily.INET6}} for IPV6 addr..?



> TestHCM#testClusterStatus is continuosly failing in jenkins
> ---
>
> Key: HBASE-12357
> URL: https://issues.apache.org/jira/browse/HBASE-12357
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.0
>Reporter: Ashish Singhi
>Assignee: Dima Spivak
>
> {noformat}Tests run: 21, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 187.475 sec <<< FAILURE! - in org.apache.hadoop.hbase.client.TestHCM
> testClusterStatus(org.apache.hadoop.hbase.client.TestHCM)  Time elapsed: 
> 41.477 sec  <<< ERROR!
> java.lang.Exception: Unexpected exception, 
> expected 
> but was
>   at junit.framework.Assert.fail(Assert.java:57)
>   at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3537)
>   at 
> org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:273)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12357) TestHCM#testClusterStatus is continuosly failing in jenkins

2014-10-28 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-12357:
---
Description: 
{}


{noformat}Tests run: 21, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
187.475 sec <<< FAILURE! - in org.apache.hadoop.hbase.client.TestHCM
testClusterStatus(org.apache.hadoop.hbase.client.TestHCM)  Time elapsed: 41.477 
sec  <<< ERROR!
java.lang.Exception: Unexpected exception, 
expected but 
was
at junit.framework.Assert.fail(Assert.java:57)
at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3537)
at 
org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:273)
{noformat}

  was:
{noformat}Tests run: 21, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
187.475 sec <<< FAILURE! - in org.apache.hadoop.hbase.client.TestHCM
testClusterStatus(org.apache.hadoop.hbase.client.TestHCM)  Time elapsed: 41.477 
sec  <<< ERROR!
java.lang.Exception: Unexpected exception, 
expected but 
was
at junit.framework.Assert.fail(Assert.java:57)
at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3537)
at 
org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:273)
{noformat}


> TestHCM#testClusterStatus is continuosly failing in jenkins
> ---
>
> Key: HBASE-12357
> URL: https://issues.apache.org/jira/browse/HBASE-12357
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.0
>Reporter: Ashish Singhi
>Assignee: Dima Spivak
>
> {}
> {noformat}Tests run: 21, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 187.475 sec <<< FAILURE! - in org.apache.hadoop.hbase.client.TestHCM
> testClusterStatus(org.apache.hadoop.hbase.client.TestHCM)  Time elapsed: 
> 41.477 sec  <<< ERROR!
> java.lang.Exception: Unexpected exception, 
> expected 
> but was
>   at junit.framework.Assert.fail(Assert.java:57)
>   at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3537)
>   at 
> org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:273)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12357) TestHCM#testClusterStatus is continuosly failing in jenkins

2014-10-28 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-12357:
---
Description: 
{noformat}
014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error starting 
cluster
java.lang.RuntimeException: Failed construction of Master: class 
org.apache.hadoop.hbase.master.HMaster
   at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
   at 
org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
   at 
org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
   at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
   at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
   at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
   at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
   at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
   at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
   at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
   at 
org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
   at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
   at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
   at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
   at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
   at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
   at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
multicast group
   at sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
   at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
   at 
io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
   at 
org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
   at 
org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
   at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
   at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:142)
   ... 26 more


Tests run: 21, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 187.475 sec 
<<< FAILURE! - in org.apache.hadoop.hbase.client.TestHCM
testClusterStatus(org.apache.hadoop.hbase.client.TestHCM)  Time elapsed: 41.477 
sec  <<< ERROR!
java.lang.Exception: Unexpected exception, 
expected but 
was
at junit.framework.Assert.fail(Assert.java:57)
at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3537)
at 
org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:273)
{noformat}

  was:
{}


{noformat}Tests run: 21, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
187.475 sec <<< FAILURE! - in org.apache.hadoop.hbase.client.TestHCM
testClusterStatus(org.apache.hadoop.hbase.client.TestHCM)  Time elapsed: 41.477 
sec  <<< ERROR!
java.lang.Exception: Unexpected exception, 
expected but 
was
at junit.framework.Assert.fail(Assert.java:57)
  

[jira] [Commented] (HBASE-12357) TestHCM#testClusterStatus is continuosly failing in jenkins

2014-10-28 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186580#comment-14186580
 ] 

Qiang Tian commented on HBASE-12357:


oops. the stacktrace comes from Ashish's email.
the code is JDK code about:
bq.  java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 multicast 
group


see the error here:
http://docs.oracle.com/javase/7/docs/api/java/nio/channels/MulticastChannel.html

we are hitting here:
{code}
 765   if (group instanceof Inet4Address) {
 766   if (family == StandardProtocolFamily.INET6 && 
!Net.canIPv6SocketJoinIPv4Group())
  767   throw new IllegalArgumentException("IPv6 socket cannot 
join IPv4 multicast group");
{code}

the parameter group is IPV4, and family is IPV6(StandardProtocolFamily.INET6), 
but I do not see family is specified to IPv6 in ClusterStPublisher.java. 
strange..




> TestHCM#testClusterStatus is continuosly failing in jenkins
> ---
>
> Key: HBASE-12357
> URL: https://issues.apache.org/jira/browse/HBASE-12357
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.0
>Reporter: Ashish Singhi
>Assignee: Dima Spivak
>
> {noformat}
> Tests run: 21, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 187.475 sec 
> <<< FAILURE! - in org.apache.hadoop.hbase.client.TestHCM
> testClusterStatus(org.apache.hadoop.hbase.client.TestHCM)  Time elapsed: 
> 41.477 sec  <<< ERROR!
> java.lang.Exception: Unexpected exception, 
> expected 
> but was
>   at junit.framework.Assert.fail(Assert.java:57)
>   at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3537)
>   at 
> org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:273)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12357) TestHCM#testClusterStatus is continuosly failing in jenkins

2014-10-28 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186584#comment-14186584
 ] 

Qiang Tian commented on HBASE-12357:


[~ashish singhi],
Can you confirm is it the same problem?

> TestHCM#testClusterStatus is continuosly failing in jenkins
> ---
>
> Key: HBASE-12357
> URL: https://issues.apache.org/jira/browse/HBASE-12357
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.0
>Reporter: Ashish Singhi
>Assignee: Dima Spivak
>
> {noformat}
> Tests run: 21, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 187.475 sec 
> <<< FAILURE! - in org.apache.hadoop.hbase.client.TestHCM
> testClusterStatus(org.apache.hadoop.hbase.client.TestHCM)  Time elapsed: 
> 41.477 sec  <<< ERROR!
> java.lang.Exception: Unexpected exception, 
> expected 
> but was
>   at junit.framework.Assert.fail(Assert.java:57)
>   at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3537)
>   at 
> org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:273)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12357) TestHCM#testClusterStatus is continuosly failing in jenkins

2014-10-28 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186602#comment-14186602
 ] 

Qiang Tian commented on HBASE-12357:


ah..sorry for that [~dimaspivak].
that looks a bug too. will open a new jira



> TestHCM#testClusterStatus is continuosly failing in jenkins
> ---
>
> Key: HBASE-12357
> URL: https://issues.apache.org/jira/browse/HBASE-12357
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.0
>Reporter: Ashish Singhi
>Assignee: Dima Spivak
>
> {noformat}
> Tests run: 21, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 187.475 sec 
> <<< FAILURE! - in org.apache.hadoop.hbase.client.TestHCM
> testClusterStatus(org.apache.hadoop.hbase.client.TestHCM)  Time elapsed: 
> 41.477 sec  <<< ERROR!
> java.lang.Exception: Unexpected exception, 
> expected 
> but was
>   at junit.framework.Assert.fail(Assert.java:57)
>   at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193)
>   at 
> org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3537)
>   at 
> org.apache.hadoop.hbase.client.TestHCM.testClusterStatus(TestHCM.java:273)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12359) TestHCM.java failed on windows

2014-10-28 Thread Qiang Tian (JIRA)
Qiang Tian created HBASE-12359:
--

 Summary: TestHCM.java failed on windows 
 Key: HBASE-12359
 URL: https://issues.apache.org/jira/browse/HBASE-12359
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.0.0
Reporter: Qiang Tian
Priority: Minor


see http://osdir.com/ml/general/2014-10/msg56689.html
{code}
014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error starting 
cluster
java.lang.RuntimeException: Failed construction of Master: class 
org.apache.hadoop.hbase.master.HMaster
   at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
   at 
org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
   at 
org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
   at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
   at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
   at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
   at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
   at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
   at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
   at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
   at 
org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
   at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
   at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
   at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
   at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
   at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
   at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
multicast group
   at sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
   at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
   at 
io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
   at 
org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
   at 
org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
   at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
   at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:142)
   ... 26 more
{code}

the exception comes from below JDK code:
{code}
 765   if (group instanceof Inet4Address) {
 766   if (family == StandardProtocolFamily.INET6 && 
!Net.canIPv6SocketJoinIPv4Group())
 767   throw new IllegalArgumentException("IPv6 socket cannot 
join IPv4 multicast group");
{code}


according to 
document(http://docs.oracle.com/javase/7/docs/api/java/nio/channels/MulticastChannel.html)

{quote}
The multicast implementation is intended to map directly to the native 
multicasting facility. Consequently, the following items should be considered 
when developing an application that receives IP multicast datagrams:

The creation of the channel should specify the ProtocolFamily that 
corresponds

[jira] [Assigned] (HBASE-12359) TestHCM.java failed on windows

2014-10-28 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian reassigned HBASE-12359:
--

Assignee: Qiang Tian

> TestHCM.java failed on windows 
> ---
>
> Key: HBASE-12359
> URL: https://issues.apache.org/jira/browse/HBASE-12359
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Qiang Tian
>Assignee: Qiang Tian
>Priority: Minor
>
> see http://osdir.com/ml/general/2014-10/msg56689.html
> {code}
> 014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error 
> starting cluster
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
>at 
> org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
> multicast group
>at 
> sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
>at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
>at 
> io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
>at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:142)
>... 26 more
> {code}
> the exception comes from below JDK code:
> {code}
>  765   if (group instanceof Inet4Address) {
>  766   if (family == StandardProtocolFamily.INET6 && 
> !Net.canIPv6SocketJoinIPv4Group())
>  767   throw new IllegalArgumentException("IPv6 socket cannot 
> join IPv4 multicast group");
> {code}
> according to 
> document(

[jira] [Commented] (HBASE-12359) TestHCM.java failed on windows

2014-10-28 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186632#comment-14186632
 ] 

Qiang Tian commented on HBASE-12359:


we should support both IPv4 and IPv6 family, right?


> TestHCM.java failed on windows 
> ---
>
> Key: HBASE-12359
> URL: https://issues.apache.org/jira/browse/HBASE-12359
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Qiang Tian
>Assignee: Qiang Tian
>Priority: Minor
>
> see http://osdir.com/ml/general/2014-10/msg56689.html
> {code}
> 014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error 
> starting cluster
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
>at 
> org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
> multicast group
>at 
> sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
>at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
>at 
> io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
>at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:142)
>... 26 more
> {code}
> the exception comes from below JDK code:
> {code}
>  765   if (group instanceof Inet4Address) {
>  766   if (family == StandardProtocolFamily.INET6 && 
> !Net.canIPv6SocketJoinIPv4Group())
>  767   throw new IllegalArgumentException("IPv6 soc

[jira] [Updated] (HBASE-12359) MulticastPublisher should specify IPv4/v6 protocol family when creating multicast channel

2014-11-02 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-12359:
---
Summary: MulticastPublisher should specify IPv4/v6 protocol family when 
creating multicast channel  (was: TestHCM.java failed on windows )

> MulticastPublisher should specify IPv4/v6 protocol family when creating 
> multicast channel
> -
>
> Key: HBASE-12359
> URL: https://issues.apache.org/jira/browse/HBASE-12359
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Qiang Tian
>Assignee: Qiang Tian
>Priority: Minor
>
> see http://osdir.com/ml/general/2014-10/msg56689.html
> {code}
> 014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error 
> starting cluster
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
>at 
> org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
> multicast group
>at 
> sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
>at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
>at 
> io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
>at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:142)
>... 26 more
> {code}
> the exception comes from below JDK code:
> {code}
>  765   if (group instanceof Inet4Address) {
>  766   i

[jira] [Updated] (HBASE-12359) MulticastPublisher should specify IPv4/v6 protocol family when creating multicast channel

2014-11-02 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-12359:
---
Attachment: hbase-12359-master.patch

upload master patch

> MulticastPublisher should specify IPv4/v6 protocol family when creating 
> multicast channel
> -
>
> Key: HBASE-12359
> URL: https://issues.apache.org/jira/browse/HBASE-12359
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Qiang Tian
>Assignee: Qiang Tian
>Priority: Minor
> Attachments: hbase-12359-master.patch
>
>
> see http://osdir.com/ml/general/2014-10/msg56689.html
> {code}
> 014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error 
> starting cluster
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
>at 
> org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
> multicast group
>at 
> sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
>at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
>at 
> io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
>at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:142)
>... 26 more
> {code}
> the exception comes from below JDK code:
> {code}
>  765   if (group instanceof Inet4Address) {
>  766   if (family == StandardProtocolFam

[jira] [Updated] (HBASE-12359) MulticastPublisher should specify IPv4/v6 protocol family when creating multicast channel

2014-11-02 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-12359:
---
Status: Patch Available  (was: Open)

> MulticastPublisher should specify IPv4/v6 protocol family when creating 
> multicast channel
> -
>
> Key: HBASE-12359
> URL: https://issues.apache.org/jira/browse/HBASE-12359
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Qiang Tian
>Assignee: Qiang Tian
>Priority: Minor
> Attachments: hbase-12359-master.patch
>
>
> see http://osdir.com/ml/general/2014-10/msg56689.html
> {code}
> 014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error 
> starting cluster
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
>at 
> org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
> multicast group
>at 
> sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
>at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
>at 
> io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
>at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:142)
>... 26 more
> {code}
> the exception comes from below JDK code:
> {code}
>  765   if (group instanceof Inet4Address) {
>  766   if (family == StandardProtocolFamily.INET6 && 
> !Net.

[jira] [Commented] (HBASE-12359) MulticastPublisher should specify IPv4/v6 protocol family when creating multicast channel

2014-11-02 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194325#comment-14194325
 ] 

Qiang Tian commented on HBASE-12359:


Hi [~ashish singhi],
could you try the patch on your windows env?


> MulticastPublisher should specify IPv4/v6 protocol family when creating 
> multicast channel
> -
>
> Key: HBASE-12359
> URL: https://issues.apache.org/jira/browse/HBASE-12359
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Qiang Tian
>Assignee: Qiang Tian
>Priority: Minor
> Attachments: hbase-12359-master.patch
>
>
> see http://osdir.com/ml/general/2014-10/msg56689.html
> {code}
> 014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error 
> starting cluster
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
>at 
> org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
> multicast group
>at 
> sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
>at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
>at 
> io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
>at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:142)
>... 26 more
> {code}
> the exception comes from below JDK code:
> {code}
>  765   if (group instanceof Inet4Ad

[jira] [Commented] (HBASE-12336) RegionServer failed to shutdown for NodeFailoverWorker thread

2014-11-02 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194334#comment-14194334
 ] 

Qiang Tian commented on HBASE-12336:


bq. Maybe we need to look at server.isStopped inside ReplicationSourceManager 
more often than we do?

[~stack], I think it is a zookeeper issue - the stacktrace shows the zk client 
thread is stuck
{code}
Object.wait() [0x7f0dc8fe6000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:485)
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
{code}

so it cannot shut down. 

as shown in ZOOKEEPER-2012, there is timing hole between the zk client 'main' 
thread and
'main-SendThread' thread.


> RegionServer failed to shutdown for NodeFailoverWorker thread
> -
>
> Key: HBASE-12336
> URL: https://issues.apache.org/jira/browse/HBASE-12336
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.11
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Fix For: 2.0.0, 0.94.26, 0.98.9, 0.99.2
>
> Attachments: HBASE-12336-trunk-v1.diff, stack
>
>
> After enabling hbase.zookeeper.useMulti in hbase cluster, we found that 
> regionserver failed to shutdown. Other threads have exited except a 
> NodeFailoverWorker thread.
> {code}
> "ReplicationExecutor-0" prio=10 tid=0x7f0d40195ad0 nid=0x73a in 
> Object.wait() [0x7f0dc8fe6000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:485)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
> - locked <0x0005a16df080> (a 
> org.apache.zookeeper.ClientCnxn$Packet)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:930)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:912)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1518)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:804)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:612)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> It's sure that the shutdown method of the executor is called in  
> ReplicationSourceManager#join.
>  
> I am looking for the root cause and suggestions are welcomed. Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12359) MulticastPublisher should specify IPv4/v6 protocol family when creating multicast channel

2014-11-03 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194370#comment-14194370
 ] 

Qiang Tian commented on HBASE-12359:


Thanks [~ashish singhi],
just noticed there are only 2 watchers! :-) ping [~stack]

Considering the description of the document.
bq. The creation of the channel should specify the ProtocolFamily that 
corresponds to the  address type of the multicast groups that the channel will 
join.

I'd think putting the new factory code into netty is better...



> MulticastPublisher should specify IPv4/v6 protocol family when creating 
> multicast channel
> -
>
> Key: HBASE-12359
> URL: https://issues.apache.org/jira/browse/HBASE-12359
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Qiang Tian
>Assignee: Qiang Tian
>Priority: Minor
> Attachments: hbase-12359-master.patch
>
>
> see http://osdir.com/ml/general/2014-10/msg56689.html
> {code}
> 014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error 
> starting cluster
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
>at 
> org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
> multicast group
>at 
> sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
>at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
>at 
> io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
>at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lan

[jira] [Commented] (HBASE-12359) MulticastPublisher should specify IPv4/v6 protocol family when creating multicast channel

2014-11-05 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199630#comment-14199630
 ] 

Qiang Tian commented on HBASE-12359:


Hi [~stack], 
run on Linux is fine.
thanks.


> MulticastPublisher should specify IPv4/v6 protocol family when creating 
> multicast channel
> -
>
> Key: HBASE-12359
> URL: https://issues.apache.org/jira/browse/HBASE-12359
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Qiang Tian
>Assignee: Qiang Tian
>Priority: Minor
> Attachments: hbase-12359-master.patch
>
>
> see http://osdir.com/ml/general/2014-10/msg56689.html
> {code}
> 014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error 
> starting cluster
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
>at 
> org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
> multicast group
>at 
> sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
>at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
>at 
> io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
>at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:142)
>... 26 more
> {code}
> the exception comes from below JDK code:
> {code}
>  765   if (group instanceof Inet4Address) {
>  766   

[jira] [Commented] (HBASE-12336) RegionServer failed to shutdown for NodeFailoverWorker thread

2014-11-05 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199654#comment-14199654
 ] 

Qiang Tian commented on HBASE-12336:


Hi [~stack], 
As I understand, the zookeeper-2012 could apply to this issue as well.
the root of the problem is zk uses 2 queues for request handling. when a packet 
is not on one of the 2 queues. the exception in send thread(in this case, could 
be due to cluster restarted?) will just ignore the packet, so the main thread 
will never get response and hang there. But we need more data for proof.. (so 
far the occurrence is rare..)
thanks.


> RegionServer failed to shutdown for NodeFailoverWorker thread
> -
>
> Key: HBASE-12336
> URL: https://issues.apache.org/jira/browse/HBASE-12336
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.11
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Fix For: 2.0.0, 0.94.26, 0.98.9, 0.99.2
>
> Attachments: HBASE-12336-trunk-v1.diff, stack
>
>
> After enabling hbase.zookeeper.useMulti in hbase cluster, we found that 
> regionserver failed to shutdown. Other threads have exited except a 
> NodeFailoverWorker thread.
> {code}
> "ReplicationExecutor-0" prio=10 tid=0x7f0d40195ad0 nid=0x73a in 
> Object.wait() [0x7f0dc8fe6000]
>java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at java.lang.Object.wait(Object.java:485)
> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
> - locked <0x0005a16df080> (a 
> org.apache.zookeeper.ClientCnxn$Packet)
> at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:930)
> at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:912)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1518)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:804)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:612)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> It's sure that the shutdown method of the executor is called in  
> ReplicationSourceManager#join.
>  
> I am looking for the root cause and suggestions are welcomed. Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12451) IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits in rolling update of cluster

2014-11-10 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14205828#comment-14205828
 ] 

Qiang Tian commented on HBASE-12451:


Minimum split size is good and simple enough for me..user could have a tradeoff 
between automatic tuning and customization based on knowledge of their 
workload...(many times we do not want to expose too many configure parameters, 
but it looks really useful in some cases:-))

based on total region count looks hard to control, e.g. if user pre-split many 
regions, e.g. in 
http://search-hadoop.com/m/DHED4aS08G1 with 240 regions, the size will be quite 
big unless "hbase.increasing.policy.initial.size" is also configured..



> IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits 
> in rolling update of cluster
> 
>
> Key: HBASE-12451
> URL: https://issues.apache.org/jira/browse/HBASE-12451
> Project: HBase
>  Issue Type: Bug
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently IncreasingToUpperBoundRegionSplitPolicy is the default region split 
> policy. In this policy, split size is the number of regions that are on this 
> server that all are of the same table, cubed, times 2x the region flush size.
> But when unloading regions of a regionserver in a cluster using 
> region_mover.rb, the number of regions that are on this server that all are 
> of the same table will decrease, and the split size will decrease too, which 
> may cause the left region split in the regionsever. Region Splits also 
> happens when loading regions of a regionserver in a cluster. 
> A improvment may set a minimum split size in 
> IncreasingToUpperBoundRegionSplitPolicy
> Suggestions are welcomed. Thanks~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12451) IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits in rolling update of cluster

2014-11-10 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206040#comment-14206040
 ] 

Qiang Tian commented on HBASE-12451:


Thanks Duo, forgot that. 
with default values, it looks the region split size will use upper limit after 
3 regions.
according to http://hbase.apache.org/book/ops.capacity.html, region count and 
region size are most important factors, but there is no clear answer for region 
count?

bq. If we already have 240 regions of a table, and there is only one region of 
this table on a regionserver, should the region have a small split size?

the regions should be evenly spread across RS(8 RS in that case)






> IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits 
> in rolling update of cluster
> 
>
> Key: HBASE-12451
> URL: https://issues.apache.org/jira/browse/HBASE-12451
> Project: HBase
>  Issue Type: Bug
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently IncreasingToUpperBoundRegionSplitPolicy is the default region split 
> policy. In this policy, split size is the number of regions that are on this 
> server that all are of the same table, cubed, times 2x the region flush size.
> But when unloading regions of a regionserver in a cluster using 
> region_mover.rb, the number of regions that are on this server that all are 
> of the same table will decrease, and the split size will decrease too, which 
> may cause the left region split in the regionsever. Region Splits also 
> happens when loading regions of a regionserver in a cluster. 
> A improvment may set a minimum split size in 
> IncreasingToUpperBoundRegionSplitPolicy
> Suggestions are welcomed. Thanks~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12451) IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits in rolling update of cluster

2014-11-14 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212007#comment-14212007
 ] 

Qiang Tian commented on HBASE-12451:


Shaohui boss, this looks not a small change? :-)
Even if average region count is used,  below code will use max file size in 
most cases - e.g. when tableRegionsCount = 4, the value is 16384M which is 
bigger than DEFAULT_MAX_FILE_SIZE..
{code}
return tableRegionsCount == 0 || tableRegionsCount > 100 ? 
getDesiredMaxFileSize():
  Math.min(getDesiredMaxFileSize(),
this.initialSize * tableRegionsCount * tableRegionsCount * 
tableRegionsCount);

{code}
Personally I like KISS - a simple configure parameter can resolve this case - 
if we have many complains on existing region split policy, that is another 
story...my 2 cents.


> IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits 
> in rolling update of cluster
> 
>
> Key: HBASE-12451
> URL: https://issues.apache.org/jira/browse/HBASE-12451
> Project: HBase
>  Issue Type: Bug
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-12451-v1.diff
>
>
> Currently IncreasingToUpperBoundRegionSplitPolicy is the default region split 
> policy. In this policy, split size is the number of regions that are on this 
> server that all are of the same table, cubed, times 2x the region flush size.
> But when unloading regions of a regionserver in a cluster using 
> region_mover.rb, the number of regions that are on this server that all are 
> of the same table will decrease, and the split size will decrease too, which 
> may cause the left region split in the regionsever. Region Splits also 
> happens when loading regions of a regionserver in a cluster. 
> A improvment may set a minimum split size in 
> IncreasingToUpperBoundRegionSplitPolicy
> Suggestions are welcomed. Thanks~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12451) IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits in rolling update of cluster

2014-11-14 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212065#comment-14212065
 ] 

Qiang Tian commented on HBASE-12451:


bq. Add a minimum split size will break the first goal, right? 
it is configurable, we can set the default minimum size to 0, the calculation 
will ignore the minimum size in this case:(Pseudocode)
{code}
long splitSize = tableRegionsCount == 0 || tableRegionsCount > 100 ? 
getDesiredMaxFileSize():
  Math.min(getDesiredMaxFileSize(),
this.initialSize * tableRegionsCount * tableRegionsCount * 
tableRegionsCount);

long minmumSize = conf.getLong("hbase.hregion.split.minimum.size", 0);
return minmumSize > 0 ? Math.max(splitSize, minmumSize) : splitSize;
  }
{code}

bq. And actually, there is a "hbase.increasing.policy.initial.size" 
configuration in IncreasingToUpperBoundRegionSplitPolicy which specify the 
initial(minimum) split size. 

beside initial.size, tableRegionsCount is also a variable.

bq. And if you do not like the curve of split size, I think we should introduce 
a new split policy instead?
I do not mean I do not like it, it looks to me the case described in the jira 
is not a common case? so a simple fix can resolve it quickly...if there are 
other issues I am open for ideas :-)




> IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits 
> in rolling update of cluster
> 
>
> Key: HBASE-12451
> URL: https://issues.apache.org/jira/browse/HBASE-12451
> Project: HBase
>  Issue Type: Bug
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-12451-v1.diff
>
>
> Currently IncreasingToUpperBoundRegionSplitPolicy is the default region split 
> policy. In this policy, split size is the number of regions that are on this 
> server that all are of the same table, cubed, times 2x the region flush size.
> But when unloading regions of a regionserver in a cluster using 
> region_mover.rb, the number of regions that are on this server that all are 
> of the same table will decrease, and the split size will decrease too, which 
> may cause the left region split in the regionsever. Region Splits also 
> happens when loading regions of a regionserver in a cluster. 
> A improvment may set a minimum split size in 
> IncreasingToUpperBoundRegionSplitPolicy
> Suggestions are welcomed. Thanks~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12451) IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits in rolling update of cluster

2014-11-14 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212154#comment-14212154
 ] 

Qiang Tian commented on HBASE-12451:



Had a offline discuss with Duo, Get more details about the scenario. Now I 
understand the requirement.thanks


> IncreasingToUpperBoundRegionSplitPolicy may cause unnecessary region splits 
> in rolling update of cluster
> 
>
> Key: HBASE-12451
> URL: https://issues.apache.org/jira/browse/HBASE-12451
> Project: HBase
>  Issue Type: Bug
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-12451-v1.diff
>
>
> Currently IncreasingToUpperBoundRegionSplitPolicy is the default region split 
> policy. In this policy, split size is the number of regions that are on this 
> server that all are of the same table, cubed, times 2x the region flush size.
> But when unloading regions of a regionserver in a cluster using 
> region_mover.rb, the number of regions that are on this server that all are 
> of the same table will decrease, and the split size will decrease too, which 
> may cause the left region split in the regionsever. Region Splits also 
> happens when loading regions of a regionserver in a cluster. 
> A improvment may set a minimum split size in 
> IncreasingToUpperBoundRegionSplitPolicy
> Suggestions are welcomed. Thanks~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12359) MulticastPublisher should specify IPv4/v6 protocol family when creating multicast channel

2014-11-14 Thread Qiang Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213383#comment-14213383
 ] 

Qiang Tian commented on HBASE-12359:


Hi Stack,
that is strange. the jekins run and local run were both fine. My local run 
against the latest master branch on linux VM is also fine..
from the openjdk code(suppose similar with oracle jdk code), we hit below null:

{code}
  151* Returns any IPv4 address of the given network interface, or
  152* null if the interface does not have any IPv4 addresses.
  153*/
  154   static Inet4Address anyInet4Address(final NetworkInterface interf) {
  155   return AccessController.doPrivileged(new 
PrivilegedAction() {
  156   public Inet4Address run() {
  157   Enumeration addrs = 
interf.getInetAddresses();
  158   while (addrs.hasMoreElements()) {
  159   InetAddress addr = addrs.nextElement();
  160   if (addr instanceof Inet4Address) {
  161   return (Inet4Address)addr;
  162   }
  163   }
  164   return null;  //  MulticastPublisher should specify IPv4/v6 protocol family when creating 
> multicast channel
> -
>
> Key: HBASE-12359
> URL: https://issues.apache.org/jira/browse/HBASE-12359
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Qiang Tian
>Assignee: Qiang Tian
>Priority: Minor
> Attachments: hbase-12359-master.patch
>
>
> see http://osdir.com/ml/general/2014-10/msg56689.html
> {code}
> 014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error 
> starting cluster
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
>at 
> org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>at 
> org.eclipse.jdt.i

[jira] [Updated] (HBASE-12359) MulticastPublisher should specify IPv4/v6 protocol family when creating multicast channel

2014-11-14 Thread Qiang Tian (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiang Tian updated HBASE-12359:
---
Status: Open  (was: Patch Available)

> MulticastPublisher should specify IPv4/v6 protocol family when creating 
> multicast channel
> -
>
> Key: HBASE-12359
> URL: https://issues.apache.org/jira/browse/HBASE-12359
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Qiang Tian
>Assignee: Qiang Tian
>Priority: Minor
> Attachments: hbase-12359-master.patch
>
>
> see http://osdir.com/ml/general/2014-10/msg56689.html
> {code}
> 014-10-28 12:21:47,337 ERROR [main] hbase.MiniHBaseCluster(230): Error 
> starting cluster
> java.lang.RuntimeException: Failed construction of Master: class 
> org.apache.hadoop.hbase.master.HMaster
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:145)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:153)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:215)
>at 
> org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:94)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:914)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:877)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:794)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:765)
>at 
> org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:752)
>at 
> org.apache.hadoop.hbase.client.TestHCM.setUpBeforeClass(TestHCM.java:138)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:601)
>at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: java.lang.IllegalArgumentException: IPv6 socket cannot join IPv4 
> multicast group
>at 
> sun.nio.ch.DatagramChannelImpl.innerJoin(DatagramChannelImpl.java:779)
>at sun.nio.ch.DatagramChannelImpl.join(DatagramChannelImpl.java:865)
>at 
> io.netty.channel.socket.nio.NioDatagramChannel.joinGroup(NioDatagramChannel.java:394)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher$MulticastPublisher.connect(ClusterStatusPublisher.java:273)
>at 
> org.apache.hadoop.hbase.master.ClusterStatusPublisher.(ClusterStatusPublisher.java:121)
>at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:307)
>at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>at 
> org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:142)
>... 26 more
> {code}
> the exception comes from below JDK code:
> {code}
>  765   if (group instanceof Inet4Address) {
>  766   if (family == StandardProtocolFamily.INET6 && 
> !Net.

  1   2   3   >