[jira] [Created] (HBASE-27035) failed to set file permission when node crash
lujie created HBASE-27035: - Summary: failed to set file permission when node crash Key: HBASE-27035 URL: https://issues.apache.org/jira/browse/HBASE-27035 Project: HBase Issue Type: Bug Reporter: lujie in SecureBulkLoadManager#secureBulkLoadHFiles, we have code like that: {code:java} for(Pair el: familyPaths) { Path stageFamily = new Path(bulkToken, Bytes.toString(el.getFirst())); if(!fs.exists(stageFamily)) { fs.mkdirs(stageFamily); fs.setPermission(stageFamily, PERM_ALL_ACCESS); } } {code} if process crashbefore setpermission, and reboot, we can't setpermission again. we should make this code like SnapshotScannerHDFSAclHelper#setCommonDirectoryPermission {code:java} for (Path path : paths) { createDirIfNotExist(path); fs.setPermission(path, new FsPermission( conf.get(COMMON_DIRECTORY_PERMISSION, COMMON_DIRECTORY_PERMISSION_DEFAULT))); } {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (HBASE-25877) Add access check for switchCompaction
lujie created HBASE-25877: - Summary: Add access check for switchCompaction Key: HBASE-25877 URL: https://issues.apache.org/jira/browse/HBASE-25877 Project: HBase Issue Type: Bug Reporter: lujie Should we add access check for org.apache.hadoop.hbase.regionserver.CompactSplit.switchCompaction? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25558) Adding audit log for execMasterService
lujie created HBASE-25558: - Summary: Adding audit log for execMasterService Key: HBASE-25558 URL: https://issues.apache.org/jira/browse/HBASE-25558 Project: HBase Issue Type: Bug Reporter: lujie -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25422) update_all_config should not be executed by non-admin user!!!
[ https://issues.apache.org/jira/browse/HBASE-25422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie resolved HBASE-25422. --- Resolution: Duplicate > update_all_config should not be executed by non-admin user!!! > - > > Key: HBASE-25422 > URL: https://issues.apache.org/jira/browse/HBASE-25422 > Project: HBase > Issue Type: Bug >Reporter: lujie >Priority: Critical > Attachments: image-2020-12-20-12-50-23-433.png > > > !image-2020-12-20-12-50-23-433.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25456) setRegionStateInMeta need security check
lujie created HBASE-25456: - Summary: setRegionStateInMeta need security check Key: HBASE-25456 URL: https://issues.apache.org/jira/browse/HBASE-25456 Project: HBase Issue Type: Bug Reporter: lujie Assignee: lujie -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25441) Unauthorized client can shutdown the regionserver
lujie created HBASE-25441: - Summary: Unauthorized client can shutdown the regionserver Key: HBASE-25441 URL: https://issues.apache.org/jira/browse/HBASE-25441 Project: HBase Issue Type: Bug Reporter: lujie -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-25432) we should add security checks for setTableStateInMeta
[ https://issues.apache.org/jira/browse/HBASE-25432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie reopened HBASE-25432: --- > we should add security checks for setTableStateInMeta > - > > Key: HBASE-25432 > URL: https://issues.apache.org/jira/browse/HBASE-25432 > Project: HBase > Issue Type: Bug >Reporter: lujie >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25432) we should add security checks for list_namespace_tables
[ https://issues.apache.org/jira/browse/HBASE-25432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie resolved HBASE-25432. --- Resolution: Not A Problem > we should add security checks for list_namespace_tables > --- > > Key: HBASE-25432 > URL: https://issues.apache.org/jira/browse/HBASE-25432 > Project: HBase > Issue Type: Bug >Reporter: lujie >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25432) we should add missing security checks for list_namespace_tables and listTableDescriptorsByNamespace
lujie created HBASE-25432: - Summary: we should add missing security checks for list_namespace_tables and listTableDescriptorsByNamespace Key: HBASE-25432 URL: https://issues.apache.org/jira/browse/HBASE-25432 Project: HBase Issue Type: Bug Reporter: lujie -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25422) update_all_config can be executed by non-admin user
lujie created HBASE-25422: - Summary: update_all_config can be executed by non-admin user Key: HBASE-25422 URL: https://issues.apache.org/jira/browse/HBASE-25422 Project: HBase Issue Type: Bug Reporter: lujie Attachments: image-2020-12-20-12-50-23-433.png !image-2020-12-20-12-50-23-433.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25407) list_regions make potential sensitive information disclosure
lujie created HBASE-25407: - Summary: list_regions make potential sensitive information disclosure Key: HBASE-25407 URL: https://issues.apache.org/jira/browse/HBASE-25407 Project: HBase Issue Type: Bug Reporter: lujie Attachments: image-2020-12-18-13-00-20-126.png I found that I can get other users' region information which is not expected. For example i create a table as sysadmin, then I can read the region information as user1. !image-2020-12-18-13-00-20-126.png! I have found that list_regions is introduced by https://issues.apache.org/jira/browse/HBASE-14925 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25332) one NPE
[ https://issues.apache.org/jira/browse/HBASE-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie resolved HBASE-25332. --- Resolution: Fixed > one NPE > --- > > Key: HBASE-25332 > URL: https://issues.apache.org/jira/browse/HBASE-25332 > Project: HBase > Issue Type: Bug > Components: Zookeeper >Reporter: lujie >Assignee: lujie >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4 > > > * getData can return null at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L615] > or > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L619] > all its caller have null checker except at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java#L467] > We shoud add null check for pontential NPEs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-25332) one NPE
[ https://issues.apache.org/jira/browse/HBASE-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie reopened HBASE-25332: --- > one NPE > --- > > Key: HBASE-25332 > URL: https://issues.apache.org/jira/browse/HBASE-25332 > Project: HBase > Issue Type: Bug > Components: Zookeeper >Reporter: lujie >Assignee: lujie >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4 > > > * getData can return null at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L615] > or > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L619] > all its caller have null checker except at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java#L467] > We shoud add null check for pontential NPEs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25332) one NPE
[ https://issues.apache.org/jira/browse/HBASE-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie resolved HBASE-25332. --- Resolution: Fixed > one NPE > --- > > Key: HBASE-25332 > URL: https://issues.apache.org/jira/browse/HBASE-25332 > Project: HBase > Issue Type: Bug > Components: Zookeeper >Reporter: lujie >Assignee: lujie >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4 > > > * getData can return null at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L615] > or > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L619] > all its caller have null checker except at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java#L467] > We shoud add null check for pontential NPEs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-25332) one NPE
[ https://issues.apache.org/jira/browse/HBASE-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie reopened HBASE-25332: --- > one NPE > --- > > Key: HBASE-25332 > URL: https://issues.apache.org/jira/browse/HBASE-25332 > Project: HBase > Issue Type: Bug > Components: Zookeeper >Reporter: lujie >Assignee: lujie >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4 > > > * getData can return null at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L615] > or > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L619] > all its caller have null checker except at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java#L467] > We shoud add null check for pontential NPEs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25332) one NPE
[ https://issues.apache.org/jira/browse/HBASE-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie resolved HBASE-25332. --- Resolution: Fixed > one NPE > --- > > Key: HBASE-25332 > URL: https://issues.apache.org/jira/browse/HBASE-25332 > Project: HBase > Issue Type: Bug > Components: Zookeeper >Reporter: lujie >Assignee: lujie >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4 > > > * getData can return null at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L615] > or > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L619] > all its caller have null checker except at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java#L467] > We shoud add null check for pontential NPEs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-25332) one NPE
[ https://issues.apache.org/jira/browse/HBASE-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie reopened HBASE-25332: --- > one NPE > --- > > Key: HBASE-25332 > URL: https://issues.apache.org/jira/browse/HBASE-25332 > Project: HBase > Issue Type: Bug > Components: Zookeeper >Reporter: lujie >Assignee: lujie >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0, 2.2.7, 2.3.4 > > > * getData can return null at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L615] > or > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-zookeeper/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L619] > all its caller have null checker except at > > [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-server/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupInfoManagerImpl.java#L467] > We shoud add null check for pontential NPEs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25332) One pontential NPE
lujie created HBASE-25332: - Summary: One pontential NPE Key: HBASE-25332 URL: https://issues.apache.org/jira/browse/HBASE-25332 Project: HBase Issue Type: Bug Reporter: lujie peek can return null at [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java#L108] all its callers have null checker except at [https://github.com/apache/hbase/blob/1726160839368df14602da1618e3538955b25f74/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ReversedKeyValueHeap.java#L110] We shoud add null check for pontential NPE -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25023) NPE while shutdown master node
[ https://issues.apache.org/jira/browse/HBASE-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie resolved HBASE-25023. --- Fix Version/s: 2.2.6 Resolution: Fixed > NPE while shutdown master node > -- > > Key: HBASE-25023 > URL: https://issues.apache.org/jira/browse/HBASE-25023 > Project: HBase > Issue Type: Bug >Reporter: lujie >Assignee: Junhong Xu >Priority: Major > Fix For: 2.2.6 > > > while shutdown the master node, we can see the exception: > {code:java} > 2020-09-14 06:48:29,530 ERROR [PEWorker-16] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception: pid=111, ppid=64, state=RUNNABLE, > locked=true; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure > java.lang.NullPointerException > at > org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:276) > at > org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25023) NPE while shutdown master node
lujie created HBASE-25023: - Summary: NPE while shutdown master node Key: HBASE-25023 URL: https://issues.apache.org/jira/browse/HBASE-25023 Project: HBase Issue Type: Bug Reporter: lujie while shutdown the master node, we can see the exception: {code:java} 2020-09-14 06:48:29,530 ERROR [PEWorker-16] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime exception: pid=111, ppid=64, state=RUNNABLE, locked=true; org.apache.hadoop.hbase.master.assignment.OpenRegionProcedure java.lang.NullPointerException at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:276) at org.apache.hadoop.hbase.master.assignment.RegionRemoteProcedureBase.execute(RegionRemoteProcedureBase.java:58) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:962) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1648) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1395) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:78) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1965) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24976) REST Server failes to start without any error message
lujie created HBASE-24976: - Summary: REST Server failes to start without any error message Key: HBASE-24976 URL: https://issues.apache.org/jira/browse/HBASE-24976 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.2.1 Reporter: lujie -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-22050) NPE happens while RS shutdown, due to atomic violation
lujie created HBASE-22050: - Summary: NPE happens while RS shutdown, due to atomic violation Key: HBASE-22050 URL: https://issues.apache.org/jira/browse/HBASE-22050 Project: HBase Issue Type: Bug Reporter: lujie while RS shutdown, the RS#abort are called due to {code:java} handler.AssignRegionHandler: Fatal error occured while opening region hbase:meta,,1.1588230740, aborting... {code} And in abort: {code:java} 2428.if (rssStub != null && this.serverName != null) { 2429 ReportRSFatalErrorRequest.Builder builder = 2430. ReportRSFatalErrorRequest.newBuilder(); 2431. builder.setServer(ProtobufUtil.toServerName(this.serverName)); 2432 builder.setErrorMessage(msg); 2433 rssStub.reportRSFatalError(null, builder.build()); 2434 } {code} 2428-2434 are assumed to be atomic, but if it step in the 2429-2433, meanwhile RS#run: {code:java} 1149 // Make sure the proxy is down. 1150 if (this.rssStub != null) { 1151this.rssStub = null; 1152 } {code} So the rssStub == null and NPE happens {code:java} 2019-03-14 04:49:53,016 WARN [RS_CLOSE_META-regionserver/hadoop12:16020-0] regionserver.HRegionServer: Unable to report fatal error to master java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:2433) at org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.handleException(AssignRegionHandler.java:154) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:106) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} I think we should avoid the NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22041) Master stuck in startup and print "FailedServerException" forever
lujie created HBASE-22041: - Summary: Master stuck in startup and print "FailedServerException" forever Key: HBASE-22041 URL: https://issues.apache.org/jira/browse/HBASE-22041 Project: HBase Issue Type: Bug Reporter: lujie Attachments: fixedlogs.zip while master fresh boot, we shutdown the RS who hold meta. we find that the master startup fails and print thounds of logs like: {code:java} 2019-03-13 01:09:54,896 WARN [RSProcedureDispatcher-pool4-t1] procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724 failed due to java.net.ConnectException: Call to hadoop14/172.16.1.131:16020 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: hadoop14/172.16.1.131:16020, try=0, retrying... 2019-03-13 01:09:55,004 WARN [RSProcedureDispatcher-pool4-t2] procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724 failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to hadoop14/172.16.1.131:16020 failed on local exception: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: hadoop14/172.16.1.131:16020, try=1, retrying... 2019-03-13 01:09:55,114 WARN [RSProcedureDispatcher-pool4-t3] procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724 failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to hadoop14/172.16.1.131:16020 failed on local exception: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: hadoop14/172.16.1.131:16020, try=2, retrying... 2019-03-13 01:09:55,219 WARN [RSProcedureDispatcher-pool4-t4] procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724 failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to hadoop14/172.16.1.131:16020 failed on local exception: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: hadoop14/172.16.1.131:16020, try=3, retrying... 2019-03-13 01:09:55,324 WARN [RSProcedureDispatcher-pool4-t5] procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724 failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to hadoop14/172.16.1.131:16020 failed on local exception: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: hadoop14/172.16.1.131:16020, try=4, retrying... 2019-03-13 01:09:55,428 WARN [RSProcedureDispatcher-pool4-t6] procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724 failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to hadoop14/172.16.1.131:16020 failed on local exception: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: hadoop14/172.16.1.131:16020, try=5, retrying... 2019-03-13 01:09:55,533 WARN [RSProcedureDispatcher-pool4-t7] procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724 failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to hadoop14/172.16.1.131:16020 failed on local exception: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: hadoop14/172.16.1.131:16020, try=6, retrying... 2019-03-13 01:09:55,638 WARN [RSProcedureDispatcher-pool4-t8] procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724 failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to hadoop14/172.16.1.131:16020 failed on local exception: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: hadoop14/172.16.1.131:16020, try=7, retrying... 2019-03-13 01:09:55,755 WARN [RSProcedureDispatcher-pool4-t9] procedure.RSProcedureDispatcher: request to server hadoop14,16020,1552410583724 failed due to org.apache.hadoop.hbase.ipc.FailedServerException: Call to hadoop14/172.16.1.131:16020 failed on local exception: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: hadoop14/172.16.1.131:16020, try=8, retrying... {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22023) similar to HBASE-21740: NPE happens while shutdown the RS
lujie created HBASE-22023: - Summary: similar to HBASE-21740: NPE happens while shutdown the RS Key: HBASE-22023 URL: https://issues.apache.org/jira/browse/HBASE-22023 Project: HBase Issue Type: Bug Reporter: lujie Assignee: lujie shutdown command comes before startServices: {code:java} if (!isStopped() && !isAborted()) { initializeThreads(); }{code} so initializeThreads will skip and leases is null leases will be used in line 1996 without check, hence NPE happens Give the simple fix! {code:java} 2019-03-10 14:17:12,690 ERROR [regionserver/hadoop15:16020] regionserver.HRegionServer: Failed init java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.startServices(HRegionServer.java:1996) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1575) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:976) at java.lang.Thread.run(Thread.java:745) 2019-03-10 14:17:12,719 ERROR [regionserver/hadoop15:16020] regionserver.HRegionServer: * ABORTING region server hadoop15,16020,1552198622594: Unhandled: Region server startup failed * java.io.IOException: Region server startup failed at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:3398) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1594) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:976) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.startServices(HRegionServer.java:1996) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1575) ... 2 more {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-22017) Failed to become active master due to lease 'XXX' does not exist
lujie created HBASE-22017: - Summary: Failed to become active master due to lease 'XXX' does not exist Key: HBASE-22017 URL: https://issues.apache.org/jira/browse/HBASE-22017 Project: HBase Issue Type: Bug Reporter: lujie {code:java} 2019-03-06 01:36:17,040 ERROR [master/hadoop11:16000:becomeActiveMaster] master.HMaster: * ABORTING master hadoop11,16000,1551807353275: Unhandled exception. Starting shutdown. * org.apache.hadoop.hbase.regionserver.LeaseException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '3449673378019934209' does not exist at org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:224) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3434) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42002) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.instantiateException(RemoteWithExtrasException.java:100) at org.apache.hadoop.hbase.ipc.RemoteWithExtrasException.unwrapRemoteException(RemoteWithExtrasException.java:90) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.makeIOExceptionOfException(ProtobufUtil.java:361) at org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil.handleRemoteException(ProtobufUtil.java:349) at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:344) at org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:242) at org.apache.hadoop.hbase.client.ScannerCallable.rpcCall(ScannerCallable.java:58) at org.apache.hadoop.hbase.client.RegionServerCallable.call(RegionServerCallable.java:127) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:387) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:361) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21740) NPE happens while shutdown the RS
lujie created HBASE-21740: - Summary: NPE happens while shutdown the RS Key: HBASE-21740 URL: https://issues.apache.org/jira/browse/HBASE-21740 Project: HBase Issue Type: Bug Reporter: lujie while shutdown a NM, we meet the NPE: {code:java} 2019-01-18 16:52:05,500 INFO [Thread-4] regionserver.HRegionServer: STOPPED: Shutdown hook 2019-01-18 16:52:05,896 INFO [regionserver/hadoop15:16020] regionserver.MetricsRegionServerWrapperImpl: Computing regionserver metrics every 5000 milliseconds 2019-01-18 16:52:05,978 INFO [regionserver/hadoop15:16020.Chore.1] hbase.ScheduledChore: Chore: CompactedHFilesCleaner was stopped 2019-01-18 16:52:05,996 ERROR [regionserver/hadoop15:16020] regionserver.HRegionServer: Failed init java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.startServices(HRegionServer.java:1978) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1572) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:975) at java.lang.Thread.run(Thread.java:745) 2019-01-18 16:52:06,011 ERROR [regionserver/hadoop15:16020] regionserver.HRegionServer: * ABORTING region server hadoop15,16020,1547801516426: Unhandled: Region server startup failed * java.io.IOException: Region server startup failed at org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:3392) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1591) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:975) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.hadoop.hbase.regionserver.HRegionServer.startServices(HRegionServer.java:1978) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1572) ... 2 more {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20420) Fix Some Potential NPE
lujie created HBASE-20420: - Summary: Fix Some Potential NPE Key: HBASE-20420 URL: https://issues.apache.org/jira/browse/HBASE-20420 Project: HBase Issue Type: Bug Affects Versions: 2.0.0-beta-2 Reporter: lujie Attachments: hbase-20420.patch We have used the tool [NPEDetector|https://github.com/lujiefsi/NPEDetector] find another six problems that similar to HBASE-20419. list here and attach the patch. CommonFSUtils#listStatus RSGroupInfoManagerImpl#getRSGroupOfServer BackupSystemTable#readBackupInfo SnapshotManifest#getRegionManifestsMap HRegionFileSystem#getFamilies Result#getFamilyMap -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20419) Two Potential NPE
lujie created HBASE-20419: - Summary: Two Potential NPE Key: HBASE-20419 URL: https://issues.apache.org/jira/browse/HBASE-20419 Project: HBase Issue Type: Bug Affects Versions: 2.0.0-beta-2 Reporter: lujie Attachments: HBASE-20419_1.patch Callee ZKUtil#listChildrenAndWatchForNewChildren may return null, it has 8 callers, 6 of the caller have null checker like: {code:java} List children = ZKUtil.listChildrenAndWatchForNewChildren(zkw, zkw.znodePaths.rsZNode); if (children == null) { return Collections.emptyList(); } {code} but another two callers do not have null checker:RSGroupInfoManagerImpl#retrieveGroupListFromZookeeper,ZKProcedureMemberRpcs#watchForAbortedProcedures. We attach the patch to fix this probelm.(We found this bug by tool [NPEDetector|https://github.com/lujiefsi/NPEDetector]) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19004) master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected
lujie created HBASE-19004: - Summary: master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected Key: HBASE-19004 URL: https://issues.apache.org/jira/browse/HBASE-19004 Project: HBase Issue Type: Bug Reporter: lujie When send stop regionserver command {code:java} 2017-10-13 16:28:28,366 INFO [ProcedureExecutor-1] zookeeper.ZKTableStateManager: Moving table TestTable state from null to ENABLING 2017-10-13 16:28:28,387 INFO [ProcedureExecutor-1] master.AssignmentManager: Bulk assigning 1 region(s) across 3 server(s), round-robin=true 2017-10-13 16:28:28,388 INFO [hadoop11,16000,1507883241250-GeneralBulkAssigner-0] master.AssignmentManager: Assigning 1 region(s) to hadoop11,16020,1507883241942 2017-10-13 16:28:28,394 INFO [hadoop11,16000,1507883241250-GeneralBulkAssigner-0] master.RegionStates: Transition {2aaaf8304f2b09288f528ac0f105cc01 state=OFFLINE, ts=1507883308388, server=null} to {2aaaf8304f2b09288f528ac0f105cc01 state=PENDING_OPEN, ts=1507883308394, server=hadoop11,16020,1507883241942} 2017-10-13 16:28:28,585 INFO [AM.ZK.Worker-pool2-t10] master.RegionStates: Transition {2aaaf8304f2b09288f528ac0f105cc01 state=PENDING_OPEN, ts=1507883308394, server=hadoop11,16020,1507883241942} to {2aaaf8304f2b09288f528ac0f105cc01 state=OPENING, ts=1507883308585, server=hadoop11,16020,1507883241942} 2017-10-13 16:28:29,163 INFO [AM.ZK.Worker-pool2-t11] master.RegionStates: Transition {2aaaf8304f2b09288f528ac0f105cc01 state=OPENING, ts=1507883308585, server=hadoop11,16020,1507883241942} to {2aaaf8304f2b09288f528ac0f105cc01 state=OPEN, ts=1507883309163, server=hadoop11,16020,1507883241942} 2017-10-13 16:28:36,517 INFO [main-EventThread] zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [hadoop11,16020,1507883241942] 2017-10-13 16:28:37,428 INFO [ProcedureExecutor-2] procedure.ServerCrashProcedure: Start processing crashed hadoop11,16020,1507883241942 2017-10-13 16:28:37,689 INFO [ProcedureExecutor-4] master.SplitLogManager: dead splitlog workers [hadoop11,16020,1507883241942] 2017-10-13 16:28:37,693 INFO [ProcedureExecutor-4] master.SplitLogManager: hdfs://hadoop11:29000/hbase/WALs/hadoop11,16020,1507883241942-splitting is empty dir, no logs to split 2017-10-13 16:28:37,695 INFO [ProcedureExecutor-4] master.SplitLogManager: Started splitting 0 logs in [hdfs://hadoop11:29000/hbase/WALs/hadoop11,16020,1507883241942-splitting] for [hadoop11,16020,1507883241942] 2017-10-13 16:28:37,701 INFO [ProcedureExecutor-4] master.SplitLogManager: finished splitting (more than or equal to) 0 bytes in 0 log files in [hdfs://hadoop11:29000/hbase/WALs/hadoop11,16020,1507883241942-splitting] in 6ms 2017-10-13 16:28:37,807 WARN [ProcedureExecutor-4] master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected {2aaaf8304f2b09288f528ac0f105cc01 state=OPEN, ts=1507883309163, server=hadoop11,16020,1507883241942} 2017-10-13 16:28:37,923 INFO [ProcedureExecutor-4] procedure.ServerCrashProcedure: Finished processing of crashed hadoop11,16020,1507883241942 {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)