[jira] [Reopened] (HBASE-23595) HMaster abort when write to meta failed
[ https://issues.apache.org/jira/browse/HBASE-23595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez reopened HBASE-23595: --- > HMaster abort when write to meta failed > --- > > Key: HBASE-23595 > URL: https://issues.apache.org/jira/browse/HBASE-23595 > Project: HBase > Issue Type: Bug >Affects Versions: 2.2.2 >Reporter: Lijin Bin >Priority: Major > > RegionStateStore > {code} > private void updateRegionLocation(RegionInfo regionInfo, State state, Put > put) > throws IOException { > try (Table table = > master.getConnection().getTable(TableName.META_TABLE_NAME)) { > table.put(put); > } catch (IOException e) { > // TODO: Revist Means that if a server is loaded, then we will > abort our host! > // In tests we abort the Master! > String msg = String.format("FAILED persisting region=%s state=%s", > regionInfo.getShortNameToLog(), state); > LOG.error(msg, e); > master.abort(msg, e); > throw e; > } > } > {code} > When regionserver (carry meta) stop or crash, if the ServerCrashProcedure > have not start process, write to meta will fail and abort master. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-19352) Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags
[ https://issues.apache.org/jira/browse/HBASE-19352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-19352. --- Fix Version/s: 2.2.6 2.4.0 2.3.3 3.0.0-alpha-1 Tags: security Resolution: Fixed > Port HADOOP-10379: Protect authentication cookies with the HttpOnly and > Secure flags > > > Key: HBASE-19352 > URL: https://issues.apache.org/jira/browse/HBASE-19352 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Esteban Gutierrez >Assignee: Esteban Gutierrez >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.6 > > Attachments: HBASE-19352.master.v0.patch > > > This came via a security scanner, since we have a fork of HttpServer2 in > HBase we should include it too. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24041) [regression] Increase RESTServer buffer size back to 64k
[ https://issues.apache.org/jira/browse/HBASE-24041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-24041. --- Fix Version/s: 2.2.5 2.4.0 2.3.0 3.0.0 Resolution: Fixed > [regression] Increase RESTServer buffer size back to 64k > - > > Key: HBASE-24041 > URL: https://issues.apache.org/jira/browse/HBASE-24041 > Project: HBase > Issue Type: Bug > Components: REST >Affects Versions: 3.0.0, 2.2.0, 2.3.0, 2.4.0 >Reporter: Esteban Gutierrez >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.4.0, 2.2.5 > > > HBASE-14492 is not longer present in our current releases after HBASE-12894. > Unfortunately our RESTServer is not extending HttpServer which means that > {{DEFAULT_MAX_HEADER_SIZE}} is not being set and HTTP requests with a very > large header can still cause connection issues for clients. A quick fix is > just to add the settings to the {{HttpConfiguration}} configuration object. A > long term solution should be to re-factor services that create an HTTP server > and normalize all configuration settings across all of them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24041) [regression] Increase RESTServer buffer size back to 64k
Esteban Gutierrez created HBASE-24041: - Summary: [regression] Increase RESTServer buffer size back to 64k Key: HBASE-24041 URL: https://issues.apache.org/jira/browse/HBASE-24041 Project: HBase Issue Type: Bug Affects Versions: 2.2.0, 3.0.0, 2.3.0, 2.4.0 Reporter: Esteban Gutierrez HBASE-14492 is not longer present in our current releases after HBASE-12894. Unfortunately our RESTServer is not extending HttpServer which means that {{DEFAULT_MAX_HEADER_SIZE}} is not being set and HTTP requests with a very large header can still cause connection issues for clients. A quick fix is just to add the settings to the {{HttpConfiguration}} configuration object. A long term solution should be to re-factor services that create an HTTP server and normalize all configuration settings across all of them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-22926) REST server should return 504 Gateway Timeout Error on scanner timeout
Esteban Gutierrez created HBASE-22926: - Summary: REST server should return 504 Gateway Timeout Error on scanner timeout Key: HBASE-22926 URL: https://issues.apache.org/jira/browse/HBASE-22926 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.2.0, 2.1.0, 3.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Currently when a scanner timeout error occurs on the RS side, a client will get a RetriesExhaustedException that will make the client to fail, however from the REST server point of view that is just an IOE: org.apache.hadoop.hbase.rest.ScannerResultGenerator#next {code} } else { Result result = null; try { result = scanner.next(); } catch (UnknownScannerException e) { throw new IllegalArgumentException(e); } catch (TableNotEnabledException tnee) { throw new IllegalStateException(tnee); } catch (TableNotFoundException tnfe) { throw new IllegalArgumentException(tnfe); } catch (IOException e) { LOG.error(StringUtils.stringifyException(e)); } {code} Now, with that empty result (will handle this as an HTTP 204 response back to the client: org.apache.hadoop.hbase.rest.ScannerInstanceResource#get {code} ... Cell value = null; try { value = generator.next(); } catch (IllegalStateException e) { ... } catch (IllegalArgumentException e) { ... } ... if (value == null) { if (LOG.isTraceEnabled()) { LOG.trace("generator exhausted"); } // respond with 204 (No Content) if an empty cell set would be // returned if (count == limit) { return Response.noContent().build(); } break; {code} Obviously this is wrong, since a RetriesExhaustedException is most likely due a failure in the RS side. The correct behavior should be a 504 Gateway Timeout Error. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (HBASE-22253) An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader
Esteban Gutierrez created HBASE-22253: - Summary: An AuthenticationTokenSecretManager leader won't step down if another RS claims to be a leader Key: HBASE-22253 URL: https://issues.apache.org/jira/browse/HBASE-22253 Project: HBase Issue Type: Bug Components: security Affects Versions: 2.1.0, 3.0.0, 2.2.0 Reporter: Esteban Gutierrez We ran into a situation were a rouge Lily HBase Indexer [SEP Consumer|https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl/src/main/java/com/ngdata/sep/impl/SepConsumer.java#L169] sharing the same {{zookeeper.znode.parent}} claimed to be AuthenticationTokenSecretManager for an HBase cluster. This situation undesirable since the leader running on the HBase cluster doesn't steps down when the rouge leader registers in the HBase cluster and both will start rolling keys with the same IDs causing authentication errors. Even a reasonable "fix" is to point to a different {{zookeeper.znode.parent}}, we should make sure that we step down as leader correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HBASE-22019) Ability to remotely connect to hbase when hbase/zook is hosted on dynamic IP addresses
[ https://issues.apache.org/jira/browse/HBASE-22019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-22019. --- Resolution: Invalid Thanks for reporting this, [~toopt4]. Please reach out the HBase user [mailing list|https://hbase.apache.org/mailing-lists.html] for this type of this problems since many users have been able to connect to HBase clusters without any problem regardless if the HBase is on different networks. > Ability to remotely connect to hbase when hbase/zook is hosted on dynamic IP > addresses > -- > > Key: HBASE-22019 > URL: https://issues.apache.org/jira/browse/HBASE-22019 > Project: HBase > Issue Type: New Feature > Components: IPC/RPC, Zookeeper >Reporter: t oo >Priority: Major > > Our team's need for this is purely for remote connections (ie personal > laptops) to HBASE (hosted on EC2) to work as hbase connections under the > cover connect to zookeeper (also running on EC2) and attempt to resolve the > hostname (not DNS!) of the machine running zookeeper. From what I've read > others re facing the issue: > https://forums.aws.amazon.com/thread.jspa?threadID=119915 > https://stackoverflow.com/questions/30751187/unable-to-connect-to-hbase-stand-alone-server-from-windows-remote-client > https://sematext.com/opensee/m/HBase/YGbbw6MGk1B9nCv?subj=Re:+Remote+Java+client+connection+into+EC2+instance > https://community.cloudera.com/t5/Storage-Random-Access-HDFS/Problem-in-connectivity-between-HBase-amp-JAVA/td-p/1693 > https://stackoverflow.com/questions/9413481/hbase-node-could-not-be-reached-from-hbase-java-api-client > https://groups.google.com/forum/#!topic/opentsdb/3w4FCnPYRDg > Between ec2s I don't get the below error because I can edit /etc/hosts to add > the host name below but don't have root/admin access on other machines to do > the same. Problem is if we have 100s of users wanting to connect to hbase > data then they would all face this /etc/hosts issue. > Example of the error: > 19/03/01 17:02:14 WARN client.ConnectionUtils: Can not resolve > ip-10x.com, please check your network > java.net.UnknownHostException: ip-10x.com: Name or service not known > at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) > at java.net.InetAddress.getAllByName0(InetAddress.java:1277) > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > at java.net.InetAddress.getByName(InetAddress.java:1077) > at > org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:233) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getClient(ConnectionImplementation.java:1189) > at > org.apache.hadoop.hbase.client.ClientServiceCallable.setStubByServiceName(ClientServiceCallable.java:44) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:229) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:386) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:360) > at > org.apache.hadoop.hbase.MetaTableAccessor.getTableState(MetaTableAccessor.java:1066) > at > org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:389) > at org.apache.hadoop.hbase.client.HBaseAdmin$6.rpcCall(HBaseAdmin.java:437) > at org.apache.hadoop.hbase.client.HBaseAdmin$6.rpcCall(HBaseAdmin.java:434) > at > org.apache.hadoop.hbase.client.RpcRetryingCallable.call(RpcRetryingCallable.java:58) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:107) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3055) > at > org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3047) > at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:434) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-21134) Add guardrails to cell tags in order to avoid the tags length to overflow
Esteban Gutierrez created HBASE-21134: - Summary: Add guardrails to cell tags in order to avoid the tags length to overflow Key: HBASE-21134 URL: https://issues.apache.org/jira/browse/HBASE-21134 Project: HBase Issue Type: Bug Affects Versions: 1.5.0 Reporter: Esteban Gutierrez We found that per cell tags can easily overflow and and cause failures while reading HFiles. If a mutation has more than 32KB for the byte array with the tags we should reject the operation on the client side (proactively) and the server side as we deserialize the request. {code} 2018-08-21 11:08:45,387 ERROR org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction failed Request = regionName=table1,,1534870486680.9112ca53504084152da5e28116f40ec2., storeName=c1, fileCount=4, fileSize=254.2 K (138.0 K, 33.5 K, 34.0 K, 48.7 K), priority=1, time=8555785624243 java.lang.IllegalStateException: Invalid currTagsLen -20658. Block offset: 0, block length: 44912, position: 0 (without header). at org.apache.hadoop.hbase.io.hfile.HFileReaderV3$ScannerV3.checkTagsLen(HFileReaderV3.java:226) at org.apache.hadoop.hbase.io.hfile.HFileReaderV3$ScannerV3.readKeyValueLen(HFileReaderV3.java:251) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.updateCurrBlock(HFileReaderV2.java:956) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:919) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:304) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:200) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:350) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:269) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:231) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:414) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:91) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:125) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1247) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1915) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:529) at org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:566) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20761) FSReaderImpl#readBlockDataInternal can fail to switch to HDFS checksums in some edge cases
Esteban Gutierrez created HBASE-20761: - Summary: FSReaderImpl#readBlockDataInternal can fail to switch to HDFS checksums in some edge cases Key: HBASE-20761 URL: https://issues.apache.org/jira/browse/HBASE-20761 Project: HBase Issue Type: Bug Components: HFile Reporter: Esteban Gutierrez One of our users reported this problem on HBase 1.2 before and after HBASE-11625: {code} Caused by: java.io.IOException: On-disk size without header provided is 131131, but block header contains 0. Block offset: 2073954793, data starts with: \x00\x00\x00\x00\x00\x00\x00\x0\ 0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 at org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:526) at org.apache.hadoop.hbase.io.hfile.HFileBlock.access$700(HFileBlock.java:92) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1699) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1542) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:445) at org.apache.hadoop.hbase.util.CompoundBloomFilter.contains(CompoundBloomFilter.java:100) {code} The problems occurs when we do a read a block without HDFS checksums enabled and due some data corruption we end with an empty headerBuf while trying to read the block before the HDFS checksum failover code. This will cause further attempts to read the block to fail since we will still retry the corrupt replica instead of reporting the corrupt replica and trying a different one. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20604) ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled
Esteban Gutierrez created HBASE-20604: - Summary: ProtobufLogReader#readNext can incorrectly loop to the same position in the stream until the the WAL is rolled Key: HBASE-20604 URL: https://issues.apache.org/jira/browse/HBASE-20604 Project: HBase Issue Type: Bug Components: Replication, wal Affects Versions: 3.0.0, 2.1.0, 1.5.0 Reporter: Esteban Gutierrez Every time we call {{ProtobufLogReader#readNext}} we consume the input stream associated to the {{FSDataInputStream}} from the WAL that we are reading. Under certain conditions, e.g. when using the encryption at rest ({{CryptoInputStream}}) the stream can return partial data which can cause a premature EOF that cause {{inputStream.getPos()}} to return to the same origina position causing {{ProtobufLogReader#readNext}} to re-try over the reads until the WAL is rolled. The side effect of this issue is that {{ReplicationSource}} can get stuck until the WAL is rolled and causing replication delays up to an hour in some cases. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-19572) RegionMover should use the configured default port number and not the one from HConstants
Esteban Gutierrez created HBASE-19572: - Summary: RegionMover should use the configured default port number and not the one from HConstants Key: HBASE-19572 URL: https://issues.apache.org/jira/browse/HBASE-19572 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez The issue I ran into HBASE-19499 was due RegionMover not using the port used by {{hbase-site.xml}}. The tool should use the value used in the configuration before falling back to the hardcoded value {{HConstants.DEFAULT_REGIONSERVER_PORT}} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-19499) RegionMover#stripMaster in RegionMover needs to handle HBASE-18511 gracefully
[ https://issues.apache.org/jira/browse/HBASE-19499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-19499. --- Resolution: Not A Bug > RegionMover#stripMaster in RegionMover needs to handle HBASE-18511 gracefully > - > > Key: HBASE-19499 > URL: https://issues.apache.org/jira/browse/HBASE-19499 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Esteban Gutierrez > > Probably this is the first of few issues found during some tests with > RegionMover. After HBASE-13014 we ship the new RegionMover tool but it > currently assumes that master will be hosting regions so it attempts to > remove master from the list and that causes an issue similar to this: > {code} > 17/12/12 11:01:06 WARN util.RegionMover: Could not remove master from list of > RS > java.lang.Exception: Server host1.example.com:22001 is not in list of online > servers(Offline/Incorrect) > at > org.apache.hadoop.hbase.util.RegionMover.stripServer(RegionMover.java:818) > at > org.apache.hadoop.hbase.util.RegionMover.stripMaster(RegionMover.java:757) > at > org.apache.hadoop.hbase.util.RegionMover.access$1800(RegionMover.java:78) > at > org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:339) > at > org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:314) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Basicaly -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19499) RegionMover#stripMaster is not longer necessary in RegionMover
Esteban Gutierrez created HBASE-19499: - Summary: RegionMover#stripMaster is not longer necessary in RegionMover Key: HBASE-19499 URL: https://issues.apache.org/jira/browse/HBASE-19499 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Esteban Gutierrez Probably this is the first of few issues found during some tests with RegionMover. After HBASE-13014 we ship the new RegionMover tool but it currently assumes that master will be hosting regions so it attempts to remove master from the list and that causes an issue similar to this: {code} 17/12/12 11:01:06 WARN util.RegionMover: Could not remove master from list of RS java.lang.Exception: Server host1.example.com:22001 is not in list of online servers(Offline/Incorrect) at org.apache.hadoop.hbase.util.RegionMover.stripServer(RegionMover.java:818) at org.apache.hadoop.hbase.util.RegionMover.stripMaster(RegionMover.java:757) at org.apache.hadoop.hbase.util.RegionMover.access$1800(RegionMover.java:78) at org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:339) at org.apache.hadoop.hbase.util.RegionMover$Unload.call(RegionMover.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} Basicaly -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19391) Calling HRegion#initializeRegionInternals from a region replica can still re-create a region directory
Esteban Gutierrez created HBASE-19391: - Summary: Calling HRegion#initializeRegionInternals from a region replica can still re-create a region directory Key: HBASE-19391 URL: https://issues.apache.org/jira/browse/HBASE-19391 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez This is a follow up from HBASE-18024. There stills a chance that attempting to open a region that is not the default region replica can still create a GC'd region directory by the CatalogJanitor causing inconsistencies with hbck. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19390) Revert to older version of Jetty 9.3
Esteban Gutierrez created HBASE-19390: - Summary: Revert to older version of Jetty 9.3 Key: HBASE-19390 URL: https://issues.apache.org/jira/browse/HBASE-19390 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez As discussed in HBASE-19256 we will have to temporarily revert to Jetty 9.3 due existing issues with 9.4 and Hadoop3. Once HBASE-19256 is resolved we can revert to 9.4. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19352) Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags
Esteban Gutierrez created HBASE-19352: - Summary: Port HADOOP-10379: Protect authentication cookies with the HttpOnly and Secure flags Key: HBASE-19352 URL: https://issues.apache.org/jira/browse/HBASE-19352 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez This came via a security scanner, since we have a fork of HttpServer2 in HBase we should include it too. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH
[ https://issues.apache.org/jira/browse/HBASE-18987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-18987. --- Resolution: Later Solving as later since we could only do this with a new HFile format. > Raise value of HConstants#MAX_ROW_LENGTH > > > Key: HBASE-18987 > URL: https://issues.apache.org/jira/browse/HBASE-18987 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 1.0.0, 2.0.0 >Reporter: Esteban Gutierrez >Assignee: Esteban Gutierrez >Priority: Minor > Attachments: HBASE-18987.master.001.patch, > HBASE-18987.master.002.patch > > > Short.MAX_VALUE hasn't been a problem for a long time but one of our > customers ran into an edgy case when the midKey used for the split point was > very close to Short.MAX_VALUE. When the split is submitted, we attempt to > create the new two daughter regions and we name those regions via > {{HRegionInfo.createRegionName()}} in order to be added to META. > Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the > startKey {{Put}} will fail since the row key length will now fail checkRow > and thus causing the split to fail. > I tried a couple of alternatives to address this problem, e.g. truncating the > startKey. But the number of changes in the code doesn't justify for this edge > condition. Since we already use {{Integer.MAX_VALUE - 1}} for > {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for > the maximum row key. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19309) Lower HConstants#MAX_ROW_LENGTH as guardrail in order to avoid HBASE-18987
Esteban Gutierrez created HBASE-19309: - Summary: Lower HConstants#MAX_ROW_LENGTH as guardrail in order to avoid HBASE-18987 Key: HBASE-19309 URL: https://issues.apache.org/jira/browse/HBASE-19309 Project: HBase Issue Type: Bug Components: HFile, regionserver Reporter: Esteban Gutierrez As discussed in HBASE-18987. A problem of having a row about the maximum size of a row (Short.MAX_VALUE) is when a split happens, there is a possibility that the midkey could be that row and the Put created to add the new entry in META will exceed the maximum row size since the new row key will include the table name and that will cause the split to abort. Since is not possible to raise that row key size in HFileV3, a reasonable solution is to reduce the maximum size of row key in order to avoid exceeding Short.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18987) Raise value of HConstants#MAX_ROW_LENGTH
Esteban Gutierrez created HBASE-18987: - Summary: Raise value of HConstants#MAX_ROW_LENGTH Key: HBASE-18987 URL: https://issues.apache.org/jira/browse/HBASE-18987 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.0.0, 2.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Minor Short.MAX_VALUE hasn't been a problem for a long time but one of our customers ran into an edgy case when the midKey used for the split point was very close to Short.MAX_VALUE. When the split is submitted, we attempt to create the new two daughter regions and we name those regions via {{HRegionInfo.createRegionName()}} in order to be added to META. Unfortunately, since {{HRegionInfo.createRegionName()}} uses midKey as the startKey {{Put}} will fail since the row key length will now fail checkRow and thus causing the split to fail. I tried a couple of alternatives to address this problem, e.g. truncating the startKey. But the number of changes in the code doesn't justify for this edge condition. Since we already use {{Integer.MAX_VALUE - 1}} for {{HConstants#MAXIMUM_VALUE_LENGTH}} it should be ok to use the same limit for the maximum row key. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18563) Fix RAT License complaint about website jenkins scripts
Esteban Gutierrez created HBASE-18563: - Summary: Fix RAT License complaint about website jenkins scripts Key: HBASE-18563 URL: https://issues.apache.org/jira/browse/HBASE-18563 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez Priority: Trivial {{2 Unknown Licenses * Files with unapproved licenses: dev-support/jenkins-scripts/check-website-links.sh dev-support/jenkins-scripts/generate-hbase-website.sh * }} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18177) FanOutOneBlockAsyncDFSOutputHelper fails to compile against Hadoop 3
Esteban Gutierrez created HBASE-18177: - Summary: FanOutOneBlockAsyncDFSOutputHelper fails to compile against Hadoop 3 Key: HBASE-18177 URL: https://issues.apache.org/jira/browse/HBASE-18177 Project: HBase Issue Type: Bug Components: wal Reporter: Esteban Gutierrez After HDFS-10996 ClientProtocol#create() needs to specify the erasure code policy to use. In the meantime we should add a workaround to FanOutOneBlockAsyncDFSOutputHelper to be able to compile against Hadoop 3 and Hadoop 2. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18025) CatalogJanitor should collect outdated RegionStates from the AM
Esteban Gutierrez created HBASE-18025: - Summary: CatalogJanitor should collect outdated RegionStates from the AM Key: HBASE-18025 URL: https://issues.apache.org/jira/browse/HBASE-18025 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez I don't think this will matter on the long run for HBase 2, but at least in branch-1 and the current master we keep in multiple places copies of the region states in the master and this copies include information like the HRI. A problem that we have observed is when region replicas are being used and there is a split, the region replica from parent doesn't get collected from the region states and when the balancer tries to assign the old parent region replica, this will cause the RegionServer to create a new HRI with the details of the parent causing an inconstancy (see HBASE-18024). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18024) HRegion#initializeRegionInternals should not re-create .hregioninfo file when the region directory no longer exists
Esteban Gutierrez created HBASE-18024: - Summary: HRegion#initializeRegionInternals should not re-create .hregioninfo file when the region directory no longer exists Key: HBASE-18024 URL: https://issues.apache.org/jira/browse/HBASE-18024 Project: HBase Issue Type: Bug Components: Region Assignment, regionserver Affects Versions: 1.2.5, 1.3.1, 1.4.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez When a RegionSever attempts to open a region, during initialization the RS tries to open the {{/data///.hregioninfo}} file, however if the {{.hregioninfofile}} doesn't exist, the RegionServer will create a new one on {{HRegionFileSystem#checkRegionInfoOnFilesystem}}. A side effect of that tools like hbck will incorrectly assume an inconsistency due the presence of this new {{.hregioninfofile}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17799) HBCK region boundaries check can return false negatives when IOExceptions are thrown
Esteban Gutierrez created HBASE-17799: - Summary: HBCK region boundaries check can return false negatives when IOExceptions are thrown Key: HBASE-17799 URL: https://issues.apache.org/jira/browse/HBASE-17799 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 2.0.0, 1.4.0, 1.3.1, 1.2.5 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez When enabled, HBaseFsck#checkRegionBoundaries will crawl all HFiles across all namespaces and tables when {{-boundaries}} is specified. However if an IOException is thrown by accessing a corrupt HFile, an un-handled HLink or by any other reason, we will only log the exception and stop crawling the HFiles and potentially reporting the wrong result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17756) We should have better introspection of HFiles
Esteban Gutierrez created HBASE-17756: - Summary: We should have better introspection of HFiles Key: HBASE-17756 URL: https://issues.apache.org/jira/browse/HBASE-17756 Project: HBase Issue Type: Brainstorming Components: HFile Reporter: Esteban Gutierrez [~saint@gmail.com] was suggesting to use DataSketches (https://datasketches.github.io) in order to write additional statistics to the HFiles. This could be used to improve our split decisions, troubleshooting or potentially do other interesting analysis without having to perform full table scans. The statistics could be stored as part of the HFile but we could initially improve the visibility of the data by adding some statistics to HFilePrettyPrinter. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17755) CellBasedKeyBlockIndexReader#midkey should exhaust search of the target middle key on skewed regions
Esteban Gutierrez created HBASE-17755: - Summary: CellBasedKeyBlockIndexReader#midkey should exhaust search of the target middle key on skewed regions Key: HBASE-17755 URL: https://issues.apache.org/jira/browse/HBASE-17755 Project: HBase Issue Type: Bug Components: HFile Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez We have always been returning the middle key of the the block index regardless the distribution of the data on an HFile. A side effect of that approach is that when millions of rows share the same key its quite easy to run into a situation when the start key is equal to the middle key or when the end key is equal to the middle key making that HFile nearly impossible to split until enough data is written into the region and the middle key shifts to another row or when an operator uses a custom split point in order to split that region. Instead we should exhaust the search of the middle key in the block index in order to be able to split an HFile earlier when possible even if our edge case is to serve a region that could hold a single key with millions of versions of a row or with millions of qualifiers on the same row. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-17679) Log arguments passed to hbck
[ https://issues.apache.org/jira/browse/HBASE-17679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-17679. --- Resolution: Duplicate Duplicate of HBASE-12678. Perhaps PrintingErrorReporter sends to stdout that information while our log4j properties make the console write to stderr. > Log arguments passed to hbck > > > Key: HBASE-17679 > URL: https://issues.apache.org/jira/browse/HBASE-17679 > Project: HBase > Issue Type: Bug >Reporter: Esteban Gutierrez >Assignee: Esteban Gutierrez >Priority: Trivial > > Sometimes hbck arguments get lost and we only end up with the output of hbck. > This should log some basic info about our arguments passed to hbck for better > supportability. Additional server side logging will be added later on HBase > Admin calls in a different JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17679) Log arguments passed to hbck
Esteban Gutierrez created HBASE-17679: - Summary: Log arguments passed to hbck Key: HBASE-17679 URL: https://issues.apache.org/jira/browse/HBASE-17679 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Trivial Sometimes hbck arguments get lost and we only end up with the output of hbck. This should log some basic info about our arguments passed to hbck for better supportability. Additional server side logging will be added later on HBase Admin calls in a different JIRA. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17622) Add hbase-metrics package to TableMapReduceUtil
Esteban Gutierrez created HBASE-17622: - Summary: Add hbase-metrics package to TableMapReduceUtil Key: HBASE-17622 URL: https://issues.apache.org/jira/browse/HBASE-17622 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 2.0.0 Reporter: Esteban Gutierrez Priority: Trivial HBASE-9774 moved our metrics to its own package recently, unfortunately running a MR job against snapshots will fail since org.apache.hadoop.hbase.metrics.impl.FastLongHistogram is not present in the classpath and is needed by the ClientSideRegionScanner (HStore keeps track cache statistics from LruBlockCache). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17544) Expose metrics for the CatalogJanitor
Esteban Gutierrez created HBASE-17544: - Summary: Expose metrics for the CatalogJanitor Key: HBASE-17544 URL: https://issues.apache.org/jira/browse/HBASE-17544 Project: HBase Issue Type: Improvement Reporter: Esteban Gutierrez Currently there is other way to know what the CatalogJanitor is doing except in the logs. We should have better visibility of when it was the last time the CatalogJanitor ran, how long it took to scan meta, the number of merged and parent regions cleaned on the last run, and if in maintenance mode (see HBASE-16008). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17305) Two active HBase Masters can run at the same time under certain circumstances
Esteban Gutierrez created HBASE-17305: - Summary: Two active HBase Masters can run at the same time under certain circumstances Key: HBASE-17305 URL: https://issues.apache.org/jira/browse/HBASE-17305 Project: HBase Issue Type: Bug Components: master Affects Versions: 2.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Critical This needs a little more investigation, but we found a very edgy case when the active master is restarted and a stand-by master tries to become active, however the original active master was able to become the active master again and just before the standby master passed the point of the transition to become active we ended up with two active masters running at the same time. Assuming the clock on both masters were accurate to milliseconds, this race happened in less than 85ms. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-9913) weblogic deployment project implementation under the mapreduce hbase reported a NullPointerException
[ https://issues.apache.org/jira/browse/HBASE-9913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-9913. -- Resolution: Duplicate Fixed in HBASE-12491 > weblogic deployment project implementation under the mapreduce hbase reported > a NullPointerException > > > Key: HBASE-9913 > URL: https://issues.apache.org/jira/browse/HBASE-9913 > Project: HBase > Issue Type: Bug > Components: hadoop2, mapreduce >Affects Versions: 0.94.10 > Environment: weblogic windows >Reporter: 刘泓 > Attachments: TableMapReduceUtil.class, TableMapReduceUtil.java > > > java.lang.NullPointerException > at java.io.File.(File.java:222) > at java.util.zip.ZipFile.(ZipFile.java:75) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.updateMap(TableMapReduceUtil.java:617) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.findOrCreateJar(TableMapReduceUtil.java:597) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:557) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:518) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:144) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:221) > at > org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:87) > at > com.easymap.ezserver6.map.source.hbase.convert.HBaseMapMerge.beginMerge(HBaseMapMerge.java:163) > at > com.easymap.ezserver6.app.servlet.EzMapToHbaseService.doPost(EzMapToHbaseService.java:32) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:227) > at > weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:125) > at > weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:292) > at > weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:175) > at > weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3594) > at > weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321) > at > weblogic.security.service.SecurityManager.runAs(SecurityManager.java:121) > at > weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2202) > at > weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2108) > at > weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1432) > at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201) > at weblogic.work.ExecuteThread.run(ExecuteThread.java:173) > > > my project deploy under weblogic11,and when i run hbase mapreduce,it throws a > NullPointerException.i found the method > TableMapReduceUtil.findContainingJar() returns null,so i debug it, > url.getProtocol() return "zip",but the file is a jar file,so the if condition: > if ("jar".equals(url.getProtocol())) cann't run. so i add a if condition to > judge "zip" type -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-9925) Don't close a file if doesn't EOF while replicating
[ https://issues.apache.org/jira/browse/HBASE-9925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-9925. -- Resolution: Later Resolving for later, we should better fix other replication bottlenecks before we hit some contention from the NN. > Don't close a file if doesn't EOF while replicating > --- > > Key: HBASE-9925 > URL: https://issues.apache.org/jira/browse/HBASE-9925 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0, 0.96.0 >Reporter: Himanshu Vashishtha > > While doing replication, we open and close the WAL file _every_ time we read > entries to send. We could open/close the reader only when we hit EOF. That > would alleviate some NN load, especially on a write heavy cluster. > This came while discussing our current open/close heuristic in replication > with [~jdcryans]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-9940) PerformanceEvaluation should have a test with many table options on (Bloom, compression, FAST_DIFF, etc.)
[ https://issues.apache.org/jira/browse/HBASE-9940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-9940. -- Resolution: Fixed Most of the features requested by [~jmspaggi] are already present in PerformanceEvaluation. Created HBASE-17116 to address missing feature to configure block size. > PerformanceEvaluation should have a test with many table options on (Bloom, > compression, FAST_DIFF, etc.) > - > > Key: HBASE-9940 > URL: https://issues.apache.org/jira/browse/HBASE-9940 > Project: HBase > Issue Type: Bug > Components: Performance, test >Affects Versions: 0.96.0, 0.94.13 >Reporter: Jean-Marc Spaggiari >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17116) [PerformanceEvaluation] Add option to configure block size
Esteban Gutierrez created HBASE-17116: - Summary: [PerformanceEvaluation] Add option to configure block size Key: HBASE-17116 URL: https://issues.apache.org/jira/browse/HBASE-17116 Project: HBase Issue Type: Bug Components: tooling Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.2.5 Reporter: Esteban Gutierrez Priority: Trivial Followup from HBASE-9940 to add option to configure block size for PerformanceEvaluation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-9968) Cluster is non operative if the RS carrying -ROOT- is expiring after deleting -ROOT- region transition znode and before adding it to online regions.
[ https://issues.apache.org/jira/browse/HBASE-9968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-9968. -- Resolution: Won't Fix We no longer have {{-ROOT-}} > Cluster is non operative if the RS carrying -ROOT- is expiring after deleting > -ROOT- region transition znode and before adding it to online regions. > > > Key: HBASE-9968 > URL: https://issues.apache.org/jira/browse/HBASE-9968 > Project: HBase > Issue Type: Bug > Components: Region Assignment >Affects Versions: 0.94.11 >Reporter: rajeshbabu >Assignee: rajeshbabu > > When we check whether the dead region is carrying root or meta, first we will > check any transition znode for the region is there or not. In this case it > got deleted. So from zookeeper we cannot find the region location. > {code} > try { > data = ZKAssign.getData(master.getZooKeeper(), hri.getEncodedName()); > } catch (KeeperException e) { > master.abort("Unexpected ZK exception reading unassigned node for > region=" > + hri.getEncodedName(), e); > } > {code} > Now we will check from the AssignmentManager whether its in online regions or > not > {code} > ServerName addressFromAM = getRegionServerOfRegion(hri); > boolean matchAM = (addressFromAM != null && > addressFromAM.equals(serverName)); > LOG.debug("based on AM, current region=" + hri.getRegionNameAsString() + > " is on server=" + (addressFromAM != null ? addressFromAM : "null") + > " server being checked: " + serverName); > {code} > From AM we will get null because while adding region to online regions we > will check whether the RS is in onlineservers or not and if not we will not > add the region to online regions. > {code} > if (isServerOnline(sn)) { > this.regions.put(regionInfo, sn); > addToServers(sn, regionInfo); > this.regions.notifyAll(); > } else { > LOG.info("The server is not in online servers, ServerName=" + > sn.getServerName() + ", region=" + regionInfo.getEncodedName()); > } > {code} > Even though the dead regionserver carrying ROOT region, its returning false. > After that ROOT region never assigned. > Here are the logs > {code} > 2013-11-11 18:04:14,730 INFO > org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region > location in ZooKeeper > 2013-11-11 18:04:14,775 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan > was found (or we are ignoring an existing plan) for -ROOT-,,0.70236052 so > generated a random one; hri=-ROOT-,,0.70236052, src=, > dest=HOST-10-18-40-69,60020,1384173244404; 1 (online=1, available=1) > available servers > 2013-11-11 18:04:14,809 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region > -ROOT-,,0.70236052 to HOST-10-18-40-69,60020,1384173244404 > 2013-11-11 18:04:18,375 DEBUG > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: > Looked up root region location, > connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@12133926; > serverName=HOST-10-18-40-69,60020,1384173244404 > 2013-11-11 18:04:26,213 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: Handling > transition=RS_ZK_REGION_OPENED, server=HOST-10-18-40-69,60020,1384173244404, > region=70236052/-ROOT- > 2013-11-11 18:04:26,213 INFO > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED > event for -ROOT-,,0.70236052 from HOST-10-18-40-69,60020,1384173244404; > deleting unassigned node > 2013-11-11 18:04:31,553 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: based on AM, current > region=-ROOT-,,0.70236052 is on server=null server being checked: > HOST-10-18-40-69,60020,1384173244404 > 2013-11-11 18:04:31,561 DEBUG org.apache.hadoop.hbase.master.ServerManager: > Added=HOST-10-18-40-69,60020,1384173244404 to dead servers, submitted > shutdown handler to be executed, root=false, meta=false > {code} > {code} > 2013-11-11 18:04:32,323 DEBUG > org.apache.hadoop.hbase.master.AssignmentManager: The znode of region > -ROOT-,,0.70236052 has been deleted. > 2013-11-11 18:04:32,323 INFO > org.apache.hadoop.hbase.master.AssignmentManager: The server is not in online > servers, ServerName=HOST-10-18-40-69,60020,1384173244404, region=70236052 > 2013-11-11 18:04:32,323 INFO > org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the > region -ROOT-,,0.70236052 that was online on > HOST-10-18-40-69,60020,1384173244404 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-6205) Support an option to keep data of dropped table for some time
[ https://issues.apache.org/jira/browse/HBASE-6205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-6205. -- Resolution: Later Resolving for later, We already have the archive and snapshots and we could take care of this after HBASE-14439. > Support an option to keep data of dropped table for some time > - > > Key: HBASE-6205 > URL: https://issues.apache.org/jira/browse/HBASE-6205 > Project: HBase > Issue Type: New Feature >Affects Versions: 0.94.0, 0.95.2 >Reporter: chunhui shen >Assignee: chunhui shen > Attachments: HBASE-6205.patch, HBASE-6205v2.patch, > HBASE-6205v3.patch, HBASE-6205v4.patch, HBASE-6205v5.patch > > > User may drop table accidentally because of error code or other uncertain > reasons. > Unfortunately, it happens in our environment because one user make a mistake > between production cluster and testing cluster. > So, I just give a suggestion, do we need to support an option to keep data of > dropped table for some time, e.g. 1 day > In the patch: > We make a new dir named .trashtables in the rood dir. > In the DeleteTableHandler, we move files in dropped table's dir to trash > table dir instead of deleting them directly. > And Create new class TrashCleaner which will clean dropped tables if it is > time out with a period check. > Default keep time for dropped tables is 1 day, and check period is 1 hour. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-3991) Add Util folder for Utility Scripts
[ https://issues.apache.org/jira/browse/HBASE-3991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-3991. -- Resolution: Won't Fix No progress on this in 5 years. We tend to unify things on the main hbase script or the hbase shell, in some cases like the region_mover.rb we ended creating better tooling. > Add Util folder for Utility Scripts > --- > > Key: HBASE-3991 > URL: https://issues.apache.org/jira/browse/HBASE-3991 > Project: HBase > Issue Type: Brainstorming > Components: scripts, util >Affects Versions: 0.92.0 >Reporter: Nicolas Spiegelberg >Assignee: Nicolas Spiegelberg > > This JIRA is to start discussion around adding some sort of 'util' folder to > HBase for common operational scripts. We're starting to write a lot of HBase > analysis utilities that we'd love to share with open source, but don't want > to clutter the 'bin' folder, which seems like it should be reserved for > start/stop tasks. If we add a 'util' folder, how do we keep it from becoming > a cesspool of half-baked & duplicated operational hacks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-3975) NoServerForRegionException stalls write pipeline
[ https://issues.apache.org/jira/browse/HBASE-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-3975. -- Resolution: Fixed The new async client is taking care of this. > NoServerForRegionException stalls write pipeline > > > Key: HBASE-3975 > URL: https://issues.apache.org/jira/browse/HBASE-3975 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 0.89.20100924, 0.90.3, 0.92.0 >Reporter: Nicolas Spiegelberg >Assignee: Nicolas Spiegelberg > > When we process a batch of puts, the current algorithm basically goes like > this: > 1. Find all servers for the Put requests > 2. Partition Puts by servers > 3. Make requests > 4. Collect success/error results > If we throw an IOE in step 1 or 2, we will abort the whole batch operation. > In our case, this was an NoServerForRegionException due to region > rebalancing. However, the asynchronous put case normally has requests going > to a wide variety of servers. We should fail all the put requests that throw > an IOE in Step 1 but continue to try all the put requests that succeed at > this stage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-3854) [thrift] broken thrift examples
[ https://issues.apache.org/jira/browse/HBASE-3854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-3854. -- Resolution: Later Resolving as later for now. We should fix coverage on the hbase-examples module. At least the code generation for php, perl and others seems to work. > [thrift] broken thrift examples > --- > > Key: HBASE-3854 > URL: https://issues.apache.org/jira/browse/HBASE-3854 > Project: HBase > Issue Type: Bug > Components: Thrift >Affects Versions: 0.20.0 >Reporter: Alexey Diomin >Priority: Minor > > We introduce NotFound exception in HBASE-1292, but we drop it in HBASE-1367. > In result: > 1. incorrect doc in Hbase.thrift in as result in generated java and java-doc > 2. broken examples in src/examples/thrift/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-3792) TableInputFormat leaks ZK connections
[ https://issues.apache.org/jira/browse/HBASE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-3792. -- Resolution: Won't Fix > TableInputFormat leaks ZK connections > - > > Key: HBASE-3792 > URL: https://issues.apache.org/jira/browse/HBASE-3792 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.90.1 > Environment: Java 1.6.0_24, Mac OS X 10.6.7 >Reporter: Bryan Keller > Attachments: patch0.90.4, tableinput.patch > > > The TableInputFormat creates an HTable using a new Configuration object, and > it never cleans it up. When running a Mapper, the TableInputFormat is > instantiated and the ZK connection is created. While this connection is not > explicitly cleaned up, the Mapper process eventually exits and thus the > connection is closed. Ideally the TableRecordReader would close the > connection in its close() method rather than relying on the process to die > for connection cleanup. This is fairly easy to implement by overriding > TableRecordReader, and also overriding TableInputFormat to specify the new > record reader. > The leak occurs when the JobClient is initializing and needs to retrieves the > splits. To get the splits, it instantiates a TableInputFormat. Doing so > creates a ZK connection that is never cleaned up. Unlike the mapper, however, > my job client process does not die. Thus the ZK connections accumulate. > I was able to fix the problem by writing my own TableInputFormat that does > not initialize the HTable in the getConf() method and does not have an HTable > member variable. Rather, it has a variable for the table name. The HTable is > instantiated where needed and then cleaned up. For example, in the > getSplits() method, I create the HTable, then close the connection once the > splits are retrieved. I also create the HTable when creating the record > reader, and I have a record reader that closes the connection when done. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-3791) Display total number of zookeeper connections on master.jsp
[ https://issues.apache.org/jira/browse/HBASE-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-3791. -- Resolution: Fixed zk.jsp (ZKUtil.dump() see HBASE-2692) from the Master UI already provides the total number of connections to ZK open. > Display total number of zookeeper connections on master.jsp > --- > > Key: HBASE-3791 > URL: https://issues.apache.org/jira/browse/HBASE-3791 > Project: HBase > Issue Type: Improvement > Components: Zookeeper >Affects Versions: 0.90.2 >Reporter: Ted Yu > Attachments: 3791.patch > > > Quite often, user needs to telnet to Zookeeper and type 'stats' to get the > connections, or count the connections on zk.jsp > We should display the total number of connections beside the link to zk.jsp > on master.jsp -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-3786) Enhance MasterCoprocessorHost to include notification of balancing of each region
[ https://issues.apache.org/jira/browse/HBASE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-3786. -- Resolution: Won't Fix HBASE-4552 was closed and as [~apurtell] stated in HBASE-3529 with NGDATA's hbase-indexer we have some indexing functionality that relies on our replication infra. > Enhance MasterCoprocessorHost to include notification of balancing of each > region > - > > Key: HBASE-3786 > URL: https://issues.apache.org/jira/browse/HBASE-3786 > Project: HBase > Issue Type: Improvement > Components: Coprocessors >Affects Versions: 0.90.2 >Reporter: Ted Yu > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-3782) Multi-Family support for bulk upload tools causes File Not Found Exception
[ https://issues.apache.org/jira/browse/HBASE-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-3782. -- Resolution: Won't Fix Should be fixed by atomic bulk loading from HBASE-4552 > Multi-Family support for bulk upload tools causes File Not Found Exception > -- > > Key: HBASE-3782 > URL: https://issues.apache.org/jira/browse/HBASE-3782 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 0.90.3 >Reporter: Nichole Treadway > Attachments: HBASE-3782.patch > > > I've been testing HBASE-1861 in 0.90.2, which adds multi-family support for > bulk upload tools. > I found that when running the importtsv program, some reduce tasks fail with > a File Not Found exception if there are no keys in the input data which fall > into the region assigned to that reduce task. From what I can determine, it > seems that an output directory is created in the write() method and expected > to exist in the writeMetaData() method...if there are no keys to be written > for that reduce task, the write method is never called and the output > directory is never created, but writeMetaData is expecting the output > directory to exist...thus the FnF exception: > 2011-03-17 11:52:48,095 WARN org.apache.hadoop.mapred.TaskTracker: Error > running child > java.io.FileNotFoundException: File does not exist: > hdfs://master:9000/awardsData/_temporary/_attempt_201103151859_0066_r_00_0 > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:468) > at > org.apache.hadoop.hbase.regionserver.StoreFile.getUniqueFile(StoreFile.java:580) > at > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.writeMetaData(HFileOutputFormat.java:186) > at > org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.close(HFileOutputFormat.java:247) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) > at org.apache.hadoop.mapred.Child.main(Child.java:170) > Simply checking if the file exists should fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-3778) HBaseAdmin.create doesn't create empty boundary keys
[ https://issues.apache.org/jira/browse/HBASE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-3778. -- Resolution: Duplicate > HBaseAdmin.create doesn't create empty boundary keys > > > Key: HBASE-3778 > URL: https://issues.apache.org/jira/browse/HBASE-3778 > Project: HBase > Issue Type: Bug >Affects Versions: 0.90.2 >Reporter: Ted Dunning > Attachments: HBASE-3778.patch > > > In my ycsb stuff, I have code that looks like this: > {code} > String startKey = "user102000"; > String endKey = "user94000"; > admin.createTable(descriptor, startKey.getBytes(), endKey.getBytes(), > regions); > {code} > The result, however, is a table where the first and last region has defined > first and last keys rather than empty keys. > The patch I am about to attach fixes this, I think. I have some worries > about other uses of Bytes.split, however, and would like some eyes on this > patch. Perhaps we need a new dialect of split. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-3725) HBase increments from old value after delete and write to disk
[ https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-3725. -- Resolution: Resolved Resolving per last comment from [~larsh] > HBase increments from old value after delete and write to disk > -- > > Key: HBASE-3725 > URL: https://issues.apache.org/jira/browse/HBASE-3725 > Project: HBase > Issue Type: Bug > Components: io, regionserver >Affects Versions: 0.90.1 >Reporter: Nathaniel Cook >Assignee: ShiXing > Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, > HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, > HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, > HBASE-3725.patch > > > Deleted row values are sometimes used for starting points on new increments. > To reproduce: > Create a row "r". Set column "x" to some default value. > Force hbase to write that value to the file system (such as restarting the > cluster). > Delete the row. > Call table.incrementColumnValue with "some_value" > Get the row. > The returned value in the column was incremented from the old value before > the row was deleted instead of being initialized to "some_value". > Code to reproduce: > {code} > import java.io.IOException; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.HColumnDescriptor; > import org.apache.hadoop.hbase.HTableDescriptor; > import org.apache.hadoop.hbase.client.Delete; > import org.apache.hadoop.hbase.client.Get; > import org.apache.hadoop.hbase.client.HBaseAdmin; > import org.apache.hadoop.hbase.client.HTableInterface; > import org.apache.hadoop.hbase.client.HTablePool; > import org.apache.hadoop.hbase.client.Increment; > import org.apache.hadoop.hbase.client.Result; > import org.apache.hadoop.hbase.util.Bytes; > public class HBaseTestIncrement > { > static String tableName = "testIncrement"; > static byte[] infoCF = Bytes.toBytes("info"); > static byte[] rowKey = Bytes.toBytes("test-rowKey"); > static byte[] newInc = Bytes.toBytes("new"); > static byte[] oldInc = Bytes.toBytes("old"); > /** >* This code reproduces a bug with increment column values in hbase >* Usage: First run part one by passing '1' as the first arg >*Then restart the hbase cluster so it writes everything to disk >*Run part two by passing '2' as the first arg >* >* This will result in the old deleted data being found and used for > the increment calls >* >* @param args >* @throws IOException >*/ > public static void main(String[] args) throws IOException > { > if("1".equals(args[0])) > partOne(); > if("2".equals(args[0])) > partTwo(); > if ("both".equals(args[0])) > { > partOne(); > partTwo(); > } > } > /** >* Creates a table and increments a column value 10 times by 10 each > time. >* Results in a value of 100 for the column >* >* @throws IOException >*/ > static void partOne()throws IOException > { > Configuration conf = HBaseConfiguration.create(); > HBaseAdmin admin = new HBaseAdmin(conf); > HTableDescriptor tableDesc = new HTableDescriptor(tableName); > tableDesc.addFamily(new HColumnDescriptor(infoCF)); > if(admin.tableExists(tableName)) > { > admin.disableTable(tableName); > admin.deleteTable(tableName); > } > admin.createTable(tableDesc); > HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE); > HTableInterface table = pool.getTable(Bytes.toBytes(tableName)); > //Increment unitialized column > for (int j = 0; j < 10; j++) > { > table.incrementColumnValue(rowKey, infoCF, oldInc, > (long)10); > Increment inc = new Increment(rowKey); > inc.addColumn(infoCF, newInc, (long)10); > table.increment(inc); > } > Get get = new Get(rowKey); > Result r = table.get(get); > System.out.println("initial values: new " + > Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + > Bytes.toLong(r.getValue(infoCF, oldInc))); > } > /** >* First deletes the data then increments the column 10 times by 1 each > time >* >* Should result in
[jira] [Resolved] (HBASE-3432) [hbck] Add "remove table" switch
[ https://issues.apache.org/jira/browse/HBASE-3432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-3432. -- Resolution: Won't Fix closing as stale. Not seen in a long time. > [hbck] Add "remove table" switch > > > Key: HBASE-3432 > URL: https://issues.apache.org/jira/browse/HBASE-3432 > Project: HBase > Issue Type: New Feature > Components: util >Affects Versions: 0.89.20100924 >Reporter: Lars George >Priority: Minor > > This happened before and I am not sure how the new Master improves on it > (this stuff is only available between the lines are buried in some peoples > heads - one other thing I wish was for a better place to communicate what > each path improves). Just so we do not miss it, there is an issue that > sometimes disabling large tables simply times out and the table gets stuck in > limbo. > From the CDH User list: > {quote} > On Fri, Jan 7, 2011 at 1:57 PM, Sean Sechristwrote: > To get them out of META, you can just scan '.META.' for that table name, and > delete those rows. We had to do that a few months ago. > -Sean > That did it. For the benefit of others, here's code. Beware the literal > table names, run at your own peril. > {quote} > {code} > import java.io.IOException; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.client.HTable; > import org.apache.hadoop.hbase.client.Delete; > import org.apache.hadoop.hbase.client.Result; > import org.apache.hadoop.hbase.client.MetaScanner; > import org.apache.hadoop.hbase.util.Bytes; > public class CleanFromMeta { > public static class Cleaner implements MetaScanner.MetaScannerVisitor { > public HTable meta = null; > public Cleaner(Configuration conf) throws IOException { > meta = new HTable(conf, ".META."); > } > public boolean processRow(Result rowResult) throws IOException { > String r = new String(rowResult.getRow()); > if (r.startsWith("webtable,")) { > meta.delete(new Delete(rowResult.getRow())); > System.out.println("Deleting row " + rowResult); > } > return true; > } > } > public static void main(String[] args) throws Exception { > String tname = ".META."; > Configuration conf = HBaseConfiguration.create(); > MetaScanner.metaScan(conf, new Cleaner(conf), > Bytes.toBytes("webtable")); > } > } > {code} > I suggest to move this into HBaseFsck. I do not like personally to have these > JRuby scripts floating around that may or may not help. This should be > available if a user gets stuck and knows what he is doing (they can delete > from .META. anyways). Maybe a "\-\-disable-table \-\-force" or > so? But since disable is already in the shell we could add an "\-\-force" > there? Or add a "\-\-delete-table " to the hbck? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-3307) Add checkAndPut to the Thrift API
[ https://issues.apache.org/jira/browse/HBASE-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-3307. -- Resolution: Duplicate dup of HBASE-10960 > Add checkAndPut to the Thrift API > - > > Key: HBASE-3307 > URL: https://issues.apache.org/jira/browse/HBASE-3307 > Project: HBase > Issue Type: Improvement > Components: Thrift >Affects Versions: 0.89.20100924 >Reporter: Chris Tarnas >Priority: Minor > Labels: thrift > > It would be very useful to have the checkAndPut method available via the > Thrift API. This would both allow for easier atomic updates as well as cut > down on at least one Thrift roundtrip for quite a few common tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-2535) split hostname format should be consistent with tasktracker for locality
[ https://issues.apache.org/jira/browse/HBASE-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-2535. -- Resolution: Duplicate resolved by HBASE-7693 > split hostname format should be consistent with tasktracker for locality > > > Key: HBASE-2535 > URL: https://issues.apache.org/jira/browse/HBASE-2535 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 0.20.4 >Reporter: John Sichi > > I was running a mapreduce job (via Hive) against HBase, and noticed that I > wasn't getting any locality (the input split location and the task tracker > machine in the job tracker UI were always different, and "Rack-local map > tasks" in the job counters was 0). > I tracked this down to a discrepancy in the way hostnames were being compared. > The task tracker detail had a Host like > /f/s/1.2.3.4/h.s.f.com. > (with trailing dot) > But the Input Split Location had > /f/s/1.2.3.4/h.s.f.com > (without trailing dot) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-2434) Add scanner caching option to Export and write buffer option for Import
[ https://issues.apache.org/jira/browse/HBASE-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-2434. -- Resolution: Won't Fix No longer relevant, superseded by the buffered mutator and [~yangzhe1991]'s rationalization on sizing and timing scanners. > Add scanner caching option to Export and write buffer option for Import > --- > > Key: HBASE-2434 > URL: https://issues.apache.org/jira/browse/HBASE-2434 > Project: HBase > Issue Type: Improvement > Components: util >Affects Versions: 0.20.3 >Reporter: Ted Yu > Original Estimate: 1h > Remaining Estimate: 1h > > An option of number of rows to fetch every time we hit a region server should > be added to mapreduce.Export so that createSubmittableJob() calls > s.setCaching() with the specified value. > Also, an option of write buffer size should be added to mapreduce.Import so > that we can set write buffer. Sample calls: > +table.setAutoFlush(false); > +table.setWriteBufferSize(desired_buffer_size); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-2376) Add special SnapshotScanner which presents view of all data at some time in the past
[ https://issues.apache.org/jira/browse/HBASE-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-2376. -- Resolution: Later Equivalent functionality can be achieved by using HBASE-4536, HBASE-4071 if you think this stills necessary please re-open. > Add special SnapshotScanner which presents view of all data at some time in > the past > > > Key: HBASE-2376 > URL: https://issues.apache.org/jira/browse/HBASE-2376 > Project: HBase > Issue Type: New Feature > Components: Client, regionserver >Affects Versions: 0.20.3 >Reporter: Jonathan Gray >Assignee: Pritam Damania > > In order to support a particular kind of database "snapshot" feature which > doesn't require copying data, we came up with the idea for a special > SnapshotScanner that would present a view of your data at some point in the > past. The primary use case for this would be to be able to recover > particular data/rows (but not all data, like a global rollback) should they > have somehow been messed up (application fault, application bug, user error, > etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-2213) HCD should only have those fields explicitly set by user while creating tables
[ https://issues.apache.org/jira/browse/HBASE-2213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-2213. -- Resolution: Won't Fix Stale, re-open if you consider this needs to be implemented. > HCD should only have those fields explicitly set by user while creating tables > -- > > Key: HBASE-2213 > URL: https://issues.apache.org/jira/browse/HBASE-2213 > Project: HBase > Issue Type: Bug >Affects Versions: 0.20.3 >Reporter: ryan rawson > > right now we take the default HCD fields and 'snapshot' them into every HCD. > So things like 'BLOCKCACHE' and 'FILESIZE' are in every table, even if they > don't differ from the defaults. If the default changes in a > meanful/important way, the user is left with the unenviable task of (a) > determining this happened and (b) actually going through and > disabling/altering the tables to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17058) Lower epsilon used for jitter verification from HBASE-15324
Esteban Gutierrez created HBASE-17058: - Summary: Lower epsilon used for jitter verification from HBASE-15324 Key: HBASE-17058 URL: https://issues.apache.org/jira/browse/HBASE-17058 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 1.2.4, 1.1.7, 2.0.0, 1.3.0, 1.4.0 Reporter: Esteban Gutierrez The current epsilon used is 1E-6 and its too big it might overflow the desiredMaxFileSize. A trivial fix is to lower the epsilon to 2^-52 or even 2^-53. An option to consider too is just to shift the jitter to always decrement hbase.hregion.max.filesize (MAX_FILESIZE) instead of increase the size of the region and having to deal with the round off. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17007) Move ZooKeeper logging to its own log file
Esteban Gutierrez created HBASE-17007: - Summary: Move ZooKeeper logging to its own log file Key: HBASE-17007 URL: https://issues.apache.org/jira/browse/HBASE-17007 Project: HBase Issue Type: Bug Components: Zookeeper Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Trivial ZooKeeper logging can be too verbose. Lets move ZooKeeper logging to a different log file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16774) [shell] Add coverage to TestShell when ZooKeeper is not reachable
Esteban Gutierrez created HBASE-16774: - Summary: [shell] Add coverage to TestShell when ZooKeeper is not reachable Key: HBASE-16774 URL: https://issues.apache.org/jira/browse/HBASE-16774 Project: HBase Issue Type: Improvement Components: shell, test Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez While testing a couple of things in master I noticed that after some of the changes done in HBASE-16117 the hbase shell would die when there is no ZooKeeper server up or if we get another ZK exception. This is to add coverage to test the shell when ZK is not up or if we get another exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16450) Shell tool to dump replication queues
Esteban Gutierrez created HBASE-16450: - Summary: Shell tool to dump replication queues Key: HBASE-16450 URL: https://issues.apache.org/jira/browse/HBASE-16450 Project: HBase Issue Type: Improvement Components: Replication Affects Versions: 1.2.2, 1.1.5, 2.0.0, 1.3.0 Reporter: Esteban Gutierrez Currently there is no way to dump list of the configured queues and the replication queues when replication is enabled. Unfortunately the HBase master only offers an option to dump the whole content of the znodes but not details on the queues being processed on each RS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16379) [replication] Minor improvement to replication/copy_tables_desc.rb
Esteban Gutierrez created HBASE-16379: - Summary: [replication] Minor improvement to replication/copy_tables_desc.rb Key: HBASE-16379 URL: https://issues.apache.org/jira/browse/HBASE-16379 Project: HBase Issue Type: Improvement Components: Replication, shell Affects Versions: 1.2.2, 1.3.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Trivial copy_tables_desc.rb is helpful for quickly setting up a table remotely based on an existing schema. However it does copy by default all tables. Now you can pass a list of tables as an optional third argument and it will also display what table descriptors where copied. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15612) Minor doc improvements to CellCounter and RowCounter documentation
Esteban Gutierrez created HBASE-15612: - Summary: Minor doc improvements to CellCounter and RowCounter documentation Key: HBASE-15612 URL: https://issues.apache.org/jira/browse/HBASE-15612 Project: HBase Issue Type: Improvement Components: documentation, mapreduce Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Trivial Both Javadoc and the HBase Book need to reflect that is possible to specify an optional time range in the command line arguments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15511) ClusterStatus should be able
Esteban Gutierrez created HBASE-15511: - Summary: ClusterStatus should be able Key: HBASE-15511 URL: https://issues.apache.org/jira/browse/HBASE-15511 Project: HBase Issue Type: Improvement Reporter: Esteban Gutierrez -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15489) Improve handling of hbase.rpc.protection configuration mismatch when using replication
Esteban Gutierrez created HBASE-15489: - Summary: Improve handling of hbase.rpc.protection configuration mismatch when using replication Key: HBASE-15489 URL: https://issues.apache.org/jira/browse/HBASE-15489 Project: HBase Issue Type: Improvement Reporter: Esteban Gutierrez This probably should be a sub-task for major a major revamp of how we report our replication metrics in HBase in the UI. After switching {{hbase.rpc.protection}} to {{privacy}} in one cluster I didn't noticed immediately there was a mismatch across my other clusters which caused replication to stop. Ideally if this happens we should have a better log message and show this mismatch in the RegionServer and Master UIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14952) hbase-assembly has hbase-external-blockcache missing
Esteban Gutierrez created HBASE-14952: - Summary: hbase-assembly has hbase-external-blockcache missing Key: HBASE-14952 URL: https://issues.apache.org/jira/browse/HBASE-14952 Project: HBase Issue Type: Bug Components: build, dependencies Affects Versions: 1.2.0 Reporter: Esteban Gutierrez Assignee: Sean Busbey Priority: Blocker After generating a tarball we noticed that hbase-external-blockcache was missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14500) Remove load of deprecated MOB ruby scripts after HBASE-14227
Esteban Gutierrez created HBASE-14500: - Summary: Remove load of deprecated MOB ruby scripts after HBASE-14227 Key: HBASE-14500 URL: https://issues.apache.org/jira/browse/HBASE-14500 Project: HBase Issue Type: Bug Components: shell Affects Versions: 2.0.0 Reporter: Esteban Gutierrez -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14405) region_mover.rb should verify the location of the region that is being scanned by isSuccessfulScan()
Esteban Gutierrez created HBASE-14405: - Summary: region_mover.rb should verify the location of the region that is being scanned by isSuccessfulScan() Key: HBASE-14405 URL: https://issues.apache.org/jira/browse/HBASE-14405 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.2.1, 1.0.3, 1.1.3, 0.98.16 Reporter: Esteban Gutierrez When we do isSuccessfulScan() to verify if the region can be scanned or not we never verify if the scanner is returning a result from the expected RegionServer, e.g. if unloading the regions from a RS, the scanner should return from the source and after moving the region the scanner should come from a different RS and if loading the regions we should verify in a similar way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14358) Parent region is not removed from regionstates after a successful split
Esteban Gutierrez created HBASE-14358: - Summary: Parent region is not removed from regionstates after a successful split Key: HBASE-14358 URL: https://issues.apache.org/jira/browse/HBASE-14358 Project: HBase Issue Type: Bug Affects Versions: 1.2.1, 1.0.3, 1.1.3 Reporter: Esteban Gutierrez Priority: Critical Ran into this while trying to find out why region_mover.rb was not catching an exception after a region was split. Digging further I found that the problem is happening in the handling of the region state in the Master since we don't remove the old state after the split is successful: {code} 2015-09-03 02:56:49,255 INFO org.apache.hadoop.hbase.master.AssignmentManager: Ignored moving region not assigned: {ENCODED => 9a4930ed41dc7013d9956240e6f5c03e, NAME => 'u,user3605,1432797255754.9a4930ed41dc7013d9956240e6f5c03e.', STARTKEY => 'user3605', ENDKEY => 'user3723'}, {9a4930ed41dc7013d9956240e6f5c03e state=SPLIT, ts=1441273152561, server=a2209.halxg.cloudera.com,22101,1441243232790} {code} I don't think the problem is happening in the master branch but at least I've been able to confirm this is happening on branch-1 and branch-1.2 at least. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14354) Minor improvements for usage of the mlock agent
Esteban Gutierrez created HBASE-14354: - Summary: Minor improvements for usage of the mlock agent Key: HBASE-14354 URL: https://issues.apache.org/jira/browse/HBASE-14354 Project: HBase Issue Type: Bug Components: hbase, regionserver Reporter: Esteban Gutierrez Priority: Trivial 1. MLOCK_AGENT points to the wrong path in hbase-config.sh When the mlock agent is build, the binary is installed under $HBASE_HOME/lib/native and not under $HBASE_HOME/native 2. By default we pass $HBASE_REGIONSERVER_UID to the agent options which causes the mlock agent to attempt to do a setuid in order to mlock the memory of the RS process. We should only pass that user if specified in the environment, not by default. (the agent currently handles that gracefully) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14347) Add a switch to DynamicClassLoader to be disabled and make that the default
Esteban Gutierrez created HBASE-14347: - Summary: Add a switch to DynamicClassLoader to be disabled and make that the default Key: HBASE-14347 URL: https://issues.apache.org/jira/browse/HBASE-14347 Project: HBase Issue Type: Bug Components: Client, defaults, regionserver Affects Versions: 2.0.0, 1.2.0, 1.1.2, 0.98.15, 1.0.3 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Since HBASE-1936 we have the option to load jars dynamically by default from HDFS or the local filesystem, however hbase.dynamic.jars.dir points to a directory that could be world writable it potentially opens a security problem in both the client side and the RS. We should consider to have a switch to enable or disable this option and it should be off by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14076) ResultSerialization and MutationSerialization can throw InvalidProtocolBufferException when serializing a cell larger than 64MB
Esteban Gutierrez created HBASE-14076: - Summary: ResultSerialization and MutationSerialization can throw InvalidProtocolBufferException when serializing a cell larger than 64MB Key: HBASE-14076 URL: https://issues.apache.org/jira/browse/HBASE-14076 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez This was reported in CRUNCH-534 but is a problem how we handle deserialization of large Cells ( 64MB) in ResultSerialization and MutationSerialization. The fix is just re-using what it was done in HBASE-13230. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14060) Add FuzzyRowFilter to ParseFilter
Esteban Gutierrez created HBASE-14060: - Summary: Add FuzzyRowFilter to ParseFilter Key: HBASE-14060 URL: https://issues.apache.org/jira/browse/HBASE-14060 Project: HBase Issue Type: Bug Components: Filters, Scanners, Usability Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez FuzzyRowFilter is not currently exposed in ParseFilter. I think it would be nice to have it there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14059) We should add a RS to the dead servers list if admin calls fail more than a threshold
Esteban Gutierrez created HBASE-14059: - Summary: We should add a RS to the dead servers list if admin calls fail more than a threshold Key: HBASE-14059 URL: https://issues.apache.org/jira/browse/HBASE-14059 Project: HBase Issue Type: Bug Components: master, regionserver, rpc Affects Versions: 0.98.13 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Critical I ran into this problem twice this week: calls from the HBase master to a RS can timeout since the RS call queue size has been maxed out, however since the RS is not dead (ephemeral znode still present) the master keeps attempting to perform admin tasks like trying to open or close a region but those operations eventually fail after we run out of retries or the assignment manager attempts to re-assign to other RSs. From the side effects of this I've noticed master operations to be fully blocked or RITs since we cannot close the region and open the region in a new location since RS is not dead. A potential solution for this is to add the RS to the list of dead RSs after certain number of calls from the master to the RS fail. I've noticed only the problem in 0.98.x but it should be present in all versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13729) old hbase.regionserver.global.memstore.upperLimit is ignored if present
Esteban Gutierrez created HBASE-13729: - Summary: old hbase.regionserver.global.memstore.upperLimit is ignored if present Key: HBASE-13729 URL: https://issues.apache.org/jira/browse/HBASE-13729 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.1.0, 1.0.1, 2.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Critical If hbase.regionserver.global.memstore.upperLimit is present we should use it instead of hbase.regionserver.global.memstore.size the current implementation of HeapMemorySizeUtil.getGlobalMemStorePercent() asumes that if hbase.regionserver.global.memstore.size is not defined thenit should use the old configuration, however it should be the other way around. This has a large impact specially if doing a rolling upgrade of a cluster when the memstore upper limit has been changed from the default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13714) Add tracking of the total response queue size
Esteban Gutierrez created HBASE-13714: - Summary: Add tracking of the total response queue size Key: HBASE-13714 URL: https://issues.apache.org/jira/browse/HBASE-13714 Project: HBase Issue Type: Improvement Components: master, metrics, regionserver, rpc Affects Versions: 2.0.0, 1.0.2, 1.2.0 Reporter: Esteban Gutierrez I noticed this behavior while working on HBASE-13694: Once we are done processing a request, we decrement the call queue size on the RPC server. However, responses can be very large and sometimes sending them can take a long time. Since we don't keep track the response queue via metrics it is hard to spot when the responses are using too much resources on the RS. Ideally we should be tracking on the RS how much data we have in-flight in the response queue via metrics and not just in the logs if the size of the response exceeds a threshold (e.g hbase.ipc.warn.response.size or hbase.ipc.warn.response.time) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13694) CallQueueSize is incorrectly decremented after the response is sent
Esteban Gutierrez created HBASE-13694: - Summary: CallQueueSize is incorrectly decremented after the response is sent Key: HBASE-13694 URL: https://issues.apache.org/jira/browse/HBASE-13694 Project: HBase Issue Type: Bug Components: master, regionserver, rpc Affects Versions: 2.0.0, 1.1.0, 1.0.2, 1.2.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez We should decrement the CallQueueSize as soon as we no longer need the call around, e.g. after {{RpcServer.CurCall.set(null)}} otherwise we will be only pushing back to other client requests while we send the response back to the client that original caller. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13495) Create metrics for purged calls, abandoned calls and other RPC failures
Esteban Gutierrez created HBASE-13495: - Summary: Create metrics for purged calls, abandoned calls and other RPC failures Key: HBASE-13495 URL: https://issues.apache.org/jira/browse/HBASE-13495 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez Similar to HBASE-13477 this aimed to add metrics to keep track of how many calls are abandoned, purged and other states before the call is executed. This would be helpful to track the rate of channel closed exceptions when 100s or 1000s of clients disconnect or the calls are not correctly formed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13484) [docs] docs need to be specific about hbase.bucketcache.size range
Esteban Gutierrez created HBASE-13484: - Summary: [docs] docs need to be specific about hbase.bucketcache.size range Key: HBASE-13484 URL: https://issues.apache.org/jira/browse/HBASE-13484 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez This is not 100% clear for users: if hbase.bucketcache.size is between 0.0 and 1.0 then is a percentage. But if the value is above 1 then the value is expressed in KBs: From CacheConfig.getBucketCache(): {noformat} long bucketCacheSize = (long) (bucketCachePercentage 1? mu.getMax() * bucketCachePercentage: bucketCachePercentage * 1024 * 1024); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13483) [docs] onheap is not a valid bucket cache IO engine.
Esteban Gutierrez created HBASE-13483: - Summary: [docs] onheap is not a valid bucket cache IO engine. Key: HBASE-13483 URL: https://issues.apache.org/jira/browse/HBASE-13483 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 2.0.0 Reporter: Esteban Gutierrez From the HBase book: http://hbase.apache.org/book.html#hbase_default_configurations : {code} hbase.bucketcache.ioengine Description Where to store the contents of the bucketcache. One of: *onheap*, offheap, or file. If a file, set it to file:PATH_TO_FILE. See https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/io/hfile/CacheConfig.html for more information. {code} Instead of onheap it should be heap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13461) RegionSever Hlog flush BLOCKED on hbase-0.96.2-hadoop2
[ https://issues.apache.org/jira/browse/HBASE-13461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-13461. --- Resolution: Invalid RegionSever Hlog flush BLOCKED on hbase-0.96.2-hadoop2 Key: HBASE-13461 URL: https://issues.apache.org/jira/browse/HBASE-13461 Project: HBase Issue Type: Bug Affects Versions: 0.96.2 Environment: hbase-0.96.2-hadoop2 hadoop2.2.0 Reporter: zhangjg I try to dump thread stack below: RpcServer.handler=63,port=60020 daemon prio=10 tid=0x7fdcddc5d000 nid=0x5f9 waiting for monitor entry [0x7fd289194000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:98) - waiting to lock 0x7fd36c023728 (a org.apache.hadoop.hdfs.DFSOutputStream) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:59) at java.io.DataOutputStream.write(DataOutputStream.java:90) - locked 0x7fd510cfdc28 (a org.apache.hadoop.hdfs.client.HdfsDataOutputStream) at com.google.protobuf.CodedOutputStream.refreshBuffer(CodedOutputStream.java:833) at com.google.protobuf.CodedOutputStream.flush(CodedOutputStream.java:843) at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:91) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.append(ProtobufLogWriter.java:87) at org.apache.hadoop.hbase.regionserver.wal.FSHLog$LogSyncer.hlogFlush(FSHLog.java:1026) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1075) - locked 0x7fd2d9bbfad0 (a java.lang.Object) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1240) at org.apache.hadoop.hbase.regionserver.HRegion.syncOrDefer(HRegion.java:5593) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2315) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2028) at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4094) at org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3380) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3284) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26935) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2185) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1889) RpcServer.handler=12,port=60020 daemon prio=10 tid=0x7fdcddf2c800 nid=0x5c6 in Object.wait() [0x7fd28c4c7000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:1803) - locked 0x7fd45857c540 (a java.util.LinkedList) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1697) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1590) at org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1575) at org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:121) at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:135) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1098) at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1240) at org.apache.hadoop.hbase.regionserver.HRegion.syncOrDefer(HRegion.java:5593) at org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2315) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2028) at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4094) at org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3380) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3284) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26935) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2185) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1889) RpcServer.handler=11,port=60020 daemon prio=10 tid=0x7fdcdd9e1000 nid=0x5c5 in Object.wait() [0x7fd28c5c8000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at
[jira] [Created] (HBASE-13403) Make waitOnSafeMode configurable in MasterFileSystem
Esteban Gutierrez created HBASE-13403: - Summary: Make waitOnSafeMode configurable in MasterFileSystem Key: HBASE-13403 URL: https://issues.apache.org/jira/browse/HBASE-13403 Project: HBase Issue Type: Bug Components: master Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Minor We currently wait whatever is the configured value of hbase.server.thread.wakefrequency or the default 10 seconds. We should have a configuration to control how long we wait until the HDFS is no longer in safe mode, since using the existing hbase.server.thread.wakefrequency property to tune that can have adverse side effects. My proposal is to add a new property called hbase.master.waitonsafemode and start with the current default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13407) Add a configurable jitter to MemStoreFlusher#FlushHandler in order to smooth write latency
Esteban Gutierrez created HBASE-13407: - Summary: Add a configurable jitter to MemStoreFlusher#FlushHandler in order to smooth write latency Key: HBASE-13407 URL: https://issues.apache.org/jira/browse/HBASE-13407 Project: HBase Issue Type: Improvement Reporter: Esteban Gutierrez There is a very interesting behavior that I can reproduce consistently with many workloads from HBase 0.98 to HBase 1.0 since hbase.hstore.flusher.count was set by default to 2: when writes are evenly distributed across regions, memstores grow and flush about the same rate causing spikes in IO and CPU. The side effect of those spikes is loss in throughput which in some cases can above 10% impacting write metrics. When the flushes get a out of sync the spikes lower and and throughput is very stable. Reverting hbase.hstore.flusher.count to 1 doesn't help too much with write heavy workloads since we end with a large flush queue that eventually can block writers. Adding a small configurable jitter hbase.server.thread.wakefrequency.jitter.pct (a percentage of the hbase.server.thread.wakefrequency frequency) can help to stagger the writes from FlushHandler to HDFS and smooth the write latencies when the memstores are flushed in multiple threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-13392) Hbase master dyeing out in 1.0 distributed mode with hadoop 2.6 HA
[ https://issues.apache.org/jira/browse/HBASE-13392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez resolved HBASE-13392. --- Resolution: Invalid Hbase master dyeing out in 1.0 distributed mode with hadoop 2.6 HA -- Key: HBASE-13392 URL: https://issues.apache.org/jira/browse/HBASE-13392 Project: HBase Issue Type: Brainstorming Components: hadoop2, hbase Affects Versions: 1.0.0 Environment: rhel 6 64bit Reporter: sridhararao mutluri Priority: Minor HBASE master is dyeing out speedily in cluster mode with this error: My HMASTER is dyeing out speedily as hmaster log shows below error: 2015-04-02 03:43:43,588 FATAL [vxa1:16020.activeMasterManager] master.HMaster: Failed to become active master java.net.ConnectException: Call From vxa1.cloud.com/10.1.178.86 to vxa1.cloud.com:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused Please suggest any solution. The hadoop 2.6 HA core-site.xml fs deafult shows cluster name only where as in hbase-site.xml shows hostname:9000 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13266) test-patch.sh can return false positives for zombie tests from tests running on the same host
Esteban Gutierrez created HBASE-13266: - Summary: test-patch.sh can return false positives for zombie tests from tests running on the same host Key: HBASE-13266 URL: https://issues.apache.org/jira/browse/HBASE-13266 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez Just saw this here https://builds.apache.org/job/PreCommit-HBASE-Build/13271//consoleFull {code} [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 01:27 h [INFO] Finished at: 2015-03-16T23:58:30+00:00 [INFO] Final Memory: 93M/844M [INFO] Suspicious java process found - waiting 30s to see if there are just slow to stop There are 1 zombie tests, they should have been killed by surefire but survived BEGIN zombies jstack extract 2015-03-16 23:59:03 Full thread dump Java HotSpot(TM) Server VM (23.25-b01 mixed mode): Attach Listener daemon prio=10 tid=0xaa400800 nid=0x17cc waiting on condition [0x] java.lang.Thread.State: RUNNABLE IPC Client (47) connection to 0.0.0.0/0.0.0.0:4324 from jenkins daemon prio=10 tid=0xa8d03400 nid=0x1759 in Object.wait() [0xa9c7d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xde1987c8 (a org.apache.hama.ipc.Client$Connection) at org.apache.hama.ipc.Client$Connection.waitForWork(Client.java:533) - locked 0xde1987c8 (a org.apache.hama.ipc.Client$Connection) at org.apache.hama.ipc.Client$Connection.run(Client.java:577) ... java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hama.bsp.TestBSPTaskFaults.tearDown(TestBSPTaskFaults.java:618) at junit.framework.TestCase.runBare(TestCase.java:140) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:232) at junit.framework.TestSuite.run(TestSuite.java:227) {code} Which is getting a jstack from a test from Hama: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13224) Fix minor formatting issue in AuthResult#toContextString
Esteban Gutierrez created HBASE-13224: - Summary: Fix minor formatting issue in AuthResult#toContextString Key: HBASE-13224 URL: https://issues.apache.org/jira/browse/HBASE-13224 Project: HBase Issue Type: Bug Components: Coprocessors, security Affects Versions: 1.0.0, 2.0.0 Reporter: Esteban Gutierrez Priority: Trivial Now that we handle namespace permissions AuthResult#toContextString is not correctly formatted: {code} Access denied for user esteban; reason: Insufficient permissions; remote address: /10.20.30.1; request: createTable; context: (user=esteban@XXX, scope=defaultaction=CREATE) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13208) Patch build should match the patch file name and not the whole relative URL in findBranchNameFromPatchName
Esteban Gutierrez created HBASE-13208: - Summary: Patch build should match the patch file name and not the whole relative URL in findBranchNameFromPatchName Key: HBASE-13208 URL: https://issues.apache.org/jira/browse/HBASE-13208 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez Priority: Trivial In HBASE-1319 we saw that the patch got applied to the wrong branch, the problem is findBranchNameFromPatchName matching a regex that contains wildcard symbols against the whole URL, in this case the regex is 0.94 and the relativePatchURL is /jira/secure/attachment/12703942/HBASE-13193-v4.patch where 0394 is a match. Thanks to [~jonathan.lawlor] for reporting this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13105) [hbck] Add option to reconstruct hbase:namespace if corrupt
Esteban Gutierrez created HBASE-13105: - Summary: [hbck] Add option to reconstruct hbase:namespace if corrupt Key: HBASE-13105 URL: https://issues.apache.org/jira/browse/HBASE-13105 Project: HBase Issue Type: Bug Reporter: Esteban Gutierrez If the HFile containing the namespaces gets corrupted, we don't have a way to gracefully fix it. hbck should handle this in a similar way to OfflineMetaRepair. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12983) HBase book mentions hadoo.ssl.enabled when it should be hbase.ssl.enabled
Esteban Gutierrez created HBASE-12983: - Summary: HBase book mentions hadoo.ssl.enabled when it should be hbase.ssl.enabled Key: HBASE-12983 URL: https://issues.apache.org/jira/browse/HBASE-12983 Project: HBase Issue Type: Bug Components: documentation Reporter: Esteban Gutierrez In the HBase book we say the following: {quote} A default HBase install uses insecure HTTP connections for web UIs for the master and region servers. To enable secure HTTP (HTTPS) connections instead, set *hadoop.ssl.enabled* to true in hbase-site.xml. This does not change the port used by the Web UI. To change the port for the web UI for a given HBase component, configure that port’s setting in hbase-site.xml. These settings are: {quote} The property should be *hbase.ssl.enabled* instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12984) SSL cannot be used by the InfoPort in branch-1
Esteban Gutierrez created HBASE-12984: - Summary: SSL cannot be used by the InfoPort in branch-1 Key: HBASE-12984 URL: https://issues.apache.org/jira/browse/HBASE-12984 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 2.0.0, 1.1.0 Reporter: Esteban Gutierrez Priority: Blocker Setting {{hbase.ssl.enabled}} to {{true}} doesn't enable SSL on the InfoServer. Found that the problem is down the InfoServer and HttpConfig in how we setup the protocol in the HttpServer: {code} for (URI ep : endpoints) { Connector listener = null; String scheme = ep.getScheme(); if (http.equals(scheme)) { listener = HttpServer.createDefaultChannelConnector(); } else if (https.equals(scheme)) { SslSocketConnector c = new SslSocketConnectorSecure(); c.setNeedClientAuth(needsClientAuth); c.setKeyPassword(keyPassword); {code} It depends what end points have been added by the InfoServer: {code} builder .setName(name) .addEndpoint(URI.create(http://; + bindAddress + : + port)) .setAppDir(HBASE_APP_DIR).setFindPort(findPort).setConf(c); {code} Basically we always use http and we don't look via HttConfig if {{hbase.ssl.enabled}} was set to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12956) Binding to 0.0.0.0 is broken after HBASE-10569
Esteban Gutierrez created HBASE-12956: - Summary: Binding to 0.0.0.0 is broken after HBASE-10569 Key: HBASE-12956 URL: https://issues.apache.org/jira/browse/HBASE-12956 Project: HBase Issue Type: Bug Affects Versions: 1.0.0 Reporter: Esteban Gutierrez After the Region Server and Master code was merged, we lost the functionality to bind to 0.0.0.0 via hbase.regionserver.ipc.address and znodes now get created with the wildcard address which means that RSs and the master. Thanks to [~dimaspivak] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12950) Extend the truncate command to handle region ranges and not just the whole table
Esteban Gutierrez created HBASE-12950: - Summary: Extend the truncate command to handle region ranges and not just the whole table Key: HBASE-12950 URL: https://issues.apache.org/jira/browse/HBASE-12950 Project: HBase Issue Type: New Feature Components: Region Assignment, regionserver, shell Affects Versions: 2.0.0 Reporter: Esteban Gutierrez We have seen many times during the last few years that when key prefixes are time based and the access pattern only consists of writes to recent KVs we can end up with tens of thousands of regions and some of those regions will not be longer used. Even if users use TTLs and data is eventually deleted we still have the old regions around and only performing an online merge can help to reduce the excess of regions. Extending the truncate command to handle also region ranges can help user that experience this issue to trim the old regions if required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12826) Expose draining servers into ClusterStatus
Esteban Gutierrez created HBASE-12826: - Summary: Expose draining servers into ClusterStatus Key: HBASE-12826 URL: https://issues.apache.org/jira/browse/HBASE-12826 Project: HBase Issue Type: Improvement Reporter: Esteban Gutierrez We currently keep track of dead, live and in-transition RegionServers I think we should expose also the list of servers that are being decommissioned via draining. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12806) [hbck] move admin.create() to HBaseTestingUtility.createTable in TestHBaseFsck
Esteban Gutierrez created HBASE-12806: - Summary: [hbck] move admin.create() to HBaseTestingUtility.createTable in TestHBaseFsck Key: HBASE-12806 URL: https://issues.apache.org/jira/browse/HBASE-12806 Project: HBase Issue Type: Bug Components: hbck Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Minor TestHBaseFsck should wait until all regions have been assigned after the table has been created. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12792) [backport] HBASE-5835: Catch and handle NotServingRegionException when close region attempt fails
Esteban Gutierrez created HBASE-12792: - Summary: [backport] HBASE-5835: Catch and handle NotServingRegionException when close region attempt fails Key: HBASE-12792 URL: https://issues.apache.org/jira/browse/HBASE-12792 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.26 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Trivial Fix For: 0.94.27 This one is around in 0.94 and its a low hanging fruit when we get a NotServerRegionException if the region is not found when we attempt to close. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12793) [hbck] closeRegionSilentlyAndWait() should log cause of IOException and retry until hbase.hbck.close.timeout expires
Esteban Gutierrez created HBASE-12793: - Summary: [hbck] closeRegionSilentlyAndWait() should log cause of IOException and retry until hbase.hbck.close.timeout expires Key: HBASE-12793 URL: https://issues.apache.org/jira/browse/HBASE-12793 Project: HBase Issue Type: Bug Components: hbck Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Minor This is subtask on HBASE-12131 in order to handle gracefully network partitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12458) Improve CellCounter command line parsing
Esteban Gutierrez created HBASE-12458: - Summary: Improve CellCounter command line parsing Key: HBASE-12458 URL: https://issues.apache.org/jira/browse/HBASE-12458 Project: HBase Issue Type: Improvement Reporter: Esteban Gutierrez Priority: Minor Command line options parsing in CellCounter are different form other tools like CopyTable, RowCounter or VerifyReplication. It should be consistent with the other tools. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12447) Add support for setTimeRange for CopyTable, RowCounter and CellCounter
Esteban Gutierrez created HBASE-12447: - Summary: Add support for setTimeRange for CopyTable, RowCounter and CellCounter Key: HBASE-12447 URL: https://issues.apache.org/jira/browse/HBASE-12447 Project: HBase Issue Type: Improvement Reporter: Esteban Gutierrez Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12380) Too many attempts to open a region can crash the RegionServer
Esteban Gutierrez created HBASE-12380: - Summary: Too many attempts to open a region can crash the RegionServer Key: HBASE-12380 URL: https://issues.apache.org/jira/browse/HBASE-12380 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Esteban Gutierrez Priority: Critical Noticed this while trying to fix faulty test while working on a fix for HBASE-12219: {code} Tests in error: TestRegionServerNoMaster.testMultipleOpen:237 » Service java.io.IOException: R... TestRegionServerNoMaster.testCloseByRegionServer:211-closeRegionNoZK:201 » Service {code} Initially I thought the problem was on my patch for HBASE-12219 but I noticed that the issue was occurring on the 7th attempt to open the region. However I was able to reproduce the same problem in the master branch after increasing the number of requests in testMultipleOpen(): {code} 2014-10-29 15:03:45,043 INFO [Thread-216] regionserver.RSRpcServices(1334): Receiving OPEN for the region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca., which we are already trying to OPEN - ignoring this new request for this region. Submitting openRegion attempt: 16 2014-10-29 15:03:45,044 INFO [Thread-216] regionserver.RSRpcServices(1311): Open TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca. 2014-10-29 15:03:45,044 INFO [PostOpenDeployTasks:025198143197ea68803e49819eae27ca] hbase.MetaTableAccessor(1307): Updated row TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca. with server=192.168.1.105,63082,1414620220789 Submitting openRegion attempt: 17 2014-10-29 15:03:45,046 ERROR [RS_OPEN_REGION-192.168.1.105:63082-2] handler.OpenRegionHandler(88): Region 025198143197ea68803e49819eae27ca was already online when we started processing the opening. Marking this new attempt as failed 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1931): ABORTING region server 192.168.1.105,63082,1414620220789: Received OPEN for the region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca., which is already online 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1937): RegionServer abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] 2014-10-29 15:03:45,054 WARN [Thread-216] regionserver.HRegionServer(1955): Unable to report fatal error to master com.google.protobuf.ServiceException: java.io.IOException: Call to /192.168.1.105:63079 failed on local exception: java.io.IOException: Connection to /192.168.1.105:63079 is closing. Call id=4, waitTime=2 at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1707) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1757) at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRSFatalError(RegionServerStatusProtos.java:8301) at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1952) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abortRegionServer(MiniHBaseCluster.java:174) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$100(MiniHBaseCluster.java:108) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$2.run(MiniHBaseCluster.java:167) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528) at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:277) at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abort(MiniHBaseCluster.java:165) at org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1964) at org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1308) at org.apache.hadoop.hbase.regionserver.TestRegionServerNoMaster.testMultipleOpen(TestRegionServerNoMaster.java:237) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at
[jira] [Created] (HBASE-12365) [hbck] -fixVersionFile should not require a running master
Esteban Gutierrez created HBASE-12365: - Summary: [hbck] -fixVersionFile should not require a running master Key: HBASE-12365 URL: https://issues.apache.org/jira/browse/HBASE-12365 Project: HBase Issue Type: Bug Components: hbck Reporter: Esteban Gutierrez The current logic requires to perform something like this in hbck: {code} exec { ... connect(); ... onlineHbck(); ... } onlineHbck() { ... offlineHdfsIntegrityRepair(); ... } {code} It should be possible to fix {{hbase.version}} without having to connect to the master first. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12369) Warn if hbase.bucketcache.size too close or equal to MaxDirectMemorySize
Esteban Gutierrez created HBASE-12369: - Summary: Warn if hbase.bucketcache.size too close or equal to MaxDirectMemorySize Key: HBASE-12369 URL: https://issues.apache.org/jira/browse/HBASE-12369 Project: HBase Issue Type: Bug Components: regionserver Reporter: Esteban Gutierrez Our ref guide currently says that its required to leave some room from the DirectMemory. However if hbase.bucketcache.size is too close or equal to MaxDirectMemorySize it can trigger OOMEs: {code} 2014-10-28 16:14:41,585 INFO [master//172.16.0.101:16020] util.ByteBufferArray: Allocating buffers total=5.00 GB, sizePerBuffer=4 MB, count=1280, direct=true 2014-10-28 16:14:41,604 INFO [172.16.0.101:16020.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 1, slept for 99 ms, expecting minimum of 2, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. 2014-10-28 16:14:43,144 INFO [172.16.0.101:16020.activeMasterManager] master.ServerManager: Waiting for region servers count to settle; currently checked in 1, slept for 1639 ms, expecting minimum of 2, maximum of 2147483647, timeout of 4500 ms, interval of 1500 ms. 2014-10-28 16:14:44,057 INFO [master//172.16.0.101:16020] regionserver.HRegionServer: STOPPED: Failed initialization 2014-10-28 16:14:44,058 ERROR [master//172.16.0.101:16020] regionserver.HRegionServer: Failed init java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.init(DirectByteBuffer.java:123) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) at org.apache.hadoop.hbase.util.ByteBufferArray.init(ByteBufferArray.java:65) at org.apache.hadoop.hbase.io.hfile.bucket.ByteBufferIOEngine.init(ByteBufferIOEngine.java:47) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.getIOEngineFromName(BucketCache.java:310) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.init(BucketCache.java:218) at org.apache.hadoop.hbase.io.hfile.CacheConfig.getL2(CacheConfig.java:513) at org.apache.hadoop.hbase.io.hfile.CacheConfig.instantiateBlockCache(CacheConfig.java:536) at org.apache.hadoop.hbase.io.hfile.CacheConfig.init(CacheConfig.java:213) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1259) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:818) at java.lang.Thread.run(Thread.java:724) {code} It would be helpful to print a warn message that hbase.bucketcache.size too close or equal to MaxDirectMemorySize. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12219) Use optionally a TTL based cache for FSTableDescriptors#getAll() and FSTableDescriptors#TableDescriptorAndModtime()
Esteban Gutierrez created HBASE-12219: - Summary: Use optionally a TTL based cache for FSTableDescriptors#getAll() and FSTableDescriptors#TableDescriptorAndModtime() Key: HBASE-12219 URL: https://issues.apache.org/jira/browse/HBASE-12219 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.98.6.1, 0.94.24, 0.99.1 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Currently table descriptors and tables are cached once they are accessed for the first time. Next calls to the master only require a trip to HDFS to lookup the modified time in order to reload the table descriptors if modified. However in clusters with a large number of tables or concurrent clients and this can be too aggressive to HDFS and the master causing contention to process other requests. A simple solution is to have a TTL based cached for FSTableDescriptors#getAll() and FSTableDescriptors#TableDescriptorAndModtime() that can allow the master to process those calls faster without causing contention without having to perform a trip to HDFS for every call. to listtables() or getTableDescriptor() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12131) [hbck] undeployRegions should handle gracefully network partitions and other exceptions to avoid the same region deployed multiple times
Esteban Gutierrez created HBASE-12131: - Summary: [hbck] undeployRegions should handle gracefully network partitions and other exceptions to avoid the same region deployed multiple times Key: HBASE-12131 URL: https://issues.apache.org/jira/browse/HBASE-12131 Project: HBase Issue Type: Bug Components: hbck Affects Versions: 0.94.23 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Critical If we get an IOE (we currently ignore it) while regions are being undeployed by hbck we should make sure that we don't re-assign that region in the master before we know that RS was marked as dead and optionally let the user to confirm that action or we will end in a split brain situation with clients talking to different RSs serving the same region. The offending part is here in HBaseFsck.undeployRegions(): {code} private void undeployRegions(HbckInfo hi) throws IOException, InterruptedException { for (OnlineEntry rse : hi.deployedEntries) { LOG.debug(Undeploy region + rse.hri + from + rse.hsa); try { HBaseFsckRepair.closeRegionSilentlyAndWait(admin, rse.hsa, rse.hri); offline(rse.hri.getRegionName()); } catch (IOException ioe) { LOG.warn(Got exception when attempting to offline region + Bytes.toString(rse.hri.getRegionName()), ioe); } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12099) TestScannerModel fails if using jackson 1.9.13
Esteban Gutierrez created HBASE-12099: - Summary: TestScannerModel fails if using jackson 1.9.13 Key: HBASE-12099 URL: https://issues.apache.org/jira/browse/HBASE-12099 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.0.0, 0.98.7, 0.99.1 Environment: hadoop-2.5.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez TestScannerModel fails if jackson 1.9.13 is used. (Hadoop 2.5 now uses that version, see HADOOP-10104): {code} Failed tests: testToJSON(org.apache.hadoop.hbase.rest.model.TestScannerModel): expected:{batch:100,caching:1000,cacheBlocks:false,endRow:enp5eng=,endTime:1245393318192,maxVersions:2147483647,startRow:YWJyYWNhZGFicmE=,startTime:1245219839331,column:[Y29sdW1uMQ==,Y29sdW1uMjpmb28=],labels:[private,public]} but was:{startRow:YWJyYWNhZGFicmE=,endRow:enp5eng=,batch:100,startTime:1245219839331,endTime:1245393318192,maxVersions:2147483647,caching:1000,cacheBlocks:false,column:[Y29sdW1uMQ==,Y29sdW1uMjpmb28=],label:[private,public]} {code} The problem is the annotation used for the labels element which is 'label' instead of 'labels'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-11846) HStore#assertBulkLoadHFileOk should log if a full HFile verification will be performed during a bulkload
Esteban Gutierrez created HBASE-11846: - Summary: HStore#assertBulkLoadHFileOk should log if a full HFile verification will be performed during a bulkload Key: HBASE-11846 URL: https://issues.apache.org/jira/browse/HBASE-11846 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 0.98.6, 0.99.0, 2.0.0 Reporter: Esteban Gutierrez Assignee: Esteban Gutierrez Priority: Trivial If hbase.hstore.bulkload.verify is set to true in the Region Server, we should log that we are about to perform a full scan of the HFiles that are going to be bulk loaded, it might be helpful to correlate other performance issues if the operator has enabled that feature. -- This message was sent by Atlassian JIRA (v6.2#6252)