[jira] [Updated] (HBASE-14539) Slight improvement of StoreScanner.optimize
[ https://issues.apache.org/jira/browse/HBASE-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-14539: -- Fix Version/s: (was: 0.98.12) (was: 1.1.0) (was: 1.0.1) 1.1.3 1.0.3 1.2.1 0.98.15 1.3.0 > Slight improvement of StoreScanner.optimize > --- > > Key: HBASE-14539 > URL: https://issues.apache.org/jira/browse/HBASE-14539 > Project: HBase > Issue Type: Sub-task >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Minor > Fix For: 2.0.0, 1.3.0, 0.98.15, 1.2.1, 1.0.3, 1.1.3 > > > While looking at the code I noticed that StoreScanner.optimize does not some > unnecessary work. This is a very tight loop and even just looking up a > reference can throw off the CPUs cache lines. This does safe a few percent of > performance (not a lot, though). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14539) Slight improvement of StoreScanner.optimize
Lars Hofhansl created HBASE-14539: - Summary: Slight improvement of StoreScanner.optimize Key: HBASE-14539 URL: https://issues.apache.org/jira/browse/HBASE-14539 Project: HBase Issue Type: Sub-task Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Minor While looking at the code I noticed that StoreScanner.optimize does not some unnecessary work. This is a very tight loop and even just looking up a reference can throw off the CPUs cache lines. This does safe a few percent of performance (not a lot, though). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14539) Slight improvement of StoreScanner.optimize
[ https://issues.apache.org/jira/browse/HBASE-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-14539: -- Attachment: 14539-0.98.txt Here's a trivial patch. Makes absolutely sure we do no work (other than the compares in the switch statements) unless we need to do any. I measured a 3-5% improvement in some cases. Trivial patch, no functional change. Will commit tomorrow unless I hear objections. > Slight improvement of StoreScanner.optimize > --- > > Key: HBASE-14539 > URL: https://issues.apache.org/jira/browse/HBASE-14539 > Project: HBase > Issue Type: Sub-task >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Minor > Fix For: 2.0.0, 1.3.0, 0.98.15, 1.2.1, 1.0.3, 1.1.3 > > Attachments: 14539-0.98.txt > > > While looking at the code I noticed that StoreScanner.optimize does not some > unnecessary work. This is a very tight loop and even just looking up a > reference can throw off the CPUs cache lines. This does safe a few percent of > performance (not a lot, though). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-14509) Configurable sparse indexes?
[ https://issues.apache.org/jira/browse/HBASE-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935818#comment-14935818 ] Lars Hofhansl edited comment on HBASE-14509 at 10/2/15 6:16 AM: [~lhofhansl], FYI HBASE-14511 - StoreFile.Writer Meta plugin framework. I need only Meta section and only for Writer. For your sparse indexes, you will need full Reader/Writer plugin (both meta and data blocks). It is just a one way of doing indexes, of course. was (Author: vrodionov): [~lhofhansl], FYI https://issues.apache.org/jira/browse/HBASE-14511 - StoreFile.Writer Meta plugin framework. I need only Meta section and only for Writer. For your sparse indexes, you will need full Reader/Writer plugin (both meta and data blocks). It is just a one way of doing indexes, of course. > Configurable sparse indexes? > > > Key: HBASE-14509 > URL: https://issues.apache.org/jira/browse/HBASE-14509 > Project: HBase > Issue Type: Brainstorming >Reporter: Lars Hofhansl > > This idea just popped up today and I wanted to record it for discussion: > What if we kept sparse column indexes per region or HFile or per configurable > range? > I.e. For any given CQ we record the lowest and highest value for a particular > range (HFile, Region, or a custom range like the Phoenix guide post). > By tweaking the size of these ranges we can control the size of the index, vs > its selectivity. > For example if we kept it by HFile we can almost instantly decide whether we > need scan a particular HFile at all to find a particular value in a Cell. > We can also collect min/max values for each n MB of data, for example when we > can the region the first time. Assuming ranges are large enough we can always > keep the index in memory together with the region. > Kind of a sparse local index. Might much easier than the buddy region stuff > we've been discussing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13143) TestCacheOnWrite is flaky and needs a diet
[ https://issues.apache.org/jira/browse/HBASE-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940810#comment-14940810 ] Hudson commented on HBASE-13143: FAILURE: Integrated in HBase-TRUNK #6864 (See [https://builds.apache.org/job/HBase-TRUNK/6864/]) HBASE-13143 TestCacheOnWrite is flaky and needs a diet (apurtell: rev 030ae5f0415b97e5da688c1432ed53fd56990194) * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java > TestCacheOnWrite is flaky and needs a diet > -- > > Key: HBASE-13143 > URL: https://issues.apache.org/jira/browse/HBASE-13143 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.11 >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: HBASE-13143.patch > > > TestCacheOnWrite passes locally but has been flaking in 0.98 builds on > Jenkins, most recently https://builds.apache.org/job/HBase-0.98/878/ > The test takes a long time to execute (338.492 sec) and is resource intensive > (216 tests). Neither of these characteristics endear it to Jenkins. > When I ran this unit test on a macbook after a minute the fan was running so > fast I thought it would take flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14511) StoreFile.Writer Meta Plugin
[ https://issues.apache.org/jira/browse/HBASE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940808#comment-14940808 ] Lars Hofhansl commented on HBASE-14511: --- I'd like to use this for Phoenix to store min/max for some column qualifiers in the HFile itself. At scan time we can then efficiently rule out entire HFiles based on those (similar to HBase does it with key ranges, and timestamps) - that would be a cheap local secondary index. [~giacomotaylor], FYI. Can we make this accessible through coprocessor hooks somehow (I'd need to think about this side, though). > StoreFile.Writer Meta Plugin > > > Key: HBASE-14511 > URL: https://issues.apache.org/jira/browse/HBASE-14511 > Project: HBase > Issue Type: New Feature >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: HBASE-14511.v1.patch, HBASE-14511.v2.patch > > > During my work on a new compaction policies (HBASE-14468, HBASE-14477) I had > to modify the existing code of a StoreFile.Writer to add additional meta-info > required by these new policies. I think that it should be done by means of a > new Plugin framework, because this seems to be a general capability/feature. > As a future enhancement this can become a part of a more general > StoreFileWriter/Reader plugin architecture. But I need only Meta section of a > store file. > This could be used, for example, to collect rowkeys distribution information > during hfile creation. This info can be used later to find the optimal region > split key or to create optimal set of sub-regions for M/R jobs or other jobs > which can operate on a sub-region level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-14536) Balancer & SSH interfering with each other leading to unavailability
[ https://issues.apache.org/jira/browse/HBASE-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang reassigned HBASE-14536: -- Assignee: Stephen Yuan Jiang > Balancer & SSH interfering with each other leading to unavailability > > > Key: HBASE-14536 > URL: https://issues.apache.org/jira/browse/HBASE-14536 > Project: HBase > Issue Type: Bug > Components: master, Region Assignment >Affects Versions: 1.1.2 >Reporter: Devaraj Das >Assignee: Stephen Yuan Jiang > Fix For: 1.1.4 > > Attachments: master-log.tgz > > > Came across this in our cluster: > 1. The meta was assigned to a server 10.0.0.149,16020,1443507203340 > {noformat} > 2015-09-29 06:16:22,472 DEBUG [AM.ZK.Worker-pool2-t56] > master.RegionStates: Onlined 1588230740 on > 10.0.0.149,16020,1443507203340 {ENCODED => 1588230740, NAME => > 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} > {noformat} > 2. The server dies at some point: > {noformat} > 2015-09-29 06:18:25,952 INFO [main-EventThread] > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, > processing expiration [10.0.0.149,16020,1443507203340] > 2015-09-29 06:18:25,955 DEBUG [main-EventThread] master.AssignmentManager: > based on AM, current > region=hbase:meta,,1.1588230740 is on server=10.0.0.149,16020,1443507203340 > server being checked: > 10.0.0.149,16020,1443507203340 > {noformat} > 3. The balancer had computed a plan that contained a move for the meta: > {noformat} > 2015-09-29 06:18:26,833 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] master.HMaster: > balance hri=hbase:meta,,1.1588230740, > src=10.0.0.149,16020,1443507203340, dest=10.0.0.205,16020,1443507257905 > {noformat} > 4. The following ensues after this, leading to the meta remaining unassigned: > {noformat} > 2015-09-29 06:18:26,859 DEBUG > [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Offline hbase:meta,,1.1588230740, no need to > unassign since it's on a dead server: 10.0.0.149,16020,1443507203340 > .. > 2015-09-29 06:18:26,899 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] master.RegionStates: > Offlined 1588230740 from 10.0.0.149,16020,1443507203340 > . > 2015-09-29 06:18:26,914 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Skip assigning hbase:meta,,1.1588230740, it is > on a dead but not processed yet server: 10.0.0.149,16020,1443507203340 > > 2015-09-29 06:18:26,915 DEBUG [AM.ZK.Worker-pool2-t58] > master.AssignmentManager: Znode hbase:meta,,1.1588230740 deleted, > state: {1588230740 state=OFFLINE, ts=1443507506914, > server=10.0.0.149,16020,1443507203340} > > 2015-09-29 06:18:29,447 DEBUG > [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] master.AssignmentManager: > based on AM, current > region=hbase:meta,,1.1588230740 is on server=null server being checked: > 10.0.0.149,16020,1443507203340 > 2015-09-29 06:18:29,451 INFO [MASTER_META_SERVER_OPERATIONS- > 10.0.0.148:16000-2] handler.MetaServerShutdownHandler: META has been > assigned to otherwhere, skip assigning. > 2015-09-29 06:18:29,452 DEBUG > [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] > master.DeadServer: Finished processing 10.0.0.149,16020,1443507203340 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy
[ https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940816#comment-14940816 ] Vladimir Rodionov commented on HBASE-14468: --- HBASE-14477 - now its DateTieredCompaction :) > Compaction improvements: FIFO compaction policy > --- > > Key: HBASE-14468 > URL: https://issues.apache.org/jira/browse/HBASE-14468 > Project: HBase > Issue Type: Improvement >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Fix For: 2.0.0 > > Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, > HBASE-14468-v3.patch, HBASE-14468-v4.patch > > > h2. FIFO Compaction > h3. Introduction > FIFO compaction policy selects only files which have all cells expired. The > column family MUST have non-default TTL. > Essentially, FIFO compactor does only one job: collects expired store files. > I see many applications for this policy: > # use it for very high volume raw data which has low TTL and which is the > source of another data (after additional processing). Example: Raw > time-series vs. time-based rollup aggregates and compacted time-series. We > collect raw time-series and store them into CF with FIFO compaction policy, > periodically we run task which creates rollup aggregates and compacts > time-series, the original raw data can be discarded after that. > # use it for data which can be kept entirely in a a block cache (RAM/SSD). > Say we have local SSD (1TB) which we can use as a block cache. No need for > compaction of a raw data at all. > Because we do not do any real compaction, we do not use CPU and IO (disk and > network), we do not evict hot data from a block cache. The result: improved > throughput and latency both write and read. > See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style > h3. To enable FIFO compaction policy > For table: > {code} > HTableDescriptor desc = new HTableDescriptor(tableName); > > desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, > FIFOCompactionPolicy.class.getName()); > {code} > For CF: > {code} > HColumnDescriptor desc = new HColumnDescriptor(family); > > desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, > FIFOCompactionPolicy.class.getName()); > {code} > h3. Limitations > Do not use FIFO compaction if : > * Table/CF has MIN_VERSION > 0 > * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL) > * Table/CF is MOB -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14511) StoreFile.Writer Meta Plugin
[ https://issues.apache.org/jira/browse/HBASE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940818#comment-14940818 ] Vladimir Rodionov commented on HBASE-14511: --- [~lhofhansl] {quote} Can we make this accessible through coprocessor hooks somehow (I'd need to think about this side, though). {quote} Sure, there is a sub-task for that. > StoreFile.Writer Meta Plugin > > > Key: HBASE-14511 > URL: https://issues.apache.org/jira/browse/HBASE-14511 > Project: HBase > Issue Type: New Feature >Reporter: Vladimir Rodionov >Assignee: Vladimir Rodionov > Attachments: HBASE-14511.v1.patch, HBASE-14511.v2.patch > > > During my work on a new compaction policies (HBASE-14468, HBASE-14477) I had > to modify the existing code of a StoreFile.Writer to add additional meta-info > required by these new policies. I think that it should be done by means of a > new Plugin framework, because this seems to be a general capability/feature. > As a future enhancement this can become a part of a more general > StoreFileWriter/Reader plugin architecture. But I need only Meta section of a > store file. > This could be used, for example, to collect rowkeys distribution information > during hfile creation. This info can be used later to find the optimal region > split key or to create optimal set of sub-regions for M/R jobs or other jobs > which can operate on a sub-region level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14540) Write Ahead Log Batching Optimization
John Leach created HBASE-14540: -- Summary: Write Ahead Log Batching Optimization Key: HBASE-14540 URL: https://issues.apache.org/jira/browse/HBASE-14540 Project: HBase Issue Type: Improvement Reporter: John Leach The new write ahead log mechanism seems to batch too few mutations when running inside the disruptor. As we scaled our load up (many threads with small writes), we saw the number of hdfs sync operations grow in concert with the number of writes. Generally, one would expect the size of the batches to grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14540) Write Ahead Log Batching Optimization
[ https://issues.apache.org/jira/browse/HBASE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Leach updated HBASE-14540: --- Attachment: HBaseWALBlockingWaitStrategy.java Here is a modified Wait Strategy to apply to the disruptor. > Write Ahead Log Batching Optimization > - > > Key: HBASE-14540 > URL: https://issues.apache.org/jira/browse/HBASE-14540 > Project: HBase > Issue Type: Improvement >Reporter: John Leach > Attachments: HBaseWALBlockingWaitStrategy.java > > > The new write ahead log mechanism seems to batch too few mutations when > running inside the disruptor. As we scaled our load up (many threads with > small writes), we saw the number of hdfs sync operations grow in concert with > the number of writes. Generally, one would expect the size of the batches to > grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14540) Write Ahead Log Batching Optimization
[ https://issues.apache.org/jira/browse/HBASE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941155#comment-14941155 ] John Leach commented on HBASE-14540: I did not run this on HBase based benchmarks but I did run this while we (SpliceMachine) were running TPCC benchmarks and it showed a significant improvement (2x). Also we were able to get rid of these types of error messages. {NOFORMAT} wal.FSHLog: Slow sync cost {NOFORMAT} > Write Ahead Log Batching Optimization > - > > Key: HBASE-14540 > URL: https://issues.apache.org/jira/browse/HBASE-14540 > Project: HBase > Issue Type: Improvement >Reporter: John Leach > Attachments: HBaseWALBlockingWaitStrategy.java > > > The new write ahead log mechanism seems to batch too few mutations when > running inside the disruptor. As we scaled our load up (many threads with > small writes), we saw the number of hdfs sync operations grow in concert with > the number of writes. Generally, one would expect the size of the batches to > grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941020#comment-14941020 ] Hadoop QA commented on HBASE-14367: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12764743/HBASE-14367-branch-1.2.v3.patch against branch-1.2 branch at commit 030ae5f0415b97e5da688c1432ed53fd56990194. ATTACHMENT ID: 12764743 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + * rpc SetNormalizerRunning(.SetNormalizerRunningRequest) returns (.SetNormalizerRunningResponse); + * rpc IsNormalizerEnabled(.IsNormalizerEnabledRequest) returns (.IsNormalizerEnabledResponse); + * rpc SetNormalizerRunning(.SetNormalizerRunningRequest) returns (.SetNormalizerRunningResponse); + * rpc IsNormalizerEnabled(.IsNormalizerEnabledRequest) returns (.IsNormalizerEnabledResponse); {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15858//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15858//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15858//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15858//console This message is automatically generated. > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14490) [RpcServer] reuse request read buffer
[ https://issues.apache.org/jira/browse/HBASE-14490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940889#comment-14940889 ] Zephyr Guo commented on HBASE-14490: {quote} I should maintain a suitable buffer each Reader.It's a easy way for optimizing. {quote} This is wrong.Because non-blocking socket, so we can't use one buffer each Reader.Each Connect still need a buffer. > [RpcServer] reuse request read buffer > - > > Key: HBASE-14490 > URL: https://issues.apache.org/jira/browse/HBASE-14490 > Project: HBase > Issue Type: Improvement > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2 >Reporter: Zephyr Guo >Assignee: Zephyr Guo > Labels: performance > Fix For: 2.0.0, 1.0.2 > > Attachments: HBASE-14490-v1.patch, HBASE-14490-v10.patch, > HBASE-14490-v2.patch, HBASE-14490-v3.patch, HBASE-14490-v4.patch, > HBASE-14490-v5.patch, HBASE-14490-v6.patch, HBASE-14490-v7.patch, > HBASE-14490-v8.patch, HBASE-14490-v9.patch > > > Reuse buffer to read request.It's not necessary free data's buffer for each > request.Optimization is to reduce the times that allocate ByteBuffer. > *patch modification* > * {{saslReadAndProcess}} and {{processOneRpc}} accept a ByteBuffer instead of > byte[]. > * {{processUnwrappedData}} can reuse the same ByteBuffer that > {{saslReadAndProcess}} used. > * Maintaining a reused ByteBuffer each {{Connection}} for small request. > ** Buffer size is fixed. > ** Using a SoftReference to reference the buffer. > ** If request is too large, we allocate a temporary ByteBuffer.Freeing it > when {{process()}} will have been finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13143) TestCacheOnWrite is flaky and needs a diet
[ https://issues.apache.org/jira/browse/HBASE-13143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940921#comment-14940921 ] Hudson commented on HBASE-13143: SUCCESS: Integrated in HBase-1.3-IT #201 (See [https://builds.apache.org/job/HBase-1.3-IT/201/]) HBASE-13143 TestCacheOnWrite is flaky and needs a diet (apurtell: rev 9b297493e2d87bfd3f93005fe16d26cdf847b0c3) * hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java > TestCacheOnWrite is flaky and needs a diet > -- > > Key: HBASE-13143 > URL: https://issues.apache.org/jira/browse/HBASE-13143 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.11 >Reporter: Andrew Purtell >Assignee: Andrew Purtell >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: HBASE-13143.patch > > > TestCacheOnWrite passes locally but has been flaking in 0.98 builds on > Jenkins, most recently https://builds.apache.org/job/HBase-0.98/878/ > The test takes a long time to execute (338.492 sec) and is resource intensive > (216 tests). Neither of these characteristics endear it to Jenkins. > When I ran this unit test on a macbook after a minute the fan was running so > fast I thought it would take flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13336) Consistent rules for security meta table protections
[ https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940897#comment-14940897 ] Mikhail Antonov commented on HBASE-13336: - Since I reviewed the first patch, let me pick this one up.. > Consistent rules for security meta table protections > > > Key: HBASE-13336 > URL: https://issues.apache.org/jira/browse/HBASE-13336 > Project: HBase > Issue Type: Improvement >Reporter: Andrew Purtell > Fix For: 2.0.0, 1.3.0, 0.98.16 > > Attachments: HBASE-13336.patch, HBASE-13336_v2.patch > > > The AccessController and VisibilityController do different things regarding > protecting their meta tables. The AC allows schema changes and disable/enable > if the user has permission. The VC unconditionally disallows all admin > actions. Generally, bad things will happen if these meta tables are damaged, > disabled, or dropped. The likely outcome is random frequent (or constant) > server side op failures with nasty stack traces. On the other hand some > things like column family and table attribute changes can have valid use > cases. We should have consistent and sensible rules for protecting security > meta tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-13336) Consistent rules for security meta table protections
[ https://issues.apache.org/jira/browse/HBASE-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov reassigned HBASE-13336: --- Assignee: Mikhail Antonov > Consistent rules for security meta table protections > > > Key: HBASE-13336 > URL: https://issues.apache.org/jira/browse/HBASE-13336 > Project: HBase > Issue Type: Improvement >Reporter: Andrew Purtell >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.3.0, 0.98.16 > > Attachments: HBASE-13336.patch, HBASE-13336_v2.patch > > > The AccessController and VisibilityController do different things regarding > protecting their meta tables. The AC allows schema changes and disable/enable > if the user has permission. The VC unconditionally disallows all admin > actions. Generally, bad things will happen if these meta tables are damaged, > disabled, or dropped. The likely outcome is random frequent (or constant) > server side op failures with nasty stack traces. On the other hand some > things like column family and table attribute changes can have valid use > cases. We should have consistent and sensible rules for protecting security > meta tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-14367: Attachment: HBASE-14367-branch-1.2.v3.patch fixed long lines except protobuf-generated ones, fixed checkstyles, added test in TestAdmin2 for region normalizer (shell commands would be tested in TestShell, as they are picked up by Ruby test runner automatically). > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-14367: Status: Patch Available (was: Open) > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-14367: Status: Open (was: Patch Available) > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941253#comment-14941253 ] Sean Busbey commented on HBASE-14367: - the core tests vote looks like it's caused by a timeout on hbase-server, and hbase-shell got skipped: {code} [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache HBase .. SUCCESS [2.823s] [INFO] Apache HBase - Checkstyle . SUCCESS [0.444s] [INFO] Apache HBase - Resource Bundle SUCCESS [0.151s] [INFO] Apache HBase - Annotations SUCCESS [0.834s] [INFO] Apache HBase - Protocol ... SUCCESS [10.775s] [INFO] Apache HBase - Common . SUCCESS [1:22.568s] [INFO] Apache HBase - Procedure .. SUCCESS [1:52.079s] [INFO] Apache HBase - Client . SUCCESS [1:20.418s] [INFO] Apache HBase - Hadoop Compatibility ... SUCCESS [7.197s] [INFO] Apache HBase - Hadoop Two Compatibility ... SUCCESS [6.854s] [INFO] Apache HBase - Prefix Tree SUCCESS [9.750s] [INFO] Apache HBase - Server . FAILURE [1:36:05.365s] [INFO] Apache HBase - Testing Util ... SKIPPED [INFO] Apache HBase - Thrift . SKIPPED [INFO] Apache HBase - Rest ... SKIPPED [INFO] Apache HBase - Shell .. SKIPPED [INFO] Apache HBase - Integration Tests .. SKIPPED [INFO] Apache HBase - Examples ... SKIPPED [INFO] Apache HBase - External Block Cache ... SKIPPED [INFO] Apache HBase - Assembly ... SKIPPED [INFO] Apache HBase - Shaded . SKIPPED [INFO] Apache HBase - Shaded - Client SKIPPED [INFO] Apache HBase - Shaded - Server SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 1:41:21.117s [INFO] Finished at: Fri Oct 02 11:09:37 UTC 2015 [INFO] Final Memory: 52M/645M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (secondPartTestsExecution) on project hbase-server: There was a timeout or other error in the fork -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hbase-server {code} can you make a go at figuring out which one so you can run it locally? > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14540) Write Ahead Log Batching Optimization
[ https://issues.apache.org/jira/browse/HBASE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941327#comment-14941327 ] John Leach commented on HBASE-14540: Good point... Probably not a good idea then. > Write Ahead Log Batching Optimization > - > > Key: HBASE-14540 > URL: https://issues.apache.org/jira/browse/HBASE-14540 > Project: HBase > Issue Type: Improvement >Reporter: John Leach > Attachments: HBaseWALBlockingWaitStrategy.java > > > The new write ahead log mechanism seems to batch too few mutations when > running inside the disruptor. As we scaled our load up (many threads with > small writes), we saw the number of hdfs sync operations grow in concert with > the number of writes. Generally, one would expect the size of the batches to > grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14540) Write Ahead Log Batching Optimization
[ https://issues.apache.org/jira/browse/HBASE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941276#comment-14941276 ] Elliott Clark commented on HBASE-14540: --- This will negative impact the average response time since on average everything will wait 2ms. So it might work for a throughput oriented workload, it won't be all that good for a online workloads. > Write Ahead Log Batching Optimization > - > > Key: HBASE-14540 > URL: https://issues.apache.org/jira/browse/HBASE-14540 > Project: HBase > Issue Type: Improvement >Reporter: John Leach > Attachments: HBaseWALBlockingWaitStrategy.java > > > The new write ahead log mechanism seems to batch too few mutations when > running inside the disruptor. As we scaled our load up (many threads with > small writes), we saw the number of hdfs sync operations grow in concert with > the number of writes. Generally, one would expect the size of the batches to > grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12911) Client-side metrics
[ https://issues.apache.org/jira/browse/HBASE-12911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-12911: - Attachment: 12911.yammer.v02.patch This one unpacks the {{Method}} and {{Message}} objects, instead of using dynamic registries, on the DML critical path. It's too tightly coupled to protobuf internal representation for my liking, but it was the only way I could find to determine the calling context without string comparisons. Let me know if you protobuf experts have better ideas. The performance of running with this patch was on par with the run from master (actually, 19s faster and 14 fewer allocations, but I think that's within the noise). RB is updated. > Client-side metrics > --- > > Key: HBASE-12911 > URL: https://issues.apache.org/jira/browse/HBASE-12911 > Project: HBase > Issue Type: New Feature > Components: Client, Operability, Performance >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 2.0.0, 1.3.0 > > Attachments: 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, 12911-0.98.00.patch, > 12911-branch-1.00.patch, 12911.yammer.jpg, 12911.yammer.v00.patch, > 12911.yammer.v01.patch, 12911.yammer.v02.patch, am.jpg, client metrics > RS-Master.jpg, client metrics client.jpg, conn_agg.jpg, connection > attributes.jpg, ltt.jpg, standalone.jpg > > > There's very little visibility into the hbase client. Folks who care to add > some kind of metrics collection end up wrapping Table method invocations with > {{System.currentTimeMillis()}}. For a crude example of this, have a look at > what I did in {{PerformanceEvaluation}} for exposing requests latencies up to > {{IntegrationTestRegionReplicaPerf}}. The client is quite complex, there's a > lot going on under the hood that is impossible to see right now without a > profiler. Being a crucial part of the performance of this distributed system, > we should have deeper visibility into the client's function. > I'm not sure that wiring into the hadoop metrics system is the right choice > because the client is often embedded as a library in a user's application. We > should have integration with our metrics tools so that, i.e., a client > embedded in a coprocessor can report metrics through the usual RS channels, > or a client used in a MR job can do the same. > I would propose an interface-based system with pluggable implementations. Out > of the box we'd include a hadoop-metrics implementation and one other, > possibly [dropwizard/metrics|https://github.com/dropwizard/metrics]. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-14540) Write Ahead Log Batching Optimization
[ https://issues.apache.org/jira/browse/HBASE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941276#comment-14941276 ] Elliott Clark edited comment on HBASE-14540 at 10/2/15 3:46 PM: This will negatively impact the average response time since on average everything will wait 2ms (1ms for wait and 1 ms for sync). So it might work for a throughput oriented workload, it won't be all that good for a online workloads. was (Author: eclark): This will negative impact the average response time since on average everything will wait 2ms. So it might work for a throughput oriented workload, it won't be all that good for a online workloads. > Write Ahead Log Batching Optimization > - > > Key: HBASE-14540 > URL: https://issues.apache.org/jira/browse/HBASE-14540 > Project: HBase > Issue Type: Improvement >Reporter: John Leach > Attachments: HBaseWALBlockingWaitStrategy.java > > > The new write ahead log mechanism seems to batch too few mutations when > running inside the disruptor. As we scaled our load up (many threads with > small writes), we saw the number of hdfs sync operations grow in concert with > the number of writes. Generally, one would expect the size of the batches to > grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14541) TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed on internal rig; gave up after trying 10 times
stack created HBASE-14541: - Summary: TestHFileOutputFormat.testMRIncrementalLoadWithSplit failed on internal rig; gave up after trying 10 times Key: HBASE-14541 URL: https://issues.apache.org/jira/browse/HBASE-14541 Project: HBase Issue Type: Bug Reporter: stack This one seems worth a dig. We seem to be making progress but here is what we are trying to load which seems weird: {code} 2015-10-01 17:19:41,322 INFO [main] mapreduce.LoadIncrementalHFiles(360): Split occured while grouping HFiles, retry attempt 10 with 4 files remaining to group or split 2015-10-01 17:19:41,323 ERROR [main] mapreduce.LoadIncrementalHFiles(402): - Bulk load aborted with some files not yet loaded: - hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-B/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/ce11cbe2490d444d8958264004286aff.bottom hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-B/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/ce11cbe2490d444d8958264004286aff.top hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-A/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/30c58eeb23a6464da21117e6e1bc565c.bottom hdfs://localhost:39540/user/jenkins/test-data/720ae36a-2495-456b-ba68-19e260685a35/testLocalMRIncrementalLoad/info-A/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/_tmp/30c58eeb23a6464da21117e6e1bc565c.top {code} Whats that about? Making note here. Will keep an eye on this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14536) Balancer & SSH interfering with each other leading to unavailability
[ https://issues.apache.org/jira/browse/HBASE-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941301#comment-14941301 ] stack commented on HBASE-14536: --- Is HBASE-9665 related at all lads? > Balancer & SSH interfering with each other leading to unavailability > > > Key: HBASE-14536 > URL: https://issues.apache.org/jira/browse/HBASE-14536 > Project: HBase > Issue Type: Bug > Components: master, Region Assignment >Affects Versions: 1.1.2 >Reporter: Devaraj Das >Assignee: Stephen Yuan Jiang > Fix For: 1.1.4 > > Attachments: master-log.tgz > > > Came across this in our cluster: > 1. The meta was assigned to a server 10.0.0.149,16020,1443507203340 > {noformat} > 2015-09-29 06:16:22,472 DEBUG [AM.ZK.Worker-pool2-t56] > master.RegionStates: Onlined 1588230740 on > 10.0.0.149,16020,1443507203340 {ENCODED => 1588230740, NAME => > 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} > {noformat} > 2. The server dies at some point: > {noformat} > 2015-09-29 06:18:25,952 INFO [main-EventThread] > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, > processing expiration [10.0.0.149,16020,1443507203340] > 2015-09-29 06:18:25,955 DEBUG [main-EventThread] master.AssignmentManager: > based on AM, current > region=hbase:meta,,1.1588230740 is on server=10.0.0.149,16020,1443507203340 > server being checked: > 10.0.0.149,16020,1443507203340 > {noformat} > 3. The balancer had computed a plan that contained a move for the meta: > {noformat} > 2015-09-29 06:18:26,833 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] master.HMaster: > balance hri=hbase:meta,,1.1588230740, > src=10.0.0.149,16020,1443507203340, dest=10.0.0.205,16020,1443507257905 > {noformat} > 4. The following ensues after this, leading to the meta remaining unassigned: > {noformat} > 2015-09-29 06:18:26,859 DEBUG > [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Offline hbase:meta,,1.1588230740, no need to > unassign since it's on a dead server: 10.0.0.149,16020,1443507203340 > .. > 2015-09-29 06:18:26,899 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] master.RegionStates: > Offlined 1588230740 from 10.0.0.149,16020,1443507203340 > . > 2015-09-29 06:18:26,914 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Skip assigning hbase:meta,,1.1588230740, it is > on a dead but not processed yet server: 10.0.0.149,16020,1443507203340 > > 2015-09-29 06:18:26,915 DEBUG [AM.ZK.Worker-pool2-t58] > master.AssignmentManager: Znode hbase:meta,,1.1588230740 deleted, > state: {1588230740 state=OFFLINE, ts=1443507506914, > server=10.0.0.149,16020,1443507203340} > > 2015-09-29 06:18:29,447 DEBUG > [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] master.AssignmentManager: > based on AM, current > region=hbase:meta,,1.1588230740 is on server=null server being checked: > 10.0.0.149,16020,1443507203340 > 2015-09-29 06:18:29,451 INFO [MASTER_META_SERVER_OPERATIONS- > 10.0.0.148:16000-2] handler.MetaServerShutdownHandler: META has been > assigned to otherwhere, skip assigning. > 2015-09-29 06:18:29,452 DEBUG > [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] > master.DeadServer: Finished processing 10.0.0.149,16020,1443507203340 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14540) Write Ahead Log Batching Optimization
[ https://issues.apache.org/jira/browse/HBASE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941338#comment-14941338 ] Elliott Clark commented on HBASE-14540: --- I think something like this could really add throughput for people who have an olap workload, so we should totally try this out and then make it an option. Just allow online workloads to use something else. > Write Ahead Log Batching Optimization > - > > Key: HBASE-14540 > URL: https://issues.apache.org/jira/browse/HBASE-14540 > Project: HBase > Issue Type: Improvement >Reporter: John Leach > Attachments: HBaseWALBlockingWaitStrategy.java > > > The new write ahead log mechanism seems to batch too few mutations when > running inside the disruptor. As we scaled our load up (many threads with > small writes), we saw the number of hdfs sync operations grow in concert with > the number of writes. Generally, one would expect the size of the batches to > grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14540) Write Ahead Log Batching Optimization
[ https://issues.apache.org/jira/browse/HBASE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941346#comment-14941346 ] John Leach commented on HBASE-14540: Clearly, I think we should make it configurable. The problem with the "smart batching" we have is that it is designed for in-memory processing vs. a distributed WAL. I appreciate you thinking on this... > Write Ahead Log Batching Optimization > - > > Key: HBASE-14540 > URL: https://issues.apache.org/jira/browse/HBASE-14540 > Project: HBase > Issue Type: Improvement >Reporter: John Leach > Attachments: HBaseWALBlockingWaitStrategy.java > > > The new write ahead log mechanism seems to batch too few mutations when > running inside the disruptor. As we scaled our load up (many threads with > small writes), we saw the number of hdfs sync operations grow in concert with > the number of writes. Generally, one would expect the size of the batches to > grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14540) Write Ahead Log Batching Optimization
[ https://issues.apache.org/jira/browse/HBASE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941280#comment-14941280 ] stack commented on HBASE-14540: --- Lets get it in as an option. Let me try a WALPE workload against it. > Write Ahead Log Batching Optimization > - > > Key: HBASE-14540 > URL: https://issues.apache.org/jira/browse/HBASE-14540 > Project: HBase > Issue Type: Improvement >Reporter: John Leach > Attachments: HBaseWALBlockingWaitStrategy.java > > > The new write ahead log mechanism seems to batch too few mutations when > running inside the disruptor. As we scaled our load up (many threads with > small writes), we saw the number of hdfs sync operations grow in concert with > the number of writes. Generally, one would expect the size of the batches to > grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14540) Write Ahead Log Batching Optimization
[ https://issues.apache.org/jira/browse/HBASE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941319#comment-14941319 ] Elliott Clark commented on HBASE-14540: --- So right now I have clusters with less than 1ms average response time. This will absolutely negatively impact those. Smart batching is great. This is just adding wait time. We already have the smart batching algorithm that the linked article describes. > Write Ahead Log Batching Optimization > - > > Key: HBASE-14540 > URL: https://issues.apache.org/jira/browse/HBASE-14540 > Project: HBase > Issue Type: Improvement >Reporter: John Leach > Attachments: HBaseWALBlockingWaitStrategy.java > > > The new write ahead log mechanism seems to batch too few mutations when > running inside the disruptor. As we scaled our load up (many threads with > small writes), we saw the number of hdfs sync operations grow in concert with > the number of writes. Generally, one would expect the size of the batches to > grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14540) Write Ahead Log Batching Optimization
[ https://issues.apache.org/jira/browse/HBASE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941335#comment-14941335 ] Elliott Clark commented on HBASE-14540: --- To expand a bit more this is the classic throughput vs response time trade off. We already have the smart batching method described in the linked article. If requests come in bunches we'll sync fewer times with larger payloads. This just adds more wait time. The attached class will wait longer than the described smart batching algorithm. That will increase throughput as it will mean less round trips to HDFS. However since on average everything will twice for while going through the ring it will increase latency. For clusters that optimize for throughput ( benchmarks and olap based workloads ) that's a good trade off. For people with an online workload this will set a baseline that they can't go faster than 2ms on average. > Write Ahead Log Batching Optimization > - > > Key: HBASE-14540 > URL: https://issues.apache.org/jira/browse/HBASE-14540 > Project: HBase > Issue Type: Improvement >Reporter: John Leach > Attachments: HBaseWALBlockingWaitStrategy.java > > > The new write ahead log mechanism seems to batch too few mutations when > running inside the disruptor. As we scaled our load up (many threads with > small writes), we saw the number of hdfs sync operations grow in concert with > the number of writes. Generally, one would expect the size of the batches to > grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14347) Add a switch to DynamicClassLoader to disable it and make that the default
[ https://issues.apache.org/jira/browse/HBASE-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941226#comment-14941226 ] Matteo Bertozzi commented on HBASE-14347: - +1 on the patch for 1.x branches, since it does not change any behavior. for 2.x we probably want to do some changes. the DynamicLoader seems to not be needed on the client side, so we should force that to "not enabled". but on the server side we probably want that still on, to allow user filters and so on. do we have any alternative to copy local instead of forcing that "not enable" with security reason as motivation? how one is supposed to use custom filters in a "secure" environment otherwise? > Add a switch to DynamicClassLoader to disable it and make that the default > -- > > Key: HBASE-14347 > URL: https://issues.apache.org/jira/browse/HBASE-14347 > Project: HBase > Issue Type: Bug > Components: Client, defaults, regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2, 0.98.15, 1.0.3 >Reporter: Esteban Gutierrez >Assignee: Esteban Gutierrez > Attachments: HBASE-14347-v001.patch > > > Since HBASE-1936 we have the option to load jars dynamically by default from > HDFS or the local filesystem, however hbase.dynamic.jars.dir points to a > directory that could be world writable it potentially opens a security > problem in both the client side and the RS. We should consider to have a > switch to enable or disable this option and it should be off by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14540) Write Ahead Log Batching Optimization
[ https://issues.apache.org/jira/browse/HBASE-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941309#comment-14941309 ] John Leach commented on HBASE-14540: Elliott, that is what I intuitively thought as well for a long time. A few implementations have changed my mind on this... FYI, Here is a nice article on smart batching and why it is important even in low latency systems. http://mechanical-sympathy.blogspot.com/2011/10/smart-batching.html Stack, let me know if I can help on the testing front. I know you put a ton of work in on the disruptor piece. > Write Ahead Log Batching Optimization > - > > Key: HBASE-14540 > URL: https://issues.apache.org/jira/browse/HBASE-14540 > Project: HBase > Issue Type: Improvement >Reporter: John Leach > Attachments: HBaseWALBlockingWaitStrategy.java > > > The new write ahead log mechanism seems to batch too few mutations when > running inside the disruptor. As we scaled our load up (many threads with > small writes), we saw the number of hdfs sync operations grow in concert with > the number of writes. Generally, one would expect the size of the batches to > grow but the number of actual sync operations to settle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14347) Add a switch to DynamicClassLoader to disable it and make that the default
[ https://issues.apache.org/jira/browse/HBASE-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941372#comment-14941372 ] Esteban Gutierrez commented on HBASE-14347: --- +1 Thanks for working on this one [~huaxiang]. [~mbertozzi] would be an option for you to load remotely from the RS or the master those jars? For a secure environment I think the only requirement would be to have a signed jar. > Add a switch to DynamicClassLoader to disable it and make that the default > -- > > Key: HBASE-14347 > URL: https://issues.apache.org/jira/browse/HBASE-14347 > Project: HBase > Issue Type: Bug > Components: Client, defaults, regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2, 0.98.15, 1.0.3 >Reporter: Esteban Gutierrez >Assignee: huaxiang sun > Attachments: HBASE-14347-v001.patch > > > Since HBASE-1936 we have the option to load jars dynamically by default from > HDFS or the local filesystem, however hbase.dynamic.jars.dir points to a > directory that could be world writable it potentially opens a security > problem in both the client side and the RS. We should consider to have a > switch to enable or disable this option and it should be off by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14347) Add a switch to DynamicClassLoader to disable it and make that the default
[ https://issues.apache.org/jira/browse/HBASE-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941403#comment-14941403 ] huaxiang sun commented on HBASE-14347: -- [~mbertozzi][~esteban] I will follow up, thanks for the reviewing. > Add a switch to DynamicClassLoader to disable it and make that the default > -- > > Key: HBASE-14347 > URL: https://issues.apache.org/jira/browse/HBASE-14347 > Project: HBase > Issue Type: Bug > Components: Client, defaults, regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2, 0.98.15, 1.0.3 >Reporter: Esteban Gutierrez >Assignee: huaxiang sun > Attachments: HBASE-14347-v001.patch > > > Since HBASE-1936 we have the option to load jars dynamically by default from > HDFS or the local filesystem, however hbase.dynamic.jars.dir points to a > directory that could be world writable it potentially opens a security > problem in both the client side and the RS. We should consider to have a > switch to enable or disable this option and it should be off by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14542) test-util.sh completely broken
Elliott Clark created HBASE-14542: - Summary: test-util.sh completely broken Key: HBASE-14542 URL: https://issues.apache.org/jira/browse/HBASE-14542 Project: HBase Issue Type: Bug Reporter: Elliott Clark None of the flags work. It tried to find tests in src/test/java not in the modules. It leaves maven processes running even if you crtl-c -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14347) Add a switch to DynamicClassLoader to disable it and make that the default
[ https://issues.apache.org/jira/browse/HBASE-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Esteban Gutierrez updated HBASE-14347: -- Assignee: huaxiang sun (was: Esteban Gutierrez) > Add a switch to DynamicClassLoader to disable it and make that the default > -- > > Key: HBASE-14347 > URL: https://issues.apache.org/jira/browse/HBASE-14347 > Project: HBase > Issue Type: Bug > Components: Client, defaults, regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2, 0.98.15, 1.0.3 >Reporter: Esteban Gutierrez >Assignee: huaxiang sun > Attachments: HBASE-14347-v001.patch > > > Since HBASE-1936 we have the option to load jars dynamically by default from > HDFS or the local filesystem, however hbase.dynamic.jars.dir points to a > directory that could be world writable it potentially opens a security > problem in both the client side and the RS. We should consider to have a > switch to enable or disable this option and it should be off by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12911) Client-side metrics
[ https://issues.apache.org/jira/browse/HBASE-12911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941358#comment-14941358 ] Hadoop QA commented on HBASE-12911: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12764781/12911.yammer.v02.patch against master branch at commit 030ae5f0415b97e5da688c1432ed53fd56990194. ATTACHMENT ID: 12764781 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 31 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15859//console This message is automatically generated. > Client-side metrics > --- > > Key: HBASE-12911 > URL: https://issues.apache.org/jira/browse/HBASE-12911 > Project: HBase > Issue Type: New Feature > Components: Client, Operability, Performance >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 2.0.0, 1.3.0 > > Attachments: 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, > 0001-HBASE-12911-Client-side-metrics.patch, 12911-0.98.00.patch, > 12911-branch-1.00.patch, 12911.yammer.jpg, 12911.yammer.v00.patch, > 12911.yammer.v01.patch, 12911.yammer.v02.patch, am.jpg, client metrics > RS-Master.jpg, client metrics client.jpg, conn_agg.jpg, connection > attributes.jpg, ltt.jpg, standalone.jpg > > > There's very little visibility into the hbase client. Folks who care to add > some kind of metrics collection end up wrapping Table method invocations with > {{System.currentTimeMillis()}}. For a crude example of this, have a look at > what I did in {{PerformanceEvaluation}} for exposing requests latencies up to > {{IntegrationTestRegionReplicaPerf}}. The client is quite complex, there's a > lot going on under the hood that is impossible to see right now without a > profiler. Being a crucial part of the performance of this distributed system, > we should have deeper visibility into the client's function. > I'm not sure that wiring into the hadoop metrics system is the right choice > because the client is often embedded as a library in a user's application. We > should have integration with our metrics tools so that, i.e., a client > embedded in a coprocessor can report metrics through the usual RS channels, > or a client used in a MR job can do the same. > I would propose an interface-based system with pluggable implementations. Out > of the box we'd include a hadoop-metrics implementation and one other, > possibly [dropwizard/metrics|https://github.com/dropwizard/metrics]. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14347) Add a switch to DynamicClassLoader to disable it and make that the default
[ https://issues.apache.org/jira/browse/HBASE-14347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941392#comment-14941392 ] Matteo Bertozzi commented on HBASE-14347: - [~esteban] in theory we are already supposed to do the "remote" load. the problem is the code that copies those "remote" locally. I think that was done because it was the easy way to load the class form remote since you have the friendly API that loads the class by using addUrl() where url is expected to be something that java understand and hdfs is not. Looking at the classLoader API there is a defineClass() that takes an array of bytes. In theory we can leverage that to open the hdfs stream (the jar we want to load) and add the class to our class loader and avoid the copy-to-local step. In that way we can get even rid of the tmp dir. https://docs.oracle.com/javase/7/docs/api/java/security/SecureClassLoader.html#defineClass(java.lang.String,%20byte[],%20int,%20int,%20java.security.CodeSource) I'll let [~huaxiang] look into that, if it is something possible or not. > Add a switch to DynamicClassLoader to disable it and make that the default > -- > > Key: HBASE-14347 > URL: https://issues.apache.org/jira/browse/HBASE-14347 > Project: HBase > Issue Type: Bug > Components: Client, defaults, regionserver >Affects Versions: 2.0.0, 1.2.0, 1.1.2, 0.98.15, 1.0.3 >Reporter: Esteban Gutierrez >Assignee: huaxiang sun > Attachments: HBASE-14347-v001.patch > > > Since HBASE-1936 we have the option to load jars dynamically by default from > HDFS or the local filesystem, however hbase.dynamic.jars.dir points to a > directory that could be world writable it potentially opens a security > problem in both the client side and the RS. We should consider to have a > switch to enable or disable this option and it should be off by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14536) Balancer & SSH interfering with each other leading to unavailability
[ https://issues.apache.org/jira/browse/HBASE-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941552#comment-14941552 ] Devaraj Das commented on HBASE-14536: - [~stack] it might be a different one. [~sjiang] checked. > Balancer & SSH interfering with each other leading to unavailability > > > Key: HBASE-14536 > URL: https://issues.apache.org/jira/browse/HBASE-14536 > Project: HBase > Issue Type: Bug > Components: master, Region Assignment >Affects Versions: 1.1.2 >Reporter: Devaraj Das >Assignee: Stephen Yuan Jiang > Fix For: 1.1.4 > > Attachments: master-log.tgz > > > Came across this in our cluster: > 1. The meta was assigned to a server 10.0.0.149,16020,1443507203340 > {noformat} > 2015-09-29 06:16:22,472 DEBUG [AM.ZK.Worker-pool2-t56] > master.RegionStates: Onlined 1588230740 on > 10.0.0.149,16020,1443507203340 {ENCODED => 1588230740, NAME => > 'hbase:meta,,1', STARTKEY => '', ENDKEY => ''} > {noformat} > 2. The server dies at some point: > {noformat} > 2015-09-29 06:18:25,952 INFO [main-EventThread] > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, > processing expiration [10.0.0.149,16020,1443507203340] > 2015-09-29 06:18:25,955 DEBUG [main-EventThread] master.AssignmentManager: > based on AM, current > region=hbase:meta,,1.1588230740 is on server=10.0.0.149,16020,1443507203340 > server being checked: > 10.0.0.149,16020,1443507203340 > {noformat} > 3. The balancer had computed a plan that contained a move for the meta: > {noformat} > 2015-09-29 06:18:26,833 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] master.HMaster: > balance hri=hbase:meta,,1.1588230740, > src=10.0.0.149,16020,1443507203340, dest=10.0.0.205,16020,1443507257905 > {noformat} > 4. The following ensues after this, leading to the meta remaining unassigned: > {noformat} > 2015-09-29 06:18:26,859 DEBUG > [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Offline hbase:meta,,1.1588230740, no need to > unassign since it's on a dead server: 10.0.0.149,16020,1443507203340 > .. > 2015-09-29 06:18:26,899 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] master.RegionStates: > Offlined 1588230740 from 10.0.0.149,16020,1443507203340 > . > 2015-09-29 06:18:26,914 INFO > [B.defaultRpcServer.handler=12,queue=0,port=16000] > master.AssignmentManager: Skip assigning hbase:meta,,1.1588230740, it is > on a dead but not processed yet server: 10.0.0.149,16020,1443507203340 > > 2015-09-29 06:18:26,915 DEBUG [AM.ZK.Worker-pool2-t58] > master.AssignmentManager: Znode hbase:meta,,1.1588230740 deleted, > state: {1588230740 state=OFFLINE, ts=1443507506914, > server=10.0.0.149,16020,1443507203340} > > 2015-09-29 06:18:29,447 DEBUG > [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] master.AssignmentManager: > based on AM, current > region=hbase:meta,,1.1588230740 is on server=null server being checked: > 10.0.0.149,16020,1443507203340 > 2015-09-29 06:18:29,451 INFO [MASTER_META_SERVER_OPERATIONS- > 10.0.0.148:16000-2] handler.MetaServerShutdownHandler: META has been > assigned to otherwhere, skip assigning. > 2015-09-29 06:18:29,452 DEBUG > [MASTER_META_SERVER_OPERATIONS-10.0.0.148:16000-2] > master.DeadServer: Finished processing 10.0.0.149,16020,1443507203340 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14543) Have findHangingTests.py dump more info
[ https://issues.apache.org/jira/browse/HBASE-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-14543. --- Resolution: Fixed Fix Version/s: 2.0.0 Pushed some dev tooling hackery to master. > Have findHangingTests.py dump more info > --- > > Key: HBASE-14543 > URL: https://issues.apache.org/jira/browse/HBASE-14543 > Project: HBase > Issue Type: Sub-task > Components: tooling >Reporter: stack >Assignee: stack > Fix For: 2.0.0 > > Attachments: 14543.patch > > > Running dump hanging tests, you can get a result that says no hanging tests > and no test failures but the patch may not have applied or the hangs may be > because the test was killed. Would be good to know what machine we were > running on, what branch, and what patch, when we run the tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14543) Have findHangingTests.py dump more info
[ https://issues.apache.org/jira/browse/HBASE-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14543: -- Attachment: 14543.patch Some ugly hacking on your little python script. No one is allowed look at it because it so ugly. Here is what it dumps out now: {code} Fetching https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/15839/consoleText Building remotely on H7 (Mapreduce Falcon Hadoop Pig Zookeeper Tez Hdfs) in workspace /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build Testing patch for HBASE-13819. Testing patch on branch branch-1. [INFO] Apache HBase .. SUCCESS [2.840s] [INFO] Apache HBase - Checkstyle . SUCCESS [0.499s] [INFO] Apache HBase - Resource Bundle SUCCESS [0.170s] [INFO] Apache HBase - Annotations SUCCESS [0.948s] [INFO] Apache HBase - Protocol ... SUCCESS [11.783s] [INFO] Apache HBase - Common . SUCCESS [1:28.325s] [INFO] Apache HBase - Procedure .. SUCCESS [1:54.765s] [INFO] Apache HBase - Client . SUCCESS [1:23.896s] [INFO] Apache HBase - Hadoop Compatibility ... SUCCESS [7.361s] [INFO] Apache HBase - Hadoop Two Compatibility ... SUCCESS [7.552s] [INFO] Apache HBase - Prefix Tree SUCCESS [10.622s] [INFO] Apache HBase - Server . FAILURE [1:51:00.004s] [INFO] Apache HBase - Testing Util ... SKIPPED [INFO] Apache HBase - Thrift . SKIPPED [INFO] Apache HBase - Rest ... SKIPPED [INFO] Apache HBase - Shell .. SKIPPED [INFO] Apache HBase - Integration Tests .. SKIPPED [INFO] Apache HBase - Examples ... SKIPPED [INFO] Apache HBase - External Block Cache ... SKIPPED [INFO] Apache HBase - Assembly ... SKIPPED [INFO] Apache HBase - Shaded . SKIPPED [INFO] Apache HBase - Shaded - Client SKIPPED [INFO] Apache HBase - Shaded - Server SKIPPED Printing hanging tests Printing Failing tests Failing test : org.apache.hadoop.hbase.master.handler.TestEnableTableHandler {code} Or, if failed apply: {code} Fetching https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/15859/consoleText Building remotely on H0 (Hadoop Tez) in workspace /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build Testing patch for HBASE-12911. Testing patch on branch master. PATCH APPLICATION FAILED {code} > Have findHangingTests.py dump more info > --- > > Key: HBASE-14543 > URL: https://issues.apache.org/jira/browse/HBASE-14543 > Project: HBase > Issue Type: Sub-task > Components: tooling >Reporter: stack >Assignee: stack > Attachments: 14543.patch > > > Running dump hanging tests, you can get a result that says no hanging tests > and no test failures but the patch may not have applied or the hangs may be > because the test was killed. Would be good to know what machine we were > running on, what branch, and what patch, when we run the tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-14367: Status: Open (was: Patch Available) > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14420) Zombie Stomping Session
[ https://issues.apache.org/jira/browse/HBASE-14420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14420: -- Attachment: hangers.txt Little report on last 20 patch runs created by doing: {code} for i in `seq 15839 15859`; do python ./dev-support/findHangingTests.py https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/$i/consoleText >> /tmp/report.txt ; done {code} 4 of 20 passed 1 failed because patch did not apply 1 was a 0.98 build that failed a DLR test 4 had hanging tests/zombies. Others were test failures... Flakies Hanger incidence is falling but not cured yet. > Zombie Stomping Session > --- > > Key: HBASE-14420 > URL: https://issues.apache.org/jira/browse/HBASE-14420 > Project: HBase > Issue Type: Umbrella > Components: test >Reporter: stack >Assignee: stack >Priority: Critical > Attachments: hangers.txt > > > Patch build are now failing most of the time because we are dropping zombies. > I confirm we are doing this on non-apache build boxes too. > Left-over zombies consume resources on build boxes (OOME cannot create native > threads). Having to do multiple test runs in the hope that we can get a > non-zombie-making build or making (arbitrary) rulings that the zombies are > 'not related' is a productivity sink. And so on... > This is an umbrella issue for a zombie stomping session that started earlier > this week. Will hang sub-issues of this one. Am running builds back-to-back > on little cluster to turn out the monsters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14519) Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that can hang
[ https://issues.apache.org/jira/browse/HBASE-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14519: -- Fix Version/s: 0.98.16 1.0.3 1.3.0 1.2.0 2.0.0 > Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that > can hang > > > Key: HBASE-14519 > URL: https://issues.apache.org/jira/browse/HBASE-14519 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 0.98.16 > > Attachments: 14519.txt, 14519v2.txt > > > It came in here: > commit 7a7ab8b8da795177f42e434b1ab1b468e5cd035a > Author: Devaraj Das> Date: Sun May 12 06:47:39 2013 + > HBASE-7932. Introduces Favored Nodes for region files. Adds a balancer > called FavoredNodeLoadBalancer that will honor favored nodes in the process > of balancing but the balance operation is currently a no-op (Devaraj Das) > git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1481476 > 13f79535-47bb-0310-9956-ffa450edef68 > I've already purged the other test that came in on this patch... over in > HBASE-14486 > The test hung here: > https://builds.apache.org/job/PreCommit-HBASE-Build/15823//console > ... though we seemed to have exited abnormally. > Will let this issue hang around a while in case someone disagrees on removal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14519) Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that can hang
[ https://issues.apache.org/jira/browse/HBASE-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14519: -- Attachment: 14519v2.txt Ok. Disabled the test instead. Applied to 0.98+ > Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that > can hang > > > Key: HBASE-14519 > URL: https://issues.apache.org/jira/browse/HBASE-14519 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Attachments: 14519.txt, 14519v2.txt > > > It came in here: > commit 7a7ab8b8da795177f42e434b1ab1b468e5cd035a > Author: Devaraj Das> Date: Sun May 12 06:47:39 2013 + > HBASE-7932. Introduces Favored Nodes for region files. Adds a balancer > called FavoredNodeLoadBalancer that will honor favored nodes in the process > of balancing but the balance operation is currently a no-op (Devaraj Das) > git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1481476 > 13f79535-47bb-0310-9956-ffa450edef68 > I've already purged the other test that came in on this patch... over in > HBASE-14486 > The test hung here: > https://builds.apache.org/job/PreCommit-HBASE-Build/15823//console > ... though we seemed to have exited abnormally. > Will let this issue hang around a while in case someone disagrees on removal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Antonov updated HBASE-14367: Status: Patch Available (was: Open) > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14543) Have findHangingTests.py dump more info
stack created HBASE-14543: - Summary: Have findHangingTests.py dump more info Key: HBASE-14543 URL: https://issues.apache.org/jira/browse/HBASE-14543 Project: HBase Issue Type: Sub-task Components: tooling Reporter: stack Assignee: stack Running dump hanging tests, you can get a result that says no hanging tests and no test failures but the patch may not have applied or the hangs may be because the test was killed. Would be good to know what machine we were running on, what branch, and what patch, when we run the tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14538) Remove TestVisibilityLabelsWithDistributedLogReplay, a test for an unsupported feature
[ https://issues.apache.org/jira/browse/HBASE-14538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-14538. --- Resolution: Fixed Assignee: stack I just remove the test from 0.98, branch-1.1 and branch-1.0. These branches still have some DLR tests running but this one was using loads of resources so purging. Got +1 from @apurtell over in HBASE-13744 > Remove TestVisibilityLabelsWithDistributedLogReplay, a test for an > unsupported feature > -- > > Key: HBASE-14538 > URL: https://issues.apache.org/jira/browse/HBASE-14538 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 1.0.3, 1.1.3, 0.98.16 > > > Remove tests that do DLR. I saw one hang over on branch 0.98 test just now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13744) TestCorruptedRegionStoreFile is flaky
[ https://issues.apache.org/jira/browse/HBASE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941703#comment-14941703 ] Andrew Purtell commented on HBASE-13744: bq. We can remove TestVisibilityLabelsWithDistributedLogReplay. It is testing a feature we no longer support. It is removed in master as part of HBASE-12751. Let me remove it as far back as 0.98. +1 to that Ok, committing the change on this issue everywhere shortly > TestCorruptedRegionStoreFile is flaky > - > > Key: HBASE-13744 > URL: https://issues.apache.org/jira/browse/HBASE-13744 > Project: HBase > Issue Type: Bug >Reporter: Andrew Purtell >Assignee: Andrew Purtell > Fix For: 0.98.15 > > Attachments: HBASE-13744-0.98.patch > > > TestCorruptedRegionStoreFile#testLosingFileAfterScannerInit is failing on > recent Jenkins 0.98 builds and I can reproduce it with a few runs locally, > though not every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14523) rolling-restart.sh --graceful will start regionserver process on master node
[ https://issues.apache.org/jira/browse/HBASE-14523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941768#comment-14941768 ] Ted Yu commented on HBASE-14523: Test didn't run: {code} /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/test-framework/dev-support/test-patch.sh: line 838: 10381 Killed $MVN clean test -Dsurefire.rerunFailingTestsCount=2 -P runAllTests -D${PROJECT_NAME}PatchProcess {code} > rolling-restart.sh --graceful will start regionserver process on master node > > > Key: HBASE-14523 > URL: https://issues.apache.org/jira/browse/HBASE-14523 > Project: HBase > Issue Type: Bug > Components: scripts >Affects Versions: 2.0.0 >Reporter: Samir Ahmic >Assignee: Samir Ahmic > Fix For: 2.0.0 > > Attachments: HBASE-14523.patch, HBASE-14523v2.patch > > > In master branch master acts also as regionserver hosting 'hbase:meta' table > and it has ephemeral znode created in '/hbase/rs'. Because of this > rolling-restart.sh --graceful will pick up master server from zk at this line: > {code} > online_regionservers=`$bin/hbase zkcli ls $zkrs 2>&1 | tail -1 | sed "s/\[//" > | sed "s/\]//"` > {code} > and will restart it a long with rest of regionservers. > I'm planing to add some code to rolling-restart.sh script to filter master > server from above list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14490) [RpcServer] reuse request read buffer
[ https://issues.apache.org/jira/browse/HBASE-14490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Ikeda updated HBASE-14490: -- Attachment: ByteBufferPool.java bq. Shared pool is a perfect way but we need a low cost way of sharing. Added an example class. It might be possible to keep many buffers so that you might want to change some. > [RpcServer] reuse request read buffer > - > > Key: HBASE-14490 > URL: https://issues.apache.org/jira/browse/HBASE-14490 > Project: HBase > Issue Type: Improvement > Components: IPC/RPC >Affects Versions: 2.0.0, 1.0.2 >Reporter: Zephyr Guo >Assignee: Zephyr Guo > Labels: performance > Fix For: 2.0.0, 1.0.2 > > Attachments: ByteBufferPool.java, HBASE-14490-v1.patch, > HBASE-14490-v10.patch, HBASE-14490-v2.patch, HBASE-14490-v3.patch, > HBASE-14490-v4.patch, HBASE-14490-v5.patch, HBASE-14490-v6.patch, > HBASE-14490-v7.patch, HBASE-14490-v8.patch, HBASE-14490-v9.patch > > > Reuse buffer to read request.It's not necessary free data's buffer for each > request.Optimization is to reduce the times that allocate ByteBuffer. > *patch modification* > * {{saslReadAndProcess}} and {{processOneRpc}} accept a ByteBuffer instead of > byte[]. > * {{processUnwrappedData}} can reuse the same ByteBuffer that > {{saslReadAndProcess}} used. > * Maintaining a reused ByteBuffer each {{Connection}} for small request. > ** Buffer size is fixed. > ** Using a SoftReference to reference the buffer. > ** If request is too large, we allocate a temporary ByteBuffer.Freeing it > when {{process()}} will have been finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14544) Allow HConnectionImpl to not refresh the dns on errors
[ https://issues.apache.org/jira/browse/HBASE-14544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14544: -- Attachment: HBASE-14544.patch > Allow HConnectionImpl to not refresh the dns on errors > -- > > Key: HBASE-14544 > URL: https://issues.apache.org/jira/browse/HBASE-14544 > Project: HBase > Issue Type: Bug >Reporter: Elliott Clark >Assignee: Elliott Clark > Attachments: HBASE-14544.patch > > > Some clusters will have static ip addresses and forced dns lookup can cause > extra instability. Allow users to tun that feature off, if wanted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14519) Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that can hang
[ https://issues.apache.org/jira/browse/HBASE-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941822#comment-14941822 ] Hudson commented on HBASE-14519: SUCCESS: Integrated in HBase-1.3-IT #202 (See [https://builds.apache.org/job/HBase-1.3-IT/202/]) HBASE-14519 Purge TestFavoredNodeAssignmentHelper, a test for an (stack: rev 54dc20e920318b98d661ccd567995d704ddbe555) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java > Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that > can hang > > > Key: HBASE-14519 > URL: https://issues.apache.org/jira/browse/HBASE-14519 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 0.98.16 > > Attachments: 14519.txt, 14519v2.txt > > > It came in here: > commit 7a7ab8b8da795177f42e434b1ab1b468e5cd035a > Author: Devaraj Das> Date: Sun May 12 06:47:39 2013 + > HBASE-7932. Introduces Favored Nodes for region files. Adds a balancer > called FavoredNodeLoadBalancer that will honor favored nodes in the process > of balancing but the balance operation is currently a no-op (Devaraj Das) > git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1481476 > 13f79535-47bb-0310-9956-ffa450edef68 > I've already purged the other test that came in on this patch... over in > HBASE-14486 > The test hung here: > https://builds.apache.org/job/PreCommit-HBASE-Build/15823//console > ... though we seemed to have exited abnormally. > Will let this issue hang around a while in case someone disagrees on removal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14544) Allow HConnectionImpl to not refresh the dns on errors
[ https://issues.apache.org/jira/browse/HBASE-14544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941843#comment-14941843 ] stack commented on HBASE-14544: --- +1 > Allow HConnectionImpl to not refresh the dns on errors > -- > > Key: HBASE-14544 > URL: https://issues.apache.org/jira/browse/HBASE-14544 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.0, 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14544.patch > > > Some clusters will have static ip addresses and forced dns lookup can cause > extra instability. Allow users to tun that feature off, if wanted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14292) Call Me Maybe HBase links haved moved
[ https://issues.apache.org/jira/browse/HBASE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941892#comment-14941892 ] Andrew Purtell commented on HBASE-14292: Committing this trivial change and regenerating site today. > Call Me Maybe HBase links haved moved > - > > Key: HBASE-14292 > URL: https://issues.apache.org/jira/browse/HBASE-14292 > Project: HBase > Issue Type: Bug > Components: documentation >Reporter: Robert Yokota >Assignee: Andrew Purtell >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-14292.patch, HBASE-14292.patch > > > The links to the Yammer engineering blog have moved. > Please use the following links in section 83.5. Network Consistency and > Partition Tolerance > http://old.eng.yammer.com/call-me-maybe-hbase/ > http://old.eng.yammer.com/call-me-maybe-hbase-addendum/ > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14544) Allow HConnectionImpl to not refresh the dns on errors
Elliott Clark created HBASE-14544: - Summary: Allow HConnectionImpl to not refresh the dns on errors Key: HBASE-14544 URL: https://issues.apache.org/jira/browse/HBASE-14544 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Some clusters will have static ip addresses and forced dns lookup can cause extra instability. Allow users to tun that feature off, if wanted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14519) Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that can hang
[ https://issues.apache.org/jira/browse/HBASE-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941782#comment-14941782 ] Hadoop QA commented on HBASE-14519: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12764843/14519v2.txt against master branch at commit 83f5663a229955b04a55b9b0d6cca71b4d597933. ATTACHMENT ID: 12764843 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15860//console This message is automatically generated. > Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that > can hang > > > Key: HBASE-14519 > URL: https://issues.apache.org/jira/browse/HBASE-14519 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 0.98.16 > > Attachments: 14519.txt, 14519v2.txt > > > It came in here: > commit 7a7ab8b8da795177f42e434b1ab1b468e5cd035a > Author: Devaraj Das> Date: Sun May 12 06:47:39 2013 + > HBASE-7932. Introduces Favored Nodes for region files. Adds a balancer > called FavoredNodeLoadBalancer that will honor favored nodes in the process > of balancing but the balance operation is currently a no-op (Devaraj Das) > git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1481476 > 13f79535-47bb-0310-9956-ffa450edef68 > I've already purged the other test that came in on this patch... over in > HBASE-14486 > The test hung here: > https://builds.apache.org/job/PreCommit-HBASE-Build/15823//console > ... though we seemed to have exited abnormally. > Will let this issue hang around a while in case someone disagrees on removal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941806#comment-14941806 ] Sean Busbey commented on HBASE-14367: - okay, +1 presuming the above confirms that it isn't related. > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14327) TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky
[ https://issues.apache.org/jira/browse/HBASE-14327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941863#comment-14941863 ] Hudson commented on HBASE-14327: SUCCESS: Integrated in HBase-1.0 #1067 (See [https://builds.apache.org/job/HBase-1.0/1067/]) HBASE-14327 TestIOFencing#testFencingAroundCompactionAfterWALSync is (stack: rev 0cc5a5d8cffd901c7e0883f24da4e6e808aefb39) * hbase-server/src/test/java/org/apache/hadoop/hbase/TestIOFencing.java > TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky > -- > > Key: HBASE-14327 > URL: https://issues.apache.org/jira/browse/HBASE-14327 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Dima Spivak >Assignee: Heng Chen >Priority: Critical > Fix For: 2.0.0, 1.2.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: HBASE-14327.patch > > > I'm looking into some more of the flaky tests on trunk and this one seems to > be particularly gross, failing about half the time in recent days. Some > probably-relevant output from [a recent > run|https://builds.apache.org/job/HBase-TRUNK/6761/testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompactionAfterWALSync/]: > {noformat} > 2015-08-27 18:50:14,318 INFO [main] hbase.TestIOFencing(326): Allowing > compaction to proceed > 2015-08-27 18:50:14,318 DEBUG [main] > hbase.TestIOFencing$CompactionBlockerRegion(110): allowing compactions > 2015-08-27 18:50:14,318 DEBUG > [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.HStore(1732): > Removing store files after compaction... > 2015-08-27 18:50:14,323 DEBUG > [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1732): > Removing store files after compaction... > 2015-08-27 18:50:14,330 DEBUG > [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(224): > Archiving compacted store files. > 2015-08-27 18:50:14,331 DEBUG > [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(224): > Archiving compacted store files. > 2015-08-27 18:50:14,337 DEBUG > [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): > Finished archiving from class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464, > to > hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464 > 2015-08-27 18:50:14,337 DEBUG > [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438): > Finished archiving from class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe, > to > hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe > 2015-08-27 18:50:14,341 DEBUG > [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): > Finished archiving from class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, > file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc, > to > hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc > 2015-08-27 18:50:14,342 INFO > [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1353): > Completed compaction of 2 (all) file(s) in family of > tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311. into > e138bb0ec6c64ad19efab3b44dbbcb1a(size=68.7 K), total size for store is 146.9 > K. This selection was in queue for 0sec, and took 10sec to execute. > 2015-08-27 18:50:14,343 INFO > [RS:0;hemera:35619-longCompactions-1440701391112] > regionserver.CompactSplitThread$CompactionRunner(527): Completed compaction: > Request = > regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311., > storeName=family, fileCount=2, fileSize=73.1 K, priority=998, > time=525052314434020; duration=10sec > 2015-08-27 18:50:14,343 DEBUG > [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438): > Finished archiving from class > org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, >
[jira] [Commented] (HBASE-14519) Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that can hang
[ https://issues.apache.org/jira/browse/HBASE-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941862#comment-14941862 ] Hudson commented on HBASE-14519: SUCCESS: Integrated in HBase-1.0 #1067 (See [https://builds.apache.org/job/HBase-1.0/1067/]) HBASE-14519 Purge TestFavoredNodeAssignmentHelper, a test for an (stack: rev bda54c5c6d6ac5e56de419eccd2d70bec5268018) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java > Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that > can hang > > > Key: HBASE-14519 > URL: https://issues.apache.org/jira/browse/HBASE-14519 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 0.98.16 > > Attachments: 14519.txt, 14519v2.txt > > > It came in here: > commit 7a7ab8b8da795177f42e434b1ab1b468e5cd035a > Author: Devaraj Das> Date: Sun May 12 06:47:39 2013 + > HBASE-7932. Introduces Favored Nodes for region files. Adds a balancer > called FavoredNodeLoadBalancer that will honor favored nodes in the process > of balancing but the balance operation is currently a no-op (Devaraj Das) > git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1481476 > 13f79535-47bb-0310-9956-ffa450edef68 > I've already purged the other test that came in on this patch... over in > HBASE-14486 > The test hung here: > https://builds.apache.org/job/PreCommit-HBASE-Build/15823//console > ... though we seemed to have exited abnormally. > Will let this issue hang around a while in case someone disagrees on removal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14538) Remove TestVisibilityLabelsWithDistributedLogReplay, a test for an unsupported feature
[ https://issues.apache.org/jira/browse/HBASE-14538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941864#comment-14941864 ] Hudson commented on HBASE-14538: SUCCESS: Integrated in HBase-1.0 #1067 (See [https://builds.apache.org/job/HBase-1.0/1067/]) HBASE-14538 Remove TestVisibilityLabelsWithDistributedLogReplay, a test (stack: rev 5331b5484bbe833c236b5768cffbeec2b5459b7d) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithDistributedLogReplay.java > Remove TestVisibilityLabelsWithDistributedLogReplay, a test for an > unsupported feature > -- > > Key: HBASE-14538 > URL: https://issues.apache.org/jira/browse/HBASE-14538 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 1.0.3, 1.1.3, 0.98.16 > > > Remove tests that do DLR. I saw one hang over on branch 0.98 test just now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14546) Backport stub DNS re-resolution options to 0.98
Andrew Purtell created HBASE-14546: -- Summary: Backport stub DNS re-resolution options to 0.98 Key: HBASE-14546 URL: https://issues.apache.org/jira/browse/HBASE-14546 Project: HBase Issue Type: Task Reporter: Andrew Purtell Priority: Minor Fix For: 0.98.16 HBASE-12943 and HBASE-13067 addresses infinite caching preventing servers from rejoining a cluster using the same hostname but a different IP address. HBASE-14544 modifies this to be optional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941886#comment-14941886 ] Mikhail Antonov commented on HBASE-14367: - Thanks Sean, I can reproduce TestMasterFailover w/o this patch as well (at least on branch-1.2), I'll file a jira for that. Any more reviews? cc [~ndimiduk] > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14545) TestMasterFailover often times out
Mikhail Antonov created HBASE-14545: --- Summary: TestMasterFailover often times out Key: HBASE-14545 URL: https://issues.apache.org/jira/browse/HBASE-14545 Project: HBase Issue Type: Bug Components: test Affects Versions: 1.2.0 Reporter: Mikhail Antonov Fix For: 1.2.0 Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 301.644 sec <<< FAILURE! - in org.apache.hadoop.hbase.master.TestMasterFailover testMasterFailoverWithMockedRIT(org.apache.hadoop.hbase.master.TestMasterFailover) Time elapsed: 240.112 sec <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 24 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146) at org.apache.hadoop.hbase.MiniHBaseCluster.waitForActiveAndReadyMaster(MiniHBaseCluster.java:535) at org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:280) at org.apache.hadoop.hbase.master.TestMasterFailover.testMasterFailoverWithMockedRIT(TestMasterFailover.java:400) Results : Tests in error: TestMasterFailover.testMasterFailoverWithMockedRIT:400 » TestTimedOut test tim... Tests run: 7, Failures: 0, Errors: 1, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-12769) Replication fails to delete all corresponding zk nodes when peer is removed
[ https://issues.apache.org/jira/browse/HBASE-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reassigned HBASE-12769: -- Assignee: Jianwei Cui > Replication fails to delete all corresponding zk nodes when peer is removed > --- > > Key: HBASE-12769 > URL: https://issues.apache.org/jira/browse/HBASE-12769 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 0.99.2 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Attachments: 12769-v2.txt, 12769-v3.txt, 12769-v4.txt, > HBASE-12769-trunk-v0.patch, HBASE-12769-trunk-v1.patch > > > When removing a peer, the client side will delete peerId under peersZNode > node; then alive region servers will be notified and delete corresponding > hlog queues under its rsZNode of replication. However, if there are failed > servers whose hlog queues have not been transferred by alive servers(this > likely happens if setting a big value to "replication.sleep.before.failover" > and lots of region servers restarted), these hlog queues won't be deleted > after the peer is removed. I think remove_peer should guarantee all > corresponding zk nodes have been removed after it completes; otherwise, if we > create a new peer with the same peerId with the removed one, there might be > unexpected data to be replicated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941788#comment-14941788 ] Mikhail Antonov commented on HBASE-14367: - 'mvn test -P runLargeTests' pass on hbase-shell with this patch (there're no small or medium tests in this module). TestMasterFailover is flaky on my machine too. I'll try to test it more w/ and w/o this patch, as I doesn't look related. > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13744) TestCorruptedRegionStoreFile is flaky
[ https://issues.apache.org/jira/browse/HBASE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13744: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 1.3.0 1.2.0 2.0.0 Status: Resolved (was: Patch Available) Committed to all relevant branches. Test passes locally on every branch, looped 10 times. > TestCorruptedRegionStoreFile is flaky > - > > Key: HBASE-13744 > URL: https://issues.apache.org/jira/browse/HBASE-13744 > Project: HBase > Issue Type: Bug >Reporter: Andrew Purtell >Assignee: Andrew Purtell > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15 > > Attachments: HBASE-13744-0.98.patch > > > TestCorruptedRegionStoreFile#testLosingFileAfterScannerInit is failing on > recent Jenkins 0.98 builds and I can reproduce it with a few runs locally, > though not every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14542) test-util.sh completely broken
[ https://issues.apache.org/jira/browse/HBASE-14542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941877#comment-14941877 ] Andrew Purtell commented on HBASE-14542: Let's remove > test-util.sh completely broken > -- > > Key: HBASE-14542 > URL: https://issues.apache.org/jira/browse/HBASE-14542 > Project: HBase > Issue Type: Bug >Reporter: Elliott Clark > > None of the flags work. > It tried to find tests in src/test/java not in the modules. > It leaves maven processes running even if you crtl-c -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12769) Replication fails to delete all corresponding zk nodes when peer is removed
[ https://issues.apache.org/jira/browse/HBASE-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12769: --- Attachment: 12769-v4.txt > Replication fails to delete all corresponding zk nodes when peer is removed > --- > > Key: HBASE-12769 > URL: https://issues.apache.org/jira/browse/HBASE-12769 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 0.99.2 >Reporter: Jianwei Cui >Priority: Minor > Attachments: 12769-v2.txt, 12769-v3.txt, 12769-v4.txt, > HBASE-12769-trunk-v0.patch, HBASE-12769-trunk-v1.patch > > > When removing a peer, the client side will delete peerId under peersZNode > node; then alive region servers will be notified and delete corresponding > hlog queues under its rsZNode of replication. However, if there are failed > servers whose hlog queues have not been transferred by alive servers(this > likely happens if setting a big value to "replication.sleep.before.failover" > and lots of region servers restarted), these hlog queues won't be deleted > after the peer is removed. I think remove_peer should guarantee all > corresponding zk nodes have been removed after it completes; otherwise, if we > create a new peer with the same peerId with the removed one, there might be > unexpected data to be replicated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14544) Allow HConnectionImpl to not refresh the dns on errors
[ https://issues.apache.org/jira/browse/HBASE-14544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14544: -- Fix Version/s: 1.3.0 1.2.0 2.0.0 Affects Version/s: 1.2.0 1.1.2 Status: Patch Available (was: Open) > Allow HConnectionImpl to not refresh the dns on errors > -- > > Key: HBASE-14544 > URL: https://issues.apache.org/jira/browse/HBASE-14544 > Project: HBase > Issue Type: Bug >Affects Versions: 1.1.2, 1.2.0 >Reporter: Elliott Clark >Assignee: Elliott Clark > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14544.patch > > > Some clusters will have static ip addresses and forced dns lookup can cause > extra instability. Allow users to tun that feature off, if wanted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9049) Generalize ServerCallable creation to support custom callables
[ https://issues.apache.org/jira/browse/HBASE-9049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941883#comment-14941883 ] Jesse Yates commented on HBASE-9049: Thinking about this.. it seems like we might want to keep the class around in the interest of utility for users. If we do, then it should have some sort of docs (even if just minimal) XOR we throw out the class. Yes, its a simple class and anyone could rewrite it, but this saves everyone from rewriting it. I dunno, maybe ppl don't need it? Or the PMC wants a slimmer codebase? What say you mr. [~stack]? > Generalize ServerCallable creation to support custom callables > -- > > Key: HBASE-9049 > URL: https://issues.apache.org/jira/browse/HBASE-9049 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.0, 0.95.2, 0.94.11 >Reporter: Jesse Yates >Assignee: Jesse Yates > Attachments: hbase-9049-trunk-v0.patch, hbase-9049-trunk-v1.patch, > hbase-9049-trunk-v2.patch, hbase-9049-trunk-v3.patch, > hbase-9049-trunk-v4.patch > > > Currently, sever callables are instantiated via direct calls. Instead, we can > use a single factory and that allows more specialized callable > implementations, for instance, using a circuit-breaker pattern (or the > Hystrix implementation!) to minimize attempts to contact the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12769) Replication fails to delete all corresponding zk nodes when peer is removed
[ https://issues.apache.org/jira/browse/HBASE-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941885#comment-14941885 ] Ted Yu commented on HBASE-12769: [~apurtell]: Mind taking a look at patch v4 ? > Replication fails to delete all corresponding zk nodes when peer is removed > --- > > Key: HBASE-12769 > URL: https://issues.apache.org/jira/browse/HBASE-12769 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 0.99.2 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Attachments: 12769-v2.txt, 12769-v3.txt, 12769-v4.txt, > HBASE-12769-trunk-v0.patch, HBASE-12769-trunk-v1.patch > > > When removing a peer, the client side will delete peerId under peersZNode > node; then alive region servers will be notified and delete corresponding > hlog queues under its rsZNode of replication. However, if there are failed > servers whose hlog queues have not been transferred by alive servers(this > likely happens if setting a big value to "replication.sleep.before.failover" > and lots of region servers restarted), these hlog queues won't be deleted > after the peer is removed. I think remove_peer should guarantee all > corresponding zk nodes have been removed after it completes; otherwise, if we > create a new peer with the same peerId with the removed one, there might be > unexpected data to be replicated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12790) Support fairness across parallelized scans
[ https://issues.apache.org/jira/browse/HBASE-12790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941922#comment-14941922 ] Andrew Purtell commented on HBASE-12790: I added a comment above as a reply on another comment. Basically, perf gain and impact not characterized well enough yet, but it's just a testing issue I think and an improved test run and analysis will do the trick. See https://issues.apache.org/jira/browse/HBASE-12790?focusedCommentId=14941919=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14941919 > Support fairness across parallelized scans > -- > > Key: HBASE-12790 > URL: https://issues.apache.org/jira/browse/HBASE-12790 > Project: HBase > Issue Type: New Feature >Reporter: James Taylor >Assignee: ramkrishna.s.vasudevan > Labels: Phoenix > Attachments: AbstractRoundRobinQueue.java, HBASE-12790.patch, > HBASE-12790_1.patch, HBASE-12790_5.patch, HBASE-12790_callwrapper.patch, > HBASE-12790_trunk_1.patch > > > Some HBase clients parallelize the execution of a scan to reduce latency in > getting back results. This can lead to starvation with a loaded cluster and > interleaved scans, since the RPC queue will be ordered and processed on a > FIFO basis. For example, if there are two clients, A & B that submit largish > scans at the same time. Say each scan is broken down into 100 scans by the > client (broken down into equal depth chunks along the row key), and the 100 > scans of client A are queued first, followed immediately by the 100 scans of > client B. In this case, client B will be starved out of getting any results > back until the scans for client A complete. > One solution to this is to use the attached AbstractRoundRobinQueue instead > of the standard FIFO queue. The queue to be used could be (maybe it already > is) configurable based on a new config parameter. Using this queue would > require the client to have the same identifier for all of the 100 parallel > scans that represent a single logical scan from the clients point of view. > With this information, the round robin queue would pick off a task from the > queue in a round robin fashion (instead of a strictly FIFO manner) to prevent > starvation over interleaved parallelized scans. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14475) Region split requests are always audited with "hbase" user rather than request user
[ https://issues.apache.org/jira/browse/HBASE-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941967#comment-14941967 ] stack commented on HBASE-14475: --- Seems to have broken TestMasterFailover. [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hbase-server 7aaef0f9208654acbb352804ee823337417e6488 is the first bad commit commit 7aaef0f9208654acbb352804ee823337417e6488 Author: Andrew PurtellDate: Thu Oct 1 12:09:13 2015 -0700 HBASE-14475 Region split requests are always audited with hbase user rather than request user (Ted Yu) :04 04 3f4a4707f472f34d6cbfcb29c0a8c93d64328a17 a22025f35cc6419ac9b61659df38445ff561a561 M hbase-server bisect run success Here is my little script: #!/bin/sh mvn clean && mvn install -DskipTests && mvn test -Dtest=TestMasterFailover Fixing over in HBASE-14545. Looking at patch, not sure how... but when I remove this, TestMasterFailover no longer times out. > Region split requests are always audited with "hbase" user rather than > request user > --- > > Key: HBASE-14475 > URL: https://issues.apache.org/jira/browse/HBASE-14475 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14475-0.98.txt, 14475-branch-1-v2.txt, > 14475-branch-1-v3.txt, 14475-v2.txt, 14475-v3.txt, 14475-v3.txt, > HBASE-14475-branch-1.0.patch > > > [~madhan.neethiraj] from Ranger reported that when a region split request is > initiated from the user, we always audit (and do the permission check) > against the hbase user, not the request user. > The issue is that a split request that is coming from the user is only > processed at a later time from the CompactSplitThread asynchronously to the > splitRegion RPC. > RSRpcServices.splitRegion() only does a flush from the handler thread and > then calls regionServer.compactSplitThread.requestSplit() which puts a > SplitRequest to the split queue. The split request is handled by the split > executor from CompactSplitThread. > Since the split is actually executed from the compact split thread, the > preSplit() for the AccessController is called from the executor thread. In > this thread, we no longer have the user who initially requested the split, so > the user in the context (UGI) is "hbase", causing the AC.preSplit() access > control check to be always be performed against the hbase user, not the user > who have submitted the request. The audit log also contains "hbase" user > rather than the actual user. > Luckily, the split forces a flush to the region in-line (from the handler > thread), which requires a {{CREATE|ADMIN}} permission. split requires > {{ADMIN}}, but due to this bug {{CREATE}} is also sufficient (although we > have not verified it manually). {{CREATE}} permission can do flush and > compactions, so this is not a security issue (I think). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14432) Procedure V2 - enforce ACL on procedure admin tasks
[ https://issues.apache.org/jira/browse/HBASE-14432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14432: --- Attachment: HBASE-14432-draft.patch > Procedure V2 - enforce ACL on procedure admin tasks > --- > > Key: HBASE-14432 > URL: https://issues.apache.org/jira/browse/HBASE-14432 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0, 1.1.2, 1.3.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Labels: security > Attachments: HBASE-14432-draft.patch > > > In the Procedure class, the owner field is never set. We need to set it so > that we can enforce ACLs on admin tasks such as whether a user has privilege > to abort a procedure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14432) Procedure V2 - enforce ACL on procedure admin tasks
[ https://issues.apache.org/jira/browse/HBASE-14432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942015#comment-14942015 ] Stephen Yuan Jiang commented on HBASE-14432: Update the earlier patch by adding Access Control coprocessor and setOwner of procedures for procedure V2 admin tasks. The change could be reviewed in RB: https://reviews.apache.org/r/38974/ > Procedure V2 - enforce ACL on procedure admin tasks > --- > > Key: HBASE-14432 > URL: https://issues.apache.org/jira/browse/HBASE-14432 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0, 1.1.2, 1.3.0 >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Labels: security > Attachments: HBASE-14432-draft.patch > > > In the Procedure class, the owner field is never set. We need to set it so > that we can enforce ACLs on admin tasks such as whether a user has privilege > to abort a procedure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14519) Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that can hang
[ https://issues.apache.org/jira/browse/HBASE-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942019#comment-14942019 ] Hudson commented on HBASE-14519: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1092 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1092/]) HBASE-14519 Purge TestFavoredNodeAssignmentHelper, a test for an (stack: rev 931eb8c59c9abb4be4fc69ff881a909daf4c2f4e) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java > Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that > can hang > > > Key: HBASE-14519 > URL: https://issues.apache.org/jira/browse/HBASE-14519 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 0.98.16 > > Attachments: 14519.txt, 14519v2.txt > > > It came in here: > commit 7a7ab8b8da795177f42e434b1ab1b468e5cd035a > Author: Devaraj Das> Date: Sun May 12 06:47:39 2013 + > HBASE-7932. Introduces Favored Nodes for region files. Adds a balancer > called FavoredNodeLoadBalancer that will honor favored nodes in the process > of balancing but the balance operation is currently a no-op (Devaraj Das) > git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1481476 > 13f79535-47bb-0310-9956-ffa450edef68 > I've already purged the other test that came in on this patch... over in > HBASE-14486 > The test hung here: > https://builds.apache.org/job/PreCommit-HBASE-Build/15823//console > ... though we seemed to have exited abnormally. > Will let this issue hang around a while in case someone disagrees on removal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14519) Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that can hang
[ https://issues.apache.org/jira/browse/HBASE-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942021#comment-14942021 ] Hudson commented on HBASE-14519: FAILURE: Integrated in HBase-TRUNK #6866 (See [https://builds.apache.org/job/HBase-TRUNK/6866/]) HBASE-14519 Purge TestFavoredNodeAssignmentHelper, a test for an (stack: rev 83f5663a229955b04a55b9b0d6cca71b4d597933) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java > Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that > can hang > > > Key: HBASE-14519 > URL: https://issues.apache.org/jira/browse/HBASE-14519 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 0.98.16 > > Attachments: 14519.txt, 14519v2.txt > > > It came in here: > commit 7a7ab8b8da795177f42e434b1ab1b468e5cd035a > Author: Devaraj Das> Date: Sun May 12 06:47:39 2013 + > HBASE-7932. Introduces Favored Nodes for region files. Adds a balancer > called FavoredNodeLoadBalancer that will honor favored nodes in the process > of balancing but the balance operation is currently a no-op (Devaraj Das) > git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1481476 > 13f79535-47bb-0310-9956-ffa450edef68 > I've already purged the other test that came in on this patch... over in > HBASE-14486 > The test hung here: > https://builds.apache.org/job/PreCommit-HBASE-Build/15823//console > ... though we seemed to have exited abnormally. > Will let this issue hang around a while in case someone disagrees on removal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13744) TestCorruptedRegionStoreFile is flaky
[ https://issues.apache.org/jira/browse/HBASE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942022#comment-14942022 ] Hudson commented on HBASE-13744: FAILURE: Integrated in HBase-TRUNK #6866 (See [https://builds.apache.org/job/HBase-TRUNK/6866/]) HBASE-13744 TestCorruptedRegionStoreFile is flaky (apurtell: rev 39c0b8f6db2c609fcc74f09ece00fdfb8c5aa003) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCorruptedRegionStoreFile.java > TestCorruptedRegionStoreFile is flaky > - > > Key: HBASE-13744 > URL: https://issues.apache.org/jira/browse/HBASE-13744 > Project: HBase > Issue Type: Bug >Reporter: Andrew Purtell >Assignee: Andrew Purtell > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15 > > Attachments: HBASE-13744-0.98.patch > > > TestCorruptedRegionStoreFile#testLosingFileAfterScannerInit is failing on > recent Jenkins 0.98 builds and I can reproduce it with a few runs locally, > though not every time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14538) Remove TestVisibilityLabelsWithDistributedLogReplay, a test for an unsupported feature
[ https://issues.apache.org/jira/browse/HBASE-14538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942020#comment-14942020 ] Hudson commented on HBASE-14538: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1092 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1092/]) HBASE-14538 Remove TestVisibilityLabelsWithDistributedLogReplay, a test (stack: rev 5c57975d6ed0a8141b150b040c66d0a590cb9dcf) * hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithDistributedLogReplay.java > Remove TestVisibilityLabelsWithDistributedLogReplay, a test for an > unsupported feature > -- > > Key: HBASE-14538 > URL: https://issues.apache.org/jira/browse/HBASE-14538 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 1.0.3, 1.1.3, 0.98.16 > > > Remove tests that do DLR. I saw one hang over on branch 0.98 test just now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14543) Have findHangingTests.py dump more info
[ https://issues.apache.org/jira/browse/HBASE-14543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941902#comment-14941902 ] Hudson commented on HBASE-14543: FAILURE: Integrated in HBase-TRUNK #6865 (See [https://builds.apache.org/job/HBase-TRUNK/6865/]) HBASE-14543 Have findHangingTests.py dump more info (stack: rev 26dec4c60d60a868dccd28aabd06b16302491b1b) * dev-support/findHangingTests.py > Have findHangingTests.py dump more info > --- > > Key: HBASE-14543 > URL: https://issues.apache.org/jira/browse/HBASE-14543 > Project: HBase > Issue Type: Sub-task > Components: tooling >Reporter: stack >Assignee: stack > Fix For: 2.0.0 > > Attachments: 14543.patch > > > Running dump hanging tests, you can get a result that says no hanging tests > and no test failures but the patch may not have applied or the hangs may be > because the test was killed. Would be good to know what machine we were > running on, what branch, and what patch, when we run the tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12790) Support fairness across parallelized scans
[ https://issues.apache.org/jira/browse/HBASE-12790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941919#comment-14941919 ] Andrew Purtell commented on HBASE-12790: I'm sorry but this doesn't help. Looks like output captured from one test run? What would help is a few runs of the test, i.e. 10, with statistics average/min/max/p99 provided for the measured point and count query running times over all of the test runs with and without the patch. /cc [~giacomotaylor] [~lhofhansl] > Support fairness across parallelized scans > -- > > Key: HBASE-12790 > URL: https://issues.apache.org/jira/browse/HBASE-12790 > Project: HBase > Issue Type: New Feature >Reporter: James Taylor >Assignee: ramkrishna.s.vasudevan > Labels: Phoenix > Attachments: AbstractRoundRobinQueue.java, HBASE-12790.patch, > HBASE-12790_1.patch, HBASE-12790_5.patch, HBASE-12790_callwrapper.patch, > HBASE-12790_trunk_1.patch > > > Some HBase clients parallelize the execution of a scan to reduce latency in > getting back results. This can lead to starvation with a loaded cluster and > interleaved scans, since the RPC queue will be ordered and processed on a > FIFO basis. For example, if there are two clients, A & B that submit largish > scans at the same time. Say each scan is broken down into 100 scans by the > client (broken down into equal depth chunks along the row key), and the 100 > scans of client A are queued first, followed immediately by the 100 scans of > client B. In this case, client B will be starved out of getting any results > back until the scans for client A complete. > One solution to this is to use the attached AbstractRoundRobinQueue instead > of the standard FIFO queue. The queue to be used could be (maybe it already > is) configurable based on a new config parameter. Using this queue would > require the client to have the same identifier for all of the 100 parallel > scans that represent a single logical scan from the clients point of view. > With this information, the round robin queue would pick off a task from the > queue in a round robin fashion (instead of a strictly FIFO manner) to prevent > starvation over interleaved parallelized scans. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14545) TestMasterFailover often times out
[ https://issues.apache.org/jira/browse/HBASE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14545: -- Status: Patch Available (was: Open) > TestMasterFailover often times out > -- > > Key: HBASE-14545 > URL: https://issues.apache.org/jira/browse/HBASE-14545 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 1.2.0 >Reporter: Mikhail Antonov > Fix For: 1.2.0 > > Attachments: 14545.txt > > > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 301.644 sec > <<< FAILURE! - in org.apache.hadoop.hbase.master.TestMasterFailover > testMasterFailoverWithMockedRIT(org.apache.hadoop.hbase.master.TestMasterFailover) > Time elapsed: 240.112 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 24 > milliseconds > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146) > at > org.apache.hadoop.hbase.MiniHBaseCluster.waitForActiveAndReadyMaster(MiniHBaseCluster.java:535) > at > org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:280) > at > org.apache.hadoop.hbase.master.TestMasterFailover.testMasterFailoverWithMockedRIT(TestMasterFailover.java:400) > Results : > Tests in error: > TestMasterFailover.testMasterFailoverWithMockedRIT:400 » TestTimedOut test > tim... > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14545) TestMasterFailover often times out
[ https://issues.apache.org/jira/browse/HBASE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-14545: -- Attachment: 14545.txt [~mantonov] Does this fix it for you? You seeing the concurrentmodification in your logs? I noticed this failing last few runs... > TestMasterFailover often times out > -- > > Key: HBASE-14545 > URL: https://issues.apache.org/jira/browse/HBASE-14545 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 1.2.0 >Reporter: Mikhail Antonov > Fix For: 1.2.0 > > Attachments: 14545.txt > > > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 301.644 sec > <<< FAILURE! - in org.apache.hadoop.hbase.master.TestMasterFailover > testMasterFailoverWithMockedRIT(org.apache.hadoop.hbase.master.TestMasterFailover) > Time elapsed: 240.112 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 24 > milliseconds > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146) > at > org.apache.hadoop.hbase.MiniHBaseCluster.waitForActiveAndReadyMaster(MiniHBaseCluster.java:535) > at > org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:280) > at > org.apache.hadoop.hbase.master.TestMasterFailover.testMasterFailoverWithMockedRIT(TestMasterFailover.java:400) > Results : > Tests in error: > TestMasterFailover.testMasterFailoverWithMockedRIT:400 » TestTimedOut test > tim... > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14499) Master coprocessors shutdown will not happen on master abort
[ https://issues.apache.org/jira/browse/HBASE-14499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14499: --- Hadoop Flags: Reviewed Fix Version/s: 2.0.0 > Master coprocessors shutdown will not happen on master abort > > > Key: HBASE-14499 > URL: https://issues.apache.org/jira/browse/HBASE-14499 > Project: HBase > Issue Type: Bug > Components: master >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar > Fix For: 2.0.0 > > Attachments: HBASE-14499.patch, HBASE-14499.patch > > > In HMaster, > {code} > @Override > public void abort(final String msg, final Throwable t) { > if (isAborted() || isStopped()) { > return; > } > if (cpHost != null) { > // HBASE-4014: dump a list of loaded coprocessors. > LOG.fatal("Master server abort: loaded coprocessors are: " + > getLoadedCoprocessors()); > } > if (t != null) LOG.fatal(msg, t); > stop(msg); > } > {code} > Here we are invoking stop(...) of HRegionServer, which will try to stop RS > coprocessors if rsHost is not NULL. > So Master coprocessors will not be stopped. We should invoke stopMaster() > instead of stop(...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14545) TestMasterFailover often times out
[ https://issues.apache.org/jira/browse/HBASE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941942#comment-14941942 ] Mikhail Antonov commented on HBASE-14545: - Yep! CME is exactly what i see - 2015-10-02 15:48:08,665 DEBUG [10.1.4.219:53572.activeMasterManager] regionserver.HRegionFileSystem(201): No StoreFiles for: hdfs://localhost:53425/user/mantonov/test-data/708ebe97-f5fe-414a-befa-bb9e21d602d0/data/default/tableWithMergingRegions/13be9d6cb1452830f14e71340681e978/family 2015-10-02 15:48:08,666 FATAL [10.1.4.219:53572.activeMasterManager] master.HMaster$1(1656): Failed to become active master java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1067) at org.apache.hadoop.hbase.master.balancer.RegionLocationFinder.scheduleFullRefresh(RegionLocationFinder.java:160) at org.apache.hadoop.hbase.master.balancer.RegionLocationFinder.setClusterStatus(RegionLocationFinder.java:133) This call to region states is used in RegionLocationFinder and in AM#processDeadServersAndRegionsInTransition, wonder why we didn't see it in there before. Let met try the patch. > TestMasterFailover often times out > -- > > Key: HBASE-14545 > URL: https://issues.apache.org/jira/browse/HBASE-14545 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 1.2.0 >Reporter: Mikhail Antonov > Fix For: 1.2.0 > > Attachments: 14545.txt > > > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 301.644 sec > <<< FAILURE! - in org.apache.hadoop.hbase.master.TestMasterFailover > testMasterFailoverWithMockedRIT(org.apache.hadoop.hbase.master.TestMasterFailover) > Time elapsed: 240.112 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 24 > milliseconds > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146) > at > org.apache.hadoop.hbase.MiniHBaseCluster.waitForActiveAndReadyMaster(MiniHBaseCluster.java:535) > at > org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:280) > at > org.apache.hadoop.hbase.master.TestMasterFailover.testMasterFailoverWithMockedRIT(TestMasterFailover.java:400) > Results : > Tests in error: > TestMasterFailover.testMasterFailoverWithMockedRIT:400 » TestTimedOut test > tim... > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14545) TestMasterFailover often times out
[ https://issues.apache.org/jira/browse/HBASE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941943#comment-14941943 ] stack commented on HBASE-14545: --- Ok. Good. Trying to figure why this a branch-1 only issue. master seems same. > TestMasterFailover often times out > -- > > Key: HBASE-14545 > URL: https://issues.apache.org/jira/browse/HBASE-14545 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 1.2.0 >Reporter: Mikhail Antonov > Fix For: 1.2.0 > > Attachments: 14545.txt > > > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 301.644 sec > <<< FAILURE! - in org.apache.hadoop.hbase.master.TestMasterFailover > testMasterFailoverWithMockedRIT(org.apache.hadoop.hbase.master.TestMasterFailover) > Time elapsed: 240.112 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 24 > milliseconds > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146) > at > org.apache.hadoop.hbase.MiniHBaseCluster.waitForActiveAndReadyMaster(MiniHBaseCluster.java:535) > at > org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:280) > at > org.apache.hadoop.hbase.master.TestMasterFailover.testMasterFailoverWithMockedRIT(TestMasterFailover.java:400) > Results : > Tests in error: > TestMasterFailover.testMasterFailoverWithMockedRIT:400 » TestTimedOut test > tim... > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14545) TestMasterFailover often times out
[ https://issues.apache.org/jira/browse/HBASE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941945#comment-14941945 ] stack commented on HBASE-14545: --- Patch good by you [~mantonov] ? (The change to info port seems needed to run this in eclipse... will see what patch build says) > TestMasterFailover often times out > -- > > Key: HBASE-14545 > URL: https://issues.apache.org/jira/browse/HBASE-14545 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 1.2.0 >Reporter: Mikhail Antonov > Fix For: 1.2.0 > > Attachments: 14545.txt > > > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 301.644 sec > <<< FAILURE! - in org.apache.hadoop.hbase.master.TestMasterFailover > testMasterFailoverWithMockedRIT(org.apache.hadoop.hbase.master.TestMasterFailover) > Time elapsed: 240.112 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 24 > milliseconds > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146) > at > org.apache.hadoop.hbase.MiniHBaseCluster.waitForActiveAndReadyMaster(MiniHBaseCluster.java:535) > at > org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:280) > at > org.apache.hadoop.hbase.master.TestMasterFailover.testMasterFailoverWithMockedRIT(TestMasterFailover.java:400) > Results : > Tests in error: > TestMasterFailover.testMasterFailoverWithMockedRIT:400 » TestTimedOut test > tim... > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14292) Call Me Maybe HBase links haved moved
[ https://issues.apache.org/jira/browse/HBASE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-14292: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Call Me Maybe HBase links haved moved > - > > Key: HBASE-14292 > URL: https://issues.apache.org/jira/browse/HBASE-14292 > Project: HBase > Issue Type: Bug > Components: documentation >Reporter: Robert Yokota >Assignee: Andrew Purtell >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-14292.patch, HBASE-14292.patch > > > The links to the Yammer engineering blog have moved. > Please use the following links in section 83.5. Network Consistency and > Partition Tolerance > http://old.eng.yammer.com/call-me-maybe-hbase/ > http://old.eng.yammer.com/call-me-maybe-hbase-addendum/ > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13770) Programmatic JAAS configuration option for secure zookeeper may be broken
[ https://issues.apache.org/jira/browse/HBASE-13770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941951#comment-14941951 ] Andrew Purtell commented on HBASE-13770: Thanks [~sukuna...@gmail.com], let me see about getting this in. > Programmatic JAAS configuration option for secure zookeeper may be broken > - > > Key: HBASE-13770 > URL: https://issues.apache.org/jira/browse/HBASE-13770 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.0.1, 1.1.0, 0.98.13, 1.2.0 >Reporter: Andrew Purtell >Assignee: Maddineni Sukumar > Fix For: 0.98.13 > > Attachments: HBASE-13770-0.98.patch, HBASE-13770-0.98.patch, > HBASE-13770-v1.patch, HBASE-13770-v2-0.98.patch, HBASE-13770-v2.patch, > HBASE-13770-v3-0.98.patch, HBASE-13770-v4-0.98.patch, > HBASE-13770-v4-master.patch > > > While verifying the patch fix for HBASE-13768 we were unable to successfully > test the programmatic JAAS configuration option for secure ZooKeeper > integration. Unclear if that was due to a bug or incorrect test configuration. > Update the security section of the online book with clear instructions for > setting up the programmatic JAAS configuration option for secure ZooKeeper > integration. > Verify it works. > Fix as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14547) Add more debug/trace to zk-procedure
[ https://issues.apache.org/jira/browse/HBASE-14547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941950#comment-14941950 ] stack commented on HBASE-14547: --- +1 > Add more debug/trace to zk-procedure > > > Key: HBASE-14547 > URL: https://issues.apache.org/jira/browse/HBASE-14547 > Project: HBase > Issue Type: Improvement > Components: snapshots >Affects Versions: 2.0.0, 1.2.0 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi >Priority: Trivial > Fix For: 2.0.0, 1.2.0 > > Attachments: HBASE-14547-v0.patch > > > add more debug/trace logs to the zk-procedure/online-snapshot flow -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941959#comment-14941959 ] Mikhail Antonov commented on HBASE-14367: - (for reference - flaky TestMasterFailover nailed down in HBASE-14545) > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14545) TestMasterFailover often times out
[ https://issues.apache.org/jira/browse/HBASE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941969#comment-14941969 ] stack commented on HBASE-14545: --- Git bisect says HBASE-14475 is culprit. > TestMasterFailover often times out > -- > > Key: HBASE-14545 > URL: https://issues.apache.org/jira/browse/HBASE-14545 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 1.2.0 >Reporter: Mikhail Antonov > Fix For: 1.2.0 > > Attachments: 14545.txt > > > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 301.644 sec > <<< FAILURE! - in org.apache.hadoop.hbase.master.TestMasterFailover > testMasterFailoverWithMockedRIT(org.apache.hadoop.hbase.master.TestMasterFailover) > Time elapsed: 240.112 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 24 > milliseconds > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146) > at > org.apache.hadoop.hbase.MiniHBaseCluster.waitForActiveAndReadyMaster(MiniHBaseCluster.java:535) > at > org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:280) > at > org.apache.hadoop.hbase.master.TestMasterFailover.testMasterFailoverWithMockedRIT(TestMasterFailover.java:400) > Results : > Tests in error: > TestMasterFailover.testMasterFailoverWithMockedRIT:400 » TestTimedOut test > tim... > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14475) Region split requests are always audited with "hbase" user rather than request user
[ https://issues.apache.org/jira/browse/HBASE-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941972#comment-14941972 ] stack commented on HBASE-14475: --- Ignore above. Just tried again without this patch in place and it still fails. It must be intermittent. > Region split requests are always audited with "hbase" user rather than > request user > --- > > Key: HBASE-14475 > URL: https://issues.apache.org/jira/browse/HBASE-14475 > Project: HBase > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Ted Yu > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: 14475-0.98.txt, 14475-branch-1-v2.txt, > 14475-branch-1-v3.txt, 14475-v2.txt, 14475-v3.txt, 14475-v3.txt, > HBASE-14475-branch-1.0.patch > > > [~madhan.neethiraj] from Ranger reported that when a region split request is > initiated from the user, we always audit (and do the permission check) > against the hbase user, not the request user. > The issue is that a split request that is coming from the user is only > processed at a later time from the CompactSplitThread asynchronously to the > splitRegion RPC. > RSRpcServices.splitRegion() only does a flush from the handler thread and > then calls regionServer.compactSplitThread.requestSplit() which puts a > SplitRequest to the split queue. The split request is handled by the split > executor from CompactSplitThread. > Since the split is actually executed from the compact split thread, the > preSplit() for the AccessController is called from the executor thread. In > this thread, we no longer have the user who initially requested the split, so > the user in the context (UGI) is "hbase", causing the AC.preSplit() access > control check to be always be performed against the hbase user, not the user > who have submitted the request. The audit log also contains "hbase" user > rather than the actual user. > Luckily, the split forces a flush to the region in-line (from the handler > thread), which requires a {{CREATE|ADMIN}} permission. split requires > {{ADMIN}}, but due to this bug {{CREATE}} is also sufficient (although we > have not verified it manually). {{CREATE}} permission can do flush and > compactions, so this is not a security issue (I think). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14545) TestMasterFailover often times out
[ https://issues.apache.org/jira/browse/HBASE-14545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941975#comment-14941975 ] stack commented on HBASE-14545: --- Git bisect seems to be lying. I suppose its stumped by the intermittent nature of failure. Giving up on trying to find where we introduced the prob... Looking at the code, it seems susceptible anyways so we need this fix. Waiting on hadoop qa. > TestMasterFailover often times out > -- > > Key: HBASE-14545 > URL: https://issues.apache.org/jira/browse/HBASE-14545 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 1.2.0 >Reporter: Mikhail Antonov > Fix For: 1.2.0 > > Attachments: 14545.txt > > > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 301.644 sec > <<< FAILURE! - in org.apache.hadoop.hbase.master.TestMasterFailover > testMasterFailoverWithMockedRIT(org.apache.hadoop.hbase.master.TestMasterFailover) > Time elapsed: 240.112 sec <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 24 > milliseconds > at java.lang.Thread.sleep(Native Method) > at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:146) > at > org.apache.hadoop.hbase.MiniHBaseCluster.waitForActiveAndReadyMaster(MiniHBaseCluster.java:535) > at > org.apache.hadoop.hbase.HBaseCluster.waitForActiveAndReadyMaster(HBaseCluster.java:280) > at > org.apache.hadoop.hbase.master.TestMasterFailover.testMasterFailoverWithMockedRIT(TestMasterFailover.java:400) > Results : > Tests in error: > TestMasterFailover.testMasterFailoverWithMockedRIT:400 » TestTimedOut test > tim... > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14519) Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that can hang
[ https://issues.apache.org/jira/browse/HBASE-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941977#comment-14941977 ] Hudson commented on HBASE-14519: FAILURE: Integrated in HBase-1.3 #226 (See [https://builds.apache.org/job/HBase-1.3/226/]) HBASE-14519 Purge TestFavoredNodeAssignmentHelper, a test for an (stack: rev 54dc20e920318b98d661ccd567995d704ddbe555) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestFavoredNodeAssignmentHelper.java > Purge TestFavoredNodeAssignmentHelper, a test for an abandoned feature that > can hang > > > Key: HBASE-14519 > URL: https://issues.apache.org/jira/browse/HBASE-14519 > Project: HBase > Issue Type: Sub-task > Components: test >Reporter: stack >Assignee: stack > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 0.98.16 > > Attachments: 14519.txt, 14519v2.txt > > > It came in here: > commit 7a7ab8b8da795177f42e434b1ab1b468e5cd035a > Author: Devaraj Das> Date: Sun May 12 06:47:39 2013 + > HBASE-7932. Introduces Favored Nodes for region files. Adds a balancer > called FavoredNodeLoadBalancer that will honor favored nodes in the process > of balancing but the balance operation is currently a no-op (Devaraj Das) > git-svn-id: https://svn.apache.org/repos/asf/hbase/trunk@1481476 > 13f79535-47bb-0310-9956-ffa450edef68 > I've already purged the other test that came in on this patch... over in > HBASE-14486 > The test hung here: > https://builds.apache.org/job/PreCommit-HBASE-Build/15823//console > ... though we seemed to have exited abnormally. > Will let this issue hang around a while in case someone disagrees on removal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14367) Add normalization support to shell
[ https://issues.apache.org/jira/browse/HBASE-14367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941991#comment-14941991 ] Mikhail Antonov commented on HBASE-14367: - Tested on 3 node cluster, newly added commands work fine. > Add normalization support to shell > -- > > Key: HBASE-14367 > URL: https://issues.apache.org/jira/browse/HBASE-14367 > Project: HBase > Issue Type: Bug > Components: Balancer, shell >Affects Versions: 1.1.2 >Reporter: Lars George >Assignee: Mikhail Antonov > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14367-branch-1.2.v1.patch, > HBASE-14367-branch-1.2.v2.patch, HBASE-14367-branch-1.2.v3.patch, > HBASE-14367.patch > > > https://issues.apache.org/jira/browse/HBASE-13103 adds support for setting a > normalization flag per {{HTableDescriptor}}, along with the server side chore > to do the work. > What is lacking is to easily set this from the shell, right now you need to > use the Java API to modify the descriptor. This issue is to add the flag as a > known attribute key and/or other means to toggle this per table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12769) Replication fails to delete all corresponding zk nodes when peer is removed
[ https://issues.apache.org/jira/browse/HBASE-12769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942005#comment-14942005 ] Hadoop QA commented on HBASE-12769: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12764847/12769-v4.txt against master branch at commit 83f5663a229955b04a55b9b0d6cca71b4d597933. ATTACHMENT ID: 12764847 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + doFsck(conf, false, true, false, false, false, false, false, false, false, false, false, null), + boolean fixEmptyMetaRegionInfo, boolean fixTableLocks, Boolean fixReplication, TableName table) {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.client.TestReplicationShell.testRunShellTests(TestReplicationShell.java:35) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15861//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15861//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15861//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15861//console This message is automatically generated. > Replication fails to delete all corresponding zk nodes when peer is removed > --- > > Key: HBASE-12769 > URL: https://issues.apache.org/jira/browse/HBASE-12769 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 0.99.2 >Reporter: Jianwei Cui >Assignee: Jianwei Cui >Priority: Minor > Attachments: 12769-v2.txt, 12769-v3.txt, 12769-v4.txt, > HBASE-12769-trunk-v0.patch, HBASE-12769-trunk-v1.patch > > > When removing a peer, the client side will delete peerId under peersZNode > node; then alive region servers will be notified and delete corresponding > hlog queues under its rsZNode of replication. However, if there are failed > servers whose hlog queues have not been transferred by alive servers(this > likely happens if setting a big value to "replication.sleep.before.failover" > and lots of region servers restarted), these hlog queues won't be deleted > after the peer is removed. I think remove_peer should guarantee all > corresponding zk nodes have been removed after it completes; otherwise, if we > create a new peer with the same peerId with the removed one, there might be > unexpected data to be replicated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)