[jira] [Created] (HBASE-28157) hbck should report previously reported regions with null region location
Andrew Kyle Purtell created HBASE-28157: --- Summary: hbck should report previously reported regions with null region location Key: HBASE-28157 URL: https://issues.apache.org/jira/browse/HBASE-28157 Project: HBase Issue Type: Bug Affects Versions: 2.5.6 Reporter: Andrew Kyle Purtell Fix For: 2.6.0, 2.4.18, 3.0.0, 4.0.0-alpha-1, 2.5.7 Operators bypassed some in progress TRSPs leading to a state where some regions were persistently in transition but hidden. Because the master builds its list of regions in transition by tracking TRSP, the bypass of TRSP removed the regions from the RIT list. This was expected, but I will propose a change to RIT tracking on another issue. The online hbck chore also did not report the inconsistency. This was not expected. HBASE-28144 was another issue related to this incident, already fixed. Ensure that hbck will report as inconsistent regions where previously a location was reported but now the region location is null, if it is not expected to be offline. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28158) Decouple RIT list management from TRSP invocation
Andrew Kyle Purtell created HBASE-28158: --- Summary: Decouple RIT list management from TRSP invocation Key: HBASE-28158 URL: https://issues.apache.org/jira/browse/HBASE-28158 Project: HBase Issue Type: Bug Affects Versions: 2.5.6 Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7 Operators bypassed some in progress TRSPs leading to a state where some regions were persistently in transition but hidden. Because the master builds its list of regions in transition by tracking TRSP, the bypass of TRSP removed the regions from the RIT list. Although I can see from reading the code this is the expected behavior, it is surprising for operators and should be changed. We should only remove a region from the RIT map when assignment reaches a suitable terminal state. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28172) Update downloads.xml for release 2.5.6
Andrew Kyle Purtell created HBASE-28172: --- Summary: Update downloads.xml for release 2.5.6 Key: HBASE-28172 URL: https://issues.apache.org/jira/browse/HBASE-28172 Project: HBase Issue Type: Task Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28172) Update downloads.xml for release 2.5.6
[ https://issues.apache.org/jira/browse/HBASE-28172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28172. - Resolution: Fixed > Update downloads.xml for release 2.5.6 > -- > > Key: HBASE-28172 > URL: https://issues.apache.org/jira/browse/HBASE-28172 > Project: HBase > Issue Type: Task >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28178) Upgrade ZooKeeper on all branches for CVE-2023-44981
Andrew Kyle Purtell created HBASE-28178: --- Summary: Upgrade ZooKeeper on all branches for CVE-2023-44981 Key: HBASE-28178 URL: https://issues.apache.org/jira/browse/HBASE-28178 Project: HBase Issue Type: Task Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7 CVE-2023-44981 is a high scoring (9.1/10) authorization bypass vulnerability in ZooKeeper related to SASL quorum authentication. The bug is fixed in versions 3.7.2, 3.8.3, and 3.9.1. Upgrade ZK versions on all active branches. At least 3.7.2. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28267) create-release should run spotless
Andrew Kyle Purtell created HBASE-28267: --- Summary: create-release should run spotless Key: HBASE-28267 URL: https://issues.apache.org/jira/browse/HBASE-28267 Project: HBase Issue Type: Task Reporter: Andrew Kyle Purtell Before committing generated files like CHANGES.md and RELEASENOTES.md we should run 'mvn spotless:apply' first to ensure what is committed is formatted per our rules and will not be modified when someone invokes spotless later. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28282) Update downloads.xml for release 2.5.7
Andrew Kyle Purtell created HBASE-28282: --- Summary: Update downloads.xml for release 2.5.7 Key: HBASE-28282 URL: https://issues.apache.org/jira/browse/HBASE-28282 Project: HBase Issue Type: Task Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28282) Update downloads.xml for release 2.5.7
[ https://issues.apache.org/jira/browse/HBASE-28282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28282. - Resolution: Fixed > Update downloads.xml for release 2.5.7 > -- > > Key: HBASE-28282 > URL: https://issues.apache.org/jira/browse/HBASE-28282 > Project: HBase > Issue Type: Task >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27694) Exclude the older versions of netty pulling from Hadoop dependencies
[ https://issues.apache.org/jira/browse/HBASE-27694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-27694. - Fix Version/s: (was: 2.5.8) (was: 3.0.0-beta-2) (was: 2.6.1) Assignee: (was: Rajeshbabu Chintaguntla) Resolution: Won't Fix We can't fix this on our side because some Hadoop code still requires netty 3. We need to wait for HADOOP-15327 . Fix version is 3.4.0. > Exclude the older versions of netty pulling from Hadoop dependencies > > > Key: HBASE-27694 > URL: https://issues.apache.org/jira/browse/HBASE-27694 > Project: HBase > Issue Type: Bug >Reporter: Rajeshbabu Chintaguntla >Priority: Major > > Currently the netty version of 3.10.6 is getting pulled from hdfs > dependencies and sonatype kind of tools reporting the CVEs in HBase. To get > rid of this better to exclude netty where hdfs or mapred client jars used. > * org.apache.hbase : hbase-it : jar : tests : 2.5.2 > ** org.apache.hadoop : hadoop-mapreduce-client-core : 3.2.2 > *** io.netty : netty : 3.10.6.final > ** org.apache.hbase : hbase-endpoint : 2.5.2 > *** org.apache.hadoop : hadoop-hdfs : jar : tests : 3.2.2 > io.netty : netty : 3.10.6.final > *** org.apache.hadoop : hadoop-hdfs : 3.2.2 > io.netty : netty : 3.10.6.final > * org.apache.hadoop : hadoop-mapreduce-client-jobclient : 3.2.2 > ** io.netty : netty : 3.10.6.final > ** org.apache.hadoop : hadoop-mapreduce-client-common : 3.2.2 > *** io.netty : netty : 3.10.6.final > * org.apache.hadoop : hadoop-mapreduce-client-jobclient : jar : tests : 3.2.2 > ** io.netty : netty : 3.10.6.final > * org.apache.hadoop : hadoop-mapreduce-client-hs : 3.2.2 > ** io.netty : netty : 3.10.6.final > ** org.apache.hadoop : hadoop-mapreduce-client-app : 3.2.2 > *** io.netty : netty : 3.10.6.final > *** org.apache.hadoop : hadoop-mapreduce-client-shuffle : 3.2.2 > io.netty : netty : 3.10.6.final -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28441) Update downloads.xml for 2.5.8
Andrew Kyle Purtell created HBASE-28441: --- Summary: Update downloads.xml for 2.5.8 Key: HBASE-28441 URL: https://issues.apache.org/jira/browse/HBASE-28441 Project: HBase Issue Type: Task Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28441) Update downloads.xml for 2.5.8
[ https://issues.apache.org/jira/browse/HBASE-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28441. - Resolution: Fixed > Update downloads.xml for 2.5.8 > -- > > Key: HBASE-28441 > URL: https://issues.apache.org/jira/browse/HBASE-28441 > Project: HBase > Issue Type: Task >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28506) Remove hbase-compression-xz
Andrew Kyle Purtell created HBASE-28506: --- Summary: Remove hbase-compression-xz Key: HBASE-28506 URL: https://issues.apache.org/jira/browse/HBASE-28506 Project: HBase Issue Type: Task Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 2.6.0, 3.0.0-beta-2 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28507) Deprecate hbase-compression-xz
Andrew Kyle Purtell created HBASE-28507: --- Summary: Deprecate hbase-compression-xz Key: HBASE-28507 URL: https://issues.apache.org/jira/browse/HBASE-28507 Project: HBase Issue Type: Sub-task Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 2.5.9 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-25972) Dual File Compaction
[ https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-25972. - Fix Version/s: 2.6.1 2.5.9 Hadoop Flags: Reviewed Release Note: The default compactor in HBase compacts HFiles into one file. This change introduces a new store file writer which writes the retained cells by compaction into two files, which will be called DualFileWriter. One of these files will include the live cells. This file will be called a live-version file. The other file will include the rest of the cells, that is, historical versions. This file will be called a historical-version file. DualFileWriter will work with the default compactor. The historical files will not be read for the scans scanning latest row versions. This eliminates scanning unnecessary cell versions in compacted files and thus it is expected to improve performance of these scans. Resolution: Fixed > Dual File Compaction > > > Key: HBASE-25972 > URL: https://issues.apache.org/jira/browse/HBASE-25972 > Project: HBase > Issue Type: Improvement >Reporter: Kadir Ozdemir >Assignee: Kadir Ozdemir >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9 > > > HBase stores tables row by row in its files, HFiles. An HFile is composed of > blocks. The number of rows stored in a block depends on the row sizes. The > number of rows per block gets lower when rows get larger on disk due to > multiple row versions since HBase stores all row versions sequentially in the > same HFile after compaction. However, applications (e.g., Phoenix) mostly > query the most recent row versions. > The default compactor in HBase compacts HFiles into one file. This Jira > introduces a new store file writer which writes the retained cells by > compaction into two files, which will be called DualFileWriter. One of these > files will include the live cells. This file will be called a live-version > file. The other file will include the rest of the cells, that is, historical > versions. This file will be called a historical-version file. DualFileWriter > will work with the default compactor. > The historical files will not be read for the scans scanning latest row > versions. This eliminates scanning unnecessary cell versions in compacted > files and thus it is expected to improve performance of these scans. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-25244) Support splitting a region into N parts at a time
[ https://issues.apache.org/jira/browse/HBASE-25244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-25244. - Assignee: (was: zhuqi) Resolution: Duplicate While not exactly the same proposal, this issue is duplicated by HBASE-28438 and this one has had no activity. > Support splitting a region into N parts at a time > - > > Key: HBASE-25244 > URL: https://issues.apache.org/jira/browse/HBASE-25244 > Project: HBase > Issue Type: New Feature > Components: regionserver >Reporter: zhuqi >Priority: Major > > In the current reference file format, only one parent region split into two > references can be recorded. At this time, if you want to continue splitting > the daughter region, you must wait until the majorCompaction is over and the > reference file is deleted before you can continue to split the region. > If the reference file can point to other refenrence files, which data has not > been moved from the parentRegion to the region under the corresponding > folder, thereby establishing a multi-level reference. At this time, a tree > structure is formed. Only the root contains physical data, and the region on > the leaf node region is serving. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27635) Shutdown zookeeper logs coming via ReadOnlyZKClient when hbase shell started
[ https://issues.apache.org/jira/browse/HBASE-27635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-27635. - Fix Version/s: (was: 3.0.0-beta-2) (was: 2.6.1) (was: 2.5.9) Assignee: (was: Rajeshbabu Chintaguntla) Resolution: Not A Problem > Shutdown zookeeper logs coming via ReadOnlyZKClient when hbase shell started > > > Key: HBASE-27635 > URL: https://issues.apache.org/jira/browse/HBASE-27635 > Project: HBase > Issue Type: Improvement > Components: shell >Reporter: Rajeshbabu Chintaguntla >Priority: Major > > When hbase shell with HBase 2.5.2 started there is too much logging of zk > connection realated, classpaths etc. Even though we enabled ERROR log level > for zookeeper package. > {noformat} > 2023-02-10 17:34:25,211 INFO > [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] > zookeeper.ZooKeeper: Client > environment:zookeeper.version=3.5.9-5-a433770fc7b303332f10174221799495a26bbca2, > built on 02/07/2023 13:02 GMT > 2023-02-10 17:34:25,212 INFO > [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] > zookeeper.ZooKeeper: Client environment:host.name=host1 > 2023-02-10 17:34:25,212 INFO > [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181:2181@0x15c16f19] > zookeeper.ZooKeeper: Client environment:java.version=1.8.0_352 > 2023-02-10 17:34:25,212 INFO > [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] > zookeeper.ZooKeeper: Client environment:java.vendor=Red Hat, Inc. > 2023-02-10 17:34:25,212 INFO > [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] > zookeeper.ZooKeeper: Client > environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre > {noformat} > Better to change the org.apache.hadoop.hbase.zookeeper package log level to > error. > {noformat} > # Set logging level to avoid verboseness > org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.zookeeper', > log_level) > org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.hadoop', > log_level) > org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.hadoop.hbase.zookeeper', > log_level) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28652) Backport HBASE-21785 master reports open regions as RITs and also messes up rit age metric
[ https://issues.apache.org/jira/browse/HBASE-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28652. - Resolution: Fixed > Backport HBASE-21785 master reports open regions as RITs and also messes up > rit age metric > -- > > Key: HBASE-28652 > URL: https://issues.apache.org/jira/browse/HBASE-28652 > Project: HBase > Issue Type: Sub-task >Reporter: Szucs Villo >Assignee: Szucs Villo >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 2.6.1, 2.5.9 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28661) Fix compatibility issue in SecurityHeadersFilter in branch-2.x
[ https://issues.apache.org/jira/browse/HBASE-28661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28661. - Hadoop Flags: Reviewed Resolution: Fixed > Fix compatibility issue in SecurityHeadersFilter in branch-2.x > -- > > Key: HBASE-28661 > URL: https://issues.apache.org/jira/browse/HBASE-28661 > Project: HBase > Issue Type: Task >Reporter: Szucs Villo >Assignee: Szucs Villo >Priority: Major > Labels: pull-request-available > Fix For: 2.7.0, 2.6.1, 2.5.9 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-26092) JVM core dump in the replication path
[ https://issues.apache.org/jira/browse/HBASE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-26092. - Resolution: Duplicate > JVM core dump in the replication path > - > > Key: HBASE-26092 > URL: https://issues.apache.org/jira/browse/HBASE-26092 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.3.5 >Reporter: Huaxiang Sun >Priority: Critical > > When replication is turned on, we found the following code dump in the region > server. > I checked the code dump for replication. I think I got some ideas. For > replication, when RS receives walEdits from remote cluster, it needs to send > them out to final RS. In this case, NettyRpcConnection is deployed, calls are > queued while it refers to ByteBuffer in the context of replicationHandler > (returned to the pool once it returns). Code dump will happen since the > byteBuffer has been reused. Needs ref count in this asynchronous processing. > > Feel free to take it, otherwise, I will try to work on a patch later. > > > {code:java} > Stack: [0x7fb1bf039000,0x7fb1bf13a000], sp=0x7fb1bf138560, free > space=1021k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > J 28175 C2 > org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I > (21 bytes) @ 0x7fd2663c [0x7fd263c0+0x27c] > J 14912 C2 > org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.writeRequest(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Lorg/apache/hadoop/hbase/ipc/Call;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (370 bytes) @ 0x7fdbbb94b590 [0x7fdbbb949c00+0x1990] > J 14911 C2 > org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (30 bytes) @ 0x7fdbb972d1d4 [0x7fdbb972d1a0+0x34] > J 30476 C2 > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V > (149 bytes) @ 0x7fdbbd4e7084 [0x7fdbbd4e6900+0x784] > J 14914 C2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$6$1.run()V (22 > bytes) @ 0x7fdbbb9344ec [0x7fdbbb934280+0x26c] > J 23528 C2 > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z > (106 bytes) @ 0x7fdbbcbb0efc [0x7fdbbcbb0c40+0x2bc] > J 15987% C2 > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (461 > bytes) @ 0x7fdbbbaf1580 [0x7fdbbbaf1360+0x220] > j > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44 > j > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run()V+11 > j > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run()V+4 > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28739) Update downloads.xml for 2.5.9
Andrew Kyle Purtell created HBASE-28739: --- Summary: Update downloads.xml for 2.5.9 Key: HBASE-28739 URL: https://issues.apache.org/jira/browse/HBASE-28739 Project: HBase Issue Type: Task Reporter: Andrew Kyle Purtell -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28739) Update downloads.xml for 2.5.9
[ https://issues.apache.org/jira/browse/HBASE-28739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28739. - Resolution: Fixed > Update downloads.xml for 2.5.9 > -- > > Key: HBASE-28739 > URL: https://issues.apache.org/jira/browse/HBASE-28739 > Project: HBase > Issue Type: Task >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28740) Need to call parent class's serialization methods in CloseExcessRegionReplicasProcedure
[ https://issues.apache.org/jira/browse/HBASE-28740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28740. - Hadoop Flags: Reviewed Resolution: Fixed > Need to call parent class's serialization methods in > CloseExcessRegionReplicasProcedure > --- > > Key: HBASE-28740 > URL: https://issues.apache.org/jira/browse/HBASE-28740 > Project: HBase > Issue Type: Bug > Components: proc-v2 >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Blocker > Labels: pull-request-available > Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.10 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28755) Update downloads.xml for 2.5.10
Andrew Kyle Purtell created HBASE-28755: --- Summary: Update downloads.xml for 2.5.10 Key: HBASE-28755 URL: https://issues.apache.org/jira/browse/HBASE-28755 Project: HBase Issue Type: Task Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-28755) Update downloads.xml for 2.5.10
[ https://issues.apache.org/jira/browse/HBASE-28755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-28755. - Resolution: Fixed > Update downloads.xml for 2.5.10 > --- > > Key: HBASE-28755 > URL: https://issues.apache.org/jira/browse/HBASE-28755 > Project: HBase > Issue Type: Task >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HBASE-23054) Remove synchronization block from MetaTableMetrics and fix LossyCounting algorithm
[ https://issues.apache.org/jira/browse/HBASE-23054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell reopened HBASE-23054: - > Remove synchronization block from MetaTableMetrics and fix LossyCounting > algorithm > -- > > Key: HBASE-23054 > URL: https://issues.apache.org/jira/browse/HBASE-23054 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.5 >Reporter: Ankit Singhal >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2 > > Attachments: HBASE-23054.master.001.patch, > HBASE-23054.master.002.patch > > > While trying to use LossyCounting for HBASE-15519 , found following bugs in > current implementation > – Remove synchronization block from MetaTableMetrics to avoid congestion at > the code > – Fix license format > – Fix LossyCounting algorithm as per > [http://www.vldb.org/conf/2002/S10P03.pdf > |http://www.vldb.org/conf/2002/S10P03.pdf] > -- Avoid doing sweep on every insert in LossyCounting > – Remove extra redundant data structures from MetaTableMetrics. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-22988) Backport HBASE-11062 "hbtop" to branch-1
[ https://issues.apache.org/jira/browse/HBASE-22988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-22988. - Fix Version/s: 1.4.11 1.3.6 Hadoop Flags: Reviewed Assignee: Toshihiro Suzuki (was: Andrew Kyle Purtell) Resolution: Fixed > Backport HBASE-11062 "hbtop" to branch-1 > > > Key: HBASE-22988 > URL: https://issues.apache.org/jira/browse/HBASE-22988 > Project: HBase > Issue Type: Sub-task > Components: backport, hbtop >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 1.5.0, 1.3.6, 1.4.11 > > Attachments: HBASE-22988-branch-1.patch > > > Backport parent issue to branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23101) Backport HBASE-22380 to branch-1
[ https://issues.apache.org/jira/browse/HBASE-23101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-23101. - Fix Version/s: 1.4.11 1.3.6 Hadoop Flags: Reviewed Resolution: Fixed > Backport HBASE-22380 to branch-1 > > > Key: HBASE-23101 > URL: https://issues.apache.org/jira/browse/HBASE-23101 > Project: HBase > Issue Type: Sub-task >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Blocker > Fix For: 1.5.0, 1.3.6, 1.4.11 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23116) LoadBalancer should log table name when balancing per table
Andrew Kyle Purtell created HBASE-23116: --- Summary: LoadBalancer should log table name when balancing per table Key: HBASE-23116 URL: https://issues.apache.org/jira/browse/HBASE-23116 Project: HBase Issue Type: Improvement Affects Versions: 1.5.0 Reporter: Andrew Kyle Purtell Fix For: 3.0.0, 1.5.0, 2.3.0, 1.3.6, 1.4.11, 2.1.7, 2.2.2 The load balancer logs lines like these: {noformat} 2019-10-02 23:18:47,664 INFO [46493_ChoreService_6] balancer.StochasticLoadBalancer - Skipping load balancing because balanced cluster; total cost is 46.68964334022376, sum multiplier is 1087.0 min cost which need balance is 0.05 {noformat} When balancing per table it would be useful if the table name was also printed in the log line. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23116) LoadBalancer should log table name when balancing per table
[ https://issues.apache.org/jira/browse/HBASE-23116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-23116. - Hadoop Flags: Reviewed Resolution: Fixed > LoadBalancer should log table name when balancing per table > --- > > Key: HBASE-23116 > URL: https://issues.apache.org/jira/browse/HBASE-23116 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.4.10, 2.2.1, 2.1.6 >Reporter: Andrew Kyle Purtell >Assignee: Bharath Vissapragada >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0, 1.4.11, 2.2.2, 2.1.8 > > > The load balancer logs lines like these: > {noformat} > 2019-10-02 23:18:47,664 INFO [46493_ChoreService_6] > balancer.StochasticLoadBalancer - Skipping load balancing because balanced > cluster; total cost is 46.68964334022376, sum multiplier is 1087.0 min cost > which need balance is 0.05 > {noformat} > When balancing per table it would be useful if the table name was also > printed in the log line. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23128) Restore Region interface compatibility
Andrew Kyle Purtell created HBASE-23128: --- Summary: Restore Region interface compatibility Key: HBASE-23128 URL: https://issues.apache.org/jira/browse/HBASE-23128 Project: HBase Issue Type: Bug Reporter: Andrew Kyle Purtell Adding methods to a Public interface is ok for a minor release, removing methods is not. We need to restore abstract method boolean bulkLoadHFiles ( Collection>, boolean, Region.BulkLoadListener ) in order to maintain binary compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23128) Restore Region interface compatibility
[ https://issues.apache.org/jira/browse/HBASE-23128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-23128. - Resolution: Fixed > Restore Region interface compatibility > --- > > Key: HBASE-23128 > URL: https://issues.apache.org/jira/browse/HBASE-23128 > Project: HBase > Issue Type: Bug >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Blocker > Fix For: 1.5.0 > > > Adding methods to a Public interface is ok for a minor release, removing > methods is not. We need to restore > {code} > abstract method boolean bulkLoadHFiles ( > Collection>, boolean, Region.BulkLoadListener) > {code} > to the Region interface in order to maintain binary compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23139) MapReduce jobs lauched from convenience distribution are nonfunctional
Andrew Kyle Purtell created HBASE-23139: --- Summary: MapReduce jobs lauched from convenience distribution are nonfunctional Key: HBASE-23139 URL: https://issues.apache.org/jira/browse/HBASE-23139 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 1.5.0 Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 1.3.6, 1.4.11, 1.5.0 CNFE thirdparty GSON, need to add thirdparty jar to job deps. {noformat} Error: java.lang.ClassNotFoundException: org.apache.hbase.thirdparty.com.google.gson.GsonBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.hadoop.hbase.util.GsonUtil.createGson(GsonUtil.java:44) at org.apache.hadoop.hbase.util.JsonMapper.(JsonMapper.java:37) at org.apache.hadoop.hbase.client.Operation.toJSON(Operation.java:70) at org.apache.hadoop.hbase.client.Operation.toString(Operation.java:96) at org.apache.hadoop.hbase.client.Operation.toString(Operation.java:110) at org.apache.hadoop.hbase.mapreduce.TableSplit.toString(TableSplit.java:368) at java.lang.String.valueOf(String.java:2994) at java.lang.StringBuilder.append(StringBuilder.java:131) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:762) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23151) Backport HBASE-23083 (Collect Executor status info periodically and report to metrics system) to branch-1
Andrew Kyle Purtell created HBASE-23151: --- Summary: Backport HBASE-23083 (Collect Executor status info periodically and report to metrics system) to branch-1 Key: HBASE-23151 URL: https://issues.apache.org/jira/browse/HBASE-23151 Project: HBase Issue Type: Sub-task Reporter: Andrew Kyle Purtell Fix For: 1.6.0, 1.5.1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23153) PrimaryRegionCountSkewCostFunction SLB function should implement CostFunction#isNeeded
Andrew Kyle Purtell created HBASE-23153: --- Summary: PrimaryRegionCountSkewCostFunction SLB function should implement CostFunction#isNeeded Key: HBASE-23153 URL: https://issues.apache.org/jira/browse/HBASE-23153 Project: HBase Issue Type: Bug Reporter: Andrew Kyle Purtell The PrimaryRegionCountSkewCostFunction SLB function should implement CostFunction#isNeeded and like the other region replica specific functions should return false for it when region replicas are not in use. Otherwise it will always report a cost if 0 even though its weight will be included in the sum of the weights. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23161) Invalid hostname tests can fail if the ISP hijacks NXDOMAIN
Andrew Kyle Purtell created HBASE-23161: --- Summary: Invalid hostname tests can fail if the ISP hijacks NXDOMAIN Key: HBASE-23161 URL: https://issues.apache.org/jira/browse/HBASE-23161 Project: HBase Issue Type: Bug Reporter: Andrew Kyle Purtell Some residential internet service providers use the opportunity of DNS record not found cases and instead of returning NXDOMAIN responses per the standard they return A records that redirect the user to their own portal or search page. This breaks tests like TestConnectionImplementation.testGetClientBadHostname and TestRegionServerHostname.testInvalidRegionServerHostnameAbortsServer. We should detect this behavior and skip these cases. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23174) Upgrade jackson and jackson-databind to 2.9.10
Andrew Kyle Purtell created HBASE-23174: --- Summary: Upgrade jackson and jackson-databind to 2.9.10 Key: HBASE-23174 URL: https://issues.apache.org/jira/browse/HBASE-23174 Project: HBase Issue Type: Improvement Reporter: Andrew Kyle Purtell Fix For: 2.3.0, 1.3.6, 1.4.11, 2.2.2, 2.1.8, 1.5.1 Two more CVEs (CVE-2019-16335 and CVE-2019-14540) are addressed in jackson-databind 2.9.10. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23151) Backport HBASE-23083 (Collect Executor status info periodically and report to metrics system) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-23151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-23151. - Resolution: Fixed Pushed as 425d84dc14, thanks [~javaman_chen] > Backport HBASE-23083 (Collect Executor status info periodically and report to > metrics system) to branch-1 > - > > Key: HBASE-23151 > URL: https://issues.apache.org/jira/browse/HBASE-23151 > Project: HBase > Issue Type: Sub-task >Reporter: Andrew Kyle Purtell >Assignee: chenxu >Priority: Minor > Fix For: 1.5.1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23206) ZK quorum redundancy with failover in RZK
Andrew Kyle Purtell created HBASE-23206: --- Summary: ZK quorum redundancy with failover in RZK Key: HBASE-23206 URL: https://issues.apache.org/jira/browse/HBASE-23206 Project: HBase Issue Type: Brainstorming Reporter: Andrew Kyle Purtell We have faced a few production issues where the reliability of the ZooKeeper quorum serving the cluster has not been as robust as expected. The most recent one was essentially ZOOKEEPER-2164 (and related: ZOOKEEPER-900). These can be mitigated by a ZK server configuration change but the incidents suggest it may be worth thinking about how to be less reliant on the service provided by a single ZK quorum instance. A solution would be holistic with several parts: - HBASE-18095 to get ZK dependencies out of the client - Related HBase replication improvements to track peer and position state in HBase tables instead of znodes - This brainstorming... For this part, we could consider the possibility that RecoverableZooKeeper (RZK) might be taught how to speak to two separate ZK quorum redundantly, and continue to offer service even if one of them is temporarily unable to provide service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23207) Log a region open journal
Andrew Kyle Purtell created HBASE-23207: --- Summary: Log a region open journal Key: HBASE-23207 URL: https://issues.apache.org/jira/browse/HBASE-23207 Project: HBase Issue Type: Improvement Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Like HBASE-22828, but for region opening. Also, tweak the calls to enableStatusJournal to pass through 'true' as parameter to include the current status in the journal, for slightly more context. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-15519) Add per-user metrics
[ https://issues.apache.org/jira/browse/HBASE-15519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell reopened HBASE-15519: - > Add per-user metrics > - > > Key: HBASE-15519 > URL: https://issues.apache.org/jira/browse/HBASE-15519 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 1.2.0 >Reporter: Enis Soztutar >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-15519.master.003.patch, hbase-15519_v0.patch, > hbase-15519_v1.patch, hbase-15519_v1.patch, hbase-15519_v2.patch > > > Per-user metrics will be useful in multi-tenant cases where we can emit > number of requests, operations, num RPCs etc at the per-user aggregate level > per regionserver. We currently have throttles per user, but no way to monitor > resource usage per-user. > Looking at these metrics, operators can adjust throttles, do capacity > planning, etc per-user. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-15519) Add per-user metrics
[ https://issues.apache.org/jira/browse/HBASE-15519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-15519. - Resolution: Fixed > Add per-user metrics > - > > Key: HBASE-15519 > URL: https://issues.apache.org/jira/browse/HBASE-15519 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 1.2.0 >Reporter: Enis Soztutar >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0, 2.3.0 > > Attachments: HBASE-15519.master.003.patch, hbase-15519_v0.patch, > hbase-15519_v1.patch, hbase-15519_v1.patch, hbase-15519_v2.patch > > > Per-user metrics will be useful in multi-tenant cases where we can emit > number of requests, operations, num RPCs etc at the per-user aggregate level > per regionserver. We currently have throttles per user, but no way to monitor > resource usage per-user. > Looking at these metrics, operators can adjust throttles, do capacity > planning, etc per-user. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23210) Backport HBASE-15519 (Add per-user metrics) to branch-1
Andrew Kyle Purtell created HBASE-23210: --- Summary: Backport HBASE-15519 (Add per-user metrics) to branch-1 Key: HBASE-23210 URL: https://issues.apache.org/jira/browse/HBASE-23210 Project: HBase Issue Type: New Feature Reporter: Andrew Kyle Purtell Fix For: 1.6.0 We will need HBASE-15519 in branch-1 for eventual backport of HBASE-23065. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23225) Error building shaded-client: duplicate entry: META-INF/.../ObjectCodec
Andrew Kyle Purtell created HBASE-23225: --- Summary: Error building shaded-client: duplicate entry: META-INF/.../ObjectCodec Key: HBASE-23225 URL: https://issues.apache.org/jira/browse/HBASE-23225 Project: HBase Issue Type: Bug Reporter: Andrew Kyle Purtell {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade (aggregate-into-a-jar-with-relocated-third-parties) on project hbase-shaded-client: Error creating shaded jar: duplicate entry: META-INF/services/org.apache.hadoop.hbase.shaded.com.fasterxml.jackson.core.ObjectCodec {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23226) Backport HBASE-22460 (Reopen a region if store reader references may have leaked) to branch-1
Andrew Kyle Purtell created HBASE-23226: --- Summary: Backport HBASE-22460 (Reopen a region if store reader references may have leaked) to branch-1 Key: HBASE-23226 URL: https://issues.apache.org/jira/browse/HBASE-23226 Project: HBase Issue Type: Sub-task Affects Versions: 1.4.11, 1.3.6, 1.5.0 Reporter: Andrew Kyle Purtell Fix For: 1.6.0 Backport parent change to branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23225) Error building shaded-client: duplicate entry: META-INF/.../ObjectCodec
[ https://issues.apache.org/jira/browse/HBASE-23225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-23225. - Resolution: Cannot Reproduce After some other work and builds returning to this to try and repo with -X was unsuccessful. Whatever may have happened before the Maven black magic voodoo is working again for me > Error building shaded-client: duplicate entry: META-INF/.../ObjectCodec > > > Key: HBASE-23225 > URL: https://issues.apache.org/jira/browse/HBASE-23225 > Project: HBase > Issue Type: Bug > Environment: $ mvn --version > Apache Maven 3.6.2 (40f52333136460af0dc0d7232c0dc0bcf0d9e117; > 2019-08-27T08:06:16-07:00) > Maven home: /usr/local/Cellar/maven/3.6.2/libexec > Java version: 1.8.0_232, vendor: Azul Systems, Inc., runtime: > /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre > Default locale: en_US, platform encoding: UTF-8 > OS name: "mac os x", version: "10.14.6", arch: "x86_64", family: "mac" >Reporter: Andrew Kyle Purtell >Priority: Major > > {noformat} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade > (aggregate-into-a-jar-with-relocated-third-parties) on project > hbase-shaded-client: > Error creating shaded jar: > duplicate entry: > META-INF/services/org.apache.hadoop.hbase.shaded.com.fasterxml.jackson.core.ObjectCodec > > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23226) Backport HBASE-22460 (Reopen a region if store reader references may have leaked) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-23226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-23226. - Fix Version/s: (was: 1.6.0) Resolution: Duplicate > Backport HBASE-22460 (Reopen a region if store reader references may have > leaked) to branch-1 > - > > Key: HBASE-23226 > URL: https://issues.apache.org/jira/browse/HBASE-23226 > Project: HBase > Issue Type: Sub-task >Affects Versions: 1.5.0, 1.3.6, 1.4.11 >Reporter: Andrew Kyle Purtell >Priority: Major > > Backport parent change to branch-1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23210) Backport HBASE-15519 (Add per-user metrics) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-23210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-23210. - Resolution: Fixed > Backport HBASE-15519 (Add per-user metrics) to branch-1 > --- > > Key: HBASE-23210 > URL: https://issues.apache.org/jira/browse/HBASE-23210 > Project: HBase > Issue Type: New Feature >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Fix For: 1.5.1 > > > We will need HBASE-15519 in branch-1 for eventual backport of HBASE-23065. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23246) Fix error prone warning in TestMetricsUserSourceImpl
Andrew Kyle Purtell created HBASE-23246: --- Summary: Fix error prone warning in TestMetricsUserSourceImpl Key: HBASE-23246 URL: https://issues.apache.org/jira/browse/HBASE-23246 Project: HBase Issue Type: Sub-task Reporter: Andrew Kyle Purtell Fix For: 1.6.0 TestMetricsUserSourceImpl.java:[50,29] [SelfComparison] An object is compared to itself -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23317) An option to fail only the region open if a coprocessor fails to load
Andrew Kyle Purtell created HBASE-23317: --- Summary: An option to fail only the region open if a coprocessor fails to load Key: HBASE-23317 URL: https://issues.apache.org/jira/browse/HBASE-23317 Project: HBase Issue Type: Improvement Reporter: Andrew Kyle Purtell If a table coprocessor fails to load, rather than aborting, throw an exception which prevents the region from opening. This will lead to unresolvable regions in transition but in some circumstances this may be preferable to process aborts. On the other hand, there would be a new risk that the failure to load is a symptom of or a cause of regionserver global state corruption that eventually leads to other problems. Should at least be an option, though. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23318) LoadTestTool doesn't start
Andrew Kyle Purtell created HBASE-23318: --- Summary: LoadTestTool doesn't start Key: HBASE-23318 URL: https://issues.apache.org/jira/browse/HBASE-23318 Project: HBase Issue Type: Bug Reporter: Andrew Kyle Purtell ./bin/hbase ltt after unpacking a binary tarball distribution doesn't start with a CNFE. We are missing the tests jar from hbase-zookeeper. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23288) Backport HBASE-23251 (Add Column Family and Table Names to HFileContext) to branch-1
[ https://issues.apache.org/jira/browse/HBASE-23288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-23288. - Fix Version/s: 1.6.0 Hadoop Flags: Reviewed Resolution: Fixed > Backport HBASE-23251 (Add Column Family and Table Names to HFileContext) to > branch-1 > > > Key: HBASE-23288 > URL: https://issues.apache.org/jira/browse/HBASE-23288 > Project: HBase > Issue Type: Sub-task >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby >Priority: Major > Fix For: 1.6.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-23085) Network and Data related Actions
[ https://issues.apache.org/jira/browse/HBASE-23085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell reopened HBASE-23085: - This commit has a terminology problem. The universal technical term for network packet is packet, not "package". > Network and Data related Actions > > > Key: HBASE-23085 > URL: https://issues.apache.org/jira/browse/HBASE-23085 > Project: HBase > Issue Type: Sub-task > Components: integration tests >Reporter: Szabolcs Bukros >Assignee: Szabolcs Bukros >Priority: Minor > Fix For: 3.0.0, 2.3.0, 2.2.3 > > > Add additional actions to: > * manipulate network packages with tc (reorder, loose,...) > * add CPU load > * fill the disk > * corrupt or delete regionserver data files > Create new monkey factories for the new actions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23569) Validate that the log cleaner actually cleans oldWALs as expected
Andrew Kyle Purtell created HBASE-23569: --- Summary: Validate that the log cleaner actually cleans oldWALs as expected Key: HBASE-23569 URL: https://issues.apache.org/jira/browse/HBASE-23569 Project: HBase Issue Type: Test Components: integration tests, master, test Reporter: Andrew Kyle Purtell Fix For: 3.0.0, 2.3.0, 1.6.0 The fix for HBASE-23287 (LogCleaner is not added to choreService) is in but we are lacking test coverage that validates that the log cleaner actually cleans oldWALs as expected. Add the test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23678) Literate builder API for version management in schema
Andrew Kyle Purtell created HBASE-23678: --- Summary: Literate builder API for version management in schema Key: HBASE-23678 URL: https://issues.apache.org/jira/browse/HBASE-23678 Project: HBase Issue Type: Improvement Reporter: Andrew Kyle Purtell Lars designed the combination of VERSIONS, TTL, MIN_VERSIONS, and KEEP_DELETED_CELLS with a maximum of flexibility. There is a lot of nuance regarding their usage. Almost all combinations of these four settings make sense for some use cases (exceptions are MIN_VERSIONS > 0 without TTL, and KEEP_DELETED_CELLS=TTL without TTL). There should be a way to make the behavior with TTL easier to conceive when creating the schema. This could take the form of a literate builder API for ColumnDescriptor or an extension to an existing one. Let me give you a motivating example: We may want to retain all versions for a given TTL, and then only a specific number of versions. This can be achieved with VERSIONS=INT_MAX, TTL=_retention_interval_, KEEP_DELETED_CELLS=TTL, MIN_VERSION=_num_versions_ . This is not intuitive though because VERSIONS has been used to specify _num_versions_ in this example since version 0.1. A literate builder API, by way if its method names, could let a user describe more or less in speaking language how they want version retention to work, and internally the builder API could set the low level schema attributes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-16141) Unwind use of UserGroupInformation.doAs() to convey requester identity in coprocessor upcalls
[ https://issues.apache.org/jira/browse/HBASE-16141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-16141. - Fix Version/s: (was: 1.7.0) (was: 3.0.0) Assignee: (was: Gary Helmling) Resolution: Later > Unwind use of UserGroupInformation.doAs() to convey requester identity in > coprocessor upcalls > - > > Key: HBASE-16141 > URL: https://issues.apache.org/jira/browse/HBASE-16141 > Project: HBase > Issue Type: Improvement > Components: Coprocessors, security >Reporter: Gary Helmling >Priority: Major > > In discussion on HBASE-16115, there is some discussion of whether > UserGroupInformation.doAs() is the right mechanism for propagating the > original requester's identify in certain system contexts (splits, > compactions, some procedure calls). It has the unfortunately of overriding > the current user, which makes for very confusing semantics for coprocessor > implementors. We should instead find an alternate mechanism for conveying > the caller identity, which does not override the current user context. > I think we should instead look at passing this through as part of the > ObserverContext passed to every coprocessor hook. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-23948) Backport HBASE-23146 (Support CheckAndMutate with multiple conditions) to branch-1
Andrew Kyle Purtell created HBASE-23948: --- Summary: Backport HBASE-23146 (Support CheckAndMutate with multiple conditions) to branch-1 Key: HBASE-23948 URL: https://issues.apache.org/jira/browse/HBASE-23948 Project: HBase Issue Type: Improvement Reporter: Andrew Kyle Purtell Fix For: 1.7.0 Backport HBASE-23146 (Support CheckAndMutate with multiple conditions) to branch-1, including updates to REST (HBASE-23924) and Thrift (HBASE-23925). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23220) Release 1.6.0
[ https://issues.apache.org/jira/browse/HBASE-23220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-23220. - Resolution: Fixed > Release 1.6.0 > - > > Key: HBASE-23220 > URL: https://issues.apache.org/jira/browse/HBASE-23220 > Project: HBase > Issue Type: Task > Components: community >Affects Versions: 1.5.1 >Reporter: Sean Busbey >Assignee: Andrew Kyle Purtell >Priority: Major > > let's roll 1.6.0 to get HBASE-23174 out on recent branch-1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24018) Access check for getTableDescriptors is too restrictive
[ https://issues.apache.org/jira/browse/HBASE-24018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-24018. - Resolution: Won't Fix > Access check for getTableDescriptors is too restrictive > --- > > Key: HBASE-24018 > URL: https://issues.apache.org/jira/browse/HBASE-24018 > Project: HBase > Issue Type: Improvement >Reporter: Abhishek Singh Chouhan >Priority: Major > > Currently getTableDescriptor requires a user to have Admin or Create > permissions. A client might need to get table descriptors to act accordingly > eg. based on an attribute set or a CP loaded. It should not be necessary for > the client to have create or admin privileges just to read the descriptor, > execute and/or read permission should be sufficient? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24069) Extend HBASE-16209 strategy (Provide an ExponentialBackOffPolicy sleep between failed region open requests) to region close and split requests
Andrew Kyle Purtell created HBASE-24069: --- Summary: Extend HBASE-16209 strategy (Provide an ExponentialBackOffPolicy sleep between failed region open requests) to region close and split requests Key: HBASE-24069 URL: https://issues.apache.org/jira/browse/HBASE-24069 Project: HBase Issue Type: Improvement Components: Region Assignment Affects Versions: 1.6.0 Reporter: Andrew Kyle Purtell Fix For: 3.0.0, 1.7.0, 2.4.0 In HBASE-16209 we provide an ExponentialBackOffPolicy sleep between failed region open requests. This should be extended to also apply to region close and split requests. Will reduce the likelihood of FAILED_CLOSE transitions in production by being more tolerant of temporary regionserver loading issues, e.g. CallQueueTooBigException. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24099) Use a fair ReentrantReadWriteLock for the region lock used to guard closes
Andrew Kyle Purtell created HBASE-24099: --- Summary: Use a fair ReentrantReadWriteLock for the region lock used to guard closes Key: HBASE-24099 URL: https://issues.apache.org/jira/browse/HBASE-24099 Project: HBase Issue Type: Improvement Reporter: Andrew Kyle Purtell Consider creating the region's ReentrantReadWriteLock with the fair locking policy. We have had a couple of production incidents where a regionserver stalled in shutdown for a very very long time, leading to RIT (FAILED_CLOSE). The latest example is a 43 minute shutdown, ~40 minutes (2465280 ms) of that time was spent waiting to acquire the write lock on the region in order to finish closing it. {quote} ... Finished memstore flush of ~66.92 MB/70167112, currentsize=0 B/0 for region . in 927ms, sequenceid=6091133815, compaction requested=false at 1585175635349 (+60 ms) Disabling writes for close at 1585178100629 (+2465280 ms) {quote} This time was spent in between the memstore flush and the task status change "Disabling writes for close at...". This is at HRegion.java:1481 in 1.3.6: {code} 1480: // block waiting for the lock for closing 1481: lock.writeLock().lock(); // FindBugs: Complains UL_UNRELEASED_LOCK_EXCEPTION_PATH but seems fine {code} The close lock is operating in unfair mode. The table in question is under constant high query load. When the close request was received, there were active readers. After the close request there were more active readers, near-continuous contention. Although the clients would receive RegionServerStoppingException and other error notifications, because the region could not be reassigned, they kept coming, region (re-)location would find the region still hosted on the stuck server. Finally the closing thread waiting for the write lock became no longer starved (by chance) after 40 minutes. The ReentrantReadWriteLock javadoc is clear about the possibility of starvation when continuously contended: "_When constructed as non-fair (the default), the order of entry to the read and write lock is unspecified, subject to reentrancy constraints. A nonfair lock that is continuously contended may indefinitely postpone one or more reader or writer threads, but will normally have higher throughput than a fair lock._" We could try changing the acquisition semantics of this lock to fair. This is a one line change, where we call the RW lock constructor. Then: "_When constructed as fair, threads contend for entry using an approximately arrival-order policy. When the currently held lock is released, either the longest-waiting single writer thread will be assigned the write lock, or if there is a group of reader threads waiting longer than all waiting writer threads, that group will be assigned the read lock._" This could be better. The close process will have to wait until all readers and writers already waiting for acquisition either acquire and release or go away but won't be starved by future/incoming requests. There could be a throughput loss in request handling, though, because this is the global reentrant RW lock for the region. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24115) Relocate test-only REST "client" from src/ to test/ and mark Private
Andrew Kyle Purtell created HBASE-24115: --- Summary: Relocate test-only REST "client" from src/ to test/ and mark Private Key: HBASE-24115 URL: https://issues.apache.org/jira/browse/HBASE-24115 Project: HBase Issue Type: Test Components: REST, security Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 1.7.0 Relocate test-only REST "client" from src/ to test/ and annotate as Private. The classes o.a.h.h.rest.Remote* were developed to facilitate REST unit tests and incorrectly committed to src/ . Although this "breaks" compatibility by moving public classes to test jar and marking them private, no attention has been paid to these classes with respect to performance, convenience, or security. Consensus from various discussions over the years is to move them to test/ as was intent of the original committer, but misplaced by the same individual. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24322) UnsafeAvailChecker should also check that required methods are available
Andrew Kyle Purtell created HBASE-24322: --- Summary: UnsafeAvailChecker should also check that required methods are available Key: HBASE-24322 URL: https://issues.apache.org/jira/browse/HBASE-24322 Project: HBase Issue Type: Bug Reporter: Andrew Kyle Purtell We had a weird test failure due to accidentally running tests with Java 11, where Unsafe is available, but the method signatures were different, leading to this: {noformat} 020-05-02 14:57:15,145 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:143) at org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:237) at org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:163) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:225) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:127) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2911) Caused by: java.lang.NoSuchMethodError: 'void sun.misc.Unsafe.putInt(java.lang.Object, int, int)' at org.apache.hadoop.hbase.util.UnsafeAccess.putInt(UnsafeAccess.java:233) at org.apache.hadoop.hbase.util.Bytes$ConverterHolder$UnsafeConverter.putInt(Bytes.java:1499) at org.apache.hadoop.hbase.util.Bytes.putInt(Bytes.java:1021) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.appendMetaData(RecoverableZooKeeper.java:850) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:640) at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:1027) at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.setMasterAddress(MasterAddressTracker.java:211) at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2095) at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:520) at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.(HMasterCommandLine.java:315) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:138) ... 7 more {noformat} We should also check that all methods that will be invoked on Unsafe in UnsafeAccess.java are available when deciding in UnsafeAvailChecker if Unsafe is available (and usable). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24350) HBase table level replication metrics for shippedBytes are always 0
[ https://issues.apache.org/jira/browse/HBASE-24350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-24350. - Fix Version/s: 2.4.0 1.7.0 3.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed > HBase table level replication metrics for shippedBytes are always 0 > --- > > Key: HBASE-24350 > URL: https://issues.apache.org/jira/browse/HBASE-24350 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 3.0.0-alpha-1, master, 1.7.0, 2.4.0 >Reporter: Sandeep Pal >Assignee: Sandeep Pal >Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0 > > > It was observed during some investigations that table level metrics for > shippedBytes are always 0 consistently even though data is getting shipped. > There are two problems with table-level metrics: > # There are no table-level metrics for shipped bytes. > # Another problem is that it's using `MetricsReplicationSourceSourceImpl` > which is creating all source-level metrics at table level as well but updated > only ageOfLastShippedOp. This reports lot of false/incorrect replication > metrics at table level. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24380) Improve WAL splitting log lines to enable sessionizatino
Andrew Kyle Purtell created HBASE-24380: --- Summary: Improve WAL splitting log lines to enable sessionizatino Key: HBASE-24380 URL: https://issues.apache.org/jira/browse/HBASE-24380 Project: HBase Issue Type: Improvement Components: logging, Operability, wal Reporter: Andrew Kyle Purtell Looking to reconstruct a timeline from write of recovered.edits file back to start of WAL file split, with a bunch of unrelated activity in the meantime, there isn't a consistent token that links split file write messages (which include store path including region hash) to beginning of WAL splitting activity. Sessonizing by host doesn't work because work can bounce around through retries. Thread context names in the logs vary and can be like [nds1-225-fra:60020-7] or [fb472085572ba72e96f1] (trailing digits of region hash) or [splits-1589016325868] . We could have WALSplitter get the current time when starting the split of a WAL file and have it log this timestamp in every line as a splitting session identifier. Related, we should track the time of split task execution end to end and export a metric that captures it. It might also be worthwhile to wire up more of WAL splitting to TaskMonitor status logging. If we do this we can also enable status journal logging, so when splitting is down, a line will appear in the log that has the list of all status messages recorded during splitting and the time delta in milliseconds between them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24428) Priority compaction for recently split daughter regions
Andrew Kyle Purtell created HBASE-24428: --- Summary: Priority compaction for recently split daughter regions Key: HBASE-24428 URL: https://issues.apache.org/jira/browse/HBASE-24428 Project: HBase Issue Type: Improvement Components: Compaction Reporter: Andrew Kyle Purtell We observe that under hotspotting conditions that splitting will proceed very slowly and the "_Cannot split region due to reference files being there_" log line will be logged excessively. (branch-1 based production.) This is because after a region is split it must be compacted before it can be split again. Reference files must be replaced by real HFiles, normal housekeeping performed during compaction. However if the regionserver is under excessive load, its compaction queues may become deep. The daughters of a recently split hotspotting region may themselves continue to hotspot and will rapidly need to split again. If the scheduled compaction work to remove/replace reference files is queued hundreds or thousands of compaction queue elements behind current, the recently split daughter regions will not be able to split again for a long time and may grow very large, producing additional complications (very large regions, very deep replication queues). To help avoid this condition we should prioritize the compaction of recently split daughter regions. Compaction requests include a {{priority}} field and CompactionRequest implements a comparator that sorts by this field. We already detect when a compaction request involves a region that has reference files, to ensure that it gets selected to be eligible for compaction, but we do not seem to prioritize the requests for post-split housekeeping. Split work should be placed at the top of the queue. Ensure that this is happening. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24439) Replication queue recovery tool for rescuing deep queues
Andrew Kyle Purtell created HBASE-24439: --- Summary: Replication queue recovery tool for rescuing deep queues Key: HBASE-24439 URL: https://issues.apache.org/jira/browse/HBASE-24439 Project: HBase Issue Type: Improvement Components: Replication Reporter: Andrew Kyle Purtell In HBase cross site replication, on the source side, every regionserver places its WALs into a replication queue and then drains the queue to the remote sink cluster. At the source cluster every regionserver participates as a source. At the sink cluster, a configurable subset of regionservers volunteer to process inbound replication RPC. When data is highly skewed we can take certain steps to mitigate, such as pre-splitting, or manual splitting, and rebalancing. This can most effectively be done at the sink, because replication RPCs are randomly distributed over the set of receiving regionservers, and splitting on the sink side can effectively redistribute resulting writes there. On the source side we are more limited. If writes are deeply unbalanced, a regionserver's source replication queue may become very deep. Hotspotting can happen, despite mitigations. Unlike on the sink side, once hotspotting has happened at the source, it is not possible to increase parallelism or redistribute work among sources once WALs have already been enqueued. Increasing parallelism on the sink side will not help if there is a big rock at the source. Source side mitigations like splitting and redistribute cannot help deep queues already accumulated. Can we redistribute source work? Yes and no. If a source regionserver fails, its queues will be recovered by other regionservers. However the other rs must still serve the recovered queue as an atomic entity. We can move a deep queue, but we can't break it up. Where time is of the essence, and ordering semantics can be allowed to break, operators should have available to them a recovery tool that rescues their production from the consequences of deep source queues. A very large replication queue can be split into many smaller queues. Perhaps even one new queue for each WAL file. Then, these new synthetic queues can be distributed to any/all source regionservers through the normal recovery queue assignment protocol. This increases parallelism at the source. Of course this would break serial replication semantics, and sync replication semantics, and even in branch-1 which does not have these features would highly increase the probability of reordering of edits. That is an unavoidable consequence of breaking up the queue for more parallelism, but as long as this is done by a separate tool, invoked by operators, it is a valid option for emergency drain if backed up replication queues. Every cell in the WAL entries carries a timestamp assigned at the source, and will be applied on the sink with this timestamp. When the queue is drained and all edits have been persisted at the target, there will be a complete and correct temporal data ordering at that time. An operator will be and must be prepared to handle intermediate mis-/re-ordered states if they intend to invoke this tool. In many use cases the interim states are not important. The final state after all edits have transferred cross cluster and persisted at this sink, after invocation of the recovery tool, is the point where the operator would transition back into service. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick
Andrew Kyle Purtell created HBASE-24440: --- Summary: Prevent temporal misordering on timescales smaller than one clock tick Key: HBASE-24440 URL: https://issues.apache.org/jira/browse/HBASE-24440 Project: HBase Issue Type: Improvement Reporter: Andrew Kyle Purtell When mutations are sent to the servers without a timestamp explicitly assigned by the client the server will substitute the current wall clock time. There are edge cases where it is at least theoretically possible for more than one mutation to be committed to a given row within the same clock tick. When this happens we have to track and preserve the ordering of these mutations in some other way besides the timestamp component of the key. Let me bypass most discussion here by noting that whether we do this or not, we do not pass such ordering information in the cross cluster replication protocol. We also have interesting edge cases regarding key type precedence when mutations arrive "simultaneously": we sort deletes ahead of puts. This, especially in the presence of replication, can lead to visible anomalies for clients able to interact with both source and sink. There is a simple solution that removes the possibility that these edge cases can occur: We can detect, when we are about to commit a mutation to a row, if we have already committed a mutation to this same row in the current clock tick. Occurrences of this condition will be rare. We are already tracking current time. We have to know this in order to assign the timestamp. Where this becomes interesting is how we might track the last commit time per row. Making the detection of this case efficient for the normal code path is the bulk of the challenge. We would do this somehow via the memstore. Assuming we can efficiently know if we are about to commit twice to the same row within a single clock tick, we would simply sleep/yield the current thread until the clock ticks over, and then proceed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24445) Improve default thread pool size for opening store files
Andrew Kyle Purtell created HBASE-24445: --- Summary: Improve default thread pool size for opening store files Key: HBASE-24445 URL: https://issues.apache.org/jira/browse/HBASE-24445 Project: HBase Issue Type: Improvement Reporter: Andrew Kyle Purtell For each store open we create a CompletionService and also create a thread pool for opening and closing store files. See HStore#openStoreFiles and HRegion#getStoreFileOpenAndCloseThreadPool. By default this pool has only one thread. It can be increased with "hbase.hstore.open.and.close.threads.max" but this config value is then divided by number of stores in the region. "hbase.hstore.open.and.close.threads.max" is also used to size other thread pools for opening and closing the stores themselves, so it's an unfortunate overloading. We should have a configuration parameter that directly and simply tunes the thread pool size for opening store files. Introduce a new configuration parameter: "hbase.hstore.hfile.open.threads.max" which will define the upper bound for a thread pool shared by the entire store for opening hfiles. The default should be 1 to preserve default behavior. Once this is done, we could increase this to 2, 4, 8, or more for increased parallelism when opening store files without impact on other activities. The time required to open all storefiles often dominates the total time for bringing a region online. The thread pool will be shut down and eligible for garbage collection once all files are loaded and the store is online. Number of open threads should scale with the number of stores, so allocating the pool at the store level continues to make sense. Longer term we might try recursively decomposing the region open task with a fork-join pool such that the opening of store files can be dynamically parallelized in a probably superior way (conjecture pending a real attempt with metrics) . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24525) [branch-1] Support ZooKeeper 3.6.0+
Andrew Kyle Purtell created HBASE-24525: --- Summary: [branch-1] Support ZooKeeper 3.6.0+ Key: HBASE-24525 URL: https://issues.apache.org/jira/browse/HBASE-24525 Project: HBase Issue Type: Improvement Components: Zookeeper Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 1.7.0 Fix compilation issues against ZooKeeper 3.6.0. Backwards compatible changes with 3.4 and 3.5. Tested with: {{ mvn clean install}}{{}} {{ mvn clean install -Dzookeeper.version=3.5.8}}{{}} {{ mvn clean install -Dzookeeper.version=3.6.0}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24527) Improve region housekeeping status observability
Andrew Kyle Purtell created HBASE-24527: --- Summary: Improve region housekeeping status observability Key: HBASE-24527 URL: https://issues.apache.org/jira/browse/HBASE-24527 Project: HBase Issue Type: New Feature Components: Admin, Compaction, shell, UI Reporter: Andrew Kyle Purtell We provide a coarse grained admin API and associated shell command for determining the compaction status of a table: {noformat} hbase(main):001:0> help "compaction_state" Here is some help for this command: Gets compaction status (MAJOR, MAJOR_AND_MINOR, MINOR, NONE) for a table: hbase> compaction_state 'ns1:t1' hbase> compaction_state 't1' {noformat} We also log compaction activity, including a compaction journal at completion, via log4j to whatever log aggregation solution is available in production. This is not sufficient for online and interactive observation, debugging, or performance analysis of current compaction activity. In this kind of activity an operator is attempting to observe and analyze compaction activity in real time. Log aggregation and presentation solutions have typical latencies (end to end visibility of log lines on the order of ~minutes) which make that not possible today. We don't offer any API or tools for directly interrogating split and merge activity in real time. Some indirect knowledge of split or merge activity can be inferred from RIT information via ClusterStatus. We should have new APIs and shell commands, and perhaps also new admin UI views, for at regionserver scope: * listing the current state of a regionserver's compaction, split, and merge tasks and threads * counting (simple view) and listing (detailed view) a regionserver's compaction queues * listing a region's currently compacting, splitting, or merging status at master scope, aggregations of the above detailed information into: * listing the active compaction tasks and threads for a given table, the extension of _compaction_state_ with a new detailed view * listing the active split or merge tasks and threads for a given table's regions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24528) Improve balancer decision observability
Andrew Kyle Purtell created HBASE-24528: --- Summary: Improve balancer decision observability Key: HBASE-24528 URL: https://issues.apache.org/jira/browse/HBASE-24528 Project: HBase Issue Type: New Feature Components: Admin, Balancer, shell, UI Reporter: Andrew Kyle Purtell We provide detailed INFO and DEBUG level logging of balancer decision factors, outcome, and reassignment planning, as well as similarly detailed logging of the resulting assignment manager activity. However, an operator may need to perform online and interactive observation, debugging, or performance analysis of current balancer activity. Scraping and correlating the many log lines resulting from a balancer execution is labor intensive and has a lot of latency (order of ~minutes to acquire and index, order of ~minutes to correlate). The balancer should maintain a rolling window of history, e.g. the last 100 region move plans, or last 1000 region move plans submitted to the assignment manager. This history should include decision factor details and weights and costs. The rsgroups balancer may be able to provide fairly simple decision factors, like for example "this table was reassigned to that regionserver group". The underlying or vanilla stochastic balancer on the other hand, after a walk over random assignment plans, will have considered a number of cost functions with various inputs (locality, load, etc.) and multipliers, including custom cost functions. We can devise an extensible class structure that represents explanations for balancer decisions, and for each region move plan that is actually submitted to the assignment manager, we can keep the explanations of all relevant decision factors alongside the other details of the assignment plan like the region name, and the source and destination regionservers. This history should be available via API for use by new shell commands and admin UI widgets. The new shell commands and UI widgets can unpack the representation of balancer decision components into human readable output. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24543) ScheduledChore logging is too chatty, replace with metrics
Andrew Kyle Purtell created HBASE-24543: --- Summary: ScheduledChore logging is too chatty, replace with metrics Key: HBASE-24543 URL: https://issues.apache.org/jira/browse/HBASE-24543 Project: HBase Issue Type: Improvement Components: metrics, Operability Reporter: Andrew Kyle Purtell ScheduledChore logs at DEBUG level the execution time of each chore. We used to log an average execution time across all chores every five minutes, which by consensus was judged to not be useful. Derived metrics like averages or histograms should be calculated per chore. So we modified the logging to dump the chore execution time each time it runs, to facilitate such calculations with the log aggregation and searching tool of choice. Per chore execution logging is more useful, in that sense, but may be too chatty. This is not unexpected but let me provide my observations so we can revisit this. On the master, for example, this is logged every second: {noformat} 2020-06-11 16:35:28,263 DEBUG [master/apurtell-ltm:8100.splitLogManager..Chore.1] hbase.ScheduledChore: SplitLogManager Timeout Monitor execution time: 0 ms. {noformat} Does the value of these lines outweigh the cost of 86,400 log lines per day per master instance? (At least.) On the regionserver it is somewhat better, these are logged every 10 seconds: {noformat} 2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1] hbase.ScheduledChore: CompactionChecker execution time: 0 ms. 2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1] hbase.ScheduledChore: MemstoreFlusherChore execution time: 0 ms. {noformat} So that will be 17,280 log lines per day per regionserver. (At least.) Perhaps these should be moved to TRACE level. We should definitely replace this logging with histogram metrics. There should be a separate metric for each distinct chore classname, allocated as needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24597) Port HBASE-24380 (Improve WAL splitting log lines to enable sessionization) to branch-1
Andrew Kyle Purtell created HBASE-24597: --- Summary: Port HBASE-24380 (Improve WAL splitting log lines to enable sessionization) to branch-1 Key: HBASE-24597 URL: https://issues.apache.org/jira/browse/HBASE-24597 Project: HBase Issue Type: Sub-task Components: logging, wal Reporter: Andrew Kyle Purtell Fix For: 1.7.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24598) Port HBASE-24380 (Improve WAL splitting log lines to enable sessionization) to branch-2.2
Andrew Kyle Purtell created HBASE-24598: --- Summary: Port HBASE-24380 (Improve WAL splitting log lines to enable sessionization) to branch-2.2 Key: HBASE-24598 URL: https://issues.apache.org/jira/browse/HBASE-24598 Project: HBase Issue Type: Sub-task Components: logging, wal Reporter: Andrew Kyle Purtell Fix For: 2.2.6 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24637) Filter SKIP hinting regression
Andrew Kyle Purtell created HBASE-24637: --- Summary: Filter SKIP hinting regression Key: HBASE-24637 URL: https://issues.apache.org/jira/browse/HBASE-24637 Project: HBase Issue Type: Bug Components: Filters, Performance, Scanners Reporter: Andrew Kyle Purtell I have been looking into reported performance regressions in HBase 2 relative to HBase 1. Depending on the test scenario, HBase 2 can demonstrate significantly better microbenchmarks in a number of cases, and usually shows improvement in whole cluster benchmarks like YCSB. To assist in debugging I added methods to RpcServer for updating per-call metrics that leverage the fact it puts a reference to the current Call into a thread local and that all activity for a given RPC is processed by a single thread context. I then instrumented ScanQueryMatcher (in branch-1) and its various friends (in branch-2.2), StoreScanner, HFileReaderV2 and HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 and 2.2 versions under test operated on identical data files in HDFS. For tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to ensure only the server side differed. The results for pe --filterAll were revealing. See attached. It appears a refactor to ScanQueryMatcher and friends has disabled the ability of filters to provide meaningful SKIP hints, which disables an optimization that avoids reseeking, leading to a serious and proportional regression in reseek activity and time spent in that code path. So for queries that use filters, there can be a substantial regression. Other test cases that did not use filters did not show this regression. If filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was almost identical, as measured by counts of the hint types returned, whether or not column or version trackers are called, and counts of store seeks or reseeks. Regarding micro-timings, there was a 10% variance in my testing and results generally fell within this range, except for the filter all case of course. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-15519) Add per-user metrics
[ https://issues.apache.org/jira/browse/HBASE-15519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell reopened HBASE-15519: - > Add per-user metrics > - > > Key: HBASE-15519 > URL: https://issues.apache.org/jira/browse/HBASE-15519 > Project: HBase > Issue Type: Sub-task > Components: metrics >Affects Versions: 1.2.0 >Reporter: Enis Soztutar >Assignee: Ankit Singhal >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.0 > > Attachments: HBASE-15519.master.003.patch, hbase-15519_v0.patch, > hbase-15519_v1.patch, hbase-15519_v1.patch, hbase-15519_v2.patch > > > Per-user metrics will be useful in multi-tenant cases where we can emit > number of requests, operations, num RPCs etc at the per-user aggregate level > per regionserver. We currently have throttles per user, but no way to monitor > resource usage per-user. > Looking at these metrics, operators can adjust throttles, do capacity > planning, etc per-user. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24893) TestLogLevel failing on hadoop-ci (branch-1)
Andrew Kyle Purtell created HBASE-24893: --- Summary: TestLogLevel failing on hadoop-ci (branch-1) Key: HBASE-24893 URL: https://issues.apache.org/jira/browse/HBASE-24893 Project: HBase Issue Type: Bug Components: test Reporter: Andrew Kyle Purtell Fix For: 1.7.0 TestLogLevel is failing the branch-1 builds on hadoop-ci. The test needs some improvement. The code seems to be doing the right thing but the error condition the test is expecting varies by JVM or JVM version: {noformat} Expected to find 'Unrecognized SSL message' but got unexpected exception:javax.net.ssl.SSLException: Unsupported or unrecognized SSL message {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24898) Use EnvironmentEdge.currentTime() instead of System.currentTimeMillis() in CurrentHourProvider
[ https://issues.apache.org/jira/browse/HBASE-24898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell reopened HBASE-24898: - This test fails 100% of the time on branch-1 and the commit has been reverted. {noformat} [INFO] Running org.apache.hadoop.hbase.regionserver.compactions.TestCurrentHourProvider [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.309 s <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.compactions.TestCurrentHourProvider [ERROR] testWithEnvironmentEdge(org.apache.hadoop.hbase.regionserver.compactions.TestCurrentHourProvider) Time elapsed: 0.175 s <<< FAILURE! java.lang.AssertionError: expected:<11> but was:<12> at org.apache.hadoop.hbase.regionserver.compactions.TestCurrentHourProvider.testWithEnvironmentEdge(TestCurrentHourProvider.java:53) [INFO] [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestCurrentHourProvider.testWithEnvironmentEdge:53 expected:<11> but was:<12> [INFO] [ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0 {noformat} It also fails 100% of the time for me on branch-2.3 and probably should be reverted elsewhere as well. > Use EnvironmentEdge.currentTime() instead of System.currentTimeMillis() in > CurrentHourProvider > -- > > Key: HBASE-24898 > URL: https://issues.apache.org/jira/browse/HBASE-24898 > Project: HBase > Issue Type: Bug > Components: tooling >Reporter: Zheng Wang >Assignee: Zheng Wang >Priority: Major > Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0, 2.2.7, 2.3.2 > > > In order to control the return value of getCurrentHour used by unit test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-24971) Upgrade JQuery to 3.5.1
Andrew Kyle Purtell created HBASE-24971: --- Summary: Upgrade JQuery to 3.5.1 Key: HBASE-24971 URL: https://issues.apache.org/jira/browse/HBASE-24971 Project: HBase Issue Type: Bug Components: security, UI Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 3.0.0-alpha-1, 2.3.3, 1.7.0, 2.4.0, 2.2.7 JQuery <= 3.5.0 is subject to a known cross site scripting vulnerability. Upgrade our embedded minimized jquery library to 3.5.1. Upgrade embedded jquery-tablesorter while at it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24971) Upgrade JQuery to 3.5.1
[ https://issues.apache.org/jira/browse/HBASE-24971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-24971. - Hadoop Flags: Reviewed Resolution: Fixed > Upgrade JQuery to 3.5.1 > --- > > Key: HBASE-24971 > URL: https://issues.apache.org/jira/browse/HBASE-24971 > Project: HBase > Issue Type: Bug > Components: security, UI >Reporter: Andrew Kyle Purtell >Assignee: Andrew Kyle Purtell >Priority: Major > Fix For: 3.0.0-alpha-1, 2.3.3, 1.7.0, 2.4.0, 2.2.7 > > > JQuery <= 3.5.0 is subject to a known cross site scripting vulnerability. > Upgrade our embedded minimized jquery library to 3.5.1. > Upgrade embedded jquery-tablesorter while at it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-24893) TestLogLevel failing on hadoop-ci (branch-1)
[ https://issues.apache.org/jira/browse/HBASE-24893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell reopened HBASE-24893: - > TestLogLevel failing on hadoop-ci (branch-1) > > > Key: HBASE-24893 > URL: https://issues.apache.org/jira/browse/HBASE-24893 > Project: HBase > Issue Type: Bug > Components: test >Reporter: Andrew Kyle Purtell >Assignee: Abhey Rana >Priority: Major > Labels: pull-request-available > Fix For: 1.7.0 > > > TestLogLevel is failing the branch-1 builds on hadoop-ci. > The test needs some improvement. The code seems to be doing the right thing > but the error condition the test is expecting varies by JVM or JVM version: > {noformat} > Expected to find 'Unrecognized SSL message' but got unexpected exception: > javax.net.ssl.SSLException: Unsupported or unrecognized SSL message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25079) Upgrade Bootstrap to 3.3.7
Andrew Kyle Purtell created HBASE-25079: --- Summary: Upgrade Bootstrap to 3.3.7 Key: HBASE-25079 URL: https://issues.apache.org/jira/browse/HBASE-25079 Project: HBase Issue Type: Improvement Components: security, UI Environment: ad Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 3.0.0-alpha-1, 2.3.3, 1.7.0, 2.4.0, 2.2.7 Our UI embeds Bootstrap 3.0.0. There are some reported security issues. Upgrade to Bootstrap 3.3.7. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25212) Optionally abort requests in progress after deciding a region should close
Andrew Kyle Purtell created HBASE-25212: --- Summary: Optionally abort requests in progress after deciding a region should close Key: HBASE-25212 URL: https://issues.apache.org/jira/browse/HBASE-25212 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0 After deciding a region should be closed, the regionserver will set the internal region state to closing and wait for all pending requests to complete, via a rendezvous on the region lock. In closing state the region will not accept any new requests but requests in progress will be allowed to complete before the close action takes place. In our production we see outlier wait times on this lock in excess of several minutes. During close when there are requests in flight the regionserver is subject to any conceivable reason for delay, like full scans over large regions, expensive filtering hierarchies, bugs, or store level performance problems like slow HDFS. The regionserver should interrupt requests in progress to facilitate smaller/shorter close times on an opt-in basis. Optionally, via configuration parameter -- which would be a system wide default set in hbase-site.xml in common practice but could be overridden in table schema for per table settings -- interrupt requests in progress holding the region lock rather than wait for completion of all operations in flight. Send back NotServingRegionException("region is closing") to the clients of the interrupted operations, like we do after the write lock is acquired. The client will transparently relocate the region data and resubmit the aborted requests per normal retry policy. This can be less disruptive than waiting for very long times for a region to close in extreme outlier cases (e.g. 50 minutes). After waiting for all requests to complete then we flush the region's memstore and finish the close. The flush portion of the close process is out of scope of this proposal. Under normal conditions the flush portion of the close completes quickly. It is specifically waits on the close lock that has been an occasional issue in our production that causes difficulty achieving 99.99% availability. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25227) [branch-1] Cast in UnsafeAccess to avoid Java 11 runtime issue
Andrew Kyle Purtell created HBASE-25227: --- Summary: [branch-1] Cast in UnsafeAccess to avoid Java 11 runtime issue Key: HBASE-25227 URL: https://issues.apache.org/jira/browse/HBASE-25227 Project: HBase Issue Type: Bug Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 1.7.0 When running tests with Java 11 UnsafeAccess is observed to throw NoSuchMethodErrors. Some of our methods accept 'int' parameters and use them as parameters to Unsafe methods which should take 'long'. The Java 8 compiler does the implicit conversion but the Java 11 compiler does not. Add casts to fix. Not an issue on branch-2 and up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25292) Update InetSocketAddress usage discipline
Andrew Kyle Purtell created HBASE-25292: --- Summary: Update InetSocketAddress usage discipline Key: HBASE-25292 URL: https://issues.apache.org/jira/browse/HBASE-25292 Project: HBase Issue Type: Bug Components: Client, HFile Reporter: Andrew Kyle Purtell We sometimes cache InetSocketAddress in data structures in an attempt to optimize away potential nameservice (DNS) lookups. This is, in general, an anti-pattern, because once an InetSocketAddress is resolved, resolution is never attempted again. The ideal pattern for connect() is ISA instantiation just before the connect() call, with no reuse of the ISA instance. For bind() we presume the local identity won't change while the process is live so usage and caching can be relaxed in that case. If I can restate my proposal for a usage convention for InetSocketAddress, it would be this: Network identities should be bound late. This means addresses should be resolved at the last possible moment. Also, network identity mappings can change, so our code should not inappropriately cache them; otherwise we might miss a change and fail to operate normally. I have reviewed the code for InetSocketAddress usage and in my opinion sometimes we are caching ISA acceptably, and in other cases we are not. Correct cases: * We cache ISA for RPC connections, so we don't potentially do a lookup for every Call. However, we resolve the address earlier than we need to. The code can be improved by moving resolution to just before where we connect(). Incorrect cases that can be fixed: * RPC stubs. Remote clients may be recycled and replaced with new instances where the network identities (DNS name to IP address mapping) have changed--. HBASE-14544 attempts to work around DNS instability in data centers of years past in a way that, in my opinion, is the wrong thing to do in the modern era. This is just a technical opinion and not critical to the rest of the proposal. That said, I intend to propose a revert of HBASE-14544. Reverting this simplifies some code a bit. (If this part of the proposal is controversial it can be dropped.) When looking up the IP address of the remote host when creating a stub key we also make a key even if the resolution fails. This is the wrong thing to do. If we can't resolve the remote address, we can't contact the server. Making a stub that can't communicate is pointless. Throw an exception instead. * Favored nodes. Although the HDFS API requires InetSocketAddress, we don't have to make up a list right away and cache them forever. We can use Address to record the list of favored nodes and convert from Address to InetSocketAddress on demand (when we go to create the HFile). This will allow us to resolve datanode hostnames just before they are needed. In public cloud, kubernetes, and or some private datacenter service deployment options, datanode servers may have their network identities (DNS name -> IP address mapping) changed over time. We can and should avoid inappropriate caching that may cause us to indefinitely use an incorrect address when contacting a favored node. * Sometimes we use ISA when Address is just as good. For example, the dead servers list. If we are going to pay some attention to ISA usage discipline, let's remove the cases where we use ISA as a host and port pair but do not need to do so. Address works just as well and doesn't present an opportunity for misuse. Another example would be the RPC client concurrentCounterCache. Incorrect cases that cannot be fixed: * hbase-external-blockcache: We have to resolve all of the memcached locations up front because the memcached client constructor requires ISA instances. So we have to hope that the network identities (DNS name -> IP address mapping) does not change for any in the list. This is beyond our control. While in this area it is trivial to add new client connect metrics for number of potential nameservice lookups (whenever we instantiate an ISA) and number of failed nameservice lookups (if the instantiated ISA is unresolved). While in this area I also noticed we often directly access a field in ConnectionId where there is also a getter, so good practice is to use the getter instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25308) [branch-1] Consume Guava from hbase-thirdparty hbase-shaded-miscellaneous
Andrew Kyle Purtell created HBASE-25308: --- Summary: [branch-1] Consume Guava from hbase-thirdparty hbase-shaded-miscellaneous Key: HBASE-25308 URL: https://issues.apache.org/jira/browse/HBASE-25308 Project: HBase Issue Type: Improvement Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell Fix For: 1.7.0 We are again having classpath versioning issues related to Guava in our branch-1 based application. Hadoop 3, HBase 2, Phoenix 5, and other projects deal with Guava cross-version incompatibilities, as they manifest on a combined classpath with other components, via shading. I propose to do a global search and replace of all direct uses of Guava in our branch-1 code base and refer to Guava as provided in hbase-thirdparty's hbase-shaded-miscellaneous. This will protect HBase branch-1 from Guava cross-version vagaries just like the same technique protects branch-2 and branch-2 based releases. There are a couple of Public interfaces that incorporate Guava's HostAndPort and Service that will be indirectly impacted. We are about to release a new minor branch-1 version, 1.7.0, and this would be a great opportunity to introduce this kind of change in a manner consistent with semantic versioning and our compatibility policies. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25314) branch-1 docker mode for yetus fails
Andrew Kyle Purtell created HBASE-25314: --- Summary: branch-1 docker mode for yetus fails Key: HBASE-25314 URL: https://issues.apache.org/jira/browse/HBASE-25314 Project: HBase Issue Type: Bug Reporter: Andrew Kyle Purtell {noformat} 15:30:41 Step 28/33 : RUN gem install rubocop:'<= 0.81' 15:30:41 ---> Running in 21103fb7944c 15:30:42 Building native extensions. This could take a while... 15:30:43 [91mERROR: Error installing rubocop: 15:30:43parallel requires Ruby version >= 2.5. 15:30:43 [0mSuccessfully installed jaro_winkler-1.5.4 15:30:44 The command '/bin/sh -c gem install rubocop:'<= 0.81'' returned a non-zero code: 1 15:30:44 ERROR: Docker failed to build yetus/hbase:b249092a5f. {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25316) Release a hbase-thirdparty with hbase-shaded-miscellaneous suitable for branch-1
Andrew Kyle Purtell created HBASE-25316: --- Summary: Release a hbase-thirdparty with hbase-shaded-miscellaneous suitable for branch-1 Key: HBASE-25316 URL: https://issues.apache.org/jira/browse/HBASE-25316 Project: HBase Issue Type: Task Affects Versions: 1.7.0 Environment: R Reporter: Andrew Kyle Purtell Assignee: Andrew Kyle Purtell -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24664) Some changing of split region by overall region size rather than only one store size
[ https://issues.apache.org/jira/browse/HBASE-24664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-24664. - Hadoop Flags: Reviewed Resolution: Fixed > Some changing of split region by overall region size rather than only one > store size > > > Key: HBASE-24664 > URL: https://issues.apache.org/jira/browse/HBASE-24664 > Project: HBase > Issue Type: Improvement > Components: regionserver >Affects Versions: 3.0.0-alpha-1, 2.4.0 >Reporter: Zheng Wang >Assignee: Zheng Wang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0 > > > As a distributed cluster, HBase distribute loads in unit of region, so if > region grows too big, > it will bring some negative effects, such as: > 1. Harder to homogenize disk usage(consider locality) > 2. Might cost more time on region opening > 3. After split, the daughter region might lead to more io cost on compaction > in a short time(if write evenly) > I tried to introduce a new SteppingAllStoresSizeSplitPolicy in HBASE-24530, > but after discussed in comments and related > [thread|https://lists.apache.org/thread.html/r08a8103e2532eb667a0fcb4efa8a4117b3f82e6251bc4bd0bc157c26%40%3Cdev.hbase.apache.org%3E], > finally we decide to change the existing split policy with a new option that > if it should count all store files, and for master it would be true, else > false. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25298) hbase.rsgroup.fallback.enable should support dynamic configuration
[ https://issues.apache.org/jira/browse/HBASE-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-25298. - Fix Version/s: 3.0.0-alpha-1 Resolution: Fixed The PR was merged to master branch. > hbase.rsgroup.fallback.enable should support dynamic configuration > --- > > Key: HBASE-25298 > URL: https://issues.apache.org/jira/browse/HBASE-25298 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1, 2.4.0 >Reporter: Baiqiang Zhao >Assignee: Baiqiang Zhao >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Use update_config command to control the switch of RSGroup fallback. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24877) Add option to avoid aborting RS process upon uncaught exceptions happen on replication source
[ https://issues.apache.org/jira/browse/HBASE-24877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-24877. - Fix Version/s: 2.4.0 3.0.0-alpha-1 Resolution: Fixed PRs were merged to master and branch-2. Resolving. File new issues for any further backports. > Add option to avoid aborting RS process upon uncaught exceptions happen on > replication source > - > > Key: HBASE-24877 > URL: https://issues.apache.org/jira/browse/HBASE-24877 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 3.0.0-alpha-1, 2.4.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0 > > > Currently, we abort entire RS process if any uncaught exceptions happens on > ReplicationSource initialization. This may be too extreme on certain > deployments, where custom replication endpoint implementations may choose to > do so when remote peers are unavailable, but source cluster shouldn't be > brought down entirely. Similarly, source reader and shipper threads would > cause RS to abort on any runtime exception occurrence while running. > This patch adds configuration option (false by default, to keep the original > behaviour), to avoid aborting entire RS processes under these conditions. > Instead, if ReplicationSource initialization fails with a RuntimeException, > it keeps retrying the source startup. In the case of readers/shippers runtime > errors, it refreshes the replication source, terminating current source and > its readers/shippers and creating new ones. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-24081) Provide documentation for running Yetus with HBase
[ https://issues.apache.org/jira/browse/HBASE-24081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-24081. - Fix Version/s: 2.4.0 Resolution: Fixed > Provide documentation for running Yetus with HBase > -- > > Key: HBASE-24081 > URL: https://issues.apache.org/jira/browse/HBASE-24081 > Project: HBase > Issue Type: Task > Components: documentation >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0 > > > A colleague asked how to use Yetus with HBase, so I wrote up a little how-to > doc. Maybe it's useful to someone else? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25187) Improve SizeCachedKV variants initialization
[ https://issues.apache.org/jira/browse/HBASE-25187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-25187. - Hadoop Flags: Reviewed Resolution: Fixed > Improve SizeCachedKV variants initialization > > > Key: HBASE-25187 > URL: https://issues.apache.org/jira/browse/HBASE-25187 > Project: HBase > Issue Type: Improvement >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.3.4, 2.5.0 > > > Currently in SizeCachedKV we get the rowlength and Key length from the > buffers. This can be optimized because we can pass the keylen and row len > while actually creating the cell while reading the cell from the block. Some > times we see that the SizeCachedKV takes the max width in a flame graph - > considering the fact we also do a sanity check on the created KV. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25050) We initialize Filesystems more than once.
[ https://issues.apache.org/jira/browse/HBASE-25050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-25050. - Hadoop Flags: Reviewed Resolution: Fixed > We initialize Filesystems more than once. > - > > Key: HBASE-25050 > URL: https://issues.apache.org/jira/browse/HBASE-25050 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.6 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.4.0 > > > In HFileSystem > {code} > // Create the default filesystem with checksum verification switched on. > // By default, any operation to this FilterFileSystem occurs on > // the underlying filesystem that has checksums switched on. > this.fs = FileSystem.get(conf); > this.useHBaseChecksum = useHBaseChecksum; > fs.initialize(getDefaultUri(conf), conf); > {code} > We call fs.initialize(). Generally the FS would have been created and inited > either in the FileSystem.get() call above or even when we try to check > {code} > FileSystem fs = p.getFileSystem(c); > {code} > The FS that gets cached in the hadoop-common layer does the init for us. So > we doing it again is redundant. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25246) Backup/Restore hbase cell tags.
[ https://issues.apache.org/jira/browse/HBASE-25246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-25246. - Fix Version/s: 2.4.0 3.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed > Backup/Restore hbase cell tags. > --- > > Key: HBASE-25246 > URL: https://issues.apache.org/jira/browse/HBASE-25246 > Project: HBase > Issue Type: Improvement > Components: backup&restore >Reporter: Rushabh Shah >Assignee: Rushabh Shah >Priority: Major > Fix For: 3.0.0-alpha-1, 2.4.0 > > > In PHOENIX-6213 we are planning to add cell tags for Delete mutations. After > having a discussion with hbase community via dev mailing thread, it was > decided that we will pass the tags via an attribute in Mutation object and > persist them to hbase via phoenix co-processor. The intention of PHOENIX-6213 > is to store metadata in Delete marker so that while running Restore tool we > can selectively restore certain Delete markers and ignore others. For that to > happen we need to persist these tags in Backup and retrieve them in Restore > MR jobs (Import/Export tool). > Currently we don't persist the tags in Backup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25352) API compatibilty checker fails with "Argument list too long"
Andrew Kyle Purtell created HBASE-25352: --- Summary: API compatibilty checker fails with "Argument list too long" Key: HBASE-25352 URL: https://issues.apache.org/jira/browse/HBASE-25352 Project: HBase Issue Type: Bug Affects Versions: 2.4.0 Reporter: Andrew Kyle Purtell While working on the 2.4.0 RC I hit a stumbling block where the argument list passed to javap by the API compatibility checker is too large for Mac OS. Attempted execution of the forked process fails with "Argument list too long". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25359) create-release scripts releasedocmaker step should be optional
Andrew Kyle Purtell created HBASE-25359: --- Summary: create-release scripts releasedocmaker step should be optional Key: HBASE-25359 URL: https://issues.apache.org/jira/browse/HBASE-25359 Project: HBase Issue Type: Improvement Reporter: Andrew Kyle Purtell The create-release scripts assume, when invoking releasedocmaker and performing surgery on CHANGES.md and RELEASENOTES.md during the 'tag' stage, that the current RC step is RC0. The entirety of the generated CHANGES.md and RELEASENOTES.md files are stitched in at the head, just below the ASF notice. If we are at a RC step that is not zero, wouldn't this duplicate all CHANGES.md and RELEASENOTES.md content for the release? There would be all the content added for RC0, then the same content (with delta) added for RC1, and so on. For this reason the releasedocmaker invocation should itself be optional. For RC steps > 0, assume the RM has updated CHANGES.md and RELEASENOTES.md to reflect the delta. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25465) Use javac --release option for supporting cross version compilation in create-release
Andrew Kyle Purtell created HBASE-25465: --- Summary: Use javac --release option for supporting cross version compilation in create-release Key: HBASE-25465 URL: https://issues.apache.org/jira/browse/HBASE-25465 Project: HBase Issue Type: Improvement Reporter: Andrew Kyle Purtell -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-12830) Unreadable HLogs stuck in replication queues
[ https://issues.apache.org/jira/browse/HBASE-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-12830. - Resolution: Duplicate > Unreadable HLogs stuck in replication queues > > > Key: HBASE-12830 > URL: https://issues.apache.org/jira/browse/HBASE-12830 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.9 >Reporter: Andrew Kyle Purtell >Priority: Major > > We had an incident where underlying infrastructure issues caused HDFS > namenodes to go down, not at the same time, leading to periods of HDFS > service outage and recovery as namenodes failed over. These clusters had > replication enabled. We had some Regionservers roll logs during partial HDFS > availability. Namespace entries for these HLogs were created but the first > block-being-written was lost or unable to complete, leading to dead 0-length > HLogs in the queues of active sources. When this happens the queue becomes > stuck on the dead 0-length HLog reporting EOFExceptions whenever the source > wakes up and tries to (re)open the current file like so: > {noformat} > 2015-01-08 18:50:40,956 WARN > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > 1-,60020,1418764167084 Got: > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1845) > at > org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810) > at > org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1759) > at > org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1773) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:175) > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:184) > at > org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:70) > at > org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createReader(HLogFactory.java:128) > at > org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createReader(HLogFactory.java:91) > at > org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createReader(HLogFactory.java:79) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:68) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:506) > at > org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:309) > {noformat} > This exception originates from where SequenceFile tries to read in the 4-byte > version header from position 0. > In ReplicationSource#run we have an active loop: > {code} > // Loop until we close down > while (isActive()) { > ... > } > {code} > Within this loop we iterate over paths in the replication queue. For each > path, we attempt to open it: > {code} > // Open a reader on it > if (!openReader(sleepMultiplier)) { > // Reset the sleep multiplier, else it'd be reused for the next file > sleepMultiplier = 1; > continue; > } > {code} > When we have a zero length file openReader returns TRUE but this.reader is > set to NULL (look at the catch of the outer try block) and we fall through > the conditional to: > {code} > // If we got a null reader but didn't continue, then sleep and continue > if (this.reader == null) { > if (sleepForRetries("Unable to open a reader", sleepMultiplier)) { > sleepMultiplier++; > } > continue; > } > {code} > We will keep trying to open the current file for a long time. The queue will > be stuck until sleepMultiplier == maxRetriesMultiplier (conf > "replication.source.maxretriesmultiplier", default 10), with a base sleep > time of "replication.source.sleepforretries" (default 1000) ms, then we will > call ReplicationSource#processEndOfFile(). > By default we will spin on opening the dead 0-length HLog for (1000*1) + > (1000*2) ... + (1000*10) milliseconds before processing the file out of the > queue. In HBASE-11964 we recommend increasing > "replication.source.maxretriesmultiplier" to 300. Using the updated > configuration we will wait for (1000*1) + (1000*2) ... + (1000*300) > milliseconds before processing the file out of the queue. > There should be some way to break out of this very long wait for a 0-length > or corrupt file that is blocking the q
[jira] [Resolved] (HBASE-24813) ReplicationSource should clear buffer usage on ReplicationSourceManager upon termination
[ https://issues.apache.org/jira/browse/HBASE-24813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Kyle Purtell resolved HBASE-24813. - Resolution: Fixed > ReplicationSource should clear buffer usage on ReplicationSourceManager upon > termination > > > Key: HBASE-24813 > URL: https://issues.apache.org/jira/browse/HBASE-24813 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.2.6, 2.3.4, 2.5.0 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.1 > > Attachments: TestReplicationSyncUpTool.log, > image-2020-10-09-10-50-00-372.png > > > Following investigations on the issue described by [~elserj] on HBASE-24779, > we found out that once a peer is removed, thus killing peers related > *ReplicationSource* instance, it may leave > *ReplicationSourceManager.totalBufferUsed* inconsistent. This can happen if > *ReplicationSourceWALReader* had put some entries on its queue to be > processed by *ReplicationSourceShipper,* but the peer removal killed the > shipper before it could process the pending entries. When > *ReplicationSourceWALReader* thread add entries to the queue, it increments > *ReplicationSourceManager.totalBufferUsed* with the sum of the entries sizes. > When those entries are read by *ReplicationSourceShipper,* > *ReplicationSourceManager.totalBufferUsed* is then decreased. We should also > decrease *ReplicationSourceManager.totalBufferUsed* when *ReplicationSource* > is terminated, otherwise those unprocessed entries size would be consuming > *ReplicationSourceManager.totalBufferUsed __*indefinitely, unless the RS gets > restarted. This may be a problem for deployments with multiple peers, or if > new peers are added.** -- This message was sent by Atlassian Jira (v8.3.4#803005)