[jira] [Created] (HBASE-28157) hbck should report previously reported regions with null region location

2023-10-16 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28157:
---

 Summary: hbck should report previously reported regions with null 
region location
 Key: HBASE-28157
 URL: https://issues.apache.org/jira/browse/HBASE-28157
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.5.6
Reporter: Andrew Kyle Purtell
 Fix For: 2.6.0, 2.4.18, 3.0.0, 4.0.0-alpha-1, 2.5.7


Operators bypassed some in progress TRSPs leading to a state where some regions 
were persistently in transition but hidden. 

Because the master builds its list of regions in transition by tracking TRSP, 
the bypass of TRSP removed the regions from the RIT list. This was expected, 
but I will propose a change to RIT tracking on another issue. 

The online hbck chore also did not report the inconsistency. This was not 
expected.

HBASE-28144 was another issue related to this incident, already fixed. 

Ensure that hbck will report as inconsistent regions where previously a 
location was reported but now the region location is null, if it is not 
expected to be offline.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28158) Decouple RIT list management from TRSP invocation

2023-10-16 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28158:
---

 Summary: Decouple RIT list management from TRSP invocation
 Key: HBASE-28158
 URL: https://issues.apache.org/jira/browse/HBASE-28158
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.5.6
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 2.6.0, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7


Operators bypassed some in progress TRSPs leading to a state where some regions 
were persistently in transition but hidden. Because the master builds its list 
of regions in transition by tracking TRSP, the bypass of TRSP removed the 
regions from the RIT list. 

Although I can see from reading the code this is the expected behavior, it is 
surprising for operators and should be changed. 

We should only remove a region from the RIT map when assignment reaches a 
suitable terminal state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28172) Update downloads.xml for release 2.5.6

2023-10-20 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28172:
---

 Summary: Update downloads.xml for release 2.5.6
 Key: HBASE-28172
 URL: https://issues.apache.org/jira/browse/HBASE-28172
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28172) Update downloads.xml for release 2.5.6

2023-10-20 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-28172.
-
Resolution: Fixed

> Update downloads.xml for release 2.5.6
> --
>
> Key: HBASE-28172
> URL: https://issues.apache.org/jira/browse/HBASE-28172
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28178) Upgrade ZooKeeper on all branches for CVE-2023-44981

2023-10-25 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28178:
---

 Summary: Upgrade ZooKeeper on all branches for CVE-2023-44981
 Key: HBASE-28178
 URL: https://issues.apache.org/jira/browse/HBASE-28178
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 2.6.0, 2.4.18, 3.0.0-beta-1, 4.0.0-alpha-1, 2.5.7


CVE-2023-44981 is a high scoring (9.1/10) authorization bypass vulnerability in 
ZooKeeper related to SASL quorum authentication. The bug is fixed in versions 
3.7.2, 3.8.3, and 3.9.1. 
Upgrade ZK versions on all active branches. At least 3.7.2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28267) create-release should run spotless

2023-12-16 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28267:
---

 Summary: create-release should run spotless
 Key: HBASE-28267
 URL: https://issues.apache.org/jira/browse/HBASE-28267
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell


Before committing generated files like CHANGES.md and RELEASENOTES.md we should 
run 'mvn spotless:apply' first to ensure what is committed is formatted per our 
rules and will not be modified when someone invokes spotless later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28282) Update downloads.xml for release 2.5.7

2023-12-24 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28282:
---

 Summary: Update downloads.xml for release 2.5.7
 Key: HBASE-28282
 URL: https://issues.apache.org/jira/browse/HBASE-28282
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28282) Update downloads.xml for release 2.5.7

2023-12-24 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-28282.
-
Resolution: Fixed

> Update downloads.xml for release 2.5.7
> --
>
> Key: HBASE-28282
> URL: https://issues.apache.org/jira/browse/HBASE-28282
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27694) Exclude the older versions of netty pulling from Hadoop dependencies

2024-02-26 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-27694.
-
Fix Version/s: (was: 2.5.8)
   (was: 3.0.0-beta-2)
   (was: 2.6.1)
 Assignee: (was: Rajeshbabu Chintaguntla)
   Resolution: Won't Fix

We can't fix this on our side because some Hadoop code still requires netty 3. 
We need to wait for HADOOP-15327 . Fix version is 3.4.0. 

> Exclude the older versions of netty pulling from Hadoop dependencies
> 
>
> Key: HBASE-27694
> URL: https://issues.apache.org/jira/browse/HBASE-27694
> Project: HBase
>  Issue Type: Bug
>Reporter: Rajeshbabu Chintaguntla
>Priority: Major
>
> Currently the netty version of 3.10.6 is getting pulled from hdfs 
> dependencies and sonatype kind of tools reporting the CVEs in HBase. To get 
> rid of this better to exclude netty where hdfs or mapred client jars used.
>  * org.apache.hbase : hbase-it : jar : tests : 2.5.2
>  ** org.apache.hadoop : hadoop-mapreduce-client-core : 3.2.2
>  *** io.netty : netty : 3.10.6.final
>  ** org.apache.hbase : hbase-endpoint : 2.5.2
>  *** org.apache.hadoop : hadoop-hdfs : jar : tests : 3.2.2
>   io.netty : netty : 3.10.6.final
>  *** org.apache.hadoop : hadoop-hdfs : 3.2.2
>   io.netty : netty : 3.10.6.final
>  * org.apache.hadoop : hadoop-mapreduce-client-jobclient : 3.2.2
>  ** io.netty : netty : 3.10.6.final
>  ** org.apache.hadoop : hadoop-mapreduce-client-common : 3.2.2
>  *** io.netty : netty : 3.10.6.final
>  * org.apache.hadoop : hadoop-mapreduce-client-jobclient : jar : tests : 3.2.2
>  ** io.netty : netty : 3.10.6.final
>  * org.apache.hadoop : hadoop-mapreduce-client-hs : 3.2.2
>  ** io.netty : netty : 3.10.6.final
>  ** org.apache.hadoop : hadoop-mapreduce-client-app : 3.2.2
>  *** io.netty : netty : 3.10.6.final
>  *** org.apache.hadoop : hadoop-mapreduce-client-shuffle : 3.2.2
>   io.netty : netty : 3.10.6.final



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28441) Update downloads.xml for 2.5.8

2024-03-13 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28441:
---

 Summary: Update downloads.xml for 2.5.8
 Key: HBASE-28441
 URL: https://issues.apache.org/jira/browse/HBASE-28441
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28441) Update downloads.xml for 2.5.8

2024-03-13 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-28441.
-
Resolution: Fixed

> Update downloads.xml for 2.5.8
> --
>
> Key: HBASE-28441
> URL: https://issues.apache.org/jira/browse/HBASE-28441
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28506) Remove hbase-compression-xz

2024-04-09 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28506:
---

 Summary: Remove hbase-compression-xz
 Key: HBASE-28506
 URL: https://issues.apache.org/jira/browse/HBASE-28506
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 2.6.0, 3.0.0-beta-2






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28507) Deprecate hbase-compression-xz

2024-04-09 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28507:
---

 Summary: Deprecate hbase-compression-xz
 Key: HBASE-28507
 URL: https://issues.apache.org/jira/browse/HBASE-28507
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 2.5.9






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-25972) Dual File Compaction

2024-05-17 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-25972.
-
Fix Version/s: 2.6.1
   2.5.9
 Hadoop Flags: Reviewed
 Release Note: The default compactor in HBase compacts HFiles into one 
file. This change introduces a new store file writer which writes the retained 
cells by compaction into two files, which will be called DualFileWriter. One of 
these files will include the live cells. This file will be called a 
live-version file. The other file will include the rest of the cells, that is, 
historical versions. This file will be called a historical-version file. 
DualFileWriter will work with the default compactor. The historical files will 
not be read for the scans scanning latest row versions. This eliminates 
scanning unnecessary cell versions in compacted files and thus it is expected 
to improve performance of these scans.
   Resolution: Fixed

> Dual File Compaction
> 
>
> Key: HBASE-25972
> URL: https://issues.apache.org/jira/browse/HBASE-25972
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Kadir Ozdemir
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.9
>
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when rows get larger on disk due to 
> multiple row versions since HBase stores all row versions sequentially in the 
> same HFile after compaction. However, applications (e.g., Phoenix) mostly 
> query the most recent row versions.
> The default compactor in HBase compacts HFiles into one file. This Jira 
> introduces a new store file writer which writes the retained cells by 
> compaction into two files, which will be called DualFileWriter. One of these 
> files will include the live cells. This file will be called a live-version 
> file. The other file will include the rest of the cells, that is, historical 
> versions. This file will be called a historical-version file. DualFileWriter 
> will work with the default compactor.
> The historical files will not be read for the scans scanning latest row 
> versions. This eliminates scanning unnecessary cell versions in compacted 
> files and thus it is expected to improve performance of these scans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-25244) Support splitting a region into N parts at a time

2024-06-03 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-25244.
-
  Assignee: (was: zhuqi)
Resolution: Duplicate

While not exactly the same proposal, this issue is duplicated by HBASE-28438 
and this one has had no activity. 

> Support splitting a region into N parts at a time
> -
>
> Key: HBASE-25244
> URL: https://issues.apache.org/jira/browse/HBASE-25244
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver
>Reporter: zhuqi
>Priority: Major
>
> In the current reference file format, only one parent region split into two 
> references can be recorded. At this time, if you want to continue splitting 
> the daughter region, you must wait until the majorCompaction is over and the 
> reference file is deleted before you can continue to split the region.
> If the reference file can point to other refenrence files, which data has not 
> been moved from the parentRegion to the region under the corresponding 
> folder, thereby establishing a multi-level reference. At this time, a tree 
> structure is formed. Only the root contains physical data, and the region on 
> the leaf node region is serving.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27635) Shutdown zookeeper logs coming via ReadOnlyZKClient when hbase shell started

2024-06-11 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-27635.
-
Fix Version/s: (was: 3.0.0-beta-2)
   (was: 2.6.1)
   (was: 2.5.9)
 Assignee: (was: Rajeshbabu Chintaguntla)
   Resolution: Not A Problem

> Shutdown zookeeper logs coming via ReadOnlyZKClient when hbase shell started
> 
>
> Key: HBASE-27635
> URL: https://issues.apache.org/jira/browse/HBASE-27635
> Project: HBase
>  Issue Type: Improvement
>  Components: shell
>Reporter: Rajeshbabu Chintaguntla
>Priority: Major
>
> When hbase shell with HBase 2.5.2 started there is too much logging of zk 
> connection realated, classpaths etc.  Even though we enabled ERROR log level 
> for zookeeper package.
> {noformat}
> 2023-02-10 17:34:25,211 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client 
> environment:zookeeper.version=3.5.9-5-a433770fc7b303332f10174221799495a26bbca2,
>  built on 02/07/2023 13:02 GMT
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client environment:host.name=host1
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client environment:java.version=1.8.0_352
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client environment:java.vendor=Red Hat, Inc.
> 2023-02-10 17:34:25,212 INFO  
> [ReadOnlyZKClient-host1:2181,host2:2181,host3:2181@0x15c16f19] 
> zookeeper.ZooKeeper: Client 
> environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.352.b08-2.el7_9.x86_64/jre
> {noformat}
> Better to change the  org.apache.hadoop.hbase.zookeeper package log level to 
> error.
> {noformat}
> # Set logging level to avoid verboseness
> org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.zookeeper',
>  log_level)
> org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.hadoop',
>  log_level)
> org.apache.logging.log4j.core.config.Configurator.setAllLevels('org.apache.hadoop.hbase.zookeeper',
>  log_level)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28652) Backport HBASE-21785 master reports open regions as RITs and also messes up rit age metric

2024-06-11 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-28652.
-
Resolution: Fixed

> Backport HBASE-21785 master reports open regions as RITs and also messes up 
> rit age metric
> --
>
> Key: HBASE-28652
> URL: https://issues.apache.org/jira/browse/HBASE-28652
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Szucs Villo
>Assignee: Szucs Villo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 2.6.1, 2.5.9
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28661) Fix compatibility issue in SecurityHeadersFilter in branch-2.x

2024-06-14 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-28661.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> Fix compatibility issue in SecurityHeadersFilter in branch-2.x
> --
>
> Key: HBASE-28661
> URL: https://issues.apache.org/jira/browse/HBASE-28661
> Project: HBase
>  Issue Type: Task
>Reporter: Szucs Villo
>Assignee: Szucs Villo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.7.0, 2.6.1, 2.5.9
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-26092) JVM core dump in the replication path

2024-07-15 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-26092.
-
Resolution: Duplicate

> JVM core dump in the replication path
> -
>
> Key: HBASE-26092
> URL: https://issues.apache.org/jira/browse/HBASE-26092
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 2.3.5
>Reporter: Huaxiang Sun
>Priority: Critical
>
> When replication is turned on, we found the following code dump in the region 
> server. 
> I checked the code dump for replication. I think I got some ideas. For 
> replication, when RS receives walEdits from remote cluster, it needs to send 
> them out to final RS. In this case, NettyRpcConnection is deployed, calls are 
> queued while it refers to ByteBuffer in the context of replicationHandler 
> (returned to the pool once it returns). Code dump will happen since the 
> byteBuffer has been reused. Needs ref count in this asynchronous processing.
>  
> Feel free to take it, otherwise, I will try to work on a patch later.
>  
>  
> {code:java}
> Stack: [0x7fb1bf039000,0x7fb1bf13a000],  sp=0x7fb1bf138560,  free 
> space=1021k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
> code)
> J 28175 C2 
> org.apache.hadoop.hbase.ByteBufferKeyValue.write(Ljava/io/OutputStream;Z)I 
> (21 bytes) @ 0x7fd2663c [0x7fd263c0+0x27c]
> J 14912 C2 
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.writeRequest(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Lorg/apache/hadoop/hbase/ipc/Call;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (370 bytes) @ 0x7fdbbb94b590 [0x7fdbbb949c00+0x1990]
> J 14911 C2 
> org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.write(Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelHandlerContext;Ljava/lang/Object;Lorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (30 bytes) @ 0x7fdbb972d1d4 [0x7fdbb972d1a0+0x34]
> J 30476 C2 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.write(Ljava/lang/Object;ZLorg/apache/hbase/thirdparty/io/netty/channel/ChannelPromise;)V
>  (149 bytes) @ 0x7fdbbd4e7084 [0x7fdbbd4e6900+0x784]
> J 14914 C2 org.apache.hadoop.hbase.ipc.NettyRpcConnection$6$1.run()V (22 
> bytes) @ 0x7fdbbb9344ec [0x7fdbbb934280+0x26c]
> J 23528 C2 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(J)Z
>  (106 bytes) @ 0x7fdbbcbb0efc [0x7fdbbcbb0c40+0x2bc]
> J 15987% C2 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run()V (461 
> bytes) @ 0x7fdbbbaf1580 [0x7fdbbbaf1360+0x220]
> j  
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run()V+44
> j  
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run()V+11
> j  
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run()V+4
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28739) Update downloads.xml for 2.5.9

2024-07-17 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28739:
---

 Summary: Update downloads.xml for 2.5.9
 Key: HBASE-28739
 URL: https://issues.apache.org/jira/browse/HBASE-28739
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28739) Update downloads.xml for 2.5.9

2024-07-17 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-28739.
-
Resolution: Fixed

> Update downloads.xml for 2.5.9
> --
>
> Key: HBASE-28739
> URL: https://issues.apache.org/jira/browse/HBASE-28739
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28740) Need to call parent class's serialization methods in CloseExcessRegionReplicasProcedure

2024-07-18 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-28740.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> Need to call parent class's serialization methods in 
> CloseExcessRegionReplicasProcedure
> ---
>
> Key: HBASE-28740
> URL: https://issues.apache.org/jira/browse/HBASE-28740
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.6.1, 2.5.10
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28755) Update downloads.xml for 2.5.10

2024-07-24 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-28755:
---

 Summary: Update downloads.xml for 2.5.10
 Key: HBASE-28755
 URL: https://issues.apache.org/jira/browse/HBASE-28755
 Project: HBase
  Issue Type: Task
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28755) Update downloads.xml for 2.5.10

2024-07-25 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-28755.
-
Resolution: Fixed

> Update downloads.xml for 2.5.10
> ---
>
> Key: HBASE-28755
> URL: https://issues.apache.org/jira/browse/HBASE-28755
> Project: HBase
>  Issue Type: Task
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-23054) Remove synchronization block from MetaTableMetrics and fix LossyCounting algorithm

2019-10-01 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell reopened HBASE-23054:
-

> Remove synchronization block from MetaTableMetrics and fix LossyCounting 
> algorithm
> --
>
> Key: HBASE-23054
> URL: https://issues.apache.org/jira/browse/HBASE-23054
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.5
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.1.7, 2.2.2
>
> Attachments: HBASE-23054.master.001.patch, 
> HBASE-23054.master.002.patch
>
>
> While trying to use LossyCounting for HBASE-15519 , found following bugs in 
> current implementation
>  – Remove synchronization block from MetaTableMetrics to avoid congestion at 
> the code 
> – Fix license format
> – Fix LossyCounting algorithm as per 
> [http://www.vldb.org/conf/2002/S10P03.pdf 
> |http://www.vldb.org/conf/2002/S10P03.pdf]
> -- Avoid doing sweep on every insert in LossyCounting
> – Remove extra redundant data structures from MetaTableMetrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-22988) Backport HBASE-11062 "hbtop" to branch-1

2019-10-02 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-22988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-22988.
-
Fix Version/s: 1.4.11
   1.3.6
 Hadoop Flags: Reviewed
 Assignee: Toshihiro Suzuki  (was: Andrew Kyle Purtell)
   Resolution: Fixed

> Backport HBASE-11062 "hbtop" to branch-1
> 
>
> Key: HBASE-22988
> URL: https://issues.apache.org/jira/browse/HBASE-22988
> Project: HBase
>  Issue Type: Sub-task
>  Components: backport, hbtop
>Reporter: Toshihiro Suzuki
>Assignee: Toshihiro Suzuki
>Priority: Major
> Fix For: 1.5.0, 1.3.6, 1.4.11
>
> Attachments: HBASE-22988-branch-1.patch
>
>
> Backport parent issue to branch-1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23101) Backport HBASE-22380 to branch-1

2019-10-02 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-23101.
-
Fix Version/s: 1.4.11
   1.3.6
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Backport HBASE-22380 to branch-1
> 
>
> Key: HBASE-23101
> URL: https://issues.apache.org/jira/browse/HBASE-23101
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Blocker
> Fix For: 1.5.0, 1.3.6, 1.4.11
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23116) LoadBalancer should log table name when balancing per table

2019-10-02 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23116:
---

 Summary: LoadBalancer should log table name when balancing per 
table
 Key: HBASE-23116
 URL: https://issues.apache.org/jira/browse/HBASE-23116
 Project: HBase
  Issue Type: Improvement
Affects Versions: 1.5.0
Reporter: Andrew Kyle Purtell
 Fix For: 3.0.0, 1.5.0, 2.3.0, 1.3.6, 1.4.11, 2.1.7, 2.2.2


The load balancer logs lines like these:

{noformat}
2019-10-02 23:18:47,664 INFO  [46493_ChoreService_6] 
balancer.StochasticLoadBalancer - Skipping load balancing because balanced 
cluster; total cost is 46.68964334022376, sum multiplier is 1087.0 min cost 
which need balance is 0.05
{noformat}

When balancing per table it would be useful if the table name was also printed 
in the log line. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23116) LoadBalancer should log table name when balancing per table

2019-10-04 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-23116.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> LoadBalancer should log table name when balancing per table
> ---
>
> Key: HBASE-23116
> URL: https://issues.apache.org/jira/browse/HBASE-23116
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.4.10, 2.2.1, 2.1.6
>Reporter: Andrew Kyle Purtell
>Assignee: Bharath Vissapragada
>Priority: Minor
> Fix For: 3.0.0, 1.5.0, 2.3.0, 1.4.11, 2.2.2, 2.1.8
>
>
> The load balancer logs lines like these:
> {noformat}
> 2019-10-02 23:18:47,664 INFO  [46493_ChoreService_6] 
> balancer.StochasticLoadBalancer - Skipping load balancing because balanced 
> cluster; total cost is 46.68964334022376, sum multiplier is 1087.0 min cost 
> which need balance is 0.05
> {noformat}
> When balancing per table it would be useful if the table name was also 
> printed in the log line. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23128) Restore Region interface compatibility

2019-10-07 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23128:
---

 Summary: Restore Region interface compatibility 
 Key: HBASE-23128
 URL: https://issues.apache.org/jira/browse/HBASE-23128
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Kyle Purtell


Adding methods to a Public interface is ok for a minor release, removing 
methods is not. We need to restore 

abstract method boolean bulkLoadHFiles ( Collection>, 
boolean, Region.BulkLoadListener )

in order to maintain binary compatibility. 





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23128) Restore Region interface compatibility

2019-10-07 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-23128.
-
Resolution: Fixed

> Restore Region interface compatibility 
> ---
>
> Key: HBASE-23128
> URL: https://issues.apache.org/jira/browse/HBASE-23128
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Blocker
> Fix For: 1.5.0
>
>
> Adding methods to a Public interface is ok for a minor release, removing 
> methods is not. We need to restore 
> {code}
> abstract method boolean bulkLoadHFiles (
> Collection>, boolean, Region.BulkLoadListener)
> {code}
> to the Region interface in order to maintain binary compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23139) MapReduce jobs lauched from convenience distribution are nonfunctional

2019-10-08 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23139:
---

 Summary: MapReduce jobs lauched from convenience distribution are 
nonfunctional
 Key: HBASE-23139
 URL: https://issues.apache.org/jira/browse/HBASE-23139
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 1.5.0
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 1.3.6, 1.4.11, 1.5.0


CNFE thirdparty GSON, need to add thirdparty jar to job deps.

{noformat}
Error: java.lang.ClassNotFoundException: 
org.apache.hbase.thirdparty.com.google.gson.GsonBuilder
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.hadoop.hbase.util.GsonUtil.createGson(GsonUtil.java:44)
at org.apache.hadoop.hbase.util.JsonMapper.(JsonMapper.java:37)
at org.apache.hadoop.hbase.client.Operation.toJSON(Operation.java:70)
at org.apache.hadoop.hbase.client.Operation.toString(Operation.java:96)
at org.apache.hadoop.hbase.client.Operation.toString(Operation.java:110)
at 
org.apache.hadoop.hbase.mapreduce.TableSplit.toString(TableSplit.java:368)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:762)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:171)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23151) Backport HBASE-23083 (Collect Executor status info periodically and report to metrics system) to branch-1

2019-10-11 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23151:
---

 Summary: Backport HBASE-23083 (Collect Executor status info 
periodically and report to metrics system) to branch-1
 Key: HBASE-23151
 URL: https://issues.apache.org/jira/browse/HBASE-23151
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Kyle Purtell
 Fix For: 1.6.0, 1.5.1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23153) PrimaryRegionCountSkewCostFunction SLB function should implement CostFunction#isNeeded

2019-10-11 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23153:
---

 Summary: PrimaryRegionCountSkewCostFunction SLB function should 
implement CostFunction#isNeeded
 Key: HBASE-23153
 URL: https://issues.apache.org/jira/browse/HBASE-23153
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Kyle Purtell


The PrimaryRegionCountSkewCostFunction SLB function should implement 
CostFunction#isNeeded and like the other region replica specific functions 
should return false for it when region replicas are not in use. Otherwise it 
will always report a cost if 0 even though its weight will be included in the 
sum of the weights. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23161) Invalid hostname tests can fail if the ISP hijacks NXDOMAIN

2019-10-12 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23161:
---

 Summary: Invalid hostname tests can fail if the ISP hijacks 
NXDOMAIN
 Key: HBASE-23161
 URL: https://issues.apache.org/jira/browse/HBASE-23161
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Kyle Purtell


Some residential internet service providers use the opportunity of DNS record 
not found cases and instead of returning NXDOMAIN responses per the standard 
they return A records that redirect the user to their own portal or search 
page. This breaks tests like 
TestConnectionImplementation.testGetClientBadHostname and 
TestRegionServerHostname.testInvalidRegionServerHostnameAbortsServer. We should 
detect this behavior and skip these cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23174) Upgrade jackson and jackson-databind to 2.9.10

2019-10-14 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23174:
---

 Summary: Upgrade jackson and jackson-databind to 2.9.10
 Key: HBASE-23174
 URL: https://issues.apache.org/jira/browse/HBASE-23174
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell
 Fix For: 2.3.0, 1.3.6, 1.4.11, 2.2.2, 2.1.8, 1.5.1


Two more CVEs (CVE-2019-16335 and CVE-2019-14540) are addressed in 
jackson-databind 2.9.10.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23151) Backport HBASE-23083 (Collect Executor status info periodically and report to metrics system) to branch-1

2019-10-15 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-23151.
-
Resolution: Fixed

Pushed as 425d84dc14, thanks [~javaman_chen]

> Backport HBASE-23083 (Collect Executor status info periodically and report to 
> metrics system) to branch-1
> -
>
> Key: HBASE-23151
> URL: https://issues.apache.org/jira/browse/HBASE-23151
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Kyle Purtell
>Assignee: chenxu
>Priority: Minor
> Fix For: 1.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23206) ZK quorum redundancy with failover in RZK

2019-10-23 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23206:
---

 Summary: ZK quorum redundancy with failover in RZK
 Key: HBASE-23206
 URL: https://issues.apache.org/jira/browse/HBASE-23206
 Project: HBase
  Issue Type: Brainstorming
Reporter: Andrew Kyle Purtell


We have faced a few production issues where the reliability of the ZooKeeper 
quorum serving the cluster has not been as robust as expected. The most recent 
one was essentially ZOOKEEPER-2164 (and related: ZOOKEEPER-900). These can be 
mitigated by a ZK server configuration change but the incidents suggest it may 
be worth thinking about how to be less reliant on the service provided by a 
single ZK quorum instance. 

A solution would be holistic with several parts:
- HBASE-18095 to get ZK dependencies out of the client
- Related HBase replication improvements to track peer and position state in 
HBase tables instead of znodes
- This brainstorming...

For this part, we could consider the possibility that RecoverableZooKeeper 
(RZK) might be taught how to speak to two separate ZK quorum redundantly, and 
continue to offer service even if one of them is temporarily unable to provide 
service. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23207) Log a region open journal

2019-10-23 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23207:
---

 Summary: Log a region open journal
 Key: HBASE-23207
 URL: https://issues.apache.org/jira/browse/HBASE-23207
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell


Like HBASE-22828, but for region opening.

Also, tweak the calls to enableStatusJournal to pass through 'true' as 
parameter to include the current status in the journal, for slightly more 
context. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-15519) Add per-user metrics

2019-10-23 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-15519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell reopened HBASE-15519:
-

> Add per-user metrics 
> -
>
> Key: HBASE-15519
> URL: https://issues.apache.org/jira/browse/HBASE-15519
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 1.2.0
>Reporter: Enis Soztutar
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-15519.master.003.patch, hbase-15519_v0.patch, 
> hbase-15519_v1.patch, hbase-15519_v1.patch, hbase-15519_v2.patch
>
>
> Per-user metrics will be useful in multi-tenant cases where we can emit 
> number of requests, operations, num RPCs etc at the per-user aggregate level 
> per regionserver. We currently have throttles per user, but no way to monitor 
> resource usage per-user. 
> Looking at these metrics, operators can adjust throttles, do capacity 
> planning, etc per-user. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-15519) Add per-user metrics

2019-10-23 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-15519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-15519.
-
Resolution: Fixed

> Add per-user metrics 
> -
>
> Key: HBASE-15519
> URL: https://issues.apache.org/jira/browse/HBASE-15519
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 1.2.0
>Reporter: Enis Soztutar
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 2.3.0
>
> Attachments: HBASE-15519.master.003.patch, hbase-15519_v0.patch, 
> hbase-15519_v1.patch, hbase-15519_v1.patch, hbase-15519_v2.patch
>
>
> Per-user metrics will be useful in multi-tenant cases where we can emit 
> number of requests, operations, num RPCs etc at the per-user aggregate level 
> per regionserver. We currently have throttles per user, but no way to monitor 
> resource usage per-user. 
> Looking at these metrics, operators can adjust throttles, do capacity 
> planning, etc per-user. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23210) Backport HBASE-15519 (Add per-user metrics) to branch-1

2019-10-23 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23210:
---

 Summary: Backport HBASE-15519 (Add per-user metrics) to branch-1
 Key: HBASE-23210
 URL: https://issues.apache.org/jira/browse/HBASE-23210
 Project: HBase
  Issue Type: New Feature
Reporter: Andrew Kyle Purtell
 Fix For: 1.6.0


We will need HBASE-15519 in branch-1 for eventual backport of HBASE-23065.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23225) Error building shaded-client: duplicate entry: META-INF/.../ObjectCodec

2019-10-28 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23225:
---

 Summary: Error building shaded-client: duplicate entry: 
META-INF/.../ObjectCodec 
 Key: HBASE-23225
 URL: https://issues.apache.org/jira/browse/HBASE-23225
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Kyle Purtell


{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade
 (aggregate-into-a-jar-with-relocated-third-parties) on project 
hbase-shaded-client: 
Error creating shaded jar: 
duplicate entry: 
META-INF/services/org.apache.hadoop.hbase.shaded.com.fasterxml.jackson.core.ObjectCodec
 
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23226) Backport HBASE-22460 (Reopen a region if store reader references may have leaked) to branch-1

2019-10-28 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23226:
---

 Summary: Backport HBASE-22460 (Reopen a region if store reader 
references may have leaked) to branch-1
 Key: HBASE-23226
 URL: https://issues.apache.org/jira/browse/HBASE-23226
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 1.4.11, 1.3.6, 1.5.0
Reporter: Andrew Kyle Purtell
 Fix For: 1.6.0


Backport parent change to branch-1. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23225) Error building shaded-client: duplicate entry: META-INF/.../ObjectCodec

2019-10-28 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-23225.
-
Resolution: Cannot Reproduce

After some other work and builds returning to this to try and repo with -X was 
unsuccessful. Whatever may have happened before the Maven black magic voodoo is 
working again for me

> Error building shaded-client: duplicate entry: META-INF/.../ObjectCodec 
> 
>
> Key: HBASE-23225
> URL: https://issues.apache.org/jira/browse/HBASE-23225
> Project: HBase
>  Issue Type: Bug
> Environment: $ mvn --version
> Apache Maven 3.6.2 (40f52333136460af0dc0d7232c0dc0bcf0d9e117; 
> 2019-08-27T08:06:16-07:00)
> Maven home: /usr/local/Cellar/maven/3.6.2/libexec
> Java version: 1.8.0_232, vendor: Azul Systems, Inc., runtime: 
> /Library/Java/JavaVirtualMachines/zulu-8.jdk/Contents/Home/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "mac os x", version: "10.14.6", arch: "x86_64", family: "mac"
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-shade-plugin:3.1.1:shade
>  (aggregate-into-a-jar-with-relocated-third-parties) on project 
> hbase-shaded-client: 
> Error creating shaded jar: 
> duplicate entry: 
> META-INF/services/org.apache.hadoop.hbase.shaded.com.fasterxml.jackson.core.ObjectCodec
>  
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23226) Backport HBASE-22460 (Reopen a region if store reader references may have leaked) to branch-1

2019-10-28 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-23226.
-
Fix Version/s: (was: 1.6.0)
   Resolution: Duplicate

> Backport HBASE-22460 (Reopen a region if store reader references may have 
> leaked) to branch-1
> -
>
> Key: HBASE-23226
> URL: https://issues.apache.org/jira/browse/HBASE-23226
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 1.5.0, 1.3.6, 1.4.11
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> Backport parent change to branch-1. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23210) Backport HBASE-15519 (Add per-user metrics) to branch-1

2019-11-02 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-23210.
-
Resolution: Fixed

> Backport HBASE-15519 (Add per-user metrics) to branch-1
> ---
>
> Key: HBASE-23210
> URL: https://issues.apache.org/jira/browse/HBASE-23210
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 1.5.1
>
>
> We will need HBASE-15519 in branch-1 for eventual backport of HBASE-23065.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23246) Fix error prone warning in TestMetricsUserSourceImpl

2019-11-02 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23246:
---

 Summary: Fix error prone warning in TestMetricsUserSourceImpl
 Key: HBASE-23246
 URL: https://issues.apache.org/jira/browse/HBASE-23246
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Kyle Purtell
 Fix For: 1.6.0


TestMetricsUserSourceImpl.java:[50,29] [SelfComparison] An object is compared 
to itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23317) An option to fail only the region open if a coprocessor fails to load

2019-11-18 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23317:
---

 Summary: An option to fail only the region open if a coprocessor 
fails to load
 Key: HBASE-23317
 URL: https://issues.apache.org/jira/browse/HBASE-23317
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell


If a table coprocessor fails to load, rather than aborting, throw an exception 
which prevents the region from opening. This will lead to unresolvable regions 
in transition but in some circumstances this may be preferable to process 
aborts. On the other hand, there would be a new risk that the failure to load 
is a symptom of or a cause of regionserver global state corruption that 
eventually leads to other problems. Should at least be an option, though.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23318) LoadTestTool doesn't start

2019-11-18 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23318:
---

 Summary: LoadTestTool doesn't start
 Key: HBASE-23318
 URL: https://issues.apache.org/jira/browse/HBASE-23318
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Kyle Purtell


./bin/hbase ltt after unpacking a binary tarball distribution doesn't start 
with a CNFE. We are missing the tests jar from hbase-zookeeper. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23288) Backport HBASE-23251 (Add Column Family and Table Names to HFileContext) to branch-1

2019-11-18 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-23288.
-
Fix Version/s: 1.6.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Backport HBASE-23251 (Add Column Family and Table Names to HFileContext) to 
> branch-1
> 
>
> Key: HBASE-23288
> URL: https://issues.apache.org/jira/browse/HBASE-23288
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>Priority: Major
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-23085) Network and Data related Actions

2019-12-09 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell reopened HBASE-23085:
-

This commit has a terminology problem. The universal technical term for network 
packet is packet, not "package".

> Network and Data related Actions
> 
>
> Key: HBASE-23085
> URL: https://issues.apache.org/jira/browse/HBASE-23085
> Project: HBase
>  Issue Type: Sub-task
>  Components: integration tests
>Reporter: Szabolcs Bukros
>Assignee: Szabolcs Bukros
>Priority: Minor
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> Add additional actions to:
>  * manipulate network packages with tc (reorder, loose,...)
>  * add CPU load
>  * fill the disk
>  * corrupt or delete regionserver data files
> Create new monkey factories for the new actions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23569) Validate that the log cleaner actually cleans oldWALs as expected

2019-12-12 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23569:
---

 Summary: Validate that the log cleaner actually cleans oldWALs as 
expected
 Key: HBASE-23569
 URL: https://issues.apache.org/jira/browse/HBASE-23569
 Project: HBase
  Issue Type: Test
  Components: integration tests, master, test
Reporter: Andrew Kyle Purtell
 Fix For: 3.0.0, 2.3.0, 1.6.0


The fix for HBASE-23287 (LogCleaner is not added to choreService) is in but we 
are lacking test coverage that validates that the log cleaner actually cleans 
oldWALs as expected. Add the test. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23678) Literate builder API for version management in schema

2020-01-10 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23678:
---

 Summary: Literate builder API for version management in schema
 Key: HBASE-23678
 URL: https://issues.apache.org/jira/browse/HBASE-23678
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell


Lars designed the combination of VERSIONS, TTL, MIN_VERSIONS, and 
KEEP_DELETED_CELLS with a maximum of flexibility. There is a lot of nuance 
regarding their usage. Almost all combinations of these four settings make 
sense for some use cases (exceptions are MIN_VERSIONS > 0 without TTL, and 
KEEP_DELETED_CELLS=TTL without TTL). There should be a way to make the behavior 
with TTL easier to conceive when creating the schema. This could take the form 
of a literate builder API for ColumnDescriptor or an extension to an existing 
one. 

Let me give you a motivating example: We may want to retain all versions for a 
given TTL, and then only a specific number of versions. This can be achieved 
with VERSIONS=INT_MAX, TTL=_retention_interval_, KEEP_DELETED_CELLS=TTL, 
MIN_VERSION=_num_versions_ . This is not intuitive though because VERSIONS has 
been used to specify _num_versions_ in this example since version 0.1.

A literate builder API, by way if its method names, could let a user describe 
more or less in speaking language how they want version retention to work, and 
internally the builder API could set the low level schema attributes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-16141) Unwind use of UserGroupInformation.doAs() to convey requester identity in coprocessor upcalls

2020-03-06 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-16141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-16141.
-
Fix Version/s: (was: 1.7.0)
   (was: 3.0.0)
 Assignee: (was: Gary Helmling)
   Resolution: Later

> Unwind use of UserGroupInformation.doAs() to convey requester identity in 
> coprocessor upcalls
> -
>
> Key: HBASE-16141
> URL: https://issues.apache.org/jira/browse/HBASE-16141
> Project: HBase
>  Issue Type: Improvement
>  Components: Coprocessors, security
>Reporter: Gary Helmling
>Priority: Major
>
> In discussion on HBASE-16115, there is some discussion of whether 
> UserGroupInformation.doAs() is the right mechanism for propagating the 
> original requester's identify in certain system contexts (splits, 
> compactions, some procedure calls).  It has the unfortunately of overriding 
> the current user, which makes for very confusing semantics for coprocessor 
> implementors.  We should instead find an alternate mechanism for conveying 
> the caller identity, which does not override the current user context.
> I think we should instead look at passing this through as part of the 
> ObserverContext passed to every coprocessor hook.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23948) Backport HBASE-23146 (Support CheckAndMutate with multiple conditions) to branch-1

2020-03-06 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-23948:
---

 Summary: Backport HBASE-23146 (Support CheckAndMutate with 
multiple conditions) to branch-1
 Key: HBASE-23948
 URL: https://issues.apache.org/jira/browse/HBASE-23948
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell
 Fix For: 1.7.0


Backport HBASE-23146 (Support CheckAndMutate with multiple conditions) to 
branch-1, including updates to REST (HBASE-23924) and Thrift (HBASE-23925). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-23220) Release 1.6.0

2020-03-06 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-23220.
-
Resolution: Fixed

> Release 1.6.0
> -
>
> Key: HBASE-23220
> URL: https://issues.apache.org/jira/browse/HBASE-23220
> Project: HBase
>  Issue Type: Task
>  Components: community
>Affects Versions: 1.5.1
>Reporter: Sean Busbey
>Assignee: Andrew Kyle Purtell
>Priority: Major
>
> let's roll 1.6.0 to get HBASE-23174 out on recent branch-1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24018) Access check for getTableDescriptors is too restrictive

2020-03-18 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-24018.
-
Resolution: Won't Fix

> Access check for getTableDescriptors is too restrictive
> ---
>
> Key: HBASE-24018
> URL: https://issues.apache.org/jira/browse/HBASE-24018
> Project: HBase
>  Issue Type: Improvement
>Reporter: Abhishek Singh Chouhan
>Priority: Major
>
> Currently getTableDescriptor requires a user to have Admin or Create 
> permissions. A client might need to get table descriptors to act accordingly 
> eg. based on an attribute set or a CP loaded. It should not be necessary for 
> the client to have create or admin privileges just to read the descriptor, 
> execute and/or read permission should be sufficient? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24069) Extend HBASE-16209 strategy (Provide an ExponentialBackOffPolicy sleep between failed region open requests) to region close and split requests

2020-03-27 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24069:
---

 Summary: Extend HBASE-16209 strategy (Provide an 
ExponentialBackOffPolicy sleep between failed region open requests) to region 
close and split requests
 Key: HBASE-24069
 URL: https://issues.apache.org/jira/browse/HBASE-24069
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Affects Versions: 1.6.0
Reporter: Andrew Kyle Purtell
 Fix For: 3.0.0, 1.7.0, 2.4.0


In HBASE-16209 we provide an ExponentialBackOffPolicy sleep between failed 
region open requests. This should be extended to also apply to region close and 
split requests. Will reduce the likelihood of FAILED_CLOSE transitions in 
production by being more tolerant of temporary regionserver loading issues, 
e.g. CallQueueTooBigException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24099) Use a fair ReentrantReadWriteLock for the region lock used to guard closes

2020-04-01 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24099:
---

 Summary: Use a fair ReentrantReadWriteLock for the region lock 
used to guard closes
 Key: HBASE-24099
 URL: https://issues.apache.org/jira/browse/HBASE-24099
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell


Consider creating the region's ReentrantReadWriteLock with the fair locking 
policy. We have had a couple of production incidents where a regionserver 
stalled in shutdown for a very very long time, leading to RIT (FAILED_CLOSE). 
The latest example is a 43 minute shutdown, ~40 minutes (2465280 ms) of that 
time was spent waiting to acquire the write lock on the region in order to 
finish closing it.

{quote}
...

Finished memstore flush of ~66.92 MB/70167112, currentsize=0 B/0 for region 
. in 927ms, sequenceid=6091133815, compaction requested=false at 
1585175635349 (+60 ms)

Disabling writes for close at 1585178100629 (+2465280 ms)

{quote}

This time was spent in between the memstore flush and the task status change 
"Disabling writes for close at...". This is at HRegion.java:1481 in 1.3.6:

{code}
1480:   // block waiting for the lock for closing

1481:  lock.writeLock().lock(); // FindBugs: Complains 
UL_UNRELEASED_LOCK_EXCEPTION_PATH but seems fine
{code}
 
The close lock is operating in unfair mode. The table in question is under 
constant high query load. When the close request was received, there were 
active readers. After the close request there were more active readers, 
near-continuous contention. Although the clients would receive 
RegionServerStoppingException and other error notifications, because the region 
could not be reassigned, they kept coming, region (re-)location would find the 
region still hosted on the stuck server. Finally the closing thread waiting for 
the write lock became no longer starved (by chance) after 40 minutes.

The ReentrantReadWriteLock javadoc is clear about the possibility of starvation 
when continuously contended: "_When constructed as non-fair (the default), the 
order of entry to the read and write lock is unspecified, subject to reentrancy 
constraints. A nonfair lock that is continuously contended may indefinitely 
postpone one or more reader or writer threads, but will normally have higher 
throughput than a fair lock._"

We could try changing the acquisition semantics of this lock to fair. This is a 
one line change, where we call the RW lock constructor. Then:

 "_When constructed as fair, threads contend for entry using an approximately 
arrival-order policy. When the currently held lock is released, either the 
longest-waiting single writer thread will be assigned the write lock, or if 
there is a group of reader threads waiting longer than all waiting writer 
threads, that group will be assigned the read lock._" 

This could be better. The close process will have to wait until all readers and 
writers already waiting for acquisition either acquire and release or go away 
but won't be starved by future/incoming requests.

There could be a throughput loss in request handling, though, because this is 
the global reentrant RW lock for the region. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24115) Relocate test-only REST "client" from src/ to test/ and mark Private

2020-04-03 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24115:
---

 Summary: Relocate test-only REST "client" from src/ to test/ and 
mark Private
 Key: HBASE-24115
 URL: https://issues.apache.org/jira/browse/HBASE-24115
 Project: HBase
  Issue Type: Test
  Components: REST, security
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 1.7.0


Relocate test-only REST "client" from src/ to test/ and annotate as Private. 
The classes o.a.h.h.rest.Remote* were developed to facilitate REST unit tests 
and incorrectly committed to src/ . 

Although this "breaks" compatibility by moving public classes to test jar and 
marking them private, no attention has been paid to these classes with respect 
to performance, convenience, or security. Consensus from various discussions 
over the years is to move them to test/ as was intent of the original 
committer, but misplaced by the same individual. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24322) UnsafeAvailChecker should also check that required methods are available

2020-05-04 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24322:
---

 Summary: UnsafeAvailChecker should also check that required 
methods are available
 Key: HBASE-24322
 URL: https://issues.apache.org/jira/browse/HBASE-24322
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Kyle Purtell


We had a weird test failure due to accidentally running tests with Java 11, 
where Unsafe is available, but the method signatures were different, leading to 
this:
{noformat}
020-05-02 14:57:15,145 ERROR [main] master.HMasterCommandLine: Master exiting

java.lang.RuntimeException: Failed construction of Master: class 
org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster

at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:143)

at 
org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:237)

at 
org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:163)

at 
org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:225)

at 
org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at 
org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:127)

at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2911)

Caused by: java.lang.NoSuchMethodError: 'void 
sun.misc.Unsafe.putInt(java.lang.Object, int, int)'

at org.apache.hadoop.hbase.util.UnsafeAccess.putInt(UnsafeAccess.java:233)

at 
org.apache.hadoop.hbase.util.Bytes$ConverterHolder$UnsafeConverter.putInt(Bytes.java:1499)

at org.apache.hadoop.hbase.util.Bytes.putInt(Bytes.java:1021)

at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.appendMetaData(RecoverableZooKeeper.java:850)

at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:640)

at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:1027)

at 
org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.setMasterAddress(MasterAddressTracker.java:211)

at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2095)

at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:520)

at 
org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.(HMasterCommandLine.java:315)

at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)

at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)

at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:138)

... 7 more
{noformat}

We should also check that all methods that will be invoked on Unsafe in 
UnsafeAccess.java are available when deciding in UnsafeAvailChecker if Unsafe 
is available (and usable). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24350) HBase table level replication metrics for shippedBytes are always 0

2020-05-14 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-24350.
-
Fix Version/s: 2.4.0
   1.7.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

> HBase table level replication metrics for shippedBytes are always 0
> ---
>
> Key: HBASE-24350
> URL: https://issues.apache.org/jira/browse/HBASE-24350
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, master, 1.7.0, 2.4.0
>Reporter: Sandeep Pal
>Assignee: Sandeep Pal
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0
>
>
> It was observed during some investigations that table level metrics for 
> shippedBytes are always 0 consistently even though data is getting shipped.
> There are two problems with table-level metrics:
>  # There are no table-level metrics for shipped bytes.
>  # Another problem is that it's using `MetricsReplicationSourceSourceImpl` 
> which is creating all source-level metrics at table level as well but updated 
> only ageOfLastShippedOp. This reports lot of false/incorrect replication 
> metrics at table level. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24380) Improve WAL splitting log lines to enable sessionizatino

2020-05-15 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24380:
---

 Summary: Improve WAL splitting log lines to enable sessionizatino
 Key: HBASE-24380
 URL: https://issues.apache.org/jira/browse/HBASE-24380
 Project: HBase
  Issue Type: Improvement
  Components: logging, Operability, wal
Reporter: Andrew Kyle Purtell


Looking to reconstruct a timeline from write of recovered.edits file back to 
start of WAL file split, with a bunch of unrelated activity in the meantime, 
there isn't a consistent token that links split file write messages (which 
include store path including region hash) to beginning of WAL splitting 
activity. Sessonizing by host doesn't work because work can bounce around 
through retries. Thread context names in the logs vary and can be like 
[nds1-225-fra:60020-7] or [fb472085572ba72e96f1] (trailing digits of region 
hash) or [splits-1589016325868] . 

We could have WALSplitter get the current time when starting the split of a WAL 
file and have it log this timestamp in every line as a splitting session 
identifier.

Related, we should track the time of split task execution end to end and export 
a metric that captures it.

It might also be worthwhile to wire up more of WAL splitting to TaskMonitor 
status logging. If we do this we can also enable status journal logging, so 
when splitting is down, a line will appear in the log that has the list of all 
status messages recorded during splitting and the time delta in milliseconds 
between them. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24428) Priority compaction for recently split daughter regions

2020-05-25 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24428:
---

 Summary: Priority compaction for recently split daughter regions
 Key: HBASE-24428
 URL: https://issues.apache.org/jira/browse/HBASE-24428
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: Andrew Kyle Purtell


We observe that under hotspotting conditions that splitting will proceed very 
slowly and the "_Cannot split region due to reference files being there_" log 
line will be logged excessively. (branch-1 based production.) This is because 
after a region is split it must be compacted before it can be split again. 
Reference files must be replaced by real HFiles, normal housekeeping performed 
during compaction. However if the regionserver is under excessive load, its 
compaction queues may become deep. The daughters of a recently split 
hotspotting region may themselves continue to hotspot and will rapidly need to 
split again. If the scheduled compaction work to remove/replace reference files 
is queued hundreds or thousands of compaction queue elements behind current, 
the recently split daughter regions will not be able to split again for a long 
time and may grow very large, producing additional complications (very large 
regions, very deep replication queues).

To help avoid this condition we should prioritize the compaction of recently 
split daughter regions. Compaction requests include a {{priority}} field and 
CompactionRequest implements a comparator that sorts by this field. We already 
detect when a compaction request involves a region that has reference files, to 
ensure that it gets selected to be eligible for compaction, but we do not seem 
to prioritize the requests for post-split housekeeping. Split work should be 
placed at the top of the queue. Ensure that this is happening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24439) Replication queue recovery tool for rescuing deep queues

2020-05-26 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24439:
---

 Summary: Replication queue recovery tool for rescuing deep queues
 Key: HBASE-24439
 URL: https://issues.apache.org/jira/browse/HBASE-24439
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: Andrew Kyle Purtell


In HBase cross site replication, on the source side, every regionserver places 
its WALs into a replication queue and then drains the queue to the remote sink 
cluster. At the source cluster every regionserver participates as a source. At 
the sink cluster, a configurable subset of regionservers volunteer to process 
inbound replication RPC. 

When data is highly skewed we can take certain steps to mitigate, such as 
pre-splitting, or manual splitting, and rebalancing. This can most effectively 
be done at the sink, because replication RPCs are randomly distributed over the 
set of receiving regionservers, and splitting on the sink side can effectively 
redistribute resulting writes there. On the source side we are more limited. 

If writes are deeply unbalanced, a regionserver's source replication queue may 
become very deep. Hotspotting can happen, despite mitigations. Unlike on the 
sink side, once hotspotting has happened at the source, it is not possible to 
increase parallelism or redistribute work among sources once WALs have already 
been enqueued. Increasing parallelism on the sink side will not help if there 
is a big rock at the source. Source side mitigations like splitting and 
redistribute cannot help deep queues already accumulated.

Can we redistribute source work? Yes and no. If a source regionserver fails, 
its queues will be recovered by other regionservers. However the other rs must 
still serve the recovered queue as an atomic entity. We can move a deep queue, 
but we can't break it up. 

Where time is of the essence, and ordering semantics can be allowed to break, 
operators should have available to them a recovery tool that rescues their 
production from the consequences of  deep source queues. A very large 
replication queue can be split into many smaller queues. Perhaps even one new 
queue for each WAL file. Then, these new synthetic queues can be distributed to 
any/all source regionservers through the normal recovery queue assignment 
protocol. This increases parallelism at the source.

Of course this would break serial replication semantics, and sync replication 
semantics, and even in branch-1 which does not have these features would highly 
increase the probability of reordering of edits. That is an unavoidable 
consequence of breaking up the queue for more parallelism, but as long as this 
is done by a separate tool, invoked by operators, it is a valid option for 
emergency drain if backed up replication queues. Every cell in the WAL entries 
carries a timestamp assigned at the source, and will be applied on the sink 
with this timestamp. When the queue is drained and all edits have been 
persisted at the target, there will be a complete and correct temporal data 
ordering at that time. An operator will be and must be prepared to handle 
intermediate mis-/re-ordered states if they intend to invoke this tool. In many 
use cases the interim states are not important. The final state after all edits 
have transferred cross cluster and persisted at this sink, after invocation of 
the recovery tool, is the point where the operator would transition back into 
service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24440) Prevent temporal misordering on timescales smaller than one clock tick

2020-05-26 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24440:
---

 Summary: Prevent temporal misordering on timescales smaller than 
one clock tick
 Key: HBASE-24440
 URL: https://issues.apache.org/jira/browse/HBASE-24440
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell


When mutations are sent to the servers without a timestamp explicitly assigned 
by the client the server will substitute the current wall clock time. There are 
edge cases where it is at least theoretically possible for more than one 
mutation to be committed to a given row within the same clock tick. When this 
happens we have to track and preserve the ordering of these mutations in some 
other way besides the timestamp component of the key. Let me bypass most 
discussion here by noting that whether we do this or not, we do not pass such 
ordering information in the cross cluster replication protocol. We also have 
interesting edge cases regarding key type precedence when mutations arrive 
"simultaneously": we sort deletes ahead of puts. This, especially in the 
presence of replication, can lead to visible anomalies for clients able to 
interact with both source and sink. 

There is a simple solution that removes the possibility that these edge cases 
can occur: 

We can detect, when we are about to commit a mutation to a row, if we have 
already committed a mutation to this same row in the current clock tick. 
Occurrences of this condition will be rare. We are already tracking current 
time. We have to know this in order to assign the timestamp. Where this becomes 
interesting is how we might track the last commit time per row. Making the 
detection of this case efficient for the normal code path is the bulk of the 
challenge. We would do this somehow via the memstore. Assuming we can 
efficiently know if we are about to commit twice to the same row within a 
single clock tick, we would simply sleep/yield the current thread until the 
clock ticks over, and then proceed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24445) Improve default thread pool size for opening store files

2020-05-27 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24445:
---

 Summary: Improve default thread pool size for opening store files
 Key: HBASE-24445
 URL: https://issues.apache.org/jira/browse/HBASE-24445
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell


For each store open we create a CompletionService and also create a thread pool 
for opening and closing store files. See HStore#openStoreFiles and 
HRegion#getStoreFileOpenAndCloseThreadPool. By default this pool has only one 
thread. It can be increased with "hbase.hstore.open.and.close.threads.max" but 
this config value is then divided by number of stores in the region.

"hbase.hstore.open.and.close.threads.max" is also used to size other thread 
pools for opening and closing the stores themselves, so it's an unfortunate 
overloading.

We should have a configuration parameter that directly and simply tunes the 
thread pool size for opening store files. Introduce a new configuration 
parameter: "hbase.hstore.hfile.open.threads.max" which will define the upper 
bound for a thread pool shared by the entire store for opening hfiles. The 
default should be 1 to preserve default behavior.

Once this is done, we could increase this to 2, 4, 8, or more for increased 
parallelism when opening store files without impact on other activities. The 
time required to open all storefiles often dominates the total time for 
bringing a region online. The thread pool will be shut down and eligible for 
garbage collection once all files are loaded and the store is online.

Number of open threads should scale with the number of stores, so allocating 
the pool at the store level continues to make sense.

Longer term we might try recursively decomposing the region open task with a 
fork-join pool such that the opening of store files can be dynamically 
parallelized in a probably superior way (conjecture pending a real attempt with 
metrics) . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24525) [branch-1] Support ZooKeeper 3.6.0+

2020-06-09 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24525:
---

 Summary: [branch-1] Support ZooKeeper 3.6.0+
 Key: HBASE-24525
 URL: https://issues.apache.org/jira/browse/HBASE-24525
 Project: HBase
  Issue Type: Improvement
  Components: Zookeeper
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 1.7.0


Fix compilation issues against ZooKeeper 3.6.0. Backwards compatible changes 
with 3.4 and 3.5. Tested with:

{{  mvn clean install}}{{}}

{{  mvn clean install -Dzookeeper.version=3.5.8}}{{}}

{{  mvn clean install -Dzookeeper.version=3.6.0}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24527) Improve region housekeeping status observability

2020-06-09 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24527:
---

 Summary: Improve region housekeeping status observability
 Key: HBASE-24527
 URL: https://issues.apache.org/jira/browse/HBASE-24527
 Project: HBase
  Issue Type: New Feature
  Components: Admin, Compaction, shell, UI
Reporter: Andrew Kyle Purtell


We provide a coarse grained admin API and associated shell command for 
determining the compaction status of a table:

{noformat}
hbase(main):001:0> help "compaction_state"
Here is some help for this command:
 Gets compaction status (MAJOR, MAJOR_AND_MINOR, MINOR, NONE) for a table:
 hbase> compaction_state 'ns1:t1'
 hbase> compaction_state 't1'
{noformat}

We also log  compaction activity, including a compaction journal at completion, 
via log4j to whatever log aggregation solution is available in production.  

This is not sufficient for online and interactive observation, debugging, or 
performance analysis of current compaction activity. In this kind of activity 
an operator is attempting to observe and analyze compaction activity in real 
time. Log aggregation and presentation solutions have typical latencies (end to 
end visibility of log lines on the order of ~minutes) which make that not 
possible today.

We don't offer any API or tools for directly interrogating split and merge 
activity in real time. Some indirect knowledge of split or merge activity can 
be inferred from RIT information via ClusterStatus. 

We should have new APIs and shell commands, and perhaps also new admin UI 
views, for

at regionserver scope:
* listing the current state of a regionserver's compaction, split, and merge 
tasks and threads
* counting (simple view) and listing (detailed view) a regionserver's 
compaction queues
* listing a region's currently compacting, splitting, or merging status

at master scope, aggregations of the above detailed information into:
* listing the active compaction tasks and threads for a given table, the 
extension of _compaction_state_ with a new detailed view
* listing the active split or merge tasks and threads for a given table's 
regions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24528) Improve balancer decision observability

2020-06-09 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24528:
---

 Summary: Improve balancer decision observability
 Key: HBASE-24528
 URL: https://issues.apache.org/jira/browse/HBASE-24528
 Project: HBase
  Issue Type: New Feature
  Components: Admin, Balancer, shell, UI
Reporter: Andrew Kyle Purtell


We provide detailed INFO and DEBUG level logging of balancer decision factors, 
outcome, and reassignment planning, as well as similarly detailed logging of 
the resulting assignment manager activity. However, an operator may need to 
perform online and interactive observation, debugging, or performance analysis 
of current balancer activity. Scraping and correlating the many log lines 
resulting from a balancer execution is labor intensive and has a lot of latency 
(order of ~minutes to acquire and index, order of ~minutes to correlate). 

The balancer should maintain a rolling window of history, e.g. the last 100 
region move plans, or last 1000 region move plans submitted to the assignment 
manager. This history should include decision factor details and weights and 
costs. The rsgroups balancer may be able to provide fairly simple decision 
factors, like for example "this table was reassigned to that regionserver 
group". The underlying or vanilla stochastic balancer on the other hand, after 
a walk over random assignment plans, will have considered a number of cost 
functions with various inputs (locality, load, etc.) and multipliers, including 
custom cost functions. We can devise an extensible class structure that 
represents explanations for balancer decisions, and for each region move plan 
that is actually submitted to the assignment manager, we can keep the 
explanations of all relevant decision factors alongside the other details of 
the assignment plan like the region name, and the source and destination 
regionservers. 

This history should be available via API for use by new shell commands and 
admin UI widgets.

The new shell commands and UI widgets can unpack the representation of balancer 
decision components into human readable output. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24543) ScheduledChore logging is too chatty, replace with metrics

2020-06-11 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24543:
---

 Summary: ScheduledChore logging is too chatty, replace with metrics
 Key: HBASE-24543
 URL: https://issues.apache.org/jira/browse/HBASE-24543
 Project: HBase
  Issue Type: Improvement
  Components: metrics, Operability
Reporter: Andrew Kyle Purtell


ScheduledChore logs at DEBUG level the execution time of each chore. 

We used to log an average execution time across all chores every five minutes, 
which by consensus was judged to not be useful. Derived metrics like averages 
or histograms should be calculated per chore. So we modified the logging to 
dump the chore execution time each time it runs, to facilitate such 
calculations with the log aggregation and searching tool of choice. Per chore 
execution logging is more useful, in that sense, but may be too chatty. This is 
not unexpected but let me provide my observations so we can revisit this.

On the master, for example, this is logged every second:
{noformat}
2020-06-11 16:35:28,263 DEBUG 
[master/apurtell-ltm:8100.splitLogManager..Chore.1] hbase.ScheduledChore: 
SplitLogManager Timeout Monitor execution time: 0 ms.
{noformat}

Does the value of these lines outweigh the cost of 86,400 log lines per day per 
master instance? (At least.)

On the regionserver it is somewhat better, these are logged every 10 seconds:
{noformat}
2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1] 
hbase.ScheduledChore: CompactionChecker execution time: 0 ms.
2020-06-11 16:37:57,203 DEBUG [regionserver/apurtell-ltm:8120.Chore.1] 
hbase.ScheduledChore: MemstoreFlusherChore execution time: 0 ms.
{noformat}

So that will be 17,280 log lines per day per regionserver. (At least.)

Perhaps these should be moved to TRACE level. 

We should definitely replace this logging with histogram metrics. There should 
be a separate metric for each distinct chore classname, allocated as needed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24597) Port HBASE-24380 (Improve WAL splitting log lines to enable sessionization) to branch-1

2020-06-19 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24597:
---

 Summary: Port HBASE-24380 (Improve WAL splitting log lines to 
enable sessionization) to branch-1
 Key: HBASE-24597
 URL: https://issues.apache.org/jira/browse/HBASE-24597
 Project: HBase
  Issue Type: Sub-task
  Components: logging, wal
Reporter: Andrew Kyle Purtell
 Fix For: 1.7.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24598) Port HBASE-24380 (Improve WAL splitting log lines to enable sessionization) to branch-2.2

2020-06-19 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24598:
---

 Summary: Port HBASE-24380 (Improve WAL splitting log lines to 
enable sessionization) to branch-2.2
 Key: HBASE-24598
 URL: https://issues.apache.org/jira/browse/HBASE-24598
 Project: HBase
  Issue Type: Sub-task
  Components: logging, wal
Reporter: Andrew Kyle Purtell
 Fix For: 2.2.6






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24637) Filter SKIP hinting regression

2020-06-25 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24637:
---

 Summary: Filter SKIP hinting regression
 Key: HBASE-24637
 URL: https://issues.apache.org/jira/browse/HBASE-24637
 Project: HBase
  Issue Type: Bug
  Components: Filters, Performance, Scanners
Reporter: Andrew Kyle Purtell


I have been looking into reported performance regressions in HBase 2 relative 
to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
significantly better microbenchmarks in a number of cases, and usually shows 
improvement in whole cluster benchmarks like YCSB.

To assist in debugging I added methods to RpcServer for updating per-call 
metrics that leverage the fact it puts a reference to the current Call into a 
thread local and that all activity for a given RPC is processed by a single 
thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
various friends (in branch-2.2), StoreScanner, HFileReaderV2 and HFileReaderV3 
(in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, and 
DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables with 
one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per row 
were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 and 2.2 
versions under test operated on identical data files in HDFS. For tests with 
1.6 and 2.2 on the server side the same 1.6 PE client was used, to ensure only 
the server side differed.

The results for pe --filterAll were revealing. See attached. 

It appears a refactor to ScanQueryMatcher and friends has disabled the ability 
of filters to provide meaningful SKIP hints, which disables an optimization 
that avoids reseeking, leading to a serious and proportional regression in 
reseek activity and time spent in that code path. So for queries that use 
filters, there can be a substantial regression.

Other test cases that did not use filters did not show this regression. If 
filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
almost identical, as measured by counts of the hint types returned, whether or 
not column or version trackers are called, and counts of store seeks or 
reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
results generally fell within this range, except for the filter all case of 
course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-15519) Add per-user metrics

2020-07-27 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-15519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell reopened HBASE-15519:
-

> Add per-user metrics 
> -
>
> Key: HBASE-15519
> URL: https://issues.apache.org/jira/browse/HBASE-15519
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Affects Versions: 1.2.0
>Reporter: Enis Soztutar
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.0
>
> Attachments: HBASE-15519.master.003.patch, hbase-15519_v0.patch, 
> hbase-15519_v1.patch, hbase-15519_v1.patch, hbase-15519_v2.patch
>
>
> Per-user metrics will be useful in multi-tenant cases where we can emit 
> number of requests, operations, num RPCs etc at the per-user aggregate level 
> per regionserver. We currently have throttles per user, but no way to monitor 
> resource usage per-user. 
> Looking at these metrics, operators can adjust throttles, do capacity 
> planning, etc per-user. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24893) TestLogLevel failing on hadoop-ci (branch-1)

2020-08-17 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24893:
---

 Summary: TestLogLevel failing on hadoop-ci (branch-1)
 Key: HBASE-24893
 URL: https://issues.apache.org/jira/browse/HBASE-24893
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Andrew Kyle Purtell
 Fix For: 1.7.0


TestLogLevel is failing the branch-1 builds on hadoop-ci.

The test needs some improvement. The code seems to be doing the right thing but 
the error condition the test is expecting varies by JVM or JVM version:
{noformat}
Expected to find 'Unrecognized SSL message' but got unexpected 
exception:javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-24898) Use EnvironmentEdge.currentTime() instead of System.currentTimeMillis() in CurrentHourProvider

2020-08-24 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell reopened HBASE-24898:
-

This test fails 100% of the time on branch-1 and the commit has been reverted. 

{noformat}
[INFO] Running 
org.apache.hadoop.hbase.regionserver.compactions.TestCurrentHourProvider
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.309 s 
<<< FAILURE! - in 
org.apache.hadoop.hbase.regionserver.compactions.TestCurrentHourProvider
[ERROR] 
testWithEnvironmentEdge(org.apache.hadoop.hbase.regionserver.compactions.TestCurrentHourProvider)
  Time elapsed: 0.175 s  <<< FAILURE!
java.lang.AssertionError: expected:<11> but was:<12>
at 
org.apache.hadoop.hbase.regionserver.compactions.TestCurrentHourProvider.testWithEnvironmentEdge(TestCurrentHourProvider.java:53)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TestCurrentHourProvider.testWithEnvironmentEdge:53 expected:<11> but 
was:<12>
[INFO] 
[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
{noformat}

It also fails 100% of the time for me on branch-2.3 and probably should be 
reverted elsewhere as well. 

> Use EnvironmentEdge.currentTime() instead of System.currentTimeMillis() in 
> CurrentHourProvider
> --
>
> Key: HBASE-24898
> URL: https://issues.apache.org/jira/browse/HBASE-24898
> Project: HBase
>  Issue Type: Bug
>  Components: tooling
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0, 2.2.7, 2.3.2
>
>
> In order to control the return value of getCurrentHour used by unit test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-24971) Upgrade JQuery to 3.5.1

2020-08-31 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-24971:
---

 Summary: Upgrade JQuery to 3.5.1
 Key: HBASE-24971
 URL: https://issues.apache.org/jira/browse/HBASE-24971
 Project: HBase
  Issue Type: Bug
  Components: security, UI
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 3.0.0-alpha-1, 2.3.3, 1.7.0, 2.4.0, 2.2.7


JQuery <= 3.5.0 is subject to a known cross site scripting vulnerability. 
Upgrade our embedded minimized jquery library to 3.5.1. 

Upgrade embedded jquery-tablesorter while at it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24971) Upgrade JQuery to 3.5.1

2020-09-01 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-24971.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> Upgrade JQuery to 3.5.1
> ---
>
> Key: HBASE-24971
> URL: https://issues.apache.org/jira/browse/HBASE-24971
> Project: HBase
>  Issue Type: Bug
>  Components: security, UI
>Reporter: Andrew Kyle Purtell
>Assignee: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.3, 1.7.0, 2.4.0, 2.2.7
>
>
> JQuery <= 3.5.0 is subject to a known cross site scripting vulnerability. 
> Upgrade our embedded minimized jquery library to 3.5.1. 
> Upgrade embedded jquery-tablesorter while at it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-24893) TestLogLevel failing on hadoop-ci (branch-1)

2020-09-02 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell reopened HBASE-24893:
-

> TestLogLevel failing on hadoop-ci (branch-1)
> 
>
> Key: HBASE-24893
> URL: https://issues.apache.org/jira/browse/HBASE-24893
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Andrew Kyle Purtell
>Assignee: Abhey Rana
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.7.0
>
>
> TestLogLevel is failing the branch-1 builds on hadoop-ci.
> The test needs some improvement. The code seems to be doing the right thing 
> but the error condition the test is expecting varies by JVM or JVM version:
> {noformat}
> Expected to find 'Unrecognized SSL message' but got unexpected exception:
> javax.net.ssl.SSLException: Unsupported or unrecognized SSL message
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25079) Upgrade Bootstrap to 3.3.7

2020-09-21 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-25079:
---

 Summary: Upgrade Bootstrap to 3.3.7
 Key: HBASE-25079
 URL: https://issues.apache.org/jira/browse/HBASE-25079
 Project: HBase
  Issue Type: Improvement
  Components: security, UI
 Environment: ad
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 3.0.0-alpha-1, 2.3.3, 1.7.0, 2.4.0, 2.2.7


Our UI embeds Bootstrap 3.0.0. There are some reported security issues. 

Upgrade to Bootstrap 3.3.7. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25212) Optionally abort requests in progress after deciding a region should close

2020-10-21 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-25212:
---

 Summary: Optionally abort requests in progress after deciding a 
region should close
 Key: HBASE-25212
 URL: https://issues.apache.org/jira/browse/HBASE-25212
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 3.0.0-alpha-1, 1.7.0, 2.4.0


After deciding a region should be closed, the regionserver will set the 
internal region state to closing and wait for all pending requests to complete, 
via a rendezvous on the region lock. In closing state the region will not 
accept any new requests but requests in progress will be allowed to complete 
before the close action takes place. In our production we see outlier wait 
times on this lock in excess of several minutes. 

During close when there are requests in flight the regionserver is subject to 
any conceivable reason for delay, like full scans over large regions, expensive 
filtering hierarchies, bugs, or store level performance problems like slow 
HDFS. The regionserver should interrupt requests in progress to facilitate 
smaller/shorter close times on an opt-in basis.

Optionally, via configuration parameter -- which would be a system wide default 
set in hbase-site.xml in common practice but could be overridden in table 
schema for per table settings -- interrupt requests in progress holding the 
region lock rather than wait for completion of all operations in flight. Send 
back NotServingRegionException("region is closing") to the clients of the 
interrupted operations, like we do after the write lock is acquired. The client 
will transparently relocate the region data and resubmit the aborted requests 
per normal retry policy. This can be less disruptive than waiting for very long 
times for a region to close in extreme outlier cases (e.g. 50 minutes).

After waiting for all requests to complete then we flush the region's memstore 
and finish the close. The flush portion of the close process is out of scope of 
this proposal. Under normal conditions the flush portion of the close completes 
quickly. It is specifically waits on the close lock that has been an occasional 
issue in our production that causes difficulty achieving 99.99% availability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25227) [branch-1] Cast in UnsafeAccess to avoid Java 11 runtime issue

2020-10-28 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-25227:
---

 Summary: [branch-1] Cast in UnsafeAccess to avoid Java 11 runtime 
issue
 Key: HBASE-25227
 URL: https://issues.apache.org/jira/browse/HBASE-25227
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 1.7.0


When running tests with Java 11 UnsafeAccess is observed to throw 
NoSuchMethodErrors. Some of our methods accept 'int' parameters and use them as 
parameters to Unsafe methods which should take 'long'. The Java 8 compiler does 
the implicit conversion but the Java 11 compiler does not. Add casts to fix. 
Not an issue on branch-2 and up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25292) Update InetSocketAddress usage discipline

2020-11-16 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-25292:
---

 Summary: Update InetSocketAddress usage discipline
 Key: HBASE-25292
 URL: https://issues.apache.org/jira/browse/HBASE-25292
 Project: HBase
  Issue Type: Bug
  Components: Client, HFile
Reporter: Andrew Kyle Purtell


We sometimes cache InetSocketAddress in data structures in an attempt to 
optimize away potential nameservice (DNS) lookups. This is, in general, an 
anti-pattern, because once an InetSocketAddress is resolved, resolution is 
never attempted again. The ideal pattern for connect() is ISA instantiation 
just before the connect() call, with no reuse of the ISA instance. For bind() 
we presume the local identity won't change while the process is live so usage 
and caching can be relaxed in that case.

If I can restate my proposal for a usage convention for InetSocketAddress, it 
would be this: Network identities should be bound late. This means addresses 
should be resolved at the last possible moment. Also, network identity mappings 
can change, so our code should not inappropriately cache them; otherwise we 
might miss a change and fail to operate normally.

I have reviewed the code for InetSocketAddress usage and in my opinion 
sometimes we are caching ISA acceptably, and in other cases we are not.

Correct cases:
 * We cache ISA for RPC connections, so we don't potentially do a lookup for 
every Call. However, we resolve the address earlier than we need to. The code 
can be improved by moving resolution to just before where we connect().

Incorrect cases that can be fixed:
 * RPC stubs. Remote clients may be recycled and replaced with new instances 
where the network identities (DNS name to IP address mapping) have changed--. 
HBASE-14544 attempts to work around DNS instability in data centers of years 
past in a way that, in my opinion, is the wrong thing to do in the modern era. 
This is just a technical opinion and not critical to the rest of the proposal. 
That said, I intend to propose a revert of HBASE-14544. Reverting this 
simplifies some code a bit. (If this part of the proposal is controversial it 
can be dropped.) When looking up the IP address of the remote host when 
creating a stub key we also make a key even if the resolution fails. This is 
the wrong thing to do. If we can't resolve the remote address, we can't contact 
the server. Making a stub that can't communicate is pointless. Throw an 
exception instead.
 * Favored nodes. Although the HDFS API requires InetSocketAddress, we don't 
have to make up a list right away and cache them forever. We can use Address to 
record the list of favored nodes and convert from Address to InetSocketAddress 
on demand (when we go to create the HFile). This will allow us to resolve 
datanode hostnames just before they are needed. In public cloud, kubernetes, 
and or some private datacenter service deployment options, datanode servers may 
have their network identities (DNS name -> IP address mapping) changed over 
time. We can and should avoid inappropriate caching that may cause us to 
indefinitely use an incorrect address when contacting a favored node. 
 * Sometimes we use ISA when Address is just as good. For example, the dead 
servers list. If we are going to pay some attention to ISA usage discipline, 
let's remove the cases where we use ISA as a host and port pair but do not need 
to do so. Address works just as well and doesn't present an opportunity for 
misuse. Another example would be the RPC client concurrentCounterCache.

Incorrect cases that cannot be fixed:
 * hbase-external-blockcache: We have to resolve all of the memcached locations 
up front because the memcached client constructor requires ISA instances. So we 
have to hope that the network identities (DNS name -> IP address mapping) does 
not change for any in the list. This is beyond our control.

While in this area it is trivial to add new client connect metrics for number 
of potential nameservice lookups (whenever we instantiate an ISA) and number of 
failed nameservice lookups (if the instantiated ISA is unresolved).

While in this area I also noticed we often directly access a field in 
ConnectionId where there is also a getter, so good practice is to use the 
getter instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25308) [branch-1] Consume Guava from hbase-thirdparty hbase-shaded-miscellaneous

2020-11-19 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-25308:
---

 Summary: [branch-1] Consume Guava from hbase-thirdparty 
hbase-shaded-miscellaneous
 Key: HBASE-25308
 URL: https://issues.apache.org/jira/browse/HBASE-25308
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell
 Fix For: 1.7.0


We are again having classpath versioning issues related to Guava in our 
branch-1 based application.

Hadoop 3, HBase 2, Phoenix 5, and other projects deal with Guava cross-version 
incompatibilities, as they manifest on a combined classpath with other 
components, via shading.

I propose to do a global search and replace of all direct uses of Guava in our 
branch-1 code base and refer to Guava as provided in hbase-thirdparty's 
hbase-shaded-miscellaneous. This will protect HBase branch-1 from Guava 
cross-version vagaries just like the same technique protects branch-2 and 
branch-2 based releases. 

There are a couple of Public interfaces that incorporate Guava's HostAndPort 
and Service that will be indirectly impacted. We are about to release a new 
minor branch-1 version, 1.7.0, and this would be a great opportunity to 
introduce this kind of change in a manner consistent with semantic versioning 
and our compatibility policies.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25314) branch-1 docker mode for yetus fails

2020-11-20 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-25314:
---

 Summary: branch-1 docker mode for yetus fails
 Key: HBASE-25314
 URL: https://issues.apache.org/jira/browse/HBASE-25314
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Kyle Purtell


{noformat}
15:30:41  Step 28/33 : RUN gem install rubocop:'<= 0.81'
15:30:41   ---> Running in 21103fb7944c
15:30:42  Building native extensions.  This could take a while...
15:30:43  [91mERROR:  Error installing rubocop:
15:30:43parallel requires Ruby version >= 2.5.
15:30:43  [0mSuccessfully installed jaro_winkler-1.5.4
15:30:44  The command '/bin/sh -c gem install rubocop:'<= 0.81'' returned a 
non-zero code: 1
15:30:44  ERROR: Docker failed to build yetus/hbase:b249092a5f. 
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25316) Release a hbase-thirdparty with hbase-shaded-miscellaneous suitable for branch-1

2020-11-20 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-25316:
---

 Summary: Release a hbase-thirdparty with 
hbase-shaded-miscellaneous suitable for branch-1
 Key: HBASE-25316
 URL: https://issues.apache.org/jira/browse/HBASE-25316
 Project: HBase
  Issue Type: Task
Affects Versions: 1.7.0
 Environment: R
Reporter: Andrew Kyle Purtell
Assignee: Andrew Kyle Purtell






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24664) Some changing of split region by overall region size rather than only one store size

2020-11-23 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-24664.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> Some changing of split region by overall region size rather than only one 
> store size
> 
>
> Key: HBASE-24664
> URL: https://issues.apache.org/jira/browse/HBASE-24664
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: Zheng Wang
>Assignee: Zheng Wang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> As a distributed cluster, HBase distribute loads in unit of region, so if 
> region grows too big,
>  it will bring some negative effects, such as:
>  1. Harder to homogenize disk usage(consider locality)
>  2. Might cost more time on region opening
>  3. After split, the daughter region might lead to more io cost on compaction 
> in a short time(if write evenly)
> I tried to introduce a new SteppingAllStoresSizeSplitPolicy in HBASE-24530, 
> but after discussed in comments and related 
> [thread|https://lists.apache.org/thread.html/r08a8103e2532eb667a0fcb4efa8a4117b3f82e6251bc4bd0bc157c26%40%3Cdev.hbase.apache.org%3E],
>  finally we decide to change the existing split policy with a new option that 
> if it should count all store files, and for master it would be true, else 
> false. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25298) hbase.rsgroup.fallback.enable should support dynamic configuration

2020-11-23 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-25298.
-
Fix Version/s: 3.0.0-alpha-1
   Resolution: Fixed

The PR was merged to master branch.

> hbase.rsgroup.fallback.enable should support dynamic configuration 
> ---
>
> Key: HBASE-25298
> URL: https://issues.apache.org/jira/browse/HBASE-25298
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> Use update_config command to control the switch of RSGroup fallback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24877) Add option to avoid aborting RS process upon uncaught exceptions happen on replication source

2020-11-23 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-24877.
-
Fix Version/s: 2.4.0
   3.0.0-alpha-1
   Resolution: Fixed

PRs were merged to master and branch-2. Resolving. File new issues for any 
further backports.

> Add option to avoid aborting RS process upon uncaught exceptions happen on 
> replication source
> -
>
> Key: HBASE-24877
> URL: https://issues.apache.org/jira/browse/HBASE-24877
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently, we abort entire RS process if any uncaught exceptions happens on 
> ReplicationSource initialization. This may be too extreme on certain 
> deployments, where custom replication endpoint implementations may choose to 
> do so when remote peers are unavailable, but source cluster shouldn't be 
> brought down entirely. Similarly, source reader and shipper threads would 
> cause RS to abort on any runtime exception occurrence while running. 
> This patch adds configuration option (false by default, to keep the original 
> behaviour), to avoid aborting entire RS processes under these conditions. 
> Instead, if ReplicationSource initialization fails with a RuntimeException, 
> it keeps retrying the source startup. In the case of readers/shippers runtime 
> errors, it refreshes the replication source, terminating current source and 
> its readers/shippers and creating new ones.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24081) Provide documentation for running Yetus with HBase

2020-11-23 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-24081.
-
Fix Version/s: 2.4.0
   Resolution: Fixed

> Provide documentation for running Yetus with HBase
> --
>
> Key: HBASE-24081
> URL: https://issues.apache.org/jira/browse/HBASE-24081
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> A colleague asked how to use Yetus with HBase, so I wrote up a little how-to 
> doc. Maybe it's useful to someone else?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25187) Improve SizeCachedKV variants initialization

2020-11-24 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-25187.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> Improve SizeCachedKV variants initialization
> 
>
> Key: HBASE-25187
> URL: https://issues.apache.org/jira/browse/HBASE-25187
> Project: HBase
>  Issue Type: Improvement
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.4, 2.5.0
>
>
> Currently in SizeCachedKV we get the rowlength and Key length from the 
> buffers. This can be optimized because we can pass the keylen and row len 
> while actually creating the cell while reading the cell from the block.  Some 
> times we see that the SizeCachedKV takes the max width in a flame graph - 
> considering the fact we also do a sanity check on the created KV. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25050) We initialize Filesystems more than once.

2020-11-24 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-25050.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> We initialize Filesystems more than once.
> -
>
> Key: HBASE-25050
> URL: https://issues.apache.org/jira/browse/HBASE-25050
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.6
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> In HFileSystem
> {code}
> // Create the default filesystem with checksum verification switched on.
> // By default, any operation to this FilterFileSystem occurs on
> // the underlying filesystem that has checksums switched on.
> this.fs = FileSystem.get(conf);
> this.useHBaseChecksum = useHBaseChecksum;
> fs.initialize(getDefaultUri(conf), conf);
> {code}
> We call fs.initialize(). Generally the FS would have been created and inited 
> either in the FileSystem.get() call above or even when we try to check 
> {code}
>   FileSystem fs = p.getFileSystem(c);
> {code}
> The FS that gets cached in the hadoop-common layer does the init for us. So 
> we doing it again is redundant. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25246) Backup/Restore hbase cell tags.

2020-12-02 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-25246.
-
Fix Version/s: 2.4.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Backup/Restore hbase cell tags.
> ---
>
> Key: HBASE-25246
> URL: https://issues.apache.org/jira/browse/HBASE-25246
> Project: HBase
>  Issue Type: Improvement
>  Components: backup&restore
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> In PHOENIX-6213 we are planning to add cell tags for Delete mutations. After 
> having a discussion with hbase community via dev mailing thread, it was 
> decided that we will pass the tags via an attribute in Mutation object and 
> persist them to hbase via phoenix co-processor. The intention of PHOENIX-6213 
> is to store metadata in Delete marker so that while running Restore tool we 
> can selectively restore certain Delete markers and ignore others. For that to 
> happen we need to persist these tags in Backup and retrieve them in Restore 
> MR jobs (Import/Export tool). 
> Currently we don't persist the tags in Backup. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25352) API compatibilty checker fails with "Argument list too long"

2020-12-02 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-25352:
---

 Summary: API compatibilty checker fails with "Argument list too 
long"
 Key: HBASE-25352
 URL: https://issues.apache.org/jira/browse/HBASE-25352
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Andrew Kyle Purtell


While working on the 2.4.0 RC I hit a stumbling block where the argument list 
passed to javap by the API compatibility checker is too large for Mac OS. 
Attempted execution of the forked process fails with "Argument list too long".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25359) create-release scripts releasedocmaker step should be optional

2020-12-04 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-25359:
---

 Summary: create-release scripts releasedocmaker step should be 
optional
 Key: HBASE-25359
 URL: https://issues.apache.org/jira/browse/HBASE-25359
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell


The create-release scripts assume, when invoking releasedocmaker and performing 
surgery on CHANGES.md and RELEASENOTES.md during the 'tag' stage, that the 
current RC step is RC0. The entirety of the generated CHANGES.md and 
RELEASENOTES.md files are stitched in at the head, just below the ASF notice. 
If we are at a RC step that is not zero, wouldn't this duplicate all CHANGES.md 
and RELEASENOTES.md content for the release? There would be all the content 
added for RC0, then the same content (with delta) added for RC1, and so on. 

For this reason the releasedocmaker invocation should itself be optional.

For RC steps > 0, assume the RM has updated CHANGES.md and RELEASENOTES.md to 
reflect the delta. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25465) Use javac --release option for supporting cross version compilation in create-release

2021-01-05 Thread Andrew Kyle Purtell (Jira)
Andrew Kyle Purtell created HBASE-25465:
---

 Summary: Use javac --release option for supporting cross version 
compilation in create-release
 Key: HBASE-25465
 URL: https://issues.apache.org/jira/browse/HBASE-25465
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Kyle Purtell






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-12830) Unreadable HLogs stuck in replication queues

2021-01-11 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-12830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-12830.
-
Resolution: Duplicate

> Unreadable HLogs stuck in replication queues
> 
>
> Key: HBASE-12830
> URL: https://issues.apache.org/jira/browse/HBASE-12830
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.9
>Reporter: Andrew Kyle Purtell
>Priority: Major
>
> We had an incident where underlying infrastructure issues caused HDFS 
> namenodes to go down, not at the same time, leading to periods of HDFS 
> service outage and recovery as namenodes failed over. These clusters had 
> replication enabled. We had some Regionservers roll logs during partial HDFS 
> availability. Namespace entries for these HLogs were created but the first 
> block-being-written was lost or unable to complete, leading to dead 0-length 
> HLogs in the queues of active sources. When this happens the queue becomes 
> stuck on the dead 0-length HLog reporting EOFExceptions whenever the source 
> wakes up and tries to (re)open the current file like so:
> {noformat}
> 2015-01-08 18:50:40,956 WARN 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: 
> 1-,60020,1418764167084 Got: 
> java.io.EOFException
>   at java.io.DataInputStream.readFully(DataInputStream.java:197)
>   at java.io.DataInputStream.readFully(DataInputStream.java:169)
>   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1845)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1810)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1759)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1773)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.reset(SequenceFileLogReader.java:175)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.initReader(SequenceFileLogReader.java:184)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.init(ReaderBase.java:70)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createReader(HLogFactory.java:128)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createReader(HLogFactory.java:91)
>   at 
> org.apache.hadoop.hbase.regionserver.wal.HLogFactory.createReader(HLogFactory.java:79)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:68)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:506)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:309)
> {noformat}
> This exception originates from where SequenceFile tries to read in the 4-byte 
> version header from position 0.
> In ReplicationSource#run we have an active loop:
> {code}
> // Loop until we close down
> while (isActive()) {
> ...
> }
> {code}
> Within this loop we iterate over paths in the replication queue. For each 
> path, we attempt to open it:
> {code}
>   // Open a reader on it
>   if (!openReader(sleepMultiplier)) {
> // Reset the sleep multiplier, else it'd be reused for the next file
> sleepMultiplier = 1;
> continue;
>   }
> {code}
> When we have a zero length file openReader returns TRUE but this.reader is 
> set to NULL (look at the catch of the outer try block) and we fall through 
> the conditional to:
> {code}
>   // If we got a null reader but didn't continue, then sleep and continue
>   if (this.reader == null) {
> if (sleepForRetries("Unable to open a reader", sleepMultiplier)) {
>   sleepMultiplier++;
> }
> continue;
>   }
> {code}
> We will keep trying to open the current file for a long time. The queue will 
> be stuck until sleepMultiplier == maxRetriesMultiplier (conf 
> "replication.source.maxretriesmultiplier", default 10), with a base sleep 
> time of "replication.source.sleepforretries" (default 1000) ms, then we will 
> call ReplicationSource#processEndOfFile(). 
> By default we will spin on opening the dead 0-length HLog for (1000*1) + 
> (1000*2) ... + (1000*10) milliseconds before processing the file out of the 
> queue. In HBASE-11964 we recommend increasing 
> "replication.source.maxretriesmultiplier" to 300. Using the updated 
> configuration we will wait for (1000*1) + (1000*2) ... + (1000*300) 
> milliseconds before processing the file out of the queue. 
> There should be some way to break out of this very long wait for a 0-length 
> or corrupt file that is blocking the q

[jira] [Resolved] (HBASE-24813) ReplicationSource should clear buffer usage on ReplicationSourceManager upon termination

2021-01-12 Thread Andrew Kyle Purtell (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-24813.
-
Resolution: Fixed

> ReplicationSource should clear buffer usage on ReplicationSourceManager upon 
> termination
> 
>
> Key: HBASE-24813
> URL: https://issues.apache.org/jira/browse/HBASE-24813
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.2.6, 2.3.4, 2.5.0
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.5.0, 2.4.1
>
> Attachments: TestReplicationSyncUpTool.log, 
> image-2020-10-09-10-50-00-372.png
>
>
> Following investigations on the issue described by [~elserj] on HBASE-24779, 
> we found out that once a peer is removed, thus killing peers related 
> *ReplicationSource* instance, it may leave 
> *ReplicationSourceManager.totalBufferUsed* inconsistent. This can happen if 
> *ReplicationSourceWALReader* had put some entries on its queue to be 
> processed by *ReplicationSourceShipper,* but the peer removal killed the 
> shipper before it could process the pending entries. When 
> *ReplicationSourceWALReader* thread add entries to the queue, it increments 
> *ReplicationSourceManager.totalBufferUsed* with the sum of the entries sizes. 
> When those entries are read by *ReplicationSourceShipper,* 
> *ReplicationSourceManager.totalBufferUsed* is then decreased. We should also 
> decrease *ReplicationSourceManager.totalBufferUsed* when *ReplicationSource* 
> is terminated, otherwise those unprocessed entries size would be consuming 
> *ReplicationSourceManager.totalBufferUsed __*indefinitely, unless the RS gets 
> restarted. This may be a problem for deployments with multiple peers, or if 
> new peers are added.**



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >