[jira] [Created] (HBASE-27713) Remove numRegions in Region Metrics

2023-03-14 Thread tianhang tang (Jira)
tianhang tang created HBASE-27713:
-

 Summary: Remove numRegions in Region Metrics
 Key: HBASE-27713
 URL: https://issues.apache.org/jira/browse/HBASE-27713
 Project: HBase
  Issue Type: Improvement
Reporter: tianhang tang
Assignee: tianhang tang


After HBASE-27681, I'd like to refactor region metrics, Then found this issue.

We have a metric named `numRegions` in Region Metrics, which I think it's a 
regionServer metric, but not a region metric.

And we do have a `regionCount` metric in RegionServer Metrics.

So I think we should delete the `numRegions`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27714) WALEntryStreamTestBase creates a new HBTU in startCluster method which causes all sub classes are testing default configurations

2023-03-14 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-27714:
-

 Summary: WALEntryStreamTestBase creates a new HBTU in startCluster 
method which causes all sub classes are testing default configurations
 Key: HBASE-27714
 URL: https://issues.apache.org/jira/browse/HBASE-27714
 Project: HBase
  Issue Type: Bug
  Components: Replication, test
Reporter: Duo Zhang
Assignee: Duo Zhang


Should be a typo...

Let's fix this...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27715) Refactoring the long tryAdvanceEntry method in WALEntryStream

2023-03-14 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-27715:
-

 Summary: Refactoring the long tryAdvanceEntry method in 
WALEntryStream
 Key: HBASE-27715
 URL: https://issues.apache.org/jira/browse/HBASE-27715
 Project: HBase
  Issue Type: Task
Reporter: Duo Zhang


Let's make it more readable and add more logs, for debugging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27716) Fix TestWALOpenAfterDNRollingStart

2023-03-14 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-27716:
-

 Summary: Fix TestWALOpenAfterDNRollingStart
 Key: HBASE-27716
 URL: https://issues.apache.org/jira/browse/HBASE-27716
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Duo Zhang
Assignee: Duo Zhang


Just a test issue, should use NoEOF reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27717) Add rsgroup name for dead region servers on master UI

2023-03-14 Thread Xiaolin Ha (Jira)
Xiaolin Ha created HBASE-27717:
--

 Summary: Add rsgroup name for dead region servers on master UI
 Key: HBASE-27717
 URL: https://issues.apache.org/jira/browse/HBASE-27717
 Project: HBase
  Issue Type: Improvement
  Components: UI
Affects Versions: 2.5.3
Reporter: Xiaolin Ha


We also want to known the rsgroup name of dead region servers, which are showed 
in the `Dead Region Servers` of master UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Kubernetes Orchestration for ZK, HDFS, and HBase

2023-03-14 Thread Mallikarjun
Hi Nick,

I agree with your thought that there is an increasing reliance on
kubernetes, more so for complex workloads like hbase deployments because of
unavailability of reliable automation frameworks outside of k8s.

But I have a slightly different view in terms of how to achieve it. When I
was exploring what are the possibilities such as kustomize or helm or
operator. I found it can get pretty complex in terms of writing extensible
deployment manifest (for different kinds of deployments) with tools like
kustomize or helm. Here is our attempt to conterairise hbase with operator
--> https://github.com/flipkart-incubator/hbase-k8s-operator

---
Mallikarjun

On Mon, Mar 13, 2023 at 3:58 PM Nick Dimiduk  wrote:

> Heya team,
>
> Over here at $dayjob, we have an increasing reliance on Kubernetes for
> both development and production workloads. Our tools are maturing and
> we're hoping that they might be of interest to the wider community.
> I'd like to see if there's community interest in receiving some/any of
> them as a contribution. I think we'll also need a plan from ASF Infra
> that makes kubernetes available to us as a project.


> We have implemented a basic stack of tools for orchestrating ZK + HDFS
> + HBase on Kubernetes. We use this for running a small local dev
> cluster via MiniKube/KIND ; for ITBLL on smallish distributed clusters
> in a public cloud ; and in production for running clusters of ~100
> Data Nodes/Region Servers in a public cloud. There was an earlier
> discussion about using our donation of test hardware for running more
> thorough tests in our CI, but one of the limiting factors is full
> cluster deployment. I hope that the community might be interested in
> receiving this tooling as a foundation for more rigorous correctness
> and maybe even performance tests in the open. Furthermore, perhaps the
> wider community has interest in an Apache licensed cluster
> orchestration tool for other uses.
>
> Now for some details: The implementation is built on Kustomize, so
> it's fundamentally transparent resource specification with yaml
> patches for composability; this is in contrast to a solution using
> templates with defined capabilities and interfaces. There is no
> operator ; it's all coordinated via init/bootstrap containers, shell
> scripts, shared volumes for state, &c. For now.


> Such a donation will amount to a code drop, which will have its
> challenges. I'm motivated via internal processes to carve it into
> smaller pieces, and I think that will benefit community review as
> well. Perhaps this approach could be used to make the contribution via
> a feature branch.
>
> Is there community interest in adding such a capability to our
> maintained responsibilities? I'd hope that we have several volunteers
> to work with me through the contribution process, and who are
> reasonably confident that they'll be able to help maintain such a
> capability going forward. We'll also need someone who can work with
> Infra to get us access to Kubernetes cluster(s), via whatever means.
>
> What do you think?
>
> Thanks,
> Nick & the HBase team at Apple


[jira] [Created] (HBASE-27718) The regionStateNode only need remove one time in regionOffline

2023-03-14 Thread chaijunjie (Jira)
chaijunjie created HBASE-27718:
--

 Summary: The regionStateNode only need remove one time in 
regionOffline
 Key: HBASE-27718
 URL: https://issues.apache.org/jira/browse/HBASE-27718
 Project: HBase
  Issue Type: Bug
  Components: amv2
Affects Versions: 2.4.14
Reporter: chaijunjie


The regionStateNode only need remove one time in regionOffline when delete a 
table.

In

org.apache.hadoop.hbase.master.assignment.AssignmentManager#deleteTable



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Kubernetes Orchestration for ZK, HDFS, and HBase

2023-03-14 Thread Lars Francke
Hi Nick,

I do not mean to derail your mail so I'll keep mine short: Yes, I
think testing & infrastructure on Kubernetes would be worthwhile and I
thank you for the offer.
We're happy to take a look and would try to review any incoming
contributions depending on how large/digestable they are :)

We[1] are developing our own operator[2] and during that mission have
learned how prone Kubernetes is to bit rot so it'd be great if your
team would continue helping out.
I'm happy to go into details on why we did what we did but that'd be a
separate thread.

Cheers,
Lars

[1] 
[2] 

On Tue, Mar 14, 2023 at 3:23 PM Mallikarjun  wrote:
>
> Hi Nick,
>
> I agree with your thought that there is an increasing reliance on
> kubernetes, more so for complex workloads like hbase deployments because of
> unavailability of reliable automation frameworks outside of k8s.
>
> But I have a slightly different view in terms of how to achieve it. When I
> was exploring what are the possibilities such as kustomize or helm or
> operator. I found it can get pretty complex in terms of writing extensible
> deployment manifest (for different kinds of deployments) with tools like
> kustomize or helm. Here is our attempt to conterairise hbase with operator
> --> https://github.com/flipkart-incubator/hbase-k8s-operator
>
> ---
> Mallikarjun
>
> On Mon, Mar 13, 2023 at 3:58 PM Nick Dimiduk  wrote:
>
> > Heya team,
> >
> > Over here at $dayjob, we have an increasing reliance on Kubernetes for
> > both development and production workloads. Our tools are maturing and
> > we're hoping that they might be of interest to the wider community.
> > I'd like to see if there's community interest in receiving some/any of
> > them as a contribution. I think we'll also need a plan from ASF Infra
> > that makes kubernetes available to us as a project.
>
>
> > We have implemented a basic stack of tools for orchestrating ZK + HDFS
> > + HBase on Kubernetes. We use this for running a small local dev
> > cluster via MiniKube/KIND ; for ITBLL on smallish distributed clusters
> > in a public cloud ; and in production for running clusters of ~100
> > Data Nodes/Region Servers in a public cloud. There was an earlier
> > discussion about using our donation of test hardware for running more
> > thorough tests in our CI, but one of the limiting factors is full
> > cluster deployment. I hope that the community might be interested in
> > receiving this tooling as a foundation for more rigorous correctness
> > and maybe even performance tests in the open. Furthermore, perhaps the
> > wider community has interest in an Apache licensed cluster
> > orchestration tool for other uses.
> >
> > Now for some details: The implementation is built on Kustomize, so
> > it's fundamentally transparent resource specification with yaml
> > patches for composability; this is in contrast to a solution using
> > templates with defined capabilities and interfaces. There is no
> > operator ; it's all coordinated via init/bootstrap containers, shell
> > scripts, shared volumes for state, &c. For now.
>
>
> > Such a donation will amount to a code drop, which will have its
> > challenges. I'm motivated via internal processes to carve it into
> > smaller pieces, and I think that will benefit community review as
> > well. Perhaps this approach could be used to make the contribution via
> > a feature branch.
> >
> > Is there community interest in adding such a capability to our
> > maintained responsibilities? I'd hope that we have several volunteers
> > to work with me through the contribution process, and who are
> > reasonably confident that they'll be able to help maintain such a
> > capability going forward. We'll also need someone who can work with
> > Infra to get us access to Kubernetes cluster(s), via whatever means.
> >
> > What do you think?
> >
> > Thanks,
> > Nick & the HBase team at Apple


[jira] [Resolved] (HBASE-27688) HFile splitting occurs during bulkload, the CREATE_TIME_TS of hfileinfo is 0

2023-03-14 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-27688.
---
Fix Version/s: 2.6.0
   3.0.0-alpha-4
   2.4.17
   2.5.4
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2.4+.

Thanks [~alanlemma] for contributing!

> HFile splitting occurs during bulkload, the CREATE_TIME_TS of hfileinfo is 0
> 
>
> Key: HBASE-27688
> URL: https://issues.apache.org/jira/browse/HBASE-27688
> Project: HBase
>  Issue Type: Bug
>Reporter: alan.zhao
>Assignee: alan.zhao
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.4
>
>
> If HFile splitting occurs during bulkload, the CREATE_TIME_TS of hfileinfo 
> =0,When HFile is copied after splitting, CREATE_TIME_TS of the original file 
> is not copied。
> {code:java}
> ##BulkLoadHFilesTool.class 
> /**
>  * Copy half of an HFile into a new HFile.
>  */
> private static void copyHFileHalf(Configuration conf, Path inFile, Path 
> outFile,
>   Reference reference, ColumnFamilyDescriptor familyDescriptor) throws 
> IOException {
>   FileSystem fs = inFile.getFileSystem(conf);
>   CacheConfig cacheConf = CacheConfig.DISABLED;
>   HalfStoreFileReader halfReader = null;
>   StoreFileWriter halfWriter = null;
>   try {
> 。。。
> HFileContext hFileContext = new 
> HFileContextBuilder().withCompression(compression)
>   .withChecksumType(StoreUtils.getChecksumType(conf))
>   
> .withBytesPerCheckSum(StoreUtils.getBytesPerChecksum(conf)).withBlockSize(blocksize)
>   
> .withDataBlockEncoding(familyDescriptor.getDataBlockEncoding()).withIncludesTags(true)
>   .build();
> // TODO .withCreateTime(EnvironmentEdgeManager.currentTime())      
> halfWriter = new StoreFileWriter.Builder(conf, cacheConf, 
> fs).withFilePath(outFile)
>   .withBloomType(bloomFilterType).withFileContext(hFileContext).build();
> HFileScanner scanner = halfReader.getScanner(false, false, false);
> scanner.seekTo();
> do {
>   halfWriter.append(scanner.getCell());
> } while (scanner.next());
> for (Map.Entry entry : fileInfo.entrySet()) {
>   if (shouldCopyHFileMetaKey(entry.getKey())) {
> halfWriter.appendFileInfo(entry.getKey(), entry.getValue());
>   }
> }
>   } finally {
> 。。。
>   }
> } 
> ##get lastMajorCompactionTs metric
>   lastMajorCompactionTs = this.region.getOldestHfileTs(true);
> ...
>   long now = EnvironmentEdgeManager.currentTime();
>   return now - lastMajorCompactionTs;
> ...
> ##
> public long getOldestHfileTs(boolean majorCompactionOnly) throws IOException {
>   long result = Long.MAX_VALUE;
>   for (HStore store : stores.values()) {
> Collection storeFiles = store.getStorefiles();
>...
> for (HStoreFile file : storeFiles) {
>   StoreFileReader sfReader = file.getReader();
>  ...
>   result = Math.min(result, reader.getFileContext().getFileCreateTime());
> }
>   }
>   return result == Long.MAX_VALUE ? 0 : result;
> }{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Remove checkRefCnt for on-heap ByteBuffs?

2023-03-14 Thread Bryan Beaudreault
For those following along, I’ve submitted an initial PR for this issue.
https://github.com/apache/hbase/pull/5104

On Mon, Mar 13, 2023 at 11:46 AM Bryan Beaudreault 
wrote:

> Thanks everyone for chiming in. I submitted
> https://issues.apache.org/jira/browse/HBASE-27710 and will work on it in
> the near future.
>
> This does make me wonder what overhead checkRefCount causes in off-heap
> cases too, but we just dont see it in profiling because the server does so
> much other work. It has me thinking if there are other optimizations we
> could do there. For example, currently we do checkRefCount for every single
> ByteBuff operation (slice/duplicate/get/etc/etc). A lot of the buffer
> operations happen before creating a Cell for return to user in
> HFileBlock/DBE. I wonder if we could do all the parsing we need in
> HFileBlock/DBE without checkRefCount, do a single checkRefCount call when
> we create the Cell, then enable checkRefCount for all future ByteBuff calls
> before returning a cell to upper levels. This would probably be a larger
> project, so not investigating now. Just an idea i had.
>
> On Mon, Mar 13, 2023 at 2:06 AM Viraj Jasani  wrote:
>
>> +1, as the numbers suggest significant perf improvement.
>>
>>
>> On Thu, Mar 9, 2023 at 9:36 AM Bryan Beaudreault > >
>> wrote:
>>
>> > Hey all,
>> >
>> > We have a use-case for HFile.Reader at my company. The use-case in
>> question
>> > is scanning many hfiles for a small subset of cells, so it mostly is
>> > just iterating a large number of HFiles and discarding most of the
>> cells.
>> > We recently upgraded that deployable from super old cdh 5.16 (hbase
>> > 1.2-ish) to hbase 2.5.3. In doing so, we noticed a performance
>> regression
>> > of around 4x. I imagine this regression would also affect
>> > ClientSideRegionScanner.
>> >
>> > We did some profiling and noticed that a large amount of time is spent
>> in
>> > SingleByteBuff.checkRefCnt. It seems like every SingleByteBuff method
>> calls
>> > checkRefCnt and this checks compares a volatile
>> > in AtomicIntegerFieldUpdater in the netty code.
>> >
>> > I believe ref counting is mostly necessary for off-heap buffers, but
>> > on-heap buffers are also wrapped in SingleByteBuff and so also go
>> through
>> > checkRefCnt. We removed the checkRefCnt call, and the regression
>> > disappeared.
>> >
>> > We created a simple test method which just does HFile.createReader,
>> > reader.getScanner(), and then iterates the scanner counting the total
>> > cells. The test reads an hfile with 100M cells and takes  over 1 minute
>> > with checkRefCnt. Removing checkRefCnt brings the runtime down to 20s.
>> >
>> > It's worth noting that the regression is most prominent on java17. It's
>> > slightly less obvious on java11, with runtime being 40s vs 28s.
>> >
>> > Thoughts on updating SingleByteBuff to skip the checkRefCnt call for
>> > on-heap buffers? We can handle this in the constructor, when wrapping an
>> > on-heap buffer here [1].
>> >
>> > [1]
>> >
>> >
>> https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/io/ByteBuffAllocator.java#L300
>> >
>>
>


[jira] [Resolved] (HBASE-27714) WALEntryStreamTestBase creates a new HBTU in startCluster method which causes all sub classes are testing default configurations

2023-03-14 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-27714.
---
Fix Version/s: 2.6.0
   3.0.0-alpha-4
   2.4.17
   2.5.4
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2.4+.

Thanks [~zghao] for reviewing!

> WALEntryStreamTestBase creates a new HBTU in startCluster method which causes 
> all sub classes are testing default configurations
> 
>
> Key: HBASE-27714
> URL: https://issues.apache.org/jira/browse/HBASE-27714
> Project: HBase
>  Issue Type: Bug
>  Components: Replication, test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.4
>
>
> Should be a typo...
> Let's fix this...



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27716) Fix TestWALOpenAfterDNRollingStart

2023-03-14 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-27716.
---
Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to master and branch-2.

Thanks [~zghao] for reviewing!

> Fix TestWALOpenAfterDNRollingStart
> --
>
> Key: HBASE-27716
> URL: https://issues.apache.org/jira/browse/HBASE-27716
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> Just a test issue, should use NoEOF reader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)