[jira] [Resolved] (HBASE-27458) Use ReadWriteLock for region scanner readpoint map
[ https://issues.apache.org/jira/browse/HBASE-27458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaolin Ha resolved HBASE-27458. Resolution: Fixed Merged to master and branch-2+, thanks [~frostruan] for contributing, and thanks [~zhangduo] for reviewing. > Use ReadWriteLock for region scanner readpoint map > --- > > Key: HBASE-27458 > URL: https://issues.apache.org/jira/browse/HBASE-27458 > Project: HBase > Issue Type: Improvement > Components: Scanners >Affects Versions: 3.0.0-alpha-3 >Reporter: ruanhui >Assignee: ruanhui >Priority: Minor > Fix For: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.4 > > Attachments: jstack-2.png > > > Currently we manage the concurrency between the RegionScanner and > getSmallestReadPoint by synchronizing on the scannerReadPoints object. In our > production, we find that many read threads are blocked by this when we have a > heavy read load. > we need to get smallest read point when > a. flush a memstore > b. compact memstore/storefile > c. do delta operation like increment/append > Usually the frequency of these operations is much less than read requests. > It's a little expensive to use an exclusive lock here because for region > scanners, what it need to do is just calcaulating readpoint and putting the > readpoint in the scanner readpoint map, which is thread-safe. Multiple read > threads can do this in parallel without synchronization. > Based on the above consideration, maybe we can replace the synchronized lock > with readwrite lock. It will help improve the read performance if the > bottleneck is on the synchronization here. > !jstack.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [Discuss] Mapreduce based major compactions to minimise compactions overhead in HBase cluster
Hi Rajeshbabu, I think that compaction management and execution are important areas for experimentation and growth of HBase. I’m more interested in the harness and APIs that make an implementation possible than in any specific implementation. I’d also like to see consideration for a cluster-wide compaction scheduler, something to prioritize allocation of precious IO resources. I agree with Andrew that externalizing to MapReduce is unlikely to be a popular compute runtime for the feature, but I also have no statistics about which runtimes are commonly available. I look forward to seeing how your design proposal develops. Thanks, Nick On Thu, Mar 2, 2023 at 02:46 Andrew Purtell wrote: > Hi Rajesbabu, > > You have proposed a solution without describing the problem. Please do that > first. > > That said, compaction is fundamental to HBase operation and should have no > external dependency on a particular compute framework. Especially > MapReduce, which is out of favor and deprecated in many places. If this is > an optional feature it could be fine. So perhaps you could also explain how > you see this potential feature fitting into the long term roadmap for the > project. > > > > On Wed, Mar 1, 2023 at 3:54 PM rajeshb...@apache.org < > chrajeshbab...@gmail.com> wrote: > > > Hi Team, > > > > I would just like to discuss the new compactor implementation to run > major > > compactions through mapreduce job(which are best fit for merge sorting > > applications) > > > > I have a high level plan and would like to check with you before > proceeding > > with detailed design and implementation to know any challenges or any > > similar solutions you are aware of. > > > > High level plan: > > > > We should have a new compactor implementation which can create the > > mapreduce job > > for running major compaction and wait for job to complete in a thread. > > Mapreduce job implementation is as follows: > > 1) since we need to read through all the files in a column family for > major > > compaction > > we can pass the column family folder to the mapreduce job. > > If possible file filters might be required to not use newly created > hfiles. > > 2) we can identify the partitions or input splits based on hfiles > > boundaries and > > utilise existing HFileInputFormatter to scan through each hfile > partitions > > so that each mapper sorts data within the partition range. > > 3) If possible we can use combiner to remove old versions or deleted > cells. > > 4) we can use the HFileOutputFilter to create new HFile at tmp directory > > and write cells to it by reading the sorted data from mappers in the > > reducer. > > > > once the hfile is created in a tmp directory and mapreduce job completed > > we can move the compacted file to the column family location, move old > > files out and refresh the hfiles which is same as default implementation. > > > > There are tradeoffs with the solution where intermediate copies of data > > required > > while running the mapreduce job even though the hfiles have sorted data. > > > > Thanks, > > Rajeshbabu. > > > > > -- > Best regards, > Andrew > > Unrest, ignorance distilled, nihilistic imbeciles - > It's what we’ve earned > Welcome, apocalypse, what’s taken you so long? > Bring us the fitting end that we’ve been counting on >- A23, Welcome, Apocalypse >
Re: [Discuss] Mapreduce based major compactions to minimise compactions overhead in HBase cluster
Hi Rajesbabu, You have proposed a solution without describing the problem. Please do that first. That said, compaction is fundamental to HBase operation and should have no external dependency on a particular compute framework. Especially MapReduce, which is out of favor and deprecated in many places. If this is an optional feature it could be fine. So perhaps you could also explain how you see this potential feature fitting into the long term roadmap for the project. On Wed, Mar 1, 2023 at 3:54 PM rajeshb...@apache.org < chrajeshbab...@gmail.com> wrote: > Hi Team, > > I would just like to discuss the new compactor implementation to run major > compactions through mapreduce job(which are best fit for merge sorting > applications) > > I have a high level plan and would like to check with you before proceeding > with detailed design and implementation to know any challenges or any > similar solutions you are aware of. > > High level plan: > > We should have a new compactor implementation which can create the > mapreduce job > for running major compaction and wait for job to complete in a thread. > Mapreduce job implementation is as follows: > 1) since we need to read through all the files in a column family for major > compaction > we can pass the column family folder to the mapreduce job. > If possible file filters might be required to not use newly created hfiles. > 2) we can identify the partitions or input splits based on hfiles > boundaries and > utilise existing HFileInputFormatter to scan through each hfile partitions > so that each mapper sorts data within the partition range. > 3) If possible we can use combiner to remove old versions or deleted cells. > 4) we can use the HFileOutputFilter to create new HFile at tmp directory > and write cells to it by reading the sorted data from mappers in the > reducer. > > once the hfile is created in a tmp directory and mapreduce job completed > we can move the compacted file to the column family location, move old > files out and refresh the hfiles which is same as default implementation. > > There are tradeoffs with the solution where intermediate copies of data > required > while running the mapreduce job even though the hfiles have sorted data. > > Thanks, > Rajeshbabu. > -- Best regards, Andrew Unrest, ignorance distilled, nihilistic imbeciles - It's what we’ve earned Welcome, apocalypse, what’s taken you so long? Bring us the fitting end that we’ve been counting on - A23, Welcome, Apocalypse
[Discuss] Mapreduce based major compactions to minimise compactions overhead in HBase cluster
Hi Team, I would just like to discuss the new compactor implementation to run major compactions through mapreduce job(which are best fit for merge sorting applications) I have a high level plan and would like to check with you before proceeding with detailed design and implementation to know any challenges or any similar solutions you are aware of. High level plan: We should have a new compactor implementation which can create the mapreduce job for running major compaction and wait for job to complete in a thread. Mapreduce job implementation is as follows: 1) since we need to read through all the files in a column family for major compaction we can pass the column family folder to the mapreduce job. If possible file filters might be required to not use newly created hfiles. 2) we can identify the partitions or input splits based on hfiles boundaries and utilise existing HFileInputFormatter to scan through each hfile partitions so that each mapper sorts data within the partition range. 3) If possible we can use combiner to remove old versions or deleted cells. 4) we can use the HFileOutputFilter to create new HFile at tmp directory and write cells to it by reading the sorted data from mappers in the reducer. once the hfile is created in a tmp directory and mapreduce job completed we can move the compacted file to the column family location, move old files out and refresh the hfiles which is same as default implementation. There are tradeoffs with the solution where intermediate copies of data required while running the mapreduce job even though the hfiles have sorted data. Thanks, Rajeshbabu.
[jira] [Created] (HBASE-27681) Refactor Table Latency Metrics
tianhang tang created HBASE-27681: - Summary: Refactor Table Latency Metrics Key: HBASE-27681 URL: https://issues.apache.org/jira/browse/HBASE-27681 Project: HBase Issue Type: Improvement Reporter: tianhang tang Assignee: tianhang tang Attachments: image-2023-03-01-23-55-14-095.png, image-2023-03-01-23-56-16-819.png Benefit: # Table Latency Metrics could removed after table has been moved away. Fix [HBASE-27617|https://issues.apache.org/jira/browse/HBASE-27617] # Could remove the hash lookup caused by metrics map from the hot request path. # Reduce the output in jmx. As if we use jmx_exporter to collect metrics to Prometheus, the operation overhead of performing regular matching in metric is relatively high(especially region metrics, which might be next step). # I think that could be the first step after hbase-metrics released. It seems that the roadmap indicates that we should replace hadoop-metrics2 with hbase-metrics. Influence: # The metrics structure in jmx will change. Old: !image-2023-03-01-23-55-14-095.png! New: !image-2023-03-01-23-56-16-819.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-26491) I can't read/write a Hbase table by spark-hbase connector when the table is in non-default namespace
[ https://issues.apache.org/jira/browse/HBASE-26491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Istvan Toth resolved HBASE-26491. - Resolution: Duplicate > I can't read/write a Hbase table by spark-hbase connector when the table is > in non-default namespace > - > > Key: HBASE-26491 > URL: https://issues.apache.org/jira/browse/HBASE-26491 > Project: HBase > Issue Type: Bug > Components: hbase-connectors >Affects Versions: 1.0.0 >Reporter: mengdou >Priority: Minor > Attachments: image-2021-11-26-17-32-53-507.png, > image-2021-12-01-20-52-11-664.png, image-2021-12-01-20-53-05-405.png > > > I found I can't read/write a Hbase table by spark-hbase connector when the > hbase table is in a non-default namespace. > > Because when spark opens a table(related to a hbase table), it creates a > HBaseRelation instance first, and initializes a HBaseTableCatalog from the > table definition saved in spark catalog. But in the function 'convert' the > field 'tableCatalog' is constructed from a string template, in which the > namespace is set as 'default', leading to a wrong namespace. This namespace > is not the one defined when user created the table before. > > Pls have a look: > !image-2021-11-26-17-32-53-507.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (HBASE-23983) Spotbugs warning complain on master build
[ https://issues.apache.org/jira/browse/HBASE-23983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk reopened HBASE-23983: -- Reopening for backport. > Spotbugs warning complain on master build > - > > Key: HBASE-23983 > URL: https://issues.apache.org/jira/browse/HBASE-23983 > Project: HBase > Issue Type: Improvement >Affects Versions: 3.0.0-alpha-1 >Reporter: Jan Hentschel >Assignee: Jan Hentschel >Priority: Major > Fix For: 3.0.0-alpha-1 > > > Spotbugs currently complains on every master build with an extant warning > related: > {code} > Return value of putIfAbsent is ignored, but node is reused in > org.apache.hadoop.hbase.master.assignment.RegionStates.createRegionStateNode(RegionInfo) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27680) Bump hbase to 2.4.16, hadoop to 3.1.2 and spark to 3.2.3 for hbase-connectors
Nihal Jain created HBASE-27680: -- Summary: Bump hbase to 2.4.16, hadoop to 3.1.2 and spark to 3.2.3 for hbase-connectors Key: HBASE-27680 URL: https://issues.apache.org/jira/browse/HBASE-27680 Project: HBase Issue Type: Task Components: hbase-connectors Reporter: Nihal Jain Assignee: Nihal Jain -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-23102) Improper Usage of Map putIfAbsent
[ https://issues.apache.org/jira/browse/HBASE-23102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk resolved HBASE-23102. -- Resolution: Fixed Backported to 2.6 and 2.5. > Improper Usage of Map putIfAbsent > - > > Key: HBASE-23102 > URL: https://issues.apache.org/jira/browse/HBASE-23102 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Minor > Labels: newbie, noob > Fix For: 2.6.0, 2.5.4, 3.0.0-alpha-1 > > Attachments: HBASE-23102.1.patch > > > When using {{Map#putIfAbsent}}, the argument should not be a {{new}} object. > Otherwise, if the item is present, the object that was instantiated is > immediately thrown away. Instead, use {{Map#computeIfAbsent}} so that the > object is only instantiated if it is needed. > There exists a good example in the {{Map}} JavaDoc: > https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#computeIfAbsent-K-java.util.function.Function- > > h2. Locations > https://github.com/apache/hbase/blob/9370347efea5b09e2fa8f4e5d82fa32491e1181b/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/QuotaCache.java#L227-L236 > https://github.com/apache/hbase/blob/025ddce868eb06b4072b5152c5ffae5a01e7ae30/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/throttle/StoreHotnessProtector.java#L124-L129 > https://github.com/apache/hbase/blob/1170f28122d9d36e511ba504a5263ec62e11ef6a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java#L555 > https://github.com/apache/hbase/blob/4ca760fe9dd373b8d8a4c48db15e42424920653c/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java#L584-L586 > https://github.com/apache/hbase/blob/4ca760fe9dd373b8d8a4c48db15e42424920653c/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java#L585 > https://github.com/apache/hbase/blob/5b01e613fbbb92e243e99a1d199b4ffbb21ed2d9/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java#L834 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-27673) Fix mTLS client hostname verification
[ https://issues.apache.org/jira/browse/HBASE-27673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balazs Meszaros resolved HBASE-27673. - Fix Version/s: 2.6.0 3.0.0-alpha-4 Resolution: Fixed > Fix mTLS client hostname verification > - > > Key: HBASE-27673 > URL: https://issues.apache.org/jira/browse/HBASE-27673 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 3.0.0-alpha-3 >Reporter: Balazs Meszaros >Assignee: Balazs Meszaros >Priority: Major > Fix For: 2.6.0, 3.0.0-alpha-4 > > > The exception what I get: > {noformat} > 23/02/22 15:18:06 ERROR tls.HBaseTrustManager: Failed to verify host address: > 127.0.0.1 > javax.net.ssl.SSLPeerUnverifiedException: Certificate for <127.0.0.1> doesn't > match any of the subject alternative names: [***] > at > org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.matchIPAddress(HBaseHostnameVerifier.java:144) > at > org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.verify(HBaseHostnameVerifier.java:117) > at > org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.performHostVerification(HBaseTrustManager.java:143) > at > org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.checkClientTrusted(HBaseTrustManager.java:97) > ... > 23/02/22 15:18:06 ERROR tls.HBaseTrustManager: Failed to verify hostname: > localhost > javax.net.ssl.SSLPeerUnverifiedException: Certificate for doesn't > match any of the subject alternative names: [***] > at > org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.matchDNSName(HBaseHostnameVerifier.java:159) > at > org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.verify(HBaseHostnameVerifier.java:119) > at > org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.performHostVerification(HBaseTrustManager.java:171) > at > org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.checkClientTrusted(HBaseTrustManager.java:97) > ... > 23/02/22 15:18:06 WARN ipc.NettyRpcServer: Connection /100.100.124.2:47109; > caught unexpected downstream exception. > org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: > javax.net.ssl.SSLHandshakeException: Failed to verify both host address and > host name > at > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499) > at > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) > at > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) > at > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) > at > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > at > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) > at > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) > at > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > at > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) > at > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499) > at > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397) > at > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) > at > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.lang.Thread.run(Thread.java:750) > Caused by: javax.net.ssl.SSLHandshakeException: Failed to verify both host > address and host name > at sun.security.ssl.Alert.createSSLException(Alert.java:131) > at sun.security.ssl.TransportContext.fatal(TransportContext.java:324) > at sun.security.ssl.TransportContext.fatal(TransportContext.java:267) > at sun.security.ssl.TransportContext.fatal(TransportContext.java:262) > at >
[jira] [Resolved] (HBASE-27639) Support hbase-connectors compilation with HBase 2.5.3, Hadoop 3.2.4 and Spark 3.2.3
[ https://issues.apache.org/jira/browse/HBASE-27639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi resolved HBASE-27639. --- Fix Version/s: hbase-connectors-1.1.0 Resolution: Fixed Merged to master. Thanks for the patch [~nihaljain.cs]. > Support hbase-connectors compilation with HBase 2.5.3, Hadoop 3.2.4 and Spark > 3.2.3 > --- > > Key: HBASE-27639 > URL: https://issues.apache.org/jira/browse/HBASE-27639 > Project: HBase > Issue Type: Improvement > Components: hbase-connectors, spark >Reporter: Nihal Jain >Assignee: Nihal Jain >Priority: Major > Fix For: hbase-connectors-1.1.0 > > > Goal is to allow hbase-connectors to compile with: > * HBase: 2.5.3 > * Hadoop: 3.2.4 and > * Spark: 3.2.3 > We could also discuss if we want to bump the versions of the above mentioned > in the pom itself, > or just want to let spark connector compile with above components as the JIRA > title says. -- This message was sent by Atlassian Jira (v8.20.10#820010)