[jira] [Resolved] (HBASE-27458) Use ReadWriteLock for region scanner readpoint map

2023-03-01 Thread Xiaolin Ha (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaolin Ha resolved HBASE-27458.

Resolution: Fixed

Merged to master and branch-2+, thanks [~frostruan] for contributing, and 
thanks [~zhangduo] for reviewing.

> Use ReadWriteLock for region scanner readpoint map 
> ---
>
> Key: HBASE-27458
> URL: https://issues.apache.org/jira/browse/HBASE-27458
> Project: HBase
>  Issue Type: Improvement
>  Components: Scanners
>Affects Versions: 3.0.0-alpha-3
>Reporter: ruanhui
>Assignee: ruanhui
>Priority: Minor
> Fix For: 2.6.0, 3.0.0-alpha-4, 2.4.17, 2.5.4
>
> Attachments: jstack-2.png
>
>
> Currently we manage the concurrency between the RegionScanner and 
> getSmallestReadPoint by synchronizing on the scannerReadPoints object. In our 
> production, we find that many read threads are blocked by this when we have a 
> heavy read load. 
> we need to get smallest read point when 
> a. flush a memstore 
> b. compact memstore/storefile 
> c. do delta operation like increment/append
> Usually the frequency of these operations is much less than read requests. 
> It's a little expensive to use an exclusive lock here because for region 
> scanners, what it need to do is just calcaulating readpoint and putting the 
> readpoint in the scanner readpoint map, which is thread-safe. Multiple read 
> threads can do this in parallel without synchronization.
> Based on the above consideration, maybe we can replace the synchronized lock 
> with readwrite lock. It will help improve the read performance if the 
> bottleneck is on the synchronization here.
> !jstack.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [Discuss] Mapreduce based major compactions to minimise compactions overhead in HBase cluster

2023-03-01 Thread Nick Dimiduk
Hi Rajeshbabu,

I think that compaction management and execution are important areas for
experimentation and growth of HBase. I’m more interested in the harness and
APIs that make an implementation possible than in any specific
implementation. I’d also like to see consideration for a cluster-wide
compaction scheduler, something to prioritize allocation of precious IO
resources.

I agree with Andrew that externalizing to MapReduce is unlikely to be a
popular compute runtime for the feature, but I also have no statistics
about which runtimes are commonly available.

I look forward to seeing how your design proposal develops.

Thanks,
Nick

On Thu, Mar 2, 2023 at 02:46 Andrew Purtell  wrote:

>  Hi Rajesbabu,
>
> You have proposed a solution without describing the problem. Please do that
> first.
>
> That said, compaction is fundamental to HBase operation and should have no
> external dependency on a particular compute framework. Especially
> MapReduce, which is out of favor and deprecated in many places. If this is
> an optional feature it could be fine. So perhaps you could also explain how
> you see this potential feature fitting into the long term roadmap for the
> project.
>
>
>
> On Wed, Mar 1, 2023 at 3:54 PM rajeshb...@apache.org <
> chrajeshbab...@gmail.com> wrote:
>
> > Hi Team,
> >
> > I would just like to discuss the new compactor implementation to run
> major
> > compactions through mapreduce job(which are best fit for merge sorting
> > applications)
> >
> > I have a high level plan and would like to check with you before
> proceeding
> > with detailed design and implementation to know any challenges or any
> > similar solutions you are aware of.
> >
> > High level plan:
> >
> > We should have a new compactor implementation which can create the
> > mapreduce job
> >  for running major compaction and wait for job to complete in a thread.
> > Mapreduce job implementation is as follows:
> > 1) since we need to read through all the files in a column family for
> major
> > compaction
> >  we can pass the column family folder to the mapreduce job.
> > If possible file filters might be required to not use newly created
> hfiles.
> > 2) we can identify the partitions or input splits based on  hfiles
> > boundaries and
> > utilise existing HFileInputFormatter to scan through each hfile
> partitions
> >  so that each mapper sorts data  within the partition range.
> > 3) If possible we can use combiner to remove old versions or deleted
> cells.
> > 4) we can use the HFileOutputFilter to create new HFile at tmp directory
> > and write cells to it by reading the sorted data from mappers in the
> > reducer.
> >
> > once the hfile is created in a tmp directory and mapreduce job completed
> > we can move the compacted file to the column family location, move old
> > files out and refresh the hfiles which is same as default implementation.
> >
> > There are tradeoffs with the solution where intermediate copies of data
> > required
> > while running the mapreduce job even though the hfiles have sorted data.
> >
> > Thanks,
> > Rajeshbabu.
> >
>
>
> --
> Best regards,
> Andrew
>
> Unrest, ignorance distilled, nihilistic imbeciles -
> It's what we’ve earned
> Welcome, apocalypse, what’s taken you so long?
> Bring us the fitting end that we’ve been counting on
>- A23, Welcome, Apocalypse
>


Re: [Discuss] Mapreduce based major compactions to minimise compactions overhead in HBase cluster

2023-03-01 Thread Andrew Purtell
 Hi Rajesbabu,

You have proposed a solution without describing the problem. Please do that
first.

That said, compaction is fundamental to HBase operation and should have no
external dependency on a particular compute framework. Especially
MapReduce, which is out of favor and deprecated in many places. If this is
an optional feature it could be fine. So perhaps you could also explain how
you see this potential feature fitting into the long term roadmap for the
project.



On Wed, Mar 1, 2023 at 3:54 PM rajeshb...@apache.org <
chrajeshbab...@gmail.com> wrote:

> Hi Team,
>
> I would just like to discuss the new compactor implementation to run major
> compactions through mapreduce job(which are best fit for merge sorting
> applications)
>
> I have a high level plan and would like to check with you before proceeding
> with detailed design and implementation to know any challenges or any
> similar solutions you are aware of.
>
> High level plan:
>
> We should have a new compactor implementation which can create the
> mapreduce job
>  for running major compaction and wait for job to complete in a thread.
> Mapreduce job implementation is as follows:
> 1) since we need to read through all the files in a column family for major
> compaction
>  we can pass the column family folder to the mapreduce job.
> If possible file filters might be required to not use newly created hfiles.
> 2) we can identify the partitions or input splits based on  hfiles
> boundaries and
> utilise existing HFileInputFormatter to scan through each hfile partitions
>  so that each mapper sorts data  within the partition range.
> 3) If possible we can use combiner to remove old versions or deleted cells.
> 4) we can use the HFileOutputFilter to create new HFile at tmp directory
> and write cells to it by reading the sorted data from mappers in the
> reducer.
>
> once the hfile is created in a tmp directory and mapreduce job completed
> we can move the compacted file to the column family location, move old
> files out and refresh the hfiles which is same as default implementation.
>
> There are tradeoffs with the solution where intermediate copies of data
> required
> while running the mapreduce job even though the hfiles have sorted data.
>
> Thanks,
> Rajeshbabu.
>


-- 
Best regards,
Andrew

Unrest, ignorance distilled, nihilistic imbeciles -
It's what we’ve earned
Welcome, apocalypse, what’s taken you so long?
Bring us the fitting end that we’ve been counting on
   - A23, Welcome, Apocalypse


[Discuss] Mapreduce based major compactions to minimise compactions overhead in HBase cluster

2023-03-01 Thread rajeshb...@apache.org
Hi Team,

I would just like to discuss the new compactor implementation to run major
compactions through mapreduce job(which are best fit for merge sorting
applications)

I have a high level plan and would like to check with you before proceeding
with detailed design and implementation to know any challenges or any
similar solutions you are aware of.

High level plan:

We should have a new compactor implementation which can create the
mapreduce job
 for running major compaction and wait for job to complete in a thread.
Mapreduce job implementation is as follows:
1) since we need to read through all the files in a column family for major
compaction
 we can pass the column family folder to the mapreduce job.
If possible file filters might be required to not use newly created hfiles.
2) we can identify the partitions or input splits based on  hfiles
boundaries and
utilise existing HFileInputFormatter to scan through each hfile partitions
 so that each mapper sorts data  within the partition range.
3) If possible we can use combiner to remove old versions or deleted cells.
4) we can use the HFileOutputFilter to create new HFile at tmp directory
and write cells to it by reading the sorted data from mappers in the
reducer.

once the hfile is created in a tmp directory and mapreduce job completed
we can move the compacted file to the column family location, move old
files out and refresh the hfiles which is same as default implementation.

There are tradeoffs with the solution where intermediate copies of data
required
while running the mapreduce job even though the hfiles have sorted data.

Thanks,
Rajeshbabu.


[jira] [Created] (HBASE-27681) Refactor Table Latency Metrics

2023-03-01 Thread tianhang tang (Jira)
tianhang tang created HBASE-27681:
-

 Summary: Refactor Table Latency Metrics
 Key: HBASE-27681
 URL: https://issues.apache.org/jira/browse/HBASE-27681
 Project: HBase
  Issue Type: Improvement
Reporter: tianhang tang
Assignee: tianhang tang
 Attachments: image-2023-03-01-23-55-14-095.png, 
image-2023-03-01-23-56-16-819.png

Benefit:
# Table Latency Metrics could removed after table has been moved away. Fix 
[HBASE-27617|https://issues.apache.org/jira/browse/HBASE-27617]
# Could remove the hash lookup caused by metrics map from the hot request path.
# Reduce the output in jmx. As if we use jmx_exporter to collect metrics to 
Prometheus, the operation overhead of performing regular matching in metric is 
relatively high(especially region metrics, which might be next step).
# I think that could be the first step after hbase-metrics released. It seems 
that the roadmap indicates that we should replace hadoop-metrics2 with 
hbase-metrics.


Influence:
# The metrics structure in jmx will change.

Old:  !image-2023-03-01-23-55-14-095.png! 
New:  !image-2023-03-01-23-56-16-819.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-26491) I can't read/write a Hbase table by spark-hbase connector when the table is in non-default namespace

2023-03-01 Thread Istvan Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Toth resolved HBASE-26491.
-
Resolution: Duplicate

> I can't read/write a Hbase table by spark-hbase connector when the table is  
> in non-default namespace
> -
>
> Key: HBASE-26491
> URL: https://issues.apache.org/jira/browse/HBASE-26491
> Project: HBase
>  Issue Type: Bug
>  Components: hbase-connectors
>Affects Versions: 1.0.0
>Reporter: mengdou
>Priority: Minor
> Attachments: image-2021-11-26-17-32-53-507.png, 
> image-2021-12-01-20-52-11-664.png, image-2021-12-01-20-53-05-405.png
>
>
> I found I can't read/write a Hbase table by spark-hbase connector when the 
> hbase table is  in a non-default namespace.
>  
> Because when spark opens a table(related to a hbase table), it creates a 
> HBaseRelation instance first, and initializes a HBaseTableCatalog from the 
> table definition saved in spark catalog. But in the function 'convert' the 
> field 'tableCatalog' is constructed from a string template, in which the 
> namespace is set as 'default', leading to a wrong namespace. This namespace 
> is not  the one defined when user created the table before.
>  
> Pls have a look:
> !image-2021-11-26-17-32-53-507.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HBASE-23983) Spotbugs warning complain on master build

2023-03-01 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk reopened HBASE-23983:
--

Reopening for backport.

> Spotbugs warning complain on master build
> -
>
> Key: HBASE-23983
> URL: https://issues.apache.org/jira/browse/HBASE-23983
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1
>Reporter: Jan Hentschel
>Assignee: Jan Hentschel
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> Spotbugs currently complains on every master build with an extant warning 
> related:
> {code}
> Return value of putIfAbsent is ignored, but node is reused in 
> org.apache.hadoop.hbase.master.assignment.RegionStates.createRegionStateNode(RegionInfo)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27680) Bump hbase to 2.4.16, hadoop to 3.1.2 and spark to 3.2.3 for hbase-connectors

2023-03-01 Thread Nihal Jain (Jira)
Nihal Jain created HBASE-27680:
--

 Summary: Bump hbase to 2.4.16, hadoop to 3.1.2 and spark to 3.2.3 
for hbase-connectors
 Key: HBASE-27680
 URL: https://issues.apache.org/jira/browse/HBASE-27680
 Project: HBase
  Issue Type: Task
  Components: hbase-connectors
Reporter: Nihal Jain
Assignee: Nihal Jain






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-23102) Improper Usage of Map putIfAbsent

2023-03-01 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-23102.
--
Resolution: Fixed

Backported to 2.6 and 2.5.

> Improper Usage of Map putIfAbsent
> -
>
> Key: HBASE-23102
> URL: https://issues.apache.org/jira/browse/HBASE-23102
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
>  Labels: newbie, noob
> Fix For: 2.6.0, 2.5.4, 3.0.0-alpha-1
>
> Attachments: HBASE-23102.1.patch
>
>
> When using {{Map#putIfAbsent}}, the argument should not be a {{new}} object.  
> Otherwise, if the item is present, the object that was instantiated is 
> immediately thrown away.  Instead, use {{Map#computeIfAbsent}} so that the 
> object is only instantiated if it is needed.
> There exists a good example in the {{Map}} JavaDoc:
> https://docs.oracle.com/javase/8/docs/api/java/util/Map.html#computeIfAbsent-K-java.util.function.Function-
> 
> h2. Locations
> https://github.com/apache/hbase/blob/9370347efea5b09e2fa8f4e5d82fa32491e1181b/hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/QuotaCache.java#L227-L236
> https://github.com/apache/hbase/blob/025ddce868eb06b4072b5152c5ffae5a01e7ae30/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/throttle/StoreHotnessProtector.java#L124-L129
> https://github.com/apache/hbase/blob/1170f28122d9d36e511ba504a5263ec62e11ef6a/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java#L555
> https://github.com/apache/hbase/blob/4ca760fe9dd373b8d8a4c48db15e42424920653c/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java#L584-L586
> https://github.com/apache/hbase/blob/4ca760fe9dd373b8d8a4c48db15e42424920653c/hbase-rsgroup/src/main/java/org/apache/hadoop/hbase/rsgroup/RSGroupAdminServer.java#L585
> https://github.com/apache/hbase/blob/5b01e613fbbb92e243e99a1d199b4ffbb21ed2d9/hbase-procedure/src/main/java/org/apache/hadoop/hbase/procedure2/ProcedureExecutor.java#L834



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-27673) Fix mTLS client hostname verification

2023-03-01 Thread Balazs Meszaros (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Meszaros resolved HBASE-27673.
-
Fix Version/s: 2.6.0
   3.0.0-alpha-4
   Resolution: Fixed

> Fix mTLS client hostname verification
> -
>
> Key: HBASE-27673
> URL: https://issues.apache.org/jira/browse/HBASE-27673
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 3.0.0-alpha-3
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Fix For: 2.6.0, 3.0.0-alpha-4
>
>
> The exception what I get:
> {noformat}
> 23/02/22 15:18:06 ERROR tls.HBaseTrustManager: Failed to verify host address: 
> 127.0.0.1
> javax.net.ssl.SSLPeerUnverifiedException: Certificate for <127.0.0.1> doesn't 
> match any of the subject alternative names: [***]
>   at 
> org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.matchIPAddress(HBaseHostnameVerifier.java:144)
>   at 
> org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.verify(HBaseHostnameVerifier.java:117)
>   at 
> org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.performHostVerification(HBaseTrustManager.java:143)
>   at 
> org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.checkClientTrusted(HBaseTrustManager.java:97)
>   ...
> 23/02/22 15:18:06 ERROR tls.HBaseTrustManager: Failed to verify hostname: 
> localhost
> javax.net.ssl.SSLPeerUnverifiedException: Certificate for  doesn't 
> match any of the subject alternative names: [***]
>   at 
> org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.matchDNSName(HBaseHostnameVerifier.java:159)
>   at 
> org.apache.hadoop.hbase.io.crypto.tls.HBaseHostnameVerifier.verify(HBaseHostnameVerifier.java:119)
>   at 
> org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.performHostVerification(HBaseTrustManager.java:171)
>   at 
> org.apache.hadoop.hbase.io.crypto.tls.HBaseTrustManager.checkClientTrusted(HBaseTrustManager.java:97)
>   ...
> 23/02/22 15:18:06 WARN ipc.NettyRpcServer: Connection /100.100.124.2:47109; 
> caught unexpected downstream exception.
> org.apache.hbase.thirdparty.io.netty.handler.codec.DecoderException: 
> javax.net.ssl.SSLHandshakeException: Failed to verify both host address and 
> host name
>   at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499)
>   at 
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:750)
> Caused by: javax.net.ssl.SSLHandshakeException: Failed to verify both host 
> address and host name
>   at sun.security.ssl.Alert.createSSLException(Alert.java:131)
>   at sun.security.ssl.TransportContext.fatal(TransportContext.java:324)
>   at sun.security.ssl.TransportContext.fatal(TransportContext.java:267)
>   at sun.security.ssl.TransportContext.fatal(TransportContext.java:262)
>   at 
> 

[jira] [Resolved] (HBASE-27639) Support hbase-connectors compilation with HBase 2.5.3, Hadoop 3.2.4 and Spark 3.2.3

2023-03-01 Thread Peter Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Somogyi resolved HBASE-27639.
---
Fix Version/s: hbase-connectors-1.1.0
   Resolution: Fixed

Merged to master. Thanks for the patch [~nihaljain.cs].

> Support hbase-connectors compilation with HBase 2.5.3, Hadoop 3.2.4 and Spark 
> 3.2.3
> ---
>
> Key: HBASE-27639
> URL: https://issues.apache.org/jira/browse/HBASE-27639
> Project: HBase
>  Issue Type: Improvement
>  Components: hbase-connectors, spark
>Reporter: Nihal Jain
>Assignee: Nihal Jain
>Priority: Major
> Fix For: hbase-connectors-1.1.0
>
>
> Goal is to allow hbase-connectors to compile with:
>  * HBase: 2.5.3
>  * Hadoop: 3.2.4 and
>  * Spark: 3.2.3
> We could also discuss if we want to bump the versions of the above mentioned 
> in the pom itself,
> or just want to let spark connector compile with above components as the JIRA 
> title says.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)