[ 
https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615255#comment-16615255
 ] 

Nihal Jain commented on HBASE-21196:
------------------------------------

Submitted [^HBASE-21196.master.002.patch]. Please review..

> HTableMultiplexer clears the meta cache after every put operation
> -----------------------------------------------------------------
>
>                 Key: HBASE-21196
>                 URL: https://issues.apache.org/jira/browse/HBASE-21196
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance
>    Affects Versions: 3.0.0, 1.3.3, 2.2.0
>            Reporter: Nihal Jain
>            Assignee: Nihal Jain
>            Priority: Critical
>             Fix For: 3.0.0
>
>         Attachments: HBASE-21196.master.001.patch, 
> HBASE-21196.master.001.patch, HBASE-21196.master.002.patch, 
> HTableMultiplexer1000Puts.UT.txt
>
>
> *Problem:* Operations which use 
> {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, 
> MultiResponse, int)}} API with tablename set to null reset the meta cache of 
> the corresponding server after each call. One such operation is put operation 
> of HTableMultiplexer (Might not be the only one). This may impact the 
> performance of the system severely as all new ops directed to that server 
> will have to go to zk first to get the meta table address and then get the 
> location of the table region as it will become empty after every 
> htablemultiplexer put.
> From the logs below, one can see after every other put the cached region 
> locations are cleared. As a side effect of this, before every put the server 
> needs to contact zk and get meta table location and read meta to get region 
> locations of the table.
> {noformat}
> 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): 
> Removed all cached region locations that map to 
> root1-thinkpad-t440p,35811,1536857446588
> 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] 
> client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for 
> root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] 
> ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" 
> request_param: true priority: 0 timeout: 60000 totalRequestSize: 137 bytes
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 executing as root1
> 2018-09-13 22:21:15,515 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: 
> 137 connection: 127.0.0.1:42338 param: region= 
> testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., 
> row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { 
> associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 
> totalTime: 0
> 2018-09-13 22:21:15,516 TRACE 
> [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] 
> io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, 
> count=0, allocations=1
> 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, 
> callTime: 2ms
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan 
> table=hbase:meta, 
> startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999
> 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): 
> Advancing internal small scanner to startKey at 
> 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999'
> 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up 
> meta region location in ZK, 
> connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f
> {noformat}
> From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see 
> that the string "Removed all cached region locations that map" and "Looking 
> up meta region location in ZK" are present for every put.
> *Analysis:*
>  The problem occurs as we call the {{cleanServerCache}} method always clears 
> the server cache in case tablename is null and exception is null. See 
> [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918]
> {code:java}
> private void cleanServerCache(ServerName server, Throwable regionException) {
>     if (tableName == null && 
> ClientExceptionsUtil.isMetaClearingException(regionException)) {
>       // For multi-actions, we don't have a table name, but we want to make 
> sure to clear the
>       // cache in case there were location-related exceptions. We don't to 
> clear the cache
>       // for every possible exception that comes through, however.
>       asyncProcess.connection.clearCaches(server);
>     }
>   }
> {code}
> The problem is  
> [ClientExceptionsUtil.isMetaClearingException(regionException))|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L51]
>  assumes that the caller should take care of null exception check before 
> calling the method i.e. it will return true if the passed exception is null, 
> which may not be a correct assumption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to