[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615255#comment-16615255 ]
Nihal Jain commented on HBASE-21196: ------------------------------------ Submitted [^HBASE-21196.master.002.patch]. Please review.. > HTableMultiplexer clears the meta cache after every put operation > ----------------------------------------------------------------- > > Key: HBASE-21196 > URL: https://issues.apache.org/jira/browse/HBASE-21196 > Project: HBase > Issue Type: Bug > Components: Performance > Affects Versions: 3.0.0, 1.3.3, 2.2.0 > Reporter: Nihal Jain > Assignee: Nihal Jain > Priority: Critical > Fix For: 3.0.0 > > Attachments: HBASE-21196.master.001.patch, > HBASE-21196.master.001.patch, HBASE-21196.master.002.patch, > HTableMultiplexer1000Puts.UT.txt > > > *Problem:* Operations which use > {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, > MultiResponse, int)}} API with tablename set to null reset the meta cache of > the corresponding server after each call. One such operation is put operation > of HTableMultiplexer (Might not be the only one). This may impact the > performance of the system severely as all new ops directed to that server > will have to go to zk first to get the meta table address and then get the > location of the table region as it will become empty after every > htablemultiplexer put. > From the logs below, one can see after every other put the cached region > locations are cleared. As a side effect of this, before every put the server > needs to contact zk and get meta table location and read meta to get region > locations of the table. > {noformat} > 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): > Removed all cached region locations that map to > root1-thinkpad-t440p,35811,1536857446588 > 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] > client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for > root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] > ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" > request_param: true priority: 0 timeout: 60000 totalRequestSize: 137 bytes > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 executing as root1 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 param: region= > testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., > row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { > associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 > totalTime: 0 > 2018-09-13 22:21:15,516 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, > count=0, allocations=1 > 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, > callTime: 2ms > 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan > table=hbase:meta, > startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999 > 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): > Advancing internal small scanner to startKey at > 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999' > 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up > meta region location in ZK, > connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f > {noformat} > From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see > that the string "Removed all cached region locations that map" and "Looking > up meta region location in ZK" are present for every put. > *Analysis:* > The problem occurs as we call the {{cleanServerCache}} method always clears > the server cache in case tablename is null and exception is null. See > [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918] > {code:java} > private void cleanServerCache(ServerName server, Throwable regionException) { > if (tableName == null && > ClientExceptionsUtil.isMetaClearingException(regionException)) { > // For multi-actions, we don't have a table name, but we want to make > sure to clear the > // cache in case there were location-related exceptions. We don't to > clear the cache > // for every possible exception that comes through, however. > asyncProcess.connection.clearCaches(server); > } > } > {code} > The problem is > [ClientExceptionsUtil.isMetaClearingException(regionException))|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L51] > assumes that the caller should take care of null exception check before > calling the method i.e. it will return true if the passed exception is null, > which may not be a correct assumption. -- This message was sent by Atlassian JIRA (v7.6.3#76005)