[ https://issues.apache.org/jira/browse/HBASE-21196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615180#comment-16615180 ]
Nihal Jain commented on HBASE-21196: ------------------------------------ bq. Please fix the assertion ('it is0'). Sure will remove the above string from the message as the actual value is anyways logged in case of failure bq. Since the other two tests don't fail without fix, I think you may take them out. I added the other 2 tests deliberately for greater coverage. I think it's good to have tests which can ensure we don't break anything like this in future. Should I keep them? > HTableMultiplexer clears the meta cache after every put operation > ----------------------------------------------------------------- > > Key: HBASE-21196 > URL: https://issues.apache.org/jira/browse/HBASE-21196 > Project: HBase > Issue Type: Bug > Components: Performance > Affects Versions: 3.0.0, 1.3.3, 2.2.0 > Reporter: Nihal Jain > Assignee: Nihal Jain > Priority: Critical > Fix For: 3.0.0 > > Attachments: HBASE-21196.master.001.patch, > HBASE-21196.master.001.patch, HTableMultiplexer1000Puts.UT.txt > > > *Problem:* Operations which use > {{AsyncRequestFutureImpl.receiveMultiAction(MultiAction, ServerName, > MultiResponse, int)}} API with tablename set to null reset the meta cache of > the corresponding server after each call. One such operation is put operation > of HTableMultiplexer (Might not be the only one). This may impact the > performance of the system severely as all new ops directed to that server > will have to go to zk first to get the meta table address and then get the > location of the table region as it will become empty after every > htablemultiplexer put. > From the logs below, one can see after every other put the cached region > locations are cleared. As a side effect of this, before every put the server > needs to contact zk and get meta table location and read meta to get region > locations of the table. > {noformat} > 2018-09-13 22:21:15,467 TRACE [htable-pool11-t1] client.MetaCache(283): > Removed all cached region locations that map to > root1-thinkpad-t440p,35811,1536857446588 > 2018-09-13 22:21:15,467 DEBUG [HTableFlushWorker-5] > client.HTableMultiplexer$FlushWorker(632): Processed 1 put requests for > root1-ThinkPad-T440p:35811 and 0 failed, latency for this send: 5 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.reader=1,bindAddress=root1-ThinkPad-T440p,port=35811] > ipc.RpcServer$Connection(1954): RequestHeader call_id: 218 method_name: "Get" > request_param: true priority: 0 timeout: 60000 totalRequestSize: 137 bytes > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.CallRunner(105): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 executing as root1 > 2018-09-13 22:21:15,515 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > ipc.RpcServer(2356): callId: 218 service: ClientService methodName: Get size: > 137 connection: 127.0.0.1:42338 param: region= > testHTableMultiplexer_1,,1536857451720.304d914b641a738624937c7f9b4d684f., > row=\x00\x00\x00\xC4 connection: 127.0.0.1:42338, response result { > associated_cell_count: 1 stale: false } queueTime: 0 processingTime: 0 > totalTime: 0 > 2018-09-13 22:21:15,516 TRACE > [RpcServer.FifoWFPBQ.default.handler=3,queue=0,port=35811] > io.BoundedByteBufferPool(106): runningAverage=16384, totalCapacity=0, > count=0, allocations=1 > 2018-09-13 22:21:15,516 TRACE [main] ipc.AbstractRpcClient(236): Call: Get, > callTime: 2ms > 2018-09-13 22:21:15,516 TRACE [main] client.ClientScanner(122): Scan > table=hbase:meta, > startRow=testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999 > 2018-09-13 22:21:15,516 TRACE [main] client.ClientSmallReversedScanner(179): > Advancing internal small scanner to startKey at > 'testHTableMultiplexer_1,\x00\x00\x00\xC5,99999999999999' > 2018-09-13 22:21:15,517 TRACE [main] client.ZooKeeperRegistry(59): Looking up > meta region location in ZK, > connection=org.apache.hadoop.hbase.client.ZooKeeperRegistry@599f571f > {noformat} > From the minicluster logs [^HTableMultiplexer1000Puts.UT.txt] one can see > that the string "Removed all cached region locations that map" and "Looking > up meta region location in ZK" are present for every put. > *Analysis:* > The problem occurs as we call the {{cleanServerCache}} method always clears > the server cache in case tablename is null and exception is null. See > [AsyncRequestFutureImpl.java#L918|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncRequestFutureImpl.java#L918] > {code:java} > private void cleanServerCache(ServerName server, Throwable regionException) { > if (tableName == null && > ClientExceptionsUtil.isMetaClearingException(regionException)) { > // For multi-actions, we don't have a table name, but we want to make > sure to clear the > // cache in case there were location-related exceptions. We don't to > clear the cache > // for every possible exception that comes through, however. > asyncProcess.connection.clearCaches(server); > } > } > {code} > The problem isĀ > [ClientExceptionsUtil.isMetaClearingException(regionException))|https://github.com/apache/hbase/blob/5d14c1af65c02f4e87059337c35e4431505de91c/hbase-client/src/main/java/org/apache/hadoop/hbase/exceptions/ClientExceptionsUtil.java#L51] > assumes that the caller should take care of null exception check before > calling the method i.e. it will return true if the passed exception is null, > which may not be a correct assumption. -- This message was sent by Atlassian JIRA (v7.6.3#76005)