Tanuj Khurana created PHOENIX-7727:
--------------------------------------

             Summary: Eliminate IndexMetadataCache rpcs and use server side 
cqsi PTable cache for index maintainer metadata
                 Key: PHOENIX-7727
                 URL: https://issues.apache.org/jira/browse/PHOENIX-7727
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Tanuj Khurana
            Assignee: Tanuj Khurana


ServerCachingEndpointImpl coproc implements the server cache RPC protocol. One 
use case of the cache RPCs is server side index updates. Whenever the client 
commits a batch of mutations, if the mutation count is greater than 
_*phoenix.index.mutableBatchSizeThreshold*_ (default value 3) instead of 
sending the index maintainer metadata as a mutation attribute the client uses 
the server cache RPCs to populate the server cache on the region servers and 
just sends the cache key as a mutation attribute. This was done as an 
optimization to reduce sending duplicate index maintainer information on every 
mutation of the batch. It is typical to have batches of size 100 - 1000 so this 
optimization is useful but there are several downsides of this rpc approach.
 # In-order to determine which region servers to send the cache RPCs, we first 
create the scan ranges object from the primary keys in the mutations. The size 
of the scan ranges object is the same as your commit size. This can add to GC 
overhead since we are doing this on every commit batch.
 # Then the client calls _*getAllTableRegions*_ which can make calls to meta if 
the table region locations are not cached in the hbase client meta cache. This 
adds additional latency on the client side. Once it receives the region list, 
it intersects the region boundaries with the scan range it constructed to 
determine the locations of the region servers which host the regions that will 
be receiving the mutations.


 # Then the actual RPCs are executed in parallel but these caching RPCs are 
subject to standard hbase client retry policies and can be retried in case of 
timeouts or RITs thus potentially adding more latency overhead.


 # Futhermore, it is not guaranteed that when the server processes these 
mutations in the IndexRegionObserver coproc and tries to fetch the index 
maintainer metadata from the cache it will definitely find the cache entry. 
This happens when the region moves/splits after sending the cache RPC but 
before the data table mutations are sent. Another scenario where this happens 
is if the server is overloaded and RPCs are getting queued on the server and by 
the time the server process the batch rpc, the cache entry has expired (default 
TTL 30s). If the metadata is not found a DoNotRetryIOException is returned to 
the client which is handled within the Phoenix MutationState class. The phoenix 
client retries and again repeats. The worst thing is that when the Phoenix 
client receives this error, it first calls _*clearTableRegionCache*_ before 
repeating.
 

Sample error logs that we have seen in production:
{code:java}
2025-10-20 07:38:21,800 INFO  
[t.FPRWQ.Fifo.write.handler=120,queue=20,port=60020] util.IndexManagementUtil - 
Rethrowing 
org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
(INT10): Unable to find cached index metadata.  key=4619765145502425070 
region=FOO.TEST1,00D1H000000N1TASDER,1708858336233.1ae49454ee9993697a7cc9e34c899b25.host=server.net,60020,1757812136389
 Index update failed
    at org.apache.phoenix.util.ClientUtil.createIOException(ClientUtil.java:166)
    at org.apache.phoenix.util.ClientUtil.throwIOException(ClientUtil.java:182)
    at 
org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaDataCache(PhoenixIndexMetaDataBuilder.java:101)
    at 
org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaData(PhoenixIndexMetaDataBuilder.java:51)
    at 
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:92)
    at 
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:69)
    at 
org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexMetaData(IndexBuildManager.java:85)
    at 
org.apache.phoenix.hbase.index.IndexRegionObserver.getPhoenixIndexMetaData(IndexRegionObserver.java:1090)
    at 
org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutateWithExceptions(IndexRegionObserver.java:1214)
    at 
org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutate(IndexRegionObserver.java:514)
    at {code}
There is a better solution which will address most of the above problems. 
Previously, the IndexRegionObserver coproc didn't have the logical name of the 
table when it was processing a batch of mutations so it couldn't tell whether 
the entity into which data is being upserted is a table or a view. Because of 
this the server couldn't determine if the entity in question has an index or 
not so it relied on the client to tell the server by annotating the mutations. 
But [PHOENIX-5521 |https://issues.apache.org/jira/browse/PHOENIX-5521]started 
annotating each mutation with enough metadata so that the server can 
deterministically figure out the Phoenix schema object the mutation targets to. 
With this information the server can simply _*getTable()*_ and rely on the cqsi 
cache. Depending on the UPDATE_CACHE_FREQUENCY set on the table we can control 
the schema freshness. 

This will greatly simplify the implementation and should also improve batch 
write times on tables with indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to