[ 
https://issues.apache.org/jira/browse/PHOENIX-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanuj Khurana updated PHOENIX-7727:
-----------------------------------
    Description: 
ServerCachingEndpointImpl coproc implements the server cache RPC protocol. One 
use case of the cache RPCs is server side index updates. Whenever the client 
commits a batch of mutations, if the mutation count is greater than 
_*phoenix.index.mutableBatchSizeThreshold*_ (default value 3) instead of 
sending the index maintainer metadata as a mutation attribute the client uses 
the server cache RPCs to populate the server cache on the region servers and 
just sends the cache key as a mutation attribute. This was done as an 
optimization to reduce sending duplicate index maintainer information on every 
mutation of the batch. It is typical to have batches of size 100 - 1000 so this 
optimization is useful but there are several downsides of this rpc approach.
 # In-order to determine which region servers to send the cache RPCs, we first 
create the scan ranges object from the primary keys in the mutations. The size 
of the scan ranges object is the same as your commit size. This can add to GC 
overhead since we are doing this on every commit batch.
 # Then the client calls _*getAllTableRegions*_ which can make calls to meta if 
the table region locations are not cached in the hbase client meta cache. This 
adds additional latency on the client side. Once it receives the region list, 
it intersects the region boundaries with the scan range it constructed to 
determine the locations of the region servers which host the regions that will 
be receiving the mutations.
 # Then the actual RPCs are executed in parallel but these caching RPCs are 
subject to standard hbase client retry policies and can be retried in case of 
timeouts or RITs thus potentially adding more latency overhead.
 # Furthermore, it is not guaranteed that when the server processes these 
mutations in the IndexRegionObserver coproc and tries to fetch the index 
maintainer metadata from the cache it will definitely find the cache entry. 
This happens when the region moves/splits after sending the cache RPC but 
before the data table mutations are sent. Another scenario where this happens 
is if the server is overloaded and RPCs are getting queued on the server and by 
the time the server process the batch rpc, the cache entry has expired (default 
TTL 30s). If the metadata is not found a DoNotRetryIOException is returned to 
the client which is handled within the Phoenix MutationState class. The phoenix 
client retries and again repeats. The worst thing is that when the Phoenix 
client receives this error, it first calls _*clearTableRegionCache*_ before 
repeating.

 

Sample error logs that we have seen in production:
{code:java}
2025-10-20 07:38:21,800 INFO  
[t.FPRWQ.Fifo.write.handler=120,queue=20,port=60020] util.IndexManagementUtil - 
Rethrowing 
org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
(INT10): Unable to find cached index metadata.  key=4619765145502425070 
region=FOO.TEST1,00D1H000000N1TASDER,1708858336233.1ae49454ee9993697a7cc9e34c899b25.host=server.net,60020,1757812136389
 Index update failed
    at org.apache.phoenix.util.ClientUtil.createIOException(ClientUtil.java:166)
    at org.apache.phoenix.util.ClientUtil.throwIOException(ClientUtil.java:182)
    at 
org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaDataCache(PhoenixIndexMetaDataBuilder.java:101)
    at 
org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaData(PhoenixIndexMetaDataBuilder.java:51)
    at 
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:92)
    at 
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:69)
    at 
org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexMetaData(IndexBuildManager.java:85)
    at 
org.apache.phoenix.hbase.index.IndexRegionObserver.getPhoenixIndexMetaData(IndexRegionObserver.java:1090)
    at 
org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutateWithExceptions(IndexRegionObserver.java:1214)
    at 
org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutate(IndexRegionObserver.java:514)
    at {code}
There is a better solution which will address most of the above problems. 
Previously, the IndexRegionObserver coproc didn't have the logical name of the 
table when it was processing a batch of mutations so it couldn't tell whether 
the entity into which data is being upserted is a table or a view. Because of 
this the server couldn't determine if the entity in question has an index or 
not so it relied on the client to tell the server by annotating the mutations. 
But PHOENIX-5521 started annotating each mutation with enough metadata so that 
the server can deterministically figure out the Phoenix schema object the 
mutation targets to. With this information the server can simply _*getTable()*_ 
and rely on the cqsi cache. Depending on the UPDATE_CACHE_FREQUENCY set on the 
table we can control the schema freshness. There are already other places on 
the server where we are making getTable calls like in compaction, server 
metadata caching

This will greatly simplify the implementation and should also improve batch 
write times on tables with indexes.

  was:
ServerCachingEndpointImpl coproc implements the server cache RPC protocol. One 
use case of the cache RPCs is server side index updates. Whenever the client 
commits a batch of mutations, if the mutation count is greater than 
_*phoenix.index.mutableBatchSizeThreshold*_ (default value 3) instead of 
sending the index maintainer metadata as a mutation attribute the client uses 
the server cache RPCs to populate the server cache on the region servers and 
just sends the cache key as a mutation attribute. This was done as an 
optimization to reduce sending duplicate index maintainer information on every 
mutation of the batch. It is typical to have batches of size 100 - 1000 so this 
optimization is useful but there are several downsides of this rpc approach.
 # In-order to determine which region servers to send the cache RPCs, we first 
create the scan ranges object from the primary keys in the mutations. The size 
of the scan ranges object is the same as your commit size. This can add to GC 
overhead since we are doing this on every commit batch.
 # Then the client calls _*getAllTableRegions*_ which can make calls to meta if 
the table region locations are not cached in the hbase client meta cache. This 
adds additional latency on the client side. Once it receives the region list, 
it intersects the region boundaries with the scan range it constructed to 
determine the locations of the region servers which host the regions that will 
be receiving the mutations.


 # Then the actual RPCs are executed in parallel but these caching RPCs are 
subject to standard hbase client retry policies and can be retried in case of 
timeouts or RITs thus potentially adding more latency overhead.


 # Futhermore, it is not guaranteed that when the server processes these 
mutations in the IndexRegionObserver coproc and tries to fetch the index 
maintainer metadata from the cache it will definitely find the cache entry. 
This happens when the region moves/splits after sending the cache RPC but 
before the data table mutations are sent. Another scenario where this happens 
is if the server is overloaded and RPCs are getting queued on the server and by 
the time the server process the batch rpc, the cache entry has expired (default 
TTL 30s). If the metadata is not found a DoNotRetryIOException is returned to 
the client which is handled within the Phoenix MutationState class. The phoenix 
client retries and again repeats. The worst thing is that when the Phoenix 
client receives this error, it first calls _*clearTableRegionCache*_ before 
repeating.

 

Sample error logs that we have seen in production:
{code:java}
2025-10-20 07:38:21,800 INFO  
[t.FPRWQ.Fifo.write.handler=120,queue=20,port=60020] util.IndexManagementUtil - 
Rethrowing 
org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
(INT10): Unable to find cached index metadata.  key=4619765145502425070 
region=FOO.TEST1,00D1H000000N1TASDER,1708858336233.1ae49454ee9993697a7cc9e34c899b25.host=server.net,60020,1757812136389
 Index update failed
    at org.apache.phoenix.util.ClientUtil.createIOException(ClientUtil.java:166)
    at org.apache.phoenix.util.ClientUtil.throwIOException(ClientUtil.java:182)
    at 
org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaDataCache(PhoenixIndexMetaDataBuilder.java:101)
    at 
org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaData(PhoenixIndexMetaDataBuilder.java:51)
    at 
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:92)
    at 
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:69)
    at 
org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexMetaData(IndexBuildManager.java:85)
    at 
org.apache.phoenix.hbase.index.IndexRegionObserver.getPhoenixIndexMetaData(IndexRegionObserver.java:1090)
    at 
org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutateWithExceptions(IndexRegionObserver.java:1214)
    at 
org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutate(IndexRegionObserver.java:514)
    at {code}
There is a better solution which will address most of the above problems. 
Previously, the IndexRegionObserver coproc didn't have the logical name of the 
table when it was processing a batch of mutations so it couldn't tell whether 
the entity into which data is being upserted is a table or a view. Because of 
this the server couldn't determine if the entity in question has an index or 
not so it relied on the client to tell the server by annotating the mutations. 
But PHOENIX-5521 started annotating each mutation with enough metadata so that 
the server can deterministically figure out the Phoenix schema object the 
mutation targets to. With this information the server can simply _*getTable()*_ 
and rely on the cqsi cache. Depending on the UPDATE_CACHE_FREQUENCY set on the 
table we can control the schema freshness. There are already other places on 
the server where we are making getTable calls like in compaction, server 
metadata caching

This will greatly simplify the implementation and should also improve batch 
write times on tables with indexes.


> Eliminate IndexMetadataCache rpcs and use server side cqsi PTable cache for 
> index maintainer metadata
> -----------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-7727
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7727
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Tanuj Khurana
>            Assignee: Tanuj Khurana
>            Priority: Major
>
> ServerCachingEndpointImpl coproc implements the server cache RPC protocol. 
> One use case of the cache RPCs is server side index updates. Whenever the 
> client commits a batch of mutations, if the mutation count is greater than 
> _*phoenix.index.mutableBatchSizeThreshold*_ (default value 3) instead of 
> sending the index maintainer metadata as a mutation attribute the client uses 
> the server cache RPCs to populate the server cache on the region servers and 
> just sends the cache key as a mutation attribute. This was done as an 
> optimization to reduce sending duplicate index maintainer information on 
> every mutation of the batch. It is typical to have batches of size 100 - 1000 
> so this optimization is useful but there are several downsides of this rpc 
> approach.
>  # In-order to determine which region servers to send the cache RPCs, we 
> first create the scan ranges object from the primary keys in the mutations. 
> The size of the scan ranges object is the same as your commit size. This can 
> add to GC overhead since we are doing this on every commit batch.
>  # Then the client calls _*getAllTableRegions*_ which can make calls to meta 
> if the table region locations are not cached in the hbase client meta cache. 
> This adds additional latency on the client side. Once it receives the region 
> list, it intersects the region boundaries with the scan range it constructed 
> to determine the locations of the region servers which host the regions that 
> will be receiving the mutations.
>  # Then the actual RPCs are executed in parallel but these caching RPCs are 
> subject to standard hbase client retry policies and can be retried in case of 
> timeouts or RITs thus potentially adding more latency overhead.
>  # Furthermore, it is not guaranteed that when the server processes these 
> mutations in the IndexRegionObserver coproc and tries to fetch the index 
> maintainer metadata from the cache it will definitely find the cache entry. 
> This happens when the region moves/splits after sending the cache RPC but 
> before the data table mutations are sent. Another scenario where this happens 
> is if the server is overloaded and RPCs are getting queued on the server and 
> by the time the server process the batch rpc, the cache entry has expired 
> (default TTL 30s). If the metadata is not found a DoNotRetryIOException is 
> returned to the client which is handled within the Phoenix MutationState 
> class. The phoenix client retries and again repeats. The worst thing is that 
> when the Phoenix client receives this error, it first calls 
> _*clearTableRegionCache*_ before repeating.
>  
> Sample error logs that we have seen in production:
> {code:java}
> 2025-10-20 07:38:21,800 INFO  
> [t.FPRWQ.Fifo.write.handler=120,queue=20,port=60020] util.IndexManagementUtil 
> - Rethrowing 
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
> (INT10): Unable to find cached index metadata.  key=4619765145502425070 
> region=FOO.TEST1,00D1H000000N1TASDER,1708858336233.1ae49454ee9993697a7cc9e34c899b25.host=server.net,60020,1757812136389
>  Index update failed
>     at 
> org.apache.phoenix.util.ClientUtil.createIOException(ClientUtil.java:166)
>     at 
> org.apache.phoenix.util.ClientUtil.throwIOException(ClientUtil.java:182)
>     at 
> org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaDataCache(PhoenixIndexMetaDataBuilder.java:101)
>     at 
> org.apache.phoenix.index.PhoenixIndexMetaDataBuilder.getIndexMetaData(PhoenixIndexMetaDataBuilder.java:51)
>     at 
> org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:92)
>     at 
> org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:69)
>     at 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexMetaData(IndexBuildManager.java:85)
>     at 
> org.apache.phoenix.hbase.index.IndexRegionObserver.getPhoenixIndexMetaData(IndexRegionObserver.java:1090)
>     at 
> org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutateWithExceptions(IndexRegionObserver.java:1214)
>     at 
> org.apache.phoenix.hbase.index.IndexRegionObserver.preBatchMutate(IndexRegionObserver.java:514)
>     at {code}
> There is a better solution which will address most of the above problems. 
> Previously, the IndexRegionObserver coproc didn't have the logical name of 
> the table when it was processing a batch of mutations so it couldn't tell 
> whether the entity into which data is being upserted is a table or a view. 
> Because of this the server couldn't determine if the entity in question has 
> an index or not so it relied on the client to tell the server by annotating 
> the mutations. But PHOENIX-5521 started annotating each mutation with enough 
> metadata so that the server can deterministically figure out the Phoenix 
> schema object the mutation targets to. With this information the server can 
> simply _*getTable()*_ and rely on the cqsi cache. Depending on the 
> UPDATE_CACHE_FREQUENCY set on the table we can control the schema freshness. 
> There are already other places on the server where we are making getTable 
> calls like in compaction, server metadata caching
> This will greatly simplify the implementation and should also improve batch 
> write times on tables with indexes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to