Kadir Ozdemir created PHOENIX-6761:
--------------------------------------

             Summary: Phoenix Client Side Metadata Caching Improvement
                 Key: PHOENIX-6761
                 URL: https://issues.apache.org/jira/browse/PHOENIX-6761
             Project: Phoenix
          Issue Type: Improvement
            Reporter: Kadir Ozdemir


CQSI maintains a client-side metadata cache, i.e., schemas, tables, and 
functions, that evicts the last recently used table entries when the cache size 
grows beyond the configured size.

Each time a Phoenix connection is created, the client-side metadata cache 
maintained by the CQSI object creating this connection is cloned for the 
connection. Thus, we have two levels of caches, one at the Phoenix connection 
level and the other at the CQSI level. 

When a Phoenix client needs to update the client side cache, it updates both 
caches (on the connection object and on the CQSI object). The Phoenix client 
attempts to retrieve a table from the connection level cache. If this table is 
not there then the Phoenix client does not check the CQSI level cache, instead 
it retrieves the object from the server and finally updates both the connection 
and CQSI level cache.

PMetaDataCache provides caching for tables, schemas and functions but it 
maintains separate caches internally, one cache for each type of metadata. The 
cache for the tables is actually a cache of PTableRef objects. PTableRef holds 
a reference to the table object as well as the estimated size of the table 
object, the create time, last access time, and resolved time. The create time 
is set to the last access time value provided when the PTableRef object is 
inserted into the cache. The resolved time is also provided when the PTableRef 
object is inserted into the cache. Both the created time and resolved time are 
final fields (i.e., they are not updated). PTableRef provide a setter method to 
update the last access time. PMetaDataCache updates the last access time 
whenever the table is retrieved from the cache. The LRU eviction policy is 
implemented using the last access time. The eviction policy is not implemented 
for schemas and functions. The configuration parameter for the frequency of 
updating cache is phoenix.default.update.cache.frequency. This can be defined 
at the cluster or table level. When it is set to zero, it means cache would not 
be used.

Obviously the eviction of the cache is to limit the memory consumed by the 
cache. The expected behavior is that when a table is removed from the cache, 
the table (PTableImpl) object is also garbage collected. However, this does not 
really happen because multiple caches make references to the same object and 
each cache maintains its own table refs and thus access times. This means that 
the access time for the same table may differ from one cache to another; and 
when one cache can evict an object, another cache will hold on the same object. 

Although individual caches implements the LRU eviction policy, the overall 
memory eviction policy for the actual table objects is more like age based 
cache. If a table is frequently accessed from the connection level caches, the 
last access time maintained by the corresponding table ref objects for this 
table will be updated. However, these updates on the access times will not be 
visible to the CQSI level cache. The table refs in the CQSI level cache have 
the same create time and access time. 

Since whenever an object is inserted into the local cache of a connection 
object, it is also inserted the cache on the CSQI object, the CQSI level cache 
will grow faster than the caches on the connection objects. When the cache 
reaches its maximum size, the newly inserted tables will result in evicting one 
of the existing tables in the cache. Since the access time of these tables are 
not updated on the CQSI level cache, it is likely that the table that has 
stayed in the cache for the longest period of time will be evicted (regardless 
of whether the same table is frequently accessed via the connection level 
caches). This obviously defeats the purpose of an LRU cache.

Another problem with the current cache is related to the choice of its internal 
data structures and its eviction implementation. The table refs in the cache 
are maintained in a hash map which maps a table key (which is pair of a tenant 
id and table name) to a table ref. When the size of a cache (the total byte 
size of the table objects referred by the cache) reaches its configured limit, 
how much overage adding a new table would cause is computed. Then all the table 
refs in this cache are cloned into a priority queue as well as a new cache. 
This queue uses the access time to determine the order of its elements (i.e., 
table refs). The table refs that should not be evicted are removed from the 
queue, which leaves the table refs to be evicted in the queue. Finally, the 
table refs left in the queue are removed from the new cache. The new cache 
replaces the old one. It clear that this is an expensive operation in terms of 
memory allocations and CPU time. The bad news is that when the cache reaches 
its limit, every insertion would likely cause an eviction and this expensive 
operation will be repeated for each such insertion.

Since Phoenix connections are supposed to be short lived, maintaining a 
separate cache for each connection object and especially cloning entire cache 
content (and then pruning the entries belonging to other tenants when the 
connection is a tenant specific connection) are not justified. The cost of such 
a clone operation by itself would offset the gain of not accessing the CQSI 
level cache as the number of such accesses per connection should be small 
because of short lived Phoenix connections. 

Also the impact of Phoenix connection leaks, the connections that are not 
closed by applications and simply long lived connections will be exacerbated 
since these connections will have references to the large set of table objects.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to