Kadir Ozdemir created PHOENIX-6761:
--------------------------------------
Summary: Phoenix Client Side Metadata Caching Improvement
Key: PHOENIX-6761
URL: https://issues.apache.org/jira/browse/PHOENIX-6761
Project: Phoenix
Issue Type: Improvement
Reporter: Kadir Ozdemir
CQSI maintains a client-side metadata cache, i.e., schemas, tables, and
functions, that evicts the last recently used table entries when the cache size
grows beyond the configured size.
Each time a Phoenix connection is created, the client-side metadata cache
maintained by the CQSI object creating this connection is cloned for the
connection. Thus, we have two levels of caches, one at the Phoenix connection
level and the other at the CQSI level.
When a Phoenix client needs to update the client side cache, it updates both
caches (on the connection object and on the CQSI object). The Phoenix client
attempts to retrieve a table from the connection level cache. If this table is
not there then the Phoenix client does not check the CQSI level cache, instead
it retrieves the object from the server and finally updates both the connection
and CQSI level cache.
PMetaDataCache provides caching for tables, schemas and functions but it
maintains separate caches internally, one cache for each type of metadata. The
cache for the tables is actually a cache of PTableRef objects. PTableRef holds
a reference to the table object as well as the estimated size of the table
object, the create time, last access time, and resolved time. The create time
is set to the last access time value provided when the PTableRef object is
inserted into the cache. The resolved time is also provided when the PTableRef
object is inserted into the cache. Both the created time and resolved time are
final fields (i.e., they are not updated). PTableRef provide a setter method to
update the last access time. PMetaDataCache updates the last access time
whenever the table is retrieved from the cache. The LRU eviction policy is
implemented using the last access time. The eviction policy is not implemented
for schemas and functions. The configuration parameter for the frequency of
updating cache is phoenix.default.update.cache.frequency. This can be defined
at the cluster or table level. When it is set to zero, it means cache would not
be used.
Obviously the eviction of the cache is to limit the memory consumed by the
cache. The expected behavior is that when a table is removed from the cache,
the table (PTableImpl) object is also garbage collected. However, this does not
really happen because multiple caches make references to the same object and
each cache maintains its own table refs and thus access times. This means that
the access time for the same table may differ from one cache to another; and
when one cache can evict an object, another cache will hold on the same object.
Although individual caches implements the LRU eviction policy, the overall
memory eviction policy for the actual table objects is more like age based
cache. If a table is frequently accessed from the connection level caches, the
last access time maintained by the corresponding table ref objects for this
table will be updated. However, these updates on the access times will not be
visible to the CQSI level cache. The table refs in the CQSI level cache have
the same create time and access time.
Since whenever an object is inserted into the local cache of a connection
object, it is also inserted the cache on the CSQI object, the CQSI level cache
will grow faster than the caches on the connection objects. When the cache
reaches its maximum size, the newly inserted tables will result in evicting one
of the existing tables in the cache. Since the access time of these tables are
not updated on the CQSI level cache, it is likely that the table that has
stayed in the cache for the longest period of time will be evicted (regardless
of whether the same table is frequently accessed via the connection level
caches). This obviously defeats the purpose of an LRU cache.
Another problem with the current cache is related to the choice of its internal
data structures and its eviction implementation. The table refs in the cache
are maintained in a hash map which maps a table key (which is pair of a tenant
id and table name) to a table ref. When the size of a cache (the total byte
size of the table objects referred by the cache) reaches its configured limit,
how much overage adding a new table would cause is computed. Then all the table
refs in this cache are cloned into a priority queue as well as a new cache.
This queue uses the access time to determine the order of its elements (i.e.,
table refs). The table refs that should not be evicted are removed from the
queue, which leaves the table refs to be evicted in the queue. Finally, the
table refs left in the queue are removed from the new cache. The new cache
replaces the old one. It clear that this is an expensive operation in terms of
memory allocations and CPU time. The bad news is that when the cache reaches
its limit, every insertion would likely cause an eviction and this expensive
operation will be repeated for each such insertion.
Since Phoenix connections are supposed to be short lived, maintaining a
separate cache for each connection object and especially cloning entire cache
content (and then pruning the entries belonging to other tenants when the
connection is a tenant specific connection) are not justified. The cost of such
a clone operation by itself would offset the gain of not accessing the CQSI
level cache as the number of such accesses per connection should be small
because of short lived Phoenix connections.
Also the impact of Phoenix connection leaks, the connections that are not
closed by applications and simply long lived connections will be exacerbated
since these connections will have references to the large set of table objects.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)