[
https://issues.apache.org/jira/browse/PHOENIX-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761659#comment-17761659
]
Viraj Jasani commented on PHOENIX-6883:
---
[~shahrs87] PHOENIX-6883-feature branch has merge conflict with latest commit
from PHOENIX-7029. Since the conflicting changes are not so much, you could
directly resolve it.
If required, we can trigger full build after resolving merge conflict on
[https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-mulitbranch/job/PHOENIX-6883-feature/]
> Phoenix metadata caching redesign
> -
>
> Key: PHOENIX-6883
> URL: https://issues.apache.org/jira/browse/PHOENIX-6883
> Project: Phoenix
> Issue Type: Improvement
>Reporter: Kadir Ozdemir
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 5.2.0
>
>
> PHOENIX-6761 improves the client side metadata caching by eliminating the
> separate cache for each connection. This improvement results in memory and
> compute savings since it eliminates copying CQSI level cache every time a
> Phoenix connection is created, and also replaces the inefficient the CQSI
> level cache implementation with Guava Cache from Google.
> Despite this improvement, the overall metadata caching architecture begs for
> redesign. This is because every operation in Phoenix need to make multiple
> RPCs to metadata servers for the SYSTEM.CATALOG table (please see
> PHOENIX-6860) to ensure the latest metadata changes are visible to clients.
> These constant RPCs makes the region servers serving SYSTEM.CATALOG hot spot
> and thus leads to poor performance and availability issues.
> The UPDATE_CACHE_FREQUENCY configuration parameter specifies how frequently
> the client cache is updated. However, setting this parameter to a non-zero
> value results in stale caching. Stale caching can cause data integrity
> issues. For example, if an index table creation is not visible to the client,
> Phoenix would skip updating the index table in the write path. That's why is
> this parameter is typically set to zero. However, this defeats the purpose of
> client side metadata caching.
> The redesign of the metadata caching architecture is to directly address this
> issue by making sure that the client metadata caching is always used (that
> is, UPDATE_CACHE_FREQUENCY is set to NEVER) but still ensures the data
> integrity. This is achieved by three main changes.
> The first change is to introduce server side metadata caching in all region
> servers. Currently, the server side metadata caching is used on the region
> servers serving SYSTEM.CATALOG. This metadata caching should be strongly
> consistent such that the metadata updates should include invalidating the
> corresponding entries on the server side caches. This would ensure the server
> cache would not become stale.
> The second change is that the Phoenix client passes the LAST_DDL_TIMESTAMP
> table attribute along with scan and mutation operations to the server regions
> (more accurately to the Phoenix coprocessors). Then the Phoenix coprocessors
> would check the timestamp on a given operation against with the timestamp in
> its server side cache to validate that the client did not use stale metadata
> when it prepared the operation. If the client did use stale metadata then the
> coprocessor would return an exception (this exception can be called
> StaleClientMetadataCacheException) to the client.
> The third change is that upon receiving StaleClientMetadataCacheException the
> Phoenix client makes an RPC call to the metadata server to update the client
> cache, reconstruct the operation with the updated cached, and retry the
> operation.
> This redesign would require updating client and server metadata caches only
> when metadata is stale instead of updating the client metadata cache for each
> (scan or mutation) operation. This would eliminate hot spotting on the
> metadata servers and thus poor performance and availability issues caused by
> this hot spotting.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)