[
https://issues.apache.org/jira/browse/PHOENIX-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906670#comment-17906670
]
Sanjeet Malhotra commented on PHOENIX-7484:
-------------------------------------------
The current MetadataClient#updateCache() call looks into client metadata cache
for tenant view if we are using tenant connection and if no entry is found,
then it looks for a global table. This logic fails a corner case scenario where
we have a tenant view and a global table with the same name and the entry for
tenant view somehow gets ejected, then a tenant connection can wrongly end up
selecting global table.
We can fix this by hitting SYSCAT always when we don't find tenant view in
client metadata cache but this will cause us to always hit SYSCAT if there is
no tenant view in SYSCAT. Thus, to avoid looking up SYSCAT always in the case
of negative lookup we can maintain a negative cache at CQSI level. Thus, once
we hit SYSCAT to get tenant view and find that no such tenant view exists then
we can cache this info to avoid looking up again in the SYSCAT on next call.
The same fix is needed for MetadataClient#updateCache() and PHOENIX-7484. We
also discussed following points to reach to final solution: * Having negative
entries in same CQSI cache as the positive entries. But this can penalize the
non-multi tenant tables as they most likely won't be as frequently accessed as
tenant's base table. So, better to keep these two caches separate.
* Having tenant connection specific negative cache. As we earlier had tenant
connection specific positive cache and deprecated it so, this time good to
start with single CQSI level tenant negative cache. Additionally, if the
negative cache is not shared b/w connections then each connection will be
penalized at least once. Moreover, having a tenant connection negative cache
will require us to maintain caches at two levels i.e. CQSI and
PhoenixConnection, which we would like to avoid.
This bug will be fixed automatically once PHOENIX-7490 gets fixed.
Thanks [~haridsv] [~kadir] for suggesting the solution to handle the bigger
problem.
> Upserts on a multi-tenant tables using tenant connection are taking 5K-6K %
> more time than non-tenant connection
> ----------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-7484
> URL: https://issues.apache.org/jira/browse/PHOENIX-7484
> Project: Phoenix
> Issue Type: Bug
> Reporter: Sanjeet Malhotra
> Assignee: Sanjeet Malhotra
> Priority: Major
>
> Upserts using tenant connection on a multi-tenant table are taking 5K-6K %
> more time than upserts using non-tenant connection for 2M rows. Here the time
> being taken means total time spent in `executeUpdate()` and `commit()` call.
> The batch size and schema was same when testing with tenant connection and
> non-tenant connection.
> On further analysis, got to know that when doing upserts (for 2M rows) on a
> multi-tenant table over a tenant connection 13K-14K% more time was being
> spent in executeUpdate call than non-tenant connection. This whole regression
> is coming from mutation plan creation phase of executeUpdate call.
> Further root caused that, with tenant connection we are always hitting SYSCAT
> to get PTable object during mutation plan creation. So, every call to
> executeUpdate() over tenant connection results in PTable lookup from SYSCAT
> during mutation plan creation adding ~1ms to every call of executeUpdate()
> and for 2M rows this cumulate to 29-33 mins.
>
> For multi-tenant tables, the PTableKey in metadata cache has tenant Id as
> null as table was created over a non-tenant connection. When we are using
> multi-tenant connection for doing upserts, the PTableKey used to lookup
> PTableRef in metadata cache on client has tenant Id same as tenant Id of
> connection i.e. non null. Thus, when lookup happens for PTableRef it results
> in cache miss and next we immediately fallback to `getTableNoCache()` which
> ends up hitting SYSCAT. Rather we should first fallback to looking in
> metadata cache again but with tenant Id as null in PTableKey used for lookup
> and if still we don't find PTableRef then we should fallback to
> `getTableNoCache()`.
>
> Code pointer:
> https://github.com/apache/phoenix/blob/7682e3cee82e9cecb952eddaade1c544e6bd502d/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/PhoenixConnection.java#L766-L768
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)