[ 
https://issues.apache.org/jira/browse/PHOENIX-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905622#comment-17905622
 ] 

Tanuj Khurana commented on PHOENIX-7484:
----------------------------------------

Isn't the code in MetadataClient#updateCache() doing something similar to what 
you are suggesting ? There is even a comment stating the same 
https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/schema/MetaDataClient.java#L615-L626

> Upserts on a multi-tenant tables using tenant connection are taking 5K-6K % 
> more time than non-tenant connection
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-7484
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7484
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Sanjeet Malhotra
>            Assignee: Sanjeet Malhotra
>            Priority: Major
>
> Upserts using tenant connection on a multi-tenant table are taking 5K-6K % 
> more time than upserts using non-tenant connection for 2M rows. Here the time 
> being taken means total time spent in `executeUpdate()` and `commit()` call. 
> The batch size and schema was same when testing with tenant connection and 
> non-tenant connection. 
> On further analysis, got to know that when doing upserts (for 2M rows) on a 
> multi-tenant table over a tenant connection 13K-14K% more time was being 
> spent in executeUpdate call than non-tenant connection. This whole regression 
> is coming from mutation plan creation phase of executeUpdate call.
> Further root caused that, with tenant connection we are always hitting SYSCAT 
> to get PTable object during mutation plan creation. So, every call to 
> executeUpdate() over tenant connection results in PTable lookup from SYSCAT 
> during mutation plan creation adding ~1ms to every call of executeUpdate() 
> and for 2M rows this cumulate to 29-33 mins.
>  
> For multi-tenant tables, the PTableKey in metadata cache has tenant Id as 
> null as table was created over a non-tenant connection. When we are using 
> multi-tenant connection for doing upserts, the PTableKey used to lookup 
> PTableRef in metadata cache on client has tenant Id same as tenant Id of 
> connection i.e. non null. Thus, when lookup happens for PTableRef it results 
> in cache miss and next we immediately fallback to `getTableNoCache()` which 
> ends up hitting SYSCAT. Rather we should first fallback to looking in 
> metadata cache again but with tenant Id as null in PTableKey used for lookup 
> and if still we don't find PTableRef then we should fallback to 
> `getTableNoCache()`.
>  
> Code pointer: 
> https://github.com/apache/phoenix/blob/7682e3cee82e9cecb952eddaade1c544e6bd502d/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/PhoenixConnection.java#L766-L768
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to