[ 
https://issues.apache.org/jira/browse/PHOENIX-7484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905612#comment-17905612
 ] 

Tanuj Khurana commented on PHOENIX-7484:
----------------------------------------

There is also a correctness issue with your proposed solution. Assume that you 
have a tenant view and that tenant view is not in cache but its parent view is 
in the cache. Now when upserting into the tenant view you will get the Ptable 
object of the parent view. Now this can cause problems in cases like when 
tenant view has more columns than the parent view. So you can't use the parent 
view PTable.

> Upserts on a multi-tenant tables using tenant connection are taking 5K-6K % 
> more time than non-tenant connection
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-7484
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7484
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Sanjeet Malhotra
>            Assignee: Sanjeet Malhotra
>            Priority: Major
>
> Upserts using tenant connection on a multi-tenant table are taking 5K-6K % 
> more time than upserts using non-tenant connection for 2M rows. Here the time 
> being taken means total time spent in `executeUpdate()` and `commit()` call. 
> The batch size and schema was same when testing with tenant connection and 
> non-tenant connection. 
> On further analysis, got to know that when doing upserts (for 2M rows) on a 
> multi-tenant table over a tenant connection 13K-14K% more time was being 
> spent in executeUpdate call than non-tenant connection. This whole regression 
> is coming from mutation plan creation phase of executeUpdate call.
> Further root caused that, with tenant connection we are always hitting SYSCAT 
> to get PTable object during mutation plan creation. So, every call to 
> executeUpdate() over tenant connection results in PTable lookup from SYSCAT 
> during mutation plan creation adding ~1ms to every call of executeUpdate() 
> and for 2M rows this cumulate to 29-33 mins.
>  
> For multi-tenant tables, the PTableKey in metadata cache has tenant Id as 
> null as table was created over a non-tenant connection. When we are using 
> multi-tenant connection for doing upserts, the PTableKey used to lookup 
> PTableRef in metadata cache on client has tenant Id same as tenant Id of 
> connection i.e. non null. Thus, when lookup happens for PTableRef it results 
> in cache miss and next we immediately fallback to `getTableNoCache()` which 
> ends up hitting SYSCAT. Rather we should first fallback to looking in 
> metadata cache again but with tenant Id as null in PTableKey used for lookup 
> and if still we don't find PTableRef then we should fallback to 
> `getTableNoCache()`.
>  
> Code pointer: 
> https://github.com/apache/phoenix/blob/7682e3cee82e9cecb952eddaade1c544e6bd502d/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/PhoenixConnection.java#L766-L768
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to