[jira] [Commented] (CALCITE-5187) "CalciteSchema call getSubSchemaMap when prepareSql" will Causes SQL execution to slow down

itxiangkui (Jira) Fri, 10 Jun 2022 03:17:04 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-5187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552661#comment-17552661
 ]


itxiangkui commented on CALCITE-5187:
-------------------------------------

Referring to this Trace process, the following phenomena were observed:
1. The Calcite framework executes some code before calling 
CatalogSchema.getSubSchemaMap for the first time, which takes time
2. It also takes some time between calls to DatabaseSchema.getTable and 
CatalogSchema.getSubSchemaMap

But when the same Sql is executed for the second time and after, the 
above-mentioned time seems to disappear. It is guessed that it is the caching 
function of the JaninoRexCompiler process...

> "CalciteSchema call getSubSchemaMap when prepareSql" will Causes SQL 
> execution to slow down
> -------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-5187
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5187
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: itxiangkui
>            Priority: Major
>         Attachments: Screenshot from 2022-06-10 11-25-06.png
>
>
>  
>  # We define the structure of Schema-Tree: catalog->database->table
>  # 2. A time series database is designed. Schema information is stored in the 
> traditional Mysql database. Therefore, methods such as 
> Schema.getSubSchemaNames() Schema.getSubSchemaMap() will be frequently called 
> when executing SQL, especially in the validate phase.
>  # Query a SQL sample as follows:
> select * from `catalog`.`database`.`xxtable` limit 10
>  # Modified part of the code and connected to the opentracing system (jaeger)
>  # DEBUG tracks and analyzes the entire execution process of SQL to analyze 
> why it is slow, so as to optimize our system
> Here is one of our SQL executions:
> !Screenshot from 2022-06-10 11-25-06.png!
>  
> Apparently, ICUserCatalogSchema.getSubSchemaMap() is called many times and it 
> increases with the number of databases
> The debug code finds that the prepareSql stage will call the gatherLattices 
> method, and gatherLattices will access all Schema information, which 
> corresponds to databases
>  
> {code:java}
> private static void gatherLattices(CalciteSchema schema,
>     List<CalciteSchema.LatticeEntry> list) {
>   list.addAll(schema.getLatticeMap().values());
>   for (CalciteSchema subSchema : schema.getSubSchemaMap().values()) {
>     gatherLattices(subSchema, list);
>   }
> } {code}
> My question is, why the calcite will  care about other database information 
> when querying a table, and are there any suggestions for improvement here...
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (CALCITE-5187) "CalciteSchema call getSubSchemaMap when prepareSql" will Causes SQL execution to slow down

Reply via email to