itxiangkui created CALCITE-5187:
-----------------------------------

             Summary: "CalciteSchema call getSubSchemaMap when prepareSql" will 
Causes SQL execution to slow down
                 Key: CALCITE-5187
                 URL: https://issues.apache.org/jira/browse/CALCITE-5187
             Project: Calcite
          Issue Type: Improvement
            Reporter: itxiangkui
         Attachments: Screenshot from 2022-06-10 11-17-02.png, Screenshot from 
2022-06-10 11-18-54.png

 

 
 # We define the structure of Schema-Tree: catalog->database->table
 # 2. A time series database is designed. Schema information is stored in the 
traditional Mysql database. Therefore, methods such as 
Schema.getSubSchemaNames() Schema.getSubSchemaMap() will be frequently called 
when executing SQL, especially in the validate phase.
 # Query a SQL sample as follows:
select * from `catalog`.`database`.`xxtable` limit 10
 # Modified part of the code and connected to the opentracing system (jaeger)
 # DEBUG tracks and analyzes the entire execution process of SQL to analyze why 
it is slow, so as to optimize our system

Here is one of our SQL executions:

!Screenshot from 2022-06-10 11-18-54.png!

 

Apparently, ICUserCatalogSchema.getSubSchemaMap() is called many times and it 
increases with the number of databases
The debug code finds that the prepareSql stage will call the gatherLattices 
method, and gatherLattices will access all Schema information, which 
corresponds to databases

 
{code:java}
private static void gatherLattices(CalciteSchema schema,
    List<CalciteSchema.LatticeEntry> list) {
  list.addAll(schema.getLatticeMap().values());
  for (CalciteSchema subSchema : schema.getSubSchemaMap().values()) {
    gatherLattices(subSchema, list);
  }
} {code}
My question is, why do you care about other database information when querying 
a table, and are there any suggestions for improvement here...

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to