[ https://issues.apache.org/jira/browse/CALCITE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
neoremind updated CALCITE-3873: ------------------------------- Description: For curiosity, I use flame graph to profiling a simple query. The code snippet looks like below. {code:java} String sql = "select empno, gender, name from EMPS where name = 'John'"; Connection connection = null; Statement statement = null; try { Properties info = new Properties(); info.put("model", jsonPath("smart")); connection = DriverManager.getConnection("jdbc:calcite:", info); String x = null; long start = System.currentTimeMillis(); for (int i = 0; i < 50000; i++) { statement = connection.createStatement(); final ResultSet resultSet = statement.executeQuery( sql); while (resultSet.next()) { x = resultSet.getInt(1) + resultSet.getString(2) + resultSet.getString(3); } } } catch (SQLException e) { e.printStackTrace(); } finally { close(connection, statement); } {code} I attach the generated flame graph [^pic1.svg] {code:java} 3% on sql2rel 9% on query optimizing, 62% of the time is spent on code gen and implementation, 20% on result set iterating and checking, … {code} Hope this graph is informative. Since I start to learn Calcite recently, I cannot tell where to start tuning, but from the graph one tiny point catches my attention, I find there are many reflection invocations in _Prepare#trimUnusedFields_. So, I spent some time trying to mitigate the small overhead. I optimize _ReflectiveVisitDispatcher_ by introducing a global _Guava_ cache with limited size to cache methods, also I add full unit tests for _ReflectUtil_. I count the reference of the method: _ReflectUtil#createMethodDispatcher and_ _ReflectUtil#createDispatcher (see below)._ Total 68 possible invocations, so the cache size is limited, by caching all the methods during the lifecycle of the process, we can eliminate reflection looking up methods overhead. {code:java} org.apache.calcite.rel.rel2sql.RelToSqlConverter: 18 possible invocations. org.apache.calcite.sql2rel.RelDecorrelator: 15 possible invocations. org.apache.calcite.sql2rel.RelFieldTrimmer: 11 possible invocations. org.apache.calcite.sql2rel.RelStructuredTypeFlattener.RewriteRelVisitor: 22 possible invocations. org.apache.calcite.interpreter static class Interpreter.CompilerImpl: 2 possible invocations. {code} Before introducing the global caching, caching is shared per _ReflectiveVisitDispatcher_ instance, now different _ReflectiveVisitDispatcher_ in different thread is able to reuse the cached methods. See [^pic2.svg], after tuning, _trimUnusedFields_ only takes 0.64% of the sampling time compared with 1.38% previously. I think this will help in a lot more places. was: For curiosity, I use flame graph to profiling a simple query. The code snippet looks like below. {code:java} String sql = "select empno, gender, name from EMPS where name = 'John'"; Connection connection = null; Statement statement = null; try { Properties info = new Properties(); info.put("model", jsonPath("smart")); connection = DriverManager.getConnection("jdbc:calcite:", info); String x = null; long start = System.currentTimeMillis(); for (int i = 0; i < 50000; i++) { statement = connection.createStatement(); final ResultSet resultSet = statement.executeQuery( sql); while (resultSet.next()) { x = resultSet.getInt(1) + resultSet.getString(2) + resultSet.getString(3); } } } catch (SQLException e) { e.printStackTrace(); } finally { close(connection, statement); } {code} I attach the generated flame graph [^pic1.svg] {code:java} 3% on sql2rel 9% on query optimizing, 62% of the time is spent on code gen and implementation, 20% on result set iterating and checking, … {code} Hope this graph is informative. Since I start to learn Calcite recently, I cannot tell where to start tuning, but from the graph one tiny point catches my attention, I find there are many reflection invocations in _Prepare#trimUnusedFields_. So, I spent some time trying to mitigate the small overhead. I optimize _ReflectiveVisitDispatcher_ by introducing a global _Guava_ cache with limited size to cache methods, also I add full unit tests for _ReflectUtil_. I count the reference of the method: _ReflectUtil#createMethodDispatcher and_ _ReflectUtil#createDispatcher (see below)._ Total 68 possible invocations, so the cache size is limited, by caching all the methods during the lifecycle of the process, we can eliminate reflection looking up methods overhead. {code:java} org.apache.calcite.rel.rel2sql.RelToSqlConverter: 18 possible invocations. org.apache.calcite.sql2rel.RelDecorrelator: 15 possible invocations. org.apache.calcite.sql2rel.RelFieldTrimmer: 11 possible invocations. org.apache.calcite.sql2rel.RelStructuredTypeFlattener.RewriteRelVisitor: 22 possible invocations. org.apache.calcite.interpreter static class Interpreter.CompilerImpl: 2 possible invocations. {code} Before introducing the global caching, caching is shared per _ReflectiveVisitDispatcher_ instance, now different _ReflectiveVisitDispatcher_ in different thread is able to reuse the cached methods. See [^pic2.svg], after tuning, _trimUnusedFields_ only takes 0.64% of the sampling time compared with 1.38% previously. I think this will help in a lot more places. > Use global caching for ReflectiveVisitDispatcher implementation > --------------------------------------------------------------- > > Key: CALCITE-3873 > URL: https://issues.apache.org/jira/browse/CALCITE-3873 > Project: Calcite > Issue Type: Improvement > Components: core > Affects Versions: 1.22.0 > Reporter: neoremind > Priority: Minor > Attachments: pic1.svg, pic2.svg > > > For curiosity, I use flame graph to profiling a simple query. The code > snippet looks like below. > {code:java} > String sql = "select empno, gender, name from EMPS where name = 'John'"; > Connection connection = null; > Statement statement = null; > try { > Properties info = new Properties(); > info.put("model", jsonPath("smart")); > connection = DriverManager.getConnection("jdbc:calcite:", info); > String x = null; > long start = System.currentTimeMillis(); > for (int i = 0; i < 50000; i++) { > statement = connection.createStatement(); > final ResultSet resultSet = > statement.executeQuery( > sql); > while (resultSet.next()) { > x = resultSet.getInt(1) > + resultSet.getString(2) > + resultSet.getString(3); > } } > } catch (SQLException e) { > e.printStackTrace(); > } finally { > close(connection, statement); > } > {code} > > I attach the generated flame graph [^pic1.svg] > {code:java} > 3% on sql2rel > 9% on query optimizing, > 62% of the time is spent on code gen and implementation, > 20% on result set iterating and checking, > … > {code} > Hope this graph is informative. Since I start to learn Calcite recently, I > cannot tell where to start tuning, but from the graph one tiny point catches > my attention, I find there are many reflection invocations in > _Prepare#trimUnusedFields_. So, I spent some time trying to mitigate the > small overhead. > I optimize _ReflectiveVisitDispatcher_ by introducing a global _Guava_ cache > with limited size to cache methods, also I add full unit tests for > _ReflectUtil_. > I count the reference of the method: _ReflectUtil#createMethodDispatcher and_ > _ReflectUtil#createDispatcher (see below)._ Total 68 possible invocations, so > the cache size is limited, by caching all the methods during the lifecycle of > the process, we can eliminate reflection looking up methods overhead. > {code:java} > org.apache.calcite.rel.rel2sql.RelToSqlConverter: 18 possible invocations. > org.apache.calcite.sql2rel.RelDecorrelator: 15 possible invocations. > org.apache.calcite.sql2rel.RelFieldTrimmer: 11 possible invocations. > org.apache.calcite.sql2rel.RelStructuredTypeFlattener.RewriteRelVisitor: 22 > possible invocations. > org.apache.calcite.interpreter static class Interpreter.CompilerImpl: 2 > possible invocations. > {code} > Before introducing the global caching, caching is shared per > _ReflectiveVisitDispatcher_ instance, now different > _ReflectiveVisitDispatcher_ in different thread is able to reuse the cached > methods. > See [^pic2.svg], after tuning, _trimUnusedFields_ only takes 0.64% of the > sampling time compared with 1.38% previously. I think this will help in a lot > more places. > -- This message was sent by Atlassian Jira (v8.3.4#803005)