[jira] [Updated] (CALCITE-3937) Fire rule for RelSubset only when it is derived
[ https://issues.apache.org/jira/browse/CALCITE-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haisheng Yuan updated CALCITE-3937: --- Summary: Fire rule for RelSubset only when it is derived (was: Only fire rule for RelSubset when it is derived ) > Fire rule for RelSubset only when it is derived > > > Key: CALCITE-3937 > URL: https://issues.apache.org/jira/browse/CALCITE-3937 > Project: Calcite > Issue Type: Improvement > Components: core >Reporter: Haisheng Yuan >Priority: Major > > It is meaningless to fire rule for RelSubset when it is generated by parent's > trait requirement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CALCITE-3937) Only fire rule for RelSubset when it is derived
Haisheng Yuan created CALCITE-3937: -- Summary: Only fire rule for RelSubset when it is derived Key: CALCITE-3937 URL: https://issues.apache.org/jira/browse/CALCITE-3937 Project: Calcite Issue Type: Improvement Components: core Reporter: Haisheng Yuan It is meaningless to fire rule for RelSubset when it is generated by parent's trait requirement. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CALCITE-3873) Use global caching for ReflectiveVisitDispatcher implementation
[ https://issues.apache.org/jira/browse/CALCITE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] neoremind closed CALCITE-3873. -- Resolution: Not A Problem > Use global caching for ReflectiveVisitDispatcher implementation > --- > > Key: CALCITE-3873 > URL: https://issues.apache.org/jira/browse/CALCITE-3873 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.22.0 >Reporter: neoremind >Priority: Minor > Labels: pull-request-available > Attachments: ReflectVisitorDispatcherTest.java, jmh_result.txt, > pic1.svg, pic2.svg > > Time Spent: 10h 20m > Remaining Estimate: 0h > > For curiosity, I use flame graph to profiling a simple query. The code > snippet looks like below. > {code:java} > String sql = "select empno, gender, name from EMPS where name = 'John'"; > Connection connection = null; > Statement statement = null; > try { > Properties info = new Properties(); > info.put("model", jsonPath("smart")); > connection = DriverManager.getConnection("jdbc:calcite:", info); > String x = null; > for (int i = 0; i < 5; i++) { > statement = connection.createStatement(); > final ResultSet resultSet = > statement.executeQuery( > sql); > while (resultSet.next()) { > x = resultSet.getInt(1) > + resultSet.getString(2) > + resultSet.getString(3); > } > } > } catch (SQLException e) { > e.printStackTrace(); > } finally { > close(connection, statement); > } > {code} > > I attach the generated flame graph [^pic1.svg] > {code:java} > 3% on sql2rel > 9% on query optimizing, > 62% of the time is spent on code gen and implementation, > 20% on result set iterating and checking, > … > {code} > Hope this graph is informative. Since I start to learn Calcite recently, I > cannot tell where to start tuning, but from the graph one tiny point catches > my attention, I find there are many reflection invocations in > _Prepare#trimUnusedFields_. So, I spent some time trying to mitigate the > small overhead. > I optimize _ReflectiveVisitDispatcher_ by introducing a global _Guava_ cache > with limited size to cache methods, also I add full unit tests for > _ReflectUtil_. > I count the reference of the method: _ReflectUtil#createMethodDispatcher and_ > _ReflectUtil#createDispatcher (see below)._ Total 68 possible invocations, so > the cache size is limited, by caching all the methods during the lifecycle of > the process, we can eliminate reflection looking up methods overhead. > {code:java} > org.apache.calcite.rel.rel2sql.RelToSqlConverter: 18 possible invocations. > org.apache.calcite.sql2rel.RelDecorrelator: 15 possible invocations. > org.apache.calcite.sql2rel.RelFieldTrimmer: 11 possible invocations. > org.apache.calcite.sql2rel.RelStructuredTypeFlattener.RewriteRelVisitor: 22 > possible invocations. > org.apache.calcite.interpreter.Interpreter.CompilerImpl: 2 possible > invocations. > {code} > Before introducing the global caching, caching is shared per > _ReflectiveVisitDispatcher_ instance, now different > _ReflectiveVisitDispatcher_ in different thread is able to reuse the cached > methods. > See [^pic2.svg], after tuning, _trimUnusedFields_ only takes 0.64% of the > sampling time compared with 1.38% previously. I think this will help in a lot > more places. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3873) Use global caching for ReflectiveVisitDispatcher implementation
[ https://issues.apache.org/jira/browse/CALCITE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086507#comment-17086507 ] neoremind commented on CALCITE-3873: The following is copied conclusion from github PR. I have added new benchmark test case for baseline to disable guava caching. The result shows as below (code is in [my branch|https://github.com/neoremind/calcite/blob/refvisit_benchmark/ubenchmark/src/jmh/java/org/apache/calcite/benchmarks/ReflectiveVisitDispatcherTest2.java]). {{Benchmark (guavaCacheMaxSize) Mode CntScoreError Units ReflectiveVisitDispatcherTest102.testGlobalCachingUsingGuavaCache 0 avgt5 116.446 ± 8.957 ns/op ReflectiveVisitDispatcherTest102.testGlobalCachingUsingGuavaCache 128 avgt5 133.678 ± 8.251 ns/op ReflectiveVisitDispatcherTest102.testGlobalCachingUsingHashMap 0 avgt5 73.213 ± 7.614 ns/op ReflectiveVisitDispatcherTest102.testInstanceCachingWithReflection 0 avgt5 227.130 ± 9.033 ns/op}} Table title are operations performed by each solution. || ||lookup method||get by key from map||put key value to map||time cost|| |{{testGlobalCachingUsingGuavaCache with guavaCacheMaxSize = 0}} (Disable cache, looking up method every time for each call)|Y| | |116.446 ns/op| |{{testGlobalCachingUsingGuavaCache with guavaCacheMaxSize = 128}} (Global cache with guava cache)| |Y| |133.678 ns/op| |{{testGlobalCachingUsingHashMap}} (Global cache with JDK ConcurrentHashMap)| |Y| |73.213 ns/op| |{{testInstanceCachingWithReflection}} (Original implementation to cache per instance for each call)|Y|Y|Y|227.130 ns/op| The conclusion we could get is # Guava cache overhead is a little too big, compared to JDK ConcurrentHashMap. But JDK ConcurrentHashMap is not limit sized, so in the previous discussion, we do not choose JDK ConcurrentHashMap. # Guava cache overhead is even bigger than looking up method for each call. This is what surprised me. # The original implementation is the slowest one because it has to get from map, look up method then put method to map, it combines all the operations, the solution works well in the same instance, but for every Calcite invocation it has to sacrifice performance to look up method every time, that is where the PR tries to improve. The current master version does improve performance and balance memory usage. But it is slower than 1) JDK ConcurrentHashMap for caching (but not limit size), 2) looking up method direct in each call without map operations. Also, I have tested against MethodHandle and LambdaMetaFactory. The result is as below. {{Benchmark Mode Cnt Score Error Units ReflectiveVisitDispatcherTest.testGlobalCaching avgt3137.270 ±32.275 ns/op ReflectiveVisitDispatcherTest.testInstanceCachingWithLambdaFactory avgt3 92459.660 ± 56777.698 ns/op ReflectiveVisitDispatcherTest.testInstanceCachingWithLambdaFactoryThreadLocalInitialize avgt3199.687 ±24.755 ns/op ReflectiveVisitDispatcherTest.testInstanceCachingWithMethodHandle avgt3 2421.131 ± 633.756 ns/op ReflectiveVisitDispatcherTest.testInstanceCachingWithMethodHandleThreadLocalInitialize avgt3218.616 ±41.754 ns/op ReflectiveVisitDispatcherTest.testInstanceCachingWithReflection avgt3224.603 ±22.770 ns/op ReflectiveVisitDispatcherTest.testInstanceCachingWithReflectionThreadLocalInitialize avgt3218.406 ± 9.620 ns/op}} >From the result, it seems the pre-work is time consuming for both >{{MethodHandle}} and {{LambdaFactory}} (built upon MethodHandle). To eliminate >the initialization effect, I try to use ThreadLocal to initialize once, the >result do verifies that, in terms of time cost, LambdaFactory < Reflection <= >MethodHandle, but the gap is very small. With that, it will be a good option to stick to the current version. Hope the analysis and work could help people who interested in this. Many thanks to Vladimir, Haisheng, Danny, Chunwei, Stamatis to take time reviewing my code kindly :) > Use global caching for ReflectiveVisitDispatcher implementation > --- > > Key: CALCITE-3873 > URL: https://issues.apache.org/jira/browse/CALCITE-3873 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.22.0 >Reporter: neoremind >Priority: Minor > Labels: pull-request-available > Attachments: ReflectVisitorDispatcherTest.ja
[jira] [Commented] (CALCITE-3790) Make the URL of FileSource available
[ https://issues.apache.org/jira/browse/CALCITE-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086455#comment-17086455 ] Vladimir Sitnikov commented on CALCITE-3790: The CI was perfectly fine when PR was merged. The problem was that ASF Jenkins is running with a weird encoding/filesystem that does not allow Unicode filenames. > Make the URL of FileSource available > > > Key: CALCITE-3790 > URL: https://issues.apache.org/jira/browse/CALCITE-3790 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Liya Fan >Priority: Minor > Labels: pull-request-available > Fix For: 1.23.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > When a FileSource is constructed with only a File object, its URL is left > null. This makes it inconvenient for some scenarios where a valid URL is > required. > This can be resolved, as each file in the local file system corresponds to a > file URL, and we fix it by converting a file object to a file URL. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3936) RelToSqlConverter changes target of ambiguous HAVING clause with a Project on Filter on Aggregate
[ https://issues.apache.org/jira/browse/CALCITE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086382#comment-17086382 ] Jin Xing commented on CALCITE-3936: --- Add a simpler test case: {code:java} // Sql: isHaving = false SELECT DEPTNO, SUM(SAL) SAL FROM EMP GROUP BY DEPTNO HAVING SUM(SAL) > 100 // Plan: LogicalFilter(condition=[>($1, 100)]) LogicalAggregate(group=[{0}], SAL=[SUM($1)]) LogicalProject(DEPTNO=[$7], SAL=[$5]) JdbcTableScan(table=[[JDBC_SCOTT, EMP]]) // Converted: isHaving = true SELECT DEPTNO, SUM(SAL) AS SAL FROM SCOTT.EMP GROUP BY DEPTNO HAVING SUM(SAL) > 100{code} [1] Calcite doesn't check if the filtering clause conflicts with the alias in Aggregate and convert the filtering condition directly, that's where the bug comes from; We need to add a checking logic when converting a Filter/Aggregate pattern. BTW, [~swtalbot], in the Jira title, do you mean "... for dialects with SqlConformance.isHavingAlias=true" ? I think the issue happen when parse the sql with config (isHaving=false) and convert rel with config (isHaving=true). [1] [https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java#L315] > RelToSqlConverter changes target of ambiguous HAVING clause with a Project on > Filter on Aggregate > - > > Key: CALCITE-3936 > URL: https://issues.apache.org/jira/browse/CALCITE-3936 > Project: Calcite > Issue Type: Bug >Reporter: Steven Talbot >Priority: Major > > ... for dialects with SqlConformance.isHavingAlias=false > Very, very similar to -CALCITE-3593.- > Reproducing test case in RelToSqlConverter: > {code:java} > @Test public void testHavingAlias2() { > final String query = "select \"product_id\" + 1, sum(\"gross_weight\") as > gross_weight\n" + > " from \"product\"\n" + > " group by \"product_id\"\n" + > " having sum(\"product\".\"gross_weight\") < 200"; > final String expected = "SELECT product_id + 1, GROSS_WEIGHT\n" + > "FROM (SELECT product_id, SUM(gross_weight) AS GROSS_WEIGHT\n" + > "FROM foodmart.product\n" + > "GROUP BY product_id\n" + > "HAVING SUM(product.gross_weight) < 200) AS t1" > // (or) "HAVING gross_weight < 200) AS t1" > // (or) ") AS t1\nWHERE t1.gross_weight < 200) AS t1" > // INSTEAD, we get "HAVING SUM(gross_weight) < 200) AS t1" > // which on BigQuery gives you an error about aggregating aggregates > ; > sql(query).withBigQuery().ok(expected); > } > {code} > In that one, the pattern was Project/Filter/Aggregate, here it is > Filter/Aggregate/Project. In 3593, the project created a new alias, which got > added to the same SELECT clause and caused the ambiguity. Here, the aggregate > creates an alias, but the filter will write a HAVING clause using the aliases > from before the Aggregate, and that will cause the SQL engine to think that > the filter is on the aggregate field, rather than on the underlying field. > Note that this is less an absurdly unlikely occurrence than it might seem > because when Calcite's default aliasing kicks in and everything gets the name > "$f6", "$f4", etc, so chances of a collision are higher if you have multiply > nested selects with default aliases. > Potential fixes: > # force a subselect, as was done for 3593. > # Force the expression in the HAVING to be fully aliased by table (works at > least in BigQuery, where I tested) > # Write the HAVING expression in terms of the aliases from the aggregate, > rather than what's coming from the aggregate (also works on BigQuery) -- This message was sent by Atlassian Jira (v8.3.4#803005)