date:20200418

[jira] [Updated] (CALCITE-3937) Fire rule for RelSubset only when it is derived

2020-04-18 Thread Haisheng Yuan (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-3937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haisheng Yuan updated CALCITE-3937:
---
Summary: Fire rule for RelSubset only when it is derived   (was: Only fire 
rule for RelSubset when it is derived )

> Fire rule for RelSubset only when it is derived 
> 
>
> Key: CALCITE-3937
> URL: https://issues.apache.org/jira/browse/CALCITE-3937
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Haisheng Yuan
>Priority: Major
>
> It is meaningless to fire rule for RelSubset when it is generated by parent's 
> trait requirement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3937) Only fire rule for RelSubset when it is derived

2020-04-18 Thread Haisheng Yuan (Jira)

Haisheng Yuan created CALCITE-3937:
--

 Summary: Only fire rule for RelSubset when it is derived 
 Key: CALCITE-3937
 URL: https://issues.apache.org/jira/browse/CALCITE-3937
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


It is meaningless to fire rule for RelSubset when it is generated by parent's 
trait requirement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (CALCITE-3873) Use global caching for ReflectiveVisitDispatcher implementation

2020-04-18 Thread neoremind (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

neoremind closed CALCITE-3873.
--
Resolution: Not A Problem

> Use global caching for ReflectiveVisitDispatcher implementation
> ---
>
> Key: CALCITE-3873
> URL: https://issues.apache.org/jira/browse/CALCITE-3873
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.22.0
>Reporter: neoremind
>Priority: Minor
>  Labels: pull-request-available
> Attachments: ReflectVisitorDispatcherTest.java, jmh_result.txt, 
> pic1.svg, pic2.svg
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> For curiosity, I use flame graph to profiling a simple query. The code 
> snippet looks like below.
> {code:java}
> String sql = "select empno, gender, name from EMPS where name = 'John'";
> Connection connection = null;
> Statement statement = null;
> try {
>   Properties info = new Properties();
>   info.put("model", jsonPath("smart"));
>   connection = DriverManager.getConnection("jdbc:calcite:", info);  
>   String x = null;
>   for (int i = 0; i < 5; i++) {
> statement = connection.createStatement();
> final ResultSet resultSet =
> statement.executeQuery(
> sql);
> while (resultSet.next()) {
>   x = resultSet.getInt(1)
>   + resultSet.getString(2)
>   + resultSet.getString(3);
> }  
>   }
> } catch (SQLException e) {
>   e.printStackTrace();
> } finally {
>   close(connection, statement);
> }
> {code}
>  
> I attach the generated flame graph [^pic1.svg]
> {code:java}
> 3% on sql2rel
> 9% on query optimizing,
> 62% of the time is spent on code gen and implementation,
> 20% on result set iterating and checking,
> … 
> {code}
> Hope this graph is informative. Since I start to learn Calcite recently, I 
> cannot tell where to start tuning, but from the graph one tiny point catches 
> my attention, I find there are many reflection invocations in 
> _Prepare#trimUnusedFields_. So, I spent some time trying to mitigate the 
> small overhead.
> I optimize _ReflectiveVisitDispatcher_ by introducing a global _Guava_ cache 
> with limited size to cache methods, also I add full unit tests for 
> _ReflectUtil_.
> I count the reference of the method: _ReflectUtil#createMethodDispatcher and_
> _ReflectUtil#createDispatcher (see below)._ Total 68 possible invocations, so 
> the cache size is limited, by caching all the methods during the lifecycle of 
> the process, we can eliminate reflection looking up methods overhead.
> {code:java}
> org.apache.calcite.rel.rel2sql.RelToSqlConverter: 18 possible invocations.
> org.apache.calcite.sql2rel.RelDecorrelator: 15 possible invocations.
> org.apache.calcite.sql2rel.RelFieldTrimmer: 11 possible invocations.
> org.apache.calcite.sql2rel.RelStructuredTypeFlattener.RewriteRelVisitor: 22 
> possible invocations.
> org.apache.calcite.interpreter.Interpreter.CompilerImpl: 2 possible 
> invocations.
> {code}
> Before introducing the global caching, caching is shared per 
> _ReflectiveVisitDispatcher_ instance, now different 
> _ReflectiveVisitDispatcher_ in different thread is able to reuse the cached 
> methods.
> See [^pic2.svg], after tuning, _trimUnusedFields_ only takes 0.64% of the 
> sampling time compared with 1.38% previously. I think this will help in a lot 
> more places.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-3873) Use global caching for ReflectiveVisitDispatcher implementation

2020-04-18 Thread neoremind (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086507#comment-17086507
 ] 

neoremind commented on CALCITE-3873:


The following is copied conclusion from github PR.

I have added new benchmark test case for baseline to disable guava caching. The 
result shows as below (code is in [my 
branch|https://github.com/neoremind/calcite/blob/refvisit_benchmark/ubenchmark/src/jmh/java/org/apache/calcite/benchmarks/ReflectiveVisitDispatcherTest2.java]).

{{Benchmark   
(guavaCacheMaxSize)  Mode  CntScoreError  Units
ReflectiveVisitDispatcherTest102.testGlobalCachingUsingGuavaCache   
  0  avgt5  116.446 ±  8.957  ns/op
ReflectiveVisitDispatcherTest102.testGlobalCachingUsingGuavaCache   
128  avgt5  133.678 ±  8.251  ns/op
ReflectiveVisitDispatcherTest102.testGlobalCachingUsingHashMap  
  0  avgt5   73.213 ±  7.614  ns/op
ReflectiveVisitDispatcherTest102.testInstanceCachingWithReflection  
  0  avgt5  227.130 ±  9.033  ns/op}}

Table title are operations performed by each solution.
|| ||lookup method||get by key from map||put key value to map||time cost||
|{{testGlobalCachingUsingGuavaCache with guavaCacheMaxSize = 0}} (Disable 
cache, looking up method every time for each call)|Y| | |116.446 ns/op|
|{{testGlobalCachingUsingGuavaCache with guavaCacheMaxSize = 128}} (Global 
cache with guava cache)| |Y| |133.678 ns/op|
|{{testGlobalCachingUsingHashMap}} (Global cache with JDK ConcurrentHashMap)| 
|Y| |73.213 ns/op|
|{{testInstanceCachingWithReflection}} (Original implementation to cache per 
instance for each call)|Y|Y|Y|227.130 ns/op|

The conclusion we could get is
 # Guava cache overhead is a little too big, compared to JDK ConcurrentHashMap. 
But JDK ConcurrentHashMap is not limit sized, so in the previous discussion, we 
do not choose JDK ConcurrentHashMap.
 # Guava cache overhead is even bigger than looking up method for each call. 
This is what surprised me.
 # The original implementation is the slowest one because it has to get from 
map, look up method then put method to map, it combines all the operations, the 
solution works well in the same instance, but for every Calcite invocation it 
has to sacrifice performance to look up method every time, that is where the PR 
tries to improve.

The current master version does improve performance and balance memory usage. 
But it is slower than 1) JDK ConcurrentHashMap for caching (but not limit 
size), 2) looking up method direct in each call without map operations.

Also, I have tested against MethodHandle and LambdaMetaFactory.

The result is as below.

{{Benchmark 
   Mode  Cnt  Score   Error  Units
ReflectiveVisitDispatcherTest.testGlobalCaching 
 avgt3137.270 ±32.275  ns/op
ReflectiveVisitDispatcherTest.testInstanceCachingWithLambdaFactory  
 avgt3  92459.660 ± 56777.698  ns/op
ReflectiveVisitDispatcherTest.testInstanceCachingWithLambdaFactoryThreadLocalInitialize
  avgt3199.687 ±24.755  ns/op
ReflectiveVisitDispatcherTest.testInstanceCachingWithMethodHandle   
 avgt3   2421.131 ±   633.756  ns/op
ReflectiveVisitDispatcherTest.testInstanceCachingWithMethodHandleThreadLocalInitialize
   avgt3218.616 ±41.754  ns/op
ReflectiveVisitDispatcherTest.testInstanceCachingWithReflection 
 avgt3224.603 ±22.770  ns/op
ReflectiveVisitDispatcherTest.testInstanceCachingWithReflectionThreadLocalInitialize
 avgt3218.406 ± 9.620  ns/op}}

>From the result, it seems the pre-work is time consuming for both 
>{{MethodHandle}} and {{LambdaFactory}} (built upon MethodHandle). To eliminate 
>the initialization effect, I try to use ThreadLocal to initialize once, the 
>result do verifies that, in terms of time cost, LambdaFactory < Reflection <= 
>MethodHandle, but the gap is very small.

With that, it will be a good option to stick to the current version. Hope the 
analysis and work could help people who interested in this. 

Many thanks to Vladimir, Haisheng, Danny, Chunwei, Stamatis to take time 
reviewing my code kindly :)

 

> Use global caching for ReflectiveVisitDispatcher implementation
> ---
>
> Key: CALCITE-3873
> URL: https://issues.apache.org/jira/browse/CALCITE-3873
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.22.0
>Reporter: neoremind
>Priority: Minor
>  Labels: pull-request-available
> Attachments: ReflectVisitorDispatcherTest.ja

[jira] [Commented] (CALCITE-3790) Make the URL of FileSource available

2020-04-18 Thread Vladimir Sitnikov (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086455#comment-17086455
 ] 

Vladimir Sitnikov commented on CALCITE-3790:


The CI was perfectly fine when PR was merged. The problem was that ASF Jenkins 
is running with a weird encoding/filesystem that does not allow Unicode 
filenames.

> Make the URL of FileSource available
> 
>
> Key: CALCITE-3790
> URL: https://issues.apache.org/jira/browse/CALCITE-3790
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Liya Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.23.0
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When a FileSource is constructed with only a File object, its URL is left 
> null. This makes it inconvenient for some scenarios where a valid URL is 
> required.
> This can be resolved, as each file in the local file system corresponds to a 
> file URL, and we fix it by converting a file object to a file URL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-3936) RelToSqlConverter changes target of ambiguous HAVING clause with a Project on Filter on Aggregate

2020-04-18 Thread Jin Xing (Jira)



[ 
https://issues.apache.org/jira/browse/CALCITE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086382#comment-17086382
 ] 

Jin Xing commented on CALCITE-3936:
---

Add a simpler test case:
{code:java}
// Sql: isHaving = false
SELECT DEPTNO, SUM(SAL) SAL
FROM EMP
GROUP BY DEPTNO
HAVING SUM(SAL) > 100

// Plan:
LogicalFilter(condition=[>($1, 100)])
  LogicalAggregate(group=[{0}], SAL=[SUM($1)])
LogicalProject(DEPTNO=[$7], SAL=[$5])
  JdbcTableScan(table=[[JDBC_SCOTT, EMP]])

// Converted: isHaving = true
SELECT DEPTNO, SUM(SAL) AS SAL
FROM SCOTT.EMP
GROUP BY DEPTNO
HAVING SUM(SAL) > 100{code}
 

[1] Calcite doesn't check if the filtering clause conflicts with the alias in 
Aggregate and convert the filtering condition directly, that's where the bug 
comes from;

We need to add a checking logic when converting a Filter/Aggregate pattern.

BTW, [~swtalbot], in the Jira title, do you mean "... for dialects with 
SqlConformance.isHavingAlias=true" ?

I think the issue happen when parse the sql with config (isHaving=false) and 
convert rel with config (isHaving=true).

 [1] 
[https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rel2sql/RelToSqlConverter.java#L315]

 

> RelToSqlConverter changes target of ambiguous HAVING clause with a Project on 
> Filter on Aggregate
> -
>
> Key: CALCITE-3936
> URL: https://issues.apache.org/jira/browse/CALCITE-3936
> Project: Calcite
>  Issue Type: Bug
>Reporter: Steven Talbot
>Priority: Major
>
> ... for dialects with SqlConformance.isHavingAlias=false
> Very, very similar to -CALCITE-3593.-
> Reproducing test case in RelToSqlConverter:
> {code:java}
> @Test public void testHavingAlias2() {
>   final String query = "select \"product_id\" + 1, sum(\"gross_weight\") as 
> gross_weight\n" +
>   " from \"product\"\n" +
>   " group by \"product_id\"\n" +
>   " having sum(\"product\".\"gross_weight\") < 200";
>   final String expected = "SELECT product_id + 1, GROSS_WEIGHT\n" +
>   "FROM (SELECT product_id, SUM(gross_weight) AS GROSS_WEIGHT\n" +
>   "FROM foodmart.product\n" +
>   "GROUP BY product_id\n" +
>   "HAVING SUM(product.gross_weight) < 200) AS t1"
>   // (or) "HAVING gross_weight < 200) AS t1"
>   // (or) ") AS t1\nWHERE t1.gross_weight < 200) AS t1"
>   // INSTEAD, we get "HAVING SUM(gross_weight) < 200) AS t1"
>   // which on BigQuery gives you an error about aggregating aggregates
>   ;
>   sql(query).withBigQuery().ok(expected);
> }
> {code}
> In that one, the pattern was Project/Filter/Aggregate, here it is 
> Filter/Aggregate/Project. In 3593, the project created a new alias, which got 
> added to the same SELECT clause and caused the ambiguity. Here, the aggregate 
> creates an alias, but the filter will write a HAVING clause using the aliases 
> from before the Aggregate, and that will cause the SQL engine to think that 
> the filter is on the aggregate field, rather than on the underlying field.
> Note that this is less an absurdly unlikely occurrence than it might seem 
> because when Calcite's default aliasing kicks in and everything gets the name 
> "$f6", "$f4", etc, so chances of a collision are higher if you have multiply 
> nested selects with default aliases.
> Potential fixes:
>  # force a subselect, as was done for 3593.
>  # Force the expression in the HAVING to be fully aliased by table (works at 
> least in BigQuery, where I tested)
>  # Write the HAVING expression in terms of the aliases from the aggregate, 
> rather than what's coming from the aggregate (also works on BigQuery)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CALCITE-3937) Fire rule for RelSubset only when it is derived

[jira] [Created] (CALCITE-3937) Only fire rule for RelSubset when it is derived

[jira] [Closed] (CALCITE-3873) Use global caching for ReflectiveVisitDispatcher implementation

[jira] [Commented] (CALCITE-3873) Use global caching for ReflectiveVisitDispatcher implementation

[jira] [Commented] (CALCITE-3790) Make the URL of FileSource available

[jira] [Commented] (CALCITE-3936) RelToSqlConverter changes target of ambiguous HAVING clause with a Project on Filter on Aggregate

6 matches

Site Navigation

Mail list logo

Footer information