Re: [DISCUSS] State of the project 2019

2019-10-29 Thread Juan Pan
Sorry to disturb others.


 @Danny Chan Hi, i have not received your personal mail, and i sent you 
email(yuzhao@gmail.com?) as well, but no reply. :(



So i have to ping you in this way, please excuse me.






 Juan Pan


panj...@apache.org
Juan Pan(Trista), Apache ShardingSphere


On 10/25/2019 20:41,Danny Chan wrote:
Oh, you can add my weixin(send personal mail for that) and I have a free ticket 
for the conference !

Best,
Danny Chan
在 2019年10月25日 +0800 PM6:29,Juan Pan ,写道:
Hi Danny,


I am interested in your coming talk in Beijing China. How to take part in it, 
can you give me more detail?


Juan Pan


panj...@apache.org
Juan Pan(Trista), Apache ShardingSphere


On 10/23/2019 18:23,Danny Chan wrote:
I gave a talk last year in a university in
France, and nobody in the audience had ever heard of Calcite before.

Oops, that's a pity, I would also give a talk about Calcite on Flink Forward 
Asia 2019 of BeiJing China, hope more people would know Apache Calcite.

Best,
Danny Chan
在 2019年10月23日 +0800 PM2:36,dev@calcite.apache.org,写道:

I gave a talk last year in a university in
France, and nobody in the audience had ever heard of Calcite before.


[jira] [Created] (CALCITE-3459) AssertionError for using Timestamp/Time/Date in table function

2019-10-29 Thread Wang Yanlin (Jira)
Wang Yanlin created CALCITE-3459:


 Summary: AssertionError for using Timestamp/Time/Date in table 
function
 Key: CALCITE-3459
 URL: https://issues.apache.org/jira/browse/CALCITE-3459
 Project: Calcite
  Issue Type: Bug
Reporter: Wang Yanlin


Add the following test case in *TableFunctionTest* to reproduce, you need to 
add the implementation of user defined table function in *Smalls* before 
running the test.
{code:java}
@Test public void testTableFunctionWithTimeRelatedParameter() throws 
SQLException {
try (Connection connection = DriverManager.getConnection("jdbc:calcite:")) {
  CalciteConnection calciteConnection =
  connection.unwrap(CalciteConnection.class);
  SchemaPlus rootSchema = calciteConnection.getRootSchema();
  SchemaPlus schema = rootSchema.add("s", new AbstractSchema());

  final TableFunction table1 =
  TableFunctionImpl.create(Smalls.TIMESTAMP_STRING_LENGTH);
  schema.add("TimestampStringLength", table1);
  final String sql1 = "select *\n"
  + "from table(\"s\".\"TimestampStringLength\"(TIMESTAMP '2019-10-12 
19:00:35'))\n"
  + "as t(n, c) where n > 19";
  ResultSet resultSet1 = connection.createStatement().executeQuery(sql1);
  assertThat(CalciteAssert.toString(resultSet1),
  equalTo("N=20; C=abcdefg\n"));

  final TableFunction table2 =
  TableFunctionImpl.create(Smalls.DATE_STRING_LENGTH);
  schema.add("DateStringLength", table2);
  final String sql2 = "select *\n"
  + "from table(\"s\".\"DateStringLength\"(DATE '2019-10-12')) as t(n, 
c)\n"
  + "where n > 8";
  ResultSet resultSet2 = connection.createStatement().executeQuery(sql2);
  assertThat(CalciteAssert.toString(resultSet2),
  equalTo("N=9; C=abcdefghi\n"));

  final TableFunction table3 =
  TableFunctionImpl.create(Smalls.TIME_STRING_LENGTH);
  schema.add("TimeStringLength", table3);
  final String sql3 = "select *\n"
  + "from table(\"s\".\"TimeStringLength\"(TIME '19:00:35')) as t(n, 
c)\n"
  + "where n > 6";
  ResultSet resultSet3 = connection.createStatement().executeQuery(sql3);
  assertThat(CalciteAssert.toString(resultSet3),
  equalTo("N=7; C=abcdefg\n"));
}
  }
{code}

The stack trace of exception

{code:java}
java.lang.AssertionError: value 2019-10-12 19:00:35 does not match type class 
java.sql.Timestamp

at 
org.apache.calcite.linq4j.tree.ConstantExpression.(ConstantExpression.java:50)
at 
org.apache.calcite.linq4j.tree.Expressions.constant(Expressions.java:589)
at 
org.apache.calcite.linq4j.tree.OptimizeShuttle.visit(OptimizeShuttle.java:278)
at 
org.apache.calcite.linq4j.tree.UnaryExpression.accept(UnaryExpression.java:37)
at 
org.apache.calcite.linq4j.tree.GotoStatement.accept(GotoStatement.java:60)
at 
org.apache.calcite.linq4j.tree.BlockBuilder.optimize(BlockBuilder.java:437)
at 
org.apache.calcite.linq4j.tree.BlockBuilder.toBlock(BlockBuilder.java:321)
at 
org.apache.calcite.sql.validate.SqlUserDefinedTableMacro.coerce(SqlUserDefinedTableMacro.java:190)
at 
org.apache.calcite.sql.validate.SqlUserDefinedTableMacro.convertArguments(SqlUserDefinedTableMacro.java:110)
at 
org.apache.calcite.sql.validate.SqlUserDefinedTableFunction.getRowType(SqlUserDefinedTableFunction.java:70)
at 
org.apache.calcite.sql.validate.ProcedureNamespace.validateImpl(ProcedureNamespace.java:62)
at 
org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1009)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:969)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3129)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3111)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3383)
at 
org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
at 
org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1009)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:969)
at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:944)
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:651)
at 

[jira] [Created] (CALCITE-3458) Remove desc in AbstractRelNode

2019-10-29 Thread Haisheng Yuan (Jira)
Haisheng Yuan created CALCITE-3458:
--

 Summary: Remove desc in AbstractRelNode
 Key: CALCITE-3458
 URL: https://issues.apache.org/jira/browse/CALCITE-3458
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Haisheng Yuan


If the query is super large, e.g. contains tens of thousands of nodes or 
expressions, the RelNode digest and desc become very large. The content of desc 
and digest are almost the same, except that desc consists of id plus digest, 
which causes OOM in our production system. So remove desc, just use {{id + 
digest}} to produce description.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Quicksql

2019-10-29 Thread Julian Hyde
Anyone know anything about Quicksql? It seems to be quite a popular project, 
and they have an internal fork of Calcite.

https://github.com/Qihoo360/ 

https://github.com/Qihoo360/Quicksql/tree/master/analysis/src/main/java/org/apache/calcite
 


Julian



Re: Same rules fired for logical and physical nodes

2019-10-29 Thread Julian Hyde
This is probably a bigger discussion, but let me kick it off here anyway.

Some of our planner rules have public constructors with Class arguments, and 
therefore people can create their own rule instances that match particular 
sub-classes. Other planner rules do not have those public constructors, and 
therefore are hard-wired to accept logical RelNodes (e.g. LogicalProject) or 
the base RelNode (e.g. Project).

The reason that the latter, less modern, rules still exist is that it is a big 
task to refactor all existing rules, and so it hasn’t been done yet.

Is there a change we could make to the rule API that would make it easier to 
customize rules? I have been thinking for a while about adding an “operand 
builder” to replace the static methods (e.g. RelOptRule.operandJ) that we use 
to build the operands inside rules. Maybe as part of this change, we could 
allow rules to clone themselves with slightly different parameters.

I’m interested in hearing about a “big change” that would save us from a myriad 
of small changes in the coming years.

Julian


> On Oct 28, 2019, at 7:57 PM, Danny Chan  wrote:
> 
> Thanks Xiening Dai for bringing up this topic ~
> 
> I agree that for most of the planning rules, only matching logical nodes are 
> enough. There are 2 concerts from my side:
> 
> 
> • We have some plan rewrite that indeed happens on the “physical” node, so 
> some the rules may be still useful for such case, i.e. the ProjectMergeRule
> • I’m a little worried about the compatibility, maybe we can rule the test 
> cases like Apache Druid which give us more confidence that this is the right 
> way to go
> 
> 
> Best,
> Danny Chan
> 在 2019年10月29日 +0800 AM12:48,Xiening Dai ,写道:
>> Hi all,
>> 
>> While I was looking at CALCITE-2970, I noticed that some of the rules are 
>> fired for both logical and physical nodes. For example, ProjectMergeRule 
>> matches Project.class, so it’s fired for LogicalProject. But then after 
>> LogicalProject is converted into EnummerableProject, the same rule is fired 
>> again for the physical rels. Same for EnumerableLimitRule, 
>> SortRemoveConstantKeysRule, etc.
>> 
>> This seems to be unnecessary. When ProjectMerge is applied to LogicalProject 
>> nodes, we already generate all possible alternatives with merged projects. 
>> We just need to convert the LogicalProject into EnumerableProject. There’s 
>> no need to merge EnumerableProject again.
>> 
>> If I update those rules to only match logical nodes, the planning time of 
>> the case in CALCITE-2970 is reduced ~30%.
>> 
>> Any thoughts?



Re: [DISCUSSION] Cache Optimization in JdbcSchema

2019-10-29 Thread Julian Hyde
How much benefit is that massive cache giving you? If it’s not giving much 
benefit, maybe we should not be caching so much. Maybe in your case caching 
just the current schema (or the most recent 2 or 3 schemas) would be a better 
strategy.

As y’all know, I’m always in favor of removing caches. Or at least getting them 
to prove their worth.

Julian


> On Oct 28, 2019, at 11:50 PM, Feng Zhu  wrote:
> 
> Thanks, I will open a JIRA for discussion, with design doc and testing
> report.
> 
> Danny Chan  于2019年10月29日周二 下午12:11写道:
> 
>> Sounds very attractive, could you give an intuitive design doc to
>> illustrate how it works ? And we may review the design then ;)
>> 
>> Best,
>> Danny Chan
>> 在 2019年10月29日 +0800 AM10:36,Feng Zhu ,写道:
>>> Hi all,
>>> We made some optimizations in practice. But I'm not sure whether this
>> kind
>>> of change is necessary to the community, because it will make the code
>>> complex.
>>> 
>>> Current now, JdbcSchema caches all JdbcTables in tableMap (i.e.,*
>>> ImmutableMap tableMap*)
>>> 
>>> In our production environment, there are about 3000+ datasources and
>>> correspondingly creating 3000+ JdbcSchemas, while each JdbcSchema may
>>> contain up to 1+ tables.Consequently, the table map occupies nearly
>>> 10GB memory, bringing great pressure on the server.
>>> 
>>> We encode <*catalogName, schemaName, tableTypeName*> tuple as unique
>>> Integer, and simplify the table map as <*String, Integer*>. According to
>>> the Integer, we can find tuple and construct JdbcTable dynamically.
>> Benefit
>>> from this, the cached table map costs only about 800MB memory.
>>> 
>>> Best,
>>> DonnyZone
>> 



Re: [DISCUSS] Support Sql Hint for Calcite

2019-10-29 Thread Julian Hyde
Sure, we can make sure something gets into 1.22. There is consensus about the 
parser extensions, whereas the extensions to RelNode and the planner engine are 
a little more experimental. So let’s go forward with that, stating which parts 
we think are likely to change.

Julian


> On Oct 29, 2019, at 2:09 AM, Seliverstov Igor  wrote:
> 
> Colleagues,
> 
> Not only Hazelcast and Apache Flink are interested in SQL hints. Apache 
> Ignite community is working on Calcite integration too, it’s important for us 
> to have appropriate API at current development stage. This case we’ll be able 
> to adapt our solution for SQL hints usage, probably determining additional 
> approach weaknesses or inconveniences.
> 
> Regards,
> Igor
> 
>> 29 окт. 2019 г., в 11:51, Danny Chan  написал(а):
>> 
>> Julian, can we make  some effort to push this feature into release 1.22, 
>> there are users like Vladimir Ozerov from Hazelcast that are interesting on 
>> this feature, also the Apache Flink.
>> 
>> I agree that this internal design is not that perfect, at this moment, we 
>> may hardly to conclude a perfect solution, but at least, the syntax would 
>> remain unchanged in the future.
>> 
>> So can we mark this feature as experimental and we can promote the internal 
>> design when accept more feedbacks from the Calcite uses (from Apache Flink 
>> or from users like Vladimir).
>> 
>> Best,
>> Danny Chan
>> 在 2019年10月18日 +0800 AM4:55,Julian Hyde ,写道:
>>> I wonder whether it is possible to add some kind of “action handler” to the 
>>> planner engine, called, for example, when a rule has fired and is 
>>> registering the RelNode created by the rule. People can write their own 
>>> action handlers to copy hints around. Since the action handlers are the 
>>> user’s code, they can iterate faster to find a hint-propagation strategy 
>>> that works in practice.
>>> 
>>> Another idea is to use VolcanoPlanner.Provenance[1]. A RelNode can find its 
>>> ancestor RelNodes, and the rules that fired to create it. So it can grab 
>>> hints from those ancestors. It does not need to copy those hints onto 
>>> itself.
>>> 
>>> Julian
>>> 
>>> [1] 
>>> https://calcite.apache.org/apidocs/org/apache/calcite/plan/volcano/VolcanoPlanner.Provenance.html
>>>  
>>> 
>>> 
 On Oct 16, 2019, at 8:38 PM, Haisheng Yuan  wrote:
 
 Julian,
 Your concern is very valid, and that is also our main concern.
 I was thinking whether we can put hint into the MEMO group, so that both 
 logical and physical expression in the same group can share the same hint, 
 without copying the hint explicitly. But for newly generated expression 
 that doesn't belong to the original group, we still need to copy hints. 
 What's worse, in HepPlanner, there is no such concept, we may still need 
 to copy hints explicity in planner rules, if we want to keep the hint, 
 which is burdensome.
 
 - Haisheng
 
 --
 发件人:Danny Chan
 日 期:2019年10月16日 14:54:46
 收件人:
 主 题:Re: [DISCUSS] Support Sql Hint for Calcite
 
 Thanks for the clarification.
 
 I understand you worried. Yes, the effort/memory would be wasted or 
 meaningless if hints are not used. This is just what a hint does, it is a 
 “hint” and non-mandatory, but we should give the chance to let user see 
 them, it is the use that decide if to use the hints and how to use them. 
 For big queries I have no confidence to cover the corner cases. So can we 
 mark this feature as experimental and used for simple queries(no 
 decorrelation) first ?
 
 For “reversible”, during the implementation, I try to make the 
 modifications non-invasive with the current codes. That is why I made all 
 the interfaces about the hint into one class named RelWithHInt. Different 
 with trait, I didn’t force users to pass in the hints in the RelNode 
 constructor. I think if is not a bigwork if we want to remove the API.
 
 Best,
 Danny Chan
 在 2019年10月16日 +0800 AM11:14,Julian Hyde ,写道:
> By “skeptical” I mean that I think we can come up with a mechanism to 
> copy hints when applying planner rules, but even when we have implemented 
> that mechanism there will be many cases where people want a hint and that 
> hint is not copied to the RelNode where it is needed, and many other 
> cases where we spend the effort/memory of copying the hint to a RelNode 
> and the hint is not used.
> 
> By “reversible” I mean if we come up with an API that does not work, how 
> do we change or remove that API without people complaining?
> 
> Julian
> 
> 
>> On Oct 15, 2019, at 7:11 PM, Danny Chan  wrote:
>> 
>> Thanks Julian
>> 
>>> I am skeptical that RelWithHint will work for large queries.

[jira] [Created] (CALCITE-3457) RexSimplify incorrectly simplifies IS NOT NULL operator with ITEM call

2019-10-29 Thread Vova Vysotskyi (Jira)
Vova Vysotskyi created CALCITE-3457:
---

 Summary: RexSimplify incorrectly simplifies IS NOT NULL operator 
with ITEM call
 Key: CALCITE-3457
 URL: https://issues.apache.org/jira/browse/CALCITE-3457
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.22.0
Reporter: Vova Vysotskyi
 Fix For: 1.22.0


In CALCITE-3390 ITEM was marked with {{Policy.ANY}} strong policy, but 
according to its JavaDoc, the result may be null if and only if at least one of 
its arguments is null. This statement was used in 
{{RexSimplify.simplifyIsNotNull()}} method, so {{t1.c_nationkey[0] is not 
null}} will be simplified to {{IS NOT NULL($0)}} which is wrong, since array 
may be empty, or index may be less than the size of the array.

Unit test which helps to reproduce this issue:
{noformat}
  @Test public void testSimplifyItemIsNotNull() {
HepProgramBuilder programBuilder = HepProgram.builder()
.addRuleInstance(ReduceExpressionsRule.FILTER_INSTANCE);

String query = "select * from sales.customer as t1 where t1.c_nationkey[0] 
is not null";

checkPlanning(
createDynamicTester(),
null,
new HepPlanner(programBuilder.build()),
query,
false);
  }
{noformat}
Returns plan with incorrectly simplified ITEM expression:
{noformat}
LogicalProject(**=[$1])
  LogicalFilter(condition=[IS NOT NULL($0)])
LogicalTableScan(table=[[CATALOG, SALES, CUSTOMER]])
{noformat}

But the initial intention of CALCITE-3390 was to allow pushing ITEM expression 
to the right input of left-outer-join.

I propose to add a new element to the {{Policy}} which will have a relaxed 
condition - expression is null if at least one of its arguments is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Need help in adding keywords to Babel parser

2019-10-29 Thread Michael Mior
Adding to the list of keywords is a start, but you'll also have to
modify the parser to indicate where those keywords are valid to be
used. You'll also need to modify some of the data structures to store
information about what has been parsed.
--
Michael Mior
mm...@apache.org

Le mar. 29 oct. 2019 à 06:58, devjyoti patra  a écrit :
>
> Hi,
>
> I am writing a SQL statement (ANSI SQL or HiveQL) analyzer which will parse
> a given SQL statement into equivalent AST and I am using Babel for this use
> case. But when I try to parse statements such as  “lateral view explode”,
> which is a valid HiveQL syntax, Babel is throwing ParseException. Adding
> these as keywords to the default list of keywords for Babel also did not
> help. Can someone point me to an example where something similar has been
> done.
>
> Thanks,
> Devjyoti


[jira] [Created] (CALCITE-3456) AssertionError throws when aggregation has non-aggregate calls

2019-10-29 Thread Danny Chen (Jira)
Danny Chen created CALCITE-3456:
---

 Summary: AssertionError throws when aggregation has non-aggregate 
calls
 Key: CALCITE-3456
 URL: https://issues.apache.org/jira/browse/CALCITE-3456
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.21.0
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 1.22.0


Check this sql in SqlToRelConverterTest:

{code:java}
@Test public void testAggregateWithCaseWhen() {
final String sql = "select\n"
+ "  CASE WHEN job IN ('81', '82') THEN job\n"
+ "   ELSE 'error'\n"
+ "  END AS id,\n"
+ "  count(empno)\n"
+ "FROM emp\n"
+ "where job <> '' or job in ('81', '82')\n"
+ "GROUP by deptno, job";
sql(sql).ok();
  }
{code}

I tested PostgreSQL 9.6 and MySQL 5.6, this is a valid SQL and they both 
outputs the right result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Transaction Support - Guidelines

2019-10-29 Thread Kiran Purandara
Hi Stamatis!

Thank you for the clarification.
It gives me some direction as to where to start.

Regards,
Kiran

On Mon, Oct 28, 2019 at 6:25 PM Stamatis Zampetakis 
wrote:

> Hi Kiran,
>
> This is definitely the place to ask such questions!
>
> Unfortunately, there is no specific documentation or guidelines on how to
> support transactions for adapters.
>
> Below some very rough ideas on how this could be done, assuming that the
> complete query can be pushed in the underlying DBMS, which supports
> transactions.
>
> I guess you need your own implementation of the Connection interface that
> wraps internally the real connection and delegate the planning process to
> Calcite.
> You could use the various components of Calcite to arrive to an executable
> plan in the JdbcConvention.
> Then you can transform the plan to SQL using RelToSqlConverter and the
> appropriate dialect and then execute it using the real connection.
>
> Best,
> Stamatis
>
>
>
>
>
> On Mon, Oct 28, 2019 at 4:11 PM Kiran Purandara 
> wrote:
>
> > Hello,
> >
> > This is my 1st question/post so please excuse me in case this is not
> > the right forum for such questions or it should be posted elsewhere.
> >
> > I am working on writing a custom adapter. We need to have finer control
> on
> > both read/write pipelines. In short, we want the adapter to be an
> interface
> > for applications that until now worked with conventional RDBMS which
> means
> > support all the SQL constructs *but also provides us the capabilities
> that
> > calcite enables*.
> > Are there any guidelines published for what needs to be extended in order
> > to support say something as basic as transactions?
> > Since CalciteMetaImpl commit is explicitly throwing exceptions, its not
> > possible to do something like...
> >
> > Connection conn = ...
> > conn.setAutoCommit(false);
> > try{
> > PreparedStatement prep_stmt = conn.prepareStatement(sql.toString());
> >
> > while(...){
> >   //set values..
> >   prep_stmt.addBatch()/prep_stmt.executeBatch();
> > }
> > }
> > conn.commit();//or abort
> >
> > Regards,
> > Kiran
> >
> > PS:  This is the only reference I found when I looked for "calcite
> > transactions not working".
> >
> > [1]
> >
> >
> https://mail-archives.apache.org/mod_mbox/calcite-dev/201502.mbox/%3ccamctme+wsqq8ehzapyc9-3stzpb7d5ayxy0aa7pzbyucjnh...@mail.gmail.com%3E
> >
>


Re: Problem with converters and possibly rule matching

2019-10-29 Thread Vladimir Ozerov
Hi Vladimir,

I am sorry. Pushed, it works now.

вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov :

> > mvn clean test
>
> [ERROR] The goal you specified requires a project to execute but there is
> no POM in this directory
>
> Vladimir, please push missing files
>
> Vladimir
>


Re: Problem with converters and possibly rule matching

2019-10-29 Thread Vladimir Sitnikov
> mvn clean test

[ERROR] The goal you specified requires a project to execute but there is
no POM in this directory

Vladimir, please push missing files

Vladimir


Re: Problem with converters and possibly rule matching

2019-10-29 Thread Vladimir Ozerov
Hi colleagues,

I prepared (relatively) simple reproducer for the problem [1]. It can be
executed with three commands:
git clone https://github.com/devozerov/calcite-optimizer.git
cd calcite-optimizer
mvn clean test

Please do not pay attention to stuff located inside "org.apache.calcite"
package, as it is most likely of little interest for the given problem. I
removed a lot of unrelated stuff for the sake of simplicity, so some things
may seem strange or unnecessary.

So what is going on here?
1) We have a distributed engine that should assemble data from remote
nodes. Let's say, we have three distribution types [2]:
ANY - abstract unknown distribution, which is yet to be resolved during
physical planning. This is the default
REPLICATED - this is how the tables are organized - they are copied to all
nodes
SINGLETON - in this example, this is the distribution of a final aggregator
node which is enforced during physical planning

REPLICATED satisfies SINGLETON. SINGLETON doesn't satisfy REPLICATED.

2) There is a query "SELECT FUNC(f) FROM t", which is converted to the
following form with LOGICAL convention during logical planning:
RootLogicalRel[convention=LOGICAL, distribution=ANY]
-> ProjectLogicalRel[convention=LOGICAL, distribution=ANY]
  -> MapScanLogicalRel[convention=LOGICAL, distribution=ANY]

3) Then, during physical planning we need to resolve and propagate
distributions, and enforce [convention=PHYSICAL, distribution=SINGLETON],
possibly injecting exchange operators in between. In this specific case, no
exchanges are needed, because scans are always REPLICATED, projection
inherits scan distribution, and for the root node is ok to use REPLICATED
input instead of SINGLETON, since the first one satisfies the latter. The
expected physical tree is:
RootPhysicalRel[convention=PHYSICAL, distribution=SINGLETON]
-> ProjectPhysicalRel[convention=PHYSICAL, distribution=REPLICATED]
  -> MapScanPhysicalRel[convention=PHYSICAL, distribution=REPLICATED]

4) There are three physical rules for each operator, which are organized as
follows:
4.1) RootPhysicalRule [4] - converts RootLogicalRel node, enforcing
SINGLETON distribution on its input. As a result, an appropriate exchange
is injected later from DistributionTraitDef [3]. As I said, in this
specific query exchange is not needed
4.2) MapScanPhysicalRule [5] - converts MapScanLogicalRel. At this point,
we resolve input distribution.
4.3) ProjectPhysicalRule [6] - transforms logical project to physical
project *ONLY* if there is an underlying physical input with REPLICATED or
SINGLETON distribution.

5) Original case fails because the rules are executed as follows:
5.1) RootPhysicalRule - do transform
5.2) ProjectPhysicalRule - do nothing at this point
5.3) MapScanPhysicalRule - do transform
I would like ProjectPhysicalRule to be re-fired again, but this doesn't
happen.

There are two workarounds for this, and both concern me.

*Workaround 1*: set "canConvertConvention" to "true". In this case, rules
are executed in the way I would expect. But I observe too many rule
invocations and unnecessary trait conversions because enabling this flag
leads to matching every rel in a set with every other. For example, Volcano
tries to convert SINGLETON -> REPLICATED, while it is never actually needed
by my engine. And this is how rules are executed:
1) RootPhysicalRule - expected
2) RootPhysicalRule - fired again, because I created a converted node on
the previous step
3) ProjectPhysicalRule - expected
4) ProjectPhysicalRule - fired again for the same reason as p.2
5) MapScanPhysicalRule - expected
6) ProjectPhysicalRule - parent rule is called as I would expect - this is
where the missing physical project appear
7) RootPhysicalRule - called because its child was transformed
+ ExpandConversionRule is called multiple times, and sometimes it creates
nodes, which are otherwise completely unnecessary. Such as the
aforementioned SINGLETON -> REPLICATED conversion.
So in the end, it works as I would expect even for much more complex plans.
But produces a litter which I cannot control

Resulting plan:
RootPhysicalRel.PHYSICAL.SINGLETON.[](input=ProjectPhysicalRel#91)
  ProjectPhysicalRel.PHYSICAL.REPLICATED.[](input=MapScanPhysicalRel#81,
...))
MapScanPhysicalRel.PHYSICAL.REPLICATED.[](table=[t],projects=[0])

*Workaround 2*: add exchanges for ANY distribution and produce
transformation with ANY distribution in the project rule. This is what
Drill does. But it also produces a litter. This time these are unnecessary
exchanges; we pessimistically create them when input is ANY, and if it
eventually resolved to any compatible distribution, the exchange stays
there. As far as I understand, Drill's ExcessiveExchangeIdentifier removes
that stuff after planning separately.

Resulting plan:
RootPhysicalRel.PHYSICAL.SINGLETON.[](input=UnicastExchangePhysicalRel#30)

UnicastExchangePhysicalRel.PHYSICAL.SINGLETON.[](input=ProjectPhysicalRel#29,hashFields=[0])
// Not needed!

Need help in adding keywords to Babel parser

2019-10-29 Thread devjyoti patra
Hi,

I am writing a SQL statement (ANSI SQL or HiveQL) analyzer which will parse
a given SQL statement into equivalent AST and I am using Babel for this use
case. But when I try to parse statements such as  “lateral view explode”,
which is a valid HiveQL syntax, Babel is throwing ParseException. Adding
these as keywords to the default list of keywords for Babel also did not
help. Can someone point me to an example where something similar has been
done.

Thanks,
Devjyoti


Re: [DISCUSS] Support Sql Hint for Calcite

2019-10-29 Thread Seliverstov Igor
Colleagues,

Not only Hazelcast and Apache Flink are interested in SQL hints. Apache Ignite 
community is working on Calcite integration too, it’s important for us to have 
appropriate API at current development stage. This case we’ll be able to adapt 
our solution for SQL hints usage, probably determining additional approach 
weaknesses or inconveniences.

Regards,
Igor

> 29 окт. 2019 г., в 11:51, Danny Chan  написал(а):
> 
> Julian, can we make  some effort to push this feature into release 1.22, 
> there are users like Vladimir Ozerov from Hazelcast that are interesting on 
> this feature, also the Apache Flink.
> 
> I agree that this internal design is not that perfect, at this moment, we may 
> hardly to conclude a perfect solution, but at least, the syntax would remain 
> unchanged in the future.
> 
> So can we mark this feature as experimental and we can promote the internal 
> design when accept more feedbacks from the Calcite uses (from Apache Flink or 
> from users like Vladimir).
> 
> Best,
> Danny Chan
> 在 2019年10月18日 +0800 AM4:55,Julian Hyde ,写道:
>> I wonder whether it is possible to add some kind of “action handler” to the 
>> planner engine, called, for example, when a rule has fired and is 
>> registering the RelNode created by the rule. People can write their own 
>> action handlers to copy hints around. Since the action handlers are the 
>> user’s code, they can iterate faster to find a hint-propagation strategy 
>> that works in practice.
>> 
>> Another idea is to use VolcanoPlanner.Provenance[1]. A RelNode can find its 
>> ancestor RelNodes, and the rules that fired to create it. So it can grab 
>> hints from those ancestors. It does not need to copy those hints onto itself.
>> 
>> Julian
>> 
>> [1] 
>> https://calcite.apache.org/apidocs/org/apache/calcite/plan/volcano/VolcanoPlanner.Provenance.html
>>  
>> 
>> 
>>> On Oct 16, 2019, at 8:38 PM, Haisheng Yuan  wrote:
>>> 
>>> Julian,
>>> Your concern is very valid, and that is also our main concern.
>>> I was thinking whether we can put hint into the MEMO group, so that both 
>>> logical and physical expression in the same group can share the same hint, 
>>> without copying the hint explicitly. But for newly generated expression 
>>> that doesn't belong to the original group, we still need to copy hints. 
>>> What's worse, in HepPlanner, there is no such concept, we may still need to 
>>> copy hints explicity in planner rules, if we want to keep the hint, which 
>>> is burdensome.
>>> 
>>> - Haisheng
>>> 
>>> --
>>> 发件人:Danny Chan
>>> 日 期:2019年10月16日 14:54:46
>>> 收件人:
>>> 主 题:Re: [DISCUSS] Support Sql Hint for Calcite
>>> 
>>> Thanks for the clarification.
>>> 
>>> I understand you worried. Yes, the effort/memory would be wasted or 
>>> meaningless if hints are not used. This is just what a hint does, it is a 
>>> “hint” and non-mandatory, but we should give the chance to let user see 
>>> them, it is the use that decide if to use the hints and how to use them. 
>>> For big queries I have no confidence to cover the corner cases. So can we 
>>> mark this feature as experimental and used for simple queries(no 
>>> decorrelation) first ?
>>> 
>>> For “reversible”, during the implementation, I try to make the 
>>> modifications non-invasive with the current codes. That is why I made all 
>>> the interfaces about the hint into one class named RelWithHInt. Different 
>>> with trait, I didn’t force users to pass in the hints in the RelNode 
>>> constructor. I think if is not a bigwork if we want to remove the API.
>>> 
>>> Best,
>>> Danny Chan
>>> 在 2019年10月16日 +0800 AM11:14,Julian Hyde ,写道:
 By “skeptical” I mean that I think we can come up with a mechanism to copy 
 hints when applying planner rules, but even when we have implemented that 
 mechanism there will be many cases where people want a hint and that hint 
 is not copied to the RelNode where it is needed, and many other cases 
 where we spend the effort/memory of copying the hint to a RelNode and the 
 hint is not used.
 
 By “reversible” I mean if we come up with an API that does not work, how 
 do we change or remove that API without people complaining?
 
 Julian
 
 
> On Oct 15, 2019, at 7:11 PM, Danny Chan  wrote:
> 
> Thanks Julian
> 
>> I am skeptical that RelWithHint will work for large queries.
> 
> For “skeptical” do you mean how to transfer the hints during rule 
> planning ? I’m also not that confident yet.
> 
>> How do we introduce it in a reversible way
> Do you mean transform the RelWithHint back into the SqlHint ? I didn’t 
> implement it in current patch, but I think we have the ability to do that 
> because we have a inheritPath for each RelWithHint, we can collect all 
> the hints together and merge 

Re: [DISCUSS] Support Sql Hint for Calcite

2019-10-29 Thread Danny Chan
Julian, can we make  some effort to push this feature into release 1.22, there 
are users like Vladimir Ozerov from Hazelcast that are interesting on this 
feature, also the Apache Flink.

I agree that this internal design is not that perfect, at this moment, we may 
hardly to conclude a perfect solution, but at least, the syntax would remain 
unchanged in the future.

So can we mark this feature as experimental and we can promote the internal 
design when accept more feedbacks from the Calcite uses (from Apache Flink or 
from users like Vladimir).

Best,
Danny Chan
在 2019年10月18日 +0800 AM4:55,Julian Hyde ,写道:
> I wonder whether it is possible to add some kind of “action handler” to the 
> planner engine, called, for example, when a rule has fired and is registering 
> the RelNode created by the rule. People can write their own action handlers 
> to copy hints around. Since the action handlers are the user’s code, they can 
> iterate faster to find a hint-propagation strategy that works in practice.
>
> Another idea is to use VolcanoPlanner.Provenance[1]. A RelNode can find its 
> ancestor RelNodes, and the rules that fired to create it. So it can grab 
> hints from those ancestors. It does not need to copy those hints onto itself.
>
> Julian
>
> [1] 
> https://calcite.apache.org/apidocs/org/apache/calcite/plan/volcano/VolcanoPlanner.Provenance.html
>  
> 
>
> > On Oct 16, 2019, at 8:38 PM, Haisheng Yuan  wrote:
> >
> > Julian,
> > Your concern is very valid, and that is also our main concern.
> > I was thinking whether we can put hint into the MEMO group, so that both 
> > logical and physical expression in the same group can share the same hint, 
> > without copying the hint explicitly. But for newly generated expression 
> > that doesn't belong to the original group, we still need to copy hints. 
> > What's worse, in HepPlanner, there is no such concept, we may still need to 
> > copy hints explicity in planner rules, if we want to keep the hint, which 
> > is burdensome.
> >
> > - Haisheng
> >
> > --
> > 发件人:Danny Chan
> > 日 期:2019年10月16日 14:54:46
> > 收件人:
> > 主 题:Re: [DISCUSS] Support Sql Hint for Calcite
> >
> > Thanks for the clarification.
> >
> > I understand you worried. Yes, the effort/memory would be wasted or 
> > meaningless if hints are not used. This is just what a hint does, it is a 
> > “hint” and non-mandatory, but we should give the chance to let user see 
> > them, it is the use that decide if to use the hints and how to use them. 
> > For big queries I have no confidence to cover the corner cases. So can we 
> > mark this feature as experimental and used for simple queries(no 
> > decorrelation) first ?
> >
> > For “reversible”, during the implementation, I try to make the 
> > modifications non-invasive with the current codes. That is why I made all 
> > the interfaces about the hint into one class named RelWithHInt. Different 
> > with trait, I didn’t force users to pass in the hints in the RelNode 
> > constructor. I think if is not a bigwork if we want to remove the API.
> >
> > Best,
> > Danny Chan
> > 在 2019年10月16日 +0800 AM11:14,Julian Hyde ,写道:
> > > By “skeptical” I mean that I think we can come up with a mechanism to 
> > > copy hints when applying planner rules, but even when we have implemented 
> > > that mechanism there will be many cases where people want a hint and that 
> > > hint is not copied to the RelNode where it is needed, and many other 
> > > cases where we spend the effort/memory of copying the hint to a RelNode 
> > > and the hint is not used.
> > >
> > > By “reversible” I mean if we come up with an API that does not work, how 
> > > do we change or remove that API without people complaining?
> > >
> > > Julian
> > >
> > >
> > > > On Oct 15, 2019, at 7:11 PM, Danny Chan  wrote:
> > > >
> > > > Thanks Julian
> > > >
> > > > > I am skeptical that RelWithHint will work for large queries.
> > > >
> > > > For “skeptical” do you mean how to transfer the hints during rule 
> > > > planning ? I’m also not that confident yet.
> > > >
> > > > > How do we introduce it in a reversible way
> > > > Do you mean transform the RelWithHint back into the SqlHint ? I didn’t 
> > > > implement it in current patch, but I think we have the ability to do 
> > > > that because we have a inheritPath for each RelWithHint, we can collect 
> > > > all the hints together and merge them into the SqlHints, then propagate 
> > > > these SqlHints to the SqlNodes.
> > > >
> > > > > What are the other options?
> > > > Do you mean the way to transfer hints during planning ? I have no other 
> > > > options yet.
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年10月16日 +0800 AM8:03,dev@calcite.apache.org,写道:
> > > > >
> > > > > I am skeptical that RelWithHint will work for large queries.
> > >
> >
>


Re: Problem with converters and possibly rule matching

2019-10-29 Thread Vladimir Ozerov
Hi everybody,

First of all, thank you for answers and suggestions. Let me address them
briefly:
1) I use two conventions at the moment, LOGICAL and PHYSICAL. I agree with
you that this might be overkill, and there is a chance that in the final
solution we may end up with only one. But meanwhile having this separation
seems handy, because on the first stage I enforce the optimizer to
propagate NONE -> LOGICAL conversions. Then I have a clean logical tree
*before* any physical distribution stuff is involved, which I can use for
internal post-processing before going logical. I would propose to keep it
out of scope at the moment. Let's just consider that the goal is to convert
from one convention to another.
2) Operands with "any" matchers are already used
3) The reason why I would like to avoid the "Project" rule fire on the
first run, is that it doesn't enforce any distribution on its child
("Scan"). Instead, it needs to derive the distribution from the scan.

To make the problem more clear, let me prepare a simple reproducer for the
issue.

Regards,
Vladimir.

вт, 29 окт. 2019 г. в 10:01, Seliverstov Igor :

> Vladimir,
>
> I guess Project rule doesn't have a child matcher. Put into it child "any"
> match rule and it will apply on each child node transformation.
>
> Regards,
> Igor
>
>
> вт, 29 окт. 2019 г., 7:07 Danny Chan :
>
> > Vladimir, all you need to do is to change the convention of the root
> node,
> > the volcano would propagate the convention to all its input nodes when
> > registering them to the planner. You can take this code [1] for
> reference :)
> >
> > [1]
> >
> https://github.com/apache/calcite/blob/1ef2821695ca6e10fbad7b8efe7246c4a20143af/core/src/main/java/org/apache/calcite/tools/Programs.java#L324
> >
> > Best,
> > Danny Chan
> > 在 2019年10月29日 +0800 AM5:24,dev@calcite.apache.org,写道:
> > >
> > > n of the scan.
> >
>


Re: Problem with converters and possibly rule matching

2019-10-29 Thread Seliverstov Igor
Vladimir,

I guess Project rule doesn't have a child matcher. Put into it child "any"
match rule and it will apply on each child node transformation.

Regards,
Igor


вт, 29 окт. 2019 г., 7:07 Danny Chan :

> Vladimir, all you need to do is to change the convention of the root node,
> the volcano would propagate the convention to all its input nodes when
> registering them to the planner. You can take this code [1] for reference :)
>
> [1]
> https://github.com/apache/calcite/blob/1ef2821695ca6e10fbad7b8efe7246c4a20143af/core/src/main/java/org/apache/calcite/tools/Programs.java#L324
>
> Best,
> Danny Chan
> 在 2019年10月29日 +0800 AM5:24,dev@calcite.apache.org,写道:
> >
> > n of the scan.
>


Re: [DISCUSSION] Cache Optimization in JdbcSchema

2019-10-29 Thread Feng Zhu
Thanks, I will open a JIRA for discussion, with design doc and testing
report.

Danny Chan  于2019年10月29日周二 下午12:11写道:

> Sounds very attractive, could you give an intuitive design doc to
> illustrate how it works ? And we may review the design then ;)
>
> Best,
> Danny Chan
> 在 2019年10月29日 +0800 AM10:36,Feng Zhu ,写道:
> > Hi all,
> > We made some optimizations in practice. But I'm not sure whether this
> kind
> > of change is necessary to the community, because it will make the code
> > complex.
> >
> > Current now, JdbcSchema caches all JdbcTables in tableMap (i.e.,*
> > ImmutableMap tableMap*)
> >
> > In our production environment, there are about 3000+ datasources and
> > correspondingly creating 3000+ JdbcSchemas, while each JdbcSchema may
> > contain up to 1+ tables.Consequently, the table map occupies nearly
> > 10GB memory, bringing great pressure on the server.
> >
> > We encode <*catalogName, schemaName, tableTypeName*> tuple as unique
> > Integer, and simplify the table map as <*String, Integer*>. According to
> > the Integer, we can find tuple and construct JdbcTable dynamically.
> Benefit
> > from this, the cached table map costs only about 800MB memory.
> >
> > Best,
> > DonnyZone
>


Re: Query hints support and example of extended SQL syntax

2019-10-29 Thread Danny Chan
I’m trying to push it into release-1.22 (next release), wait for our good news !

Best,
Danny Chan
在 2019年10月29日 +0800 PM2:11,Vladimir Ozerov ,写道:
> Hi Danny,
>
> I looked at your design proposal and it looks great. All common hint cases
> that we may possibly need are covered there. Thank you.
>
> I hope the community will release it soon since hints are essential for
> production-grade systems. From what I see in other projects, such as Flink
> or Drill, they have to rely on properties set somewhere else
> (configuration, session), to achieve the same goals, which is not
> user-friendly.
>
> Regards,
> Vladimir.
>
> пн, 28 окт. 2019 г. в 13:26, Danny Chan :
>
> > Here is the discussion mailing list [1]
> >
> > [1]
> > https://ponymail-vm.apache.org/_GUI_/thread.html/db3799d70232ec85e294bb8885431a3f5c88bb28f4fd3337368b1480@%3Cdev.calcite.apache.org%3E
> >
> > Best,
> > Danny Chan
> > 在 2019年10月28日 +0800 PM5:09,Vladimir Ozerov ,写道:
> > > Hi colleagues,
> > >
> > > I am working on Calcite integration with Hazelcast. One particular
> > problem
> > > is that we need query hints. I found a couple of tickets [1] [2], and it
> > > seems that hints are not readily available at the moment. So I have two
> > > questions:
> > > 1) Is it true that there is no built-in way to provide query hints at the
> > > moment? Are there any plans to finally add hints to the engine?
> > > 2) As a possible solution, I am considering extending SELECT syntax with
> > a
> > > custom parser extension. Do you know any example of extended SELECT
> > syntax
> > > in existing Calcite integrations? I found an example of SqlInsert command
> > > extension in Apache Flink, but maybe somebody already extended SELECT as
> > > well?
> > >
> > > Thank you,
> > > Vladimir.
> > >
> > > [1] https://issues.apache.org/jira/browse/CALCITE-482
> > > [2] https://issues.apache.org/jira/browse/CALCITE-495
> >


Re: Query hints support and example of extended SQL syntax

2019-10-29 Thread Vladimir Ozerov
Hi Danny,

I looked at your design proposal and it looks great. All common hint cases
that we may possibly need are covered there. Thank you.

I hope the community will release it soon since hints are essential for
production-grade systems. From what I see in other projects, such as Flink
or Drill, they have to rely on properties set somewhere else
(configuration, session), to achieve the same goals, which is not
user-friendly.

Regards,
Vladimir.

пн, 28 окт. 2019 г. в 13:26, Danny Chan :

> Here is the discussion mailing list [1]
>
> [1]
> https://ponymail-vm.apache.org/_GUI_/thread.html/db3799d70232ec85e294bb8885431a3f5c88bb28f4fd3337368b1480@%3Cdev.calcite.apache.org%3E
>
> Best,
> Danny Chan
> 在 2019年10月28日 +0800 PM5:09,Vladimir Ozerov ,写道:
> > Hi colleagues,
> >
> > I am working on Calcite integration with Hazelcast. One particular
> problem
> > is that we need query hints. I found a couple of tickets [1] [2], and it
> > seems that hints are not readily available at the moment. So I have two
> > questions:
> > 1) Is it true that there is no built-in way to provide query hints at the
> > moment? Are there any plans to finally add hints to the engine?
> > 2) As a possible solution, I am considering extending SELECT syntax with
> a
> > custom parser extension. Do you know any example of extended SELECT
> syntax
> > in existing Calcite integrations? I found an example of SqlInsert command
> > extension in Apache Flink, but maybe somebody already extended SELECT as
> > well?
> >
> > Thank you,
> > Vladimir.
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-482
> > [2] https://issues.apache.org/jira/browse/CALCITE-495
>