Re: [question] how to do not include columns in select *

2023-09-01 Thread Julian Hyde
I’d write a jira case, specifying the feature in such a way that everyone who 
uses Calcite can benefit from it, then I’d write some unit tests (including 
some negative tests), and then the coding part would be relatively 
straightforward.

Probably not the answer you were hoping for, but that’s honestly how I’d 
approach it.

Julian

PS Please subscribe to dev@ so that you receive replies and so that your reply 
does not need to be moderated.

> On Sep 1, 2023, at 4:00 AM, Gonzalo Ortiz Jaureguizar  
> wrote:
> 
> Hi there,
> 
> I'm a contributor to Apache Pinot, which uses Apache Calcite under the
> hood. Like some other databases, Pinot defines some *private* columns that
> customers can use, but they are not included in select * by default. I
> guess this is a not so strange pattern. For example, Postgres does the same
> with `xmin` and `xmax`.
> 
> My question is: What is the correct way to implement this behavior? We
> initially override SqlValidatorImpl.expandStar in our own Validator, but
> recently we have enabled identifierExpansion and these columns are
> projected again.
> 
> Reading the code, it looks like we could
> override SqlValidatorImpl.addToSelectList doing something like:
> 
> ```
> protected void addToSelectList(
>  List list,
>  Set aliases,
>  List> fieldList,
>  SqlNode exp,
>  SelectScope scope,
>  final boolean includeSystemVars) {
>   if (!isPrivateColumn(exp)) {
> super.addToSelectList(list, aliases, fieldList, expr, scope,
> includeSystemVars);
>   }
> }
> ```
> 
> But the javadoc doesn't say that it is allowed to not add the column to
> list and fieldList, so that change may break something.



Re: PR CI Failures Question

2023-08-31 Thread Julian Hyde
Actually, no need to force-push, just restart the job.

> On Aug 31, 2023, at 2:41 PM, Julian Hyde  wrote:
> 
> I searched for the error message and it’s maybe a corrupted Gradle cache. 
> That sounds plausible, because the message doesn’t look specific to Windows 
> or your change. Just one container gone bad. I’d force-push and see whether I 
> get a healthier container next run.
> 
> Julian
> 
> [1] 
> https://stackoverflow.com/questions/65645510/cannot-access-script-base-class-org-gradle-kotlin-dsl-kotlinbuildscript
>  
> 
>> On Aug 31, 2023, at 9:13 AM, Tanner Clary  
>> wrote:
>> 
>> Hi Calcite Developers,
>> 
>> I have a PR [1] that is failing the CI checks for Windows [2]. I am not
>> familiar with these errors and am wondering if anyone has any suggestions
>> on how I could fix them. Thanks so much.
>> 
>> Best,
>> Tanner Clary
>> 
>> [1] https://github.com/apache/calcite/pull/3391
>> [2]
>> https://github.com/apache/calcite/actions/runs/6039592258/job/16388593881
> 



Re: PR CI Failures Question

2023-08-31 Thread Julian Hyde
I searched for the error message and it’s maybe a corrupted Gradle cache. That 
sounds plausible, because the message doesn’t look specific to Windows or your 
change. Just one container gone bad. I’d force-push and see whether I get a 
healthier container next run.

Julian

[1] 
https://stackoverflow.com/questions/65645510/cannot-access-script-base-class-org-gradle-kotlin-dsl-kotlinbuildscript
 

> On Aug 31, 2023, at 9:13 AM, Tanner Clary  
> wrote:
> 
> Hi Calcite Developers,
> 
> I have a PR [1] that is failing the CI checks for Windows [2]. I am not
> familiar with these errors and am wondering if anyone has any suggestions
> on how I could fix them. Thanks so much.
> 
> Best,
> Tanner Clary
> 
> [1] https://github.com/apache/calcite/pull/3391
> [2]
> https://github.com/apache/calcite/actions/runs/6039592258/job/16388593881



Re: [Question] How to test the correctness of an optimization rule in quidem?

2023-08-30 Thread Julian Hyde
How about creating a RelOptRules#EXTENDED_RULES (or using some existing 
collection) and provide a means within a Quidem test to turn on that set of 
rules.

> On Aug 28, 2023, at 9:56 PM, LakeShen  wrote:
> 
> Hi Julian, thanks for you reply.I have got some information from previous
> commits, a new rule are added to the RelOptRules#ABSTRACT_RULES collection,
> then in RelOptUtil#registerDefaultRules method,It is added to the default
> rules,so that we could add the test sql in .iq file and test the
> correctness of results for rule.
> 
> I'm not sure whether this is a good way,could we add a standard process to
> test the correctness of the results for the optimization rules  in calcite?
> 
> Best,
> LakeShen
> 
> 在 2023年8月29日 03:26:03 上,Julian Hyde  写道:
> 
>> The meta-answer: I would use ‘git log’ and identify commit that modify
>> both RelOptRulesTest.java and also at least one .iq file.
>> 
>> I don’t think there’s a way to selectively enable a planner rule in a .iq
>> file. I don’t know if there is a way to add such a mechanism which is
>> reasonably simple for people to understand.
>> 
>> If the rule is added to the standard set of rules then it’s easy to use
>> the rule in a Quidem test. I haven’t thought about whether this rule should
>> be in the standard rule set.
>> 
>> Julian
>> 
>> 
>> On Aug 28, 2023, at 5:36 AM, LakeShen  wrote:
>> 
>> 
>> Hi Community,
>> 
>> 
>> Now I have created the CALCITE-5889 and have added the
>> MinusToDistinctRule,I
>> 
>> want to test the correctness of my optimization rules in quidem,but I
>> 
>> didn't find any unit test that I could refer to. Could the community give
>> 
>> me some help?
>> 
>> 
>> Usually, we finished the optimization rules, we could add the unit test in
>> 
>> RelOptRulesTest , and then in RelOptRulesTest.xml file, we need add the
>> 
>> expected results of the plan.If we could add end-to-end testing to our
>> 
>> optimization rules, we could ensure that our rules are correct.
>> 
>> 
>> Best,
>> 
>> LakeShen
>> 
>> 
>> 



Re: Filter push-down below join not occurring for custom UDF

2023-08-30 Thread Julian Hyde
I wanted to figure out whether the transformation that you want is valid. Let’s 
convert your query

  select T1.id, T2.id
  from T1 left join T2 on T1.id = T2.C201
  where test(T1.id)

into one of the same shape:

  select d.deptno, e.empno
  from dept as d left join emp as e on d.deptno = e.deptno
  where is_odd_or_null(d.deptno)

Is this query equivalent to

  select d.deptno, e.empno
  from (select * from dept as d
where is_odd_or_null(d.deptno)
  ) as d left join emp as e on d.deptno = e.deptno

for all possible is_odd_or_null functions, and without assuming and primary or 
foreign key constraints, and without assuming that any columns are not null?

Yes, I think it is. In particular, we do not need any knowledge of the ’strong’ 
behavior of is_odd_or_null.

Julian



> On Aug 30, 2023, at 10:02 AM, Ian Bertolacci 
>  wrote:
> 
> Hi Benchao,
> Looking at this closer, I think I see what you mean.
> The distinguishing factor here is that the Policy [1] for the EQUALS operator 
> is ANY, which is used in Strong.isNull to check if any of the operands can be 
> null (which is true when above an outer join), which allows 
> RelOptUtil.simplifyJoin to simplify the join into an inner join.
> Then this allows the filter to be pushed down onto what use to be the null 
> producing side of the join.
> In the case of a custom function, the operator is given an AS_IS, which 
> causes Strong.isNull to base the nullability on the expression kind. The node 
> is an OTHER_FUNCTION which is not present in the switch, and defaults to 
> false [3], which does *not* allow the join to be converted to an inner join, 
> which disallows the rule to push down the filters below the join.
> 
> I think that if we were able to define the operator’s Policy (which I guess 
> we are only able to do by defining SqlOperators) then this problem would be 
> solved.
> 
> Thanks!
> -Ian J. Bertolacci
> 
> [1] 
> https://calcite.apache.org/javadocAggregate/org/apache/calcite/plan/Strong.Policy.html
> [2] 
> https://github.com/apache/calcite/blob/2bf1e7b2e8596a61b5ac19aa7bebb9910c4eea94/core/src/main/java/org/apache/calcite/plan/Strong.java#L211
> [3] 
> https://github.com/apache/calcite/blob/2bf1e7b2e8596a61b5ac19aa7bebb9910c4eea94/core/src/main/java/org/apache/calcite/plan/Strong.java#L242
> On 2023/08/30 12:39:24 Benchao Li wrote:
>> Hi Ian,
>> 
>> FilterJoinRule[1] is designed to not push filters to null-generating
>> side of Join, it has nothing to do with your custom UDF.
>> 
>> [1] 
>> https://github.com/apache/calcite/blob/2bf1e7b2e8596a61b5ac19aa7bebb9910c4eea94/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L111-L125
>> 
>> Alessandro Solimando mailto:al...@gmail.com>> 于2023年8月30日周三 
>> 14:13写道:
>>> 
>>> Hi Ian,
>>> I did not have time to dig more into your question (not very familiar with
>>> the way you register functions), but for RuleEventLogger you might want to
>>> check these slides (9 to 12 for how to activate it, the rest 15 to 23 on
>>> how to read the output):
>>> https://www.slideshare.net/StamatisZampetakis/debugging-planning-issues-using-calcites-builtin-loggers
>>> .
>>> 
>>> Hth,
>>> Alessandro
>>> 
>>> 
>>> 
>>> On Wed, 30 Aug 2023 at 04:14, Ian Bertolacci
>>> mailto:ia...@workday.com.inva>lid> wrote:
>>> 
 Hello,
 We have defined some extra comparison functions for our users.
 However, we’ve noticed that filter push-down (using `
 CoreRules.FILTER_INTO_JOIN`) does not occur when the operands to the
 functions come from the opposite side of an outer join (i.e. from the right
 side of a left outer join, or from the left side of a right outer join).
 
 Here is a demonstration of this with SQL, the logical plan, and the
 physical plan post optimization: https://pastebin.com/raw/KjE40z5X
 In this example, I’ve defined a simple function called “test” which takes
 a BigInt and returns Boolean.
 In the first query, the operand to `test` comes from the left side of the
 left join, and the tree before planning has the filter node above the join,
 and after planning the filter is below the join.
 In the second query, the operand to `test` comes from the left side of the
 right join, and the tree before planning has the filter node above the
 join, but after planning the filter is still above the join.
 (I was hoping to get the output from the RuleEventLogger, but I haven’t
 been able to get it working.)
 
 I figure the push down is not happening because we are not properly
 communicating how these functions handles null (which is to return null,
 which would be false-y, and therefore allow the join to be converted to an
 inner join and the filter pushed down below the join).
 We are not providing these functions as SqlOperator instances in a
 SqlOperatorTable; instead we use `ScalarFunctionImpl.create` on native
 methods, and provide those Function instances through the Schema, via
 

Re: [Question] How to test the correctness of an optimization rule in quidem?

2023-08-28 Thread Julian Hyde
The meta-answer: I would use ‘git log’ and identify commit that modify both 
RelOptRulesTest.java and also at least one .iq file.

I don’t think there’s a way to selectively enable a planner rule in a .iq file. 
I don’t know if there is a way to add such a mechanism which is reasonably 
simple for people to understand.

If the rule is added to the standard set of rules then it’s easy to use the 
rule in a Quidem test. I haven’t thought about whether this rule should be in 
the standard rule set.

Julian


> On Aug 28, 2023, at 5:36 AM, LakeShen  wrote:
> 
> Hi Community,
> 
> Now I have created the CALCITE-5889 and have added the MinusToDistinctRule,I
> want to test the correctness of my optimization rules in quidem,but I
> didn't find any unit test that I could refer to. Could the community give
> me some help?
> 
> Usually, we finished the optimization rules, we could add the unit test in
> RelOptRulesTest , and then in RelOptRulesTest.xml file, we need add the
> expected results of the plan.If we could add end-to-end testing to our
> optimization rules, we could ensure that our rules are correct.
> 
> Best,
> LakeShen



Re: LITERAL_AGG question

2023-08-28 Thread Julian Hyde
First off, can you create a jira case? This is clearly a bug. 

I think AggregateCall.rexList is new and will need to be explicitly handled in 
serialization and deserialization. Testing shouldn’t be difficult if you extend 
or modify existing RelJson tests. 

> On Aug 28, 2023, at 7:30 AM, stanilovsky evgeny  
> wrote:
> 
> hello, please can someone explain me how LITERAL_AGG need to be correctly 
> serialized\deserealized and how to write appropriate tests.
> 
> Problem :
> we have LITERAL_AGG
> further:
> toJson(org.apache.calcite.rel.core.AggregateCall) {
>Map map = map();
>map.put("agg", toJson(node.getAggregation()));
>map.put("type", toJson(node.getType()));
>map.put("distinct", node.isDistinct());
>map.put("operands", node.getArgList());
>map.put("filter", node.filterArg);
>map.put("name", node.getName());
>return map;
>}
> 
> not serialized "rexList" here ^
> 
> deserialization part:
> 
> RelJsonReader#toAggCall(Map jsonAggCall) {
>... cut ...
>return AggregateCall.create(aggregation, distinct, false, false,
>ImmutableList.of(), <-- empty list
>operands,
>filterOperand == null ? -1 : filterOperand,
>null, RelCollations.EMPTY, type, name);
> }
> 
> used ImmutableList.of() instead of "rexList" (no place where we can obtain it 
> here, ok)
> 
> and further call for assembly:
> 
> public static AggregateCall create(...
> ...
>  final List preTypes = RexUtil.types(rexList); <- pre types 
> are empty
>  final List types =
>  SqlTypeUtil.projectTypes(input.getRowType(), argList); <- argList is 
> empty too
>  final Aggregate.AggCallBinding callBinding =
>  new Aggregate.AggCallBinding(typeFactory, aggFunction, preTypes,
>  types, groupCount, filterArg >= 0);
>   type = aggFunction.inferReturnType(callBinding);
> 
> 
> 
>  private static RelDataType inferReturnType(SqlOperatorBinding opBinding) {
>// LITERAL_AGG takes one pre-operand and zero (post-)operands.
>if (opBinding.getPreOperandCount() != 1
>|| opBinding.getOperandCount() != 1) {
>  throw new AssertionError(); <-- brings this assertion
>}
>return opBinding.getOperandType(0);
>  }
> 
> it`s not clear for me how to write test for such a case.
> I realize that description is probably not clear but i tried to make it as 
> informative as i can.
> 
> If all above is not clear, brief question : how to write 
> serizlization\deserialization tests for such Operands, yes i found 
> RelWriterTest but it still unclear for me:
> 
>  @Test void testWriter1() {
>int i = Frameworks.withPlanner((cluster, relOptSchema, rootSchema) -> {
>  final RelDataTypeFactory typeFactory = cluster.getTypeFactory();
> 
>  RelDataTypeField field = new RelDataTypeFieldImpl(
>  "ID", 0, typeFactory.createSqlType(SqlTypeName.INTEGER)
>  );
>  RelRecordType rowType = new 
> RelRecordType(Collections.singletonList(field));
>  RelOptAbstractTable t1 = new RelOptAbstractTable(null, "t1", rowType) {
>@Override
>public  T unwrap(Class clazz) {
>  return null;
>}
>  };
> 
>  LogicalTableScan lt = new LogicalTableScan(cluster, null, List.of(), t1);
>  RelDataTypeSystem typeSys = cluster.getTypeFactory().getTypeSystem();
> 
>  SqlBasicAggFunction literalFunc = SqlLiteralAggFunction.INSTANCE;
>  RexLiteral rex = RexLiteral.fromJdbcString(
>  new BasicSqlType(typeSys, SqlTypeName.BOOLEAN), SqlTypeName.BOOLEAN, 
> "true");
>  AggregateCall type = AggregateCall.create(literalFunc, false, false, 
> false, List.of(rex),
>  ImmutableList.of(), -1, null,
>  RelCollations.of(new RelFieldCollation(1)), new 
> BasicSqlType(typeSys, SqlTypeName.BOOLEAN), "i");
> 
>  LogicalAggregate la = new LogicalAggregate(cluster, null, List.of(), lt, 
> ImmutableBitSet.of(), null, List.of(type));
> 
>  final JsonBuilder jsonBuilder = new JsonBuilder();
>  final RelJson json = RelJson.create().withJsonBuilder(jsonBuilder);
>  final Object o = json.toJson(la);
>  assertThat(o, notNullValue());
>  final String s = jsonBuilder.toJsonString(o);
>  //final String expectedJson = "";
>  //assertThat(s, is(expectedJson));
>  final RelDataType type2 = json.toType(typeFactory, o);
>  assertThat(type2, is(type));
>  return 0;
>});
>  }
> 
> Thanks !


Re: JDBC 4.2 or 4.3 support

2023-08-25 Thread Julian Hyde
We’re an open source project, so there is no formal roadmap.

Did you search jira? If there is no jira case, log one which describes your 
problem. I’m not familiar with what changed between JDBC 4.1, 4.2 and 4.3.

Julian


> On Aug 24, 2023, at 5:25 AM, Tony Falabella 
>  wrote:
> 
> Calcite Team,
> 
> Are there any plans to support JDBC 4.2 or 4.3 on the roadmap?  
> 
> Also we have code that adds a JdbcSchema to the rootSchema via:
> 
> rootSchema.add(DB_SCHEMA, JdbcSchema.create(rootSchema, DB_SCHEMA, 
> dataSource, null, DB_SCHEMA));
> 
> When querying via JPA against an H2 DB that has a table containing a “Binary 
> Large Object” (a Blob) CalciteJdbc41Connection creates a 
> CalciteJdbc41PreparedStatement which creates a CalciteResultSet (a subclass 
> of AvaticaResultSet) which then creates an ArrayIteratorCursor (a subclass of 
> AbstractCursor) which has this implementation:
> 
> public Blob getBlob() throws SQLException {
>  throw cannotConvert(“Blob”)
> }
> 
> Is there anything I can do to support Blobs/Clobs/Arrays/Refs/etc?  Maybe I’m 
> doing something incorrectly in my Connection initialization.  I am NOT 
> initializing anything with the RelOptPlanner like Rules. It didn’t appear 
> that I needed to. 
> 
> Thanks in advance. 
> Tony



Functions that cache state, e.g. compiled regular expressions

2023-08-23 Thread Julian Hyde
Can I get one or two reviews of
https://github.com/apache/calcite/pull/3394 /
https://issues.apache.org/jira/browse/CALCITE-5914 ?

I'm trying to introduce an architecture so that Java functions that
need to retain state for performance reasons (e.g. a cached compiled
regular expression) are straightforward to write and use.

My proposal is to implement such functions using a non-static method.
The code generator instantiates the "function object" so that it can
call the method, uses the same object throughout the query, and the
function can store state in that object. We were already doing it with
the RAND function, so it made sense to use the mechanism more widely.

Julian


Re: [jira] [Created] (CALCITE-5950) Default column constraint is erroneously processed.

2023-08-23 Thread Julian Hyde
I just took a look at
https://github.com/apache/calcite/pull/3393/checks. I see an autostyle
violation, and it looks valid. That error is there so that you can fix
it and not waste reviewers' time.

On Wed, Aug 23, 2023 at 2:15 PM Julian Hyde  wrote:
>
> There aren't many false positives or flaky tests in CI,
> checker-framework, error-prone. If it says that (say) you are passing
> a nullable object to a method that requires a not-nullable argument,
> you probably are.
>
> I find it useful to run checker-framework on my own computer, rather
> than waiting for CI. But you must use JDK 11:
>
>  ./gradlew --no-parallel --no-daemon -PenableCheckerframework
> :linq4j:classes :core:classes
>
> Julian
>
> On Wed, Aug 23, 2023 at 8:42 AM stanilovsky evgeny
>  wrote:
> >
> > PR is ready for review, plz explain did i need to fix all of: CI /
> > CheckerFramework issues ?
> > Seems some of them are false positive.
> > thanks !
> >
> > > Evgeny Stanilovsky created CALCITE-5950:


Re: [jira] [Created] (CALCITE-5950) Default column constraint is erroneously processed.

2023-08-23 Thread Julian Hyde
There aren't many false positives or flaky tests in CI,
checker-framework, error-prone. If it says that (say) you are passing
a nullable object to a method that requires a not-nullable argument,
you probably are.

I find it useful to run checker-framework on my own computer, rather
than waiting for CI. But you must use JDK 11:

 ./gradlew --no-parallel --no-daemon -PenableCheckerframework
:linq4j:classes :core:classes

Julian

On Wed, Aug 23, 2023 at 8:42 AM stanilovsky evgeny
 wrote:
>
> PR is ready for review, plz explain did i need to fix all of: CI /
> CheckerFramework issues ?
> Seems some of them are false positive.
> thanks !
>
> > Evgeny Stanilovsky created CALCITE-5950:


Re: Implementing HASH(*)

2023-08-22 Thread Julian Hyde
Fwiw, the “*” in the syntax of "COUNT(*)” was a mistake.

By analogy with “SELECT *” and with how Unix treats commands like “ls -l *”, 
people expect “*” to be expanded to all columns. But suppose I have T (x, y) 
with values (0, null), (1, 0), (null, 0). Do I expect “select count(*) from t” 
to return the same results as “select count(x, y) from t”?

No. The first query returns 3, and the second query returns 1 (because there is 
only one row where neither x and y is null). You can try it:

  $ ./sqlline -u 
'jdbc:calcite:model=core/src/test/resources/hsqldb-model.json;fun=calcite' -n 
'sa' -p 'sa'
  > with t (x, y) as (values (0,null),(1, 0),(null,1)) select count(x) from t;
  2
  1 row selected
  > with t (x, y) as (values (0,null),(1, 0),(null,1)) select count(x,y) from t;
  1
  1 row selected

When someone writes “COUNT(*)” Calcite converts it to “COUNT()” internally.

Julian


> On Aug 16, 2023, at 6:45 AM, Kian Nassre  wrote:
> 
> Also I had a few more follow up questions, if you have the availability to
> respond:
> - Is it `RelToSqlConverter` or `SqlToRelConverter` where the changes need
> to be made?
> - I already have the `HASH` function working in its non-star format (e.g.
> `HASH(A, B, C)` or `HASH(X)`). Would using `FUNCTION_STAR` interfere with
> that?
> Thank you so much for your time!
> 
> On Wed, Aug 16, 2023 at 9:35 AM Kian Nassre  wrote:
> 
>> Hello Hongyu Guo,
>> 
>> Interesting! I’ll definitely explore this as an option. I’m curious
>> though, do the convertlet tables in RelToSqlConverter have access to enough
>> information to be able to look at a call to HASH(*) and derive the table
>> names (+ their column names) corresponding the scope? My understanding is
>> that the context objects available have access to the validator, but then
>> we’re right back where we started with the problem of how to access the
>> table/column names within the validator.
>> 
>> On 2023/08/16 13:27:06 Hongyu Guo wrote:
>>> Hi Kian Nassre,
>>> I try to create a HASH function instance to solve this problem
>>> ```
>>> public static final SqlFunction HASH =
>>>new SqlFunction("HASH",
>>>SqlKind.OTHER_FUNCTION,
>>>ReturnTypes.BIGINT,
>>>null,
>>>OperandTypes.ONE_OR_MORE,
>>>SqlFunctionCategory.NUMERIC) {
>>>  @Override public SqlSyntax getSyntax() {
>>>return SqlSyntax.FUNCTION_STAR;
>>>  }
>>> 
>>>  @Override public RelDataType deriveType(
>>>  SqlValidator validator, SqlValidatorScope scope, SqlCall call) {
>>>return validator.getTypeFactory().createSqlType(
>>>SqlTypeName.BIGINT);
>>>  }
>>> };
>>> ```
>>> It can pass validation, but it will throw an exception in the
>>> `RelToSqlConverter`
>>> stage. I notice `count(*)` is a special case similar to `hash(*)`, so I
>>> have looked into the
>>> code for COUNT function and AggConverter, where a trick is used to remove
>>> the `*` identifier in `count(*)`
>>> (org/apache/calcite/sql2rel/AggConverter.java:447),
>>> we cannot use similar implementations in `hash(*)`.
>>> 
>>> So i think one possible solution could be expanding `*` in `hash(*)`
>> during
>>> the
>>> `SqlToRelConverter` stage (a litter complex, you should derive the type
>> of
>>> `*`
>>> or `tbl1.*`), or check why `hash(*)` throws an exception in the
>>> `RelToSqlConverter`
>>> stage.
>>> 
>>> I hope this helps :)
>>> 
>>> Best
>>> Hongyu Guo
>>> 
>>> On 2023/08/15 04:31:41 Kian Nassre wrote:
 Hello,
 
 I’m working on a project that utilizes Calcite and I have interest in
 supporting functions that allow using * to specify all of the existing
 columns in a function call. For example, one function that I am
>> looking at
 is the variadic Snowflake HASH function which can accept all columns
>> with
 HASH(*). Trying to add this function as I would any other function I
 encounter issues in the validator because it does not recognize * as a
 valid identifier.
 
 My understanding is that the *  is essentially syntactic sugar which
>>> should
 be expanded in the same way that SELECT * FROM t is expanded early by
>> the
 validator.
 
 I’m wondering if anyone has any advice on the best way to add support
>> for
 this functionality. I have done some experimentation with replicating
>> the
 approach that the validator uses for expanding SELECT * FROM T into
 multiple columns. I do this using a variant of the private method
 expandStar.
 
 The first problem comes when I try to write my variant of expandStar
 because it seems I need to access the package private method
>>> getChildNames,
 or to the .names attribute of the package private class ScopeChild, so
>> I’m
 not understanding if that’s what I should be doing or if there’s an
 alternate method. I’m using it because I need some way to access the
>> table
 names from a scope.
 
 The other problem is that to get my version of expandStar called in all
>>> the
 places I need it to be called (e.g. 

Re: A link in the code has expired

2023-08-22 Thread Julian Hyde
Yes, when eigenbase.org  shut down I copied the contents 
of the wiki onto my server, http://www.hydromatic.net/wiki. 
http://www.hydromatic.net/wiki/Eigenbase_Introduction is a good place to start 
reading.

Maybe there’s a better place to store those files for posterity. I have just 
checked the files into git; see my fork of LucidDB: 
https://github.com/julianhyde/luciddb/commits/master. I’m trying to figure out 
how to make those files appear via GitHub pages.

Julian


> On Aug 16, 2023, at 6:54 AM, Michael Mior  wrote:
> 
> Thanks for pointing this out! It looks like Julian might have a mirror of
> that page at the link below that could be swapped out. Unfortunately, with
> the original page unavailable, I can't confirm.
> 
> http://www.hydromatic.net/wiki/RelationalExpressionMetadata
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> On Wed, Aug 16, 2023 at 9:30 AM Hongyu Guo  wrote:
> 
>> Hello,
>> 
>> I found a link in RelMetadataProvider.java has expired.
>> 
>> The link is http://wiki.eigenbase.org/RelationalExpressionMetadata
>> 
>> Found in
>> 
>> https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataProvider.java
>> 
>> 
>> 
>> Best,
>> 
>> Hongyu Guo
>> 



[jira] [Created] (CALCITE-5951) PRECEDES function, for period-to-date calculations

2023-08-22 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5951:


 Summary: PRECEDES function, for period-to-date calculations
 Key: CALCITE-5951
 URL: https://issues.apache.org/jira/browse/CALCITE-5951
 Project: Calcite
  Issue Type: Improvement
Reporter: Julian Hyde


We propose a {{PRECEDES}} function, for period-to-date calculations in built-in 
and custom time frames.

Like {{{}EXTRACT{}}}, {{{}FLOOR{}}}, and {{{}CEIL{}}}, {{PRECEDES}} works with 
built-in and custom time frames (see CALCITE-5155). Specifically:
 * {{PRECEDES(d1, d2, timeFrame)}} returns whether {{d1}} is earlier or equal 
in the time frame than {{{}d2{}}};
 * {{PRECEDES(d1, timeFrame)}} is shorthand for {{PRECEDES(d1, NOW(), 
timeFrame)}}

Examples:
 * {{PRECEDES(hire_date, WEEK)}} expands to {{EXTRACT(DAY_OF_WEEK FROM 
hire_date) <= EXTRACT(DAY_OF_WEEK FROM Now())}}
 * Similarly, {{PRECEDES(hire_date, MONTH)}} expands using a {{DAY_OF_MONTH}} 
calculation
 * The calculation for {{YEAR}} is a little more complex; one date in a leap 
year may have a greater offset in its year than another date in a non-leap year 
and still precede it.

The behavior of {{timeFrame}} is defined by the calendar of the user’s 
organization. Organizations can define whether weeks start on Monday or 
Tuesday, the start of a quarter and a year, and can define new time frames 
(e.g. months in a [4-4-5 
calendar|https://en.wikipedia.org/wiki/4%E2%80%934%E2%80%935_calendar]).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5941) Support LITERAL_AGG in Interpreter

2023-08-20 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5941:


 Summary: Support LITERAL_AGG in Interpreter
 Key: CALCITE-5941
 URL: https://issues.apache.org/jira/browse/CALCITE-5941
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


In CALCITE-4334 we added {{LITERAL_AGG}}, an internal aggregate function that 
returns a constant, but we did not extend {{Interpreter}} to execute queries 
containing {{LITERAL_AGG}}. This change fixes that.

Without that change, {{Interpreter}} throws a {{NullPointerException}} during 
construction:
{noformat}
java.lang.NullPointerException: agg.state
at java.base/java.util.Objects.requireNonNull(Objects.java:233)
at 
org.apache.calcite.interpreter.AggregateNode.getAccumulator(AggregateNode.java:206)
at 
org.apache.calcite.interpreter.AggregateNode.(AggregateNode.java:95)
at 
org.apache.calcite.interpreter.Nodes$CoreCompiler.visit(Nodes.java:54)
at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:577)
at 
org.apache.calcite.util.ReflectUtil.invokeVisitorInternal(ReflectUtil.java:263)
at 
org.apache.calcite.util.ReflectUtil.invokeVisitor(ReflectUtil.java:220)
at 
org.apache.calcite.util.ReflectUtil$1.invokeVisitor(ReflectUtil.java:470)
at 
org.apache.calcite.interpreter.Interpreter$CompilerImpl.visit(Interpreter.java:358)
at 
org.apache.calcite.interpreter.Nodes$CoreCompiler.visit(Nodes.java:48)
at 
org.apache.calcite.interpreter.Interpreter$CompilerImpl.visitRoot(Interpreter.java:312)
at 
org.apache.calcite.interpreter.Interpreter.(Interpreter.java:88)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [Discussion] How to get the table alias in RelNode tree or TableScan?

2023-08-17 Thread Julian Hyde
You can’t get the alias. In RelNode-land there is no such concept. Just the 
inputs to the current RelNode, identified by ordinal.

Maybe at the time of SQL-to-RelNode conversion you can convert the aliases in 
the hint into ordinals. That’s the last moment that aliases are available.  

Julian

> On Aug 16, 2023, at 7:51 PM, LakeShen  wrote:
> 
> Hi Community,
> 
> Currently I'm working on Calcite SQL Hint,and I want to know how to get the
> table alias in al RelNode tree.
> 
> For example,consider the following sql :
> 
> select /*+ BROADCAST(a,b) */ * from nation as a left join nation as b on
> a.N_REGIONKEY = b.N_REGIONKEY
> 
> The hint BROADCAST(a,b) indicates that the Shuffle type of Join uses
> BROADCAST,a is left table alias,b is right table alias.
> 
> SQL corresponding to RelNode Tree is :
> 
> LogicalProject(N_NATIONKEY=[$0], N_NAME=[$1], N_REGIONKEY=[$2],
> N_COMMENT=[$3], N_NATIONKEY0=[$4], N_NAME0=[$5], N_REGIONKEY0=[$6],
> N_COMMENT0=[$7]) LogicalJoin(condition=[=($2, $6)], joinType=[left])
> LogicalTableScan(table=[[tpch, NATION]]) LogicalTableScan(table=[[tpch,
> NATION]])
> 
> I want to know what table is alias a and what table is alias b,but I can't
> get the table for the aliases a and b in above RelNode tree.
> 
> In SqlToRelConverter#convertFrom method,it does not consider the alias
> information of the table.
> 
> Best,
> LakeShen


Re: [Discussion] Make SubTreeHintPropagateShuttle hint that match the search layer can be configured

2023-08-14 Thread Julian Hyde
I haven’t had time to review your code, but I want to point out that when you 
want things to propagate up the tree, the metadata system is often the best fit.

Hints are tricky. They originate as comments in SQL. Those comments are then 
applied to one RelNode when the SQL is translated. Now we are talking about 
propagating hints up the tree (to the direct and indirect consumers of a 
RelNode). And we also copy hints when we apply transformation rules. I continue 
to be worried that the number of hints will grow unbounded. We should entertain 
the possibility that some of these “hints” are of a different kind to others - 
explicit hints, that are part of the RelNode, vs derived hints, which are 
propagated by the metadata system (or just become existing forms of metadata).

Julian


> On Aug 13, 2023, at 8:51 PM, LakeShen  wrote:
> 
> Sorry, there is something wrong with the above email format.You could check
> it out below
> Hi community,
> 
> Now I am reading the source code of calcite sql hint,and I found in
> SubTreeHintPropagateShuttle class, from the root node, it will be the most
> search 3 layers down child nodes, trying to propagate
> the original RelNode hints, 3 are fixed.
> 
> SubTreeHintPropagateShuttle is a RelShuttle,and its purpose is that when a
> rule rewrited a RelNode Tree, need to propagate the original RelNode Hints
> to rewrite RelNode. More details could see RelOptUtil# propagateRelHints
> method.
> 
> My idea is that make SubTreeHintPropagateShuttle search child layers could
> let user configurable. One way is to add the setHintSearchNum and
> getHintSearchNum methods to the RelOptCluster class, and then in
> the RelOptUtil#propagateRelHints method, using
> originalRel.getCluster().getHintSearchNum
> to get the search the number of layers, and then transfer to
> SubTreeHintPropagateShuttle.This is just my idea, not the final
> implementation.
> 
> Code links: 1.
> https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L4199C7-L4202C8
> 2.
> https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L426C2-L435C34
> 
> Best,
> LakeShen
> 
> LakeShen  于2023年8月14日周一 11:47写道:
> 
>> Hi community, Now I am reading the source code of calcite sql hint,and I
>> found in SubTreeHintPropagateShuttle class, from the root node, it will be
>> the most search 3 layers down child nodes, trying to propagate
>> the original RelNode hints, 3 are fixed. SubTreeHintPropagateShuttle is a
>> RelShuttle,and its purpose is that when a rule rewrited a RelNode Tree,
>> need to propagate the original RelNode Hints to rewrite RelNode. More
>> details could see RelOptUtil# propagateRelHints method. My idea is that
>> make SubTreeHintPropagateShuttle search child layers could let user
>> configurable. One way is to add the setHintSearchNum and getHintSearchNum
>> methods to the RelOptCluster class, and then in the
>> RelOptUtil#propagateRelHints method, using 
>> originalRel.getCluster().getHintSearchNum
>> to get the search the number of layers, and then transfer to
>> SubTreeHintPropagateShuttle.This is just my idea, not the final
>> implementation. Code links: 1.
>> https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L4199C7-L4202C8
>> 2.
>> https://github.com/apache/calcite/blob/50c3edfc3d6630528ab51fe836bd50df82cc7db8/core/src/main/java/org/apache/calcite/plan/RelOptUtil.java#L426C2-L435C34
>> 
>> Best,
>> LakeShen
>> 



Re: [Question] Working with defaults in calcite.

2023-08-14 Thread Julian Hyde
If you way there are no execution tests, I wouldn’t be surprised. DDL was 
historically deemed out of scope, and what we didn’t implement, we couldn’t 
test. But now we have the ’server’  component, and there are tests such as 
table.iq [1].

Can you add some tests?

Julian

[1] 
https://github.com/apache/calcite/blob/main/server/src/test/resources/sql/table.iq
 

> On Aug 14, 2023, at 7:40 AM, stanilovsky evgeny  
> wrote:
> 
> Hello community !
> I just try a simple case, like :
> 
> create table foo (i int not null, j int default 100);
> 
> and obtain parser error: parse failed: Encountered "default" at line 1, 
> column 41.
> 
> while
> create table foo (i int not null, j int);
> insert into foo values (1, DEFAULT);
> 
> throws no exceptions in quidem test framework
> 
> i also find SqlValidatorTest#testInsertShouldNotCheckForDefaultValue that 
> contains newColumnDefaultValue:
> 
>// Now remove DEPTNO, which has a default value, from the target list.
>// Will generate an extra call to newColumnDefaultValue at sql-to-rel time,
>// just not yet.
>final String sql4 = "insert into ^emp^ (empno, ename, job, mgr, 
> hiredate,\n"
>+ "  sal, comm, slacker)\n"
>+ "values(1, 'nom', 'job', 0,\n"
>+ "  timestamp '1970-01-01 00:00:00', 1, 1, false)";
> 
> org.apache.calcite.sql2rel.InitializerExpressionFactory#newColumnDefaultValue
> 
> so is it correct that:
> 1. calcite has no execution tests for CREATE TABLE with DEFAULT columns ?
> 2. seems can`t process correctly such kind of columns with insertions ? (i 
> see no tests)
> 
> CREATE TABLE integers(i INTEGER PRIMARY KEY, col1 INTEGER DEFAULT 200, col2 
> INTEGER DEFAULT 100)
> 
> INSERT INTO integers VALUES (1, DEFAULT, DEFAULT)
> INSERT INTO integers(i, col2) VALUES (2, DEFAULT), (3, 4), (4, DEFAULT)");
> 
> thanks !



Re: Understanding the RelMdColumnUniqueness private constructor

2023-08-14 Thread Julian Hyde
I checked a few other RelMdXxx classes, and. they have private constructors 
too. Which makes sense, because you don’t extend metadata providers by 
subclassing. You just define your own provider and add it to the chain.

RelMdColumnUniqueness has

  public static final RelMetadataProvider SOURCE =
  ReflectiveRelMetadataProvider.reflectiveSource(
  new RelMdColumnUniqueness(),
  BuiltInMetadata.ColumnUniqueness.Handler.class);

You could define a similar SOURCE in your class that chains 
RelMdColumnUniqueness handlers with your own.

Julian


> On Aug 14, 2023, at 9:32 AM, Nick Riasanovsky  wrote:
> 
> Hi everyone,
> 
> I'm trying to understand why RelMdColumnUniqueness. I have interest in
> subclassing RelMdColumnUniqueness in my project and am unable to do so
> because the constructor is private. Other metadata queries don't appear to
> have private constructors, such as RelMdRowCount. If anyone has any
> insights or context that would be very helpful.
> 
> Thanks,
> Nick Riasanovsky



Re: TimeString

2023-08-11 Thread Julian Hyde
Calcite doesn’t use java.sql.Time internally to represent values of SQL type 
TIME (except when you read them using JDBC). It uses int.

Suppose we wanted to support TIME values to attosecond (10^-18 second) 
precision. We could process values in any type that can hold integers with 24 
decimal digits (there are 86,400 seconds in a day, which is 8 * 10^23 
attoseconds). So basically we need an efficient implementation of numbers with 
an arbitrary (but fixed) decimal precision. 

We would also have to provide a way to get arbitrary precision values from 
JDBC. Rather than calling ResultSet.getTime(), people could get the values by 
calling ResultSet.getObject() to return a BigDecimal value, or by calling 
ResultSet.getString().

Julian



> On Aug 11, 2023, at 3:09 PM, Mihai Budiu  wrote:
> 
> In this question I was referring to the TimeString class in particular.
> 
> Related to this, I have filed an issue with the compile-time evaluation of
> Time expressions:
> https://issues.apache.org/jira/browse/CALCITE-5919
> 
> Unfortunately it looks like in the Java type system Time values are
> represented using int and not long. This is a big problem, since it
> precludes using high precision time values.
> 
> Mihai
> 
> On Fri, Aug 11, 2023 at 2:50 PM Julian Hyde  wrote:
> 
>> In compilers it is a best practice to represent literals using exact,
>> unbounded precision values. In Calcite we use BigDecimal for numbers and
>> TimeString (TimestampString, DateString) for date and time literals.
>> 
>> At runtime there are different considerations. We need a representation
>> that is space and time efficient, and therefore will choose something like
>> a 64 bit signed integer for timestamp values, which gives us
>> millisecond precision over a reasonable range of dates. (No primitive Java
>> data type stores more than 64 bits.) If we are bounded by 64 bits, then
>> extending the precision to microseconds or nanoseconds or beyond will cause
>> us to have to cover a smaller range of dates. If we want unlimited
>> precision over a reasonable range, we have to use a representation,
>> analogous to BigDecimal, that stores values on the Java heap.
>> 
>> It would be nice if people could choose an alternative runtime
>> representation of date-time values. I would welcome such a contribution.
>> 
>> Julian
>> 
>> On Aug 11, 2023, at 00:54, Stamatis Zampetakis  wrote:
>> 
>> Hey Mihai,
>> 
>> I'm not sure to which part of calcite you refer to by saying that it
>> supports arbitrary precision strings but it is not uncommon for consumers
>> to use only a single component from calcite (say the parser) so I wouldn't
>> be surprised if there are implementation gaps in other places.
>> 
>> Best,
>> Stamatis
>> 
>> 
>> On Fri, Aug 11, 2023, 3:07 AM  wrote:
>> 
>> Why does Calcite support arbitrary precision time-strings and yet makes it
>> 
>> practically impossible to extract anything but milliseconds?
>> 
>> 
>> 
>> 
>> Mihai
>> 



Re: TimeString

2023-08-11 Thread Julian Hyde
In compilers it is a best practice to represent literals using exact,
unbounded precision values. In Calcite we use BigDecimal for numbers and
TimeString (TimestampString, DateString) for date and time literals.

At runtime there are different considerations. We need a representation
that is space and time efficient, and therefore will choose something like
a 64 bit signed integer for timestamp values, which gives us
millisecond precision over a reasonable range of dates. (No primitive Java
data type stores more than 64 bits.) If we are bounded by 64 bits, then
extending the precision to microseconds or nanoseconds or beyond will cause
us to have to cover a smaller range of dates. If we want unlimited
precision over a reasonable range, we have to use a representation,
analogous to BigDecimal, that stores values on the Java heap.

It would be nice if people could choose an alternative runtime
representation of date-time values. I would welcome such a contribution.

Julian

On Aug 11, 2023, at 00:54, Stamatis Zampetakis  wrote:

Hey Mihai,

I'm not sure to which part of calcite you refer to by saying that it
supports arbitrary precision strings but it is not uncommon for consumers
to use only a single component from calcite (say the parser) so I wouldn't
be surprised if there are implementation gaps in other places.

Best,
Stamatis


On Fri, Aug 11, 2023, 3:07 AM  wrote:

Why does Calcite support arbitrary precision time-strings and yet makes it

practically impossible to extract anything but milliseconds?




Mihai


[jira] [Created] (CALCITE-5916) In RelBuilder, add sample() method (equivalent to SQL TABLESAMPLE clause)

2023-08-10 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5916:


 Summary: In RelBuilder, add sample() method (equivalent to SQL 
TABLESAMPLE clause)
 Key: CALCITE-5916
 URL: https://issues.apache.org/jira/browse/CALCITE-5916
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


In RelBuilder, add sample() method (equivalent to SQL TABLESAMPLE clause).

In {{SqlToRelConverter}}, the code
{code}
if (tableSampleSpec.getSamplePercentage() == 0f) {
  bb.setRoot(relBuilder.push(bb.root()).empty().build(), true);
} else {
  RelOptSamplingParameters params =
  new RelOptSamplingParameters(
  tableSampleSpec.isBernoulli(),
  tableSampleSpec.getSamplePercentage(),
  tableSampleSpec.isRepeatable(),
  tableSampleSpec.getRepeatableSeed());
  bb.setRoot(new Sample(cluster, bb.root(), params), false);
}
{code}
would become
{code}
relBuilder.push(bb.root());
relBuilder.sample(tableSampleSpec.isBernoulli(),
tableSampleSpec.getSamplePercentage(),
tableSampleSpec.isRepeatable(),
tableSampleSpec.getRepeatableSeed());
bb.setRoot(relBuilder.build(), false);
{code}
Note that {{RelBuilder.sample}} calls {{empty()}} if the rate = 0. It should 
also skip sampling if rate = 1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Why is Filter's condition required to be flat?

2023-08-09 Thread Julian Hyde
Flattened expressions - so that AND never contains a child that is
AND, and OR never contains a child that is OR - is a canonized form
that seems to have more advantages than disadvantages. It's never
larger than the original (unlike CNF and DNF), frequently smaller than
the original, doesn't take much effort to canonize, and allows the
multiple consumers of expressions (simplifications, planner rules) to
make some assumptions that yield good results efficiently.

If you are only producing expressions you want Calcite to be liberal
in what it accepts; but if you are consuming expressions (e.g.
maintaining a rule) you want Calcite to be conservative in what it
accepts. You can't please both producers and consumers - it's a zero
sum game - but flattened expressions seemed to be a reasonable
compromise.

I hope there's a simple process to get your expressions flattened. If
there's not, log a bug, so that there's a documented process.

By the way, your note reminds me that we're not as diligent at
flattening OR as we are at flattening AND. We should fix that.

Julian

On Wed, Aug 9, 2023 at 9:09 AM Ian Bertolacci
 wrote:
>
> Hello,
> I’m curious about why Filter requires that the condition be a flattened tree?
> We have some transformations which occasionally result in non-flat trees, 
> which causes issues for us in testing.
>
> I know we can avoid this by constructing expressions using RelBuilder or the 
> RexUtil.composeConjunction, but our transformers work very similar to 
> RexShuttle, which simply clones a RexCall on reconstruction.
> So there are situations where a subexpression to an `AND` or `OR` expression 
> result in a nested `AND` or `OR` expression.
> One solution is to use RexUtil.flatten, but that only applies flattening for 
> subexpressions with the same operator, so that expressions like `AND( A, AND( 
> B, C ), OR( E, OR( F, G ) )`  only get partially flattened to only gets 
> flattened to `AND( A, B, C, OR( E, OR( F, G) )`  which still isn’t flat; the 
> correct flattening would be `AND( A, B, C, OR( E, F, G ) )`
> Similarly, something like `AND( A, AND( B, C ) ) == …` wouldn’t get flattened 
> at all.
>
> Is there a real reason for having this assertion?
> Would removing it cause problems?
> Thanks!
> -Ian J. Bertolacci


[jira] [Created] (CALCITE-5914) Cache compiled regular expressions in SQL function runtime

2023-08-09 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5914:


 Summary: Cache compiled regular expressions in SQL function runtime
 Key: CALCITE-5914
 URL: https://issues.apache.org/jira/browse/CALCITE-5914
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Cache compiled regular expressions (and other amortized work) in SQL function 
runtime.

Consider the following query:
{code}
SELECT ename, job, RLIKE(ename, 'A.*'), RLIKE(ename, job || '.*')
FROM emp
{code}

The first regular expression, {{A.*}}, is constant and can be compiled at 
prepare time or at the start of execution; the second regular expression '{{job 
|| '.*'}}' might vary from one row to the next. However if the {{job}} column 
has a small number of values it still might be beneficial to cache the compiled 
regular expression.

If {{SqlFunctions.rlike}} could use a cache (mapping from {{String}} to 
{{java.util.regex.Pattern}}) then it would achieve benefits in both the 
constant and non-constant cases.

The cache needs to:
 * be thread-safe (in case queries are executing using multiple threads),
 * return thread-safe objects (as is {{Pattern}}),
 * have bounded space (so that a query doesn't blow memory with 1 million 
distinct regular expressions),
 * disposed after the query has terminated,
 * (ideally) share with regexes of the same language in the same query,
 * not conflict with regexes of different languages in the same query.

One possibility is to add an {{interface FunctionState}}, with subclasses 
including {{class RegexpCache}}, and if argument 1 of a function is a subclass 
of {{FunctionState}} the compiler would initialize the state in the generated 
code. The function can rely on the state argument being initialized, and being 
the same object from one call to the next. Example:

{code}
interface FunctionState {
}

class RegexpCache implements FunctionState {
  final Cache cache = ...;
}
{code}

This change should install the cache for all applicable functions, including 
LIKE, ILIKE, RLIKE, SIMILAR, posix regex, REGEXP_CONTAINS, REGEXP_REPLACE, 
other REGEXP_ functions, PARSE_URL.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [Question] What is meaning of tablesample substitute('medium')?

2023-08-08 Thread Julian Hyde
It substitutes a named data set for a table.

We added it to support SQLstream, a streaming SQL system. When developing 
streaming queries, it was sometimes convenient to substitute a table for a 
stream, because the table had predictable results and didn’t need another 
process inserting the rows.

I don’t t think anyone is using this functionality, so in a dev branch [1] I 
plan to remove the feature. See the commit 'Refactor: Remove TABLESAMPLE 
SUBSTITUTE('name’)’.

Julian

[1] 
https://github.com/apache/calcite/compare/main...julianhyde:calcite:-blackboard




> On Aug 8, 2023, at 4:27 AM, LakeShen  wrote:
> 
> Hi all,
> 
> Currently I see tablesample substitute('medium') in calcite's parser.jj
> file and some single tests, but I looked at the tablesample documentation
> for Postgresql and some other databases, Not seeing this SQL syntax, I was
> wondering what exactly tablesample substitute('medium') means?
> At the same time,consider the following SQL:
> select * from product TABLESAMPLE SUBSTITUTE('medium') After the SqlNode to
> RelNode, it also loses the TABLESAMPLE SUBSTITUTE('medium') information.
> Best, LakeShen



Re: [Discussion] Can we forbidden SEARCH operator when use other execution engine?

2023-08-08 Thread Julian Hyde
The optimizations are the reason that SEARCH (and Sarg) exist. For the 
simplifier to handle all of the combinations of <, <=, >, >=, =, <>, and AND, 
OR, NOT is prohibitively expensive; if the same expressions are converted to 
Sargs they can be optimized using simple set operations.


> On Aug 4, 2023, at 5:57 PM, P.F. ZHAN  wrote:
> 
> Thanks, Julian, for your answer. Initially, I was thinking that SEARCH
> might not be essential. Since Calcite first translates it into a SEARCH
> operator, and then we can use RexUtil#expandSearch to rewrite it into an
> operator supported by Spark, it seems like doing the same job twice. So,
> instead, why not consider disabling this operator, thus avoiding the need
> to reconvert the operator back into an expression supported by other
> execution engines, such as Spark, later on? Frankly speaking, I might not
> have taken into account the potential simplifications brought by the SEARCH
> operator for optimizing the execution plan. If these
> simplifications produce a more efficient execution plan or make the
> optimization stage more efficient, then I'm open to exploring ways to
> implement an equivalent transformation within Kylin, or even exploring the
> possibility of creating a similar implementation in other execution engines
> like Spark.
> 
> On Sat, Aug 5, 2023 at 2:57 AM Julian Hyde  wrote:
> 
>> I agree that it should be solved ‘by config’ but not by global config. The
>> mere fact that you are talking to Spark (i.e. using the JDBC adapter with
>> the Spark dialect) should be sufficient right?
>> 
>> Put another way. Calcite’s internal representation for expressions is what
>> it is. The fact that SEARCH is part of that representation has many
>> benefits for simplification. Just expect there to be a a translation step
>> from that representation to any backend.
>> 
>> Julian
>> 
>> 
>>> On Aug 4, 2023, at 7:22 AM, P.F. ZHAN  wrote:
>>> 
>>> Very nice suggestion. I wonder can we introduce this feature by config?
>>> Maybe it’s better for users using more than one query engine to interpret
>>> and execute query.
>>> 
>>> 
>>> On Fri, Aug 4, 2023 at 22:03 Alessandro Solimando <
>>> alessandro.solima...@gmail.com> wrote:
>>> 
>>>> Hello,
>>>> as LakeShen suggests, you can take a look into RexUtil#expandSearch, you
>>>> can see it in action in RexProgramTest tests, one example:
>>>> 
>>>> 
>>>> 
>> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/test/java/org/apache/calcite/rex/RexProgramTest.java#L1710-L1727
>>>> 
>>>> Best regards,
>>>> Alessandro
>>>> 
>>>> On Fri, 4 Aug 2023 at 15:45, LakeShen 
>> wrote:
>>>> 
>>>>> Hi P.F.ZHAN,in calcite,it has a method RexUtil#expandSearch to expand
>>>>> Search,maybe you could get some information from this method.
>>>>> 
>>>>> There is also some logic to simplify Search in the
>>>>> RexSimplify#simplifySearch method. I hope this could help you.
>>>>> Here's the code: 1.
>>>>> 
>>>>> 
>>>> 
>> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L593
>>>>> 2.
>>>>> 
>>>>> 
>>>> 
>> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L2132
>>>>> 
>>>>> Best,
>>>>> LakeShen
>>>>> 
>>>>> Soumyadeep Mukhopadhyay  于2023年8月4日周五 20:29写道:
>>>>> 
>>>>>> Thank you, shall explore more on this! :)
>>>>>> 
>>>>>> 
>>>>>> On Fri, 4 Aug 2023 at 5:53 PM, P.F. ZHAN  wrote:
>>>>>> 
>>>>>>> Aha, I'm using Apache Kylin which uses Calcite to generate a logical
>>>>>> plan,
>>>>>>> then convert to Spark plan to execute a query. Given that Calcite has
>>>>>> more
>>>>>>> operations for aggregations, and Kylin  wants to take full advantage
>>>> of
>>>>>>> precomputed cubes (something like Calcite's materialized views), it
>>>>> uses
>>>>>>> both Calcite and Spark(for distribution computing). Maybe it's wild
>>>>> and a
>>>>>>> little fun, but it does works well on many scena

Re: Extending SqlColumnDeclaration

2023-08-07 Thread Julian Hyde
Intentional. See 
https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/main/java/org/apache/calcite/sql/ddl/package-info.java#L25:

 * If you are writing a project that requires DDL it is likely that your
 * DDL syntax is different than ours. We recommend that you copy-paste this
 * the parser and its supporting classes into your own module, rather than try
 * to extend this one.

I could see small modifications being made to SqlColumnDeclaration. A public 
constructor would have made it harder to do those while preserving 
compatibility. 

> On Aug 7, 2023, at 4:35 PM,   wrote:
> 
> I notice that SqlColumnDeclaration has a non-public constructor, which makes
> it impossible to extend from a different package.
> 
> Is this design intentional or is it a bug?
> 
> 
> 
> (There are some good reasons for extending this class. For example, a
> comment on the class says "FOREIGN KEY when we support it".)
> 
> 
> 
> Thank you,
> 
> Mihai
> 



Re: [DISCUSS] Increase the default maximum precision and scale of the number type

2023-08-07 Thread Julian Hyde
Calcite works best when it is able to execute the SQL that it accepts. If we 
were to allow say DECIMAL(200, 5) we would need to be able to generate Java 
code that can handle values of that type. The current maximum, DECIMAL(19), can 
be handled by 64-bit signed integers (Java long). 

> On Aug 7, 2023, at 7:18 PM, Ran Tao  wrote:
> 
> hi, mbudiu.
> 
> I get your point.  Ofcourse, we can override `RelDataTypeSystemImpl`.
> however, my suggestion is that we change the maximum 19 of
> `RelDataTypeSystemImpl` to meet the mature engines,
> If there is no specific reason for setting the maximum value of 19.
> 
>  于2023年8月8日周二 01:03写道:
> 
>> Calcite gives you hooks to change these limits.
>> 
>> For example:
>> 
>> public static final RelDataTypeSystem TYPE_SYSTEM = new
>> RelDataTypeSystemImpl() {
>>@Override
>>public int getMaxNumericPrecision() {
>>return MAX_PRECISION;
>>}
>>@Override
>>public int getMaxNumericScale() {
>>return MAX_SCALE;
>>}
>>@Override
>>public boolean shouldConvertRaggedUnionTypesToVarying() { return
>> true; }
>>};
>> 
>> 
>> this.typeFactory = new SqlTypeFactoryImpl(TYPE_SYSTEM);
>> 
>> Then you can use this type factory in other structures you build, like the
>> CatalogReader, the Validator, and the Cluster.
>> 
>> 
>> -Original Message-
>> From: Ran Tao
>> Sent: Sunday, August 06, 2023 7:58 PM
>> To: dev@calcite.apache.org
>> Subject: [DISCUSS] Increase the default maximum precision and scale of the
>> number type
>> 
>> Hi, devs. I have noticed that currently calcite default maximum precision
>> and scale is 19.
>> If we set Decimal(50,2), it will reduce to Decimal(19,2).
>> while other mature database systems or bigdata engines have more bigger
>> value.
>> 
>> For consistency and to give calcite users a standard value that meets most
>> engines, can we adjust the value of calcite to 38?
>> 
>> Besides, the sql standard defines the meaning of precision and scale,
>> letting each system customize the maximum precision and scale. (sql1999,
>> page 125)
>> 
>> engine precision range scale range
>> calcite [1,19] [1,19]
>> oracle [1,38] [-84,127]
>> sqlserver [1,38] [0,38]
>> mysql [1,65] [0,30]
>> spark [1,38] [0,38]
>> hive [1,38] [0,38]
>> flink [1,38] [0,38]
>> We can see many systems or engines follow the sql standard to make maximum
>> precision and scale to be 38, except oracle and mysql.
>> 
>> If anyone can share some history or reasons why calcite set 19, i will
>> appreciate it.
>> 
>> [1]
>> 
>> https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlqr/Data-Types.html#GUID-DCCC6F18-15A0-4ECC-BA48-16F73F844844
>> 
>> [2]
>> 
>> https://learn.microsoft.com/en-us/sql/t-sql/data-types/decimal-and-numeric-transact-sql?view=sql-server-ver15
>> 
>> [3]
>> 
>> https://dev.mysql.com/doc/refman/8.0/en/precision-math-decimal-characteristics.html
>> 
>> [4]
>> 
>> https://spark.apache.org/docs/3.2.0/api/java/org/apache/spark/sql/types/DecimalType.html
>> 
>> [5]
>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=82706456#LanguageManualTypes-decimal
>> 
>> [6]
>> 
>> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/table/api/DataTypes.html
>> 
>> 
>> Best Regards,
>> Ran Tao
>> https://github.com/chucheng92
>> 
>> 



Re: DECIMAL(2, 3) meaning

2023-08-06 Thread Julian Hyde
As I commented in https://issues.apache.org/jira/browse/CALCITE-5901, I don’t 
think it’s a bug to support behavior beyond what the standard requires. Which 
Calcite does, intentionally. 

Julian

> On Aug 6, 2023, at 08:35, stanilovsky evgeny  
> wrote:
> 
> Ok, seems like a bug.
> Feel free to fill the issue.
> 
>> I have added this test to SqlOperatorTest:
>> 
>>f.checkScalar("cast(0.012 as DECIMAL(2, 5))", new BigDecimal("0.012"),
>>"DECIMAL(2, 5) NOT NULL");
>> 
>> and it has passed. That's why I am asking. It should fail, but it doesn't.
>> 
>> Mihai
>> 
>> -Original Message-
>> From: stanilovsky evgeny
>> Sent: Friday, August 04, 2023 7:00 AM
>> To: dev@calcite.apache.org
>> Subject: Re: DECIMAL(2, 3) meaning
>> 
>> Hello Mihai.
>> A bit older standard describes Precision as : Precision of decimal 
>> floating-point values is a positive value that specifies the number of 
>> significant decimal digits in the mantissa.
>> 
>> Thus:
>> cast(0.012 as DECIMAL(3, 3)) - ok
>> cast(0.012 as DECIMAL(2, 3)) - fail
>> cast(0.012 as DECIMAL(1, 3)) - fail
>> cast(0.012 as DECIMAL(2, 5)) - fail
>> 
>> 
>>> Hello,
>>> 
>>> 
>>> I notice that Calcite happily accepts decimal type specifications
>>> where the scale is greater than the precision.
>>> 
>>> There are quite a few tests with such types.
>>> 
>>> 
>>> What is the meaning of such types?
>>> 
>>> 
>>> The SQL 92 standard has this statement on page 109:
>>> 
>>> 
>>> 15)The  of an  shall not be greater than
>>> 
>>>the  of the .
>>> 
>>> 
>>> Thank you,
>>> 
>>> Mihai


Re: [Discussion] Can we forbidden SEARCH operator when use other execution engine?

2023-08-04 Thread Julian Hyde
I agree that it should be solved ‘by config’ but not by global config. The mere 
fact that you are talking to Spark (i.e. using the JDBC adapter with the Spark 
dialect) should be sufficient right?

Put another way. Calcite’s internal representation for expressions is what it 
is. The fact that SEARCH is part of that representation has many benefits for 
simplification. Just expect there to be a a translation step from that 
representation to any backend.

Julian


> On Aug 4, 2023, at 7:22 AM, P.F. ZHAN  wrote:
> 
> Very nice suggestion. I wonder can we introduce this feature by config?
> Maybe it’s better for users using more than one query engine to interpret
> and execute query.
> 
> 
> On Fri, Aug 4, 2023 at 22:03 Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
> 
>> Hello,
>> as LakeShen suggests, you can take a look into RexUtil#expandSearch, you
>> can see it in action in RexProgramTest tests, one example:
>> 
>> 
>> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/test/java/org/apache/calcite/rex/RexProgramTest.java#L1710-L1727
>> 
>> Best regards,
>> Alessandro
>> 
>> On Fri, 4 Aug 2023 at 15:45, LakeShen  wrote:
>> 
>>> Hi P.F.ZHAN,in calcite,it has a method RexUtil#expandSearch to expand
>>> Search,maybe you could get some information from this method.
>>> 
>>> There is also some logic to simplify Search in the
>>> RexSimplify#simplifySearch method. I hope this could help you.
>>> Here's the code: 1.
>>> 
>>> 
>> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/main/java/org/apache/calcite/rex/RexUtil.java#L593
>>> 2.
>>> 
>>> 
>> https://github.com/apache/calcite/blob/98f3048fb1407e2878162ffc80388d4f9dd094b2/core/src/main/java/org/apache/calcite/rex/RexSimplify.java#L2132
>>> 
>>> Best,
>>> LakeShen
>>> 
>>> Soumyadeep Mukhopadhyay  于2023年8月4日周五 20:29写道:
>>> 
 Thank you, shall explore more on this! :)
 
 
 On Fri, 4 Aug 2023 at 5:53 PM, P.F. ZHAN  wrote:
 
> Aha, I'm using Apache Kylin which uses Calcite to generate a logical
 plan,
> then convert to Spark plan to execute a query. Given that Calcite has
 more
> operations for aggregations, and Kylin  wants to take full advantage
>> of
> precomputed cubes (something like Calcite's materialized views), it
>>> uses
> both Calcite and Spark(for distribution computing). Maybe it's wild
>>> and a
> little fun, but it does works well on many scenarios.
> 
> On Fri, Aug 4, 2023 at 8:10 PM Soumyadeep Mukhopadhyay <
> soumyamy...@gmail.com> wrote:
> 
>> I am curious about your use case. Are you not losing out on the
>> optimisations of Calcite when you are using Spark? Is it possible
>> for
 you
>> to share a general approach where we will be able to keep the
> optimisations
>> done by Calcite and use Spark on top of it?
>> 
>> 
>> On Fri, 4 Aug 2023 at 5:19 PM, P.F. ZHAN 
>> wrote:
>> 
>>> Generally speaking, the SEARCH operator is very good, but when we
>>> use
>>> Calcite to optimize the logical plan and then use Spark to
>> execute,
> this
>> is
>>> unsupported. So is there a more elegant way to close the SEARCH
> operator?
>>> Or how to convert the SEARCH operator to the IN operator before
>> converting
>>> the Calcite logical plan to the Spark logical plan? If we do
>> this,
>>> we
>> need
>>> to consider Join / Filter, are there any other RelNodes?
>>> 
>>> Maybe, this optimization is optional more better at present for
>>> many
>> query
>>> execution engine does not support this operator?
>>> 
>> 
> 
 
>>> 
>> 



Re: AggregateProjectPullUpConstantsRule not working with the Volcano Planner

2023-08-03 Thread Julian Hyde
The constraints on a RelSubset should be the union of the constraints of all of 
the RelNodes in that subset. (I haven’t tested it, or read the code. But if it 
doesn’t do that, you’re probably seeing a bug, or you haven’r configured your 
metadata providers correctly.)

> On Aug 3, 2023, at 9:33 AM, Nick Riasanovsky  wrote:
> 
> Hello everyone,
> 
> I am attempting to use AggregateProjectPullUpConstantsRule within an
> optimizer that is using the VolcanoPlanner. When doing so I encounter
> issues with what should be valid constants on the line
> `mq.getPulledUpPredicates()` because aggregate.getInput() is a RelSubset.
> As a result, the predicates list is always empty. If I replace the uses of
> aggregate.getInput() with input then everything works as expected.
> 
> I am happy to submit a PR upstream with this fix, but I first want to
> confirm that my interpretation is correct as I am relatively new to using
> the volcano planner and could have missed a configuration step.
> 
> Thanks everyone,
> Nick Riasanovsky



Re: [Question] CAST possibility cases

2023-08-02 Thread Julian Hyde
I think there’s at least one jira case discussing Boolean cast. Can you locate 
it?

> On Aug 2, 2023, at 5:46 AM, stanilovsky evgeny  
> wrote:
> 
> hello all !
> According to sql standard, chapter: 6.22 
> We can found, for example, that casting from BOOLEAN is deprecated into Exact 
> Numeric, but according to SqlTypeCoercionRule
> it`s possible for now. Probably i miss something, can someone highlight me in 
> such a case? If no - did i need to fill the appropriate issue ?
> Also i can`t found sufficient CAST tests (
> 
> thanks !


Re: Creating user-defined functions dynamically

2023-08-01 Thread Julian Hyde
When you prepare a statement you can provide an instance of SqlOperatorTable 
that contains any functions you desire. (The ‘fun’ JDBC connect string 
parameter is just one means to construct such a table. It happens to be 
particularly convenient for people who wish to create a connection by writing a 
URI and not by writing code.)

You can also specify your own UDFs in a model file. See the ‘functions’ 
attribute of ‘Map Schema’ in [1]. These UDFs are not global, but live in a 
particular schema and are visible only if the user has that schema on their 
path.

Julian

[1] https://calcite.apache.org/docs/model.html#map-schema

> On Aug 1, 2023, at 1:40 PM, mbu...@gmail.com wrote:
> 
> Hello all,
> 
> 
> 
> I have a question about user-defined functions. I see that one can extend
> Calcite by adding new functions to various libraries.
> 
> My question is whether there exists a mechanism to add such functions
> *dynamically*, without changing the Calcite code.
> 
> This would allow users to write UDFs in other languages, as long as their
> SQL backends support them. What some PRs call UDFs are not really
> user-defined, they are dialect-defined. 
> 
> 
> 
> In principle, Calcite only needs to know the function type signatures to
> pass them through all the compilation stages, so a grammar extension for
> declaring function "prototypes" could be enough to make this work (at least
> for functions which are not polymorphic and thus require no type inference).
> There is a complication with constant evaluation: I think that some Calcite
> optimization stages try to evaluate functions with compile-time known
> arguments at compilation time. One workaround would be to declare such
> functions as "nondeterminstic", but perhaps there is a more principled way.
> 
> 
> 
> Thank you,
> 
> Mihai
> 



Re: PR CALCITE-5681 Review Request

2023-08-01 Thread Julian Hyde
I took a very quick look. This looks very well structured (with nice 
abstractions Principal, Grant and Revoke commands, and I like how you have made 
the parser extensible). This is definitely worth reviewing and getting to 
complation.

* Terminology. You have made Grant and Revoke sub-classes of Privilege. I would 
rename Privilege to AuthCommand (or something), because a privilege is a ’thing 
you can do’ and grant and revoke are ‘requests to change the things you can do’.
* CalcitePrincipal is used in too many places. We should just use Principal. 
E.g. CalciteSchema.getAccessType(CalcitePrincipal) should be 
getAccessType(Principal). We want to make it easy for people to plug in their 
own access scheme. CalcitePrincipal can be used for tests and simple demos.

Should CatalogReader.getAllowedAccess() be changed to 
CatalogReader.getAllowedAccess(Principal)? I can see arguments both ways. One 
would require CatalogReader to be a filtered view for the current statement’s 
principal(s).

I’ll add these comments to the Jira case. I encourage others to have a similar 
‘quick look’ so that we can uncover the major issues quickly, before we move to 
more detailed review.

Julian



 

> On Aug 1, 2023, at 5:00 AM, Hongyu Guo  wrote:
> 
> Hi, community!
> 
> I’ve submitted a PR for "supporting authorization via GRANT and REVOKE DDL 
> commands.”
> 
> In this PR, I implemented basic access control for calcite by adding GRANT 
> and REVOKE syntax.
> 
> JIRA case link: https://issues.apache.org/jira/browse/CALCITE-5816 
> 
> Github PR link: https://github.com/apache/calcite/pull/3284 
> 
> 
> Can someone make a review? 
> 
> This is the first new feature I submitted in the community. 
> 
> I need some feedback and suggestions to improve it. Thank you!
> 
> Best
> Hongyu Guo



Re: Empty array literal

2023-08-01 Thread Julian Hyde
I don’t know - maybe it’s not in the sql standard? Can’t deduce a type? Any 
existing issues logged?

Julian

> On Jul 31, 2023, at 6:35 PM, mbu...@gmail.com wrote:
> 
> Hello,
> 
> 
> 
> It looks to me like the Calcite grammar does not accept empty array literals
> "ARRAY[]".
> 
> This sounds like a serious omission.
> 
> Is there a fundamental reason why this isn't in the grammar?
> 
> If there isn't, I can file an issue for it.
> 
> I can see a workaround using the Spark "ARRAY()" function.
> 
> 
> 
> Thank you,
> 
> Mihai
> 


Re: Community over Code (ApacheCon) East Asia

2023-07-31 Thread Julian Hyde
Great news. Thank you for submitting a talk!

I believe there’s a way to order Calcite laptop stickers before the event. You 
can leave the stickers at the ComDev booth so that anyone attending can have a 
sticker for their laptop. I’ve forgotten the details but it’s worth starting 
the process a few weeks before the event.

You could also consider hosting a Calcite BoF (birds of a feather) session.

Julian


> On Jul 31, 2023, at 12:19 AM, Benchao Li  wrote:
> 
> I'll attend, and give a talk[1] in the keynote track about my experiences
> contributing to Apache projects (Calcite and Flink).
> 
> Hope to see you guys there!
> 
> [1] https://apachecon.com/acasia2023/sessions/keynote-1234.html
> 
> Julian Hyde  于2023年6月12日周一 05:15写道:
> 
>> Community over Code (ApacheCon) East Asia is happening in August, and
>> the call for papers closes in LESS THAN A WEEK [1].
>> 
>> I'm proud of how many Calcite community members we have in China and
>> East Asia, and I appreciate the constant stream of excellent and
>> innovative contributions.
>> 
>> If you are one of those people, I encourage you to attend the
>> conference, or, better, submit a talk proposal. This is a great
>> opportunity to bring new people into the Calcite community, to
>> convince projects and companies based in China that Calcite is a good
>> core technology, or just to get to know each other better.
>> 
>> I have attended several ApacheCon conferences in the US and Canada,
>> and they have helped me gain an understanding of what Apache does, and
>> make some lasting friendships with people in other projects.
>> 
>> If you have an interesting talk about Calcite, please submit it!
>> 
>> I believe that there is travel assistance available for those who need
>> financial help getting to the event [2].
>> 
>> Julian
>> 
>> [1] https://apachecon.com/acasia2023/cfp.html
>> [2] https://tac.apache.org/
>> 
> 
> 
> -- 
> 
> Best,
> Benchao Li



Re: [Help Needed] Dissecting CTE out of Calcite LogicPlan

2023-07-26 Thread Julian Hyde
Your images did not come through (ASF mail server strips them from all emails) 
but I’ll try my best to answer.

As you note, we added the Spool operator to model the case where the RelNode 
tree is a DAG (one node has more than one consumer). But other than that, 
relational algebra doesn’t have a CTE concept. (In a sense every relational 
operator is a CTE. And in relational algebra the intermediate relations don’t 
have names.)

Do you intend to create CTEs only where a node has multiple consumers, or do 
you have some other criteria for ‘cutting’ the tree?

RelToSqlConverter (and its base class SqlImplementor) is very concerned with 
‘cutting’. Given an algebra tree, it tries to create the minimal number of 
SELECT blocks. The algebra scan-filter-aggregate-filter can be done in one 
block (FROM-WHERE-GROUP BY-HAVING) but scan-aggregate-filter-aggregate requires 
two blocks (because you can’t have multiple GROUP BY in one block).

Perhaps you need a version of RelToSqlConverter with a custom ‘cutting’ policy.

Julian

PS You and your colleague should subscribe to dev@ to avoid moderation delays 
and to receive replies.

> On Jul 26, 2023, at 12:58 AM, Ken Yang  wrote:
> 
> Dear Calcite Community,
> 
> We are trying to use Calcite to parse SQLs, by breaking complicated SQLs into 
> WITH-AS blocks. Particularly, our goal is to first parse an SQL clause into 
> Calcite Logical Plans, and then manually traverse the entire tree and use 
> WITH-AS to wrap any sub-trees with a LogicalProject root. By doing so we will 
> have an easy-to-understand SQL equivalent of the original SQL. After some 
> research, we came to believe that this feature is not readily available in 
> Calcite, and decide to reach out for help as our current approach has 
> blockers.
> 
> To illustrate our current approach and blocker, we consider the following 
> SQL, and its corresponding Logical Plan provided by Calcite:
> 
> 
> Here, with a RelNode logPlan as root, we would simply take out 
> logPlan.getInput(0) and logPlan.getInput(1), transform them back into 
> SqlNodes, and then into Strings. During the transformation process between 
> SqlNodes and Strings, we would manually wrap a "WITH SQ0 AS" and "WITH SQ1 
> AS" around the Statements, making them WITH-AS blocks as desired.
> 
> The blocker we face is that we do not know how to push SQ0 and SQ1 as tables 
> into the Logical Plan. The purpose of pushing them as LogicalTableScans into 
> the tree is we want their parent node, LogicalIntersect to result in 
> something like (SELECT * FROM SQ0) INTERSECT (SELECT * FROM SQ1) when parsed 
> back into SQL. If left untreated, the resulting SQL statement would simply 
> contain two WITH-AS blocks and the entire original SQL statement, as the 
> parser would take on the entire subquery of SQ0 and SQ1 without knowing that 
> they have acquired aliases. On the other hand, as SQ0 and SQ1 are dynamically 
> generated during the parsing of the tree, we are not sure if it is 
> appropriate to add them as new tables into the schemas while we are parsing.
> 
> With the problem clearly stated, we would sincerely appreciate if you could 
> provide us with some concrete guidance on how we can complete the task on 
> this simple case. We are aware that there are some potential solutions, e.g., 
> by pushing a Spool into the Logical Plan and override the RelToSqlConverter, 
> as suggested in 
> https://lists.apache.org/thread/k0or4xyfv4bbmgtrllg40ftysbg24y0h, but we are 
> unfortunately incapable of realizing these solutions without further concrete 
> guidance.
> 
> Please kindly respond to this thread if you know how to solve this problem. 
> We will be very grateful for your help.
> 
> Best,
> Yingxiang & Boyi
> 
> PhD, Electrical and Computer Engineering
> University of Illinois at Urbana-Champaign
> Coordinated Science Laboratory, Room 108
> Urbana, Illinois, 61801.
> 



Re: [VOTE] Release Apache Calcite 1.35.0 (release candidate 3)

2023-07-23 Thread Julian Hyde
+1

Downloaded; checked signatures; checked LICENSE, NOTICE; built and ran
tests on Linux using Gradle 7.4.2 and OpenJDK 18.0.2; ran rat.

Notes:
 * Please add your signature to KEYS. I had to import from xiong.asc manually.
 * The parser generator reports a choice conflict on the "TRUNCATE"
keyword; I presume this is a regression introduced by CALCITE-5688.
Not a show-stopper. but it needs to be fixed.

Julian



On Sun, Jul 23, 2023 at 7:37 AM Benchao Li  wrote:
>
> +1 (binding)
>
> - checked signature and checksum (OK)
> - diffed source with tag (OK)
> - build from source (OK, but with an unstable test, I've logged
> https://issues.apache.org/jira/browse/CALCITE-5868)
> - checked files in Nexus staging (OK)
> - checked copyright year in NOTICE (OK)
>
> Francis Chuang  于2023年7月23日周日 15:02写道:
>
> > My vote is: +1 (binding)
> >
> > - Verified GPG signature - OK
> > - Verified SHA512 - OK
> > - Diffed source release and git repository - OK
> > - Checked release notes on tag
> > (
> > https://github.com/apache/calcite/blob/calcite-1.35.0-rc3/site/_docs/history.md)
> >
> > - OK
> > - Checked year and versions in NOTICE, README and HOWTO - OK
> > - Ran tests (gradle check) - OK
> > - Spot checked Nexus artifacts - OK
> >
> > Environment:
> > Eclipse-temurin:19-jammy docker container in WSL2 (Ubuntu 22.04.2) on
> > Windows 11 22h2
> >
> > $ docker version
> > Client: Docker Engine - Community
> >   Cloud integration: v1.0.35
> >   Version:   24.0.2
> >   API version:   1.43
> >   Go version:go1.20.4
> >   Git commit:cb74dfc
> >   Built: Thu May 25 21:52:17 2023
> >   OS/Arch:   linux/amd64
> >   Context:   default
> >
> > Server: Docker Desktop
> >   Engine:
> >Version:  24.0.2
> >API version:  1.43 (minimum version 1.12)
> >Go version:   go1.20.4
> >Git commit:   659604f
> >Built:Thu May 25 21:52:17 2023
> >OS/Arch:  linux/amd64
> >Experimental: false
> >   containerd:
> >Version:  1.6.21
> >GitCommit:3dce8eb055cbb6872793272b4f20ed16117344f8
> >   runc:
> >Version:  1.1.7
> >GitCommit:v1.1.7-0-g860f061
> >   docker-init:
> >Version:  0.19.0
> >GitCommit:de40ad0
> >
> > $ gradle -v
> >
> > 
> > Gradle 7.6.1
> > 
> >
> > Build time:   2023-02-24 13:54:42 UTC
> > Revision: 3905fe8ac072bbd925c70ddbf4463341f4b4
> >
> > Kotlin:   1.7.10
> > Groovy:   3.0.13
> > Ant:  Apache Ant(TM) version 1.10.11 compiled on July 10 2021
> > JVM:  19.0.2 (Eclipse Adoptium 19.0.2+7)
> > OS:   Linux 5.15.90.1-microsoft-standard-WSL2 amd64
> >
> > $ java --version
> > openjdk 19.0.2 2023-01-17
> > OpenJDK Runtime Environment Temurin-19.0.2+7 (build 19.0.2+7)
> > OpenJDK 64-Bit Server VM Temurin-19.0.2+7 (build 19.0.2+7, mixed mode,
> > sharing)
> >
> > Francis
> >
> > On 22/07/2023 9:40 am, Xiong Duan wrote:
> > > Hi all,
> > >
> > >
> > > I have created a build for Apache Calcite 1.35.0, release
> > >
> > > candidate 3.
> > >
> > >
> > > Thanks to everyone who has contributed to this release.
> > >
> > >
> > > You can read the release notes here:
> > >
> > >
> > https://github.com/apache/calcite/blob/calcite-1.35.0-rc3/site/_docs/history.md
> > >
> > >
> > > The commit to be voted upon:
> > >
> > >
> > https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=75750b78b5ac692caa654f506fc1515d4d3991d6
> > >
> > >
> > > Its hash is 75750b78b5ac692caa654f506fc1515d4d3991d6
> > >
> > >
> > > Tag:
> > >
> > > https://github.com/apache/calcite/tree/calcite-1.35.0-rc3
> > >
> > >
> > > The artifacts to be voted on are located here:
> > >
> > > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.35.0-rc3
> > >
> > > (revision 63131)
> > >
> > >
> > > The hashes of the artifacts are as follows:
> > >
> > >
> > a19e0e73dfaa4d54d4d27db24c1b5674979e5461ed4c0bd1408aaec72ff82e832b3e4dee13c9d6552d3b5a9db86c1f11b9a960521e76dd65fc661565e160f63e
> > >
> > > *apache-calcite-1.35.0-src.tar.gz
> > >
> > >
> > > A staged Maven repository is available for review at:
> > >
> > >
> > https://repository.apache.org/content/repositories/orgapachecalcite-1216/org/apache/calcite/
> > >
> > >
> > > Release artifacts are signed with the following key:
> > >
> > > https://people.apache.org/keys/committer/xiong.asc
> > >
> > > https://www.apache.org/dist/calcite/KEYS
> > >
> > >
> > > To create the jars and test Apache Calcite: "gradle build"
> > >
> > > (requires an appropriate Gradle/JDK installation)
> > >
> > >
> > > Please vote on releasing this package as Apache Calcite 1.35.0.
> > >
> > >
> > > The vote is open for the next 72 hours and passes if a majority of at
> > least
> > > three +1 PMC votes are cast.
> > >
> > >
> > > [ ] +1 Release this package as Apache Calcite 1.35.0
> > >
> > > [ ] 

Re: Window functions frame_exclusion

2023-07-16 Thread Julian Hyde
It would be good to have this as a feature. Are you proposing to fully 
implement the feature (I.e. add to parser, validator, and Enumerable 
convention, so that we can execute such queries) or just add it to the parser?

I think our users might be confused or frustrated if the parser supports the 
syntax but we cannot execute it. 

Julian

> On Jul 16, 2023, at 10:35 AM, Itiel Sadeh  
> wrote:
> 
> Hello All,
> PostgreSQL supports "frame exclusion" in window function, e.g.: "SELECT
> sum(x) OVER (PARTITION BY y ORDER BY z ROWS BETWEEN UNBOUNDED PRECEDING and
> UNBOUNDED FOLLOWING EXCLUDE CURRENT ROW) from t", (you can see here
> )
> which let you ignore some rows from the selected scope.
> Calcite however does not support it.
> Can I add it to the parser? Do we use conformance for such cases or other
> mechanisms?
> 
> Thanks,
> Itiel


Re: [VOTE] Release Apache Calcite 1.35.0 (release candidate 0)

2023-07-16 Thread Julian Hyde
I don’t think the short links are an improvement to the vote email. 

The release notes could be improved. I think CONVERT counts as a new feature. 
Puffin and PairList do not (they’re utility classes). 

There are so many new functions this time, how about adding a new section, and 
sorting them alphabetically?

Can you convert contributors names to first last name? E.g. rubenql appeared in 
previous releases as Ruben Quesada Lopez. 

Julian

> On Jul 14, 2023, at 7:33 AM, xiong duan  wrote:
> 
> Hi all,
> 
> 
> I have created a build for Apache Calcite 1.35.0, release
> 
> candidate 0.
> 
> 
> Thanks to everyone who has contributed to this release.
> 
> 
> You can read the release notes here:
> 
> https://s.apache.org/p3rl9
> 
> 
> The commit to be voted upon:
> 
> https://s.apache.org/8io6t
> 
> 
> Its hash is bd4cac4ee30b4c275f5a229eb1704524c3dbc376
> 
> 
> Tag:
> 
> https://github.com/apache/calcite/tree/calcite-1.35.0-rc0
> 
> 
> The artifacts to be voted on are located here:
> 
> https://s.apache.org/c95f1
> 
> (revision 62985)
> 
> 
> The hashes of the artifacts are as follows:
> 
> b2eb9b25727d1213f889e8d1a547a1f84c66af3821523ebe19a36ae5bc595daa74b4b4584914012bffca0a24be19a523c45dbaf72064d90f50feffc0de8a5e6e
> 
> *apache-calcite-1.35.0-src.tar.gz
> 
> 
> A staged Maven repository is available for review at:
> 
> https://repository.apache.org/content/repositories/orgapachecalcite-1206/org/apache/calcite/
> 
> 
> Release artifacts are signed with the following key:
> 
> https://people.apache.org/keys/committer/xiong.asc
> 
> https://www.apache.org/dist/calcite/KEYS
> 
> 
> To create the jars and test Apache Calcite: "gradle build"
> 
> (requires an appropriate Gradle/JDK installation)
> 
> 
> Please vote on releasing this package as Apache Calcite 1.35.0.
> 
> 
> The vote is open for the next 72 hours and passes if a majority of at
> 
> least three +1 PMC votes are cast.
> 
> 
> [ ] +1 Release this package as Apache Calcite 1.35.0
> 
> [ ]  0 I don't feel strongly about it, but I'm okay with the release
> 
> [ ] -1 Do not release this package because...
> 
> 
> Here is my vote:
> 
> 
> +1 (binding)


Re: Developing a new SqlDialect

2023-07-11 Thread Julian Hyde
I see those problems too.

A major problem is that when you start a new dialect, there are no tests. And 
that’s because our dialect test does not execute queries, only compares 
generated SQL (and those SQL fragments need to be entered by hand). There’s a 
good reason that our dialect test does not execute queries: connections to 
external databases are expensive (you need a license, and cloud resources) and 
flaky (valid tests will fail from time to time).

The moonshot to address this is 
https://issues.apache.org/jira/browse/CALCITE-5529.

I have reached the point where I can execute all queries in 
RelToSqlConverterTest against any connection you provide (and I plan to run 
them against a reference connection - say Postgres - so that I know that the 
results from your connection are valid). The problem is that I don’t have a 
means to initialize a connection (create tables and populate data) and that 
there are a huge number of errors.

I would appreciate help working on this. It’s a lot of work, but it will 
improve the quality of our dialect system.

Julian


> On Jul 11, 2023, at 6:54 AM, Joeri van Ruth 
>  wrote:
> 
> Hi all,
> 
> MonetDB is an open source columnar main-memory SQL database that has
> been around for quite a while.  I am a MonetDB developer and I would
> like to add support for MonetDB's SQL dialect to Calcite and
> contribute it to the Calcite project.
> 
> Looking at some of the existing Dialect classes I don't expect the
> end result to be a lot of code but I'm a bit concerned that the
> process of getting there will be an endless treadmill of users
> finding yet another deviation and me fixing it.
> 
> I'm hoping to ask you here for some guidance on how to develop a new
> dialect in a more systematic way and how to ensure it will not
> immediately fail when someone other than the developer tries to use
> it.
> 
> For example, I know you have an extensive test suite, will I be able
> to use that to determine if my implementation is ready, and to
> pinpoint the things I still need to do?  Are there particular parts
> of the rest of the code base I should first try to familiarize myself
> with to make the implementation go more smoothly?
> 
> Also, what would you expect to be a reasonable time frame for this
> project considering I have good knowledge fo MonetDB but I'm new to
> Calcite?  A week?  A month?
> 
> Any tips and pointers would be appreciated.
> 
> Joeri



Fwd: Data Engineering track for Community Over Code NA is calling for presentations

2023-07-07 Thread Julian Hyde
Calcite community members,

As the message below says, the CFP for Community Over Code North
America closes in a few days.

It would be great if there were talks about Calcite. Consider
submitting to the Data Engineering track (see below), the Big Data
track and the Geospatial track.

Also consider submitting a talk about your experiences in open source
- building community, growing your career and expertise, or solving
software engineering problems.

I plan to submit a talk, and I hope to see you there!

Julian

-- Forwarded message -
From: Jarek Potiuk 
Date: Fri, Jul 7, 2023 at 12:39 PM
Subject: Data Engineering track for Community Over Code NA is calling
for presentations
To: 
Cc: Ismaël Mejía 


Hello Hop community,

Just a reminder, that there are just 6 days left to submit your proposal
for The Community Over Code NA (former ApacheCon) conference.

This is the flagship event for the ASF in Halifax, Nova Scotia, Canada,
October 7-10, 2023 and together with Ismael, we want to encourage you to
submit talks for the Data Engineering track.

More info on the conference https://communityovercode.org/ and more info on
the Data Engineering track:
https://medium.com/@jarekpotiuk/data-engineering-community-over-code-conference-38e9677bb440

We hope to see you in Halifax in October!

J.


Re: Force push to calcite main

2023-07-07 Thread Julian Hyde
I didn’t know about —force-with-lease. I hoped that it existed — because 
otherwise there is an obvious race condition — but never discovered it. 

Thank you for teaching me something, Michael.

> On Jul 7, 2023, at 11:34 AM, Michael Mior  wrote:
> 
> I would also be +1 for keeping it in place. It certainly doesn't solve all
> the problems, but using --force-with-lease instead of --force is generally
> a good idea. It just makes sure no one else has pushed anything since your
> last update. If they have, it will block the force push. At that point, if
> I still wanted to force push, I would rebase to incorporate their changes
> if possible, and make sure I notify whoever pushed.
> 
> --
> Michael Mior
> mm...@apache.org
> 
> 
> On Fri, Jul 7, 2023 at 2:19 PM Julian Hyde  wrote:
> 
>>> I prefer keeping force push in place
>> 
>> +1. I use force push about a dozen times a year and hopefully no one
>> notices. It helps the commit history clean.
>> 
>> I know that force pushes can confuse CI systems and make things confusing
>> to other users, so I try to be judicious. I trust other committers to do
>> the same. (I suspect that this was a learning experience for Tanner.)
>> 
>> Julian
>> 
>> 
>>> On Jul 7, 2023, at 4:12 AM, Stamatis Zampetakis 
>> wrote:
>>> 
>>> @Tanner No worries we all did this at some point in time, thanks a lot
>>> for following up!
>>> 
>>> @Stanilovsky: I prefer keeping force push in place and avoiding messy
>>> reverts that are usually necessary in various situations where we make
>>> a mistake. All commits are archived so there is nothing that we can't
>>> fix (I think).
>>> 
>>> On Thu, Jul 6, 2023 at 3:45 PM Tanner Clary
>>>  wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> This was my mistake, my apologies. I will update the hashes. Sorry for
>> any
>>>> inconvenience.
>>>> 
>>>> Tanner
>>>> 
>>>> On Thu, Jul 6, 2023 at 3:34 AM stanilovsky evgeny <
>>>> estanilovs...@gridgain.com> wrote:
>>>> 
>>>>> I already told that community need to vote for prohibit force push.
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> It appears that there was a force push to main yesterday [1] rewriting
>>>>>> the history for a bunch of commits. I don't know if it was intentional
>>>>>> or not but it seems that now resolved JIRAs (after CALCITE-5810 I
>>>>>> think) are pointing to non-existent commits.
>>>>>> 
>>>>>> Can someone please update the JIRA tickets with the correct commit
>>>>>> hashes and also ensure that we didn't lose anything after the rebase?
>>>>>> 
>>>>>> Best,
>>>>>> Stamatis
>>>>>> 
>>>>>> [1] https://lists.apache.org/thread/7jjnbkkh9tv49sjcc5kg2tm7c54tj861
>>>>> 
>> 
>> 



Re: Force push to calcite main

2023-07-07 Thread Julian Hyde
> I prefer keeping force push in place

+1. I use force push about a dozen times a year and hopefully no one notices. 
It helps the commit history clean.

I know that force pushes can confuse CI systems and make things confusing to 
other users, so I try to be judicious. I trust other committers to do the same. 
(I suspect that this was a learning experience for Tanner.)

Julian


> On Jul 7, 2023, at 4:12 AM, Stamatis Zampetakis  wrote:
> 
> @Tanner No worries we all did this at some point in time, thanks a lot
> for following up!
> 
> @Stanilovsky: I prefer keeping force push in place and avoiding messy
> reverts that are usually necessary in various situations where we make
> a mistake. All commits are archived so there is nothing that we can't
> fix (I think).
> 
> On Thu, Jul 6, 2023 at 3:45 PM Tanner Clary
>  wrote:
>> 
>> Hello,
>> 
>> This was my mistake, my apologies. I will update the hashes. Sorry for any
>> inconvenience.
>> 
>> Tanner
>> 
>> On Thu, Jul 6, 2023 at 3:34 AM stanilovsky evgeny <
>> estanilovs...@gridgain.com> wrote:
>> 
>>> I already told that community need to vote for prohibit force push.
>>> 
 Hello,
 
 It appears that there was a force push to main yesterday [1] rewriting
 the history for a bunch of commits. I don't know if it was intentional
 or not but it seems that now resolved JIRAs (after CALCITE-5810 I
 think) are pointing to non-existent commits.
 
 Can someone please update the JIRA tickets with the correct commit
 hashes and also ensure that we didn't lose anything after the rebase?
 
 Best,
 Stamatis
 
 [1] https://lists.apache.org/thread/7jjnbkkh9tv49sjcc5kg2tm7c54tj861
>>> 



Re: Java Doc about RexProgramBuilderBase

2023-07-07 Thread Julian Hyde
People refactoring code should remember that their IDE can move and rename 
fields but is not so good at changing documentation. (Maybe in a couple of 
years, with advances in generative AI?!)

And when documentation and code don’t line up, people don’t trust either. (I’m 
as guilty of this as anyone.)

> On Jul 7, 2023, at 5:31 AM, Benchao Li  wrote:
> 
> This usually happens. When javadoc and the implementation diverges, the
> javadoc is mostly possible wrong and need to be improved to match the real
> behavior.
> 
> For this specific case, I agree with you that the javadoc for
> `vParamNotNull` and `vDecimal(int arg)` are not correct, please fix them.
> (We can do this kind of trivial work without a Jira ticket)
> 
> Zhe Hu  于2023年7月7日周五 14:45写道:
> 
>> Hi community.
>> Recently, when I review CALCITE-5769(
>> https://github.com/apache/calcite/pull/3296), I found something a little
>> confusing.
>> 
>> First, the java doc in RexProgramBuilderBase.vParamNotNull(), which meant
>> to create non-nullable variable, but it’s returning description is
>> “nullable varchar variable”.
>> Second, we use vDecimal(int arg) to create nullable decimal variable, but
>> the RelDataType we pass in is “nonNullableDecimal”, which I think should be
>> “nullableDecimal”. So does the other vXxx() methods.
>> I’m not sure if I understand right here. If it’s something we can improve,
>> I’ll file a JIRA case to record and fix it.
>> 
>> 
>> Best regards,
>> Zhe Hu
>> 
>> 
> 
> -- 
> 
> Best,
> Benchao Li



Re: [ANNOUNCE] New committer: Dan Zou

2023-07-05 Thread Julian Hyde
Welcome, Dan! And thank you for your contributions so far.

> On Jul 4, 2023, at 7:12 PM, Jiajun Xie  wrote:
> 
> Congratulations, Dan!
> 
> On Tue, 4 Jul 2023 at 19:48, Runkang He  wrote:
> 
>> Congratulations, Dan!
>> 
>> Best,
>> Runkang He
>> 
>> Jacky Lau  于2023年7月4日周二 18:56写道:
>> 
>>> Congratulations, Dan!
>>> 
>>> Best Jacky!
>>> 
>>> Benchao Li  于2023年7月4日周二 17:55写道:
>>> 
 Congratulations, Dan!
 
 Zhe Hu  于2023年7月4日周二 17:26写道:
 
> Congrats! Dan.
>  Replied Message 
> | From | Francis Chuang |
> | Date | 07/04/2023 15:18 |
> | To | dev |
> | Subject | Re: [ANNOUNCE] New committer: Dan Zou |
> Congrats, Dan!
> 
> On 4/07/2023 5:16 pm, Stamatis Zampetakis wrote:
>> Apache Calcite's Project Management Committee (PMC) has invited Dan
>>> Zou
> to
>> become a committer, and we are pleased to announce that they have
> accepted.
>> 
>> Dan has been doing some great work for the project over the past
>> few
>> months. They implemented and enabled multiple new SQL functions for
>> BigQuery and MSSQL dialects, fixed some optimization rules, and
>> improved documentation and test code.
>> 
>> Dan, welcome, thank you for your contributions, and we look forward
>>> to
> your
>> further interactions with the community! If you wish, please feel
>>> free
> to tell
>> us more about yourself and what you are working on.
>> 
>> As your first commit, please add yourself to the contributors list
>>> [1]
>> and the community page will re-generate [2].
>> 
>> Stamatis (on behalf of the Apache Calcite PMC)
>> 
>> [1]
> 
>>> https://github.com/apache/calcite/blob/main/site/_data/contributors.yml
>> [2] https://calcite.apache.org/community/#project-members
> 
 
 
 --
 
 Best,
 Benchao Li
 
>>> 
>> 



Re: [ANNOUNCE] New committer: TJ Banghart

2023-07-05 Thread Julian Hyde
Welcome, TJ! Thank you for your work on both Calcite and Avatica.

As you’re my colleague, I know what you do, but please feel free to introduce 
yourself to the community.

Julian


> On Jul 4, 2023, at 7:11 PM, Jiajun Xie  wrote:
> 
> Congratulations, TJ!
> 
> On Wed, 5 Jul 2023 at 01:03, Tanner Clary 
> wrote:
> 
>> Congrats and welcome, TJ!!
>> On Tue, Jul 4, 2023 at 8:26 AM Runkang He  wrote:
>> 
>>> Congratulations, TJ!
>>> 
>>> Best, Runkang He
>>> 
>>> xiong duan  于2023年7月4日周二 19:39写道:
>>> 
 Congratulations, TJ! Welcome!
 
 Jacky Lau  于2023年7月4日周二 18:56写道:
 
> Congratulations, TJ!
> 
> Best Jacky!
> 
> Benchao Li  于2023年7月4日周二 17:55写道:
> 
>> Congratulations, TJ!
>> 
>> Zhe Hu  于2023年7月4日周二 17:26写道:
>> 
>>> Congrats! TJ.
>>>  Replied Message 
>>> | From | Francis Chuang |
>>> | Date | 07/04/2023 15:18 |
>>> | To | dev |
>>> | Subject | Re: [ANNOUNCE] New committer: TJ Banghart |
>>> Congrats, TJ!
>>> 
>>> On 4/07/2023 5:17 pm, Stamatis Zampetakis wrote:
 Apache Calcite's Project Management Committee (PMC) has invited
>>> TJ
>>> Banghart to
 become a committer, and we are pleased to announce that they
>> have
>>> accepted.
 
 TJ has been contributing to the community for about a year now.
 They
 introduced many new SQL functions for parsing and formatting
>>> dates,
 times, and timestamps extending the capabilities of the
>> BigQuery
 dialect. Furthermore, they pushed various improvements around
>>> JSON
 serialization and deserialization, JDBC metadata, and lexical
 policies.
 
 TJ, welcome, thank you for your contributions, and we look
>>> forward
 to
>>> your
 further interactions with the community! If you wish, please
>> feel
> free
>>> to tell
 us more about yourself and what you are working on.
 
 As your first commit, please add yourself to the contributors
>>> list
> [1]
 and the community page will re-generate [2].
 
 Stamatis (on behalf of the Apache Calcite PMC)
 
 [1]
>>> 
> 
>>> https://github.com/apache/calcite/blob/main/site/_data/contributors.yml
 [2] https://calcite.apache.org/community/#project-members
>>> 
>> 
>> 
>> --
>> 
>> Best,
>> Benchao Li
>> 
> 
 
>>> 
>> 



Re: Draft: board report for 2023 Q2

2023-07-05 Thread Julian Hyde
Looks good. Thanks, Stamatis.

And thank you to the top reviewers — Benchao, Ruben, Jiajun. Your efforts keep 
the project moving. Without your work, Calcite would be a lot less welcoming to 
new contributors.

Julian
 


> On Jul 5, 2023, at 5:10 AM, Benchao Li  wrote:
> 
> +1, it's very insightful.
> 
> Francis Chuang  于2023年7月5日周三 14:47写道:
> 
>> +1, excellent work, Stamatis!
>> 
>> On 4/07/2023 9:04 pm, Ruben Q L wrote:
>>> +1
>>> Thanks Stamatis for preparing it!
>>> 
>>> On Tue, Jul 4, 2023 at 10:18 AM Stamatis Zampetakis 
>>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> Below you can find a draft of this quarter's board report. I plan to
>>>> submit it next Tuesday (July 11, 2023).
>>>> Please let me know if you have any additions or corrections.
>>>> 
>>>> Best regards,
>>>> Stamatis
>>>> -
>>>> 
>>>> ## Description:
>>>> Apache Calcite is a highly customizable framework for parsing and
>> planning
>>>> queries on data in a wide variety of formats. It allows database-like
>>>> access,
>>>> and in particular a SQL interface and advanced query optimization, for
>> data
>>>> not residing in a traditional database.
>>>> 
>>>> Avatica is a sub-project within Calcite and provides a framework for
>>>> building
>>>> local and remote JDBC and ODBC database drivers. Avatica has an
>> independent
>>>> release schedule and its own repository.
>>>> 
>>>> ## Project Status:
>>>> Current project status: ongoing
>>>> Issues for the board: none
>>>> 
>>>> ## Membership Data:
>>>> Apache Calcite was founded 2015-10-22 (8 years ago)
>>>> There are currently 67 committers and 27 PMC members in this project.
>>>> The Committer-to-PMC ratio is roughly 2:1.
>>>> 
>>>> Community changes, past quarter:
>>>> - No new PMC members. Last addition was Benchao Li on 2023-01-27.
>>>> - Jacky Lau was added as committer on 2023-06-28
>>>> - Oliver Lee was added as committer on 2023-06-13
>>>> - Tanner Clary was added as committer on 2023-05-25
>>>> - Zhe Hu was added as committer on 2023-06-28
>>>> 
>>>> ## Project Activity:
>>>> There were no releases during this quarter. The last release was Apache
>>>> Calcite 1.34.0 on 2023-03-14.
>>>> 
>>>> We are actively working towards Apache Calcite 1.35.0 and we plan to
>>>> release
>>>> the new version during July 2023.
>>>> 
>>>> On 2023-06-29, the Calcite community organised a Virtual key signing
>> party
>>>> for
>>>> expanding the Web of trust and empowering the cryptographic signatures
>> of
>>>> our
>>>> members and future release managers. 4 PMC members and 2 committers
>>>> attended
>>>> the event.
>>>> 
>>>> ## Community Health:
>>>> The project remains super healthy.
>>>> 
>>>> The traffic in JIRA, GitHub, issues@, commits, has increased overall,
>>>> while
>>>> dev@ has dropped by 18%. During this quarter there were more
>> discussions
>>>> and
>>>> exchanges under specific issues/tickets shifting the traffic from dev@
>> to
>>>> issues@ and other places.
>>>> 
>>>> The number of non-committer (contributor) commits per month:
>>>> +-+-+-+
>>>> |year |month| contributor_commits |
>>>> +-+-+-+
>>>> | 2023| 4   | 17  |
>>>> | 2023| 5   | 21  |
>>>> | 2023| 6   | 35  |
>>>> +-+-+-+
>>>> 
>>>> The number of active reviewers per month:
>>>> +-+-+-+
>>>> |year |month|  active_reviewers   |
>>>> +-+-+-+
>>>> | 2023| 4   | 6   |
>>>> | 

Re: [DISCUSS] Towards Calcite 1.35.0

2023-07-05 Thread Julian Hyde
I’ve been working on a few lint improvements in 
https://github.com/julianhyde/calcite/tree/-lint. I’ll merge those in 
(rules and fixes) before the release.

> On Jul 5, 2023, at 11:26 AM, Julian Hyde  wrote:
> 
> I don’t think CALCITE-5701 (named_struct) is ready. (I had imagined that it 
> would be syntactic sugar for ROW. Guillaume’s latest PR adds a new runtime 
> class so that the struct values can be introspected at runtime as beans, and 
> I think that is the wrong path. I’ll add more discussion in the Jira case.)
> 
>> On Jul 5, 2023, at 1:50 AM, stanilovsky evgeny  
>> wrote:
>> 
>> [1] Is ready for merging, approvals are obtained.
>> 
>> [1] https://github.com/apache/calcite/pull/3211
>> 
>> 
>>> I finished CALCITE-5701 Add NAMED_STRUCT function (enabled in Spark library)
>>> I hope it's not too late for the release:
>>> https://github.com/apache/calcite/pull/3295
>>> 
>>> The CI passes, it just needs a re-run on the last commit.
>>> 
>>> On Fri, Jun 30, 2023 at 5:26 PM Gian Merlino  wrote:
>>> 
>>>> Of the open tickets, I reviewed CALCITE-5708 and CALCITE-5727, which are
>>>> two I felt I understood well enough to review.
>>>> 
>>>> On 2023/06/28 19:36:16 Julian Hyde wrote:
>>>>> I have taken care of 1, 2, 3, 4.
>>>>> 
>>>>> Tanner, can you do 5.
>>>>> 
>>>>> Of the 12 open issues that are open and have fixVersion = 1.35, most
>>>> have PRs ready for review [1]. Can a few committers each take say 3 cases
>>>> each and review their PRs?
>>>>> 
>>>>> Julian
>>>>> 
>>>>> [1]
>>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CALCITE%20AND%20fixVersion%20%3D%201.35.0%20and%20status%20%3D%20open
>>>>> 
>>>>>> On Jun 28, 2023, at 6:25 AM, Ruben Q L  wrote:
>>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> Thanks for checking Xiong Duan, I have taken care of 6,7,8.
>>>>>> 
>>>>>> Contributors, please remember that after completing a ticket, the Jira
>>>>>> needs to be set to "Resolved", not "Closed" (they will be moved to
>>>> Closed
>>>>>> by the Release Manager, once the next release is produced).
>>>>>> 
>>>>>> Apart from that, our dashboard [1] still shows 22 unresolved tickets
>>>> with
>>>>>> fixVersion=1.35
>>>>>> Except from the blocker ones, I guess the rest would need to be
>>>> resolved
>>>>>> shortly or otherwise moved to fixVersion=1.36 (or no fixVersion).
>>>>>> As a general rule, please do not set a fixVersion unless it is a
>>>> blocker
>>>>>> issue, or you're reasonably sure that the issue will be done for the
>>>> next
>>>>>> release.
>>>>>> 
>>>>>> Best,
>>>>>> Ruben
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jun 28, 2023 at 1:47 PM xiong duan 
>>>> wrote:
>>>>>> 
>>>>>>> In order to release 1.35.0, I reviewed all the fixed issues in this
>>>>>>> version, Here are some issue statuses that need to be resolved:
>>>>>>> 
>>>>>>> 
>>>>>>>  1. https://issues.apache.org/jira/browse/CALCITE-5764(The PR has
>>>>>>>  merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>>>>>>  2. https://issues.apache.org/jira/browse/CALCITE-5706(The PR has
>>>>>>>  merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>>>>>>  3. https://issues.apache.org/jira/browse/CALCITE-5765(The PR has
>>>>>>>  merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>>>>>>  4. https://issues.apache.org/jira/browse/CALCITE-5762(The PR has
>>>>>>>  merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>>>>>>  5. https://issues.apache.org/jira/browse/CALCITE-5747(The PR has
>>>>>>>  merge.But issue is NOT Resolve). (Assigine Tanner Clary)
>>>>>>>  6. https://issues.apache.org/jira/browse/CALCITE-5771

Re: [DISCUSS] Towards Calcite 1.35.0

2023-07-05 Thread Julian Hyde
I don’t think CALCITE-5701 (named_struct) is ready. (I had imagined that it 
would be syntactic sugar for ROW. Guillaume’s latest PR adds a new runtime 
class so that the struct values can be introspected at runtime as beans, and I 
think that is the wrong path. I’ll add more discussion in the Jira case.)

> On Jul 5, 2023, at 1:50 AM, stanilovsky evgeny  
> wrote:
> 
> [1] Is ready for merging, approvals are obtained.
> 
> [1] https://github.com/apache/calcite/pull/3211
> 
> 
>> I finished CALCITE-5701 Add NAMED_STRUCT function (enabled in Spark library)
>> I hope it's not too late for the release:
>> https://github.com/apache/calcite/pull/3295
>> 
>> The CI passes, it just needs a re-run on the last commit.
>> 
>> On Fri, Jun 30, 2023 at 5:26 PM Gian Merlino  wrote:
>> 
>>> Of the open tickets, I reviewed CALCITE-5708 and CALCITE-5727, which are
>>> two I felt I understood well enough to review.
>>> 
>>> On 2023/06/28 19:36:16 Julian Hyde wrote:
>>> > I have taken care of 1, 2, 3, 4.
>>> >
>>> > Tanner, can you do 5.
>>> >
>>> > Of the 12 open issues that are open and have fixVersion = 1.35, most
>>> have PRs ready for review [1]. Can a few committers each take say 3 cases
>>> each and review their PRs?
>>> >
>>> > Julian
>>> >
>>> > [1]
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20CALCITE%20AND%20fixVersion%20%3D%201.35.0%20and%20status%20%3D%20open
>>> >
>>> > > On Jun 28, 2023, at 6:25 AM, Ruben Q L  wrote:
>>> > >
>>> > > Hello,
>>> > >
>>> > > Thanks for checking Xiong Duan, I have taken care of 6,7,8.
>>> > >
>>> > > Contributors, please remember that after completing a ticket, the Jira
>>> > > needs to be set to "Resolved", not "Closed" (they will be moved to
>>> Closed
>>> > > by the Release Manager, once the next release is produced).
>>> > >
>>> > > Apart from that, our dashboard [1] still shows 22 unresolved tickets
>>> with
>>> > > fixVersion=1.35
>>> > > Except from the blocker ones, I guess the rest would need to be
>>> resolved
>>> > > shortly or otherwise moved to fixVersion=1.36 (or no fixVersion).
>>> > > As a general rule, please do not set a fixVersion unless it is a
>>> blocker
>>> > > issue, or you're reasonably sure that the issue will be done for the
>>> next
>>> > > release.
>>> > >
>>> > > Best,
>>> > > Ruben
>>> > >
>>> > > [1]
>>> > >
>>> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Wed, Jun 28, 2023 at 1:47 PM xiong duan 
>>> wrote:
>>> > >
>>> > >> In order to release 1.35.0, I reviewed all the fixed issues in this
>>> > >> version, Here are some issue statuses that need to be resolved:
>>> > >>
>>> > >>
>>> > >>   1. https://issues.apache.org/jira/browse/CALCITE-5764(The PR has
>>> > >>   merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>> > >>   2. https://issues.apache.org/jira/browse/CALCITE-5706(The PR has
>>> > >>   merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>> > >>   3. https://issues.apache.org/jira/browse/CALCITE-5765(The PR has
>>> > >>   merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>> > >>   4. https://issues.apache.org/jira/browse/CALCITE-5762(The PR has
>>> > >>   merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>> > >>   5. https://issues.apache.org/jira/browse/CALCITE-5747(The PR has
>>> > >>   merge.But issue is NOT Resolve). (Assigine Tanner Clary)
>>> > >>   6. https://issues.apache.org/jira/browse/CALCITE-5771(The Fix
>>> Version
>>> > >> is
>>> > >>   NONE, Need set it to 1.35.0)
>>> > >>   7. https://issues.apache.org/jira/browse/CALCITE-5757(The Fix
>>> Version
>>> > >> is
>>> > >>   NONE, Need set it to 1.35.0)
>>> > >>   8. https://issues.apache.org/jira/browse/CALCITE-4679(Resolution
>>> should
>>> > >>   be Fixed. But is Resolved)

Re: Optimal way to organize Joins in Calcite

2023-07-03 Thread Julian Hyde
The reason that there are two strategies is because of large joins. If your 
query joins 10 tables, the number of possible join orders is large (bounded by 
10 factorial I believe) and therefore would overwhelm the Volcano planner, 
which must construct each possibility. 

Therefore we have a heuristic algorithm that you should use for large joins. We 
gather the entire FROM clause into a data structure called MultiJoin, and a 
single rule call applies heuristics and spits out a join order that is probably 
close to optimal. 

When you are optimizing a query, you need to know whether you are in danger of 
being swallowed by the monster that is the complexity of large joins. If your 
query only joins 2 or 3 tables (and in some other situations too) you are not 
in danger and can safely exhaustively enumerate plans. 

> On Jul 3, 2023, at 7:58 AM, Jonathan Sternberg  wrote:
> 
> Hi,
> 
> I'm presently working on optimizing the ordering of joins for queries and
> had a few questions about the optimal way to do that with Calcite.
> 
> I watched this meetup video (https://www.youtube.com/watch?v=5wQojihyJDs)
> and spent some time experimenting with JoinAssociateRule, JoinCommuteRule,
> and the rules related to MultiJoins. We're utilizing the volcano planner
> for optimization at the present moment but also have the freedom to
> customize the order and phases for the planner phases.
> 
> 1. Is MultiJoin generally suggested over JoinAssociate and JoinCommute
> rules? Or are JoinAssociate and JoinCommute still recommended as the
> standard way to handle reordering of joins?
> 2. Our system only supports performing the join over two inputs and we
> can't support MultiJoin as a physical operation. My understanding is that
> the LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule will rearrange the
> join but will still produce a MultiJoin. What's the appropriate way to
> convert a MultiJoin back to a set of joins?
> 3. My understanding is that MultiJoin rules aren't compatible with the
> volcano planner and should be run as part of a stage using the heuristic
> planner. Is this understanding correct?
> 
> Thank you for any help.
> 
> --Jonathan Sternberg


Re: [DISCUSS] Towards Calcite 1.35.0

2023-06-28 Thread Julian Hyde
I have taken care of 1, 2, 3, 4.

Tanner, can you do 5.

Of the 12 open issues that are open and have fixVersion = 1.35, most have PRs 
ready for review [1]. Can a few committers each take say 3 cases each and 
review their PRs?

Julian

[1] 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20CALCITE%20AND%20fixVersion%20%3D%201.35.0%20and%20status%20%3D%20open
 

> On Jun 28, 2023, at 6:25 AM, Ruben Q L  wrote:
> 
> Hello,
> 
> Thanks for checking Xiong Duan, I have taken care of 6,7,8.
> 
> Contributors, please remember that after completing a ticket, the Jira
> needs to be set to "Resolved", not "Closed" (they will be moved to Closed
> by the Release Manager, once the next release is produced).
> 
> Apart from that, our dashboard [1] still shows 22 unresolved tickets with
> fixVersion=1.35
> Except from the blocker ones, I guess the rest would need to be resolved
> shortly or otherwise moved to fixVersion=1.36 (or no fixVersion).
> As a general rule, please do not set a fixVersion unless it is a blocker
> issue, or you're reasonably sure that the issue will be done for the next
> release.
> 
> Best,
> Ruben
> 
> [1]
> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
> 
> 
> 
> 
> On Wed, Jun 28, 2023 at 1:47 PM xiong duan  wrote:
> 
>> In order to release 1.35.0, I reviewed all the fixed issues in this
>> version, Here are some issue statuses that need to be resolved:
>> 
>> 
>>   1. https://issues.apache.org/jira/browse/CALCITE-5764(The PR has
>>   merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>   2. https://issues.apache.org/jira/browse/CALCITE-5706(The PR has
>>   merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>   3. https://issues.apache.org/jira/browse/CALCITE-5765(The PR has
>>   merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>   4. https://issues.apache.org/jira/browse/CALCITE-5762(The PR has
>>   merge.But issue is IN PROGRESS). (Assigine Julian Hyde)
>>   5. https://issues.apache.org/jira/browse/CALCITE-5747(The PR has
>>   merge.But issue is NOT Resolve). (Assigine Tanner Clary)
>>   6. https://issues.apache.org/jira/browse/CALCITE-5771(The Fix Version
>> is
>>   NONE, Need set it to 1.35.0)
>>   7. https://issues.apache.org/jira/browse/CALCITE-5757(The Fix Version
>> is
>>   NONE, Need set it to 1.35.0)
>>   8. https://issues.apache.org/jira/browse/CALCITE-4679(Resolution should
>>   be Fixed. But is Resolved)
>> 
>> If you are busy, Please tell me the real status of the ISSUE, and I will
>> handle it. (6,7,8 need one PMC to handle it).Thanks.
>> 
>> 
>> Jacky Lau  于2023年6月26日周一 14:24写道:
>> 
>>> hi @xiong duan:
>>>thanks for your review and merged very much, and i forgot there are
>>> also have one pr https://github.com/apache/calcite/pull/3262
>>>do you also have time to have a look?
>>> 
>>> 
>>> xiong duan  于2023年6月25日周日 19:45写道:
>>> 
>>>> I have reviewed:
>>>> https://github.com/apache/calcite/pull/3238
>>>> https://github.com/apache/calcite/pull/3263
>>>> If no other problem, I will merge it tomorrow.
>>>> 
>>>> Ruben Q L  于2023年6月25日周日 17:20写道:
>>>> 
>>>>> I'd like to include CALCITE-5789 in 1.35. I'll try to finalize it
>>>>> today/tomorrow.
>>>>> 
>>>>> 
>>>>> 
>>>>> El dom, 25 jun 2023, 6:33, Jacky Lau 
>> escribió:
>>>>> 
>>>>>> could we review this pr, so that it could be merged in 1.35. then
>> the
>>>>> 1.35
>>>>>> will almost have the full spark  collection function.
>>>>>> I would very appreciate a review. if someone has time.
>>>>>> 
>>>>>> https://github.com/apache/calcite/pull/3238
>>>>>> https://github.com/apache/calcite/pull/3263
>>>>>> 
>>>>>> xiong duan  于2023年6月25日周日 10:31写道:
>>>>>> 
>>>>>>> I have created CALCITE-5797 to release 1.35.0. If there are no
>>>>>> objections,
>>>>>>> I will create an RC in the following days.
>>>>>>> 
>>>>>>> [1]:https://issues.apache.org/jira/browse/CALCITE-5797
>>>>>>> 
>>>>>>> If there are other must fix for 1.35.0 please let us know so that
>>> we
>>>>> can
>>>>>>> plan accordingly.
>>>>>>> 
>>>>>>> Julian Hyde  于2

Re: [ANNOUNCE] New committer: Jacky Lau

2023-06-28 Thread Julian Hyde
Welcome, Jacky! Thanks for your excellent PRs.

Julian


> On Jun 28, 2023, at 8:51 AM, Michael Mior  wrote:
> 
> Congratulations and welcome Jacky!
> --
> Michael Mior
> mm...@apache.org
> 
> 
> On Wed, Jun 28, 2023 at 6:47 AM Stamatis Zampetakis 
> wrote:
> 
>> Apache Calcite's Project Management Committee (PMC) has invited Jacky Lau
>> to
>> become a committer, and we are pleased to announce that they have accepted.
>> 
>> Jacky has started contributing to the project very recently and in a
>> very short time they landed many notable improvements around ARRAY and
>> MAP functions with a particular focus on the Spark dialect.
>> 
>> Jacky, welcome, thank you for your contributions, and we look forward to
>> your
>> further interactions with the community! If you wish, please feel free to
>> tell
>> us more about yourself and what you are working on.
>> 
>> As your first commit, please add yourself to the contributors list [1]
>> and the community page will re-generate [2].
>> 
>> Stamatis (on behalf of the Apache Calcite PMC)
>> 
>> [1]
>> https://github.com/apache/calcite/blob/main/site/_data/contributors.yml
>> [2] https://calcite.apache.org/community/#project-members
>> 



Re: [ANNOUNCE] New committer: Zhe Hu

2023-06-28 Thread Julian Hyde
Thanks for your contributions, and welcome to the team!

Julian


> On Jun 28, 2023, at 8:52 AM, Michael Mior  wrote:
> 
> Congratulations and welcome Zhe Hu!
> --
> Michael Mior
> mm...@apache.org
> 
> 
> On Wed, Jun 28, 2023 at 7:04 AM Stamatis Zampetakis 
> wrote:
> 
>> Apache Calcite's Project Management Committee (PMC) has invited Zhe Hu to
>> become a committer, and we are pleased to announce that they have accepted.
>> 
>> Zhe Hu has been contributing to the project for a while now. They
>> improved the stability of the ElasticSearch adapter, worked on
>> supporting new java versions, and landed various enhancements around
>> CONCAT and CONVERT functions.
>> 
>> Zhe Hu, welcome, thank you for your contributions, and we look forward to
>> your
>> further interactions with the community! If you wish, please feel free to
>> tell
>> us more about yourself and what you are working on.
>> 
>> As your first commit, please add yourself to the contributors list [1]
>> and the community page will re-generate [2].
>> 
>> Stamatis (on behalf of the Apache Calcite PMC)
>> 
>> [1]
>> https://github.com/apache/calcite/blob/main/site/_data/contributors.yml
>> [2] https://calcite.apache.org/community/#project-members
>> 



[jira] [Created] (CALCITE-5802) In RelBuilder add method aggregateExtended, to allow aggregating complex expressions such as "1 + SUM(x + 2)"

2023-06-26 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5802:


 Summary: In RelBuilder add method aggregateExtended, to allow 
aggregating complex expressions such as "1 + SUM(x + 2)"
 Key: CALCITE-5802
 URL: https://issues.apache.org/jira/browse/CALCITE-5802
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


In {{RelBuilder}} add method {{{}aggregateExtended{}}}, to allow aggregating 
complex expressions such as "1 + SUM(x + 2)". These expressions are difficult 
because there is an expression ({{{}x + 2{}}}) before the aggregate, then the 
aggregate, then an expression ({{{}1 + sum{}}}) after the aggregate. For 
complex expressions such as this, the translation requires a {{Project}} 
followed by an {{Aggregate}} followed by a {{{}Project{}}}.

Aggregate functions are not conventionally represented as {{{}RexNode{}}}, but 
we allow them in the expression passed to {{{}aggregateExtended{}}}.

For example, to create the same effect as SQL
{code:java}
SELECT deptno,
deptno + 2 AS d2,
3 + SUM(4 + sal) AS s
FROM emp
GROUP BY deptno
{code}
we use the {{RelBuilder}} code
{code:java}
RelBuilder b;
b.scan("EMP")
.aggregateRex(b.groupKey(b.field("DEPTNO")),
b.field("DEPTNO"),
b.alias(
b.call(SqlStdOperatorTable.PLUS, b.field("DEPTNO"),
b.literal(2)),
"d2"),
b.alias(
b.call(SqlStdOperatorTable.PLUS, b.literal(3),
b.call(SqlStdOperatorTable.SUM,
b.call(SqlStdOperatorTable.PLUS, b.literal(4),
b.field("SAL",
"s"))
.build();
{code}
and the resulting relational expression is
{noformat}
LogicalProject(DEPTNO=[$0], d2=[+($0, 2)], s=[+(3, $1)])
  LogicalAggregate(group=[{0}], agg#0=[SUM($1)])
LogicalProject(DEPTNO=[$7], $f8=[+(4, $5)])
  LogicalTableScan(table=[[scott, EMP]])
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Towards Calcite 1.35.0

2023-06-23 Thread Julian Hyde
We said ‘mid-June’. Time to get this release rolling?

> On Jun 20, 2023, at 10:34 PM, Gian Merlino  wrote:
> 
> I committed the patch for CALCITE-5477. I would appreciate a review on 
> CALCITE-5479, if someone has a chance.
> 
> On 2023/06/07 09:53:08 Gian Merlino wrote:
>> Some more detail:
>> 
>> - the PR for CALCITE-5477 is https://github.com/apache/calcite/pull/3249
>> - the PR for CALCITE-5479 is https://github.com/apache/calcite/pull/3030
>> 
>> I've just pushed up some updates to reflect the new approach we discussed 
>> for guava backwards-compat in CALCITE-5477, and to sync up with latest main 
>> in CALCITE-5479.
>> 
>> We've been doing a bunch of work on the Druid side to be able to update our 
>> Calcite dependency (currently, we're on 1.21.0). From what we've seen so 
>> far, we believe if these two patches are part of 1.35.0, then we could 
>> target that version.
>> 
>> Gian
>> 
>> On 2023/06/07 06:32:34 Abhishek Agarwal wrote:
>>> We (Apache Druid) would also like
>>> https://issues.apache.org/jira/browse/CALCITE-5479 and
>>> https://issues.apache.org/jira/browse/CALCITE-5477 to be fixed in 1.35.0.
>>> There are PRs for both of these fixes. We are working on addressing review
>>> comments and just need a few days of time.
>>> 
>>> On Thu, Jun 1, 2023 at 3:32 AM Julian Hyde  wrote:
>>> 
>>>> There's one Jira case we should fix:
>>>> https://issues.apache.org/jira/browse/CALCITE-5737 (support JDK 19 and
>>>> 20). Can someone take that on?
>>>> 
>>>> Julian
>>>> 
>>>> On 2023/05/30 08:33:23 xiong duan wrote:
>>>>> Yes. I can release 1.35.0 when the PR is handled.
>>>>> 
>>>>> Julian Hyde  于2023年5月30日周二 05:46写道:
>>>>> 
>>>>>> Mid-June sounds like a good idea.
>>>>>> 
>>>>>> I would like to see the following Jira cases finished:
>>>>>> * 5701 NAMED_STRUCT (MasseGuillaume)
>>>>>> * 5640 SAFE_ADD (DanZou)
>>>>>> * 5625 SEARCH (olivrlee)
>>>>>> * 5626 Fully-qualified names (herunkang2018)
>>>>>> * 5615 SQL Logic Test (mbudiu)
>>>>>> * 5607 JSON serialization (olivrlee)
>>>>>> * 5564 PERCENTILE_CONT (tanclary)
>>>>>> * 5526 Unparsing literals (TJ Banghart)
>>>>>> 
>>>>>> If the authors have made changes they would like me to review, please
>>>>>> add a comment to the Jira case.
>>>>>> 
>>>>>> Any other cases, don't ask me personally, ask everyone on the dev
>>>>>> list. My capacity to review PRs is limited.
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> On Fri, May 26, 2023 at 11:16 PM Benchao Li 
>>>> wrote:
>>>>>>> 
>>>>>>> It's been a bit more than 2 months since our last release [1] and
>>>> there
>>>>>> are
>>>>>>> currently 80+ new commits in master.
>>>>>>> 
>>>>>>> As usual, according to our Jira dashboard [2] and Github [3], there
>>>> are
>>>>>>> many pending issues that could / should be part of the release. I'd
>>>>>> propose
>>>>>>> to make a collective effort to try to clean up our 1.35 backlog and
>>>> merge
>>>>>>> the PRs which are in a good state. I'd propose to aim for **mid
>>>> June** to
>>>>>>> release 1.35.0, this will give us about 15-20 days to clean up
>>>> pending
>>>>>> PRs
>>>>>>> for next version. What do you think?
>>>>>>> 
>>>>>>> According to [4], the following release managers would be:
>>>>>>> - 1.35.0 Duan Xiong
>>>>>>> - 1.36.0 Benchao Li
>>>>>>> - 1.37.0 Sergey Nuyanzin
>>>>>>> - 1.38.0 Julian Hyde
>>>>>>> 
>>>>>>> @Duan Xiong, are you still available for being the release manager
>>>> for
>>>>>>> 1.35.0?
>>>>>>> 
>>>>>>> And contributors, if you have any work that is in good shape and
>>>> want to
>>>>>> be
>>>>>>> included in 1.35.0, please mark the fixVersion to 1.35.0, this will
>>>>>> inform
>>>>>>> the release manager, and we'll try our best to get it in.
>>>>>>> 
>>>>>>> [1] https://calcite.apache.org/docs/history.html#v1-34-0
>>>>>>> [2]
>>>>>>> 
>>>>>> 
>>>> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
>>>>>>> [3] https://github.com/apache/calcite/pulls
>>>>>>> [4] https://lists.apache.org/thread/tm3t42qvpq3db24xtd2g468ofv83l6hk
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> Benchao Li
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 



Re: VARIANT and JSON data types

2023-06-21 Thread Julian Hyde
https://issues.apache.org/jira/browse/CALCITE-4918 

> On Jun 21, 2023, at 6:41 PM,   wrote:
> 
> Hello,
> 
> 
> 
> I have a question about Calcite and fancier data types.
> 
> It looks like Calcite JSON operations are really implemented by treating
> JSON as a string. I expect that this is expensive, since the string needs to
> be parsed and serialized back for every operation.
> 
> 
> 
> Snowflake uses VARIANT types to represent the parsed JSON; a VARIANT is
> really a dynamically-typed union type:
> https://docs.snowflake.com/en/sql-reference/data-types-semistructured#label-
> data-type-variant
> 
> 
> 
> Postgres has native types JSON, JSONB, and JSONPATH:
> https://www.postgresql.org/docs/15/datatype-json.html
> 
> 
> 
> Is there a way to manipulate JSON more efficiently in Calcite than by just
> using strings?
> 
> If there isn't, is anyone working on such a topic?
> 
> 
> 
> If there isn't anyone working on this, the next question is whether the
> existing Calcite type system is rich enough to support a VARIANT-like  type,
> or does the type system need to be extended?
> 
> 
> 
> Thank you,
> 
> Mihai
> 



Re: Support Graph Query In Calcite

2023-06-21 Thread Julian Hyde
I’ll reiterate what Stamatis said: log Jira tickets. Most of us know only 
vaguely what a graph query is. A well-written Jira case (describing the 
problem, with a couple of good SQL examples) will educate the people who would 
be reviewing the PR.

> On Jun 20, 2023, at 10:42 PM, pzwpzw  
> wrote:
> 
> Thanks for your reply. 
> Currently we have implement a sub-set of the ISO/GQL draft with some our 
> extension. It is widely used in the AntGroup. And the project is also 
> released as an open source project: 
> https://github.com/TuGraph-family/tugraph-analytics.
> I'm going to investigate the way Babel parser and the calcite adapter work 
> and choose a better contribution way.
> Thanks.
> 
>> 2023年6月20日下午6:40,Stamatis Zampetakis  写道:
>> 
>> 
>> Hello,
>> 
>> We definitely welcome contributions that could improve the graph query
>> capabilities in Calcite. If something is part of the SQL standard then
>> it can be contributed to the core SQL parser of Calcite. If it is not
>> standard but still widely accepted/used it should probably land in the
>> babel parser. In every other case, we have the adapters and we can
>> accept new ones if necessary.
>> 
>> In general smaller contributions are easier to review and merge so if
>> you can divide your work into smaller parts and create JIRA tickets
>> for each we can reason about them individually.
>> 
>> Other than that the following threads/topics may be somewhat relevant
>> to what you are doing.
>> 
>> * https://lists.apache.org/thread/q3yx5tmoxzwwh1n8x4oct0vfxcfpcjwg
>> * https://lists.apache.org/thread/oohfsp1ddyytn8v13btv21trwkbl5y2f
>> * https://issues.apache.org/jira/browse/CALCITE-3679
>> * https://calcite.apache.org/docs/reference.html#match_recognize
>> 
>> Best,
>> Stamatis
>> 
>> On Mon, Jun 19, 2023 at 6:03 AM pzwpzw
>>  wrote:
>>> 
>>> Hi everyone, I have noticed that the SQL 2023 have proposed the Graph 
>>> Property Query support. 
>>> http://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-finished-here-is-whats-new
>>> which we can combine the graph query "Match" statement with the sql.
>>> 
>>> And also the ISO/GQL have proposed the GQL query. We have implement the SQL 
>>> extend which can combine SQL with the ISO/GQL query in our open source 
>>> stream graph engine 
>>> TuGraph-Analytics(https://github.com/TuGraph-family/tugraph-analytics) 
>>> based on calcite which can combine the SQL process with the graph query in 
>>> one streaming engine. We can see the syntax here: 
>>> https://github.com/TuGraph-family/tugraph-analytics/blob/master/docs/docs-en/application-development/dsl/overview.md
>>> 
>>> So I would like to know if the Calcite community plans to support graph 
>>> query language in the future. If possible, we would like to contribute our 
>>> graph query extension to the community.
>>> 
>>> Thanks.
> 



[jira] [Created] (CALCITE-5790) Query with "*" but no FROM clause should be invalid

2023-06-20 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5790:


 Summary: Query with "*" but no FROM clause should be invalid
 Key: CALCITE-5790
 URL: https://issues.apache.org/jira/browse/CALCITE-5790
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


The query
{code}
SELECT *
{code}
(which has a "*" indicating "all columns of all source tables" but no {{FROM}} 
clause to define source tables) should be invalid.

The validator should throw "SELECT * requires a FROM clause".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5788) Order of metadata handler methods is inconsistent in different java versions

2023-06-19 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5788:


 Summary: Order of metadata handler methods is inconsistent in 
different java versions
 Key: CALCITE-5788
 URL: https://issues.apache.org/jira/browse/CALCITE-5788
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


The order of metadata handler methods is inconsistent in different java 
versions. The effect is that if you add a new metadata class with two or more 
methods (such as {{BuiltInMetadata.Measure}} in 
[julianhyde/4496-measure.16|https://github.com/julianhyde/calcite/tree/4496-measure.16]),
 you may get assertion failures when running under JDK 8 or 11 but not running 
under JDK 17:

{noformat}
$ ./gradlew :babel:test --tests BabelTest.testInfixCast
JAVA_HOME is /usr/lib/jvm/jdk-11
java version "11.0.18" 2023-01-17 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.18+9-LTS-195)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.18+9-LTS-195, mixed mode)

> Configure project :
Building Apache Calcite 1.35.0-SNAPSHOT

> Task :core:compileJava
Could not load entry 6c654d63a2c430f60f4f91733541df12 from remote build cache: 
Bucket 'calcite-gradle-cache' not found
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :babel:test FAILED
FAILURE   0.8sec, org.apache.calcite.test.BabelTest > testInfixCast()
java.lang.AssertionError
at 
org.apache.calcite.rel.metadata.MetadataDef.(MetadataDef.java:53)
at org.apache.calcite.rel.metadata.MetadataDef.of(MetadataDef.java:63)
at 
org.apache.calcite.rel.metadata.BuiltInMetadata$Measure.(BuiltInMetadata.java:842)
at 
org.apache.calcite.rel.metadata.RelMdMeasure.getDef(RelMdMeasure.java:43)
at 
org.apache.calcite.rel.metadata.ReflectiveRelMetadataProvider.reflectiveSource(ReflectiveRelMetadataProvider.java:134)
at 
org.apache.calcite.rel.metadata.RelMdMeasure.(RelMdMeasure.java:40)
at 
org.apache.calcite.rel.metadata.DefaultRelMetadataProvider.(DefaultRelMetadataProvider.java:42)
at 
org.apache.calcite.rel.metadata.DefaultRelMetadataProvider.(DefaultRelMetadataProvider.java:28)
at org.apache.calcite.plan.RelOptCluster.(RelOptCluster.java:97)
at org.apache.calcite.plan.RelOptCluster.create(RelOptCluster.java:106)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.createCluster(CalcitePrepareImpl.java:414)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.getPreparingStmt(CalcitePrepareImpl.java:554)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.prepare_(CalcitePrepareImpl.java:523)
at 
org.apache.calcite.prepare.CalcitePrepareImpl.prepareSql(CalcitePrepareImpl.java:492)
at 
org.apache.calcite.jdbc.CalciteConnectionImpl.parseQuery(CalciteConnectionImpl.java:236)
at 
org.apache.calcite.jdbc.CalciteMetaImpl.prepareAndExecute(CalciteMetaImpl.java:630)
at 
org.apache.calcite.avatica.AvaticaConnection.prepareAndExecuteInternal(AvaticaConnection.java:677)
at 
org.apache.calcite.avatica.AvaticaStatement.executeInternal(AvaticaStatement.java:157)
at 
org.apache.calcite.avatica.AvaticaStatement.executeQuery(AvaticaStatement.java:228)
at org.apache.calcite.test.BabelTest.checkInfixCast(BabelTest.java:104)
at org.apache.calcite.test.BabelTest.testInfixCast(BabelTest.java:92)
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


OSS-Fuzz

2023-06-16 Thread Julian Hyde
Someone from Google logged a case offering to add Calcite to the
OSS-Fuzz program. (I work for Google but was not aware that we were
being considered.)

https://issues.apache.org/jira/browse/CALCITE-5781

How do people feel about participating in this program?

I think that it could improve our security significantly, but it will
take work. The fuzzer might generate a lot of false negatives. It
might also generate quite a few genuine security issues that we will
need to respond to appropriately. As an all-volunteer project it might
put a strain on us.

Julian


Re: PR 3258 Review Request

2023-06-16 Thread Julian Hyde
Nice work.

(The hardest thing we do, as software engineers, is to make our work easy for 
others to understand. The fact that several databases have functions called 
CONCAT, with subtly different behaviors, made this task particularly 
challenging. You produced a solution that makes the differences really clear.)

I left one comment on the PR but it’s really close.

Julian


> On Jun 15, 2023, at 8:18 AM, Zhe Hu  wrote:
> 
> Hi, community.
> I’ve submitted a PR for applying two different NULL semantics for CONCAT 
> function, because this function now returns NULL when any of the arguments is 
> NULL, however Postgresql and MSSQL have different behavior(they treat NULL as 
> empty string).
> BTW, CONCAT enabled in Oracle(we call it CONCAT2 in Calcite) has been 
> resolved in CALCITE-5745, if anyone needs to refer to.
> 
> 
> https://github.com/apache/calcite/pull/3258
> 
> 
> Best,
> ZheHu
> 
> 
> 



Re: [ANNOUNCE] New committer: Oliver Lee

2023-06-13 Thread Julian Hyde
Thanks for your contributions, Oliver! And welcome!

On Tue, Jun 13, 2023 at 4:31 AM Zhe Hu  wrote:
>
> Congrats Oliver!
>  Replied Message 
> | From | xiong duan |
> | Date | 06/13/2023 19:17 |
> | To | dev |
> | Subject | Re: [ANNOUNCE] New committer: Oliver Lee |
> Congratulations Oliver!
>
> Dan Zou  于2023年6月13日周二 19:16写道:
>
> > Congratulations Oliver!
> >
> > Best,
> > Dan Zou
> >
> >
> >
> >
> >
> > > 2023年6月13日 18:44,Stamatis Zampetakis  写道:
> > >
> > > Apache Calcite's Project Management Committee (PMC) has invited Oliver
> > > Lee to become a committer, and we are pleased to announce that they
> > > have accepted.
> > >
> > > Oliver started working with us around November 2022 and since then
> > > they contributed multiple SQL functions to the Calcite repository
> > > bringing lots of improvements to the BigQuery dialect. They improved
> > > the extensibility of the SQL validator and pushed various fixes in
> > > RelNode serialization to JSON.
> > >
> > > Oliver, welcome, thank you for your contributions, and we look forward
> > > to your further interactions with the community! If you wish, please
> > > feel free to tell us more about yourself and what you are working on.
> > >
> > > As your first commit, please add yourself to the contributors list [1]
> > > and the community page will re-generate [2].
> > >
> > > Stamatis (on behalf of the Apache Calcite PMC)
> > >
> > > [1]
> > https://github.com/apache/calcite/blob/main/site/_data/contributors.yml
> > > [2] https://calcite.apache.org/community/#project-members
> >
> >


Community over Code (ApacheCon) East Asia

2023-06-11 Thread Julian Hyde
Community over Code (ApacheCon) East Asia is happening in August, and
the call for papers closes in LESS THAN A WEEK [1].

I'm proud of how many Calcite community members we have in China and
East Asia, and I appreciate the constant stream of excellent and
innovative contributions.

If you are one of those people, I encourage you to attend the
conference, or, better, submit a talk proposal. This is a great
opportunity to bring new people into the Calcite community, to
convince projects and companies based in China that Calcite is a good
core technology, or just to get to know each other better.

I have attended several ApacheCon conferences in the US and Canada,
and they have helped me gain an understanding of what Apache does, and
make some lasting friendships with people in other projects.

If you have an interesting talk about Calcite, please submit it!

I believe that there is travel assistance available for those who need
financial help getting to the event [2].

Julian

[1] https://apachecon.com/acasia2023/cfp.html
[2] https://tac.apache.org/


Re: PR 3250 (PairList, LintTest, Puffin, TestUnsafe)

2023-06-08 Thread Julian Hyde
I've merged this to main. (Thanks for the review, Zhe Hu!)

There are new lint checks on code (especially javadoc) and commit
messages. This may cause existing open PRs to give errors. Sorry about
that, but it should help us all write better commit messages. (And
reduce nagging from reviewers like me.)

Julian

On Wed, Jun 7, 2023 at 2:34 PM Julian Hyde  wrote:
>
> Can I please get a review for PR 3250? It has several commits that are
> loosely connected.
>
> CALCITE-5762 should be reviewed from a security perspective (because
> it deals with unsafe methods such as ProcessBuilder).
>
> CALCITE-5764 adds a fun class called Puffin that brings the scripting
> capabilities of 'awk' into Java.
>
> https://github.com/apache/calcite/pull/3250
>
> Julian


PR 3250 (PairList, LintTest, Puffin, TestUnsafe)

2023-06-07 Thread Julian Hyde
Can I please get a review for PR 3250? It has several commits that are
loosely connected.

CALCITE-5762 should be reviewed from a security perspective (because
it deals with unsafe methods such as ProcessBuilder).

CALCITE-5764 adds a fun class called Puffin that brings the scripting
capabilities of 'awk' into Java.

https://github.com/apache/calcite/pull/3250

Julian


[jira] [Created] (CALCITE-5765) Add LintTest, to apply custom lint rules to source code

2023-06-07 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5765:


 Summary: Add LintTest, to apply custom lint rules to source code
 Key: CALCITE-5765
 URL: https://issues.apache.org/jira/browse/CALCITE-5765
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Add {{LintTest}}, to apply custom lint rules to source code. If there are any 
violations, the test prints the warnings and fails.

Example lint rules:
 * Javadoc paragraphs must be separated by blank lines
 * Javadoc paragraphs must start with {{}}
 * In Javadoc, there must be a blank line between the description
  and the first {{@param}}

{{LintTest}} is built using Puffin (see CALCITE-5764), which allows regular 
expressions to be applied to individual lines and state that is carried from 
one line to the next. (The regular expressions provided by Autostyle are too 
cumbersome to match multi-line patterns.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5764) Puffin, an Awk for Java

2023-06-07 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5764:


 Summary: Puffin, an Awk for Java
 Key: CALCITE-5764
 URL: https://issues.apache.org/jira/browse/CALCITE-5764
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Create Puffin, which allows a programming model similar to the {{awk}} 
scripting language.

An {{awk}} program is a collection of rules, each of which is a pair: a 
predicate and an action. For each line in a file, the rules are applied in 
sequence, and if the predicate evaluates to true, the action is executed. Then 
{{awk}} goes on to the next file.

In {{Puffin}}, each predicate is a {{Predicate>}}, and each action is a 
{{Consumer}}. {{Line}} is a data structure that gives access to the text 
of the line, regular expression matching, and file-local and global state.

File-local state is allocated by a factory, and each file is processed in a 
single thread. This allows {{Puffin}} to be invoked on multiple files (or more 
generally sources, including URLs) and processed in parallel. Global state is 
shared, and rules must coordinate when they access it.

Here is a simple {{awk}} script that counts the number of non-comment lines in 
a file:

{code}
/^#/ { ++n; }
END { printf("counter: %d\n", n); }
{code}

Here is the equivalent Puffin program:
{code}
Puffin.Program program =
Puffin.builder(() -> Unit.INSTANCE, u -> new AtomicInteger())
.add(line -> !line.startsWith("#"),
line -> line.state().incrementAndGet())
.after(context ->
context.println("counter: " + context.state().get()))
.build();
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5763) Discontinue support for Guava < 20.0

2023-06-07 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5763:


 Summary: Discontinue support for Guava < 20.0
 Key: CALCITE-5763
 URL: https://issues.apache.org/jira/browse/CALCITE-5763
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Discontinue support for Guava versions before 20.0, and resume building on the 
latest Guava. This reverses CALCITE-5477, which changes the build from Guava 
31.1-jre to 19.0, and CALCITE-5428, which moves the minimum supported Guava 
version from 19.0 to 16.0.1.

This change will happen no earlier than "the first release after August", 
therefore can be merged to main no earlier than 2023-09-01. I recommend that it 
is merged very soon after that date. I have set fixVersion = 1.36 assuming that 
1.36 is the first release after August.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5762) Create class TestUnsafe, that contains unsafe methods used by tests

2023-06-07 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5762:


 Summary: Create class TestUnsafe, that contains unsafe methods 
used by tests
 Key: CALCITE-5762
 URL: https://issues.apache.org/jira/browse/CALCITE-5762
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Create {{class TestUnsafe}}, that contains unsafe methods used by tests. 
Similar to the existing {{class Unsafe}}, this will be excluded from 
forbidden-apis checks. But it contains sensitive commands (e.g. 
{{java.lang.Runtime#exec}})) that are safe to use in tests but must not be on 
the runtime class path.

Move some methods from {{class ConcurrentTestCommandScript}} to {{class 
TestUnsafe}}, and remove the former class from the forbidden-apis allow-list.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Implementing EXPLODE operator

2023-06-03 Thread Julian Hyde
I don’t know what EXPLODE is and you didn’t define it. UNNEST is in the SQL 
standard. 

Julian

> On Jun 3, 2023, at 09:41, Soumyadeep Mukhopadhyay  
> wrote:
> 
> Actually I am hoping to contribute, but only if it doesn’t seem redundant.
> That’s why I wanted to understand if the effort would even make sense. Was
> EXPLODE not included by design? Would you recommend against adding it?
> 
> Soumyadeep.
> 
> 
> 
>> On Sat, 3 Jun 2023 at 9:51 PM, Julian Hyde  wrote:
>> 
>> I can’t tell whether you are intending to contribute EXPLODE or do it on a
>> private branch. You don’t make a case for what EXPLODE could do, so I
>> presume the latter.
>> 
>> UNNEST is a unique function. Its implementation is (unfortunately but
>> necessarily) spread over many files. Copy-pasting it to make a new function
>> seems like a bad idea, because you would be multiplying a mess. I don’t
>> know what you’re trying to achieve but I would lean on the existing
>> facilities (UDFs, table functions, adding syntactic sugar if necessary).
>> 
>> Julian
>> 
>>> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay <
>> soumyamy...@gmail.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> I was wondering if there's any virtue in creating an operator like UNNEST
>>> by having a similar implementation like that of SqlUnnestOperator.
>>> (SqlExplodeOperator maybe)
>>> 
>>> The functionalities are close but there are nuances that I have not
>>> discovered yet.
>>> 
>>> My approach was kind of a hack :
>>> Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator
>> name
>>> as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
>>> .nullable(true).build() in the inferReturnType to "$explode" -> then
>> check
>>> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
>>> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call, false).
>>> NOTE: I have also tried creating a new kind EXPLODE and then followed the
>>> above step, both of them work.
>>> 
>>> As Julian had pointed out about semantics and how everything within
>> Calcite
>>> should be in Calcite's dialect, I feel there could be antipatterns in my
>>> approach.
>>> Could any of you please point me to where I might be going wrong? How
>> could
>>> I make this more robust?
>>> 
>>> (If any of you are wondering why I am doing this then this is what I want
>>> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
>>> "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect, like
>>> Spark for example)
>>> 
>>> Thank you for your time!
>>> 
>>> Best,
>>> Soumyadeep.
>> 


Re: Implementing EXPLODE operator

2023-06-03 Thread Julian Hyde
I can’t tell whether you are intending to contribute EXPLODE or do it on a 
private branch. You don’t make a case for what EXPLODE could do, so I presume 
the latter. 

UNNEST is a unique function. Its implementation is (unfortunately but 
necessarily) spread over many files. Copy-pasting it to make a new function 
seems like a bad idea, because you would be multiplying a mess. I don’t know 
what you’re trying to achieve but I would lean on the existing facilities 
(UDFs, table functions, adding syntactic sugar if necessary).

Julian

> On Jun 3, 2023, at 2:45 AM, Soumyadeep Mukhopadhyay  
> wrote:
> 
> Hello all,
> 
> I was wondering if there's any virtue in creating an operator like UNNEST
> by having a similar implementation like that of SqlUnnestOperator.
> (SqlExplodeOperator maybe)
> 
> The functionalities are close but there are nuances that I have not
> discovered yet.
> 
> My approach was kind of a hack :
> Clone the class SqlUnnestOperator -> rename the SqlFunctionalOperator name
> as "EXPLODE" and rename return builder.add("$unnest", SqlTypeName.ANY)
> .nullable(true).build() in the inferReturnType to "$explode" -> then check
> if call.getKind() within unparseCall is UNNEST -> if yes then SqlUtil.
> unparseFunctionSyntax(SqlStdOperatorTable.EXPLODE, writer, call, false).
> NOTE: I have also tried creating a new kind EXPLODE and then followed the
> above step, both of them work.
> 
> As Julian had pointed out about semantics and how everything within Calcite
> should be in Calcite's dialect, I feel there could be antipatterns in my
> approach.
> Could any of you please point me to where I might be going wrong? How could
> I make this more robust?
> 
> (If any of you are wondering why I am doing this then this is what I want
> to achieve - "SELECT * FROM UNNEST(ARRAY['1', '2'])" can be written as
> "SELECT * FROM EXPLODE(ARRAY['1', '2'])" in an appropriate dialect, like
> Spark for example)
> 
> Thank you for your time!
> 
> Best,
> Soumyadeep.


Re: Algebraic optimizations rule

2023-06-03 Thread Julian Hyde
Be careful when applying algebraic optimizations to computer numbers. For 
example,

 a + (b - c) = (a + b) - c

doesn’t hold if a, b and c are floating point numbers, a and “b - c” are small, 
and b and c are large. “a + (b - c)” will experience less underflow and will 
deliver a more accurate result.

Julian
 

> On Jun 2, 2023, at 7:15 PM,   wrote:
> 
> I have seen the constant folding rules you mention in [2], but these seen to 
> require all operands to be compile-time known.
> Algebraic optimization rules can work even when only some of the operands are 
> constant, such as addition of an arbitrary expression with zero.
> 
> I will take a look at the simplification code to see whether it handles such 
> cases.
> 
> Thank you,
> Mihai
> 
> -Original Message-
> From: Jiajun Xie 
> Sent: Friday, June 02, 2023 7:08 PM
> To: dev@calcite.apache.org
> Subject: Re: Algebraic optimizations rule
> 
> Hello, mbudiu:
> 
> Based on my experience, constant folding can occur in two stages.
> - Enable it by `RelBuilder$Config$withSimplify`[1] in the convert stage
> - Add `RelOptRules$CONSTANT_REDUCTION_RULES`[2] in the optimize stage
> 
> If your RexNode is not complex, you can directly use RexSimplify. For example 
> ```[3]
>final RexSimplify simplify =
>new RexSimplify(rexBuilder, RelOptPredicateList.EMPTY,
> RexUtil.EXECUTOR)
>.withParanoid(true);
>return simplify.simplifyUnknownAs(e, RexUnknownAs.UNKNOWN); ```
> 
> 
> [1]
> https://github.com/apache/calcite/blob/8ea4160f10e95aca6c3b0029d505bbc56975a873/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L4841
> [2]
> https://github.com/apache/calcite/blob/8ea4160f10e95aca6c3b0029d505bbc56975a873/core/src/main/java/org/apache/calcite/plan/RelOptRules.java#L132
> [3]
> https://github.com/apache/calcite/blob/8ea4160f10e95aca6c3b0029d505bbc56975a873/core/src/test/java/org/apache/calcite/rex/RexProgramTestBase.java#L187
> 
> On Sat, 3 Jun 2023 at 04:33,  wrote:
> 
>> Hello,
>> 
>> 
>> 
>> I was looking for rules that apply algebraic optimizations to RexNode 
>> expressions, such as 0 + x = x, but I couldn't find any.
>> 
>> Is there such a rule?
>> 
>> 
>> 
>> Mihai
>> 
>> 
> 



[jira] [Created] (CALCITE-5746) Support JDK 19

2023-06-02 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5746:


 Summary: Support JDK 19
 Key: CALCITE-5746
 URL: https://issues.apache.org/jira/browse/CALCITE-5746
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde
Assignee: Sergey Nuyanzin
 Fix For: 1.35.0


Support JDK 19 in Calcite. Modify history.md and remove calls to deprecated 
Locale constructors. To be fixed before Calcite 1.35.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Towards Calcite 1.35.0

2023-05-31 Thread Julian Hyde
There's one Jira case we should fix: 
https://issues.apache.org/jira/browse/CALCITE-5737 (support JDK 19 and 20). Can 
someone take that on?

Julian

On 2023/05/30 08:33:23 xiong duan wrote:
> Yes. I can release 1.35.0 when the PR is handled.
> 
> Julian Hyde  于2023年5月30日周二 05:46写道:
> 
> > Mid-June sounds like a good idea.
> >
> > I would like to see the following Jira cases finished:
> >  * 5701 NAMED_STRUCT (MasseGuillaume)
> >  * 5640 SAFE_ADD (DanZou)
> >  * 5625 SEARCH (olivrlee)
> >  * 5626 Fully-qualified names (herunkang2018)
> >  * 5615 SQL Logic Test (mbudiu)
> >  * 5607 JSON serialization (olivrlee)
> >  * 5564 PERCENTILE_CONT (tanclary)
> >  * 5526 Unparsing literals (TJ Banghart)
> >
> > If the authors have made changes they would like me to review, please
> > add a comment to the Jira case.
> >
> > Any other cases, don't ask me personally, ask everyone on the dev
> > list. My capacity to review PRs is limited.
> >
> > Julian
> >
> > On Fri, May 26, 2023 at 11:16 PM Benchao Li  wrote:
> > >
> > > It's been a bit more than 2 months since our last release [1] and there
> > are
> > > currently 80+ new commits in master.
> > >
> > > As usual, according to our Jira dashboard [2] and Github [3], there are
> > > many pending issues that could / should be part of the release. I'd
> > propose
> > > to make a collective effort to try to clean up our 1.35 backlog and merge
> > > the PRs which are in a good state. I'd propose to aim for **mid June** to
> > > release 1.35.0, this will give us about 15-20 days to clean up pending
> > PRs
> > > for next version. What do you think?
> > >
> > > According to [4], the following release managers would be:
> > > - 1.35.0 Duan Xiong
> > > - 1.36.0 Benchao Li
> > > - 1.37.0 Sergey Nuyanzin
> > > - 1.38.0 Julian Hyde
> > >
> > > @Duan Xiong, are you still available for being the release manager for
> > > 1.35.0?
> > >
> > > And contributors, if you have any work that is in good shape and want to
> > be
> > > included in 1.35.0, please mark the fixVersion to 1.35.0, this will
> > inform
> > > the release manager, and we'll try our best to get it in.
> > >
> > > [1] https://calcite.apache.org/docs/history.html#v1-34-0
> > > [2]
> > >
> > https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
> > > [3] https://github.com/apache/calcite/pulls
> > > [4] https://lists.apache.org/thread/tm3t42qvpq3db24xtd2g468ofv83l6hk
> > >
> > >
> > > Best,
> > > Benchao Li
> >
> 


[jira] [Created] (CALCITE-5737) Support JDK 19, JDK 20

2023-05-31 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5737:


 Summary: Support JDK 19, JDK 20
 Key: CALCITE-5737
 URL: https://issues.apache.org/jira/browse/CALCITE-5737
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde
 Fix For: 1.35.0


Support JDK 19, JDK 20.

We currently support JDK (and OpenJDK) versions up to 18. JDK 20 is the latest. 
We should support it in Calcite 1.35.

This change would modify history.md (for the upcoming release), add JDK 20 to 
the GitHub CI, and fix the build. I see a couple of deprecation warnings 
compiling under JDK 19:
{noformat}
core/src/main/java/org/apache/calcite/util/Util.java:1723: warning: 
[deprecation] Locale(String) in Locale has been deprecated
  return new Locale(strings[0]);
 ^
core/src/main/java/org/apache/calcite/util/Util.java:1725: warning: 
[deprecation] Locale(String,String) in Locale has been deprecated
  return new Locale(strings[0], strings[1]);
 ^
core/src/main/java/org/apache/calcite/util/Util.java:1727: warning: 
[deprecation] Locale(String,String,String) in Locale has been deprecated
  return new Locale(strings[0], strings[1], strings[2]);
 ^
error: warnings found and -Werror specified
core/src/main/java/org/apache/calcite/sql/parser/SqlParserUtil.java:744: 
warning: [deprecation] Locale(String) in Locale has been deprecated
  locale = new Locale(localeParts[0]);
   ^
core/src/main/java/org/apache/calcite/sql/parser/SqlParserUtil.java:746: 
warning: [deprecation] Locale(String,String) in Locale has been deprecated
  locale = new Locale(localeParts[0], localeParts[1]);
   ^
core/src/main/java/org/apache/calcite/sql/parser/SqlParserUtil.java:748: 
warning: [deprecation] Locale(String,String,String) in Locale has been 
deprecated
  locale = new Locale(localeParts[0], localeParts[1], localeParts[2]);
   ^
{noformat}
These definitely need to be fixed. I don't know whether there's anything else.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Towards Calcite 1.35.0

2023-05-29 Thread Julian Hyde
Mid-June sounds like a good idea.

I would like to see the following Jira cases finished:
 * 5701 NAMED_STRUCT (MasseGuillaume)
 * 5640 SAFE_ADD (DanZou)
 * 5625 SEARCH (olivrlee)
 * 5626 Fully-qualified names (herunkang2018)
 * 5615 SQL Logic Test (mbudiu)
 * 5607 JSON serialization (olivrlee)
 * 5564 PERCENTILE_CONT (tanclary)
 * 5526 Unparsing literals (TJ Banghart)

If the authors have made changes they would like me to review, please
add a comment to the Jira case.

Any other cases, don't ask me personally, ask everyone on the dev
list. My capacity to review PRs is limited.

Julian

On Fri, May 26, 2023 at 11:16 PM Benchao Li  wrote:
>
> It's been a bit more than 2 months since our last release [1] and there are
> currently 80+ new commits in master.
>
> As usual, according to our Jira dashboard [2] and Github [3], there are
> many pending issues that could / should be part of the release. I'd propose
> to make a collective effort to try to clean up our 1.35 backlog and merge
> the PRs which are in a good state. I'd propose to aim for **mid June** to
> release 1.35.0, this will give us about 15-20 days to clean up pending PRs
> for next version. What do you think?
>
> According to [4], the following release managers would be:
> - 1.35.0 Duan Xiong
> - 1.36.0 Benchao Li
> - 1.37.0 Sergey Nuyanzin
> - 1.38.0 Julian Hyde
>
> @Duan Xiong, are you still available for being the release manager for
> 1.35.0?
>
> And contributors, if you have any work that is in good shape and want to be
> included in 1.35.0, please mark the fixVersion to 1.35.0, this will inform
> the release manager, and we'll try our best to get it in.
>
> [1] https://calcite.apache.org/docs/history.html#v1-34-0
> [2]
> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
> [3] https://github.com/apache/calcite/pulls
> [4] https://lists.apache.org/thread/tm3t42qvpq3db24xtd2g468ofv83l6hk
>
>
> Best,
> Benchao Li


Re: [ANNOUNCE] New committer: Tanner Clary

2023-05-28 Thread Julian Hyde
Welcome to the community, Tanner! You've earned it.

On Sun, May 28, 2023 at 5:11 AM Jacky Lau  wrote:
>
> Congratulations, Tanner!
>
> Best regards,
> Jacky
>
> Jiajun Xie  于2023年5月27日周六 11:28写道:
>
> > Congratulations and welcome, Tanner!
> >
> > On Sat, 27 May 2023 at 02:46, Sergey Nuyanzin  wrote:
> >
> > > Congratulations, Tanner!
> > >
> > > On Fri, May 26, 2023 at 8:40 PM Bertil Chapuis 
> > wrote:
> > >
> > > > Congratulations and welcome!
> > > >
> > > > Bertil
> > > >
> > > > > On 26 May 2023, at 18:58, Tanner Clary  > > .INVALID>
> > > > wrote:
> > > > >
> > > > > Hey everyone,
> > > > >
> > > > > Thank you all for the warm welcome! I have really enjoyed
> > contributing
> > > to
> > > > > the project so far and look forward to continuing to do so.
> > > > >
> > > > > About me: I graduated in May 2022 and since then I've been working on
> > > > > Looker @ Google. I'm eager to learn more about other areas of the
> > > project
> > > > > and hopefully collaborate with many of you along the way.
> > > > >
> > > > > Thanks again,
> > > > > Tanner Clary
> > > > >
> > > > >
> > > > > On Fri, May 26, 2023 at 5:12 AM Michael Mior 
> > wrote:
> > > > >
> > > > >> Congratulations and welcome Tanner!
> > > > >> --
> > > > >> Michael Mior
> > > > >> mm...@apache.org
> > > > >>
> > > > >>
> > > > >> On Fri, May 26, 2023 at 5:05 AM Stamatis Zampetakis <
> > > zabe...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Apache Calcite's Project Management Committee (PMC) has invited
> > > Tanner
> > > > >>> Clary to become a committer, and we are pleased to announce that
> > they
> > > > >>> have accepted.
> > > > >>>
> > > > >>> In less than 5 months, Tanner has introduced, enabled, and tested
> > > over
> > > > >>> 30 SQL functions in Calcite. They have been a driving force in
> > > > >>> improving the BigQuery dialect and by now an expert in library and
> > > > >>> parser changes.
> > > > >>>
> > > > >>> Tanner, welcome, thank you for your contributions, and we look
> > > forward
> > > > >>> to your further interactions with the community! If you wish,
> > please
> > > > >>> feel free to tell us more about yourself and what you are working
> > on.
> > > > >>>
> > > > >>> As your first commit, please add yourself to the contributors list
> > > [1]
> > > > >>> and the community page will re-generate [2].
> > > > >>>
> > > > >>> Stamatis (on behalf of the Apache Calcite PMC)
> > > > >>>
> > > > >>> [1]
> > > > >>>
> > > >
> > https://github.com/apache/calcite/blob/main/site/_data/contributors.yml
> > > > >>>
> > > > >>> [2] https://calcite.apache.org/community/#project-members
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> > > --
> > > Best regards,
> > > Sergey
> > >
> >


[jira] [Created] (CALCITE-5726) Canonize use of Hamcrest matchers in test code

2023-05-26 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5726:


 Summary: Canonize use of Hamcrest matchers in test code
 Key: CALCITE-5726
 URL: https://issues.apache.org/jira/browse/CALCITE-5726
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Canonize use of Hamcrest matchers in test code.

1. If there are multiple equivalent methods, use the canonical one:
* CoreMatchers.is  Is.is
* Matchers.hasToString  HasToString.hasToString
* Matchers.hasSize  IsCollectionWithSize.hasSize
* Matchers.aMapWithSize  IsMapWithSize.aMapWithSize

2. Use matchers for {{Object.toString()}}, {{Collection.size()}}, 
{{Map.size()}}:
* assertThat(map.size(), is\(n))  assertThat(map, aMapWithSize\(n));
* assertThat(list.size(), is\(n))  assertThat(map, 
IsCollectionWithSize.hasSize\(n));
* assertThat(o.toString(), is(s))  assertThat(o, hasToString(s));
* assertThat(o.toString(), equalTo(s))  assertThat(o, hasToString(s));

Require static import of Hamcrest methods.
 
The benefit is conciseness/uniformity of tests. Most of these transformations 
can be accomplished using autostyle rules in the Gradle {{build.gradle.kts}}, 
so people's code will be fixed using {{gradle autostyleApply}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Add new functions in existing dialects

2023-05-25 Thread Julian Hyde
You could do something that substitutes RexCalls or SqlCalls of unsupported 
function calls with calls to supported functions.

But be sure to remember that the semantics of functions does not depend on the 
target dialect. Function semantics are always in terms of Calcite’s dialect. 
This matters when two DBs have a function with the same name that have 
different semantics. Examples of this are the various SUBSTR_ORACLE, 
SUBSTR_BIG_QUERY, SUBSTR_MYSQL functions in SqlLibraryOperators, which all 
appear as “SUBSTR” in their respective DBs but have different semantics when 
length is negative or zero. 

Julian


> On May 25, 2023, at 10:09 AM, Soumyadeep Mukhopadhyay  
> wrote:
> 
> Hello Everyone,
> 
> I wanted to know how one can leverage existing dialects to add new
> functions or rename the occurrence of a specific function (or an operator).
> 
> For example, a query in Spark SQL - "SELECT EXPLODE(ARRAY['1', '2'])" will
> only work in Postgres if it is rewritten as  "SELECT UNNEST(ARRAY['1',
> '2'])". If I need to emulate this kind of substitution how should I
> approach it?
> 
> Another example that I was thinking of is, implementing functions that
> don't exist in the current dialect like ARRAY_AGG in Snowflake. How can I
> implement this in Calcite?
> 
> Any suggestions would be much appreciated. Thank you for your time! :)
> 
> With best regards,
> Soumydeep.



Re: Question | Referencing column alias in WHERE clause

2023-05-22 Thread Julian Hyde
It does not currently. See https://issues.apache.org/jira/browse/CALCITE-5248. 

> On May 22, 2023, at 4:12 AM, Zineeddine ZIDANE  wrote:
> 
> Does calcite support referencing column aliases in WHERE clause? I know SQL
> doesn't allow this but it is supported out of the box in databases such as
> redshift, bigquery etc..
> 
> The following query (for the sake of the example) fails with the error
> message below:
> 
> Query:
> 
> 
> 
> SELECT (CASE WHEN deptno = 20 THEN deptno ELSE NULL END) AS deptno_20
> FROM emp WHERE deptno_20 = 20;
> 
> 
> 
> Error:
> 
> 
> 
> org.apache.calcite.sql.validate.SqlValidatorException: Column 'W_20'
> not found in any table
>at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>at 
> java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
>at 
> org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:505)
>at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:599)
>at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:945)
>at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:930)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:5408)
>at 
> org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:286)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:6573)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:6553)
>at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:324)
>at 
> org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:134)
>at 
> org.apache.calcite.sql.util.SqlShuttle$CallCopyingArgHandler.visitChild(SqlShuttle.java:101)
>at org.apache.calcite.sql.SqlOperator.acceptCall(SqlOperator.java:954)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visitScoped(SqlValidatorImpl.java:6593)
>at 
> org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:54)
>at 
> org.apache.calcite.sql.validate.SqlScopedShuttle.visit(SqlScopedShuttle.java:37)
>at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:161)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.go(SqlValidatorImpl.java:6562)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.expand(SqlValidatorImpl.java:6144)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateWhereClause(SqlValidatorImpl.java:4450)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3737)
>at 
> org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:61)
>at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:88)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1114)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:1085)
>at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:248)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:1060)
>at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:766)
> 
> 
> Thank you for any leads on this,



Re: Translate join predicates into filters

2023-05-18 Thread Julian Hyde
I expect that your table works if you put the filter in the WHERE clause, e.g.

  SELECT *
  FROM Books AS b
  WHERE b.id  IN (1, 10, 27)

and it does so using FilterTableScanRule (which matches a Filter on top of a 
Scan of a FilterableTable). But you need a new planner rule that can convert a 
Join whose right input is a Scan of a FilterableTable into a NestedLoopsJoin 
that dynamically sets the filter for each row from the left. (In this query, 
config would be on the left, Books on the right.)

There could be a more efficient version that gathers all IDs from the left, 
then does one request to the right, and them something like a hash join.

Julian




> On May 17, 2023, at 8:41 AM, Luca Marchi  wrote:
> 
> Morning everyone, 
> in our company we are running a POC using Apache Calcite, and we would like 
> to collect some feedbacks from you for the scenario mentioned below.
> 
> There is a service API that allows retrieving some `Book`s, and we would like 
> to build a table adapter on top of this service; this API 
> only accepts a set of IDs, and if no IDs are provided, no result is returned.
> 
> ```
>  interface BookService {
>/** Returns the books matching the given IDs.
> *
> * If not IDs is provided, no result is returned.
> */
>List findBooksByIds(Set ids);
>  }
> 
>  record Book(String id, String title) {};
> ```
> 
> A requirement of this table is that it has to support join, and we would like 
> to support joining by ID in an efficient way.
> 
> The goal is to define a rule that forces the query planner to always push 
> down join predicates into a table scan.
> 
> Given the following `book` table:
> 
> ```java
> /** A table which represents books, queryable only by their ID. */
> final class BookTable extends AbstractTable implements FilterableTable {
>  private final BookService service;
> 
>  BookTable(BookService service) {
>this.service = service;
>  }
> 
>  @Override
>  public RelDataType getRowType(RelDataTypeFactory typeFactory) {
>return new RelDataTypeFactory.Builder(typeFactory)
>.add("id", SqlTypeName.VARCHAR)
>.add("title", SqlTypeName.VARCHAR)
>.build();
>  }
> 
>  @Override
>  public Enumerable scan(DataContext root, List filters) {
>Set bookIds = getBooksId(filters);
>List result = service.findBooksByIds(bookIds)
>.stream()
>.map(b -> new Object[]{b.id, b.title})
>.toList();
> 
>return Linq4j.asEnumerable(result);
>  }
> 
>  private static Set getBooksId(List filters) {
>if (filters.size() != 1) {
>  throw new IllegalArgumentException("Expected one filter to the ID, 
> found: %d".formatted(filters.size()));
>}
> 
>RexNode filter = filters.get(0);
>RexNode leftCondition = ((RexCall) filter).getOperands().get(0);
>RexNode rightCondition = ((RexCall) filter).getOperands().get(1);
> 
>if (leftCondition instanceof RexInputRef left
>&& rightCondition instanceof RexLiteral right
>// The index of the ID column is 1.
>&& left.getIndex() == 1) {
>  if (filter.isA(SqlKind.EQUALS)) {
>String bookId = right.getValue2().toString();
>return ImmutableSet.of(bookId);
>  }
>  if (filter.isA(SqlKind.SEARCH)) {
>@SuppressWarnings("unchecked")
>Sarg searchArguments = right.getValueAs(Sarg.class);
>return searchArguments.rangeSet.asRanges().stream()
>.map(Range::lowerEndpoint)
>.map(NlsString::getValue)
>.collect(toSet());
>  }
>}
>throw new IllegalArgumentException("Unexpected operator, found: 
> %s".formatted(filter.getKind()));
>  }
> }
> ```
> 
> The API of the `BookService` always expects a set of IDs, and in case of 
> query like (assuming an entry in `book` matching the ID `'a'`):
> ```sql
> WITH config (id, val) AS (
>VALUES ('a', 3), ('b', 5)
> )
> SELECT b.* FROM books b
> INNER JOIN config ON b.id = config.id
> WHERE config.val > 4
> ```
> 
> Calcite produces the following plan:
> ```
> EnumerableCalc(expr#0..2=[{inputs}], id=[$t1], title=[$t2])
>  EnumerableMergeJoin(condition=[=($0, $1)], joinType=[inner])
>EnumerableSort(sort0=[$0], dir0=[ASC])
>  EnumerableCalc(expr#0..1=[{inputs}], id=[$t0])
>EnumerableTableScan(table=[[books]])
>EnumerableSort(sort0=[$0], dir0=[ASC])
>  EnumerableTableScan(table=[[books]])
> ```
> 
> This means Calcite performs a full table scan of the `book` table, and since 
> the `RexNode` filters in the `scan` method are empty, no result is returned 
> (in this example we are using a value statement scoped views, but ideally the 
> solution we are looking for should be valid for other table).
> Under some defined circustances, Postgres generates Nested-Loop query plan 
> for join: it first selects the row of the table A matching a given condition, 
> then iterates over the retrieved rows and performs a scan of table B looking 
> for rows that match the join condition; and this seems 

LITERAL_AGG

2023-05-17 Thread Julian Hyde
Can someone please review https://github.com/apache/calcite/pull/3209.
It has fixes for https://issues.apache.org/jira/browse/CALCITE-4334
(LITERAL_AGG) and https://issues.apache.org/jira/browse/CALCITE-5697
(RelBuilder.convert).

I would especially like feedback on the strategy of avoiding a
spurious Project on top of an Aggregate just to project a constant
value.

This change is a step towards
https://issues.apache.org/jira/browse/CALCITE-4496 (measures) and
https://issues.apache.org/jira/browse/CALCITE-5692 (AT operator).
These are major changes to SQL semantics. If someone would like to
work with me on those I would very much appreciate it.

Julian


[jira] [Created] (CALCITE-5706) Add class PairList

2023-05-16 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5706:


 Summary: Add class PairList
 Key: CALCITE-5706
 URL: https://issues.apache.org/jira/browse/CALCITE-5706
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Add a {{class PairList}} which an implementation of {{List}} 
backed by a single list. It can be used to build two lists in parallel (e.g. a 
list of field types and field names that will be converted to a struct type); 
it can also be used to build maps.

It has a {{forEach(BiConsumer)}} method to allow the list to be 
deconstructed without creating intermediate entries.

Potentially also {{toImmutableMap}} and {{toHashMap}} methods.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: structs

2023-05-15 Thread Julian Hyde
I added some comments to the Jira case.

Thanks for finding PR 2573 (for 
https://issues.apache.org/jira/browse/CALCITE-4842). I think that case needs to 
be fixed first. That PR looks almost ready.

Julian


> On May 15, 2023, at 11:51 AM, Guillaume Masse 
>  wrote:
> 
> Hi All,
> 
> I created a ticket to implement an Apache Spark function called named_struct
> https://issues.apache.org/jira/browse/CALCITE-5701
> 
> I'm wondering if there is already a construct for creating nested
> structures and their types. I saw an open PR for nested row types:
> https://github.com/apache/calcite/pull/2573
> 
> any hints on the subject is gratefully received, I'm planning on working on
> this task this week.
> 
> -- 
> Guillaume Massé
> [Gee-OHM]
> (马赛卫)



[jira] [Created] (CALCITE-5697) RelBuilder.convert does not match nullability if top of stack is a Project

2023-05-10 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5697:


 Summary: RelBuilder.convert does not match nullability if top of 
stack is a Project
 Key: CALCITE-5697
 URL: https://issues.apache.org/jira/browse/CALCITE-5697
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


{{RelBuilder.convert}} does not match nullability if top of stack is a 
{{Project}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (CALCITE-5692) Add AT operator, for context-sensitive expressions

2023-05-06 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5692:


 Summary: Add AT operator, for context-sensitive expressions
 Key: CALCITE-5692
 URL: https://issues.apache.org/jira/browse/CALCITE-5692
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


In CALCITE-4496 we added measures, and the ability to evaluate a measure in a 
context determined by the GROUP BY clause. In this change, we add the ability 
to change that context using the AT operator.

When used as a clause of an aggregate function call, {{AT}} is similar to 
{{FILTER}}. For example, in the query
{code}
SELECT deptno,
  AVG(sal) FILTER (WHERE job <> 'MANAGER') AS a1,
  AVG(sal) AT (VISIBLE WHERE job <> 'MANAGER') AS a2,
  AVG(sal) AT (WHERE job <> 'MANAGER') AS a2
FROM emp
WHERE sal > 1000
GROUP BY deptno;
{code}
{{a1}} and {{a2}} always return the same value - the average salary of 
non-managers who earn more than $1000. But {{a3}} (similar to {{a2}} but 
missing the {{VISIBLE}} keyword) computes the average of all non-managers (not 
just those earning more than $1000).

Sub-clauses of {{AT}}:
 * {{VISIBLE}} adds a filter equivalent to the {{WHERE}} clause (has no effect 
when used in the {{WHERE}} or {{ON}} clauses);
 * {{WHERE}} adds the given predicate as a filter;
 * {{CLEAR GROUP}} clears the predicate of the current group, if any (e.g. 
"deptno = 10 AND job = 'ANALYST'" if we are in the SELECT clause of a query 
with "GROUP BY deptno, job" evaluating the row (10, 'ANALYST');
 * {{CLEAR WHERE}} clears the predicate of the {{WHERE}} clause (has no effect 
when used in the {{WHERE}} or {{ON}} clauses);
 * {{SET dimension = expression}} clears the predicate on {{dimension}} (if 
any) and adds a filter "dimension = expression".




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Authentication and authorizations in Calcite

2023-05-01 Thread Julian Hyde
Thanks for digging these up, Stamatis.

I have logged https://issues.apache.org/jira/browse/CALCITE-5681 for DDL (GRANT 
and REVOKE commands).

Contributions welcome.

Julian


> On Apr 29, 2023, at 6:53 AM, Stamatis Zampetakis  wrote:
> 
> Following up on the authorization part lots of things can be achieved
> through the use of views. Views can be used to restrict the columns
> and rows that are visible to certain users and it is a common way of
> handling permissions.
> 
> Here also two other JIRA tickets with discussion around this topic:
> https://issues.apache.org/jira/browse/CALCITE-2194
> https://issues.apache.org/jira/browse/CALCITE-5292
> 
> Best,
> Stamatis
> 
> 
> 
> On Sat, Apr 29, 2023 at 2:11 AM Julian Hyde  wrote:
>> 
>> I think Calcite should stay out of the authentication business. The 
>> container (not Calcite) should authenticate users and convert them to 
>> security principals that Calcite knows about. (Avatica does authentication 
>> [1] but it just delegates to a provider.)
>> 
>> Regarding authorization. I support adding a grants system to Calcite. Grants 
>> could be created via DDL (GRANT, REVOKE commands) or via APIs (like 
>> interface Schema) and the validator should enforce them. E.g. principal1 can 
>> see table1 but is not allowed to see its columns column2 and column3.
>> 
>> If you’re interested please log Jira case(s).
>> 
>> Julian
>> 
>> [1] https://issues.apache.org/jira/browse/CALCITE-643
>> 
>> 
>>> On Apr 26, 2023, at 9:39 AM, Joachim Bloche  wrote:
>>> 
>>> Hi,
>>> 
>>> I'm discovering Calcite after a customer asked me to enhance a proof of 
>>> concept they made with it.
>>> 
>>> I could get most things to work as needed and am very impressed with the 
>>> possibilities offered.  But completely stuck on a critical point : I can't 
>>> find any hint on how to manage authentication and authorization of users - 
>>> sorry if information is available and I somehow missed it, but I did my 
>>> best to research it.
>>> 
>>> The customer needs to allow access only to certain schemas or tables (or 
>>> columns as a bonus) based on user role.
>>> 
>>> Do you know of any standard way to implement this ? Ideally I would need to 
>>> implement it as low as possible in the stack as I'd like to use the same 
>>> authorization process for both front-end users and users connecting through 
>>> the Avatica JDBC driver.
>>> 
>>> With my still limited knowledge of Calcite, the only "off the shelf" 
>>> solution I could find is to create one Calcite model per role and start one 
>>> Avatica server per model. Obviously I'm not very proud of this one as it's 
>>> not really scalable nor elegant and there has to be a better solution.
>>> 
>>> Any hint to point me in the right direction would be hugely appreciated as 
>>> I would need the same for future projects.
>>> 
>>> 
>>> 
>>> Of course if some development is needed to implement this in Calcite this 
>>> could be an opportunity to contribute to the project.
>>> 
>>> 
>>> 
>>> Many thanks in advance and best regards,
>>> 
>>> 
>>> 
>>> Joachim
>>> 
>>> 
>> 



[jira] [Created] (CALCITE-5681) Support authorization via GRANT and REVOKE DDL commands

2023-05-01 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5681:


 Summary: Support authorization via GRANT and REVOKE DDL commands
 Key: CALCITE-5681
 URL: https://issues.apache.org/jira/browse/CALCITE-5681
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Support authorization via GRANT and REVOKE DDL commands.

While CALCITE-2194 describes how authorization could be built into the schema 
(e.g. user1 can see tables table1 and table2), it requires people to create 
their own Schema objects. This feature would add GRANT and REVOKE commands to 
the DDL parser in the "server" component.

The syntax is TBD but would look something like this:
{code:java}
GRANT SELECT ON TABLE table1, table2 TO user1;

REVOKE ALL ON table1 FROM user1; {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Running Sql Logic Tests for Calcite

2023-04-29 Thread Julian Hyde
An update on this, for those of you not following 
https://issues.apache.org/jira/browse/CALCITE-5615. We agreed to move most of 
Mihai's code (except for the Calcite-specific code) into a new project, 
sql-logic-test. Today I made release 0.1 of that project, and published the 
artifacts to Maven Central.

Thanks to Mihai and Stamatis for their contributions. The announcement is on 
Twitter: https://twitter.com/julianhyde/status/1652409133180817408

The next step will be to rework 5615 to use the net.hydromatic:sql-logic-test 
library in Calcite's test suite.

Julian



On 2023/04/17 17:34:41 Julian Hyde wrote:
> I agree with Stamatis that this has a similar “shape” to Quidem. I’d be happy 
> to host the project under github.com/hydromatic. (If the maven group is 
> net.hydromatic I can publish artifacts to Maven Central and Calcite could 
> depend on those artifacts.)
> 
> Regarding the frequency of testing. If we add it to CI and (say) 5% of the 
> tests fail, I would find that demoralizing, even though passing 95% of the 
> tests is actually a great achievement. So I would only deploy it as part of 
> CI if there is a way to exclude failing tests.
> 
> If the SqlLogicTest tool were defined in another repo, then there could be a 
> Calcite module under plus [1] similar to TpchTest.
> 
> Julian
> 
> [1] https://github.com/apache/calcite/tree/main/plus 
> 
> 
> 
> > On Apr 17, 2023, at 1:58 AM, Stamatis Zampetakis  wrote:
> > 
> > Hey Mihai,
> > 
> > Thanks for starting this discussion!
> > 
> > Let's focus on the first question for now:
> > 
> > Q1: Should the new slt module under PR-3145 [1] become part of Calcite
> > repo or get its own?
> > 
> > For those who have not followed the discussion under the CALCITE-5615
> > [2] let me try to summarize a few things as per my understanding;
> > Mihai can amend/correct things if necessary.
> > 
> > The new slt module resembles a port of sqllogictest utility [3] to
> > Java. It can parse and understand the test-script format used in
> > sqllogictest and can run this scripts over JDBC compliant databases.
> > It also accounts for extensions for Java engines without a JDBC
> > interface.
> > 
> > From my perspective, the code in [1] could perfectly stand on its own
> > in a separate repo; there are already ports of sqllogictest in other
> > languages such as Rust [4] and the latter appears to be quite popular.
> > The sqllocitest parser/runner presents some similarities with the
> > Quidem [5] executor that we are using for certain tests in Calcite.
> > The Quidem project has its own repo although we are making use of it
> > in Calcite.
> > If it becomes a separate repo then the test scripts could also become
> > part of the project making it more self-contained.
> > 
> > On the other hand, we already have a testkit module in Calcite so
> > bringing in new modules for testing purposes is relevant so why not
> > slt as well. If it becomes part of Calcite it can get more visibility
> > and facilitate maintenance since more people would be able to review
> > and merge changes (not only Mihai).
> > 
> > Since we are talking about a new module I would like to see some more
> > people share their opinion on the topic before I continue the review.
> > 
> > Best,
> > Stamatis
> > 
> > [1] https://github.com/apache/calcite/pull/3145
> > [2] https://issues.apache.org/jira/browse/CALCITE-5615
> > [3] https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki
> > [4] https://github.com/risinglightdb/sqllogictest-rs
> > [5] https://github.com/julianhyde/quidem
> > 
> > 
> > 
> > On Sat, Apr 15, 2023 at 11:31 AM Michael Mior  wrote:
> >> 
> >> Very cool! One approach could be to add set these tests to run periodically
> >> (daily/weekly) as opposed to being part of the CI pipeline. That way we
> >> still have a mechanism to keep tabs on bugs but the whole build isn't
> >> slow/broken until this is fixed.
> >> 
> >> On Fri, Apr 14, 2023, 15:20 Mihai Budiu  wrote:
> >> 
> >>> Hello all,
> >>> 
> >>> I have submitted a PR for Calcite with a standalone executable that runs
> >>> the Sql Logic Test suite of 7+ million tests from sqlite.
> >>> 
> >>> This is the JIRA case: https://issues.apache.org/jira/browse/CALCITE-5615
> >>> And this is the PR: https://github.com/apache/calcite/pull/3145
> >>> 
> >>> As Stamatis pointed out, the PR isn't really specific to Calcite, it is a
> >>> general framework in Java to run these tests on 

Re: Employ bloom filters in joins

2023-04-28 Thread Julian Hyde
It would be great to have such a rule. People who don’t want it can disable it; 
and people who enable it can use a cost function.

Some systems that use Bloom filters (and other probabilistic filters) don’t 
execute the query twice but use a side-channel to send the Bloom filter from 
one scan to the other. For example, suppose that the “dept" table is smaller 
and its scan finishes faster. When the scan has finished, it sends the Bloom 
filter to the “emp" scan, which is still under way. From that point, the “emp” 
scan can eliminate a fraction of its rows because it knows that their “deptno” 
values do not pass the filter.

Julian


> On Apr 28, 2023, at 9:01 AM, Zoltan Haindrich  wrote:
> 
> Hi,
> 
> I was wondering about the pros and cons of having a Calcite rule which could 
> rewrite a join to utilize bloom filters; something like:
> 
> select e.*
>   from emp e
>   join dept d on(e.deptno=d.deptno);
>   where d.dname='Sales';
> 
> into something like:
> 
> select e.*
>   from (
>   select e.* from emp e join (
>   select bloom_sketch(deptno) as sketch from dept 
> dname='Sales'
>   ) dept_agg on (bloom_contains(sketch,e.deptno)
>   ) e
>   join dept d on(e.deptno=d.deptno)
>   where d.dname='Sales';
> 
> Generally for the original query:
> * if "dept" is very small a mapjoin is used which is great
> * or possibly some nested loops with index usages on the big table
> * but if the execution engine decides to use a non-specialized approach like 
> merge-join or hash-join; it may move around a lot of data - and in those 
> cases this might be usefull
> 
> There are systems which handle this by introducing a bloom filter (Hive; 
> Spark) and transfer that in the background for the big-table readers - but 
> that's outside the scope of the planner. I was wondering if it would be 
> beneficial or not to introduce such a rule - so that using this can be a 
> cost-based decision during planning.
> 
> pro:
> * to enable an engine to support this optimization - it would only need to 
> implement a few UDFs
> * the rule could put the use of this optimization under cost-based decision
> 
> con:
> * an extra scan of the small table
> * it adds an extra join + aggregate computation
>  * exec engine will most likely exploit that its just a single row
> * I guess without proper stats this could even worsen things
> * it could put more stress on (join) planning - as it could introduce more 
> joins
> 
> What do you guys think?
> 
> cheers,
> Zoltan
> 
> 



Re: Authentication and authorizations in Calcite

2023-04-28 Thread Julian Hyde
I think Calcite should stay out of the authentication business. The container 
(not Calcite) should authenticate users and convert them to security principals 
that Calcite knows about. (Avatica does authentication [1] but it just 
delegates to a provider.)

Regarding authorization. I support adding a grants system to Calcite. Grants 
could be created via DDL (GRANT, REVOKE commands) or via APIs (like interface 
Schema) and the validator should enforce them. E.g. principal1 can see table1 
but is not allowed to see its columns column2 and column3.

If you’re interested please log Jira case(s).

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-643


> On Apr 26, 2023, at 9:39 AM, Joachim Bloche  wrote:
> 
> Hi,
> 
> I'm discovering Calcite after a customer asked me to enhance a proof of 
> concept they made with it.
> 
> I could get most things to work as needed and am very impressed with the 
> possibilities offered.  But completely stuck on a critical point : I can't 
> find any hint on how to manage authentication and authorization of users - 
> sorry if information is available and I somehow missed it, but I did my best 
> to research it.
> 
> The customer needs to allow access only to certain schemas or tables (or 
> columns as a bonus) based on user role.
> 
> Do you know of any standard way to implement this ? Ideally I would need to 
> implement it as low as possible in the stack as I'd like to use the same 
> authorization process for both front-end users and users connecting through 
> the Avatica JDBC driver.
> 
> With my still limited knowledge of Calcite, the only "off the shelf" solution 
> I could find is to create one Calcite model per role and start one Avatica 
> server per model. Obviously I'm not very proud of this one as it's not really 
> scalable nor elegant and there has to be a better solution.
> 
> Any hint to point me in the right direction would be hugely appreciated as I 
> would need the same for future projects.
> 
> 
> 
> Of course if some development is needed to implement this in Calcite this 
> could be an opportunity to contribute to the project.
> 
> 
> 
> Many thanks in advance and best regards,
> 
> 
> 
> Joachim
> 
> 



Re: Rewrite rule to convert self-joins into scans

2023-04-19 Thread Julian Hyde
Thank you, Stamatis. This is helpful. I have linked to this email thread in 
CALCITE-5631.

> On Apr 16, 2023, at 2:44 AM, Stamatis Zampetakis  wrote:
> 
> Few quick thoughts about this.
> 
> For to the problem of query minimization/redundant joins the simpler
> scenarios that I can think of are the following:
> 
> # Scenario A
> select e1.id from emp e1 inner join emp e2 on e1.name = e2.name;
> 
> If you know the name column is UNIQUE then you can drop the join on e2.
> 
> # Scenario B
> select e.name from emp e inner join dept d on e.dept = d.id;
> 
> If you know that e.dept and d.id is a foreign key relationship then
> you can drop the join on dept.
> 
> There are probably other cases to define and handle but we should move
> incrementally.
> 
> As Julian pointed out, the issue logged  in CALITE-5631 could also be
> addressed by employing common table expression related optimizations.
> CTE optimizations and query minimization are both interesting and
> powerful techniques to reduce the cost of the query (whatever that is
> speed, money, resources, etc).
> 
> I would suggest focusing on query minimization first since it is
> pretty well defined and we could come up with solutions much faster
> than CTEs. CTEs usually come up with decisions about materializing or
> not the common expressions which are closer to lower level ("physical
> plan") optimizations.
> 
> Most minimization techniques focus on select project join (SPJ)
> queries so I guess we would have to do some preprocessing to bring the
> plan in this format (only Scan, Project, Filter, and Join operators)
> before applying the rule. It would be a separate planning phase
> combining a bunch of existing rules followed by some new which is
> inline with what Julian was saying about bottom-up unification.
> 
> The new rule could be something similar to LoptOptimizeJoinRule that
> operates on a MultiJoin. I haven't checked if the MultiJoin operator
> is sufficient to express an SPJ query but I think the general idea of
> grouping joins together seems to be a promising direction for writing
> new rules.
> 
> Best,
> Stamatis
> 
> On Sun, Apr 16, 2023 at 2:27 AM Julian Hyde  wrote:
>> 
>> Ian Bertolacci recently logged
>> https://issues.apache.org/jira/browse/CALCITE-5631, to convert
>> 
>>  select
>> (select numarrayagg(C5633_203) from T893 where C5633_586 = T895.id),
>> (select numarrayagg(C5633_170) from T893 where C5633_586 = T895.id)
>>  from T895
>> 
>> into
>> 
>>  select agg.agg1,
>>  agg.agg2
>>  from T895
>>  left join (
>>select C5633_586,
>>numarrayagg(C5633_203) as agg1,
>>numarrayagg(C5633_170) as agg2
>>from T893
>>where C5633_586 is not null
>>group by C5633_586) as agg
>>  on agg.C5633_586 = T895.id
>> 
>> This seems to me an interesting and important problem. But it's also a
>> hard problem, and it's not clear to me which approach is the best.
>> Does anyone have any ideas for how to approach it?
>> 
>> Also, we could use more example queries that illustrate the general
>> pattern.  (Preferably in terms of simple databases such as EMP and
>> DEPT.)
>> 
>> In Calcite rewrite rules (RelRule) are usually the preferred approach.
>> Because the common relational expressions scans can be an arbitrary
>> distance apart in the RelNode tree, RelRule doesn't seem suitable.
>> 
>> There seem to be some similarities to algorithms to use materialized
>> views, which use bottom-up unification.
>> 
>> Ian's original query actually has correlated scalar sub-queries rather
>> than explicit joins. Would it be better to target common sub-queries
>> rather than joins?
>> 
>> Lastly, there are similarities with the WinMagic algorithm, which
>> converts correlated sub-queries into window aggregates. Is that a
>> useful direction? (My implementation of measures in CALCITE-4496
>> naturally creates correlated scalar sub-queries that can be inlined in
>> the enclosing query if simple, or converted to window aggregates if
>> more complex.)
>> 
>> Julian



Re: [DISCUSS] Running Sql Logic Tests for Calcite

2023-04-17 Thread Julian Hyde
I agree with Stamatis that this has a similar “shape” to Quidem. I’d be happy 
to host the project under github.com/hydromatic. (If the maven group is 
net.hydromatic I can publish artifacts to Maven Central and Calcite could 
depend on those artifacts.)

Regarding the frequency of testing. If we add it to CI and (say) 5% of the 
tests fail, I would find that demoralizing, even though passing 95% of the 
tests is actually a great achievement. So I would only deploy it as part of CI 
if there is a way to exclude failing tests.

If the SqlLogicTest tool were defined in another repo, then there could be a 
Calcite module under plus [1] similar to TpchTest.

Julian

[1] https://github.com/apache/calcite/tree/main/plus 



> On Apr 17, 2023, at 1:58 AM, Stamatis Zampetakis  wrote:
> 
> Hey Mihai,
> 
> Thanks for starting this discussion!
> 
> Let's focus on the first question for now:
> 
> Q1: Should the new slt module under PR-3145 [1] become part of Calcite
> repo or get its own?
> 
> For those who have not followed the discussion under the CALCITE-5615
> [2] let me try to summarize a few things as per my understanding;
> Mihai can amend/correct things if necessary.
> 
> The new slt module resembles a port of sqllogictest utility [3] to
> Java. It can parse and understand the test-script format used in
> sqllogictest and can run this scripts over JDBC compliant databases.
> It also accounts for extensions for Java engines without a JDBC
> interface.
> 
> From my perspective, the code in [1] could perfectly stand on its own
> in a separate repo; there are already ports of sqllogictest in other
> languages such as Rust [4] and the latter appears to be quite popular.
> The sqllocitest parser/runner presents some similarities with the
> Quidem [5] executor that we are using for certain tests in Calcite.
> The Quidem project has its own repo although we are making use of it
> in Calcite.
> If it becomes a separate repo then the test scripts could also become
> part of the project making it more self-contained.
> 
> On the other hand, we already have a testkit module in Calcite so
> bringing in new modules for testing purposes is relevant so why not
> slt as well. If it becomes part of Calcite it can get more visibility
> and facilitate maintenance since more people would be able to review
> and merge changes (not only Mihai).
> 
> Since we are talking about a new module I would like to see some more
> people share their opinion on the topic before I continue the review.
> 
> Best,
> Stamatis
> 
> [1] https://github.com/apache/calcite/pull/3145
> [2] https://issues.apache.org/jira/browse/CALCITE-5615
> [3] https://www.sqlite.org/sqllogictest/doc/trunk/about.wiki
> [4] https://github.com/risinglightdb/sqllogictest-rs
> [5] https://github.com/julianhyde/quidem
> 
> 
> 
> On Sat, Apr 15, 2023 at 11:31 AM Michael Mior  wrote:
>> 
>> Very cool! One approach could be to add set these tests to run periodically
>> (daily/weekly) as opposed to being part of the CI pipeline. That way we
>> still have a mechanism to keep tabs on bugs but the whole build isn't
>> slow/broken until this is fixed.
>> 
>> On Fri, Apr 14, 2023, 15:20 Mihai Budiu  wrote:
>> 
>>> Hello all,
>>> 
>>> I have submitted a PR for Calcite with a standalone executable that runs
>>> the Sql Logic Test suite of 7+ million tests from sqlite.
>>> 
>>> This is the JIRA case: https://issues.apache.org/jira/browse/CALCITE-5615
>>> And this is the PR: https://github.com/apache/calcite/pull/3145
>>> 
>>> As Stamatis pointed out, the PR isn't really specific to Calcite, it is a
>>> general framework in Java to run these tests on any JDBC compliant
>>> executor. So a question is whether this belongs to the Calcite project, or
>>> some place else. sqlite is a C project, I didn't see any Java in their
>>> source tree.
>>> 
>>> Please note that SQLite is in the public domain, so their licensing terms
>>> are not an obstacle to using the test scripts.
>>> 
>>> The submitted code runs Calcite in its default configuration, but the
>>> intent is for other projects that build Calcite-based compilers to be able
>>> to test them by subclassing the "TestExecutors". In our own project (
>>> https://github.com/vmware/sql-to-dbsp-compiler) we have done exactly that,
>>> and we are not using the JDBC API.
>>> 
>>> The testsuite does find bugs in Calcite, both crashes and incorrect
>>> results. So I think it's usefulness is not debated.
>>> 
>>> The second question is about the packaging of this program; right now it
>>> has a main() entry point and it prints the results to stderr for human
>>> consumption and triage. It is not clear to me how it should be inserted in
>>> a CI infrastructure, since running all 7 million tests could take a long
>>> time. One possible extension would be to have the program generate a
>>> regression test for Calcite for each bug it finds, but I haven't
>>> implemented this feature yet (and many failures could be due to the same
>>> bug). But even that 

Rewrite rule to convert self-joins into scans

2023-04-15 Thread Julian Hyde
Ian Bertolacci recently logged
https://issues.apache.org/jira/browse/CALCITE-5631, to convert

  select
 (select numarrayagg(C5633_203) from T893 where C5633_586 = T895.id),
 (select numarrayagg(C5633_170) from T893 where C5633_586 = T895.id)
  from T895

into

  select agg.agg1,
  agg.agg2
  from T895
  left join (
select C5633_586,
numarrayagg(C5633_203) as agg1,
numarrayagg(C5633_170) as agg2
from T893
where C5633_586 is not null
group by C5633_586) as agg
  on agg.C5633_586 = T895.id

This seems to me an interesting and important problem. But it's also a
hard problem, and it's not clear to me which approach is the best.
Does anyone have any ideas for how to approach it?

Also, we could use more example queries that illustrate the general
pattern.  (Preferably in terms of simple databases such as EMP and
DEPT.)

In Calcite rewrite rules (RelRule) are usually the preferred approach.
Because the common relational expressions scans can be an arbitrary
distance apart in the RelNode tree, RelRule doesn't seem suitable.

There seem to be some similarities to algorithms to use materialized
views, which use bottom-up unification.

Ian's original query actually has correlated scalar sub-queries rather
than explicit joins. Would it be better to target common sub-queries
rather than joins?

Lastly, there are similarities with the WinMagic algorithm, which
converts correlated sub-queries into window aggregates. Is that a
useful direction? (My implementation of measures in CALCITE-4496
naturally creates correlated scalar sub-queries that can be inlined in
the enclosing query if simple, or converted to window aggregates if
more complex.)

Julian


[jira] [Created] (CALCITE-5645) Correlated scalar sub-query returns incorrect results when the correlating variable is NULL

2023-04-13 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-5645:


 Summary: Correlated scalar sub-query returns incorrect results 
when the correlating variable is NULL
 Key: CALCITE-5645
 URL: https://issues.apache.org/jira/browse/CALCITE-5645
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Correlated scalar sub-query returns incorrect results when the correlating 
variable is NULL. For example, consider the following view and query:
{code}
!use scott
!set outputformat mysql
create view emp1 as 
  select empno, ename, job, sal,
  case when job = 'ANALYST' then null
   else deptno
  end as deptno
  from emp);
!ok

select ename, (select count(*) as c
   from emp1 as e2
   where e2.deptno is not distinct from e1.deptno) as c
from emp1 as e1;
!ok

select ename, deptno,
count(*) over (partition by deptno) as c
from emp1 as e1;
+++---+
| ENAME  | DEPTNO | C |
+++---+
| ADAMS  | 20 | 3 |
| ALLEN  | 30 | 6 |
| BLAKE  | 30 | 6 |
| CLARK  | 10 | 3 |
| FORD   || 2 |
| JAMES  | 30 | 6 |
| JONES  | 20 | 3 |
| KING   | 10 | 3 |
| MARTIN | 30 | 6 |
| MILLER | 10 | 3 |
| SCOTT  || 2 |
| SMITH  | 20 | 3 |
| TURNER | 30 | 6 |
| WARD   | 30 | 6 |
+++---+
(14 rows)

!ok

{code}

The {{emp1}} view just provides null values for the {{deptno}} column.

The two queries should be equivalent, but the first throws:
{noformat}
java.lang.AssertionError: RexInputRef index 7 out of range 0..4
at org.apache.calcite.util.Litmus.lambda$static$0(Litmus.java:31)
at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:125)
at org.apache.calcite.rex.RexChecker.visitInputRef(RexChecker.java:61)
at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:113)
at org.apache.calcite.rex.RexChecker.visitCall(RexChecker.java:144)
at org.apache.calcite.rex.RexChecker.visitCall(RexChecker.java:61)
at org.apache.calcite.rex.RexCall.accept(RexCall.java:189)
at org.apache.calcite.rel.core.Project.isValid(Project.java:262)
at org.apache.calcite.rel.core.Project.(Project.java:107)
at 
org.apache.calcite.rel.logical.LogicalProject.(LogicalProject.java:75)
at 
org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:166)
at 
org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:143)
at 
org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:199)
at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:2117)
at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1888)
at 
org.apache.calcite.tools.RelBuilder.projectNamed(RelBuilder.java:2209)
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelConverter.java:4638)
{noformat}

The simpler query
{code}
select ename, deptno,
(select count(*)
 from emp1 as e2
 where e1.deptno is null) as c
from emp1 as e1;
!ok
{code}
throws a similar exception.

This functionality is necessary for supporting measures (see CALCITE-4496) with 
NULL keys in the GROUP BY, or with GROUPING SETS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: How to utilize parser hooks such as SqlInsertKeywords

2023-04-10 Thread Julian Hyde
To research this, I would use 'git annotate' and identify the commits
and Jira cases that added those particular hooks.

On Tue, Apr 4, 2023 at 9:22 AM Jonathan Sternberg  wrote:
>
> Hi,
>
> We're utilizing Calcite's parser and need to customize it for a certain
> dialect. In the past, we've just been keeping a fork of the parser and
> making changes to the core template file. We wanted to try utilizing some
> of the hooks that Calcite provides to ease the maintenance burden when
> modifying the parser.
>
> We've noticed hooks such as this one:
> https://github.com/apache/calcite/blob/2d0b3acb11169b307dc165c1fae8b7c92b888ae9/core/src/main/codegen/templates/Parser.jj#L326-L333
>
> If we wanted to customize that production to add support for a keyword, how
> would we go about doing it? We've seen the configuration file for
> freemarker and have seen how to add new productions for new statements, but
> we haven't found a way to hook into the methods like these ones for
> modifying existing statements.
>
> Thanks.
>
> --Jonathan Sternberg


[DISCUSS] Sharing the load of reviewing PRs

2023-04-10 Thread Julian Hyde
I don't enjoy reviewing and merging PRs. And every time I do, I feel
like a sucker, because there are over a few dozen committers who are
enjoying the project and not doing the work. (There is a small group
of committers who regularly review and merge PRs. I don't know how
they feel about the task, but I am immensely grateful.)

I think I would review more PRs if I saw others doing the same.

Can we figure out a fairer way to distribute the load? For release
managers (approximately the same amount of work, but compressed into a
few hours or days) we have successfully run a rota for several years.
Could we do something similar with PRs?

I propose the following. For each calendar month, there is a PR
manager and 6 - 8 reviewers. The PR manager does not review PRs, but
assigns them to reviewers, and politely reminds reviews to keep the PR
moving.

The PR manager's goals are:
 * every non-draft PR is reviewed within 3 days of submission,
 * every PR is merged within 3 days of being done;
 * rotate duties so that no reviewer is asked to review more than 4
PRs per month;
 * email a report at the end of the month;
 * work down the backlog of historic PRs if it's a slow month.

The PR manager rotates every month. The reviewers can rotate if they
wish, but I suspect most will stay in the pool for several months,
because the reviewing load is not very heavy, and because they see
others doing the work.

Other notes:
 * Non-committers would be welcome to join the pool of reviews (and
that would be a good way to earn the committer bit) and a committer
could merge when the PR is approved.
 * If committers join the pool, that's a good way to earn PMC membership.
 * Committers who are not in the pool are welcome to review PRs and
assign PRs to themselves (but expect to be nagged by the PR manager if
you don't review in a timely manner).

What do you think? Would you join this scheme if we introduced it? If
you agree please +1; also happy to see revisions to this suggestion or
other ideas to share the work.

Julian


Re: calcite lsp server

2023-04-07 Thread Julian Hyde
I wrote SqlAdvisor [1] with the goal of enabling auto completion in SQL 
editors, but I never took the next step of wrapping it as a service. 

Julian

[1] 
https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/advise/SqlAdvisor.html

> On Apr 6, 2023, at 1:30 PM, Guillaume Masse 
>  wrote:
> 
> Hi,
> 
> I'm wondering if anyone has built a full fledge language server protocol
> using Calcite validator.
> 
> I saw Dreamio has an autocomplete service:
> https://github.com/dremio/dremio-oss/tree/master/services/autocomplete
> 
> -- 
> Guillaume Massé
> [Gee-OHM]
> (马赛卫)


Re: Using calcite to parse untrusted SQL queries and add implicit filters

2023-04-02 Thread Julian Hyde
+1 what Askar said

Also you could give users access to views. A particular user might have a view 
with “WHERE tenant = 100” so if they ask for any any other tenant is they would 
get no results. 

It’s also possible to use Calcite to check grants (table and column access) and 
fail a query if they access objects they are not allowed to see or use. We 
should add features to support this use case better. 

Julian

> On Apr 2, 2023, at 11:09, Askar Bozcan  wrote:
> 
> (EDIT)
> ... *For this case *there's also a Druid adapter which uses the native JSON
> intf.
> 
>> On Sun, 2 Apr 2023 at 21:08, Askar Bozcan  wrote:
>> 
>> What's your use case? Do you want to:
>> a) Use Calcite as a "frontend" DB of sorts, to accept all queries and send
>> the processed SQL query (from Calcite) only the accepted queries? For this
>> case
>> b) Just use Calcite as a query processor, and send the processed query
>> yourself?
>> 
>> - Askar
>> 
>>> On Sun, 2 Apr 2023 at 20:38, Sebastien Rosset  wrote:
>>> 
>>> Thank you for the quick response. I am new to Calcite, it's good to hear
>>> there might be a possibility. I will investigate. Regarding the tenant_id,
>>> if the input SQL statement is:
>>> SELECT a, b, c
>>> FROM datasource
>>> WHERE input_expression
>>> 
>>> Then the tool should add a "security filter" as shown below:
>>> 
>>> SELECT a, b, c
>>> FROM datasource
>>> WHERE tenant_id = 'abcd123' AND input_expression
>>> 
>>> 
>>> On Sun, Apr 2, 2023 at 9:56 AM Askar Bozcan 
>>> wrote:
>>> 
 Hello Sebastien,
 I'd say it's quite feasible for this purpose by having Calcite push-down
 pre-processed queries to Druid and being a kind of a "front" to
>>> underlying
 Druid DB.
 
 Regarding rejecting queries other than SELECT:
 1) Parse the query and get the SqlNode representing the root of the
>>> syntax
 tree.
 2) Extend SqlShuttle
 <
 
>>> https://calcite.apache.org/javadocAggregate/org/apache/calcite/sql/util/SqlShuttle.html
> 
 to check & manipulate the syntax tree, returning null for rejected
>>> nodes &
 recursively iterating for SELECT sub-queries.
 Note that you can use SqlSelect's getFrom, getGroup, etc to get all of
>>> the
 possible nodes which can contain subqueries. Also note that SqlSelect
 itself is a subtype of SqlCall.
 
 Regarding "tenant_id" in WHERE case, I'm not too sure as I'm neither
 familiar with Druid nor with Druid adapter in Calcite. Regardless, it
 should be doable on the adapter level AFAIK.
 
 Hope that was helpful!
 
 Regards,
 Askar Bozcan
 
 On Sun, 2 Apr 2023 at 18:07, Sebastien Rosset 
>>> wrote:
 
> Would it make sense to use Apache Calcite as a security mediator? Has
> calcite already been used for that purpose? The mediator would parse
> untrusted SQL queries, reject queries other than SELECT (including any
> sub-queries), and inject multi-tenancy WHERE filters in every SELECT
 query,
> including sub-queries?
> 
> More specifically, consider a Druid database which is configured to
> implement multi-tenancy with shared datasources:
> 
> 
 
>>> https://druid.apache.org/docs/latest/querying/multitenancy.html#partitioning-shared-datasources
> .
> Every Druid datasource would have a "tenant_id" attribute that can be
 used
> in a Druid-SQL WHERE clause.
> 
> Thank you. Sebastien
> 
 
>>> 
>> 


<    1   2   3   4   5   6   7   8   9   10   >