[jira] [Created] (CALCITE-4596) RelFieldTrimmer#trimFields fails if values row type is empty record

2021-05-05 Thread Konstantin Orlov (Jira)
Konstantin Orlov created CALCITE-4596:
-

 Summary: RelFieldTrimmer#trimFields fails if values row type is 
empty record
 Key: CALCITE-4596
 URL: https://issues.apache.org/jira/browse/CALCITE-4596
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.26.0
Reporter: Konstantin Orlov


Currently an exception is thrown when LogicalValues having no fields is passed 
to {{RelFieldTrimmer#trimFields(LogicalValues, ImmutableBitSet, 
Set)}}.

The reason is while trying to avoid producing an empty record, the code below 
is not expecting the row type of the input could be already an empty record.
{code:java}
  public TrimResult trimFields(
  LogicalValues values,
  ImmutableBitSet fieldsUsed,
  Set extraFields) {
final RelDataType rowType = values.getRowType();
final int fieldCount = rowType.getFieldCount();

// If they are asking for no fields, we can't give them what they want,
// because zero-column records are illegal. Give them the last field,
// which is unlikely to be a system field.
if (fieldsUsed.isEmpty()) {
  fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
}
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: RelFieldTrimmer throws an exception in certain cases

2021-05-05 Thread Konstantin Orlov
> Konstantin, can you log it, please

Yes, sure. Here it is [1]

[1] https://issues.apache.org/jira/browse/CALCITE-4596 


-- 
Regards,
Konstantin Orlov



> On 4 May 2021, at 21:29, Julian Hyde  wrote:
> 
> Regardless of which direction we go (allowing zero-field record types, or 
> disallowing them), Konstantin has found a bug. Konstantin, can you log it, 
> please.
> 
> On 2021/04/29 14:25:27, Konstantin Orlov  wrote: 
>> Hi all.
>> 
>> I faced a problem preventing certain queries being planned because 
>> RelFieldTrimmer throws 
>> an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds for 
>> length 0”.
>> 
>> The problem is here [1]:
>> 
>>// If they are asking for no fields, we can't give them what they want,
>>// because zero-column records are illegal. Give them the last field,
>>// which is unlikely to be a system field.
>>if (fieldsUsed.isEmpty()) {
>>  fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
>>}
>> 
>> In case fieldsUsed.isEmpty we returns last field, but it is currently 
>> possible that fieldCount=0 as well.  
>> 
>> After some investigation I find out that the reason is empty record derived 
>> as row type for Aggregate.
>> It is possible when an aggregate has an empty group key and no aggregate 
>> calls.
>> 
>> So the question is whether an empty record is a legal row type for an 
>> aggregation node?
>> 
>> Below is a reproducer for this problem, just put it at RelFieldTrimmerTest:
>> 
>>  @Test void test() {
>>class ContextImpl implements Context {
>>  final Object target;
>> 
>>  ContextImpl(Object target) {
>>this.target = Objects.requireNonNull(target, "target");
>>  }
>> 
>>  @Override public  @Nullable T unwrap(Class clazz) {
>>if (clazz.isInstance(target)) {
>>  return clazz.cast(target);
>>}
>>return null;
>>  }
>>}
>> 
>>// RelBuilder hides problem when simplifyValues=true, hence we need to 
>> disable it
>>final RelBuilder builder = RelBuilder.create(config()
>>.context(new 
>> ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());
>> 
>>final RelNode root =
>>builder.scan("EMP")
>>.aggregate(builder.groupKey())
>>.filter(builder.literal(false))
>>.project(builder.literal(42))
>>.build();
>> 
>>final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null, builder);
>>fieldTrimmer.trim(root); // fails with ArrayIndexOutOfBoundsException: 
>> Index -1 out of bounds for length 0
>>  }
>> 
>> 
>> [1] 
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197
>> 
>> -- 
>> Regards,
>> Konstantin Orlov
>> 
>> 
>> 
>> 
>> 



Re: Proposal to extend Calcite into a incremental query optimizer

2021-05-05 Thread Botong Huang
Hi Stamatis and all,

Thanks for the interest! Let's tentatively schedule the next meeting next
Wednesday at May 12, 10pm-11pm PST then. Please let us know if there's new
needs showing up.

Best,
Botong

On Sun, May 2, 2021 at 2:59 PM Stamatis Zampetakis 
wrote:

> Hello,
>
> I really regret missing the first meeting, sorry about that. I added my
> preferences in the document.
> I will make sure to attend the next one and help as much as I can.
>
> I didn't have the chance yet to go over the paper but will try to do it
> before the next meeting.
>
> For me the following dates are more convenient than others so it would be
> nice if we could arrange it then.
>
> Thu, May 6, 10pm PST
> Tue, May 12, 10pm PST
>
> Best,
> Stamatis
>
> On Sat, May 1, 2021 at 9:42 PM Julian Hyde  wrote:
>
> > I have added my time preferences to the doc [1]. I am generally
> > available any evening Mon - Thu. How about we meet Monday 10th May?
> >
> > Stamatis, Jesus, Given the complexity of this work, I would very much
> > appreciate your insight, as experts in optimizer theory. Could one of
> > you join the next meeting? Of course we should choose a time that
> > works for everyone's schedule.
> >
> > Julian
> >
> > [1]
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> >
> > On Wed, Apr 28, 2021 at 9:32 AM Botong Huang  wrote:
> > >
> > > We didn't record it, we will try to record the following meetings.
> Please
> > > add your time preference in the docs, so that we can find a meeting
> time
> > > that works for more people.
> > >
> > > Thanks,
> > > Botong
> > >
> > > On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina 
> > wrote:
> > >
> > > > Is there a recording available?
> > > > Viliam
> > > >
> > > > On Wed, 28 Apr 2021 at 00:15, Botong Huang  wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > The meeting yesterday was fun and productive. As discussed, this is
> > the
> > > > > call to schedule our second meeting.
> > > > >
> > > > > We encourage everyone to add their time preferences during 05/01 -
> > 05/15
> > > > > here:
> > > > >
> > > > >
> > > >
> >
> https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing
> > > > >
> > > > > Thanks,
> > > > > Botong
> > > > >
> > > > > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang 
> > wrote:
> > > > >
> > > > > > Hi all,
> > > > > > We've created a zoom meeting below for our meeting next Monday
> > > > > > (9pm-10:30pm PST on 04/26).
> > > > > > Talk to you all soon!
> > > > > >
> > > > > > Join Zoom Meeting
> > > > > > https://uci.zoom.us/j/91279732686
> > > > > > <
> > > > >
> > > >
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE
> > > > > >
> > > > > >
> > > > > > Meeting ID: 912 7973 2686
> > > > > > One tap mobile
> > > > > > +16699006833,,91279732686# US (San Jose)
> > > > > > +12532158782,,91279732686# US (Tacoma)
> > > > > >
> > > > > > Dial by your location
> > > > > > +1 669 900 6833 US (San Jose)
> > > > > > +1 253 215 8782 US (Tacoma)
> > > > > > +1 346 248 7799 US (Houston)
> > > > > > +1 301 715 8592 US (Washington DC)
> > > > > > +1 312 626 6799 US (Chicago)
> > > > > > +1 646 558 8656 US (New York)
> > > > > > Meeting ID: 912 7973 2686
> > > > > > Find your local number: https://uci.zoom.us/u/aykHTkJBh
> > > > > > <
> > > > >
> > > >
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM
> > > > > >
> > > > > >
> > > > > > Join by Skype for Business
> > > > > > https://uci.zoom.us/skype/91279732686
> > > > > > <
> > > > >
> > > >
> >
> https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Botong
> > > > > >
> > > > > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang 
> > > > wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> According to the preferences collected, we are tentatively
> > scheduling
> > > > > our
> > > > > >> meeting at 9pm-10:30pm PST on 04/26 Monday.
> > > > > >>
> > > > > >> We will give a presentation about Tempura, followed by a free
> > > > > discussion.
> > > > > >>
> > > > > >> Please let us know if there are new other requests. Few days
> > before
> > > > > >> the meeting, I will send out a zoom meeting link.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Botong
> > > > > >>
> > > > > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang 
> > wrote:
> > > > > >>
> > > > > >>> Hi Julian and all,
> > > > > >>>
> > > > > >>> We've posted the Tempura code base below. Feel free to take a
> > quick
> > > > > peek
> > > > > >>> at the last five commits.
> > > > > >>>
> > > > >
> > https://github.com/alibaba/cost-based-incremental-optimizer/commits/main
> > > > > >>>
> > > > > >>> I've also opened a Jira (CALCITE-4568
> > > > > >>> <

Re: Trait propagation in heterogeneous plans

2021-05-05 Thread Vladimir Ozerov
Hi Vladimir, Julian,

I want to distinguish between two cases.

Some projects may decide to use Calcite's distribution trait. To my
knowledge, this is not a common pattern because it is not really integrated
into Calcite. It is not destroyed/adjusted in rules and operators as
needed, not integrated into EnumerableConvention.enforce, etc.

Other projects may decide to use a custom distribution trait. Examples are
Apache Flink, Hazelcast, and some other private projects we work on. There
are many reasons to do this. A couple of examples:
1. Calcite's distribution produces logical exchange, while production
grade-optimizers are typically multi-phase and want the distribution
convention to produce physical exchanges in a dedicated physical phase(s).
2. Some systems may have custom requirements for distribution, such as
propagating the number of shards, supporting multiple equivalent keys, etc.

But in both cases, the bottom line is that the Enumerable currently cannot
work with both built-in and custom distributions because the associated
code is not implemented in Calcite's core. And even if we add the
fully-fledged support of the built-in distribution to Enumerable, many
projects will continue using custom distribution traits because the
exchange is a physical operation with lots of backend-dependent specific
quirks, and any attempt to model it abstractly in Calcite's core is
unlikely to cover some edge cases.

The same applies to any other custom trait that depends on columns -
Enumerable will not be able to process it correctly.

Therefore, instead of having a definitively broken code, it might be better
to apply the defensive approach when the whole Enumerable backend provides
a clear and consistent contract: we support collation and reset everything
else. IMO it is better because it matches the current behavior and would
never cause strange bugs in a user code. If in the future we invest in the
proper integration of the built-in distribution or figure out how to
"externalize" the trait propagation for Enumerable operators, we may relax
this statement.

Please let me know if it makes any sense.

Regards,
Vladimir.

вт, 4 мая 2021 г. в 21:02, Julian Hyde :

> > I would say known in-core vs unknown trait is a reasonable approach to
> > distingush traits.
>
> Easy, but not reasonable. It will make it very difficult to reuse
> existing rels and rules (e.g. Enumerable) in a downstream project that
> has defined its own traits.
>
> On Tue, May 4, 2021 at 10:44 AM Vladimir Sitnikov
>  wrote:
> >
> > > It seems arbitrary to include Collation but exclude other traits.
> >
> > I would say known in-core vs unknown trait is a reasonable approach to
> > distingush traits.
> >
> > Vladimir
>


[jira] [Created] (CALCITE-4597) Allow RelNodes to have an empty row type (zero fields)

2021-05-05 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-4597:


 Summary: Allow RelNodes to have an empty row type (zero fields)
 Key: CALCITE-4597
 URL: https://issues.apache.org/jira/browse/CALCITE-4597
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


Add an option, {{EmptyRowTypePolicy}}, to allow creation of {{RelNode}}s whose 
row type is empty. That is, contains zero fields.

There are three values:
 * {{FORBIDDEN}} - Calcite prevents empty row type. (For example, planner and 
{{RelBuilder}} throw if they see one.) Rules must not produce empty row types. 
Rules can assume that they will not encounter empty row types.
 * {{DISCOURAGED}} - Empty row types are discouraged. (Planner and 
{{RelBuilder}} will not throw if they see one.) Rules must not fail if they 
encounter an empty row type. Rules should not produce empty row types (with 
reasonable exceptions, such as if the input has an empty row type).
 * {{ALLOWED}} - Empty row types are OK. All rules should handle {{RelNode}}s 
with empty row types, and it's OK if they generate {{RelNode}}s with empty row 
types.

The current policy is effectively {{DISCOURAGED}}. We try not to create empty 
RelNodes, but we don't check, and they crop up occasionally.

After this change, and for a few releases, the policy will be {{DISCOURAGED}} 
by default, but we will run tests in all three modes. All rules must run in all 
modes.

At some point in the future, we will change the default policy to {{ALLOWED}}. 
All rules must continue to run in all modes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: RelFieldTrimmer throws an exception in certain cases

2021-05-05 Thread Julian Hyde
Thanks, Konstantin.

I have logged https://issues.apache.org/jira/browse/CALCITE-4597 to make the 
policy configurable. Eventually I would like to allow empty row types 
throughout the system, but until then, rules and RelFieldTrimmer should follow 
Postel’s law [1] and accept empty row types but try not to produce them.

Julian

[1] https://en.wikipedia.org/wiki/Robustness_principle

> On May 5, 2021, at 12:51 AM, Konstantin Orlov  wrote:
> 
>> Konstantin, can you log it, please
> 
> Yes, sure. Here it is [1]
> 
> [1] https://issues.apache.org/jira/browse/CALCITE-4596 
> 
> 
> -- 
> Regards,
> Konstantin Orlov
> 
> 
> 
>> On 4 May 2021, at 21:29, Julian Hyde  wrote:
>> 
>> Regardless of which direction we go (allowing zero-field record types, or 
>> disallowing them), Konstantin has found a bug. Konstantin, can you log it, 
>> please.
>> 
>> On 2021/04/29 14:25:27, Konstantin Orlov  wrote: 
>>> Hi all.
>>> 
>>> I faced a problem preventing certain queries being planned because 
>>> RelFieldTrimmer throws 
>>> an ArrayIndexOutOfBoundsException with message "Index -1 out of bounds for 
>>> length 0”.
>>> 
>>> The problem is here [1]:
>>> 
>>>   // If they are asking for no fields, we can't give them what they want,
>>>   // because zero-column records are illegal. Give them the last field,
>>>   // which is unlikely to be a system field.
>>>   if (fieldsUsed.isEmpty()) {
>>> fieldsUsed = ImmutableBitSet.range(fieldCount - 1, fieldCount);
>>>   }
>>> 
>>> In case fieldsUsed.isEmpty we returns last field, but it is currently 
>>> possible that fieldCount=0 as well.  
>>> 
>>> After some investigation I find out that the reason is empty record derived 
>>> as row type for Aggregate.
>>> It is possible when an aggregate has an empty group key and no aggregate 
>>> calls.
>>> 
>>> So the question is whether an empty record is a legal row type for an 
>>> aggregation node?
>>> 
>>> Below is a reproducer for this problem, just put it at RelFieldTrimmerTest:
>>> 
>>> @Test void test() {
>>>   class ContextImpl implements Context {
>>> final Object target;
>>> 
>>> ContextImpl(Object target) {
>>>   this.target = Objects.requireNonNull(target, "target");
>>> }
>>> 
>>> @Override public  @Nullable T unwrap(Class clazz) {
>>>   if (clazz.isInstance(target)) {
>>> return clazz.cast(target);
>>>   }
>>>   return null;
>>> }
>>>   }
>>> 
>>>   // RelBuilder hides problem when simplifyValues=true, hence we need to 
>>> disable it
>>>   final RelBuilder builder = RelBuilder.create(config()
>>>   .context(new 
>>> ContextImpl(RelBuilder.Config.DEFAULT.withSimplifyValues(false))).build());
>>> 
>>>   final RelNode root =
>>>   builder.scan("EMP")
>>>   .aggregate(builder.groupKey())
>>>   .filter(builder.literal(false))
>>>   .project(builder.literal(42))
>>>   .build();
>>> 
>>>   final RelFieldTrimmer fieldTrimmer = new RelFieldTrimmer(null, builder);
>>>   fieldTrimmer.trim(root); // fails with ArrayIndexOutOfBoundsException: 
>>> Index -1 out of bounds for length 0
>>> }
>>> 
>>> 
>>> [1] 
>>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/sql2rel/RelFieldTrimmer.java#L1197
>>> 
>>> -- 
>>> Regards,
>>> Konstantin Orlov
>>> 
>>> 
>>> 
>>> 
>>> 
> 



Re: Trait propagation in heterogeneous plans

2021-05-05 Thread Julian Hyde
Vladimir,

You are arguing for pragmatism over idealism. I get that.

The problem with your argument is that you go on to say

> If in the future we invest in the
> proper integration 

That’s a big “If”. Who is the “we” who is going to do this work? Now you are 
the one being unrealistic.

Calcite is a sophisticated framework that has many high-level abstractions to 
support scenarios that are not tested in the core code base. We built those 
abstractions by being idealistic. We couldn’t possibly test them because we 
didn’t have the use case to exercise them.

How do these abstractions get fully baked into production quality? When the 
downstream projects that need them refine the features, and contribute fixes 
back.

It’s not in Calcite’s interests to make it easy for downstream projects to fork 
the code when they need to do the complex stuff. We need to use our 
abstractions (in this case, the idea that traits are pluggable) and if those 
abstractions are wrong or limiting, those downstream projects will come and fix 
them.

Julian



> On May 5, 2021, at 12:32 PM, Vladimir Ozerov  wrote:
> 
> Hi Vladimir, Julian,
> 
> I want to distinguish between two cases.
> 
> Some projects may decide to use Calcite's distribution trait. To my
> knowledge, this is not a common pattern because it is not really integrated
> into Calcite. It is not destroyed/adjusted in rules and operators as
> needed, not integrated into EnumerableConvention.enforce, etc.
> 
> Other projects may decide to use a custom distribution trait. Examples are
> Apache Flink, Hazelcast, and some other private projects we work on. There
> are many reasons to do this. A couple of examples:
> 1. Calcite's distribution produces logical exchange, while production
> grade-optimizers are typically multi-phase and want the distribution
> convention to produce physical exchanges in a dedicated physical phase(s).
> 2. Some systems may have custom requirements for distribution, such as
> propagating the number of shards, supporting multiple equivalent keys, etc.
> 
> But in both cases, the bottom line is that the Enumerable currently cannot
> work with both built-in and custom distributions because the associated
> code is not implemented in Calcite's core. And even if we add the
> fully-fledged support of the built-in distribution to Enumerable, many
> projects will continue using custom distribution traits because the
> exchange is a physical operation with lots of backend-dependent specific
> quirks, and any attempt to model it abstractly in Calcite's core is
> unlikely to cover some edge cases.
> 
> The same applies to any other custom trait that depends on columns -
> Enumerable will not be able to process it correctly.
> 
> Therefore, instead of having a definitively broken code, it might be better
> to apply the defensive approach when the whole Enumerable backend provides
> a clear and consistent contract: we support collation and reset everything
> else. IMO it is better because it matches the current behavior and would
> never cause strange bugs in a user code. If in the future we invest in the
> proper integration of the built-in distribution or figure out how to
> "externalize" the trait propagation for Enumerable operators, we may relax
> this statement.
> 
> Please let me know if it makes any sense.
> 
> Regards,
> Vladimir.
> 
> вт, 4 мая 2021 г. в 21:02, Julian Hyde :
> 
>>> I would say known in-core vs unknown trait is a reasonable approach to
>>> distingush traits.
>> 
>> Easy, but not reasonable. It will make it very difficult to reuse
>> existing rels and rules (e.g. Enumerable) in a downstream project that
>> has defined its own traits.
>> 
>> On Tue, May 4, 2021 at 10:44 AM Vladimir Sitnikov
>>  wrote:
>>> 
 It seems arbitrary to include Collation but exclude other traits.
>>> 
>>> I would say known in-core vs unknown trait is a reasonable approach to
>>> distingush traits.
>>> 
>>> Vladimir
>> 



Question about Calcite History

2021-05-05 Thread Junwen Liu
Hi Mr. or Ms. :
I'm an engineer who is using Calcite as our Optimizer in our
project.Calcite is an amazing framework that can meet our demands. But I
want to know mostly is the history of Calcite, we also want to create an
open-source project. So I want to know these things about calcite:
1. When did you start this project?
2. What the difference between optiq and Calcite?
3. Where Calcite will go, what features Calcite will support?