Re: [Discuss] Make flattening on Struct/Row optional

Julian Hyde Wed, 05 Sep 2018 11:59:58 -0700

It might not be minor, but it’s worth a try. At optimization time we treat all 
fields as fields, regardless of whether they have complex types (maps, arrays, 
multisets, records) so there should not be too many problems. The flattening 
was mainly for the benefit of the runtime.



> On Sep 5, 2018, at 11:32 AM, Rui Wang <ruw...@google.com.INVALID> wrote:
> 
> Thanks for your helpful response! It seems like disabling the flattening
> will at least affect some rules in optimization. It might not be a minor
> change.
> 
> 
> -Rui
> 
> On Wed, Sep 5, 2018 at 4:54 AM Stamatis Zampetakis <zabe...@gmail.com>
> wrote:
> 
>> Hi Rui,
>> 
>> Disabling flattening in some cases seems reasonable.
>> 
>> If I am not mistaken, even in the existing code it is not used all the time
>> so it makes sense to become configurable.
>> For example, Calcite prepared statements (CalcitePrepareImpl) are using the
>> flattener only for DDL operations that create materialized views (and this
>> is because this code at some point passes from the PlannerImpl).
>> On the other hand, any query that is using the Planner will also pass from
>> the flattener.
>> 
>> Disabling the flattener does not mean that all rules will work without
>> problems. The Javadoc of the RelStructuredTypeFlattener at some point says
>> "This approach has the benefit that real optimizer and codegen rules never
>> have to deal with structured types.". Due to this, it is very likely that
>> some rules were written based on the fact that there are no structured
>> types.
>> 
>> Best,
>> Stamatis
>> 
>> 
>> Στις Τετ, 5 Σεπ 2018 στις 9:48 π.μ., ο/η Julian Hyde <jh...@apache.org>
>> έγραψε:
>> 
>>> Flattening was introduced mainly because the original engine used flat
>>> column-oriented storage. Now we have several ways to executing,
>>> including generating java code.
>>> 
>>> Adding a mode to disable flattening might make sense.
>>> On Tue, Sep 4, 2018 at 12:52 PM Rui Wang <ruw...@google.com.invalid>
>>> wrote:
>>>> 
>>>> Hi Community,
>>>> 
>>>> While trying to support Row type in Apache Beam SQL on top of Calcite,
>> I
>>>> realized flattening Row logic will make structure information of Row
>> lost
>>>> after Projections. There is a use case where users want to mix Beam
>>>> programming model with Beam SQL together to process a dataset. The
>>>> following is an example of the use case:
>>>> 
>>>> dataset.apply(something user defined)
>>>>            .apply(SELECT ...)
>>>>            .apply(something user defined)
>>>> 
>>>> As you can see, after the SQL statement is applied, the data structure
>>>> should be preserved for further processing.
>>>> 
>>>> The most straightforward way to me is to make Struct fattening optional
>>> so
>>>> I could choose to disable it and the Row structure is preserved. Can I
>>> ask
>>>> if it is feasible to make it happen? What could happen if Calcite just
>>>> doesn't flatten Struct in flattener? (I tried to disable it but had
>>>> exceptions in optimizer. I wasn't sure if that were some minor thing to
>>> fix
>>>> or Struct flattening was a design choice so the impact of change was
>>> huge)
>>>> 
>>>> Additionally, if there is a way to keep the information that I can use
>> to
>>>> reconstruct the Row after projections, it might be ok as well. Does
>> this
>>>> idea exist in Calcite? If it does not exist, how is this idea compared
>>> with
>>>> disabling Struct flattening?
>>>> 
>>>> Thanks,
>>>> Rui
>>> 
>>

Re: [Discuss] Make flattening on Struct/Row optional

Reply via email to