Re: Nested data handling in Caclite

2020-06-19 Thread Slim Bouguerra
Thanks for all the help guys, At this point wondering if I should just
write a udf that does that or add the flag to Calcite, I might try the udf
since I am not a huge fan of config flags.

On Fri, Jun 19, 2020 at 11:41 AM Rui Wang  wrote:

> If it goes back to make struct flattening configurable (thus can be turned
> off), you could check this thread for some context:
>
> https://lists.apache.org/thread.html/6bc9fd2e4c8d09e71740b0544df0982acf0d64321b9d5dc68db6acf1%40%3Cdev.calcite.apache.org%3E
>
>
> -Rui
>
> On Fri, Jun 19, 2020 at 1:21 AM Danny Chan  wrote:
>
> > I tried your patch and I think you are right, there is some missing
> > feature for RelStructuredTypeFlattener,
> > When the field you want to project is a struct but not at top level, an
> > error throws.
> >
> > I tried to close the flattening and it works well, originally
> > the RelStructuredTypeFlattener was designed to extract nested fields so
> > that the work like de-correlation is much easier, but it also makes the
> > plan changed and hard to maintain, it is not updated frequently, if
> > possible, close the flattening.
> >
> > Best,
> > Danny Chan
> > 在 2020年6月19日 +0800 AM4:39,Slim Bouguerra ,写道:
> > > @Danny it is attached to the case CALCITE-4065
> > >
> > >
> >
> https://jira.apache.org/jira/secure/attachment/13005815/13005815_test_cases_CALCITE-4065.patch
> > > Thanks
> > >
> > > On Thu, Jun 18, 2020 at 12:29 AM Danny Chan 
> > wrote:
> > >
> > > > What diff, I didn’t see that ~
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2020年6月16日 +0800 PM11:52,Slim Bouguerra  > >,写道:
> > > > > Hi Danny I have run some test yesterday with
> > > > RelToSqlConverterStructsTest I have attached the diff, let me know
> > what you
> > > > think
> > > > >
> > > > >
> > > > > > On Tue, Jun 16, 2020 at 2:22 AM Danny Chan  >
> > > > wrote:
> > > > > > > Take
> > SqlToRelConverterTest#testAliasUnnestArrayPlanWithSingleColumn
> > > > for an example, you should make every record type with
> > > > StructKind.PEEK_FIELDS, so that nested record type can be also
> accessed
> > > > with DOT.
> > > > > > >
> > > > > > > Best,
> > > > > > > Danny Chan
> > > > > > > 在 2020年6月16日 +0800 PM12:50,Slim Bouguerra  >,写道:
> > > > > > > > Hi Danny,
> > > > > > > > Thanks for the suggestion, but that did not solve the
> problem,
> > > > still
> > > > > > > > getting the same exception, Not sure If I am missing
> something
> > ?
> > > > Do you
> > > > > > > > have an example of this usage ?
> > > > > > > > Again the goal here is to select a Row for a Row as an
> example
> > > > this is the
> > > > > > > > column type sketch
> > > > > > > > outerRow(address_kind, address_inner_row(ZipCode,
> > StreetNum,))
> > > > > > > > SELECT outerRow.address_inner_row FROM table.
> > > > > > > >
> > > > > > > > FYI select outerRow.address_kind works because it is a scalar
> > and
> > > > after
> > > > > > > > adding your suggestion I see that select address_kind from
> > table.
> > > > > > > >
> > > > > > > > On Mon, Jun 15, 2020 at 7:21 PM Danny Chan <
> > yuzhao@gmail.com>
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi, when you create a structure type, you should choose
> > > > > > > > > StructKind.PEEK_FIELDS instead, which let you to access the
> > > > nested fields
> > > > > > > > > with DOT, i.e. “a.b.c”.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Danny Chan
> > > > > > > > > 在 2020年6月16日 +0800 AM4:21,Slim Bouguerra  > >,写道:
> > > > > > > > > > I am using this thread since the question seems related.
> > > > > > > > > > As of now I can not say a way to project a nested record
> > (FYI
> > > > scalar
> > > > > > > > > > works). https://jira.apache.org/jira/browse/CALCITE-4065
> > > > > > > > > > @Igor any idea if this can be done without major work on
> > the
> > > > > > > > > > SqlRelToRelConverter ?
> > > > > > > > > > Also I am thinking about turning off the flatten stage
> but
> > not
> > > > sure this
> > > > > > > > > is
> > > > > > > > > > going to happen (seems like a pandora box kind of flag
> > where
> > > > you do not
> > > > > > > > > > know what to expect)
> > > > > > > > > >
> > > > > > > > > > On Thu, Oct 24, 2019 at 3:53 AM Igor Guzenko <
> > > > ihor.huzenko@gmail.com
> > > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hello Naveen,
> > > > > > > > > > >
> > > > > > > > > > > 1. If I understand correctly, then yes you can extract
> > > > nested fields
> > > > > > > > > from
> > > > > > > > > > > struct type. The syntax depends on StructKind value for
> > your
> > > > data type,
> > > > > > > > > > > for example for FULLY_QUALIFIED struct you should first
> > > > > > > > > > > make alias for your table and then request nested field
> > like,
> > > > > > > > > > > table_alias.struct_column.nested_field. In rel tree
> such
> > > > expressions
> > > > > > > > > are
> > > > > > > > > > > presented as RexCall with SqlItemOperator operator.
> > > > > > > > > > > 2. Yes, this ability was implemente

Re: Nested data handling in Caclite

2020-06-19 Thread Rui Wang
If it goes back to make struct flattening configurable (thus can be turned
off), you could check this thread for some context:
https://lists.apache.org/thread.html/6bc9fd2e4c8d09e71740b0544df0982acf0d64321b9d5dc68db6acf1%40%3Cdev.calcite.apache.org%3E


-Rui

On Fri, Jun 19, 2020 at 1:21 AM Danny Chan  wrote:

> I tried your patch and I think you are right, there is some missing
> feature for RelStructuredTypeFlattener,
> When the field you want to project is a struct but not at top level, an
> error throws.
>
> I tried to close the flattening and it works well, originally
> the RelStructuredTypeFlattener was designed to extract nested fields so
> that the work like de-correlation is much easier, but it also makes the
> plan changed and hard to maintain, it is not updated frequently, if
> possible, close the flattening.
>
> Best,
> Danny Chan
> 在 2020年6月19日 +0800 AM4:39,Slim Bouguerra ,写道:
> > @Danny it is attached to the case CALCITE-4065
> >
> >
> https://jira.apache.org/jira/secure/attachment/13005815/13005815_test_cases_CALCITE-4065.patch
> > Thanks
> >
> > On Thu, Jun 18, 2020 at 12:29 AM Danny Chan 
> wrote:
> >
> > > What diff, I didn’t see that ~
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年6月16日 +0800 PM11:52,Slim Bouguerra  >,写道:
> > > > Hi Danny I have run some test yesterday with
> > > RelToSqlConverterStructsTest I have attached the diff, let me know
> what you
> > > think
> > > >
> > > >
> > > > > On Tue, Jun 16, 2020 at 2:22 AM Danny Chan 
> > > wrote:
> > > > > > Take
> SqlToRelConverterTest#testAliasUnnestArrayPlanWithSingleColumn
> > > for an example, you should make every record type with
> > > StructKind.PEEK_FIELDS, so that nested record type can be also accessed
> > > with DOT.
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > > > 在 2020年6月16日 +0800 PM12:50,Slim Bouguerra ,写道:
> > > > > > > Hi Danny,
> > > > > > > Thanks for the suggestion, but that did not solve the problem,
> > > still
> > > > > > > getting the same exception, Not sure If I am missing something
> ?
> > > Do you
> > > > > > > have an example of this usage ?
> > > > > > > Again the goal here is to select a Row for a Row as an example
> > > this is the
> > > > > > > column type sketch
> > > > > > > outerRow(address_kind, address_inner_row(ZipCode,
> StreetNum,))
> > > > > > > SELECT outerRow.address_inner_row FROM table.
> > > > > > >
> > > > > > > FYI select outerRow.address_kind works because it is a scalar
> and
> > > after
> > > > > > > adding your suggestion I see that select address_kind from
> table.
> > > > > > >
> > > > > > > On Mon, Jun 15, 2020 at 7:21 PM Danny Chan <
> yuzhao@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hi, when you create a structure type, you should choose
> > > > > > > > StructKind.PEEK_FIELDS instead, which let you to access the
> > > nested fields
> > > > > > > > with DOT, i.e. “a.b.c”.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Danny Chan
> > > > > > > > 在 2020年6月16日 +0800 AM4:21,Slim Bouguerra  >,写道:
> > > > > > > > > I am using this thread since the question seems related.
> > > > > > > > > As of now I can not say a way to project a nested record
> (FYI
> > > scalar
> > > > > > > > > works). https://jira.apache.org/jira/browse/CALCITE-4065
> > > > > > > > > @Igor any idea if this can be done without major work on
> the
> > > > > > > > > SqlRelToRelConverter ?
> > > > > > > > > Also I am thinking about turning off the flatten stage but
> not
> > > sure this
> > > > > > > > is
> > > > > > > > > going to happen (seems like a pandora box kind of flag
> where
> > > you do not
> > > > > > > > > know what to expect)
> > > > > > > > >
> > > > > > > > > On Thu, Oct 24, 2019 at 3:53 AM Igor Guzenko <
> > > ihor.huzenko@gmail.com
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello Naveen,
> > > > > > > > > >
> > > > > > > > > > 1. If I understand correctly, then yes you can extract
> > > nested fields
> > > > > > > > from
> > > > > > > > > > struct type. The syntax depends on StructKind value for
> your
> > > data type,
> > > > > > > > > > for example for FULLY_QUALIFIED struct you should first
> > > > > > > > > > make alias for your table and then request nested field
> like,
> > > > > > > > > > table_alias.struct_column.nested_field. In rel tree such
> > > expressions
> > > > > > > > are
> > > > > > > > > > presented as RexCall with SqlItemOperator operator.
> > > > > > > > > > 2. Yes, this ability was implemented in CALCITE-3138
> [1]. It
> > > builds
> > > > > > > > call to
> > > > > > > > > > ROW type constructor function on top of flattened tree
> for
> > > necessary
> > > > > > > > > > columns.
> > > > > > > > > > 3. Yes, examples of such functions are ROW(...),
> > > ANY_VALUE(...) etc.
> > > > > > > > > >
> > > > > > > > > > In current implementation of flattener invocation of ROW
> > > constructor
> > > > > > > > > > function is done despite of null handling same issue
> exists
> > > for some
> > > > > >

Re: Nested data handling in Caclite

2020-06-19 Thread Danny Chan
I tried your patch and I think you are right, there is some missing feature for 
RelStructuredTypeFlattener,
When the field you want to project is a struct but not at top level, an error 
throws.

I tried to close the flattening and it works well, originally the 
RelStructuredTypeFlattener was designed to extract nested fields so that the 
work like de-correlation is much easier, but it also makes the plan changed and 
hard to maintain, it is not updated frequently, if possible, close the 
flattening.

Best,
Danny Chan
在 2020年6月19日 +0800 AM4:39,Slim Bouguerra ,写道:
> @Danny it is attached to the case CALCITE-4065
>
> https://jira.apache.org/jira/secure/attachment/13005815/13005815_test_cases_CALCITE-4065.patch
> Thanks
>
> On Thu, Jun 18, 2020 at 12:29 AM Danny Chan  wrote:
>
> > What diff, I didn’t see that ~
> >
> > Best,
> > Danny Chan
> > 在 2020年6月16日 +0800 PM11:52,Slim Bouguerra ,写道:
> > > Hi Danny I have run some test yesterday with
> > RelToSqlConverterStructsTest I have attached the diff, let me know what you
> > think
> > >
> > >
> > > > On Tue, Jun 16, 2020 at 2:22 AM Danny Chan 
> > wrote:
> > > > > Take SqlToRelConverterTest#testAliasUnnestArrayPlanWithSingleColumn
> > for an example, you should make every record type with
> > StructKind.PEEK_FIELDS, so that nested record type can be also accessed
> > with DOT.
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2020年6月16日 +0800 PM12:50,Slim Bouguerra ,写道:
> > > > > > Hi Danny,
> > > > > > Thanks for the suggestion, but that did not solve the problem,
> > still
> > > > > > getting the same exception, Not sure If I am missing something ?
> > Do you
> > > > > > have an example of this usage ?
> > > > > > Again the goal here is to select a Row for a Row as an example
> > this is the
> > > > > > column type sketch
> > > > > > outerRow(address_kind, address_inner_row(ZipCode, StreetNum,))
> > > > > > SELECT outerRow.address_inner_row FROM table.
> > > > > >
> > > > > > FYI select outerRow.address_kind works because it is a scalar and
> > after
> > > > > > adding your suggestion I see that select address_kind from table.
> > > > > >
> > > > > > On Mon, Jun 15, 2020 at 7:21 PM Danny Chan 
> > wrote:
> > > > > >
> > > > > > > Hi, when you create a structure type, you should choose
> > > > > > > StructKind.PEEK_FIELDS instead, which let you to access the
> > nested fields
> > > > > > > with DOT, i.e. “a.b.c”.
> > > > > > >
> > > > > > > Best,
> > > > > > > Danny Chan
> > > > > > > 在 2020年6月16日 +0800 AM4:21,Slim Bouguerra ,写道:
> > > > > > > > I am using this thread since the question seems related.
> > > > > > > > As of now I can not say a way to project a nested record (FYI
> > scalar
> > > > > > > > works). https://jira.apache.org/jira/browse/CALCITE-4065
> > > > > > > > @Igor any idea if this can be done without major work on the
> > > > > > > > SqlRelToRelConverter ?
> > > > > > > > Also I am thinking about turning off the flatten stage but not
> > sure this
> > > > > > > is
> > > > > > > > going to happen (seems like a pandora box kind of flag where
> > you do not
> > > > > > > > know what to expect)
> > > > > > > >
> > > > > > > > On Thu, Oct 24, 2019 at 3:53 AM Igor Guzenko <
> > ihor.huzenko@gmail.com
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hello Naveen,
> > > > > > > > >
> > > > > > > > > 1. If I understand correctly, then yes you can extract
> > nested fields
> > > > > > > from
> > > > > > > > > struct type. The syntax depends on StructKind value for your
> > data type,
> > > > > > > > > for example for FULLY_QUALIFIED struct you should first
> > > > > > > > > make alias for your table and then request nested field like,
> > > > > > > > > table_alias.struct_column.nested_field. In rel tree such
> > expressions
> > > > > > > are
> > > > > > > > > presented as RexCall with SqlItemOperator operator.
> > > > > > > > > 2. Yes, this ability was implemented in CALCITE-3138 [1]. It
> > builds
> > > > > > > call to
> > > > > > > > > ROW type constructor function on top of flattened tree for
> > necessary
> > > > > > > > > columns.
> > > > > > > > > 3. Yes, examples of such functions are ROW(...),
> > ANY_VALUE(...) etc.
> > > > > > > > >
> > > > > > > > > In current implementation of flattener invocation of ROW
> > constructor
> > > > > > > > > function is done despite of null handling same issue exists
> > for some
> > > > > > > > > aggregate function flattening, like COUNT(struct_column).
> > > > > > > > > Proper null handling is real pain for flattener, original
> > idea was to
> > > > > > > > > handle special null indicator for each flattened struct, but
> > in
> > > > > > > practice I
> > > > > > > > > recognized that it's really hard to deal with flattened
> > fields indices
> > > > > > > when
> > > > > > > > > related methods are called from very different points, so
> > for now the
> > > > > > > > > problem remains unsolved.
> > > > > > > > > If you can't avoid dealing with null values in your struc

Re: Nested data handling in Caclite

2020-06-18 Thread Slim Bouguerra
@Danny it is attached to the case CALCITE-4065

https://jira.apache.org/jira/secure/attachment/13005815/13005815_test_cases_CALCITE-4065.patch
Thanks

On Thu, Jun 18, 2020 at 12:29 AM Danny Chan  wrote:

> What diff, I didn’t see that ~
>
> Best,
> Danny Chan
> 在 2020年6月16日 +0800 PM11:52,Slim Bouguerra ,写道:
> > Hi Danny I have run some test yesterday with
> RelToSqlConverterStructsTest I have attached the diff, let me know what you
> think
> >
> >
> > > On Tue, Jun 16, 2020 at 2:22 AM Danny Chan 
> wrote:
> > > > Take SqlToRelConverterTest#testAliasUnnestArrayPlanWithSingleColumn
> for an example, you should make every record type with
> StructKind.PEEK_FIELDS, so that nested record type can be also accessed
> with DOT.
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2020年6月16日 +0800 PM12:50,Slim Bouguerra ,写道:
> > > > > Hi Danny,
> > > > > Thanks for the suggestion, but that did not solve the problem,
> still
> > > > > getting the same exception, Not sure If I am missing something ?
> Do you
> > > > > have an example of this usage ?
> > > > > Again the goal here is to select a Row for a Row as an example
> this is the
> > > > > column type sketch
> > > > > outerRow(address_kind, address_inner_row(ZipCode, StreetNum,))
> > > > > SELECT outerRow.address_inner_row FROM table.
> > > > >
> > > > > FYI select outerRow.address_kind works because it is a scalar and
> after
> > > > > adding your suggestion I see that select address_kind from table.
> > > > >
> > > > > On Mon, Jun 15, 2020 at 7:21 PM Danny Chan 
> wrote:
> > > > >
> > > > > > Hi, when you create a structure type, you should choose
> > > > > > StructKind.PEEK_FIELDS instead, which let you to access the
> nested fields
> > > > > > with DOT, i.e. “a.b.c”.
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > > > 在 2020年6月16日 +0800 AM4:21,Slim Bouguerra ,写道:
> > > > > > > I am using this thread since the question seems related.
> > > > > > > As of now I can not say a way to project a nested record (FYI
> scalar
> > > > > > > works). https://jira.apache.org/jira/browse/CALCITE-4065
> > > > > > > @Igor any idea if this can be done without major work on the
> > > > > > > SqlRelToRelConverter ?
> > > > > > > Also I am thinking about turning off the flatten stage but not
> sure this
> > > > > > is
> > > > > > > going to happen (seems like a pandora box kind of flag where
> you do not
> > > > > > > know what to expect)
> > > > > > >
> > > > > > > On Thu, Oct 24, 2019 at 3:53 AM Igor Guzenko <
> ihor.huzenko@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hello Naveen,
> > > > > > > >
> > > > > > > > 1. If I understand correctly, then yes you can extract
> nested fields
> > > > > > from
> > > > > > > > struct type. The syntax depends on StructKind value for your
> data type,
> > > > > > > > for example for FULLY_QUALIFIED struct you should first
> > > > > > > > make alias for your table and then request nested field like,
> > > > > > > > table_alias.struct_column.nested_field. In rel tree such
> expressions
> > > > > > are
> > > > > > > > presented as RexCall with SqlItemOperator operator.
> > > > > > > > 2. Yes, this ability was implemented in CALCITE-3138 [1]. It
> builds
> > > > > > call to
> > > > > > > > ROW type constructor function on top of flattened tree for
> necessary
> > > > > > > > columns.
> > > > > > > > 3. Yes, examples of such functions are ROW(...),
> ANY_VALUE(...) etc.
> > > > > > > >
> > > > > > > > In current implementation of flattener invocation of ROW
> constructor
> > > > > > > > function is done despite of null handling same issue exists
> for some
> > > > > > > > aggregate function flattening, like COUNT(struct_column).
> > > > > > > > Proper null handling is real pain for flattener, original
> idea was to
> > > > > > > > handle special null indicator for each flattened struct, but
> in
> > > > > > practice I
> > > > > > > > recognized that it's really hard to deal with flattened
> fields indices
> > > > > > when
> > > > > > > > related methods are called from very different points, so
> for now the
> > > > > > > > problem remains unsolved.
> > > > > > > > If you can't avoid dealing with null values in your struct
> columns you
> > > > > > > > could try to avoid invocation to
> SqlToRelConverter.flattenTypes(...)
> > > > > > and
> > > > > > > > check whether final plan acceptable for you. As far as I know
> > > > > > > > there is no reading material for given topic, you can
> investigate
> > > > > > source
> > > > > > > > code by debugging RelStructuredTypeFlattener and reading
> some related
> > > > > > plans
> > > > > > > > in SqlToRelConverterTest.java and SqlToRelConverterTest.xml.
> > > > > > > >
> > > > > > > > [1] https://issues.apache.org/jira/browse/CALCITE-3138
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Igor
> > > > > > > >
> > > > > > > > On Thu, Oct 24, 2019 at 12:57 PM Naveen Kumar
> > > > > > > >  wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
>

Re: Nested data handling in Caclite

2020-06-18 Thread Danny Chan
What diff, I didn’t see that ~

Best,
Danny Chan
在 2020年6月16日 +0800 PM11:52,Slim Bouguerra ,写道:
> Hi Danny I have run some test yesterday with RelToSqlConverterStructsTest I 
> have attached the diff, let me know what you think
>
>
> > On Tue, Jun 16, 2020 at 2:22 AM Danny Chan  wrote:
> > > Take SqlToRelConverterTest#testAliasUnnestArrayPlanWithSingleColumn for 
> > > an example, you should make every record type with 
> > > StructKind.PEEK_FIELDS, so that nested record type can be also accessed 
> > > with DOT.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年6月16日 +0800 PM12:50,Slim Bouguerra ,写道:
> > > > Hi Danny,
> > > > Thanks for the suggestion, but that did not solve the problem, still
> > > > getting the same exception, Not sure If I am missing something ? Do you
> > > > have an example of this usage ?
> > > > Again the goal here is to select a Row for a Row as an example this is 
> > > > the
> > > > column type sketch
> > > > outerRow(address_kind, address_inner_row(ZipCode, StreetNum,))
> > > > SELECT outerRow.address_inner_row FROM table.
> > > >
> > > > FYI select outerRow.address_kind works because it is a scalar and after
> > > > adding your suggestion I see that select address_kind from table.
> > > >
> > > > On Mon, Jun 15, 2020 at 7:21 PM Danny Chan  wrote:
> > > >
> > > > > Hi, when you create a structure type, you should choose
> > > > > StructKind.PEEK_FIELDS instead, which let you to access the nested 
> > > > > fields
> > > > > with DOT, i.e. “a.b.c”.
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2020年6月16日 +0800 AM4:21,Slim Bouguerra ,写道:
> > > > > > I am using this thread since the question seems related.
> > > > > > As of now I can not say a way to project a nested record (FYI scalar
> > > > > > works). https://jira.apache.org/jira/browse/CALCITE-4065
> > > > > > @Igor any idea if this can be done without major work on the
> > > > > > SqlRelToRelConverter ?
> > > > > > Also I am thinking about turning off the flatten stage but not sure 
> > > > > > this
> > > > > is
> > > > > > going to happen (seems like a pandora box kind of flag where you do 
> > > > > > not
> > > > > > know what to expect)
> > > > > >
> > > > > > On Thu, Oct 24, 2019 at 3:53 AM Igor Guzenko 
> > > > > >  > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hello Naveen,
> > > > > > >
> > > > > > > 1. If I understand correctly, then yes you can extract nested 
> > > > > > > fields
> > > > > from
> > > > > > > struct type. The syntax depends on StructKind value for your data 
> > > > > > > type,
> > > > > > > for example for FULLY_QUALIFIED struct you should first
> > > > > > > make alias for your table and then request nested field like,
> > > > > > > table_alias.struct_column.nested_field. In rel tree such 
> > > > > > > expressions
> > > > > are
> > > > > > > presented as RexCall with SqlItemOperator operator.
> > > > > > > 2. Yes, this ability was implemented in CALCITE-3138 [1]. It 
> > > > > > > builds
> > > > > call to
> > > > > > > ROW type constructor function on top of flattened tree for 
> > > > > > > necessary
> > > > > > > columns.
> > > > > > > 3. Yes, examples of such functions are ROW(...), ANY_VALUE(...) 
> > > > > > > etc.
> > > > > > >
> > > > > > > In current implementation of flattener invocation of ROW 
> > > > > > > constructor
> > > > > > > function is done despite of null handling same issue exists for 
> > > > > > > some
> > > > > > > aggregate function flattening, like COUNT(struct_column).
> > > > > > > Proper null handling is real pain for flattener, original idea 
> > > > > > > was to
> > > > > > > handle special null indicator for each flattened struct, but in
> > > > > practice I
> > > > > > > recognized that it's really hard to deal with flattened fields 
> > > > > > > indices
> > > > > when
> > > > > > > related methods are called from very different points, so for now 
> > > > > > > the
> > > > > > > problem remains unsolved.
> > > > > > > If you can't avoid dealing with null values in your struct 
> > > > > > > columns you
> > > > > > > could try to avoid invocation to 
> > > > > > > SqlToRelConverter.flattenTypes(...)
> > > > > and
> > > > > > > check whether final plan acceptable for you. As far as I know
> > > > > > > there is no reading material for given topic, you can investigate
> > > > > source
> > > > > > > code by debugging RelStructuredTypeFlattener and reading some 
> > > > > > > related
> > > > > plans
> > > > > > > in SqlToRelConverterTest.java and SqlToRelConverterTest.xml.
> > > > > > >
> > > > > > > [1] https://issues.apache.org/jira/browse/CALCITE-3138
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Igor
> > > > > > >
> > > > > > > On Thu, Oct 24, 2019 at 12:57 PM Naveen Kumar
> > > > > > >  wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I work at Flipkart, we are using Calcite in our streaming 
> > > > > > > > platform.
> > > > > In
> > > > > > > most
> > > > > > > > of our use cases, input data is nested. I unde

Re: Nested data handling in Caclite

2020-06-16 Thread Slim Bouguerra
Hi Danny I have run some test yesterday with RelToSqlConverterStructsTest I
have attached the diff, let me know what you think


On Tue, Jun 16, 2020 at 2:22 AM Danny Chan  wrote:

> Take SqlToRelConverterTest#testAliasUnnestArrayPlanWithSingleColumn for an
> example, you should make every record type with StructKind.PEEK_FIELDS, so
> that nested record type can be also accessed with DOT.
>
> Best,
> Danny Chan
> 在 2020年6月16日 +0800 PM12:50,Slim Bouguerra ,写道:
> > Hi Danny,
> > Thanks for the suggestion, but that did not solve the problem, still
> > getting the same exception, Not sure If I am missing something ? Do you
> > have an example of this usage ?
> > Again the goal here is to select a Row for a Row as an example this is
> the
> > column type sketch
> > outerRow(address_kind, address_inner_row(ZipCode, StreetNum,))
> > SELECT outerRow.address_inner_row FROM table.
> >
> > FYI select outerRow.address_kind works because it is a scalar and after
> > adding your suggestion I see that select address_kind from table.
> >
> > On Mon, Jun 15, 2020 at 7:21 PM Danny Chan  wrote:
> >
> > > Hi, when you create a structure type, you should choose
> > > StructKind.PEEK_FIELDS instead, which let you to access the nested
> fields
> > > with DOT, i.e. “a.b.c”.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年6月16日 +0800 AM4:21,Slim Bouguerra ,写道:
> > > > I am using this thread since the question seems related.
> > > > As of now I can not say a way to project a nested record (FYI scalar
> > > > works). https://jira.apache.org/jira/browse/CALCITE-4065
> > > > @Igor any idea if this can be done without major work on the
> > > > SqlRelToRelConverter ?
> > > > Also I am thinking about turning off the flatten stage but not sure
> this
> > > is
> > > > going to happen (seems like a pandora box kind of flag where you do
> not
> > > > know what to expect)
> > > >
> > > > On Thu, Oct 24, 2019 at 3:53 AM Igor Guzenko <
> ihor.huzenko@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hello Naveen,
> > > > >
> > > > > 1. If I understand correctly, then yes you can extract nested
> fields
> > > from
> > > > > struct type. The syntax depends on StructKind value for your data
> type,
> > > > > for example for FULLY_QUALIFIED struct you should first
> > > > > make alias for your table and then request nested field like,
> > > > > table_alias.struct_column.nested_field. In rel tree such
> expressions
> > > are
> > > > > presented as RexCall with SqlItemOperator operator.
> > > > > 2. Yes, this ability was implemented in CALCITE-3138 [1]. It builds
> > > call to
> > > > > ROW type constructor function on top of flattened tree for
> necessary
> > > > > columns.
> > > > > 3. Yes, examples of such functions are ROW(...), ANY_VALUE(...)
> etc.
> > > > >
> > > > > In current implementation of flattener invocation of ROW
> constructor
> > > > > function is done despite of null handling same issue exists for
> some
> > > > > aggregate function flattening, like COUNT(struct_column).
> > > > > Proper null handling is real pain for flattener, original idea was
> to
> > > > > handle special null indicator for each flattened struct, but in
> > > practice I
> > > > > recognized that it's really hard to deal with flattened fields
> indices
> > > when
> > > > > related methods are called from very different points, so for now
> the
> > > > > problem remains unsolved.
> > > > > If you can't avoid dealing with null values in your struct columns
> you
> > > > > could try to avoid invocation to
> SqlToRelConverter.flattenTypes(...)
> > > and
> > > > > check whether final plan acceptable for you. As far as I know
> > > > > there is no reading material for given topic, you can investigate
> > > source
> > > > > code by debugging RelStructuredTypeFlattener and reading some
> related
> > > plans
> > > > > in SqlToRelConverterTest.java and SqlToRelConverterTest.xml.
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/CALCITE-3138
> > > > >
> > > > > Thanks,
> > > > > Igor
> > > > >
> > > > > On Thu, Oct 24, 2019 at 12:57 PM Naveen Kumar
> > > > >  wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I work at Flipkart, we are using Calcite in our streaming
> platform.
> > > In
> > > > > most
> > > > > > of our use cases, input data is nested. I understand Calcite
> flattens
> > > > > > structs in scan and references fields positionally.
> > > > > >
> > > > > > I had a few questions on handling nested data -
> > > > > >
> > > > > > 1. Can RelNode DAG work with nested data (instead of flattened
> > > fields)
> > > > > > by referencing fields through their nested structure eg,
> > > > > > data.order.orderId
> > > > > > 2. In the current flattened behavior, can output of a query be a
> > > > > struct.
> > > > > > Eg if *orderId, orderData.timestamp, orderData.category* are
> output
> > > of
> > > > > > select query, can I declaratively organise output to below json
> > > > > > structure -
> > > > > > 1.
> > > > > >
> > > > > >
> > > > > >
>

Re: Nested data handling in Caclite

2020-06-16 Thread Danny Chan
Take SqlToRelConverterTest#testAliasUnnestArrayPlanWithSingleColumn for an 
example, you should make every record type with StructKind.PEEK_FIELDS, so that 
nested record type can be also accessed with DOT.

Best,
Danny Chan
在 2020年6月16日 +0800 PM12:50,Slim Bouguerra ,写道:
> Hi Danny,
> Thanks for the suggestion, but that did not solve the problem, still
> getting the same exception, Not sure If I am missing something ? Do you
> have an example of this usage ?
> Again the goal here is to select a Row for a Row as an example this is the
> column type sketch
> outerRow(address_kind, address_inner_row(ZipCode, StreetNum,))
> SELECT outerRow.address_inner_row FROM table.
>
> FYI select outerRow.address_kind works because it is a scalar and after
> adding your suggestion I see that select address_kind from table.
>
> On Mon, Jun 15, 2020 at 7:21 PM Danny Chan  wrote:
>
> > Hi, when you create a structure type, you should choose
> > StructKind.PEEK_FIELDS instead, which let you to access the nested fields
> > with DOT, i.e. “a.b.c”.
> >
> > Best,
> > Danny Chan
> > 在 2020年6月16日 +0800 AM4:21,Slim Bouguerra ,写道:
> > > I am using this thread since the question seems related.
> > > As of now I can not say a way to project a nested record (FYI scalar
> > > works). https://jira.apache.org/jira/browse/CALCITE-4065
> > > @Igor any idea if this can be done without major work on the
> > > SqlRelToRelConverter ?
> > > Also I am thinking about turning off the flatten stage but not sure this
> > is
> > > going to happen (seems like a pandora box kind of flag where you do not
> > > know what to expect)
> > >
> > > On Thu, Oct 24, 2019 at 3:53 AM Igor Guzenko  > >
> > > wrote:
> > >
> > > > Hello Naveen,
> > > >
> > > > 1. If I understand correctly, then yes you can extract nested fields
> > from
> > > > struct type. The syntax depends on StructKind value for your data type,
> > > > for example for FULLY_QUALIFIED struct you should first
> > > > make alias for your table and then request nested field like,
> > > > table_alias.struct_column.nested_field. In rel tree such expressions
> > are
> > > > presented as RexCall with SqlItemOperator operator.
> > > > 2. Yes, this ability was implemented in CALCITE-3138 [1]. It builds
> > call to
> > > > ROW type constructor function on top of flattened tree for necessary
> > > > columns.
> > > > 3. Yes, examples of such functions are ROW(...), ANY_VALUE(...) etc.
> > > >
> > > > In current implementation of flattener invocation of ROW constructor
> > > > function is done despite of null handling same issue exists for some
> > > > aggregate function flattening, like COUNT(struct_column).
> > > > Proper null handling is real pain for flattener, original idea was to
> > > > handle special null indicator for each flattened struct, but in
> > practice I
> > > > recognized that it's really hard to deal with flattened fields indices
> > when
> > > > related methods are called from very different points, so for now the
> > > > problem remains unsolved.
> > > > If you can't avoid dealing with null values in your struct columns you
> > > > could try to avoid invocation to SqlToRelConverter.flattenTypes(...)
> > and
> > > > check whether final plan acceptable for you. As far as I know
> > > > there is no reading material for given topic, you can investigate
> > source
> > > > code by debugging RelStructuredTypeFlattener and reading some related
> > plans
> > > > in SqlToRelConverterTest.java and SqlToRelConverterTest.xml.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/CALCITE-3138
> > > >
> > > > Thanks,
> > > > Igor
> > > >
> > > > On Thu, Oct 24, 2019 at 12:57 PM Naveen Kumar
> > > >  wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I work at Flipkart, we are using Calcite in our streaming platform.
> > In
> > > > most
> > > > > of our use cases, input data is nested. I understand Calcite flattens
> > > > > structs in scan and references fields positionally.
> > > > >
> > > > > I had a few questions on handling nested data -
> > > > >
> > > > > 1. Can RelNode DAG work with nested data (instead of flattened
> > fields)
> > > > > by referencing fields through their nested structure eg,
> > > > > data.order.orderId
> > > > > 2. In the current flattened behavior, can output of a query be a
> > > > struct.
> > > > > Eg if *orderId, orderData.timestamp, orderData.category* are output
> > of
> > > > > select query, can I declaratively organise output to below json
> > > > > structure -
> > > > > 1.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > *{ "orderId": "order1", "orderData": { "timestamp": 1571904384814,
> > > > > "category": "shoes" } }*
> > > > > 3. Can output of a UDF be struct type
> > > > >
> > > > > Please point me to any reading material or example that would help
> > with
> > > > > these questions.
> > > > >
> > > > > Regards,
> > > > > Naveen
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > *

Re: Nested data handling in Caclite

2020-06-15 Thread Slim Bouguerra
Hi Danny,
Thanks for the suggestion, but that did not solve the problem, still
getting the same exception, Not sure If I am missing something ? Do you
have an example of this usage ?
Again the goal here is to select a Row for a Row  as an example this is the
column type sketch
outerRow(address_kind, address_inner_row(ZipCode, StreetNum,))
SELECT outerRow.address_inner_row  FROM table.

FYI select outerRow.address_kind works because it is a scalar and after
adding your suggestion I see that select address_kind from table.

On Mon, Jun 15, 2020 at 7:21 PM Danny Chan  wrote:

> Hi, when you create a structure type, you should choose
> StructKind.PEEK_FIELDS instead, which let you to access the nested fields
> with DOT, i.e. “a.b.c”.
>
> Best,
> Danny Chan
> 在 2020年6月16日 +0800 AM4:21,Slim Bouguerra ,写道:
> > I am using this thread since the question seems related.
> > As of now I can not say a way to project a nested record (FYI scalar
> > works). https://jira.apache.org/jira/browse/CALCITE-4065
> > @Igor any idea if this can be done without major work on the
> > SqlRelToRelConverter ?
> > Also I am thinking about turning off the flatten stage but not sure this
> is
> > going to happen (seems like a pandora box kind of flag where you do not
> > know what to expect)
> >
> > On Thu, Oct 24, 2019 at 3:53 AM Igor Guzenko  >
> > wrote:
> >
> > > Hello Naveen,
> > >
> > > 1. If I understand correctly, then yes you can extract nested fields
> from
> > > struct type. The syntax depends on StructKind value for your data type,
> > > for example for FULLY_QUALIFIED struct you should first
> > > make alias for your table and then request nested field like,
> > > table_alias.struct_column.nested_field. In rel tree such expressions
> are
> > > presented as RexCall with SqlItemOperator operator.
> > > 2. Yes, this ability was implemented in CALCITE-3138 [1]. It builds
> call to
> > > ROW type constructor function on top of flattened tree for necessary
> > > columns.
> > > 3. Yes, examples of such functions are ROW(...), ANY_VALUE(...) etc.
> > >
> > > In current implementation of flattener invocation of ROW constructor
> > > function is done despite of null handling same issue exists for some
> > > aggregate function flattening, like COUNT(struct_column).
> > > Proper null handling is real pain for flattener, original idea was to
> > > handle special null indicator for each flattened struct, but in
> practice I
> > > recognized that it's really hard to deal with flattened fields indices
> when
> > > related methods are called from very different points, so for now the
> > > problem remains unsolved.
> > > If you can't avoid dealing with null values in your struct columns you
> > > could try to avoid invocation to SqlToRelConverter.flattenTypes(...)
> and
> > > check whether final plan acceptable for you. As far as I know
> > > there is no reading material for given topic, you can investigate
> source
> > > code by debugging RelStructuredTypeFlattener and reading some related
> plans
> > > in SqlToRelConverterTest.java and SqlToRelConverterTest.xml.
> > >
> > > [1] https://issues.apache.org/jira/browse/CALCITE-3138
> > >
> > > Thanks,
> > > Igor
> > >
> > > On Thu, Oct 24, 2019 at 12:57 PM Naveen Kumar
> > >  wrote:
> > >
> > > > Hi,
> > > >
> > > > I work at Flipkart, we are using Calcite in our streaming platform.
> In
> > > most
> > > > of our use cases, input data is nested. I understand Calcite flattens
> > > > structs in scan and references fields positionally.
> > > >
> > > > I had a few questions on handling nested data -
> > > >
> > > > 1. Can RelNode DAG work with nested data (instead of flattened
> fields)
> > > > by referencing fields through their nested structure eg,
> > > > data.order.orderId
> > > > 2. In the current flattened behavior, can output of a query be a
> > > struct.
> > > > Eg if *orderId, orderData.timestamp, orderData.category* are output
> of
> > > > select query, can I declaratively organise output to below json
> > > > structure -
> > > > 1.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *{ "orderId": "order1", "orderData": { "timestamp": 1571904384814,
> > > > "category": "shoes" } }*
> > > > 3. Can output of a UDF be struct type
> > > >
> > > > Please point me to any reading material or example that would help
> with
> > > > these questions.
> > > >
> > > > Regards,
> > > > Naveen
> > > >
> > > > --
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> *-*
> > > >
> > > >
> > > > *This email and any files transmitted with it are confidential and
> > > > intended solely for the use of the individual or entity to whom they
> are
> > > > addressed. If you have received this email in error, please notify
> the
> > > > system manager. This message contains confidential information and is
> > > > intended only for the individual named. If you are not the named
> > > > addressee,
> > > > you should not disseminate,

Re: Nested data handling in Caclite

2020-06-15 Thread Danny Chan
Hi, when you create a structure type, you should choose StructKind.PEEK_FIELDS 
instead, which let you to access the nested fields with DOT, i.e. “a.b.c”.

Best,
Danny Chan
在 2020年6月16日 +0800 AM4:21,Slim Bouguerra ,写道:
> I am using this thread since the question seems related.
> As of now I can not say a way to project a nested record (FYI scalar
> works). https://jira.apache.org/jira/browse/CALCITE-4065
> @Igor any idea if this can be done without major work on the
> SqlRelToRelConverter ?
> Also I am thinking about turning off the flatten stage but not sure this is
> going to happen (seems like a pandora box kind of flag where you do not
> know what to expect)
>
> On Thu, Oct 24, 2019 at 3:53 AM Igor Guzenko 
> wrote:
>
> > Hello Naveen,
> >
> > 1. If I understand correctly, then yes you can extract nested fields from
> > struct type. The syntax depends on StructKind value for your data type,
> > for example for FULLY_QUALIFIED struct you should first
> > make alias for your table and then request nested field like,
> > table_alias.struct_column.nested_field. In rel tree such expressions are
> > presented as RexCall with SqlItemOperator operator.
> > 2. Yes, this ability was implemented in CALCITE-3138 [1]. It builds call to
> > ROW type constructor function on top of flattened tree for necessary
> > columns.
> > 3. Yes, examples of such functions are ROW(...), ANY_VALUE(...) etc.
> >
> > In current implementation of flattener invocation of ROW constructor
> > function is done despite of null handling same issue exists for some
> > aggregate function flattening, like COUNT(struct_column).
> > Proper null handling is real pain for flattener, original idea was to
> > handle special null indicator for each flattened struct, but in practice I
> > recognized that it's really hard to deal with flattened fields indices when
> > related methods are called from very different points, so for now the
> > problem remains unsolved.
> > If you can't avoid dealing with null values in your struct columns you
> > could try to avoid invocation to SqlToRelConverter.flattenTypes(...) and
> > check whether final plan acceptable for you. As far as I know
> > there is no reading material for given topic, you can investigate source
> > code by debugging RelStructuredTypeFlattener and reading some related plans
> > in SqlToRelConverterTest.java and SqlToRelConverterTest.xml.
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-3138
> >
> > Thanks,
> > Igor
> >
> > On Thu, Oct 24, 2019 at 12:57 PM Naveen Kumar
> >  wrote:
> >
> > > Hi,
> > >
> > > I work at Flipkart, we are using Calcite in our streaming platform. In
> > most
> > > of our use cases, input data is nested. I understand Calcite flattens
> > > structs in scan and references fields positionally.
> > >
> > > I had a few questions on handling nested data -
> > >
> > > 1. Can RelNode DAG work with nested data (instead of flattened fields)
> > > by referencing fields through their nested structure eg,
> > > data.order.orderId
> > > 2. In the current flattened behavior, can output of a query be a
> > struct.
> > > Eg if *orderId, orderData.timestamp, orderData.category* are output of
> > > select query, can I declaratively organise output to below json
> > > structure -
> > > 1.
> > >
> > >
> > >
> > >
> > >
> > > *{ "orderId": "order1", "orderData": { "timestamp": 1571904384814,
> > > "category": "shoes" } }*
> > > 3. Can output of a UDF be struct type
> > >
> > > Please point me to any reading material or example that would help with
> > > these questions.
> > >
> > > Regards,
> > > Naveen
> > >
> > > --
> > >
> > >
> > >
> > >
> > >
> > *-*
> > >
> > >
> > > *This email and any files transmitted with it are confidential and
> > > intended solely for the use of the individual or entity to whom they are
> > > addressed. If you have received this email in error, please notify the
> > > system manager. This message contains confidential information and is
> > > intended only for the individual named. If you are not the named
> > > addressee,
> > > you should not disseminate, distribute or copy this email. Please notify
> > > the sender immediately by email if you have received this email by
> > mistake
> > > and delete this email from your system. If you are not the intended
> > > recipient, you are notified that disclosing, copying, distributing or
> > > taking any action in reliance on the contents of this information is
> > > strictly prohibited.*
> > >
> > > 
> > >
> > > *Any views or opinions presented in this
> > > email are solely those of the author and do not necessarily represent
> > > those
> > > of the organization. Any information on shares, debentures or similar
> > > instruments, recommended product pricing, valuations and the like are for
> > > information purposes only. It is not meant to be an instruction or
> > > recommendation, as the case may be, to buy 

Re: Nested data handling in Caclite

2020-06-15 Thread Rui Wang
>Also I am thinking about turning off the flatten stage but not sure this is
>going to happen (seems like a pandora box kind of flag where you do not
>know what to expect)

It is not happening to make flattener optional by setting a flag (see
[1]).  If there is anything that is not properly flattened and
reconstructed, I think the idea was to improve RelStructuredTypeFlattener
rather than making it optional.


[1]: https://issues.apache.org/jira/browse/CALCITE-3582


-Rui

On Mon, Jun 15, 2020 at 1:21 PM Slim Bouguerra  wrote:

> I am using this thread since the question seems related.
> As of now I can not say a way to project a nested record (FYI scalar
> works). https://jira.apache.org/jira/browse/CALCITE-4065
> @Igor any idea if this can be done without major work on the
> SqlRelToRelConverter ?
> Also I am thinking about turning off the flatten stage but not sure this is
> going to happen (seems like a pandora box kind of flag where you do not
> know what to expect)
>
> On Thu, Oct 24, 2019 at 3:53 AM Igor Guzenko 
> wrote:
>
> > Hello Naveen,
> >
> > 1. If I understand correctly, then yes you can extract nested fields from
> > struct type. The syntax depends on StructKind value for your data type,
> > for example for FULLY_QUALIFIED struct you should first
> > make alias for your table and then request nested field like,
> > table_alias.struct_column.nested_field. In rel tree such expressions are
> > presented as RexCall with SqlItemOperator operator.
> > 2. Yes, this ability was implemented in CALCITE-3138 [1]. It builds call
> to
> > ROW type constructor function on top of flattened tree for necessary
> > columns.
> > 3. Yes, examples of such functions are ROW(...), ANY_VALUE(...) etc.
> >
> > In current implementation of flattener  invocation of ROW constructor
> > function is done despite of null handling same issue exists for some
> > aggregate function flattening, like COUNT(struct_column).
> > Proper null handling is real pain for flattener, original idea was to
> > handle special null indicator for each flattened struct, but in practice
> I
> > recognized that it's really hard to deal with flattened fields indices
> when
> > related methods are called from very different points, so for now the
> > problem remains unsolved.
> > If you can't avoid dealing with null values in your struct columns you
> > could try to avoid invocation to SqlToRelConverter.flattenTypes(...) and
> > check whether final plan acceptable for you. As far as I know
> > there is no reading material for given topic, you can investigate source
> > code by debugging RelStructuredTypeFlattener and reading some related
> plans
> > in SqlToRelConverterTest.java and SqlToRelConverterTest.xml.
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-3138
> >
> > Thanks,
> > Igor
> >
> > On Thu, Oct 24, 2019 at 12:57 PM Naveen Kumar
> >  wrote:
> >
> > > Hi,
> > >
> > > I work at Flipkart, we are using Calcite in our streaming platform. In
> > most
> > > of our use cases, input data is nested. I understand Calcite flattens
> > > structs in scan and references fields positionally.
> > >
> > > I had a few questions on handling nested data -
> > >
> > >1. Can RelNode DAG work with nested data (instead of flattened
> fields)
> > >by referencing fields through their nested structure eg,
> > > data.order.orderId
> > >2. In the current flattened behavior, can output of a query be a
> > struct.
> > >Eg if *orderId, orderData.timestamp, orderData.category* are output
> of
> > >select query, can I declaratively organise output to below json
> > > structure -
> > >   1.
> > >
> > >
> > >
> > >
> > >
> > > *{ "orderId": "order1", "orderData": { "timestamp": 1571904384814,
> > >  "category": "shoes" } }*
> > >   3. Can output of a UDF be struct type
> > >
> > > Please point me to any reading material or example that would help with
> > > these questions.
> > >
> > > Regards,
> > > Naveen
> > >
> > > --
> > >
> > >
> > >
> > >
> > >
> >
> *-*
> > >
> > >
> > > *This email and any files transmitted with it are confidential and
> > > intended solely for the use of the individual or entity to whom they
> are
> > > addressed. If you have received this email in error, please notify the
> > > system manager. This message contains confidential information and is
> > > intended only for the individual named. If you are not the named
> > > addressee,
> > > you should not disseminate, distribute or copy this email. Please
> notify
> > > the sender immediately by email if you have received this email by
> > mistake
> > > and delete this email from your system. If you are not the intended
> > > recipient, you are notified that disclosing, copying, distributing or
> > > taking any action in reliance on the contents of this information is
> > > strictly prohibited.*
> > >
> > >  
> > >
> > > *Any views or opinions presented

Re: Nested data handling in Caclite

2020-06-15 Thread Slim Bouguerra
I am using this thread since the question seems related.
As of now I can not say a way to project a nested record (FYI scalar
works). https://jira.apache.org/jira/browse/CALCITE-4065
@Igor any idea if this can be done without major work on the
SqlRelToRelConverter ?
Also I am thinking about turning off the flatten stage but not sure this is
going to happen (seems like a pandora box kind of flag where you do not
know what to expect)

On Thu, Oct 24, 2019 at 3:53 AM Igor Guzenko 
wrote:

> Hello Naveen,
>
> 1. If I understand correctly, then yes you can extract nested fields from
> struct type. The syntax depends on StructKind value for your data type,
> for example for FULLY_QUALIFIED struct you should first
> make alias for your table and then request nested field like,
> table_alias.struct_column.nested_field. In rel tree such expressions are
> presented as RexCall with SqlItemOperator operator.
> 2. Yes, this ability was implemented in CALCITE-3138 [1]. It builds call to
> ROW type constructor function on top of flattened tree for necessary
> columns.
> 3. Yes, examples of such functions are ROW(...), ANY_VALUE(...) etc.
>
> In current implementation of flattener  invocation of ROW constructor
> function is done despite of null handling same issue exists for some
> aggregate function flattening, like COUNT(struct_column).
> Proper null handling is real pain for flattener, original idea was to
> handle special null indicator for each flattened struct, but in practice I
> recognized that it's really hard to deal with flattened fields indices when
> related methods are called from very different points, so for now the
> problem remains unsolved.
> If you can't avoid dealing with null values in your struct columns you
> could try to avoid invocation to SqlToRelConverter.flattenTypes(...) and
> check whether final plan acceptable for you. As far as I know
> there is no reading material for given topic, you can investigate source
> code by debugging RelStructuredTypeFlattener and reading some related plans
> in SqlToRelConverterTest.java and SqlToRelConverterTest.xml.
>
> [1] https://issues.apache.org/jira/browse/CALCITE-3138
>
> Thanks,
> Igor
>
> On Thu, Oct 24, 2019 at 12:57 PM Naveen Kumar
>  wrote:
>
> > Hi,
> >
> > I work at Flipkart, we are using Calcite in our streaming platform. In
> most
> > of our use cases, input data is nested. I understand Calcite flattens
> > structs in scan and references fields positionally.
> >
> > I had a few questions on handling nested data -
> >
> >1. Can RelNode DAG work with nested data (instead of flattened fields)
> >by referencing fields through their nested structure eg,
> > data.order.orderId
> >2. In the current flattened behavior, can output of a query be a
> struct.
> >Eg if *orderId, orderData.timestamp, orderData.category* are output of
> >select query, can I declaratively organise output to below json
> > structure -
> >   1.
> >
> >
> >
> >
> >
> > *{ "orderId": "order1", "orderData": { "timestamp": 1571904384814,
> >  "category": "shoes" } }*
> >   3. Can output of a UDF be struct type
> >
> > Please point me to any reading material or example that would help with
> > these questions.
> >
> > Regards,
> > Naveen
> >
> > --
> >
> >
> >
> >
> >
> *-*
> >
> >
> > *This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom they are
> > addressed. If you have received this email in error, please notify the
> > system manager. This message contains confidential information and is
> > intended only for the individual named. If you are not the named
> > addressee,
> > you should not disseminate, distribute or copy this email. Please notify
> > the sender immediately by email if you have received this email by
> mistake
> > and delete this email from your system. If you are not the intended
> > recipient, you are notified that disclosing, copying, distributing or
> > taking any action in reliance on the contents of this information is
> > strictly prohibited.*
> >
> >  
> >
> > *Any views or opinions presented in this
> > email are solely those of the author and do not necessarily represent
> > those
> > of the organization. Any information on shares, debentures or similar
> > instruments, recommended product pricing, valuations and the like are for
> > information purposes only. It is not meant to be an instruction or
> > recommendation, as the case may be, to buy or to sell securities,
> > products,
> > services nor an offer to buy or sell securities, products or services
> > unless specifically stated to be so on behalf of the Flipkart group.
> > Employees of the Flipkart group of companies are expressly required not
> to
> > make defamatory statements and not to infringe or authorise any
> > infringement of copyright or any other legal right by email
> > 

Re: Nested data handling in Caclite

2019-10-24 Thread Igor Guzenko
Hello Naveen,

1. If I understand correctly, then yes you can extract nested fields from
struct type. The syntax depends on StructKind value for your data type,
for example for FULLY_QUALIFIED struct you should first
make alias for your table and then request nested field like,
table_alias.struct_column.nested_field. In rel tree such expressions are
presented as RexCall with SqlItemOperator operator.
2. Yes, this ability was implemented in CALCITE-3138 [1]. It builds call to
ROW type constructor function on top of flattened tree for necessary
columns.
3. Yes, examples of such functions are ROW(...), ANY_VALUE(...) etc.

In current implementation of flattener  invocation of ROW constructor
function is done despite of null handling same issue exists for some
aggregate function flattening, like COUNT(struct_column).
Proper null handling is real pain for flattener, original idea was to
handle special null indicator for each flattened struct, but in practice I
recognized that it's really hard to deal with flattened fields indices when
related methods are called from very different points, so for now the
problem remains unsolved.
If you can't avoid dealing with null values in your struct columns you
could try to avoid invocation to SqlToRelConverter.flattenTypes(...) and
check whether final plan acceptable for you. As far as I know
there is no reading material for given topic, you can investigate source
code by debugging RelStructuredTypeFlattener and reading some related plans
in SqlToRelConverterTest.java and SqlToRelConverterTest.xml.

[1] https://issues.apache.org/jira/browse/CALCITE-3138

Thanks,
Igor

On Thu, Oct 24, 2019 at 12:57 PM Naveen Kumar
 wrote:

> Hi,
>
> I work at Flipkart, we are using Calcite in our streaming platform. In most
> of our use cases, input data is nested. I understand Calcite flattens
> structs in scan and references fields positionally.
>
> I had a few questions on handling nested data -
>
>1. Can RelNode DAG work with nested data (instead of flattened fields)
>by referencing fields through their nested structure eg,
> data.order.orderId
>2. In the current flattened behavior, can output of a query be a struct.
>Eg if *orderId, orderData.timestamp, orderData.category* are output of
>select query, can I declaratively organise output to below json
> structure -
>   1.
>
>
>
>
>
> *{ "orderId": "order1", "orderData": { "timestamp": 1571904384814,
>  "category": "shoes" } }*
>   3. Can output of a UDF be struct type
>
> Please point me to any reading material or example that would help with
> these questions.
>
> Regards,
> Naveen
>
> --
>
>
>
>
> *-*
>
>
> *This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they are
> addressed. If you have received this email in error, please notify the
> system manager. This message contains confidential information and is
> intended only for the individual named. If you are not the named
> addressee,
> you should not disseminate, distribute or copy this email. Please notify
> the sender immediately by email if you have received this email by mistake
> and delete this email from your system. If you are not the intended
> recipient, you are notified that disclosing, copying, distributing or
> taking any action in reliance on the contents of this information is
> strictly prohibited.*
>
>  
>
> *Any views or opinions presented in this
> email are solely those of the author and do not necessarily represent
> those
> of the organization. Any information on shares, debentures or similar
> instruments, recommended product pricing, valuations and the like are for
> information purposes only. It is not meant to be an instruction or
> recommendation, as the case may be, to buy or to sell securities,
> products,
> services nor an offer to buy or sell securities, products or services
> unless specifically stated to be so on behalf of the Flipkart group.
> Employees of the Flipkart group of companies are expressly required not to
> make defamatory statements and not to infringe or authorise any
> infringement of copyright or any other legal right by email
> communications.
> Any such communication is contrary to organizational policy and outside
> the
> scope of the employment of the individual concerned. The organization will
> not accept any liability in respect of such communication, and the
> employee
> responsible will be personally liable for any damages or other liability
> arising.*
>
>  
>
> *Our organization accepts no liability for the
> content of this email, or for the consequences of any actions taken on the
> basis of the information *provided,* unless that information is
> subsequently confirmed in writing. If you are not the intended recipient,
> you are notified that disclosing, copying, distributing or taking any
> actio