Re: is there a way to provide inline array metadata to inform the xml_reader?

2023-08-14 Thread Paul Rogers
IIRC, the syntax for the "provided schema" for arrays is "ARRAY" such
as "ARRAY". This works, however, only if the XML reader uses the
(very complex) EVF framework and has a way to control parsing based on the
data type (and to set the data type based on parsing). The JSON reader has
such an integration. Charles, did you do the work to add that kind of
dynamic state machine to the XML parser?

- Paul

On Mon, Aug 14, 2023 at 6:28 PM Charles Givre  wrote:

> Hi Mike,
> It is theoretically possible but I don't have an example of the syntax.
> As you've probably figured out, Drill vectors have both a type and data
> mode.  The mode is either NULLABLE or REPEATED if I remember correctly.
> Thus, you could tell Drill via the inline schema that the data mode for a
> given field is REPEATED and that would be the Drill equivalent of an
> Array.  I've never actually done this, so I don't really know if it would
> work for inline schemata but I'd assume that it would.
>
> I'll do some digging to see whether I have any examples of this.
> Best,
> --C
>
>
>
>
>
> > On Aug 14, 2023, at 3:36 PM, Mike Beckerle  wrote:
> >
> > I'm trying to get my Drill SQL queries to produce the right thing from
> XML.
> >
> > A major thing that you can't easily infer from looking at just XML data
> is
> > what is an array. XML lacks an array starting indicator.
> >
> > Is there an inline schema notation in the Drill Query language for
> > array-ness, so that one can inform Drill what is an array?
> >
> > For example this provides simple types for all the fields directly in the
> > query.
> >
> > @Test
> >
> > public void testSimpleProvidedSchema() throws Exception {
> >
> >  String sql = "SELECT * FROM table(cp.`xml/simple_with_datatypes.xml`
> > (type => 'xml', schema " +
> >
> >"=> 'inline=(`int_field` INT, `bigint_field` BIGINT, `float_field`
> > FLOAT, `double_field` DOUBLE, `boolean_field` " +
> >
> >"BOOLEAN, `date_field` DATE, `time_field` TIME, `timestamp_field`
> > TIMESTAMP, `string_field`" +
> >
> >" VARCHAR, `date2_field` DATE properties {`drill.format` =
> > `MM/dd/`})'))";
> >
> >  RowSet results = client.queryBuilder().sql(sql).rowSet();
> >
> >  assertEquals(2, results.rowCount());
> >
> >
> > Can one also tell Drill what fields or child elements are arrays?
>
>


Re: is there a way to provide inline array metadata to inform the xml_reader?

2023-08-14 Thread Charles Givre
Hi Mike,
It is theoretically possible but I don't have an example of the syntax.  As 
you've probably figured out, Drill vectors have both a type and data mode.  The 
mode is either NULLABLE or REPEATED if I remember correctly.  Thus, you could 
tell Drill via the inline schema that the data mode for a given field is 
REPEATED and that would be the Drill equivalent of an Array.  I've never 
actually done this, so I don't really know if it would work for inline schemata 
but I'd assume that it would.

I'll do some digging to see whether I have any examples of this.
Best,
--C





> On Aug 14, 2023, at 3:36 PM, Mike Beckerle  wrote:
> 
> I'm trying to get my Drill SQL queries to produce the right thing from XML.
> 
> A major thing that you can't easily infer from looking at just XML data is
> what is an array. XML lacks an array starting indicator.
> 
> Is there an inline schema notation in the Drill Query language for
> array-ness, so that one can inform Drill what is an array?
> 
> For example this provides simple types for all the fields directly in the
> query.
> 
> @Test
> 
> public void testSimpleProvidedSchema() throws Exception {
> 
>  String sql = "SELECT * FROM table(cp.`xml/simple_with_datatypes.xml`
> (type => 'xml', schema " +
> 
>"=> 'inline=(`int_field` INT, `bigint_field` BIGINT, `float_field`
> FLOAT, `double_field` DOUBLE, `boolean_field` " +
> 
>"BOOLEAN, `date_field` DATE, `time_field` TIME, `timestamp_field`
> TIMESTAMP, `string_field`" +
> 
>" VARCHAR, `date2_field` DATE properties {`drill.format` =
> `MM/dd/`})'))";
> 
>  RowSet results = client.queryBuilder().sql(sql).rowSet();
> 
>  assertEquals(2, results.rowCount());
> 
> 
> Can one also tell Drill what fields or child elements are arrays?



signature.asc
Description: Message signed with OpenPGP


is there a way to provide inline array metadata to inform the xml_reader?

2023-08-14 Thread Mike Beckerle
I'm trying to get my Drill SQL queries to produce the right thing from XML.

A major thing that you can't easily infer from looking at just XML data is
what is an array. XML lacks an array starting indicator.

Is there an inline schema notation in the Drill Query language for
array-ness, so that one can inform Drill what is an array?

For example this provides simple types for all the fields directly in the
query.

@Test

public void testSimpleProvidedSchema() throws Exception {

  String sql = "SELECT * FROM table(cp.`xml/simple_with_datatypes.xml`
(type => 'xml', schema " +

"=> 'inline=(`int_field` INT, `bigint_field` BIGINT, `float_field`
FLOAT, `double_field` DOUBLE, `boolean_field` " +

"BOOLEAN, `date_field` DATE, `time_field` TIME, `timestamp_field`
TIMESTAMP, `string_field`" +

" VARCHAR, `date2_field` DATE properties {`drill.format` =
`MM/dd/`})'))";

  RowSet results = client.queryBuilder().sql(sql).rowSet();

  assertEquals(2, results.rowCount());


Can one also tell Drill what fields or child elements are arrays?