Hey Paul,
The XML reader was implemented using the EVF2 Framework and in theory does have 
writers for repeated data types.  I'm not sure to what extent this has been 
tested.
Best,
-- C

> On Aug 15, 2023, at 12:01 AM, Paul Rogers <par0...@gmail.com> wrote:
> 
> IIRC, the syntax for the "provided schema" for arrays is "ARRAY<type>" such
> as "ARRAY<DOUBLE>". This works, however, only if the XML reader uses the
> (very complex) EVF framework and has a way to control parsing based on the
> data type (and to set the data type based on parsing). The JSON reader has
> such an integration. Charles, did you do the work to add that kind of
> dynamic state machine to the XML parser?
> 
> - Paul
> 
> On Mon, Aug 14, 2023 at 6:28 PM Charles Givre <cgi...@gmail.com> wrote:
> 
>> Hi Mike,
>> It is theoretically possible but I don't have an example of the syntax.
>> As you've probably figured out, Drill vectors have both a type and data
>> mode.  The mode is either NULLABLE or REPEATED if I remember correctly.
>> Thus, you could tell Drill via the inline schema that the data mode for a
>> given field is REPEATED and that would be the Drill equivalent of an
>> Array.  I've never actually done this, so I don't really know if it would
>> work for inline schemata but I'd assume that it would.
>> 
>> I'll do some digging to see whether I have any examples of this.
>> Best,
>> --C
>> 
>> 
>> 
>> 
>> 
>>> On Aug 14, 2023, at 3:36 PM, Mike Beckerle <mbecke...@apache.org> wrote:
>>> 
>>> I'm trying to get my Drill SQL queries to produce the right thing from
>> XML.
>>> 
>>> A major thing that you can't easily infer from looking at just XML data
>> is
>>> what is an array. XML lacks an array starting indicator.
>>> 
>>> Is there an inline schema notation in the Drill Query language for
>>> array-ness, so that one can inform Drill what is an array?
>>> 
>>> For example this provides simple types for all the fields directly in the
>>> query.
>>> 
>>> @Test
>>> 
>>> public void testSimpleProvidedSchema() throws Exception {
>>> 
>>> String sql = "SELECT * FROM table(cp.`xml/simple_with_datatypes.xml`
>>> (type => 'xml', schema " +
>>> 
>>>   "=> 'inline=(`int_field` INT, `bigint_field` BIGINT, `float_field`
>>> FLOAT, `double_field` DOUBLE, `boolean_field` " +
>>> 
>>>   "BOOLEAN, `date_field` DATE, `time_field` TIME, `timestamp_field`
>>> TIMESTAMP, `string_field`" +
>>> 
>>>   " VARCHAR, `date2_field` DATE properties {`drill.format` =
>>> `MM/dd/yyyy`})'))";
>>> 
>>> RowSet results = client.queryBuilder().sql(sql).rowSet();
>>> 
>>> assertEquals(2, results.rowCount());
>>> 
>>> 
>>> Can one also tell Drill what fields or child elements are arrays?
>> 
>> 

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to