Hey Mike,
So it looks like I was wrong and the XML reader does not have the support for 
Arrays.  However... Once DRILL-8450 is merged, I'll add the readers for arrays. 
  The XML reader itself still won't be able to dynamically detect them until we 
finish the XSD support, but at least the infra will be there.
Best,
-- C


> On Aug 15, 2023, at 11:39 PM, Charles Givre <cgi...@gmail.com> wrote:
> 
> I stand corrected...  It does not look like the XML reader has any support 
> for arrays.
> -- C
> 
>> On Aug 15, 2023, at 12:01 AM, Paul Rogers <par0...@gmail.com> wrote:
>> 
>> IIRC, the syntax for the "provided schema" for arrays is "ARRAY<type>" such
>> as "ARRAY<DOUBLE>". This works, however, only if the XML reader uses the
>> (very complex) EVF framework and has a way to control parsing based on the
>> data type (and to set the data type based on parsing). The JSON reader has
>> such an integration. Charles, did you do the work to add that kind of
>> dynamic state machine to the XML parser?
>> 
>> - Paul
>> 
>> On Mon, Aug 14, 2023 at 6:28 PM Charles Givre <cgi...@gmail.com> wrote:
>> 
>>> Hi Mike,
>>> It is theoretically possible but I don't have an example of the syntax.
>>> As you've probably figured out, Drill vectors have both a type and data
>>> mode.  The mode is either NULLABLE or REPEATED if I remember correctly.
>>> Thus, you could tell Drill via the inline schema that the data mode for a
>>> given field is REPEATED and that would be the Drill equivalent of an
>>> Array.  I've never actually done this, so I don't really know if it would
>>> work for inline schemata but I'd assume that it would.
>>> 
>>> I'll do some digging to see whether I have any examples of this.
>>> Best,
>>> --C
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Aug 14, 2023, at 3:36 PM, Mike Beckerle <mbecke...@apache.org> wrote:
>>>> 
>>>> I'm trying to get my Drill SQL queries to produce the right thing from
>>> XML.
>>>> 
>>>> A major thing that you can't easily infer from looking at just XML data
>>> is
>>>> what is an array. XML lacks an array starting indicator.
>>>> 
>>>> Is there an inline schema notation in the Drill Query language for
>>>> array-ness, so that one can inform Drill what is an array?
>>>> 
>>>> For example this provides simple types for all the fields directly in the
>>>> query.
>>>> 
>>>> @Test
>>>> 
>>>> public void testSimpleProvidedSchema() throws Exception {
>>>> 
>>>> String sql = "SELECT * FROM table(cp.`xml/simple_with_datatypes.xml`
>>>> (type => 'xml', schema " +
>>>> 
>>>>  "=> 'inline=(`int_field` INT, `bigint_field` BIGINT, `float_field`
>>>> FLOAT, `double_field` DOUBLE, `boolean_field` " +
>>>> 
>>>>  "BOOLEAN, `date_field` DATE, `time_field` TIME, `timestamp_field`
>>>> TIMESTAMP, `string_field`" +
>>>> 
>>>>  " VARCHAR, `date2_field` DATE properties {`drill.format` =
>>>> `MM/dd/yyyy`})'))";
>>>> 
>>>> RowSet results = client.queryBuilder().sql(sql).rowSet();
>>>> 
>>>> assertEquals(2, results.rowCount());
>>>> 
>>>> 
>>>> Can one also tell Drill what fields or child elements are arrays?
>>> 
>>> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to