[jira] [Commented] (DRILL-8453) Add XSD Support to XML Reader (Part 1)

2023-08-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757205#comment-17757205
 ] 

ASF GitHub Bot commented on DRILL-8453:
---

cgivre opened a new pull request, #2824:
URL: https://github.com/apache/drill/pull/2824

   # [DRILL-8453](https://issues.apache.org/jira/browse/DRILL-8453): Add XSD 
Support to XML Reader (Part 1)
   
   ## Description
   This PR is a part of a series to add better support for reading XML data to 
Drill.  One of the main challenges is that XML data does not have a way of 
inferring data types, nor does it have a way of detecting arrays.  
   The only way to do this really well is to have a schema.  Some XML files 
link a schema definition file to the data.  This PR adds the capability for 
Drill to map XSD schema files into Drill schemas.  
   The current plan is as follows: Part 1 of this PR simply adds the reader but 
adds no new user detectable functionality.  Part 2 will include the actual 
integration with the XML reader.  Part 3 will include the ability to read 
arrays.
   
   ## Documentation
   No user facing changes.
   
   ## Testing
   Added new unit tests.




> Add XSD Support to XML Reader (Part 1)
> --
>
> Key: DRILL-8453
> URL: https://issues.apache.org/jira/browse/DRILL-8453
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Format - XML
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.21.2
>
>
> This PR is a part of a series to add better support for reading XML data to 
> Drill.  One of the main challenges is that XML data does not have a way of 
> inferring data types, nor does it have a way of detecting arrays.  
> The only way to do this really well is to have a schema.  Some XML files link 
> a schema definition file to the data.  This PR adds the capability for Drill 
> to map XSD schema files into Drill schemas.  
> The current plan is as follows: Part 1 of this PR simply adds the reader but 
> adds no new user detectable functionality.  Part 2 will include the actual 
> integration with the XML reader.  Part 3 will include the ability to read 
> arrays.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (DRILL-8453) Add XSD Support to XML Reader (Part 1)

2023-08-21 Thread Charles Givre (Jira)
Charles Givre created DRILL-8453:


 Summary: Add XSD Support to XML Reader (Part 1)
 Key: DRILL-8453
 URL: https://issues.apache.org/jira/browse/DRILL-8453
 Project: Apache Drill
  Issue Type: Improvement
  Components: Format - XML
Affects Versions: 1.21.1
Reporter: Charles Givre
Assignee: Charles Givre
 Fix For: 1.21.2


This PR is a part of a series to add better support for reading XML data to 
Drill.  One of the main challenges is that XML data does not have a way of 
inferring data types, nor does it have a way of detecting arrays.  

The only way to do this really well is to have a schema.  Some XML files link a 
schema definition file to the data.  This PR adds the capability for Drill to 
map XSD schema files into Drill schemas.  

The current plan is as follows: Part 1 of this PR simply adds the reader but 
adds no new user detectable functionality.  Part 2 will include the actual 
integration with the XML reader.  Part 3 will include the ability to read 
arrays.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (DRILL-8452) Library upgrades

2023-08-21 Thread James Turton (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Turton updated DRILL-8452:

Description: 
* aircompressor.version -> 0.25
 * antlr.version -> -4.13.0- 4.9.3
 * asm.version -> 9.5
 * avro.version -> 1.11.2
 * commons.compress.version -> 1.23.0
 * commons.validator.version -> 1.7
 * hbase.version -> 2.5.5 (Hadoop 2 profile)
 * hbase.version -> 2.5.5-hadoop3
 * -hikari.version -> 5.0.1-
 * httpclient.version -> 4.5.14
 * httpdlog-parser.version -> 5.10.0
 * jersey.version -> 2.40
 * jetty -> 9.4.51.v20230217
 * jna.version -> 5.13.0
 * joda.version -> 2.12.5
 * libthrift.version -> 0.18.1
 * log4j.version -> 2.20.0
 * -maven.version -> 3.9.4-
 * metrics.version -> 4.2.19
 * protostuff.version -> 1.8.0
 * snakeyaml.version -> 2.1
 * surefire.version -> 3.1.2
 * testcontainers.version -> 1.18.3

  was:
- hbase.version -> 2.5.5-hadoop3
 - avro.version -> 1.11.2
 - metrics.version -> 4.2.19
 - jersey.version -> 2.40
 - asm.version -> 9.5
 - antlr.version -> -4.13.0- 4.9.3
 - -maven.version -> 3.9.4-
 - commons.validator.version -> 1.7
 - protostuff.version -> 1.8.0
 - joda.version -> 2.12.5
 - surefire.version -> 3.1.2
 - jna.version -> 5.13.0
 - commons.compress.version -> 1.23.0
 - -hikari.version -> 5.0.1-
 - httpclient.version -> 4.5.14
 - libthrift.version -> 0.18.1
 - snakeyaml.version -> 2.1
 - testcontainers.version -> 1.18.3
 - httpdlog-parser.version -> 5.10.0
 - log4j.version -> 2.20.0
 - aircompressor.version -> 0.25
 - hbase.version -> 2.5.5


> Library upgrades
> 
>
> Key: DRILL-8452
> URL: https://issues.apache.org/jira/browse/DRILL-8452
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: library
>Affects Versions: 1.21.1
>Reporter: James Turton
>Assignee: James Turton
>Priority: Minor
> Fix For: 1.21.2
>
>
> * aircompressor.version -> 0.25
>  * antlr.version -> -4.13.0- 4.9.3
>  * asm.version -> 9.5
>  * avro.version -> 1.11.2
>  * commons.compress.version -> 1.23.0
>  * commons.validator.version -> 1.7
>  * hbase.version -> 2.5.5 (Hadoop 2 profile)
>  * hbase.version -> 2.5.5-hadoop3
>  * -hikari.version -> 5.0.1-
>  * httpclient.version -> 4.5.14
>  * httpdlog-parser.version -> 5.10.0
>  * jersey.version -> 2.40
>  * jetty -> 9.4.51.v20230217
>  * jna.version -> 5.13.0
>  * joda.version -> 2.12.5
>  * libthrift.version -> 0.18.1
>  * log4j.version -> 2.20.0
>  * -maven.version -> 3.9.4-
>  * metrics.version -> 4.2.19
>  * protostuff.version -> 1.8.0
>  * snakeyaml.version -> 2.1
>  * surefire.version -> 3.1.2
>  * testcontainers.version -> 1.18.3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8450) Add Data Type Inference to XML Format Plugin

2023-08-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757165#comment-17757165
 ] 

ASF GitHub Bot commented on DRILL-8450:
---

cgivre merged PR #2819:
URL: https://github.com/apache/drill/pull/2819




> Add Data Type Inference to XML Format Plugin
> 
>
> Key: DRILL-8450
> URL: https://issues.apache.org/jira/browse/DRILL-8450
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Format - XML
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>
> This PR adds data type inference to the XML format plugin.  In similar 
> fashion to other plugins, it adds a new configuration parameter: allTextMode, 
> which when set to true, reads all data as strings.  The default is true.
> Note that the inference is limited to doubles, date, timestamps, boolean and 
> strings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8450) Add Data Type Inference to XML Format Plugin

2023-08-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756975#comment-17756975
 ] 

ASF GitHub Bot commented on DRILL-8450:
---

jnturton commented on PR #2819:
URL: https://github.com/apache/drill/pull/2819#issuecomment-1686562600

   LGTM




> Add Data Type Inference to XML Format Plugin
> 
>
> Key: DRILL-8450
> URL: https://issues.apache.org/jira/browse/DRILL-8450
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Format - XML
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>
> This PR adds data type inference to the XML format plugin.  In similar 
> fashion to other plugins, it adds a new configuration parameter: allTextMode, 
> which when set to true, reads all data as strings.  The default is true.
> Note that the inference is limited to doubles, date, timestamps, boolean and 
> strings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8450) Add Data Type Inference to XML Format Plugin

2023-08-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756962#comment-17756962
 ] 

ASF GitHub Bot commented on DRILL-8450:
---

cgivre commented on PR #2819:
URL: https://github.com/apache/drill/pull/2819#issuecomment-1686494732

   @mbeckerle @jnturton Are we ok to merge this?  I'll add support for arrays 
in a separate PR.




> Add Data Type Inference to XML Format Plugin
> 
>
> Key: DRILL-8450
> URL: https://issues.apache.org/jira/browse/DRILL-8450
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Format - XML
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Assignee: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>
> This PR adds data type inference to the XML format plugin.  In similar 
> fashion to other plugins, it adds a new configuration parameter: allTextMode, 
> which when set to true, reads all data as strings.  The default is true.
> Note that the inference is limited to doubles, date, timestamps, boolean and 
> strings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)