[jira] [Commented] (DRILL-8453) Add XSD Support to XML Reader (Part 1)
[ https://issues.apache.org/jira/browse/DRILL-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757205#comment-17757205 ] ASF GitHub Bot commented on DRILL-8453: --- cgivre opened a new pull request, #2824: URL: https://github.com/apache/drill/pull/2824 # [DRILL-8453](https://issues.apache.org/jira/browse/DRILL-8453): Add XSD Support to XML Reader (Part 1) ## Description This PR is a part of a series to add better support for reading XML data to Drill. One of the main challenges is that XML data does not have a way of inferring data types, nor does it have a way of detecting arrays. The only way to do this really well is to have a schema. Some XML files link a schema definition file to the data. This PR adds the capability for Drill to map XSD schema files into Drill schemas. The current plan is as follows: Part 1 of this PR simply adds the reader but adds no new user detectable functionality. Part 2 will include the actual integration with the XML reader. Part 3 will include the ability to read arrays. ## Documentation No user facing changes. ## Testing Added new unit tests. > Add XSD Support to XML Reader (Part 1) > -- > > Key: DRILL-8453 > URL: https://issues.apache.org/jira/browse/DRILL-8453 > Project: Apache Drill > Issue Type: Improvement > Components: Format - XML >Affects Versions: 1.21.1 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.21.2 > > > This PR is a part of a series to add better support for reading XML data to > Drill. One of the main challenges is that XML data does not have a way of > inferring data types, nor does it have a way of detecting arrays. > The only way to do this really well is to have a schema. Some XML files link > a schema definition file to the data. This PR adds the capability for Drill > to map XSD schema files into Drill schemas. > The current plan is as follows: Part 1 of this PR simply adds the reader but > adds no new user detectable functionality. Part 2 will include the actual > integration with the XML reader. Part 3 will include the ability to read > arrays. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (DRILL-8453) Add XSD Support to XML Reader (Part 1)
Charles Givre created DRILL-8453: Summary: Add XSD Support to XML Reader (Part 1) Key: DRILL-8453 URL: https://issues.apache.org/jira/browse/DRILL-8453 Project: Apache Drill Issue Type: Improvement Components: Format - XML Affects Versions: 1.21.1 Reporter: Charles Givre Assignee: Charles Givre Fix For: 1.21.2 This PR is a part of a series to add better support for reading XML data to Drill. One of the main challenges is that XML data does not have a way of inferring data types, nor does it have a way of detecting arrays. The only way to do this really well is to have a schema. Some XML files link a schema definition file to the data. This PR adds the capability for Drill to map XSD schema files into Drill schemas. The current plan is as follows: Part 1 of this PR simply adds the reader but adds no new user detectable functionality. Part 2 will include the actual integration with the XML reader. Part 3 will include the ability to read arrays. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (DRILL-8452) Library upgrades
[ https://issues.apache.org/jira/browse/DRILL-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Turton updated DRILL-8452: Description: * aircompressor.version -> 0.25 * antlr.version -> -4.13.0- 4.9.3 * asm.version -> 9.5 * avro.version -> 1.11.2 * commons.compress.version -> 1.23.0 * commons.validator.version -> 1.7 * hbase.version -> 2.5.5 (Hadoop 2 profile) * hbase.version -> 2.5.5-hadoop3 * -hikari.version -> 5.0.1- * httpclient.version -> 4.5.14 * httpdlog-parser.version -> 5.10.0 * jersey.version -> 2.40 * jetty -> 9.4.51.v20230217 * jna.version -> 5.13.0 * joda.version -> 2.12.5 * libthrift.version -> 0.18.1 * log4j.version -> 2.20.0 * -maven.version -> 3.9.4- * metrics.version -> 4.2.19 * protostuff.version -> 1.8.0 * snakeyaml.version -> 2.1 * surefire.version -> 3.1.2 * testcontainers.version -> 1.18.3 was: - hbase.version -> 2.5.5-hadoop3 - avro.version -> 1.11.2 - metrics.version -> 4.2.19 - jersey.version -> 2.40 - asm.version -> 9.5 - antlr.version -> -4.13.0- 4.9.3 - -maven.version -> 3.9.4- - commons.validator.version -> 1.7 - protostuff.version -> 1.8.0 - joda.version -> 2.12.5 - surefire.version -> 3.1.2 - jna.version -> 5.13.0 - commons.compress.version -> 1.23.0 - -hikari.version -> 5.0.1- - httpclient.version -> 4.5.14 - libthrift.version -> 0.18.1 - snakeyaml.version -> 2.1 - testcontainers.version -> 1.18.3 - httpdlog-parser.version -> 5.10.0 - log4j.version -> 2.20.0 - aircompressor.version -> 0.25 - hbase.version -> 2.5.5 > Library upgrades > > > Key: DRILL-8452 > URL: https://issues.apache.org/jira/browse/DRILL-8452 > Project: Apache Drill > Issue Type: Improvement > Components: library >Affects Versions: 1.21.1 >Reporter: James Turton >Assignee: James Turton >Priority: Minor > Fix For: 1.21.2 > > > * aircompressor.version -> 0.25 > * antlr.version -> -4.13.0- 4.9.3 > * asm.version -> 9.5 > * avro.version -> 1.11.2 > * commons.compress.version -> 1.23.0 > * commons.validator.version -> 1.7 > * hbase.version -> 2.5.5 (Hadoop 2 profile) > * hbase.version -> 2.5.5-hadoop3 > * -hikari.version -> 5.0.1- > * httpclient.version -> 4.5.14 > * httpdlog-parser.version -> 5.10.0 > * jersey.version -> 2.40 > * jetty -> 9.4.51.v20230217 > * jna.version -> 5.13.0 > * joda.version -> 2.12.5 > * libthrift.version -> 0.18.1 > * log4j.version -> 2.20.0 > * -maven.version -> 3.9.4- > * metrics.version -> 4.2.19 > * protostuff.version -> 1.8.0 > * snakeyaml.version -> 2.1 > * surefire.version -> 3.1.2 > * testcontainers.version -> 1.18.3 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8450) Add Data Type Inference to XML Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757165#comment-17757165 ] ASF GitHub Bot commented on DRILL-8450: --- cgivre merged PR #2819: URL: https://github.com/apache/drill/pull/2819 > Add Data Type Inference to XML Format Plugin > > > Key: DRILL-8450 > URL: https://issues.apache.org/jira/browse/DRILL-8450 > Project: Apache Drill > Issue Type: Improvement > Components: Format - XML >Affects Versions: 1.21.1 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.22.0 > > > This PR adds data type inference to the XML format plugin. In similar > fashion to other plugins, it adds a new configuration parameter: allTextMode, > which when set to true, reads all data as strings. The default is true. > Note that the inference is limited to doubles, date, timestamps, boolean and > strings. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8450) Add Data Type Inference to XML Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756975#comment-17756975 ] ASF GitHub Bot commented on DRILL-8450: --- jnturton commented on PR #2819: URL: https://github.com/apache/drill/pull/2819#issuecomment-1686562600 LGTM > Add Data Type Inference to XML Format Plugin > > > Key: DRILL-8450 > URL: https://issues.apache.org/jira/browse/DRILL-8450 > Project: Apache Drill > Issue Type: Improvement > Components: Format - XML >Affects Versions: 1.21.1 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.22.0 > > > This PR adds data type inference to the XML format plugin. In similar > fashion to other plugins, it adds a new configuration parameter: allTextMode, > which when set to true, reads all data as strings. The default is true. > Note that the inference is limited to doubles, date, timestamps, boolean and > strings. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8450) Add Data Type Inference to XML Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756962#comment-17756962 ] ASF GitHub Bot commented on DRILL-8450: --- cgivre commented on PR #2819: URL: https://github.com/apache/drill/pull/2819#issuecomment-1686494732 @mbeckerle @jnturton Are we ok to merge this? I'll add support for arrays in a separate PR. > Add Data Type Inference to XML Format Plugin > > > Key: DRILL-8450 > URL: https://issues.apache.org/jira/browse/DRILL-8450 > Project: Apache Drill > Issue Type: Improvement > Components: Format - XML >Affects Versions: 1.21.1 >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Fix For: 1.22.0 > > > This PR adds data type inference to the XML format plugin. In similar > fashion to other plugins, it adds a new configuration parameter: allTextMode, > which when set to true, reads all data as strings. The default is true. > Note that the inference is limited to doubles, date, timestamps, boolean and > strings. -- This message was sent by Atlassian Jira (v8.20.10#820010)