[
https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756425#comment-17756425
]
ASF GitHub Bot commented on DRILL-8450:
---------------------------------------
cgivre commented on code in PR #2819:
URL: https://github.com/apache/drill/pull/2819#discussion_r1299285670
##########
contrib/storage-http/src/main/java/org/apache/drill/exec/store/http/HttpXmlOptions.java:
##########
@@ -111,7 +111,7 @@ public String toString() {
public static class HttpXmlOptionsBuilder {
private int dataLevel;
- private boolean allTextMode;
+ private Boolean allTextMode;
Review Comment:
@mbeckerle
In the JSON reader there are two parameters: `allTextMode` and
`readAllNumbersAsDouble`. Both are boolean. For the XML reader, I chose not
to implement the `readAllNumbersAsDouble` parameter because in practice, it
requires very clean data. From using Drill with clients, I can tell you from
a lot of personal experience that this was one of the biggest data challenges.
For instance, you'd get data where there was an DOUBLE field and then there
would be a row with zero denoted as `0`. This would then cause schema change
exceptions.
We have actually made significant improvements in Drill's implicit casting
rules which do prevent a lot of schema change exceptions and as a result, IMHO,
it makes distinguishing between INTs and DOUBLES a lot less important. So..
out of laziness I decided it wasn't worth it. I can be convinced otherwise.
> Add Data Type Inference to XML Format Plugin
> --------------------------------------------
>
> Key: DRILL-8450
> URL: https://issues.apache.org/jira/browse/DRILL-8450
> Project: Apache Drill
> Issue Type: Improvement
> Components: Format - XML
> Affects Versions: 1.21.1
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Major
> Fix For: 1.22.0
>
>
> This PR adds data type inference to the XML format plugin. In similar
> fashion to other plugins, it adds a new configuration parameter: allTextMode,
> which when set to true, reads all data as strings. The default is true.
> Note that the inference is limited to doubles, date, timestamps, boolean and
> strings.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)