[ https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752087#comment-17752087 ]
ASF GitHub Bot commented on DRILL-8450: --------------------------------------- mbeckerle commented on code in PR #2819: URL: https://github.com/apache/drill/pull/2819#discussion_r1287251884 ########## contrib/format-xml/README.md: ########## @@ -15,12 +15,15 @@ The default configuration is shown below: "extensions": [ "xml" ], + "allTextMode": true, "dataLevel": 2 } ``` ## Data Types -All fields are read as strings. Nested fields are read as maps. Future functionality could include support for lists. +The XML reader has an `allTextMode` which, when set to `true` reads all data fields as strings. +When set to `false`, Drill will attempt to infer data types. +Nested fields are read as maps. Future functionality could include support for lists. Review Comment: Not really part of this change set, but I don't know what you are suggesting by "future functionality could include support for lists." I'd like to understand that plan/idea just as part of grokking all of this XML mapping. ########## common/src/main/java/org/apache/drill/common/Typifier.java: ########## @@ -88,6 +96,40 @@ public class Typifier { // If a String contains any of these, try to evaluate it as an equation private static final char[] MathCharacters = new char[]{'+', '-', '/', '*', '='}; + /** + * This function infers the Drill data type of unknown data. + * @param data The input text of unknown data type. + * @return A {@link MinorType} of the Drill data type. + */ + public static MinorType typifyToDrill (String data) { + Entry<Class, String> result = Typifier.typify(data); + String dataType = result.getKey().getSimpleName(); + + // If the string is empty, return UNKNOWN Review Comment: The next line of code contradicts this comment by returning VARCHAR. (Unless VARCHAR == UNKNOWN, which is news to me.) > Add Data Type Inference to XML Format Plugin > -------------------------------------------- > > Key: DRILL-8450 > URL: https://issues.apache.org/jira/browse/DRILL-8450 > Project: Apache Drill > Issue Type: Improvement > Components: Format - XML > Affects Versions: 1.21.1 > Reporter: Charles Givre > Assignee: Charles Givre > Priority: Major > Fix For: 1.22.0 > > > This PR adds data type inference to the XML format plugin. In similar > fashion to other plugins, it adds a new configuration parameter: allTextMode, > which when set to true, reads all data as strings. The default is true. > Note that the inference is limited to doubles, date, timestamps, boolean and > strings. -- This message was sent by Atlassian Jira (v8.20.10#820010)