mbeckerle commented on code in PR #2819: URL: https://github.com/apache/drill/pull/2819#discussion_r1287251884
########## contrib/format-xml/README.md: ########## @@ -15,12 +15,15 @@ The default configuration is shown below: "extensions": [ "xml" ], + "allTextMode": true, "dataLevel": 2 } ``` ## Data Types -All fields are read as strings. Nested fields are read as maps. Future functionality could include support for lists. +The XML reader has an `allTextMode` which, when set to `true` reads all data fields as strings. +When set to `false`, Drill will attempt to infer data types. +Nested fields are read as maps. Future functionality could include support for lists. Review Comment: Not really part of this change set, but I don't know what you are suggesting by "future functionality could include support for lists." I'd like to understand that plan/idea just as part of grokking all of this XML mapping. ########## common/src/main/java/org/apache/drill/common/Typifier.java: ########## @@ -88,6 +96,40 @@ public class Typifier { // If a String contains any of these, try to evaluate it as an equation private static final char[] MathCharacters = new char[]{'+', '-', '/', '*', '='}; + /** + * This function infers the Drill data type of unknown data. + * @param data The input text of unknown data type. + * @return A {@link MinorType} of the Drill data type. + */ + public static MinorType typifyToDrill (String data) { + Entry<Class, String> result = Typifier.typify(data); + String dataType = result.getKey().getSimpleName(); + + // If the string is empty, return UNKNOWN Review Comment: The next line of code contradicts this comment by returning VARCHAR. (Unless VARCHAR == UNKNOWN, which is news to me.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org