[
https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752087#comment-17752087
]
ASF GitHub Bot commented on DRILL-8450:
---------------------------------------
mbeckerle commented on code in PR #2819:
URL: https://github.com/apache/drill/pull/2819#discussion_r1287251884
##########
contrib/format-xml/README.md:
##########
@@ -15,12 +15,15 @@ The default configuration is shown below:
"extensions": [
"xml"
],
+ "allTextMode": true,
"dataLevel": 2
}
```
## Data Types
-All fields are read as strings. Nested fields are read as maps. Future
functionality could include support for lists.
+The XML reader has an `allTextMode` which, when set to `true` reads all data
fields as strings.
+When set to `false`, Drill will attempt to infer data types.
+Nested fields are read as maps. Future functionality could include support
for lists.
Review Comment:
Not really part of this change set, but I don't know what you are suggesting
by "future functionality could include support for lists." I'd like to
understand that plan/idea just as part of grokking all of this XML mapping.
##########
common/src/main/java/org/apache/drill/common/Typifier.java:
##########
@@ -88,6 +96,40 @@ public class Typifier {
// If a String contains any of these, try to evaluate it as an equation
private static final char[] MathCharacters = new char[]{'+', '-', '/', '*',
'='};
+ /**
+ * This function infers the Drill data type of unknown data.
+ * @param data The input text of unknown data type.
+ * @return A {@link MinorType} of the Drill data type.
+ */
+ public static MinorType typifyToDrill (String data) {
+ Entry<Class, String> result = Typifier.typify(data);
+ String dataType = result.getKey().getSimpleName();
+
+ // If the string is empty, return UNKNOWN
Review Comment:
The next line of code contradicts this comment by returning VARCHAR.
(Unless VARCHAR == UNKNOWN, which is news to me.)
> Add Data Type Inference to XML Format Plugin
> --------------------------------------------
>
> Key: DRILL-8450
> URL: https://issues.apache.org/jira/browse/DRILL-8450
> Project: Apache Drill
> Issue Type: Improvement
> Components: Format - XML
> Affects Versions: 1.21.1
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Major
> Fix For: 1.22.0
>
>
> This PR adds data type inference to the XML format plugin. In similar
> fashion to other plugins, it adds a new configuration parameter: allTextMode,
> which when set to true, reads all data as strings. The default is true.
> Note that the inference is limited to doubles, date, timestamps, boolean and
> strings.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)