[
https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752112#comment-17752112
]
ASF GitHub Bot commented on DRILL-8450:
---------------------------------------
mbeckerle commented on code in PR #2819:
URL: https://github.com/apache/drill/pull/2819#discussion_r1287322034
##########
common/src/main/java/org/apache/drill/common/Typifier.java:
##########
@@ -88,6 +96,40 @@ public class Typifier {
// If a String contains any of these, try to evaluate it as an equation
private static final char[] MathCharacters = new char[]{'+', '-', '/', '*',
'='};
+ /**
+ * This function infers the Drill data type of unknown data.
+ * @param data The input text of unknown data type.
+ * @return A {@link MinorType} of the Drill data type.
+ */
+ public static MinorType typifyToDrill (String data) {
+ Entry<Class, String> result = Typifier.typify(data);
+ String dataType = result.getKey().getSimpleName();
+
+ // If the string is empty, return UNKNOWN
Review Comment:
Makes perfect sense.
For XML you need XSD to know what's potentially repeating.
Sometimes that is easy because of minOccurs/maxOccurs.
But there's also these "implied arrays".
```
<element name="a" type="xs:int"/><!
> Add Data Type Inference to XML Format Plugin
> --------------------------------------------
>
> Key: DRILL-8450
> URL: https://issues.apache.org/jira/browse/DRILL-8450
> Project: Apache Drill
> Issue Type: Improvement
> Components: Format - XML
> Affects Versions: 1.21.1
> Reporter: Charles Givre
> Assignee: Charles Givre
> Priority: Major
> Fix For: 1.22.0
>
>
> This PR adds data type inference to the XML format plugin. In similar
> fashion to other plugins, it adds a new configuration parameter: allTextMode,
> which when set to true, reads all data as strings. The default is true.
> Note that the inference is limited to doubles, date, timestamps, boolean and
> strings.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)