[ 
https://issues.apache.org/jira/browse/DRILL-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752112#comment-17752112
 ] 

ASF GitHub Bot commented on DRILL-8450:
---------------------------------------

mbeckerle commented on code in PR #2819:
URL: https://github.com/apache/drill/pull/2819#discussion_r1287322034


##########
common/src/main/java/org/apache/drill/common/Typifier.java:
##########
@@ -88,6 +96,40 @@ public class Typifier {
   // If a String contains any of these, try to evaluate it as an equation
   private static final char[] MathCharacters = new char[]{'+', '-', '/', '*', 
'='};
 
+  /**
+   * This function infers the Drill data type of unknown data.
+   * @param data The input text of unknown data type.
+   * @return A {@link MinorType} of the Drill data type.
+   */
+  public static MinorType typifyToDrill (String data) {
+    Entry<Class, String> result = Typifier.typify(data);
+    String dataType = result.getKey().getSimpleName();
+
+    // If the string is empty, return UNKNOWN

Review Comment:
   Makes perfect sense. 
   
   For XML you need XSD to know what's potentially repeating. 
   
   Sometimes that is easy because of minOccurs/maxOccurs.
   
   But there's also these "implied arrays".
   ```
   <element name="a" type="xs:int"/><!

> Add Data Type Inference to XML Format Plugin
> --------------------------------------------
>
>                 Key: DRILL-8450
>                 URL: https://issues.apache.org/jira/browse/DRILL-8450
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Format - XML
>    Affects Versions: 1.21.1
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 1.22.0
>
>
> This PR adds data type inference to the XML format plugin.  In similar 
> fashion to other plugins, it adds a new configuration parameter: allTextMode, 
> which when set to true, reads all data as strings.  The default is true.
> Note that the inference is limited to doubles, date, timestamps, boolean and 
> strings.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to