jnturton commented on code in PR #2710:
URL: https://github.com/apache/drill/pull/2710#discussion_r1033143646


##########
contrib/format-xml/src/main/java/org/apache/drill/exec/store/xml/XMLReader.java:
##########
@@ -428,8 +435,67 @@ private void writeFieldData(String fieldName, String 
fieldValue, TupleWriter wri
       index = writer.addColumn(colSchema);
     }
     ScalarWriter colWriter = writer.scalar(index);
+    ColumnMetadata columnMetadata = writer.tupleSchema().metadata(index);
+    MinorType dataType = columnMetadata.schema().getType().getMinorType();
+    String dateFormat;
+
+    // Write the values depending on their data type.  This only applies to 
scalar fields.
     if (fieldValue != null && (currentState != xmlState.ROW_ENDED && 
currentState != xmlState.FIELD_ENDED)) {
-      colWriter.setString(fieldValue);
+      switch (dataType) {
+        case BIT:
+          colWriter.setBoolean(Boolean.parseBoolean(fieldValue));
+          break;
+        case TINYINT:
+        case SMALLINT:
+        case INT:
+          colWriter.setInt(Integer.parseInt(fieldValue));
+          break;
+        case BIGINT:
+          colWriter.setLong(Long.parseLong(fieldValue));
+          break;
+        case FLOAT4:
+        case FLOAT8:
+          colWriter.setDouble(Double.parseDouble(fieldValue));
+          break;
+        case DATE:
+          dateFormat = columnMetadata.property("drill.format");
+          LocalDate localDate;
+          if (Strings.isNullOrEmpty(dateFormat)) {
+            localDate = LocalDate.parse(fieldValue);
+          } else {
+            localDate = LocalDate.parse(fieldValue, 
DateTimeFormatter.ofPattern(dateFormat));
+          }
+          colWriter.setDate(localDate);
+          break;
+        case TIME:
+          dateFormat = columnMetadata.property("drill.format");
+          LocalTime localTime;
+          if (Strings.isNullOrEmpty(dateFormat)) {
+            localTime = LocalTime.parse(fieldValue);
+          } else {
+            localTime = LocalTime.parse(fieldValue, 
DateTimeFormatter.ofPattern(dateFormat));
+          }
+          colWriter.setTime(localTime);
+          break;
+        case TIMESTAMP:
+          dateFormat = columnMetadata.property("drill.format");
+          Instant timestamp = null;
+          if (Strings.isNullOrEmpty(dateFormat)) {
+            timestamp = Instant.parse(fieldValue);
+          } else {
+            try {
+              SimpleDateFormat simpleDateFormat = new 
SimpleDateFormat(dateFormat);
+              Date parsedDate = simpleDateFormat.parse(fieldValue);
+              timestamp = Instant.ofEpochMilli(parsedDate.getTime());
+            } catch (ParseException e) {

Review Comment:
   Not only dates and times but numerics and strings when invalid UTF-8 
sequences are encountered! I created 
[DRILL-8361](https://issues.apache.org/jira/browse/DRILL-8361) for this.
   
   I see we slipped up in the PDF plugin and we have two places where invalid 
data swallowing is present. Let's make both of them (XML and PDF) consistent 
with the rest of Drill (do not swallow invalid data) in this PR, and do the 
nulling of invalid data properly in 8361?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to