[ https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399049#comment-16399049 ]
Jiang Wu edited comment on DRILL-6242 at 3/14/18 6:24 PM: ---------------------------------------------------------- I think there are two causes for the above: 1) When outputting hierarchical data, Drill places the data inside JsonStringHashMap and JsonStringArrayList objects. Both of these classes use their own private static ObjectMapper to serialize the content to JSON string representation. However, the ObjectMapper does not have any configuration to serialize time based data types to their logical value. Instead, by using the default settings, the serialization will expose all the getters of the org.joda.time.DataTime class. 2) When the data is retrieved from Drill vectors for date, time, or timestamp, the code always use a org.joda.time.DateTime as the java object. See NullableDateVector.java:429 as an example. The result is that regardless of the underlying data being a date, a time, or a timestamp, the Java representation is always DateTime. This is not quite correct as it loses the actual type of the underlying data. Possible fixes: A) add a mixin for DateTime in 1) to output logical timestamp value. However, this won't fix 2) and the result is that all date, time, and timestamp will have the same output. B) do not use DateTime directly for 2). Instead, introduce 3 subclasses of DateTime for Drill Date, Drill Time, Drill Timestamp. These 3 subclass have built-in ser-de to output the right JSON representation. This is similar to how java.sql.Time, java.sql.Timestamp, java.sql.Date work. These 3 classes are subclass of java.util.Date, but with built-in serde for the proper string representation. was (Author: wu): I think there are two causes for the above: 1) When outputting hierarchical data, Drill places the data inside JsonStringHashMap and JsonStringArrayList objects. Both of these classes use their own private static ObjectMapper to serialize the content to JSON string representation. However, the ObjectMapper does not have any configuration to serialize time based data types to their logical value. Instead, by using the default settings, the serialization will expose all the getters of the org.joda.time.DataTime class. 2) When the data is retrieved from Drill vectors for date, time, or timestamp, the code always use a org.joda.time.DateTime as the java object. See NullableDateVector.java:429 as an example. The result is that regardless of the underlying data being a date, a time, or a timestamp, the Java representation is always DateTime. This is not quite correct as it loses the actual type of the underlying data. Possible fixes: A) add a mixin for DateTime in 1) to output logical timestamp value. However, this won't fix 2) and the result is that all date, time, and timestamp will have the same output. B) do not use DateTime directly for 2). Rather introduce 3 subclasses of DateTime for Drill Date, Drill Time, Drill Timestamp. These 3 subclass have built-in ser-de to output the right JSON representation. This is similar to how java.sql.Time, java.sql.Timestamp, java.sql.Date work. These 3 classes are subclass of java.util.Date, but with built-in serde for the proper string representation. > Output format for nested date, time, timestamp values in an object hierarchy > ---------------------------------------------------------------------------- > > Key: DRILL-6242 > URL: https://issues.apache.org/jira/browse/DRILL-6242 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Affects Versions: 1.12.0 > Reporter: Jiang Wu > Priority: Major > > Some storages (mapr db, mongo db, etc.) have hierarchical objects that > contain nested fields of date, time, timestamp types. When a query returns > these objects, the output format for the nested date, time, timestamp, are > showing the internal object (org.joda.time.DateTime), rather than the logical > data value. > For example. Suppose in MongoDB, we have a single object that looks like > this: > {code:java} > > db.test.findOne(); > { > "_id" : ObjectId("5aa8487d470dd39a635a12f5"), > "name" : "orange", > "context" : { > "date" : ISODate("2018-03-13T21:52:54.940Z"), > "user" : "jack" > } > } > {code} > Then connect Drill to the above MongoDB storage, and run the following query > within Drill: > {code:java} > > select t.context.`date`, t.context from test t; > +--------+---------+ > | EXPR$0 | context | > +--------+---------+ > | 2018-03-13 | > {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"} > | > {code} > We can see that from the above output, when the date field is retrieved as a > top level column, Drill outputs a logical date value. But when the same > field is within an object hierarchy, Drill outputs the internal object used > to hold the date value. > The expected output is the same display for whether the date field is shown > as a top level column or when it is within an object hierarchy: > {code:java} > > select t.context.`date`, t.context from test t; > +--------+---------+ > | EXPR$0 | context | > +--------+---------+ > | 2018-03-13 | {"date":"2018-03-13","user":"jack"} | > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)