[ 
https://issues.apache.org/jira/browse/DRILL-6242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399049#comment-16399049
 ] 

Jiang Wu edited comment on DRILL-6242 at 3/14/18 6:24 PM:
----------------------------------------------------------

I think there are two causes for the above:

1) When outputting hierarchical data, Drill places the data inside 
JsonStringHashMap and JsonStringArrayList objects.  Both of these classes use 
their own private static ObjectMapper to serialize the content to JSON string 
representation.  However, the ObjectMapper does not have any configuration to 
serialize time based data types to their logical value.  Instead, by using the 
default settings, the serialization will expose all the getters of the 
org.joda.time.DataTime class.

2) When the data is retrieved from Drill vectors for date, time, or timestamp, 
the code always use a org.joda.time.DateTime as the java object.  See 
NullableDateVector.java:429 as an example.  The result is that regardless of 
the underlying data being a date, a time, or a timestamp, the Java 
representation is always DateTime.  This is not quite correct as it loses the 
actual type of the underlying data.

Possible fixes:

A) add a mixin for DateTime in 1) to output logical timestamp value.  However, 
this won't fix 2) and the result is that all date, time, and timestamp will 
have the same output.

B) do not use DateTime directly for 2).  Instead, introduce 3 subclasses of 
DateTime for Drill Date, Drill Time, Drill Timestamp. These 3 subclass have 
built-in ser-de to output the right JSON representation.  This is similar to 
how java.sql.Time, java.sql.Timestamp, java.sql.Date work.  These 3 classes are 
subclass of java.util.Date, but with built-in serde for the proper string 
representation.

 


was (Author: wu):
I think there are two causes for the above:

1) When outputting hierarchical data, Drill places the data inside 
JsonStringHashMap and JsonStringArrayList objects.  Both of these classes use 
their own private static ObjectMapper to serialize the content to JSON string 
representation.  However, the ObjectMapper does not have any configuration to 
serialize time based data types to their logical value.  Instead, by using the 
default settings, the serialization will expose all the getters of the 
org.joda.time.DataTime class.

2) When the data is retrieved from Drill vectors for date, time, or timestamp, 
the code always use a org.joda.time.DateTime as the java object.  See 
NullableDateVector.java:429 as an example.  The result is that regardless of 
the underlying data being a date, a time, or a timestamp, the Java 
representation is always DateTime.  This is not quite correct as it loses the 
actual type of the underlying data.

Possible fixes:

A) add a mixin for DateTime in 1) to output logical timestamp value.  However, 
this won't fix 2) and the result is that all date, time, and timestamp will 
have the same output.

B) do not use DateTime directly for 2).  Rather introduce 3 subclasses of 
DateTime for Drill Date, Drill Time, Drill Timestamp. These 3 subclass have 
built-in ser-de to output the right JSON representation.  This is similar to 
how java.sql.Time, java.sql.Timestamp, java.sql.Date work.  These 3 classes are 
subclass of java.util.Date, but with built-in serde for the proper string 
representation.

 

> Output format for nested date, time, timestamp values in an object hierarchy
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-6242
>                 URL: https://issues.apache.org/jira/browse/DRILL-6242
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.12.0
>            Reporter: Jiang Wu
>            Priority: Major
>
> Some storages (mapr db, mongo db, etc.) have hierarchical objects that 
> contain nested fields of date, time, timestamp types.  When a query returns 
> these objects, the output format for the nested date, time, timestamp, are 
> showing the internal object (org.joda.time.DateTime), rather than the logical 
> data value.
> For example.  Suppose in MongoDB, we have a single object that looks like 
> this:
> {code:java}
> > db.test.findOne();
> {
>     "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
>     "name" : "orange",
>     "context" : {
>         "date" : ISODate("2018-03-13T21:52:54.940Z"),
>         "user" : "jack"
>     }
> }
> {code}
> Then connect Drill to the above MongoDB storage, and run the following query 
> within Drill:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> +--------+---------+ 
> | EXPR$0 | context | 
> +--------+---------+ 
> | 2018-03-13 | 
> {"date":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,"era":1,"millisOfDay":78774940,"weekOfWeekyear":11,"weekyear":2018,"monthOfYear":3,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"millisOfSecond":940,"secondOfMinute":54,"secondOfDay":78774,"minuteOfHour":52,"minuteOfDay":1312,"hourOfDay":21,"zone":{"fixed":true,"id":"UTC"},"millis":1520977974940,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"afterNow":false,"beforeNow":true,"equalNow":false},"user":"jack"}
>  |
> {code}
> We can see that from the above output, when the date field is retrieved as a 
> top level column, Drill outputs a logical date value.  But when the same 
> field is within an object hierarchy, Drill outputs the internal object used 
> to hold the date value.
> The expected output is the same display for whether the date field is shown 
> as a top level column or when it is within an object hierarchy:
> {code:java}
> > select t.context.`date`, t.context from test t; 
> +--------+---------+ 
> | EXPR$0 | context | 
> +--------+---------+ 
> | 2018-03-13 | {"date":"2018-03-13","user":"jack"} |
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to