[ https://issues.apache.org/jira/browse/DRILL-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553744#comment-15553744 ]
ASF GitHub Bot commented on DRILL-4373: --------------------------------------- Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/600#discussion_r82314071 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetReaderUtility.java --- @@ -45,4 +53,34 @@ public static int getIntFromLEBytes(byte[] input, int start) { } return out; } + + /** + * Utilities for converting from parquet INT96 binary (impala, hive timestamp) + * to date time value. This utilizes the Joda library. + */ + public static class NanoTimeUtils { + + public static final long NANOS_PER_DAY = TimeUnit.DAYS.toNanos(1); + public static final long NANOS_PER_HOUR = TimeUnit.HOURS.toNanos(1); + public static final long NANOS_PER_MINUTE = TimeUnit.MINUTES.toNanos(1); + public static final long NANOS_PER_SECOND = TimeUnit.SECONDS.toNanos(1); + public static final long NANOS_PER_MILLISECOND = TimeUnit.MILLISECONDS.toNanos(1); + + /** + * @param binaryTimeStampValue + * hive, impala timestamp values with nanoseconds precision + * are stored in parquet Binary as INT96 + * + * @return the number of milliseconds since January 1, 1970, 00:00:00 GMT + * represented by @param binaryTimeStampValue . + */ + public static long getDateTimeValueFromBinary(Binary binaryTimeStampValue) { + NanoTime nt = NanoTime.fromBinary(binaryTimeStampValue); + int julianDay = nt.getJulianDay(); + long nanosOfDay = nt.getTimeOfDayNanos(); + return DateTimeUtils.fromJulianDay(julianDay-0.5d) + nanosOfDay/NANOS_PER_MILLISECOND; --- End diff -- 1. I would recommend not using Joda. Do the calculations directly, like in ConvertFromImpalaTimestamp. Joda uses non-standard, hence confusing, terminology. What Joda calls and uses as JulianDay, is actually Julian Date. Seems like you have identified this discrepancy and adjusted for it by subtracting 0.5 from _julianDay_. Note: (I guess you have already figured this out) : The actual code and the Joda code in the comment, in ConvertFromImpalaTimestamp, are inconsistent. Took me a day to figure out the reason behind this ! A bug should be opened to delete the comment. 2. Can you please also leave a comment stating that 2440588 is the JDN for the Unix Epoch. 3. Please leave a comment stating that the order of the calls to get _julianDay_ and _nanosOfDay_ matters. You can do this by just stating how timestamps are stored in INT96 i.e 32-bit JDN followed by 64-bit nanosOfDay. 4. Consistent(single or none) spacing for binary operators (+-/) used here would be nice. Single spacing would be preferable. > Drill and Hive have incompatible timestamp representations in parquet > --------------------------------------------------------------------- > > Key: DRILL-4373 > URL: https://issues.apache.org/jira/browse/DRILL-4373 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Hive, Storage - Parquet > Affects Versions: 1.8.0 > Reporter: Rahul Challapalli > Assignee: Karthikeyan Manivannan > Labels: doc-impacting > Fix For: 1.9.0 > > > git.commit.id.abbrev=83d460c > I created a parquet file with a timestamp type using Drill. Now if I define a > hive table on top of the parquet file and use "timestamp" as the column type, > drill fails to read the hive table through the hive storage plugin -- This message was sent by Atlassian JIRA (v6.3.4#6332)