[ 
https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800394#comment-16800394
 ] 

ASF GitHub Bot commented on DRILL-7077:
---------------------------------------

cgivre commented on pull request #1680: DRILL-7077: Add Function to Facilitate 
Time Series Analysis
URL: https://github.com/apache/drill/pull/1680#discussion_r268484737
 
 

 ##########
 File path: 
contrib/udfs/src/main/java/org/apache/drill/exec/udfs/NearestDateUtils.java
 ##########
 @@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.udfs;
+
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+
+import java.time.temporal.TemporalAdjusters;
+import java.time.LocalDateTime;
+import java.time.DayOfWeek;
+import java.time.temporal.ChronoUnit;
+import java.util.Arrays;
+
+public class NearestDateUtils {
+  /**
+   * Specifies the time grouping to be used with the nearest date function
+   */
+  private enum TimeInterval {
+    YEAR,
+    QUARTER,
+    MONTH,
+    WEEK_SUNDAY,
+    WEEK_MONDAY,
+    DAY,
+    HOUR,
+    HALF_HOUR,
+    QUARTER_HOUR,
+    MINUTE,
+    HALF_MINUTE,
+    QUARTER_MINUTE,
+    SECOND
+  }
+
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(NearestDateUtils.class);
+
+  /**
+   * This function takes a Java LocalDateTime object, and an interval string 
and returns
+   * the nearest date closets to that time.  For instance, if you specified 
the date as 2018-05-04 and YEAR, the function
+   * will return 2018-01-01
+   *
+   * @param d        the original datetime before adjustments
+   * @param interval The interval string to deduct from the supplied date
+   * @return the modified LocalDateTime
+   */
+  public final static java.time.LocalDateTime getDate(java.time.LocalDateTime 
d, String interval) {
+    java.time.LocalDateTime newDate = d;
+    int year = d.getYear();
+    int month = d.getMonth().getValue();
+    int day = d.getDayOfMonth();
+    int hour = d.getHour();
+    int minute = d.getMinute();
+    int second = d.getSecond();
+    TimeInterval adjustmentAmount;
+    try {
+      adjustmentAmount = TimeInterval.valueOf(interval.toUpperCase());
+    } catch (IllegalArgumentException e) {
+      throw new DrillRuntimeException(String.format("[%s] is not a valid time 
statement. Expecting: %s", interval, Arrays.asList(TimeInterval.values())));
+    }
+    switch (adjustmentAmount) {
+      case YEAR:
+        newDate = LocalDateTime.of(year, 1, 1, 0, 0, 0);
+        break;
+      case QUARTER:
+        newDate = LocalDateTime.of(year, (month / 3) * 3 + 1, 1, 0, 0, 0);
+        break;
+      case MONTH:
+        newDate = LocalDateTime.of(year, month, 1, 0, 0, 0);
+        break;
+      case WEEK_SUNDAY:
+        newDate = 
newDate.with(TemporalAdjusters.previousOrSame(DayOfWeek.SUNDAY))
+                .truncatedTo(ChronoUnit.DAYS);
+        break;
+      case WEEK_MONDAY:
+        newDate = 
newDate.with(TemporalAdjusters.previousOrSame(DayOfWeek.MONDAY))
+                .truncatedTo(ChronoUnit.DAYS);
+        break;
+      case DAY:
+        newDate = LocalDateTime.of(year, month, day, 0, 0, 0);
+        break;
+      case HOUR:
+        newDate = LocalDateTime.of(year, month, day, hour, 0, 0);
+        break;
+      case HALF_HOUR:
+        if (minute >= 30) {
+          minute = 30;
+        } else {
+          minute = 0;
+        }
+        newDate = LocalDateTime.of(year, month, day, hour, minute, 0);
+        break;
+      case QUARTER_HOUR:
+        if (minute >= 45) {
+          minute = 45;
+        } else if (minute >= 30) {
+          minute = 30;
+        } else if (minute >= 15) {
+          minute = 15;
+        } else {
+          minute = 0;
+        }
+        newDate = LocalDateTime.of(year, month, day, hour, minute, 0);
+        break;
+      case MINUTE:
+        newDate = LocalDateTime.of(year, month, day, hour, minute, 0);
+        break;
+      case HALF_MINUTE:
+        if (second >= 30) {
+          second = 30;
+        } else {
+          second = 0;
+        }
+        newDate = LocalDateTime.of(year, month, day, hour, minute, second);
+        break;
+      case QUARTER_MINUTE:
+        if (second >= 45) {
 
 Review comment:
   What's the advantage of this? Wouldn't this add additional overhead from the 
function calls and loops?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Function to Facilitate Time Series Analysis
> -----------------------------------------------
>
>                 Key: DRILL-7077
>                 URL: https://issues.apache.org/jira/browse/DRILL-7077
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>              Labels: doc-impacting
>             Fix For: 1.16.0
>
>
> When analyzing time based data, you will often have to aggregate by time 
> grains. While some time grains will be easy to calculate, others, such as 
> quarter, can be quite difficult. These functions enable a user to quickly and 
> easily aggregate data by various units of time. Usage is as follows:
> {code:java}
> SELECT <fields>
> FROM <data>
> GROUP BY nearestDate(<timestamp_column>, <time increment>{code}
> So let's say that a user wanted to count the number of hits on a web server 
> per 15 minute, the query might look like this:
> {code:java}
> SELECT nearestDate(`eventDate`, '15MINUTE' ) AS eventDate,
> COUNT(*) AS hitCount
> FROM dfs.`log.httpd`
> GROUP BY nearestDate(`eventDate`, '15MINUTE'){code}
> Currently supports the following time units:
>  * YEAR
>  * QUARTER
>  * MONTH
>  * WEEK_SUNDAY
>  * WEEK_MONDAY
>  * DAY
>  * HOUR
>  * HALF_HOUR / 30MIN
>  * QUARTER_HOUR / 15MIN
>  * MINUTE
>  * 30SECOND
>  * 15SECOND
>  * SECOND
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to