[jira] [Commented] (DRILL-7077) Add Function to Facilitate Time Series Analysis

Bridget Bevens (JIRA) Tue, 09 Apr 2019 16:28:39 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16813895#comment-16813895
 ]


Bridget Bevens commented on DRILL-7077:
---------------------------------------

Hi [~cgivre],

I'm trying this function and may be doing something wrong, but 15SECOND and 
30SECOND are not working for me:

select nearestdate(CAST(COLUMNS[2] as timestamp), '30SECOND') as nearest_second 
from dfs.samples.`/bee/time.csv`;

Error: SYSTEM ERROR: DrillRuntimeException: [30SECOND] is not a valid time 
statement. Expecting: [YEAR, QUARTER, MONTH, WEEK_SUNDAY, WEEK_MONDAY, DAY, 
HOUR, HALF_HOUR, QUARTER_HOUR, MINUTE, HALF_MINUTE, QUARTER_MINUTE, SECOND]

Fragment 0:0

Please, refer to logs for more information.

[Error Id: f119202e-ec24-4670-83c2-14b4a7f83ebf on doc23.lab:31010] 
(state=,code=0)

apache drill> select nearestdate(CAST(COLUMNS[2] as timestamp), 'SECOND') as 
nearest_second from dfs.samples.`/bee/time.csv`;
+-----------------------+
|    nearest_second     |
+-----------------------+
| 2018-01-01 05:10:15.0 |
| 2017-02-02 01:02:03.0 |
| 2003-04-06 07:11:11.0 |
+-----------------------+
3 rows selected (0.191 seconds)  

Are 15SECOND and 30SECOND supported?

Thanks,
Bridget


> Add Function to Facilitate Time Series Analysis
> -----------------------------------------------
>
>                 Key: DRILL-7077
>                 URL: https://issues.apache.org/jira/browse/DRILL-7077
>             Project: Apache Drill
>          Issue Type: New Feature
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>              Labels: doc-impacting, ready-to-commit
>             Fix For: 1.16.0
>
>
> When analyzing time based data, you will often have to aggregate by time 
> grains. While some time grains will be easy to calculate, others, such as 
> quarter, can be quite difficult. These functions enable a user to quickly and 
> easily aggregate data by various units of time. Usage is as follows:
> {code:java}
> SELECT <fields>
> FROM <data>
> GROUP BY nearestDate(<timestamp_column>, <time increment>{code}
> So let's say that a user wanted to count the number of hits on a web server 
> per 15 minute, the query might look like this:
> {code:java}
> SELECT nearestDate(`eventDate`, '15MINUTE' ) AS eventDate,
> COUNT(*) AS hitCount
> FROM dfs.`log.httpd`
> GROUP BY nearestDate(`eventDate`, '15MINUTE'){code}
> Currently supports the following time units:
>  * YEAR
>  * QUARTER
>  * MONTH
>  * WEEK_SUNDAY
>  * WEEK_MONDAY
>  * DAY
>  * HOUR
>  * HALF_HOUR / 30MIN
>  * QUARTER_HOUR / 15MIN
>  * MINUTE
>  * 30SECOND
>  * 15SECOND
>  * SECOND
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-7077) Add Function to Facilitate Time Series Analysis

Reply via email to