[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function

Rui Wang (Jira) Thu, 23 Jan 2020 14:16:26 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022551#comment-17022551
 ]


Rui Wang edited comment on CALCITE-3737 at 1/23/20 10:15 PM:
-------------------------------------------------------------

Addressed your comments and have two responses to two of the comments:

> Can HOP and TUMBLE share implementation?
I tried to share most of the code and just implemented the windowing part 
(computing window_start and window_end). Later I gave it up cause hopping need 
call one function to return a list of hopping's window_start and window_end, 
and we won't know the size of the list so we cannot really write a for loop in 
Java. (note that I need to build a list of lin4j expressions and you can check 
discussion here: 
[link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]).

Also considering later I will add per-key sessionazation and bucket_gap_filling 
table functions, they will have even more complicated code to write and is also 
less sharable. For example, per-key sessionazation will need know all data 
first and then apply sorting to find window start and window end. Thus I will 
prefer implement those by the way that  implements hopping (e.g. provide a 
AbstractEnumerable<Object[]> implementation).

As I am building more table functions and add support for streaming sql, if I 
want better way to unified table functions implementation, I will add patches 
for that.

>Changes to reference.md need some copy-editing.
I tried to check the changes in reference.md and made some changes. However I 
am not a native English speaker so I might not really fix what in your mind 
before. 




was (Author: amaliujia):
Addressed your comments and have two responses to two of the comments:

> Can HOP and TUMBLE share implementation?
I tried to share most of the code and just implemented the windowing part 
(computing window_start and window_end). Later I gave it up cause hopping need 
call one function to return a list of hopping's window_start and window_end, 
and we won't know the size of the list so we cannot really write a for loop in 
Java. (note that I need to build a list of lin4j expressions and you can check 
discussion here: 
[link|https://lists.apache.org/thread.html/86e5aa132de0656419843cab6c1f4fbea5941d4401dbde36cc11827e%40%3Cdev.calcite.apache.org%3E]).

Also considering later I will add per-key sessionazation and bucket_gap_filling 
table functions, they will have even more complicated code to write thus I will 
prefer implement those by the way that  implements hopping (e.g. provide a 
AbstractEnumerable<Object[]> implementation).


>Changes to reference.md need some copy-editing.
I tried to check the changes in reference.md and made some changes. However I 
am not a native English speaker so I might not really fix what in your mind 
before. 



> HOP Table-valued Function
> -------------------------
>
>                 Key: CALCITE-3737
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3737
>             Project: Calcite
>          Issue Type: Sub-task
>            Reporter: Rui Wang
>            Assignee: Rui Wang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Hopping windows place intervals of a fixed size evenly spaced across event 
> time. Most importantly, in the most common use a given event time timestamp 
> will generally fall into more than one window.
> The table-valued function Hop may produce zero, one, or multiple rows 
> corresponding to each row of input.  Hop takes four required parameters and 
> one optional parameter. All parameters are analogous to those for Tumble 
> except for hopsize, which specifies the duration between the starting points 
> (and endpoints) of the hopping windows, allowing for overlapping windows 
> (hopsize < dur, common) or gaps in the data (hopsize > dur, rarely useful).
> {code:java}
> Hop (data , timecol , dur, hopsize)
> {code}
> The return value of Hop is a relation that includes all columns of data as 
> well as additional event time columns wstart and wend. Here is an example 
> (from https://s.apache.org/streaming-beam-sql ):
> {code:sql}
> SELECT *
>       FROM Hop (
>         data    => TABLE Bids ,
>         timecol => DESCRIPTOR ( bidtime ) ,
>         dur     => INTERVAL '10' MINUTES ,
>         hopsize => INTERVAL '5' MINUTES );
> ------------------------------------------
> | wstart | wend | bidtime | price | item |
> ------------------------------------------
> | 8:00   | 8:10 | 8:07    | $2    | A    |
> | 8:05   | 8:15 | 8:07    | $2    | A    |
> | 8:05   | 8:15 | 8:11    | $3    | B    |
> | 8:10   | 8:20 | 8:11    | $3    | B    |
> | 8:00   | 8:10 | 8:05    | $4    | C    |
> | 8:05   | 8:15 | 8:05    | $4    | C    |
> | 8:00   | 8:10 | 8:09    | $5    | D    |
> | 8:05   | 8:15 | 8:09    | $5    | D    |
> | 8:05   | 8:15 | 8:13    | $1    | E    |
> | 8:10   | 8:20 | 8:13    | $1    | E    |
> | 8:10   | 8:20 | 8:17    | $6    | F    |
> | 8:15   | 8:25 | 8:17    | $6    | F    |
> ------------------------------------------
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (CALCITE-3737) HOP Table-valued Function

Reply via email to