[ 
https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038095#comment-17038095
 ] 

John Mora commented on BEAM-9198:
---------------------------------

Hi.

I am John Mora, a student at UTPL, and I am interested in participating in the 
GSoC program. Currently, I am committer/PMC of the Apache Gora project and I 
have some experience with distributed storage for data analytics (i.e Apache 
Kudu), Java programming and SQL, so this issue caught my attention. I was 
wondering if you could give more information.

I noticed that the SQL extensions of Beam are only implemented for the Java 
SDK, therefore this project only involves working in that SDK, right?. 
According to the documentation there are two SQL dialects (Calcite and Zeta) 
that are supported by Beam, will these new aggregation functions be implemented 
in both dialects?.

Finally, are there some other implementations of aggregation functions (or 
similar) that I could check out in other SDKs?. I would really appreciated if 
you could give some resources / examples that I could analyze.


Best regards, 
John.

> BeamSQL aggregation analytics functions 
> ----------------------------------------
>
>                 Key: BEAM-9198
>                 URL: https://issues.apache.org/jira/browse/BEAM-9198
>             Project: Beam
>          Issue Type: Task
>          Components: dsl-sql
>            Reporter: Rui Wang
>            Priority: Major
>              Labels: gsoc, gsoc2020, mentor
>
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> analytic_function_name ( [ argument_list ] )
>   OVER (
>     [ PARTITION BY partition_expression_list ]
>     [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
>     [ window_frame_clause ]
>   )
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Build benchmarks to measure performance of your implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to