[ https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rui Wang updated BEAM-9198: --------------------------- Labels: gsoc gsoc2020 mentor (was: gsoc gsoc2020 mentor stale-assigned) > BeamSQL aggregation analytics functionality > -------------------------------------------- > > Key: BEAM-9198 > URL: https://issues.apache.org/jira/browse/BEAM-9198 > Project: Beam > Issue Type: New Feature > Components: dsl-sql > Reporter: Rui Wang > Assignee: John Mora > Priority: P2 > Labels: gsoc, gsoc2020, mentor > Time Spent: 50m > Remaining Estimate: 0h > > Mentor email: ruw...@google.com. Feel free to send emails for your questions. > Project Information > --------------------- > BeamSQL has a long list of of aggregation/aggregation analytics > functionalities to support. > To begin with, you will need to support this syntax: > {code:sql} > analytic_function_name ( [ argument_list ] ) > OVER ( > [ PARTITION BY partition_expression_list ] > [ ORDER BY expression [{ ASC | DESC }] [, ...] ] > [ window_frame_clause ] > ) > {code} > As there is a long list of analytics functions, a good start point is support > rank() first. > This will requires touch core components of BeamSQL: > 1. SQL parser to support the syntax above. > 2. SQL core to implement physical relational operator. > 3. Distributed algorithms to implement a list of functions in a distributed > manner. > 4. Enable in ZetaSQL dialect. > To understand what SQL analytics functionality is, you could check this great > explanation doc: > https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts. > To know about Beam's programming model, check: > https://beam.apache.org/documentation/programming-guide/#overview -- This message was sent by Atlassian Jira (v8.3.4#803005)