[ https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rui Wang updated BEAM-9198: --------------------------- Description: Mentor email: ruw...@google.com. Feel free to send emails for your questions. Project Information --------------------- BeamSQL has a long list of of aggregation/aggregation analytics functionalities to support. To begin with, you will need to support this syntax: {code:sql} analytic_function_name ( [ argument_list ] ) OVER ( [ PARTITION BY partition_expression_list ] [ ORDER BY expression [{ ASC | DESC }] [, ...] ] [ window_frame_clause ] ) {code} As there is a long list of analytics functions, a good start point is support rank() first. This will requires touch core components of BeamSQL: 1. SQL parser to support the syntax above. 2. SQL core to implement physical relational operator. 3. Distributed algorithms to implement a list of functions in a distributed manner. 4. Enable in ZetaSQL dialect. To understand what SQL analytics functionality is, you could check this great explanation doc: https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts. To know about Beam's programming model, check: https://beam.apache.org/documentation/programming-guide/#overview was: Mentor email: ruw...@google.com. Feel free to send emails for your questions. Project Information --------------------- BeamSQL has a long list of of aggregation/aggregation analytics functionalities to support. To begin with, you will need to support this syntax: {code:sql} analytic_function_name ( [ argument_list ] ) OVER ( [ PARTITION BY partition_expression_list ] [ ORDER BY expression [{ ASC | DESC }] [, ...] ] [ window_frame_clause ] ) {code} As there is a long list of analytics functions, a good start point is support rank() first. This will requires touch core components of BeamSQL: 1. SQL parser to support the syntax above. 2. SQL core to implement physical relational operator. 3. Distributed algorithms to implement a list of functions in a distributed manner. 4. Build benchmarks to measure performance of your implementation. To understand what SQL analytics functionality is, you could check this great explanation doc: https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts. To know about Beam's programming model, check: https://beam.apache.org/documentation/programming-guide/#overview > BeamSQL aggregation analytics functionality > -------------------------------------------- > > Key: BEAM-9198 > URL: https://issues.apache.org/jira/browse/BEAM-9198 > Project: Beam > Issue Type: New Feature > Components: dsl-sql > Reporter: Rui Wang > Priority: Major > Labels: gsoc, gsoc2020, mentor > > Mentor email: ruw...@google.com. Feel free to send emails for your questions. > Project Information > --------------------- > BeamSQL has a long list of of aggregation/aggregation analytics > functionalities to support. > To begin with, you will need to support this syntax: > {code:sql} > analytic_function_name ( [ argument_list ] ) > OVER ( > [ PARTITION BY partition_expression_list ] > [ ORDER BY expression [{ ASC | DESC }] [, ...] ] > [ window_frame_clause ] > ) > {code} > As there is a long list of analytics functions, a good start point is support > rank() first. > This will requires touch core components of BeamSQL: > 1. SQL parser to support the syntax above. > 2. SQL core to implement physical relational operator. > 3. Distributed algorithms to implement a list of functions in a distributed > manner. > 4. Enable in ZetaSQL dialect. > To understand what SQL analytics functionality is, you could check this great > explanation doc: > https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts. > To know about Beam's programming model, check: > https://beam.apache.org/documentation/programming-guide/#overview -- This message was sent by Atlassian Jira (v8.3.4#803005)