[
https://issues.apache.org/jira/browse/SAMZA-483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275998#comment-14275998
]
Milinda Lakmal Pathirage commented on SAMZA-483:
------------------------------------------------
Here is a another try to rephrase the above comment by me.
Let's say for Samza Streaming SQL, user will start with a streaming query like
below entered to a REPL:
SELECT ISTREAM field1, count(*) FROM InputStream1
WHERE someField >= 3 && someField <= 6
GROUP BY field1
INTO OutputStream1;
Process of converting this to a set of Samza jobs would be:
1. Parse the query (ANTLR or similar tool; Generates AST)
2. Semantic analysis (We implement semantic analysis phase on top of AST or
some other model generated from AST)
3. Optimizations
4. Generate execution plan (Samza job)
Given that we are going with CQL based execution model, execution plan would be
several extended relational algebra expressions connected together depending on
the query.
So considering above, where is this common representation going to sit? Best
model for our case will depend on answer to this question.
Others may have a different view than this. So, please feel free to comment
with those views.
> A common representation of relational algebra for streaming SQL
> ----------------------------------------------------------------
>
> Key: SAMZA-483
> URL: https://issues.apache.org/jira/browse/SAMZA-483
> Project: Samza
> Issue Type: Sub-task
> Reporter: Yi Pan (Data Infrastructure)
> Priority: Minor
> Labels: project
>
> Per discussion with [~criccomini] and [~milinda], we agreed that it seems to
> be a good idea to define a common representation of relational algebra on top
> of the operators defined in the operator layer (see SAMZA-482), which can be
> the common base that we can use to generate the description/configuration of
> a Samza job.
> This common layer can also be used by DSL-like language parser as a result of
> parsing a DSL program.
> Some additional requirements needed in addition to pure relational algebra:
> 1) the common representation should include window operators and stream
> operators (i.e. IStream/DStream/RStream)
> 2) the common representation should include description on parallelism of the
> jobs (i.e. how many partitions the resultant Samza job will use)
> Some references:
> http://web.cs.wpi.edu/~mukherab/i/DCAPE.pdf
> https://cs.uwaterloo.ca/~david/cs848/stream-cql.pdf
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/publications.htm
> http://davis.wpi.edu/dsrg/PROJECTS/CAPE/slides.htm
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)