[jira] [Commented] (BEAM-301) Add a Beam SQL DSL

Tyler Akidau (JIRA) Thu, 06 Apr 2017 12:52:02 -0700

    [ 
https://issues.apache.org/jira/browse/BEAM-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959621#comment-15959621
 ]


Tyler Akidau commented on BEAM-301:
-----------------------------------

I agree with what I think Mingmin is saying here in that it's important to 
clearly distinguish between streams and tables. Though it's easy to interchange 
between them, they should not be treated as 1:1 equivalents. In a situation 
where you clearly have a source of one type of the other, that should dictate 
which type of primitive you start out with.

That said, I agree with Julian that STREAM, if kept around, is probably just an 
alternative way of specifying an EMIT clause that emits upon every new 
INSERT/UPDATE/DELETE. I think EMIT EARLY is probably too general, though. EMIT 
CHANGELOG, EMIT INSERTS, or even EMIT STREAM (though only because of the 
history of the STREAM keyword) are all little more specific. But we don't need 
to bikeshed terms here.

I also agree that we should come to some agreement on a full specification for 
EMIT before forging ahead with any implementations (and to be clear, that's 
independent from the core SQL DSL stuff you're already putting in the feature 
branch, Mingmin; that work can proceed unimpeded while we sort out unified 
model semantics). In that vein, I've dedicated two chapters of my upcoming 
streaming systems book to the topic (one for the necessary background, and one 
on SQL specifically) as I've tried to sort the question out for myself. The 
book won't be out until later this summer at the earliest, though, so maybe in 
parallel I should try to condense that all into a specification doc we could 
iterate on in public? I'm not saying what I have there is necessarily the right 
answer, but it incorporates everything we've more or less agreed upon so far 
and then extends it a little further, so I think it's probably a good place to 
start from. I has triggering semantics via EMIT, the semantics necessary for 
temporal joins, and also addresses things like CUBE in a clean fashion. If we 
can come to agreement on a way forward in both the Beam and Calcite camps, then 
we're probably in a good position to forge ahead with implementation details.

Sound reasonable?

> Add a Beam SQL DSL
> ------------------
>
>                 Key: BEAM-301
>                 URL: https://issues.apache.org/jira/browse/BEAM-301
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-ideas
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Xu Mingmin
>
> The SQL DSL helps developers to build a Beam pipeline from SQL statement in 
> String directly. 
> In Phase I, it starts to support INSERT/SELECT queries with FILTERs, one 
> example SQL as below:
> {code}
> INSERT INTO `SUB_USEREVENT` (`SITEID`, `PAGEID`, `PAGENAME`, `EVENTTIMESTAMP`)
> (SELECT STREAM `USEREVENT`.`SITEID`, `USEREVENT`.`PAGEID`, 
> `USEREVENT`.`PAGENAME`, `USEREVENT`.`EVENTTIMESTAMP`
> FROM `USEREVENT` AS `USEREVENT`
> WHERE `USEREVENT`.`SITEID` > 10)
> {code}
> A design doc is available at 
> https://docs.google.com/document/d/1Uc5xYTpO9qsLXtT38OfuoqSLimH_0a1Bz5BsCROMzCU/edit?usp=sharing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-301) Add a Beam SQL DSL

Reply via email to