[jira] [Commented] (SAMZA-390) High-Level Language for Samza

Yi Pan (Data Infrastructure) (JIRA) Wed, 03 Dec 2014 15:17:43 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233659#comment-14233659
 ]


Yi Pan (Data Infrastructure) commented on SAMZA-390:
----------------------------------------------------

[~criccomini], good summary on Azure. Here are some of my comments:
{quote}
I like the TIMESTAMP BY syntax in Azure. It seems more flexible than a rigid 
timestamp field enforced in the data model. It also means a single stream can 
have multiple timestamp fields, rather than having to re-materialize messages 
every time a new field should be used as the timestamp.
{quote}
Good point on the option to have multiple instantiation of the time-sequence 
out of the same single stream. I would still argue to have a default system 
injected timestamp if TIMESTMAP BY is not specified. Hence, avoid always 
relying on an application field that a publisher has to fill.

{quote}
Azure's SELECT has an explicit PARTITION BY clause 
(http://msdn.microsoft.com/en-us/library/dn835022.aspx).
{quote}
Yes, we can borrow this. However, this still misses how many partitions the 
output of SELECT should be spread across. I would like to add that as an 
option, something like PARTITION BY  <col_name> TO <num_partitions>.

{quote}
It's interesting that you can't SELECT * in a join 
(http://msdn.microsoft.com/en-us/library/dn835026.aspx). I haven't thought 
about why.
{quote}
There is also a very interesting point that Azure made on the join: every join 
is time-bounded, which is inline with what we have discussed above.

I totally agree w/ the point that Azure should have some ways of indicating a 
stream or a relation (i.e. table). In addition, Azure seems to be only 
implementing RStream output, from the few examples and experiments I did.

> High-Level Language for Samza
> -----------------------------
>
>                 Key: SAMZA-390
>                 URL: https://issues.apache.org/jira/browse/SAMZA-390
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Raul Castro Fernandez
>            Priority: Minor
>              Labels: project
>
> Discussion about high-level languages to define Samza queries. Queries are 
> defined in this language and transformed to a dataflow graph where the nodes 
> are Samza jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-390) High-Level Language for Samza

Reply via email to