[
https://issues.apache.org/jira/browse/SAMZA-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233659#comment-14233659
]
Yi Pan (Data Infrastructure) commented on SAMZA-390:
----------------------------------------------------
[~criccomini], good summary on Azure. Here are some of my comments:
{quote}
I like the TIMESTAMP BY syntax in Azure. It seems more flexible than a rigid
timestamp field enforced in the data model. It also means a single stream can
have multiple timestamp fields, rather than having to re-materialize messages
every time a new field should be used as the timestamp.
{quote}
Good point on the option to have multiple instantiation of the time-sequence
out of the same single stream. I would still argue to have a default system
injected timestamp if TIMESTMAP BY is not specified. Hence, avoid always
relying on an application field that a publisher has to fill.
{quote}
Azure's SELECT has an explicit PARTITION BY clause
(http://msdn.microsoft.com/en-us/library/dn835022.aspx).
{quote}
Yes, we can borrow this. However, this still misses how many partitions the
output of SELECT should be spread across. I would like to add that as an
option, something like PARTITION BY <col_name> TO <num_partitions>.
{quote}
It's interesting that you can't SELECT * in a join
(http://msdn.microsoft.com/en-us/library/dn835026.aspx). I haven't thought
about why.
{quote}
There is also a very interesting point that Azure made on the join: every join
is time-bounded, which is inline with what we have discussed above.
I totally agree w/ the point that Azure should have some ways of indicating a
stream or a relation (i.e. table). In addition, Azure seems to be only
implementing RStream output, from the few examples and experiments I did.
> High-Level Language for Samza
> -----------------------------
>
> Key: SAMZA-390
> URL: https://issues.apache.org/jira/browse/SAMZA-390
> Project: Samza
> Issue Type: New Feature
> Reporter: Raul Castro Fernandez
> Priority: Minor
> Labels: project
>
> Discussion about high-level languages to define Samza queries. Queries are
> defined in this language and transformed to a dataflow graph where the nodes
> are Samza jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)