[jira] [Commented] (SAMZA-390) High-Level Language for Samza

Chris Riccomini (JIRA) Thu, 04 Dec 2014 16:53:48 -0800

    [ 
https://issues.apache.org/jira/browse/SAMZA-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234856#comment-14234856
 ]


Chris Riccomini commented on SAMZA-390:
---------------------------------------

Notes on Tigon:

# Tigon SQL reference manual is 
[here|http://docs.cask.co/tigon/current/en/_downloads/Tigon_SQL_User_Manual_2014_v4.pdf].
# Data model is only slightly more sophisticated than Azure's. Unsigned ints 
and longs, floats, booleans, and strings.
# Tigon SQL has a bunch of odd network-related operators and data types for bit 
shifting, IPv4, etc. This is because it's written by AT&T research, and the 
first use case was network packet analysis.
# There doesn't seem to be a WINDOW operator in Tigon SQL. Instead, windows are 
denoted through WHERE clauses. So,a 5 minute window for a join would be written 
as `FROM a JOIN b WHERE ABS(a.timestamp\_seconds - b.timstamp\_seconds) < 
300000;` There seem to be some restrictions on this, but abstractly, I think 
this is how it works. This is interesting.
# As far as I can tell, Tigon seems to rely on exact ordering of messages 
(timestamps never go backwards).
# Tigon SQL supports only 2-way joins.
# There is a CUBE operation, which allows you to execute a group by against 
multiple aggregations within a single query: `SELECT sourceIP, destIP, count(*) 
 FROM eth0.IPV4 T  GROUP BY T.time/60 as tb, Cube(sourceIP, destIP);` This 
seems kind of interesting. It sounds like it's mostly for efficiency.
# Lack of a WINDOW in GSQL's GROUP BY actually works fairly intuitively. 
`SELECT sourceIP, tb, count(*), max(offset) FROM eth0.IPV4 T WHERE T.protocol=1 
GROUP BY T.sourceIP, T.time/60 as tb` seems to make a lot of sense to me. GSQL 
just rolls over the window every time T.time/60 changes. Again, this relies on 
strongly ordered message arrival.
# In general, I find GSQL to be somewhat strange. Some of the operators such as 
CLOSING\_WHEN, CLEANING\_BY, etc feel a bit hacky. Similar to CQL, it 
introduces enough new concepts to make it diverge from standard SQL in a way 
that breaks with my expectations (as opposed to Azure streaming analytics, 
which feels very natural and SQL-ish).

> High-Level Language for Samza
> -----------------------------
>
>                 Key: SAMZA-390
>                 URL: https://issues.apache.org/jira/browse/SAMZA-390
>             Project: Samza
>          Issue Type: New Feature
>            Reporter: Raul Castro Fernandez
>            Priority: Minor
>              Labels: project
>
> Discussion about high-level languages to define Samza queries. Queries are 
> defined in this language and transformed to a dataflow graph where the nodes 
> are Samza jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-390) High-Level Language for Samza

Reply via email to