[
https://issues.apache.org/jira/browse/SAMZA-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234856#comment-14234856
]
Chris Riccomini commented on SAMZA-390:
---------------------------------------
Notes on Tigon:
# Tigon SQL reference manual is
[here|http://docs.cask.co/tigon/current/en/_downloads/Tigon_SQL_User_Manual_2014_v4.pdf].
# Data model is only slightly more sophisticated than Azure's. Unsigned ints
and longs, floats, booleans, and strings.
# Tigon SQL has a bunch of odd network-related operators and data types for bit
shifting, IPv4, etc. This is because it's written by AT&T research, and the
first use case was network packet analysis.
# There doesn't seem to be a WINDOW operator in Tigon SQL. Instead, windows are
denoted through WHERE clauses. So,a 5 minute window for a join would be written
as `FROM a JOIN b WHERE ABS(a.timestamp\_seconds - b.timstamp\_seconds) <
300000;` There seem to be some restrictions on this, but abstractly, I think
this is how it works. This is interesting.
# As far as I can tell, Tigon seems to rely on exact ordering of messages
(timestamps never go backwards).
# Tigon SQL supports only 2-way joins.
# There is a CUBE operation, which allows you to execute a group by against
multiple aggregations within a single query: `SELECT sourceIP, destIP, count(*)
FROM eth0.IPV4 T GROUP BY T.time/60 as tb, Cube(sourceIP, destIP);` This
seems kind of interesting. It sounds like it's mostly for efficiency.
# Lack of a WINDOW in GSQL's GROUP BY actually works fairly intuitively.
`SELECT sourceIP, tb, count(*), max(offset) FROM eth0.IPV4 T WHERE T.protocol=1
GROUP BY T.sourceIP, T.time/60 as tb` seems to make a lot of sense to me. GSQL
just rolls over the window every time T.time/60 changes. Again, this relies on
strongly ordered message arrival.
# In general, I find GSQL to be somewhat strange. Some of the operators such as
CLOSING\_WHEN, CLEANING\_BY, etc feel a bit hacky. Similar to CQL, it
introduces enough new concepts to make it diverge from standard SQL in a way
that breaks with my expectations (as opposed to Azure streaming analytics,
which feels very natural and SQL-ish).
> High-Level Language for Samza
> -----------------------------
>
> Key: SAMZA-390
> URL: https://issues.apache.org/jira/browse/SAMZA-390
> Project: Samza
> Issue Type: New Feature
> Reporter: Raul Castro Fernandez
> Priority: Minor
> Labels: project
>
> Discussion about high-level languages to define Samza queries. Queries are
> defined in this language and transformed to a dataflow graph where the nodes
> are Samza jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)