Yes, you got the semantics correct.

There are 6 kinds of events that can happen: create, update, delete of 
order_item, and create, update, delete of product. Some of these are not 
possible - e.g. creating a product with the same id after you have created an 
order - because of primary and foreign key semantics. And others are not likely 
- e.g. order_item might be overwhelmingly append-only, so update and delete 
rarely happen. 

You can optimize for the common events, and not use very much memory. For the 
rarer events, you can pay the cost of a disk I/O.

Folks who have not read the paper might find presentations (e.g. Fabian at 
Flink Forward) [1] or the slides [2] more accessible.

Julian

[1] https://www.youtube.com/watch?v=uJtqGkIxGhc 
<https://www.youtube.com/watch?v=uJtqGkIxGhc>

[2] https://s.apache.org/streaming-sql-apachecon-na-2019 
<https://s.apache.org/streaming-sql-apachecon-na-2019> 

> On Mar 23, 2020, at 8:41 AM, Viliam Durina <vil...@hazelcast.com> wrote:
> 
> I've read again parts of the paper and now I've finally got the idea, I
> hope. The semantics still works with plain relations, relational operators
> are applied as in instantaneous queries, just the resulting relation is
> "streamed", in case of EMIT STREAM clause in the form of records and
> retractions.
> 
> So I'll try to make up an example for the query, let's repeat it for
> clarity:
> 
>  CREATE TABLE order_item (product_id INT, amount INT, ...);
>  CREATE TABLE product (product_id INT PRIMARY KEY, name VARCHAR);
> 
>  SELECT *
>  FROM order_item o
>    JOIN product p USING(product_id)
>  EMIT STREAM;
> 
> When we execute the above query:
> - until any change occurs in either table, there's no output
> - if a new order_item is inserted:
>  -> a joined record is emitted with the order item and matching product
> - if a product name is updated
>  -> for every matching order_item a retraction and a new record is emitted
> 
> Therefore to execute this query you need an unbounded memory for
> `order_item` and `product` relations, so it's not a good candidate for a
> streaming query, but let's put that aside, I'm interested in the semantics.
> 
> Did I put the example correctly?
> 
> Viliam
> 
> On Sat, 21 Mar 2020 at 01:00, Julian Hyde <jh...@apache.org> wrote:
> 
>> Our thinking in the "One SQL to rule them all" paper [1] is that there
>> are not "streams" and "tables". Both product and order_items are
>> time-varying relations (TVFs).
>> 
>> Whether it is a streaming query is determined by whether you specify
>> "EMIT STREAM" in the query, not by what objects are referenced in the
>> query.
>> 
>> (There is a strong analogy between streaming queries and the
>> differentiation operation in differential calculus. Consider the
>> product rule in calculus: (uv)' = u'v + u.v'. If you want to compute
>> the join of two time-varying relations istream(u join v) = (istream(u)
>> join v) union (u join istream(v)). So you see that we are using the
>> 'stream' of each side. I find this symmetric treatment of all TVRs to
>> be compelling.)
>> 
>> Julian
>> 
>> [1] https://arxiv.org/pdf/1905.12133.pdf
>> 
>> On Fri, Mar 20, 2020 at 3:17 PM Viliam Durina <vil...@hazelcast.com>
>> wrote:
>>> 
>>>> Does it matter which table is a steam? if the "STREAM" query runs
>>>> continuously, the output (table) from the query is a stream, and likely
>>>> this stream gives you delta updates periodically.
>>> 
>>> 
>>> In my understanding, it does. If both tables are a stream, you get a
>> change
>>> stream from both. You're joining two change streams. So if there's a
>> change
>>> in product name, a change event will occur and the change event should be
>>> joined to all previous (and future) change events on order_items matching
>>> that product. Similarly if there's a new order_item, it should be joined
>>> with all previous change events on the matching product.
>>> 
>>> The paper doesn't discuss queries with joins at all. But it's unclear to
>> me
>>> how it's supposed to work. Maybe if you could give an example for the
>> above
>>> query and what happens when there's a change in order_item and when in
>>> product.
>>> 
>>> Viliam
>>> 
>>> --
>>> This message contains confidential information and is intended only for
>> the
>>> individuals named. If you are not the named addressee you should not
>>> disseminate, distribute or copy this e-mail. Please notify the sender
>>> immediately by e-mail if you have received this e-mail by mistake and
>>> delete this e-mail from your system. E-mail transmission cannot be
>>> guaranteed to be secure or error-free as information could be
>> intercepted,
>>> corrupted, lost, destroyed, arrive late or incomplete, or contain
>> viruses.
>>> The sender therefore does not accept liability for any errors or
>> omissions
>>> in the contents of this message, which arise as a result of e-mail
>>> transmission. If verification is required, please request a hard-copy
>>> version. -Hazelcast
>> 
> 
> 
> -- 
> Viliam Durina
> Jet Developer
>      hazelcastĀ®
> 
>  <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 |
> USA
> +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com>
> 
> -- 
> This message contains confidential information and is intended only for the 
> individuals named. If you are not the named addressee you should not 
> disseminate, distribute or copy this e-mail. Please notify the sender 
> immediately by e-mail if you have received this e-mail by mistake and 
> delete this e-mail from your system. E-mail transmission cannot be 
> guaranteed to be secure or error-free as information could be intercepted, 
> corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. 
> The sender therefore does not accept liability for any errors or omissions 
> in the contents of this message, which arise as a result of e-mail 
> transmission. If verification is required, please request a hard-copy 
> version. -Hazelcast

Reply via email to