Re: [infinispan-dev] [infinispan-internal] Continuous Queries

Sanne Grinovero Fri, 18 Oct 2013 05:07:43 -0700

On 18 October 2013 12:12, Mircea Markus <[email protected]> wrote:
>
> On Oct 17, 2013, at 11:29 PM, Sanne Grinovero <[email protected]> wrote:
>
>> On 17 October 2013 20:19, Mircea Markus <[email protected]> wrote:
>>> let's keep this on -dev.
>>
>> +1
>>
>>> On Oct 17, 2013, at 6:24 PM, Sanne Grinovero <[email protected]> wrote:
>>>> ----- Original Message -----
>>>>>
>>>>> On Oct 17, 2013, at 2:28 PM, Sanne Grinovero <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> On Oct 17, 2013, at 1:31 PM, Sanne Grinovero <[email protected]> wrote:
>>>>>>>
>>>>>>>> With some custom coding it's certainly possible to define an event
>>>>>>>> listener
>>>>>>>> which triggers when an entry is inserted/removed which matches a 
>>>>>>>> certain
>>>>>>>> Query.
>>>>>>>
>>>>>>> where would hold the  the query result? a cache perhaps?
>>>>>>
>>>>>> Why do you need to hold on to the query result?
>>>>>> I was thinking to just send an event "newly stored X matches query Q1".
>>>>>
>>>>> You don't have a single process receive all the notifications then, but
>>>>> multiple processes in the cluster. It's up to the user to aggregate these
>>>>> results (that's why I mentioned a cache) but without aggregation this
>>>>> feature is pretty limiting.
>>>>
>>>> I have no idea if it's limiting. For the use case I understood, that's 
>>>> pretty decent.
>>>
>>> Here's my understanding of CQ[1]: a user queries a cache 10000000( you add 
>>> the rest of 0) per second.
>>> Instead of executing the query every time (very resource consuming) the 
>>> system caches the query result, update it when underlying data gets 
>>> modified, and return to the user on every invocation. Optionally you can 
>>> register a listener on the query result, but that's just API sugar.
>>
>> That's an implementation detail, I need a use case.
>>
>> Assuming you store a good amount of entries, you know, maybe so many
>> that I actually need a data grid instead of a simple HashMap or a USB
>> stick, as a Query user I don't think I would always want to actually
>> fetch locally all data, when all I need is maybe sound an alarm bell.
>>
>> A use case could be that I'm interested in some stock, specifically I
>> want to be notified ASAP for course changes for the stock traded on
>> market "Neverland", so I register a continuous query "from stock where
>> stock.market = 'Neverland' ".
>> Let's also assume that Neverland trades approximately 5,000 titles.
>>
>> My application starts and fetches all current values with a one-off
>> full query (using that same query), so I fetch all 5,000 locally. Next
>> step, I want to be notified ASAP when one of these change value, so
>> that I can react on it.
>> Then I get my first notification! cool, my nice List API provides me
>> with the new value for 5,000 titles.. which one changed? let me find
>> out, I can scan on my previous results and find out..
>> (Note that I'm not even getting into the detail of how we got all
>> those titles locally: using deltas or not is irrelevant).
>>
>> That's certainly doable, but what if you have more than 5,000 titles..
>> it's degenerating. Of course you could wrap this "resultset" in some
>> more syntactic sugar, but essentially what you need to implement the
>> client side API is to receive the single events.
>>
>> I'm not focusing on the client side sugar because of Divya's original 
>> question:
>> "a feasible path to achieve this functionality via some custom
>> coding, even though it is not the most efficient path (because
>> Continuous Queries are not available out of the box)."
>>
>>> From a very different perspective, look at it in terms of a scalable
>> architecture: when dealing with large amounts of data, the List
>> interface is conceptually not cutting it; I would expect you to ban
>> it, not to encourage it.
>> Assuming the client is also designed as a a properly scalable system,
>> if you were to provide it with a List this would likely need to
>> iterate on it to forward each single element as a task to some
>> parallel executor. It's much simpler if you push them one by one: it
>> could still wrap each in a task, but you cut on the latency which you
>> would otherwise introduce to collect all single items and you can
>> allow users to insert a load balancer between your crazy scalable
>> event generator and the target of these notifications.
>>
>> (Because really if you setup such a feature on a large grid, it will
>> be come a crazy scalable event generator)
>>
>>>>>> You could register multiple such listeners, getting the effect of "newly
>>>>>> stored entry X matches Query set {Q1, Q3, Q7}"
>>>>>
>>>>> The listeners would not be collocated.
>>>>
>>>> I'm not going to implement distributed listeners, I indeed expect you to 
>>>> register such a listener on each node.
>>>
>>> If I run a query, continuous or not, I'd expect to be able to get all the 
>>> result set of that query on the process on which I invoke it. Call me old 
>>> fashion :-)
>>>
>>>>
>>>> I can show how to make Continous Queries on the Query API to accomplish 
>>>> this.
>>>
>>> I wouldn't name the problem your solution solve Continuous Query :-)
>>>
>>>> Anything else is out of scope for me :-) Technically I think it's out of 
>>>> scope for Infinispan too, it should delegate to a message bus.
>>>
>>> -1, for the reasons mentioned above.
>>>
>>> [1] http://coherence.oracle.com/display/COH31UG/Continuous+Query
>>
>> Do you realize this page is confirming a List is fundamentally wrong :-)
>> it's listing a bunch of fallacies to explain common errors, which all
>> boil down to an attempt of iterating on the entries, and then states:
>>
>> "The solution is to provide the listener during construction, and it
>> will receive one event for each item that is in the Continuous Query
>> Cache, whether it was there to begin with (because it was in the
>> query) or if it got added during or after the construction of the
>> cache"
>>
>> Finally, a consistency consideration on how to create such a list: if
>> you get multiple events in short time, you'll never know which one is
>> correct because of interleaving of the notifications.There is no way
>> to iterate (search) a list of results in Infinispan in a consistent
>> transactional view, unless you want me to lock all entries and repeat
>> the query to confirm.
>
> For many many users this getting a snapshot-result is good enough. After all 
> this is how relational databases are queried.
>
>> By NOT providing a List access, you avoid the
>> problem of consistency and don't introduce contentions points like
>> "aggregating it all in one placeholder".
>
> Well Coherence supports both List(the CQ Cache itself) and event based, 
> events being the preferred way when you don't want to miss any updated to the 
> result set.
> Also very important, the mechanism you described does't offer this 
> consistency guarantee (e.g. between the time the user runs the query and he 
> registers the listeners things might change).


That's what I said: you can't make a List in that time, but the event
happened so it's fair to notify about it.

> Another (fundamental IMO) limitation that the approach we can offer has is 
> the locality of the notifications:  the initial query executes the on node A 
> and receives future notifications of other elements matching the query 
> criteria on node B, C etc.
>
>> Also interesting from Coherence's wiki: they have their results
>> implement InvocableMap, essentially a representation of a conceptual
>> data partition on which you can the invoke operations, by moving
>> execution to the data. I think that's brilliant, and makes it quite
>> clear that no such list is sent to the client.
>
> Not really, the cache itself is the list :-)

That sounds very confusing to me, the cache is definitely not a list.
If you mean to point out that it "represents" a local view of all
data, that's fishy as it either contains a copy of all data (not nice
when it's large) or it's a proxy which will be extremely slow by
"lazy-loading" each entry. The InvocableMap approach sounds far more
interesting in terms of locality.

>
> I don't think that with what we currently have we're that close to the CQ 
> caches as the industry "defines" them. If this listener followed distributed 
> notifications can be useful, then very good. I would refrain from marketing 
> this as CQ support as would create false expectations.

Happy to not do it!

Cheers,
Sanne

>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> _______________________________________________
> infinispan-dev mailing list
> [email protected]
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
[email protected]
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [infinispan-internal] Continuous Queries

Reply via email to