On 18 October 2013 12:12, Mircea Markus <[email protected]> wrote: > > On Oct 17, 2013, at 11:29 PM, Sanne Grinovero <[email protected]> wrote: > >> On 17 October 2013 20:19, Mircea Markus <[email protected]> wrote: >>> let's keep this on -dev. >> >> +1 >> >>> On Oct 17, 2013, at 6:24 PM, Sanne Grinovero <[email protected]> wrote: >>>> ----- Original Message ----- >>>>> >>>>> On Oct 17, 2013, at 2:28 PM, Sanne Grinovero <[email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>>> On Oct 17, 2013, at 1:31 PM, Sanne Grinovero <[email protected]> wrote: >>>>>>> >>>>>>>> With some custom coding it's certainly possible to define an event >>>>>>>> listener >>>>>>>> which triggers when an entry is inserted/removed which matches a >>>>>>>> certain >>>>>>>> Query. >>>>>>> >>>>>>> where would hold the the query result? a cache perhaps? >>>>>> >>>>>> Why do you need to hold on to the query result? >>>>>> I was thinking to just send an event "newly stored X matches query Q1". >>>>> >>>>> You don't have a single process receive all the notifications then, but >>>>> multiple processes in the cluster. It's up to the user to aggregate these >>>>> results (that's why I mentioned a cache) but without aggregation this >>>>> feature is pretty limiting. >>>> >>>> I have no idea if it's limiting. For the use case I understood, that's >>>> pretty decent. >>> >>> Here's my understanding of CQ[1]: a user queries a cache 10000000( you add >>> the rest of 0) per second. >>> Instead of executing the query every time (very resource consuming) the >>> system caches the query result, update it when underlying data gets >>> modified, and return to the user on every invocation. Optionally you can >>> register a listener on the query result, but that's just API sugar. >> >> That's an implementation detail, I need a use case. >> >> Assuming you store a good amount of entries, you know, maybe so many >> that I actually need a data grid instead of a simple HashMap or a USB >> stick, as a Query user I don't think I would always want to actually >> fetch locally all data, when all I need is maybe sound an alarm bell. >> >> A use case could be that I'm interested in some stock, specifically I >> want to be notified ASAP for course changes for the stock traded on >> market "Neverland", so I register a continuous query "from stock where >> stock.market = 'Neverland' ". >> Let's also assume that Neverland trades approximately 5,000 titles. >> >> My application starts and fetches all current values with a one-off >> full query (using that same query), so I fetch all 5,000 locally. Next >> step, I want to be notified ASAP when one of these change value, so >> that I can react on it. >> Then I get my first notification! cool, my nice List API provides me >> with the new value for 5,000 titles.. which one changed? let me find >> out, I can scan on my previous results and find out.. >> (Note that I'm not even getting into the detail of how we got all >> those titles locally: using deltas or not is irrelevant). >> >> That's certainly doable, but what if you have more than 5,000 titles.. >> it's degenerating. Of course you could wrap this "resultset" in some >> more syntactic sugar, but essentially what you need to implement the >> client side API is to receive the single events. >> >> I'm not focusing on the client side sugar because of Divya's original >> question: >> "a feasible path to achieve this functionality via some custom >> coding, even though it is not the most efficient path (because >> Continuous Queries are not available out of the box)." >> >>> From a very different perspective, look at it in terms of a scalable >> architecture: when dealing with large amounts of data, the List >> interface is conceptually not cutting it; I would expect you to ban >> it, not to encourage it. >> Assuming the client is also designed as a a properly scalable system, >> if you were to provide it with a List this would likely need to >> iterate on it to forward each single element as a task to some >> parallel executor. It's much simpler if you push them one by one: it >> could still wrap each in a task, but you cut on the latency which you >> would otherwise introduce to collect all single items and you can >> allow users to insert a load balancer between your crazy scalable >> event generator and the target of these notifications. >> >> (Because really if you setup such a feature on a large grid, it will >> be come a crazy scalable event generator) >> >>>>>> You could register multiple such listeners, getting the effect of "newly >>>>>> stored entry X matches Query set {Q1, Q3, Q7}" >>>>> >>>>> The listeners would not be collocated. >>>> >>>> I'm not going to implement distributed listeners, I indeed expect you to >>>> register such a listener on each node. >>> >>> If I run a query, continuous or not, I'd expect to be able to get all the >>> result set of that query on the process on which I invoke it. Call me old >>> fashion :-) >>> >>>> >>>> I can show how to make Continous Queries on the Query API to accomplish >>>> this. >>> >>> I wouldn't name the problem your solution solve Continuous Query :-) >>> >>>> Anything else is out of scope for me :-) Technically I think it's out of >>>> scope for Infinispan too, it should delegate to a message bus. >>> >>> -1, for the reasons mentioned above. >>> >>> [1] http://coherence.oracle.com/display/COH31UG/Continuous+Query >> >> Do you realize this page is confirming a List is fundamentally wrong :-) >> it's listing a bunch of fallacies to explain common errors, which all >> boil down to an attempt of iterating on the entries, and then states: >> >> "The solution is to provide the listener during construction, and it >> will receive one event for each item that is in the Continuous Query >> Cache, whether it was there to begin with (because it was in the >> query) or if it got added during or after the construction of the >> cache" >> >> Finally, a consistency consideration on how to create such a list: if >> you get multiple events in short time, you'll never know which one is >> correct because of interleaving of the notifications.There is no way >> to iterate (search) a list of results in Infinispan in a consistent >> transactional view, unless you want me to lock all entries and repeat >> the query to confirm. > > For many many users this getting a snapshot-result is good enough. After all > this is how relational databases are queried. > >> By NOT providing a List access, you avoid the >> problem of consistency and don't introduce contentions points like >> "aggregating it all in one placeholder". > > Well Coherence supports both List(the CQ Cache itself) and event based, > events being the preferred way when you don't want to miss any updated to the > result set. > Also very important, the mechanism you described does't offer this > consistency guarantee (e.g. between the time the user runs the query and he > registers the listeners things might change).
That's what I said: you can't make a List in that time, but the event happened so it's fair to notify about it. > Another (fundamental IMO) limitation that the approach we can offer has is > the locality of the notifications: the initial query executes the on node A > and receives future notifications of other elements matching the query > criteria on node B, C etc. > >> Also interesting from Coherence's wiki: they have their results >> implement InvocableMap, essentially a representation of a conceptual >> data partition on which you can the invoke operations, by moving >> execution to the data. I think that's brilliant, and makes it quite >> clear that no such list is sent to the client. > > Not really, the cache itself is the list :-) That sounds very confusing to me, the cache is definitely not a list. If you mean to point out that it "represents" a local view of all data, that's fishy as it either contains a copy of all data (not nice when it's large) or it's a proxy which will be extremely slow by "lazy-loading" each entry. The InvocableMap approach sounds far more interesting in terms of locality. > > I don't think that with what we currently have we're that close to the CQ > caches as the industry "defines" them. If this listener followed distributed > notifications can be useful, then very good. I would refrain from marketing > this as CQ support as would create false expectations. Happy to not do it! Cheers, Sanne > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > [email protected] > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
