On Oct 18, 2013, at 1:06 PM, Sanne Grinovero <[email protected]> wrote:
> On 18 October 2013 12:12, Mircea Markus <[email protected]> wrote: >> >> On Oct 17, 2013, at 11:29 PM, Sanne Grinovero <[email protected]> wrote: >> >>> On 17 October 2013 20:19, Mircea Markus <[email protected]> wrote: >>>> let's keep this on -dev. >>> >>> +1 >>> >>>> On Oct 17, 2013, at 6:24 PM, Sanne Grinovero <[email protected]> wrote: >>>>> ----- Original Message ----- >>>>>> >>>>>> On Oct 17, 2013, at 2:28 PM, Sanne Grinovero <[email protected]> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>>> On Oct 17, 2013, at 1:31 PM, Sanne Grinovero <[email protected]> wrote: >>>>>>>> >>>>>>>>> With some custom coding it's certainly possible to define an event >>>>>>>>> listener >>>>>>>>> which triggers when an entry is inserted/removed which matches a >>>>>>>>> certain >>>>>>>>> Query. >>>>>>>> >>>>>>>> where would hold the the query result? a cache perhaps? >>>>>>> >>>>>>> Why do you need to hold on to the query result? >>>>>>> I was thinking to just send an event "newly stored X matches query Q1". >>>>>> >>>>>> You don't have a single process receive all the notifications then, but >>>>>> multiple processes in the cluster. It's up to the user to aggregate these >>>>>> results (that's why I mentioned a cache) but without aggregation this >>>>>> feature is pretty limiting. >>>>> >>>>> I have no idea if it's limiting. For the use case I understood, that's >>>>> pretty decent. >>>> >>>> Here's my understanding of CQ[1]: a user queries a cache 10000000( you add >>>> the rest of 0) per second. >>>> Instead of executing the query every time (very resource consuming) the >>>> system caches the query result, update it when underlying data gets >>>> modified, and return to the user on every invocation. Optionally you can >>>> register a listener on the query result, but that's just API sugar. >>> >>> That's an implementation detail, I need a use case. >>> >>> Assuming you store a good amount of entries, you know, maybe so many >>> that I actually need a data grid instead of a simple HashMap or a USB >>> stick, as a Query user I don't think I would always want to actually >>> fetch locally all data, when all I need is maybe sound an alarm bell. >>> >>> A use case could be that I'm interested in some stock, specifically I >>> want to be notified ASAP for course changes for the stock traded on >>> market "Neverland", so I register a continuous query "from stock where >>> stock.market = 'Neverland' ". >>> Let's also assume that Neverland trades approximately 5,000 titles. >>> >>> My application starts and fetches all current values with a one-off >>> full query (using that same query), so I fetch all 5,000 locally. Next >>> step, I want to be notified ASAP when one of these change value, so >>> that I can react on it. >>> Then I get my first notification! cool, my nice List API provides me >>> with the new value for 5,000 titles.. which one changed? let me find >>> out, I can scan on my previous results and find out.. >>> (Note that I'm not even getting into the detail of how we got all >>> those titles locally: using deltas or not is irrelevant). >>> >>> That's certainly doable, but what if you have more than 5,000 titles.. >>> it's degenerating. Of course you could wrap this "resultset" in some >>> more syntactic sugar, but essentially what you need to implement the >>> client side API is to receive the single events. >>> >>> I'm not focusing on the client side sugar because of Divya's original >>> question: >>> "a feasible path to achieve this functionality via some custom >>> coding, even though it is not the most efficient path (because >>> Continuous Queries are not available out of the box)." >>> >>>> From a very different perspective, look at it in terms of a scalable >>> architecture: when dealing with large amounts of data, the List >>> interface is conceptually not cutting it; I would expect you to ban >>> it, not to encourage it. >>> Assuming the client is also designed as a a properly scalable system, >>> if you were to provide it with a List this would likely need to >>> iterate on it to forward each single element as a task to some >>> parallel executor. It's much simpler if you push them one by one: it >>> could still wrap each in a task, but you cut on the latency which you >>> would otherwise introduce to collect all single items and you can >>> allow users to insert a load balancer between your crazy scalable >>> event generator and the target of these notifications. >>> >>> (Because really if you setup such a feature on a large grid, it will >>> be come a crazy scalable event generator) >>> >>>>>>> You could register multiple such listeners, getting the effect of "newly >>>>>>> stored entry X matches Query set {Q1, Q3, Q7}" >>>>>> >>>>>> The listeners would not be collocated. >>>>> >>>>> I'm not going to implement distributed listeners, I indeed expect you to >>>>> register such a listener on each node. >>>> >>>> If I run a query, continuous or not, I'd expect to be able to get all the >>>> result set of that query on the process on which I invoke it. Call me old >>>> fashion :-) >>>> >>>>> >>>>> I can show how to make Continous Queries on the Query API to accomplish >>>>> this. >>>> >>>> I wouldn't name the problem your solution solve Continuous Query :-) >>>> >>>>> Anything else is out of scope for me :-) Technically I think it's out of >>>>> scope for Infinispan too, it should delegate to a message bus. >>>> >>>> -1, for the reasons mentioned above. >>>> >>>> [1] http://coherence.oracle.com/display/COH31UG/Continuous+Query >>> >>> Do you realize this page is confirming a List is fundamentally wrong :-) >>> it's listing a bunch of fallacies to explain common errors, which all >>> boil down to an attempt of iterating on the entries, and then states: >>> >>> "The solution is to provide the listener during construction, and it >>> will receive one event for each item that is in the Continuous Query >>> Cache, whether it was there to begin with (because it was in the >>> query) or if it got added during or after the construction of the >>> cache" >>> >>> Finally, a consistency consideration on how to create such a list: if >>> you get multiple events in short time, you'll never know which one is >>> correct because of interleaving of the notifications.There is no way >>> to iterate (search) a list of results in Infinispan in a consistent >>> transactional view, unless you want me to lock all entries and repeat >>> the query to confirm. >> >> For many many users this getting a snapshot-result is good enough. After all >> this is how relational databases are queried. >> >>> By NOT providing a List access, you avoid the >>> problem of consistency and don't introduce contentions points like >>> "aggregating it all in one placeholder". >> >> Well Coherence supports both List(the CQ Cache itself) and event based, >> events being the preferred way when you don't want to miss any updated to >> the result set. >> Also very important, the mechanism you described does't offer this >> consistency guarantee (e.g. between the time the user runs the query and he >> registers the listeners things might change). > > That's what I said: you can't make a List in that time, but the event > happened so it's fair to notify about it. > >> Another (fundamental IMO) limitation that the approach we can offer has is >> the locality of the notifications: the initial query executes the on node A >> and receives future notifications of other elements matching the query >> criteria on node B, C etc. >> >>> Also interesting from Coherence's wiki: they have their results >>> implement InvocableMap, essentially a representation of a conceptual >>> data partition on which you can the invoke operations, by moving >>> execution to the data. I think that's brilliant, and makes it quite >>> clear that no such list is sent to the client. >> >> Not really, the cache itself is the list :-) > > That sounds very confusing to me, the cache is definitely not a list. > If you mean to point out that it "represents" a local view of all > data, yes :-) > that's fishy as it either contains a copy of all data (not nice > when it's large) Not if you only keep the set of keys locally and fetch the values (you might not even need them) on demand. > or it's a proxy which will be extremely slow by > "lazy-loading" each entry. Indeed you might need to get the value based on the key with an RPC. I wouldn't call that as extremely slow, after all it's just a cache lookup. > The InvocableMap approach sounds far more > interesting in terms of locality. It's still something that will go remotely on every invocation. If you need to do that very often(few 1000 times a sec), better to cache results locally. > >> >> I don't think that with what we currently have we're that close to the CQ >> caches as the industry "defines" them. If this listener followed distributed >> notifications can be useful, then very good. I would refrain from marketing >> this as CQ support as would create false expectations. > > Happy to not do it! I don't think the query API extension you mention is critical here, as the filtering logic can be expressed directly in java (which might me actually more convenient/flexible). Looking around, the CQ functionality that's missing in ISPN is: - offer a way to receive all the notifications in the same VM - offer a way to cache the result(might be keys only) in order to avoid executing the same query very often Let's continue our chat on this next week ;) Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
