On Mar 7, 2014, at 3:21 PM, Sanne Grinovero <sa...@infinispan.org> wrote:
> On 7 March 2014 14:54, Mircea Markus <mmar...@redhat.com> wrote: >> >> On Mar 6, 2014, at 9:21 AM, Emmanuel Bernard <emman...@hibernate.org> wrote: >> >>> On Wed 2014-03-05 17:16, Mircea Markus wrote: >>>> Sanne came with a good follow up to this email, just some small >>>> clarifications: >>>> >>>> On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard <emman...@hibernate.org> >>>> wrote: >>>> >>>>>>> If you have to do a map reduce for tasks so simple as age > 18, I think >>>>>>> you system better have to be prepared to run gazillions of M/R jobs. >>>>>> >>>>>> I want to run a simple M/R job in the evening to determine who turns 18 >>>>>> tomorrow, to congratulate them. Once a day, not gazzilions of times, and >>>>>> I don't need to index the age filed just for that. Also when it comes to >>>>>> Map/Reduce, the drawback of holding all the data in a single cache is >>>>>> two-folded: >>>>>> - performance: you iterate over the data that is not related to your >>>>>> query. >>>>> >>>>> If the data are never related (query wise), then we are in the database >>>>> split category. Which is fine. But if some of your queries are related, >>>>> what do you do? Deny the user the ability to do them? >>>> >>>> Here's where cross-site query would have been used. As Sanne suggested >>>> (next post) these limitations overcome the advantages. >>> >>> No. Cross-cache query if implemented will not support (efficiently >>> enough) that kind of query. Cf my wiki page. >> >> yes, non-indexed joins would be exponential on the number of caches involved. > > Technically non-indexed joins would be exponential on the number of > caches (joins) involves *and* on the amount of entries you have > stored: I know you wheren't suggesting doing it, but to confirm it's > even worse than an horrible idea ;-) > And that's not even considering the subtle design catch of "load it > all from all cachestores".. combined with "multiple times per join".. I wasn't suggesting doing it, not only for performance but also for the limitations you mentioned in the previous emails. > >> Is it possible to use an index for x-cache joins with linear index update >> time and query? > > Index update cost is not linear but LogN: approximates to a constant > cost. you're counting RPCs here or index seeks? > And we could cut this constant by 4 orders of magnitude if only > I could safely differentiate between a put of a new entry vs. an > update -> something which we'll need to brainstorm about. > > Query time is also significantly sub-linear in practice, but specifics > will vary on the query type. > > Yes you could use indexes to improve x-cache joins, but you'll need an > additional engine to coordinate that correctly, not least to manage > data size buffers; essentially I think you'd need Teiid. > > Sanne > > >> >> Cheers, >> -- >> Mircea Markus >> Infinispan lead (www.infinispan.org) >> >> >> >> >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev