On Mar 7, 2014, at 3:21 PM, Sanne Grinovero <sa...@infinispan.org> wrote:

> On 7 March 2014 14:54, Mircea Markus <mmar...@redhat.com> wrote:
>> 
>> On Mar 6, 2014, at 9:21 AM, Emmanuel Bernard <emman...@hibernate.org> wrote:
>> 
>>> On Wed 2014-03-05 17:16, Mircea Markus wrote:
>>>> Sanne came with a good follow up to this email, just some small 
>>>> clarifications:
>>>> 
>>>> On Mar 4, 2014, at 6:02 PM, Emmanuel Bernard <emman...@hibernate.org> 
>>>> wrote:
>>>> 
>>>>>>> If you have to do a map reduce for tasks so simple as age > 18, I think 
>>>>>>> you system better have to be prepared to run gazillions of M/R jobs.
>>>>>> 
>>>>>> I want to run a simple M/R job in the evening to determine who turns 18 
>>>>>> tomorrow, to congratulate them. Once a day, not gazzilions of times, and 
>>>>>> I don't need to index the age filed just for that. Also when it comes to 
>>>>>> Map/Reduce, the drawback of holding all the data in a single cache is 
>>>>>> two-folded:
>>>>>> - performance: you iterate over the data that is not related to your 
>>>>>> query.
>>>>> 
>>>>> If the data are never related (query wise), then we are in the database 
>>>>> split category. Which is fine. But if some of your queries are related, 
>>>>> what do you do? Deny the user the ability to do them?
>>>> 
>>>> Here's where cross-site query would have been used. As Sanne suggested 
>>>> (next post) these limitations overcome the advantages.
>>> 
>>> No. Cross-cache query if implemented will not support (efficiently
>>> enough) that kind of query. Cf my wiki page.
>> 
>> yes, non-indexed joins would be exponential on the number of caches involved.
> 
> Technically non-indexed joins would be exponential on the number of
> caches (joins) involves *and* on the amount of entries you have
> stored: I know you wheren't suggesting doing it, but to confirm it's
> even worse than an horrible idea ;-)
> And that's not even considering the subtle design catch of "load it
> all from all cachestores".. combined with "multiple times per join"..

I wasn't suggesting doing it, not only for performance but also for the 
limitations you mentioned in the previous emails.

> 
>> Is it possible to use an index for x-cache joins with linear index update 
>> time and query?
> 
> Index update cost is not linear but LogN: approximates to a constant
> cost.

you're counting RPCs here or index seeks?

> And we could cut this constant by 4 orders of magnitude if only
> I could safely differentiate between a put of a new entry vs. an
> update -> something which we'll need to brainstorm about.
> 
> Query time is also significantly sub-linear in practice, but specifics
> will vary on the query type.
> 
> Yes you could use indexes to improve x-cache joins, but you'll need an
> additional engine to coordinate that correctly, not least to manage
> data size buffers; essentially I think you'd need Teiid.
> 
> Sanne
> 
> 
>> 
>> Cheers,
>> --
>> Mircea Markus
>> Infinispan lead (www.infinispan.org)
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)





_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to