Re: [infinispan-dev] Design change in Infinispan Query

Mircea Markus Fri, 28 Feb 2014 13:18:14 -0800

Added a correction:

On Feb 28, 2014, at 9:14 PM, Mircea Markus <mmar...@redhat.com> wrote:


> 
>>>>> On 24 févr. 2014, at 17:39, Mircea Markus <mmar...@redhat.com> wrote:
>>>>> 
>>>>> 
>>>>>> On Feb 17, 2014, at 10:13 PM, Emmanuel Bernard <emman...@hibernate.org> 
>>>>>> wrote:
>>>>>> 
>>>>>> By the way, Mircea, Sanne and I had quite a long discussion about this 
>>>>>> one and the idea of one cache per entity. It turns out that the right 
>>>>>> (as in easy) solution does involve a higher level programming model like 
>>>>>> OGM provides. You can simulate it yourself using the Infinispan APIs but 
>>>>>> it is just cumbersome.
>>>>> 
>>>>> Curious to hear the whole story :-)
>>>>> We cannot mandate all the suers to use OGM though, one of the reasons 
>>>>> being OGM is not platform independent (hotrod). 
>>>> 
>>>> Then solve all the issues I have raised with a magic wand and come back to 
>>>> me when you have done it, I'm interested.
>>> 
>>> People are going to use infinispan with one cache per entity, because it 
>>> makes sense:
>>> - different config (repl/dist | persistent/non-persistent) for different 
>>> data types
>>> - have map/reduce tasks running only the Person entires not on Dog as well, 
>>> when you want to select (Person) where age > 18
>>> I don't see a reason to forbid this, on the contrary. The way I see it the 
>>> relation between (OGM, ISPN) <=> (Hibernate, JDBC). Indeed OGM would be a 
>>> better abstraction and should be recommended as such for the Java clients, 
>>> but ultimately we're a general purpose storage engine that is available to 
>>> different platforms as well.
>>> 
>> 
>> I do disagree on your assessment.
>> I did write a whole essay on why I think your view is problematic - I was 
>> getting tired of repeating myself ;P
>> https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity
> 
> Thanks for writing this up, it is a good taxonomy of data storage schemes and 
> querying.
> 
>> 
>> To anecdotally answer your specific example, yes different configs for 
>> different entities is an interesting benefit but it has to outweigh the 
>> drawbacks.
> 
> Using a single cache for all the types is practical at all :-) Just to expand 
> my idea, people prefer using different caches for many reasons:
                                          ^NOT 

> - security: Account cache has a different security requirements than the News 
> cache
> - data consistency: News is a non-transactional cache, Account require 
> pessimistic XA transactions
> - expiry: expire last year's news from the system. Not the same for Accounts
> - availability: I want the Accounts cache to be backed up to another site. I 
> don't want that for the News cache
> - logical data grouping: mixing Accounts with News doesn't make sense. I 
> might want to know which account appeared in the news, though.
> 
>> If you have to do a map reduce for tasks so simple as age > 18, I think you 
>> system better have to be prepared to run gazillions of M/R jobs.
> 
> I want to run a simple M/R job in the evening to determine who turns 18 
> tomorrow, to congratulate them. Once a day, not gazzilions of times, and I 
> don't need to index the age filed just for that. Also when it comes to 
> Map/Reduce, the drawback of holding all the data in a single cache is 
> two-folded:
> - performance: you iterate over the data that is not related to your query. 
> - programming model: the Map/Reduce implementation has a dependency on both 
> Dog and Person. If I add Cats to the cache, I'll need to update the M/R code 
> to be aware of that as well. Same if I rename/remove Dog. Not nice.
> 
>> I think that Dogs and any domestic animal is fundamentally related to humans 
>> - Person in your case. So queries involving both will be required - a cross 
>> cache M/R is not doable today AFAIK and even if it was, it’s still M/R and 
>> all its drawbacks.
>> To me, the Cache API and Hot Rod are well suited for what I call self 
>> contained object graph (i.e. where Dog would be an embedded object of Person 
>> and not a separate Entity). In that situation, there is a single cache.
> 
> I see where you come from but I don't think requiring people to use a single 
> cache for all the entities is an option. Besides a natural logical 
> separation, different data has different storage requirements: security, 
> access patterns, consistency, durability, availability etc. For most of the 
> non-trivial use cases, using a single cache just wont do. 
> 
>> One cache per entity does make sense for API that do support what I call 
>> connected entities. Hibernate OGM specifically.
> 
> OGM does a great job covering this, but it is very specific: java only and 
> OOP - our C/S mode, hotrod specifically, is language independent and not OOP. 
> Also I would like to comment on the following statements:
> "I believe a cache API and Hot Rod are well suited to address up to the self 
> contained object graph use case with a couple of relations maintained 
> manually by the application but that cannot be queried. For the connected 
> entities use case, only a high level paradigm is suited like JPA."
> 
> I don't think storing object graphs should be under scrutiny here: Infinispan 
> C/S mode (and there's where most of the client focus is BTW) has a schema 
> (prtobuf) that does not support object graphs. I also think expecting people 
> to use multiple caches for multiple data types is a solid assumption to start 
> from. And here's me speculating: these data types have logical relations 
> between them so people will ask for querying. In order to queries on multiple 
> data types, you can either merge them together (your suggestion) or support 
> some sort of new cross-cache indexing/querying/api. x-cache querying is more 
> flexible and less restraining than merging data, but from what I understand 
> from you has certain implementation challenges. There's no pressure to take a 
> decision now around supporting queries spreading multiple caches - just 
> something to keep an eye on when dealing with use cases/users. ATM merging 
> data is the only solution available, let's wait and see if people ask for 
> more.
> 
>> But please read the wiki page first before commenting. I did spend a lot of 
>> time on it
>> https://github.com/infinispan/infinispan/wiki/A-continuum-of-data-structure-and-query-complexity
> 
> I do read your comments and I really appreciate your feedback. We come from 
> slightly different worlds and look at things from different angles, but 
> discussions like this raise many good points.
> 
>> 
>> Emmanuel
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> Cheers,
> -- 
> Mircea Markus
> Infinispan lead (www.infinispan.org)
> 
> 
> 
> 
> 
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)





_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Design change in Infinispan Query

Reply via email to