FYI I've created a JIRA to track this: https://issues.jboss.org/browse/ISPN-2950 Whilst quite a performance issues, I don't think that this is an critical/consistency issue for async stores: by using an async store you might loose data (expect inconsistencies) during a node crash anyway, so what this behaviour does is just to increase the inconsistency window.
On 19 Mar 2013, at 16:30, Mircea Markus wrote: > > On 19 Mar 2013, at 16:15, Dan Berindei wrote: > >> Hi Sanne >> >> On Tue, Mar 19, 2013 at 4:12 PM, Sanne Grinovero <sa...@infinispan.org> >> wrote: >> Mircea, >> what I was most looking forward was to you comment on the interceptor >> order generated for DIST+cachestores >> - we don't think the ClusteredCacheLoader should be needed at all >> >> Agree, ClusteredCacheLoader should not be necessary. >> >> James, if you're still seeing problems with numOwners=1, could you create an >> issue in JIRA? >> >> >> - each DIST node is loading from the CacheLoader (any) rather than >> loading from its peer nodes for non-owned entries (!!) >> >> >> Sometimes loading stuff from a local disk is faster than going remote, e.g. >> if you have numOwners=2 and both owners have to load the same entry from >> disk and send it to the originator twice. > the staggering of remote gets should overcome that. >> >> Still, most of the time the entry is going to be in memory on the owner >> nodes, so the local load is slower (especially with a shared cache store, >> where loading is over the network as well). > +1 >> >> >> This has come up on several threads now and I think it's critically >> wrong, as I commented previously this also introduces many >> inconsistencies - as far as I understand it. >> >> >> Is there a JIRA for this already? >> >> Yes, loading a stale entry from the local cache store is definitely not a >> good thing, but we actually delete the non-owned entries after the initial >> state transfer. There may be some consistency issues if one uses a DIST_SYNC >> cache with a shared async cache store, but fully sync configurations should >> be fine. >> >> OTOH, if the cache store is not shared, the chances of finding the entry in >> the local store on a non-owner are slim to none, so it doesn't make sense to >> do the lookup. >> >> Implementation-wise, just changing the interceptor order is probably not >> enough. If the key doesn't exist in the cache, the CacheLoaderInterceptor >> will still try to load it from the cache store after the remote lookup, so >> we'll need a marker in the invocation context to avoid the extra cache >> store load. > if the key does't map to the local node it should trigger a remote get to > owners (or allow the dist interceptor to do just that) >> Actually, since this is just a performance issue, it could wait until we >> implement tombstones everywhere. > Hmm, not sure i see the correlation between this and tombstones? > >> >> BTW your gist wouldn't work, the metadata cache needs to load certain >> elements too. But nice you spotted the need to potentially filter what >> "preload" means in the scope of each cache, as the metadata one should >> only preload metadata, while in the original configuration this data >> would indeed be duplicated. >> Opened: https://issues.jboss.org/browse/ISPN-2938 >> >> Sanne >> >> On 19 March 2013 11:51, Mircea Markus <mmar...@redhat.com> wrote: >>> >>> On 16 Mar 2013, at 01:19, Sanne Grinovero wrote: >>> >>>> Hi Adrian, >>>> let's forget about Lucene details and focus on DIST. >>>> With numOwners=1 and having two nodes the entries should be stored >>>> roughly 50% on each node, I see nothing wrong with that >>>> considering you don't need data failover in a read-only use case >>>> having all the index available in the shared CacheLoader. >>>> >>>> In such a scenario, and having both nodes preloaded all data, in case >>>> of a get() operation I would expect >>>> either: >>>> A) to be the owner, hence retrieve the value from local in-JVM reference >>>> B) to not be the owner, so to forward the request to the other node >>>> having roughly 50% chance per key to be in case A or B. >>>> >>>> But when hitting case B) it seems that instead of loading from the >>>> other node, it hits the CacheLoader to fetch the value. >>>> >>>> I already had asked James to verify with 4 nodes and numOwners=2, the >>>> result is the same so I suggested him to ask here; >>>> BTW I think numOwners=1 is perfectly valid and should work as with >>>> numOwners=1, the only reason I asked him to repeat >>>> the test is that we don't have much tests on the numOwners=1 case and >>>> I was assuming there might be some (wrong) assumptions >>>> affecting this. >>>> >>>> Note that this is not "just" a critical performance problem but I'm >>>> also suspecting it could provide inconsistent reads, in two classes of >>>> problems: >>>> >>>> # non-shared CacheStore with stale entries >>>> If for non-owned keys it will hit the local CacheStore first, where >>>> you might expect to not find anything, so to forward the request to >>>> the right node. What if this node has been the owner in the past? It >>>> might have an old entry locally stored, which would be returned >>>> instead of the correct value which is owned on a different node. >>>> >>>> # shared CacheStore using write-behind >>>> When using an async CacheStore by definition the content of the >>>> CacheStore is not trustworthy if you don't check on the owner first >>>> for entries in memory. >>>> >>>> Both seem critical to me, but the performance impact is really bad too. >>>> >>>> I hoped to make some more tests myself but couldn't look at this yet, >>>> any help from the core team would be appreciated. >>> I think you have a fair point and reads/writes to the data should be >>> coordinated through its owners both for performance and (more importantly) >>> correctness. >>> Mind creating a JIRA for this? >>> >>>> >>>> @Ray, thanks for mentioning the ClusterCacheLoader. Wasn't there >>>> someone else with a CacheLoader issue recently who had worked around >>>> the problem by using a ClusterCacheLoader ? >>>> Do you remember what the scenario was? >>>> >>>> Cheers, >>>> Sanne >>>> >>>> On 15 March 2013 15:44, Adrian Nistor <anis...@redhat.com> wrote: >>>>> Hi James, >>>>> >>>>> I'm not an expert on InfinispanDirectory but I've noticed in [1] that >>>>> the lucene-index cache is distributed with numOwners = 1. That means >>>>> each cache entry is owned by just one cluster node and there's nowhere >>>>> else to go in the cluster if the key is not available in local memory, >>>>> thus it needs fetching from the cache store. This can be solved with >>>>> numOwners > 1. >>>>> Please let me know if this solves your problem. >>>>> >>>>> Cheers! >>>>> >>>>> On 03/15/2013 05:03 PM, James Aley wrote: >>>>>> Hey all, >>>>>> >>>>>> <OT> >>>>>> Seeing as this is my first post, I wanted to just quickly thank you >>>>>> all for Infinispan. So far I'm really enjoying working with it - great >>>>>> product! >>>>>> </OT> >>>>>> >>>>>> I'm using the InfinispanDirectory for a Lucene project at the moment. >>>>>> We use Lucene directly to build a search product, which has high read >>>>>> requirements and likely very large indexes. I'm hoping to make use of >>>>>> a distribution mode cache to keep the whole index in memory across a >>>>>> cluster of machines (the index will be too big for one server). >>>>>> >>>>>> The problem I'm having is that after loading a filesystem-based Lucene >>>>>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are >>>>>> retrieving data from the cluster - they instead look up keys in their >>>>>> local CacheLoaders, which involves lots of disk I/O and is very slow. >>>>>> I was hoping to just use the CacheLoader to initialize the caches, but >>>>>> from there on read only from RAM (and network, of course). Is this >>>>>> supported? Maybe I've misunderstood the purpose of the CacheLoader? >>>>>> >>>>>> To explain my observations in a little more detail: >>>>>> * I start a cluster of two servers, using [1] as the cache config. >>>>>> Both have a local copy of the Lucene index that will be loaded into >>>>>> the InfinispanDirectory via the loader. This is a test configuration, >>>>>> where I've set numOwners=1 so that I only need two servers for >>>>>> distribution to happen. >>>>>> * Upon startup, things look good. I see the memory usage of the JVM >>>>>> reflect a pretty near 50/50 split of the data across both servers. >>>>>> Logging indicates both servers are in the cluster view, all seems >>>>>> fine. >>>>>> * When I send a search query to either one of the nodes, I notice the >>>>>> following: >>>>>> - iotop shows huge (~100MB/s) disk I/O on that node alone from the >>>>>> JVM process. >>>>>> - no change in network activity between nodes (~300b/s, same as when >>>>>> idle) >>>>>> - memory usage on the node running the query increases dramatically, >>>>>> and stays higher even after the query is finished. >>>>>> >>>>>> So it seemed to me like each node was favouring use of the CacheLoader >>>>>> to retrieve keys that are not in memory, instead of using the cluster. >>>>>> Does that seem reasonable? Is this the expected behaviour? >>>>>> >>>>>> I started to investigate this by turning on trace logging, in this >>>>>> made me think perhaps the cause was that the CacheLoader's interceptor >>>>>> is higher priority in the chain than the the distribution interceptor? >>>>>> I'm not at all familiar with the design in any level of detail - just >>>>>> what I picked up in the last 24 hours from browsing the code, so I >>>>>> could easily be way off. I've attached the log snippets I thought >>>>>> relevant in [2]. >>>>>> >>>>>> Any advice offered much appreciated. >>>>>> Thanks! >>>>>> >>>>>> James. >>>>>> >>>>>> >>>>>> [1] https://www.refheap.com/paste/12531 >>>>>> [2] https://www.refheap.com/paste/12543 >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev@lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev@lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev@lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>> >>> Cheers, >>> -- >>> Mircea Markus >>> Infinispan lead (www.infinispan.org) >>> >>> >>> >>> >>> >>> _______________________________________________ >>> infinispan-dev mailing list >>> infinispan-dev@lists.jboss.org >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > Cheers, > -- > Mircea Markus > Infinispan lead (www.infinispan.org) > > > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev