> there should > be no need to use a ClusterCacheLoader, I agree. This looked consistent w/ what I saw a couple weeks ago in a different thread. Use of ClusterCacheLoader didn't make sense to me either...
On Mar 18, 2013, at 5:55, Sanne Grinovero <sa...@infinispan.org> wrote: > I'm glad you're finding a workaround for the disk IO but there should > be no need to use a ClusterCacheLoader, > the intention of that would be to be able to chain multiple grids; > this is a critical problem IMHO. > > Seems there are multiple other issues at hand, let me comment per bullet: > > On 18 March 2013 12:20, James Aley <james.a...@swiftkey.net> wrote: >> Update: >> >> I tried again - I think I misconfigured that ClusterCacheLoader on my >> last attempt. With this configuration [1] it actually appears to be >> loading keys over the network from the peer node. I'm seeing a lot of >> network IO between the nodes when requesting from either one of them >> (30-50 MBp/s), and considerably less disk I/O than previously, though >> still not negligible. >> >> I think, however, that both nodes are holding on to any data they >> retrieve from the other node. Is this possible? The reason I think >> this is the case: >> * I have a fairly large test index on disk, which the Lucene >> CacheLoader loads into memory as soon as the cache is created. It's >> about a 12GB index, and after a flurry of disk activity when they >> processes start, I see about 5-6GB of heap usage on each node -- all >> seems good. > > Agree: that phase looks good. > >> * When I send requests now (with this ClusterCacheLoader >> configuration linked below), I see network activity between nodes, >> plus some disk I/O. > > I would not expect any more disk I/O to happen anymore at this point. > Only case I think a disk event could be triggered is if the Lucene logic > would attempt to load a non-existing key, like if there was a function > attempting either: > - to check if a file exists (disregarding the directory listing) > - to load a data range out of the expected boundaries > > The reason for me to think that is that Infinispan is not "caching" null > entries: we plan tombstones for 6.0 but for now if an entry doesn't exist > it won't "remember" the entry is null and will try to look it up again from > the usual places, so including the CacheLoader. > > I've opened ISPN-2932 to inspect this and add tests to cover it. > >> * After each query, each node grows in heap usage considerably. >> Eventually they'll both be using about 11GB of RAM. > > Permanently even after you close the IndexReader? > > I'm thinking I'm afraid this Directory is not suited for you as it is expected > you can fit the whole index in a single JVM: the IndexReader might > request (and keep references) to all segments; in most cases it will > work on a subset of segments so it could work for you but in case you > need to iterate it all you might need to use a custom Collector and play > with Infinispan custom commands; I can give you some pointers as we > have examples of an (experimental) distributed Query execution in the > infinispan query module, or I think we could play into combining > Map/Reduce with index analysis. (In other words: send the computation > to the data rather than downloading half of the index to the local JVM). > > But if this is not leaking because of the IndexReader usage, then it's a > leak we need to fix. > >> * At the point where both nodes have lots of data in RAM, the network >> I/O has dropped hugely to ~100k/s > > Almost looks like you have L1 enabled? Could you check that? > Or the IndexReader is buffering. > >> * If I repeat an identical query to either node, the response is >> instant - O(10ms) > > Well that would be good if only we could know why :) > But this is not an option for you right? I mean you can't load all the > production data in a single JVM? > > Sanne > > >> >> I don't know if this is because they're lazily loading entries from >> disk despite the preload=true setting (and the index just takes up far >> more RAM when loaded as a Cache like this?), or if it's because >> they're locally caching entries that should (by the consistent hash >> and numOwners configuration, at least) only live in the remote node? >> >> Thanks! >> James. >> >> [1] https://www.refheap.com/paste/12685 >> >> >> On 16 March 2013 01:19, Sanne Grinovero <sa...@infinispan.org> wrote: >>> Hi Adrian, >>> let's forget about Lucene details and focus on DIST. >>> With numOwners=1 and having two nodes the entries should be stored >>> roughly 50% on each node, I see nothing wrong with that >>> considering you don't need data failover in a read-only use case >>> having all the index available in the shared CacheLoader. >>> >>> In such a scenario, and having both nodes preloaded all data, in case >>> of a get() operation I would expect >>> either: >>> A) to be the owner, hence retrieve the value from local in-JVM reference >>> B) to not be the owner, so to forward the request to the other node >>> having roughly 50% chance per key to be in case A or B. >>> >>> But when hitting case B) it seems that instead of loading from the >>> other node, it hits the CacheLoader to fetch the value. >>> >>> I already had asked James to verify with 4 nodes and numOwners=2, the >>> result is the same so I suggested him to ask here; >>> BTW I think numOwners=1 is perfectly valid and should work as with >>> numOwners=1, the only reason I asked him to repeat >>> the test is that we don't have much tests on the numOwners=1 case and >>> I was assuming there might be some (wrong) assumptions >>> affecting this. >>> >>> Note that this is not "just" a critical performance problem but I'm >>> also suspecting it could provide inconsistent reads, in two classes of >>> problems: >>> >>> # non-shared CacheStore with stale entries >>> If for non-owned keys it will hit the local CacheStore first, where >>> you might expect to not find anything, so to forward the request to >>> the right node. What if this node has been the owner in the past? It >>> might have an old entry locally stored, which would be returned >>> instead of the correct value which is owned on a different node. >>> >>> # shared CacheStore using write-behind >>> When using an async CacheStore by definition the content of the >>> CacheStore is not trustworthy if you don't check on the owner first >>> for entries in memory. >>> >>> Both seem critical to me, but the performance impact is really bad too. >>> >>> I hoped to make some more tests myself but couldn't look at this yet, >>> any help from the core team would be appreciated. >>> >>> @Ray, thanks for mentioning the ClusterCacheLoader. Wasn't there >>> someone else with a CacheLoader issue recently who had worked around >>> the problem by using a ClusterCacheLoader ? >>> Do you remember what the scenario was? >>> >>> Cheers, >>> Sanne >>> >>> On 15 March 2013 15:44, Adrian Nistor <anis...@redhat.com> wrote: >>>> Hi James, >>>> >>>> I'm not an expert on InfinispanDirectory but I've noticed in [1] that >>>> the lucene-index cache is distributed with numOwners = 1. That means >>>> each cache entry is owned by just one cluster node and there's nowhere >>>> else to go in the cluster if the key is not available in local memory, >>>> thus it needs fetching from the cache store. This can be solved with >>>> numOwners > 1. >>>> Please let me know if this solves your problem. >>>> >>>> Cheers! >>>> >>>> On 03/15/2013 05:03 PM, James Aley wrote: >>>>> Hey all, >>>>> >>>>> <OT> >>>>> Seeing as this is my first post, I wanted to just quickly thank you >>>>> all for Infinispan. So far I'm really enjoying working with it - great >>>>> product! >>>>> </OT> >>>>> >>>>> I'm using the InfinispanDirectory for a Lucene project at the moment. >>>>> We use Lucene directly to build a search product, which has high read >>>>> requirements and likely very large indexes. I'm hoping to make use of >>>>> a distribution mode cache to keep the whole index in memory across a >>>>> cluster of machines (the index will be too big for one server). >>>>> >>>>> The problem I'm having is that after loading a filesystem-based Lucene >>>>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are >>>>> retrieving data from the cluster - they instead look up keys in their >>>>> local CacheLoaders, which involves lots of disk I/O and is very slow. >>>>> I was hoping to just use the CacheLoader to initialize the caches, but >>>>> from there on read only from RAM (and network, of course). Is this >>>>> supported? Maybe I've misunderstood the purpose of the CacheLoader? >>>>> >>>>> To explain my observations in a little more detail: >>>>> * I start a cluster of two servers, using [1] as the cache config. >>>>> Both have a local copy of the Lucene index that will be loaded into >>>>> the InfinispanDirectory via the loader. This is a test configuration, >>>>> where I've set numOwners=1 so that I only need two servers for >>>>> distribution to happen. >>>>> * Upon startup, things look good. I see the memory usage of the JVM >>>>> reflect a pretty near 50/50 split of the data across both servers. >>>>> Logging indicates both servers are in the cluster view, all seems >>>>> fine. >>>>> * When I send a search query to either one of the nodes, I notice the >>>>> following: >>>>> - iotop shows huge (~100MB/s) disk I/O on that node alone from the >>>>> JVM process. >>>>> - no change in network activity between nodes (~300b/s, same as when >>>>> idle) >>>>> - memory usage on the node running the query increases dramatically, >>>>> and stays higher even after the query is finished. >>>>> >>>>> So it seemed to me like each node was favouring use of the CacheLoader >>>>> to retrieve keys that are not in memory, instead of using the cluster. >>>>> Does that seem reasonable? Is this the expected behaviour? >>>>> >>>>> I started to investigate this by turning on trace logging, in this >>>>> made me think perhaps the cause was that the CacheLoader's interceptor >>>>> is higher priority in the chain than the the distribution interceptor? >>>>> I'm not at all familiar with the design in any level of detail - just >>>>> what I picked up in the last 24 hours from browsing the code, so I >>>>> could easily be way off. I've attached the log snippets I thought >>>>> relevant in [2]. >>>>> >>>>> Any advice offered much appreciated. >>>>> Thanks! >>>>> >>>>> James. >>>>> >>>>> >>>>> [1] https://www.refheap.com/paste/12531 >>>>> [2] https://www.refheap.com/paste/12543 >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev@lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> _______________________________________________ >>>> infinispan-dev mailing list >>>> infinispan-dev@lists.jboss.org >>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev