Hi Ray, Yeah - I've tried with shared=true/false and preload=true/false. I'm using org.infinispan.lucene.cachestore.LuceneCacheLoader.
Sorry, I also should have mentioned previously that I'm building from master, as I need access to the Lucene v4 support. James. On 15 March 2013 15:31, Ray Tsang <saturn...@gmail.com> wrote: > Was the cache loader shared? Which cache loader were you using? > > On Fri, Mar 15, 2013 at 8:03 AM, James Aley <james.a...@swiftkey.net> wrote: >> >> Hey all, >> >> <OT> >> Seeing as this is my first post, I wanted to just quickly thank you >> all for Infinispan. So far I'm really enjoying working with it - great >> product! >> </OT> >> >> I'm using the InfinispanDirectory for a Lucene project at the moment. >> We use Lucene directly to build a search product, which has high read >> requirements and likely very large indexes. I'm hoping to make use of >> a distribution mode cache to keep the whole index in memory across a >> cluster of machines (the index will be too big for one server). >> >> The problem I'm having is that after loading a filesystem-based Lucene >> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are >> retrieving data from the cluster - they instead look up keys in their >> local CacheLoaders, which involves lots of disk I/O and is very slow. >> I was hoping to just use the CacheLoader to initialize the caches, but >> from there on read only from RAM (and network, of course). Is this >> supported? Maybe I've misunderstood the purpose of the CacheLoader? >> >> To explain my observations in a little more detail: >> * I start a cluster of two servers, using [1] as the cache config. >> Both have a local copy of the Lucene index that will be loaded into >> the InfinispanDirectory via the loader. This is a test configuration, >> where I've set numOwners=1 so that I only need two servers for >> distribution to happen. >> * Upon startup, things look good. I see the memory usage of the JVM >> reflect a pretty near 50/50 split of the data across both servers. >> Logging indicates both servers are in the cluster view, all seems >> fine. >> * When I send a search query to either one of the nodes, I notice the >> following: >> - iotop shows huge (~100MB/s) disk I/O on that node alone from the >> JVM process. >> - no change in network activity between nodes (~300b/s, same as when >> idle) >> - memory usage on the node running the query increases dramatically, >> and stays higher even after the query is finished. >> >> So it seemed to me like each node was favouring use of the CacheLoader >> to retrieve keys that are not in memory, instead of using the cluster. >> Does that seem reasonable? Is this the expected behaviour? >> >> I started to investigate this by turning on trace logging, in this >> made me think perhaps the cause was that the CacheLoader's interceptor >> is higher priority in the chain than the the distribution interceptor? >> I'm not at all familiar with the design in any level of detail - just >> what I picked up in the last 24 hours from browsing the code, so I >> could easily be way off. I've attached the log snippets I thought >> relevant in [2]. >> >> Any advice offered much appreciated. >> Thanks! >> >> James. >> >> >> [1] https://www.refheap.com/paste/12531 >> [2] https://www.refheap.com/paste/12543 >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev > > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev