Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-15 Thread Ray Tsang
Was the cache loader shared?  Which cache loader were you using?

On Fri, Mar 15, 2013 at 8:03 AM, James Aley  wrote:

> Hey all,
>
> 
> Seeing as this is my first post, I wanted to just quickly thank you
> all for Infinispan. So far I'm really enjoying working with it - great
> product!
> 
>
> I'm using the InfinispanDirectory for a Lucene project at the moment.
> We use Lucene directly to build a search product, which has high read
> requirements and likely very large indexes. I'm hoping to make use of
> a distribution mode cache to keep the whole index in memory across a
> cluster of machines (the index will be too big for one server).
>
> The problem I'm having is that after loading a filesystem-based Lucene
> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
> retrieving data from the cluster - they instead look up keys in their
> local CacheLoaders, which involves lots of disk I/O and is very slow.
> I was hoping to just use the CacheLoader to initialize the caches, but
> from there on read only from RAM (and network, of course). Is this
> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>
> To explain my observations in a little more detail:
> * I start a cluster of two servers, using [1] as the cache config.
> Both have a local copy of the Lucene index that will be loaded into
> the InfinispanDirectory via the loader. This is a test configuration,
> where I've set numOwners=1 so that I only need two servers for
> distribution to happen.
> * Upon startup, things look good. I see the memory usage of the JVM
> reflect a pretty near 50/50 split of the data across both servers.
> Logging indicates both servers are in the cluster view, all seems
> fine.
> * When I send a search query to either one of the nodes, I notice the
> following:
>   - iotop shows huge (~100MB/s) disk I/O on that node alone from the
> JVM process.
>   - no change in network activity between nodes (~300b/s, same as when
> idle)
>   - memory usage on the node running the query increases dramatically,
> and stays higher even after the query is finished.
>
> So it seemed to me like each node was favouring use of the CacheLoader
> to retrieve keys that are not in memory, instead of using the cluster.
> Does that seem reasonable? Is this the expected behaviour?
>
> I started to investigate this by turning on trace logging, in this
> made me think perhaps the cause was that the CacheLoader's interceptor
> is higher priority in the chain than the the distribution interceptor?
> I'm not at all familiar with the design in any level of detail - just
> what I picked up in the last 24 hours from browsing the code, so I
> could easily be way off. I've attached the log snippets I thought
> relevant in [2].
>
> Any advice offered much appreciated.
> Thanks!
>
> James.
>
>
> [1] https://www.refheap.com/paste/12531
> [2] https://www.refheap.com/paste/12543
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-15 Thread James Aley
Hi Ray,

Yeah - I've tried with shared=true/false and preload=true/false. I'm
using org.infinispan.lucene.cachestore.LuceneCacheLoader.

Sorry, I also should have mentioned previously that I'm building from
master, as I need access to the Lucene v4 support.


James.

On 15 March 2013 15:31, Ray Tsang  wrote:
> Was the cache loader shared?  Which cache loader were you using?
>
> On Fri, Mar 15, 2013 at 8:03 AM, James Aley  wrote:
>>
>> Hey all,
>>
>> 
>> Seeing as this is my first post, I wanted to just quickly thank you
>> all for Infinispan. So far I'm really enjoying working with it - great
>> product!
>> 
>>
>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>> We use Lucene directly to build a search product, which has high read
>> requirements and likely very large indexes. I'm hoping to make use of
>> a distribution mode cache to keep the whole index in memory across a
>> cluster of machines (the index will be too big for one server).
>>
>> The problem I'm having is that after loading a filesystem-based Lucene
>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
>> retrieving data from the cluster - they instead look up keys in their
>> local CacheLoaders, which involves lots of disk I/O and is very slow.
>> I was hoping to just use the CacheLoader to initialize the caches, but
>> from there on read only from RAM (and network, of course). Is this
>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>>
>> To explain my observations in a little more detail:
>> * I start a cluster of two servers, using [1] as the cache config.
>> Both have a local copy of the Lucene index that will be loaded into
>> the InfinispanDirectory via the loader. This is a test configuration,
>> where I've set numOwners=1 so that I only need two servers for
>> distribution to happen.
>> * Upon startup, things look good. I see the memory usage of the JVM
>> reflect a pretty near 50/50 split of the data across both servers.
>> Logging indicates both servers are in the cluster view, all seems
>> fine.
>> * When I send a search query to either one of the nodes, I notice the
>> following:
>>   - iotop shows huge (~100MB/s) disk I/O on that node alone from the
>> JVM process.
>>   - no change in network activity between nodes (~300b/s, same as when
>> idle)
>>   - memory usage on the node running the query increases dramatically,
>> and stays higher even after the query is finished.
>>
>> So it seemed to me like each node was favouring use of the CacheLoader
>> to retrieve keys that are not in memory, instead of using the cluster.
>> Does that seem reasonable? Is this the expected behaviour?
>>
>> I started to investigate this by turning on trace logging, in this
>> made me think perhaps the cause was that the CacheLoader's interceptor
>> is higher priority in the chain than the the distribution interceptor?
>> I'm not at all familiar with the design in any level of detail - just
>> what I picked up in the last 24 hours from browsing the code, so I
>> could easily be way off. I've attached the log snippets I thought
>> relevant in [2].
>>
>> Any advice offered much appreciated.
>> Thanks!
>>
>> James.
>>
>>
>> [1] https://www.refheap.com/paste/12531
>> [2] https://www.refheap.com/paste/12543
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-15 Thread Adrian Nistor
Hi James,

I'm not an expert on InfinispanDirectory but I've noticed in [1] that 
the lucene-index cache is distributed with numOwners = 1. That means 
each cache entry is owned by just one cluster node and there's nowhere 
else to go in the cluster if the key is not available in local memory, 
thus it needs fetching from the cache store. This can be solved with 
numOwners > 1.
Please let me know if this solves your problem.

Cheers!

On 03/15/2013 05:03 PM, James Aley wrote:
> Hey all,
>
> 
> Seeing as this is my first post, I wanted to just quickly thank you
> all for Infinispan. So far I'm really enjoying working with it - great
> product!
> 
>
> I'm using the InfinispanDirectory for a Lucene project at the moment.
> We use Lucene directly to build a search product, which has high read
> requirements and likely very large indexes. I'm hoping to make use of
> a distribution mode cache to keep the whole index in memory across a
> cluster of machines (the index will be too big for one server).
>
> The problem I'm having is that after loading a filesystem-based Lucene
> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
> retrieving data from the cluster - they instead look up keys in their
> local CacheLoaders, which involves lots of disk I/O and is very slow.
> I was hoping to just use the CacheLoader to initialize the caches, but
> from there on read only from RAM (and network, of course). Is this
> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>
> To explain my observations in a little more detail:
> * I start a cluster of two servers, using [1] as the cache config.
> Both have a local copy of the Lucene index that will be loaded into
> the InfinispanDirectory via the loader. This is a test configuration,
> where I've set numOwners=1 so that I only need two servers for
> distribution to happen.
> * Upon startup, things look good. I see the memory usage of the JVM
> reflect a pretty near 50/50 split of the data across both servers.
> Logging indicates both servers are in the cluster view, all seems
> fine.
> * When I send a search query to either one of the nodes, I notice the 
> following:
>- iotop shows huge (~100MB/s) disk I/O on that node alone from the
> JVM process.
>- no change in network activity between nodes (~300b/s, same as when idle)
>- memory usage on the node running the query increases dramatically,
> and stays higher even after the query is finished.
>
> So it seemed to me like each node was favouring use of the CacheLoader
> to retrieve keys that are not in memory, instead of using the cluster.
> Does that seem reasonable? Is this the expected behaviour?
>
> I started to investigate this by turning on trace logging, in this
> made me think perhaps the cause was that the CacheLoader's interceptor
> is higher priority in the chain than the the distribution interceptor?
> I'm not at all familiar with the design in any level of detail - just
> what I picked up in the last 24 hours from browsing the code, so I
> could easily be way off. I've attached the log snippets I thought
> relevant in [2].
>
> Any advice offered much appreciated.
> Thanks!
>
> James.
>
>
> [1] https://www.refheap.com/paste/12531
> [2] https://www.refheap.com/paste/12543
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-15 Thread James Aley
Apologies - forgot to copy list.

On 15 March 2013 15:48, James Aley  wrote:
> Hey Adrian,
>
> Thanks for the response. I was chatting to Sanne on IRC yesterday, and
> he suggested this to me. Actually the logging I attached was from a
> cluster of 4 servers with numOwners=2. Sorry, I should have mentioned
> this actually, but I thought seeing as it didn't appear to make any
> difference that I'd just keep things simple in my previous email.
>
> While it seemed not to make a difference in this case... I can see why
> that would make sense. In future tests I guess I should probably stick
> with numOwners > 1.
>
>
> James.
>
> On 15 March 2013 15:44, Adrian Nistor  wrote:
>> Hi James,
>>
>> I'm not an expert on InfinispanDirectory but I've noticed in [1] that the
>> lucene-index cache is distributed with numOwners = 1. That means each cache
>> entry is owned by just one cluster node and there's nowhere else to go in
>> the cluster if the key is not available in local memory, thus it needs
>> fetching from the cache store. This can be solved with numOwners > 1.
>> Please let me know if this solves your problem.
>>
>> Cheers!
>>
>>
>> On 03/15/2013 05:03 PM, James Aley wrote:
>>>
>>> Hey all,
>>>
>>> 
>>> Seeing as this is my first post, I wanted to just quickly thank you
>>> all for Infinispan. So far I'm really enjoying working with it - great
>>> product!
>>> 
>>>
>>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>>> We use Lucene directly to build a search product, which has high read
>>> requirements and likely very large indexes. I'm hoping to make use of
>>> a distribution mode cache to keep the whole index in memory across a
>>> cluster of machines (the index will be too big for one server).
>>>
>>> The problem I'm having is that after loading a filesystem-based Lucene
>>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
>>> retrieving data from the cluster - they instead look up keys in their
>>> local CacheLoaders, which involves lots of disk I/O and is very slow.
>>> I was hoping to just use the CacheLoader to initialize the caches, but
>>> from there on read only from RAM (and network, of course). Is this
>>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>>>
>>> To explain my observations in a little more detail:
>>> * I start a cluster of two servers, using [1] as the cache config.
>>> Both have a local copy of the Lucene index that will be loaded into
>>> the InfinispanDirectory via the loader. This is a test configuration,
>>> where I've set numOwners=1 so that I only need two servers for
>>> distribution to happen.
>>> * Upon startup, things look good. I see the memory usage of the JVM
>>> reflect a pretty near 50/50 split of the data across both servers.
>>> Logging indicates both servers are in the cluster view, all seems
>>> fine.
>>> * When I send a search query to either one of the nodes, I notice the
>>> following:
>>>- iotop shows huge (~100MB/s) disk I/O on that node alone from the
>>> JVM process.
>>>- no change in network activity between nodes (~300b/s, same as when
>>> idle)
>>>- memory usage on the node running the query increases dramatically,
>>> and stays higher even after the query is finished.
>>>
>>> So it seemed to me like each node was favouring use of the CacheLoader
>>> to retrieve keys that are not in memory, instead of using the cluster.
>>> Does that seem reasonable? Is this the expected behaviour?
>>>
>>> I started to investigate this by turning on trace logging, in this
>>> made me think perhaps the cause was that the CacheLoader's interceptor
>>> is higher priority in the chain than the the distribution interceptor?
>>> I'm not at all familiar with the design in any level of detail - just
>>> what I picked up in the last 24 hours from browsing the code, so I
>>> could easily be way off. I've attached the log snippets I thought
>>> relevant in [2].
>>>
>>> Any advice offered much appreciated.
>>> Thanks!
>>>
>>> James.
>>>
>>>
>>> [1] https://www.refheap.com/paste/12531
>>> [2] https://www.refheap.com/paste/12543
>>> ___
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-15 Thread Ray Tsang
Can you try adding a ClusterCacheLoader to see if that helps?

Thanks,

On Fri, Mar 15, 2013 at 8:49 AM, James Aley  wrote:

> Apologies - forgot to copy list.
>
> On 15 March 2013 15:48, James Aley  wrote:
> > Hey Adrian,
> >
> > Thanks for the response. I was chatting to Sanne on IRC yesterday, and
> > he suggested this to me. Actually the logging I attached was from a
> > cluster of 4 servers with numOwners=2. Sorry, I should have mentioned
> > this actually, but I thought seeing as it didn't appear to make any
> > difference that I'd just keep things simple in my previous email.
> >
> > While it seemed not to make a difference in this case... I can see why
> > that would make sense. In future tests I guess I should probably stick
> > with numOwners > 1.
> >
> >
> > James.
> >
> > On 15 March 2013 15:44, Adrian Nistor  wrote:
> >> Hi James,
> >>
> >> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
> the
> >> lucene-index cache is distributed with numOwners = 1. That means each
> cache
> >> entry is owned by just one cluster node and there's nowhere else to go
> in
> >> the cluster if the key is not available in local memory, thus it needs
> >> fetching from the cache store. This can be solved with numOwners > 1.
> >> Please let me know if this solves your problem.
> >>
> >> Cheers!
> >>
> >>
> >> On 03/15/2013 05:03 PM, James Aley wrote:
> >>>
> >>> Hey all,
> >>>
> >>> 
> >>> Seeing as this is my first post, I wanted to just quickly thank you
> >>> all for Infinispan. So far I'm really enjoying working with it - great
> >>> product!
> >>> 
> >>>
> >>> I'm using the InfinispanDirectory for a Lucene project at the moment.
> >>> We use Lucene directly to build a search product, which has high read
> >>> requirements and likely very large indexes. I'm hoping to make use of
> >>> a distribution mode cache to keep the whole index in memory across a
> >>> cluster of machines (the index will be too big for one server).
> >>>
> >>> The problem I'm having is that after loading a filesystem-based Lucene
> >>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
> >>> retrieving data from the cluster - they instead look up keys in their
> >>> local CacheLoaders, which involves lots of disk I/O and is very slow.
> >>> I was hoping to just use the CacheLoader to initialize the caches, but
> >>> from there on read only from RAM (and network, of course). Is this
> >>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
> >>>
> >>> To explain my observations in a little more detail:
> >>> * I start a cluster of two servers, using [1] as the cache config.
> >>> Both have a local copy of the Lucene index that will be loaded into
> >>> the InfinispanDirectory via the loader. This is a test configuration,
> >>> where I've set numOwners=1 so that I only need two servers for
> >>> distribution to happen.
> >>> * Upon startup, things look good. I see the memory usage of the JVM
> >>> reflect a pretty near 50/50 split of the data across both servers.
> >>> Logging indicates both servers are in the cluster view, all seems
> >>> fine.
> >>> * When I send a search query to either one of the nodes, I notice the
> >>> following:
> >>>- iotop shows huge (~100MB/s) disk I/O on that node alone from the
> >>> JVM process.
> >>>- no change in network activity between nodes (~300b/s, same as when
> >>> idle)
> >>>- memory usage on the node running the query increases dramatically,
> >>> and stays higher even after the query is finished.
> >>>
> >>> So it seemed to me like each node was favouring use of the CacheLoader
> >>> to retrieve keys that are not in memory, instead of using the cluster.
> >>> Does that seem reasonable? Is this the expected behaviour?
> >>>
> >>> I started to investigate this by turning on trace logging, in this
> >>> made me think perhaps the cause was that the CacheLoader's interceptor
> >>> is higher priority in the chain than the the distribution interceptor?
> >>> I'm not at all familiar with the design in any level of detail - just
> >>> what I picked up in the last 24 hours from browsing the code, so I
> >>> could easily be way off. I've attached the log snippets I thought
> >>> relevant in [2].
> >>>
> >>> Any advice offered much appreciated.
> >>> Thanks!
> >>>
> >>> James.
> >>>
> >>>
> >>> [1] https://www.refheap.com/paste/12531
> >>> [2] https://www.refheap.com/paste/12543
> >>> ___
> >>> infinispan-dev mailing list
> >>> infinispan-dev@lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-15 Thread James Aley
Not sure if I've done exactly what you had in mind... here is my updated XML:
https://www.refheap.com/paste/12601

I added the loader to the lucene-index namedCache, which is the one
I'm using for distribution.

This didn't appear to change anything, as far as I can see. Still
seeing a lot of disk IO with every request.


James.


On 15 March 2013 15:54, Ray Tsang  wrote:
> Can you try adding a ClusterCacheLoader to see if that helps?
>
> Thanks,
>
>
> On Fri, Mar 15, 2013 at 8:49 AM, James Aley  wrote:
>>
>> Apologies - forgot to copy list.
>>
>> On 15 March 2013 15:48, James Aley  wrote:
>> > Hey Adrian,
>> >
>> > Thanks for the response. I was chatting to Sanne on IRC yesterday, and
>> > he suggested this to me. Actually the logging I attached was from a
>> > cluster of 4 servers with numOwners=2. Sorry, I should have mentioned
>> > this actually, but I thought seeing as it didn't appear to make any
>> > difference that I'd just keep things simple in my previous email.
>> >
>> > While it seemed not to make a difference in this case... I can see why
>> > that would make sense. In future tests I guess I should probably stick
>> > with numOwners > 1.
>> >
>> >
>> > James.
>> >
>> > On 15 March 2013 15:44, Adrian Nistor  wrote:
>> >> Hi James,
>> >>
>> >> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
>> >> the
>> >> lucene-index cache is distributed with numOwners = 1. That means each
>> >> cache
>> >> entry is owned by just one cluster node and there's nowhere else to go
>> >> in
>> >> the cluster if the key is not available in local memory, thus it needs
>> >> fetching from the cache store. This can be solved with numOwners > 1.
>> >> Please let me know if this solves your problem.
>> >>
>> >> Cheers!
>> >>
>> >>
>> >> On 03/15/2013 05:03 PM, James Aley wrote:
>> >>>
>> >>> Hey all,
>> >>>
>> >>> 
>> >>> Seeing as this is my first post, I wanted to just quickly thank you
>> >>> all for Infinispan. So far I'm really enjoying working with it - great
>> >>> product!
>> >>> 
>> >>>
>> >>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>> >>> We use Lucene directly to build a search product, which has high read
>> >>> requirements and likely very large indexes. I'm hoping to make use of
>> >>> a distribution mode cache to keep the whole index in memory across a
>> >>> cluster of machines (the index will be too big for one server).
>> >>>
>> >>> The problem I'm having is that after loading a filesystem-based Lucene
>> >>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
>> >>> retrieving data from the cluster - they instead look up keys in their
>> >>> local CacheLoaders, which involves lots of disk I/O and is very slow.
>> >>> I was hoping to just use the CacheLoader to initialize the caches, but
>> >>> from there on read only from RAM (and network, of course). Is this
>> >>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>> >>>
>> >>> To explain my observations in a little more detail:
>> >>> * I start a cluster of two servers, using [1] as the cache config.
>> >>> Both have a local copy of the Lucene index that will be loaded into
>> >>> the InfinispanDirectory via the loader. This is a test configuration,
>> >>> where I've set numOwners=1 so that I only need two servers for
>> >>> distribution to happen.
>> >>> * Upon startup, things look good. I see the memory usage of the JVM
>> >>> reflect a pretty near 50/50 split of the data across both servers.
>> >>> Logging indicates both servers are in the cluster view, all seems
>> >>> fine.
>> >>> * When I send a search query to either one of the nodes, I notice the
>> >>> following:
>> >>>- iotop shows huge (~100MB/s) disk I/O on that node alone from the
>> >>> JVM process.
>> >>>- no change in network activity between nodes (~300b/s, same as
>> >>> when
>> >>> idle)
>> >>>- memory usage on the node running the query increases
>> >>> dramatically,
>> >>> and stays higher even after the query is finished.
>> >>>
>> >>> So it seemed to me like each node was favouring use of the CacheLoader
>> >>> to retrieve keys that are not in memory, instead of using the cluster.
>> >>> Does that seem reasonable? Is this the expected behaviour?
>> >>>
>> >>> I started to investigate this by turning on trace logging, in this
>> >>> made me think perhaps the cause was that the CacheLoader's interceptor
>> >>> is higher priority in the chain than the the distribution interceptor?
>> >>> I'm not at all familiar with the design in any level of detail - just
>> >>> what I picked up in the last 24 hours from browsing the code, so I
>> >>> could easily be way off. I've attached the log snippets I thought
>> >>> relevant in [2].
>> >>>
>> >>> Any advice offered much appreciated.
>> >>> Thanks!
>> >>>
>> >>> James.
>> >>>
>> >>>
>> >>> [1] https://www.refheap.com/paste/12531
>> >>> [2] https://www.refheap.com/paste/12543
>> >>> ___
>> >>> infin

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-15 Thread Sanne Grinovero
Hi Adrian,
let's forget about Lucene details and focus on DIST.
With numOwners=1 and having two nodes the entries should be stored
roughly 50% on each node, I see nothing wrong with that
considering you don't need data failover in a read-only use case
having all the index available in the shared CacheLoader.

In such a scenario, and having both nodes preloaded all data, in case
of a get() operation I would expect
either:
 A) to be the owner, hence retrieve the value from local in-JVM reference
 B) to not be the owner, so to forward the request to the other node
having roughly 50% chance per key to be in case A or B.

But when hitting case B) it seems that instead of loading from the
other node, it hits the CacheLoader to fetch the value.

I already had asked James to verify with 4 nodes and numOwners=2, the
result is the same so I suggested him to ask here;
BTW I think numOwners=1 is perfectly valid and should work as with
numOwners=1, the only reason I asked him to repeat
the test is that we don't have much tests on the numOwners=1 case and
I was assuming there might be some (wrong) assumptions
affecting this.

Note that this is not "just" a critical performance problem but I'm
also suspecting it could provide inconsistent reads, in two classes of
problems:

# non-shared CacheStore with stale entries
If for non-owned keys it will hit the local CacheStore first, where
you might expect to not find anything, so to forward the request to
the right node. What if this node has been the owner in the past? It
might have an old entry locally stored, which would be returned
instead of the correct value which is owned on a different node.

# shared CacheStore using write-behind
When using an async CacheStore by definition the content of the
CacheStore is not trustworthy if you don't check on the owner first
for entries in memory.

Both seem critical to me, but the performance impact is really bad too.

I hoped to make some more tests myself but couldn't look at this yet,
any help from the core team would be appreciated.

@Ray, thanks for mentioning the ClusterCacheLoader. Wasn't there
someone else with a CacheLoader issue recently who had worked around
the problem by using a ClusterCacheLoader ?
Do you remember what the scenario was?

Cheers,
Sanne

On 15 March 2013 15:44, Adrian Nistor  wrote:
> Hi James,
>
> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
> the lucene-index cache is distributed with numOwners = 1. That means
> each cache entry is owned by just one cluster node and there's nowhere
> else to go in the cluster if the key is not available in local memory,
> thus it needs fetching from the cache store. This can be solved with
> numOwners > 1.
> Please let me know if this solves your problem.
>
> Cheers!
>
> On 03/15/2013 05:03 PM, James Aley wrote:
>> Hey all,
>>
>> 
>> Seeing as this is my first post, I wanted to just quickly thank you
>> all for Infinispan. So far I'm really enjoying working with it - great
>> product!
>> 
>>
>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>> We use Lucene directly to build a search product, which has high read
>> requirements and likely very large indexes. I'm hoping to make use of
>> a distribution mode cache to keep the whole index in memory across a
>> cluster of machines (the index will be too big for one server).
>>
>> The problem I'm having is that after loading a filesystem-based Lucene
>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
>> retrieving data from the cluster - they instead look up keys in their
>> local CacheLoaders, which involves lots of disk I/O and is very slow.
>> I was hoping to just use the CacheLoader to initialize the caches, but
>> from there on read only from RAM (and network, of course). Is this
>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>>
>> To explain my observations in a little more detail:
>> * I start a cluster of two servers, using [1] as the cache config.
>> Both have a local copy of the Lucene index that will be loaded into
>> the InfinispanDirectory via the loader. This is a test configuration,
>> where I've set numOwners=1 so that I only need two servers for
>> distribution to happen.
>> * Upon startup, things look good. I see the memory usage of the JVM
>> reflect a pretty near 50/50 split of the data across both servers.
>> Logging indicates both servers are in the cluster view, all seems
>> fine.
>> * When I send a search query to either one of the nodes, I notice the 
>> following:
>>- iotop shows huge (~100MB/s) disk I/O on that node alone from the
>> JVM process.
>>- no change in network activity between nodes (~300b/s, same as when idle)
>>- memory usage on the node running the query increases dramatically,
>> and stays higher even after the query is finished.
>>
>> So it seemed to me like each node was favouring use of the CacheLoader
>> to retrieve keys that are not in memory, instead of using the cluster.

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-18 Thread James Aley
Update:

I tried again - I think I misconfigured that ClusterCacheLoader on my
last attempt. With this configuration [1] it actually appears to be
loading keys over the network from the peer node. I'm seeing a lot of
network IO between the nodes when requesting from either one of them
(30-50 MBp/s), and considerably less disk I/O than previously, though
still not negligible.

I think, however, that both nodes are holding on to any data they
retrieve from the other node. Is this possible? The reason I think
this is the case:
 * I have a fairly large test index on disk, which the Lucene
CacheLoader loads into memory as soon as the cache is created. It's
about a 12GB index, and after a flurry of disk activity when they
processes start, I see about 5-6GB of heap usage on each node -- all
seems good.
 * When I send requests now (with this ClusterCacheLoader
configuration linked below), I see network activity between nodes,
plus some disk I/O.
 * After each query, each node grows in heap usage considerably.
Eventually they'll both be using about 11GB of RAM.
 * At the point where both nodes have lots of data in RAM, the network
I/O has dropped hugely to ~100k/s
 * If I repeat an identical query to either node, the response is
instant - O(10ms)

I don't know if this is because they're lazily loading entries from
disk despite the preload=true setting (and the index just takes up far
more RAM when loaded as a Cache like this?), or if it's because
they're locally caching entries that should (by the consistent hash
and numOwners configuration, at least) only live in the remote node?

Thanks!
James.

[1] https://www.refheap.com/paste/12685


On 16 March 2013 01:19, Sanne Grinovero  wrote:
> Hi Adrian,
> let's forget about Lucene details and focus on DIST.
> With numOwners=1 and having two nodes the entries should be stored
> roughly 50% on each node, I see nothing wrong with that
> considering you don't need data failover in a read-only use case
> having all the index available in the shared CacheLoader.
>
> In such a scenario, and having both nodes preloaded all data, in case
> of a get() operation I would expect
> either:
>  A) to be the owner, hence retrieve the value from local in-JVM reference
>  B) to not be the owner, so to forward the request to the other node
> having roughly 50% chance per key to be in case A or B.
>
> But when hitting case B) it seems that instead of loading from the
> other node, it hits the CacheLoader to fetch the value.
>
> I already had asked James to verify with 4 nodes and numOwners=2, the
> result is the same so I suggested him to ask here;
> BTW I think numOwners=1 is perfectly valid and should work as with
> numOwners=1, the only reason I asked him to repeat
> the test is that we don't have much tests on the numOwners=1 case and
> I was assuming there might be some (wrong) assumptions
> affecting this.
>
> Note that this is not "just" a critical performance problem but I'm
> also suspecting it could provide inconsistent reads, in two classes of
> problems:
>
> # non-shared CacheStore with stale entries
> If for non-owned keys it will hit the local CacheStore first, where
> you might expect to not find anything, so to forward the request to
> the right node. What if this node has been the owner in the past? It
> might have an old entry locally stored, which would be returned
> instead of the correct value which is owned on a different node.
>
> # shared CacheStore using write-behind
> When using an async CacheStore by definition the content of the
> CacheStore is not trustworthy if you don't check on the owner first
> for entries in memory.
>
> Both seem critical to me, but the performance impact is really bad too.
>
> I hoped to make some more tests myself but couldn't look at this yet,
> any help from the core team would be appreciated.
>
> @Ray, thanks for mentioning the ClusterCacheLoader. Wasn't there
> someone else with a CacheLoader issue recently who had worked around
> the problem by using a ClusterCacheLoader ?
> Do you remember what the scenario was?
>
> Cheers,
> Sanne
>
> On 15 March 2013 15:44, Adrian Nistor  wrote:
>> Hi James,
>>
>> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
>> the lucene-index cache is distributed with numOwners = 1. That means
>> each cache entry is owned by just one cluster node and there's nowhere
>> else to go in the cluster if the key is not available in local memory,
>> thus it needs fetching from the cache store. This can be solved with
>> numOwners > 1.
>> Please let me know if this solves your problem.
>>
>> Cheers!
>>
>> On 03/15/2013 05:03 PM, James Aley wrote:
>>> Hey all,
>>>
>>> 
>>> Seeing as this is my first post, I wanted to just quickly thank you
>>> all for Infinispan. So far I'm really enjoying working with it - great
>>> product!
>>> 
>>>
>>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>>> We use Lucene directly to build a search product, which has high read
>>> requirements 

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-18 Thread Sanne Grinovero
I'm glad you're finding a workaround for the disk IO but there should
be no need to use a ClusterCacheLoader,
the intention of that would be to be able to chain multiple grids;
this is a critical problem IMHO.

Seems there are multiple other issues at hand, let me comment per bullet:

On 18 March 2013 12:20, James Aley  wrote:
> Update:
>
> I tried again - I think I misconfigured that ClusterCacheLoader on my
> last attempt. With this configuration [1] it actually appears to be
> loading keys over the network from the peer node. I'm seeing a lot of
> network IO between the nodes when requesting from either one of them
> (30-50 MBp/s), and considerably less disk I/O than previously, though
> still not negligible.
>
> I think, however, that both nodes are holding on to any data they
> retrieve from the other node. Is this possible? The reason I think
> this is the case:
>  * I have a fairly large test index on disk, which the Lucene
> CacheLoader loads into memory as soon as the cache is created. It's
> about a 12GB index, and after a flurry of disk activity when they
> processes start, I see about 5-6GB of heap usage on each node -- all
> seems good.

Agree: that phase looks good.

>  * When I send requests now (with this ClusterCacheLoader
> configuration linked below), I see network activity between nodes,
> plus some disk I/O.

I would not expect any more disk I/O to happen anymore at this point.
Only case I think a disk event could be triggered is if the Lucene logic
would attempt to load a non-existing key, like if there was a function
attempting either:
 - to check if a file exists (disregarding the directory listing)
 - to load a data range out of the expected boundaries

The reason for me to think that is that Infinispan is not "caching" null
entries: we plan tombstones for 6.0 but for now if an entry doesn't exist
it won't "remember" the entry is null and will try to look it up again from
the usual places, so including the CacheLoader.

I've opened ISPN-2932 to inspect this and add tests to cover it.

>  * After each query, each node grows in heap usage considerably.
> Eventually they'll both be using about 11GB of RAM.

Permanently even after you close the IndexReader?

I'm thinking I'm afraid this Directory is not suited for you as it is expected
you can fit the whole index in a single JVM: the IndexReader might
request (and keep references) to all segments; in most cases it will
work on a subset of segments so it could work for you but in case you
need to iterate it all you might need to use a custom Collector and play
with Infinispan custom commands; I can give you some pointers as we
have examples of an (experimental) distributed Query execution in the
infinispan query module, or I think we could play into combining
Map/Reduce with index analysis. (In other words: send the computation
to the data rather than downloading half of the index to the local JVM).

But if this is not leaking because of the IndexReader usage, then it's a
leak we need to fix.

>  * At the point where both nodes have lots of data in RAM, the network
> I/O has dropped hugely to ~100k/s

Almost looks like you have L1 enabled? Could you check that?
Or the IndexReader is buffering.

>  * If I repeat an identical query to either node, the response is
> instant - O(10ms)

Well that would be good if only we could know why :)
But this is not an option for you right? I mean you can't load all the
production data in a single JVM?

Sanne


>
> I don't know if this is because they're lazily loading entries from
> disk despite the preload=true setting (and the index just takes up far
> more RAM when loaded as a Cache like this?), or if it's because
> they're locally caching entries that should (by the consistent hash
> and numOwners configuration, at least) only live in the remote node?
>
> Thanks!
> James.
>
> [1] https://www.refheap.com/paste/12685
>
>
> On 16 March 2013 01:19, Sanne Grinovero  wrote:
>> Hi Adrian,
>> let's forget about Lucene details and focus on DIST.
>> With numOwners=1 and having two nodes the entries should be stored
>> roughly 50% on each node, I see nothing wrong with that
>> considering you don't need data failover in a read-only use case
>> having all the index available in the shared CacheLoader.
>>
>> In such a scenario, and having both nodes preloaded all data, in case
>> of a get() operation I would expect
>> either:
>>  A) to be the owner, hence retrieve the value from local in-JVM reference
>>  B) to not be the owner, so to forward the request to the other node
>> having roughly 50% chance per key to be in case A or B.
>>
>> But when hitting case B) it seems that instead of loading from the
>> other node, it hits the CacheLoader to fetch the value.
>>
>> I already had asked James to verify with 4 nodes and numOwners=2, the
>> result is the same so I suggested him to ask here;
>> BTW I think numOwners=1 is perfectly valid and should work as with
>> numOwners=1, the only reason I asked him to repeat
>>

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-18 Thread Ray Tsang
> there should
> be no need to use a ClusterCacheLoader,

I agree. This looked consistent w/ what I saw a couple weeks ago in a
different thread.  Use of ClusterCacheLoader didn't make sense to me
either...

On Mar 18, 2013, at 5:55, Sanne Grinovero  wrote:

> I'm glad you're finding a workaround for the disk IO but there should
> be no need to use a ClusterCacheLoader,
> the intention of that would be to be able to chain multiple grids;
> this is a critical problem IMHO.
>
> Seems there are multiple other issues at hand, let me comment per bullet:
>
> On 18 March 2013 12:20, James Aley  wrote:
>> Update:
>>
>> I tried again - I think I misconfigured that ClusterCacheLoader on my
>> last attempt. With this configuration [1] it actually appears to be
>> loading keys over the network from the peer node. I'm seeing a lot of
>> network IO between the nodes when requesting from either one of them
>> (30-50 MBp/s), and considerably less disk I/O than previously, though
>> still not negligible.
>>
>> I think, however, that both nodes are holding on to any data they
>> retrieve from the other node. Is this possible? The reason I think
>> this is the case:
>> * I have a fairly large test index on disk, which the Lucene
>> CacheLoader loads into memory as soon as the cache is created. It's
>> about a 12GB index, and after a flurry of disk activity when they
>> processes start, I see about 5-6GB of heap usage on each node -- all
>> seems good.
>
> Agree: that phase looks good.
>
>> * When I send requests now (with this ClusterCacheLoader
>> configuration linked below), I see network activity between nodes,
>> plus some disk I/O.
>
> I would not expect any more disk I/O to happen anymore at this point.
> Only case I think a disk event could be triggered is if the Lucene logic
> would attempt to load a non-existing key, like if there was a function
> attempting either:
> - to check if a file exists (disregarding the directory listing)
> - to load a data range out of the expected boundaries
>
> The reason for me to think that is that Infinispan is not "caching" null
> entries: we plan tombstones for 6.0 but for now if an entry doesn't exist
> it won't "remember" the entry is null and will try to look it up again from
> the usual places, so including the CacheLoader.
>
> I've opened ISPN-2932 to inspect this and add tests to cover it.
>
>> * After each query, each node grows in heap usage considerably.
>> Eventually they'll both be using about 11GB of RAM.
>
> Permanently even after you close the IndexReader?
>
> I'm thinking I'm afraid this Directory is not suited for you as it is expected
> you can fit the whole index in a single JVM: the IndexReader might
> request (and keep references) to all segments; in most cases it will
> work on a subset of segments so it could work for you but in case you
> need to iterate it all you might need to use a custom Collector and play
> with Infinispan custom commands; I can give you some pointers as we
> have examples of an (experimental) distributed Query execution in the
> infinispan query module, or I think we could play into combining
> Map/Reduce with index analysis. (In other words: send the computation
> to the data rather than downloading half of the index to the local JVM).
>
> But if this is not leaking because of the IndexReader usage, then it's a
> leak we need to fix.
>
>> * At the point where both nodes have lots of data in RAM, the network
>> I/O has dropped hugely to ~100k/s
>
> Almost looks like you have L1 enabled? Could you check that?
> Or the IndexReader is buffering.
>
>> * If I repeat an identical query to either node, the response is
>> instant - O(10ms)
>
> Well that would be good if only we could know why :)
> But this is not an option for you right? I mean you can't load all the
> production data in a single JVM?
>
> Sanne
>
>
>>
>> I don't know if this is because they're lazily loading entries from
>> disk despite the preload=true setting (and the index just takes up far
>> more RAM when loaded as a Cache like this?), or if it's because
>> they're locally caching entries that should (by the consistent hash
>> and numOwners configuration, at least) only live in the remote node?
>>
>> Thanks!
>> James.
>>
>> [1] https://www.refheap.com/paste/12685
>>
>>
>> On 16 March 2013 01:19, Sanne Grinovero  wrote:
>>> Hi Adrian,
>>> let's forget about Lucene details and focus on DIST.
>>> With numOwners=1 and having two nodes the entries should be stored
>>> roughly 50% on each node, I see nothing wrong with that
>>> considering you don't need data failover in a read-only use case
>>> having all the index available in the shared CacheLoader.
>>>
>>> In such a scenario, and having both nodes preloaded all data, in case
>>> of a get() operation I would expect
>>> either:
>>> A) to be the owner, hence retrieve the value from local in-JVM reference
>>> B) to not be the owner, so to forward the request to the other node
>>> having roughly 50% chance per key to be in case A

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread James Aley
Hi all,

So, in my previous update it seems I had numOwners=2, but was only
using two servers. Therefore, what I was seeing made complete sense,
actually. After changing numOwners to 1, distribution appears to work
as expected with that clusterLoader added to the config as suggested.
Thanks for the help!

I'm now having other issues, seeing way more network traffic than I
can really explain, but that's another topic, which I need to
investigate more. Just wanted to let you know that I think we got to
the bottom of this one!


Thanks!
James.

On 18 March 2013 15:52, Ray Tsang  wrote:
>> there should
>> be no need to use a ClusterCacheLoader,
>
> I agree. This looked consistent w/ what I saw a couple weeks ago in a
> different thread.  Use of ClusterCacheLoader didn't make sense to me
> either...
>
> On Mar 18, 2013, at 5:55, Sanne Grinovero  wrote:
>
>> I'm glad you're finding a workaround for the disk IO but there should
>> be no need to use a ClusterCacheLoader,
>> the intention of that would be to be able to chain multiple grids;
>> this is a critical problem IMHO.
>>
>> Seems there are multiple other issues at hand, let me comment per bullet:
>>
>> On 18 March 2013 12:20, James Aley  wrote:
>>> Update:
>>>
>>> I tried again - I think I misconfigured that ClusterCacheLoader on my
>>> last attempt. With this configuration [1] it actually appears to be
>>> loading keys over the network from the peer node. I'm seeing a lot of
>>> network IO between the nodes when requesting from either one of them
>>> (30-50 MBp/s), and considerably less disk I/O than previously, though
>>> still not negligible.
>>>
>>> I think, however, that both nodes are holding on to any data they
>>> retrieve from the other node. Is this possible? The reason I think
>>> this is the case:
>>> * I have a fairly large test index on disk, which the Lucene
>>> CacheLoader loads into memory as soon as the cache is created. It's
>>> about a 12GB index, and after a flurry of disk activity when they
>>> processes start, I see about 5-6GB of heap usage on each node -- all
>>> seems good.
>>
>> Agree: that phase looks good.
>>
>>> * When I send requests now (with this ClusterCacheLoader
>>> configuration linked below), I see network activity between nodes,
>>> plus some disk I/O.
>>
>> I would not expect any more disk I/O to happen anymore at this point.
>> Only case I think a disk event could be triggered is if the Lucene logic
>> would attempt to load a non-existing key, like if there was a function
>> attempting either:
>> - to check if a file exists (disregarding the directory listing)
>> - to load a data range out of the expected boundaries
>>
>> The reason for me to think that is that Infinispan is not "caching" null
>> entries: we plan tombstones for 6.0 but for now if an entry doesn't exist
>> it won't "remember" the entry is null and will try to look it up again from
>> the usual places, so including the CacheLoader.
>>
>> I've opened ISPN-2932 to inspect this and add tests to cover it.
>>
>>> * After each query, each node grows in heap usage considerably.
>>> Eventually they'll both be using about 11GB of RAM.
>>
>> Permanently even after you close the IndexReader?
>>
>> I'm thinking I'm afraid this Directory is not suited for you as it is 
>> expected
>> you can fit the whole index in a single JVM: the IndexReader might
>> request (and keep references) to all segments; in most cases it will
>> work on a subset of segments so it could work for you but in case you
>> need to iterate it all you might need to use a custom Collector and play
>> with Infinispan custom commands; I can give you some pointers as we
>> have examples of an (experimental) distributed Query execution in the
>> infinispan query module, or I think we could play into combining
>> Map/Reduce with index analysis. (In other words: send the computation
>> to the data rather than downloading half of the index to the local JVM).
>>
>> But if this is not leaking because of the IndexReader usage, then it's a
>> leak we need to fix.
>>
>>> * At the point where both nodes have lots of data in RAM, the network
>>> I/O has dropped hugely to ~100k/s
>>
>> Almost looks like you have L1 enabled? Could you check that?
>> Or the IndexReader is buffering.
>>
>>> * If I repeat an identical query to either node, the response is
>>> instant - O(10ms)
>>
>> Well that would be good if only we could know why :)
>> But this is not an option for you right? I mean you can't load all the
>> production data in a single JVM?
>>
>> Sanne
>>
>>
>>>
>>> I don't know if this is because they're lazily loading entries from
>>> disk despite the preload=true setting (and the index just takes up far
>>> more RAM when loaded as a Cache like this?), or if it's because
>>> they're locally caching entries that should (by the consistent hash
>>> and numOwners configuration, at least) only live in the remote node?
>>>
>>> Thanks!
>>> James.
>>>
>>> [1] https://www.refheap.com/paste/12685
>>>
>>>
>>> On 16 March 

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread Mircea Markus
Hi James,

By specifying the LuceneCacheLoader as a loader for the default cache, it will 
added to both the "lucene-index" (where it is needed) and the other two caches 
(lucene-metadata and lucene-locks) - where I don't think it is needed. I think 
it should only be configured for the "lucene-index" cache and removed from the 
default config.

On top of that you might want to add the cluster cache loader *before* the 
LuceneCacheLoader, otherwise it will always be the LuceneCacheLoader that would 
be queried first. The config I have in mind is[1], would you mind giving it a 
try?

[1] https://gist.github.com/mmarkus/5195400


On 15 Mar 2013, at 16:22, James Aley wrote:

> Not sure if I've done exactly what you had in mind... here is my updated XML:
> https://www.refheap.com/paste/12601
> 
> I added the loader to the lucene-index namedCache, which is the one
> I'm using for distribution.
> 
> This didn't appear to change anything, as far as I can see. Still
> seeing a lot of disk IO with every request.
> 
> 
> James.
> 
> 
> On 15 March 2013 15:54, Ray Tsang  wrote:
>> Can you try adding a ClusterCacheLoader to see if that helps?
>> 
>> Thanks,
>> 
>> 
>> On Fri, Mar 15, 2013 at 8:49 AM, James Aley  wrote:
>>> 
>>> Apologies - forgot to copy list.
>>> 
>>> On 15 March 2013 15:48, James Aley  wrote:
 Hey Adrian,
 
 Thanks for the response. I was chatting to Sanne on IRC yesterday, and
 he suggested this to me. Actually the logging I attached was from a
 cluster of 4 servers with numOwners=2. Sorry, I should have mentioned
 this actually, but I thought seeing as it didn't appear to make any
 difference that I'd just keep things simple in my previous email.
 
 While it seemed not to make a difference in this case... I can see why
 that would make sense. In future tests I guess I should probably stick
 with numOwners > 1.
 
 
 James.
 
 On 15 March 2013 15:44, Adrian Nistor  wrote:
> Hi James,
> 
> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
> the
> lucene-index cache is distributed with numOwners = 1. That means each
> cache
> entry is owned by just one cluster node and there's nowhere else to go
> in
> the cluster if the key is not available in local memory, thus it needs
> fetching from the cache store. This can be solved with numOwners > 1.
> Please let me know if this solves your problem.
> 
> Cheers!
> 
> 
> On 03/15/2013 05:03 PM, James Aley wrote:
>> 
>> Hey all,
>> 
>> 
>> Seeing as this is my first post, I wanted to just quickly thank you
>> all for Infinispan. So far I'm really enjoying working with it - great
>> product!
>> 
>> 
>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>> We use Lucene directly to build a search product, which has high read
>> requirements and likely very large indexes. I'm hoping to make use of
>> a distribution mode cache to keep the whole index in memory across a
>> cluster of machines (the index will be too big for one server).
>> 
>> The problem I'm having is that after loading a filesystem-based Lucene
>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
>> retrieving data from the cluster - they instead look up keys in their
>> local CacheLoaders, which involves lots of disk I/O and is very slow.
>> I was hoping to just use the CacheLoader to initialize the caches, but
>> from there on read only from RAM (and network, of course). Is this
>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>> 
>> To explain my observations in a little more detail:
>> * I start a cluster of two servers, using [1] as the cache config.
>> Both have a local copy of the Lucene index that will be loaded into
>> the InfinispanDirectory via the loader. This is a test configuration,
>> where I've set numOwners=1 so that I only need two servers for
>> distribution to happen.
>> * Upon startup, things look good. I see the memory usage of the JVM
>> reflect a pretty near 50/50 split of the data across both servers.
>> Logging indicates both servers are in the cluster view, all seems
>> fine.
>> * When I send a search query to either one of the nodes, I notice the
>> following:
>>   - iotop shows huge (~100MB/s) disk I/O on that node alone from the
>> JVM process.
>>   - no change in network activity between nodes (~300b/s, same as
>> when
>> idle)
>>   - memory usage on the node running the query increases
>> dramatically,
>> and stays higher even after the query is finished.
>> 
>> So it seemed to me like each node was favouring use of the CacheLoader
>> to retrieve keys that are not in memory, instead of using the cluster.
>> Does that seem reasonable? Is this the expected behaviour?
>> 
>>>

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread James Aley
Hi,

Thanks for the tips, but I think there will be a couple of issues:

* By mistake, I actually tried creating the lucene-metadata cache
without the loader to start with, and the Directory is unusable
without it, as it isn't able to list the index files when Lucene's
IndexReader asks for them. So, I'm pretty sure the metadata cache
needs to see the loader - maybe Sanne can confirm?

* Having the clusterLoader with shared=true and/or preload=true will
cause "unknown responses" exceptions when the cluster rebalances. It's
documented in ClusterCacheLoader that this is unsupported. That's why
I had to create two  elements - as I want those attributes
set for the lucene-index cache.

* The lucene-locks cache doesn't need to see the loader, but it seems
that having it available causes no harm, as this cache just doesn't
find any relevant keys in the loader when initialised.

With the config I linked previously, it seems distribution mode is
actually working OK. It appeared not be because I had as many owners
as nodes, but now things are working as expected, but for performance
tuning!


Thanks,
James.

On 19 March 2013 11:32, Mircea Markus  wrote:
> Hi James,
>
> By specifying the LuceneCacheLoader as a loader for the default cache, it 
> will added to both the "lucene-index" (where it is needed) and the other two 
> caches (lucene-metadata and lucene-locks) - where I don't think it is needed. 
> I think it should only be configured for the "lucene-index" cache and removed 
> from the default config.
>
> On top of that you might want to add the cluster cache loader *before* the 
> LuceneCacheLoader, otherwise it will always be the LuceneCacheLoader that 
> would be queried first. The config I have in mind is[1], would you mind 
> giving it a try?
>
> [1] https://gist.github.com/mmarkus/5195400
>
>
> On 15 Mar 2013, at 16:22, James Aley wrote:
>
>> Not sure if I've done exactly what you had in mind... here is my updated XML:
>> https://www.refheap.com/paste/12601
>>
>> I added the loader to the lucene-index namedCache, which is the one
>> I'm using for distribution.
>>
>> This didn't appear to change anything, as far as I can see. Still
>> seeing a lot of disk IO with every request.
>>
>>
>> James.
>>
>>
>> On 15 March 2013 15:54, Ray Tsang  wrote:
>>> Can you try adding a ClusterCacheLoader to see if that helps?
>>>
>>> Thanks,
>>>
>>>
>>> On Fri, Mar 15, 2013 at 8:49 AM, James Aley  wrote:

 Apologies - forgot to copy list.

 On 15 March 2013 15:48, James Aley  wrote:
> Hey Adrian,
>
> Thanks for the response. I was chatting to Sanne on IRC yesterday, and
> he suggested this to me. Actually the logging I attached was from a
> cluster of 4 servers with numOwners=2. Sorry, I should have mentioned
> this actually, but I thought seeing as it didn't appear to make any
> difference that I'd just keep things simple in my previous email.
>
> While it seemed not to make a difference in this case... I can see why
> that would make sense. In future tests I guess I should probably stick
> with numOwners > 1.
>
>
> James.
>
> On 15 March 2013 15:44, Adrian Nistor  wrote:
>> Hi James,
>>
>> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
>> the
>> lucene-index cache is distributed with numOwners = 1. That means each
>> cache
>> entry is owned by just one cluster node and there's nowhere else to go
>> in
>> the cluster if the key is not available in local memory, thus it needs
>> fetching from the cache store. This can be solved with numOwners > 1.
>> Please let me know if this solves your problem.
>>
>> Cheers!
>>
>>
>> On 03/15/2013 05:03 PM, James Aley wrote:
>>>
>>> Hey all,
>>>
>>> 
>>> Seeing as this is my first post, I wanted to just quickly thank you
>>> all for Infinispan. So far I'm really enjoying working with it - great
>>> product!
>>> 
>>>
>>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>>> We use Lucene directly to build a search product, which has high read
>>> requirements and likely very large indexes. I'm hoping to make use of
>>> a distribution mode cache to keep the whole index in memory across a
>>> cluster of machines (the index will be too big for one server).
>>>
>>> The problem I'm having is that after loading a filesystem-based Lucene
>>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
>>> retrieving data from the cluster - they instead look up keys in their
>>> local CacheLoaders, which involves lots of disk I/O and is very slow.
>>> I was hoping to just use the CacheLoader to initialize the caches, but
>>> from there on read only from RAM (and network, of course). Is this
>>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>>>
>>> To explain my observations in a little m

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread Mircea Markus

On 16 Mar 2013, at 01:19, Sanne Grinovero wrote:

> Hi Adrian,
> let's forget about Lucene details and focus on DIST.
> With numOwners=1 and having two nodes the entries should be stored
> roughly 50% on each node, I see nothing wrong with that
> considering you don't need data failover in a read-only use case
> having all the index available in the shared CacheLoader.
> 
> In such a scenario, and having both nodes preloaded all data, in case
> of a get() operation I would expect
> either:
> A) to be the owner, hence retrieve the value from local in-JVM reference
> B) to not be the owner, so to forward the request to the other node
> having roughly 50% chance per key to be in case A or B.
> 
> But when hitting case B) it seems that instead of loading from the
> other node, it hits the CacheLoader to fetch the value.
> 
> I already had asked James to verify with 4 nodes and numOwners=2, the
> result is the same so I suggested him to ask here;
> BTW I think numOwners=1 is perfectly valid and should work as with
> numOwners=1, the only reason I asked him to repeat
> the test is that we don't have much tests on the numOwners=1 case and
> I was assuming there might be some (wrong) assumptions
> affecting this.
> 
> Note that this is not "just" a critical performance problem but I'm
> also suspecting it could provide inconsistent reads, in two classes of
> problems:
> 
> # non-shared CacheStore with stale entries
> If for non-owned keys it will hit the local CacheStore first, where
> you might expect to not find anything, so to forward the request to
> the right node. What if this node has been the owner in the past? It
> might have an old entry locally stored, which would be returned
> instead of the correct value which is owned on a different node.
> 
> # shared CacheStore using write-behind
> When using an async CacheStore by definition the content of the
> CacheStore is not trustworthy if you don't check on the owner first
> for entries in memory.
> 
> Both seem critical to me, but the performance impact is really bad too.
> 
> I hoped to make some more tests myself but couldn't look at this yet,
> any help from the core team would be appreciated.
I think you have a fair point and reads/writes to the data should be 
coordinated through its owners both for performance and (more importantly) 
correctness.
Mind creating a JIRA for this?

> 
> @Ray, thanks for mentioning the ClusterCacheLoader. Wasn't there
> someone else with a CacheLoader issue recently who had worked around
> the problem by using a ClusterCacheLoader ?
> Do you remember what the scenario was?
> 
> Cheers,
> Sanne
> 
> On 15 March 2013 15:44, Adrian Nistor  wrote:
>> Hi James,
>> 
>> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
>> the lucene-index cache is distributed with numOwners = 1. That means
>> each cache entry is owned by just one cluster node and there's nowhere
>> else to go in the cluster if the key is not available in local memory,
>> thus it needs fetching from the cache store. This can be solved with
>> numOwners > 1.
>> Please let me know if this solves your problem.
>> 
>> Cheers!
>> 
>> On 03/15/2013 05:03 PM, James Aley wrote:
>>> Hey all,
>>> 
>>> 
>>> Seeing as this is my first post, I wanted to just quickly thank you
>>> all for Infinispan. So far I'm really enjoying working with it - great
>>> product!
>>> 
>>> 
>>> I'm using the InfinispanDirectory for a Lucene project at the moment.
>>> We use Lucene directly to build a search product, which has high read
>>> requirements and likely very large indexes. I'm hoping to make use of
>>> a distribution mode cache to keep the whole index in memory across a
>>> cluster of machines (the index will be too big for one server).
>>> 
>>> The problem I'm having is that after loading a filesystem-based Lucene
>>> directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
>>> retrieving data from the cluster - they instead look up keys in their
>>> local CacheLoaders, which involves lots of disk I/O and is very slow.
>>> I was hoping to just use the CacheLoader to initialize the caches, but
>>> from there on read only from RAM (and network, of course). Is this
>>> supported? Maybe I've misunderstood the purpose of the CacheLoader?
>>> 
>>> To explain my observations in a little more detail:
>>> * I start a cluster of two servers, using [1] as the cache config.
>>> Both have a local copy of the Lucene index that will be loaded into
>>> the InfinispanDirectory via the loader. This is a test configuration,
>>> where I've set numOwners=1 so that I only need two servers for
>>> distribution to happen.
>>> * Upon startup, things look good. I see the memory usage of the JVM
>>> reflect a pretty near 50/50 split of the data across both servers.
>>> Logging indicates both servers are in the cluster view, all seems
>>> fine.
>>> * When I send a search query to either one of the nodes, I notice the 
>>> following:
>>>   - iotop shows huge (~100MB/s) disk I/O 

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread Sanne Grinovero
Mircea,
what I was most looking forward was to you comment on the interceptor
order generated for DIST+cachestores
 - we don't think the ClusteredCacheLoader should be needed at all
 - each DIST node is loading from the CacheLoader (any) rather than
loading from its peer nodes for non-owned entries (!!)

This has come up on several threads now and I think it's critically
wrong, as I commented previously this also introduces many
inconsistencies - as far as I understand it.

BTW your gist wouldn't work, the metadata cache needs to load certain
elements too. But nice you spotted the need to potentially filter what
"preload" means in the scope of each cache, as the metadata one should
only preload metadata, while in the original configuration this data
would indeed be duplicated.
Opened: https://issues.jboss.org/browse/ISPN-2938

Sanne

On 19 March 2013 11:51, Mircea Markus  wrote:
>
> On 16 Mar 2013, at 01:19, Sanne Grinovero wrote:
>
>> Hi Adrian,
>> let's forget about Lucene details and focus on DIST.
>> With numOwners=1 and having two nodes the entries should be stored
>> roughly 50% on each node, I see nothing wrong with that
>> considering you don't need data failover in a read-only use case
>> having all the index available in the shared CacheLoader.
>>
>> In such a scenario, and having both nodes preloaded all data, in case
>> of a get() operation I would expect
>> either:
>> A) to be the owner, hence retrieve the value from local in-JVM reference
>> B) to not be the owner, so to forward the request to the other node
>> having roughly 50% chance per key to be in case A or B.
>>
>> But when hitting case B) it seems that instead of loading from the
>> other node, it hits the CacheLoader to fetch the value.
>>
>> I already had asked James to verify with 4 nodes and numOwners=2, the
>> result is the same so I suggested him to ask here;
>> BTW I think numOwners=1 is perfectly valid and should work as with
>> numOwners=1, the only reason I asked him to repeat
>> the test is that we don't have much tests on the numOwners=1 case and
>> I was assuming there might be some (wrong) assumptions
>> affecting this.
>>
>> Note that this is not "just" a critical performance problem but I'm
>> also suspecting it could provide inconsistent reads, in two classes of
>> problems:
>>
>> # non-shared CacheStore with stale entries
>> If for non-owned keys it will hit the local CacheStore first, where
>> you might expect to not find anything, so to forward the request to
>> the right node. What if this node has been the owner in the past? It
>> might have an old entry locally stored, which would be returned
>> instead of the correct value which is owned on a different node.
>>
>> # shared CacheStore using write-behind
>> When using an async CacheStore by definition the content of the
>> CacheStore is not trustworthy if you don't check on the owner first
>> for entries in memory.
>>
>> Both seem critical to me, but the performance impact is really bad too.
>>
>> I hoped to make some more tests myself but couldn't look at this yet,
>> any help from the core team would be appreciated.
> I think you have a fair point and reads/writes to the data should be 
> coordinated through its owners both for performance and (more importantly) 
> correctness.
> Mind creating a JIRA for this?
>
>>
>> @Ray, thanks for mentioning the ClusterCacheLoader. Wasn't there
>> someone else with a CacheLoader issue recently who had worked around
>> the problem by using a ClusterCacheLoader ?
>> Do you remember what the scenario was?
>>
>> Cheers,
>> Sanne
>>
>> On 15 March 2013 15:44, Adrian Nistor  wrote:
>>> Hi James,
>>>
>>> I'm not an expert on InfinispanDirectory but I've noticed in [1] that
>>> the lucene-index cache is distributed with numOwners = 1. That means
>>> each cache entry is owned by just one cluster node and there's nowhere
>>> else to go in the cluster if the key is not available in local memory,
>>> thus it needs fetching from the cache store. This can be solved with
>>> numOwners > 1.
>>> Please let me know if this solves your problem.
>>>
>>> Cheers!
>>>
>>> On 03/15/2013 05:03 PM, James Aley wrote:
 Hey all,

 
 Seeing as this is my first post, I wanted to just quickly thank you
 all for Infinispan. So far I'm really enjoying working with it - great
 product!
 

 I'm using the InfinispanDirectory for a Lucene project at the moment.
 We use Lucene directly to build a search product, which has high read
 requirements and likely very large indexes. I'm hoping to make use of
 a distribution mode cache to keep the whole index in memory across a
 cluster of machines (the index will be too big for one server).

 The problem I'm having is that after loading a filesystem-based Lucene
 directory into InfinispanDirectory via LuceneCacheLoader, no nodes are
 retrieving data from the cluster - they instead look up keys in their
 local CacheLoaders, which involves lots of di

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread Sanne Grinovero
James, to workaround ISPN-2938
you could use preloading=true on the "lucene-index" cacheloader, and
preloading="false" on the "lucene-metadata" cacheloader.
Not particularly critical, but would save you a bunch of memory.

Sanne

On 19 March 2013 14:12, Sanne Grinovero  wrote:
> Mircea,
> what I was most looking forward was to you comment on the interceptor
> order generated for DIST+cachestores
>  - we don't think the ClusteredCacheLoader should be needed at all
>  - each DIST node is loading from the CacheLoader (any) rather than
> loading from its peer nodes for non-owned entries (!!)
>
> This has come up on several threads now and I think it's critically
> wrong, as I commented previously this also introduces many
> inconsistencies - as far as I understand it.
>
> BTW your gist wouldn't work, the metadata cache needs to load certain
> elements too. But nice you spotted the need to potentially filter what
> "preload" means in the scope of each cache, as the metadata one should
> only preload metadata, while in the original configuration this data
> would indeed be duplicated.
> Opened: https://issues.jboss.org/browse/ISPN-2938
>
> Sanne
>
> On 19 March 2013 11:51, Mircea Markus  wrote:
>>
>> On 16 Mar 2013, at 01:19, Sanne Grinovero wrote:
>>
>>> Hi Adrian,
>>> let's forget about Lucene details and focus on DIST.
>>> With numOwners=1 and having two nodes the entries should be stored
>>> roughly 50% on each node, I see nothing wrong with that
>>> considering you don't need data failover in a read-only use case
>>> having all the index available in the shared CacheLoader.
>>>
>>> In such a scenario, and having both nodes preloaded all data, in case
>>> of a get() operation I would expect
>>> either:
>>> A) to be the owner, hence retrieve the value from local in-JVM reference
>>> B) to not be the owner, so to forward the request to the other node
>>> having roughly 50% chance per key to be in case A or B.
>>>
>>> But when hitting case B) it seems that instead of loading from the
>>> other node, it hits the CacheLoader to fetch the value.
>>>
>>> I already had asked James to verify with 4 nodes and numOwners=2, the
>>> result is the same so I suggested him to ask here;
>>> BTW I think numOwners=1 is perfectly valid and should work as with
>>> numOwners=1, the only reason I asked him to repeat
>>> the test is that we don't have much tests on the numOwners=1 case and
>>> I was assuming there might be some (wrong) assumptions
>>> affecting this.
>>>
>>> Note that this is not "just" a critical performance problem but I'm
>>> also suspecting it could provide inconsistent reads, in two classes of
>>> problems:
>>>
>>> # non-shared CacheStore with stale entries
>>> If for non-owned keys it will hit the local CacheStore first, where
>>> you might expect to not find anything, so to forward the request to
>>> the right node. What if this node has been the owner in the past? It
>>> might have an old entry locally stored, which would be returned
>>> instead of the correct value which is owned on a different node.
>>>
>>> # shared CacheStore using write-behind
>>> When using an async CacheStore by definition the content of the
>>> CacheStore is not trustworthy if you don't check on the owner first
>>> for entries in memory.
>>>
>>> Both seem critical to me, but the performance impact is really bad too.
>>>
>>> I hoped to make some more tests myself but couldn't look at this yet,
>>> any help from the core team would be appreciated.
>> I think you have a fair point and reads/writes to the data should be 
>> coordinated through its owners both for performance and (more importantly) 
>> correctness.
>> Mind creating a JIRA for this?
>>
>>>
>>> @Ray, thanks for mentioning the ClusterCacheLoader. Wasn't there
>>> someone else with a CacheLoader issue recently who had worked around
>>> the problem by using a ClusterCacheLoader ?
>>> Do you remember what the scenario was?
>>>
>>> Cheers,
>>> Sanne
>>>
>>> On 15 March 2013 15:44, Adrian Nistor  wrote:
 Hi James,

 I'm not an expert on InfinispanDirectory but I've noticed in [1] that
 the lucene-index cache is distributed with numOwners = 1. That means
 each cache entry is owned by just one cluster node and there's nowhere
 else to go in the cluster if the key is not available in local memory,
 thus it needs fetching from the cache store. This can be solved with
 numOwners > 1.
 Please let me know if this solves your problem.

 Cheers!

 On 03/15/2013 05:03 PM, James Aley wrote:
> Hey all,
>
> 
> Seeing as this is my first post, I wanted to just quickly thank you
> all for Infinispan. So far I'm really enjoying working with it - great
> product!
> 
>
> I'm using the InfinispanDirectory for a Lucene project at the moment.
> We use Lucene directly to build a search product, which has high read
> requirements and likely very large indexes. I'm hoping to make use of
> a distribution m

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread Dan Berindei
Hi Sanne

On Tue, Mar 19, 2013 at 4:12 PM, Sanne Grinovero wrote:

> Mircea,
> what I was most looking forward was to you comment on the interceptor
> order generated for DIST+cachestores
>  - we don't think the ClusteredCacheLoader should be needed at all
>

Agree, ClusteredCacheLoader should not be necessary.

James, if you're still seeing problems with numOwners=1, could you create
an issue in JIRA?



>  - each DIST node is loading from the CacheLoader (any) rather than
> loading from its peer nodes for non-owned entries (!!)
>
>
Sometimes loading stuff from a local disk is faster than going remote, e.g.
if you have numOwners=2 and both owners have to load the same entry from
disk and send it to the originator twice.

Still, most of the time the entry is going to be in memory on the owner
nodes, so the local load is slower (especially with a shared cache store,
where loading is over the network as well).



> This has come up on several threads now and I think it's critically
> wrong, as I commented previously this also introduces many
> inconsistencies - as far as I understand it.
>
>
Is there a JIRA for this already?

Yes, loading a stale entry from the local cache store is definitely not a
good thing, but we actually delete the non-owned entries after the initial
state transfer. There may be some consistency issues if one uses a
DIST_SYNC cache with a shared async cache store, but fully sync
configurations should be fine.

OTOH, if the cache store is not shared, the chances of finding the entry in
the local store on a non-owner are slim to none, so it doesn't make sense
to do the lookup.

Implementation-wise, just changing the interceptor order is probably not
enough. If the key doesn't exist in the cache, the CacheLoaderInterceptor
will still try to load it from the cache store after the remote lookup, so
we'll need a marker  in the invocation context to avoid the extra cache
store load. Actually, since this is just a performance issue, it could wait
until we implement tombstones everywhere.



> BTW your gist wouldn't work, the metadata cache needs to load certain
> elements too. But nice you spotted the need to potentially filter what
> "preload" means in the scope of each cache, as the metadata one should
> only preload metadata, while in the original configuration this data
> would indeed be duplicated.
> Opened: https://issues.jboss.org/browse/ISPN-2938
>
> Sanne
>
> On 19 March 2013 11:51, Mircea Markus  wrote:
> >
> > On 16 Mar 2013, at 01:19, Sanne Grinovero wrote:
> >
> >> Hi Adrian,
> >> let's forget about Lucene details and focus on DIST.
> >> With numOwners=1 and having two nodes the entries should be stored
> >> roughly 50% on each node, I see nothing wrong with that
> >> considering you don't need data failover in a read-only use case
> >> having all the index available in the shared CacheLoader.
> >>
> >> In such a scenario, and having both nodes preloaded all data, in case
> >> of a get() operation I would expect
> >> either:
> >> A) to be the owner, hence retrieve the value from local in-JVM reference
> >> B) to not be the owner, so to forward the request to the other node
> >> having roughly 50% chance per key to be in case A or B.
> >>
> >> But when hitting case B) it seems that instead of loading from the
> >> other node, it hits the CacheLoader to fetch the value.
> >>
> >> I already had asked James to verify with 4 nodes and numOwners=2, the
> >> result is the same so I suggested him to ask here;
> >> BTW I think numOwners=1 is perfectly valid and should work as with
> >> numOwners=1, the only reason I asked him to repeat
> >> the test is that we don't have much tests on the numOwners=1 case and
> >> I was assuming there might be some (wrong) assumptions
> >> affecting this.
> >>
> >> Note that this is not "just" a critical performance problem but I'm
> >> also suspecting it could provide inconsistent reads, in two classes of
> >> problems:
> >>
> >> # non-shared CacheStore with stale entries
> >> If for non-owned keys it will hit the local CacheStore first, where
> >> you might expect to not find anything, so to forward the request to
> >> the right node. What if this node has been the owner in the past? It
> >> might have an old entry locally stored, which would be returned
> >> instead of the correct value which is owned on a different node.
> >>
> >> # shared CacheStore using write-behind
> >> When using an async CacheStore by definition the content of the
> >> CacheStore is not trustworthy if you don't check on the owner first
> >> for entries in memory.
> >>
> >> Both seem critical to me, but the performance impact is really bad too.
> >>
> >> I hoped to make some more tests myself but couldn't look at this yet,
> >> any help from the core team would be appreciated.
> > I think you have a fair point and reads/writes to the data should be
> coordinated through its owners both for performance and (more importantly)
> correctness.
> > Mind creating a JIRA for this?
> >
> >>

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread Mircea Markus

On 19 Mar 2013, at 14:12, Sanne Grinovero wrote:

> Mircea,
> what I was most looking forward was to you comment on the interceptor
> order generated for DIST+cachestores
> - we don't think the ClusteredCacheLoader should be needed at all
> - each DIST node is loading from the CacheLoader (any) rather than
> loading from its peer nodes for non-owned entries (!!)
My intention was to comment precisely on this matters, sorry if the comment 
wasn't clear enough :-)
I think you have fair point that the data should only be loaded from the cache 
store through the main owners: that's for correctness and performance reasons.
Naturally the ClusterCacheLoader would then be deprecated. Would you mind 
creating a JIRA for this?

> 
> This has come up on several threads now and I think it's critically
> wrong, as I commented previously this also introduces many
> inconsistencies - as far as I understand it.
> 
> BTW your gist wouldn't work, the metadata cache needs to load certain
> elements too. But nice you spotted the need to potentially filter what
> "preload" means in the scope of each cache, as the metadata one should
> only preload metadata, while in the original configuration this data
> would indeed be duplicated.
> Opened: https://issues.jboss.org/browse/ISPN-2938
> 
> Sanne
> 
> On 19 March 2013 11:51, Mircea Markus  wrote:
>> 
>> On 16 Mar 2013, at 01:19, Sanne Grinovero wrote:
>> 
>>> Hi Adrian,
>>> let's forget about Lucene details and focus on DIST.
>>> With numOwners=1 and having two nodes the entries should be stored
>>> roughly 50% on each node, I see nothing wrong with that
>>> considering you don't need data failover in a read-only use case
>>> having all the index available in the shared CacheLoader.
>>> 
>>> In such a scenario, and having both nodes preloaded all data, in case
>>> of a get() operation I would expect
>>> either:
>>> A) to be the owner, hence retrieve the value from local in-JVM reference
>>> B) to not be the owner, so to forward the request to the other node
>>> having roughly 50% chance per key to be in case A or B.
>>> 
>>> But when hitting case B) it seems that instead of loading from the
>>> other node, it hits the CacheLoader to fetch the value.
>>> 
>>> I already had asked James to verify with 4 nodes and numOwners=2, the
>>> result is the same so I suggested him to ask here;
>>> BTW I think numOwners=1 is perfectly valid and should work as with
>>> numOwners=1, the only reason I asked him to repeat
>>> the test is that we don't have much tests on the numOwners=1 case and
>>> I was assuming there might be some (wrong) assumptions
>>> affecting this.
>>> 
>>> Note that this is not "just" a critical performance problem but I'm
>>> also suspecting it could provide inconsistent reads, in two classes of
>>> problems:
>>> 
>>> # non-shared CacheStore with stale entries
>>> If for non-owned keys it will hit the local CacheStore first, where
>>> you might expect to not find anything, so to forward the request to
>>> the right node. What if this node has been the owner in the past? It
>>> might have an old entry locally stored, which would be returned
>>> instead of the correct value which is owned on a different node.
>>> 
>>> # shared CacheStore using write-behind
>>> When using an async CacheStore by definition the content of the
>>> CacheStore is not trustworthy if you don't check on the owner first
>>> for entries in memory.
>>> 
>>> Both seem critical to me, but the performance impact is really bad too.
>>> 
>>> I hoped to make some more tests myself but couldn't look at this yet,
>>> any help from the core team would be appreciated.
>> I think you have a fair point and reads/writes to the data should be 
>> coordinated through its owners both for performance and (more importantly) 
>> correctness.
>> Mind creating a JIRA for this?
>> 
>>> 
>>> @Ray, thanks for mentioning the ClusterCacheLoader. Wasn't there
>>> someone else with a CacheLoader issue recently who had worked around
>>> the problem by using a ClusterCacheLoader ?
>>> Do you remember what the scenario was?
>>> 
>>> Cheers,
>>> Sanne
>>> 
>>> On 15 March 2013 15:44, Adrian Nistor  wrote:
 Hi James,
 
 I'm not an expert on InfinispanDirectory but I've noticed in [1] that
 the lucene-index cache is distributed with numOwners = 1. That means
 each cache entry is owned by just one cluster node and there's nowhere
 else to go in the cluster if the key is not available in local memory,
 thus it needs fetching from the cache store. This can be solved with
 numOwners > 1.
 Please let me know if this solves your problem.
 
 Cheers!
 
 On 03/15/2013 05:03 PM, James Aley wrote:
> Hey all,
> 
> 
> Seeing as this is my first post, I wanted to just quickly thank you
> all for Infinispan. So far I'm really enjoying working with it - great
> product!
> 
> 
> I'm using the InfinispanDirectory for a Lucene project at the moment.
> We use Lucen

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread Mircea Markus

On 19 Mar 2013, at 16:15, Dan Berindei wrote:

> Hi Sanne
> 
> On Tue, Mar 19, 2013 at 4:12 PM, Sanne Grinovero  wrote:
> Mircea,
> what I was most looking forward was to you comment on the interceptor
> order generated for DIST+cachestores
>  - we don't think the ClusteredCacheLoader should be needed at all
> 
> Agree, ClusteredCacheLoader should not be necessary.
> 
> James, if you're still seeing problems with numOwners=1, could you create an 
> issue in JIRA?
> 
>  
>  - each DIST node is loading from the CacheLoader (any) rather than
> loading from its peer nodes for non-owned entries (!!)
> 
> 
> Sometimes loading stuff from a local disk is faster than going remote, e.g. 
> if you have numOwners=2 and both owners have to load the same entry from disk 
> and send it to the originator twice. 
the staggering of remote gets should overcome that. 
> 
> Still, most of the time the entry is going to be in memory on the owner 
> nodes, so the local load is slower (especially with a shared cache store, 
> where loading is over the network as well).
+1
> 
>  
> This has come up on several threads now and I think it's critically
> wrong, as I commented previously this also introduces many
> inconsistencies - as far as I understand it.
> 
> 
> Is there a JIRA for this already?
> 
> Yes, loading a stale entry from the local cache store is definitely not a 
> good thing, but we actually delete the non-owned entries after the initial 
> state transfer. There may be some consistency issues if one uses a DIST_SYNC 
> cache with a shared async cache store, but fully sync configurations should 
> be fine.
> 
> OTOH, if the cache store is not shared, the chances of finding the entry in 
> the local store on a non-owner are slim to none, so it doesn't make sense to 
> do the lookup.
> 
> Implementation-wise, just changing the interceptor order is probably not 
> enough. If the key doesn't exist in the cache, the CacheLoaderInterceptor 
> will still try to load it from the cache store after the remote lookup, so 
> we'll need a marker  in the invocation context to avoid the extra cache store 
> load.
if the key does't map to the local node it should trigger a remote get to 
owners (or allow the dist interceptor to do just that)
> Actually, since this is just a performance issue, it could wait until we 
> implement tombstones everywhere.
Hmm, not sure i see the correlation between this and tombstones? 

> 
> BTW your gist wouldn't work, the metadata cache needs to load certain
> elements too. But nice you spotted the need to potentially filter what
> "preload" means in the scope of each cache, as the metadata one should
> only preload metadata, while in the original configuration this data
> would indeed be duplicated.
> Opened: https://issues.jboss.org/browse/ISPN-2938
> 
> Sanne
> 
> On 19 March 2013 11:51, Mircea Markus  wrote:
> >
> > On 16 Mar 2013, at 01:19, Sanne Grinovero wrote:
> >
> >> Hi Adrian,
> >> let's forget about Lucene details and focus on DIST.
> >> With numOwners=1 and having two nodes the entries should be stored
> >> roughly 50% on each node, I see nothing wrong with that
> >> considering you don't need data failover in a read-only use case
> >> having all the index available in the shared CacheLoader.
> >>
> >> In such a scenario, and having both nodes preloaded all data, in case
> >> of a get() operation I would expect
> >> either:
> >> A) to be the owner, hence retrieve the value from local in-JVM reference
> >> B) to not be the owner, so to forward the request to the other node
> >> having roughly 50% chance per key to be in case A or B.
> >>
> >> But when hitting case B) it seems that instead of loading from the
> >> other node, it hits the CacheLoader to fetch the value.
> >>
> >> I already had asked James to verify with 4 nodes and numOwners=2, the
> >> result is the same so I suggested him to ask here;
> >> BTW I think numOwners=1 is perfectly valid and should work as with
> >> numOwners=1, the only reason I asked him to repeat
> >> the test is that we don't have much tests on the numOwners=1 case and
> >> I was assuming there might be some (wrong) assumptions
> >> affecting this.
> >>
> >> Note that this is not "just" a critical performance problem but I'm
> >> also suspecting it could provide inconsistent reads, in two classes of
> >> problems:
> >>
> >> # non-shared CacheStore with stale entries
> >> If for non-owned keys it will hit the local CacheStore first, where
> >> you might expect to not find anything, so to forward the request to
> >> the right node. What if this node has been the owner in the past? It
> >> might have an old entry locally stored, which would be returned
> >> instead of the correct value which is owned on a different node.
> >>
> >> # shared CacheStore using write-behind
> >> When using an async CacheStore by definition the content of the
> >> CacheStore is not trustworthy if you don't check on the owner first
> >> for entries in memory.
> >>
> >> Both seem critica

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread Dan Berindei
>

> > Implementation-wise, just changing the interceptor order is probably not
> enough. If the key doesn't exist in the cache, the CacheLoaderInterceptor
> will still try to load it from the cache store after the remote lookup, so
> we'll need a marker  in the invocation context to avoid the extra cache
> store load.
> if the key does't map to the local node it should trigger a remote get to
> owners (or allow the dist interceptor to do just that)
> > Actually, since this is just a performance issue, it could wait until we
> implement tombstones everywhere.
> Hmm, not sure i see the correlation between this and tombstones?
>
>
If the key doesn't exist in the cache at all, on any node, then the remote
lookup will return null and the CacheLoaderInterceptor will try to load it
from the local cache store again (assuming we move CacheLoaderInterceptor
after DistributionInterceptor). If DistributionInterceptor put a tombstone
in the invocation context for that key, CacheLoaderInterceptor could avoid
that extra cache store lookup.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread Mircea Markus

On 19 Mar 2013, at 17:38, Dan Berindei wrote:

> >
> > Implementation-wise, just changing the interceptor order is probably not 
> > enough. If the key doesn't exist in the cache, the CacheLoaderInterceptor 
> > will still try to load it from the cache store after the remote lookup, so 
> > we'll need a marker  in the invocation context to avoid the extra cache 
> > store load.
> if the key does't map to the local node it should trigger a remote get to 
> owners (or allow the dist interceptor to do just that)
> > Actually, since this is just a performance issue, it could wait until we 
> > implement tombstones everywhere.
> Hmm, not sure i see the correlation between this and tombstones?
> 
> 
> If the key doesn't exist in the cache at all, on any node, then the remote 
> lookup will return null and the CacheLoaderInterceptor will try to load it 
> from the local cache store again (assuming we move CacheLoaderInterceptor 
> after DistributionInterceptor). If DistributionInterceptor put a tombstone in 
> the invocation context for that key, CacheLoaderInterceptor could avoid that 
> extra cache store lookup.
I think the rule for going to the caches store should be based on key locality 
- if the key does not map to the local node, then don't involve the store at 
all locally,  but delegate the store interaction to actual owner.

Cheers,
-- 
Mircea Markus
Infinispan lead (www.infinispan.org)





___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-20 Thread Mircea Markus
FYI I've created a JIRA to track this: https://issues.jboss.org/browse/ISPN-2950
Whilst quite a performance issues, I don't think that this is an 
critical/consistency issue for async stores: by using an async store you might 
loose data (expect inconsistencies) during a node crash anyway, so what this 
behaviour does is just to increase the inconsistency window.
 

On 19 Mar 2013, at 16:30, Mircea Markus wrote:
> 
> On 19 Mar 2013, at 16:15, Dan Berindei wrote:
> 
>> Hi Sanne
>> 
>> On Tue, Mar 19, 2013 at 4:12 PM, Sanne Grinovero  
>> wrote:
>> Mircea,
>> what I was most looking forward was to you comment on the interceptor
>> order generated for DIST+cachestores
>> - we don't think the ClusteredCacheLoader should be needed at all
>> 
>> Agree, ClusteredCacheLoader should not be necessary.
>> 
>> James, if you're still seeing problems with numOwners=1, could you create an 
>> issue in JIRA?
>> 
>> 
>> - each DIST node is loading from the CacheLoader (any) rather than
>> loading from its peer nodes for non-owned entries (!!)
>> 
>> 
>> Sometimes loading stuff from a local disk is faster than going remote, e.g. 
>> if you have numOwners=2 and both owners have to load the same entry from 
>> disk and send it to the originator twice. 
> the staggering of remote gets should overcome that. 
>> 
>> Still, most of the time the entry is going to be in memory on the owner 
>> nodes, so the local load is slower (especially with a shared cache store, 
>> where loading is over the network as well).
> +1
>> 
>> 
>> This has come up on several threads now and I think it's critically
>> wrong, as I commented previously this also introduces many
>> inconsistencies - as far as I understand it.
>> 
>> 
>> Is there a JIRA for this already?
>> 
>> Yes, loading a stale entry from the local cache store is definitely not a 
>> good thing, but we actually delete the non-owned entries after the initial 
>> state transfer. There may be some consistency issues if one uses a DIST_SYNC 
>> cache with a shared async cache store, but fully sync configurations should 
>> be fine.
>> 
>> OTOH, if the cache store is not shared, the chances of finding the entry in 
>> the local store on a non-owner are slim to none, so it doesn't make sense to 
>> do the lookup.
>> 
>> Implementation-wise, just changing the interceptor order is probably not 
>> enough. If the key doesn't exist in the cache, the CacheLoaderInterceptor 
>> will still try to load it from the cache store after the remote lookup, so 
>> we'll need a marker  in the invocation context to avoid the extra cache 
>> store load.
> if the key does't map to the local node it should trigger a remote get to 
> owners (or allow the dist interceptor to do just that)
>> Actually, since this is just a performance issue, it could wait until we 
>> implement tombstones everywhere.
> Hmm, not sure i see the correlation between this and tombstones? 
> 
>> 
>> BTW your gist wouldn't work, the metadata cache needs to load certain
>> elements too. But nice you spotted the need to potentially filter what
>> "preload" means in the scope of each cache, as the metadata one should
>> only preload metadata, while in the original configuration this data
>> would indeed be duplicated.
>> Opened: https://issues.jboss.org/browse/ISPN-2938
>> 
>> Sanne
>> 
>> On 19 March 2013 11:51, Mircea Markus  wrote:
>>> 
>>> On 16 Mar 2013, at 01:19, Sanne Grinovero wrote:
>>> 
 Hi Adrian,
 let's forget about Lucene details and focus on DIST.
 With numOwners=1 and having two nodes the entries should be stored
 roughly 50% on each node, I see nothing wrong with that
 considering you don't need data failover in a read-only use case
 having all the index available in the shared CacheLoader.
 
 In such a scenario, and having both nodes preloaded all data, in case
 of a get() operation I would expect
 either:
 A) to be the owner, hence retrieve the value from local in-JVM reference
 B) to not be the owner, so to forward the request to the other node
 having roughly 50% chance per key to be in case A or B.
 
 But when hitting case B) it seems that instead of loading from the
 other node, it hits the CacheLoader to fetch the value.
 
 I already had asked James to verify with 4 nodes and numOwners=2, the
 result is the same so I suggested him to ask here;
 BTW I think numOwners=1 is perfectly valid and should work as with
 numOwners=1, the only reason I asked him to repeat
 the test is that we don't have much tests on the numOwners=1 case and
 I was assuming there might be some (wrong) assumptions
 affecting this.
 
 Note that this is not "just" a critical performance problem but I'm
 also suspecting it could provide inconsistent reads, in two classes of
 problems:
 
 # non-shared CacheStore with stale entries
 If for non-owned keys it will hit the local CacheStore first, where
 you might expect to not 

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-20 Thread James Aley
Hi all,

Thanks for the help with this issue. I thought I'd just clarify that
the situation is pretty much resolved (or worked around) for me now by
use of the clusterLoader. I'll watch the JIRA issue and be sure to try
again without a clusterLoader when that's taken care of at some point.

Best,
James.

On 20 March 2013 15:05, Mircea Markus  wrote:
> FYI I've created a JIRA to track this: 
> https://issues.jboss.org/browse/ISPN-2950
> Whilst quite a performance issues, I don't think that this is an 
> critical/consistency issue for async stores: by using an async store you 
> might loose data (expect inconsistencies) during a node crash anyway, so what 
> this behaviour does is just to increase the inconsistency window.
>
>
> On 19 Mar 2013, at 16:30, Mircea Markus wrote:
>>
>> On 19 Mar 2013, at 16:15, Dan Berindei wrote:
>>
>>> Hi Sanne
>>>
>>> On Tue, Mar 19, 2013 at 4:12 PM, Sanne Grinovero  
>>> wrote:
>>> Mircea,
>>> what I was most looking forward was to you comment on the interceptor
>>> order generated for DIST+cachestores
>>> - we don't think the ClusteredCacheLoader should be needed at all
>>>
>>> Agree, ClusteredCacheLoader should not be necessary.
>>>
>>> James, if you're still seeing problems with numOwners=1, could you create 
>>> an issue in JIRA?
>>>
>>>
>>> - each DIST node is loading from the CacheLoader (any) rather than
>>> loading from its peer nodes for non-owned entries (!!)
>>>
>>>
>>> Sometimes loading stuff from a local disk is faster than going remote, e.g. 
>>> if you have numOwners=2 and both owners have to load the same entry from 
>>> disk and send it to the originator twice.
>> the staggering of remote gets should overcome that.
>>>
>>> Still, most of the time the entry is going to be in memory on the owner 
>>> nodes, so the local load is slower (especially with a shared cache store, 
>>> where loading is over the network as well).
>> +1
>>>
>>>
>>> This has come up on several threads now and I think it's critically
>>> wrong, as I commented previously this also introduces many
>>> inconsistencies - as far as I understand it.
>>>
>>>
>>> Is there a JIRA for this already?
>>>
>>> Yes, loading a stale entry from the local cache store is definitely not a 
>>> good thing, but we actually delete the non-owned entries after the initial 
>>> state transfer. There may be some consistency issues if one uses a 
>>> DIST_SYNC cache with a shared async cache store, but fully sync 
>>> configurations should be fine.
>>>
>>> OTOH, if the cache store is not shared, the chances of finding the entry in 
>>> the local store on a non-owner are slim to none, so it doesn't make sense 
>>> to do the lookup.
>>>
>>> Implementation-wise, just changing the interceptor order is probably not 
>>> enough. If the key doesn't exist in the cache, the CacheLoaderInterceptor 
>>> will still try to load it from the cache store after the remote lookup, so 
>>> we'll need a marker  in the invocation context to avoid the extra cache 
>>> store load.
>> if the key does't map to the local node it should trigger a remote get to 
>> owners (or allow the dist interceptor to do just that)
>>> Actually, since this is just a performance issue, it could wait until we 
>>> implement tombstones everywhere.
>> Hmm, not sure i see the correlation between this and tombstones?
>>
>>>
>>> BTW your gist wouldn't work, the metadata cache needs to load certain
>>> elements too. But nice you spotted the need to potentially filter what
>>> "preload" means in the scope of each cache, as the metadata one should
>>> only preload metadata, while in the original configuration this data
>>> would indeed be duplicated.
>>> Opened: https://issues.jboss.org/browse/ISPN-2938
>>>
>>> Sanne
>>>
>>> On 19 March 2013 11:51, Mircea Markus  wrote:

 On 16 Mar 2013, at 01:19, Sanne Grinovero wrote:

> Hi Adrian,
> let's forget about Lucene details and focus on DIST.
> With numOwners=1 and having two nodes the entries should be stored
> roughly 50% on each node, I see nothing wrong with that
> considering you don't need data failover in a read-only use case
> having all the index available in the shared CacheLoader.
>
> In such a scenario, and having both nodes preloaded all data, in case
> of a get() operation I would expect
> either:
> A) to be the owner, hence retrieve the value from local in-JVM reference
> B) to not be the owner, so to forward the request to the other node
> having roughly 50% chance per key to be in case A or B.
>
> But when hitting case B) it seems that instead of loading from the
> other node, it hits the CacheLoader to fetch the value.
>
> I already had asked James to verify with 4 nodes and numOwners=2, the
> result is the same so I suggested him to ask here;
> BTW I think numOwners=1 is perfectly valid and should work as with
> numOwners=1, the only reason I asked him to repeat
> the test is that we don't have much tests