Re: [infinispan-dev] strictly not returning expired values

2011-10-19 Thread Sanne Grinovero
On 18 October 2011 18:45, Elias Ross gen...@noderunner.net wrote:
 On Tue, Oct 18, 2011 at 10:13 AM, Mircea Markus mircea.mar...@jboss.com 
 wrote:

 The way I see it a user might configure an expiry for two reasons:
 - to manage memory
  and/or
 - to control how much staleness the data it uses might have

 I've mostly used expiry to control data staleness. For example, I
 would hold a token for security validation. It wasn't entirely
 necessary to have a high degree of accuracy, maybe an order of a few
 seconds. But in many cases if the expiry thread runs every minute I
 suspect this would be too long.

 In my original implementation of expiration for JBoss TreeCache I used
 a heap (priority queue) to order entries. This kept the implementation
 very fast, at the expense of memory usage of course.

 I agree that doing removals during data access is a very bad idea.

 I would suggest having two user options: 1) Infrequent expiration
 scans 2) Or, use of a priority queue and more frequent scanning.

Thanks Elias, feedback from a real use case is exactly what I was needing most.

Security tokens is an interesting application; wouldn't it make sense
for you to store a timestamp in your value too, and check on that as
well? (in combination with automatic expiration).

Your suggestions are what I would do too; not sure about the priority
queue exactly, but the solution clearly is in making the eviction
process more efficient - a priority queue might be a way, or anything
else which might make the eviction process quicker. It seems we're
discussing implementation details in the other thread Time
measurement and expiry, so I'll give more details there but one of
the reasons for me to start these discussions is that if I'm allowed
to make these changes, then the eviction process will be much faster
as a consequence of not invoking the wall clock at every entry of the
cache.
How much faster, that's to be measured yet.. focusing this thread on
the use cases, what sort of precision would you expect? Should we try
to stay under a single second for an average heap (and how big is
that?) to be considered correct ?

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] strictly not returning expired values

2011-10-18 Thread Sanne Grinovero
Not because of the cost of reading the clock, my main motivation is
that we artificially create cache misses which might be very expensive
to the client code.

On top of that, the fact that this check also has a cost - not arguing
how low - makes me wonder even more.

As I see it, it's unavoidable for a cache to provide a cache miss when
the data is gone, like GC'd, and we're not able to return it. It's
also reasonable to expire values and remove them to make sure we don't
run out of memory - but if there's a chance in which the value was not
garbage collected and we're able to return it, we should return it.
This has a potential good effect on the client application, and as it
turns out checking time is expensive too we will save some cost at
every get operation, but this is just in addition to the greater
benefit of reducing the ratio of cache misses.

Sanne

On 18 October 2011 11:05, Galder Zamarreño gal...@redhat.com wrote:
 I'm not sure I'm understanding the motivation of this. Is it because you're 
 finding System.currentTimeMillis to be expensive?

 If so, how expensive and in which scenarios (set up, test…etc)

 On Oct 17, 2011, at 6:51 PM, Sanne Grinovero wrote:

 I've noticed that every time we perform a get() operation (or any
 other retrieval) the value is NOT returned to the client if it's
 expired, which is going to receive a null instead.

 It looks like that these checks are in place to prevent
 a) a race condition with the eviction process
 b) to not return very old values from a very large wakeUpInterval
 between eviction activity
 c) to load from cacheLoaders

 Now for the first case, as a user of a caching library I would prefer
 to get the value instead. Expiry is in place only to prevent me from
 wasting memory, but this entry wasn't collected yet so why is
 Infinispan denying me access to the value I want to retrieve? We're
 introducing unneeded cache misses.

 Since even with the current code we can not provide guarantees that we
 won't ever return an expired entry (between  the check and the actual
 return to user code we might be briefly suspended), I don't see why we
 should not make it clear that the expiry timeout is a best-effort time
 and relax our code checks a bit; there are many reasons to do so:

 1) avoid fictitious cache misses.

 2) Avoid the need to invoke a System.currentTimeMillis() for each
 retrieval operation - that's expensive!

 3) Some operations like size() or values() now explicitly iterate the
 whole result set twice to make sure all expired entries are removed.
 This is unnecessary following the reasoning above.

 Regarding the other cases:
 b) We don't need to care imo. If people need more precision they
 should use a lower wakeUpInterval.
 c) We should keep this logic in CacheLoader, just because having an
 aggressive wakeUpInterval might be very expensive in this case.

 I'm asking as I've built a quick POC for ISPN-1459 but since now the
 current time needs to be passed to the entry, I end up needing to
 invoke the wall clock even for immortal cache entries making many
 other use cases slower: not happy with that, I'd rather speed up both
 use cases.

 Sanne
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

 --
 Galder Zamarreño
 Sr. Software Engineer
 Infinispan, JBoss Cache


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] strictly not returning expired values

2011-10-18 Thread Mircea Markus

On 18 Oct 2011, at 12:05, Sanne Grinovero wrote:

 Not because of the cost of reading the clock, my main motivation is
 that we artificially create cache misses which might be very expensive
 to the client code.
 
 On top of that, the fact that this check also has a cost - not arguing
 how low - makes me wonder even more.
 
 As I see it, it's unavoidable for a cache to provide a cache miss when
 the data is gone, like GC'd, and we're not able to return it. It's
 also reasonable to expire values and remove them to make sure we don't
 run out of memory - but if there's a chance in which the value was not
 garbage collected and we're able to return it, we should return it.
The way I see it a user might configure an expiry for two reasons: 
- to manage memory
 and/or
- to control how much staleness the data it uses might have

IMO this is something the user must decide on a use case by use case basis, 
hence I think a config would be better.


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] strictly not returning expired values

2011-10-18 Thread Elias Ross
On Tue, Oct 18, 2011 at 10:13 AM, Mircea Markus mircea.mar...@jboss.com wrote:

 The way I see it a user might configure an expiry for two reasons:
 - to manage memory
  and/or
 - to control how much staleness the data it uses might have

I've mostly used expiry to control data staleness. For example, I
would hold a token for security validation. It wasn't entirely
necessary to have a high degree of accuracy, maybe an order of a few
seconds. But in many cases if the expiry thread runs every minute I
suspect this would be too long.

In my original implementation of expiration for JBoss TreeCache I used
a heap (priority queue) to order entries. This kept the implementation
very fast, at the expense of memory usage of course.

I agree that doing removals during data access is a very bad idea.

I would suggest having two user options: 1) Infrequent expiration
scans 2) Or, use of a priority queue and more frequent scanning.

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] strictly not returning expired values

2011-10-17 Thread Dan Berindei
On Mon, Oct 17, 2011 at 7:51 PM, Sanne Grinovero sa...@infinispan.org wrote:
 I've noticed that every time we perform a get() operation (or any
 other retrieval) the value is NOT returned to the client if it's
 expired, which is going to receive a null instead.

 It looks like that these checks are in place to prevent
 a) a race condition with the eviction process
 b) to not return very old values from a very large wakeUpInterval
 between eviction activity
 c) to load from cacheLoaders

 Now for the first case, as a user of a caching library I would prefer
 to get the value instead. Expiry is in place only to prevent me from
 wasting memory, but this entry wasn't collected yet so why is
 Infinispan denying me access to the value I want to retrieve? We're
 introducing unneeded cache misses.

 Since even with the current code we can not provide guarantees that we
 won't ever return an expired entry (between  the check and the actual
 return to user code we might be briefly suspended), I don't see why we
 should not make it clear that the expiry timeout is a best-effort time
 and relax our code checks a bit; there are many reasons to do so:

 1) avoid fictitious cache misses.

 2) Avoid the need to invoke a System.currentTimeMillis() for each
 retrieval operation - that's expensive!

 3) Some operations like size() or values() now explicitly iterate the
 whole result set twice to make sure all expired entries are removed.
 This is unnecessary following the reasoning above.

 Regarding the other cases:
 b) We don't need to care imo. If people need more precision they
 should use a lower wakeUpInterval.

I don't think this is an option, as a 5 seconds wakeUpInterval was
deemed to short not long ago:
https://source.jboss.org/viewrep/Infinispan/core/src/main/java/org/infinispan/config/Configuration.java?r1=9fe9a8cb6e49308b9e45f09b2f5324a7daefdee3r2=9cda361dc6df7bcecde95abc31fcbab11f734360

 c) We should keep this logic in CacheLoader, just because having an
 aggressive wakeUpInterval might be very expensive in this case.

 I'm asking as I've built a quick POC for ISPN-1459 but since now the
 current time needs to be passed to the entry, I end up needing to
 invoke the wall clock even for immortal cache entries making many
 other use cases slower: not happy with that, I'd rather speed up both
 use cases.


I guess you could pass in a Clock that lazily calls
System.currentTimeMillis() and saves it for subsequent calls, but that
will complicate things again.

Dan


 Sanne
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] strictly not returning expired values

2011-10-17 Thread Mircea Markus

On 17 Oct 2011, at 18:21, Dan Berindei wrote:

 On Mon, Oct 17, 2011 at 7:51 PM, Sanne Grinovero sa...@infinispan.org wrote:
 I've noticed that every time we perform a get() operation (or any
 other retrieval) the value is NOT returned to the client if it's
 expired, which is going to receive a null instead.
 
 It looks like that these checks are in place to prevent
 a) a race condition with the eviction process
 b) to not return very old values from a very large wakeUpInterval
 between eviction activity
 c) to load from cacheLoaders
 
 Now for the first case, as a user of a caching library I would prefer
 to get the value instead. Expiry is in place only to prevent me from
 wasting memory, but this entry wasn't collected yet so why is
 Infinispan denying me access to the value I want to retrieve? We're
 introducing unneeded cache misses.
 
 Since even with the current code we can not provide guarantees that we
 won't ever return an expired entry (between  the check and the actual
 return to user code we might be briefly suspended), I don't see why we
 should not make it clear that the expiry timeout is a best-effort time
 and relax our code checks a bit; there are many reasons to do so:
 
 1) avoid fictitious cache misses.
 
 2) Avoid the need to invoke a System.currentTimeMillis() for each
 retrieval operation - that's expensive!
 
 3) Some operations like size() or values() now explicitly iterate the
 whole result set twice to make sure all expired entries are removed.
 This is unnecessary following the reasoning above.
 
 Regarding the other cases:
 b) We don't need to care imo. If people need more precision they
 should use a lower wakeUpInterval.
 
 I don't think this is an option, as a 5 seconds wakeUpInterval was
 deemed to short not long ago:
 https://source.jboss.org/viewrep/Infinispan/core/src/main/java/org/infinispan/config/Configuration.java?r1=9fe9a8cb6e49308b9e45f09b2f5324a7daefdee3r2=9cda361dc6df7bcecde95abc31fcbab11f734360
perhaps a configuration option disabled by default? 
 
 c) We should keep this logic in CacheLoader, just because having an
 aggressive wakeUpInterval might be very expensive in this case.
 
 I'm asking as I've built a quick POC for ISPN-1459 but since now the
 current time needs to be passed to the entry, I end up needing to
 invoke the wall clock even for immortal cache entries making many
 other use cases slower: not happy with that, I'd rather speed up both
 use cases.
 
 
 I guess you could pass in a Clock that lazily calls
 System.currentTimeMillis() and saves it for subsequent calls, but that
 will complicate things again.
Complicate by introducing an inaccuracy  on the exact time? Code-wise seems 
reasonably simple... 
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev