Re: [infinispan-dev] strictly not returning expired values
On 18 October 2011 18:45, Elias Ross gen...@noderunner.net wrote: On Tue, Oct 18, 2011 at 10:13 AM, Mircea Markus mircea.mar...@jboss.com wrote: The way I see it a user might configure an expiry for two reasons: - to manage memory and/or - to control how much staleness the data it uses might have I've mostly used expiry to control data staleness. For example, I would hold a token for security validation. It wasn't entirely necessary to have a high degree of accuracy, maybe an order of a few seconds. But in many cases if the expiry thread runs every minute I suspect this would be too long. In my original implementation of expiration for JBoss TreeCache I used a heap (priority queue) to order entries. This kept the implementation very fast, at the expense of memory usage of course. I agree that doing removals during data access is a very bad idea. I would suggest having two user options: 1) Infrequent expiration scans 2) Or, use of a priority queue and more frequent scanning. Thanks Elias, feedback from a real use case is exactly what I was needing most. Security tokens is an interesting application; wouldn't it make sense for you to store a timestamp in your value too, and check on that as well? (in combination with automatic expiration). Your suggestions are what I would do too; not sure about the priority queue exactly, but the solution clearly is in making the eviction process more efficient - a priority queue might be a way, or anything else which might make the eviction process quicker. It seems we're discussing implementation details in the other thread Time measurement and expiry, so I'll give more details there but one of the reasons for me to start these discussions is that if I'm allowed to make these changes, then the eviction process will be much faster as a consequence of not invoking the wall clock at every entry of the cache. How much faster, that's to be measured yet.. focusing this thread on the use cases, what sort of precision would you expect? Should we try to stay under a single second for an average heap (and how big is that?) to be considered correct ? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] strictly not returning expired values
Not because of the cost of reading the clock, my main motivation is that we artificially create cache misses which might be very expensive to the client code. On top of that, the fact that this check also has a cost - not arguing how low - makes me wonder even more. As I see it, it's unavoidable for a cache to provide a cache miss when the data is gone, like GC'd, and we're not able to return it. It's also reasonable to expire values and remove them to make sure we don't run out of memory - but if there's a chance in which the value was not garbage collected and we're able to return it, we should return it. This has a potential good effect on the client application, and as it turns out checking time is expensive too we will save some cost at every get operation, but this is just in addition to the greater benefit of reducing the ratio of cache misses. Sanne On 18 October 2011 11:05, Galder Zamarreño gal...@redhat.com wrote: I'm not sure I'm understanding the motivation of this. Is it because you're finding System.currentTimeMillis to be expensive? If so, how expensive and in which scenarios (set up, test…etc) On Oct 17, 2011, at 6:51 PM, Sanne Grinovero wrote: I've noticed that every time we perform a get() operation (or any other retrieval) the value is NOT returned to the client if it's expired, which is going to receive a null instead. It looks like that these checks are in place to prevent a) a race condition with the eviction process b) to not return very old values from a very large wakeUpInterval between eviction activity c) to load from cacheLoaders Now for the first case, as a user of a caching library I would prefer to get the value instead. Expiry is in place only to prevent me from wasting memory, but this entry wasn't collected yet so why is Infinispan denying me access to the value I want to retrieve? We're introducing unneeded cache misses. Since even with the current code we can not provide guarantees that we won't ever return an expired entry (between the check and the actual return to user code we might be briefly suspended), I don't see why we should not make it clear that the expiry timeout is a best-effort time and relax our code checks a bit; there are many reasons to do so: 1) avoid fictitious cache misses. 2) Avoid the need to invoke a System.currentTimeMillis() for each retrieval operation - that's expensive! 3) Some operations like size() or values() now explicitly iterate the whole result set twice to make sure all expired entries are removed. This is unnecessary following the reasoning above. Regarding the other cases: b) We don't need to care imo. If people need more precision they should use a lower wakeUpInterval. c) We should keep this logic in CacheLoader, just because having an aggressive wakeUpInterval might be very expensive in this case. I'm asking as I've built a quick POC for ISPN-1459 but since now the current time needs to be passed to the entry, I end up needing to invoke the wall clock even for immortal cache entries making many other use cases slower: not happy with that, I'd rather speed up both use cases. Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] strictly not returning expired values
On 18 Oct 2011, at 12:05, Sanne Grinovero wrote: Not because of the cost of reading the clock, my main motivation is that we artificially create cache misses which might be very expensive to the client code. On top of that, the fact that this check also has a cost - not arguing how low - makes me wonder even more. As I see it, it's unavoidable for a cache to provide a cache miss when the data is gone, like GC'd, and we're not able to return it. It's also reasonable to expire values and remove them to make sure we don't run out of memory - but if there's a chance in which the value was not garbage collected and we're able to return it, we should return it. The way I see it a user might configure an expiry for two reasons: - to manage memory and/or - to control how much staleness the data it uses might have IMO this is something the user must decide on a use case by use case basis, hence I think a config would be better. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] strictly not returning expired values
On Tue, Oct 18, 2011 at 10:13 AM, Mircea Markus mircea.mar...@jboss.com wrote: The way I see it a user might configure an expiry for two reasons: - to manage memory and/or - to control how much staleness the data it uses might have I've mostly used expiry to control data staleness. For example, I would hold a token for security validation. It wasn't entirely necessary to have a high degree of accuracy, maybe an order of a few seconds. But in many cases if the expiry thread runs every minute I suspect this would be too long. In my original implementation of expiration for JBoss TreeCache I used a heap (priority queue) to order entries. This kept the implementation very fast, at the expense of memory usage of course. I agree that doing removals during data access is a very bad idea. I would suggest having two user options: 1) Infrequent expiration scans 2) Or, use of a priority queue and more frequent scanning. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] strictly not returning expired values
On Mon, Oct 17, 2011 at 7:51 PM, Sanne Grinovero sa...@infinispan.org wrote: I've noticed that every time we perform a get() operation (or any other retrieval) the value is NOT returned to the client if it's expired, which is going to receive a null instead. It looks like that these checks are in place to prevent a) a race condition with the eviction process b) to not return very old values from a very large wakeUpInterval between eviction activity c) to load from cacheLoaders Now for the first case, as a user of a caching library I would prefer to get the value instead. Expiry is in place only to prevent me from wasting memory, but this entry wasn't collected yet so why is Infinispan denying me access to the value I want to retrieve? We're introducing unneeded cache misses. Since even with the current code we can not provide guarantees that we won't ever return an expired entry (between the check and the actual return to user code we might be briefly suspended), I don't see why we should not make it clear that the expiry timeout is a best-effort time and relax our code checks a bit; there are many reasons to do so: 1) avoid fictitious cache misses. 2) Avoid the need to invoke a System.currentTimeMillis() for each retrieval operation - that's expensive! 3) Some operations like size() or values() now explicitly iterate the whole result set twice to make sure all expired entries are removed. This is unnecessary following the reasoning above. Regarding the other cases: b) We don't need to care imo. If people need more precision they should use a lower wakeUpInterval. I don't think this is an option, as a 5 seconds wakeUpInterval was deemed to short not long ago: https://source.jboss.org/viewrep/Infinispan/core/src/main/java/org/infinispan/config/Configuration.java?r1=9fe9a8cb6e49308b9e45f09b2f5324a7daefdee3r2=9cda361dc6df7bcecde95abc31fcbab11f734360 c) We should keep this logic in CacheLoader, just because having an aggressive wakeUpInterval might be very expensive in this case. I'm asking as I've built a quick POC for ISPN-1459 but since now the current time needs to be passed to the entry, I end up needing to invoke the wall clock even for immortal cache entries making many other use cases slower: not happy with that, I'd rather speed up both use cases. I guess you could pass in a Clock that lazily calls System.currentTimeMillis() and saves it for subsequent calls, but that will complicate things again. Dan Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] strictly not returning expired values
On 17 Oct 2011, at 18:21, Dan Berindei wrote: On Mon, Oct 17, 2011 at 7:51 PM, Sanne Grinovero sa...@infinispan.org wrote: I've noticed that every time we perform a get() operation (or any other retrieval) the value is NOT returned to the client if it's expired, which is going to receive a null instead. It looks like that these checks are in place to prevent a) a race condition with the eviction process b) to not return very old values from a very large wakeUpInterval between eviction activity c) to load from cacheLoaders Now for the first case, as a user of a caching library I would prefer to get the value instead. Expiry is in place only to prevent me from wasting memory, but this entry wasn't collected yet so why is Infinispan denying me access to the value I want to retrieve? We're introducing unneeded cache misses. Since even with the current code we can not provide guarantees that we won't ever return an expired entry (between the check and the actual return to user code we might be briefly suspended), I don't see why we should not make it clear that the expiry timeout is a best-effort time and relax our code checks a bit; there are many reasons to do so: 1) avoid fictitious cache misses. 2) Avoid the need to invoke a System.currentTimeMillis() for each retrieval operation - that's expensive! 3) Some operations like size() or values() now explicitly iterate the whole result set twice to make sure all expired entries are removed. This is unnecessary following the reasoning above. Regarding the other cases: b) We don't need to care imo. If people need more precision they should use a lower wakeUpInterval. I don't think this is an option, as a 5 seconds wakeUpInterval was deemed to short not long ago: https://source.jboss.org/viewrep/Infinispan/core/src/main/java/org/infinispan/config/Configuration.java?r1=9fe9a8cb6e49308b9e45f09b2f5324a7daefdee3r2=9cda361dc6df7bcecde95abc31fcbab11f734360 perhaps a configuration option disabled by default? c) We should keep this logic in CacheLoader, just because having an aggressive wakeUpInterval might be very expensive in this case. I'm asking as I've built a quick POC for ISPN-1459 but since now the current time needs to be passed to the entry, I end up needing to invoke the wall clock even for immortal cache entries making many other use cases slower: not happy with that, I'd rather speed up both use cases. I guess you could pass in a Clock that lazily calls System.currentTimeMillis() and saves it for subsequent calls, but that will complicate things again. Complicate by introducing an inaccuracy on the exact time? Code-wise seems reasonably simple... ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev