[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user cestella commented on the issue: https://github.com/apache/metron/pull/1015 +1 for sure ---
[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/1015 lgtm, +1 ---
[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1015 @mmiklavc bump. Did I answer your question? How is this looking? ---
[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1015 In the latest, I removed the problematic test case. There is no need to test the eviction policy of the underlying cache, which is effectively what is happening. If anyone can find gaps in test coverage, please let me know. ---
[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1015 > @ben-manes Youâre code shouldnât make assumptions on what is evicted as we will try to maximize the hit rate using various techniques. That makes a ton of sense. Thanks for the tips. ---
[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user ottobackwards commented on the issue: https://github.com/apache/metron/pull/1015 Is there not some relevant Caffeine test we can ape for this? ---
[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user ben-manes commented on the issue: https://github.com/apache/metron/pull/1015 Cache eviction is performed asynchronously by default using the FJP common pool. For testing you can set `executor(Runnable::run)` and it will be on the calling thread. The penalty is small so itâs sometimes used in practice anyway. The eviction is not LRU and has a random seed to guard against HashDoS attacks. This means that which entry is not predictable without forcing the seed, which is not exposed. Typically only the cacheâs test should do that. Youâre code shouldnât make assumptions on what is evicted as we will try to maximize the hit rate using various techniques. ---
[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1015 Ok, now its really working. The second problem was that there is some variability in which items get expired from the cache according to the LRU algorithm. In about 1/250 runs the algorithm would not choose the expected item to expire. I imagine this is only a problem when running at such small scale with a cache size of 2. I updated the test case so that it looks into the cache to determine whether a hit should occur or not. The test passes, but I am not sure if the test is really useful. ---
[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1015 Daft. Still a problem. In master, run repetitively the test fails easily 3/10 times for me. With the latest in this PR, it is failing 1/40 times or so, so I'm still not there. So even explicitly performing cache maintenance does not mean that the cache size will always be exactly what we expect. I am wondering if we even need such low level validation as what we have in testWithFullCache() now. ---
[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1015 > @mmiklavc: Can you call out the lines relevant to the fix for reviewers? I provided some more color in the description this time. Also, copied below. > The core problem is that the test makes assumptions about the cache state which are not always true. Specifically, even though the cache max size is 2, the cache can exceed that size for short periods of time until cache maintenance is performed. > Since the cache can be larger than expected for brief periods of time, the expectations of what is a hit or miss were incorrect. This is why the test would fail intermittently. > To address this, I rewrote the test so that the cache stats are validated after each expression is run and I also perform cache maintenance between each execution and validation of cache state. > Without the additional logging, the ability to grab more detailed cache stats, and breaking down that test case, I wouldn't have been able to uncover the bug. ---
[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user mmiklavc commented on the issue: https://github.com/apache/metron/pull/1015 Can you call out the lines relevant to the fix for reviewers? ---
[GitHub] metron issue #1015: METRON-1544 Flaky test: org.apache.metron.stellar.common...
Github user nickwallen commented on the issue: https://github.com/apache/metron/pull/1015 Triggering the CI build to ensure tests don't fail. ---