[ https://issues.apache.org/jira/browse/UNOMI-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Francois Gerthoffert updated UNOMI-670: --------------------------------------- Fix Version/s: unomi-2.0.1 > Unomi purge system is not working > --------------------------------- > > Key: UNOMI-670 > URL: https://issues.apache.org/jira/browse/UNOMI-670 > Project: Apache Unomi > Issue Type: Bug > Components: unomi(-core) > Affects Versions: unomi-2.0.0, unomi-1.6.1 > Reporter: Kevan Jahanshahi > Priority: Major > Fix For: unomi-2.0.1, unomi-1.6.2 > > > The purge system have been fixed recently in version 1.6.0: > [https://jira.jahia.org/browse/UNOMI-504] > The fix that was done, just fixed the query used for identifying the profiles > to be purged. > Actually we figured a bit late that the purge system contains more issues > than that. > Here is a non exhaustive list of the one already identified and that will > need fix/answers an even reflections/discussions for some of them > h2. Bug 1: disabling the profile purge will end up with all profiles to be > purge everyday > Using this conf: > {code:java} > org.apache.unomi.profile.purge.existTime = -1 > org.apache.unomi.profile.purge.inactiveTime = -1 > org.apache.unomi.monthly.index.purge.existTime= 12 {code} > To only enable events/session purge will have a bad side effect: all profiles > of the system will be removed everyday. > h2. bug 2: OOM issue for profile purge. > The persistence service is implementing a method: > {code:java} > /** > * Deletes items with the specified Item subclass matching the specified > {@link Condition}. > * > * @param <T> the type of the Item subclass we want to delete > * @param query a {@link Condition} identifying which elements we want to > delete > * @param clazz the {@link Item} subclass of the items we want to delete > * @return {@code true} if the deletion was successful, {@code false} > otherwise > */ > <T extends Item> boolean removeByQuery(Condition query, Class<T> clazz); > {code} > The issue with the implem is that it's doing the following: > * query profiles to get the IDs of documents > * building a batch request with a delete request per profile ID > * sending the batch. > So in case of a lot profiles to be purge at once, the batch can grow a lot, > and consume all the memory available. > We should not do like that, instead we should: > * just use the ES query: > [https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html] > directly with the query to identify the profiles > h2. Bug 3: What is the cache system implemented in the persistence service. > Seem's coming from: https://issues.apache.org/jira/browse/UNOMI-166 > It's used by the persistence service, and it explain why the removeByQuery > implem is bad. > The reason why the *removeByQuery* is querying all the documents IDs is to > invalidate the cache. > But it's strange, the cache system is not used at all: > {code:java} > <cm:property name="itemClassesToCache" value="" /> {code} > So I would say, we should remove this cache system, because: > * it's not tested > * nobody know what would be the effect if enabled > * it's not possible to activate it with ENV conf, or global unomi conf. > * it's not documented > * it's impacting implems like: removeByQuery, even if the cache system is > not used > * it's mostly dead code inside the core persistence of Unomi, witch is > dangerous to keep. > h2. Bug 4: Purge job seem's never unregistered. > We do register the purge job when the profile service is started, but in case > the profile service is stopped (stop the bundle for example). Then it's not > unregistered. > h2. Bug 5: what should we do in case of unomi cluster, should all of them do > the purge ? > In case of unomi cluster, all of them may try to trigger the purge, I dont > think it's an issue, but it look like something could be done to avoid doing > unecessary operations twice or more. > h2. Final: We should also add integrations tests to cover this feature. -- This message was sent by Atlassian Jira (v8.20.10#820010)