[ 
https://issues.apache.org/jira/browse/UNOMI-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevan Jahanshahi updated UNOMI-748:
-----------------------------------
    Description: 
currently the sessions/events *update* is using bulkProcessor and it is 
asynchronous, we never know when the bulk will be perform.
 * t{+}he benefit{+}: fast merge requests, the merge request is fast as nothing 
is retain, bulk processor will do the job in a separate thread.
 * {+}the cons{+}: {*}all previous sessions/events are first loaded in 
memory{*}, so in case of merging active profiles that contains a lot of past 
events/sessions, {{{}we could be exposed to OOM{}}}. {_}(We already had similar 
case with the purge that was loading all profiles in memory.{_})

If we replace the *update(one item at a time)* by using {*}updateByQuery{*}, 
the request will loose it’s asynchronous nature provided by the so called: 
BulkProcessor.
 * {+}the benefit{+}: sessions, events not load in memory, no OOM possible
 * {+}the cons{+}: request will be synchron and {{{}we expose merge requests to 
timeout on client side{}}}. merge is actually trigger by the login on jExp side 
adding extra timing here could have bad impacts and side effects.

 
Since none of this solution seem’s ok, the perfect solution should be a mix of 
both strength: * use *{{updateByQuery}}* in a separate thread to avoid 
retaining merge request
 * 
 ** We have the OOM protection by not loading all the past events/sessions
 ** We have the asynchronous execution done in a separate thread/job to free 
the current request.

  was:
currently the sessions/events *update* is using bulkProcessor and it is 
asynchronous, we never know when the bulk will be perform.
 * t{+}he benefit{+}: fast merge requests, the merge request is fast as nothing 
is retain, bulk processor will do the job in a separate thread.
 * {+}the cons{+}: {*}all previous sessions/events are loaded in memory{*}, so 
in case of merging active profiles that contains a lot of past events/sessions, 
{{{}we are exposed to OOM here{}}}. {_}(We already had similar case with the 
purge that was loading all profiles in memory.{_})

If we replace the *update(one item at a time)* by using {*}updateByQuery{*}, 
the request will loose it’s asynchronous nature provided by the so called: 
BulkProcessor.
 * {+}the benefit{+}: sessions, events not load in memory, no OOM possible
 * {+}the cons{+}: request will be synchron and {{{}we expose merge requests to 
timeout on client side{}}}. merge is actually trigger by the login on jExp side 
adding extra timing here could have bad impacts and side effects.

 
Since none of this solution seem’s ok, the perfect solution should be a mix of 
both strength: * use *{{updateByQuery}}* in a separate thread to avoid 
retaining merge request
 ** We have the OOM protection by not loading all the past events/sessions
 ** We have the asynchronous execution done in a separate thread/job to free 
the current request.


> Unomi merge system is exposed to OOM
> ------------------------------------
>
>                 Key: UNOMI-748
>                 URL: https://issues.apache.org/jira/browse/UNOMI-748
>             Project: Apache Unomi
>          Issue Type: Improvement
>    Affects Versions: unomi-2.1.0
>            Reporter: Kevan Jahanshahi
>            Priority: Major
>
> currently the sessions/events *update* is using bulkProcessor and it is 
> asynchronous, we never know when the bulk will be perform.
>  * t{+}he benefit{+}: fast merge requests, the merge request is fast as 
> nothing is retain, bulk processor will do the job in a separate thread.
>  * {+}the cons{+}: {*}all previous sessions/events are first loaded in 
> memory{*}, so in case of merging active profiles that contains a lot of past 
> events/sessions, {{{}we could be exposed to OOM{}}}. {_}(We already had 
> similar case with the purge that was loading all profiles in memory.{_})
> If we replace the *update(one item at a time)* by using {*}updateByQuery{*}, 
> the request will loose it’s asynchronous nature provided by the so called: 
> BulkProcessor.
>  * {+}the benefit{+}: sessions, events not load in memory, no OOM possible
>  * {+}the cons{+}: request will be synchron and {{{}we expose merge requests 
> to timeout on client side{}}}. merge is actually trigger by the login on jExp 
> side adding extra timing here could have bad impacts and side effects.
>  
> Since none of this solution seem’s ok, the perfect solution should be a mix 
> of both strength: * use *{{updateByQuery}}* in a separate thread to avoid 
> retaining merge request
>  * 
>  ** We have the OOM protection by not loading all the past events/sessions
>  ** We have the asynchronous execution done in a separate thread/job to free 
> the current request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to