[
https://issues.apache.org/jira/browse/UNOMI-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kevan Jahanshahi updated UNOMI-736:
-----------------------------------
Description:
Recently we worked on reducing the number of ElasticSearch indices used in
order to reduce ElasticSearch cost and optimize memory cost mostly due to a lot
of small indices.
So the idea is to store all the Unomi items in the same index as much as
possible.
Here is the PoC PR: [https://github.com/apache/unomi/pull/573]
What remains to do in the PoC:
* make the itemType/index name map configurable instead of currently hardcoded
in the Java Persistence Service ({*}to be discussed. could be dangerous to
change the conf during server runtime{*})
* Fix groovy action id conflict:
** currently the rule resolution system is trying to load a *ActionType* item
with the id of a groovy action, since they are stored in the same index now, it
found a groovy Action and the deserialzation is failing, we should find a way
to avoid such resolution.
* Fix tests if necessary (current test scope should be green as everything
should continue work like before.)
* If we decide to make persona + profile sharing same index could be
interesting, but we need to make sure nobody can use a personaID as a profileId
in context request, and make sure we are still secure ! (same for the session
and personaSession), but I think it is really interesting to do that as this
objects are actually the same at the end.
* Last but most important:
** decide the final name of the index, in the PoC it's named: *context-item*
** decide the final sharing strategy
*** (separate metadata items or not ? Personally I dont thinks that would make
sense to do a separation here)
*** (share between profile and persona ?) That would be great to optimize the
persona index that really doesn't contains a lot in general, same for sessions.
Current sharing strategy from the PoC:
{code:java}
static {
// metadata items
itemTypeIndexNameMap.put("actionType", "item");
itemTypeIndexNameMap.put("campaign", "item");
itemTypeIndexNameMap.put("campaignevent", "item");
itemTypeIndexNameMap.put("goal", "item");
itemTypeIndexNameMap.put("userList", "item");
itemTypeIndexNameMap.put("propertyType", "item");
itemTypeIndexNameMap.put("scope", "item");
itemTypeIndexNameMap.put("conditionType", "item");
itemTypeIndexNameMap.put("rule", "item");
itemTypeIndexNameMap.put("scoring", "item");
itemTypeIndexNameMap.put("segment", "item");
itemTypeIndexNameMap.put("groovyAction", "item");
// direct item implems
itemTypeIndexNameMap.put("topic", "item");
itemTypeIndexNameMap.put("patch", "item");
itemTypeIndexNameMap.put("jsonSchema", "item");
itemTypeIndexNameMap.put("importConfig", "item");
itemTypeIndexNameMap.put("exportConfig", "item");
itemTypeIndexNameMap.put("rulestats", "item");
itemTypeIndexNameMap.put("profile", "profile");
itemTypeIndexNameMap.put("persona", "profile");
itemTypeIndexNameMap.put("session", "session");
itemTypeIndexNameMap.put("personaSession", "session");
} {code}
Not in current ticket:
* handle migration
was:
Recently we worked on reducing the number of ElasticSearch indices used in
order to reduce ElasticSearch cost and optimize memory cost mostly due to a lot
of small indices.
So the idea is to store all the Unomi items in the same index as much as
possible.
Here is the PoC PR: [https://github.com/apache/unomi/pull/571]
What remains to do in the PoC:
* make the itemType/index name map configurable instead of currently hardcoded
in the Java Persistence Service ({*}to be discussed. could be dangerous to
change the conf during server runtime{*})
* Fix groovy action id conflict:
** currently the rule resolution system is trying to load a *ActionType* item
with the id of a groovy action, since they are stored in the same index now, it
found a groovy Action and the deserialzation is failing, we should find a way
to avoid such resolution.
* Fix tests if necessary (current test scope should be green as everything
should continue work like before.)
* If we decide to make persona + profile sharing same index could be
interesting, but we need to make sure nobody can use a personaID as a profileId
in context request, and make sure we are still secure ! (same for the session
and personaSession), but I think it is really interesting to do that as this
objects are actually the same at the end.
* Last but most important:
** decide the final name of the index, in the PoC it's named: *context-item*
** decide the final sharing strategy
*** (separate metadata items or not ? Personally I dont thinks that would make
sense to do a separation here)
*** (share between profile and persona ?) That would be great to optimize the
persona index that really doesn't contains a lot in general, same for sessions.
Current sharing strategy from the PoC:
{code:java}
static {
// metadata items
itemTypeIndexNameMap.put("actionType", "item");
itemTypeIndexNameMap.put("campaign", "item");
itemTypeIndexNameMap.put("campaignevent", "item");
itemTypeIndexNameMap.put("goal", "item");
itemTypeIndexNameMap.put("userList", "item");
itemTypeIndexNameMap.put("propertyType", "item");
itemTypeIndexNameMap.put("scope", "item");
itemTypeIndexNameMap.put("conditionType", "item");
itemTypeIndexNameMap.put("rule", "item");
itemTypeIndexNameMap.put("scoring", "item");
itemTypeIndexNameMap.put("segment", "item");
itemTypeIndexNameMap.put("groovyAction", "item");
// direct item implems
itemTypeIndexNameMap.put("topic", "item");
itemTypeIndexNameMap.put("patch", "item");
itemTypeIndexNameMap.put("jsonSchema", "item");
itemTypeIndexNameMap.put("importConfig", "item");
itemTypeIndexNameMap.put("exportConfig", "item");
itemTypeIndexNameMap.put("rulestats", "item");
itemTypeIndexNameMap.put("profile", "profile");
itemTypeIndexNameMap.put("persona", "profile");
itemTypeIndexNameMap.put("session", "session");
itemTypeIndexNameMap.put("personaSession", "session");
} {code}
Not in current ticket:
* handle migration
> Metadata Items: indices reduction PoC cleanup
> ---------------------------------------------
>
> Key: UNOMI-736
> URL: https://issues.apache.org/jira/browse/UNOMI-736
> Project: Apache Unomi
> Issue Type: Task
> Affects Versions: unomi-2.1.0
> Reporter: Kevan Jahanshahi
> Priority: Major
> Fix For: unomi-2.2.0
>
>
> Recently we worked on reducing the number of ElasticSearch indices used in
> order to reduce ElasticSearch cost and optimize memory cost mostly due to a
> lot of small indices.
> So the idea is to store all the Unomi items in the same index as much as
> possible.
> Here is the PoC PR: [https://github.com/apache/unomi/pull/573]
> What remains to do in the PoC:
> * make the itemType/index name map configurable instead of currently
> hardcoded in the Java Persistence Service ({*}to be discussed. could be
> dangerous to change the conf during server runtime{*})
> * Fix groovy action id conflict:
> ** currently the rule resolution system is trying to load a *ActionType*
> item with the id of a groovy action, since they are stored in the same index
> now, it found a groovy Action and the deserialzation is failing, we should
> find a way to avoid such resolution.
> * Fix tests if necessary (current test scope should be green as everything
> should continue work like before.)
> * If we decide to make persona + profile sharing same index could be
> interesting, but we need to make sure nobody can use a personaID as a
> profileId in context request, and make sure we are still secure ! (same for
> the session and personaSession), but I think it is really interesting to do
> that as this objects are actually the same at the end.
> * Last but most important:
> ** decide the final name of the index, in the PoC it's named: *context-item*
> ** decide the final sharing strategy
> *** (separate metadata items or not ? Personally I dont thinks that would
> make sense to do a separation here)
> *** (share between profile and persona ?) That would be great to optimize
> the persona index that really doesn't contains a lot in general, same for
> sessions.
> Current sharing strategy from the PoC:
> {code:java}
> static {
> // metadata items
> itemTypeIndexNameMap.put("actionType", "item");
> itemTypeIndexNameMap.put("campaign", "item");
> itemTypeIndexNameMap.put("campaignevent", "item");
> itemTypeIndexNameMap.put("goal", "item");
> itemTypeIndexNameMap.put("userList", "item");
> itemTypeIndexNameMap.put("propertyType", "item");
> itemTypeIndexNameMap.put("scope", "item");
> itemTypeIndexNameMap.put("conditionType", "item");
> itemTypeIndexNameMap.put("rule", "item");
> itemTypeIndexNameMap.put("scoring", "item");
> itemTypeIndexNameMap.put("segment", "item");
> itemTypeIndexNameMap.put("groovyAction", "item");
> // direct item implems
> itemTypeIndexNameMap.put("topic", "item");
> itemTypeIndexNameMap.put("patch", "item");
> itemTypeIndexNameMap.put("jsonSchema", "item");
> itemTypeIndexNameMap.put("importConfig", "item");
> itemTypeIndexNameMap.put("exportConfig", "item");
> itemTypeIndexNameMap.put("rulestats", "item");
> itemTypeIndexNameMap.put("profile", "profile");
> itemTypeIndexNameMap.put("persona", "profile");
> itemTypeIndexNameMap.put("session", "session");
> itemTypeIndexNameMap.put("personaSession", "session");
> } {code}
> Not in current ticket:
> * handle migration
--
This message was sent by Atlassian Jira
(v8.20.10#820010)