[ https://issues.apache.org/jira/browse/OOZIE-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julia Kinga Marton reassigned OOZIE-3336: ----------------------------------------- Assignee: (was: Julia Kinga Marton) > [persistence] Refactor entity classes to feature PK, FK, and UQ constraints > --------------------------------------------------------------------------- > > Key: OOZIE-3336 > URL: https://issues.apache.org/jira/browse/OOZIE-3336 > Project: Oozie > Issue Type: Improvement > Components: core > Affects Versions: 5.0.0 > Reporter: Andras Piros > Priority: Major > Fix For: 5.2.0 > > > When an Oozie database grows substantial in size, let's say, over a few > hundred thousands of {{WorkflowActionBean}}, {{CoordinatorActionBean}} > instances, we face a couple of performance issues. Here is an analysis why. > Current Oozie JPA {{@Entity}} usage, and the resulting database DDL, suffers > from a couple of drawback from a performance point of view: > * {{@Id}} fields are {{String}}: > ** leaving no space for database primary key indices to work effectively > ** those values are calculated in case of {{WorkflowActionBean}}, > {{CoordinatorActionBean}}, and {{BundleActionBean}} instances > * no foreign constraint is set from {{WorkflowActionBean}} to > {{WorkflowJobBean}}, from {{CoordinatorActionBean}} to > {{CoordinatorJobBean}}, or from {{BundleActionBean}} to {{BundleJobBean}} > instances: > ** have to assess JPA queries discovering parent-child relationships by hand > ** no database indices are created, and hence, those queries that contain any > {{JOIN}} instances are slower > * no use of unique constraints whatsoever > * JPA queries are created by hand instead of relying on OpenJPA > * JPA entities are filled by hand instead of relying on OpenJPA > Following enhancements are necessary: > # keeping the existing {{String compositeId}} fields, let's break down the > contents to following new fields: > ## {{@Id long id}} - an auto-increment value that is unique across Oozie > database > ## {{long currentSequence}} - the sequence number of the current run since > last Oozie server restart. The first part of the {{compositeId}} > ## {{Timestamp serverStartupTimestamp}} - the timestamp when the Oozie server > was last started. The second part of the {{compositeId}} > ## {{String serverName}} - the third part of the {{compositeId}} > ## {{String name}} - the fourth and last part of the {{compositeId}} > ## {{compositeId}} might be calculated when an entity is loaded / persisted, > and then stored > # FK constraints: > ## {{@OneToMany}} fields where we have a list of child references inside > parent > ## {{@ManyToOne}} fields where we have a parent reference inside child > ## pay attention to {{FetchType}}, most of the times {{LAZY}} will be needed > ## the containment fields should not be {{@Transient}} anymore > # UQ constraints: > ## on {{currentSequence}} and {{serverStartupTimestamp}} > ## on {{currentSequence}} and {{name}} > # new JPQL queries: > ## to cover changed parent-child relationships > ## to get use of each disassembled part of {{originalId}} when doing e.g. > filtering > # let JPA fill entities instead performing this by hand > Following enhancements can be considered as nice-to-have: > * upgrade to an OpenJPA version that features JPA 2.1's composite indexing > capability > * see whether to have an optimistic locking field using {{@Version}} instead > of ZooKeeper based pessimistic locking would increase High Availability > characteristics > * refactor also SLA related entity classes > It's necessary to have performance benchmarks with some database types like > MySQL/MariaDB, and PostgreSQL before and after the changes for following use > cases: > * {{CoordinatorJobBean}} and {{WorkflowJobBean}} instances up to millions > * {{CoordinatorActionBean}} and {{WorkflowActionBean}} instances up to tens > of millions > * performance for JPQLs that get a list of entities > * performance of persisting a new entity > * performance of querying lists of entities based on popular / possible > filters like the ones used by {{VxJobsServlet}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)