[ 
https://issues.apache.org/jira/browse/OOZIE-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Salamon updated OOZIE-3336:
----------------------------------
    Fix Version/s:     (was: 5.2.0)
                   5.3.0

> [persistence] Refactor entity classes to feature PK, FK, and UQ constraints
> ---------------------------------------------------------------------------
>
>                 Key: OOZIE-3336
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3336
>             Project: Oozie
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 5.0.0
>            Reporter: Andras Piros
>            Priority: Major
>             Fix For: 5.3.0
>
>
> When an Oozie database grows substantial in size, let's say, over a few 
> hundred thousands of {{WorkflowActionBean}}, {{CoordinatorActionBean}} 
> instances, we face a couple of performance issues. Here is an analysis why.
> Current Oozie JPA {{@Entity}} usage, and the resulting database DDL, suffers 
> from a couple of drawback from a performance point of view:
> * {{@Id}} fields are {{String}}:
> ** leaving no space for database primary key indices to work effectively
> ** those values are calculated in case of {{WorkflowActionBean}}, 
> {{CoordinatorActionBean}}, and {{BundleActionBean}} instances
> * no foreign constraint is set from {{WorkflowActionBean}} to 
> {{WorkflowJobBean}}, from {{CoordinatorActionBean}} to 
> {{CoordinatorJobBean}}, or from {{BundleActionBean}} to {{BundleJobBean}} 
> instances:
> ** have to assess JPA queries discovering parent-child relationships by hand
> ** no database indices are created, and hence, those queries that contain any 
> {{JOIN}} instances are slower
> * no use of unique constraints whatsoever
> * JPA queries are created by hand instead of relying on OpenJPA
> * JPA entities are filled by hand instead of relying on OpenJPA
> Following enhancements are necessary:
> # keeping the existing {{String compositeId}} fields, let's break down the 
> contents to following new fields:
> ## {{@Id long id}} - an auto-increment value that is unique across Oozie 
> database
> ## {{long currentSequence}} - the sequence number of the current run since 
> last Oozie server restart. The first part of the {{compositeId}}
> ## {{Timestamp serverStartupTimestamp}} - the timestamp when the Oozie server 
> was last started. The second part of the {{compositeId}}
> ## {{String serverName}} - the third part of the {{compositeId}}
> ## {{String name}} - the fourth and last part of the {{compositeId}}
> ## {{compositeId}} might be calculated when an entity is loaded / persisted, 
> and then stored
> # FK constraints:
> ## {{@OneToMany}} fields where we have a list of child references inside 
> parent
> ## {{@ManyToOne}} fields where we have a parent reference inside child
> ## pay attention to {{FetchType}}, most of the times {{LAZY}} will be needed
> ## the containment fields should not be {{@Transient}} anymore
> # UQ constraints:
> ## on {{currentSequence}} and {{serverStartupTimestamp}}
> ## on {{currentSequence}} and {{name}}
> # new JPQL queries:
> ## to cover changed parent-child relationships
> ## to get use of each disassembled part of {{originalId}} when doing e.g. 
> filtering
> # let JPA fill entities instead performing this by hand
> Following enhancements can be considered as nice-to-have:
> * upgrade to an OpenJPA version that features JPA 2.1's composite indexing 
> capability
> * see whether to have an optimistic locking field using {{@Version}} instead 
> of ZooKeeper based pessimistic locking would increase High Availability 
> characteristics
> * refactor also SLA related entity classes
> It's necessary to have performance benchmarks with some database types like 
> MySQL/MariaDB, and PostgreSQL before and after the changes for following use 
> cases:
> * {{CoordinatorJobBean}} and {{WorkflowJobBean}} instances up to millions
> * {{CoordinatorActionBean}} and {{WorkflowActionBean}} instances up to tens 
> of millions
> * performance for JPQLs that get a list of entities
> * performance of persisting a new entity
> * performance of querying lists of entities based on popular / possible 
> filters like the ones used by {{VxJobsServlet}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to