[ 
https://issues.apache.org/jira/browse/AMBARI-25576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmytro Grinenko updated AMBARI-25576:
-------------------------------------
    Assignee: Dmytro Grinenko
      Status: Patch Available  (was: Open)

> Primary key duplication error during flushing alerts from alerts cache
> ----------------------------------------------------------------------
>
>                 Key: AMBARI-25576
>                 URL: https://issues.apache.org/jira/browse/AMBARI-25576
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.7.5
>            Reporter: Dmytro Vitiuk
>            Assignee: Dmytro Grinenko
>            Priority: Major
>             Fix For: 2.7.6
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Sometimes there are commit errors for clusters with a lot of hosts and 
> enabled alert caching:
> {code:java}
> 2020-10-09 19:53:14,444 ERROR [alert-event-bus-4] 
> AmbariJpaLocalTxnInterceptor:180 - [DETAILED ERROR] Rollback reason: 
> Local Exception Stack: 
> Exception [EclipseLink-4002] (Eclipse Persistence Services - 
> 2.6.2.v20151217-774c696): org.eclipse.persistence.exceptions.DatabaseException
> Internal Exception: java.sql.BatchUpdateException: Batch entry 1 INSERT INTO 
> alert_history (alert_id, alert_instance, alert_label, alert_state, 
> alert_text, alert_timestamp, cluster_id, component_name, host_name, 
> service_name, alert_definition_id) VALUES (15363461, NULL, 'DataNode Web UI', 
> 'OK', 'HTTP 200 response in 0.000s', 1602286496756, 2, 'DATANODE', 'host1', 
> 'HDFS', 53) was aborted: ERROR: duplicate key value violates unique 
> constraint "pk_alert_history"
>   Detail: Key (alert_id)=(15363461) already exists.  Call getNextException to 
> see other errors in the batch.
> Error Code: 0
> Call: INSERT INTO alert_history (alert_id, alert_instance, alert_label, 
> alert_state, alert_text, alert_timestamp, cluster_id, component_name, 
> host_name, service_name, alert_definition_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?)
>       bind => [11 parameters bound]
> {code}
> This is not often issue, but anyway it has extensive logging. Also this issue 
> can cause other rare problems, so it should be fixed.
>  The reason of the issue is we have a shareable cache which can be updated 
> with just merged value before this value will be really committed into DB. In 
> this case other thread (from CachedAlertFlushService or AlertEventPublisher) 
> can try to also merge already merged entity. 
>  For example, we've created a new AlertHistoryEntity and set it to existing 
> AlertCurrentEntity. A first thread started transaction, merged current entity 
> to context, saved merged value to the cache and paused execution. After that 
> a second thread tries to merge all content of cache and also merges just 
> updated current entity. So we have two transaction and both think they should 
> update current entity and create the new history entity. As result one of 
> them is failing on duplicate error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to