[ 
https://issues.apache.org/jira/browse/HIVE-11444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15956006#comment-15956006
 ] 

Eugene Koifman commented on HIVE-11444:
---------------------------------------

More generally, raise alert 
1. if there are too many open txns
2. if there are too many aborted txns - most likely a misconfigured streaming 
ingest client.  Need to include client info in the alert.
3. if there are a lot of entries in TXN_COMPONENTS  - means compactor is not 
keeping up

In extreme cases both can cause the amount of metadata to slow down the 
metastore operations (TxnHandler/CompactionTxnHandler) a use very large amounts 
of RAM (ValidTxnList)


> ACID Compactor should generate stats/alerts
> -------------------------------------------
>
>                 Key: HIVE-11444
>                 URL: https://issues.apache.org/jira/browse/HIVE-11444
>             Project: Hive
>          Issue Type: Improvement
>          Components: Transactions
>    Affects Versions: 1.0.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> Compaction should generate stats about number of files it reads, min/max/avg 
> size etc.  It should also generate alerts if it looks like the system is not 
> configured correctly.
> For example, if there are lots of delta files with very small files, it's a 
> good sign that Streaming API is configured with batches that are too small.
> Simplest idea is to add another periodic task to AcidHouseKeeperService to
>         //periodically do select count(*), min(txnid),max(txnid), type from 
> txns group by type.
>         //1. dump that to log file at info
>         //2. could also keep counts for last 10min, hour, 6 hours, 24 hours, 
> etc
>         //2.2 if a large increase is detected - issue alert (at least to the 
> log for now) at warn/error
> Should also alert if there is ACID activity but no compactions running.
> One way to do this is to add logic to TxnHandler to periodically check 
> contents of COMPACTION_QUEUE table and keep  a simple histogram of 
> compactions over last few hours.
> Similarly can run a periodic check of transactions started (or 
> committed/aborted) and keep a simple histogram.  Then the 2 can be used to 
> detect that there is ACID write activity but no compaction activity.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to