[ 
https://issues.apache.org/jira/browse/PHOENIX-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020365#comment-16020365
 ] 

Sergey Soldatov commented on PHOENIX-3871:
------------------------------------------

We collect statistic on all compactions that have COMPACT_DROP_DELETES. Those 
not only major compactions, but also minor compactions if one of default 
compaction policies are used ( RatioBasedCompactionPolicy sets the compaction 
as major if all storefile candidates get into the compaction). Running 
statistic collection on upserts sounds like an overkill. 

> Incremental stats collection
> ----------------------------
>
>                 Key: PHOENIX-3871
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3871
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Eli Levine
>
> Phoenix automatically gathers statistics at [major compaction 
> time|http://phoenix.apache.org/update_statistics.html]. While this is useful 
> and accurate, it also means that statistics can become stale due to the 
> infrequency of major compactions (can be days between major compactions), 
> reducing their usefulness. 
> This jira asks the question: Is it possible for Phoenix to collects 
> statistics at a more granular level, say for every (or a sampling of) UPSERT, 
> or minor compaction. Since statistics are always approximations, it is OK for 
> this incremental approach to not be 100% accurate.
> The current stats collection mechanism at major compaction time should be 
> kept to accurately "fix up" stats at major compaction time.
> [~jamestaylor], FYI. We talked about this in person a few weeks ago. Creating 
> this Jira for posterity. Please add anything that I missed. Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to