[ https://issues.apache.org/jira/browse/PHOENIX-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020365#comment-16020365 ]
Sergey Soldatov commented on PHOENIX-3871: ------------------------------------------ We collect statistic on all compactions that have COMPACT_DROP_DELETES. Those not only major compactions, but also minor compactions if one of default compaction policies are used ( RatioBasedCompactionPolicy sets the compaction as major if all storefile candidates get into the compaction). Running statistic collection on upserts sounds like an overkill. > Incremental stats collection > ---------------------------- > > Key: PHOENIX-3871 > URL: https://issues.apache.org/jira/browse/PHOENIX-3871 > Project: Phoenix > Issue Type: Improvement > Reporter: Eli Levine > > Phoenix automatically gathers statistics at [major compaction > time|http://phoenix.apache.org/update_statistics.html]. While this is useful > and accurate, it also means that statistics can become stale due to the > infrequency of major compactions (can be days between major compactions), > reducing their usefulness. > This jira asks the question: Is it possible for Phoenix to collects > statistics at a more granular level, say for every (or a sampling of) UPSERT, > or minor compaction. Since statistics are always approximations, it is OK for > this incremental approach to not be 100% accurate. > The current stats collection mechanism at major compaction time should be > kept to accurately "fix up" stats at major compaction time. > [~jamestaylor], FYI. We talked about this in person a few weeks ago. Creating > this Jira for posterity. Please add anything that I missed. Thanks! -- This message was sent by Atlassian JIRA (v6.3.15#6346)