[ 
https://issues.apache.org/jira/browse/PHOENIX-7764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18059117#comment-18059117
 ] 

Emil Kleszcz commented on PHOENIX-7764:
---------------------------------------

Thanks a lot for the suggestion, [~tkhurana]. This was very helpful.

I tested this on our QA cluster (HBase 2.5.x / Phoenix 5.2.x) and can confirm 
that setting
_ALTER TABLE <table> SET "phoenix.table.ttl.enabled" = false_
does have a real and immediate effect.

After applying the flag:
 * The property is persisted at the HBase table descriptor level ({_}METADATA 
=> 'phoenix.table.ttl.enabled' = false{_}).

 * Regions are briefly closed and reopened, which matches a descriptor refresh 
(expected side effect I assume).

 * Major compactions that previously took a lot of time on the first CF 
complete much faster and behave similarly to or even the same as non-Phoenix 
CFs.

 * Tombstones are physically removed again by major compaction which is the 
reason why we need to keep majors running more often as we have spikes of 
deletes in the table.

 * Normal Phoenix operations (UPSERT, DELETE, scans, GROUP BY, MIN/MAX) 
continued to work correctly in spot checks.

In our production use case:
 * The table does not use Phoenix TTL ({_}PHOENIX_TTL{_} and _PHOENIX_TTL_HWM_ 
are NULL).

 * There are no TTL views.

 * Deletes are explicit user deletes, and we rely on HBase major compactions to 
reclaim space.

Given this, disabling _phoenix.table.ttl.enabled_ appears to restore sane 
compaction behavior for large, delete-heavy tables where TTL is not used. The 
only operational side effect observed so far is the one-time region reopen when 
the property is applied.

One clarification question. For a Phoenix table that does not use TTL at all, 
is it safe to keep _phoenix.table.ttl.enabled = false_ permanently in 
production?
Are there any less obvious side effects (e.g. on statistics, consistency, or 
future upgrades) that operators should be aware of when leaving this disabled 
long-term?
I don't seem to find anything documented on this in the upstream documentation.

Regarding Phoenix 5.3: from reading the current source, the flag still exists 
and the TTL/compaction logic remains guarded by it. I don't see an indication 
that its semantics change in 5.3, but please correct me if that assumption is 
wrong. I was checking: 
https://github.com/apache/phoenix/blob/5.3/phoenix-core-client/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java#L480
There I can also see another flag for Phoenix compactions 
_PHOENIX_COMPACTION_ENABLED_ and according to _isPhoenixCompactionEnabled_ 
method it disabled the compaction: 
https://github.com/apache/phoenix/blob/5.3/phoenix-core-client/src/main/java/org/apache/phoenix/util/ScanUtil.java#L1217
I don't see this option in 5.2. Maybe you can help me to clarify that so I know 
how to proceed with the migration to 5.3 later on.

Thanks again, this insight directly unblocked us operationally.

> Phoenix UngroupedAggregateRegionObserver causes extremely slow HBase major 
> compactions by forcing statistics recomputation
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-7764
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-7764
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 5.2.1
>            Reporter: Emil Kleszcz
>            Priority: Major
>
> On HBase 2.5.10 with Phoenix 5.2.1, major compactions become _orders of 
> magnitude slower_ when the Phoenix coprocessor
> _org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver_ is enabled 
> in a given table (by default).
> Compactions that normally complete in minutes instead run for tens of hours, 
> even when compacting only a few GB per column family.
> Thread dumps and logs show that Phoenix wraps HBase compaction with its own 
> scanner chain and recomputes Phoenix statistics (guideposts) during 
> compaction, dominating runtime.
> This makes large Phoenix tables effectively unmaintainable under heavy delete 
> or split workloads.
> *Environment*
>  * HBase: 2.5.10
>  * Phoenix: 5.2.1
>  * Hadoop: 3.3.6
>  * JDK: 11.0.24
>  * Table: multi-CF (A/B/C/D), billions of rows, heavy deletes
> *Observed behavior*
> Major compactions on CF A routinely take 20–30 hours for ~4–6 GB of 
> compressed region data (depending on the number of tombstones, number of 
> cells, and cell sizes):
> {code:java}
> Completed major compaction ... store A ... into size=3.9 G
> This selection was in queue for 58hrs, and took 27hrs, 14mins to 
> execute.{code}
> At the same time, compactions on other CFs of similar or larger size complete 
> in minutes.
> *Evidence: Phoenix on compaction hot path*
> 1. *Thread dumps during compaction*
> All long-running compaction threads are executing Phoenix code:
> {code:java}
> org.apache.phoenix.coprocessor.CompactionScanner$PhoenixLevelRowCompactor.compactRegionLevel
> org.apache.phoenix.schema.stats.StatisticsScanner.next
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction{code}
> 2. *RegionServer logs*
> {code:java}
> Starting CompactionScanner ... store A major compaction
> Closing CompactionScanner ... retained N of N cells phoenix level only
> {code}
> This shows Phoenix intercepting the HBase compaction and running a 
> Phoenix-level scan.
> 3. *HFile inspection*
> Large store files show hundreds of millions of delete markers and billions of 
> entries.
> Phoenix statistics recomputation during compaction requires scanning and 
> processing all rows, which dominates runtime.
> *Controlled experiment*
>  * Removing only _UngroupedAggregateRegionObserver_ from the table:
>  ** CF A major compactions complete in minutes (comparable to other CFs).
>  ** Normal point lookups, scans, joins still work.
>  ** Phoenix statistics collection still enabled globally.
>  * Side effect:
>  ** Ungrouped aggregate queries ({_}COUNT( * ){_}, {_}MIN/MAX{_}, _SUM_ 
> without {_}GROUP BY{_}) fail, because Phoenix does not fall back to 
> client-side aggregation and still plans {_}SERVER AGGREGATE INTO SINGLE 
> ROW{_}.
> This confirms:
>  * The coprocessor is the source of extreme compaction slowdown.
>  * Phoenix tightly couples aggregate execution and compaction-time statistics 
> recomputation.
> *Problem*
>  * Phoenix performs expensive statistics work during HBase major compaction, 
> a critical maintenance operation.
>  * This work is opaque, unavoidable, and not configurable.
>  * Large Phoenix tables with deletes/splits can remain under compaction for 
> weeks, causing:
>  ** prolonged compaction backlogs,
>  ** blocked balancing,
>  ** unpredictable query latency spikes.
> *Expected*
> One of the following (any would be acceptable):
> # A configuration to disable Phoenix statistics recomputation during 
> compaction.
> # A way to decouple {{UngroupedAggregateRegionObserver}} from compaction-time 
> scanning.
> # Clear documentation that Phoenix majorly alters HBase compaction cost, with 
> guidance for large tables.
> # A fix so Phoenix falls back to client-side aggregation when the coprocessor 
> is absent (so operators can safely remove it).
> At minimum, confirmation whether this behavior is expected and unavoidable in 
> Phoenix 5.2.x on HBase 2.5.x.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to