Emil Kleszcz created PHOENIX-7764:
-------------------------------------
Summary: Phoenix UngroupedAggregateRegionObserver causes extremely
slow HBase major compactions by forcing statistics recomputation
Key: PHOENIX-7764
URL: https://issues.apache.org/jira/browse/PHOENIX-7764
Project: Phoenix
Issue Type: Improvement
Affects Versions: 5.2.1
Reporter: Emil Kleszcz
On HBase 2.5.10 with Phoenix 5.2.1, major compactions become _orders of
magnitude slower_ when the Phoenix coprocessor
_org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver_ is enabled in
a given table (by default).
Compactions that normally complete in minutes instead run for tens of hours,
even when compacting only a few GB per column family.
Thread dumps and logs show that Phoenix wraps HBase compaction with its own
scanner chain and recomputes Phoenix statistics (guideposts) during compaction,
dominating runtime.
This makes large Phoenix tables effectively unmaintainable under heavy delete
or split workloads.
*Environment*
* HBase: 2.5.10
* Phoenix: 5.2.1
* Hadoop: 3.3.6
* JDK: 11.0.24
* Table: multi-CF (A/B/C/D), billions of rows, heavy deletes
*Observed behavior*
Major compactions on CF A routinely take 20–30 hours for ~4–6 GB of compressed
region data (depending on the number of tombstones, number of cells, and cell
sizes):
{code:java}
Completed major compaction ... store A ... into size=3.9 G
This selection was in queue for 58hrs, and took 27hrs, 14mins to execute.{code}
At the same time, compactions on other CFs of similar or larger size complete
in minutes.
*Evidence: Phoenix on compaction hot path*
1. *Thread dumps during compaction*
All long-running compaction threads are executing Phoenix code:
{code:java}
org.apache.phoenix.coprocessor.CompactionScanner$PhoenixLevelRowCompactor.compactRegionLevel
org.apache.phoenix.schema.stats.StatisticsScanner.next
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction{code}
2. *RegionServer logs*
{code:java}
Starting CompactionScanner ... store A major compaction
Closing CompactionScanner ... retained N of N cells phoenix level only
{code}
This shows Phoenix intercepting the HBase compaction and running a
Phoenix-level scan.
3. *HFile inspection*
Large store files show hundreds of millions of delete markers and billions of
entries.
Phoenix statistics recomputation during compaction requires scanning and
processing all rows, which dominates runtime.
*Controlled experiment*
* Removing only _UngroupedAggregateRegionObserver_ from the table:
** CF A major compactions complete in minutes (comparable to other CFs).
** Normal point lookups, scans, joins still work.
** Phoenix statistics collection still enabled globally.
* Side effect:
** Ungrouped aggregate queries ({_}COUNT( * ){_}, {_}MIN/MAX{_}, _SUM_ without
{_}GROUP BY{_}) fail, because Phoenix does not fall back to client-side
aggregation and still plans {_}SERVER AGGREGATE INTO SINGLE ROW{_}.
This confirms:
* The coprocessor is the source of extreme compaction slowdown.
* Phoenix tightly couples aggregate execution and compaction-time statistics
recomputation.
*Problem*
* Phoenix performs expensive statistics work during HBase major compaction, a
critical maintenance operation.
* This work is opaque, unavoidable, and not configurable.
* Large Phoenix tables with deletes/splits can remain under compaction for
weeks, causing:
** prolonged compaction backlogs,
** blocked balancing,
** unpredictable query latency spikes.
*Expected*
One of the following (any would be acceptable):
# A configuration to disable Phoenix statistics recomputation during compaction.
# A way to decouple {{UngroupedAggregateRegionObserver}} from compaction-time
scanning.
# Clear documentation that Phoenix majorly alters HBase compaction cost, with
guidance for large tables.
# A fix so Phoenix falls back to client-side aggregation when the coprocessor
is absent (so operators can safely remove it).
At minimum, confirmation whether this behavior is expected and unavoidable in
Phoenix 5.2.x on HBase 2.5.x.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)