[
https://issues.apache.org/jira/browse/HBASE-30062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell updated HBASE-30062:
----------------------------------------
Fix Version/s: 4.0.0-alpha-1
2.7.0
3.0.0-beta-2
Status: Patch Available (was: Open)
> Device layer simulator for MiniDFSCluster-based tests
> -----------------------------------------------------
>
> Key: HBASE-30062
> URL: https://issues.apache.org/jira/browse/HBASE-30062
> Project: HBase
> Issue Type: New Feature
> Components: HFile, integration tests, test, wal
> Reporter: Andrew Kyle Purtell
> Assignee: Andrew Kyle Purtell
> Priority: Minor
> Labels: pull-request-available
> Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2
>
>
> On EBS-backed deployments in AWS, or equivalents in other cloud
> infrastructure providers, HBase compaction and replication throughput can be
> constrained by per-volume IOPS limits rather than bandwidth. A faithful
> device-level simulator within the test harness allows developers to
> reproduce, analyze, and validate fixes for such performance issues without
> requiring actual cloud infrastructure.
> This proposed change adds a test-only EBS device layer that operates at the
> DataNode storage level within {{MiniDFSCluster}} by replacing the
> {{FsDatasetSpi}} implementation via Hadoop's pluggable factory mechanism.
> This allows HBase integration tests to simulate realistic cloud block storage
> characteristics, such as per-volume bandwidth budgets, IOPS limits,
> sequential IO coalescing, and per-IO device latency, enabling identification
> and reproduction of IO bottlenecks.
> The simulator wraps the real {{FsDatasetImpl}} with a reflection proxy that
> intercepts the three SPI methods where DataNode local IO actually engages the
> underlying block device, without compile-time coupling to internal Hadoop
> classes.
> On the read path, {{getBlockInputStream}} wraps the returned {{InputStream}}
> with {{{}ThrottledBlockInputStream{}}}, charging every byte against the
> volume's BW and IOPS budgets with sequential IO coalescing. On the write
> path, {{submitBackgroundSyncFileRangeRequest}} charges {{nbytes}} against BW
> and IOPS budgets, modeling the async
> {{sync_file_range(SYNC_FILE_RANGE_WRITE)}} that the DataNode issues to flush
> dirty pages from the operating system's page cache to the block device; and
> {{finalizeBlock}} charges the remaining unflushed delta (minus bytes already
> charged via sync_file_range) against the budgets, modeling the {{fsync()}} at
> block finalization.
> Each proxy gets its own set of {{EBSVolumeDevice}} instances with independent
> budgets. Block-to-volume resolution uses {{{}delegate.getVolume(block){}}},
> providing real HDFS placement decisions. A single configuration applies to
> all volumes, but each volume maintains its own token buckets, matching
> production where all attached block devices to a host share the same SKU but
> have independent throughput budgets, and where the host itself has a cap on
> maximum aggregate throughput.
> EBS merges sequential IOs up to 1 MiB before counting them as a single IOPS
> token. The simulator tracks read streams and write streams independently.
> After each IOPS token consumption, the simulator sleeps for a configurable
> duration (default 1 ms), modeling physical device service time.
> Some naming and concepts heavily favor Amazon's EBS but these naming issues
> can be addressed during review.
> Test integration looks like:
> {noformat}
> Configuration conf = HBaseConfiguration.create();
> // Sets dfs.datanode.fsdataset.factory so that each DataNode started by
> MiniDFSCluster
> // wraps its real FsDatasetImpl with a throttling proxy that intercepts
> block-level IO.
> EBSDevice.configure(conf, /*budgetMbps=*/500, /*budgetIops=*/500,
> /*deviceLatencyUs=*/1000, /*maxIoSizeKb=*/1024, /*instanceMbps=*/1250);
> HBaseTestingUtility util = new HBaseTestingUtility(conf);
> util.startMiniZKCluster();
> MiniDFSCluster dfsCluster = new MiniDFSCluster.Builder(conf)
> .numDataNodes(1)
> .storagesPerDatanode(6)
> .build();
> dfsCluster.waitClusterUp();
> util.setDFSCluster(dfsCluster);
> util.startMiniCluster(1);
> // ... run workload ...
> long bytesRead = EBSDevice.getTotalBytesRead();
> long deviceIops = EBSDevice.getDeviceReadOps();
> String perVolume = EBSDevice.getPerVolumeStats();
> EBSDevice.shutdown();
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)