[ 
https://issues.apache.org/jira/browse/HBASE-30062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HBASE-30062:
-----------------------------------
    Labels: pull-request-available  (was: )

> Device layer simulator for MiniDFSCluster-based tests
> -----------------------------------------------------
>
>                 Key: HBASE-30062
>                 URL: https://issues.apache.org/jira/browse/HBASE-30062
>             Project: HBase
>          Issue Type: New Feature
>          Components: HFile, integration tests, test, wal
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Minor
>              Labels: pull-request-available
>
> On EBS-backed deployments in AWS, or equivalents in other cloud 
> infrastructure providers, HBase compaction and replication throughput can be 
> constrained by per-volume IOPS limits rather than bandwidth. A faithful 
> device-level simulator within the test harness allows developers to 
> reproduce, analyze, and validate fixes for such performance issues without 
> requiring actual cloud infrastructure.
> This proposed change adds a test-only EBS device layer that operates at the 
> DataNode storage level within {{MiniDFSCluster}} by replacing the 
> {{FsDatasetSpi}} implementation via Hadoop's pluggable factory mechanism. 
> This allows HBase integration tests to simulate realistic cloud block storage 
> characteristics, such as per-volume bandwidth budgets, IOPS limits, 
> sequential IO coalescing, and per-IO device latency, enabling identification 
> and reproduction of IO bottlenecks.
> The simulator wraps the real {{FsDatasetImpl}} with a reflection proxy that 
> intercepts the three SPI methods where DataNode local IO actually engages the 
> underlying block device, without compile-time coupling to internal Hadoop 
> classes.
> On the read path, {{getBlockInputStream}} wraps the returned {{InputStream}} 
> with {{{}ThrottledBlockInputStream{}}}, charging every byte against the 
> volume's BW and IOPS budgets with sequential IO coalescing. On the write 
> path, {{submitBackgroundSyncFileRangeRequest}} charges {{nbytes}} against BW 
> and IOPS budgets, modeling the async 
> {{sync_file_range(SYNC_FILE_RANGE_WRITE)}} that the DataNode issues to flush 
> dirty pages from the operating system's page cache to the block device; and 
> {{finalizeBlock}} charges the remaining unflushed delta (minus bytes already 
> charged via sync_file_range) against the budgets, modeling the {{fsync()}} at 
> block finalization.
> Each proxy gets its own set of {{EBSVolumeDevice}} instances with independent 
> budgets. Block-to-volume resolution uses {{{}delegate.getVolume(block){}}}, 
> providing real HDFS placement decisions. A single configuration applies to 
> all volumes, but each volume maintains its own token buckets, matching 
> production where all attached block devices to a host share the same SKU but 
> have independent throughput budgets, and where the host itself has a cap on 
> maximum aggregate throughput.
> EBS merges sequential IOs up to 1 MiB before counting them as a single IOPS 
> token. The simulator tracks read streams and write streams independently.
> After each IOPS token consumption, the simulator sleeps for a configurable 
> duration (default 1 ms), modeling physical device service time.
> Some naming and concepts heavily favor Amazon's EBS but these naming issues 
> can be addressed during review.
> Test integration looks like:
> {noformat}
> Configuration conf = HBaseConfiguration.create();
> // Sets dfs.datanode.fsdataset.factory so that each DataNode started by 
> MiniDFSCluster
> // wraps its real FsDatasetImpl with a throttling proxy that intercepts 
> block-level IO.
> EBSDevice.configure(conf, /*budgetMbps=*/500, /*budgetIops=*/500,
>     /*deviceLatencyUs=*/1000, /*maxIoSizeKb=*/1024, /*instanceMbps=*/1250);
> HBaseTestingUtility util = new HBaseTestingUtility(conf);
> util.startMiniZKCluster();
> MiniDFSCluster dfsCluster = new MiniDFSCluster.Builder(conf)
>     .numDataNodes(1)
>     .storagesPerDatanode(6)
>     .build();
> dfsCluster.waitClusterUp();
> util.setDFSCluster(dfsCluster);
> util.startMiniCluster(1);
> // ... run workload ...
> long bytesRead    = EBSDevice.getTotalBytesRead();
> long deviceIops   = EBSDevice.getDeviceReadOps();
> String perVolume  = EBSDevice.getPerVolumeStats();
> EBSDevice.shutdown();
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to