Aleksei Ieshin created HDDS-15586:
-------------------------------------
Summary: Add freon command to read a user-supplied list of
existing keys
Key: HDDS-15586
URL: https://issues.apache.org/jira/browse/HDDS-15586
Project: Apache Ozone
Issue Type: Improvement
Components: freon
Reporter: Aleksei Ieshin
Assignee: Aleksei Ieshin
h2. Problem
freon's client read generators can only read keys they themselves generated:
* {{ockg}}/{{ockv}} use prefix+index naming, and {{ockv}} validates every
read against key-0's digest (assumes all keys have identical content).
* {{SameKeyReader}} ({{ocokr}}) reads one fixed key from many threads.
There is no freon command that points at an arbitrary, heterogeneous set of
existing keys (a real dataset already in a bucket) and measures read
throughput. This is needed for read-path
performance and capacity/scaling work, where freshly generated uniform keys
are page-cache-hot and not representative of production data.
h2. Proposed change
Add a freon subcommand {{OzoneClientKeyListReader}} ({{ocklr}}) that:
* takes {{--key-file <path>}} — a local file with one key name per line;
blank lines and {{#}} comments ignored;
* reuses {{BaseFreonGenerator}} — a warm shared {{OzoneClient}}, {{-t}}
threads, {{-n}} total reads (task i reads keys[i % keys.size()], so {{-n}}
loops the list), DropWizard timer;
* per read calls {{bucket.readKey(key)}}, drains the stream into a fixed
buffer and counts bytes (no content/digest assumptions); reports the
{{key-read}} timer plus an aggregate bytes/wall-time
MB/s line.
It exercises the same end-to-end read path as {{ozone sh key get}} and the
FileSystem {{open()}} ({{readKey}} -> {{KeyInputStream}} ->
{{BlockInputStream}} -> {{ChunkInputStream}} -> datanode
{{ReadChunk}}), so results reflect the real client read stack. It also
separates client warmth (JIT + pooled datanode connections) from datanode
page-cache effects, and {{-t}} drives concurrency
to find where read throughput saturates.
h2. Example
{code}
ozone freon ocklr -v <volume> -b <bucket> --key-file /tmp/keys.txt -t 8 -n
160
{code}
h2. Implementation notes
* ~110 LOC in hadoop-ozone/tools, mirrors {{OzoneClientKeyValidator}};
registered via {{@MetaInfServices(FreonSubcommand.class)}}. No new
dependencies. Unit test for key-file parsing included.
* Possible refinements from the discussion: per-key MB/s (mean ± stddev), a
{{--buffer-size}} option, a thread-local read buffer, and/or routing the
throughput summary through freon's standard
report instead of a log line.
* Naming ({{ocklr}}) follows the {{ockv}}/{{ockg}}/{{ocokr}} pattern; open to
alternatives.
Discussed and supported on the community forum:
https://github.com/apache/ozone/discussions/10460
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
