Re: [DISCUSS] KIP-1299 Use key range in ProducerPerformance

黃竣陽 Tue, 26 May 2026 04:47:29 -0700

Hello PoAn,

Thanks for the feedback


poan_00: In range mode, keys are generated by `recordIndex % keyRange`,
which is fully deterministic and not affected by `--random-seed`. The seed only 
controls 
the PRNG used for random payload generation in that case. The example is 
misleading,
I will remove it.

poan_01:  According to the JDK documentation, `SplittableRandom` generates 
uniformly 
distributed pseudorandom values. With a sufficiently large number of records, 
each key 
in random mode appears roughly the same number of times, so the partition 
distribution s
tatistically converges toward behavior similar to range mode. 

The main difference is that random mode introduces short-term burstiness, where 
the same 
key may appear consecutively for a period of time, while range mode produces a 
perfectly 
even round-robin pattern. However, neither mode inherently creates a truly 
skewed (hot-partition) 
distribution. 

I’ll update the motivation section to remove the hot-partition claim for random 
mode.

Best Regards,
Jiunn-Yang

> PoAn Yang <[email protected]> 於 2026年5月26日 晚上7:18 寫道：
> 
> Hi Jiunn,
> 
> Thanks for the KIP.
> 
> poan_00: In example usage, there is a case use --key-distribution range with 
> --random-seed.
> In this case, does the --random-seed parameter take effect? If not, can we 
> remove it?
> 
> poan_01: In motivation, one use case of random distribution is hot-partition 
> scenario.
> However, in JDK document, the SplittableRandom is a generator of uniform 
> pseudorandom values [0].
> If hot-partition scenario is just because small key range, can we do it with 
> range key distribution directly?
> 
> https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/SplittableRandom.html
> 
> Best,
> PoAn
> 
>> On May 20, 2026, at 8:56 PM, 黃竣陽 <[email protected]> wrote:
>> 
>> Hi chia,
>> 
>> Thanks for the feedback,
>> 
>> chia_00: I have added a new optional argument --random-seed <SEED> (default: 
>> 0) 
>> to let users set the seed manually. The default value of 0 ensures 
>> deterministic, reproducible 
>> benchmark runs by default. 
>> 
>> chia_01: I have updated the Motivation section in the KIP to elaborate on 
>> the practical 
>> use cases for each key distribution mode.
>> 
>> Best Regards,
>> Jiunn-Yang
>> 
>>> Chia-Ping Tsai <[email protected]> 於 2026年5月20日 上午11:48 寫道：
>>> 
>>> hi Jiunn
>>> 
>>> thanks for this KIP!
>>> 
>>> chia_00: Regarding the random seed, what are your thoughts on its 
>>> initialization?
>>> 
>>> chia_01: Could you elaborate on the practical use cases for each key 
>>> distribution mode in the Motivation section?
>>> 
>>> Best,Chia-Ping
>>> 
>>> On 2026/03/30 13:06:05 黃竣陽 wrote:
>>>> Hello everyone, 
>>>> 
>>>> I would like to start a discussion on KIP-1299 Use key range in 
>>>> ProducerPerformance
>>>> <https://cwiki.apache.org/confluence/x/XpQ8G>
>>>> 
>>>> This proposal aims to add configurable key distribution support to 
>>>> kafka-producer-perf-test. 
>>>> Currently, the tool always produces records with null keys, which does not 
>>>> reflect real-world 
>>>> keyed workloads. This KIP introduces two new arguments — 
>>>> --key-distribution and --message-key-range 
>>>> — enabling engineers to benchmark with round-robin or random key 
>>>> strategies over a bounded 
>>>> key space, providing more realistic performance measurements.
>>>> 
>>>> Best regards,
>>>> Jiunn-Yang
>> 
>

Re: [DISCUSS] KIP-1299 Use key range in ProducerPerformance

Reply via email to