On 25/7/25 00:10, Henry, Andrew wrote:
> Yes, me too, I've also been in the IT industry since 1992, and understand 
> product spec sheet theoretical maximums.
:-) Old-timers then... Hehehehe.....
>
> ""dd" will always give you a sequential read or write workload which will 
> always trigger optimisation functions in the various controllers and HDD/SSD 
> firmware. The only thing it will show you is that your numbers will somewhat 
> align to the numbers in the datasheet."
>
> Which is what I'm initially trying to validate, but not seeing.
I thought your first post started off running on a hypervisor. In that 
case all bets are off and you're subject to what the host OS will allow 
you to do. That puts a whole other pie in the mix.
>
> " Furthermore, reading from /dev/zero to push a workload will for sure skew 
> numbers. The controllers are very smart and have been for a long time. If 
> they "see" a certain data characteristic they can change the write behaviour 
> to the physical platter or the NAND cells."
>
> I had considered this, and if this was the case, wouldn't it give *better* 
> performance?  If a DeDup identifies a pattern for instance, wouldn't I get 
> *better* values in my tests than what the hardware was capable of?  In which 
> circumstance would I see worse numbers when controller logic kicked in?
One of the reasons could be that load-leveling logic in the SSD firmware 
may cause delays, that's not something you can control. Also, if the 
hypervisor does allow direct-io and the guest needs to wait for IO 
completion or if the hypervisor has some settings applied that interfere 
with IO completion, you will likely see skewed results. My experience is 
that whenever you want to validate a specsheet you need to test this 
under the same characteristics or you would need to be able to align 
your IO profile that aligns to the specifications of the hardware. It 
can be a massive time-waster if you can't automate the test-logic and 
validate the outcomes to the datasheet numbers.
>
> /AH
>
>
> -----Original Message-----
> From: Erwin van Londen <[email protected]>
> Sent: 24 July 2025 15:57
> To: Henry, Andrew <[email protected]>; Zdenek Kabelac 
> <[email protected]>; [email protected]
> Subject: Re: striped LV and expected performance
>
>
> Having worked in the storage industry since around 1995 with DEC, Compaq, 
> Hewlett Packard and Hitachi Data Systems (Vantara) I've seen a fair amount of 
> spec sheets. Be aware, what you see on these sheets are indeed optimum values 
> measured against optimum characteristics for that piece of hardware. These 
> sheets are only partially written by engineers but will have a marketing 
> sauce added resulting in some potentially skewed information. Engineering 
> information will most often outline the conditions of these numbers whereas 
> marketing people will most remove them as it simply looks better.
>
>    "dd" will always give you a sequential read or write workload which will 
> always trigger optimisation functions in the various controllers and HDD/SSD 
> firmware. The only thing it will show you is that your numbers will somewhat 
> align to the numbers in the datasheet. Various OS settings on the scheduler, 
> dm and filesystem can have a significant influence on these numbers. I'm 
> pretty sure that the information datasheets is about the maximum you can suck 
> out a piece of hardware.
> Everything you do on your side in the various OS layers will only negatively 
> impact the raw performance numbers, let alone having a representative 
> application workload pushed onto it.
>
> "fio" gives you a bit more options and parameters however this will also 
> depend significantly on how the kernel and it's IO layers such as the device 
> mapper and filesystems interact with the hardware. Using zones for example 
> would require insight into the way the hardware is build, especially on 
> HDD's. If you don't have that you may as well put a wet finger in the air and 
> go for a trial and error run.
>
> Furthermore, reading from /dev/zero to push a workload will for sure skew 
> numbers. The controllers are very smart and have been for a long time. If 
> they "see" a certain data characteristic they can change the write behaviour 
> to the physical platter or the NAND cells.
>
> Everything you do that does not reflect a real life workload is a "just for 
> shits and giggles" exercise but will not give any real meaningful outcome. 
> Believe me, I've been through this discussion more than once.
>
>>
>>
>> -----Original Message-----
>> From: Erwin van Londen <[email protected]>
>> Sent: 21 July 2025 05:26
>> To: Zdenek Kabelac <[email protected]>; Henry, Andrew 
>> <[email protected]>; [email protected]
>> Subject: Re: striped LV and expected performance
>>
>>
>> 3. Unreal cache optimisations. Using dd is by far the worst option to use 
>> for performance tests as it will never (Ok, almost never) align with real 
>> workloads. If you use dd for performance test you will find that this will 
>> backfire in most cases when a normal workload is applied. The main reason is 
>> that dd will always have a sequential workload unless you start a large 
>> amount of dd instances to the same disk at once with different offsets. Even 
>> then you will see an obscure number coming back.
>>
>



Reply via email to