On 2026/1/21 15:01, Andy Shevchenko wrote: > On Wed, Jan 21, 2026 at 8:44 AM Feng Jiang <[email protected]> wrote: >> On 2026/1/20 15:36, Andy Shevchenko wrote: >>> On Tue, Jan 20, 2026 at 02:58:44PM +0800, Feng Jiang wrote: > > ... >>>> word-at-a-time logic, showing significant gains as the string length >>>> increases. >>> >>> Hmm... Have you tried to optimise the generic implementation to use >>> word-at-a-time logic and compare? >> >> Regarding the generic implementation, even if we were to optimize the C code >> to use word-at-a-time logic (the has_zero() style bit-manipulation), it still >> wouldn't match the Zbb version's efficiency. >> >> The traditional C-based word-level detection requires a sequence of >> arithmetic >> operations to identify NUL bytes. In contrast, the RISC-V orc.b instruction >> collapses this entire check into a single hardware cycle. I've focused on >> this >> architectural approach to fully leverage these specific Zbb features, which >> provides a level of instruction density that generic C math cannot achieve. > > I understand that. My point is if we move the generic implementation > to use word-at-a-time technique the difference should not go 4x, > right? Perhaps 1.5x or so. I believe this will be a very useful > exercise. >
That is a very insightful point, thanks for the suggestion. I'll look into optimizing the generic string library as a follow-up task to see if we can bring some improvements there as well. Thanks again for the guidance. -- With Best Regards, Feng Jiang
