Hi Ravi, > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur > Sent: Friday, May 08, 2015 11:55 PM > To: Matt Laswell > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH v2] Implement memcmp using AVX/SSE > instructions. > > On Fri, May 8, 2015 at 3:29 PM, Matt Laswell <laswell at infiniteio.com> > wrote: > > > > > > > On Fri, May 8, 2015 at 4:19 PM, Ravi Kerur <rkerur at gmail.com> wrote: > > > >> This patch replaces memcmp in librte_hash with rte_memcmp which is > >> implemented with AVX/SSE instructions. > >> > >> +static inline int > >> +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) > >> +{ > >> + const uint8_t *src_1 = (const uint8_t *)_src_1; > >> + const uint8_t *src_2 = (const uint8_t *)_src_2; > >> + int ret = 0; > >> + > >> + if (n & 0x80) > >> + return rte_cmp128(src_1, src_2); > >> + > >> + if (n & 0x40) > >> + return rte_cmp64(src_1, src_2); > >> + > >> + if (n & 0x20) { > >> + ret = rte_cmp32(src_1, src_2); > >> + n -= 0x20; > >> + src_1 += 0x20; > >> + src_2 += 0x20; > >> + } > >> > >> > > Pardon me for butting in, but this seems incorrect for the first two cases > > listed above, as the function as written will only compare the first 128 or > > 64 bytes of each source and return the result. The pattern expressed in > > the 32 byte case appears more correct, as it compares the first 32 bytes > > and then lets later pieces of the function handle the smaller remaining > > bits of the sources. Also, if this function is to handle arbitrarily large > > source data, the 128 byte case needs to be in a loop. > > > > What am I missing? > > > > Current max hash key length supported is 64 bytes, hence no comparison is > done after 64 bytes. 128 bytes comparison is added to measure performance > only and there is no use-case as of now. With the current use-cases its not > required but if there is a need to handle large arbitrary data upto 128 > bytes it can be modified.
So on x86 let say rte_memcmp(k1, k2, 65) might produce invalid results, right? While on PPC will work as expected (as it calls memcpu underneath)? That looks really weird to me. If you plan to use rte_memcmp only for hash comparisons, then probably you should put it somewhere into librte_hash and name it accordingly: rte_hash_key_cmp() or something. And put a big comment around it, that it only works with particular lengths. If you want it to be a generic function inside EAL, then it probably need to handle different lengths properly on all supported architectures. Konstantin > > > > > -- > > Matt Laswell > > infinite io, inc. > > laswell at infiniteio.com > > > >

