Em qua., 18 de mai. de 2022 às 19:57, David Rowley <dgrowle...@gmail.com> escreveu:
> On Thu, 19 May 2022 at 02:08, Ranier Vilela <ranier...@gmail.com> wrote: > > That would initialize the content at compilation and not at runtime, > correct? > > Your mental model of compilation and run-time might be flawed here. > Here's no such thing as zeroing memory at compile time. There's only > emitting instructions that perform those tasks at run-time. > https://godbolt.org/ might help your understanding. > > > There are a lot of cases using MemSet (with struct variables) and at > Windows 64 bits, long are 4 (four) bytes. > > So I believe that MemSet is less efficient on Windows than on Linux. > > "The size of the '_vstart' buffer is not a multiple of the element size > of the type 'long'." > > message from PVS-Studio static analysis tool. > > I've been wondering for a while if we really need to have the MemSet() > macro. I see it was added in 8cb415449 (1997). I think compilers have > evolved quite a bit in the past 25 years, so it could be time to > revisit that. > +1 All compilers currently have memset optimized. > Your comment on the sizeof(long) on win64 is certainly true. I wrote > the attached C program to test the performance difference. > > (windows 64-bit) > >cl memset.c /Ox > >memset 200000000 > Running 200000000 loops > MemSet: size 8: 1.833000 seconds > MemSet: size 16: 1.841000 seconds > MemSet: size 32: 1.838000 seconds > MemSet: size 64: 1.851000 seconds > MemSet: size 128: 3.228000 seconds > MemSet: size 256: 5.278000 seconds > MemSet: size 512: 3.943000 seconds > memset: size 8: 0.065000 seconds > memset: size 16: 0.131000 seconds > memset: size 32: 0.262000 seconds > memset: size 64: 0.530000 seconds > memset: size 128: 1.169000 seconds > memset: size 256: 2.950000 seconds > memset: size 512: 3.191000 seconds > > It seems like there's no cases there where MemSet is faster than > memset. I was careful to only provide MemSet() with inputs that > result in it not using the memset fallback. I also provided constants > so that the decision about which method to use was known at compile > time. > > It's not clear to me why 512 is faster than 256. Probably broken alignment with 256? Another warning from PVS-Studio: [1] "The pointer '_start' is cast to a more strictly aligned pointer type." src/contrib/postgres_fdw/connection.c (Line 1690) MemSet(values, 0, sizeof(values)); > I saw the same on a repeat run. > > Changing "long" to "long long" it looks like: > > >memset 200000000 > Running 200000000 loops > MemSet: size 8: 0.066000 seconds > MemSet: size 16: 1.978000 seconds > MemSet: size 32: 1.982000 seconds > MemSet: size 64: 1.973000 seconds > MemSet: size 128: 1.970000 seconds > MemSet: size 256: 3.225000 seconds > MemSet: size 512: 5.366000 seconds > memset: size 8: 0.069000 seconds > memset: size 16: 0.132000 seconds > memset: size 32: 0.265000 seconds > memset: size 64: 0.527000 seconds > memset: size 128: 1.161000 seconds > memset: size 256: 2.976000 seconds > memset: size 512: 3.179000 seconds > > The situation is a little different on my Linux machine: > > $ gcc memset.c -o memset -O2 > $ ./memset 200000000 > Running 200000000 loops > MemSet: size 8: 0.000002 seconds > MemSet: size 16: 0.000000 seconds > MemSet: size 32: 0.094041 seconds > MemSet: size 64: 0.184618 seconds > MemSet: size 128: 1.781503 seconds > MemSet: size 256: 2.547910 seconds > MemSet: size 512: 4.005173 seconds > memset: size 8: 0.046156 seconds > memset: size 16: 0.046123 seconds > memset: size 32: 0.092291 seconds > memset: size 64: 0.184509 seconds > memset: size 128: 1.781518 seconds > memset: size 256: 2.577104 seconds > memset: size 512: 4.004757 seconds > > It looks like part of the work might be getting optimised away in the > 8-16 MemSet() calls. > On linux (long) have 8 bytes. I'm still surprised that MemSet (8/16) is faster. > clang seems to have the opposite for size 8. > > $ clang memset.c -o memset -O2 > $ ./memset 200000000 > Running 200000000 loops > MemSet: size 8: 0.007653 seconds > MemSet: size 16: 0.005771 seconds > MemSet: size 32: 0.011539 seconds > MemSet: size 64: 0.023095 seconds > MemSet: size 128: 0.046130 seconds > MemSet: size 256: 0.092269 seconds > MemSet: size 512: 0.968564 seconds > memset: size 8: 0.000000 seconds > memset: size 16: 0.005776 seconds > memset: size 32: 0.011559 seconds > memset: size 64: 0.023069 seconds > memset: size 128: 0.046129 seconds > memset: size 256: 0.092243 seconds > memset: size 512: 0.968534 seconds > > There does not seem to be any significant reduction in the size of the > binary from changing the MemSet macro to directly use memset. It went > from 9865008 bytes down to 9860800 bytes (4208 bytes less). > Anyway I think on Windows 64 bits, it is very worthwhile to remove the MemSet macro. regards, Ranier Vilela [1] https://pvs-studio.com/en/docs/warnings/v1032/