On Sun, 2020-04-19 at 16:19 -0700, Peter Geoghegan wrote: > Is it possible that the issue has something to do with what the > compiler knows about the alignment of the tapes back when they were a > flexible array vs. now, where it's a separate allocation? Perhaps I'm > over reaching, but it occurs to me that MemSetAligned() is itself > concerned about the alignment of data returned from palloc(). Could > be > a similar issue here, too.
The memcpy() is for the buffer, not the array of LogicalTapes, so I don't really see how that would happen. > Some guy on the internet says that microarchitectural issues can make > __memcpy_avx_unaligned() a lot faster that the "rep movsq" > instruction > (which you mentioned was a factor on the other thread) in some cases > [1]. This explanation sounds kind of plausible. > > [1] https://news.ycombinator.com/item?id=12050579 That raises another consideration: perhaps this is not uniformly a regression, but actually faster in some situations? If so, what situations? Regards, Jeff Davis