On Mon, Nov 06, 2023 at 09:53:15PM -0800, Noah Misch wrote:
> On Mon, Nov 06, 2023 at 09:59:26PM -0600, Nathan Bossart wrote:
>> On Mon, Nov 06, 2023 at 07:15:01PM -0800, Noah Misch wrote:
>> > The glibc/gcc "ifunc" mechanism was designed to solve this problem of 
>> > choosing
>> > a function implementation based on the runtime CPU, without incurring 
>> > function
>> > pointer overhead.  I would not attempt to use AVX512 on non-glibc systems, 
>> > and
>> > I would use ifunc to select the desired popcount implementation on glibc:
>> > https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Function-Attributes.html
>> 
>> Thanks, that seems promising for the function pointer cases.  I'll plan on
>> trying to convert one of the existing ones to use it.  BTW it looks like
>> LLVM has something similar [0].
>> 
>> IIUC this unfortunately wouldn't help for cases where we wanted to keep
>> stuff inlined, such as is_valid_ascii() and the functions in pg_lfind.h,
>> unless we applied it to the calling functions, but that doesn't Ń•ound
>> particularly maintainable.
> 
> Agreed, it doesn't solve inline cases.  If the gains are big enough, we should
> move toward packages containing N CPU-specialized copies of the postgres
> binary, with bin/postgres just exec'ing the right one.

I performed a quick test with ifunc on my x86 machine that ordinarily uses
the runtime checks for the CRC32C code, and I actually see a consistent
3.5% regression for pg_waldump -z on 100M 65-byte records.  I've attached
the patch used for testing.

The multiple-copies-of-the-postgres-binary idea seems interesting.  That's
probably not something that could be enabled by default, but perhaps we
could add support for a build option.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/include/port/pg_crc32c.h b/src/include/port/pg_crc32c.h
index d085f1dc00..6db411ee29 100644
--- a/src/include/port/pg_crc32c.h
+++ b/src/include/port/pg_crc32c.h
@@ -78,7 +78,7 @@ extern pg_crc32c pg_comp_crc32c_loongarch(pg_crc32c crc, const void *data, size_
 #define FIN_CRC32C(crc) ((crc) ^= 0xFFFFFFFF)
 
 extern pg_crc32c pg_comp_crc32c_sb8(pg_crc32c crc, const void *data, size_t len);
-extern pg_crc32c (*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len);
+extern pg_crc32c pg_comp_crc32c(pg_crc32c crc, const void *data, size_t len);
 
 #ifdef USE_SSE42_CRC32C_WITH_RUNTIME_CHECK
 extern pg_crc32c pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len);
diff --git a/src/port/pg_crc32c_sse42_choose.c b/src/port/pg_crc32c_sse42_choose.c
index 41ff4a35ad..62bb981ee8 100644
--- a/src/port/pg_crc32c_sse42_choose.c
+++ b/src/port/pg_crc32c_sse42_choose.c
@@ -51,14 +51,14 @@ pg_crc32c_sse42_available(void)
  * so that subsequent calls are routed directly to the chosen implementation.
  */
 static pg_crc32c
-pg_comp_crc32c_choose(pg_crc32c crc, const void *data, size_t len)
+(*pg_comp_crc32c_choose (void))(pg_crc32c crc, const void *data, size_t len)
 {
 	if (pg_crc32c_sse42_available())
-		pg_comp_crc32c = pg_comp_crc32c_sse42;
+		return pg_comp_crc32c_sse42;
 	else
-		pg_comp_crc32c = pg_comp_crc32c_sb8;
-
-	return pg_comp_crc32c(crc, data, len);
+		return pg_comp_crc32c_sb8;
 }
 
-pg_crc32c	(*pg_comp_crc32c) (pg_crc32c crc, const void *data, size_t len) = pg_comp_crc32c_choose;
+pg_crc32c
+pg_comp_crc32c(pg_crc32c crc, const void *data, size_t len)
+	__attribute__ ((ifunc ("pg_comp_crc32c_choose")));

Reply via email to