On 02/11/2015 11:28 AM, Abhijit Menon-Sen wrote:
1. I don't mind moving platform-specific tests for CRC32C instructions
    to configure, but if we only define PG_HAVE_CRC32C_INSTRUCTIONS, we
    would anyway have to reproduce all that platform-specific stuff in
    the header file. To do it properly, I think we should generate the
    right version of crc_instructions.h for the platform. Otherwise,
    what's the point? Might as well have only the complicated header.

I don't follow. I didn't change configure at all, compared to your patch.

2. At this point, I think we should stick with the _sse function rather
    than a generic _hw one to "drive" the platform-specific instructions.
    The structure of that function (n/8*(8bytes)+n%8*(bytes)) is specific
    to what works best for the SSE instructions. On another platform, we
    may need something very similar, or very different.

    With the proposed structure, _hw would inevitably become #ifdef'ed
    non-trivially, e.g. to run a single-byte loop at the start to align
    the main loop, but only for certain combinations of #defines.

    I think we should consider what makes the most sense when someone
    actually wants to add support for another platform/compiler, and
    not before.

Hmm, ok. The ARM CRC32C instruction is very similar to the SSE4.2 one, but I presume it does require alignment.

3. I dislike everything about pg_crc32_instructions_runtime_check()—the
    name, the separate PG_CRC32C_INSTRUCTIONS_NEED_RUNTIME_CHECK #define,
    the fact that you try to do the test "inline" rather than at startup…

    Maybe you're trying to avoid checking separately at startup so that a
    program that links with code from src/common doesn't need to do its
    own test. But I don't think the results are worth it.

As a case in point, with your patch pg_xlogdump would not have used the CRC instruction, because it never called pg_choose_crc_impl(). "Choose on first use" is much more convenient than requiring every program to call a function.

    As for omitting the slicing-by-8 tables, if there's a more compelling
    reason to do that than "Gentoo users might like that ;-)", I think it
    should be done with a straightforward -DUSE_CRC_SSE style arrangement
    rather than figuring out that you HAVE_CRC32C_INSTRUCTIONS but don't
    NEED_RUNTIME_CHECK. If you want to build special-purpose binaries,
    just pick whichever CRC implementation you like, done.

I stopped short of actually allowing you to force the use of SSE, e.g. with -DUSE_CRC_SSE, but my thinking was that that would force PG_HAVE_CRC32C_INSTRUCTIONS to be set, and NEEDS_RUNTIME_CHECK to be unset. I.e. those two #defines control what code is generated, but in the header file. I think what you're suggesting is that we'd instead have two mutually exclusive #defines, something like "USE_CRC_SSE_ALWAYS" and "USE_CRC_SSE_WITH_RUNTIME_CHECK". It wouldn't be much different, but would require some places to check for both.

It would be nice to use GCC's builtin intrinsics,
__builtin_ia32_crc32* where available, but I guess those are only
available if you build with -msse4.2, and that would defeat the
point of the runtime check.

Exactly.

Hmm. Perhaps we should check for the built-in functions, and if they're available, use them and not set NEEDS_RUNTIME_CHECK. If you compile with -msse4.2, the code won't run without SSE4.2 anyway (or at least isn't guaranteed to run).

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to