Re: [HACKERS] A population of population counts
On Sun, May 8, 2016 at 7:46 PM, Thomas Munrowrote: > My aim with this thread was mainly reducing code duplication and > needless code: perhaps at least the other ideas in the attached > sketch, namely using ffs instead of the rightmost_one_pos table loop > and consolidation of popcount into a reusable API (without trying to > get hardware support) could be worth polishing for the next CF? > Annoyingly, it seems Windows doesn't have POSIX/SUSv2 ffs, though > apparently it can reach that instruction with MSVC intrinsic > _BitScanReverse or MingW __builtin_ffs. I think my_log2() is the same thing as one of ffs() and fls() - I can never keep those straight. It seems like it wouldn't he hard to clean things up at least that much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A population of population counts
On Sat, May 7, 2016 at 4:26 PM, Tom Lanewrote: > David Rowley writes: >> I'd like to see us using those functions, when they're available and >> falling back on the array when they're not. Sounds like that would >> just be a new configure test. Perhaps a good home for some shared code >> would be numutils.c. > > Meh --- numutils.c is about numbers. Maybe "bitutils.c"? > > Another point here is that there might easily be a use-case for such > functionality in frontend code, so I'd lean towards putting the support in > src/common if we do this. I played around with this a bit on the weekend (see rough sketch attached). The trouble with __builtin_popcount and the POPCNT instruction is that you only get them if you ask for -msse4.2 or -mpopcnt, and then you get an illegal instruction trap instead of sensible fallback behaviour on ancient hardware, so no packager would be able to do that. So I guess we'd have to use a runtime test like we do for CRC32 hardware support (the test may actually be identical since both depend on the SSE4.2 instruction set) and maybe inline assembler rather the GCC builtin, and that seems a bit over the top unless someone can show that it's worth it. My aim with this thread was mainly reducing code duplication and needless code: perhaps at least the other ideas in the attached sketch, namely using ffs instead of the rightmost_one_pos table loop and consolidation of popcount into a reusable API (without trying to get hardware support) could be worth polishing for the next CF? Annoyingly, it seems Windows doesn't have POSIX/SUSv2 ffs, though apparently it can reach that instruction with MSVC intrinsic _BitScanReverse or MingW __builtin_ffs. -- Thomas Munro http://www.enterprisedb.com popcount-and-rightmostbitpos-table-cleanup.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A population of population counts
David Rowleywrites: > I'd like to see us using those functions, when they're available and > falling back on the array when they're not. Sounds like that would > just be a new configure test. Perhaps a good home for some shared code > would be numutils.c. Meh --- numutils.c is about numbers. Maybe "bitutils.c"? Another point here is that there might easily be a use-case for such functionality in frontend code, so I'd lean towards putting the support in src/common if we do this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] A population of population counts
On 7 May 2016 at 12:41, Thomas Munrowrote: > Hi > > I noticed that we have three "number_of_ones" tables under contrib and > two under src, and some new specially masked variants for visibility > maps. > > Would it be an improvement if we just defined one table with external > linkage, and accessed it via a macros/functions popcount_uint8, and > wider versions _uint32, popcount_array(data, len) that sum the > popcounts of their component bytes? > > Then there would be less duplication, and future opportunities to use > fancy built-ins/assembler instructions/vectorisation in one central > place, and to work in larger sizes than bytes. > > Perhaps we could also get rid of the new special masked popcount > tables by masking the value we look up instead, eg walk through the > page calling popcount_uint64(value & FROZEN_BITMASK_64). > > As for the rightmost_one_pos table in bitmapset.c, couldn't the > various bms_XXX functions just use ffs(n) - 1 and work word-at-a-time? > That generates a bsf instruction at -O2 on this machine. > > The micro-optimisation opportunities probably don't matter, but I > wondered if it might at least be interesting to delete a bunch of > code, and re-use a standard interface for something that apparently > several modules need to do. I remember looking at GCC's __builtin_popcount() about 6 months ago with thoughts about using it for bms_num_members(). I did see performance improvements from micro-benchmarks to compare it to the number_of_ones[] array, and saw a small improvement. Likely the improvements I did see with those would actually have been better in a real world case, since not having a number_of_ones[] array in a CPU cache might be more useful for a workload with a more contended cache. I'd like to see us using those functions, when they're available and falling back on the array when they're not. Sounds like that would just be a new configure test. Perhaps a good home for some shared code would be numutils.c. I see the functions there declared in builtins.h which I see used in contrib/spi. I think it could all be made to work quite nicely with types like bitmapword just with some preprocessor trickery. I'd say go for it. Sounds like a like these would be some nice reusable functions that we already have a suitable home for. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] A population of population counts
Hi I noticed that we have three "number_of_ones" tables under contrib and two under src, and some new specially masked variants for visibility maps. Would it be an improvement if we just defined one table with external linkage, and accessed it via a macros/functions popcount_uint8, and wider versions _uint32, popcount_array(data, len) that sum the popcounts of their component bytes? Then there would be less duplication, and future opportunities to use fancy built-ins/assembler instructions/vectorisation in one central place, and to work in larger sizes than bytes. Perhaps we could also get rid of the new special masked popcount tables by masking the value we look up instead, eg walk through the page calling popcount_uint64(value & FROZEN_BITMASK_64). As for the rightmost_one_pos table in bitmapset.c, couldn't the various bms_XXX functions just use ffs(n) - 1 and work word-at-a-time? That generates a bsf instruction at -O2 on this machine. The micro-optimisation opportunities probably don't matter, but I wondered if it might at least be interesting to delete a bunch of code, and re-use a standard interface for something that apparently several modules need to do. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers