On Wed, Sep 10, 2014 at 12:43 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Sep 9, 2014 at 10:08 AM, Arthur Silva <arthur...@gmail.com> wrote: > > I'm continuously studying Postgres codebase. Hopefully I'll be able to > make > > some contributions in the future. > > > > For now I'm intrigued about the extensive use of memory alignment. I'm > sure > > there's some legacy and some architecture that requires it reasoning > behind > > it. > > > > That aside, since it wastes space (a lot of space in some cases) there > must > > be a tipping point somewhere. I'm sure one can prove aligned access is > > faster in a micro-benchmark but I'm not sure it's the case in a DBMS like > > postgres, specially in the page/rows area. > > > > Just for the sake of comparison Mysql COMPACT storage (default and > > recommended since 5.5) doesn't align data at all. Mysql NDB uses a fixed > > 4-byte alignment. Not sure about Oracle and others. > > > > Is it worth the extra space in newer architectures (specially Intel)? > > Do you guys think this is something worth looking at? > > Yes. At least in my opinion, though, it's not a good project for a > beginner. If you get your changes to take effect, you'll find that a > lot of things will break in places that are not easy to find or fix. > You're getting into really low-level areas of the system that get > touched infrequently and require a lot of expertise in how things work > today to adjust. > I thought all memory alignment was (or at least the bulk of it) handled using some codebase wide macros/settings, otherwise how could different parts of the code inter-op? Poking this area might suffice for some initial testing to check if it's worth any more attention. Unaligned memory access received a lot attention in Intel post-Nehalen era. So it may very well pay off on Intel servers. You might find this blog post and it's comments/external-links interesting http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/ I'm a newbie in the codebase, so please let me know if I'm saying anything non-sense. > The idea I've had before is to try to reduce the widest alignment we > ever require from 8 bytes to 4 bytes. That is, look for types with > typalign = 'd', and rewrite them to have typalign = 'i' by having them > use two 4-byte loads to load an eight-byte value. In practice, I > think this would probably save a high percentage of what can be saved, > because 8-byte alignment implies a maximum of 7 bytes of wasted space, > while 4-byte alignment implies a maximum of 3 bytes of wasted space. > And it would probably be pretty cheap, too, because any type with less > than 8 byte alignment wouldn't be affected at all, and even those > types that were affected would only be slightly slowed down by doing > two loads instead of one. In contrast, getting rid of alignment > requirements completely would save a little more space, but probably > at the cost of a lot more slowdown: any type with alignment > requirements would have to fetch the value byte-by-byte instead of > pulling the whole thing out at once. > Does byte-by-byte access stand true nowadays? I though modern processors would fetch memory at very least in "word" sized chunks, so 4/8 bytes then merge-slice. > But there are a couple of obvious problems with this idea, too, such as: > > 1. It's really complicated and a ton of work. > 2. It would break pg_upgrade pretty darn badly unless we employed some > even-more-complex strategy to mitigate that. > 3. The savings might not be enough to justify the effort. > Very true. > It might be interesting for someone to develop a tool measuring the > number of bytes of alignment padding we lose per tuple or per page and > gather some statistics on it on various databases. That would give us > some sense as to the possible savings. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >