On Wed, Sep 10, 2014 at 12:43 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Sep 9, 2014 at 10:08 AM, Arthur Silva <arthur...@gmail.com> wrote:
> > I'm continuously studying Postgres codebase. Hopefully I'll be able to
> make
> > some contributions in the future.
> >
> > For now I'm intrigued about the extensive use of memory alignment. I'm
> sure
> > there's some legacy and some architecture that requires it reasoning
> behind
> > it.
> >
> > That aside, since it wastes space (a lot of space in some cases) there
> must
> > be a tipping point somewhere. I'm sure one can prove aligned access is
> > faster in a micro-benchmark but I'm not sure it's the case in a DBMS like
> > postgres, specially in the page/rows area.
> >
> > Just for the sake of comparison Mysql COMPACT storage (default and
> > recommended since 5.5) doesn't align data at all. Mysql NDB uses a fixed
> > 4-byte alignment. Not sure about Oracle and others.
> >
> > Is it worth the extra space in newer architectures (specially Intel)?
> > Do you guys think this is something worth looking at?
>
> Yes.  At least in my opinion, though, it's not a good project for a
> beginner.  If you get your changes to take effect, you'll find that a
> lot of things will break in places that are not easy to find or fix.
> You're getting into really low-level areas of the system that get
> touched infrequently and require a lot of expertise in how things work
> today to adjust.
>

I thought all memory alignment was (or at least the bulk of it) handled
using some codebase wide macros/settings, otherwise how could different
parts of the code inter-op? Poking this area might suffice for some initial
testing to check if it's worth any more attention.

Unaligned memory access received a lot attention in Intel post-Nehalen era.
So it may very well pay off on Intel servers. You might find this blog post
and it's comments/external-links interesting
http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/

I'm a newbie in the codebase, so please let me know if I'm saying anything
non-sense.


> The idea I've had before is to try to reduce the widest alignment we
> ever require from 8 bytes to 4 bytes.  That is, look for types with
> typalign = 'd', and rewrite them to have typalign = 'i' by having them
> use two 4-byte loads to load an eight-byte value.  In practice, I
> think this would probably save a high percentage of what can be saved,
> because 8-byte alignment implies a maximum of 7 bytes of wasted space,
> while 4-byte alignment implies a maximum of 3 bytes of wasted space.
> And it would probably be pretty cheap, too, because any type with less
> than 8 byte alignment wouldn't be affected at all, and even those
> types that were affected would only be slightly slowed down by doing
> two loads instead of one.  In contrast, getting rid of alignment
> requirements completely would save a little more space, but probably
> at the cost of a lot more slowdown: any type with alignment
> requirements would have to fetch the value byte-by-byte instead of
> pulling the whole thing out at once.
>

Does byte-by-byte access stand true nowadays? I though modern processors
would fetch memory at very least in "word" sized chunks, so 4/8 bytes then
merge-slice.


> But there are a couple of obvious problems with this idea, too, such as:
>
> 1. It's really complicated and a ton of work.
>
2. It would break pg_upgrade pretty darn badly unless we employed some
> even-more-complex strategy to mitigate that.
> 3. The savings might not be enough to justify the effort.
>

Very true.


> It might be interesting for someone to develop a tool measuring the
> number of bytes of alignment padding we lose per tuple or per page and
> gather some statistics on it on various databases.  That would give us
> some sense as to the possible savings.
>

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Reply via email to