GitHub user iyerr3 opened a pull request:
https://github.com/apache/madlib/pull/319
Allocator: Remove 16-byte alignment for pointers in GP6
Findings:
1. MADlib performs a 16-byte alignment for pointers returned by palloc.
2. Postgres prepends a small (16 byte usually) header before every
pointer which includes
a. the memory context and
b. the size of the memory allocation.
3. Greenplum 6+ tweaks that scheme a little: instead of the memory context,
the header tracks a "shared header" which points to another struct with
richer information (aside from the memory context).
4. Postgres calls MemoryContextContains both with the final func
for an aggregate and finalize function for a windowed aggregate.
5. Currently Postgres always concludes that the datum from MADlib is
allocated outside of the context, and makes an extra copy. In
Greenplum, MemoryContextContains needs to dereference the shared header.
This is a problem since the pointer has been shifted and the function is
misinterpreting the header.
In this commit, we disable the pointer alignment for GPDB 6+ to avoid
failure in this check. Further, we also have to disable vectorization in
Eigen since it does not work when pointers are not 16-byte aligned.
Co-authored-by: Jesse Zhang <[email protected]>
Co-authored-by: Nandish Jayaram <[email protected]>
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/madlib/madlib bugfix/pointer_alignment_fix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/madlib/pull/319.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #319
----
commit 5cd1b6b56f93b177df3cda93f18cbf76535b3a07
Author: Rahul Iyer <riyer@...>
Date: 2018-09-12T23:59:59Z
Allocator: Remove 16-byte alignment for pointers in GP6
Findings:
1. MADlib performs a 16-byte alignment for pointers returned by palloc.
2. Postgres prepends a small (16 byte usually) header before every
pointer which includes
a. the memory context and
b. the size of the memory allocation.
3. Greenplum 6+ tweaks that scheme a little: instead of the memory context,
the header tracks a "shared header" which points to another struct with
richer information (aside from the memory context).
4. Postgres calls MemoryContextContains both with the final func
for an aggregate and finalize function for a windowed aggregate.
5. Currently Postgres always concludes that the datum from MADlib is
allocated outside of the context, and makes an extra copy. In
Greenplum, MemoryContextContains needs to dereference the shared header.
This is a problem since the pointer has been shifted and the function is
misinterpreting the header.
In this commit, we disable the pointer alignment for GPDB 6+ to avoid
failure in this check. Further, we also have to disable vectorization in
Eigen since it does not work when pointers are not 16-byte aligned.
Co-authored-by: Jesse Zhang <[email protected]>
Co-authored-by: Nandish Jayaram <[email protected]>
----
---