GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/319
Allocator: Remove 16-byte alignment for pointers in GP6 Findings: 1. MADlib performs a 16-byte alignment for pointers returned by palloc. 2. Postgres prepends a small (16 byte usually) header before every pointer which includes a. the memory context and b. the size of the memory allocation. 3. Greenplum 6+ tweaks that scheme a little: instead of the memory context, the header tracks a "shared header" which points to another struct with richer information (aside from the memory context). 4. Postgres calls MemoryContextContains both with the final func for an aggregate and finalize function for a windowed aggregate. 5. Currently Postgres always concludes that the datum from MADlib is allocated outside of the context, and makes an extra copy. In Greenplum, MemoryContextContains needs to dereference the shared header. This is a problem since the pointer has been shifted and the function is misinterpreting the header. In this commit, we disable the pointer alignment for GPDB 6+ to avoid failure in this check. Further, we also have to disable vectorization in Eigen since it does not work when pointers are not 16-byte aligned. Co-authored-by: Jesse Zhang <sbje...@gmail.com> Co-authored-by: Nandish Jayaram <njaya...@apache.org> You can merge this pull request into a Git repository by running: $ git pull https://github.com/madlib/madlib bugfix/pointer_alignment_fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/319.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #319 ---- commit 5cd1b6b56f93b177df3cda93f18cbf76535b3a07 Author: Rahul Iyer <riyer@...> Date: 2018-09-12T23:59:59Z Allocator: Remove 16-byte alignment for pointers in GP6 Findings: 1. MADlib performs a 16-byte alignment for pointers returned by palloc. 2. Postgres prepends a small (16 byte usually) header before every pointer which includes a. the memory context and b. the size of the memory allocation. 3. Greenplum 6+ tweaks that scheme a little: instead of the memory context, the header tracks a "shared header" which points to another struct with richer information (aside from the memory context). 4. Postgres calls MemoryContextContains both with the final func for an aggregate and finalize function for a windowed aggregate. 5. Currently Postgres always concludes that the datum from MADlib is allocated outside of the context, and makes an extra copy. In Greenplum, MemoryContextContains needs to dereference the shared header. This is a problem since the pointer has been shifted and the function is misinterpreting the header. In this commit, we disable the pointer alignment for GPDB 6+ to avoid failure in this check. Further, we also have to disable vectorization in Eigen since it does not work when pointers are not 16-byte aligned. Co-authored-by: Jesse Zhang <sbje...@gmail.com> Co-authored-by: Nandish Jayaram <njaya...@apache.org> ---- ---