> On Jan 21, 2021, at 6:48 PM, Justin Pryzby <pry...@telsasoft.com> wrote:
>
> @cfbot: rebased
> <0001-Reorganize-pglz-compression-code.patch>
Review comments.
First, I installed a build from master without this patch, created a test
installation with lots of compressed text and array columns, upgraded the
binaries to a build with this patch included, and tried to find problems with
the data left over from the pre-patch binaries. Everything checks out. This
is on little-endian mac osx intel core i9, not on any ARM platform that you are
targeting with portions of the patch.
+/**************************************
+ * CPU Feature Detection *
+ **************************************/
+/* PGLZ_FORCE_MEMORY_ACCESS
+ * By default, access to unaligned memory is controlled by `memcpy()`, which
is safe and portable.
+ * Unfortunately, on some target/compiler combinations, the generated assembly
is sub-optimal.
+ * The below switch allow to select different access method for improved
performance.
+ * Method 0 (default) : use `memcpy()`. Safe and portable.
+ * Method 1 : direct access. This method is portable but violate C standard.
+ * It can generate buggy code on targets which assembly generation
depends on alignment.
+ * But in some circumstances, it's the only known way to get the most
performance (ie GCC + ARMv6)
+ * See
https://fastcompression.blogspot.fr/2015/08/accessing-unaligned-memory.html for
details.
+ * Prefer these methods in priority order (0 > 1)
+ */
The link to blogspot.fr has a lot more information than your summary in the
code comments. It might be hard to understand this comment if the blogspot
article were ever to disappear. Perhaps you could include a bit more of the
relevant details?
+#ifndef PGLZ_FORCE_MEMORY_ACCESS /* can be defined externally */
+#if defined(__GNUC__) && \
+ ( defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) ||
defined(__ARM_ARCH_6K__) \
+ || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) ||
defined(__ARM_ARCH_6T2__) )
+#define PGLZ_FORCE_MEMORY_ACCESS 1
+#endif
+#endif
I can understand wanting to set this on gcc + ARMv6, but doesn't this belong in
a configure script rather than directly in the compression code?
The blogspot article indicates that the author lied about alignment to the
compiler when using gcc on ARMv6, thereby generating a fast load instruction
which happens to work on ARMv6. You appear to be using that same approach.
Your #if defined(__GNUC__), seems to assume that all future versions of gcc
will generate the instructions that you want, and not start generating some
other set of instructions. Wouldn't you at least need a configure test to
verify that the version of gcc being used generates the desired assembly? Even
then, you'd be banking on gcc doing the same thing for the test code and for
the pglz code, which I guess might not be true. Have you considered using
inline assembly instead?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company