On Mon, Dec 04, 2023 at 07:27:01AM +, Xiang Gao wrote:
> This is the latest patch. Looking forward to your feedback, thanks!
Thanks for the new patch. I am hoping to spend much more time on this in
the near future...
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Date: Thu, 30 Nov 2023 14:54:26PM -0600, Nathan Bossart wrote:
>>pg_crc32c_armv8.o: CFLAGS += ${CFLAGS_CRC} ${CFLAGS_CRYPTO}
>>
>> It does not work correctly. CFLAGS ='-march=armv8-a+crc,
>> -march=armv8-a+crypto', what actually works is '-march=armv8-a+crypto'.
>>
>> We set a new variable
On Thu, Nov 23, 2023 at 08:05:26AM +, Xiang Gao wrote:
> On Date: Wed, 22 Nov 2023 15:06:18PM -0600, Nathan Bossart wrote:
>>pg_crc32c_armv8.o: CFLAGS += ${CFLAGS_CRC} ${CFLAGS_CRYPTO}
>
> It does not work correctly. CFLAGS ='-march=armv8-a+crc,
> -march=armv8-a+crypto', what actually works
On Date: Wed, 22 Nov 2023 15:06:18PM -0600, Nathan Bossart wrote:
>> On Date: Fri, 10 Nov 2023 10:36:08AM -0600, Nathan Bossart wrote:
>>>+__attribute__((target("+crc+crypto")))
>>>
>>>I'm not sure we can assume that all compilers will understand this, and I'm
>>>not sure we need it.
>>
>>
On Wed, Nov 22, 2023 at 10:16:44AM +, Xiang Gao wrote:
> On Date: Fri, 10 Nov 2023 10:36:08AM -0600, Nathan Bossart wrote:
>>+__attribute__((target("+crc+crypto")))
>>
>>I'm not sure we can assume that all compilers will understand this, and I'm
>>not sure we need it.
>
> CFLAGS_CRC is
On Date: Fri, 10 Nov 2023 10:36:08AM -0600, Nathan Bossart wrote:
>-# all versions of pg_crc32c_armv8.o need CFLAGS_CRC
>-pg_crc32c_armv8.o: CFLAGS+=$(CFLAGS_CRC)
>-pg_crc32c_armv8_shlib.o: CFLAGS+=$(CFLAGS_CRC)
>-pg_crc32c_armv8_srv.o: CFLAGS+=$(CFLAGS_CRC)
>
>Why are these lines deleted?
>
>-
On Tue, Nov 07, 2023 at 08:05:45AM +, Xiang Gao wrote:
> I think I understand what you mean, this is the latest patch. Thank you!
Thanks for the new patch.
+# PGAC_ARMV8_VMULL_INTRINSICS
+#
+# Check if the compiler supports the vmull_p64
+# intrinsic functions.
On Mon, 6 Nov 2023 13:16:13PM -0600, Nathan Bossart wrote:
>>> The idea is that we don't want to start forcing runtime checks on builds
>>>where we aren't already doing runtime checks. IOW if the compiler can use
>>>the ARMv8 CRC instructions with the default compiler flags, we should only
>>>use
On Fri, Nov 03, 2023 at 10:46:57AM +, Xiang Gao wrote:
> On Date: Thu, 2 Nov 2023 09:35:50AM -0500, Nathan Bossart wrote:
>> The idea is that we don't want to start forcing runtime checks on builds
>> where we aren't already doing runtime checks. IOW if the compiler can use
>> the ARMv8 CRC
On Date: Thu, 2 Nov 2023 09:35:50AM -0500, Nathan Bossart wrote:
>On Thu, Nov 02, 2023 at 06:17:20AM +, Xiang Gao wrote:
>> After reading the discussion, I understand that in order to avoid performance
>> regression in some instances, we need to try our best to avoid runtime
>> checks.
> >I
On Thu, Nov 02, 2023 at 06:17:20AM +, Xiang Gao wrote:
> After reading the discussion, I understand that in order to avoid performance
> regression in some instances, we need to try our best to avoid runtime checks.
> I don't know if I understand it correctly.
The idea is that we don't want
On Tue, 31 Oct 2023 15:48:21PM -0500, Nathan Bossart wrote:
>> Thanks. I went ahead and split this prerequisite part out to a separate
>> thread [0] since it's sort-of unrelated to your proposal here. It's not
>> really a prerequisite, but I do think it will simplify things a bit.
>Per the
On Mon, Oct 30, 2023 at 11:21:43AM -0500, Nathan Bossart wrote:
> On Fri, Oct 27, 2023 at 07:01:10AM +, Xiang Gao wrote:
>> On Thu, 26 Oct, 2023 11:37:52AM -0500, Nathan Bossart wrote:
We consider that a runtime check needs to be done in any scenario.
Here we only confirm that the
On Fri, Oct 27, 2023 at 07:01:10AM +, Xiang Gao wrote:
> On Thu, 26 Oct, 2023 11:37:52AM -0500, Nathan Bossart wrote:
>>> We consider that a runtime check needs to be done in any scenario.
>>> Here we only confirm that the compilation can be successful.
>> >A runtime check will be done when
On Thu, 26 Oct, 2023 11:37:52AM -0500, Nathan Bossart wrote:
>> We consider that a runtime check needs to be done in any scenario.
>> Here we only confirm that the compilation can be successful.
> >A runtime check will be done when choosing which algorithm.
> >You can think of us as merging
On Thu, Oct 26, 2023 at 08:53:31AM +, Xiang Gao wrote:
> On Tue, 24 Oct, 2023 20:45:39PM -0500, Nathan Bossart wrote:
>>I tried this. pg_waldump on 2 million ~8kB records took around 8.1 seconds
>>without the patch and around 7.4 seconds with it (an 8% improvement).
>>pg_waldump on 1
On Thu, Oct 26, 2023 at 07:28:35AM +, Xiang Gao wrote:
> On Wed, 25 Oct, 2023 at 10:43:25 -0500, Nathan Bossart wrote:
>>+# Use ARM VMULL if available and ARM CRC32C intrinsic is avaliable too.
>>+if test x"$USE_ARMV8_VMULL" = x"" && (test x"$USE_ARMV8_CRC32C" = x"1" ||
>>test
On Thu, Oct 26, 2023 at 2:23 PM Xiang Gao wrote:
>
> On Tue, 24 Oct, 2023 20:45:39PM -0500, Nathan Bossart wrote:
> >I tried this. pg_waldump on 2 million ~8kB records took around 8.1 seconds
> >without the patch and around 7.4 seconds with it (an 8% improvement).
> >pg_waldump on 1 million
On Tue, 24 Oct, 2023 20:45:39PM -0500, Nathan Bossart wrote:
>I tried this. pg_waldump on 2 million ~8kB records took around 8.1 seconds
>without the patch and around 7.4 seconds with it (an 8% improvement).
>pg_waldump on 1 million ~16kB records took around 3.2 seconds without the
>patch and
On Wed, 25 Oct, 2023 at 10:43:25 -0500, Nathan Bossart wrote:
>+pg_crc32c
>+pg_comp_crc32c_with_vmull_armv8(pg_crc32c crc, const void *data, size_t len)
>It looks like most of this function is duplicated from
>pg_comp_crc32c_armv8(). I understand that we probably need a separate
>function
+pg_crc32c
+pg_comp_crc32c_with_vmull_armv8(pg_crc32c crc, const void *data, size_t len)
It looks like most of this function is duplicated from
pg_comp_crc32c_armv8(). I understand that we probably need a separate
function because of the runtime check, but perhaps we could create a common
static
Thanks for your suggestion, this is the modified patch and two test files.
-Original Message-
From: Michael Paquier
Sent: Friday, October 20, 2023 4:19 PM
To: Xiang Gao
Cc: pgsql-hackers@lists.postgresql.org
Subject: Re: CRC32C Parallel Computation Optimization on ARM
On Fri, Oct 20
On Wed, Oct 25, 2023 at 07:17:55AM +0900, Michael Paquier wrote:
> If you are looking at computing the CRC of records with arbitrary
> sizes, why not just generating a series with
> pg_logical_emit_message() before doing a comparison with pg_waldump or
> a custom replay loop to go through the
On Wed, Oct 25, 2023 at 12:37:45AM +0300, Heikki Linnakangas wrote:
> On 25/10/2023 00:18, Nathan Bossart wrote:
>> Actually, since the pg_waldump benchmark likely only involves very small
>> WAL records, it would make sense that there isn't much difference.
>> *facepalm*
>
> No need to guess,
On 25/10/2023 00:18, Nathan Bossart wrote:
On Tue, Oct 24, 2023 at 04:09:54PM -0500, Nathan Bossart wrote:
I'm able to reproduce the speedup with the provided benchmark on an Apple
M1 Pro (which appears to have the required instructions). There was almost
no change for the 512-byte case, but
On Tue, Oct 24, 2023 at 04:09:54PM -0500, Nathan Bossart wrote:
> I'm able to reproduce the speedup with the provided benchmark on an Apple
> M1 Pro (which appears to have the required instructions). There was almost
> no change for the 512-byte case, but there was a ~60% speedup for the
>
On Fri, Oct 20, 2023 at 05:18:56PM +0900, Michael Paquier wrote:
> On Fri, Oct 20, 2023 at 07:08:58AM +, Xiang Gao wrote:
>> This patch uses a parallel computing optimization algorithm to
>> improve crc32c computing performance on ARM. The algorithm comes
>> from Intel whitepaper:
>>
On Fri, Oct 20, 2023 at 07:08:58AM +, Xiang Gao wrote:
> This patch uses a parallel computing optimization algorithm to
> improve crc32c computing performance on ARM. The algorithm comes
> from Intel whitepaper:
> crc-iscsi-polynomial-crc32-instruction-paper. Input data is divided
> into three
Hi all
This patch uses a parallel computing optimization algorithm to improve crc32c
computing performance on ARM. The algorithm comes from Intel whitepaper:
crc-iscsi-polynomial-crc32-instruction-paper. Input data is divided into three
equal-sized blocks.Three parallel blocks (crc0, crc1,
29 matches
Mail list logo