Re: Hash index build performance tweak from sorting

2022-11-23 Thread David Rowley
On Thu, 24 Nov 2022 at 02:27, Simon Riggs wrote: > > On Wed, 23 Nov 2022 at 13:04, David Rowley wrote: > > I'd rather see this solved like v4 is doing it. > > Please do. No further comments. Thanks for your help Thanks. I pushed the v4 patch with some minor comment adjustments and also renamed

Re: Hash index build performance tweak from sorting

2022-11-23 Thread Tomas Vondra
On 11/23/22 14:07, David Rowley wrote: > On Fri, 18 Nov 2022 at 03:34, Tomas Vondra > wrote: >> I did some simple benchmark with v2 and v3, using the attached script, >> which essentially just builds hash index on random data, with different >> data types and maintenance_work_mem values. And

Re: Hash index build performance tweak from sorting

2022-11-23 Thread Simon Riggs
On Wed, 23 Nov 2022 at 13:04, David Rowley wrote: > After getting rid of the HashInsertState code and just adding bool > sorted to _hash_doinsert() and _hash_pgaddtup(), the resulting patch > is much more simple: Seems good to me and I wouldn't argue with any of your comments. > and v4

Re: Hash index build performance tweak from sorting

2022-11-23 Thread David Rowley
On Fri, 18 Nov 2022 at 03:34, Tomas Vondra wrote: > I did some simple benchmark with v2 and v3, using the attached script, > which essentially just builds hash index on random data, with different > data types and maintenance_work_mem values. And what I see is this > (median of 10 runs): > So to

Re: Hash index build performance tweak from sorting

2022-11-23 Thread David Rowley
On Wed, 16 Nov 2022 at 17:33, Simon Riggs wrote: > > Thanks for the review, apologies for the delay in acting upon your comments. > > My tests show the sorted and random tests are BOTH 4.6% faster with > the v3 changes using 5-test avg, but you'll be pleased to know your > kit is about 15.5%

Re: Hash index build performance tweak from sorting

2022-11-17 Thread Tomas Vondra
Hi, I did some simple benchmark with v2 and v3, using the attached script, which essentially just builds hash index on random data, with different data types and maintenance_work_mem values. And what I see is this (median of 10 runs): machine data type m_w_mmasterv2

Re: Hash index build performance tweak from sorting

2022-11-15 Thread Simon Riggs
On Wed, 21 Sept 2022 at 02:32, David Rowley wrote: > > I took this patch for a spin and saw a 2.5% performance increase using > the random INT test that Tom posted. The index took an average of > 7227.47 milliseconds on master and 7045.05 with the patch applied. Thanks for the review, apologies

Re: Hash index build performance tweak from sorting

2022-10-11 Thread Michael Paquier
On Wed, Sep 21, 2022 at 12:43:15PM +0100, Simon Riggs wrote: > Thanks for tests and review. I'm just jumping on a plane, so may not > respond in detail until next Mon. Okay. If you have time to address that by next CF, that would be interesting. For now I have marked the entry as returned with

Re: Hash index build performance tweak from sorting

2022-09-21 Thread Simon Riggs
On Wed, 21 Sept 2022 at 02:32, David Rowley wrote: > > On Tue, 2 Aug 2022 at 03:37, Simon Riggs wrote: > > Using the above test case, I'm getting a further 4-7% improvement on > > already committed code with the attached patch, which follows your > > proposal. > > > > The patch passes info via a

Re: Hash index build performance tweak from sorting

2022-09-20 Thread David Rowley
On Tue, 2 Aug 2022 at 03:37, Simon Riggs wrote: > Using the above test case, I'm getting a further 4-7% improvement on > already committed code with the attached patch, which follows your > proposal. > > The patch passes info via a state object, useful to avoid API churn in > later patches. Hi

Re: Hash index build performance tweak from sorting

2022-08-30 Thread Ranier Vilela
>It's a shame you only see 3%, but that's still worth it. Hi, I ran this test here: DROP TABLE hash_speed; CREATE unlogged TABLE hash_speed (x integer); INSERT INTO hash_speed SELECT random()*1000 FROM generate_series(1,1000) x; VACUUM Timing is on. CREATE INDEX ON hash_speed USING hash

Re: Hash index build performance tweak from sorting

2022-08-30 Thread Simon Riggs
On Fri, 5 Aug 2022 at 20:46, David Zhang wrote: > > On 2022-08-01 8:37 a.m., Simon Riggs wrote: > > Using the above test case, I'm getting a further 4-7% improvement on > > already committed code with the attached patch, which follows your > > proposal. > > I ran two test cases: for committed

Re: Hash index build performance tweak from sorting

2022-08-05 Thread David Zhang
On 2022-08-01 8:37 a.m., Simon Riggs wrote: Using the above test case, I'm getting a further 4-7% improvement on already committed code with the attached patch, which follows your proposal. I ran two test cases: for committed patch `hash_sort_by_hash.v3.patch`, I can see about 6 ~ 7%

Re: Hash index build performance tweak from sorting

2022-08-01 Thread Simon Riggs
On Fri, 29 Jul 2022 at 13:49, Simon Riggs wrote: > > On Thu, 28 Jul 2022 at 19:50, Tom Lane wrote: > > > > Simon Riggs writes: > > > Thanks for the nudge. New version attached. > > > > I also see a speed improvement from this > > --- > > DROP TABLE IF EXISTS hash_speed; > > CREATE unlogged

Re: Hash index build performance tweak from sorting

2022-07-29 Thread Simon Riggs
On Thu, 28 Jul 2022 at 19:50, Tom Lane wrote: > > Simon Riggs writes: > > Thanks for the nudge. New version attached. > > I also see a speed improvement from this, so pushed (after minor comment > editing). Thanks > I notice though that if I feed it random data, > > --- > DROP TABLE IF EXISTS

Re: Hash index build performance tweak from sorting

2022-07-28 Thread Tom Lane
Simon Riggs writes: > Thanks for the nudge. New version attached. I also see a speed improvement from this, so pushed (after minor comment editing). I notice though that if I feed it random data, --- DROP TABLE IF EXISTS hash_speed; CREATE unlogged TABLE hash_speed (x integer); INSERT INTO

Re: Hash index build performance tweak from sorting

2022-07-28 Thread Simon Riggs
On Wed, 27 Jul 2022 at 19:22, Tom Lane wrote: > > Simon Riggs writes: > > [ hash_sort_by_hash.v2.patch ] > > The cfbot says this no longer applies --- probably sideswiped by > Korotkov's sorting-related commits last night. Thanks for the nudge. New version attached. -- Simon Riggs

Re: Hash index build performance tweak from sorting

2022-07-27 Thread Tom Lane
Simon Riggs writes: > [ hash_sort_by_hash.v2.patch ] The cfbot says this no longer applies --- probably sideswiped by Korotkov's sorting-related commits last night. regards, tom lane

RE: Hash index build performance tweak from sorting

2022-07-21 Thread houzj.f...@fujitsu.com
On Monday, May 30, 2022 4:13 pmshiy.f...@fujitsu.com wrote: > > On Tue, May 10, 2022 5:43 PM Simon Riggs > wrote: > > > > On Sat, 30 Apr 2022 at 12:12, Amit Kapila > > wrote: > > > > > > Few comments on the patch: > > > 1. I think it is better to use DatumGetUInt32 to fetch the hash key > > >

RE: Hash index build performance tweak from sorting

2022-05-30 Thread shiy.f...@fujitsu.com
On Tue, May 10, 2022 5:43 PM Simon Riggs wrote: > > On Sat, 30 Apr 2022 at 12:12, Amit Kapila > wrote: > > > > Few comments on the patch: > > 1. I think it is better to use DatumGetUInt32 to fetch the hash key as > > the nearby code is using. > > 2. You may want to change the below comment in

Re: Hash index build performance tweak from sorting

2022-05-10 Thread Simon Riggs
On Sat, 30 Apr 2022 at 12:12, Amit Kapila wrote: > > Few comments on the patch: > 1. I think it is better to use DatumGetUInt32 to fetch the hash key as > the nearby code is using. > 2. You may want to change the below comment in HSpool > /* > * We sort the hash keys based on the buckets they

Re: Hash index build performance tweak from sorting

2022-05-04 Thread Amit Kapila
On Mon, May 2, 2022 at 9:28 PM Simon Riggs wrote: > > On Sat, 30 Apr 2022 at 12:12, Amit Kapila wrote: > > > > On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs > > wrote: > > > > > > Hash index pages are stored in sorted order, but we don't prepare the > > > data correctly. > > > > > > We sort the

Re: Hash index build performance tweak from sorting

2022-05-02 Thread Simon Riggs
On Sat, 30 Apr 2022 at 12:12, Amit Kapila wrote: > > On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs > wrote: > > > > Hash index pages are stored in sorted order, but we don't prepare the > > data correctly. > > > > We sort the data as the first step of a hash index build, but we > > forget to sort

Re: Hash index build performance tweak from sorting

2022-04-30 Thread Amit Kapila
On Tue, Apr 19, 2022 at 3:05 AM Simon Riggs wrote: > > Hash index pages are stored in sorted order, but we don't prepare the > data correctly. > > We sort the data as the first step of a hash index build, but we > forget to sort the data by hash as well as by hash bucket. > I was looking into

Hash index build performance tweak from sorting

2022-04-18 Thread Simon Riggs
Hash index pages are stored in sorted order, but we don't prepare the data correctly. We sort the data as the first step of a hash index build, but we forget to sort the data by hash as well as by hash bucket. This causes the normal insert path to do extra pushups to put the data in the correct