RE: alter table xxx set unlogged take long time

2022-07-28 Thread James Pang (chaolpan)
 Does  "wal_level=minimal" help reducing  wal emitting a lot for COPY and 
CREATE INDEX?   We plan to remove  "set unlogged/log" , instead , just set 
"wal_level=minimal" ,then COPY data in parallel, then create index.

   Thanks,

   James 
-Original Message-
From: Joe Conway  
Sent: Wednesday, July 27, 2022 11:02 PM
To: Tom Lane 
Cc: James Pang (chaolpan) ; Jim Mlodgenski 
; pgsql-performance@lists.postgresql.org
Subject: Re: alter table xxx set unlogged take long time

On 7/27/22 10:46, Tom Lane wrote:
> Joe Conway  writes:
>> Then (completely untested) I *think* you could create the "partition" 
>> initially as a free standing unlogged table, load it, index it, 
>> switch to logged, and then attach it to the partitioned table.
> 
> I'm still of the opinion that this plan to load the data unlogged and 
> switch to logged later is a loser.  Sooner or later you have got to 
> write the data to WAL, and this approach doesn't eliminate that cost.  
> What it does do is create one whole extra cycle of writing the data to 
> disk and reading it back.  I don't think it's an oversight that no 
> such thing is suggested in our standard tips for bulk-loading data:

Yeah, agreed. I was mostly responding to the OP desire to use unlogged and not 
taking a stance on that.

> https://www.postgresql.org/docs/current/populate.html
> 
> What perhaps *is* an oversight is that we don't suggest use of COPY 
> FREEZE there.  AFAIK that doesn't reduce the initial data loading cost 
> directly, but it would save overhead later.

Oh, yes, very good point.


--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


Re: alter table xxx set unlogged take long time

2022-07-28 Thread Joe Conway

On 7/28/22 03:47, James Pang (chaolpan) wrote:

Does  "wal_level=minimal" help reducing  wal emitting a lot for COPY
and CREATE INDEX?   We plan to remove  "set unlogged/log" , instead ,
just set "wal_level=minimal" ,then COPY data in parallel, then create
index.


(Note - please don't top post on these lists)

Yes, wal_level = minimal is a big help in my experience if you can 
tolerate it.


Similarly synchronous_commit = off might help as long as you are 
prepared to reload some data in the event of a crash (which generally is 
true when bulk loading). As noted in the docs:


  This parameter can be changed at any time; the
  behavior for any one transaction is determined by
  the setting in effect when it commits. It is
  therefore possible, and useful, to have some
  transactions commit synchronously and others
  asynchronously.


--
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com