subject:"RE\: Disable WAL logging to speed up data loading"

Re: Disable WAL logging to speed up data loading

2021-07-05 Thread Stephen Frost

Greetings,

* osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote:
> On Monday, July 5, 2021 10:32 AM Michael Paquier wrote:
> > On Sun, Jul 04, 2021 at 11:02:01AM -0400, Stephen Frost wrote:
> > > Rather than RfC, the appropriate status seems like it should be
> > > Rejected, as otherwise it's just encouraging someone to ultimately
> > > waste their time rebasing and updating the patch when it isn't going
> > > to ever actually be committed.
> > 
> > Yes, agreed with that.
> Thanks for paying attention to this thread.
> 
> This cannot be helped but I've made the patch status as you suggested,
> because I judged the community would not accept this idea.

Thanks.  Hopefully this will encourage those interested in minimal WAL
logging for data loading performance to instead work on additional
optimizations for the existing 'minimal' WAL level.

Thanks again,

Stephen


signature.asc
Description: PGP signature

RE: Disable WAL logging to speed up data loading

2021-07-04 Thread osumi.takami...@fujitsu.com

On Monday, July 5, 2021 10:32 AM Michael Paquier wrote:
> On Sun, Jul 04, 2021 at 11:02:01AM -0400, Stephen Frost wrote:
> > Rather than RfC, the appropriate status seems like it should be
> > Rejected, as otherwise it's just encouraging someone to ultimately
> > waste their time rebasing and updating the patch when it isn't going
> > to ever actually be committed.
> 
> Yes, agreed with that.
Thanks for paying attention to this thread.

This cannot be helped but I've made the patch status as you suggested,
because I judged the community would not accept this idea.

Best Regards,
Takamichi Osumi

Re: Disable WAL logging to speed up data loading

2021-07-04 Thread Michael Paquier

On Sun, Jul 04, 2021 at 11:02:01AM -0400, Stephen Frost wrote:
> Rather than RfC, the appropriate status seems like it should be
> Rejected, as otherwise it's just encouraging someone to ultimately waste
> their time rebasing and updating the patch when it isn't going to ever
> actually be committed.

Yes, agreed with that.
--
Michael


signature.asc
Description: PGP signature

Re: Disable WAL logging to speed up data loading

2021-07-04 Thread Stephen Frost

Greetings,

* vignesh C (vignes...@gmail.com) wrote:
> On Wed, Apr 7, 2021 at 12:13 PM osumi.takami...@fujitsu.com
>  wrote:
> > Mainly affected by a commit 9de9294,
> > I've fixed minor things to rebase the patch.
> > All modifications I did are cosmetic changes and
> > a little bit of documentation updates.
> > Please have a look at attached v09.
> 
> Patch is not applying on Head, kindly post a rebased version. As this
> patch is in Ready for Committer state, it will help one of the
> committers to pick up this patch during commit fest.

Unless there's actually a committer who is interested and willing to
take this on (understanding that multiple committers have already said
they don't agree with this approach), I don't think it makes sense to
spend additional time on this.

Rather than RfC, the appropriate status seems like it should be
Rejected, as otherwise it's just encouraging someone to ultimately waste
their time rebasing and updating the patch when it isn't going to ever
actually be committed.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Disable WAL logging to speed up data loading

2021-07-03 Thread vignesh C

On Wed, Apr 7, 2021 at 12:13 PM osumi.takami...@fujitsu.com
 wrote:
>
> Hi
>
>
> Mainly affected by a commit 9de9294,
> I've fixed minor things to rebase the patch.
> All modifications I did are cosmetic changes and
> a little bit of documentation updates.
> Please have a look at attached v09.
>

Patch is not applying on Head, kindly post a rebased version. As this
patch is in Ready for Committer state, it will help one of the
committers to pick up this patch during commit fest.

Regards,
Vignesh

RE: Disable WAL logging to speed up data loading

2021-04-07 Thread osumi.takami...@fujitsu.com

Hi


Mainly affected by a commit 9de9294,
I've fixed minor things to rebase the patch.
All modifications I did are cosmetic changes and
a little bit of documentation updates.
Please have a look at attached v09.

Best Regards,
Takamichi Osumi



disable_WAL_logging_v09.patch
Description: disable_WAL_logging_v09.patch

Re: Disable WAL logging to speed up data loading

2021-03-24 Thread Stephen Frost

Greetings,

* tsunakawa.ta...@fujitsu.com (tsunakawa.ta...@fujitsu.com) wrote:
> From: Stephen Frost 
> > * tsunakawa.ta...@fujitsu.com (tsunakawa.ta...@fujitsu.com) wrote:
> > As for data loading tools, surely they support loading data into UNLOGGED
> > tables and it's certainly not hard to have a script run around and flip 
> > those
> > tables to LOGGED after they're loaded, and I do actually believe some of 
> > those
> > tools support building processes of which one step could be such a command
> > (I'm fairly confident Pentaho, in particular, does as I remember building 
> > such
> > pipelines myself...).
> 
> Oh, Pentaho has such a feature, doesn't it?  But isn't it a separate step 
> from the data output step?  Here, I assume ETL tools allow users to compose a 
> data loading job from multiple steps: data input, transformation, data 
> output, etc.  I guess the user can't directly incorporate ALTER TABLE into 
> the data output step, and has to add separate custom steps for ALTER TABLE.  
> That's burdonsome and forgettable, I think.

None of the arguments presented here has done anything to change my
opinion that adding a 'none' WAL level is a bad idea.

Thanks,

Stephen


signature.asc
Description: PGP signature

RE: Disable WAL logging to speed up data loading

2021-03-24 Thread tsunakawa.ta...@fujitsu.com

From: Stephen Frost 
> * tsunakawa.ta...@fujitsu.com (tsunakawa.ta...@fujitsu.com) wrote:
> > As Laurenz-san kindly replied, the database server refuses to start with a
> clear message.  So, it's similarly very clear what happens.  The user will 
> never
> unknowingly resume operation with possibly corrupt data.
> 
> No, instead they'll just have a completely broken system that has to be 
> rebuilt or
> restored from a backup.  That doesn't strike me as a good result.

Your understanding is correct.  What I wanted to answer to confirm is that the 
behavior is clear: the server refuses to start, and the user knows he/she 
should restore the database from backup.

> > So, I understood the point boils down to elegance.  Could I ask what makes
> you feel ALTER TABLE UNLOGGED/LOGGED is (more) elegant?  I'm purely
> asking as a user.
> 
> The impact is localized to those specific tables.  The rest of the system 
> should
> come up cleanly and there won't be corruption, instead merely the lack of data
> in UNLOGGED tables.

So, I took your point as the ease and fast time of restore after a crash: the 
user just has to restore the lost table data using COPY FROM from files that 
was saved before the data loading job using COPY TO.

In that sense, the backup and restoration of the whole database is an option 
for users when they have some instant backup and restore feature available.

> > (I don't want to digress, but if we consider the number of options for
> > wal_level as an issue, I feel it's not elegant to have separate
> > "replica" and "logical".)
> 
> Do you know of a way to avoid having those two distinct levels while still 
> writing
> only the WAL needed depending on if a system is doing logical replication or
> not..?  If you've got suggestions on how to eliminate one of those levels, I'm
> sure there would be interest in doing so.  I don't see the fact that we have
> those two levels as justification for adding another spelling of 'minimal'.

Sorry, I have almost no knowledge of logical replication implementation.  So, 
being ignorant of its intricacies, I have felt like as a user "Why do I have to 
set wal_level = logical, because streaming replication and logical replication 
are both replication features?  If the implementation needs some additional WAL 
for logical replication, why doesn't the server automatically emit the WAL when 
the target table of DML statements is in a publication?"

> > The elegance of wal_level = none is that the user doesn't have to
> > remember to add ALTER TABLE to the data loading job when they add load
> > target tables/partitions.  If they build and use their own (shell)
> > scripts to load data, that won't be burdon or forgotten.  But what
> > would they have to do when they use ETL tools like Talend, Pentaho,
> > and Informatica Power Center?  Do those tools allow users to add
> > custom processing like ALTER TABLE to the data loading job steps for
> > each table?  (AFAIK, not.)
> 
> I don't buy the argument that having to 'remember' to do an ALTER TABLE is
> such a burden when it means that the database will still be consistent and
> operational after a crash.

That depends on whether an instant backup and restore feature is at hand.  If 
the user is comfortable with it, wal_level = none is easier and more 
attractive.  At least, I don't want the feature to be denied.

> As for data loading tools, surely they support loading data into UNLOGGED
> tables and it's certainly not hard to have a script run around and flip those
> tables to LOGGED after they're loaded, and I do actually believe some of those
> tools support building processes of which one step could be such a command
> (I'm fairly confident Pentaho, in particular, does as I remember building such
> pipelines myself...).

Oh, Pentaho has such a feature, doesn't it?  But isn't it a separate step from 
the data output step?  Here, I assume ETL tools allow users to compose a data 
loading job from multiple steps: data input, transformation, data output, etc.  
I guess the user can't directly incorporate ALTER TABLE into the data output 
step, and has to add separate custom steps for ALTER TABLE.  That's burdonsome 
and forgettable, I think.

Regards
TakayukiTsunakawa

Re: Disable WAL logging to speed up data loading

2021-03-23 Thread Stephen Frost

Greetings,

* tsunakawa.ta...@fujitsu.com (tsunakawa.ta...@fujitsu.com) wrote:
> From: Stephen Frost 
> > First- what are you expecting would actually happen during crash recovery in
> > this specific case with your proposed new WAL level?
> ...
> > I'm not suggesting it's somehow more crash safe- but it's at least very 
> > clear
> > what happens in such a case, to wit: the entire table is cleared on crash
> > recovery.
> 
> As Laurenz-san kindly replied, the database server refuses to start with a 
> clear message.  So, it's similarly very clear what happens.  The user will 
> never unknowingly resume operation with possibly corrupt data.

No, instead they'll just have a completely broken system that has to be
rebuilt or restored from a backup.  That doesn't strike me as a good
result.

> > We're talking about two different ways to accomplish essentially the same
> > thing- one which introduces a new WAL level, vs. one which adds an
> > optimization for a WAL level we already have.  That the second is more 
> > elegant
> > is more-or-less entirely the point I'm making here, so it seems pretty 
> > relevant.
> 
> So, I understood the point boils down to elegance.  Could I ask what makes 
> you feel ALTER TABLE UNLOGGED/LOGGED is (more) elegant?  I'm purely asking as 
> a user.

The impact is localized to those specific tables.  The rest of the
system should come up cleanly and there won't be corruption, instead
merely the lack of data in UNLOGGED tables.

> (I don't want to digress, but if we consider the number of options for 
> wal_level as an issue, I feel it's not elegant to have separate "replica" and 
> "logical".)

Do you know of a way to avoid having those two distinct levels while
still writing only the WAL needed depending on if a system is doing
logical replication or not..?  If you've got suggestions on how to
eliminate one of those levels, I'm sure there would be interest in doing
so.  I don't see the fact that we have those two levels as justification
for adding another spelling of 'minimal'.

> > Under the proposed 'none', you basically have to throw out the entire 
> > cluster on
> > a crash, all because you don't want to use 'UNLOGGED' when you created the
> > tables you want to load data into, or 'TRUNCATE' them in the transaction 
> > where
> > you start the data load, either of which gives us enough indication and 
> > which
> > we have infrastructure around dealing with in the event of a crash during 
> > the
> > load without everything else having to be tossed and everything restored 
> > from a
> > backup.  That's both a better user experience from the perspective of having
> > fewer WAL levels to understand and from just a general administration
> > perspective so you don't have to go all the way back to a backup to bring 
> > the
> > system back up.
> 
> The elegance of wal_level = none is that the user doesn't have to remember to 
> add ALTER TABLE to the data loading job when they add load target 
> tables/partitions.  If they build and use their own (shell) scripts to load 
> data, that won't be burdon or forgotten.  But what would they have to do when 
> they use ETL tools like Talend, Pentaho, and Informatica Power Center?  Do 
> those tools allow users to add custom processing like ALTER TABLE to the data 
> loading job steps for each table?  (AFAIK, not.)

I don't buy the argument that having to 'remember' to do an ALTER TABLE
is such a burden when it means that the database will still be
consistent and operational after a crash.

As for data loading tools, surely they support loading data into
UNLOGGED tables and it's certainly not hard to have a script run around
and flip those tables to LOGGED after they're loaded, and I do actually
believe some of those tools support building processes of which one step
could be such a command (I'm fairly confident Pentaho, in particular,
does as I remember building such pipelines myself...).

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Disable WAL logging to speed up data loading

2021-03-22 Thread Laurenz Albe

On Mon, 2021-03-22 at 13:22 -0400, Stephen Frost wrote:
> * Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> > On Mon, 2021-03-22 at 11:05 -0400, Stephen Frost wrote:
> > > > Perhaps allowing to set unlogged tables to logged ones without writing 
> > > > WAL
> > > > is a more elegant way to do that, but I cannot see how that would be any
> > > > more crash safe than this patch.  Besides, the feature doesn't exist 
> > > > yet.
> > > 
> > > I'm not suggesting it's somehow more crash safe- but it's at least very
> > > clear what happens in such a case, to wit: the entire table is cleared
> > > on crash recovery.
> > 
> > I didn't look at the patch, but are you saying that after changing the
> > table to LOGGED, it somehow remembers that it is not crash safe and is
> > emptied if there is a crash before the next checkpoint?
> 
> I'm not sure where the confusion is, but certainly once the table has
> been turned into LOGGED and that's been committed then it should be
> crash safe, even under 'minimal' with the optimization to avoid actually
> copying the entire table into the WAL.  I don't see any particular
> reason why that isn't possible to do.

The confusion is probably caused by my ignorance.
If the actual data files of the table are forced out to disk on COMMIT,
that should indeed work.  And if you fsync a file that has not been changed
since it was last persisted, that should be cheap.

So if I got that right, I agree with you that that would be a much better
way to achieve the goal than wal_level = 'none'.

The only drawback is that we don't have that feature yet...

Yours,
Laurenz Albe

RE: Disable WAL logging to speed up data loading

2021-03-22 Thread tsunakawa.ta...@fujitsu.com

From: Stephen Frost 
> First- what are you expecting would actually happen during crash recovery in
> this specific case with your proposed new WAL level?
...
> I'm not suggesting it's somehow more crash safe- but it's at least very clear
> what happens in such a case, to wit: the entire table is cleared on crash
> recovery.

As Laurenz-san kindly replied, the database server refuses to start with a 
clear message.  So, it's similarly very clear what happens.  The user will 
never unknowingly resume operation with possibly corrupt data.


> We're talking about two different ways to accomplish essentially the same
> thing- one which introduces a new WAL level, vs. one which adds an
> optimization for a WAL level we already have.  That the second is more elegant
> is more-or-less entirely the point I'm making here, so it seems pretty 
> relevant.

So, I understood the point boils down to elegance.  Could I ask what makes you 
feel ALTER TABLE UNLOGGED/LOGGED is (more) elegant?  I'm purely asking as a 
user.

(I don't want to digress, but if we consider the number of options for 
wal_level as an issue, I feel it's not elegant to have separate "replica" and 
"logical".)


> Under the proposed 'none', you basically have to throw out the entire cluster 
> on
> a crash, all because you don't want to use 'UNLOGGED' when you created the
> tables you want to load data into, or 'TRUNCATE' them in the transaction where
> you start the data load, either of which gives us enough indication and which
> we have infrastructure around dealing with in the event of a crash during the
> load without everything else having to be tossed and everything restored from 
> a
> backup.  That's both a better user experience from the perspective of having
> fewer WAL levels to understand and from just a general administration
> perspective so you don't have to go all the way back to a backup to bring the
> system back up.

The elegance of wal_level = none is that the user doesn't have to remember to 
add ALTER TABLE to the data loading job when they add load target 
tables/partitions.  If they build and use their own (shell) scripts to load 
data, that won't be burdon or forgotten.  But what would they have to do when 
they use ETL tools like Talend, Pentaho, and Informatica Power Center?  Do 
those tools allow users to add custom processing like ALTER TABLE to the data 
loading job steps for each table?  (AFAIK, not.)

wal_level = none is convenient and attractive for users who can backup and 
restore the entire database instantly with a storage or filesystem snapshot 
feature.


Regards
TakayukiTsunakawa

Re: Disable WAL logging to speed up data loading

2021-03-22 Thread Stephen Frost

Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> On Mon, 2021-03-22 at 11:05 -0400, Stephen Frost wrote:
> > > Perhaps allowing to set unlogged tables to logged ones without writing WAL
> > > is a more elegant way to do that, but I cannot see how that would be any
> > > more crash safe than this patch.  Besides, the feature doesn't exist yet.
> > 
> > I'm not suggesting it's somehow more crash safe- but it's at least very
> > clear what happens in such a case, to wit: the entire table is cleared
> > on crash recovery.
> 
> I didn't look at the patch, but are you saying that after changing the
> table to LOGGED, it somehow remembers that it is not crash safe and is
> emptied if there is a crash before the next checkpoint?

I'm not sure where the confusion is, but certainly once the table has
been turned into LOGGED and that's been committed then it should be
crash safe, even under 'minimal' with the optimization to avoid actually
copying the entire table into the WAL.  I don't see any particular
reason why that isn't possible to do.

Under the proposed 'none', you basically have to throw out the entire
cluster on a crash, all because you don't want to use 'UNLOGGED' when
you created the tables you want to load data into, or 'TRUNCATE' them in
the transaction where you start the data load, either of which gives us
enough indication and which we have infrastructure around dealing with
in the event of a crash during the load without everything else having
to be tossed and everything restored from a backup.  That's both a
better user experience from the perspective of having fewer WAL levels
to understand and from just a general administration perspective so you
don't have to go all the way back to a backup to bring the system back
up.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Disable WAL logging to speed up data loading

2021-03-22 Thread Laurenz Albe

On Mon, 2021-03-22 at 11:05 -0400, Stephen Frost wrote:
> > Perhaps allowing to set unlogged tables to logged ones without writing WAL
> > is a more elegant way to do that, but I cannot see how that would be any
> > more crash safe than this patch.  Besides, the feature doesn't exist yet.
> 
> I'm not suggesting it's somehow more crash safe- but it's at least very
> clear what happens in such a case, to wit: the entire table is cleared
> on crash recovery.

I didn't look at the patch, but are you saying that after changing the
table to LOGGED, it somehow remembers that it is not crash safe and is
emptied if there is a crash before the next checkpoint?

Wouldn't that cause corruption if you restore from an earlier backup?
At least with the feature in this thread we'd get an error on recovery.

Yours,
Laurenz Albe

Re: Disable WAL logging to speed up data loading

2021-03-22 Thread Stephen Frost

Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> On Mon, 2021-03-22 at 09:46 -0400, Stephen Frost wrote:
> > * tsunakawa.ta...@fujitsu.com (tsunakawa.ta...@fujitsu.com) wrote:
> > > From: Stephen Frost 
> > > > The argument here seems to stem from the idea that issueing a 'TRUNCATE'
> > > > inside the transaction before starting the 'COPY' command is 'too hard'.
> > > 
> > > No, we can't ask using TRUNCATE because the user wants to add data to a 
> > > table.
> > 
> > First- what are you expecting would actually happen during crash
> > recovery in this specific case with your proposed new WAL level?
> > 
> > Second, use partitioning, or unlogged tables (with the patch discussed
> > elsewhere to allow flipping them to logged without writing the entire
> > thing to WAL).
> 
> Perhaps allowing to set unlogged tables to logged ones without writing WAL
> is a more elegant way to do that, but I cannot see how that would be any
> more crash safe than this patch.  Besides, the feature doesn't exist yet.

I'm not suggesting it's somehow more crash safe- but it's at least very
clear what happens in such a case, to wit: the entire table is cleared
on crash recovery.

> So I think that argument doesn't carry much weight.

We're talking about two different ways to accomplish essentially the
same thing- one which introduces a new WAL level, vs. one which adds an
optimization for a WAL level we already have.  That the second is more
elegant is more-or-less entirely the point I'm making here, so it seems
pretty relevant.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Disable WAL logging to speed up data loading

2021-03-22 Thread Laurenz Albe

On Mon, 2021-03-22 at 09:46 -0400, Stephen Frost wrote:
> * tsunakawa.ta...@fujitsu.com (tsunakawa.ta...@fujitsu.com) wrote:
> > From: Stephen Frost 
> > > The argument here seems to stem from the idea that issueing a 'TRUNCATE'
> > > inside the transaction before starting the 'COPY' command is 'too hard'.
> > 
> > No, we can't ask using TRUNCATE because the user wants to add data to a 
> > table.
> 
> First- what are you expecting would actually happen during crash
> recovery in this specific case with your proposed new WAL level?
> 
> Second, use partitioning, or unlogged tables (with the patch discussed
> elsewhere to allow flipping them to logged without writing the entire
> thing to WAL).

Perhaps allowing to set unlogged tables to logged ones without writing WAL
is a more elegant way to do that, but I cannot see how that would be any
more crash safe than this patch.  Besides, the feature doesn't exist yet.

So I think that argument doesn't carry much weight.

Yours,
Laurenz Albe

Re: Disable WAL logging to speed up data loading

2021-03-22 Thread Stephen Frost

Greetings,

* tsunakawa.ta...@fujitsu.com (tsunakawa.ta...@fujitsu.com) wrote:
> From: Stephen Frost 
> > The argument here seems to stem from the idea that issueing a 'TRUNCATE'
> > inside the transaction before starting the 'COPY' command is 'too hard'.
> 
> > I could be wrong and perhaps opinions will change in the future, but it 
> > really
> > doesn't seem like the there's much support for adding a new WAL level just 
> > to
> > avoid doing the TRUNCATE.  Appealing to what other database systems
> > support can be helpful in some cases but that's usually when we're looking 
> > at a
> > new capability which multiple other systems have implemented.  This isn't
> > actually a new capability at all- the WAL level that you're looking for 
> > already
> > exists and already gives the optimization you're looking for, as long as 
> > you issue
> > a TRUNCATE at the start of the transaction.
> 
> No, we can't ask using TRUNCATE because the user wants to add data to a table.

First- what are you expecting would actually happen during crash
recovery in this specific case with your proposed new WAL level?

Second, use partitioning, or unlogged tables (with the patch discussed
elsewhere to allow flipping them to logged without writing the entire
thing to WAL).

Thanks,

Stephen


signature.asc
Description: PGP signature

RE: Disable WAL logging to speed up data loading

2021-03-21 Thread tsunakawa.ta...@fujitsu.com

From: Stephen Frost 
> The argument here seems to stem from the idea that issueing a 'TRUNCATE'
> inside the transaction before starting the 'COPY' command is 'too hard'.


> I could be wrong and perhaps opinions will change in the future, but it really
> doesn't seem like the there's much support for adding a new WAL level just to
> avoid doing the TRUNCATE.  Appealing to what other database systems
> support can be helpful in some cases but that's usually when we're looking at 
> a
> new capability which multiple other systems have implemented.  This isn't
> actually a new capability at all- the WAL level that you're looking for 
> already
> exists and already gives the optimization you're looking for, as long as you 
> issue
> a TRUNCATE at the start of the transaction.

No, we can't ask using TRUNCATE because the user wants to add data to a table.


Regards
TakayukiTsunakawa

Re: Disable WAL logging to speed up data loading

2021-03-19 Thread Stephen Frost

Greetings,

* tsunakawa.ta...@fujitsu.com (tsunakawa.ta...@fujitsu.com) wrote:
> From: David Steele 
> > After reading through the thread (but not reading the patch) I am -1 on
> > this proposal.
> > 
> > The feature seems ripe for abuse and misunderstanding, and as has been
> > noted in the thread, there are a variety of alternatives that can
> > provide a similar effect.
> > 
> > It doesn't help that at several points along the way new WAL records
> > have been found that still need to be included even when wal_level =
> > none. It's not clear to me how we know when we have found them all.
> > 
> > The patch is marked Ready for Committer but as far as I can see there
> > are no committers in favor of it and quite a few who are not.
> 
> I can understand that people are worried about not having WAL.  But as far as 
> I remember, I'm afraid those concerns were emotional, not logical, i.e., 
> something like "something may happen.".  Regarding concrete concerns that 
> Stephen-san, Magnus-san, Horiguchi-san, Sawada-san and others raised, 
> Osumi-san addressed them based on their advice and review, both in this 
> thread and other threads.
> 
> I also understand we want to value people's emotion for worry-free 
> PostgreSQL.  At the same time, I'd like the emotion understood that we want 
> Postgres to have this convenient, easy-to-use feature.  MySQL recently 
> introduced this feature.  Why can't Postgres do it?

I don't see there being anything particularly 'emotional' around
suggesting that we already have multiple ways to avoid WAL writes when
populating tables and suggesting that we work to improve those instead
of adding yet another WAL level (note: we've been removing them of late,
for the good reason that having more is confusing and not helpful).

> > Perhaps it would be better to look at some of the more targeted
> > approaches mentioned in the thread and see if any of them can be
> > used/improved to achieve the desired result?
> 
> Other methods are not as easy-to-use, and more complex to implement.

The argument here seems to stem from the idea that issueing a 'TRUNCATE'
inside the transaction before starting the 'COPY' command is 'too hard'.
That's clearly a subjective question and perhaps it's my emotional
response to that which is at issue, but I don't think I'm alone in the
feeling that just isn't enough of a justification for adding a new WAL
level.

> What kind of destiny does this type of feature end up in?

I could be wrong and perhaps opinions will change in the future, but it
really doesn't seem like the there's much support for adding a new WAL
level just to avoid doing the TRUNCATE.  Appealing to what other
database systems support can be helpful in some cases but that's usually
when we're looking at a new capability which multiple other systems have
implemented.  This isn't actually a new capability at all- the WAL level
that you're looking for already exists and already gives the
optimization you're looking for, as long as you issue a TRUNCATE at the
start of the transaction.

Thanks,

Stephen

signature.asc
Description: PGP signature

RE: Disable WAL logging to speed up data loading

2021-03-19 Thread tsunakawa.ta...@fujitsu.com

From: Laurenz Albe 
> Now I'm not saying that this feature should not go in (I set it to
> "ready for committer", because I see no technical flaw with the
> implementation), but it remains debatable if we want the feature or not.

Oh, yes, thank you very much for supporting this and other relevant two threads!


> I certainly can see David's point of view.  And we don't view MySQL as
> a role model that we want to emulate.

Yes, what MySQL was over ten years ago would not be a role model for us.  OTOH, 
recent MySQL under Oracle should be improving much -- adopting InnoDB as a 
default storage engine, transactional data dictionary, etc.  (Somewhat 
offtopic, but their documentation quality is great.)


> All these things are annoying to users, but we'd rather take that than
> the complaints that a database got corrupted because somebody didn't read
> the documentation carefully.

Hmm, if that were the case, then some people would say the unlogged-table based 
approach is also be dangerous, saying "Users don't read the manual carefully 
and easily think that making all tables unlogged is good for performance."


Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2021-03-19 Thread Laurenz Albe

On Fri, 2021-03-19 at 06:53 +, tsunakawa.ta...@fujitsu.com wrote:
> From: David Steele 
> > After reading through the thread (but not reading the patch) I am -1 on
> > this proposal.
> > 
> > The feature seems ripe for abuse and misunderstanding, and as has been
> > noted in the thread, there are a variety of alternatives that can
> > provide a similar effect.
> > 
> > It doesn't help that at several points along the way new WAL records
> > have been found that still need to be included even when wal_level =
> > none. It's not clear to me how we know when we have found them all.
> > 
> > The patch is marked Ready for Committer but as far as I can see there
> > are no committers in favor of it and quite a few who are not.
> 
> I can understand that people are worried about not having WAL.  But as far
>  as I remember, I'm afraid those concerns were emotional, not logical,
>  i.e., something like "something may happen.".  Regarding concrete concerns
>  that Stephen-san, Magnus-san, Horiguchi-san, Sawada-san and others raised,
>  Osumi-san addressed them based on their advice and review, both in this
>  thread and other threads.
> 
> I also understand we want to value people's emotion for worry-free PostgreSQL.
> At the same time, I'd like the emotion understood that we want Postgres to
>  have this convenient, easy-to-use feature.  MySQL recently introduced this
>  feature.  Why can't Postgres do it?

Part of what gives PostgreSQL the good name it has is that we strive to
make it very hard to mess up your data.  We are finicky about encoding
correctness, we don't allow you to drop a table used by a view, and so on.

All these things are annoying to users, but we'd rather take that than
the complaints that a database got corrupted because somebody didn't read
the documentation carefully.

Now I'm not saying that this feature should not go in (I set it to
"ready for committer", because I see no technical flaw with the
implementation), but it remains debatable if we want the feature or not.

I certainly can see David's point of view.  And we don't view MySQL as
a role model that we want to emulate.

> > Perhaps it would be better to look at some of the more targeted
> > approaches mentioned in the thread and see if any of them can be
> > used/improved to achieve the desired result?
> 
> Other methods are not as easy-to-use, and more complex to implement.

Fast and loose often takes less effort, yes.

Yours,
Laurenz Albe

RE: Disable WAL logging to speed up data loading

2021-03-19 Thread tsunakawa.ta...@fujitsu.com

From: David Steele 
> After reading through the thread (but not reading the patch) I am -1 on
> this proposal.
> 
> The feature seems ripe for abuse and misunderstanding, and as has been
> noted in the thread, there are a variety of alternatives that can
> provide a similar effect.
> 
> It doesn't help that at several points along the way new WAL records
> have been found that still need to be included even when wal_level =
> none. It's not clear to me how we know when we have found them all.
> 
> The patch is marked Ready for Committer but as far as I can see there
> are no committers in favor of it and quite a few who are not.

I can understand that people are worried about not having WAL.  But as far as I 
remember, I'm afraid those concerns were emotional, not logical, i.e., 
something like "something may happen.".  Regarding concrete concerns that 
Stephen-san, Magnus-san, Horiguchi-san, Sawada-san and others raised, Osumi-san 
addressed them based on their advice and review, both in this thread and other 
threads.

I also understand we want to value people's emotion for worry-free PostgreSQL.  
At the same time, I'd like the emotion understood that we want Postgres to have 
this convenient, easy-to-use feature.  MySQL recently introduced this feature.  
Why can't Postgres do it?


> Perhaps it would be better to look at some of the more targeted
> approaches mentioned in the thread and see if any of them can be
> used/improved to achieve the desired result?

Other methods are not as easy-to-use, and more complex to implement.

What kind of destiny does this type of feature end up in?


Regards
Takayuki Tsunakawa}

Re: Disable WAL logging to speed up data loading

2021-03-18 Thread David Steele


On 1/17/21 12:16 AM, tsunakawa.ta...@fujitsu.com wrote:

From: Osumi, Takamichi/大墨 昂道 

I updated my patch to take in those feedbacks !
Have a look at the latest patch.


Looks good to me (the status remains ready for committer.)


After reading through the thread (but not reading the patch) I am -1 on  
this proposal.


The feature seems ripe for abuse and misunderstanding, and as has been  
noted in the thread, there are a variety of alternatives that can  
provide a similar effect.


It doesn't help that at several points along the way new WAL records  
have been found that still need to be included even when wal_level =  
none. It's not clear to me how we know when we have found them all.


The patch is marked Ready for Committer but as far as I can see there  
are no committers in favor of it and quite a few who are not.


Perhaps it would be better to look at some of the more targeted  
approaches mentioned in the thread and see if any of them can be  
used/improved to achieve the desired result?


Regards,
--
-David
da...@pgmasters.net

RE: Disable WAL logging to speed up data loading

2021-01-16 Thread tsunakawa.ta...@fujitsu.com

From: Osumi, Takamichi/大墨 昂道 
> I updated my patch to take in those feedbacks !
> Have a look at the latest patch.

Looks good to me (the status remains ready for committer.)


Regards
Takayuki Tsunakawa

RE: Disable WAL logging to speed up data loading

2021-01-15 Thread osumi.takami...@fujitsu.com

Hi, Movead


Thanks for your comments.
> I read the patch and have two points:
> 
> 1. I do basebackup for database then switch wal level from logical to none to
> logical and of cause I archive the wal segments. Next I do PITR base on the
> basebackup, as a result it success startup with a waring said maybe data
> missed.
This applies to wal_level=minimal and is going to be addressed by
another thread [1].

> 
> Because the 'none' level is to bulkload data, do you think it's good that we 
> still
> support recover from a 'none' wal level.
> 
> 2. I just mark wal_level as 'none' but fail to startup, it success after I 
> drop the
> publication and it's subscription,mark max_wal_senders as 0, drop replicate 
> slot.
> I think it worth to write how we can startup a 'none' wal level database in
> document .
The documentation [2] touches this aspect.
It says "Also, wal_level must be set to replica or
higher to allow replication slots to be used." already.
It is documented and applies to wal_level=minimal also
because 'src/backend/replication/slot.c' has code that throws the error you got 
as below.

else if (wal_level < WAL_LEVEL_REPLICA)
ereport(FATAL,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg("physical replication slot \"%s\" exists, but wal_level 
< replica",
NameStr(cp.slotdata.name)),
 errhint("Change wal_level to be replica or higher.")));

[1] - 
https://www.postgresql.org/message-id/OSBPR01MB4888CBE1DA08818FD2D90ED8EDF90%40OSBPR01MB4888.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/docs/13/runtime-config-replication.html

Best Regards,
Takamichi Osumi

RE: Disable WAL logging to speed up data loading

2021-01-15 Thread osumi.takami...@fujitsu.com

Hi


Thank you everyone
On Thursday, January 14, 2021 9:27 AM Tsunakawa, Takayuki/綱川 貴之 
 wrote:
> From: Kyotaro Horiguchi 
> > > XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT |
> > XLOG_MARK_ESSENTIAL);
> > > XLogRegisterData((char *) , sizeof(dummy));
> > >
> > > (Here's a word play - unimportant but essential, what's that?)
> >
> > Hmm. Food may not be important to someone but it is essential for
> > survival.  I think this is somethig like that :p
> 
> Ah, that's a good answer.  I know a person around me who enjoys drinking
> alcohol but doesn't enjoy eating - he says he doesn't care about taste of 
> food.
> So food is unimportant but nutrition is essential for him.
> 
> 
> > Unfortunately, I prefer the latter as it is simple because it is in a
> > hot path.  As I think I mentioned upthread, I think the xlog stuff
> > should refrain from being conscious of things deeper than RMger-ID
> > level.  One of other reasons is that generally the issuer site is the
> > authoritative about the importance and essentiality of a record being
> > issued.  And I don't think it's very good to do the same thing in
> > different ways at the same time.  Fortunately each type of the recrods
> > has only few issuer places.
> 
> Agreed.
> 
> 
> > By the way, I noticed that pg_switch_wal() silently skips its task.
> > Desn't it need to give any sort of ERROR?
> 
> Osumi-san, can you check this?  (I thought we were aware of this according to
> Fujii-san's comment.)
I updated my patch to take in those feedbacks !
Have a look at the latest patch.


Best Regards,
Takamichi Osumi



disable_WAL_logging_v08.patch
Description: disable_WAL_logging_v08.patch

RE: Disable WAL logging to speed up data loading

2021-01-13 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> > XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT |
> XLOG_MARK_ESSENTIAL);
> > XLogRegisterData((char *) , sizeof(dummy));
> >
> > (Here's a word play - unimportant but essential, what's that?)
> 
> Hmm. Food may not be important to someone but it is essential for
> survival.  I think this is somethig like that :p

Ah, that's a good answer.  I know a person around me who enjoys drinking 
alcohol but doesn't enjoy eating - he says he doesn't care about taste of food. 
 So food is unimportant but nutrition is essential for him.


> Unfortunately, I prefer the latter as it is simple because it is in a
> hot path.  As I think I mentioned upthread, I think the xlog stuff
> should refrain from being conscious of things deeper than RMger-ID
> level.  One of other reasons is that generally the issuer site is the
> authoritative about the importance and essentiality of a record being
> issued.  And I don't think it's very good to do the same thing in
> different ways at the same time.  Fortunately each type of the recrods
> has only few issuer places.

Agreed.


> By the way, I noticed that pg_switch_wal() silently skips its
> task. Desn't it need to give any sort of ERROR?

Osumi-san, can you check this?  (I thought we were aware of this according to 
Fujii-san's comment.)


Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2021-01-13 Thread Kyotaro Horiguchi

At Wed, 13 Jan 2021 06:01:34 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> From: Kyotaro Horiguchi 
> > XLogBeginInsert();
> > XLogSetRecrodFlags(XLOG_MARK_ESSENTIAL);  # new flag value
> > XLOGInsert();
> 
> Oh, sounds like a nice idea.  That's more flexible by allowing WAL-emitting 
> modules to specify which WAL records are mandatory even when wal_level is 
> none.
> 
> For example, gistXLogAssignLSN() adds the above flag like this:
> 
> XLogBeginInsert();
> XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT | XLOG_MARK_ESSENTIAL);
> XLogRegisterData((char *) , sizeof(dummy));
> 
> (Here's a word play - unimportant but essential, what's that?)

Hmm. Food may not be important to someone but it is essential for
survival.  I think this is somethig like that :p

> And the filter in XLogInsert() becomes:
> 
> + if (wal_level == WAL_LEVEL_NONE &&
> + !((rmid == RM_XLOG_ID && info == XLOG_CHECKPOINT_SHUTDOWN) ||
> +   (rmid == RM_XLOG_ID && info == XLOG_PARAMETER_CHANGE) ||
> +   (rmid == RM_XACT_ID && info == XLOG_XACT_PREPARE) ||
> +   (curinsert_flags & XLOG_MARK_ESSENTIAL)))
> 
> Or,
> 
> + if (wal_level == WAL_LEVEL_NONE &&
> +  !(curinsert_flags & XLOG_MARK_ESSENTIAL))
>
> and add the new flag when emitting XLOG_CHECKPOINT_ONLINE,

Yeah, I was going to comment about that.  Skipping the record makes
controlfile filled by a bogus checkpoint location.

> XLOG_PARAMETER_CHANGE and XLOG_PREPARE records.  I think both have
> good reasons: the former centralizes the handling of XACT and XLOG
> RM WAL records (as the current XLOG module does already), and the
> latter delegates the decision to each module.  Which would you
> prefer?  (I kind of like the former, but this is a weak opinion.)

Unfortunately, I prefer the latter as it is simple because it is in a
hot path.  As I think I mentioned upthread, I think the xlog stuff
should refrain from being conscious of things deeper than RMger-ID
level.  One of other reasons is that generally the issuer site is the
authoritative about the importance and essentiality of a record being
issued.  And I don't think it's very good to do the same thing in
different ways at the same time.  Fortunately each type of the recrods
has only few issuer places.

By the way, I noticed that pg_switch_wal() silently skips its
task. Desn't it need to give any sort of ERROR?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Disable WAL logging to speed up data loading

2021-01-12 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> XLogBeginInsert();
> XLogSetRecrodFlags(XLOG_MARK_ESSENTIAL);  # new flag value
> XLOGInsert();

Oh, sounds like a nice idea.  That's more flexible by allowing WAL-emitting 
modules to specify which WAL records are mandatory even when wal_level is none.

For example, gistXLogAssignLSN() adds the above flag like this:

XLogBeginInsert();
XLogSetRecordFlags(XLOG_MARK_UNIMPORTANT | XLOG_MARK_ESSENTIAL);
XLogRegisterData((char *) , sizeof(dummy));

(Here's a word play - unimportant but essential, what's that?)

And the filter in XLogInsert() becomes:

+   if (wal_level == WAL_LEVEL_NONE &&
+   !((rmid == RM_XLOG_ID && info == XLOG_CHECKPOINT_SHUTDOWN) ||
+ (rmid == RM_XLOG_ID && info == XLOG_PARAMETER_CHANGE) ||
+ (rmid == RM_XACT_ID && info == XLOG_XACT_PREPARE) ||
+ (curinsert_flags & XLOG_MARK_ESSENTIAL)))

Or,

+   if (wal_level == WAL_LEVEL_NONE &&
+!(curinsert_flags & XLOG_MARK_ESSENTIAL))

and add the new flag when emitting XLOG_CHECKPOINT_ONLINE, 
XLOG_PARAMETER_CHANGE and XLOG_PREPARE records.  I think both have good 
reasons: the former centralizes the handling of XACT and XLOG RM WAL records 
(as the current XLOG module does already), and the latter delegates the 
decision to each module.  Which would you prefer?  (I kind of like the former, 
but this is a weak opinion.)


Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2021-01-12 Thread movead li

I read the patch and have two points:

1. I do basebackup for database then switch wal level from logical to none to 
logical and
of cause I archive the wal segments. Next I do PITR base on the basebackup, as 
a result
it success startup with a waring said maybe data missed.

Because the 'none' level is to bulkload data, do you think it's good that we 
still support
recover from a 'none' wal level.

2. I just mark wal_level as 'none' but fail to startup, it success after I drop 
the publication and
it's subscription,mark max_wal_senders as 0, drop replicate slot. I think it 
worth to write how 
we can startup a 'none' wal level database in document .

Re: Disable WAL logging to speed up data loading

2021-01-12 Thread Kyotaro Horiguchi

At Tue, 12 Jan 2021 07:09:28 +, "osumi.takami...@fujitsu.com" 
 wrote in 
> On Tuesday, January 12, 2021 12:52 PM Takayuki/綱川 貴之 
>  wrote:
> > From: Osumi, Takamichi/大墨 昂道 
> > > I updated the patch following this discussion, and fixed the
> > > documentation as well.
> > 
> > 
> > + (rmid == RM_GIST_ID && info == RM_GIST_ID) ||
> > + (rmid == RM_GENERIC_ID)))
> > 
> > info is wrong for GiST, and one pair of parenthesis is unnecessary.  The 
> > above
> > would be:
> > 
> > + (rmid == RM_GIST_ID && info ==
> > XLOG_GIST_ASSIGN_LSN) ||
> > + rmid == RM_GENERIC_ID))
> Oops, sorry for this careless mistake.
> Fixed. And again, regression test produces no failure.

@@ -449,6 +450,18 @@ XLogInsert(RmgrId rmid, uint8 info)
return EndPos;
}
 
+   /* Issues only limited types of WAL when wal logging is disabled */
+   if (wal_level == WAL_LEVEL_NONE &&
+   !((rmid == RM_XLOG_ID && info == XLOG_CHECKPOINT_SHUTDOWN) ||
+ (rmid == RM_XLOG_ID && info == XLOG_PARAMETER_CHANGE) ||
+ (rmid == RM_XACT_ID && info == XLOG_XACT_PREPARE) ||
+ (rmid == RM_GIST_ID && info == XLOG_GIST_ASSIGN_LSN) ||
+ rmid == RM_GENERIC_ID))

As Sawada-san mentioned, this prevents future index AMs from being
pluggable.  Providing an interface would work but seems a bit too
invasive.  The record insertion flags is there for this very purpose.

XLogBeginInsert();
XLogSetRecrodFlags(XLOG_MARK_ESSENTIAL);  # new flag value
XLOGInsert();

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Disable WAL logging to speed up data loading

2021-01-11 Thread tsunakawa.ta...@fujitsu.com

From: Osumi, Takamichi/大墨 昂道 
> Oops, sorry for this careless mistake.
> Fixed. And again, regression test produces no failure.

Now correct.  (Remains ready for committer.)


Regards
Takayuki Tsunakawa

RE: Disable WAL logging to speed up data loading

2021-01-11 Thread osumi.takami...@fujitsu.com

On Tuesday, January 12, 2021 12:52 PM Takayuki/綱川 貴之 
 wrote:
> From: Osumi, Takamichi/大墨 昂道 
> > I updated the patch following this discussion, and fixed the
> > documentation as well.
> 
> 
> +   (rmid == RM_GIST_ID && info == RM_GIST_ID) ||
> +   (rmid == RM_GENERIC_ID)))
> 
> info is wrong for GiST, and one pair of parenthesis is unnecessary.  The above
> would be:
> 
> +   (rmid == RM_GIST_ID && info ==
> XLOG_GIST_ASSIGN_LSN) ||
> +   rmid == RM_GENERIC_ID))
Oops, sorry for this careless mistake.
Fixed. And again, regression test produces no failure.

Best Regards,
Takamichi Osumi


disable_WAL_logging_v07.patch
Description: disable_WAL_logging_v07.patch

RE: Disable WAL logging to speed up data loading

2021-01-11 Thread tsunakawa.ta...@fujitsu.com

From: Osumi, Takamichi/大墨 昂道 
> I updated the patch following this discussion, and fixed the documentation as
> well.


+ (rmid == RM_GIST_ID && info == RM_GIST_ID) ||
+ (rmid == RM_GENERIC_ID)))

info is wrong for GiST, and one pair of parenthesis is unnecessary.  The above 
would be:

+ (rmid == RM_GIST_ID && info == XLOG_GIST_ASSIGN_LSN) ||
+ rmid == RM_GENERIC_ID))


Regards
Takayuki Tsunakawa

RE: Disable WAL logging to speed up data loading

2021-01-11 Thread osumi.takami...@fujitsu.com

Hi

On Mon, Jan 11, 2021 9:14 AM Tsunakawa, Takayuki  
wrote:
> From: Masahiko Sawada 
> > I think it's better to have index AM (and perhaps table AM) control it
> > instead of filtering in XLogInsert(). Because otherwise third-party
> > access methods that use LSN like gist indexes cannot write WAL records
> > at all when wal_level = none even if they want.
> 
> Hm, third-party extensions use RM_GENERIC_ID, so it should probably be
> allowed in XLogInsert() as well instead of allowing control in some other way.
> It's not unnatural that WAL records for in-core AMs are filtered in 
> XLogInsert().
I updated the patch following this discussion,
and fixed the documentation as well.
No failure is found during make check-world.


Best Regards,
Takamichi Osumi


disable_WAL_logging_v06.patch
Description: disable_WAL_logging_v06.patch

RE: Disable WAL logging to speed up data loading

2021-01-10 Thread tsunakawa.ta...@fujitsu.com

From: Masahiko Sawada 
> I think it's better to have index AM (and perhaps table AM) control it
> instead of filtering in XLogInsert(). Because otherwise third-party
> access methods that use LSN like gist indexes cannot write WAL records
> at all when wal_level = none even if they want.

Hm, third-party extensions use RM_GENERIC_ID, so it should probably be allowed 
in XLogInsert() as well instead of allowing control in some other way.  It's 
not unnatural that WAL records for in-core AMs are filtered in XLogInsert().


Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2021-01-08 Thread Masahiko Sawada

On Fri, Jan 8, 2021 at 2:12 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Robert Haas 
> > Were the issues that I mentioned regarding GIST (and maybe other AMs)
> > in the last paragraph of
> > http://postgr.es/m/CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRW
> > dmrev...@mail.gmail.com
> > addressed in some way? That seems like a pretty hard engineering
> > problem to me, and I don't see that there's been any discussion of it.
> > Those are correctness concerns separate from any wal_level tracking we
> > might want to do to avoid accidental mistakes.
>
> Thank you very much for reminding me of this.  I forgot I replied as follows:
>
>
> --
> Unlogged GiST indexes use fake LSNs that are instance-wide.  Unlogged 
> temporary GiST indexes use backend-local sequence values.  Other unlogged and 
> temporary relations don't set LSNs on pages.  So, I think it's enough to call 
> GetFakeLSNForUnloggedRel() when wal_level = none as well.
> --
>
>
> But this is not correct.  We have to allow (RM_GIST_ID, XLOG_GIST_ASSIGN_LSN) 
> WAL records to be emitted (by tweaking the filter in XLogInsert()), just like 
> those WAL records are emitted when wal_level = minimal and and other WAL 
> records are not emitted.

I think it's better to have index AM (and perhaps table AM) control it
instead of filtering in XLogInsert(). Because otherwise third-party
access methods that use LSN like gist indexes cannot write WAL records
at all when wal_level = none even if they want.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: Disable WAL logging to speed up data loading

2021-01-07 Thread Kyotaro Horiguchi

At Fri, 8 Jan 2021 05:12:02 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> From: Robert Haas 
> > Were the issues that I mentioned regarding GIST (and maybe other AMs)
> > in the last paragraph of
> > http://postgr.es/m/CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRW
> > dmrev...@mail.gmail.com
> > addressed in some way? That seems like a pretty hard engineering
> > problem to me, and I don't see that there's been any discussion of it.
> > Those are correctness concerns separate from any wal_level tracking we
> > might want to do to avoid accidental mistakes.
> 
> Thank you very much for reminding me of this.  I forgot I replied as follows:
> 
> 
> --
> Unlogged GiST indexes use fake LSNs that are instance-wide.  Unlogged 
> temporary GiST indexes use backend-local sequence values.  Other unlogged and 
> temporary relations don't set LSNs on pages.  So, I think it's enough to call 
> GetFakeLSNForUnloggedRel() when wal_level = none as well.
> --
> 
> 
> But this is not correct.  We have to allow (RM_GIST_ID,
> XLOG_GIST_ASSIGN_LSN) WAL records to be emitted (by tweaking the
> filter in XLogInsert()), just like those WAL records are emitted
> when wal_level = minimal and and other WAL records are not emitted.

Yeah, although LOGGED and UNLOGGED use incompatible LSNs, LOGGED
tables always uses real LSNs, maintained by that type of WAL record
while wal_level = minimal.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Disable WAL logging to speed up data loading

2021-01-07 Thread tsunakawa.ta...@fujitsu.com

From: Robert Haas 
> Were the issues that I mentioned regarding GIST (and maybe other AMs)
> in the last paragraph of
> http://postgr.es/m/CA+TgmoZEZ5RONS49C7mEpjhjndqMQtVrz_LCQUkpRW
> dmrev...@mail.gmail.com
> addressed in some way? That seems like a pretty hard engineering
> problem to me, and I don't see that there's been any discussion of it.
> Those are correctness concerns separate from any wal_level tracking we
> might want to do to avoid accidental mistakes.

Thank you very much for reminding me of this.  I forgot I replied as follows:


--
Unlogged GiST indexes use fake LSNs that are instance-wide.  Unlogged temporary 
GiST indexes use backend-local sequence values.  Other unlogged and temporary 
relations don't set LSNs on pages.  So, I think it's enough to call 
GetFakeLSNForUnloggedRel() when wal_level = none as well.
--


But this is not correct.  We have to allow (RM_GIST_ID, XLOG_GIST_ASSIGN_LSN) 
WAL records to be emitted (by tweaking the filter in XLogInsert()), just like 
those WAL records are emitted when wal_level = minimal and and other WAL 
records are not emitted.


Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2021-01-07 Thread Robert Haas

On Wed, Dec 2, 2020 at 10:53 PM tsunakawa.ta...@fujitsu.com
 wrote:
> The code looks good, and the performance seems to be nice, so I marked this 
> ready for committer.

Were the issues that I mentioned regarding GIST (and maybe other AMs)
in the last paragraph of
http://postgr.es/m/ca+tgmozez5rons49c7mepjhjndqmqtvrz_lcqukprwdmrev...@mail.gmail.com
addressed in some way? That seems like a pretty hard engineering
problem to me, and I don't see that there's been any discussion of it.
Those are correctness concerns separate from any wal_level tracking we
might want to do to avoid accidental mistakes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

RE: Disable WAL logging to speed up data loading

2021-01-06 Thread osumi.takami...@fujitsu.com

Hi,


On Tuesday, January 5, 2021 5:45 PM
Masahiko Sawada  wrote:
> On Tue, Jan 5, 2021 at 10:54 AM osumi.takami...@fujitsu.com
>  wrote:
> > On Monday, December 28, 2020 7:12 PM Masahiko Sawada
>  wrote:
> > > On Mon, Dec 28, 2020 at 4:29 PM osumi.takami...@fujitsu.com
> > >  wrote:
> > > > On Monday, December 28, 2020 2:29 PM Masahiko Sawada
> > >  wrote:
> > > > > On Thu, Dec 3, 2020 at 12:14 PM osumi.takami...@fujitsu.com
> > > > >  wrote:
> > > > > >
> > > > > > I've made a new patch v05 that took in comments to filter out
> > > > > > WALs more strictly and addressed some minor fixes that were
> > > > > > discussed within past few days.
> > > > > > Also, I changed the documentations, considering those
> modifications.
> > > > >
> > > > > From a backup management tool perspective, how can they detect
> > > > > that wal_level has been changed to ‘none' (and back to the
> > > > > normal)? IIUC once we changed wal_level to none, old backups
> > > > > that are taken before setting to ‘none’ can be used only for
> > > > > restoring the database to the point before the LSN where setting
> > > > > 'wal_level = none'. The users can neither restore the database
> > > > > to any points in the term of 'wal_level = none' nor use an old
> > > > > backup to restore the database to the point after LSN where
> > > > > setting 'wal_level = none’. I think we might need to provide a
> > > > > way to detect the changes other than reading
> > > XLOG_PARAMETER_CHANGE.
> > > > In the past, we discussed the aspect of backup management tool in
> > > > [1] and concluded that this should be another patch separated from
> > > > this thread because to compare the wal_level changes between
> > > > snapshots applies to wal_level = minimal, too. Please have a look at the
> "second idea"
> > > > in the e-mail in the [1] and responses to it.
> > > >
> > >
> > > Thank you for telling me about the discussion!
> > >
> > > The discussion already started on another thread? I think it might
> > > be better to consider it in parallel, if not started yet. We can
> > > verify beforehand that the proposed solutions will really work well
> > > together with backup management tools. And from the user
> > > perspective, I wonder if they would like to see this feature in the
> > > same release where wal_level = none is introduced. Otherwise, the
> > > fact that some backup management tools won’t be able to work
> > > together with wal_level = none will be a big restriction for users. That 
> > > patch
> would probably not be a blocker of this patch but will facilitate the 
> discussion.
> > I don't think the new thread is created already.
> >
> > By the way, I checked documents and user manuals of backup management
> > tools like pgBackRest, EDB's BART and pg_probackup respectively.
> > These tools work when wal_level is higher than minimal because these
> > use physical online backup or wal archiving and thus they don't need
> > to work together with wal_level=minimal or none.
> 
> Thank you for checking! The use case I imagined is that the user temporarily
> changes wal_level to 'none' from 'replica' or 'logical' to speed up loading 
> and
> changes back to the normal. In this case, the backups taken before the
> wal_level change cannot be used to restore the database to the point after the
> wal_level change. So I think backup management tools would want to
> recognize the time or LSN when/where wal_level is changed to ‘none’ in order
> to do some actions such as invalidating older backups, re-calculating backup
> redundancy etc.
> 
> Actually the same is true when the user changes to ‘minimal’. The tools would
> need to recognize the time or LSN in this case too. Since temporarily changing
> wal_level has been an uncommon use case some tools perhaps are not aware
> of that yet. But since introducing wal_level = ’none’ could make the change
> common, I think we need to provide a way for the tools.
Sawada-san, thanks a lot for your explanation and
for sure it's about time to launch a new thread for us.

This discussion can be separated from this wal_level thread and I agree with
your idea that what we are talking now would not be a blocker of the new 
wal_level patch.
So, kindly don't think that the new thread damages the content of this 
wal_level=none discussion first of all.
I'll create a new one soon.

Best Regards,
Takamichi Osumi

Re: Disable WAL logging to speed up data loading

2021-01-05 Thread Masahiko Sawada

On Tue, Jan 5, 2021 at 10:54 AM osumi.takami...@fujitsu.com
 wrote:
>
> Hi, Sawada-San
>
>
> On Monday, December 28, 2020 7:12 PM Masahiko Sawada  
> wrote:
> > On Mon, Dec 28, 2020 at 4:29 PM osumi.takami...@fujitsu.com
> >  wrote:
> > > On Monday, December 28, 2020 2:29 PM Masahiko Sawada
> >  wrote:
> > > > On Thu, Dec 3, 2020 at 12:14 PM osumi.takami...@fujitsu.com
> > > >  wrote:
> > > > >
> > > > > I've made a new patch v05 that took in comments to filter out WALs
> > > > > more strictly and addressed some minor fixes that were discussed
> > > > > within past few days.
> > > > > Also, I changed the documentations, considering those modifications.
> > > >
> > > > From a backup management tool perspective, how can they detect that
> > > > wal_level has been changed to ‘none' (and back to the normal)? IIUC
> > > > once we changed wal_level to none, old backups that are taken before
> > > > setting to ‘none’ can be used only for restoring the database to the
> > > > point before the LSN where setting 'wal_level = none'. The users can
> > > > neither restore the database to any points in the term of 'wal_level
> > > > = none' nor use an old backup to restore the database to the point
> > > > after LSN where setting 'wal_level = none’. I think we might need to
> > > > provide a way to detect the changes other than reading
> > XLOG_PARAMETER_CHANGE.
> > > In the past, we discussed the aspect of backup management tool in [1]
> > > and concluded that this should be another patch separated from this
> > > thread because to compare the wal_level changes between snapshots
> > > applies to wal_level = minimal, too. Please have a look at the "second 
> > > idea"
> > > in the e-mail in the [1] and responses to it.
> > >
> >
> > Thank you for telling me about the discussion!
> >
> > The discussion already started on another thread? I think it might be 
> > better to
> > consider it in parallel, if not started yet. We can verify beforehand that 
> > the
> > proposed solutions will really work well together with backup management
> > tools. And from the user perspective, I wonder if they would like to see 
> > this
> > feature in the same release where wal_level = none is introduced. Otherwise,
> > the fact that some backup management tools won’t be able to work together
> > with wal_level = none will be a big restriction for users. That patch would
> > probably not be a blocker of this patch but will facilitate the discussion.
> I don't think the new thread is created already.
>
> By the way, I checked documents and user manuals of backup management tools 
> like
> pgBackRest, EDB's BART and pg_probackup respectively.
> These tools work when wal_level is higher than minimal
> because these use physical online backup or wal archiving and thus
> they don't need to work together with wal_level=minimal or none.

Thank you for checking! The use case I imagined is that the user
temporarily changes wal_level to 'none' from 'replica' or 'logical' to
speed up loading and changes back to the normal. In this case, the
backups taken before the wal_level change cannot be used to restore
the database to the point after the wal_level change. So I think
backup management tools would want to recognize the time or LSN
when/where wal_level is changed to ‘none’ in order to do some actions
such as invalidating older backups, re-calculating backup redundancy
etc.

Actually the same is true when the user changes to ‘minimal’. The
tools would need to recognize the time or LSN in this case too. Since
temporarily changing wal_level has been an uncommon use case some
tools perhaps are not aware of that yet. But since introducing
wal_level = ’none’ could make the change common, I think we need to
provide a way for the tools.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

RE: Disable WAL logging to speed up data loading

2021-01-04 Thread osumi.takami...@fujitsu.com

Hi, Sawada-San


On Monday, December 28, 2020 7:12 PM Masahiko Sawada  
wrote:
> On Mon, Dec 28, 2020 at 4:29 PM osumi.takami...@fujitsu.com
>  wrote:
> > On Monday, December 28, 2020 2:29 PM Masahiko Sawada
>  wrote:
> > > On Thu, Dec 3, 2020 at 12:14 PM osumi.takami...@fujitsu.com
> > >  wrote:
> > > >
> > > > I've made a new patch v05 that took in comments to filter out WALs
> > > > more strictly and addressed some minor fixes that were discussed
> > > > within past few days.
> > > > Also, I changed the documentations, considering those modifications.
> > >
> > > From a backup management tool perspective, how can they detect that
> > > wal_level has been changed to ‘none' (and back to the normal)? IIUC
> > > once we changed wal_level to none, old backups that are taken before
> > > setting to ‘none’ can be used only for restoring the database to the
> > > point before the LSN where setting 'wal_level = none'. The users can
> > > neither restore the database to any points in the term of 'wal_level
> > > = none' nor use an old backup to restore the database to the point
> > > after LSN where setting 'wal_level = none’. I think we might need to
> > > provide a way to detect the changes other than reading
> XLOG_PARAMETER_CHANGE.
> > In the past, we discussed the aspect of backup management tool in [1]
> > and concluded that this should be another patch separated from this
> > thread because to compare the wal_level changes between snapshots
> > applies to wal_level = minimal, too. Please have a look at the "second idea"
> > in the e-mail in the [1] and responses to it.
> >
> 
> Thank you for telling me about the discussion!
> 
> The discussion already started on another thread? I think it might be better 
> to
> consider it in parallel, if not started yet. We can verify beforehand that the
> proposed solutions will really work well together with backup management
> tools. And from the user perspective, I wonder if they would like to see this
> feature in the same release where wal_level = none is introduced. Otherwise,
> the fact that some backup management tools won’t be able to work together
> with wal_level = none will be a big restriction for users. That patch would
> probably not be a blocker of this patch but will facilitate the discussion.
I don't think the new thread is created already.

By the way, I checked documents and user manuals of backup management tools like
pgBackRest, EDB's BART and pg_probackup respectively.
These tools work when wal_level is higher than minimal
because these use physical online backup or wal archiving and thus
they don't need to work together with wal_level=minimal or none.

> the fact that some backup management tools won’t be able to work together
> with wal_level = none will be a big restriction for users.
Accordingly, it turned out that current situation or introducing wal_level=none
doesn't become restriction for users from the backup management tools 
perspective.
Any comments ?

Best Regards,
Takamichi Osumi

RE: Disable WAL logging to speed up data loading

2020-12-31 Thread tsunakawa.ta...@fujitsu.com

From: Simon Riggs 
> Agreed, it is a footgun. -1 to commit the patch as-is.
> 
> The patch to avoid WAL is simple but it is dangerous for both the user
> and the PostgreSQL project.
> 
> In my experience, people will use this option and when it crashes and
> they lose their data, they will claim PostgreSQL broke and that they
> were not stupid enough to use this option. Data loss has always been
> the most serious error for PostgreSQL and our reputation for
> protecting data has been hard won; it can easily be lost in a moment
> of madness. Please consider how the headlines will read, bearing in
> mind past incidents and negative press. Yes, we did think of this
> feature already and rejected it.

Could you share the negative press that blames Postgres due to the user's 
misuse of some feature like fsync?


> If we ever did allow such an option, it must contain these things (IMHO):
> * the option should be called "unsafe" or "allows_data_loss", not just
> "none" (anything less than "minimal" must be insufficient or
> unsafe...)

One idea I proposed is "wal_level = unrecoverable" because it clearly states 
the bad consequence and Oracle also uses UNRECOVERABLE as a data loading option 
(and I found it intuitive.)  But others here commented that "none" would be OK. 
 I don't have a strong opinion on naming, as I think what's important is warn 
the user in the documentation.


> * the option must be set in the control file and be part of the same
> patch, so users cannot easily edit things to hide their unsafe usage

wal_level setting is stored in the control file as before.  If the server 
crashes while wal_level is set to none, the server refuses to start saying 
that's because wal_level is set to none, which is like MySQL.  So, the user 
cannot hide their misuse.


> * we must not support the option of going down to "unsafe" and then
> back up again. It must be a one-way transition from "unsafe" to a
> higher level, so if people want to use this for temp reporting servers
> or initial loading, great, but they can't use it as a quick speed-up
> for databases containing needs-to-be-safe data. Possibly the state
> change might be "unsafe" -> "needs_backup" -> "minimal"... or some
> other way to signal to backup.

I'm afraid I don't get a clear image.  Could you elaborate on that?  But 
anyway, I think that could be an overreaction and a prominent caution would 
suffice (like the one in this patch.)


Regards
Takayuki Tsunakawa

RE: Disable WAL logging to speed up data loading

2020-12-31 Thread tsunakawa.ta...@fujitsu.com

From: Michael Paquier 
> Something that has not been mentioned on this thread is that if you could also
> put pg_wal/ on a RAM disk.  That's similarly unsafe, of course, but it does 
> not
> require any extra upstream patching, and that should be really fast for the 
> case
> of this thread.  If you care about consistency, there are solutions like PMEM
> that we could look more into.

As noted in this thread, the customer is worried about and wants to avoid 
handling the possible high volume storage of WAL during the data loading.  
That's understandable.  In that regard, DRAM and PMEM are worse because their 
capacity is more limited than SSD.


Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2020-12-30 Thread Simon Riggs

On Wed, 30 Dec 2020 at 13:04, Michael Paquier  wrote:

> > Yes, we did think of this feature already and rejected it.
>
> I don't recall seeing such a discussion.  Do you have some references?
> Just a matter of curiosity :)

Off-list discussions.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Disable WAL logging to speed up data loading

2020-12-30 Thread Michael Paquier

On Mon, Dec 28, 2020 at 11:06:30AM +, Simon Riggs wrote:
> Agreed, it is a footgun. -1 to commit the patch as-is.
> 
> The patch to avoid WAL is simple but it is dangerous for both the user
> and the PostgreSQL project.

Something that has not been mentioned on this thread is that if you
could also put pg_wal/ on a RAM disk.  That's similarly unsafe, of
course, but it does not require any extra upstream patching, and that
should be really fast for the case of this thread.  If you care about
consistency, there are solutions like PMEM that we could look more 
into.

> In my experience, people will use this option and when it crashes and
> they lose their data, they will claim PostgreSQL broke and that they
> were not stupid enough to use this option. Data loss has always been
> the most serious error for PostgreSQL and our reputation for
> protecting data has been hard won; it can easily be lost in a moment
> of madness. Please consider how the headlines will read, bearing in
> mind past incidents and negative press.

Yeah..

> Yes, we did think of this feature already and rejected it.

I don't recall seeing such a discussion.  Do you have some references?
Just a matter of curiosity :)
--
Michael


signature.asc
Description: PGP signature

Re: Disable WAL logging to speed up data loading

2020-12-28 Thread Simon Riggs

On Fri, 25 Dec 2020 at 07:09, Michael Paquier  wrote:
>
> On Thu, Dec 03, 2020 at 03:52:47AM +, tsunakawa.ta...@fujitsu.com wrote:
> > The code looks good, and the performance seems to be nice, so I
> > marked this ready for committer.
>
> FWIW, I am extremely afraid of this proposal because this is basically
> a footgun able to corrupt customer instances, and I am ready to bet
> that people would switch wal_level to none because they see a lot of
> benefits in doing so at first sight, until the host running the server
> is plugged off and they need to use pg_resetwal in urgency to bring an
> instance online.

Agreed, it is a footgun. -1 to commit the patch as-is.

The patch to avoid WAL is simple but it is dangerous for both the user
and the PostgreSQL project.

In my experience, people will use this option and when it crashes and
they lose their data, they will claim PostgreSQL broke and that they
were not stupid enough to use this option. Data loss has always been
the most serious error for PostgreSQL and our reputation for
protecting data has been hard won; it can easily be lost in a moment
of madness. Please consider how the headlines will read, bearing in
mind past incidents and negative press. Yes, we did think of this
feature already and rejected it.

If we ever did allow such an option, it must contain these things (IMHO):
* the option should be called "unsafe" or "allows_data_loss", not just
"none" (anything less than "minimal" must be insufficient or
unsafe...)
* the option must be set in the control file and be part of the same
patch, so users cannot easily edit things to hide their unsafe usage
* we must not support the option of going down to "unsafe" and then
back up again. It must be a one-way transition from "unsafe" to a
higher level, so if people want to use this for temp reporting servers
or initial loading, great, but they can't use it as a quick speed-up
for databases containing needs-to-be-safe data. Possibly the state
change might be "unsafe" -> "needs_backup" -> "minimal"... or some
other way to signal to backup.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Disable WAL logging to speed up data loading

2020-12-28 Thread Masahiko Sawada

On Mon, Dec 28, 2020 at 4:29 PM osumi.takami...@fujitsu.com
 wrote:
>
> Hello Sawada-San
>
>
> On Monday, December 28, 2020 2:29 PM Masahiko Sawada  
> wrote:
> > On Thu, Dec 3, 2020 at 12:14 PM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > I've made a new patch v05 that took in comments to filter out WALs
> > > more strictly and addressed some minor fixes that were discussed
> > > within past few days.
> > > Also, I changed the documentations, considering those modifications.
> >
> > From a backup management tool perspective, how can they detect that
> > wal_level has been changed to ‘none' (and back to the normal)? IIUC once
> > we changed wal_level to none, old backups that are taken before setting to
> > ‘none’ can be used only for restoring the database to the point before the
> > LSN where setting 'wal_level = none'. The users can neither restore the
> > database to any points in the term of 'wal_level = none' nor use an old 
> > backup
> > to restore the database to the point after LSN where setting 'wal_level =
> > none’. I think we might need to provide a way to detect the changes other
> > than reading XLOG_PARAMETER_CHANGE.
> In the past, we discussed the aspect of backup management tool in [1]
> and concluded that this should be another patch separated from this thread
> because to compare the wal_level changes between snapshots
> applies to wal_level = minimal, too. Please have a look at the "second idea"
> in the e-mail in the [1] and responses to it.
>

Thank you for telling me about the discussion!

The discussion already started on another thread? I think it might be
better to consider it in parallel, if not started yet. We can verify
beforehand that the proposed solutions will really work well together
with backup management tools. And from the user perspective, I wonder
if they would like to see this feature in the same release where
wal_level = none is introduced. Otherwise, the fact that some backup
management tools won’t be able to work together with wal_level = none
will be a big restriction for users. That patch would probably not be
a blocker of this patch but will facilitate the discussion.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

RE: Disable WAL logging to speed up data loading

2020-12-27 Thread osumi.takami...@fujitsu.com

Hello Sawada-San


On Monday, December 28, 2020 2:29 PM Masahiko Sawada  
wrote:
> On Thu, Dec 3, 2020 at 12:14 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > I've made a new patch v05 that took in comments to filter out WALs
> > more strictly and addressed some minor fixes that were discussed
> > within past few days.
> > Also, I changed the documentations, considering those modifications.
> 
> From a backup management tool perspective, how can they detect that
> wal_level has been changed to ‘none' (and back to the normal)? IIUC once
> we changed wal_level to none, old backups that are taken before setting to
> ‘none’ can be used only for restoring the database to the point before the
> LSN where setting 'wal_level = none'. The users can neither restore the
> database to any points in the term of 'wal_level = none' nor use an old backup
> to restore the database to the point after LSN where setting 'wal_level =
> none’. I think we might need to provide a way to detect the changes other
> than reading XLOG_PARAMETER_CHANGE.
In the past, we discussed the aspect of backup management tool in [1]
and concluded that this should be another patch separated from this thread
because to compare the wal_level changes between snapshots
applies to wal_level = minimal, too. Please have a look at the "second idea"
in the e-mail in the [1] and responses to it.

 [1] - 
https://www.postgresql.org/message-id/OSBPR01MB4888B34B81A6E0DD46B5D063EDE00%40OSBPR01MB4888.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi

Re: Disable WAL logging to speed up data loading

2020-12-27 Thread Masahiko Sawada

On Thu, Dec 3, 2020 at 12:14 PM osumi.takami...@fujitsu.com
 wrote:
>
> Hello
>
>
> I've made a new patch v05 that took in comments
> to filter out WALs more strictly and addressed some minor fixes
> that were discussed within past few days.
> Also, I changed the documentations, considering those modifications.
>

>From a backup management tool perspective, how can they detect that
wal_level has been changed to ‘none' (and back to the normal)? IIUC
once we changed wal_level to none, old backups that are taken before
setting to ‘none’ can be used only for restoring the database to the
point before the LSN where setting 'wal_level = none'. The users can
neither restore the database to any points in the term of 'wal_level =
none' nor use an old backup to restore the database to the point after
LSN where setting 'wal_level = none’. I think we might need to provide
a way to detect the changes other than reading XLOG_PARAMETER_CHANGE.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

RE: Disable WAL logging to speed up data loading

2020-12-25 Thread osumi.takami...@fujitsu.com

Hi, Michael


Thank you for your attention to this thread.
On Friday, December 25, 2020 4:09 PM  Michael Paquier  
wrote:
> On Thu, Dec 03, 2020 at 03:52:47AM +, tsunakawa.ta...@fujitsu.com
> wrote:
> > The code looks good, and the performance seems to be nice, so I marked
> > this ready for committer.
> 
> FWIW, I am extremely afraid of this proposal because this is basically a
> footgun able to corrupt customer instances, and I am ready to bet that people
> would switch wal_level to none because they see a lot of benefits in doing so
> at first sight, until the host running the server is plugged off and they 
> need to
> use pg_resetwal in urgency to bring an instance online. 
In the patch, I added plenty of descriptions (especially cautions) to the 
documents.
At least, when users notice the existence of "none" parameter for wal_level,
they cannot avoid having a look at such cautions.
Accordingly, it's really impossible that they see only the merits.

In terms of pg_resetwal,
it should not happen that they use it to utilize the server
corrupted by failure of any operation during wal_level=none.

I documented clearly
this wal_level is designed to make
the server never start up again when any unexpected crash is detected
and thus users need to recreate the whole cluster again.

> Users have already
> the option to make things go bad, just by disabling full page writes or fsync,
> but I really don't think that we should put in their hands more options able 
> to
> break instances, nor should we try to spend more efforts in having more
> "protections" that would trigger only once the instance is already fried.
> 
> Perhaps this is something that Horiguchi-san pointed out upthread in [1]
> (last sentence of first paragraph), but did you consider that it is already
> possible to do bulk-loading with a minimal amount of WAL generated as long
> as you do the COPY within the transaction that created the table?  Quoting
> the docs in [2]:
> "COPY is fastest when used within the same transaction as an earlier
> CREATE TABLE or TRUNCATE command. In such cases no WAL needs to be
> written, because in case of an error, the files containing the newly loaded 
> data
> will be removed anyway. However, this consideration only applies when
> wal_level is minimal as all commands must write WAL otherwise."
> 
> Upgrade scenarios have been mentioned in this case as being a pain when it
> comes to take advantage of this optimization.  Wouldn't it be safer if we took
> a client approach instead, where restores are able to load the data with a
> cheap succession of commands by loading the data in a transaction done
> after a TRUNCATE?
In a data warehouse environment,
the environment of our scenario in this thread in mind,
I think that to truncate target table
beforehand in the same transaction is not always realistic solution.
Imagine an appended data loading case to execute bulk data load
into one specific table with a bunch of records. 

Best Regards,
Takamichi Osumi

Re: Disable WAL logging to speed up data loading

2020-12-24 Thread Michael Paquier

On Thu, Dec 03, 2020 at 03:52:47AM +, tsunakawa.ta...@fujitsu.com wrote:
> The code looks good, and the performance seems to be nice, so I
> marked this ready for committer.

FWIW, I am extremely afraid of this proposal because this is basically
a footgun able to corrupt customer instances, and I am ready to bet
that people would switch wal_level to none because they see a lot of
benefits in doing so at first sight, until the host running the server
is plugged off and they need to use pg_resetwal in urgency to bring an
instance online.  Users have already the option to make things go bad,
just by disabling full page writes or fsync, but I really don't think
that we should put in their hands more options able to break
instances, nor should we try to spend more efforts in having more
"protections" that would trigger only once the instance is already
fried.

Perhaps this is something that Horiguchi-san pointed out upthread in
[1] (last sentence of first paragraph), but did you consider that it
is already possible to do bulk-loading with a minimal amount of WAL
generated as long as you do the COPY within the transaction that
created the table?  Quoting the docs in [2]:
"COPY is fastest when used within the same transaction as an earlier
CREATE TABLE or TRUNCATE command. In such cases no WAL needs to be
written, because in case of an error, the files containing the newly
loaded data will be removed anyway. However, this consideration only
applies when wal_level is minimal as all commands must write WAL
otherwise."

Upgrade scenarios have been mentioned in this case as being a pain
when it comes to take advantage of this optimization.  Wouldn't it be
safer if we took a client approach instead, where restores are able to
load the data with a cheap succession of commands by loading the data
in a transaction done after a TRUNCATE?

[1]: 
https://www.postgresql.org/message-id/20201001.115136.288898409051085426.horikyota@gmail.com
[2]: https://www.postgresql.org/docs/current/populate.html#POPULATE-COPY-FROM
--
Michael

signature.asc
Description: PGP signature

RE: Disable WAL logging to speed up data loading

2020-12-02 Thread tsunakawa.ta...@fujitsu.com

From: Osumi, Takamichi/大墨 昂道 
> I've made a new patch v05 that took in comments to filter out WALs more 
> strictly
> and addressed some minor fixes that were discussed within past few days.
> Also, I changed the documentations, considering those modifications.

The code looks good, and the performance seems to be nice, so I marked this 
ready for committer.


Regards
Takayuki Tsunakawa

RE: Disable WAL logging to speed up data loading

2020-12-02 Thread osumi.takami...@fujitsu.com

Hello


I've made a new patch v05 that took in comments
to filter out WALs more strictly and addressed some minor fixes
that were discussed within past few days.
Also, I changed the documentations, considering those modifications.


Best,
Takamichi Osumi



disable_WAL_logging_v05.patch
Description: disable_WAL_logging_v05.patch

RE: Disable WAL logging to speed up data loading

2020-12-01 Thread tsunakawa.ta...@fujitsu.com

From: Osumi, Takamichi/大墨 昂道 
> I executed each wal_level three times and calculated the average time
> and found that disabling WAL logging reduced about 73 % of the minimal's
> loading speed
> in this test. This speed-up came from the difference of generated WAL sizes.

So, it's 4x speedup when the WAL buffer and/or disk is likely to be saturated.  
That's nice!


> In this test, to load the data generated more than 10GB of WALs with
> wal_level=minimal
> while wal_level=none emits just 760 bytes of WALs.
> 
> I expect this size for none will become smaller when
> I take the modification to filter out the types of WAL which is discussed 
> above.

I don't expect so, because the WAL volume is already very small in this 
workload (probably only the commit records.)


> Also, I monitored numbers of iostat's 'util' and noticed that
> util's spike to use I/O reduced from twice to once when I changed the level
> from minimal to none, which should be the effect of the patch.

It's a bit mysterious.  I thought %util would be constantly 100% when wal_level 
= minimal.  That may be because wal_buffers is large.


Regards
Takayuki Tsunakawa

RE: Disable WAL logging to speed up data loading

2020-12-01 Thread osumi.takami...@fujitsu.com

Hi

Sorry for being late.
On Tuesday, December 1, 2020 10:42 AM Tsunakawa, Takayuki 
 wrote:
> From: Kyotaro Horiguchi  
> > Yeah, although it's enough only to restrict non-harmful records
> > practically, if we find that only a few kinds of records are needed,
> > maybe it's cleaner to allow only required record type(s).
> >
> > Maybe it's right that if we can filter-out records looking only rmid,
> > since the xlog facility doesn't need to know about record types of a
> > resource manager.  But if we need to finer-grained control on the
> > record types, I'm afraid that that's wrong.  However, if we need only
> > the XLOG_CHECKPOINT_SHUTDOWN record, it might be better to let
> > XLogInsert filter records rather than inserting that filtering code to
> > all the caller sites.
> 
> Agreed.  As the kind of WAL records to be emitted is very limited, I think
> XLogInsert() can filter them where the current patch does.  
Yeah, I can do that.
I'll restrict the types of WAL in a strict manner,
which means filtering out the types of WAL in XLogInsert().
I'll modify this point in the next patch.

> > I don't dislike "none" since it seems to me practically "none".  It
> > seems rather correct if we actually need only the shutdown checkpoint
> > record.
> >
> > "unrecoverable" is apparently misleading. "crash_unsafe" is precise
> > but seems somewhat alien being among "logical", "replica" and
> > "minimal".
> 
> OK, I'm happy with "none" too.  We can describe in the manual that some
> limited amount of WAL is emitted.
Sure, no problem. Let's keep adopting "none", then.

By the way, I've conducted one additional performance evaluation
to see whether reducing WAL could contribute to boost the speed of data loading 
or not.

I prepared a test VM with 24 cores, HDD and 128GB memory
and used a 9.3GB input file which can be generated by pgbench's scale factor 
1000.
The test table is a partitioned table whose number of child tables is 20.
The reason why I choose partitioned table is to remove the concern
that any delay of the loading time comes from lock contention of table.
Therefore, I divided the input file into 20 files and loaded each file
by different 20 sessions respectively toward each correspondent table.
I compared wal_level=minimal and wal_level=none this time
by using the server with the PG configurations below.

-
wal_level = minimal or minimal
archive_mode = off
max_prepared_transactions = 128
max_worker_processes = 128
max_parallel_workers = 128
max_wal_senders = 0
max_wal_size = 100GB
wal_buffers = 1GB
checkpoint_timeout = 1d
shared_buffers = 32GB # 25% of RAM
maintenance_work_mem = 26GB # 20% of RAM
work_mem = 26GB # 20% of RAM
--

I executed each wal_level three times and calculated the average time
and found that disabling WAL logging reduced about 73 % of the minimal's 
loading speed
in this test. This speed-up came from the difference of generated WAL sizes.
In this test, to load the data generated more than 10GB of WALs with 
wal_level=minimal
while wal_level=none emits just 760 bytes of WALs.

I double-checked this fact from pg_wal_lsn_diff() for each wal_level
and had a look at the folder size of pg_wal.
I expect this size for none will become smaller when
I take the modification to filter out the types of WAL which is discussed above.
Also, I monitored numbers of iostat's 'util' and noticed that
util's spike to use I/O reduced from twice to once when I changed the level
from minimal to none, which should be the effect of the patch.

Any comments ?

Best,
Takamichi Osumi

RE: Disable WAL logging to speed up data loading

2020-11-30 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> We're emitting only redo logs.  So I think theoretically we don't need
> anything other than the shutdown checkpoint record because we don't
> perform recovery and checkpoint record is required at startup.
> 
> RM_XLOG_ID:
>  XLOG_FPI_FOR_HINT  - not needed?
>  XLOG_FPI   - not needed?
> 
>  XLOG_CHECKPOINT_SHUTDOWN - must have
> 
> So how about the followings?
>  XLOG_CHECKPOINT_ONLINE
>  XLOG_NOOP
>  XLOG_NEXTOID
>  XLOG_SWITCH
>  XLOG_BACKUP_END
>  XLOG_PARAMETER_CHANGE
>  XLOG_RESTORE_POINT
>  XLOG_FPW_CHANGE
>  XLOG_END_OF_RECOVERY
> 
> 
> RM_XACT_ID:
>  XLOG_XACT_COMMIT
>  XLOG_XACT_PREPARE
>  XLOG_XACT_ABORT
>  XLOG_XACT_COMMIT_PREPARED
>  XLOG_XACT_ABORT_PREPARED
>  XLOG_XACT_ASSIGNMENT
>  XLOG_XACT_INVALIDATIONS
> 
> Do we need all of these?

What need to be emitted even when wal_level = none are:

RM_XLOG_ID:
- XLOG_CHECKPOINT_SHUTDOWN
- XLOG_PARAMETER_CHANGE

RM_XACT_ID:
-  XLOG_XACT_PREPARE

XLOG_PARAMETER_CHANGE is necessary to detect during archive recovery that 
wal_level was changed from >= replica to none, thus failing the archive 
recovery.  This is for the fix discussed in this thread to change the error 
level from WARNING to FATAL.


> Yeah, although it's enough only to restrict non-harmful records
> practically, if we find that only a few kinds of records are needed,
> maybe it's cleaner to allow only required record type(s).
> 
> Maybe it's right that if we can filter-out records looking only rmid,
> since the xlog facility doesn't need to know about record types of a
> resource manager.  But if we need to finer-grained control on the
> record types, I'm afraid that that's wrong.  However, if we need only
> the XLOG_CHECKPOINT_SHUTDOWN record, it might be better to let
> XLogInsert filter records rather than inserting that filtering code to
> all the caller sites.

Agreed.  As the kind of WAL records to be emitted is very limited, I think 
XLogInsert() can filter them where the current patch does.  (OTOH, the boot 
strap mode filters WAL coarsely as follows.  I thought we may follow this 
coarse RMID-based filtering as a pessimistic safeguard against new kind of WAL 
records that are necessary even in wal_level = none.)

/*
 * In bootstrap mode, we don't actually log anything but XLOG resources;
 * return a phony record pointer.
 */
if (IsBootstrapProcessingMode() && rmid != RM_XLOG_ID)
{
XLogResetInsertion();
EndPos = SizeOfXLogLongPHD; /* start of 1st chkpt record */
return EndPos;
}


> I don't dislike "none" since it seems to me practically "none".  It
> seems rather correct if we actually need only the shutdown checkpoint
> record.
> 
> "unrecoverable" is apparently misleading. "crash_unsafe" is precise
> but seems somewhat alien being among "logical", "replica" and
> "minimal".

OK, I'm happy with "none" too.  We can describe in the manual that some limited 
amount of WAL is emitted.


Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2020-11-29 Thread Masahiko Sawada

()

On Mon, Nov 30, 2020 at 2:39 PM Kyotaro Horiguchi
 wrote:
>
> At Fri, 27 Nov 2020 13:34:49 +, "osumi.takami...@fujitsu.com" 
>  wrote in
> > Thank you, Horiguchi-San
> >
> > > I haven't seen a criteria of whether a record is emitted or not for
> > > wal_leve=none.
> > >
> > > We're emitting only redo logs.  So I think theoretically we don't need 
> > > anything
> > > other than the shutdown checkpoint record because we don't perform
> > > recovery and checkpoint record is required at startup.
> > >
> > > RM_XLOG_ID:
> > >  XLOG_FPI_FOR_HINT  - not needed?
> > >  XLOG_FPI   - not needed?
> > >
> > >  XLOG_CHECKPOINT_SHUTDOWN - must have
> > >
> > > So how about the followings?
> > >  XLOG_CHECKPOINT_ONLINE
> > >  XLOG_NOOP
> > >  XLOG_NEXTOID
> > >  XLOG_SWITCH
> > >  XLOG_BACKUP_END
> > >  XLOG_PARAMETER_CHANGE
> > >  XLOG_RESTORE_POINT
> > >  XLOG_FPW_CHANGE
> > >  XLOG_END_OF_RECOVERY
> > >
> > >
> > > RM_XACT_ID:
> > >  XLOG_XACT_COMMIT
> > >  XLOG_XACT_PREPARE
> > >  XLOG_XACT_ABORT
> > >  XLOG_XACT_COMMIT_PREPARED
> > >  XLOG_XACT_ABORT_PREPARED
> > >  XLOG_XACT_ASSIGNMENT
> > >  XLOG_XACT_INVALIDATIONS
> > >
> > > Do we need all of these?
> > No. Strictly speaking, you are right.
> > We still have types of WAL that are not necessarily needed.
> > For example, XLOG_END_OF_RECOVERY is not useful
> > because wal_level=none doesn't recover from any accidents.
> > Or, XLOG_CHECKPOINT_ONLINE is used when we execute CHECKPOINT
> > not for shutting down. Thus we could eliminate more.
>
> Yeah, although it's enough only to restrict non-harmful records
> practically, if we find that only a few kinds of records are needed,
> maybe it's cleaner to allow only required record type(s).
>
> > > And, currenly what decides whether to emit a wal record according to
> > > wal_level is the caller of XLogInsert.
> > Yes.
> >
> > > So doing this at XLogInsert-level means
> > > that we bring the criteria of the necessity of wal-record into xlog layer 
> > > only for
> > > wal_level=none. I'm not sure it is the right direction.
> > I'm sorry. I didn't understand what "doing this" and "xlog layer" meant.
>
> "doing this" meant filtering record types.
>
> > Did you mean that fixing the caller side of XLogInsert (e.g. 
> > CreateCheckPoint)
> > is not the right direction ? Or, fixing the function of XLogInsert is not 
> > the right direction ?
>
> Maybe it's right that if we can filter-out records looking only rmid,
> since the xlog facility doesn't need to know about record types of a
> resource manager.  But if we need to finer-grained control on the
> record types, I'm afraid that that's wrong.

I also think we should have the caller of XLogInsert() control whether
or not to write a WAL record according to the setting of wal_level
like XLogStandbyInfoActive() and XLogLogicalInfoActive(). Since most
callers seem to assume the wal record will be written when calling to
XLogInsert() I think it's better not to break that assumption.

> However, if we need only
> the XLOG_CHECKPOINT_SHUTDOWN record, it might be better to let
> XLogInsert filter records rather than inserting that filtering code to
> all the caller sites.
>
> > > At Fri, 27 Nov 2020 07:01:16 +, "tsunakawa.ta...@fujitsu.com"
> > >  wrote in
> > > > I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and
> > > RM_XACT_ID WAL records, except for XLOG_FPI_*, are emitted.  What's the
> > > good name?  IIUC, "minimal" is named after the fact that the minimal
> > > amount of WAL necessary for crash recovery is generated.  "norecovery" or
> > > "unrecoverable"?
> > Lastly, I found another name which expresses the essential characteristic 
> > of this wal_level.
> > How about the name of wal_level="crash_unsafe" ?
> > What did you think ?
>
> I don't dislike "none" since it seems to me practically "none".

Me too.

> It seems rather correct if we actually need only the shutdown checkpoint
> record.
>
> "unrecoverable" is apparently misleading. "crash_unsafe" is precise
> but seems somewhat alien being among "logical", "replica" and
> "minimal".

Since the WAL protocol is for durability I'm concerned the name
'crash_unsafe' (and 'unrecoverable') leads to a contradiction.

Regards,

-- 
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

Re: Disable WAL logging to speed up data loading

2020-11-29 Thread Kyotaro Horiguchi

At Fri, 27 Nov 2020 13:34:49 +, "osumi.takami...@fujitsu.com" 
 wrote in 
> Thank you, Horiguchi-San
> 
> > I haven't seen a criteria of whether a record is emitted or not for
> > wal_leve=none.
> > 
> > We're emitting only redo logs.  So I think theoretically we don't need 
> > anything
> > other than the shutdown checkpoint record because we don't perform
> > recovery and checkpoint record is required at startup.
> > 
> > RM_XLOG_ID:
> >  XLOG_FPI_FOR_HINT  - not needed?
> >  XLOG_FPI   - not needed?
> > 
> >  XLOG_CHECKPOINT_SHUTDOWN - must have
> > 
> > So how about the followings?
> >  XLOG_CHECKPOINT_ONLINE
> >  XLOG_NOOP
> >  XLOG_NEXTOID
> >  XLOG_SWITCH
> >  XLOG_BACKUP_END
> >  XLOG_PARAMETER_CHANGE
> >  XLOG_RESTORE_POINT
> >  XLOG_FPW_CHANGE
> >  XLOG_END_OF_RECOVERY
> > 
> > 
> > RM_XACT_ID:
> >  XLOG_XACT_COMMIT
> >  XLOG_XACT_PREPARE
> >  XLOG_XACT_ABORT
> >  XLOG_XACT_COMMIT_PREPARED
> >  XLOG_XACT_ABORT_PREPARED
> >  XLOG_XACT_ASSIGNMENT
> >  XLOG_XACT_INVALIDATIONS
> > 
> > Do we need all of these?
> No. Strictly speaking, you are right.
> We still have types of WAL that are not necessarily needed.
> For example, XLOG_END_OF_RECOVERY is not useful
> because wal_level=none doesn't recover from any accidents.
> Or, XLOG_CHECKPOINT_ONLINE is used when we execute CHECKPOINT
> not for shutting down. Thus we could eliminate more.

Yeah, although it's enough only to restrict non-harmful records
practically, if we find that only a few kinds of records are needed,
maybe it's cleaner to allow only required record type(s).

> > And, currenly what decides whether to emit a wal record according to
> > wal_level is the caller of XLogInsert. 
> Yes.
> 
> > So doing this at XLogInsert-level means
> > that we bring the criteria of the necessity of wal-record into xlog layer 
> > only for
> > wal_level=none. I'm not sure it is the right direction.
> I'm sorry. I didn't understand what "doing this" and "xlog layer" meant.

"doing this" meant filtering record types.

> Did you mean that fixing the caller side of XLogInsert (e.g. CreateCheckPoint)
> is not the right direction ? Or, fixing the function of XLogInsert is not the 
> right direction ?

Maybe it's right that if we can filter-out records looking only rmid,
since the xlog facility doesn't need to know about record types of a
resource manager.  But if we need to finer-grained control on the
record types, I'm afraid that that's wrong.  However, if we need only
the XLOG_CHECKPOINT_SHUTDOWN record, it might be better to let
XLogInsert filter records rather than inserting that filtering code to
all the caller sites.

> > At Fri, 27 Nov 2020 07:01:16 +, "tsunakawa.ta...@fujitsu.com"
> >  wrote in
> > > I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and
> > RM_XACT_ID WAL records, except for XLOG_FPI_*, are emitted.  What's the
> > good name?  IIUC, "minimal" is named after the fact that the minimal
> > amount of WAL necessary for crash recovery is generated.  "norecovery" or
> > "unrecoverable"?
> Lastly, I found another name which expresses the essential characteristic of 
> this wal_level.
> How about the name of wal_level="crash_unsafe" ?
> What did you think ?

I don't dislike "none" since it seems to me practically "none".  It
seems rather correct if we actually need only the shutdown checkpoint
record.

"unrecoverable" is apparently misleading. "crash_unsafe" is precise
but seems somewhat alien being among "logical", "replica" and
"minimal".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Disable WAL logging to speed up data loading

2020-11-27 Thread osumi.takami...@fujitsu.com

Thank you, Horiguchi-San

> I haven't seen a criteria of whether a record is emitted or not for
> wal_leve=none.
> 
> We're emitting only redo logs.  So I think theoretically we don't need 
> anything
> other than the shutdown checkpoint record because we don't perform
> recovery and checkpoint record is required at startup.
> 
> RM_XLOG_ID:
>  XLOG_FPI_FOR_HINT  - not needed?
>  XLOG_FPI   - not needed?
> 
>  XLOG_CHECKPOINT_SHUTDOWN - must have
> 
> So how about the followings?
>  XLOG_CHECKPOINT_ONLINE
>  XLOG_NOOP
>  XLOG_NEXTOID
>  XLOG_SWITCH
>  XLOG_BACKUP_END
>  XLOG_PARAMETER_CHANGE
>  XLOG_RESTORE_POINT
>  XLOG_FPW_CHANGE
>  XLOG_END_OF_RECOVERY
> 
> 
> RM_XACT_ID:
>  XLOG_XACT_COMMIT
>  XLOG_XACT_PREPARE
>  XLOG_XACT_ABORT
>  XLOG_XACT_COMMIT_PREPARED
>  XLOG_XACT_ABORT_PREPARED
>  XLOG_XACT_ASSIGNMENT
>  XLOG_XACT_INVALIDATIONS
> 
> Do we need all of these?
No. Strictly speaking, you are right.
We still have types of WAL that are not necessarily needed.
For example, XLOG_END_OF_RECOVERY is not useful
because wal_level=none doesn't recover from any accidents.
Or, XLOG_CHECKPOINT_ONLINE is used when we execute CHECKPOINT
not for shutting down. Thus we could eliminate more.

> And, currenly what decides whether to emit a wal record according to
> wal_level is the caller of XLogInsert. 
Yes.

> So doing this at XLogInsert-level means
> that we bring the criteria of the necessity of wal-record into xlog layer 
> only for
> wal_level=none. I'm not sure it is the right direction.
I'm sorry. I didn't understand what "doing this" and "xlog layer" meant.
Did you mean that fixing the caller side of XLogInsert (e.g. CreateCheckPoint)
is not the right direction ? Or, fixing the function of XLogInsert is not the 
right direction ?


> At Fri, 27 Nov 2020 07:01:16 +, "tsunakawa.ta...@fujitsu.com"
>  wrote in
> > I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and
> RM_XACT_ID WAL records, except for XLOG_FPI_*, are emitted.  What's the
> good name?  IIUC, "minimal" is named after the fact that the minimal
> amount of WAL necessary for crash recovery is generated.  "norecovery" or
> "unrecoverable"?
Lastly, I found another name which expresses the essential characteristic of 
this wal_level.
How about the name of wal_level="crash_unsafe" ?
What did you think ?


Best,
Takamichi Osumi

RE: Disable WAL logging to speed up data loading

2020-11-27 Thread osumi.takami...@fujitsu.com

Hi, Tsunakawa-San


> I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and
> RM_XACT_ID WAL records, except for XLOG_FPI_*, are emitted.  What's the
> good name?  IIUC, "minimal" is named after the fact that the minimal
> amount of WAL necessary for crash recovery is generated.  "norecovery" or
> "unrecoverable"?
Really good point !

Yes, it cannot be helped to generate tiny grain of WALs.
I prefer "unrecoverable" to "none",
which should work to make any users utilize this feature really really 
carefully.
If there's no strong objection, I'll change the name to it from next time.

Best,
Takamichi Osumi

RE: Disable WAL logging to speed up data loading

2020-11-27 Thread osumi.takami...@fujitsu.com

Hello, Sawada-San

On Friday, November 27, 2020 3:08 PM Masahiko Sawada  
wrote:
> -   (errmsg("WAL was generated with wal_level=minimal,
> data may be missing"),
> +   (errmsg("WAL was generated with wal_level<=minimal,
> data may be missing"),
>  errhint("This happens if you temporarily set
> wal_level=minimal without taking a new base backup.")));
> 
> 'wal_level=minimal' in errhint also needs to be changed to
> 'wal_level<=minimal'?
Yeah, thanks. I'll fix this point in the next patch.


> While testing the patch on some workload, I realized that
> XLOG_FPI_FOR_HINT record could still be emitted even when wal_level =
> none. IIUC that WAL record is not necessary during wal_level = none since
> the server cannot be the primary server and the server crash ends up requiring
> to restore the whole database.
That's right. Yeah, I'm aware of the fact that
we can refine the types of WAL. Basically, I thought the amount of 
WALs related to XLOG and XACT should be less than that of other types and
generating those doesn't have a serious impact on the peformance.
But anyway, thanks for your advice !


Best,
Takamichi Osumi

Re: Disable WAL logging to speed up data loading

2020-11-27 Thread Kyotaro Horiguchi

At Fri, 27 Nov 2020 07:01:16 +, "tsunakawa.ta...@fujitsu.com" 
 wrote in 
> From: Masahiko Sawada 
> > While testing the patch on some workload, I realized that
> > XLOG_FPI_FOR_HINT record could still be emitted even when wal_level =
> > none. IIUC that WAL record is not necessary during wal_level = none
> > since the server cannot be the primary server and the server crash
> > ends up requiring to restore the whole database.
> 
> Nice catch!  XLOG_FPI_FOR_HINT and XLOG_FPI should be eliminated, otherwise 
> large amount of WAL may be written.  (It seems that other RMIDs than 
> RM_XLOG_ID and RM_XACT_ID do not have to be written.)
> 
> I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and 
> RM_XACT_ID WAL records, except for XLOG_FPI_*, are emitted.  What's the good 
> name?  IIUC, "minimal" is named after the fact that the minimal amount of WAL 
> necessary for crash recovery is generated.  "norecovery" or "unrecoverable"?

I haven't seen a criteria of whether a record is emitted or not for
wal_leve=none.

We're emitting only redo logs.  So I think theoretically we don't need
anything other than the shutdown checkpoint record because we don't
perform recovery and checkpoint record is required at startup.

RM_XLOG_ID:
 XLOG_FPI_FOR_HINT  - not needed?
 XLOG_FPI   - not needed?

 XLOG_CHECKPOINT_SHUTDOWN - must have

So how about the followings?
 XLOG_CHECKPOINT_ONLINE
 XLOG_NOOP
 XLOG_NEXTOID
 XLOG_SWITCH
 XLOG_BACKUP_END
 XLOG_PARAMETER_CHANGE
 XLOG_RESTORE_POINT
 XLOG_FPW_CHANGE
 XLOG_END_OF_RECOVERY


RM_XACT_ID:
 XLOG_XACT_COMMIT
 XLOG_XACT_PREPARE
 XLOG_XACT_ABORT
 XLOG_XACT_COMMIT_PREPARED
 XLOG_XACT_ABORT_PREPARED
 XLOG_XACT_ASSIGNMENT
 XLOG_XACT_INVALIDATIONS

Do we need all of these?

And, currenly what decides whether to emit a wal record according to
wal_level is the caller of XLogInsert. So doing this at
XLogInsert-level means that we bring the criteria of the necessity of
wal-record into xlog layer only for wal_level=none. I'm not sure it is
the right direction.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Disable WAL logging to speed up data loading

2020-11-26 Thread tsunakawa.ta...@fujitsu.com

From: Masahiko Sawada 
> While testing the patch on some workload, I realized that
> XLOG_FPI_FOR_HINT record could still be emitted even when wal_level =
> none. IIUC that WAL record is not necessary during wal_level = none
> since the server cannot be the primary server and the server crash
> ends up requiring to restore the whole database.

Nice catch!  XLOG_FPI_FOR_HINT and XLOG_FPI should be eliminated, otherwise 
large amount of WAL may be written.  (It seems that other RMIDs than RM_XLOG_ID 
and RM_XACT_ID do not have to be written.)

I'm afraid "none" doesn't represent the behavior because RM_XLOG_ID and 
RM_XACT_ID WAL records, except for XLOG_FPI_*, are emitted.  What's the good 
name?  IIUC, "minimal" is named after the fact that the minimal amount of WAL 
necessary for crash recovery is generated.  "norecovery" or "unrecoverable"?

Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2020-11-26 Thread Kyotaro Horiguchi

At Fri, 27 Nov 2020 15:07:34 +0900, Masahiko Sawada  
wrote in 
> On Tue, Nov 24, 2020 at 11:19 AM osumi.takami...@fujitsu.com
>  wrote:
> >
> > Hi
> >
> > On Monday, Nov 23, 2020 12:17 PM Tsunakawa, Takayuki 
> >  wrote:
> > > PREPARE TRANSACTION is the same as COMMIT in that it persists
> > > transaction updates.  A crash during wal_level = none loses both of them.
> > > So, I don't think PREPARE TRANSACTION needs special treatment.
> > Yeah, I got it. That makes sense.
> > Attached is the one I removed the special treatment.
> >
> >
> > > Does PREPARE TRANSACTION complete successfully?  I remember
> > > Fujii-san said it PANICed if --enable-asserts is used in configure.
> > Yes. That assertion failure Fujii-San reported in the past
> > was solved by having rmid != RM_XACT_ID in XLogInsert()
> > to add WAL generation for transaction completes.
> > That I missed the condition was the cause but fine now.
> >
> > Plus, PREPARE TRANSACTION and COMMIT PREPARED
> > works successfully at present when no happening occurs.
> >
> 
> Thank you for updating the patch.
> 
> -   (errmsg("WAL was generated with wal_level=minimal,
> data may be missing"),
> +   (errmsg("WAL was generated with wal_level<=minimal,
> data may be missing"),
>  errhint("This happens if you temporarily set
> wal_level=minimal without taking a new base backup.")));
> 
> 'wal_level=minimal' in errhint also needs to be changed to 
> 'wal_level<=minimal'?
> 
> While testing the patch on some workload, I realized that
> XLOG_FPI_FOR_HINT record could still be emitted even when wal_level =
> none. IIUC that WAL record is not necessary during wal_level = none
> since the server cannot be the primary server and the server crash
> ends up requiring to restore the whole database.

It seems to be on purpose.

@@ -449,6 +449,14 @@ XLogInsert(RmgrId rmid, uint8 info)
return EndPos;
}
 
+   /* Issues WAL related to XLOG resources and transactions only */
+   if (wal_level == WAL_LEVEL_NONE &&
+   rmid != RM_XLOG_ID && rmid != RM_XACT_ID)
+   {
+   XLogResetInsertion();
+   return GetXLogInsertRecPtr();

What is the reason for those kinds of records to be emitted?

I think we must emit at least the shutdown checkpoint record, which
has redo-LSN that points to the record itself.  I'm not sure what
other kinds of records are needed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Disable WAL logging to speed up data loading

2020-11-26 Thread Masahiko Sawada

On Tue, Nov 24, 2020 at 11:19 AM osumi.takami...@fujitsu.com
 wrote:
>
> Hi
>
> On Monday, Nov 23, 2020 12:17 PM Tsunakawa, Takayuki 
>  wrote:
> > PREPARE TRANSACTION is the same as COMMIT in that it persists
> > transaction updates.  A crash during wal_level = none loses both of them.
> > So, I don't think PREPARE TRANSACTION needs special treatment.
> Yeah, I got it. That makes sense.
> Attached is the one I removed the special treatment.
>
>
> > Does PREPARE TRANSACTION complete successfully?  I remember
> > Fujii-san said it PANICed if --enable-asserts is used in configure.
> Yes. That assertion failure Fujii-San reported in the past
> was solved by having rmid != RM_XACT_ID in XLogInsert()
> to add WAL generation for transaction completes.
> That I missed the condition was the cause but fine now.
>
> Plus, PREPARE TRANSACTION and COMMIT PREPARED
> works successfully at present when no happening occurs.
>

Thank you for updating the patch.

-   (errmsg("WAL was generated with wal_level=minimal,
data may be missing"),
+   (errmsg("WAL was generated with wal_level<=minimal,
data may be missing"),
 errhint("This happens if you temporarily set
wal_level=minimal without taking a new base backup.")));

'wal_level=minimal' in errhint also needs to be changed to 'wal_level<=minimal'?

While testing the patch on some workload, I realized that
XLOG_FPI_FOR_HINT record could still be emitted even when wal_level =
none. IIUC that WAL record is not necessary during wal_level = none
since the server cannot be the primary server and the server crash
ends up requiring to restore the whole database.

Regards,

--
Masahiko Sawada
EnterpriseDB:  https://www.enterprisedb.com/

RE: Disable WAL logging to speed up data loading

2020-11-23 Thread osumi.takami...@fujitsu.com

Hi

On Monday, Nov 23, 2020 12:17 PM Tsunakawa, Takayuki 
 wrote:
> PREPARE TRANSACTION is the same as COMMIT in that it persists
> transaction updates.  A crash during wal_level = none loses both of them.
> So, I don't think PREPARE TRANSACTION needs special treatment.
Yeah, I got it. That makes sense.
Attached is the one I removed the special treatment.


> Does PREPARE TRANSACTION complete successfully?  I remember
> Fujii-san said it PANICed if --enable-asserts is used in configure.
Yes. That assertion failure Fujii-San reported in the past
was solved by having rmid != RM_XACT_ID in XLogInsert()
to add WAL generation for transaction completes.
That I missed the condition was the cause but fine now.

Plus, PREPARE TRANSACTION and COMMIT PREPARED
works successfully at present when no happening occurs.

Best,
Takamichi Osumi



disable_WAL_logging_v04.patch
Description: disable_WAL_logging_v04.patch

RE: Disable WAL logging to speed up data loading

2020-11-22 Thread tsunakawa.ta...@fujitsu.com

From: Osumi, Takamichi/大墨 昂道 
> This time, I updated my patch to address comments below only.

I forgot to mentionthis.  I confirmed all review comments are reflected 
correctly.


Regards
Takayuki Tsunakawa

RE: Disable WAL logging to speed up data loading

2020-11-22 Thread tsunakawa.ta...@fujitsu.com

From: Osumi, Takamichi/大墨 昂道 
> > case TRANS_STMT_PREPARE:
> > +   if (wal_level ==
> > WAL_LEVEL_NONE)
> > +   ereport(ERROR,
> > +
> > errmsg("cannot execute PREPARE TRANSACTION when WAL logging
> is
> > disabled"));
> Usually, this safeguard is not needed for PREPARE TRANSACTION.
> But, when an unexpected crash happens, user needs to recreate the cluster
> from the backup and execute the operations that are executed into the crashed
> server again if the user sets wal_level=none.
> 
> While 2PC guarantees that after issuing PREPARE TRANSACTION, the
> processing or the contents of the transaction becomes *secure*, setting
> wal_level=none makes the server never start up again if a database crash
> occurs.
> In other words, this safeguard of the new wal_level damages the important
> aspect of PREPARE TRANSACTION's functionality, in my opinion.
> 
> I don't think this is what it should be.
> 
> Therefore, I don't want users to utilize the PREPARE TRANSACTION still in
> order to prevent that user thinks that they can use PREPARE TRANSACTION
> safely during wal_level=none and execute it.
> Did it make sense ?

PREPARE TRANSACTION is the same as COMMIT in that it persists transaction 
updates.  A crash during wal_level = none loses both of them.  So, I don't 
think PREPARE TRANSACTION needs special treatment.

Does PREPARE TRANSACTION complete successfully?  I remember Fujii-san said it 
PANICed if --enable-asserts is used in configure.


Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2020-11-20 Thread Stephen Frost

Greetings,

* osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote:
> On Friday, Nov 20, 2020 9:33 AM Tsunakawa, Takayuki 
>  wrote:
> > From: Kyotaro Horiguchi 
> > > At Thu, 19 Nov 2020 11:04:17 -0500, Stephen Frost 
> > > > * Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> > > > > I missed that this is only a warning when I looked at it before.
> > > > > Yes, it should be a fatal error.
> > > >
> > > > Yeah, the more that I think about it, the more that I tend to agree
> > > > with this.  Does anyone want to argue against changing this into a
> > FATAL..?
> > >
> > > I don't come up with a use case where someone needs to set
> > > wal_level=minimal for archive recovery. So +1 to change it to FATAL.
> It seems we reached the agreement for this fix.
> 
> > Thank you all.  I'd like to give Osumi-san an opportunity to write a patch 
> > and
> > showing the evidence of successful test if you don't mind.  He is very young
> > and newbie to Postgres, so even a small contribution would encourage him to
> > make further contributions.
> Sure. Let me do that, including the test.

That all sounds good to me.  I would recommend starting a new thread on
-hackers with the patch, once it's ready, since it's really independent
from the topic of this thread.

Thanks,

Stephen


signature.asc
Description: PGP signature

RE: Disable WAL logging to speed up data loading

2020-11-20 Thread osumi.takami...@fujitsu.com

Hi

This time, I updated my patch to address comments below only.
Please have a look at updated one.

On Thursday, Nov 19, 2020 4:51 PM Tsunakawa, Takayuki 

> (1)
>  #define RelationNeedsWAL(relation)
>   \
> + (wal_level != WAL_LEVEL_NONE &&
>   \
>   ((relation)->rd_rel->relpersistence ==
> RELPERSISTENCE_PERMANENT &&   \
>(XLogIsNeeded() ||
>   \
> (relation->rd_createSubid == InvalidSubTransactionId &&
>   \
> -relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
> +relation->rd_firstRelfilenodeSubid ==
> InvalidSubTransactionId
> 
> You can remove one pair of parentheses for readability as follows:
> 
>  #define RelationNeedsWAL(relation)
>   \
> + (wal_level != WAL_LEVEL_NONE &&
>   \
>   (relation)->rd_rel->relpersistence ==
> RELPERSISTENCE_PERMANENT &&   \
>(XLogIsNeeded() ||
>   \
> (relation->rd_createSubid == InvalidSubTransactionId &&
>   \
> -relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
> +relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
Done. 

> (2)
>   /*
> +  * Detect if the server previously crashed under wal_level='none' or
> not.
> +  */
> + if (ControlFile->wal_level == WAL_LEVEL_NONE &&
> + (ControlFile->state != DB_SHUTDOWNED &&
> ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY))
> + ereport(ERROR,
> + (errmsg("detected an unexpected server
> shutdown when WAL logging was disabled"),
> +  errhint("It looks like you need to deploy a
> new cluster from your full backup again.")));
> +
> + /*
>* Check that contents look valid.
>*/
>   if (!XRecOffIsValid(ControlFile->checkPoint))
> 
> I think the above safeguard should be placed right before the following code,
> because it's better to first check control file contents and emit the message
> for the previous shutdown state.
> 
> #ifdef XLOG_REPLAY_DELAY
> if (ControlFile->state != DB_SHUTDOWNED)
> pg_usleep(6000L);
> #endif
Thank you. Perfectly makes sense. Fixed.


> (3)
> @@ -449,6 +449,13 @@ XLogInsert(RmgrId rmid, uint8 info)
>   return EndPos;
>   }
> 
> + /* Issues WAL only for XLOG resources */
> + if (wal_level == WAL_LEVEL_NONE && rmid != RM_XLOG_ID)
> + {
> + XLogResetInsertion();
> + return GetLatestCheckPointLSN();
> + }
> 
> It should be correct to use GetXLogInsertRecPtr() instead of
> GetLatestCheckPointLSN(), because what XLogInsert() should return is the
> end position of last WAL record inserted (= start position of next WAL).
Ah, sorry. My patch was not correct.


> Why don't we add transaction completion WAL records?  That is, add
> "rmid != RM_XACT_ID" to the if condition.  What we really want to eliminate
> for performance is the data modification WAL records.  This might allow us
> to insert special checks to prevent PREPARE TRANSACTION and other stuff.
I apologize that you mentioned this point in your past review but I took it 
wrongly.
I fixed the condition and the comment for it.


> What happens without these checks? 
> (4)
> @@ -404,6 +404,10 @@
> pg_logical_emit_message_bytea(PG_FUNCTION_ARGS)
>   bytea  *data = PG_GETARG_BYTEA_PP(2);
>   XLogRecPtr  lsn;
> 
> + if (wal_level < WAL_LEVEL_REPLICA)
> + ereport(ERROR,
> + errmsg("propagating a message requires
> wal_level \"replica\" or \"logical\""));
> +
>   lsn = LogLogicalMessage(prefix, VARDATA_ANY(data),
> VARSIZE_ANY_EXHDR(data),
>   transactional);
>   PG_RETURN_LSN(lsn);
> 
> @@ -192,6 +192,10 @@ replorigin_check_prerequisites(bool check_slots,
> bool recoveryOK)
> 
>   (errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
>errmsg("cannot manipulate replication
> origins during recovery")));
> 
> + if (wal_level < WAL_LEVEL_REPLICA)
> + ereport(ERROR,
> + errmsg("creating replication origins requires
> wal_level \"replica\" or \"logical\""));
> +
>  }
Those functions doesn't do something meaningful when wal_level <= minimal.
For example, pg_logical_emit_message_bytea calls XLogInsert with 
RM_LOGICALMSG_ID rmid.
This doesn't generate WAL during wal_level=none,
which makes the usage of the function nonsense.

But, the latest patch I attached removed this part of modification,
because this can be applied to wal_level = minimal also and
I should make the patch focus on its purpose only.


> @@ -625,6 +625,9 @@ standard_ProcessUtility(PlannedStmt *pstmt,
>

RE: Disable WAL logging to speed up data loading

2020-11-19 Thread osumi.takami...@fujitsu.com

Hello


On Friday, Nov 20, 2020 9:33 AM Tsunakawa, Takayuki 
 wrote:
> From: Kyotaro Horiguchi 
> > At Thu, 19 Nov 2020 11:04:17 -0500, Stephen Frost 
> > > * Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> > > > I missed that this is only a warning when I looked at it before.
> > > > Yes, it should be a fatal error.
> > >
> > > Yeah, the more that I think about it, the more that I tend to agree
> > > with this.  Does anyone want to argue against changing this into a
> FATAL..?
> >
> > I don't come up with a use case where someone needs to set
> > wal_level=minimal for archive recovery. So +1 to change it to FATAL.
It seems we reached the agreement for this fix.

> Thank you all.  I'd like to give Osumi-san an opportunity to write a patch and
> showing the evidence of successful test if you don't mind.  He is very young
> and newbie to Postgres, so even a small contribution would encourage him to
> make further contributions.
Sure. Let me do that, including the test.

Best,
Takamichi Osumi

RE: Disable WAL logging to speed up data loading

2020-11-19 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> At Thu, 19 Nov 2020 11:04:17 -0500, Stephen Frost 
> > * Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> > > I missed that this is only a warning when I looked at it before.
> > > Yes, it should be a fatal error.
> >
> > Yeah, the more that I think about it, the more that I tend to agree with
> > this.  Does anyone want to argue against changing this into a FATAL..?
> 
> I don't come up with a use case where someone needs to set
> wal_level=minimal for archive recovery. So +1 to change it to FATAL.

Thank you all.  I'd like to give Osumi-san an opportunity to write a patch and 
showing the evidence of successful test if you don't mind.  He is very young 
and newbie to Postgres, so even a small contribution would encourage him to 
make further contributions.

Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2020-11-19 Thread Kyotaro Horiguchi

At Thu, 19 Nov 2020 11:04:17 -0500, Stephen Frost  wrote in 
> Greetings,
> 
> * Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> > On Thu, 2020-11-19 at 05:24 +, osumi.takami...@fujitsu.com wrote:
> > > > > > ereport(WARNING,
> > > > > >  (errmsg("WAL was generated with wal_level=minimal, data may
> > > > > > be missing"),
> > > > > >  errhint("This happens if you temporarily set
> > > > > > wal_level=minimal without taking a new base backup.")));
> > > > > > There's definitely a question about if a WARNING there is really
> > > > > > sufficient or not, considering that you could end up with 'logged'
> > > > > > tables on the replica that are missing data, but I'm not sure that
> > > > > > inventing a new, independent, mechanism for checking WAL level
> > > > > > changes makes
> > > > > sense.
> > > >
> > > > I don't know why WARNING was chosen.  I think it should be FATAL,
> > > > resulting in the standby shutdown, disabling restarting it, and urging 
> > > > the user
> > > > to rebuild the standby.  (I guess that's overreaction because the user 
> > > > may
> > > > not perform operations that lack WAL while wal_level is minimal.)
> > > 
> > > Yeah, I agree that WARNING is not sufficient.
> > 
> > I missed that this is only a warning when I looked at it before.
> > Yes, it should be a fatal error.
> 
> Yeah, the more that I think about it, the more that I tend to agree with
> this.  Does anyone want to argue against changing this into a FATAL..?

I don't come up with a use case where someone needs to set
wal_level=minimal for archive recovery. So +1 to change it to FATAL.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Disable WAL logging to speed up data loading

2020-11-19 Thread Stephen Frost

Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> On Thu, 2020-11-19 at 05:24 +, osumi.takami...@fujitsu.com wrote:
> > > > > ereport(WARNING,
> > > > >  (errmsg("WAL was generated with wal_level=minimal, data may
> > > > > be missing"),
> > > > >  errhint("This happens if you temporarily set
> > > > > wal_level=minimal without taking a new base backup.")));
> > > > > There's definitely a question about if a WARNING there is really
> > > > > sufficient or not, considering that you could end up with 'logged'
> > > > > tables on the replica that are missing data, but I'm not sure that
> > > > > inventing a new, independent, mechanism for checking WAL level
> > > > > changes makes
> > > > sense.
> > >
> > > I don't know why WARNING was chosen.  I think it should be FATAL,
> > > resulting in the standby shutdown, disabling restarting it, and urging 
> > > the user
> > > to rebuild the standby.  (I guess that's overreaction because the user may
> > > not perform operations that lack WAL while wal_level is minimal.)
> > 
> > Yeah, I agree that WARNING is not sufficient.
> 
> I missed that this is only a warning when I looked at it before.
> Yes, it should be a fatal error.

Yeah, the more that I think about it, the more that I tend to agree with
this.  Does anyone want to argue against changing this into a FATAL..?

Thanks,

Stephen


signature.asc
Description: PGP signature

RE: Disable WAL logging to speed up data loading

2020-11-19 Thread osumi.takami...@fujitsu.com

Hello, Laurenz



On Thursday, Nov19, 2020 4:50 PM Laurenz Albe  wrote
> On Thu, 2020-11-19 at 05:24 +, osumi.takami...@fujitsu.com wrote:
> > > > > ereport(WARNING,
> > > > >  (errmsg("WAL was generated with wal_level=minimal, data
> > > > > may be missing"),
> > > > >  errhint("This happens if you temporarily set
> > > > > wal_level=minimal without taking a new base backup."))); There's
> > > > > definitely a question about if a WARNING there is really
> > > > > sufficient or not, considering that you could end up with 'logged'
> > > > > tables on the replica that are missing data, but I'm not sure
> > > > > that inventing a new, independent, mechanism for checking WAL
> > > > > level changes makes
> > > > sense.
> > >
> > > I don't know why WARNING was chosen.  I think it should be FATAL,
> > > resulting in the standby shutdown, disabling restarting it, and
> > > urging the user to rebuild the standby.  (I guess that's
> > > overreaction because the user may not perform operations that lack
> > > WAL while wal_level is minimal.)
> >
> > Yeah, I agree that WARNING is not sufficient.
> 
> I missed that this is only a warning when I looked at it before.
> Yes, it should be a fatal error.
> 
> I think that there should two patches: one that turns this warning into a
> FATAL and should be backpatched.  If you change the test to
> 
>ControlFile->wal_level <= WAL_LEVEL_MINIMAL
> 
> it will automatically work for your new feature too.
> Then your new wal_level would be a second patch only for HEAD.
Yeah, this suggestion to divide the patch into two sounds really good.
Thank you. I'll post separated patches next time.

> 
> With that, the only remaining consideration with this patch is the danger that
> enabling wal_level=none without taking a backup before can lead to data loss.
> But that is intended, so I think that an unmistakable warning in the
> documentation would be good enough.
Yes, thank you for reviewing the documents in the patch.


Best,
Takamichi Osumi

RE: Disable WAL logging to speed up data loading

2020-11-18 Thread tsunakawa.ta...@fujitsu.com

(1)
 #define RelationNeedsWAL(relation) 
\
+   (wal_level != WAL_LEVEL_NONE && 
\
((relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&  
\
 (XLogIsNeeded() || 
\
  (relation->rd_createSubid == InvalidSubTransactionId &&   
\
-  relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
+  relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId

You can remove one pair of parentheses for readability as follows:

 #define RelationNeedsWAL(relation) 
\
+   (wal_level != WAL_LEVEL_NONE && 
\
(relation)->rd_rel->relpersistence == RELPERSISTENCE_PERMANENT &&   
\
 (XLogIsNeeded() || 
\
  (relation->rd_createSubid == InvalidSubTransactionId &&   
\
-  relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))
+  relation->rd_firstRelfilenodeSubid == InvalidSubTransactionId)))


(2)
/*
+* Detect if the server previously crashed under wal_level='none' or 
not.
+*/
+   if (ControlFile->wal_level == WAL_LEVEL_NONE &&
+   (ControlFile->state != DB_SHUTDOWNED && ControlFile->state != 
DB_SHUTDOWNED_IN_RECOVERY))
+   ereport(ERROR,
+   (errmsg("detected an unexpected server shutdown 
when WAL logging was disabled"),
+errhint("It looks like you need to deploy a 
new cluster from your full backup again.")));
+
+   /*
 * Check that contents look valid.
 */
if (!XRecOffIsValid(ControlFile->checkPoint))

I think the above safeguard should be placed right before the following code, 
because it's better to first check control file contents and emit the message 
for the previous shutdown state.

#ifdef XLOG_REPLAY_DELAY
if (ControlFile->state != DB_SHUTDOWNED)
pg_usleep(6000L);
#endif


(3)
@@ -449,6 +449,13 @@ XLogInsert(RmgrId rmid, uint8 info)
return EndPos;
}
 
+   /* Issues WAL only for XLOG resources */
+   if (wal_level == WAL_LEVEL_NONE && rmid != RM_XLOG_ID)
+   {
+   XLogResetInsertion();
+   return GetLatestCheckPointLSN();
+   }

It should be correct to use GetXLogInsertRecPtr() instead of 
GetLatestCheckPointLSN(), because what XLogInsert() should return is the end 
position of last WAL record inserted (= start position of next WAL).

Why don't we add transaction completion WAL records?  That is, add "rmid != 
RM_XACT_ID" to the if condition.  What we really want to eliminate for 
performance is the data modification WAL records.  This might allow us to 
insert special checks to prevent PREPARE TRANSACTION and other stuff.


(4)
@@ -404,6 +404,10 @@ pg_logical_emit_message_bytea(PG_FUNCTION_ARGS)
bytea  *data = PG_GETARG_BYTEA_PP(2);
XLogRecPtr  lsn;
 
+   if (wal_level < WAL_LEVEL_REPLICA)
+   ereport(ERROR,
+   errmsg("propagating a message requires 
wal_level \"replica\" or \"logical\""));
+
lsn = LogLogicalMessage(prefix, VARDATA_ANY(data), 
VARSIZE_ANY_EXHDR(data),
transactional);
PG_RETURN_LSN(lsn);

@@ -192,6 +192,10 @@ replorigin_check_prerequisites(bool check_slots, bool 
recoveryOK)
(errcode(ERRCODE_READ_ONLY_SQL_TRANSACTION),
 errmsg("cannot manipulate replication origins 
during recovery")));
 
+   if (wal_level < WAL_LEVEL_REPLICA)
+   ereport(ERROR,
+   errmsg("creating replication origins requires 
wal_level \"replica\" or \"logical\""));
+
 }

@@ -625,6 +625,9 @@ standard_ProcessUtility(PlannedStmt *pstmt,
break;
 
case TRANS_STMT_PREPARE:
+   if (wal_level == WAL_LEVEL_NONE)
+   ereport(ERROR,
+   
errmsg("cannot execute PREPARE TRANSACTION when WAL logging is disabled"));

What happens without these checks?



Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2020-11-18 Thread Laurenz Albe

On Thu, 2020-11-19 at 05:24 +, osumi.takami...@fujitsu.com wrote:
> > > > ereport(WARNING,
> > > >  (errmsg("WAL was generated with wal_level=minimal, data may
> > > > be missing"),
> > > >  errhint("This happens if you temporarily set
> > > > wal_level=minimal without taking a new base backup.")));
> > > > There's definitely a question about if a WARNING there is really
> > > > sufficient or not, considering that you could end up with 'logged'
> > > > tables on the replica that are missing data, but I'm not sure that
> > > > inventing a new, independent, mechanism for checking WAL level
> > > > changes makes
> > > sense.
> >
> > I don't know why WARNING was chosen.  I think it should be FATAL,
> > resulting in the standby shutdown, disabling restarting it, and urging the 
> > user
> > to rebuild the standby.  (I guess that's overreaction because the user may
> > not perform operations that lack WAL while wal_level is minimal.)
> 
> Yeah, I agree that WARNING is not sufficient.

I missed that this is only a warning when I looked at it before.
Yes, it should be a fatal error.

I think that there should two patches: one that turns this warning into
a FATAL and should be backpatched.  If you change the test to

   ControlFile->wal_level <= WAL_LEVEL_MINIMAL

it will automatically work for your new feature too.

Then your new wal_level would be a second patch only for HEAD.

With that, the only remaining consideration with this patch is the danger
that enabling wal_level=none without taking a backup before can lead to
data loss.  But that is intended, so I think that an unmistakable warning
in the documentation would be good enough.

Yours,
Laurenz Albe

RE: Disable WAL logging to speed up data loading

2020-11-18 Thread osumi.takami...@fujitsu.com

Hello


On Thursday, Nov 19, 2020 12:45 PM Tsunakawa, Takayuki/綱川 貴之 

> > On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost 
> > wrote:
> > > Checking the WAL level certainly seems critical to anything that's
> > > reading the WAL.  We certainly do this already when running as a
> > > replica:
> > >
> > > ereport(WARNING,
> > > (errmsg("WAL was generated with wal_level=minimal, data may
> > > be missing"),
> > > errhint("This happens if you temporarily set
> > > wal_level=minimal without taking a new base backup.")));
> > >
> > > There's definitely a question about if a WARNING there is really
> > > sufficient or not, considering that you could end up with 'logged'
> > > tables on the replica that are missing data, but I'm not sure that
> > > inventing a new, independent, mechanism for checking WAL level
> > > changes makes
> > sense.
> I don't know why WARNING was chosen.  I think it should be FATAL,
> resulting in the standby shutdown, disabling restarting it, and urging the 
> user
> to rebuild the standby.  (I guess that's overreaction because the user may
> not perform operations that lack WAL while wal_level is minimal.)
Yeah, I agree that WARNING is not sufficient.


> > The second idea is incremental counter that indicates drop of
> > wal_level from replica to minimal (or like from minimal to none).
> > Its purpose is to compare the wal_level changes between snapshots.
> > When any monitoring tools detect any difference of the counter, we can
> > predict something happened immediately without checking WAL in
> > between.
> 
> Let's depict the situation.  I may be misunderstanding, so any correction
> would be much welcome.  Here, I call the new field
> wal_level_change_counter, which should be changed to a proper name.
> 
> 1. wal_level = replica.  Take a base backup and store it in
> $BACKUPDIR/20201119/.
>   wal_level_change_counter = 0
> 
> 2. Set wal_level = minimal or none, and restart the instance.  Perform some
> operations.
>   wal_level_change_counter = 1
> 
> 3. Set wal_level = replica, and restart the instance.
>   wal_level_change_counter = 1
> 
> 4. Some monitoring system compares the values of
> wal_level_change_counter in $BACKUPDIR/20201119/ and $PGDATA/, and
> notices the difference (0 and 1 respectively.)
> It warns the user that he/she should take a full backup because some WAL
> may be missing to recover the latest data from the last backup in
> $BACKUPDIR/20201119/. 
My understanding is completely same as the description Tsunakawa-San wrote 
above.

> But I think this is a separate patch, because the issue already applies to
> wal_level = minimal.
OK. The range of this patch's responsibility was obscure, but now I'm fine.
I understand that I don't need to incorporate those mechanisms in the new 
wal_level patch.


Best,
Takamichi Osumi

RE: Disable WAL logging to speed up data loading

2020-11-18 Thread tsunakawa.ta...@fujitsu.com

From: Osumi, Takamichi/大墨 昂道 
> there were some ideas to trace the change of wal_level,
> in other words, *stronger mechanism* to check wal_level.
> I agree with the idea to have a new monitoring item
> and would like to implement those kind of, or one of those ideas for the next
> patch.
> But I'm not sure where is the right place to write the information.
> Was the best place the control file ?

Yes, I think so (I find nowhere else.)


> On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost 
> wrote:
> > Checking the WAL level certainly seems critical to anything that's reading 
> > the
> > WAL.  We certainly do this already when running as a
> > replica:
> >
> > ereport(WARNING,
> > (errmsg("WAL was generated with wal_level=minimal, data may be
> > missing"),
> > errhint("This happens if you temporarily set wal_level=minimal
> > without taking a new base backup.")));
> >
> > There's definitely a question about if a WARNING there is really sufficient 
> > or
> > not, considering that you could end up with 'logged'
> > tables on the replica that are missing data, but I'm not sure that 
> > inventing a
> > new, independent, mechanism for checking WAL level changes makes
> sense.
> >
> > While pgbackrest backups of a primary wouldn't be impacted, it does
> support
> > backing up from a replica (as do other backup systems, of
> > course) and if data that's supposed to be on the replica isn't there because
> > someone restarted PG with wal_level=minimal and then flipped it back to
> > replica and got the replica to move past that part of the WAL by turning 
> > off hot
> > standby, replaying, and then turning it back on, then the backup is going to
> > also be missing that data.
> >
> > Perhaps that's where we need to have a stronger mechanism though- that is,
> > if we hit the above WARNING, ever, set a flag somewhere that tools can check
> > and which we could also check and throw further warnings about.  In other
> > words, every time this replica is started, we could check for this flag and
> > throw the same warning above about how data may be missing, and we could
> > have pg_basebackup flat out refuse to back up from a replica where this flag
> > has been set (maybe with some kind of override mechanism...  maybe not).
> > After all, that's where the real issue here is, isn't it?
> The first idea is warning that means replica could miss some data.
> This is to notify some tools that taking a backup from replica is dangerous.
> This can be applied to a change from replica to minimal(or none) and useful
> because having the wal_level lowered to minimal does not mean
> unclean shutdown or unexpected stoppage but it means a replica might get
> corrupted.

I don't know why WARNING was chosen.  I think it should be FATAL, resulting in 
the standby shutdown, disabling restarting it, and urging the user to rebuild 
the standby.  (I guess that's overreaction because the user may not perform 
operations that lack WAL while wal_level is minimal.)


> On Tuesday, Nov 3, 2020 12:28 AM Robert Haas 
> wrote:
> > I'm not exactly sure what that would look like, but suppose we had a feature
> > where every time wal_level drops below replica, a counter gets incremented
> by
> > 1, and that counter is saved in the control file. Or maybe when wal_level
> drops
> > below minimal to none. Or maybe there are two counters. Anyway, the idea is
> > that if you have a snapshot of the cluster at one time and a snapshot at
> > another time, you can see whether anything scary has happened in the
> middle
> > without needing all of the WAL in between.
> The second idea is incremental counter that indicates drop of wal_level
> from replica to minimal (or like from minimal to none).
> Its purpose is to compare the wal_level changes between snapshots.
> When any monitoring tools detect any difference of the counter,
> we can predict something happened immediately without checking WAL in
> between.
> 
> > > This would also be something that should be exposed as monitoring
> > > points (which it could be if it's in pg_control). That is, I can
> > > imagine a *lot* of installations that would definitely want an alert
> > > to fire if the cluster has ever been started up in a wal_level=none or
> > > wal_level=minimal, at least up until the point where somebody has run
> > > a new full backup.
> >
> > I agree with the general idea that it'd be good for anything we do here to 
> > be
> > made available for monitoring tools to be able to see.
> The last one is notification of permanent damage from initial cluster 
> parameter,
> although this may be similar to the second one.
> This is an indication that shows some part of data are not recovered,
> if the cluster was initialized and loaded some data during wal_level=none.
> 
> Exposing the monitoring item for tools sounds reasonable for me of course.
> Was the control file the best place for the information to implement those 
> kind
> of ideas ?
> What I'm concerned about is that

RE: Disable WAL logging to speed up data loading

2020-11-18 Thread osumi.takami...@fujitsu.com

Hello


In the past discussion of wal_level=none,
there were some ideas to trace the change of wal_level,
in other words, *stronger mechanism* to check wal_level.
I agree with the idea to have a new monitoring item
and would like to implement those kind of, or one of those ideas for the next 
patch.
But I'm not sure where is the right place to write the information.
Was the best place the control file ?
On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost  wrote:
> Checking the WAL level certainly seems critical to anything that's reading the
> WAL.  We certainly do this already when running as a
> replica:
> 
> ereport(WARNING,
> (errmsg("WAL was generated with wal_level=minimal, data may be
> missing"),
> errhint("This happens if you temporarily set wal_level=minimal
> without taking a new base backup.")));
> 
> There's definitely a question about if a WARNING there is really sufficient or
> not, considering that you could end up with 'logged'
> tables on the replica that are missing data, but I'm not sure that inventing a
> new, independent, mechanism for checking WAL level changes makes sense.
> 
> While pgbackrest backups of a primary wouldn't be impacted, it does support
> backing up from a replica (as do other backup systems, of
> course) and if data that's supposed to be on the replica isn't there because
> someone restarted PG with wal_level=minimal and then flipped it back to
> replica and got the replica to move past that part of the WAL by turning off 
> hot
> standby, replaying, and then turning it back on, then the backup is going to
> also be missing that data.
> 
> Perhaps that's where we need to have a stronger mechanism though- that is,
> if we hit the above WARNING, ever, set a flag somewhere that tools can check
> and which we could also check and throw further warnings about.  In other
> words, every time this replica is started, we could check for this flag and
> throw the same warning above about how data may be missing, and we could
> have pg_basebackup flat out refuse to back up from a replica where this flag
> has been set (maybe with some kind of override mechanism...  maybe not).
> After all, that's where the real issue here is, isn't it?
The first idea is warning that means replica could miss some data.
This is to notify some tools that taking a backup from replica is dangerous.
This can be applied to a change from replica to minimal(or none) and useful
because having the wal_level lowered to minimal does not mean
unclean shutdown or unexpected stoppage but it means a replica might get 
corrupted.

On Tuesday, Nov 3, 2020 12:28 AM Robert Haas  wrote:
> I'm not exactly sure what that would look like, but suppose we had a feature
> where every time wal_level drops below replica, a counter gets incremented by
> 1, and that counter is saved in the control file. Or maybe when wal_level 
> drops
> below minimal to none. Or maybe there are two counters. Anyway, the idea is
> that if you have a snapshot of the cluster at one time and a snapshot at
> another time, you can see whether anything scary has happened in the middle
> without needing all of the WAL in between.
The second idea is incremental counter that indicates drop of wal_level
from replica to minimal (or like from minimal to none).
Its purpose is to compare the wal_level changes between snapshots.
When any monitoring tools detect any difference of the counter,
we can predict something happened immediately without checking WAL in between.

> > This would also be something that should be exposed as monitoring
> > points (which it could be if it's in pg_control). That is, I can
> > imagine a *lot* of installations that would definitely want an alert
> > to fire if the cluster has ever been started up in a wal_level=none or
> > wal_level=minimal, at least up until the point where somebody has run
> > a new full backup.
> 
> I agree with the general idea that it'd be good for anything we do here to be
> made available for monitoring tools to be able to see.
The last one is notification of permanent damage from initial cluster parameter,
although this may be similar to the second one.
This is an indication that shows some part of data are not recovered,
if the cluster was initialized and loaded some data during wal_level=none.

Exposing the monitoring item for tools sounds reasonable for me of course.
Was the control file the best place for the information to implement those kind 
of ideas ?
What I'm concerned about is that those three items are not necessarily 
meaningful for all users.
Also, it seems better the size of the control file should be saved
because the size is limited to less than 512 bytes for atomic writes, which is 
written in a comment.
Here, I didn't say that I'll take all of those ideas into the patch
but this perspective doesn't change whenever we change the content of control 
file, I would say.
Another aspect unique to the last idea is that
the value will be no use after taking a new full backup.

RE: Disable WAL logging to speed up data loading

2020-11-16 Thread tsunakawa.ta...@fujitsu.com

# It'd be helpful if you could send mails in text format, not HTML.

From: David G. Johnston 
> For this case the fundamental feature that would seem to be required is an 
> ability for a transaction commit to return only after the system has ensured 
> that all of the new pages added to the relation during the scope of the 
> transaction have made it to disk.  Something like:
>
> BEGIN UNLOGGED TRANSACTION FOR table1, table2;
> -- locking probably allows reads, definitely disallows concurrent writes, to 
> the named tables
> -- Disallow updates and deletes, do not use dead tuple space, for the tables 
> named.  Should be able to do normal stuff for other tables?
> -- Always create new pages
> COPY TO table1;
> COPY TO table2;
> COMMIT; -- wait here until data files for table1 and table2 are completely 
> written and the transaction alive flag is committed to the WAL.
>
> I suppose the above could be written "BEGIN UNLOGGED TRANSACTION FOR ALL 
> TABLES" and you'd get the initial database population optimization capability.
>
> If the commit doesn't complete all of the newly created pages are junk.  
> Otherwise, you have a crash-recoverable state for those tables as regards 
> those specific pages.

As Steven-san said, I don't want to go this complicated direction.  Plus, 
putting my feet in the user's shoes, I want to try to avoid introducing a new 
SQL syntax for this kind of performance boost, which requires applications and 
maintenance scripts and testing.


Regards
Takayuki Tsunakawa

RE: Disable WAL logging to speed up data loading

2020-11-16 Thread tsunakawa.ta...@fujitsu.com

From: Robert Haas 
> I'm also concerned about the way that this proposed feature interacts with
> incremental backup capabilities that already exist in tools like pgBackRest,
> EDB's BART, pg_probackup, and future things we might want to introduce into
> core, along the lines of what I have previously proposed. Now, I think
> pgBackRest uses only timestamps and checksums, so it probably doesn't care,
> but some of the other solutions rely on WAL-scanning to gather a list of
> changed blocks. I guess there's no reason that they can't notice the wal_level
> being changed and do the right thing; they should probably have that kind of
> capability already. Still, it strikes me that it might be useful if we had a 
> stronger
> mechanism.

Having a quick look, those backup tools seem to require setting wal_level to 
replica or higher.  That's no wonder, because recovering the database needs WAL 
for non-relation resources such as pg_control and relation map.  So, I think 
wal_level = none won't introduce new issues (compared to wal_level = minimal, 
which also can lack WAL records for some data updates.)


> By the way, another problem here is that some AMs - e.g. GiST, IIRC - use LSNs
> to figure out whether a block has changed. For temporary and unlogged tables,
> we use "fake" LSNs that are generated using a counter, but that approach only
> works because such relations are never really WAL-logged. Mixing fake LSNs
> and real LSNs will break stuff, and not bumping the LSN when the page
> changes probably will, too.

Unlogged GiST indexes use fake LSNs that are instance-wide.  Unlogged temporary 
GiST indexes use backend-local sequence values.  Other unlogged and temporary 
relations don't set LSNs on pages.  So, I think it's enough to call 
GetFakeLSNForUnloggedRel() when wal_level = none as well.


Regards
Takayuki Tsunakawa

RE: Disable WAL logging to speed up data loading

2020-11-15 Thread osumi.takami...@fujitsu.com

Hi

On Thursday, October 29, 2020 11:42 AM Fujii Masao 
 wrote:
> BTW, with the patch, I observed that PREPARE TRANSACTION and COMMIT
> PREPARED caused assertion failure in my env, as I pointed upthread.
> 
> How does the patch handle other feature depending on the existence of WAL,
> e.g., pg_logical_emit_message()?

I've updated the first patch, based on comments
from both Tsunakwa-San and Fujii-San mainly.
I'll take other comments in the next patch.
(Note that this patch passes make check-world but
doesn't pass installcheck-world yet.)

The assertion failure Fujii-San reported in the past has been protected by
adding a check to detect whether PREPARE TRANSACTION is issued
when wal_level=none or not in the v02.
Meanwhile, I didn't change the code for COMMIT PREPARED
because prohibiting usage of PREPARE TRANSACTION
means user cannot use COMMIT PREPARED as well.
I learnt this way of modification from the case that when 
max_prepared_transaction
is set to zero, PREPARE TRANSACTION cannot be used because of wa_level check,
while issuing COMMIT TRANSACTION itself doesn't claim wal_level.
Just prohibiting PREPARE TRANSACTION seemed OK for me.

As for pg_resetwal (and other commands like pg_rewind),
I didn't make any change. pg_resetwal is used to make corrupted server
start up again as a *last* resort by guessing the content of the control file,
while wal_level=none is designed never to make
the server start again when any unexpected crash is detected.
Here, the documentation in the patch about wal_level=none 
requests user to recreate the whole cluster again
when such a corruption of the cluster happens. Thus, it's not natural to think
that user of wal_level=none will continue to use the coruppted cluster forcibly
by applying pg_resetwal. This is the reason I made no fix of pg_resetwal.
In terms of other commands, I don't think there was a need to fix.

By the way, I'm not sure why functions related to replication origin
(e.g. pg_replication_origin_create) doesn't have a check of
wal_level. It can be used when wal_level < logical even though
their purposes are to safely keep track of replication progress.
Did it mean something special to execute such function when wal_level < logical 
?
In the patch, I added an error check on this point.

Another similar case is that
there's no test to check wal_level for pg_logical_emit_message.
In my humble opinion, pg_logical_emit_message should not be used
when wal_level <= minimal. I didn't think that
without setting up a slot for logical replication
(with the restriction that user cannot create a slot by 
pg_create_physical_replication_slot when wal_level < replica),
plugins will utilize message from pg_logical_emit_message.
Or, if there is a slot that has been created before
restarting the server to turn on wal_level=none,
I cannot get the worth to execute the function
because other kinds of WAL are not issued.
Thus, I prohibited the usage of pg_logical_emit_message this time.
Did it make sense ?

Best,
Takamichi Osumi

disable_WAL_logging_v02.patch
Description: disable_WAL_logging_v02.patch

RE: Disable WAL logging to speed up data loading

2020-11-11 Thread tsunakawa.ta...@fujitsu.com

From: Stephen Frost 
> I'm not sure that I see this as really being much of an issue.  Perhaps there 
> are
> some things we can do, as I mentioned before, to make it easier for users to
> have tables be created as unlogged from the start, or to be able to ALTER
> TABLE a bunch of tables at once (using all in tablespace, or maybe having an
> ALTER TABLE on a partitioned table cascade to the partitions), but overall the
> risk here seems very low- clearly whatever processing is running to load the
> data into a particular table knows what the table is and adding an ALTER
> TABLE into it would be trivial.

Thank you for your opinion.  I'm glad to see great people like Robert, Magnus 
and you have given comments (honestly, I didn't expect this trivial feature 
wouldn't get attention.)  I'll read those mails from the oldest including this, 
and reply to them if needed.


> > Likewise, can't we do ALTER TABLE SET UNLOGGED/LOGGED against a
> partitioned table?  Currently, the statement succeeds but none of the
> partitioned table nor its partitions is set unlogged (pg_class.relpersistence
> remains 'p').  Is this intended?  If it's a bug, I'm willing to fix it so 
> that it reports
> an eror.  Of course, it's good to make all partitions unlogged at once.
> 
> I agree that this doesn't seem quite right and considering the way other
> commands work like CREATE INDEX, I would think that doing such an ALTER
> TABLE would recurse to the individual partitions (skipping over any which are
> already set to the persistance desired..).

OK, I'll try to propagate the alteration to partitions.  (I wish the fix will 
be easy, but I'm afraid some devil is lurking...)  I'd appreciate it if someone 
could kindly point out the devil if he/she knows the past history.


Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2020-11-11 Thread Stephen Frost

Greetings,

* tsunakawa.ta...@fujitsu.com (tsunakawa.ta...@fujitsu.com) wrote:
> * ALTER TABLE SET UNLOGGED/LOGGED without data copy
> Good:
> - Does not require server restart (if this feature can be used in all 
> wal_level settings).
> 
> Bad:
> - The user has to maintain and modify some scripts to use ALTER TABLE when 
> adding or removing the tables/partitions to load data into.  For example, if 
> the data loading job specifies a partitioned table, he may forget to add 
> ALTER TABLE for new partitions, resulting in slow data loading.

I'm not sure that I see this as really being much of an issue.  Perhaps
there are some things we can do, as I mentioned before, to make it
easier for users to have tables be created as unlogged from the start,
or to be able to ALTER TABLE a bunch of tables at once (using all in
tablespace, or maybe having an ALTER TABLE on a partitioned table
cascade to the partitions), but overall the risk here seems very low-
clearly whatever processing is running to load the data into a
particular table knows what the table is and adding an ALTER TABLE into
it would be trivial.

Specifically, for a partitioned table, I would think the load would go
something like:

CREATE UNLOGGED TABLE ...
load all of the data
ALTER TABLE ... SET LOGGED
ALTER TABLE ... ATTACH PARTITION

If the data load could all be done in a single transaction then you
wouldn't even need to create the table as UNLOGGED or issue the SET
LOGGED, with wal_level=minimal, you just need to create the table in the
same transaction that you do the data load in.

> * wal_level = none
> Good:
> - Easy to use.  The user does not have to be aware of what tables are loaded. 
>  This can come in handy when migrating from an older version or another DBMS, 
> building test databases, and consolidating databases.

A GUC that allowed users to set a default for newly created tables to be
unlogged would also address this.

> Bad:
> - Requires server restart.

Introducing yet another wal_level strikes me as a very large step, one
that the arguments presented here for why it'd be worth it don't come
anywhere near justifying that step.

> I expect both features will be able to meet our customer's needs.  The worst 
> scenario (I don't want to imagine!) is that neither feature fails to be 
> committed.  So, let us continue both features.  I'll join Horiguchi-san's new 
> thread, and please help us here too.  (I'll catch up with the recent 
> discussion in this thread and reply.)

While you're certainly welcome to spend your time where you wish to, if,
as you say, making these changes to how tables can be switched from
unlogged to logged with wal_level=minimal meets this use-case then that
strikes me as definitely the right approach and removes any
justification for adding another wal_level.

> > Couldn't we have something like the following?
> > 
> >  ALTER TABLE table1, table2, table3 SET UNLOGGED;
> > 
> > That is, multiple target object specification in ALTER TABLE sttatement.
> 
> Likewise, can't we do ALTER TABLE SET UNLOGGED/LOGGED against a partitioned 
> table?  Currently, the statement succeeds but none of the partitioned table 
> nor its partitions is set unlogged (pg_class.relpersistence remains 'p').  Is 
> this intended?  If it's a bug, I'm willing to fix it so that it reports an 
> eror.  Of course, it's good to make all partitions unlogged at once.

I agree that this doesn't seem quite right and considering the way other
commands work like CREATE INDEX, I would think that doing such an ALTER
TABLE would recurse to the individual partitions (skipping over any
which are already set to the persistance desired..).

Thanks,

Stephen

signature.asc
Description: PGP signature

RE: Disable WAL logging to speed up data loading

2020-11-11 Thread tsunakawa.ta...@fujitsu.com

From: Kyotaro Horiguchi 
> Thanks!  Since this feature is different from the feature that is
> being proposed in this thhead, I started another thread for this
> feature.
> 
> https://www.postgresql.org/message-id/2020.173317.460890039962481
> 381.horikyota@gmail.com

Thank you, Horiguchi-san(^^)  I was just about to ask you to separate the 
thread, because I think both features are good:

* ALTER TABLE SET UNLOGGED/LOGGED without data copy
Good:
- Does not require server restart (if this feature can be used in all wal_level 
settings).

Bad:
- The user has to maintain and modify some scripts to use ALTER TABLE when 
adding or removing the tables/partitions to load data into.  For example, if 
the data loading job specifies a partitioned table, he may forget to add ALTER 
TABLE for new partitions, resulting in slow data loading.


* wal_level = none
Good:
- Easy to use.  The user does not have to be aware of what tables are loaded.  
This can come in handy when migrating from an older version or another DBMS, 
building test databases, and consolidating databases.

Bad:
- Requires server restart.


* Both features
Bad:
- Requires taking database backup.
- Requires rebuilding the standby.


Sadly, the new thread's title includes a spelling mistake of extra r -- 
"In-placre".  (I hope this won't prevent the search hit in the mail archive.)

I expect both features will be able to meet our customer's needs.  The worst 
scenario (I don't want to imagine!) is that neither feature fails to be 
committed.  So, let us continue both features.  I'll join Horiguchi-san's new 
thread, and please help us here too.  (I'll catch up with the recent discussion 
in this thread and reply.)


> Couldn't we have something like the following?
> 
>  ALTER TABLE table1, table2, table3 SET UNLOGGED;
> 
> That is, multiple target object specification in ALTER TABLE sttatement.

Likewise, can't we do ALTER TABLE SET UNLOGGED/LOGGED against a partitioned 
table?  Currently, the statement succeeds but none of the partitioned table nor 
its partitions is set unlogged (pg_class.relpersistence remains 'p').  Is this 
intended?  If it's a bug, I'm willing to fix it so that it reports an eror.  Of 
course, it's good to make all partitions unlogged at once.


Regards
Takayuki Tsunakawa

Re: Disable WAL logging to speed up data loading

2020-11-11 Thread Kyotaro Horiguchi

At Tue, 10 Nov 2020 09:33:12 -0500, Stephen Frost  wrote in 
> Greetings,
> 
> * Kyotaro Horiguchi (horikyota@gmail.com) wrote:
> > For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> > ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
> > trials of several ways, I drifted to the following way after poking
> > several ways.
> > 
> > 1. Flip BM_PERMANENT of active buffers
> > 2. adding/removing init fork
> > 3. sync files,
> > 4. Flip pg_class.relpersistence.
> > 
> > It always skips table copy in the SET UNLOGGED case, and only when
> > wal_level=minimal in the SET LOGGED case.  Crash recovery seems
> > working by some brief testing by hand.
> 
> Somehow missed that this patch more-or-less does what I was referring to
> down-thread, but I did want to mention that it looks like it's missing a
> necessary FlushRelationBuffers() call before the sync, otherwise there
> could be dirty buffers for the relation that's being set to LOGGED (with
> wal_level=minimal), which wouldn't be good.  See the comments above
> smgrimmedsync().
>
> > Of course, I haven't performed intensive test on it.
> 
> Reading through the thread, it didn't seem very clear, but we should
> definitely make sure that it does the right thing on replicas when going
> between unlogged and logged (and between logged and unlogged too), of
> course.
> 
> Thanks,

Thanks!  Since this feature is different from the feature that is
being proposed in this thhead, I started another thread for this
feature.

https://www.postgresql.org/message-id/2020.173317.460890039962481381.horikyota@gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Disable WAL logging to speed up data loading

2020-11-10 Thread osumi.takami...@fujitsu.com

Horiguchi-San


On Tuesday, November 10, 2020 10:00 AM  Kyotaro Horiguchi 
 wrote:
> FWIW, the following is that, I think it works not only when wal_level=minimal
> for SET UNLOGGED, and only works when minimal for SET LOGGED.
> 
> https://www.postgresql.org/message-id/20201002.100621.166891875652013
> 6893.horikyota@gmail.com
> 
> | For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> | ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After
> some
> | trials of several ways, I drifted to the following way after poking
> | several ways.
> |
> | 1. Flip BM_PERMANENT of active buffers 2. adding/removing init fork 3.
> | sync files, 4. Flip pg_class.relpersistence.
> |
> | It always skips table copy in the SET UNLOGGED case, and only when
> | wal_level=minimal in the SET LOGGED case.  Crash recovery seems
> | working by some brief testing by hand.
> I agree to this aspect of the in-place flipping of UNLOGGED.
> 
> Couldn't we have something like the following?
>  ALTER TABLE table1, table2, table3 SET UNLOGGED;
> 
> That is, multiple target object specification in ALTER TABLE sttatement.
I *really* appreciate the 1st patch.
Did you register this patch with the current or next commitfest ?
I cannot find it. We should avoid that this patch misses people's attention.
When I'm available, I'd like to join the review for the patch as well.

Also, I supposed that as long as
something impossible to implement for wal_level=none is not founded,
we could develop both patches in parallel, because both have good points
described in the previous e-mails of this thread.

Best,
Takamichi Osumi

Re: Disable WAL logging to speed up data loading

2020-11-10 Thread Stephen Frost

Greetings,

* Kyotaro Horiguchi (horikyota@gmail.com) wrote:
> For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
> trials of several ways, I drifted to the following way after poking
> several ways.
> 
> 1. Flip BM_PERMANENT of active buffers
> 2. adding/removing init fork
> 3. sync files,
> 4. Flip pg_class.relpersistence.
> 
> It always skips table copy in the SET UNLOGGED case, and only when
> wal_level=minimal in the SET LOGGED case.  Crash recovery seems
> working by some brief testing by hand.

Somehow missed that this patch more-or-less does what I was referring to
down-thread, but I did want to mention that it looks like it's missing a
necessary FlushRelationBuffers() call before the sync, otherwise there
could be dirty buffers for the relation that's being set to LOGGED (with
wal_level=minimal), which wouldn't be good.  See the comments above
smgrimmedsync().

> Of course, I haven't performed intensive test on it.

Reading through the thread, it didn't seem very clear, but we should
definitely make sure that it does the right thing on replicas when going
between unlogged and logged (and between logged and unlogged too), of
course.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Disable WAL logging to speed up data loading

2020-11-10 Thread Stephen Frost

Greetings,

* Kyotaro Horiguchi (horikyota@gmail.com) wrote:
> At Mon, 9 Nov 2020 10:18:08 -0500, Stephen Frost  wrote 
> in 
> > Greetings,
> > 
> > * osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote:
> > > When I consider the use case is the system of data warehouse
> > > as described upthread, the size of each table can be large.
> > > Thus, changing the status from unlogged to logged (recoverable)
> > > takes much time under the current circumstances, which was discussed 
> > > before.
> > 
> > Ok- so the issue is that, today, we dump all of the table into the WAL
> > when we go from unlogged to logged, but as I outlined previously,
> > perhaps that's because we're missing a trick there when
> > wal_level=minimal.  If wal_level=minimal, then it would seem like we
> > could lock the table, then sync it and then mark is as logged, which is
> 
> FWIW, the following is that, I think it works not only when
> wal_level=minimal for SET UNLOGGED, and only works when minimal for
> SET LOGGED.
> 
> https://www.postgresql.org/message-id/20201002.100621.1668918756520136893.horikyota@gmail.com

Oh, nice!  Very cool that this was already being worked on, apologies
for not seeing that thread.

> | For fuel(?) of the discussion, I tried a very-quick PoC for in-place
> | ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
> | trials of several ways, I drifted to the following way after poking
> | several ways.
> | 
> | 1. Flip BM_PERMANENT of active buffers
> | 2. adding/removing init fork
> | 3. sync files,
> | 4. Flip pg_class.relpersistence.
> | 
> | It always skips table copy in the SET UNLOGGED case, and only when
> | wal_level=minimal in the SET LOGGED case.  Crash recovery seems
> | working by some brief testing by hand.

Looking over that patch- isn't it missing a necessary FlushBuffers(), to
make sure that any dirty buffers have been written out before the sync
is being done?

Will try to find that thread and review/comment on it there too, if I
can, buf figured I'd point it out here while I have your attention too.

:)

> > more-or-less what you're asking to have be effectively done with the
> > proposed wal_level=none, but this would be an optimization for all
> > existing users of wal_level=minimal who have unlogged tables that they
> > want to change to logged, and this works on a per-table basis instead,
> > which seems like a better approach than a cluster-wide setting.
> > 
> > > By having the limited window of time,
> > > during wal_level=none, I'd like to make wal_level=none work to
> > > localize and minimize the burden to guarantee all commands are
> > > repeatable. To achieve this, after switching wal_level from none to 
> > > higher ones,
> > > the patch must ensure crash recovery, though.
> > 
> > Perhaps a helper command could be added to ALTER TABLE ALL IN TABLESPACE
> > to marked a bunch of unlogged tables over to being logged would be good
> > to add too.
> 
> I agree to this aspect of the in-place flipping of UNLOGGED.

Yeah, seems like it'd be useful and probably not hard to implement.

> > > Sorry that my current patch doesn't complete this aspect fully at present
> > > but, may I have your opinion about this ?
> > 
> > Presently, my feeling is that we could address this use-case without
> > having to introduce a new cluster-wide WAL level, and that's the
> > direction I'd want to see this going.  Perhaps I'm missing something
> > about why the approach I've set forth above wouldn't work, and
> > wal_level=none would, but I've not seen it yet.
> 
> Couldn't we have something like the following?
> 
>  ALTER TABLE table1, table2, table3 SET UNLOGGED;
> 
> That is, multiple target object specification in ALTER TABLE sttatement.

That requires knowing all the tables ahead of time though, or writing
some pl/pgsql to build up that list.  Not hard in general, but having
the 'all in tablespace' would make it a bit easier if you wanted to do
this for effectively all tables in a given database.  Having ALTER TABLE
support being given an explicit list of tables does seem like it could
be useful though, so certainly not against that if someone wanted to
implement it.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: Disable WAL logging to speed up data loading

2020-11-09 Thread Kyotaro Horiguchi

At Mon, 9 Nov 2020 10:18:08 -0500, Stephen Frost  wrote in 
> Greetings,
> 
> * osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote:
> > When I consider the use case is the system of data warehouse
> > as described upthread, the size of each table can be large.
> > Thus, changing the status from unlogged to logged (recoverable)
> > takes much time under the current circumstances, which was discussed before.
> 
> Ok- so the issue is that, today, we dump all of the table into the WAL
> when we go from unlogged to logged, but as I outlined previously,
> perhaps that's because we're missing a trick there when
> wal_level=minimal.  If wal_level=minimal, then it would seem like we
> could lock the table, then sync it and then mark is as logged, which is

FWIW, the following is that, I think it works not only when
wal_level=minimal for SET UNLOGGED, and only works when minimal for
SET LOGGED.

https://www.postgresql.org/message-id/20201002.100621.1668918756520136893.horikyota@gmail.com

| For fuel(?) of the discussion, I tried a very-quick PoC for in-place
| ALTER TABLE SET LOGGED/UNLOGGED and resulted as attached. After some
| trials of several ways, I drifted to the following way after poking
| several ways.
| 
| 1. Flip BM_PERMANENT of active buffers
| 2. adding/removing init fork
| 3. sync files,
| 4. Flip pg_class.relpersistence.
| 
| It always skips table copy in the SET UNLOGGED case, and only when
| wal_level=minimal in the SET LOGGED case.  Crash recovery seems
| working by some brief testing by hand.

> more-or-less what you're asking to have be effectively done with the
> proposed wal_level=none, but this would be an optimization for all
> existing users of wal_level=minimal who have unlogged tables that they
> want to change to logged, and this works on a per-table basis instead,
> which seems like a better approach than a cluster-wide setting.
> 
> > By having the limited window of time,
> > during wal_level=none, I'd like to make wal_level=none work to
> > localize and minimize the burden to guarantee all commands are
> > repeatable. To achieve this, after switching wal_level from none to higher 
> > ones,
> > the patch must ensure crash recovery, though.
> 
> Perhaps a helper command could be added to ALTER TABLE ALL IN TABLESPACE
> to marked a bunch of unlogged tables over to being logged would be good
> to add too.

I agree to this aspect of the in-place flipping of UNLOGGED.

> > Sorry that my current patch doesn't complete this aspect fully at present
> > but, may I have your opinion about this ?
> 
> Presently, my feeling is that we could address this use-case without
> having to introduce a new cluster-wide WAL level, and that's the
> direction I'd want to see this going.  Perhaps I'm missing something
> about why the approach I've set forth above wouldn't work, and
> wal_level=none would, but I've not seen it yet.

Couldn't we have something like the following?

 ALTER TABLE table1, table2, table3 SET UNLOGGED;

That is, multiple target object specification in ALTER TABLE sttatement.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Disable WAL logging to speed up data loading

2020-11-09 Thread Stephen Frost

Greetings,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Mon, Nov 9, 2020 at 10:36 AM Stephen Frost  wrote:
> > * David G. Johnston (david.g.johns...@gmail.com) wrote:
> > > If the commit doesn't complete all of the newly created pages are junk.
> > > Otherwise, you have a crash-recoverable state for those tables as regards
> > > those specific pages.
> >
> > How would we track that and know which pages are junk?
> 
> Every one of those pages would have a single dead transaction id contained
> within it.  If there is bookkeeping that needs to happen that could be wal
> logged - the goal here being not to avoid all wal but to avoid data wal.  I
> don't know enough about the internals here to be more specific.

Yeah, not sure how well that would end up working..  Maybe there's a way
to get there, but definitely a fair bit of complicated work.

> > > Conceptually, we need an ability to perform a partial CHECKPOINT that
> > names
> > > specific tables, and make sure the crash-recovery works for those tables
> > > while figuring out what amount of effort to expend on informing the dba
> > and
> > > alerting/preventing features that require wal from using those tables.
> >
> > Yeah, seems pretty complicated.
> >
> > Did you see an issue with the basic idea I proposed earlier, whereby an
> > unlogged table could become 'logged', while we are at wal_level=minimal,
> > by essentially checkpointing it (locking it, forcing out any buffers we
> > have associated with it, and then fsync'ing it- not sure how much of
> > that is already done in the unlogged->logged process but I would guess
> > most of it) while not actually writing it into the WAL?
> 
> That is basically half of what is described above - the part at commit when
> the relation is persisted to disk.

Right, think I agree with you there.

> What your earlier description seems to
> be missing is the part about temporarily making a logged relation
> unlogged.

While I get that there may be use-cases for that, it seems that the
use-case being described here doesn't require it- there just needs to be
a way to take the unlogged table and make it into a 'logged' table
without having to actually write it all into the WAL when wal_level is
minimal.

There may be another option to addressing the use-case as you're looking
at it though- by using partitioning.  That is, there's a partitioned
table which has logged tables in it, a new unlogged table is created and
added to the partitioned table, it's then loaded, and then it's
converted to being logged (or maybe it's loaded first and then added to
the partitioned table, either way).  This would also have the advantage
that you'd be able to continue making changes to the partitioned table
as you normally do, in general.

> I envision that as being part of a transaction as opposed to a
> permanent attribute of the table.  I envision a storage parameter that
> allows individual relations to be considered as having wal_level='minimal'
> even if the system as a whole has, e.g., wal_level='replication'.  Only
> those could be forced into this temporarily unlogged mode.

I mean ... we have a way of saying that individual relations have a
lower WAL level than others- they're UNLOGGED.  I'm still seeing this as
an opportunity to build on that and improve that, rather than invent
something new.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Disable WAL logging to speed up data loading

2020-11-09 Thread David G. Johnston

On Mon, Nov 9, 2020 at 10:36 AM Stephen Frost  wrote:

> * David G. Johnston (david.g.johns...@gmail.com) wrote:
>
> > If the commit doesn't complete all of the newly created pages are junk.
> > Otherwise, you have a crash-recoverable state for those tables as regards
> > those specific pages.
>
> How would we track that and know which pages are junk?
>

Every one of those pages would have a single dead transaction id contained
within it.  If there is bookkeeping that needs to happen that could be wal
logged - the goal here being not to avoid all wal but to avoid data wal.  I
don't know enough about the internals here to be more specific.

> > Conceptually, we need an ability to perform a partial CHECKPOINT that
> names
> > specific tables, and make sure the crash-recovery works for those tables
> > while figuring out what amount of effort to expend on informing the dba
> and
> > alerting/preventing features that require wal from using those tables.
>
> Yeah, seems pretty complicated.
>
> Did you see an issue with the basic idea I proposed earlier, whereby an
> unlogged table could become 'logged', while we are at wal_level=minimal,
> by essentially checkpointing it (locking it, forcing out any buffers we
> have associated with it, and then fsync'ing it- not sure how much of
> that is already done in the unlogged->logged process but I would guess
> most of it) while not actually writing it into the WAL?
>

That is basically half of what is described above - the part at commit when
the relation is persisted to disk.  What your earlier description seems to
be missing is the part about temporarily making a logged relation
unlogged.  I envision that as being part of a transaction as opposed to a
permanent attribute of the table.  I envision a storage parameter that
allows individual relations to be considered as having wal_level='minimal'
even if the system as a whole has, e.g., wal_level='replication'.  Only
those could be forced into this temporarily unlogged mode.

One part I hadn't given thought to is indexes and how those would interact
with this plan. Mostly due to lack of internals knowledge.

David J.

Re: Disable WAL logging to speed up data loading

2020-11-09 Thread Stephen Frost

Greetings,

* David G. Johnston (david.g.johns...@gmail.com) wrote:
> On Mon, Nov 9, 2020 at 8:18 AM Stephen Frost  wrote:
> > Presently, my feeling is that we could address this use-case without
> > having to introduce a new cluster-wide WAL level, and that's the
> > direction I'd want to see this going.  Perhaps I'm missing something
> > about why the approach I've set forth above wouldn't work, and
> > wal_level=none would, but I've not seen it yet.
>
> +1
> 
> We are trying to address a performance optimization for an insert-only
> scenario on a limited set of tables by placing the entire cluster in a
> dangerous state.  The "copy table unlogged" solution is definitely closer
> to what we want - this is
> demonstrably worse.

Yeah, agreed.

> For this case the fundamental feature that would seem to be required is an
> ability for a transaction commit to return only after the system has
> ensured that all of the new pages added to the relation during the scope of
> the transaction have made it to disk.  Something like:
> 
> BEGIN UNLOGGED TRANSACTION FOR table1, table2;
> -- locking probably allows reads, definitely disallows concurrent writes,
> to the named tables
> -- Disallow updates and deletes, do not use dead tuple space, for the
> tables named.  Should be able to do normal stuff for other tables?
> -- Always create new pages
> COPY TO table1;
> COPY TO table2;
> COMMIT; -- wait here until data files for table1 and table2 are completely
> written and the transaction alive flag is committed to the WAL.

That's certainly an interesting idea, but seems like a much larger step
than just making some improvements to how UNLOGGED tables work today,
and then perhaps some helper options to make it easier to create
UNLOGGED tables and change them from unlogged to logged when the
wal_level is set to 'minimal'.

Also- I don't think this would end up working for normally logged
relations at a wal_level higher than 'minimal', since if we don't
log those pages then they won't get to replicas.

> I suppose the above could be written "BEGIN UNLOGGED TRANSACTION FOR ALL
> TABLES" and you'd get the initial database population optimization
> capability.

Or just 'BEGIN UNLOGGED TRANSACTION'..  I wonder if we'd have to run
around and lock all tables as you're suggesting above or if we could
just lock them as they get used..

> If the commit doesn't complete all of the newly created pages are junk.
> Otherwise, you have a crash-recoverable state for those tables as regards
> those specific pages.

How would we track that and know which pages are junk?

> Conceptually, we need an ability to perform a partial CHECKPOINT that names
> specific tables, and make sure the crash-recovery works for those tables
> while figuring out what amount of effort to expend on informing the dba and
> alerting/preventing features that require wal from using those tables.

Yeah, seems pretty complicated.

Did you see an issue with the basic idea I proposed earlier, whereby an
unlogged table could become 'logged', while we are at wal_level=minimal,
by essentially checkpointing it (locking it, forcing out any buffers we
have associated with it, and then fsync'ing it- not sure how much of
that is already done in the unlogged->logged process but I would guess
most of it) while not actually writing it into the WAL?

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Disable WAL logging to speed up data loading

2020-11-09 Thread David G. Johnston

On Mon, Nov 9, 2020 at 8:18 AM Stephen Frost  wrote:

> Presently, my feeling is that we could address this use-case without
>
having to introduce a new cluster-wide WAL level, and that's the
> direction I'd want to see this going.  Perhaps I'm missing something
> about why the approach I've set forth above wouldn't work, and
> wal_level=none would, but I've not seen it yet.
>
>
+1

We are trying to address a performance optimization for an insert-only
scenario on a limited set of tables by placing the entire cluster in a
dangerous state.  The "copy table unlogged" solution is definitely closer
to what we want - this is
demonstrably worse.

For this case the fundamental feature that would seem to be required is an
ability for a transaction commit to return only after the system has
ensured that all of the new pages added to the relation during the scope of
the transaction have made it to disk.  Something like:

BEGIN UNLOGGED TRANSACTION FOR table1, table2;
-- locking probably allows reads, definitely disallows concurrent writes,
to the named tables
-- Disallow updates and deletes, do not use dead tuple space, for the
tables named.  Should be able to do normal stuff for other tables?
-- Always create new pages
COPY TO table1;
COPY TO table2;
COMMIT; -- wait here until data files for table1 and table2 are completely
written and the transaction alive flag is committed to the WAL.

I suppose the above could be written "BEGIN UNLOGGED TRANSACTION FOR ALL
TABLES" and you'd get the initial database population optimization
capability.

If the commit doesn't complete all of the newly created pages are junk.
Otherwise, you have a crash-recoverable state for those tables as regards
those specific pages.

Conceptually, we need an ability to perform a partial CHECKPOINT that names
specific tables, and make sure the crash-recovery works for those tables
while figuring out what amount of effort to expend on informing the dba and
alerting/preventing features that require wal from using those tables.

David J.

Re: Disable WAL logging to speed up data loading

2020-11-09 Thread Stephen Frost

Greetings,

* osumi.takami...@fujitsu.com (osumi.takami...@fujitsu.com) wrote:
> On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost  wrote:
> > I'm not sure that wal_level=none is really the right way to address this
> > use-case.  We already have unlogged tables and that's pretty clean and
> > meets the "we want to load data fast without having to pay for WAL" use 
> > case.
> > The argument here seems to be that to take advantage of unlogged tables
> > requires the tools using PG to know how to issue a 'CREATE UNLOGGED
> > TABLE' command instead of a 'CREATE TABLE' command.  That doesn't
> > seem like a huge leap, but we could make it easier by just adding a
> > 'table_log_default' or such GUC that could be set on the data loading role 
> > to
> > have all tables created by it be unlogged.
> I'm afraid to say that in the case to setup all tables as unlogged,
> the user are forced to be under tension to
> back up *all* commands from application, in preparation for unexpected crash.
> This is because whenever the server crashes,
> the unlogged tables are truncated and the DBA needs to
> input the processings after the last backup again without exception.
> I didn't think that this was easy and satisfied the user.

You'll need to explain how this is different from the proposed
'wal_level = none' option, since it sounded like that would be exactly
the same case..?

> In addition, as long as the tables are unlogged, the user cannot be released 
> from
> this condition or (requirement ?) to back up all commands or
> to guarantee that all commands are repeatable for the DBA.

They can change the table to be logged though, if they wish to.

> When I consider the use case is the system of data warehouse
> as described upthread, the size of each table can be large.
> Thus, changing the status from unlogged to logged (recoverable)
> takes much time under the current circumstances, which was discussed before.

Ok- so the issue is that, today, we dump all of the table into the WAL
when we go from unlogged to logged, but as I outlined previously,
perhaps that's because we're missing a trick there when
wal_level=minimal.  If wal_level=minimal, then it would seem like we
could lock the table, then sync it and then mark is as logged, which is
more-or-less what you're asking to have be effectively done with the
proposed wal_level=none, but this would be an optimization for all
existing users of wal_level=minimal who have unlogged tables that they
want to change to logged, and this works on a per-table basis instead,
which seems like a better approach than a cluster-wide setting.

> By having the limited window of time,
> during wal_level=none, I'd like to make wal_level=none work to
> localize and minimize the burden to guarantee all commands are
> repeatable. To achieve this, after switching wal_level from none to higher 
> ones,
> the patch must ensure crash recovery, though.

Perhaps a helper command could be added to ALTER TABLE ALL IN TABLESPACE
to marked a bunch of unlogged tables over to being logged would be good
to add too.

> Sorry that my current patch doesn't complete this aspect fully at present
> but, may I have your opinion about this ?

Presently, my feeling is that we could address this use-case without
having to introduce a new cluster-wide WAL level, and that's the
direction I'd want to see this going.  Perhaps I'm missing something
about why the approach I've set forth above wouldn't work, and
wal_level=none would, but I've not seen it yet.

Thanks,

Stephen

signature.asc
Description: PGP signature

RE: Disable WAL logging to speed up data loading

2020-11-08 Thread osumi.takami...@fujitsu.com

Hello, Stephen


On Tuesday, Nov 3, 2020 3:02 AM Stephen Frost  wrote:
> * Magnus Hagander (mag...@hagander.net) wrote:
> > On Mon, Nov 2, 2020 at 4:28 PM Robert Haas 
> wrote:
> > > On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao
>  wrote:
> > > > Yes. What I meant was such a safe guard needs to be implemented.
> > > >
> > > > This may mean that if we want to recover the database from that
> > > > backup, we need to specify the recovery target so that the archive
> > > > recovery stops just before the WAL record indicating wal_level change.
> > >
> > > Yeah, I think we need these kinds of safeguards, for sure.
> > >
> > > I'm also concerned about the way that this proposed feature
> > > interacts with incremental backup capabilities that already exist in
> > > tools like pgBackRest, EDB's BART, pg_probackup, and future things
> > > we might want to introduce into core, along the lines of what I have
> > > previously proposed. Now, I think pgBackRest uses only timestamps
> > > and checksums, so it probably doesn't care, but some of the other
> > > solutions rely on WAL-scanning to gather a list of changed blocks. I
> > > guess there's no reason that they can't notice the wal_level being
> > > changed and do the right thing; they should probably have that kind
> > > of capability already. Still, it strikes me that it might be useful
> > > if we had a stronger mechanism.
> 
> Checking the WAL level certainly seems critical to anything that's reading the
> WAL.  We certainly do this already when running as a
> replica:
> 
> ereport(WARNING,
> (errmsg("WAL was generated with wal_level=minimal, data may be
> missing"),
> errhint("This happens if you temporarily set wal_level=minimal
> without taking a new base backup.")));
> 
> There's definitely a question about if a WARNING there is really sufficient or
> not, considering that you could end up with 'logged'
> tables on the replica that are missing data, but I'm not sure that inventing a
> new, independent, mechanism for checking WAL level changes makes sense.
> 
> While pgbackrest backups of a primary wouldn't be impacted, it does support
> backing up from a replica (as do other backup systems, of
> course) and if data that's supposed to be on the replica isn't there because
> someone restarted PG with wal_level=minimal and then flipped it back to
> replica and got the replica to move past that part of the WAL by turning off 
> hot
> standby, replaying, and then turning it back on, then the backup is going to
> also be missing that data.
> 
> Perhaps that's where we need to have a stronger mechanism though- that is,
> if we hit the above WARNING, ever, set a flag somewhere that tools can check
> and which we could also check and throw further warnings about.  In other
> words, every time this replica is started, we could check for this flag and
> throw the same warning above about how data may be missing, and we could
> have pg_basebackup flat out refuse to back up from a replica where this flag
> has been set (maybe with some kind of override mechanism...  maybe not).
> After all, that's where the real issue here is, isn't it?
> 
> > > I'm not exactly sure what that would look like, but suppose we had a
> > > feature where every time wal_level drops below replica, a counter
> > > gets incremented by 1, and that counter is saved in the control
> > > file. Or maybe when wal_level drops below minimal to none. Or maybe
> > > there are two counters. Anyway, the idea is that if you have a
> > > snapshot of the cluster at one time and a snapshot at another time,
> > > you can see whether anything scary has happened in the middle
> > > without needing all of the WAL in between.
> > >
> > > Maybe this is off-topic for this thread or not really needed, but
> > > I'm not sure. I don't think wal_level=none is a bad idea
> > > intrinsically, but I think it would be easy to implement it poorly
> > > and end up harming a lot of users. I have no problem with giving
> > > people a way to do dangerous things, but we should do our best to
> > > let people know how much danger they've incurred.
> 
> I'm not sure that wal_level=none is really the right way to address this
> use-case.  We already have unlogged tables and that's pretty clean and
> meets the "we want to load data fast without having to pay for WAL" use case.
> The argument here seems to be that to take advantage of unlogged tables
> requires the tools using PG to know how to issue a 'CREATE UNLOGGED
> TABLE' command instead of a 'CREATE TABLE' command.  That doesn't
> seem like a huge leap, but we could make it easier by just adding a
> 'table_log_default' or such GUC that could be set on the data loading role to
> have all tables created by it be unlogged.
I'm afraid to say that in the case to setup all tables as unlogged,
the user are forced to be under tension to
back up *all* commands from application, in preparation for unexpected crash.
This is because whenever the server crashes,
the unlogged

Re: Disable WAL logging to speed up data loading

2020-11-02 Thread Stephen Frost

Greetings,

* Magnus Hagander (mag...@hagander.net) wrote:
> On Mon, Nov 2, 2020 at 4:28 PM Robert Haas  wrote:
> >
> > On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao  
> > wrote:
> > > Yes. What I meant was such a safe guard needs to be implemented.
> > >
> > > This may mean that if we want to recover the database from that backup,
> > > we need to specify the recovery target so that the archive recovery stops
> > > just before the WAL record indicating wal_level change.
> >
> > Yeah, I think we need these kinds of safeguards, for sure.
> >
> > I'm also concerned about the way that this proposed feature interacts
> > with incremental backup capabilities that already exist in tools like
> > pgBackRest, EDB's BART, pg_probackup, and future things we might want
> > to introduce into core, along the lines of what I have previously
> > proposed. Now, I think pgBackRest uses only timestamps and checksums,
> > so it probably doesn't care, but some of the other solutions rely on
> > WAL-scanning to gather a list of changed blocks. I guess there's no
> > reason that they can't notice the wal_level being changed and do the
> > right thing; they should probably have that kind of capability
> > already. Still, it strikes me that it might be useful if we had a
> > stronger mechanism.

Checking the WAL level certainly seems critical to anything that's
reading the WAL.  We certainly do this already when running as a
replica:

ereport(WARNING,
(errmsg("WAL was generated with wal_level=minimal, data may be 
missing"),
errhint("This happens if you temporarily set wal_level=minimal without 
taking a new base backup.")));

There's definitely a question about if a WARNING there is really
sufficient or not, considering that you could end up with 'logged'
tables on the replica that are missing data, but I'm not sure that
inventing a new, independent, mechanism for checking WAL level changes
makes sense.

While pgbackrest backups of a primary wouldn't be impacted, it does
support backing up from a replica (as do other backup systems, of
course) and if data that's supposed to be on the replica isn't there
because someone restarted PG with wal_level=minimal and then flipped it
back to replica and got the replica to move past that part of the WAL by
turning off hot standby, replaying, and then turning it back on, then
the backup is going to also be missing that data.

Perhaps that's where we need to have a stronger mechanism though- that
is, if we hit the above WARNING, ever, set a flag somewhere that tools
can check and which we could also check and throw further warnings
about.  In other words, every time this replica is started, we could
check for this flag and throw the same warning above about how data may
be missing, and we could have pg_basebackup flat out refuse to back up
from a replica where this flag has been set (maybe with some kind of
override mechanism...  maybe not).  After all, that's where the real
issue here is, isn't it?

> > I'm not exactly sure what that would look like, but suppose we had a
> > feature where every time wal_level drops below replica, a counter gets
> > incremented by 1, and that counter is saved in the control file. Or
> > maybe when wal_level drops below minimal to none. Or maybe there are
> > two counters. Anyway, the idea is that if you have a snapshot of the
> > cluster at one time and a snapshot at another time, you can see
> > whether anything scary has happened in the middle without needing all
> > of the WAL in between.
> >
> > Maybe this is off-topic for this thread or not really needed, but I'm
> > not sure. I don't think wal_level=none is a bad idea intrinsically,
> > but I think it would be easy to implement it poorly and end up harming
> > a lot of users. I have no problem with giving people a way to do
> > dangerous things, but we should do our best to let people know how
> > much danger they've incurred.

I'm not sure that wal_level=none is really the right way to address this
use-case.  We already have unlogged tables and that's pretty clean and
meets the "we want to load data fast without having to pay for WAL" use
case.  The argument here seems to be that to take advantage of unlogged
tables requires the tools using PG to know how to issue a 'CREATE
UNLOGGED TABLE' command instead of a 'CREATE TABLE' command.  That
doesn't seem like a huge leap, but we could make it easier by just
adding a 'table_log_default' or such GUC that could be set on the data
loading role to have all tables created by it be unlogged.

Looking back at the thread, it seems that perhaps the one other area
where this isn't ideal is when the user ultimately wants the data to be
*considered* logged even when it wasn't (ie: change from wal_level none,
or minimal, back to replica).  On a quick look, it seems that we're
missing a trick there for UNLOGGED -> LOGGED changes since, while I
don't think we'll WAL the table during that if wal_level is set to
minimal, I do think we'll still end up

Re: Disable WAL logging to speed up data loading

2020-11-02 Thread Magnus Hagander

On Mon, Nov 2, 2020 at 4:28 PM Robert Haas  wrote:
>
> On Thu, Oct 29, 2020 at 4:00 PM Fujii Masao  
> wrote:
> > Yes. What I meant was such a safe guard needs to be implemented.
> >
> > This may mean that if we want to recover the database from that backup,
> > we need to specify the recovery target so that the archive recovery stops
> > just before the WAL record indicating wal_level change.
>
> Yeah, I think we need these kinds of safeguards, for sure.
>
> I'm also concerned about the way that this proposed feature interacts
> with incremental backup capabilities that already exist in tools like
> pgBackRest, EDB's BART, pg_probackup, and future things we might want
> to introduce into core, along the lines of what I have previously
> proposed. Now, I think pgBackRest uses only timestamps and checksums,
> so it probably doesn't care, but some of the other solutions rely on
> WAL-scanning to gather a list of changed blocks. I guess there's no
> reason that they can't notice the wal_level being changed and do the
> right thing; they should probably have that kind of capability
> already. Still, it strikes me that it might be useful if we had a
> stronger mechanism.
>
> I'm not exactly sure what that would look like, but suppose we had a
> feature where every time wal_level drops below replica, a counter gets
> incremented by 1, and that counter is saved in the control file. Or
> maybe when wal_level drops below minimal to none. Or maybe there are
> two counters. Anyway, the idea is that if you have a snapshot of the
> cluster at one time and a snapshot at another time, you can see
> whether anything scary has happened in the middle without needing all
> of the WAL in between.
>
> Maybe this is off-topic for this thread or not really needed, but I'm
> not sure. I don't think wal_level=none is a bad idea intrinsically,
> but I think it would be easy to implement it poorly and end up harming
> a lot of users. I have no problem with giving people a way to do
> dangerous things, but we should do our best to let people know how
> much danger they've incurred.

I definitely think this is something that should be thought out and
included in a patch like this, so it's definitely on-topic for this
thread.

Having the ability to turn things off can certainly be very useful.
Having the risk of having done so without realizing the damage caused
is a *big* foot-gun, and we need to do our best to protect against it.

This is not entirely unlike the idea that we've discussed before of
having basically a "tainted" flag in pg_control if the system has ever
been started up in say fsync=off, just to make sure that we have a
record of it. This wouldn't be the same flag of course, but it's a
similar problem, where even temporarily starting the cluster up with a
certain set of flags can do permanent damage which is not necessarily
fixed by changing it back and restarting.

This would also be something that should be exposed as monitoring
points (which it could be if it's in pg_control). That is, I can
imagine a *lot* of installations that would definitely want an alert
to fire if the cluster has ever been started up in a wal_level=none or
wal_level=minimal, at least up until the point where somebody has run
a new full backup.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

1 2 >

1 - 100 of 131 matches

Mail list logo