Re: [GENERAL] Silent data loss in its pure form

Alex Ignatov Mon, 30 May 2016 15:30:00 -0700


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On 31.05.2016 0:12, Alex Ignatov wrote:


_____________________________

From: David G. Johnston <[email protected]<mailto:[email protected]>>

Sent: Monday, May 30, 2016 23:44
Subject: Re: [GENERAL] Silent data loss in its pure form

To: Alex Ignatov <[email protected]<mailto:[email protected]>>Cc: <[email protected]<mailto:[email protected]>>, Scott Marlowe<[email protected] <mailto:[email protected]>>

On Mon, May 30, 2016 at 4:22 PM, Alex Ignatov<[email protected] <mailto:[email protected]>>wrote:



    _____________________________
    From: Scott Marlowe <[email protected]
    <mailto:[email protected]>>
    Sent: Monday, May 30, 2016 20:14
    Subject: Re: [GENERAL] Silent data loss in its pure form
    To: Alex Ignatov <[email protected]
    <mailto:[email protected]>>
    Cc: <[email protected]
    <mailto:[email protected]>>



    On Mon, May 30, 2016 at 10:57 AM, Alex Ignatov
    <[email protected] <mailto:[email protected]>> wrote:
    > Following this bug reports from redhat
    > https://bugzilla.redhat.com/show_bug.cgi?id=845233
    >
    > it rising some dangerous issue:
    >
    > If on any reasons you data file is zeroed after some power
    loss(it is the
    > most known issue on XFS in the past) when you do
    > select count(*) from you_table you got zero if you table was in one
    > 1GB(default) file or some other numbers !=count (*) from
    you_table before
    > power loss
    > No errors, nothing suspicious in logs. No any checksum errors.
    Nothing.
    >
    > Silent data loss is its pure form.
    >
    > And thanks to all gods that you notice it before backup
    recycling which
    > contains good data.
    > Keep in mind it while checking you "backups" in any forms
    (pg_dump or the
    > more dangerous and short-spoken PITR file backup)
    >
    > You data is always in danger with "zeroed data file is normal file"
    > paradigm.

    That bug shows as having been fixed in 2012. Are there any modern,
    supported distros that would still have it? It sounds really bad btw.

--Sent via pgsql-general mailing list ([email protected]

    <mailto:[email protected]>)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general

    It is not about modern distros it is about possible silent data
    loss in old distros. We have replication, have some form of data
    check summing, but we are powerless in front of this XFS bug just
    because "zeroed file is you good friend in Postgres".
     With "zero file is good file" paradigm and this noted XFS bug PG
     as it is now is "colossus with feet of clay" It can do many
    things but it cant even tell us that we have some trouble with our
    precious data.
     No need to prevent or to some other AI magic and so on when zero
    doom day has come.
    What we need now is some error report about suspicious zeroed
    file. To make us sure that something went wrong and we have to do
    recovery.
    Today PG "power loss" recovery and this XFS bug poisoning our
    ensurance that  recovery went well . It went well even with zeroed
    file. It it not healthy behavior. It like a walk on a mine field
    with eyes closed.
    I think it is  very dangerous view on data to have data files
    without any header in it and without any files checking at least
    on PG start.
    With this known XFS bug  it can leads to undetected and
    unavoidable loss of data.

For those not following -general this is basically an extension ofthe following thread.

"Deleting a table file does not raise an error when the table istouched afterwards, why?"


https://www.postgresql.org/message-id/flat/[email protected]#[email protected]

David J.

It is not extension of that thread it is about XFS bug and how PGignoring zeroed file even during poweloss recovery.That thread is just topic starter on such important theme as how tosilently loose your data with broken XFS and PG.Key words is silently without any human intervention and "zero lengthfile is good file " paradigm. It is not even like on unlinking filesby hands.



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

It also can happen on ext4 with delayed allocation .
http://www.pointsoftware.ch/en/4-ext4-vs-ext3-filesystem-and-why-delayed-allocation-is-bad/
So issue become more seriously than just "XFS constanly wiped my file" mem

So it total we have at least two FS that can wiped files to zero lengthafter power loss. One can do it "by design" with "wrong" delayedallocation mount option other just because it had some bug in old kernel.



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [GENERAL] Silent data loss in its pure form

Reply via email to