Following a number of kernel crashes due to memory corruption, we experienced 
"ERROR: invalid memory alloc request size 8589934587" from pg_dump. We assume 
because of corruption to a PostgreSQL page due to the same memory problems 
causing the crashes. Fortunately this was low impact - we could remove the 
single problematic row through the application. pg_dump now ok.


But recently we hit a different problem when restoring a pg_dump, seemingly 
caused by empty data in a not-null column. From pg_dump:

...

CREATE TABLE xxxxx (
    col_a bigint NOT NULL,
    col_b bytea,
    col_c integer NOT NULL,
    col_d bytea,
    col_e integer NOT NULL,
    last_modified timestamp without time zone NOT NULL
);

COPY xxxxx (col_a, col_b, col_c, col_d, col_e, last_modified) FROM stdin;
....
4675632 \\x     0       \\x     0       2017-09-27 10:34:38.109677
4675633 \\x     0       \\x     0       2017-09-27 10:34:38.113812
4675634 \\x     0       \\x     0       2017-09-27 10:34:38.118072
\N      \N      \N      \N      \N      \N
4675636 \\x     0       \\x     0       2017-09-27 10:34:38.128796
4675637 \\x     0       \\x     0       2017-09-27 10:34:38.132003
4675638 \\x     0       \\x     0       2017-09-27 10:34:38.134197
....


And then investigating rows 4675634 and 4675635:


db=# select * from xxxxx where col_a = 4675634;
 col_a   | col_b       | col_c           | col_d     | col_e         |       
last_modified
---------+-------------+-----------------+-----------+---------------+----------------------------
 4675634 | \x          |               0 | \x        |             0 | 
2017-09-27 10:34:38.118072
 (1 row)

 db=# select * from xxxxx where col_a = 4675635;
 col_a   | col_b       | col_c           | col_d     | col_e         |  
last_modified
--------+-------------+-----------------+-----------+---------------+---------------
        |             |                 |           |               |
(1 row)


Row 4675635 is very odd - NULL columns and at the same time retrievable by a 
value in col_a.


Can this be explained in any other way than by corruption or could it be some 
other behaviour that we should worry about?


(We've never seen anything like it in test, just on the system that's had the 
problems. Previous problem row was col_a=4675656 - same table and perhaps page 
and so perhaps same corruption).


Fortunately again low impact - we were able to delete the row through the 
application.


Thanks.


Reply via email to