On Wed, Oct 26, 2016 at 11:35 AM, Merlin Moncure <mmonc...@gmail.com> wrote: > On Tue, Oct 25, 2016 at 3:08 PM, Merlin Moncure <mmonc...@gmail.com> wrote: >> Confirmation of problem re-occurrence will come in a few days. I'm >> much more likely to believe 6+sigma occurrence (storage, freak bug, >> etc) should it prove the problem goes away post rebuild. > > ok, no major reported outage yet, but just got: > > 2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR: invalid page in > block 12 of relation base/203883/1259
*) still on 9.5.2 :( not in control of the package update process *) getting more of these: [root@rcdylsdbmpf001 data]# cat /var/lib/pgsql/9.5/data_checksum/pg_log/postgresql-26.log | grep "invalid page" 2016-10-26 11:26:42 CDT [postgres@castaging]: ERROR: invalid page in block 10 of relation base/203883/1259 2016-10-26 11:27:55 CDT [postgres@castaging]: ERROR: invalid page in block 12 of relation base/203883/1259 2016-10-26 12:16:44 CDT [postgres@castaging]: ERROR: invalid page in block 13 of relation base/203883/1259 2016-10-26 12:18:58 CDT [postgres@castaging]: ERROR: invalid page in block 15 of relation base/203883/1259 2016-10-26 12:19:12 CDT [postgres@castaging]: ERROR: invalid page in block 7 of relation base/203883/2662 castaging=# select relname from pg_class where oid in(1259, 2662); relname ──────────────────── pg_class_oid_index pg_class *) only one database in the cluster is damaged. I do not suspect storage at this point *) Currently I can execute pg_dump with no problems. So far impact is not severe but soon I have to do dump/restore *) age(xmin) from pg_class gives all reasonable values (approx 757k) that aren't 2147483647 *) All cases of statement executing getting this error in a pl/pgsql routine that does the following: BEGIN ... <several postgres_fdw querying same 1 record 1 column tabel> DROP TABLE IF EXISTS foo; CREATE TEMP TABLE foo ON COMMIT DROP AS ... SELECT <sqsh function push sql script to sql sever> ... INTO Result; DROP TABLE IF EXISTS bar; CREATE TEMP TABLE bar ON COMMIT DROP AS ... PERFORM <sqsh function removing script>; EXCEPTION WHEN OTHERS THEN PERFORM <sqsh function removing script>; RAISE; END; *) page verification miss is numerically close [root@rcdylsdbmpf001 ~]# cat /var/lib/pgsql/9.5/data/pg_log/postgresql-26.log | grep "page verification" 2016-10-26 11:26:42 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 37251 but expected 37244 2016-10-26 11:27:55 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 37249 but expected 37244 2016-10-26 12:16:44 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 44363 but expected 44364 2016-10-26 12:18:58 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 49525 but expected 49539 2016-10-26 12:19:12 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 37345 but expected 37340 *) sample log leading into the first error (all queries are logged) 2016-10-26 11:26:40 CDT [postgres@castaging]: LOG: execute <unnamed>: SELECT PushMarketSample($1::TEXT) AS published 2016-10-26 11:26:40 CDT [postgres@castaging]: DETAIL: parameters: $1 = '8840' 2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG: statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ 2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG: statement: SAVEPOINT s2 2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG: execute <unnamed>: DECLARE c1 CURSOR FOR SELECT period FROM public.dataentryperiod 2016-10-26 11:26:40 CDT [postgres@mpf2]: LOG: statement: FETCH 100 FROM c1 2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG: statement: CLOSE c1 2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG: execute <unnamed>: DECLARE c2 CURSOR FOR SELECT period FROM public.dataentryperiod 2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG: statement: FETCH 100 FROM c2 2016-10-26 11:26:41 CDT [postgres@mpf2]: LOG: statement: CLOSE c2 2016-10-26 11:26:41 CDT [rms@mpf2]: ERROR: relation "tblagent" does not exist at character 15 2016-10-26 11:26:41 CDT [rms@mpf2]: STATEMENT: select * from tblagent 2016-10-26 11:26:41 CDT [rms@mpf2]: LOG: execute <unnamed>: SELECT CURRENT_SCHEMA() 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: execute <unnamed>: DECLARE c3 CURSOR FOR SELECT period FROM public.dataentryperiod 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: statement: FETCH 100 FROM c3 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: statement: CLOSE c3 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: execute <unnamed>: DECLARE c4 CURSOR FOR SELECT period FROM public.dataentryperiod 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: statement: FETCH 100 FROM c4 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: statement: CLOSE c4 2016-10-26 11:26:42 CDT [postgres@castaging]: WARNING: page verification failed, calculated checksum 37251 but expected 37244 2016-10-26 11:26:42 CDT [postgres@castaging]: CONTEXT: SQL statement "CREATE TEMP TABLE ApartmentPublishBySubmkt ON COMMIT DROP AS SELECT ApartmentId, TotalUnits, SubmarketId, SUM(Rent * UnitCount) / NULLIF(SUM(UnitCount), 0) AS Rent, SUM(Occupancy * UnitCount) / NULLIF(SUM(UnitCount), 0) AS Occupancy FROM AptSample GROUP BY 1, 2, 3" PL/pgSQL function pushmarketsample(text,date,integer) line 103 at SQL statement 2016-10-26 11:26:42 CDT [postgres@mpf2]: LOG: statement: ROLLBACK TO SAVEPOINT s2; RELEASE SAVEPOINT s2 2016-10-26 11:26:42 CDT [postgres@castaging]: ERROR: invalid page in block 10 of relation base/203883/1259 2016-10-26 11:26:42 CDT [postgres@castaging]: CONTEXT: SQL statement "CREATE TEMP TABLE ApartmentPublishBySubmkt ON COMMIT DROP AS SELECT ApartmentId, TotalUnits, SubmarketId, SUM(Rent * UnitCount) / NULLIF(SUM(UnitCount), 0) AS Rent, merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers