Re: [GENERAL] ltree gist index errors and fill factor questions
The database is not crashing thankfully. We are waiting for the errors to come back to turn up logging in the hopes of creating the reproducible set.
Re: [GENERAL] ltree gist index errors and fill factor questions
I will do my best to provide a reproducible test case. Is there any more information I can supply in the meantime that would help?
[GENERAL] ltree gist index errors and fill factor questions
Hello, We have an ltree column (tree_path) that has a gist index (index_nodes_on_tree_path). This is in a 9.3.5 database. Recently errors started occurring in the postgres log on some updates to this table: fixing incomplete split in index "index_nodes_on_tree_path", block 2358 STATEMENT: UPDATE "nodes" SET "parent_id" = $1, "contents_id" = $2, "contents_type" = $3, "trough_code" = $4, "live_flag" = $5, "ordering" =$6, "id" = $7, "tree_path" = $8 WHERE "nodes"."id" = 127695 ERROR: failed to add item to index page in "index_nodes_on_tree_path" Reindexing index_nodes_on_tree_path fixes the problem temporarily, but it returns. I'm wondering if this is a bug (googleing the incomplete split error returns results about a similar bug with b-tree indexes) or a problem with an inadequate fill factor setting. It doesnt look like there is a specified fill factor for this index and I'm not sure what the gist default is. CREATE INDEX index_nodes_on_tree_path ON nodes USING gist (tree_path) The table in question has about 94k rows, an example of the widest tree_path tuple is 69066.69090.69091.69094 Any advice is appreciated, happy new year! Mike
Re: [GENERAL] question about checksum in 9.3
Thanks for the responses, that clarifies the checksum feature for me. FWIW, my pgbench tests between a 9.2 server and a 9.3 server with checksum showed very similar performance characteristics and system resource utilization. Im going to set up another load test with our actual application to see if that reveals any noticeable performance difference. Thanks again Mike On Mon, Jan 13, 2014 at 7:11 PM, Michael Paquier wrote: > On Tue, Jan 14, 2014 at 1:50 AM, Mike Broers wrote: > > Hello, I am in the process of planning a 9.3 migration of postgres and I > am > > curious about the checksum features available. In my test 9.3 instance > it > > seemed like this feature provides a log entry of the exact database/oid > of > > the corrupt object when it is accessed, but not much else. I can't find > > much documentation on anything else this feature provides. > Few things: > - The only way to know if a server is using data checksums is to use > pg_controldata. > - Be aware as well of the potential performance impact on your CPU, > checksums are checked each time a page is read, and recalculated each > time a page is updated. > - ignore_checksum_failure can be used to ignore failures. Don't use > that on a production system. > > > Is there a built-in method of scanning the server to check for > corruption or > > will I have to wait for a corrupt object to be accessed to see the log > > entry? > You can as well access manually tables with some for example > sequential scan to check if blocks are broken or not. > > > Is there a relation that stores last checksum status or anyway of > > reporting on what objects are identified by postgres as corrupt or not > > corrupt? > No, you could build one though with a background worker that scans > relation pages and registers that failing blocks. > > > Are there any other features of the checksum I am missing besides the log > > entry? > 9.4 has a new GUC parameter called data_checksums that allow you to > check with a psql client if checksums are used on a server. > > Regards, > -- > Michael >
[GENERAL] question about checksum in 9.3
Hello, I am in the process of planning a 9.3 migration of postgres and I am curious about the checksum features available. In my test 9.3 instance it seemed like this feature provides a log entry of the exact database/oid of the corrupt object when it is accessed, but not much else. I can't find much documentation on anything else this feature provides. Is there a built-in method of scanning the server to check for corruption or will I have to wait for a corrupt object to be accessed to see the log entry? Is there a relation that stores last checksum status or anyway of reporting on what objects are identified by postgres as corrupt or not corrupt? Are there any other features of the checksum I am missing besides the log entry? Thanks Mike
Re: [GENERAL] Re: corruption issue after server crash - ERROR: unexpected chunk number 0
The restore of a post-crash production backup worked as hoped and the 2nd replication slave is back into its happy hot standby state. So if this problem replicated to our standby servers does that indicate that the potential problematic fsync occurred during a pg_xlog write? Would breaking replication at the time of the crash have prevented this from cascading or was it already too late at that point? Thanks again for the input, its been very helpful! Mike On Mon, Nov 25, 2013 at 12:20 PM, Mike Broers wrote: > Thanks Shaun, > > Im planning to schedule a time to do the vacuum freeze suggested > previously. So far the extent of the problem seems limited to the one > session table and the one session row that was being used by a heavy bot > scan at the time of the crash. Currently Im testing a recovery of a > production backup from today to rebase one of the replication targets that > I was using to test fixes last week. Hopefully that validates the current > backups and I can proceed inquiring with our managed services provider > about the false notification of the disk write and ways to prevent that > going forward. > > I'll update the list if I uncover anything interesting in the process > and/or need more advice, thanks again for your input - its much appreciated > as always. Nothing like a little crash corruption to get the blood flowing! > > Mike > > > On Mon, Nov 25, 2013 at 10:29 AM, Shaun Thomas > wrote: > >> > Update - I have two hot replication slaves of this db, both have the >> problem. >> > I took one out of recovery and ran REINDEX table session_session and it >> > fixed the errors about this row. Now Im going to run vacuum and see if >> > there are other tables that complain, but Im guessing if so I will need >> to see >> > if there is a way to force vacuum to continue on error, worst case I >> might >> > have to script a table by table vacuum script I guess.. If anyone has >> a better >> > suggestion for determining the extent of the damage Id appreciate it. >> >> Oh man. I'm sorry, Mike. >> >> One of the cardinal rules I have is to disconnect any replication >> following a database crash. It's just too easy for damaged replicated rows >> to be propagated unless you're on 9.3 and have checksums enabled. If you >> want to perform a table-by-table check, don't vacuum the database, but the >> individual tables. I'd go with a DO loop and have it raise notices into the >> log so you can investigate further: >> >> COPY ( >> SELECT 'VACUUM ' || oid::regclass::text || ';' >> FROM pg_class >> WHERE relkind = 'r' >> ) to '/tmp/vac_all.sql'; >> >> Run the /tmp/vac_all.sql through psql and pipe the contents into a log >> file. Any table that doesn't vacuum successfully will need to be repaired >> manually. One way you can do this if there are dupes, is by checking the >> ctid value after disabling index scans: >> >> SET enable_indexscan TO False; >> >> SELECT ctid, * FROM [broken_table] WHERE ...; >> >> Just construct the WHERE clause based on the error output, and you should >> get all rows if there are dupes. You'll need to figure out which row to >> keep, then delete the bad row based on the ctid. Do this as many times as >> it takes, then reindex to make sure the proper row versions are indexed. >> >> It's also a good idea to dump any table that came back with an error, >> just in case. >> >> After you've done all of that, you should re-base your replicas once >> you've determined your production system is usable. In the meantime, I >> highly recommend you set up a VIP you can assign to one of your replicas if >> your production system dies again, and remove any autostart code. If your >> production system crashes, switch the VIP immediately to a replica, and >> invalidate your old production system. Data corruption is insidious when >> streaming replication is involved. >> >> Look into tools like repmgr to handle managing your replicas as a cluster >> to make forced invalidation and re-basing easier. >> >> Good luck! >> >> -- >> Shaun Thomas >> OptionsHouse | 141 W. Jackson Blvd | Suite 500 | Chicago IL, 60604 >> 312-676-8870 >> stho...@optionshouse.com >> >> __ >> >> See http://www.peak6.com/email_disclaimer/ for terms and conditions >> related to this email >> > >
Re: [GENERAL] Re: corruption issue after server crash - ERROR: unexpected chunk number 0
Thanks Shaun, Im planning to schedule a time to do the vacuum freeze suggested previously. So far the extent of the problem seems limited to the one session table and the one session row that was being used by a heavy bot scan at the time of the crash. Currently Im testing a recovery of a production backup from today to rebase one of the replication targets that I was using to test fixes last week. Hopefully that validates the current backups and I can proceed inquiring with our managed services provider about the false notification of the disk write and ways to prevent that going forward. I'll update the list if I uncover anything interesting in the process and/or need more advice, thanks again for your input - its much appreciated as always. Nothing like a little crash corruption to get the blood flowing! Mike On Mon, Nov 25, 2013 at 10:29 AM, Shaun Thomas wrote: > > Update - I have two hot replication slaves of this db, both have the > problem. > > I took one out of recovery and ran REINDEX table session_session and it > > fixed the errors about this row. Now Im going to run vacuum and see if > > there are other tables that complain, but Im guessing if so I will need > to see > > if there is a way to force vacuum to continue on error, worst case I > might > > have to script a table by table vacuum script I guess.. If anyone has a > better > > suggestion for determining the extent of the damage Id appreciate it. > > Oh man. I'm sorry, Mike. > > One of the cardinal rules I have is to disconnect any replication > following a database crash. It's just too easy for damaged replicated rows > to be propagated unless you're on 9.3 and have checksums enabled. If you > want to perform a table-by-table check, don't vacuum the database, but the > individual tables. I'd go with a DO loop and have it raise notices into the > log so you can investigate further: > > COPY ( > SELECT 'VACUUM ' || oid::regclass::text || ';' > FROM pg_class > WHERE relkind = 'r' > ) to '/tmp/vac_all.sql'; > > Run the /tmp/vac_all.sql through psql and pipe the contents into a log > file. Any table that doesn't vacuum successfully will need to be repaired > manually. One way you can do this if there are dupes, is by checking the > ctid value after disabling index scans: > > SET enable_indexscan TO False; > > SELECT ctid, * FROM [broken_table] WHERE ...; > > Just construct the WHERE clause based on the error output, and you should > get all rows if there are dupes. You'll need to figure out which row to > keep, then delete the bad row based on the ctid. Do this as many times as > it takes, then reindex to make sure the proper row versions are indexed. > > It's also a good idea to dump any table that came back with an error, just > in case. > > After you've done all of that, you should re-base your replicas once > you've determined your production system is usable. In the meantime, I > highly recommend you set up a VIP you can assign to one of your replicas if > your production system dies again, and remove any autostart code. If your > production system crashes, switch the VIP immediately to a replica, and > invalidate your old production system. Data corruption is insidious when > streaming replication is involved. > > Look into tools like repmgr to handle managing your replicas as a cluster > to make forced invalidation and re-basing easier. > > Good luck! > > -- > Shaun Thomas > OptionsHouse | 141 W. Jackson Blvd | Suite 500 | Chicago IL, 60604 > 312-676-8870 > stho...@optionshouse.com > > __ > > See http://www.peak6.com/email_disclaimer/ for terms and conditions > related to this email >
Re: [GENERAL] corruption issue after server crash - ERROR: unexpected chunk number 0
vacuumb avz, pg_dumpall, and vacuum freeze analyze on the former standby database that received the corruption via replication all came back without errors. Is the vacuum freeze intended to potentially fix problems or just reveal if other tables may have corruption, Im trying to decide if this needs to be run in production. On Thu, Nov 21, 2013 at 5:09 PM, Kevin Grittner wrote: > Mike Broers wrote: > > > Is there anything I should look out for with vacuum freeze? > > Just check the logs and the vacuum output for errors and warnings. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [GENERAL] corruption issue after server crash - ERROR: unexpected chunk number 0
Thanks, after this pg_dumpall I am going to see what kind of impact I can expect from running VACUUM FREEZE ANALYZE (normally I just run vacuumdb -avz nightly via a cron job) and schedule time to run this in production against all the tables in the database. Is there anything I should look out for with vacuum freeze? Much appreciated, Mike On Thu, Nov 21, 2013 at 4:51 PM, Kevin Grittner wrote: > Mike Broers wrote: > > > Thanks for the response. fsync and full_page_writes are both on. > > > [ corruption appeared following power loss on the machine hosing > > the VM running PostgreSQL ] > > That leaves three possibilities: > (1) fsync doesn't actually guarantee persistence in your stack. > (2) There is a hardware problem which has not been recognized. > (3) There is a so-far unrecognized bug in PostgreSQL. > > Based on my personal experience, those are listed in descending > order of probability. I seem to recall reports of some VM for > which an fsync did not force data all the way to persistent > storage, but I don't recall which one. You might want to talk to > your service provider about what guarantees they make in this > regard. > > > Is there something else I can run to confirm we are more or less > > ok at the database level after the pg_dumpall or is there no way > > to be sure and a fresh initdb is required. > > Given that you had persistence options in their default state of > "on", and the corruption appeared after a power failure in a VM > environment, I would guess that the damage is probably limited. > That said, damage from this sort of event can remain hidden and > cause data loss later. Unfortunately we do not yet have a > consistency checker that can root out such problems. If you can > arrange a maintenance window to dump and load to a fresh initdb, > that would eliminate the possibility that some hidden corruption is > lurking. If that is not possible, running VACUUM FREEZE ANALYZE > will reduce the number of things that can go wrong, without > requiring down time. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: [GENERAL] corruption issue after server crash - ERROR: unexpected chunk number 0
Thanks for the response. fsync and full_page_writes are both on. Our database runs on a managed hosting provider's vmhost server/san, I can possibly request for them to provide some hardware test results - do you have any specifics diagnostics in mind? The crash was apparently due to our vmhost suddenly losing power, the only row that it has complained with the chunk error also migrated into both standby servers, and as previously stated was fixed with a reindex of the parent table in one of the standby servers after taking it out of recovery. The vacuumdb -avz on this test copy didnt have any errors or warnings, im going to also run a pg_dumpall on this host to see if any other rows are problematic. Is there something else I can run to confirm we are more or less ok at the database level after the pg_dumpall or is there no way to be sure and a fresh initdb is required. I am planning on running the reindex in actual production tonight during our maintenance window, but was hoping if that worked we would be out of the woods. On Thu, Nov 21, 2013 at 3:56 PM, Kevin Grittner wrote: > Mike Broers wrote: > > > Hello we are running postgres 9.2.5 on RHEL6, our production > > server crashed hard and when it came back up our logs were > > flooded with: > > > ERROR: unexpected chunk number 0 (expected 1) for toast value 117927127 > in pg_toast_19122 > > Your database is corrupted. Unless you were running with fsync = > off or full_page_writes = off, that should not happen. It is > likely to be caused by a hardware problem (bad RAM, a bad disk > drive, or network problems if your storage is across a network). > > If it were me, I would stop the database service and copy the full > data directory tree. > > http://wiki.postgresql.org/wiki/Corruption > > If fsync or full_page_writes were off, your best bet is probably to > go to your backup. If you don't go to a backup, you should try to > get to a point where you can run pg_dump, and dump and load to a > freshly initdb'd cluster. > > If fsync and full_page_writes were both on, you should run hardware > diagnostics at your earliest opportunity. When hardware starts to > fail, the first episode is rarely the last or the most severe. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
[GENERAL] Re: corruption issue after server crash - ERROR: unexpected chunk number 0
Update - I have two hot replication slaves of this db, both have the problem. I took one out of recovery and ran REINDEX table session_session and it fixed the errors about this row. Now Im going to run vacuum and see if there are other tables that complain, but Im guessing if so I will need to see if there is a way to force vacuum to continue on error, worst case I might have to script a table by table vacuum script I guess.. If anyone has a better suggestion for determining the extent of the damage Id appreciate it. On Thu, Nov 21, 2013 at 2:10 PM, Mike Broers wrote: > Hello we are running postgres 9.2.5 on RHEL6, our production server > crashed hard and when it came back up our logs were flooded with: > > STATEMENT: SELECT "session_session"."session_key", > "session_session"."session_data", "session_session"."expire_date", > "session_session"."nonce" FROM "session_session" WHERE > ("session_session"."session_key" = 'gk9aap5d7btp6tzquh0kf73gpfmik5w3' AND > "session_session"."expire_date" > '2013-11-21 13:27:33.107913' ) > > ERROR: unexpected chunk number 0 (expected 1) for toast value 117927127 > in pg_toast_19122 > > We restarted the application and whatever session was constantly hitting > that row stopped, but Im concerned about remediation. When I attempt to > read from that row the error occurs. > > select * from session_session where session_key = > 'gk9aap5d7btp6tzquh0kf73gpfmik5w3'; > ERROR: unexpected chunk number 0 (expected 1) for toast value 117927127 > in pg_toast_19122 > > When I attempt to delete this row I get this error: > delete from session_session where session_key = > 'gk9aap5d7btp6tzquh0kf73gpfmik5w3'; > ERROR: tuple concurrently updated > > We happen to have a maintenance window tonight so I will have some time > when the app is down to run some database fixes. I saw other threads > suggesting a reindex of the toast table, but this is a 14GB table and I'm > not sure how long that will take or if it will even be successful. We also > have a full db vacuum/analyze scheduled nightly for 2am so I am expecting > to learn if there are other impacted tables, but its troubling if I dont > know what the remediation. This particular table could be truncated if > necessary if that is an option but Im not sure about other tables. > > Any suggestions for how to handle the tuple concurrently updated error? Or > if a reindex is likely to help with the unexpected chunk error? > > Thanks > Mike >
[GENERAL] corruption issue after server crash - ERROR: unexpected chunk number 0
Hello we are running postgres 9.2.5 on RHEL6, our production server crashed hard and when it came back up our logs were flooded with: STATEMENT: SELECT "session_session"."session_key", "session_session"."session_data", "session_session"."expire_date", "session_session"."nonce" FROM "session_session" WHERE ("session_session"."session_key" = 'gk9aap5d7btp6tzquh0kf73gpfmik5w3' AND "session_session"."expire_date" > '2013-11-21 13:27:33.107913' ) ERROR: unexpected chunk number 0 (expected 1) for toast value 117927127 in pg_toast_19122 We restarted the application and whatever session was constantly hitting that row stopped, but Im concerned about remediation. When I attempt to read from that row the error occurs. select * from session_session where session_key = 'gk9aap5d7btp6tzquh0kf73gpfmik5w3'; ERROR: unexpected chunk number 0 (expected 1) for toast value 117927127 in pg_toast_19122 When I attempt to delete this row I get this error: delete from session_session where session_key = 'gk9aap5d7btp6tzquh0kf73gpfmik5w3'; ERROR: tuple concurrently updated We happen to have a maintenance window tonight so I will have some time when the app is down to run some database fixes. I saw other threads suggesting a reindex of the toast table, but this is a 14GB table and I'm not sure how long that will take or if it will even be successful. We also have a full db vacuum/analyze scheduled nightly for 2am so I am expecting to learn if there are other impacted tables, but its troubling if I dont know what the remediation. This particular table could be truncated if necessary if that is an option but Im not sure about other tables. Any suggestions for how to handle the tuple concurrently updated error? Or if a reindex is likely to help with the unexpected chunk error? Thanks Mike
[GENERAL] 8.3.8 question about backup/recovery behavior
We take nightly backups using the start backup, copying the data directory and archived logs, then stop backup method. Today I tested the recoverability of the backup by mounting this backup directory on a different server, copying the 3 hours of transactions logs from after last nights backup up until this morning to the different server, and starting up postgres there from the backup, and I want to verify what I experienced was normal behavior: My first attempt was to see if I could start the postmaster without setting up recovery.conf. It worked to my surprise, pg looked for the pg_xlog folder, and in the log noted the last transaction was at the time of the last available transaction log (915) and opened the database for connections. I guess I was surprised that it automatically rolled forward through all the logs up to the last one, and opened instead of just opening the database at the time of the stop backup command which was about 3 hours prior to the last available transaction log. Here is what I saw in the log: 2010-09-14 10:48:43.626 CDT <@> [6848] LOG: 0: database system was not properly shut down; automatic recovery in progress 2010-09-14 10:48:43.626 CDT <@> [6848] LOCATION: StartupXLOG, xlog.c:5006 2010-09-14 10:48:43.679 CDT <@> [6848] LOG: 0: redo starts at 15A/4E01A5D8 2010-09-14 10:48:43.679 CDT <@> [6848] LOCATION: StartupXLOG, xlog.c:5068 2010-09-14 10:49:01.059 CDT <@> [6848] LOG: 58P01: could not open file "pg_xlog/0001015A00C6" (log file 346, segment 198): No such file or directory 2010-09-14 10:49:01.059 CDT <@> [6848] LOCATION: XLogFileRead, xlog.c:2365 2010-09-14 10:49:01.059 CDT <@> [6848] LOG: 0: redo done at 15A/C5FFF1F8 2010-09-14 10:49:01.059 CDT <@> [6848] LOCATION: StartupXLOG, xlog.c:5138 2010-09-14 10:49:01.097 CDT <@> [6848] LOG: 0: last completed transaction was at log time 2010-09-14 09:15:07.450314-05 2010-09-14 10:49:01.097 CDT <@> [6848] LOCATION: StartupXLOG, xlog.c:5142 The database was open, I could log in, but I couldnt determine a way to see if the database was brought up to 9:15am or some earlier time. My second attempt was to shut down, set the recovery.conf, and start the database in recovery. This worked as well as expected. Can anyone verify that the first attempt is valid or expected behavior? I guess I'm so used to oracles very specific and strict backup/recovery methods that this shocked me a little. Thanks Mike