Hi again, I was able to get the database back to a normal functional state using the zero_damaged_pages flag. However, after getting everything working and starting to use the database again, I am again getting "invalid page header" errors on a certain table.
Does this imply there is a hardware issue on my machine? Is there anything else that could be causing this to come back? thanks, Cory On Fri, Dec 26, 2014 at 5:15 PM, Cory Zue <c...@dimagi.com> wrote: > Hi Chiru, > > I am trying to pg_dump the database to have a snapshot of the current > state. I've turned on 'zero_damaged_pages' but pg_dump is still failing > with an "invalid page header" error - this time from what looks like a > sequence object that is auto-setting IDs on a table. Any advice on how to > remove this error? > > Here is the full query that's failing: > > SELECT sequence_name, start_value, last_value, increment_by, CASE WHEN > increment_by > 0 AND max_value = 9223372036854775807 THEN NULL WHEN > increment_by < 0 AND max_value = -1 THEN NULL ELSE max_value END AS > max_value, CASE WHEN increment_by > 0 AND min_value = 1 THEN NULL WHEN > increment_by < 0 AND min_value = -9223372036854775807 THEN NULL ELSE > min_value END AS min_value, cache_value, is_cycled, is_called from > unfinishedsubmissionstub_id_seq > > On Fri, Dec 26, 2014 at 2:35 PM, chiru r <chir...@gmail.com> wrote: > >> Hi Cory, >> >> After recovering table turn off *zero_damaged_pages *parameter. >> >> >> On Fri, Dec 26, 2014 at 9:13 PM, Cory Zue <c...@dimagi.com> wrote: >> >>> Hi all, >>> >>> Thanks for the responses. Chiru, I'm looking into your suggestion. >>> >>> Sameer, here is the kernel version info: >>> >>> Linux dimagi 2.6.32-431.20.5.el6.x86_64 #1 SMP Wed Jul 16 05:26:53 EDT >>> 2014 x86_64 x86_64 x86_64 GNU/Linux >>> >>> Does that seem like it could be a problematic version? >>> >>> More generally - I'm still wondering whether I should chalk this failure >>> up to a transient/random issue, or whether I should be more worried about >>> the hardware on the machine. According to our diagnostic tools, disk and >>> memory are fine, but it's still not clear to me how it got into this state. >>> Any general bits of information regarding the potential causes of these >>> types of issues would be much appreciated. >>> >>> thanks, >>> Cory >>> >>> >>> On Fri, Dec 26, 2014 at 6:55 AM, Sameer Kumar <sameer.ku...@ashnik.com> >>> wrote: >>> >>>> On 23 Dec 2014 12:05, "Cory Zue" <c...@dimagi.com> wrote: >>>> > >>>> > Hi all, >>>> > >>>> > Our postgres instance on one of our production machines has recently >>>> been returning errors of the form "DatabaseError: invalid page header in >>>> block 1 of relation base/16384/76623" from normal queries. I've been >>>> reading that these are often linked to hardware errors, but I would like to >>>> better understand what else it could be or how to determine that for sure. >>>> I've filled out the standard issue reporting template below. Any feedback >>>> or troubleshooting instructions would be much appreciated. >>>> > >>>> > --- >>>> > A description of what you are trying to achieve and what results you >>>> expect.: >>>> > >>>> > Intermittent queries are failing with the error "DatabaseError: >>>> invalid page header in block 1 of relation base/16384/76623" >>>> > >>>> > PostgreSQL version number you are running: >>>> > >>>> > PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc >>>> (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit >>>> > >>>> > How you installed PostgreSQL: >>>> > >>>> > from standard package installer >>>> > >>>> > Changes made to the settings in the postgresql.conf file: >>>> > >>>> > >>>> > name | current_setting | >>>> source >>>> > >>>> ------------------------------+-----------------------------+---------------------- >>>> > checkpoint_completion_target | 0.9 | >>>> configuration file >>>> > checkpoint_segments | 32 | >>>> configuration file >>>> > checkpoint_timeout | 15min | >>>> configuration file >>>> > DateStyle | ISO, MDY | >>>> configuration file >>>> > default_text_search_config | pg_catalog.english | >>>> configuration file >>>> > effective_cache_size | 1GB | >>>> configuration file >>>> > lc_messages | en_US.UTF-8 | >>>> configuration file >>>> > lc_monetary | en_US.UTF-8 | >>>> configuration file >>>> > lc_numeric | en_US.UTF-8 | >>>> configuration file >>>> > lc_time | en_US.UTF-8 | >>>> configuration file >>>> > log_checkpoints | on | >>>> configuration file >>>> > log_connections | off | >>>> configuration file >>>> > log_destination | csvlog | >>>> configuration file >>>> > log_directory | /opt/data/pgsql/data/pg_log | >>>> configuration file >>>> > log_disconnections | off | >>>> configuration file >>>> > log_duration | on | >>>> configuration file >>>> > log_filename | postgres-%Y-%m-%d_%H%M%S | >>>> configuration file >>>> > log_lock_waits | on | >>>> configuration file >>>> > log_min_duration_statement | 250ms | >>>> configuration file >>>> > log_rotation_age | 1d | >>>> configuration file >>>> > log_rotation_size | 1GB | >>>> configuration file >>>> > log_temp_files | 0 | >>>> configuration file >>>> > log_timezone | Asia/Kolkata | command >>>> line >>>> > log_truncate_on_rotation | on | >>>> configuration file >>>> > logging_collector | on | >>>> configuration file >>>> > maintenance_work_mem | 768MB | >>>> configuration file >>>> > max_connections | 500 | >>>> configuration file >>>> > max_stack_depth | 2MB | >>>> environment variable >>>> > port | 5432 | command >>>> line >>>> > shared_buffers | 4GB | >>>> configuration file >>>> > ssl | on | >>>> configuration file >>>> > TimeZone | Asia/Kolkata | command >>>> line >>>> > timezone_abbreviations | Default | command >>>> line >>>> > wal_buffers | 16MB | >>>> configuration file >>>> > work_mem | 48MB | >>>> configuration file >>>> > >>>> > It's also probably worth noting that postgres is installed on an >>>> encrypted volume which is mounted using ecryptfs. >>>> > >>>> > Operating system and version: >>>> > >>>> > RedHatEnterpriseServer, version 6.6 >>>> > >>>> > What program you're using to connect to PostgreSQL: >>>> > >>>> > Python (django) >>>> > >>>> > Is there anything relevant or unusual in the PostgreSQL server logs?: >>>> > >>>> > I see lots of instances of this error (and similar). I'm not sure >>>> what else I should be looking for. >>>> > >>>> > What you were doing when the error happened / how to cause the error: >>>> > >>>> > I haven't explicitly tried to reproduce it, but it seems to >>>> consistently happen with certain queries. However, the system was rebooted >>>> shortly before the errors started occuring. The system was rebooted because >>>> another database (elasticsearch) was having problems on the same machine >>>> and the reboot was to attempt to resolve things. >>>> > >>>> > The EXACT TEXT of the error message you're getting, if there is one: >>>> > >>>> > DatabaseError: invalid page header in block 1 of relation >>>> base/16384/76623 >>>> > >>>> > (block and relation numbers change) >>>> > >>>> > Unfortunately, I'm not completely familiar with the CPU and disk/RAID >>>> configurations used on the server. However it is storing to a (software) >>>> encrypted volume as mentioned above. >>>> > >>>> > Have you ever set fsync=off in the postgresql config file? >>>> > No >>>> > Have you had any unexpected power loss lately? Replaced a failed RAID >>>> disk? Had an operating system crash? >>>> > Not recently, though the system did reboot normally as described >>>> above. >>>> > Have you run a file system check? (chkdsk / fsck) >>>> > No. >>>> > Are there any error messages in the system logs? >>>> (unix/linux: dmesg, /var/log/syslog ; >>>> > I haven't seen anything obvious but I wasn't sure what to look for. >>>> > >>>> >>>> I guess you missed to provide the details and kernel version (rhel >>>> version and kernel level). >>>> This will give you kernel patch level- >>>> >>>> uname -a >>>> >>>> I had once faced this issue and I was on a buggy patch of Linux kernel. >>>> I just had to update to latest patch. That worked for me. >>>> >>> >>> >> >