Hi Cory, After recovering table turn off *zero_damaged_pages *parameter.
On Fri, Dec 26, 2014 at 9:13 PM, Cory Zue <c...@dimagi.com> wrote: > Hi all, > > Thanks for the responses. Chiru, I'm looking into your suggestion. > > Sameer, here is the kernel version info: > > Linux dimagi 2.6.32-431.20.5.el6.x86_64 #1 SMP Wed Jul 16 05:26:53 EDT > 2014 x86_64 x86_64 x86_64 GNU/Linux > > Does that seem like it could be a problematic version? > > More generally - I'm still wondering whether I should chalk this failure > up to a transient/random issue, or whether I should be more worried about > the hardware on the machine. According to our diagnostic tools, disk and > memory are fine, but it's still not clear to me how it got into this state. > Any general bits of information regarding the potential causes of these > types of issues would be much appreciated. > > thanks, > Cory > > > On Fri, Dec 26, 2014 at 6:55 AM, Sameer Kumar <sameer.ku...@ashnik.com> > wrote: > >> On 23 Dec 2014 12:05, "Cory Zue" <c...@dimagi.com> wrote: >> > >> > Hi all, >> > >> > Our postgres instance on one of our production machines has recently >> been returning errors of the form "DatabaseError: invalid page header in >> block 1 of relation base/16384/76623" from normal queries. I've been >> reading that these are often linked to hardware errors, but I would like to >> better understand what else it could be or how to determine that for sure. >> I've filled out the standard issue reporting template below. Any feedback >> or troubleshooting instructions would be much appreciated. >> > >> > --- >> > A description of what you are trying to achieve and what results you >> expect.: >> > >> > Intermittent queries are failing with the error "DatabaseError: invalid >> page header in block 1 of relation base/16384/76623" >> > >> > PostgreSQL version number you are running: >> > >> > PostgreSQL 8.4.13 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) >> 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit >> > >> > How you installed PostgreSQL: >> > >> > from standard package installer >> > >> > Changes made to the settings in the postgresql.conf file: >> > >> > >> > name | current_setting | >> source >> > >> ------------------------------+-----------------------------+---------------------- >> > checkpoint_completion_target | 0.9 | >> configuration file >> > checkpoint_segments | 32 | >> configuration file >> > checkpoint_timeout | 15min | >> configuration file >> > DateStyle | ISO, MDY | >> configuration file >> > default_text_search_config | pg_catalog.english | >> configuration file >> > effective_cache_size | 1GB | >> configuration file >> > lc_messages | en_US.UTF-8 | >> configuration file >> > lc_monetary | en_US.UTF-8 | >> configuration file >> > lc_numeric | en_US.UTF-8 | >> configuration file >> > lc_time | en_US.UTF-8 | >> configuration file >> > log_checkpoints | on | >> configuration file >> > log_connections | off | >> configuration file >> > log_destination | csvlog | >> configuration file >> > log_directory | /opt/data/pgsql/data/pg_log | >> configuration file >> > log_disconnections | off | >> configuration file >> > log_duration | on | >> configuration file >> > log_filename | postgres-%Y-%m-%d_%H%M%S | >> configuration file >> > log_lock_waits | on | >> configuration file >> > log_min_duration_statement | 250ms | >> configuration file >> > log_rotation_age | 1d | >> configuration file >> > log_rotation_size | 1GB | >> configuration file >> > log_temp_files | 0 | >> configuration file >> > log_timezone | Asia/Kolkata | command >> line >> > log_truncate_on_rotation | on | >> configuration file >> > logging_collector | on | >> configuration file >> > maintenance_work_mem | 768MB | >> configuration file >> > max_connections | 500 | >> configuration file >> > max_stack_depth | 2MB | >> environment variable >> > port | 5432 | command >> line >> > shared_buffers | 4GB | >> configuration file >> > ssl | on | >> configuration file >> > TimeZone | Asia/Kolkata | command >> line >> > timezone_abbreviations | Default | command >> line >> > wal_buffers | 16MB | >> configuration file >> > work_mem | 48MB | >> configuration file >> > >> > It's also probably worth noting that postgres is installed on an >> encrypted volume which is mounted using ecryptfs. >> > >> > Operating system and version: >> > >> > RedHatEnterpriseServer, version 6.6 >> > >> > What program you're using to connect to PostgreSQL: >> > >> > Python (django) >> > >> > Is there anything relevant or unusual in the PostgreSQL server logs?: >> > >> > I see lots of instances of this error (and similar). I'm not sure what >> else I should be looking for. >> > >> > What you were doing when the error happened / how to cause the error: >> > >> > I haven't explicitly tried to reproduce it, but it seems to >> consistently happen with certain queries. However, the system was rebooted >> shortly before the errors started occuring. The system was rebooted because >> another database (elasticsearch) was having problems on the same machine >> and the reboot was to attempt to resolve things. >> > >> > The EXACT TEXT of the error message you're getting, if there is one: >> > >> > DatabaseError: invalid page header in block 1 of relation >> base/16384/76623 >> > >> > (block and relation numbers change) >> > >> > Unfortunately, I'm not completely familiar with the CPU and disk/RAID >> configurations used on the server. However it is storing to a (software) >> encrypted volume as mentioned above. >> > >> > Have you ever set fsync=off in the postgresql config file? >> > No >> > Have you had any unexpected power loss lately? Replaced a failed RAID >> disk? Had an operating system crash? >> > Not recently, though the system did reboot normally as described above. >> > Have you run a file system check? (chkdsk / fsck) >> > No. >> > Are there any error messages in the system logs? >> (unix/linux: dmesg, /var/log/syslog ; >> > I haven't seen anything obvious but I wasn't sure what to look for. >> > >> >> I guess you missed to provide the details and kernel version (rhel >> version and kernel level). >> This will give you kernel patch level- >> >> uname -a >> >> I had once faced this issue and I was on a buggy patch of Linux kernel. I >> just had to update to latest patch. That worked for me. >> > >