Hi, I suspect I am having a case of data corruption. Here are the details:
I am running postgres 9.4.8: postgresql94-9.4.8-1PGDG.rhel7.x86_64 postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64 postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64 postgresql94-server-9.4.8-1PGDG.rhel7.x86_64 on CentOS Linux release 7.2.1511 (Core) This is happening in a production environment but luckily on the reporting database. I have a cluster of 3 databases, db1 and db2 are masters and replicate between each other and also replicate to db3 (db1 <-> db2, db1 -> db3, db2 -> db3). For replication I am using Bucardo. The problem I am having is that one record in a table it shows a wrong value for one single field: select gameplayid, transactionid, encodedplay from mytable where transactionid in (75315811, 75315815) order by transactionid; gameplayid | transactionid | encodedplay ------------+---------------+-------------- 160019239 | 75315811 | mix:5,2,7 160019237 | 75315811 | mix:5,4,8 160019235 | 75315811 | mix:6,2,9 160019233 | 75315811 | mix:1,9,8 160019271 | 75315815 | mix:9,0,9 160019269 | 75315815 | mix:9,8,9 160019267 | 75315815 | mix:9,2,2 160019265 | 75315815 | mix:2,2,8 160019263 | *75315811* | backup:1,9,1 160019261 | 75315815 | backup:2,0,9 select gameplayid, transactionid, encodedplay from mytable where transactionid in (75315815) order by transactionid; gameplayid | transactionid | encodedplay ------------+---------------+-------------- 160019271 | 75315815 | mix:9,0,9 160019269 | 75315815 | mix:9,8,9 160019267 | 75315815 | mix:9,2,2 160019265 | 75315815 | mix:2,2,8 160019263 | *75315811* | backup:1,9,1 160019261 | 75315815 | backup:2,0,9 select gameplayid, transactionid, encodedplay from mytable where transactionid in (75315811) order by transactionid; gameplayid | transactionid | encodedplay ------------+---------------+-------------- 160019239 | 75315811 | mix:5,2,7 160019237 | 75315811 | mix:5,4,8 160019235 | 75315811 | mix:6,2,9 160019233 | 75315811 | mix:1,9,8 So the record with gameplayid = 160019263 have a wrong transactionid, 75315811 instead of 75315815. The correct value is 75315815 and that I know because of the following facts: - on db1 and db2 transactionid = 75315815 for gameplayid = 160019263, - this table gets mostly inserts, very rare updates and only on other 2 fields not this one. - there is another parent table that shows the number of records in this table which is 4 for transactionid =75315811 and 6 for transactionid = 7531581. This table has an index by transactionid and that index seem correct because the filtering and the ordering are fine (like the filed has the correct value)... What puzzles me is that the value that shows in this field is a real value from another record... I only caught this issue because I have a script that runs in the night that compares the databases ... By now I updated the field with the correct value and everything seem stable. Postgres logs don't have any information about file corruption or any other kind of error. I also checked other logs on the system and I could not find any traces of corruption. select name, setting from pg_settings order by 1; name | setting -------------------------------------+----------------------------------------------------------------------------------------------------------------- allow_system_table_mods | off application_name | psql archive_command | test -f /cbnDBscripts/tmp/PITR_primarydb_stop_backup || rsync --timeout=60 -atz %p stldrdb:/data01/wal_files/%f archive_mode | on archive_timeout | 60 array_nulls | on authentication_timeout | 60 autovacuum | on autovacuum_analyze_scale_factor | 0.1 autovacuum_analyze_threshold | 50 autovacuum_freeze_max_age | 200000000 autovacuum_max_workers | 3 autovacuum_multixact_freeze_max_age | 400000000 autovacuum_naptime | 60 autovacuum_vacuum_cost_delay | 20 autovacuum_vacuum_cost_limit | -1 autovacuum_vacuum_scale_factor | 0.2 autovacuum_vacuum_threshold | 50 autovacuum_work_mem | -1 backslash_quote | safe_encoding bgwriter_delay | 200 bgwriter_lru_maxpages | 100 bgwriter_lru_multiplier | 2 block_size | 8192 bonjour | off bonjour_name | bytea_output | hex check_function_bodies | on checkpoint_completion_target | 0.5 checkpoint_segments | 16 checkpoint_timeout | 300 checkpoint_warning | 30 client_encoding | UTF8 client_min_messages | error commit_delay | 0 commit_siblings | 5 config_file | /data01/postgres/postgresql.conf constraint_exclusion | partition cpu_index_tuple_cost | 0.005 cpu_operator_cost | 0.0025 cpu_tuple_cost | 0.01 cursor_tuple_fraction | 0.1 data_checksums | off data_directory | /data01/postgres DateStyle | ISO, MDY db_user_namespace | off deadlock_timeout | 1000 debug_assertions | off debug_pretty_print | on debug_print_parse | off debug_print_plan | off debug_print_rewritten | off default_statistics_target | 100 default_tablespace | default_text_search_config | pg_catalog.english default_transaction_deferrable | off default_transaction_isolation | read committed default_transaction_read_only | off default_with_oids | off dynamic_library_path | $libdir dynamic_shared_memory_type | posix effective_cache_size | 1048576 effective_io_concurrency | 1 enable_bitmapscan | on enable_hashagg | on enable_hashjoin | on enable_indexonlyscan | on enable_indexscan | on enable_material | on enable_mergejoin | on enable_nestloop | on enable_seqscan | off enable_sort | on enable_tidscan | on escape_string_warning | on event_source | PostgreSQL exit_on_error | off external_pid_file | extra_float_digits | 0 from_collapse_limit | 8 fsync | on full_page_writes | on geqo | on geqo_effort | 5 geqo_generations | 0 geqo_pool_size | 0 geqo_seed | 0 geqo_selection_bias | 2 geqo_threshold | 12 gin_fuzzy_search_limit | 0 hba_file | /data01/postgres/pg_hba.conf hot_standby | off hot_standby_feedback | off huge_pages | try ident_file | /data01/postgres/pg_ident.conf ignore_checksum_failure | off ignore_system_indexes | off integer_datetimes | on IntervalStyle | postgres join_collapse_limit | 8 krb_caseins_users | off krb_server_keyfile | FILE:/etc/sysconfig/pgsql/krb5.keytab lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 lc_messages | en_US.UTF-8 lc_monetary | en_US.UTF-8 lc_numeric | en_US.UTF-8 lc_time | en_US.UTF-8 listen_addresses | * local_preload_libraries | lock_timeout | 0 lo_compat_privileges | off log_autovacuum_min_duration | -1 log_checkpoints | off log_connections | on log_destination | csvlog log_directory | pg_log log_disconnections | on log_duration | off log_error_verbosity | default log_executor_stats | off log_file_mode | 0600 log_filename | postgresql-%d.log logging_collector | on log_hostname | off log_line_prefix | %t %d %u log_lock_waits | off log_min_duration_statement | 2000 log_min_error_statement | error log_min_messages | error log_parser_stats | off log_planner_stats | off log_rotation_age | 1440 log_rotation_size | 0 log_statement | none log_statement_stats | off log_temp_files | -1 log_timezone | America/St_Lucia log_truncate_on_rotation | on maintenance_work_mem | 131072 max_connections | 300 max_files_per_process | 1000 max_function_args | 100 max_identifier_length | 63 max_index_keys | 32 max_locks_per_transaction | 64 max_pred_locks_per_transaction | 64 max_prepared_transactions | 0 max_replication_slots | 0 max_stack_depth | 2048 max_standby_archive_delay | 30000 max_standby_streaming_delay | 30000 max_wal_senders | 0 max_worker_processes | 8 password_encryption | on port | 5432 post_auth_delay | 0 pre_auth_delay | 0 quote_all_identifiers | off random_page_cost | 4 restart_after_crash | on search_path | "$user",public,hstore,dblink,www_fdw segment_size | 131072 seq_page_cost | 1 server_encoding | UTF8 server_version | 9.4.8 server_version_num | 90408 session_preload_libraries | session_replication_role | origin shared_buffers | 262144 shared_preload_libraries | sql_inheritance | on ssl | off ssl_ca_file | ssl_cert_file | server.crt ssl_ciphers | HIGH:MEDIUM:+3DES:!aNULL ssl_crl_file | ssl_ecdh_curve | prime256v1 ssl_key_file | server.key ssl_prefer_server_ciphers | on ssl_renegotiation_limit | 0 standard_conforming_strings | on statement_timeout | 0 stats_temp_directory | pg_stat_tmp superuser_reserved_connections | 3 synchronize_seqscans | on synchronous_commit | on synchronous_standby_names | syslog_facility | local0 syslog_ident | postgres tcp_keepalives_count | 0 tcp_keepalives_idle | 0 tcp_keepalives_interval | 0 temp_buffers | 1024 temp_file_limit | -1 temp_tablespaces | TimeZone | America/St_Lucia timezone_abbreviations | Default trace_notify | off trace_recovery_messages | log trace_sort | off track_activities | on track_activity_query_size | 1024 track_counts | on track_functions | none track_io_timing | off transaction_deferrable | off transaction_isolation | read committed transaction_read_only | off transform_null_equals | off unix_socket_directories | /var/run/postgresql, /tmp unix_socket_group | unix_socket_permissions | 0777 update_process_title | on vacuum_cost_delay | 0 vacuum_cost_limit | 200 vacuum_cost_page_dirty | 20 vacuum_cost_page_hit | 1 vacuum_cost_page_miss | 10 vacuum_defer_cleanup_age | 0 vacuum_freeze_min_age | 50000000 vacuum_freeze_table_age | 150000000 vacuum_multixact_freeze_min_age | 5000000 vacuum_multixact_freeze_table_age | 150000000 wal_block_size | 8192 wal_buffers | 128 wal_keep_segments | 0 wal_level | hot_standby wal_log_hints | off wal_receiver_status_interval | 10 wal_receiver_timeout | 60000 wal_segment_size | 2048 wal_sender_timeout | 60000 wal_sync_method | fdatasync wal_writer_delay | 200 work_mem | 2048 xmlbinary | base64 xmloption | content zero_damaged_pages | off (239 rows) Any suggestions, thoughts?