On Mon, Aug 8, 2016 at 12:19 PM, Adrian Klaver <adrian.kla...@aklaver.com>
wrote:

> On 08/08/2016 09:11 AM, Ioana Danes wrote:
>
>> Hi,
>>
>> I suspect I am having a case of data corruption. Here are the details:
>>
>> I am running postgres 9.4.8:
>>
>> postgresql94-9.4.8-1PGDG.rhel7.x86_64
>> postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64
>> postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64
>> postgresql94-server-9.4.8-1PGDG.rhel7.x86_64
>>
>> on CentOS Linux release 7.2.1511 (Core)
>>
>> This is happening in a production environment but luckily on the
>> reporting database.
>> I have a cluster of 3 databases, db1 and db2 are masters and replicate
>> between each other and also replicate to db3 (db1 <-> db2, db1 -> db3,
>> db2 -> db3).
>> For replication I am using Bucardo.
>>
>
> I would say this is more a question for the Burcardo list:
>
> https://mail.endcrypt.com/mailman/listinfo/bucardo-general
>
> I am just not seeing that replicating two masters on to a single database
> is going to end well.
>

Only one master is active at one time the other one is in stand by that is
a topic for another discussion but in our case that works well.

That was my first assumption, that it is a kind of a race condition or a
bug on replication but I quickly ruled that out because that does not
explain why when I filtered the table by transactionid = 75315815 it shows
one record with transactionid 75315811...

select gameplayid, transactionid, encodedplay from abrazo.matchgameplay
where transactionid in (75315815) order by transactionid;;
 gameplayid | transactionid | encodedplay
------------+---------------+--------------
  160019271 |      75315815 | mix:9,0,9
  160019269 |      75315815 | mix:9,8,9
  160019267 |      75315815 | mix:9,2,2
  160019265 |      75315815 | mix:2,2,8
  160019263 |      *75315811 *| backup:1,9,1
  160019261 |      75315815 | backup:2,0,9

So I don't think it is a replication issue...



>
>
>> The problem I am having is that one record in a table it shows a wrong
>> value for one single field:
>>
>> select gameplayid, transactionid, encodedplay from mytable where
>> transactionid in (75315811, 75315815) order by transactionid;
>>
>>  gameplayid | transactionid | encodedplay
>> ------------+---------------+--------------
>>   160019239 |      75315811 | mix:5,2,7
>>   160019237 |      75315811 | mix:5,4,8
>>   160019235 |      75315811 | mix:6,2,9
>>   160019233 |      75315811 | mix:1,9,8
>>   160019271 |      75315815 | mix:9,0,9
>>   160019269 |      75315815 | mix:9,8,9
>>   160019267 |      75315815 | mix:9,2,2
>>   160019265 |      75315815 | mix:2,2,8
>>   160019263 |      *75315811* | backup:1,9,1
>>   160019261 |      75315815 | backup:2,0,9
>>
>> select gameplayid, transactionid, encodedplay from mytable where
>> transactionid in (75315815) order by transactionid;
>>
>>  gameplayid | transactionid | encodedplay
>> ------------+---------------+--------------
>>   160019271 |      75315815 | mix:9,0,9
>>   160019269 |      75315815 | mix:9,8,9
>>   160019267 |      75315815 | mix:9,2,2
>>   160019265 |      75315815 | mix:2,2,8
>>   160019263 |      *75315811* | backup:1,9,1
>>   160019261 |      75315815 | backup:2,0,9
>>
>> select gameplayid, transactionid, encodedplay from mytable where
>> transactionid in (75315811) order by transactionid;
>>
>>  gameplayid | transactionid | encodedplay
>> ------------+---------------+--------------
>>   160019239 |      75315811 | mix:5,2,7
>>   160019237 |      75315811 | mix:5,4,8
>>   160019235 |      75315811 | mix:6,2,9
>>   160019233 |      75315811 | mix:1,9,8
>>
>> So the record with gameplayid = 160019263 have a wrong transactionid,
>> 75315811 instead of 75315815.
>> The correct value is 75315815 and that I know because of the following
>> facts:
>> - on db1 and db2 transactionid = 75315815 for gameplayid = 160019263,
>> - this table gets mostly inserts, very rare updates and only on other 2
>> fields not this one.
>> - there is another parent table that shows the number of records in this
>> table which is 4 for transactionid =75315811  and 6 for transactionid =
>> 7531581.
>>
>> This table has an index by transactionid and that index seem correct
>> because the filtering and the ordering are fine (like the filed has the
>> correct value)...
>>
>> What puzzles me is that the value that shows in this field is a real
>> value from another record...
>>
>> I only caught this issue because I have a script that runs in the night
>> that compares the databases ...
>>
>> By now I updated the field with the correct value and everything seem
>> stable.
>>
>> Postgres logs don't have any information about file corruption or any
>> other kind of error. I also checked other logs on the system and I could
>> not find any traces of corruption.
>>
>> select name, setting from pg_settings order by 1;
>>                 name
>> |                                                     setting
>> -------------------------------------+----------------------
>> ------------------------------------------------------------
>> -------------------------------
>>  allow_system_table_mods             | off
>>  application_name                    | psql
>>  archive_command                     | test -f
>> /cbnDBscripts/tmp/PITR_primarydb_stop_backup || rsync --timeout=60 -atz
>> %p stldrdb:/data01/wal_files/%f
>>  archive_mode                        | on
>>  archive_timeout                     | 60
>>  array_nulls                         | on
>>  authentication_timeout              | 60
>>  autovacuum                          | on
>>  autovacuum_analyze_scale_factor     | 0.1
>>  autovacuum_analyze_threshold        | 50
>>  autovacuum_freeze_max_age           | 200000000
>>  autovacuum_max_workers              | 3
>>  autovacuum_multixact_freeze_max_age | 400000000
>>  autovacuum_naptime                  | 60
>>  autovacuum_vacuum_cost_delay        | 20
>>  autovacuum_vacuum_cost_limit        | -1
>>  autovacuum_vacuum_scale_factor      | 0.2
>>  autovacuum_vacuum_threshold         | 50
>>  autovacuum_work_mem                 | -1
>>  backslash_quote                     | safe_encoding
>>  bgwriter_delay                      | 200
>>  bgwriter_lru_maxpages               | 100
>>  bgwriter_lru_multiplier             | 2
>>  block_size                          | 8192
>>  bonjour                             | off
>>  bonjour_name                        |
>>  bytea_output                        | hex
>>  check_function_bodies               | on
>>  checkpoint_completion_target        | 0.5
>>  checkpoint_segments                 | 16
>>  checkpoint_timeout                  | 300
>>  checkpoint_warning                  | 30
>>  client_encoding                     | UTF8
>>  client_min_messages                 | error
>>  commit_delay                        | 0
>>  commit_siblings                     | 5
>>  config_file                         | /data01/postgres/postgresql.conf
>>  constraint_exclusion                | partition
>>  cpu_index_tuple_cost                | 0.005
>>  cpu_operator_cost                   | 0.0025
>>  cpu_tuple_cost                      | 0.01
>>  cursor_tuple_fraction               | 0.1
>>  data_checksums                      | off
>>  data_directory                      | /data01/postgres
>>  DateStyle                           | ISO, MDY
>>  db_user_namespace                   | off
>>  deadlock_timeout                    | 1000
>>  debug_assertions                    | off
>>  debug_pretty_print                  | on
>>  debug_print_parse                   | off
>>  debug_print_plan                    | off
>>  debug_print_rewritten               | off
>>  default_statistics_target           | 100
>>  default_tablespace                  |
>>  default_text_search_config          | pg_catalog.english
>>  default_transaction_deferrable      | off
>>  default_transaction_isolation       | read committed
>>  default_transaction_read_only       | off
>>  default_with_oids                   | off
>>  dynamic_library_path                | $libdir
>>  dynamic_shared_memory_type          | posix
>>  effective_cache_size                | 1048576
>>  effective_io_concurrency            | 1
>>  enable_bitmapscan                   | on
>>  enable_hashagg                      | on
>>  enable_hashjoin                     | on
>>  enable_indexonlyscan                | on
>>  enable_indexscan                    | on
>>  enable_material                     | on
>>  enable_mergejoin                    | on
>>  enable_nestloop                     | on
>>  enable_seqscan                      | off
>>  enable_sort                         | on
>>  enable_tidscan                      | on
>>  escape_string_warning               | on
>>  event_source                        | PostgreSQL
>>  exit_on_error                       | off
>>  external_pid_file                   |
>>  extra_float_digits                  | 0
>>  from_collapse_limit                 | 8
>>  fsync                               | on
>>  full_page_writes                    | on
>>  geqo                                | on
>>  geqo_effort                         | 5
>>  geqo_generations                    | 0
>>  geqo_pool_size                      | 0
>>  geqo_seed                           | 0
>>  geqo_selection_bias                 | 2
>>  geqo_threshold                      | 12
>>  gin_fuzzy_search_limit              | 0
>>  hba_file                            | /data01/postgres/pg_hba.conf
>>  hot_standby                         | off
>>  hot_standby_feedback                | off
>>  huge_pages                          | try
>>  ident_file                          | /data01/postgres/pg_ident.conf
>>  ignore_checksum_failure             | off
>>  ignore_system_indexes               | off
>>  integer_datetimes                   | on
>>  IntervalStyle                       | postgres
>>  join_collapse_limit                 | 8
>>  krb_caseins_users                   | off
>>  krb_server_keyfile                  | FILE:/etc/sysconfig/pgsql/krb5
>> .keytab
>>  lc_collate                          | en_US.UTF-8
>>  lc_ctype                            | en_US.UTF-8
>>  lc_messages                         | en_US.UTF-8
>>  lc_monetary                         | en_US.UTF-8
>>  lc_numeric                          | en_US.UTF-8
>>  lc_time                             | en_US.UTF-8
>>  listen_addresses                    | *
>>  local_preload_libraries             |
>>  lock_timeout                        | 0
>>  lo_compat_privileges                | off
>>  log_autovacuum_min_duration         | -1
>>  log_checkpoints                     | off
>>  log_connections                     | on
>>  log_destination                     | csvlog
>>  log_directory                       | pg_log
>>  log_disconnections                  | on
>>  log_duration                        | off
>>  log_error_verbosity                 | default
>>  log_executor_stats                  | off
>>  log_file_mode                       | 0600
>>  log_filename                        | postgresql-%d.log
>>  logging_collector                   | on
>>  log_hostname                        | off
>>  log_line_prefix                     | %t %d %u
>>  log_lock_waits                      | off
>>  log_min_duration_statement          | 2000
>>  log_min_error_statement             | error
>>  log_min_messages                    | error
>>  log_parser_stats                    | off
>>  log_planner_stats                   | off
>>  log_rotation_age                    | 1440
>>  log_rotation_size                   | 0
>>  log_statement                       | none
>>  log_statement_stats                 | off
>>  log_temp_files                      | -1
>>  log_timezone                        | America/St_Lucia
>>  log_truncate_on_rotation            | on
>>  maintenance_work_mem                | 131072
>>  max_connections                     | 300
>>  max_files_per_process               | 1000
>>  max_function_args                   | 100
>>  max_identifier_length               | 63
>>  max_index_keys                      | 32
>>  max_locks_per_transaction           | 64
>>  max_pred_locks_per_transaction      | 64
>>  max_prepared_transactions           | 0
>>  max_replication_slots               | 0
>>  max_stack_depth                     | 2048
>>  max_standby_archive_delay           | 30000
>>  max_standby_streaming_delay         | 30000
>>  max_wal_senders                     | 0
>>  max_worker_processes                | 8
>>  password_encryption                 | on
>>  port                                | 5432
>>  post_auth_delay                     | 0
>>  pre_auth_delay                      | 0
>>  quote_all_identifiers               | off
>>  random_page_cost                    | 4
>>  restart_after_crash                 | on
>>  search_path                         | "$user",public,hstore,dblink,w
>> ww_fdw
>>  segment_size                        | 131072
>>  seq_page_cost                       | 1
>>  server_encoding                     | UTF8
>>  server_version                      | 9.4.8
>>  server_version_num                  | 90408
>>  session_preload_libraries           |
>>  session_replication_role            | origin
>>  shared_buffers                      | 262144
>>  shared_preload_libraries            |
>>  sql_inheritance                     | on
>>  ssl                                 | off
>>  ssl_ca_file                         |
>>  ssl_cert_file                       | server.crt
>>  ssl_ciphers                         | HIGH:MEDIUM:+3DES:!aNULL
>>  ssl_crl_file                        |
>>  ssl_ecdh_curve                      | prime256v1
>>  ssl_key_file                        | server.key
>>  ssl_prefer_server_ciphers           | on
>>  ssl_renegotiation_limit             | 0
>>  standard_conforming_strings         | on
>>  statement_timeout                   | 0
>>  stats_temp_directory                | pg_stat_tmp
>>  superuser_reserved_connections      | 3
>>  synchronize_seqscans                | on
>>  synchronous_commit                  | on
>>  synchronous_standby_names           |
>>  syslog_facility                     | local0
>>  syslog_ident                        | postgres
>>  tcp_keepalives_count                | 0
>>  tcp_keepalives_idle                 | 0
>>  tcp_keepalives_interval             | 0
>>  temp_buffers                        | 1024
>>  temp_file_limit                     | -1
>>  temp_tablespaces                    |
>>  TimeZone                            | America/St_Lucia
>>  timezone_abbreviations              | Default
>>  trace_notify                        | off
>>  trace_recovery_messages             | log
>>  trace_sort                          | off
>>  track_activities                    | on
>>  track_activity_query_size           | 1024
>>  track_counts                        | on
>>  track_functions                     | none
>>  track_io_timing                     | off
>>  transaction_deferrable              | off
>>  transaction_isolation               | read committed
>>  transaction_read_only               | off
>>  transform_null_equals               | off
>>  unix_socket_directories             | /var/run/postgresql, /tmp
>>  unix_socket_group                   |
>>  unix_socket_permissions             | 0777
>>  update_process_title                | on
>>  vacuum_cost_delay                   | 0
>>  vacuum_cost_limit                   | 200
>>  vacuum_cost_page_dirty              | 20
>>  vacuum_cost_page_hit                | 1
>>  vacuum_cost_page_miss               | 10
>>  vacuum_defer_cleanup_age            | 0
>>  vacuum_freeze_min_age               | 50000000
>>  vacuum_freeze_table_age             | 150000000
>>  vacuum_multixact_freeze_min_age     | 5000000
>>  vacuum_multixact_freeze_table_age   | 150000000
>>  wal_block_size                      | 8192
>>  wal_buffers                         | 128
>>  wal_keep_segments                   | 0
>>  wal_level                           | hot_standby
>>  wal_log_hints                       | off
>>  wal_receiver_status_interval        | 10
>>  wal_receiver_timeout                | 60000
>>  wal_segment_size                    | 2048
>>  wal_sender_timeout                  | 60000
>>  wal_sync_method                     | fdatasync
>>  wal_writer_delay                    | 200
>>  work_mem                            | 2048
>>  xmlbinary                           | base64
>>  xmloption                           | content
>>  zero_damaged_pages                  | off
>> (239 rows)
>>
>>
>> Any suggestions, thoughts?
>>
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Reply via email to