Re: [BUGS] Completely broken replica after PANIC: WAL contains references to invalid pages

2013-03-29 Thread anara...@anarazel.de
Hi

Sergey Konoplev  schrieb:

>Hi all,
>
>A couple of days ago I found the replica stopped after the PANIC
>message:
>
>PANIC:  WAL contains references to invalid pages
>
>When I tried to restart it I got this FATAL:
>
>FATAL:  could not access status of transaction 280557568
>
>Below is the description of the server and information from PostgreSQL
>and system logs. After googling the problem I have found nothing like
>this.
>
>Any thoughts of what it could be and how to prevent it in the future?

I think I See whats going on. Do you still have the datadir available? If so, 
could you send the pg_controldata output?


Andres

>Hardware:
>
>IBM System x3650 M4, 148GB RAM, NAS
>
>Software:
>
>PostgreSQL 9.2.3, yum.postgresql.org
>CentOS 6.3, kernel 2.6.32-279.22.1.el6.x86_64
>
>Configuration:
>
>listen_addresses = '*'
>max_connections = 550
>shared_buffers = 35GB
>work_mem = 256MB
>maintenance_work_mem = 1GB
>bgwriter_delay = 10ms
>bgwriter_lru_multiplier = 10.0
>effective_io_concurrency = 32
>wal_level = hot_standby
>synchronous_commit = off
>checkpoint_segments = 1024
>checkpoint_timeout = 1h
>checkpoint_completion_target = 0.9
>checkpoint_warning = 5min
>max_wal_senders = 3
>wal_keep_segments = 2048
>hot_standby = on
>max_standby_streaming_delay = 5min
>hot_standby_feedback = on
>effective_cache_size = 133GB
>log_directory = '/var/log/pgsql'
>log_filename = 'postgresql-%Y-%m-%d.log'
>log_checkpoints = on
>log_line_prefix = '%t %p %u@%d from %h [vxid:%v txid:%x] [%i] '
>log_lock_waits = on
>log_statement = 'ddl'
>log_timezone = 'W-SU'
>track_activity_query_size = 4096
>autovacuum_max_workers = 5
>autovacuum_naptime = 5s
>autovacuum_vacuum_scale_factor = 0.05
>autovacuum_analyze_scale_factor = 0.05
>autovacuum_vacuum_cost_delay = 5ms
>datestyle = 'iso, dmy'
>timezone = 'W-SU'
>lc_messages = 'en_US.UTF-8'
>lc_monetary = 'ru_RU.UTF-8'
>lc_numeric = 'ru_RU.UTF-8'
>lc_time = 'ru_RU.UTF-8'
>default_text_search_config = 'pg_catalog.russian'
>
>System:
>
># Controls the maximum shared segment size, in bytes
>kernel.shmmax = 53287555072
>
># Controls the maximum number of shared memory segments, in pages
>kernel.shmall = 13009657
>
># Maximum number of file-handles
>fs.file-max = 65535
>
># pdflush tuning to prevent lag spikes
>vm.dirty_ratio = 10
>vm.dirty_background_ratio = 1
>vm.dirty_expire_centisecs = 499
>
># Prevent the scheduler breakdown
>kernel.sched_migration_cost = 500
>
># Turned off to provide more CPU to PostgreSQL
>kernel.sched_autogroup_enabled = 0
>
># Setup hugepages
>vm.hugetlb_shm_group = 26
>vm.hugepages_treat_as_movable = 0
>vm.nr_overcommit_hugepages = 512
>
># The Huge Page Size is 2048kB, so for 35GB shared buffers the number
>is 17920
>vm.nr_hugepages = 17920
>
># Turn off the NUMA local pages reclaim as it leads to wrong caching
>strategy for databases
>vm.zone_reclaim_mode = 0
>
>Environment:
>
>HUGETLB_SHM=yes
>LD_PRELOAD='/usr/lib64/libhugetlbfs.so'
>export HUGETLB_SHM LD_PRELOAD
>
>When it is stopped:
>
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
>restartpoint complete: wrote 1685004 buffers (36.7%); 0 transaction
>log file(s) added, 0 removed, 555 recycled; write=3237.402 s,
>sync=0.071 s, total=3237.507 s; sync files=2673, longest=0.008 s,
>average=0.000 s
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:  recovery
>restart point at 2538/6E154AC0
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] DETAIL:  last
>completed transaction was at log time 2013-03-26 11:50:31.613948+04
>2013-03-26 11:50:32 MSK 3775 @ from  [vxid: txid:0] [] LOG:
>restartpoint starting: xlog
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] WARNING:
>page 451 of relation base/16436/2686702648 is uninitialized
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
>lastBlockVacuumed 0
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] PANIC:  WAL
>contains references to invalid pages
>2013-03-26 11:51:16 MSK 3773 @ from  [vxid:1/0 txid:0] [] CONTEXT:
>xlog redo vacuum: rel 1663/16436/2686702648; blk 2485,
>lastBlockVacuumed 0
>2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:  startup
>process (PID 3773) was terminated by signal 6: Aborted
>2013-03-26 11:51:16 MSK 3770 @ from  [vxid: txid:0] [] LOG:
>terminating any other active server processes
>
>From /var/log/messages:
>
>Mar 26 10:50:52 tms2 kernel: : postmaster: page allocation failure.
>order:8, mode:0xd0
>Mar 26 10:50:52 tms2 kernel: : Pid: 3774, comm: postmaster Not tainted
>2.6.32-279.22.1.el6.x86_64 #1
>Mar 26 10:50:52 tms2 kernel: : Call Trace:
>Mar 26 10:50:52 tms2 kernel: : [] ?
>__alloc_pages_nodemask+0x77f/0x940
>Mar 26 10:50:52 tms2 kernel: : [] ?
>kmem_getpages+0x62/0x170
>Mar 26 10:50:52 tms2 kernel: : [] ?
>fallback_alloc+0x1ba/0x270
>Mar 26 10:50:52 tms2 kernel: : [] ?
>cache_grow+0x2cf/0x320
>Mar 26 10:50:52 tms2 kernel: : [] ?
>cache_alloc_node+0x99/0x160
>Mar 26 10:50:52 tms2 kernel: : [] ?
>dma_pin_

Re: [BUGS] BUG #6378: exceeding memory usage while creating index in pg-9.1.2

2012-01-04 Thread anara...@anarazel.de
This has been fixed since 9.1.2. If you browse the history and/or this mailing 
list you should find a reference.

I just have my mobile here, so its a bit too hard to search for a reference

Greetings,

Andres

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs