Re: [GENERAL] MultiXact member wraparound protections are disabled

2016-10-13 Thread AnandKumar, Karthik
Thank you for your help Alvaro - we really appreciate it.
The error in fact stopped this morning - we took downtime and ran a vacuum 
across all of our tables, and saw increased auto vacuum activity as well.

It looks like it bumped up the oldest multitxid to something other than 1 now:

postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata 
/var/lib/pgsql/cmates/data | grep -i multi 
Latest checkpoint's NextMultiXactId: 785051 
Latest checkpoint's NextMultiOffset: 1446371 
Latest checkpoint's oldestMultiXid: 575211 
Latest checkpoint's oldestMulti's DB: 12998


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] MultiXact member wraparound protections are disabled

2016-10-13 Thread AnandKumar, Karthik
Thanks. We started seeing this error right after a SAN FC re-cable effort - so 
yes, that would make sense. 
We’ll do a little more digging to see if the  could have gotten removed.
If that’s an older file that we have in our filesystem backups, is it safe to 
restore from there?




On 10/13/16, 3:30 PM, "Alvaro Herrera"  wrote:

>AnandKumar, Karthik wrote:
>> root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/members
>>   0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B  000C 
>>  000D  000E  000F  0010  0011  0012  0013  0014  0015  0016  0017  0018  
>> 0019  001A  001B
>> root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/offsets
>> 0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B
>
>> postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata 
>> /var/lib/pgsql/cmates/data
>
>> Latest checkpoint's NextMultiXactId:  784503
>> Latest checkpoint's NextMultiOffset:  1445264
>> Latest checkpoint's oldestMultiXid:   1
>> Latest checkpoint's oldestMulti's DB: 16457
>
>This looks perfectly normal, except that the pg_multixact/offsets/
>file is gone.  oldestMultiXid is 1 so I don't see how could have the
>file gotten removed.  Has this been upgraded recently from a previous
>9.3 or 9.4 version?  There have been bugs in this area but they've been
>fixed now for some time.
>
>The  file could have been removed manually, perhaps?
>
>-- 
>Álvaro Herrerahttps://www.2ndQuadrant.com/
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] MultiXact member wraparound protections are disabled

2016-10-12 Thread AnandKumar, Karthik
root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/members
  0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B  000C  
000D  000E  000F  0010  0011  0012  0013  0014  0015  0016  0017  0018  0019  
001A  001B
root@site-db01a:/var/lib/pgsql/cmates/data # ls pg_multixact/offsets
0001  0002  0003  0004  0005  0006  0007  0008  0009  000A  000B



postgres@site-db01a:~ $ /usr/pgsql-9.4/bin/pg_controldata 
/var/lib/pgsql/cmates/data
pg_control version number:942
Catalog version number:   201409291
Database system identifier:   6228991221455883206
Database cluster state:   in production
pg_control last modified: Wed 12 Oct 2016 05:22:45 PM PDT
Latest checkpoint location:   62D0/BDE939F8
Prior checkpoint location:62CF/F039BFD0
Latest checkpoint's REDO location:62D0/8A060220
Latest checkpoint's REDO WAL file:000162D0008A
Latest checkpoint's TimeLineID:   1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:  1/1834305762
Latest checkpoint's NextOID:  19540327
Latest checkpoint's NextMultiXactId:  784503
Latest checkpoint's NextMultiOffset:  1445264
Latest checkpoint's oldestXID:226141373
Latest checkpoint's oldestXID's DB:   16457
Latest checkpoint's oldestActiveXID:  1834302410
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 16457
Time of latest checkpoint:Wed 12 Oct 2016 05:22:05 PM PDT
Fake LSN counter for unlogged rels:   0/1
Minimum recovery ending location: 0/0
Min recovery ending loc's timeline:   0
Backup start location:0/0
Backup end location:  0/0
End-of-backup record required:no
Current wal_level setting:hot_standby
Current wal_log_hints setting:off
Current max_connections setting:  1500
Current max_worker_processes setting: 8
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   1000
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Size of a large-object chunk: 2048
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value
Data page checksum version:   0






On 10/13/16, 5:28 AM, "Alvaro Herrera"  wrote:

>AnandKumar, Karthik wrote:
>> Hi,
>> 
>> We run postgres 9.4.5.
>> 
>> Starting this morning, we started seeing messages like the below:
>> Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] 
>> app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled 
>> because oldest checkpointed MultiXact 1 does not exist on disk
>> Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] 
>> app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled 
>> because oldest checkpointed MultiXact 1 does not exist on disk
>> Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] 
>> app=,user=,db=,ip=LOG:  MultiXact member wraparound protections are disabled 
>> because oldest checkpointed MultiXact 1 does not exist on disk
>> 
>> Our autovacuum_freeze_max_age = 175000.
>> 
>> site=# SELECT datname, age(datfrozenxid) FROM pg_database;
>>  datname  |age
>> ---+
>> site  | 1645328344
>> template0 | 1274558807
>> bench | 1274558807
>> postgres  | 1324283514
>> template1 | 1274558807
>> 
>> So we’re about 100 mil transactions away before we start vacuuming to 
>> prevent wraparound.
>> 
>> We’re running precautionary vacuums on our largest offenders to try and drop 
>> our transaction ids
>> 
>> What I’d request some clarity on is the message above. What does it mean 
>> that "oldest checkpointed MultiXact does not exist on disk”? Would we lose 
>> data if we did have to wrap around?
>> 
>> Is this telling us we’re not vacuuming effectively enough?
>
>Ugh.  Can you share the output of pg_controldata and the list of files
>in pg_multixact/members and pg_multixact/offset?
>
>The problem here is that multixact vacuuming is separate from xid
>vacuuming, so you need to be looking at datminmulti rather than
>datfrozenxid.  It may be that multixact wrap around has already
>occurred.
>
>-- 
>Álvaro Herrerahttps://www.2ndQuadrant.com/
>PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] MultiXact member wraparound protections are disabled

2016-10-12 Thread AnandKumar, Karthik
Hi,

We run postgres 9.4.5.

Starting this morning, we started seeing messages like the below:
Oct 12 14:07:15 site-db01a postgres[11253]: [106430-1] app=,user=,db=,ip=LOG:  
MultiXact member wraparound protections are disabled because oldest 
checkpointed MultiXact 1 does not exist on disk
Oct 12 14:09:26 site-db01a postgres[11253]: [106526-1] app=,user=,db=,ip=LOG:  
MultiXact member wraparound protections are disabled because oldest 
checkpointed MultiXact 1 does not exist on disk
Oct 12 14:14:18 site-db01a postgres[11253]: [106608-1] app=,user=,db=,ip=LOG:  
MultiXact member wraparound protections are disabled because oldest 
checkpointed MultiXact 1 does not exist on disk

Our autovacuum_freeze_max_age = 175000.

site=# SELECT datname, age(datfrozenxid) FROM pg_database;
 datname  |age
---+
site  | 1645328344
template0 | 1274558807
bench | 1274558807
postgres  | 1324283514
template1 | 1274558807

So we’re about 100 mil transactions away before we start vacuuming to prevent 
wraparound.

We’re running precautionary vacuums on our largest offenders to try and drop 
our transaction ids

What I’d request some clarity on is the message above. What does it mean that 
"oldest checkpointed MultiXact does not exist on disk”? Would we lose data if 
we did have to wrap around?

Is this telling us we’re not vacuuming effectively enough?

Thanks,
Karthik


[GENERAL] Online backup of PostgreSQL data.

2015-09-17 Thread Raman, Karthik IN BLR STS
Hi All,

I need a small clarification. We are using PostgreSQL version 9.4.1

We have a requirement to take the online backup of a running PostgreSQL system 
(with out stopping the database).
As per the design / architecture of PostgreSQL system, is it technically 
allowed to dynamically copy the "Data" folder of PostgreSQL system (with out 
using any replication concept).
While restoring (we are allowed to stop the database!), can we directly restore 
this data folder? Are there any potential problem we will come across by doing 
this way?
(Or) Should I use some replication concept like using pg_basebackup (?) to 
achieve the same. (In our case, we do not need replication as a feature to do 
in different PC). It is sufficient for us to take the backup in the same PC.

Any suggestions / help is highly appreciated. Thanks in advance!

With best regards,
Karthik R






[GENERAL] live and dead tuples are zero on slave running on tmpfs

2015-08-19 Thread Karthik Viswanathan
Hello,

I have a master slave (read replica) setup running pg 9.4.4. I'm
trying to find the dead tuples out both the master and slave

Here's what it looks like on master:

# select relname ,n_live_tup ,n_dead_tup from pg_stat_user_tables;
 relname  | n_live_tup | n_dead_tup
--++
 test_52 |4998366 |  0
 test_v2 |   25182728 |4086591
 test_1mrows |1000127 |  0

That seems legit because I did an update to ~4million rows just before this.

Here's what it looks on slave though

 #select relname ,n_live_tup ,n_dead_tup from pg_stat_user_tables;
 relname  | n_live_tup | n_dead_tup
--++
 test_52 |  0 |  0
 test_v2 |  0 |  0
 test_1mrows |  0 |  0

the postgres data directory on the slave is configured to a tmpfs
mounted store. Would this cause it to have zero live & dead tuples ?

postgres version on master and slave:
PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu
4.8.2-19ubuntu1) 4.8.2, 64-bit

-- Karthik


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Question on session_replication_role

2015-02-02 Thread Anand Kumar, Karthik
Our set up:
* Db version: postgres 9.3.4
* OS: CentOS 5.6
* kernel Version - Linux 2.6.18-238.19.1.el5 #1 SMP Fri Jul 15 07:31:24 
EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
* memory - 256 GB
* We have slony replicating to this DB from the primary server
* Some of the DB parameters:
max_connections  - 1500
shared_buffers - 8GB
work_mem = 256MB
maintenance_work_mem = 1GB

Problem statement:
On Jan 27th 2015:
We inserted 77k records to a table with 780 million records by disabling 
triggers using SET session_replication_role = replica;

Table definition:
site=# \d reg_email_subscriptions
   Table "emailsubscription.reg_email_subscriptions"
  Column  |  Type  | Modifiers
--++---
registration_id  | bigint | not null
subscription_id  | bigint | not null
creation_date| timestamp(0) without time zone | not null
last_update_date | timestamp(0) without time zone | not null
Number of child tables: 8 (Use \d+ to list them.)
Tablespace: "emailsubscription"

Indexes/constraints on each of the 8 partitions:
"reg_email_subscriptions_p00_pkey" PRIMARY KEY, btree (registration_id, 
subscription_id), tablespace "emailsubscription"
"reg_email_subscriptions_p00_n1" btree (subscription_id), tablespace 
"emailsubscription"
"reg_email_subscriptions_p00_n20" btree (last_update_date), tablespace 
"emailsubscription"
"reg_email_subscriptions_p00_n3" btree (creation_date), tablespace 
“emailsubscription"

Soon after learning that using this setting also disables PK on the table, we 
manually deleted 117 duplicate records that got inserted.

On Jan 28th 2015:
The query that was taking 7 secs started taking over 30 seconds.
SELECT DISTINCT it.recipient_id  FROM  iru.iru_tags it WHERE it.recipient_id 
BETWEEN 758587587 and 968696896 AND   it.status = 0 AND   it.last_update_date 
>= date_trunc('day', now() - interval '90 days') AND EXISTS (SELECT DISTINCT 
res.registration_id FROM emailsubscription.reg_email_subscriptions res WHERE 
res.registration_id = it.recipient_id AND res.subscription_id = 200400);

Table definition of the other table involved in the query (iru.iru_tags): - No 
manual DML was done on this table.
site=# \d iru_tags
Table "iru.iru_tags"
  Column  |Type | Modifiers
--+-+---
recipient_id | bigint  | not null
tagger_id| bigint  | not null
tag_id   | integer | not null
batch_id | integer | not null
status   | integer |
creation_date| timestamp without time zone | not null
last_update_date | timestamp without time zone | not null
Indexes:
"iru_tags_pk" PRIMARY KEY, btree (recipient_id, tagger_id, batch_id, tag_id)
"iru_tags_n1" btree (recipient_id, tag_id)
"iru_tags_n2" btree (last_update_date)
"iru_tags_n3" btree (creation_date)
"iru_tags_n31" btree (status, recipient_id, last_update_date)
"iru_tags_n4" btree (tagger_id)


Observation:

1)  The query was taking 2-7 seconds consistently until the day before we 
inserted records by disabling the triggers.

2)  When we run the selects on each table separately, the query runs really 
fast. The moment we introduce the join (AND EXISTS), the sql takes over 30 
seconds.

3)  The explain plan of this query shows that Primary key on 
reg_email_subscriptions and unique index on iru_tags table is being used.

site=# explain(analyze on, verbose on, costs on, buffers on, timing on) SELECT 
DISTINCT it.recipient_id  FROM  iru.iru_tags it WHERE it.recipient_id BETWEEN 
758587587 and 968696896 AND   it.status = 0 AND   it.last_update_date >= 
date_trunc('day', now() - interval '90 days') AND EXISTS (SELECT DISTINCT 
res.registration_id FROM emailsubscription.reg_email_subscriptions res WHERE 
res.registration_id = it.recipient_id AND res.subscription_id = 200400);


  QUERY PLAN

--

Unique  (cost=0.57..290191.08 rows=45 width=8) (actual time=89.536..89.536 
rows=0 loops=1)

   Output: it.recipient_id

   Buffers: shared hit=9 read=1

   I/O Timings: read=47.097

   ->  Nested Loop Semi Join  (cost=0.57..290187.87 rows=1286 width=8) (actual 
time=89.534..89.534 rows=0 loops=1)

 Output: it.recipient_id

 Buffers: shared hit=9 read=1

 I/O Timings: read=47.097

 ->  Index Only Scan using iru_tags_n31 on iru.iru_tags it  
(cost=0.57..319.84 rows=1286 width=8) (actual time=89.532..89.532 rows=0 
loops=1

[GENERAL] vacuum full post 9.0 - reindex needed?

2015-01-28 Thread Anand Kumar, Karthik
I haven’t been able to find clear information online about what happens with 
existing indexes on a postgres post-9.0 database (we specifically use 9.3) 
after a vacuum full.

There is a lot of information on how a vacuum differs from a vacuum full, but 
my question is – is a re-index needed after a full vacuum?

We’ve seen from practice that indexes do work after vacuum fulls, but – will a 
reindex make them more optimal?

~Karthik


[GENERAL] Upgrade to 9.3 - performance issue ?

2014-08-08 Thread Karthik Iyer

Hello,

We were planing to upgrade from 9.0 to to 9.3.  While 
investigating on the performance side, we stumbled  upon a link which 
says there may be performance degradation on 9.3:


http://postgresql.1045698.n5.nabble.com/upgrade-from-9-2-x-to-9-3-causes-significant-performance-degradation-td5771288.html

So we are planning to do this post the upgrade:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag


Wanted to understand, If there are any other such tweaks that we need to 
take care of, before we go 9.3


Some details:

OS: CentOS 6.2
Kernel: 2.6.32-431.20.3.el6.x86_64


Help here is appreciated.

Thanks in advance,

- Karthik






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Upgrade to 9.3

2014-07-21 Thread Karthik Iyer

Hello,

We are planning to upgrade Postgres from 9.0 to 9.3. Was wondering 
if there are any serious changes that I have to look out for 
(syntax/datatypes changes) so that my code does not break.


Thanks in advance.

- Karthik


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Repeated semop calls

2014-06-27 Thread Anand Kumar, Karthik
>Are there a lot of tuples in the table that have been inserted or
updated by still-open transactions?


Yes, there are likely to be inserts. That table is a log capture table
used by our replication software, so essentially every
update/delete/insert will have a record inserted into the table. It has no
updates. The only thing that deletes from it is the process that had all
the semop calls.

We¹ll look into the postgres minor upgrade and recompile, thanks.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Repeated semop calls

2014-06-27 Thread Anand Kumar, Karthik

>1. disabling zone_reclaim (echo 0 > /proc/sys/vm/zone_reclaim_mode)
> 2. disabling transparent hugepage support - this has various names on 
> different kernel/distributions, but "find /sys | grep -i 
> transparent.*hugepage.*enable" will find it, and then just echo never there.

Thank you, yes, we have both zone_reclaim_mode and THP disabled




Re: [GENERAL] DATA corruption after promoting slave to master

2014-06-26 Thread Karthik Iyer


A full dump and restore would definitely help. I tend not to suggest 
that often because I work with very large databases that are usually 
extremely cumbersome to dump and restore.


But yeah, if you can get a successful pg_dump from your database, a 
restore should obviously clean up all of your data and index 
inconsistencies if you're willing to wait. 


Thanks a lot Shaun. Appreciate the help.

- Karthik


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Repeated semop calls

2014-06-26 Thread Anand Kumar, Karthik
er  postgres   [.] pgstat_init_function_usage

  0.18%  postmaster  libc-2.12.so   [.] __strlen_sse42

  0.17%  postmaster  postgres   [.] copyObject

  0.17%  postmaster  postgres   [.] s_lock

  0.17%  postmaster  postgres   [.] MemoryContextAllocZeroAligned

  0.17%  postmaster  postgres   [.] palloc

  0.17%  postmaster  [kernel.kallsyms]  [k] avc_has_perm_noaudit

  0.16%  postmaster  postgres   [.] core_yylex

  0.15%  postmaster  postgres   [.] pgstat_end_function_usage

  0.15%  postmaster  libc-2.12.so   [.] __strcmp_sse42

  0.15%  postmaster  [kernel.kallsyms]  [k] task_rq_lock

  0.14%  postmaster  postgres   [.] expression_tree_walker

  0.14%  postmaster  pg_stat_statements.so  [.] 0x24f3

  0.14%  postmaster  postgres   [.] FunctionCall2Coll

  0.12%  postmaster  postgres   [.] CheckForSerializableConflictOut

  0.12%  postmaster  [kernel.kallsyms]  [k] select_task_rq_fair

  0.12%  postmaster  [kernel.kallsyms]  [k] __audit_syscall_exit

  0.11%  postmaster  postgres   [.] nocachegetattr

  0.11%  postmaster  postgres   [.] pg_detoast_datum_packed

  0.10%  postmaster  [kernel.kallsyms]  [k] try_to_wake_up

  0.10%  postmaster  libc-2.12.so   [.] _int_free

  0.10%  postmaster  postgres   [.] ResourceOwnerEnlargeBuffers

  0.10%  postmaster  postgres   [.] slot_attisnull

  0.09%  postmaster  postgres   [.] ExecStoreVirtualTuple

  0.09%  postmaster  postgres   [.] slot_getsomeattrs

  0.08%  postmaster  [kernel.kallsyms]  [k] try_atomic_semop

  0.08%  postmaster  [kernel.kallsyms]  [k] tcp_ack

  0.08%  postmaster  postgres   [.] get_hash_value

  0.08%  postmaster  postgres   [.] BufTableLookup

  0.08%  postmaster  libc-2.12.so   [.] __memset_sse2

  0.08%  postmaster  [kernel.kallsyms]  [k] dequeue_entity

  0.08%  postmaster  postgres   [.] ScanKeywordLookup

  0.08%  postmaster  [kernel.kallsyms]  [k] kmem_cache_free

  0.08%  postmaster  [kernel.kallsyms]  [k] tcp_recvmsg

  0.08%  postmaster  [kernel.kallsyms]  [k] _spin_lock_irq

  0.07%  postmaster  libc-2.12.so   [.] malloc

  0.07%  postmaster  [kernel.kallsyms]  [k] idr_find

  0.07%  postmaster  [tg3]  [k] tg3_poll_work

  0.07%  postmaster  [kernel.kallsyms]  [k] enqueue_entity

  0.07%  postmaster  postgres   [.] PostgresMain

  0.07%  postmaster  [kernel.kallsyms]  [k] unroll_tree_refs

  0.07%  postmaster  postgres   [.] ExecCopySlotMinimalTuple

  0.07%  postmaster  [kernel.kallsyms]  [k] kfree

  0.06%  postmaster  postgres   [.] hashint8

  0.06%  postmaster  [kernel.kallsyms]  [k] __do_softirq

  0.06%  postmaster  [kernel.kallsyms]  [k] dequeue_task_fair

  0.06%  postmaster  postgres   [.] DirectFunctionCall1Coll

  0.06%  postmaster  [kernel.kallsyms]  [k] tcp_rcv_established

  0.06%  postmaster  [kernel.kallsyms]  [k] update_queue

  0.06%  postmaster  postgres   [.] pg_encoding_mbcliplen

  0.06%  postmaster  [kernel.kallsyms]  [k] resched_task

  0.06%  postmaster  [kernel.kallsyms]  [k] copy_user_generic_string

  0.06%  postmaster  postgres   [.] LockAcquireExtended

  0.06%  postmaster  [kernel.kallsyms]  [k] find_busiest_group

  0.06%  postmaster  postgres   [.] ResourceOwnerRememberBuffer

  0.05%  postmaster  [kernel.kallsyms]  [k] enqueue_task

  0.05%  postmaster  postgres   [.] mcv_selectivity

  0.05%  postmaster  [kernel.kallsyms]  [k] thread_return

  0.05%  postmaster  [kernel.kallsyms]  [k] pid_vnr

  0.05%  postmaster  [kernel.kallsyms]  [k] audit_syscall_entry

  0.05%  postmaster  [kernel.kallsyms]  [k] __local_bh_enable

  0.05%  postmaster  [kernel.kallsyms]  [k] rcu_procesa_gp_end


I ran an ipcs and was able to see the semaphore, and get its id:


root@site-db01a:~ # ipcs -c -s | grep 21561422

21561422   600postgres   postgres   postgres   postgres


But I’m not sure where to go next. Is there a resource contention that this is 
indicating, and if so, any way to tell what the contention is on?


~Karthik


Re: [GENERAL] DATA corruption after promoting slave to master

2014-06-26 Thread Karthik Iyer

Hi Shaun,




We reindexed  all the primary and unique keys of all the tables, But we
did not reindex the tables. You think we should do that also ?


Yes, you need to reindex. Part of the problem with this kind of table 
corruption, is that PostgreSQL has applied data and index page 
modifications to the binary files themselves. Due to this, there are 
probably index rows pointing to incorrect or invalid data pages.


Reindexing the primary and unique indexes is a very good start, but 
any queries that use regular indexes can still return invalid or 
mismatched rows when compared to the primary/unique results. Those 
kind of mismatches can result in unexpected behavior from applications.


Thanks, we will reindex the tables too.

Also, do you think we should do a clean dump restore to eliminate all 
data inconsistencies.


Any inputs here? You think a pgdump and restore would help more ?

TIA,

- Karthik










Re: [GENERAL] DATA corruption after promoting slave to master

2014-06-26 Thread Karthik Iyer

Thanks Shaun.

We reindexed  all the primary and unique keys of all the tables, But we 
did not reindex the tables. You think we should do that also ?


Also, do you think we should do a clean dump restore to eliminate all 
data inconsistencies.


One more query :

We managed to get the old server up. We are trying to play the 
difference in data by checking the log files(statement logs). You think 
there is any other easy alternatives ?


Thanks in advance,

- Karthik


On Thursday 26 June 2014 01:09 AM, Shaun Thomas wrote:

On 06/25/2014 06:29 AM, Karthik Iyer wrote:


[2]  We also have a daemon process which copies the latest partial WAL
log file (which postgres is currently writing to, under pg_xlog/) every
3 secs to a different location.


No. No, no, no, no no. No.

Also, no. Partial WAL files are not valid for recovery. In fact, I'm 
surprised the standby node even applied it at all.



We are seeing these problems in the newly promoted master now:

 1. when we run queries on primary key, we don't get the rows even
if it exist in db. However if we force query not to use index, we get
those entries.
 2. there are duplicate values for primary keys


This is no surprise. Your slave has partial data commits, which means 
your table, index, or possibly both, are corrupt.


The first thing you need to do is back up any tables you've noticed 
are having this behavior. Second, try to reindex the tables that are 
having problems. The errors you are seeing are due to the data and 
indexes being out of sync. If you get an error that says the reindex 
fails due to duplicate values, you can do this:


SET enable_indexscan TO false;
SET enable_bitmapscan TO false;

SELECT primary_key, count(1)
  FROM broken_table
 GROUP BY 1
HAVING count(1) > 1;

For any ID that comes back, do this:

SELECT ctid, *
  FROM broken_table
 WHERE primary_key = [value(s) from above];

Then you need to delete one of the bad rows after deciding which. Use 
the CTID of the row you want to delete:


DELETE FROM broken_table
 WHERE ctid = 'whatever';

Then reindex the table so the correct values are properly covered. 
Doing this for all of your corrupt tables may take a while depending 
on how many there are.






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] DATA corruption after promoting slave to master

2014-06-25 Thread Karthik Iyer


Hello,

We are facing issues of some DB inconsistencies, while promoting the 
slave to master.



[1] We have master-slave with Hot Standby Setup (log shipping).
[2]  We also have a daemon process which copies the latest partial WAL 
log file (which postgres is currently writing to, under pg_xlog/) every 
3 secs to a different location.


Master server crashed. We decided to promote the slave to master. 
Before, creating the trigger file we passed the partial WAL log file [2] 
and made the slave replay the partial file as well(So that the data loss 
is minimal). Following link has the errors that we saw at different stage:


http://pastebin.com/raw.php?i=Ekr0VrQA

We are seeing these problems in the newly promoted master now:

1. when we run queries on primary key, we don't get the rows even 
if it exist in db. However if we force query not to use index, we get 
those entries.

2. there are duplicate values for primary keys


Postgres version that we use:

9.0.13

Postgres conf : http://pastebin.com/raw.php?i=B0diaitG

Any help here is appreciated.

Thanks in advance.

- Karthik Iyer


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Oracle to PostgreSQL replication

2014-05-08 Thread Anand Kumar, Karthik
We use symmetricDS for this. Works pretty well.

http://www.symmetricds.org/


From: Serge Fonville mailto:serge.fonvi...@gmail.com>>
Date: Wednesday, May 7, 2014 at 2:49 AM
To: Geoff Montee mailto:geoff.mon...@gmail.com>>
Cc: Sameer Kumar mailto:sameer.ku...@ashnik.com>>, 
PostgreSQL General Discussion Forum 
mailto:pgsql-general@postgresql.org>>
Subject: Re: [GENERAL] Oracle to PostgreSQL replication

Hi,

I need to setup a replication process for continuously replicating changes 
happening in an Oracle Database to a PostgreSQL database.

My Oracle Database is version 11.2 and setup as a cluster with RAC
My Postgres database version is 9.2

Oracle Database is running in Solaris and PostgreSQL is running on RHEL.

Is there any commercial or open source tool available to achieve this?

How about EnterpriseDB XDB 
replication?

Kind regards/met vriendelijke groet,

Serge Fonville

http://www.sergefonville.nl


2014-05-07 11:44 GMT+02:00 Geoff Montee 
mailto:geoff.mon...@gmail.com>>:

On Wed, May 7, 2014 at 12:31 AM, Sameer Kumar 
mailto:sameer.ku...@ashnik.com>> wrote:
Hi,


I need to setup a replication process for continuously replicating changes 
happening in an Oracle Database to a PostgreSQL database.


My Oracle Database is version 11.2 and setup as a cluster with RAC
My Postgres database version is 9.2

Oracle Database is running in Solaris and PostgreSQL is running on RHEL.

Is there any commercial or open source tool available to achieve this?



Continuent's Tungsten Replicator apparently offers Oracle to MySQL replication. 
There's a wiki page that suggests PostgreSQL support was in development at one 
time. I'm not sure how far they got, or if they are still working on it.

http://www.continuent.com/solutions/replication

https://wiki.postgresql.org/wiki/Tungsten

Geoff Montee



Re: [GENERAL] Increase in max_connections

2014-03-26 Thread Anand Kumar, Karthik
Thanks Bruce. Really interesting, but, I show zone reclaim is already
turned off on our system.

root@site-db01b:~ # numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 12 13 14 15 16 17
node 0 size: 393181 MB
node 0 free: 467 MB
node 1 cpus: 6 7 8 9 10 11 18 19 20 21 22 23
node 1 size: 393215 MB
node 1 free: 319 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

root@site-db01b:~ # cat /proc/sys/vm/zone_reclaim_mode
0



Thanks,
Karthik




On 3/26/14 1:54 PM, "Bruce Momjian"  wrote:

>On Wed, Mar 26, 2014 at 08:22:01PM +, Anand Kumar, Karthik wrote:
>> Looking a little deeper, I saw signs of memory being heavily fragmented:
>> 
>> root@site-db01b:/var/log # cat /proc/buddyinfo
>> Node 0, zone DMA 1 1 2 2 2 1 0 0 1 1 3
>> Node 0, zone DMA32 8 7 8 7 10 8 7 11 9 5 92
>> Node 0, zone Normal 13069 0 0 0 0 0 0 0 0 0 1
>> Node 1, zone Normal 652315 36885 1168 0 1 1 0 1 1 1 0
>> 
>> 
>> Node 0 has 13069 4k blocks, and zero 8k blocks available to use
>> Which is likely what caused the problem, I'd think.
>> 
>> A little while later though, buddyinfo changed and suddenly there was a
>> lot more memory in 8k blocks.
>> 
>> root@site-db01b:/proc # cat /proc/buddyinfo
>> Node 0, zone  DMA  1  1  2  2  2  1  0
>>  0  1  1  3
>> Node 0, zoneDMA32  8  7  8  7 10  8  7
>> 11  9  5 92
>> Node 0, zone   Normal   9645   5495   1115  0  0  0  0
>>  0  0  0  1
>> Node 1, zone   Normal 409734  10953  1  0  1  1  0
>>  1  1  1  0
>> 
>> (Note the change in the Node 0 line, 8k blocks went up from 0 to 5495)
>> 
>> Anyone have any idea why memory was so fragmented, and what causes
>>memory
>> to be defragged? Is it something postgres does? Are there any kernel
>> specific settings that control it?
>
>If I had to take a guess, it is zone_reclaim;  see:
>
>   
> http://frosty-postgres.blogspot.com/2012/08/postgresql-numa-and-zone-recl
>aim-mode.html
>
>The fix is this sysctl:
>
>   vm.zone_reclaim_mode = 0
>
>-- 
>  Bruce Momjian  http://momjian.us
>  EnterpriseDB http://enterprisedb.com
>
>  + Everyone has their own god. +



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Increase in max_connections

2014-03-26 Thread Anand Kumar, Karthik
Hi all,

We finally made some headway on this - we noticed messages like the below
in /var/log/messages whenever the issue happened:

Mar 26 07:39:58 site-db01b kernel: postmaster: page allocation failure.
order:1, mode:0x20
Mar 26 07:39:58 site-db01b kernel: Pid: 39066, comm: postmaster Not
tainted 2.6.32-279.el6.x86_64 #1
Mar 26 07:39:58 site-db01b kernel: Call Trace:
Mar 26 07:39:58 site-db01b kernel:   [] ?
__alloc_pages_nodemask+0x77f/0x940
Mar 26 07:39:58 site-db01b kernel: [] ?
fallback_alloc+0x1ba/0x270
Mar 26 07:39:58 site-db01b kernel: [] ?
kmem_getpages+0x62/0x170
Mar 26 07:39:58 site-db01b kernel: [] ?
cache_grow+0x2cf/0x320
Mar 26 07:39:58 site-db01b kernel: [] ?
cache_alloc_node+0x99/0x160
Mar 26 07:39:58 site-db01b kernel: [] ?
kmem_cache_alloc+0x11b/0x190
Mar 26 07:39:58 site-db01b kernel: [] ?
sk_clone+0x22/0x2e0
Mar 26 07:39:58 site-db01b kernel: [] ?
sk_prot_alloc+0x48/0x1c0
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_create_openreq_child+0x23/0x450
Mar 26 07:39:58 site-db01b kernel: [] ?
inet_csk_clone+0x16/0xd0
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_v4_rcv+0x4fe/0x8d0
Mar 26 07:39:58 site-db01b kernel: [] ?
ip_local_deliver_finish+0x0/0x2d0
Mar 26 07:39:58 site-db01b kernel: [] ?
ip_local_deliver_finish+0xdd/0x2d0
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_v4_syn_recv_sock+0x4d/0x310
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_check_req+0x226/0x460
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_v4_do_rcv+0x35b/0x430
Mar 26 07:39:58 site-db01b kernel: [] ?
tcp_rcv_established+0x38d/0x800
Mar 26 07:39:58 site-db01b kernel: [] ?
ip_rcv+0x275/0x350
Mar 26 07:39:58 site-db01b kernel: [] ?
ip_rcv_finish+0x12d/0x440
Mar 26 07:39:58 site-db01b kernel: [] ?
ip_local_deliver+0x98/0xa0
Mar 26 07:39:58 site-db01b kernel: [] ?
__netif_receive_skb+0x49b/0x6f0
Mar 26 07:39:58 site-db01b kernel: [] ?
tg3_poll_work+0x654/0xe30 [tg3]
Mar 26 07:39:58 site-db01b kernel: [] ?
tg3_poll_msix+0x4c/0x150 [tg3]
Mar 26 07:39:58 site-db01b kernel: [] ?
netif_receive_skb+0x58/0x60
Mar 26 07:39:58 site-db01b kernel: [] ?
__do_softirq+0xc1/0x1e0
Mar 26 07:39:58 site-db01b kernel: [] ?
napi_skb_finish+0x50/0x70
Mar 26 07:39:58 site-db01b kernel: [] ?
net_rx_action+0x103/0x2f0
Mar 26 07:39:58 site-db01b kernel: [] ?
napi_gro_receive+0x39/0x50
Mar 26 07:39:58 site-db01b kernel: [] ?
handle_IRQ_event+0x60/0x170
Mar 26 07:39:58 site-db01b kernel: [] ?
irq_exit+0x85/0x90
Mar 26 07:39:58 site-db01b kernel: [] ?
do_softirq+0x65/0xa0
Mar 26 07:39:58 site-db01b kernel: [] ?
__do_softirq+0x11f/0x1e0
Mar 26 07:39:58 site-db01b kernel: [] ?
call_softirq+0x1c/0x30

Doing some digging on that, we disabled TSO/TRO, GSO/GRO at the tcp layer
- and that seems to have helped.

$ sudo ethtool -k eth0 Offload parameters for
eth0:
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off



However, I'm looking for more information on what's happening: That stack
trace above seems to indicate that it was unable to allocate 2*4k pages
(8k) to the network stack. Its likely that was needed for GSO/GRO.
However, wondering why the kernel is unable to allocate just 8k - we have
a 768G RAM server, with over 54G in buffers/cache

root@site-db01b:/proc # free -m
 total   used   free sharedbuffers cached
Mem:775382 773354   2028  0   1403 738735
-/+ buffers/cache:  33215 742166
Swap:0  0  0


Looking a little deeper, I saw signs of memory being heavily fragmented:

root@site-db01b:/var/log # cat /proc/buddyinfo
Node 0, zone DMA 1 1 2 2 2 1 0 0 1 1 3
Node 0, zone DMA32 8 7 8 7 10 8 7 11 9 5 92
Node 0, zone Normal 13069 0 0 0 0 0 0 0 0 0 1
Node 1, zone Normal 652315 36885 1168 0 1 1 0 1 1 1 0


Node 0 has 13069 4k blocks, and zero 8k blocks available to use
Which is likely what caused the problem, I'd think.

A little while later though, buddyinfo changed and suddenly there was a
lot more memory in 8k blocks.

root@site-db01b:/proc # cat /proc/buddyinfo
Node 0, zone  DMA  1  1  2  2  2  1  0
 0  1  1  3
Node 0, zoneDMA32  8  7  8  7 10  8  7
11  9  5 92
Node 0, zone   Normal   9645   5495   1115  0  0  0  0
 0  0  0  1
Node 1, zone   Normal 409734  10953  1  0  1  1  0
 1  1  1  0

(Note the change in the Node 0 line, 8k blocks went up from 0 to 5495)

Anyone have any idea why memory was so fragmented, and what causes memory
to be defragged? Is it something postgres does? Are there any kernel
specific settings that control it?


Thanks,
Karthik




On 3/14/14 3:37 PM, "Anand Kumar, Karthik"
 wrote:

>For anyone that's still following - we tried upgrading to postgres 9.3.3 -
>that hasn't helped.
>
>Runn

Re: [GENERAL] Increase in max_connections

2014-03-14 Thread Anand Kumar, Karthik
For anyone that's still following - we tried upgrading to postgres 9.3.3 -
that hasn't helped.

Running an strace on the pid that was consuming the highest CPU at the
time of the outage shows:

semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91881569, {{12, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(89325587, {{14, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90964037, {{4, 1, 0}}, 1) = 0
semop(90308657, {{5, 1, 0}}, 1) = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(88866821, {{12, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90439733, {{13, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90341426, {{2, 1, 0}}, 1) = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90308657, {{5, 1, 0}}, 1) = 0
semop(91881569, {{12, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(88866821, {{12, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91881569, {{12, 1, 0}}, 1)= 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90865730, {{5, 1, 0}}, 1) = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(91521110, {{12, -1, 0}}, 1)   = 0
semop(90865730, {{5, 1, 0}}, 1) = 0

I've seen other people talk of this problem with a lot of semop calls,
haven't yet found a clear solution.
Anyone have any ideas?

I've also downloaded the perf tool based on
http://rhaas.blogspot.com/2012/06/perf-good-bad-ugly.html - will see what
that has to show.

Thanks,
Karthik


On 3/11/14 1:06 PM, "John R Pierce"  wrote:

>On 3/11/2014 10:20 AM, Anand Kumar, Karthik wrote:
>> We typically see about 500-700 active queries at a time
>
>if these are primarily small/fast queries, like OLTP operations, and you
>DONT have 200-400 CPU cores on this server, you will likely find that if
>you use a queueing mechanism to only execute about 2X your CPU core
>count concurrently, you will get MORE total transactions/second than
>trying to do 500-700 at once.
>
>if your apps are using persistent connections, then the session pooling
>model won't do any good, you should use transaction pooling.  you want
>the actual active query count to be tunable, probably down around 2X the
>cpu core count, depending on various things.some folks say, CPU
>cores/threads plus disk spindles is the optimal number.
>
>
>
>-- 
>john r pierce  37N 122W
>somewhere on the middle of the left coast
>
>
>
>-- 
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Increase in max_connections

2014-03-11 Thread Anand Kumar, Karthik
Thanks Jeff. We have scripts in place now to capture the incoming rate of 
requests. Waiting on the crash to happen to see if it spikes up :)

Re: min_log_duration – we *do* see a good number of requests in the log that 
hit our cap (of 100ms). Just that nothing stands out when we have the issue. 
Whatever queries we do see slow down seem to be after we start the CPU spike, 
and so an effect and not a cause.

We typically see about 500-700 active queries at a time – and that seems to 
match how high connection limit goes. We tried pg_bouncer, however, at session 
level pooling, it slowed down our applications (they maintain persistent 
connections once established, so any connection overhead slows them down), and 
with transaction level pooling, simply did not work.

Thanks,
Karthik

From: Jeff Janes mailto:jeff.ja...@gmail.com>>
Date: Tuesday, March 11, 2014 9:23 AM
To: "Anand Kumar, Karthik" 
mailto:karthik.anandku...@classmates.com>>
Cc: "pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>" 
mailto:pgsql-general@postgresql.org>>
Subject: Re: [GENERAL] Increase in max_connections

On Mon, Mar 10, 2014 at 6:04 PM, Anand Kumar, Karthik 
mailto:karthik.anandku...@classmates.com>> 
wrote:
Hi all,

We're running postgres 9.3.2, server configuration below.

Seemingly randomly, we will see the number of active queries in postgres go up 
until we hit max_connections. The DB will recover after a few minutes.

We had the issue a couple of times in Feb 2014. We then upgraded the postgres 
server from 9.1 to 9.3.2, and the occurrence has gone up significantly – to 
several times a day.

The user CPU goes up as well to a 100%, no increase in I/O or system CPU.
We have slow query logging, and there is no dramatic change in the slow queries 
either.


I think you might need to change your cutoff for what makes a slow query.  It 
must be the case that either your are seeing an abnormal spike in query 
requests, or that the queries are taking an abnormally long time (or both).  If 
not many queries are hitting log_min_duration_statement, that just means you 
can lower it further without causing too much log bloat.

Usually when people hit max_connections under load, it means they need to move 
the limit into a connection pooler (where it can be more graceful, by putting 
them in a queue for a free slot, rather than returning an error) and *lower* 
max_connections.


max_connections|1500|configuration file

 That is very high.

Cheers,

Jeff


Re: [GENERAL] Increase in max_connections

2014-03-11 Thread Anand Kumar, Karthik
No errors in the logs, except when we hit max_connections
No shared memory problems – no associated spike in I/O or system CPU indicating 
shared memory is either unused or over used. Sufficient memory in 
cache/buffers, zero swapping or anything indicative of a memory problem.

The box is pretty beefy – 24 core, 768G RAM :) - so yes, an effective cache of 
568GB is normal, we arrived at it with months of tuning over time.

cpu_tuple_cost of 0.03 – yes, a lot of our settings are tweaked from the 
defaults based on performance. I don't have the output now, the the 0.03 was 
based on recommendations from posrgtes user groups, and via testing with 
setting it up and running explain analyze on queries. None of the settings have 
changed when this problem began.

Thanks,
Karthik

From: Venkata Balaji Nagothi mailto:vbn...@gmail.com>>
Date: Monday, March 10, 2014 7:35 PM
To: "Anand Kumar, Karthik" 
mailto:karthik.anandku...@classmates.com>>
Cc: "pgsql-general@postgresql.org<mailto:pgsql-general@postgresql.org>" 
mailto:pgsql-general@postgresql.org>>
Subject: Re: [GENERAL] Increase in max_connections


On Tue, Mar 11, 2014 at 12:04 PM, Anand Kumar, Karthik 
mailto:karthik.anandku...@classmates.com>> 
wrote:
Hi all,

We're running postgres 9.3.2, server configuration below.

Seemingly randomly, we will see the number of active queries in postgres go up 
until we hit max_connections. The DB will recover after a few minutes.

We had the issue a couple of times in Feb 2014. We then upgraded the postgres 
server from 9.1 to 9.3.2, and the occurrence has gone up significantly – to 
several times a day.

The user CPU goes up as well to a 100%, no increase in I/O or system CPU.
We have slow query logging, and there is no dramatic change in the slow queries 
either.
There is a corresponding spike in shared locks, but that seems to be an effect 
not a cause – it corresponds to an increase in the number of running processes 
at the time.

We had a similar issue in the past – that was solved by disabling 
transparent_huge_pages – but the difference there was that we'd see queries 
slow down dramatically. Currently, we don't. Also, transparent_huge_pages is 
still disabled.

I do realize the issue would be caused by a spurt in incoming connections – we 
do not yet have conclusive evidence on whether that's happening (active queries 
climbs up, however no conclusive proof on whether thats because of slow down, 
or because of increase in traffic). Working on getting the information, will 
update with that information as soon as we have it.

I thought I'd send a post out to the group before then, to see if anyone has 
run into anything similar.

Thanks,
Karthik

site=# SELECT version();
PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 
20120305 (Red Hat 4.4.6-4), 64-bit

site=# SELECT name, current_setting(name), source
site-#   FROM pg_settings
site-#   WHERE source NOT IN ('default', 'override');
application_name|psql|client
archive_command|/usr/bin/archiver.sh %f %p|configuration file
archive_mode|on|configuration file
autovacuum_freeze_max_age|25000|configuration file
autovacuum_max_workers|6|configuration file
bgwriter_lru_maxpages|1000|configuration file
bgwriter_lru_multiplier|4|configuration file
checkpoint_completion_target|0.8|configuration file
checkpoint_segments|250|configuration file
checkpoint_timeout|15min|configuration file
checkpoint_warning|6min|configuration file
client_encoding|UTF8|client
commit_siblings|25|configuration file
cpu_tuple_cost|0.03|configuration file
DateStyle|ISO, MDY|configuration file
default_statistics_target|300|configuration file
default_text_search_config|pg_catalog.english|configuration file
effective_cache_size|568GB|configuration file
fsync|on|configuration file
lc_messages|en_US.UTF-8|configuration file
lc_monetary|en_US.UTF-8|configuration file
lc_numeric|en_US.UTF-8|configuration file
lc_time|en_US.UTF-8|configuration file
listen_addresses|*|configuration file
log_autovacuum_min_duration|0|configuration file
log_checkpoints|on|configuration file
log_connections|on|configuration file
log_destination|syslog|configuration file
log_directory|pg_log|configuration file
log_filename|postgresql-%a.log|configuration file
log_line_prefix|user=%u,db=%d,ip=%h |configuration file
log_min_duration_statement|100ms|configuration file
log_min_messages|debug1|configuration file
log_rotation_age|1d|configuration file
log_rotation_size|0|configuration file
log_timezone|US/Pacific|configuration file
log_truncate_on_rotation|on|configuration file
logging_collector|off|configuration file
maintenance_work_mem|1GB|configuration file
max_connections|1500|configuration file
max_locks_per_transaction|1000|configuration file
max_stack_depth|2MB|environment variable
max_wal_senders|5|configuration file
port|5432|command line
random_page_cost|2|configuration file
shared_buffers|8

[GENERAL] Increase in max_connections

2014-03-10 Thread Anand Kumar, Karthik
Hi all,

We're running postgres 9.3.2, server configuration below.

Seemingly randomly, we will see the number of active queries in postgres go up 
until we hit max_connections. The DB will recover after a few minutes.

We had the issue a couple of times in Feb 2014. We then upgraded the postgres 
server from 9.1 to 9.3.2, and the occurrence has gone up significantly – to 
several times a day.

The user CPU goes up as well to a 100%, no increase in I/O or system CPU.
We have slow query logging, and there is no dramatic change in the slow queries 
either.
There is a corresponding spike in shared locks, but that seems to be an effect 
not a cause – it corresponds to an increase in the number of running processes 
at the time.

We had a similar issue in the past – that was solved by disabling 
transparent_huge_pages – but the difference there was that we'd see queries 
slow down dramatically. Currently, we don't. Also, transparent_huge_pages is 
still disabled.

I do realize the issue would be caused by a spurt in incoming connections – we 
do not yet have conclusive evidence on whether that's happening (active queries 
climbs up, however no conclusive proof on whether thats because of slow down, 
or because of increase in traffic). Working on getting the information, will 
update with that information as soon as we have it.

I thought I'd send a post out to the group before then, to see if anyone has 
run into anything similar.

Thanks,
Karthik

site=# SELECT version();
PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 
20120305 (Red Hat 4.4.6-4), 64-bit

site=# SELECT name, current_setting(name), source
site-#   FROM pg_settings
site-#   WHERE source NOT IN ('default', 'override');
application_name|psql|client
archive_command|/usr/bin/archiver.sh %f %p|configuration file
archive_mode|on|configuration file
autovacuum_freeze_max_age|25000|configuration file
autovacuum_max_workers|6|configuration file
bgwriter_lru_maxpages|1000|configuration file
bgwriter_lru_multiplier|4|configuration file
checkpoint_completion_target|0.8|configuration file
checkpoint_segments|250|configuration file
checkpoint_timeout|15min|configuration file
checkpoint_warning|6min|configuration file
client_encoding|UTF8|client
commit_siblings|25|configuration file
cpu_tuple_cost|0.03|configuration file
DateStyle|ISO, MDY|configuration file
default_statistics_target|300|configuration file
default_text_search_config|pg_catalog.english|configuration file
effective_cache_size|568GB|configuration file
fsync|on|configuration file
lc_messages|en_US.UTF-8|configuration file
lc_monetary|en_US.UTF-8|configuration file
lc_numeric|en_US.UTF-8|configuration file
lc_time|en_US.UTF-8|configuration file
listen_addresses|*|configuration file
log_autovacuum_min_duration|0|configuration file
log_checkpoints|on|configuration file
log_connections|on|configuration file
log_destination|syslog|configuration file
log_directory|pg_log|configuration file
log_filename|postgresql-%a.log|configuration file
log_line_prefix|user=%u,db=%d,ip=%h |configuration file
log_min_duration_statement|100ms|configuration file
log_min_messages|debug1|configuration file
log_rotation_age|1d|configuration file
log_rotation_size|0|configuration file
log_timezone|US/Pacific|configuration file
log_truncate_on_rotation|on|configuration file
logging_collector|off|configuration file
maintenance_work_mem|1GB|configuration file
max_connections|1500|configuration file
max_locks_per_transaction|1000|configuration file
max_stack_depth|2MB|environment variable
max_wal_senders|5|configuration file
port|5432|command line
random_page_cost|2|configuration file
shared_buffers|8GB|configuration file
synchronous_commit|off|configuration file
syslog_facility|local0|configuration file
syslog_ident|postgres|configuration file
TimeZone|US/Pacific|configuration file
vacuum_freeze_table_age|0|configuration file
wal_buffers|32MB|configuration file
wal_keep_segments|250|configuration file
wal_level|hot_standby|configuration file
wal_sync_method|fsync|configuration file
work_mem|130MB|configuration file


[GENERAL] PQunescapebytea not reverse of PQescapebytea?

2014-03-01 Thread Karthik Segpi
Hi -
I have a 'bytea' column in the database, onto which my custom C application
is inserting encrypted data. Before inserting, I am calling
'PQescapebytea()' to escape the ciphertext. However, after SELECT, the data
needs to be 'un-escaped' before attempting to decrypt. I am trying to
'un-escape' using 'PQunescapebytea'. However, I am finding that
'PQunescapebytea' is not  exact inverse of 'PQescapebytea'. I saw
documentation and posts in the mailing lists alluding to this as well. As a
result, the decryption always fails.

Is there another C method out there that does exact inverse of
'PQescapebytea'? If the answer is no, then what other options do I have? I
could think of the following:

1. Write my own C method that does exactly the inverse of 'PQescapebytea'
2. Instead of using 'PQescapebytea', base64 encode ciphertext before
inserting. Not sure how well this would work.
3.  Use 'text' data type for the column instead of 'bytea'.

Has anyone faced this scenario before?

Thanks-
Karthik


Re: [GENERAL] timezone change after upgrade from postgres 9.1 to 9.3?

2014-01-30 Thread Anand Kumar, Karthik
Thank you.

Its a 'yes' on 2 there. I did copy the config file from the 9.1
installation. Thank you for mentioning that, will fix it.

Thanks,
Karthik




On 1/30/14 4:17 PM, "Tom Lane"  wrote:

>Adrian Klaver  writes:
>> On 01/30/2014 03:17 PM, Anand Kumar, Karthik wrote:
>>> We just upgraded our postgres database from 9.1 to 9.3. And noticed
>>>that
>>> the timezone changed from PST to GMT.
>>> Is that known behavior? Has anyone else run into it, or am I just
>>> missing something?
>
>> Well there where changes in the way timezones are set in 9.2.
>
>Yeah.  Likely theories include
>
>1. You ran initdb in a different environment than you normally start the
>server in.
>
>2. You blindly copied the 9.1 postgresql.conf into the 9.3 installation,
>overwriting what initdb had done.  If there's no uncommented timezone
>setting in the .conf file, 9.3 will fall back to GMT, IIRC.  But that
>approach to configuration has a lot of pitfalls besides this one.
>
>In any case the fix is to set the zone you want in postgresql.conf.
>
>   regards, tom lane



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] timezone change after upgrade from postgres 9.1 to 9.3?

2014-01-30 Thread Anand Kumar, Karthik
Hi,

We just upgraded our postgres database from 9.1 to 9.3. And noticed that the 
timezone changed from PST to GMT.
Is that known behavior? Has anyone else run into it, or am I just missing 
something?

I've verified the server's timezone is right, and nothing in the postgres 
user's profile is changing the timezone at startup. The postgres start up 
scripts aren't setting the timezone either, from what I can see.

Thanks,
Karthik


Re: [GENERAL] Sudden slow down and spike in system CPU causes max_connections to get exhausted

2014-01-10 Thread Anand Kumar, Karthik
Thanks all for your suggestions. Looks like disabling transparent huge
pages fixed this issue for us. We haven't had it occur in two days now
after the change.

Thanks,
Karthik



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sudden slow down and spike in system CPU causes max_connections to get exhausted

2014-01-07 Thread Anand Kumar, Karthik
The reason we're on ext2 is to get around
http://www.postgresql.org/message-id/CED87E13.C57E7%karthik.anandkumar@memo
rylane.com 

We had pretty severe index and table corruption that would occur randomly
- this was on ext3, and centos 5.6, 2.6.18 kernel. The problems got fixed
after we upgraded the kernel to 2.6.32, Centos 6.3. We also dropped down
to ext2 because we would see the filesystem go readonly, and wanted to get
the journal out of the way (yes, maybe overkill, but we desperately needed
to stop crashing)

We are getting a standby box up on Centos 6.5 with xfs, will move to that.
Thats longer term though and I'm hoping to be able to resolve this issue
before we get there.

We'll try reducing the number of backends, and disable transparent huge
pages. I'll update the thread with our results.

Thanks,
Karthik



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Sudden slow down and spike in system CPU causes max_connections to get exhausted

2014-01-06 Thread Anand Kumar, Karthik
Hi,

We run postgres 9.1.11, on Centos 6.3, and an ext2 filesystem

Everything will run along okay, and every few hours, for about a couple of 
minutes, postgres will slow way down. A "select 1" query takes between 10 and 
15 seconds to run, and the box in general gets lethargic.

This causes a pile up of connections at the DB, and we run out of 
max_connections.

This is accompanied with a steep spike in system CPU and load avg. No spike in 
user CPU or in I/O.

So far:
- We've ruled out check points as a cause.
- We have statement logging turned on and no single statement seems to be 
causing this. All statements slow down, including "select 1"
- There is no spike in incoming traffic that we can see.

We do typically have a lot of idle connections (1500 connections total, over a 
1000 idle at any given time). We're in the midst of installing pgbouncer to try 
and mitigate the problem, but that still doesn't address the root cause.

Anyone have any tips for why this might be occurring?

Thanks,
Karthik


Re: [GENERAL] Multi Master Replication

2013-12-23 Thread Anand Kumar, Karthik
We use symmetricDS pretty extensively, across oracle and postgres
databases. It has its flaws and its strengths. It shines when there's
multiple database platforms involved, when the volume of transactions is
not too high, and supports multi master. Its optimized for wan topologies,
so its great if you have small amounts of data over longer distances, not
as great when there's several gigabytes needing transfer within the same
datacenter. It also has inbuilt insert vs update conflict resolution.

On the flip side, as the volume of writes increases, it can quickly lead
to significant database bloat and high CPU usage. Its also not as fast as
some other tools (like slony, or ora2pg) for initial loads of tables.

It exposes all of its switches and knobs (and there are a *lot* of
tunables), which make it both very flexible, but also prone to breakage if
you fiddle with it too much. Its in active development and we've found the
community forums to be pretty helpful when we run into undocumented bugs.

Let me know if you need help with it.

Thanks,
Karthik



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] index and table corruption

2013-12-22 Thread Anand Kumar, Karthik
>
>Thanks Shaun!
>
>Yes, we're getting synchronous_commit on right now.
>
>The log_min_duration was briefly set to 0 at the time I sent out the post,
>just to see what statements were logged right before everything went to
>hell. Didn't yield much since we very quickly realized we couldn't cope
>with the volume of logs.
>
>We also noticed that when trying to recover from a snapshot and replay
>archived wal logs, it would corrupt right away, in under an hour. When
>recovering from snapshots *without* replaying wal logs, we go on for a day
>or two without the problem, so it does seem like wal logs are probably not
>being flushed to disk as expected.
>
>Will update once we get onto the new h/w to see if that fixes it.

>>> FYI - the change of server hardware didn't help
>>> Also, I'm able to run a full pg_dump of the database (without output
>>>to /dev/null) and it completes okay.
>>> And then a few hours later the issue shows up again
>>> We run an ext3 filesystem with journaling=ordered, and kernel 2.6.18
>>> Currently testing an upgrade to the kernel and turning off journaling
>>>on the filesystem to see if that will help

>
>Thanks,
>Karthik
>
>
>
>-- 
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Best way to sync possibly corrupted data?

2013-12-20 Thread Anand Kumar, Karthik
Thank you. pg_dump is what we will use.

We did re-assign the LUN to a new system last night, and are monitoring.
Too early to say anything, but so far, we haven't seen the corruption.

And yes, we will get the dump from the new system.

We do not have the hardware to move the backups to a different storage
area yet, but its something we are working on, thank you

Thanks,
Karthik



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Best way to sync possibly corrupted data?

2013-12-19 Thread Anand Kumar, Karthik
HI,

We have an issue with possibly corrupt data in our postgresql server. Errors 
like:

ERROR:  index "photos_p00_n2" contains unexpected zero page at block 0
ERROR:  invalid page header in block 12707 of relation 
pg_tblspc/5020557/PG_9.1_201105231/16393/9014673

Thanks to all the suggestions from this list. We are in the process of moving 
our database out to a different server, and we'll then set zero_dameged_pages 
to on, run a full vacuum and reindex.

The question I have is – what is the best method to transfer the data over to 
ensure we don't copy over bad/corrupt data? I would think a filesystem based 
copy (rsync, etc) should be avoided, and a pg_dump with a new initdb is best?

Thanks,
Karthik


Re: [GENERAL] index and table corruption

2013-12-19 Thread Anand Kumar, Karthik
Hi Jerry,

Thanks for the suggestion

Yes, until about a month ago, we weren't wrapping our snapshots with
pg_start_backup and pg_stop_backup. Same reason as you mentioned, the
database would start up and "trivial checks" would be okay, and so we
figured "why write a script?".

However we did change that a month or so ago ago, and have had the problem
after that. Every snapshot we have tried to actually recover from has been
wrapped in a pg_start_backup and pg_stop_backup, so we are leaning more
towards server/disk corruption at this time.

We also synced our snapshot to an alternate SAN, and ran a script to
update every row of every table, and do a full vacuum and reindex of every
table, and there were no error messages about bad blocks.

Thanks,
Karthik



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] index and table corruption

2013-12-19 Thread Anand Kumar, Karthik
Thanks Shaun!

Yes, we're getting synchronous_commit on right now.

The log_min_duration was briefly set to 0 at the time I sent out the post,
just to see what statements were logged right before everything went to
hell. Didn't yield much since we very quickly realized we couldn't cope
with the volume of logs.

We also noticed that when trying to recover from a snapshot and replay
archived wal logs, it would corrupt right away, in under an hour. When
recovering from snapshots *without* replaying wal logs, we go on for a day
or two without the problem, so it does seem like wal logs are probably not
being flushed to disk as expected.

Will update once we get onto the new h/w to see if that fixes it.

Thanks,
Karthik



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] index and table corruption

2013-12-19 Thread Anand Kumar, Karthik
Hi,

We're looking for help with possible corruption of our indexes and tables.

Seemingly in the middle of normal operations, we will run into errors like
the below:

ERROR:  index "mv_visits_p03_n2" contains unexpected zero page at block
15939
ERROR:  invalid page header in block 344713 of relation
pg_tblspc/4376157/PG_9.1_201105231/16393/8367465

Following which the database continues on, but IO creeps up until finally
the server becomes unresponsive. The database has never 'crashed' though

A majority of the tables are the same each time, although new ones will
come in, and old ones will go out. A total of about 84 out of 452 tables
have gotten this error so far.

We run postgres verion 9.1.2, installed via the PGDG rpms.
The server runs centos5.6, and the disk backend is Netapp based SAN
Its a 24CPU box, with 768G RAM.
The database is about 1TB. Its a single database cluster.

Things we've tried so far:

- Everytime we run into the error, we restore the database from a previous
snapshot (block level Netapp snapshot). Snapshots are taken with the
postgres hot backup mode enabled, and are clean. They are block level, so
ideally going back to a snapshot should remove any block level corruption
that occurred on the device.

- We set zero_damaged_pages = on, ran a full vacuum and re-index of 4
tables. Both the full vacuum and reindex completed successfully, with no
errors. The same tables showed up when it failed again.

- We've had the sysadmins check for errors with the hardware ­ no errors
so far about any h/w problems, either on the box, with the SAN switches,
or on the filer. We are going to switch over to a different server on the
same SAN backend, to see if that helps

- We suspected it might have something to do with
http://wiki.postgresql.org/wiki/20120924updaterelease, and upgraded to
postgres 9.1.11, that hasn't helped.

- We had shared_buffers set to 60G, and reduced that down to 8G, and then
to 4G, suspecting problems with the background writer handling such high
shared buffers, that hasn't helped either.

Our postgres configuration is:

Version: PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc
(GCC) 4.1.2 20080704 (Red Hat 4.1.2-50), 64-bit
Updgraded to: PostgreSQL 9.1.11 on x86_64-unknown-linux-gnu, compiled by
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50), 64-bit
name |


   curent_setting
--+
---
application_name | psql
archive_command  | /usr/bin/archiver.sh %f %p
archive_mode | on
checkpoint_completion_target | 0.8
checkpoint_segments  | 25
checkpoint_timeout   | 10min
checkpoint_warning   | 2min
client_encoding  | UTF8
commit_siblings  | 25
custom_variable_classes  | symmetric
DateStyle| ISO, MDY
default_statistics_target| 300
default_text_search_config   | pg_catalog.english
effective_cache_size | 128GB
lc_messages  | en_US.UTF-8
lc_monetary  | en_US.UTF-8
lc_numeric   | en_US.UTF-8
lc_time  | en_US.UTF-8
listen_addresses | *
log_checkpoints  | on
log_destination  | syslog
log_directory| /var/lib/pgsql/cmates/admin
log_filename | postgresql-%a.log
log_line_prefix  | user=%u,db=%d,ip=%h
log_min_duration_statement   | 0
log_rotation_age | 1d
log_rotation_size| 0
log_timezone | US/Pacific
log_truncate_on_rotation | on
logging_collector| off
maintenance_work_mem | 32MB
max_connections  | 1500
max_locks_per_transaction| 1000
max_stack_depth  | 2MB
max_wal_senders  | 5
port | 5432
search_path  | activities, alert, announce, askme, audit,
authentication, book, btdt, bulletinboard, cache, cas, cdc, cmates, cmdba,
collection, dep, emailsubscription, emailvalidation, eventmail, feeds,
friend, geo, inbox, invitation, ir
, kkumar, merge, myvisitor, people, photos, prepsports, profile,
provisioning, quiz, registrant_icons, registration, reunion, school,
schoolfeed, shortlist, socialauth, statspack, story, symmetricds, target,
yearbook, "$user", public
shared_buffers   | 8GB
synchronous_commit   | off
syslog_facility  | local0
syslog_ident | postgres
TimeZone | US/Pacific
vacuum_freeze_table_age  | 0
wal_buffers  | 16MB
wal_level| archive
wal_sync_method  | fsync
work_mem | 8MB
(47 rows)


Any help would be most appreciated!

Thanks,
Karthik


Thanks,
Karthik


[GENERAL] MKDIR_P@: Command not found error in regression test

2013-05-01 Thread Karthik GP
Hi -
I installed postgres 9.2.2 from source successfully. I then made some minor
code changes to learn how to build postgres again. I am on Ubuntu linux
version 3.5.0-17.'./configure' and 'make' gave no errors, but while running
'make check', I get the following error:

make -C ../backend submake-errcodes
make[3]: Entering directory `/home/user/pgsql/pgsql/src/backend'
make[3]: Nothing to be done for `submake-errcodes'.
make[3]: Leaving directory `/home/user/pgsql/pgsql/src/backend'
make[2]: Leaving directory `/home/user/pgsql/pgsql/src/port'
rm -rf ./testtablespace
mkdir ./testtablespace
../../../src/test/regress/pg_regress --inputdir=.
--temp-install=./tmp_check --top-builddir=../../..   --dlpath=.
--schedule=./parallel_schedule
== creating temporary installation==

pg_regress: installation failed
Examine /home/user/pgsql/pgsql/src/test/regress/log/install.log for the
reason.
Command was: "make" -C "../../.."
DESTDIR="/home/user/pgsql/pgsql/src/test/regress/./tmp_check/install"
install > "/home/user/pgsql/pgsql/src/test/regress/log/install.log" 2>&1
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/user/pgsql/pgsql/src/test/regress'
make: *** [check] Error 2



Checking the regress logs at 'src/test/regress/log/install.log', I see the
following messages at the bottom:

/bin/mkdir -p
'/home/user/pgsql/pgsql/src/test/regress/./tmp_check/install/usr/local/pgsql/lib/pgxs/src/test/regress'
/bin/sh ../../../config/install-sh -c  pg_regress
'/home/user/pgsql/pgsql/src/test/regress/./tmp_check/install/usr/local/pgsql/lib/pgxs/src/test/regress/pg_regress'
make[2]: Leaving directory `/home/user/pgsql/pgsql/src/test/regress'
make[1]: MKDIR_P@: Command not found
make[1]: *** [installdirs-local] Error 127
make[1]: Leaving directory `/home/user/pgsql/pgsql/src'
make: *** [install-src-recurse] Error 2
make: Leaving directory `/home/user/pgsql/pgsql'

Any clues on what's wrong?? It's very unlikely that any of my code changes
are causing this.


Thanks
Karthik


Re: [GENERAL] Is there a way to start postgresql v907 as non daemon process

2012-05-25 Thread Karthik
Thank you Laurenz,

Even executing 
 
$ postgres -D /data/directory

Used to start server in deamon mode, the reason was the postgres.conf was
having "silent_mode = on" ( freebsd port used to do this ). Got this
solved by sending "silent_mode=off" command line. With this my problem is
solved. 

Trying to train my watch dog to look into postmaster.pid is also good idea
but I do not want to work on that as it is working fine.

Regards
Karthik

On 5/25/12 1:50 PM, "Albe Laurenz"  wrote:

>Karthik wrote:
>> I have observed by default posgresql 8.x onwards  starts as daemon
>process. Is there a way to avoid
>> it. I want it to run as normal process. The parent of the server
>process should be the process that
>> starts the server instead of being PID 1 ( init ). Till 7.4.2 ( Which
>was the older version used by us
>> ) was using running in normal mode. The same behavior is necessary as
>we have our watchdog failing to
>> identify that it server is actually running.
>> 
>> ( Transcript to help you what I am talking about )
>> 
>> c670e04:rkananth 16] /usr/local/bin/postgres --version
>> postgres (PostgreSQL) 9.0.7
>> 
>> c670e04:rkananth 11] postgres -D /data/db/postgres
>> c670e04:rkananth 12] ps -A | grep post
>> 65165  ??  Ss 0:00.01 /usr/local/bin/postgres
>> 65167  ??  Ss 0:00.00 postgres: writer process(postgres)
>> 65168  ??  Ss 0:00.00 postgres: wal writer process(postgres)
>> 65169  ??  Ss 0:00.00 postgres: autovacuum launcher process
>(postgres)
>> 65170  ??  Ss 0:00.00 postgres: stats collector process
>(postgres)
>> 65224   0  S+ 0:00.00 grep post
>> 
>> c670e04:rkananth 14] ps -p 65165 -o ppid
>>  PPID
>> 1
>> 
>> c670e04:rkananth 15] ps 1
>>   PID  TT  STAT  TIME COMMAND
>> 1  ??  SLs0:00.17 /sbin/init --
>> 
>> I would be happy to provide further information if required.
>
>If you really want that, start the PostgreSQL server this way:
>
>$ postgres -D /data/directory &
>
>But can't you teach your watchdog to read the postgresql.pid file?
>
>Yours,
>Laurenz Albe



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Is there a way to start postgresql v907 as non daemon process

2012-05-24 Thread Karthik
I have observed by default posgresql 8.x onwards  starts as daemon process.
Is there a way to avoid it. I want it to run as normal process. The parent
of the server process should be the process that starts the server instead
of being PID 1 ( init ). Till 7.4.2 ( Which was the older version used by us
) was using running in normal mode. The same behavior is necessary as we
have our watchdog failing to identify that it server is actually running.

( Transcript to help you what I am talking about )

c670e04:rkananth 16] /usr/local/bin/postgres --version
postgres (PostgreSQL) 9.0.7

c670e04:rkananth 11] postgres -D /data/db/postgres
c670e04:rkananth 12] ps -A | grep post
65165  ??  Ss 0:00.01 /usr/local/bin/postgres
65167  ??  Ss 0:00.00 postgres: writer process(postgres)
65168  ??  Ss 0:00.00 postgres: wal writer process(postgres)
65169  ??  Ss 0:00.00 postgres: autovacuum launcher process
(postgres)
65170  ??  Ss 0:00.00 postgres: stats collector process(postgres)
65224   0  S+ 0:00.00 grep post

c670e04:rkananth 14] ps -p 65165 -o ppid
 PPID
1

c670e04:rkananth 15] ps 1
  PID  TT  STAT  TIME COMMAND
1  ??  SLs0:00.17 /sbin/init --

I would be happy to provide further information if required.




[GENERAL] could not open relation:no such file or directory

2006-12-28 Thread karthik
hello,

my name is karthik .

 i facing a problem when trying to select values from a table in
postgresql.

   when i execute a query like "select title from itemsbytitle;"   i
get error as

   Error:Could not open relation "itemsbytitle". no such file or
directory.

   can anybody help me to find an answer for this problem.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings