Re: [PERFORM] Hanging query on a fresh restart

2015-11-14 Thread Kevin Grittner
On Friday, November 13, 2015 3:41 PM, Jim Nasby  
wrote:
> On 11/10/15 3:42 PM, Jason Jho wrote:

>> We using Postgres 9.3.10

>> What is the most confusing part in all of this is why a DROP
>> SCHEMA CASCADE and a fresh pg_restore would somehow fix the
>> problem. Even a fresh reboot didn't fix it.
>
> Without more info we're stuck guessing. You might try submitting
> a ticket with amazon, especially if you can reproduce this.

There have been occasional reports of corrupted indexes causing
endless loops which could cause these symptoms if one core was
pegged at 100% during the incident.  There are many possible causes
for such corruption -- see:

http://rhaas.blogspot.com/2012/03/why-is-my-database-corrupted.html

That said, there was a long-standing bug in btree index page
deletion (which could only happen during vacuum or autovacuum)
which was fixed in 9.4:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=efada2b8e920adfdf7418862e939925d2acd1b89

It was pretty hard to hit, and normally wouldn't cause these
symptoms, but if there was a "perfect storm" of events before the
problem self-corrected, I think it might possibly lead to this.  If
we could somehow confirm that this old bug was the cause, it might
justify pushing this patch back into older branches.  As the commit
message said:

| This bug is old, all supported versions are affected, but this patch is too
| big to back-patch (and changes the WAL record formats of related records).
| We have not heard any reports of the bug from users, so clearly it's not
| easy to bump into. Maybe backpatch later, after this has had some field
| testing.

Did you make a filesystem-level copy of the data directory?  If so,
the first step in checking this theory would be to restore a copy
and reindex all indexes used by the problem query to see if that
fixes it.  If it does, close examination of the corrupted index
might provide clues about how the corruption occurred.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hanging query on a fresh restart

2015-11-13 Thread Jim Nasby

On 11/10/15 3:42 PM, Jason Jho wrote:

On one particular day, this query hung for many hours and even while we
killed pids for running queries and any locks granted, the query would
never return.  Also no hints of blocking processes.  After some digging
through some I/O metrics, we didn't see any memory issues or unusual
spikes that would lead us to believe that we're running low on resources.


Did IO stats indicate IO was happening? Did you see a pegged CPU running 
the query?



There is 1 caveat, however:  there was a different schema that contained
a day-old copy of data that isn't normally present when the hang started
to occur.  However, since these are completely different schema
namespaces with no crossovers in the queries themselves, I don't see how
this is relevant.


If search_path wasn't what you thought it was you could have easily been 
running against the wrong set of tables.



We thought this might be possibly due to some internal vacuuming, but
this is unlikely since there are no real concurrent reads or updates
happening.  Auto-vacuum is also on with default settings.


There are other reasons why autovacuum could kick in, notably to prevent 
transaction ID wraparound.



What is the most confusing part in all of this is why a DROP SCHEMA
CASCADE and a fresh pg_restore would somehow fix the problem.  Even a
fresh reboot didn't fix it.


Without more info we're stuck guessing. You might try submitting a 
ticket with amazon, especially if you can reproduce this.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Hanging query on a fresh restart

2015-11-10 Thread Jason Jho
Hi,

We using Postgres 9.3.10 on Amazon RDS and running into some strange
behavior that has been tough to track down and debug (partially due to the
limited admin access from RDS).

We're running a read-only query that normally takes ~10-15 min., but also
runs concurrently with several other intensive queries (these queries
themselves, finish).

On one particular day, this query hung for many hours and even while we
killed pids for running queries and any locks granted, the query would
never return.  Also no hints of blocking processes.  After some digging
through some I/O metrics, we didn't see any memory issues or unusual spikes
that would lead us to believe that we're running low on resources.

There is 1 caveat, however:  there was a different schema that contained a
day-old copy of data that isn't normally present when the hang started to
occur.  However, since these are completely different schema namespaces
with no crossovers in the queries themselves, I don't see how this is
relevant.


 1) We ended up doing a full reboot of the RDS instance and ran the query
again, this time, no other queries are running off of a fresh boot-up (no
competing locks or transactions).  The query continued to hang.

 2) We then ran pg_dump to snapshot the current data and did a full
pg_restore (after dropping all schemas) of an older dataset where we knew
this query would run successfully.  As expected, the query ran fine.

 3) We then dropped all schemas again and pg_restored the previous dataset
that was causing the query to hang, and then to my surprise, the query ran
just fine.  No hangs.

We thought this might be possibly due to some internal vacuuming, but this
is unlikely since there are no real concurrent reads or updates happening.
Auto-vacuum is also on with default settings.

What is the most confusing part in all of this is why a DROP SCHEMA CASCADE
and a fresh pg_restore would somehow fix the problem.  Even a fresh reboot
didn't fix it.

Any ideas??


Re: [PERFORM] Hanging Query

2006-01-14 Thread Tom Lane
[EMAIL PROTECTED] writes:
> We have to inserts a records(15000- 2) into a table which also
> contains (15000-2) records, then after insertion, we have to delete
> the records according to a business rule.
> Above process is taking place in a transaction and we are using batches
> of 128 to insert records.
> Everything works fine on QA environment but somehow after inserts,
> delete query hangs in production environment. Delete query has some
> joins with other table and a self join. There is no exception as we
> have done enough exception handling. It simply hangs with no trace in
> application logs.

> When I do "ps aux" , I see
> postgres  5294 41.3  2.4 270120 38092 pts/4  R10:41  52:56
> postgres: nuuser nm 127.0.0.1 DELETE

That doesn't look to me like it's "hanging"; it's trying to process
some unreasonably long-running query.  If I were you I'd be taking
a closer look at that DELETE command.  It may contain an unconstrained
join (cross-product) or some such.  Try EXPLAINing the command and
look for unexpected table scans.

> Postgres 7.3.4 on Linux..

That's mighty ancient and has many known bugs.  Do yourself a favor
and update to some newer version --- at the very least, use the latest
7.3 branch release (we're up to 7.3.13 now).

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Hanging Query

2006-01-13 Thread vimal . gupta

We have to inserts a records(15000- 2) into a table which also
contains (15000-2) records, then after insertion, we have to delete
the records according to a business rule.
Above process is taking place in a transaction and we are using batches
of 128 to insert records.
Everything works fine on QA environment but somehow after inserts,
delete query hangs in production environment. Delete query has some
joins with other table and a self join. There is no exception as we
have done enough exception handling. It simply hangs with no trace in
application logs.

When I do "ps aux" , I see
postgres  5294 41.3  2.4 270120 38092 pts/4  R10:41  52:56
postgres: nuuser nm 127.0.0.1 DELETE

Postgres 7.3.4 on Linux..

Thanks for any help..

Vimal


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings