On 12/05/2014 12:54 PM, Josh Berkus wrote:
> Hackers,
> 
> This is not a complete enough report for a diagnosis.  I'm posting it
> here just in case someone else sees something like it, and having an
> additional report will help figure out the underlying issue.
> 
> * 700GB database with around 5,000 writes per second
> * 8 replicas handling around 10,000 read queries per second each
> * replicas are slammed (40-70% utilization)
> * replication produces lots of replication query cancels
> 
> In this scenario, a specific query against some of the less busy and
> fairly small tables would produce a segfault (signal 11) once every 1-4
> days randomly.  This query could have 100's of successful runs for every
> segfault. This was not reproduceable manually, and the segfaults never
> happened on the master.  Nor did we ever see a segfault based on any
> other query, including against the tables which were generally the
> source of the query cancels.
> 
> In case it's relevant, the query included use of regexp_split_to_array()
> and ORDER BY random(), neither of which are generally used in the user's
> other queries.
> 
> We made some changes which decreased query cancel (optimizing queries,
> turning on hot_standby_feedback) and we haven't seen a segfault since
> then.  As far as the user is concerned, this solves the problem, so I'm
> never going to get a trace or a core dump file.

Forgot a major piece of evidence as to why I think this is related to
query cancel:  in each case, the segfault was preceeded by a
multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
the backend running the query which segfaulted might have been the only
backend *not* cancelled due to query conflict concurrently.
Contradicting this, there are other multi-backend query cancels in the
logs which do NOT produce a segfault.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to