Re: [GENERAL] Lots of stuck queries after upgrade to 9.4
That would need a replica of the data probably which is not possible (tablespace is 4TB). *Spiros Ioannou IT Manager, inAccesswww.inaccess.com <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* On 30 July 2015 at 21:47, Scott Marlowe wrote: > You might want to try pg replay: http://laurenz.github.io/pgreplay/ > > On Thu, Jul 30, 2015 at 7:23 AM, Spiros Ioannou > wrote: > >> I'm very sorry but we don't have a synthetic load generator for our >> testing setup, only production and that is on SLA. I would be happy to test >> the next release though. >> >> >> >> >> >> >> >> >> *Spiros Ioannou IT Manager, inAccesswww.inaccess.com >> <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* >> >> On 29 July 2015 at 13:42, Heikki Linnakangas wrote: >> >>> On 07/28/2015 11:36 PM, Heikki Linnakangas wrote: >>> >>>> A-ha, I succeeded to reproduce this now on my laptop, with pgbench! It >>>> seems to be important to have a very large number of connections: >>>> >>>> pgbench -n -c400 -j4 -T600 -P5 >>>> >>>> That got stuck after a few minutes. I'm using commit_delay=100. >>>> >>>> Now that I have something to work with, I'll investigate this more >>>> tomorrow. >>>> >>> >>> Ok, it seems that this is caused by the same issue that I found with my >>> synthetic test case, after all. It is possible to get a lockup because of >>> it. >>> >>> For the archives, here's a hopefully easier-to-understand explanation of >>> how the lockup happens. It involves three backends. A and C are insertion >>> WAL records, while B is flushing the WAL with commit_delay. The byte >>> positions 2000, 2100, 2200, and 2300 are offsets within a WAL page. 2000 >>> points to the beginning of the page, while the others are later positions >>> on the same page. WaitToFinish() is an abbreviation for >>> WaitXLogInsertionsToFinish(). "Update pos X" means a call to >>> WALInsertLockUpdateInsertingAt(X). "Reserve A-B" means a call to >>> ReserveXLogInsertLocation, which returned StartPos A and EndPos B. >>> >>> Backend A Backend B Backend C >>> - - - >>> Acquire InsertLock 2 >>> Reserve 2100-2200 >>> Calls WaitToFinish() >>> reservedUpto is 2200 >>> sees that Lock 1 is >>> free >>> Acquire InsertLock 1 >>> Reserve 2200-2300 >>> GetXLogBuffer(2200) >>> page not in cache >>> Update pos 2000 >>> AdvanceXLInsertBuffer() >>> run until about to >>> acquire WALWriteLock >>> GetXLogBuffer(2100) >>> page not in cache >>> Update pos 2000 >>> AdvanceXLInsertBuffer() >>> Acquire WALWriteLock >>> write out old page >>> initialize new page >>> Release WALWriteLock >>> finishes insertion >>> release InsertLock 2 >>> WaitToFinish() continues >>> sees that lock 2 is >>> free. Returns 2200. >>> >>> Acquire WALWriteLock >>> Call WaitToFinish(2200) >>> blocks on Lock 1, >>> whose initializedUpto >>> is 2000. >>> >>> At this point, there is a deadlock between B and C. B is waiting for C >>> to release the lock or update its insertingAt value past 2200, while C is >>> waiting for WALInsertLock, held by B. >>> >>> To fix that, let's fix GetXLogBuffer() to always advertise the exact >>> position, not the beginning of the page (except when inserting the first >>> record on the page, just after the page header, see comments). >>> >>> This fixes the problem for me. I've been running pgbench for about 30 >>> minutes without lockups now, while without the patch it locked up within a >>> couple of minutes. Spiros, can you easily test this patch in your >>> environment? Would be nice to get a confirmation that this fixes the >>> problem for you too. >>> >>> - Heikki >>> >>> >> > > > -- > To understand recursion, one must first understand recursion. >
Re: [GENERAL] Lots of stuck queries after upgrade to 9.4
> > Hmm. With that CTE query, were there other queries running at the same > time? yes, a multitude of INSERT and one COMMIT as I remember. I've noticed that in every stuck situation there was always 1 stuck COMMIT running. We do mainly INSERT and SELECT, very rarely UPDATE or DELETE. but the stuck queries happen also without the CTE. The trigger on insert runs non-stop though. I think I've noticed a different behaviour with 9.4 and the CTE but I'm just 50% sure about that, but may be relevant or not: in 9.3 while the CTE was running, I think the trigger-filled table was locked and no inserts could be made to the primary table (which copies all inserts to the trigger table). In 9.4 inserts can be made at a lower rate even when CTE runs. As I said not 100% sure about the behaviour was like that in 9.3. I was able to debug the synthetic test case I created, but unfortunately I > don't think it explains the lock up you're seeing after all. > > It's possible for WaitXLogInsertionsToFinish() to move backwards, in this > scenario: > > 1. Backend A acquires WALInsertLock 2, and reserves xlog between byte > positions 2100 - 2200 > 2. Backend B calls WaitXLogInsertionsToFinish(), which blocks on backend > A, which hasn't advertised any location yet. > 3. Backend C acquires WALInsertLock 1, and reserves xlog between byte > positions 2200 - 2300 > 4. Backend C calls GetXLogBuffer(), and sees that the page is not in cache > yet. (It does not call WALInsertLockUpdateInsertingAt() yet, because it's a > bit slow or context-switched out) > 5. Backend A initializes the page, completes inserting its WAL record, and > releases its WALInsertLock. > 6. Backend B gets unblocked, seeing that the lock held by B is now free. > It calculated 2200 as the return value, which was the latest reserved WAL > position. (Backend C started after it began, so it didn't have to wait for > it) > 7. Backend C calls WALInsertLockUpdateInsertingAt(), with a WAL position > pointing to the beginning of the page, 2000. > > If you now call WALInsertLockUpdateInsertingAt() again, it will return > 2000, because backend C is the only backend holding a lock, and its > advertised position is 2000. But the previous call calculated 2200. > GetXLogBuffer() always advertises a WAL position at the beginning of the > requested page, but that's a bit bogus. > > However, AFAICS that is actually harmless. Backend C is not blocked. The > page it's looking for is certainly in cache at this point, so it can > continue without blocking. So I don't think this explains your lockup. Thanks for the investigation. The only think I can offer is that it's still running fine with commit_delay = 0. Perhaps a warning should be put on the documentation until the cause is found. I've found a similar issue online: http://dba.stackexchange.com/questions/96957/postgresql-9-4-1-stuck-all-queries-when-making-multi-updates thanks -Spiros
Re: [GENERAL] Lots of stuck queries after upgrade to 9.4
Well, so far with commit_delay=0 no problems. I will report back of couse if something happens, but I believe that the problem may indeed be solved/masked with that setting. Rough description of our setup, or how to reproduce: * Timeseries data in table , say, "measurements", size: 3-4TB, about 1000 inserts/second * table measurements also has a trigger on insert to also insert on measurements_a (for daily export purposes) Just the above would cause a stuck query after a few days. Now for exporting we run the following CTE query (measurements_b is an empty table, measurements_a has about 5GB) * WITH d_rows AS (DELETE FROM measurement_events_a RETURNING * ) INSERT INTO measurement_events_b SELECT * FROM d_rows; The above caused the problem to appear every time, after a 10-20 minutes. Regards, -Spiros *Spiros Ioannou IT Manager, inAccesswww.inaccess.com <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* On 20 July 2015 at 17:02, Andres Freund wrote: > On 2015-07-20 17:00:52 +0300, Spiros Ioannou wrote: > > FYI we have an 9.3.5 with commit_delay = 4000 and commit_siblings = 5 > with > > a 8TB dataset which seems fine. (Runs on different - faster hardware > > though). > > 9.4 has a different xlog insertion algorithm (scaling much better), so > that unfortunately doesn't say very much... >
Re: [GENERAL] Lots of stuck queries after upgrade to 9.4
FYI we have an 9.3.5 with commit_delay = 4000 and commit_siblings = 5 with a 8TB dataset which seems fine. (Runs on different - faster hardware though). *Spiros Ioannou IT Manager, inAccesswww.inaccess.com <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* On 20 July 2015 at 15:01, Andres Freund wrote: > Heikki, > > On 2015-07-20 13:27:12 +0200, Andres Freund wrote: > > On 2015-07-20 13:22:42 +0200, Andres Freund wrote: > > > Hm. The problem seems to be the WaitXLogInsertionsToFinish() call in > > > XLogFlush(). > > > > These are the relevant stack traces: > > db9lock/debuglog-commit.txt > > #2 0x7f7405bd44f4 in LWLockWaitForVar (l=0x7f70f2ab6680, > valptr=0x7f70f2ab66a0, oldval=, newval=0x) > at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:1011 > > #3 0x7f7405a0d3e6 in WaitXLogInsertionsToFinish > (upto=121713318915952) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1755 > > #4 0x7f7405a0e1d3 in XLogFlush (record=121713318911056) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2849 > > > > db9lock/debuglog-insert-8276.txt > > #1 0x7f7405b77d91 in PGSemaphoreLock (sema=0x7f73ff6531d0, > interruptOK=0 '\000') at pg_sema.c:421 > > #2 0x7f7405bd4849 in LWLockAcquireCommon (val=, > valptr=, mode=, l=) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:626 > > #3 LWLockAcquire (l=0x7f70ecaaa1a0, mode=LW_EXCLUSIVE) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:467 > > #4 0x7f7405a0dcca in AdvanceXLInsertBuffer (upto=, > opportunistic=) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2161 > > #5 0x7f7405a0e301 in GetXLogBuffer (ptr=121713318928384) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1848 > > #6 0x7f7405a0e9c9 in CopyXLogRecordToWAL (EndPos=, > StartPos=, rdata=0x71c21b90, isLogSwitch= out>, write_len=) at > /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1494 > > #7 XLogInsert (rmid=, info=, > rdata=) at /tmp/buildd/postgre > > > XLogFlush() has the following comment: > /* > * Re-check how far we can now flush the WAL. It's > generally not > * safe to call WaitXLogInsertionsToFinish while > holding > * WALWriteLock, because an in-progress insertion > might need to > * also grab WALWriteLock to make progress. But we > know that all > * the insertions up to insertpos have already > finished, because > * that's what the earlier > WaitXLogInsertionsToFinish() returned. > * We're only calling it again to allow insertpos > to be moved > * further forward, not to actually wait for > anyone. > */ > insertpos = WaitXLogInsertionsToFinish(insertpos); > > but I don't think that's valid reasoning. WaitXLogInsertionsToFinish() > calls LWLockWaitForVar(oldval = InvalidXLogRecPtr), which will block if > there's a exlusive locker and some backend doesn't yet have set > initializedUpto. Which seems like a ossible state? >
Re: [GENERAL] Lots of stuck queries after upgrade to 9.4
Debian 6, x64, HP Proliant, dbsize: 3TB on EMC AX-4i (iscsi), 16 cores, 24GB RAM. kernel: Linux vserver11 2.6.32-5-xen-amd64 #1 SMP Sun Dec 7 22:52:42 UTC 2014 x86_64 GNU/Linux I attach postgresql.conf *Spiros Ioannou IT Manager, inAccesswww.inaccess.com <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* On 20 July 2015 at 16:04, Andres Freund wrote: > Hi, > > On 2015-07-20 15:58:33 +0300, Spiros Ioannou wrote: > > Happened again, another backtrace from a COMMIT process. I changed the > > commit_delay to 0 (it was 4000 to help with our storage) and will report > > back. > > What hardware & OS is this happening on? > > Regards, > > Andres > postgresql.conf Description: Binary data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Lots of stuck queries after upgrade to 9.4
Happened again, another backtrace from a COMMIT process. I changed the commit_delay to 0 (it was 4000 to help with our storage) and will report back. Already logging to debuglog.txt. #0 0x7f47b0789ec7 in semop () from /lib/libc.so.6 #1 0x7f47b2513d91 in PGSemaphoreLock (sema=0x7f47abfe7ed0, interruptOK=0 '\000') at pg_sema.c:421 #2 0x7f47b25704f4 in LWLockWaitForVar (l=0x7f449f452680, valptr=0x7f449f4526a0, oldval=, newval=0x) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:1011 #3 0x7f47b23a93e6 in WaitXLogInsertionsToFinish (upto=121780827906336) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1755 #4 0x7f47b23aa1d3 in XLogFlush (record=121780827898328) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2849 #5 0x7f47b2398f17 in RecordTransactionCommit () at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xact.c:1196 #6 0x7f47b239921b in CommitTransaction () at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xact.c:1953 #7 0x7f47b239a8e5 in CommitTransactionCommand () at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xact.c:2732 #8 0x7f47b257c105 in finish_xact_command () at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/tcop/postgres.c:2437 #9 0x7f47b2580a10 in exec_execute_message (max_rows=, portal_name=) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/tcop/postgres.c:1974 #10 PostgresMain (argc=, argv=, dbname=0x7f47b49b5730 "ifms_db", username=) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/tcop/postgres.c:4137 #11 0x7f47b25280f5 in BackendRun (port=) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:4164 #12 BackendStartup (port=) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:3829 #13 ServerLoop () at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:1597 #14 PostmasterMain (argc=, argv=) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/postmaster/postmaster.c:1244 #15 0x7f47b24ba474 in main (argc=5, argv=0x7f47b49b4920) at /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/main/main.c:228 Detaching from program: /usr/lib/postgresql/9.4/bin/postgres, process 15601 *Spiros Ioannou IT Manager, inAccesswww.inaccess.com <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* On 20 July 2015 at 15:21, Heikki Linnakangas wrote: > On 07/20/2015 03:01 PM, Andres Freund wrote: > >> Heikki, >> >> On 2015-07-20 13:27:12 +0200, Andres Freund wrote: >> >>> On 2015-07-20 13:22:42 +0200, Andres Freund wrote: >>> >>>> Hm. The problem seems to be the WaitXLogInsertionsToFinish() call in >>>> XLogFlush(). >>>> >>> >>> These are the relevant stack traces: >>> db9lock/debuglog-commit.txt >>> #2 0x7f7405bd44f4 in LWLockWaitForVar (l=0x7f70f2ab6680, >>> valptr=0x7f70f2ab66a0, oldval=, newval=0x) >>> at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:1011 >>> #3 0x7f7405a0d3e6 in WaitXLogInsertionsToFinish >>> (upto=121713318915952) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1755 >>> #4 0x7f7405a0e1d3 in XLogFlush (record=121713318911056) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2849 >>> >>> db9lock/debuglog-insert-8276.txt >>> #1 0x7f7405b77d91 in PGSemaphoreLock (sema=0x7f73ff6531d0, >>> interruptOK=0 '\000') at pg_sema.c:421 >>> #2 0x7f7405bd4849 in LWLockAcquireCommon (val=, >>> valptr=, mode=, l=) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:626 >>> #3 LWLockAcquire (l=0x7f70ecaaa1a0, mode=LW_EXCLUSIVE) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:467 >>> #4 0x7f7405a0dcca in AdvanceXLInsertBuffer (upto=, >>> opportunistic=) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2161 >>> #5 0x7f7405a0e301 in GetXLogBuffer (ptr=121713318928384) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1848 >>> #6 0x7f7405a0e9c9 in CopyXLogRecordToWAL (EndPos=, >>> StartPos=, rdata=0x71c21b90, isLogSwitch=>> out>, write_len=) at >>> /tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1494 >>> #7 XLogInsert (rmid=, info=, >>> rdata=) at /tmp/buildd/postgre >>> >> >> >> XLog
Re: [GENERAL] Lots of stuck queries after upgrade to 9.4
Hi Tom, thank you for your input. The DB was stuck again, I attach all logs and stack traces. A stack trace from a COMMIT, an INSERT, an UPDATE, the wal writer, the writer, and a sequence. Stracing the commit was stuck at: semop(3145761, {{12, -1, 0}}, 1 Please tell me what else could I do to help debugging. Is there a way to downgrade postgresql ? *Spiros Ioannou IT Manager, inAccesswww.inaccess.com <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* On 20 July 2015 at 01:01, Tom Lane wrote: > Spiros Ioannou writes: > > We upgraded to 9.4.4 from 8.4 We have tried 9.3.x in the past with > success. > > Unfortunately in 9.4 gets stuck 2-3 times a week. > > > Lots of queries seem stuck, that is they have waiting:false. Queries can > be > > COMMIT or INSERT. > > No blocking locks seem to exist. > > strace reveals that all stuck processes they are in semop > > semop(2064417, {{0, -1, 0}}, 1^C > > Stack traces of the stuck processes might help determine what's going on. > > > https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend > > Please try to capture traces for both the COMMIT and INSERT cases, > as they're almost certainly different. > > regards, tom lane > db9lock.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Lots of stuck queries after upgrade to 9.4
Os is debian 6
[GENERAL] Lots of stuck queries after upgrade to 9.4
We upgraded to 9.4.4 from 8.4 We have tried 9.3.x in the past with success. Unfortunately in 9.4 gets stuck 2-3 times a week. Lots of queries seem stuck, that is they have waiting:false. Queries can be COMMIT or INSERT. No blocking locks seem to exist. strace reveals that all stuck processes they are in semop semop(2064417, {{0, -1, 0}}, 1^C ipcs reveals no shortage of semaphores.. Please any ideas very welcome, our service now in a very sad state.. *Spiros Ioannou IT Manager, inAccesswww.inaccess.com <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358*
Re: [GENERAL] How to access large objects in Postgresql c-language functions?
Thanks, that's interesting, I will look into that. Regards, -Spiros
Re: [GENERAL] How to access large objects in Postgresql c-language functions?
But these need a connection to the postgresql backend. It seems an overkill for postgresql to connect to itself, I would prefer something more optimal. It seems for now that reading libpq's source code is the only way to go. Regards, Spiros
[GENERAL] How to access large objects in Postgresql c-language functions?
I'm writing a c-language user defined function for postgresql 9.3.5. I didn't use pl/pgsql or pl/python because part of it needs to be written in C for speed. The function gets an existing OID as an argument. How can I access & modify the OID data from within the c-language function? I can't find any docs for that! Thanks, -Spiros
Re: [GENERAL] partitioning query planner almost always scans all tables
Thank you for your input. When doing an "ORDER BY measurement_time DESC" I had hoped for the query planner to firstly query the most recent "child" table, i.e. the table holding current month's data, and then move-on to the oldest table, since it knows the partition is based on measurement_time. Instead it always queries the oldest table first and then moves on to the current. We did the partitioning to be able to drop old data saving disk space, as you said. Not for performance reasons, although we hoped than having smaller indexes would also help. Best regards, -Spiros *Spiros Ioannou IT Manager, inAccesswww.inaccess.com <http://www.inaccess.com>M: +30 6973-903808T: +30 210-6802-358* On 22 January 2015 at 17:37, Tom Lane wrote: > Spiros Ioannou writes: > > It is ORDER BY measurement_time, not measurement_id, and measurement_time > > is used to create the partition. So the planner should know the correct > > order, but instead it seems to query tables in the wrong order. > > The planner does not know that, and even if it attempted to figure it out > by comparing the child tables' constraints, it could not generate a plan > that considered only one child table as you incorrectly imagine. What if > the "latest" table turned out to be empty at runtime? > > The obtained plan with a Merge Append atop Index Scan Backwards nodes > seems perfectly reasonable to me. This will result in fetching only the > latest row within each partition, so that the work involved is O(number of > partitions) not O(total number of rows). > > If you're not happy with that, reconsider how many partitions you really > need. Newbies almost invariably create far more partitions than is a good > idea for performance. In my view, if you've got more than a couple dozen, > you're doing it wrong. Partitioning is, in general, not a benefit for > query performance (except in a few very narrow, specialized cases); and > the more partitions you have the worse the penalty. Partitioning only > helps for data management, in particular being able to drop old data in > bulk rather than through expensive DELETE WHERE queries. How often do > you do that, and do you really need to be able to do it at a small > granularity? > > regards, tom lane >
Re: [GENERAL] partitioning query planner almost always scans all tables
> > > > EXPLAIN ANALYZE SELECT * FROM measurement_events WHERE > > measurement_source_id='df86917e-8df0-11e1-8f8f-525400e76ceb' ORDER BY > > measurement_time DESC LIMIT 1; > > > > This seems to fail, scanning all tables. Do you think this can be > improved > > at all ? The query plan of the above query is as follows: > > The combination of sorting by measurement_source_id and limit > hinders constraint exclusion because the order of the column > across whole the inheritance is not known to planner. And the > below plan also dosn't show whether constraint exclusion worked > or not, by the same reason. But I suppose it worked. > It is ORDER BY measurement_time, not measurement_id, and measurement_time is used to create the partition. So the planner should know the correct order, but instead it seems to query tables in the wrong order.
Re: [GENERAL] partitioning query planner almost always scans all tables
@Rob Sargent: sorry Rob, not sure what you are asking. @Kyotaro HORIGUCHI thanks for your reply and time Kyotaro, Using the following query EXPLAIN ANALYZE SELECT * FROM measurement_events WHERE measurement_source_id='df86917e-8df0-11e1-8f8f-525400e76ceb' AND measurement_time >= '2015-01-01 00:00:00+0' LIMIT 1; produces this plan: Limit (cost=0.00..4.02 rows=1 width=67) (actual time=49.125..49.125 rows=1 loops=1) -> Append (cost=0.00..3644.05 rows=907 width=67) (actual time=49.122..49.122 rows=1 loops=1) -> Seq Scan on measurement_events (cost=0.00..0.00 rows=1 width=966) (actual time=0.003..0.003 rows=0 loops=1) Filter: ((measurement_time >= '2015-01-01 02:00:00+02'::timestamp with time zone) AND (measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid)) -> Bitmap Heap Scan on *measurement_events_p2015_01* (cost=41.73..3546.10 rows=894 width=54) (actual time=49.119..49.119 rows=1 loops=1) Recheck Cond: ((measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) AND (measurement_time >= '2015-01-01 02:00:00+02'::timestamp with time zone)) -> Bitmap Index Scan on *measurement_events_p2015_01_pkey* (cost=0.00..41.51 rows=894 width=0) (actual time=41.836..41.836 rows=997 loops=1) Index Cond: ((measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) AND (measurement_time >= '2015-01-01 02:00:00+02'::timestamp with time zone)) -> Index Scan using *measurement_events_p2015_02_pkey* on *measurement_events_p2015_02* (cost=0.14..8.16 rows=1 width=966) (never executed) Index Cond: ((measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) AND (measurement_time >= '2015-01-01 02:00:00+02'::timestamp with time zone)) -> Index Scan using *measurement_events_p2015_03_pkey* on *measurement_events_p2015_03* (cost=0.14..8.16 rows=1 width=966) (never executed) Index Cond: ((measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) AND (measurement_time >= '2015-01-01 02:00:00+02'::timestamp with time zone)) ... (cut for brevity) 1) Do you know if this means that the query will stop on 1st find (since it is limit 1), or will it search all tables regardless results? 2) To improve on the above, do you (or anyone else) have any input on this: to get the latest value from all tables, we were using the following query (before partitioning): EXPLAIN ANALYZE SELECT * FROM measurement_events WHERE measurement_source_id='df86917e-8df0-11e1-8f8f-525400e76ceb' ORDER BY measurement_time DESC LIMIT 1; This seems to fail, scanning all tables. Do you think this can be improved at all ? The query plan of the above query is as follows: -- Limit (cost=5.57..9.99 rows=1 width=921) (actual time=5.361..5.361 rows=1 loops=1) -> Merge Append (cost=5.57..451374.16 rows=102155 width=921) (actual time=5.359..5.359 rows=1 loops=1) Sort Key: measurement_events.measurement_time -> Index Scan Backward using measurement_events_pkey on *measurement_events* (cost=0.12..8.14 rows=1 width=966) (actual time=0.004..0.004 rows=0 loops=1) Index Cond: (measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) -> Index Scan Backward using *measurement_events_p2014_01_pkey* on *measurement_events_p2014_01* (cost=0.14..8.16 rows=1 width=966) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: (measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) -> Index Scan Backward using *measurement_events_p2014_02_pkey* on *measurement_events_p2014_02* (cost=0.14..8.16 rows=1 width=966) (actual time=0.001..0.001 rows=0 loops=1) Index Cond: (measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) -> Index Scan Backward using *measurement_events_p2014_03_pkey* on *measurement_events_p2014_03* (cost=0.14..8.16 rows=1 width=966) (actual time=0.001..0.001 rows=0 loops=1) Index Cond: (measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) . (cut for brevity)
[GENERAL] partitioning query planner almost always scans all tables
Hello group, we have a timeseries table, and we tried to partition it by month (with pg_partman). It seems the query planner always reads all tables regardless of WHERE, except when WHERE is equality. *the parent table:* ifms_db=# \dS measurement_events Table "public.measurement_events" Column | Type | Modifiers ---+--+--- measurement_source_id | uuid | not null measurement_time | timestamp with time zone | not null event_reception_time | timestamp with time zone | not null measurement_value | character varying(200) | not null quality | character varying(500) | not null Indexes: "measurement_events_pkey" PRIMARY KEY, btree (measurement_source_id, measurement_time) Triggers: measurement_events_part_trig BEFORE INSERT ON measurement_events FOR EACH ROW EXECUTE PROCEDURE measurement_events_part_trig_func() Number of child tables: 25 (Use \d+ to list them.) One of the children tables (2014_3) ifms_db=# \dS measurement_events_p2014_03 Table "public.measurement_events_p2014_03" Column | Type | Modifiers ---+--+--- measurement_source_id | uuid | not null measurement_time | timestamp with time zone | not null event_reception_time | timestamp with time zone | not null measurement_value | character varying(200) | not null quality | character varying(500) | not null Indexes: "measurement_events_p2014_03_pkey" PRIMARY KEY, btree (measurement_source_id, measurement_time) Check constraints: "measurement_events_p2014_03_partition_check" CHECK (measurement_time >= '2014-03-01 00:00:00+02'::timestamp with time zone AND measurement_time < '2014-04-01 00:00:00+03'::timestamp with time zone) Inherits: measurement_events The query: # explain analyze select * from measurement_events where measurement_source_id='df86917e-8df0-11e1-8f8f-525400e76ceb' AND measurement_time >= DATE '2015-01-01' limit 1; Limit (cost=0.00..4.12 rows=1 width=87) (actual time=0.377..0.377 rows=1 loops=1) -> Append (cost=0.00..2696.08 rows=655 width=87) (actual time=0.376..0.376 rows=1 loops=1) -> Seq Scan on measurement_events (cost=0.00..0.00 rows=1 width=966) (actual time=0.001..0.001 rows=0 loops=1) Filter: ((measurement_time >= '2015-01-01'::date) AND (measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid)) -> Index Scan using measurement_events_p2014_01_pkey on measurement_events_p2014_01 (cost=0.14..8.16 rows=1 width=966) (actual time=0.005..0.005 rows=0 loops=1) Index Cond: ((measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) AND (measurement_time >= '2015-01-01'::date)) -> Index Scan using measurement_events_p2014_02_pkey on measurement_events_p2014_02 (cost=0.14..8.16 rows=1 width=966) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: ((measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) AND (measurement_time >= '2015-01-01'::date)) -> Index Scan using measurement_events_p2014_03_pkey on measurement_events_p2014_03 (cost=0.14..8.16 rows=1 width=966) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: ((measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) AND (measurement_time >= '2015-01-01'::date)) -> Index Scan using measurement_events_p2014_04_pkey on measurement_events_p2014_04 (cost=0.14..8.16 rows=1 width=966) (actual time=0.001..0.001 rows=0 loops=1) Index Cond: ((measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) AND (measurement_time >= '2015-01-01'::date)) -> Index Scan using measurement_events_p2014_05_pkey on measurement_events_p2014_05 (cost=0.14..8.16 rows=1 width=966) (actual time=0.001..0.001 rows=0 loops=1) Index Cond: ((measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) AND (measurement_time >= '2015-01-01'::date)) -> Index Scan using measurement_events_p2014_06_pkey on measurement_events_p2014_06 (cost=0.14..8.16 rows=1 width=966) (actual time=0.002..0.002 rows=0 loops=1) Index Cond: ((measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) AND (measurement_time >= '2015-01-01'::date)) -> Index Scan using measurement_events_p2014_07_pkey on measurement_events_p2014_07 (cost=0.14..8.16 rows=1 width=966) (actual time=0.001..0.001 rows=0 loops=1) Index Cond: ((measurement_source_id = 'df86917e-8df0-11e1-8f8f-525400e76ceb'::uuid) AND (measurement_time >= '2015-01-01'::date)) -> Index Scan
Re: [GENERAL] Largely inconsistent query execution speed, involving psql_tmp
There are indices: table:measurement_events"measurement_events_pkey" PRIMARY KEY, btree (measurement_source_id, measurement_time) table:measurement_sources"measurement_sources_pkey" PRIMARY KEY, btree (measurement_source_id) -Spiros On 8 July 2014 18:10, Andy Colson wrote: > On 7/8/2014 4:47 AM, Spiros Ioannou wrote: > >> While executing the following query through psql : >> >> SELECT me.* FROM measurement_events me JOIN msrcs_timestamps mt ON >> me.measurement_source_id=mt.measurement_source_id WHERE >> measurement_time > last_update_time >> >> there are two behaviors observed by postgresql (8.4): >> 1) Either the query performs lots of reads on the database and completes >> in about 4 hours (that is the normal-expected behavior) >> 2) Either the query starts filling-up pgsql_tmp and this causes large >> write I/O on the server, and the query never actually completes on a >> reasonable time (we stop it after 10h). >> >> For some strange reason, behaviour 2 is always observed when running >> psql through a bash script, while behavior 1 is only observed while >> running psql interactively from command line (but not always). >> >> explain: >> # explain SELECT me.* FROM measurement_events me JOIN msrcs_timestamps >> mt ON me.measurement_source_id=mt.measurement_source_id WHERE >> measurement_time > last_update_time; >>QUERY PLAN >> >> -- >> Hash Join (cost=10111.78..422893652.69 rows=2958929695 width=103) >> Hash Cond: (me.measurement_source_id = mt.measurement_source_id) >> Join Filter: (me.measurement_time > mt.last_update_time) >> -> Seq Scan on measurement_events me (cost=0.00..234251772.85 >> rows=8876789085 width=103) >> -> Hash (cost=5733.57..5733.57 rows=350257 width=24) >> -> Seq Scan on msrcs_timestamps mt (cost=0.00..5733.57 >> rows=350257 width=24) >> (6 rows) >> >> >> We have tried so far fiddling with work_mem up to 512M - no difference. >> Any suggestions? >> >> >> >> Thanks for any help, >> -Spiros Ioannou >> inaccess >> >> > Is there any reason you don't have an index? > > One, or both, of these will help: > > create index measurement_events_pk on measurement_events( > measurement_source_id); > > create index msrcs_timestamps_pk on msrcs_timestamps(measurement_ > source_id); > > > > measurement_events has 8 billion rows, so expect it to take a while, but > its a one time cost and should _dramatically_ increase your query > performance. > > -Andy > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
[GENERAL] Largely inconsistent query execution speed, involving psql_tmp
While executing the following query through psql : SELECT me.* FROM measurement_events me JOIN msrcs_timestamps mt ON me.measurement_source_id=mt.measurement_source_id WHERE measurement_time > last_update_time there are two behaviors observed by postgresql (8.4): 1) Either the query performs lots of reads on the database and completes in about 4 hours (that is the normal-expected behavior) 2) Either the query starts filling-up pgsql_tmp and this causes large write I/O on the server, and the query never actually completes on a reasonable time (we stop it after 10h). For some strange reason, behaviour 2 is always observed when running psql through a bash script, while behavior 1 is only observed while running psql interactively from command line (but not always). explain: # explain SELECT me.* FROM measurement_events me JOIN msrcs_timestamps mt ON me.measurement_source_id=mt.measurement_source_id WHERE measurement_time > last_update_time; QUERY PLAN -- Hash Join (cost=10111.78..422893652.69 rows=2958929695 width=103) Hash Cond: (me.measurement_source_id = mt.measurement_source_id) Join Filter: (me.measurement_time > mt.last_update_time) -> Seq Scan on measurement_events me (cost=0.00..234251772.85 rows=8876789085 width=103) -> Hash (cost=5733.57..5733.57 rows=350257 width=24) -> Seq Scan on msrcs_timestamps mt (cost=0.00..5733.57 rows=350257 width=24) (6 rows) We have tried so far fiddling with work_mem up to 512M - no difference. Any suggestions? Thanks for any help, -Spiros Ioannou inaccess
Re: [GENERAL] While only running pg_dump, postgresql performs writes inside base/ directory. How can we stop this?
- autovacuum is not off in the pre-snapshot (production) database. It auto-runs every 2 weeks to avoid transaction ID wraparound. - I tried modifying pgdump to have a more relaxed transaction isolation (READ UNCOMMITTED) instead of the default (serializable) but this didn't help. thanks for the info. On 10 December 2013 21:13, Jeff Janes wrote: > On Tue, Dec 10, 2013 at 12:43 AM, Spiros Ioannou wrote: > >> Hi Jeff, >> autovacuum is off on the DB running on the filesystem snapshot. >> > > > Which probably makes sense on the snapshot, but is it also off on the > pre-snapshot database? > > >> What "hint bits" do you suppose it is setting? It's running only one COPY >> command for days. Do you have any suggestions to make it more "read only" ? >> > > > When a query sees a tuple that is still listed as part of an open > transaction, it needs to figure out whether that transaction has now > completed, and if so whether it committed or aborted. This can be quite a > bit of work to do, so once complete it sets a hint bit locally to that > tuple, so that the next visitor doesn't have to repeat the work. I don't > believe that there is any way to turn this off, unless you want to run your > own custom build of PostgreSQL. > > Cheers, > > Jeff >
Re: [GENERAL] While only running pg_dump, postgresql performs writes inside base/ directory. How can we stop this?
Hi Jeff, autovacuum is off on the DB running on the filesystem snapshot. What "hint bits" do you suppose it is setting? It's running only one COPY command for days. Do you have any suggestions to make it more "read only" ? On 2 December 2013 18:14, Jeff Janes wrote: > On Mon, Dec 2, 2013 at 6:22 AM, Spiros Ioannou wrote: > >> We are dumping a 4TB postgres database using pg_dump. The source is on a >> copy-on-write snapshot filesystem, and the target is an external drive. >> >> While the pg_dump is running we get about 4-5MB/sec writes on the >> filesystem where postgresql data is (iostat), and specifically inside the >> base/ directory. >> >> Since the only query running on this DB is the COPY initiated from >> pg_dump which only reads data, who is writing data to postgres datafiles? >> > > It is probably setting "hint bits" on the tuples it is visiting. > > >> >> We need to find a way to make postgres stop writing because the source >> copy-on-write filesystem gets filled up before the dump is finished. >> > > Vacuum the system before taking the snapshot. > >> >> Postgresql version is 8.4, vacuum is not running (autovacuum = off), OS >> is debian 6. >> > > Since autovacuum is off, what are you doing about vacuuming? > > Cheers, > > Jeff >
[GENERAL] While only running pg_dump, postgresql performs writes inside base/ directory. How can we stop this?
We are dumping a 4TB postgres database using pg_dump. The source is on a copy-on-write snapshot filesystem, and the target is an external drive. While the pg_dump is running we get about 4-5MB/sec writes on the filesystem where postgresql data is (iostat), and specifically inside the base/ directory. Since the only query running on this DB is the COPY initiated from pg_dump which only reads data, who is writing data to postgres datafiles? We need to find a way to make postgres stop writing because the source copy-on-write filesystem gets filled up before the dump is finished. Postgresql version is 8.4, vacuum is not running (autovacuum = off), OS is debian 6.
[GENERAL] postgres slower than grep?
Hello all, before you start reading, have in mind that this is not post to advertise one db over another, I just want to know what happens, that's why I did the comparisons with other db. to the point: This is my table: Table "table1" Attribute |Type | Modifier ---+-+-- ClientID | integer | firstname | varchar(5) | surname | varchar(22) | area | varchar(3) | neigh | varchar(27) | dimos | varchar(50) | tel | varchar(7) | The Facts: -- The table exported to a text file is about 330MB, and contains about 5 milion lines. The same table takes 670MB in postgres. Why? (I tried mysql, and it uses 340MB). issuing the following query: select * from table1 where tel='7485842'; takes about 1min and 40 seconds to complete. (query has one result) explain of the above returns: Seq Scan on table1 (cost=0.00..147835.01 rows=23 width=76) issuing a grep in the text file takes about 25 seconds! *(for the whole grep to finish - worse case)* issuing the same query in mysql takes about 25 seconds. -to test the disk speed, I doubled the size of the text file by copying it twice and the same grep query took 51 seconds (as expected) when creating an index the query completes of course in no time yes , i tried vacuum Postgres is version 7.0 system is solaris 2.7 hard disk is 1rpm, ultraSCSI cpu is UltraSparcIIi 333Mhz physical memory is 384MB and now the questions - -shouldn't postgres be at least as fast as the grep? -shouldn't the table data file be smaller to reduce disk activity? Why is it double as the mysql same data file or the text file? I also noticed that an index file for column "tel" is about 130MB large and the mysql's one was 64MB, is this normal? Thank you for your time, Spiros Ioannou e-mail:[EMAIL PROTECTED] --- Image Video & Multimedia Systems Lab. Department of Electrical & Computer Eng. National Technical University of Athens ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])