Re: [GENERAL] Lots of stuck queries after upgrade to 9.4

2015-08-02 Thread Heikki Linnakangas

On 07/30/2015 04:23 PM, Spiros Ioannou wrote:

I'm very sorry but we don't have a synthetic load generator for our testing
setup, only production and that is on SLA. I would be happy to test the
next release though.


Ok, no worries. I've pushed this changes, it will appear in the next 
release. Thanks for the report!


- Heikki



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Lots of stuck queries after upgrade to 9.4

2015-07-29 Thread Heikki Linnakangas

On 07/28/2015 11:36 PM, Heikki Linnakangas wrote:

A-ha, I succeeded to reproduce this now on my laptop, with pgbench! It
seems to be important to have a very large number of connections:

pgbench -n -c400 -j4 -T600 -P5

That got stuck after a few minutes. I'm using commit_delay=100.

Now that I have something to work with, I'll investigate this more tomorrow.


Ok, it seems that this is caused by the same issue that I found with my 
synthetic test case, after all. It is possible to get a lockup because 
of it.


For the archives, here's a hopefully easier-to-understand explanation of 
how the lockup happens. It involves three backends. A and C are 
insertion WAL records, while B is flushing the WAL with commit_delay. 
The byte positions 2000, 2100, 2200, and 2300 are offsets within a WAL 
page. 2000 points to the beginning of the page, while the others are 
later positions on the same page. WaitToFinish() is an abbreviation for 
WaitXLogInsertionsToFinish(). "Update pos X" means a call to 
WALInsertLockUpdateInsertingAt(X). "Reserve A-B" means a call to 
ReserveXLogInsertLocation, which returned StartPos A and EndPos B.


Backend A   Backend B   Backend C
-   -   -
Acquire InsertLock 2
Reserve 2100-2200
Calls WaitToFinish()
  reservedUpto is 2200
  sees that Lock 1 is
  free
Acquire InsertLock 1
Reserve 2200-2300
GetXLogBuffer(2200)
 page not in cache
 Update pos 2000
 AdvanceXLInsertBuffer()
  run until about to
  acquire WALWriteLock
GetXLogBuffer(2100)
 page not in cache
 Update pos 2000
 AdvanceXLInsertBuffer()
  Acquire WALWriteLock
  write out old page
  initialize new page
  Release WALWriteLock
finishes insertion
release InsertLock 2
WaitToFinish() continues
  sees that lock 2 is
  free. Returns 2200.

Acquire WALWriteLock
Call WaitToFinish(2200)
  blocks on Lock 1,
  whose initializedUpto
  is 2000.

At this point, there is a deadlock between B and C. B is waiting for C 
to release the lock or update its insertingAt value past 2200, while C 
is waiting for WALInsertLock, held by B.


To fix that, let's fix GetXLogBuffer() to always advertise the exact 
position, not the beginning of the page (except when inserting the first 
record on the page, just after the page header, see comments).


This fixes the problem for me. I've been running pgbench for about 30 
minutes without lockups now, while without the patch it locked up within 
a couple of minutes. Spiros, can you easily test this patch in your 
environment? Would be nice to get a confirmation that this fixes the 
problem for you too.


- Heikki

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 8e9754c..307a04c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1839,11 +1839,32 @@ GetXLogBuffer(XLogRecPtr ptr)
 	endptr = XLogCtl->xlblocks[idx];
 	if (expectedEndPtr != endptr)
 	{
+		XLogRecPtr	initializedUpto;
+
 		/*
-		 * Let others know that we're finished inserting the record up to the
-		 * page boundary.
+		 * Before calling AdvanceXLInsertBuffer(), which can block, let others
+		 * know how far we're finished with inserting the record.
+		 *
+		 * NB: If 'ptr' points to just after the page header, advertise a
+		 * position at the beginning of the page rather than 'ptr' itself. If
+		 * there are no other insertions running, someone might try to flush
+		 * up to our advertised location. If we advertised a position after
+		 * the page header, someone might try to flush the page header, even
+		 * though page might actually not be initialized yet. As the first
+		 * inserter on the page, we are effectively responsible for making
+		 * sure that it's initialized, before we let insertingAt to move past
+		 * the page header.
 		 */
-		WALInsertLockUpdateInsertingAt(expectedEndPtr - XLOG_BLCKSZ);
+		if (ptr % XLOG_BLCKSZ == SizeOfXLogShortPHD &&
+			ptr % XLOG_SEG_SIZE > XLOG_BLCKSZ)
+			initializedUpto = ptr - SizeOfXLogShortPHD;
+		else if (ptr % XLOG_BLCKSZ == SizeOfXLogLongPHD &&
+ ptr % XLOG_SEG_SIZE < XLOG_BLCKSZ)
+			initializedUpto = ptr - SizeOfXLogLongPHD;
+		else
+			initializedUpto = ptr;
+
+		WALInsertLockUpda

Re: [GENERAL] Lots of stuck queries after upgrade to 9.4

2015-07-28 Thread Heikki Linnakangas
A-ha, I succeeded to reproduce this now on my laptop, with pgbench! It 
seems to be important to have a very large number of connections:


pgbench -n -c400 -j4 -T600 -P5

That got stuck after a few minutes. I'm using commit_delay=100.

Now that I have something to work with, I'll investigate this more tomorrow.

- Heikki



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Lots of stuck queries after upgrade to 9.4

2015-07-27 Thread Heikki Linnakangas

On 07/23/2015 02:36 PM, Heikki Linnakangas wrote:

On 07/23/2015 11:31 AM, Spiros Ioannou wrote:

Well, so far with commit_delay=0 no problems. I will report back of couse
if something happens, but I believe that the problem may indeed be
solved/masked with that setting.

Rough description of our setup, or how to reproduce:
* Timeseries data in table , say, "measurements", size: 3-4TB, about 1000
inserts/second
* table measurements also has a trigger on insert to also insert on
measurements_a (for daily export purposes)
Just the above would cause a stuck query after a few days.

Now for exporting we run the following CTE query (measurements_b is an
empty table, measurements_a has about 5GB)
* WITH d_rows AS (DELETE FROM measurement_events_a RETURNING * )  INSERT
INTO measurement_events_b SELECT * FROM d_rows;
The above caused the problem to appear every time, after a 10-20 minutes.


Hmm. With that CTE query, were there other queries running at the same time?


I was able to reproduce something like this with pgbench, by running a
custom little module that calls the WaitXLogInsertionsToFinish() in a
tight loop, and checks that the value it returns moves monotonically
forward. With commit_delay on, once every minute or so, it moves backwards.

I'll investigate why that happens...


I was able to debug the synthetic test case I created, but unfortunately 
I don't think it explains the lock up you're seeing after all.


It's possible for WaitXLogInsertionsToFinish() to move backwards, in 
this scenario:


1. Backend A acquires WALInsertLock 2, and reserves xlog between byte 
positions 2100 - 2200
2. Backend B calls WaitXLogInsertionsToFinish(), which blocks on backend 
A, which hasn't advertised any location yet.
3. Backend C acquires WALInsertLock 1, and reserves xlog between byte 
positions 2200 - 2300
4. Backend C calls GetXLogBuffer(), and sees that the page is not in 
cache yet. (It does not call WALInsertLockUpdateInsertingAt() yet, 
because it's a bit slow or context-switched out)
5. Backend A initializes the page, completes inserting its WAL record, 
and releases its WALInsertLock.
6. Backend B gets unblocked, seeing that the lock held by B is now free. 
It calculated 2200 as the return value, which was the latest reserved 
WAL position. (Backend C started after it began, so it didn't have to 
wait for it)
7. Backend C calls WALInsertLockUpdateInsertingAt(), with a WAL position 
pointing to the beginning of the page, 2000.


If you now call WALInsertLockUpdateInsertingAt() again, it will return 
2000, because backend C is the only backend holding a lock, and its 
advertised position is 2000. But the previous call calculated 2200. 
GetXLogBuffer() always advertises a WAL position at the beginning of the 
requested page, but that's a bit bogus.


However, AFAICS that is actually harmless. Backend C is not blocked. The 
page it's looking for is certainly in cache at this point, so it can 
continue without blocking. So I don't think this explains your lockup.




- Heikki



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Lots of stuck queries after upgrade to 9.4

2015-07-23 Thread Heikki Linnakangas

On 07/23/2015 11:31 AM, Spiros Ioannou wrote:

Well, so far with commit_delay=0 no problems. I will report back of couse
if something happens, but I believe that the problem may indeed be
solved/masked with that setting.

Rough description of our setup, or how to reproduce:
* Timeseries data in table , say, "measurements", size: 3-4TB, about 1000
inserts/second
* table measurements also has a trigger on insert to also insert on
measurements_a (for daily export purposes)
Just the above would cause a stuck query after a few days.

Now for exporting we run the following CTE query (measurements_b is an
empty table, measurements_a has about 5GB)
* WITH d_rows AS (DELETE FROM measurement_events_a RETURNING * )  INSERT
INTO measurement_events_b SELECT * FROM d_rows;
The above caused the problem to appear every time, after a 10-20 minutes.


I was able to reproduce something like this with pgbench, by running a 
custom little module that calls the WaitXLogInsertionsToFinish() in a 
tight loop, and checks that the value it returns moves monotonically 
forward. With commit_delay on, once every minute or so, it moves backwards.


I'll investigate why that happens...

- Heikki



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Lots of stuck queries after upgrade to 9.4

2015-07-20 Thread Heikki Linnakangas

On 07/20/2015 03:01 PM, Andres Freund wrote:

Heikki,

On 2015-07-20 13:27:12 +0200, Andres Freund wrote:

On 2015-07-20 13:22:42 +0200, Andres Freund wrote:

Hm. The problem seems to be the WaitXLogInsertionsToFinish() call in
XLogFlush().


These are the relevant stack traces:
db9lock/debuglog-commit.txt
#2  0x7f7405bd44f4 in LWLockWaitForVar (l=0x7f70f2ab6680, valptr=0x7f70f2ab66a0, 
oldval=, newval=0x) at 
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:1011
#3  0x7f7405a0d3e6 in WaitXLogInsertionsToFinish (upto=121713318915952) at 
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1755
#4  0x7f7405a0e1d3 in XLogFlush (record=121713318911056) at 
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2849

db9lock/debuglog-insert-8276.txt
#1  0x7f7405b77d91 in PGSemaphoreLock (sema=0x7f73ff6531d0, interruptOK=0 
'\000') at pg_sema.c:421
#2  0x7f7405bd4849 in LWLockAcquireCommon (val=, valptr=, 
mode=, l=) at 
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:626
#3  LWLockAcquire (l=0x7f70ecaaa1a0, mode=LW_EXCLUSIVE) at 
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/storage/lmgr/lwlock.c:467
#4  0x7f7405a0dcca in AdvanceXLInsertBuffer (upto=, 
opportunistic=) at 
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:2161
#5  0x7f7405a0e301 in GetXLogBuffer (ptr=121713318928384) at 
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1848
#6  0x7f7405a0e9c9 in CopyXLogRecordToWAL (EndPos=, StartPos=, rdata=0x71c21b90, isLogSwitch=, write_len=) at 
/tmp/buildd/postgresql-9.4-9.4.4/build/../src/backend/access/transam/xlog.c:1494
#7  XLogInsert (rmid=, info=, rdata=) at /tmp/buildd/postgre



XLogFlush() has the following comment:
/*
 * Re-check how far we can now flush the WAL. It's 
generally not
 * safe to call WaitXLogInsertionsToFinish while holding
 * WALWriteLock, because an in-progress insertion might 
need to
 * also grab WALWriteLock to make progress. But we know 
that all
 * the insertions up to insertpos have already 
finished, because
 * that's what the earlier WaitXLogInsertionsToFinish() 
returned.
 * We're only calling it again to allow insertpos to be 
moved
 * further forward, not to actually wait for anyone.
 */
insertpos = WaitXLogInsertionsToFinish(insertpos);

but I don't think that's valid reasoning. WaitXLogInsertionsToFinish()
calls LWLockWaitForVar(oldval = InvalidXLogRecPtr), which will block if
there's a exlusive locker and some backend doesn't yet have set
initializedUpto. Which seems like a ossible state?


A backend always updates its insert position before sleeping/acquiring 
another lock, by calling WALInsertLockUpdateInsertingAt. So even though 
another backend might indeed be in the 
initializedUpto==InvalidXlogRecPtr state, it will get out of that state 
before either by releasing the lock or updating initializedUpto, before 
it will in turn do anything that might deadlock.


Clearly there's *something* wrong here, though, given the bug report...

- Heikki



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re: [HACKERS] Check that streaming replica received all data after master shutdown

2015-01-13 Thread Heikki Linnakangas

On 01/13/2015 12:11 PM, Vladimir Borodin wrote:


05 янв. 2015 г., в 18:15, Vladimir Borodin  написал(а):


Hi all.

I have a simple script for planned switchover of PostgreSQL (9.3 and 9.4) 
master to one of its replicas. This script checks a lot of things before doing 
it and one of them is that all data from master has been received by replica 
that is going to be promoted. Right now the check is done like below:

On the master:

postgres@pgtest03d ~ $ psql -t -A -c 'select pg_current_xlog_location();'
0/3390
postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_ctl stop -m fast
waiting for server to shut down done
server stopped
postgres@pgtest03d ~ $ /usr/pgsql-9.3/bin/pg_controldata | head
pg_control version number:937
Catalog version number:   201306121
Database system identifier:   6061800518091528182
Database cluster state:   shut down
pg_control last modified: Mon 05 Jan 2015 06:47:57 PM MSK
Latest checkpoint location:   0/3428
Prior checkpoint location:0/3328
Latest checkpoint's REDO location:0/3428
Latest checkpoint's REDO WAL file:001B0034
Latest checkpoint's TimeLineID:   27
postgres@pgtest03d ~ $

On the replica (after shutdown of master):

postgres@pgtest03g ~ $ psql -t -A -c "select 
pg_xlog_location_diff(pg_last_xlog_replay_location(), '0/3428');"
104
postgres@pgtest03g ~ $

These 104 bytes seems to be the size of shutdown checkpoint record (as I can 
understand from pg_xlogdump output).

postgres@pgtest03g ~/9.3/data/pg_xlog $ /usr/pgsql-9.3/bin/pg_xlogdump -s 
0/3390 -t 27
rmgr: XLOGlen (rec/tot):  0/32, tx:  0, lsn: 
0/3390, prev 0/3328, bkp: , desc: xlog switch
rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn: 
0/3428, prev 0/3390, bkp: , desc: checkpoint: redo 0/3428; tli 
27; prev tli 27; fpw true; xid 0/6010; oid 54128; multi 1; offset 0; oldest xid 
1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
pg_xlogdump: FATAL:  error in WAL record at 0/3428: record with zero length 
at 0/3490

postgres@pgtest03g ~/9.3/data/pg_xlog $

I’m not sure that these 104 bytes will always be 104 bytes to have a strict 
equality while checking. Could it change in the future? Or is there a better 
way to understand that streaming replica received all data after master 
shutdown? The check that pg_xlog_location_diff returns 104 bytes seems a bit 
strange.


Don't rely on it being 104 bytes. It can vary across versions, and 
across different architectures.


You could simply check that the standby's pg_last_xlog_replay_location() 
> master's "Latest checkpoint location", and not care about the exact 
difference.


- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] freeze cannot be finished

2013-11-18 Thread Heikki Linnakangas

On 14.11.2013 02:26, Jeff Janes wrote:

On Wed, Nov 13, 2013 at 3:53 PM, Sergey Burladyan wrote:


Jeff Janes  writes:

If I not mistaken, looks like lazy_scan_heap() called from
lazy_vacuum_rel()
(see [1]) skip pages, even if it run with scan_all == true,
lazy_scan_heap()
does not increment scanned_pages if lazy_check_needs_freeze() return
false, so
if this occurred at wraparound vacuum it cannot update pg_class, because
pg_class updated via this code:

 new_frozen_xid = FreezeLimit;
 if (vacrelstats->scanned_pages < vacrelstats->rel_pages)
 new_frozen_xid = InvalidTransactionId;

 vac_update_relstats(onerel,
 new_rel_pages,
 new_rel_tuples,
 new_rel_allvisible,
 vacrelstats->hasindex,
 new_frozen_xid);

so i think in our prevent wraparound vacuum vacrelstats->scanned_pages
always
less than vacrelstats->rel_pages and pg_class relfrozenxid never updated.


Yeah, I think that that is a bug.  If the clean-up lock is unavailable but
the page is inspected without it and found not to need freezing, then the
page needs to be counted as scanned, but is not so counted.

commit bbb6e559c4ea0fb4c346beda76736451dc24eb4e
Date:   Mon Nov 7 21:39:40 2011 -0500

But this was introduced in 9.2.0, so unless the OP didn't upgrade to 9.2
until recently, I don't know why it just started happening.

It looks like a simple fix (to HEAD attached), but I don't know how to test
it.


Thanks, committed.

I was able to reproduce it by doing this:

-- Create and populate a test table
create table foo (i int4);
insert into foo select generate_series(1, 1);

-- Freeze it, and observe relfrozenxid.
vacuum freeze foo;
select relfrozenxid from pg_class where oid='foo'::regclass;

-- Do some transactions to bump current transaction ID
insert into foo values (-1);
insert into foo values (-1);
insert into foo values (-1);
insert into foo values (-1);

-- Now, in a second psql session, open a cursor on the table. It keeps 
the current page pinned, which causes the ConditionalLockBuffer() in 
vacuumlazy.c to fail to grab the lock:


 begin; declare  foocur cursor for select * from foo;  fetch foocur;

-- Back to the original psql session, vacuum freeze again:
vacuum freeze foo;

-- Observe the new relfrozenxid. With the bug, it's the same as before. 
Vacuum freeze is not able to advance relfrozenxid because it skipped the 
page by the cursor. With the patch, it does advance.


select relfrozenxid from pg_class where oid='foo'::regclass;


Also, it seem like it might be worth issuing a warning if scan_all is true
but all was not scanned.


Hmm, the new definition of "scanned", with this patch, is that pages 
that were not vacuumed are still counted as scanned. I don't think a 
warning is appropriate, there isn't anything wrong with skipping pinned 
pages that don't need freezing, the amount of bloat left behind should 
be tiny. For diagnostic purposes, perhaps they should be counted 
separately in vacuum verbose output, though.


- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [pgeu-general] [GENERAL] Call for design: PostgreSQL mugs

2013-09-05 Thread Heikki Linnakangas
Put a small elephant logo *inside* the mug. With a text: "also for 
embedded systems"


- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [pgeu-general] Replication failover

2013-05-23 Thread Heikki Linnakangas

On 23.05.2013 03:55, TJ wrote:


We have a few different sets of servers with different versions.
9.0.4
9.1.4
9.2.3

I recently tried to fail-over a set of 9.2.3 servers and server4 did
notice the timeline change but did not start following it.
We do not have the recovery_target_timeline set in the recovery.conf

ATM we are not using WAL archiving only streaming.

And server4 is behind server3.

So would we need to setup WAL archiving to achieve this?


Yep. Set up a WAL archive, set recovery_target_timeline='latest', and 
use 9.1 or above, and it should work.


- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [pgeu-general] Replication failover

2013-05-22 Thread Heikki Linnakangas
BTW, pgeu-general is not for technical questions, so moving to 
pgsql-general. (I didn't notice the mailing list this came from until 
after replying).


On 22.05.2013 18:22, Heikki Linnakangas wrote:

On 22.05.2013 10:23, TJ wrote:

I am looking to migrate my databases from one set of hardware to another
all the servers are running PGSQL 9.x


Which version exactly? There are differences in this area between 9.0,
9.1, and 9.2. If you can choose, use the latest 9.2.X version.


The problem i have is that when i fail over to server3 using the trigger
file it will increment the timeline which will stop the replication of
server4.


With 9.1 and above, it will notice the new timeline and start following
it, if you set up a WAL archive in addition to streaming replication and
set recovery_target_timeline='latest' in recovery.conf. Starting with
9.3, a standby can follow a timeline changes over streaming replication
too, so in 9.3 it should just work.

That's all assuming that server4 is behind server3; if it has already
replayed WAL beyond the point in the WAL where server3 switched to a new
timeline, it can't follow that timeline switch.

- Heikki





--
- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [ODBC] [GENERAL] ODBC constructs

2013-05-20 Thread Heikki Linnakangas

On 21.05.2013 08:11, Dev Kumkar wrote:

On Mon, May 20, 2013 at 9:12 PM, Atri Sharma  wrote:



If you wish to work in C,then,I would suggest libpq.I would wait for more
replies on this,as I have little knowledge
about psqlODBC.



Thanks for the comments. Yes objective is to work in C and found libpq
useful but am not sure about psqlODBC.
It would be really great to get comments from community here regarding
comparison between both of them and also performance perspective. Based on
my reading so far libpq is good from performance perspective.


libpq is generally-speaking better than psqlodbc. The advantage of ODBC 
is that if you have a suitable driver, you can connect to other DBMS' as 
well, while libpq will only work with PostgreSQL. Unless you need such 
portability, go with libpq.


- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [pgeu-general] Alphanumeric natural order sorting

2013-03-22 Thread Heikki Linnakangas
(pgeu-general is not the right list for technical discussions, moving to 
pgsql-general)


On 20.03.2013 10:46, Albe Laurenz wrote:

Umashanker, Srividhya wrote:

I am looking for a solution the Alphanumeric sorting



I am expecting
1, bay1
2, bay2
10, bay10
11, bay11



We are working on a framework,   where the client can

* call for sort on any colmn.
* The digits may or may not be there
* The numeric can be anywhere in the string


That's easy then.
Just define exactly how you want the ordering to be,
and based on that definition you can write code for
sorting.

How would you sort
'bay10', 'ba1y0', 'ba10y', 'ba2y0'?


The OP is asking about "natural sort order". See 
http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html 
for example.


There are a few projects out there for doing that in various programming 
languages, but I'm not aware of anything for PostgreSQL. Maybe you could 
pick one of the existing functions listed in that blog post, for 
example, and write a PL function using them.


See also: 
http://blog.ringerc.id.au/2012/10/natural-sorting-example-of-utility-of.html


- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [HACKERS] [GENERAL] Floating point error

2013-03-05 Thread Heikki Linnakangas

On 05.03.2013 15:59, Kevin Grittner wrote:

Daniel Farina  wrote:


This kind of change may have many practical problems that may
make it un-pragmatic to alter at this time (considering the
workaround is to set the extra float digits), but I can't quite
grasp the rationale for "well, the only program that cares about
the most precision available is pg_dump".  It seems like most
programs would care just as much.


Something to keep in mind is that when you store 0.01 into a double
precision column, the precise value stored, when written in
decimal, is:

0.0120816681711721685132943093776702880859375

Of course, some values can't be precisely written in decimal with
so few digits.


It would be nice to have a base-2 text format to represent floats. It 
wouldn't be as human-friendly as base-10, but it could be used when you 
don't want to lose precision. pg_dump in particular.


- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc

2012-12-20 Thread Heikki Linnakangas

On 18.12.2012 13:42, hubert depesz lubaczewski wrote:

In pg_log on ubuntu2 I see:

2012-12-18 12:41:34.428 CET [unknown]@[unknown] 1685  LOG:  connection 
received: host=172.28.173.142 port=45842
2012-12-18 12:41:34.430 CET replication@[unknown] 1685 172.28.173.142(45842) 
LOG:  replication connection authorized: user=replication
2012-12-18 12:41:34.432 CET replication@[unknown] 1685 172.28.173.142(45842) 
ERROR:  requested WAL segment 00020015 has already been removed
2012-12-18 12:41:34.433 CET replication@[unknown] 1685 172.28.173.142(45842) 
LOG:  disconnection: session time: 0:00:00.005 user=replication database= 
host=172.28.173.142 port=45842

Something looks weird. To put it lightly.


Hmm, that's a different error than you got before. Thom also reported a 
"requested WAL segment ... has already been removed" error, but in his 
test case, and as far as I could reproduce it, the error doesn't reoccur 
when the standby reconnects. In other words, it eventually worked 
despite that error. In any case, I just committed a fix for the scenario 
that Thom reported. Can you try again with a fresh checkout?


- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re: [COMMITTERS] pgsql: Allow a streaming replication standby to follow a timeline switc

2012-12-17 Thread Heikki Linnakangas

On 15.12.2012 17:06, hubert depesz lubaczewski wrote:

I might be missing something, but what exactly does that commit give us?

I mean - we were able, previously, to make slave switch to new master
- as Phil Sorber described in here:
http://philsorber.blogspot.com/2012/03/what-to-do-when-your-timeline-isnt.html

After some talk on IRC, I understood that this patch will make it
possible to switch to new master in plain SR replication, with no WAL
archive (because if you have wal archive, you can use the method Phil
described, which basically "just works").


Right, that's exactly the point of the patch. A WAL archive is no longer 
necessary for failover.



So I did setup three machines: master and two slaves.
Master had 2 IPs - its own, and a floating one.
Both slaves were connecting to the floating one, and recovery.conf
looked like:
-
standby_mode = 'on'
primary_conninfo = 'port=5920 user=replication host=172.28.173.253'
trigger_file = '/tmp/finish.replication'
recovery_target_timeline='latest'
-

After I verified that replication works to both slaves, I did failover one of
the slaves, shut down master, and did ip takeover of floating ip to the slave
that did takeover.


Hmm, is it possible that some WAL was generated in the old master, and 
streamed to the standby, after the new master was already promoted? It's 
important to kill the old master before promoting the new master. 
Otherwise the timelines diverge, so that you have some WAL on the old 
timeline that's not present in the new master, and some WAL in the new 
master's timeline that's not present in the old master. In that 
situation, if the standby has already replicated the WAL from the old 
master, it can no longer start to follow the new master. I think that 
would match the symptoms you're seeing.


I wouldn't rule out a bug in the patch either, though. Amit found a 
worrying number of bugs in his testing, and although we stamped out all 
the known bugs, it wouldn't surprise me if there's more :-(..


- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [BUGS] Prepared Statement Name Truncation

2012-11-23 Thread Heikki Linnakangas

On 23.11.2012 17:53, Tom Lane wrote:

Euler Taveira  writes:

On 22-11-2012 04:27, Pavel Stehule wrote:

significantly larger catalog



Less than 5% of catalog columns? I don't buy your argument.


It's not about count, it's about size.  For instance, pg_attribute
currently requires 140 bytes per row (counting the tuple header and
line pointer), so adding 64 bytes would represent 45% bloat.  In
a database with lots of tables that would be painful.

We could avoid this problem if we were prepared to make type "name"
be varlena,  ...


It would actually be nice to do that because it would *reduce* the 
amount of space and memory used for the catalogs in the typical case, 
where the attribute names are much smaller than 64 bytes. I received a 
complaint just the other day that our backend processes consume a lot of 
memory, even when idle; the catalog caches are a large part of that.


- Heikki


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Anyone using silent_mode?

2011-06-27 Thread Heikki Linnakangas
Is anyone using silent_mode=on ? There's some discussion on 
pgsql-hackers [1] on whether it should be removed altogether in 9.2, 
since you can get the same functionality with "pg_ctl start", or nohup. 
If you're using silent_mode, what are you using it for?


[1] 
http://archives.postgresql.org/message-id/1308926157-sup-7...@alvh.no-ip.org


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [JDBC] How to just "link" to some data feed

2008-06-04 Thread Heikki Linnakangas

Albretch Mueller wrote:

 import/export the data into/out of PG, so you will be essentially
duplicating the data and having to synch it. This is exactly what I am
trying to avoid, I would like for PG to handle the data right from the
data feed


You could write a set-returning function that reads the file, and 
perhaps a view on top of that. As the file is read on every invocation, 
that's only practical for small tables, though.


It's likely a better idea to just COPY the table into the database 
periodically, or perhaps write an external script that watches the 
modification timestamp on the file and triggers a reload whenever it 
changes.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [HACKERS] Kill a Long Running Query

2007-04-25 Thread Heikki Linnakangas
Please don't cross-post to multiple mailing lists. And pgsql-hackers is 
not the correct list for basic usage questions. And long end-of-mail 
disclaimers are not generally appreciated.


Mageshwaran wrote:
Any body tell me how to kill a long running query in postgresql, is 
there any statement to kill a query, 


See the user manual on administration functions, pg_cancel_backend in 
particular:


http://www.postgresql.org/docs/8.2/interactive/functions-admin.html

Basically you issue a "SELECT * FROM pg_stat_activity", or plain ps to 
find out the pid of the backend executing the long running query, and 
then use pg_cancel_backend (or kill -INT) to cancel it.



and also tell me how to log slow queries to a log file.


Using the log_min_duration_statement configuration variable.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Kenneth Marshall wrote:

On Mon, Jan 22, 2007 at 06:42:09PM +, Simon Riggs wrote:

Hold that thought! Read Heikki's Piggyback VACUUM idea on new thread...


There may be other functions that could leverage a similar sort of
infrastructure. For example, a long DB mining query could be registered
with the system. Then as the pieces of the table/database are brought in
to shared memory during the normal daily DB activity they can be acquired
without forcing the DB to run a very I/O expensive query when waiting a
bit for the results would be acceptable. As long as we are thinking
piggyback.


Yeah, I had the same idea when we discussed synchronizing sequential 
scans. The biggest difference is that with queries, there's often a user 
waiting for the query to finish, but with vacuum we don't care so much 
how long it takes.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Gregory Stark wrote:

"Bruce Momjian" <[EMAIL PROTECTED]> writes:


I agree it index cleanup isn't > 50% of vacuum.  I was trying to figure
out how small, and it seems about 15% of the total table, which means if
we have bitmap vacuum, we can conceivably reduce vacuum load by perhaps
80%, assuming 5% of the table is scanned.


Actually no. A while back I did experiments to see how fast reading a file
sequentially was compared to reading the same file sequentially but skipping
x% of the blocks randomly. The results were surprising (to me) and depressing.
The breakeven point was about 7%.


Note that with uniformly random updates, you have dirtied every page of 
the table until you get anywhere near 5% of dead space. So we have to 
assume non-uniform distribution of update for the DSM to be of any help.


And if we assume non-uniform distribution, it's a good bet that the 
blocks that need vacuuming are also not randomly distributed. In fact, 
they might very well all be in one cluster, so that scanning that 
cluster is indeed sequential I/O.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Bruce Momjian wrote:

Heikki Linnakangas wrote:

Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum.  So doing a 
partial vacuum actually means more I/O as you have to do index cleanup 
more often.
I don't think that's usually the case. Index(es) are typically only a 
fraction of the size of the table, and since 8.2 we do index vacuums in 
a single scan in physical order. In fact, in many applications the index 
is be mostly cached and the index scan doesn't generate any I/O at all.


Are _all_ the indexes cached?  I would doubt that.


Well, depends on your schema, of course. In many applications, yes.


 Also, for typical
table, what percentage is the size of all indexes combined?


Well, there's no such thing as a typical table. As an anecdote here's 
the ratios (total size of all indexes of a table)/(size of corresponding 
heap) for the bigger tables for a DBT-2 run I have at hand:


Stock:  1190470/68550 = 6%
Order_line: 950103/274372 = 29%
Customer:   629011 /(5711+20567) = 8%

In any case, for the statement "Index cleanup is the most expensive part 
of vacuum" to be true, you're indexes would have to take up 2x as much 
space as the heap, since the heap is scanned twice. I'm sure there's 
databases like that out there, but I don't think it's the common case.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-22 Thread Heikki Linnakangas

Russell Smith wrote:
2. Index cleanup is the most expensive part of vacuum.  So doing a 
partial vacuum actually means more I/O as you have to do index cleanup 
more often.


I don't think that's usually the case. Index(es) are typically only a 
fraction of the size of the table, and since 8.2 we do index vacuums in 
a single scan in physical order. In fact, in many applications the index 
is be mostly cached and the index scan doesn't generate any I/O at all.


I believe the heap scans are the biggest issue at the moment.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] Autovacuum Improvements

2007-01-21 Thread Heikki Linnakangas

Russell Smith wrote:
Strange idea that I haven't researched,  Given Vacuum can't be run in a 
transaction, it is possible at a certain point to quit the current 
transaction and start another one.  There has been much chat and now a 
TODO item about allowing multiple vacuums to not starve small tables.  
But if a big table has a long running vacuum the vacuum of the small 
table won't be effective anyway will it?  If vacuum of a big table was 
done in multiple transactions you could reduce the effect of long 
running vacuum.  I'm not sure how this effects the rest of the system 
thought.


That was fixed by Hannu Krosing's patch in 8.2 that made vacuum to 
ignore other vacuums in the oldest xmin calculation.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] [GENERAL] ISO week dates

2006-10-12 Thread Heikki Linnakangas

Guillaume Lelarge wrote:
I've tried to work on the first one, the ISO day field. My patch is 
attached and is against CVS HEAD. It only takes care of the code, 
nothing is done for documentation matter. It works with me :


I haven't been following this thread, but I just wanted to point out 
that we prefer context diffs.


Please resend the patch as a context diff, using "diff -c" or "cvs diff 
-c", so that it's easier to review.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend