pg_update to a new machine?

2018-02-23 Thread Ron Johnson
The 2.8TB database must be moved to a new server in a new Data Center, and 
upgraded from 8.4.17 to 9.6.6


Will this work?

pg_upgrade
--old-datadir "CURSERVER://var/lib/pgsql/data"
--new-datadir "NEWSERVER://var/lib/pgsql/data"
--old-bindir "CURSERVER://usr/bin"
--new-bindir "NEWSERVER://usr/bin"

Or must I:
1. temporarily allocate 3TB of scratch space on the new server,
2. install 8.4 on the new server,
3. install 9.6.6 on the new server,
2. rsync CURSERVER://var/lib/pgsql/data to 
NEWSERVER://var/lib/pgsql/8.4/data, and then

3. pg_upgrade?

Are there better ways?  (The pipe from current DC to new DC will be 10Gbps.)

Thanks

--
Angular momentum makes the world go 'round.


Re: Given a set of daterange, finding the continuous range that includes a particular date (aggregates)

2018-02-23 Thread Ken Tanzer
>
> On Fri, Feb 23, 2018 at 6:10 AM, mariusz  wrote:


> i guess, you can easily get max continuous range for each row with
> something like this:
>
> CREATE OR REPLACE FUNCTION
> append_daterange(d1 daterange, d2 daterange)
> RETURNS daterange
> LANGUAGE sql
> AS
> $$
> SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END;
> $$;
>
> CREATE AGGREGATE agg_daterange (daterange) (
> sfunc = append_daterange,
> stype = daterange
> );
>
> SELECT dr,
>lower(agg_daterange(dr) OVER (ORDER BY dr ASC)),
>upper(agg_daterange(dr) OVER (ORDER BY dr DESC))
>   FROM ...
>
> above example is simplified to selecting only daterange column "dr" for
> readability, which in your case should be something like
>
> daterange(staff_assign_date,staff_assign_date_end,'[)')
>
> please note that daterange would be normalized to [) format so upper()
> above returns exactly your max "staff_assign_date_end" for each
> continuous range when dateranges are created with '[)' format.
>
> the key point is ... ELSE d2 in append_daterange() which starts with new
> value each time that new value is discontinuous with agg's state value
> and order in which rows are processed (ASC for lower of daterange, DESC
> for upper of daterange).
>
> unfortunately this involves reading all rows for "client_id" and
> additional sorting for each window.
> i recall reading that you already pay the price of reading all rows for
> client_id anyway, so the only question is the cost of two additional
> sorts (maybe extracting dateranges to subset on which to do windows and
> rejoining result of continuous ranges to original set would help to
> lower the cost).
>
>
Thank you, and I wanted to follow up on this.  I couldn't quite get your
example working as described, but I also ended up trying something very
similar that got me very close but not quite there.  Basically, I can see
that it is correctly calculating the ranges (in the notices), but it is
only returning the last range for each client.  (Because I said PARTITION
BY client_id).

So I'm not sure if I should be calling this differently, or if the function
needs to work differently, or if this just isn't possible.  Do I need to
partition by something else, and if so what?  I don't see what I could
specify that would indicate a new episode.

Also, it's not clear to me how an aggregate might define/return different
values within a partition.  Although this must be possible, since functions
like rank() and row_number() seem to do it.

Hoping there is something easy that can be tweaked here.  See below for
copy/pastable test stuff.  It includes output from both functions.  Both
look to be returning the same results, which makes me wonder if my passing
in a start date was a waste of time, though it seems to me it would be
necessary.

Cheers,
Ken

BEGIN;
CREATE TEMP TABLE sample_data (
client_id INTEGER,
start_date DATE,
end_date DATE,
episode INTEGER -- Just a label, for clarity
);

INSERT INTO sample_data VALUES
(1,'1990-01-01','1990-12-31',1),
(1,'1991-01-01','1991-12-31',1),

(1,'2000-01-01','2000-12-31',2),
(1,'2001-01-01','2001-12-31',2),
(1,'2002-01-01','2002-12-31',2),
(1,'2003-01-01','2003-12-31',2),
(1,'2004-01-01','2004-12-31',2),
(1,'2005-01-01','2005-12-31',2),
(1,'2006-01-01','2006-12-31',2),

(1,'2014-01-01','2014-12-31',3),
(1,'2015-01-01','2015-12-31',3),
(1,'2017-06-30','2017-12-31',4),
(1,'2018-01-01',NULL,4),

(2,'2014-02-01','2015-01-31',1),
(2,'2015-02-01','2015-12-31',1),
(2,'2017-09-30','2018-01-31',2),
(2,'2018-02-01','2018-02-14',2)
;

CREATE OR REPLACE FUNCTION
append_daterange(d1 daterange, d2 daterange)
RETURNS daterange
LANGUAGE sql
AS
$$
SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END;
$$;

CREATE AGGREGATE agg_daterange (daterange) (
sfunc = append_daterange,
stype = daterange
);

CREATE OR REPLACE FUNCTION range_continuous_merge( daterange, daterange,
date ) RETURNS daterange AS $$

DECLARE
res daterange;

BEGIN
res:=  CASE
WHEN $1 IS NULL AND NOT $2 @> $3 THEN NULL
WHEN $1 IS NULL AND $2 @> $3 THEN $2
WHEN ($1 && $2) OR ($1 -|- $2) THEN $1 + $2
WHEN NOT $1 @> $3 THEN $2
ELSE $1
END;
RAISE NOTICE 'Inputs: %,%,%.  Returning %',$1::text,$2::text,$3::text,res;
RETURN res;
END;
$$ LANGUAGE plpgsql STABLE;

CREATE AGGREGATE range_continuous( daterange, date ) (

sfunc =  range_continuous_merge,
stype = daterange
--  initcond = '{0,0,0}'

);

SELECT
client_id,episode,start_date,end_date,range_continuous(daterange(start_date,end_date,'[]'),start_date)
OVER (PARTITION by client_id) FROM sample_data ;

SELECT
client_id,episode,start_date,end_date,agg_daterange(daterange(start_date,end_date,'[]'))
OVER (PARTITION by client_id) FROM sample_data ;



-- RANGE_CONTINUOUS_MERGE

NOTICE:  Inputs: ,[1990-01-01,1991-01-01),1990-01-01.  Returning
[1990-01-01,1991-01-01)
NOTICE:  Inputs:

Re: Given a set of daterange, finding the continuous range that includes a particular date

2018-02-23 Thread Ken Tanzer
On Fri, Feb 23, 2018 at 6:42 AM, Viktor Fougstedt 
wrote:

>
> Hi.
>
> This answer is perhaps useful if I understand your problem correctly. But
> I might have interpreted it wrongly. :-)
>
>
> I would probably start with merging intervals so that overlapping and
> adjacent intervals become single continuous intervals, then select from
> those merged intervals.
>
> We have an application with a lot of interval handling in PostgreSQL, and
> we use many variants of algorithms based on one by Itzik Ben Gan which he
> calls “packing intervals”. The post we started with was an old reader’s
> challenge from SQL Magazine. It has since been updated when MS-SQL started
> supporting window functions better (Itzik Ben Gan is a MS-SQL-guy).
>
> Basically, it is a few CTE:s which convert the intervals into “start” (+1)
> and “stop” (-1) events, then keeps a running sum of these, and finally
> creates new intervals by noting that the merged (or “packed”) intervals
> starts with events that had sum=0 before them and stops with events which
> have sum=0 after them.
>
> It involves both CTE:s and window functions and is quite a beautiful
> example of SQL, IMHO.
>
> I think it’s best to google it, he does a good job of explaining how it
> works.
>
> Hope that helps a bit at least.
>
> /Viktor
>
> Hi.  Thanks for the many helpful responses!  Although I didn't end up with
exactly what I was looking for, I think I got to something that works good
enough for now.  I did it with CTEs, in a way I think similar to what you
were suggesting (my printer is out of ink, so I didn't actually get to look
at that book yet!)  I ended up having to do 4 passes:

1) Identify the starts and ends of continuous ranges
2) Eliminate the middle-point records (non stops/starts)
3) Merge the stop date in with the starts
4) Eliminate the stops

I couldn't see how to do it in less steps, but if there's a way...

I posted the query along with some sample data below.

Thanks again!

Ken

BEGIN;

CREATE TEMP TABLE sample_data (
client_id INTEGER,
start_date DATE,
end_date DATE,
episode INTEGER -- I won't actually have this info; just a label I added
for clarity and testing
);

INSERT INTO sample_data VALUES


(1,'1990-01-01','1990-12-31',0),
(1,'1991-01-01','1991-12-31',0),
(1,'1995-01-01','1995-06-30',1),
(1,'2000-01-01','2000-12-31',2),
(1,'2001-01-01','2001-12-31',2),
(1,'2002-01-01','2002-12-31',2),
(1,'2003-01-01','2003-12-31',2),
(1,'2004-01-01','2004-12-31',2),
(1,'2005-01-01','2005-12-31',2),
(1,'2006-01-01','2006-12-31',2),
(1,'2014-01-01','2014-12-31',3),
(1,'2015-01-01','2015-12-31',3),
(1,'2017-06-30','2017-12-31',4),
(1,'2018-01-01',NULL,4),

(2,'2014-02-01','2015-01-31',0),
(2,'2015-02-01','2015-12-31',0),
(2,'2017-09-30','2018-01-31',1),
(2,'2018-02-01','2018-02-14',1)
;

WITH peek3 AS (
WITH peek2 AS (
WITH peek AS (
SELECT
  client_id,
  episode,
  daterange(start_date,end_date,'[]') AS range,
  COALESCE(daterange(start_date,end_date+1,'[]') &&
  lead(daterange(start_date,end_date,'[]')) OVER (PARTITION BY client_id
ORDER BY start_date),false) AS continues,
  COALESCE(daterange(start_date,end_date,'[]') &&
  lag(daterange(start_date,end_date+1,'[]')) OVER (PARTITION BY client_id
ORDER BY start_date),false) AS is_continued
FROM
  sample_data
)
SELECT
  *
FROM peek
  WHERE NOT (is_continued AND continues)
)
SELECT client_id,episode,range,
daterange(lower(range),
  CASE WHEN lead(is_continued) OVER (PARTITION BY client_id ORDER BY range)
THEN
  lead(upper(range)) OVER (PARTITION BY client_id ORDER BY range)
  ELSE upper(range)
  END) AS full_range
,continues,is_continued
FROM peek2
)
SELECT * FROM peek3
WHERE NOT is_continued
;



-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://demo.agency-software.org/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: Query error: could not resize shared memory segment

2018-02-23 Thread Thomas Munro
On Wed, Jan 3, 2018 at 6:34 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> On Wed, Jan 3, 2018 at 6:13 PM, Thuc Nguyen Canh
>>  wrote:
>>> mount | grep /dev/shm
>>> => shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k)
>
>> Bingo.  Somehow your container tech is limiting shared memory.
>
> If this is a common setup, maybe we're going to need a new section
> under or near
>
> https://www.postgresql.org/docs/current/static/kernel-resources.html#SYSVIPC
>
> It's pretty annoying to have sweated blood to get out from under the
> kernel's chosen-at-random SysV IPC limits, only to run into container
> tmpfs limits that seem equally chosen with the aid of a dartboard.

I ran into another couple of cases like this in the wild, and judging
by the popularity of Docker, I expect it to come up more with Parallel
Hash (and any future memory-gobbling parallel things... parallel
sorting, parallel repartitioning, ...) and with the increase in node
counts that partitioning can produce.  Here's an example of a public
discussion about this:

https://github.com/docker-library/postgres/issues/416

I don't know too much about Docker myself but I see from the
documentation that it defaults to imposing no limited on regular
memory[1], but it always imposes a limit on /dev/shm[2].

It's actually quite hard to make a recommendation for --shm-size.  As
discussed elsewhere[3],  our executor node-based work_mem management
means that the peak space usage depends on the number of concurrent
queries * number of executor nodes * number of parallel processes
allowed * work_mem.  It's just the same for regular memory, but in
that case the Docker administrator doesn't have to set a separate
fixed limit -- it's just whatever the operating system will take.  To
achieve the same behaviour for DSM memory, I think you either need to
set --shm-size sky high or side-step the issue by mounting the host's
unlimited /dev/shm in the container (a trick I saw mentioned
elsewhere, not sure if it's a good idea).

Anyway, it'd be good to hear from Docker/PostgreSQL users or experts.
Do we need to write some documentation here?

[1] https://docs.docker.com/config/containers/resource_constraints/
[2] 
https://docs.docker.com/engine/reference/run/#runtime-constraints-on-resources
[3] 
https://www.postgresql.org/message-id/CAEepm%3D38vXiZRgLe_Ks0pQEk8QSp%3D_YEUxht%3DWx%3D6%2B1qiaWOcA%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Location to specify C compiler option in C extension

2018-02-23 Thread Demitri Muna
Hi Tom,

> PG_CPPFLAGS ought to work.  One point is that you need to set that before
> including pgxs.mk; maybe it's an ordering problem?

On closer inspection, it was working, but the pg Makefile was specifically 
including “-Wdeclaration-after-statement” which I think was overriding the 
‘-std=c99’ part. The line below fixed the problem.

PG_CPPFLAGS += -std=c99 -Wno-declaration-after-statement

Thanks, sorry for the noise.

Demitri



Re: Location to specify C compiler option in C extension

2018-02-23 Thread Tom Lane
Demitri Muna  writes:
> I’m writing a C extension and have successfully based my Makefile on existing 
> templates. This conveniently hides the details of the full Makefile provided 
> by "pg_config --pgxs”. Which variable would be the appropriate one to specify 
> the ‘-std=c99’ compiler flag? I’ve tried many and still get warnings that 
> this flag should suppress (specifically, "ISO C90 forbids mixed declarations 
> and code”). These are the variables I currently define:

PG_CPPFLAGS ought to work.  One point is that you need to set that before
including pgxs.mk; maybe it's an ordering problem?

regards, tom lane



Re: PQConsumeinput stuck on recv

2018-02-23 Thread Andres Freund
Hi,

On 2018-02-23 15:59:18 -0300, Andre Oliveira Freitas wrote:
> I'm using the libpq that comes with debian, however I can install the
> library from the official repository to be sure, I assume the one from
> the official repo has it enabled.

That should be fine. Could you get the backtrace again, after installing
debugging symbols? It'd certainly be helpful to see the exact path to
the blocking syscall.

Greetings,

Andres Freund



Re: PQConsumeinput stuck on recv

2018-02-23 Thread Andre Oliveira Freitas
2018-02-23 14:20 GMT-03:00 Andres Freund :
>
> Hi,
>
> On 2018-02-23 13:33:18 -0300, Andre Oliveira Freitas wrote:
> > Since it's been happening for a few weeks now, every time it freezes we
> > take a gcore dump and check it in gdb... and after a lot of hair pulling
> > and learning about the innards of the VoIP software we see that most often
> > the software is stuck in this call trace:
> >
> > #0 in __libc_recv (fd=409, buf=0x7f2c4802e6c0, n=16384, flags=1898970523)
> > at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
> > #1 in ?? () from /usr/lib/x86_64-linux-gnu/libpq.so.5
> > #2 in ?? () from /usr/lib/x86_64-linux-gnu/libpq.so.5
> > #3 in PQconsumeInput () from /usr/lib/x86_64-linux-gnu/libpq.so.5
>
> So it's just receiving data from the network. Have you verified whether
> the connection is actually stable? Any chance it's just waiting for the
> network to time out.  Might be worth configuring tcp timeouts, to make
> sure its unrelated to that.
>
> What is the server showing as activity while the client is waiting?
> Could you show the corresponding pg_stat_activity row?

Sure,

datid;datname;pid;usesysid;usename;application_name;client_addr;client_hostname;client_port;backend_start;xact_start;query_start;state_change;wait_event_type;wait_event;state;backend_xid;backend_xmin;query
16,4;freeswitch;13,322;16,386;freeswitch;freeswitch;192.168.0.101;;43,276;2018-02-20
14:11:56;;2018-02-20 14:24:15;2018-02-20 14:24:15;;;idle;;;COMMIT
16,4;freeswitch;13,323;16,386;freeswitch;freeswitch;192.168.0.101;;43,328;2018-02-20
14:11:57;;2018-02-20 14:24:57;2018-02-20 14:24:57;;;idle;;;COMMIT
16,4;freeswitch;13,416;16,386;freeswitch;freeswitch;192.168.0.101;;51,681;2018-02-20
14:16:02;;2018-02-20 14:16:31;2018-02-20 14:16:31;;;idle;;;insert into
sip_authentication (nonce,expires,profile_name,hostname, last_nc)
values('363d02f6-cb9a-4791-9e05-d18473a18812', 1519147649, 'internal',
'srv-1', 0)
16,4;freeswitch;13,472;16,386;freeswitch;freeswitch;192.168.0.101;;33,481;2018-02-20
14:22:13;;2018-02-20 14:25:09;2018-02-20 14:25:09;;;idle;;;select
command from aliases where alias='show status'

The problematic connection is the third one; by checking the timings
on query_start it appears to be the last query executed by the stuck
connection before it became stuck. There are no outstanding locks in
any of the tables the VoIP software normally uses.

>
>
> > The software shares a database connection between threads, and controls its
> > access through a mutex, so once one thread that acquires the mutex gets
> > stuck in the location above, all other threads starts pilling up behind the
> > mutex, and that's apparently the reason the software stops responding for
> > most of its functions (while other functions that do not depend on the
> > database works normally).
>
> Hm, have you compiled libpq with threading support? Or use a
> distribution that compiles it with that? While I don't see an obvious
> connection to that stacktrace it seems worthwhile to verify.
>
> A mutex protecting individual connection, while also necessary if
> connections are shared, doesn't achieve the same.

I'm using the libpq that comes with debian, however I can install the
library from the official repository to be sure, I assume the one from
the official repo has it enabled.

>
> > I wonder if anyone has any tip on what to look for next...
>
> Any chance you're occasionally forking and then interacting with the
> connection in the forked process?

As far as I know, no. The software forks at the beginning but from
then on, its just threads.

>
> Greetings,
>
> Andres Freund

If it is of any help, here is the link to the implementation that
consumes PQconsumeInput:

https://freeswitch.org/stash/projects/FS/repos/freeswitch/browse/src/switch_pgsql.c#128

It is a function to check if the database connection is up before
running a query. I'm not a mantainer nor an expert in pg, but we
reviewed the implementation and seems OK.

One thing that is bothering me by looking at the gdb backtraces is
that recv always seems to be receiving a non-zero value in flags, even
though libpq seems to pass zero. I don't know if its of any relevance.

-- 

Thanks in advance,

Andre



Trigger file behavior with the standby

2018-02-23 Thread Keiko Oda
Hello,

I'm seeing the following behavior with a trigger file which is very
confusing to me, I'd like to get some advice of what is the expected
behavior of the trigger file with the standby.

1. setup the replication, with the standby having the following
recovery.conf

  # we use wal-e
  restore_command = 'wal-e wal-fetch  "%f" "%p"'
  standby_mode = 'true'
  trigger_file = '/my/path/to/trigger-file/STANDBY_OFF'
  recovery_target_timeline = 'latest'
  primary_conninfo = 'host=myhost port=5432 user=foo
password=verysecurepassword'

2. create a trigger file while standby is having a "lag" (and replication
is not streaming, but file-based log-shipping at this point)
3. looks like Postgres doesn't recognize a trigger file at all, standby
keeps replaying/recovering WALs
  * tried to see if Postgres is doing anything with DEBUG5 log, but it
doesn't say anything about a trigger file
  * also tried to restart Postgres, sending SIGUSR1, etc. to see if it
helps but it just keeps replaying WALs
4. once the standby "caught up" with the leader (replayed all WALs and
about to switch to the streaming replication and/or switch to the streaming
replication), Postgres finally realize that there is a trigger file, and do
the failover

The doc (
https://www.postgresql.org/docs/current/static/warm-standby-failover.html)
says:

> To trigger failover of a log-shipping standby server, run pg_ctl promote
or create a trigger file with the file name and path specified by the
trigger_file setting in recovery.conf.

So, I'd expect that the standby will trigger a failover as soon as we
create a trigger file at step 2. However, the failover doesn't happen until
step 3 above, and between step 2 and step 3 can take many hours sometimes.

I've reproduced this with Postgres 9.4 and 9.5, currently trying to
reproduce with 10.
Please let me know if there is any other information I could provide.

Thanks!
Keiko Oda


Location to specify C compiler option in C extension

2018-02-23 Thread Demitri Muna
Hi,

I’m writing a C extension and have successfully based my Makefile on existing 
templates. This conveniently hides the details of the full Makefile provided by 
"pg_config --pgxs”. Which variable would be the appropriate one to specify the 
‘-std=c99’ compiler flag? I’ve tried many and still get warnings that this flag 
should suppress (specifically, "ISO C90 forbids mixed declarations and code”). 
These are the variables I currently define:

EXTENSION
DATA
PGFILEDESC
MODULE_big
PG_CPPFLAGS (this one seemed like the right place)
SHLIB_LINK
OBJS
PG_CONFIG
PGXS

A workaround has been:

OBJS = file1.o file2.o CPPFLAGS+=-Wdeclaration-after-statement

but it’s not exactly the same thing.

Thanks,
Demitri




Re: Getting a primitive numeric value from "DatumGetNumeric"?

2018-02-23 Thread Demitri Muna
Hi,

On Feb 22, 2018, at 9:31 PM, Michael Paquier  wrote:

> PG_GETARG_NUMERIC(), no?

That function returns an object of datatype “Numeric” which still requires some 
(not immediately obvious) conversation to a double (or whatever primitive C 
type).

> When working on implementing your own data
> types or when hacking out functions which manipulate arguments of an
> existing datatype, looking at the input and output functions help a
> lot.  In your case, numeric_in and numeric_out in
> src/backend/utils/adt/numeric.c is full of hints.

I spent an hour diving into the code out of curiosity. I found useful functions 
like this:

double numeric_to_double_no_overflow(Numeric n)

They’re available from the PostgreSQL main source code, but not exposed in the 
public headers. (Maybe I was missing something.) There was enough there where I 
could see a way to copy/paste or otherwise link to those methods, but as Tom 
pointed out, implicit coercion handles what I need so I’ll stick with that.

Cheers,
Demitri




Re: PQConsumeinput stuck on recv

2018-02-23 Thread Andres Freund
Hi,

On 2018-02-23 13:33:18 -0300, Andre Oliveira Freitas wrote:
> Since it's been happening for a few weeks now, every time it freezes we
> take a gcore dump and check it in gdb... and after a lot of hair pulling
> and learning about the innards of the VoIP software we see that most often
> the software is stuck in this call trace:
> 
> #0 in __libc_recv (fd=409, buf=0x7f2c4802e6c0, n=16384, flags=1898970523)
> at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
> #1 in ?? () from /usr/lib/x86_64-linux-gnu/libpq.so.5
> #2 in ?? () from /usr/lib/x86_64-linux-gnu/libpq.so.5
> #3 in PQconsumeInput () from /usr/lib/x86_64-linux-gnu/libpq.so.5

So it's just receiving data from the network. Have you verified whether
the connection is actually stable? Any chance it's just waiting for the
network to time out.  Might be worth configuring tcp timeouts, to make
sure its unrelated to that.

What is the server showing as activity while the client is waiting?
Could you show the corresponding pg_stat_activity row?

> The software shares a database connection between threads, and controls its
> access through a mutex, so once one thread that acquires the mutex gets
> stuck in the location above, all other threads starts pilling up behind the
> mutex, and that's apparently the reason the software stops responding for
> most of its functions (while other functions that do not depend on the
> database works normally).

Hm, have you compiled libpq with threading support? Or use a
distribution that compiles it with that? While I don't see an obvious
connection to that stacktrace it seems worthwhile to verify.

A mutex protecting individual connection, while also necessary if
connections are shared, doesn't achieve the same.

> I wonder if anyone has any tip on what to look for next...

Any chance you're occasionally forking and then interacting with the
connection in the forked process?

Greetings,

Andres Freund



PQConsumeinput stuck on recv

2018-02-23 Thread Andre Oliveira Freitas
Hi, I've been experiencing an issue. We use an open-source VoIP software
whose backend is PostgreSQL. Initially we had a twin-server setup (one
server running the VoIP software, another one running the pg instance). Due
to company growth we were running into performance issues, so we rolled out
a new architecture using multiple VoIP servers connected to the single pg
instance. Since then, the VoIP software started misbehaving - it randomly
stops responding, and only a restart gets it back up running. It is random
throughout the servers, time-of-day, day-of-week... we haven't found a
correlation between it and any other metric like CPU usage, network traffic
and such.

Since it's been happening for a few weeks now, every time it freezes we
take a gcore dump and check it in gdb... and after a lot of hair pulling
and learning about the innards of the VoIP software we see that most often
the software is stuck in this call trace:

#0 in __libc_recv (fd=409, buf=0x7f2c4802e6c0, n=16384, flags=1898970523)
at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1 in ?? () from /usr/lib/x86_64-linux-gnu/libpq.so.5
#2 in ?? () from /usr/lib/x86_64-linux-gnu/libpq.so.5
#3 in PQconsumeInput () from /usr/lib/x86_64-linux-gnu/libpq.so.5

The software shares a database connection between threads, and controls its
access through a mutex, so once one thread that acquires the mutex gets
stuck in the location above, all other threads starts pilling up behind the
mutex, and that's apparently the reason the software stops responding for
most of its functions (while other functions that do not depend on the
database works normally).

And it stays stuck on it forever... at one situation we took two gcore
dumps spaced 10 minutes apart, and they look almost identical, with the
same thread stuck on recv and all the others waiting for the lock to be
released.

I wonder if anyone has any tip on what to look for next... Besides the
implementation of the VoIP software itself, we are looking into network
issues (we are seeing a bunch of TCP retransmissions between some servers
and the db), but otherwise no other app running on those servers has
presented any weird behavior like this VoIP software. We don't understand
what would cause recv to get stuck like this.

BTW we're running debian 9, pg 9.6.3, and the VoIP sofware (along most of
the other apps) uses libpq of a slightly older version (9.4.15).

Thanks in advance.

Andre


Re: Given a set of daterange, finding the continuous range that includes a particular date

2018-02-23 Thread Viktor Fougstedt

Hi.

This answer is perhaps useful if I understand your problem correctly. But I 
might have interpreted it wrongly. :-)


I would probably start with merging intervals so that overlapping and adjacent 
intervals become single continuous intervals, then select from those merged 
intervals.

We have an application with a lot of interval handling in PostgreSQL, and we 
use many variants of algorithms based on one by Itzik Ben Gan which he calls 
“packing intervals”. The post we started with was an old reader’s challenge 
from SQL Magazine. It has since been updated when MS-SQL started supporting 
window functions better (Itzik Ben Gan is a MS-SQL-guy).

Basically, it is a few CTE:s which convert the intervals into “start” (+1) and 
“stop” (-1) events, then keeps a running sum of these, and finally creates new 
intervals by noting that the merged (or “packed”) intervals starts with events 
that had sum=0 before them and stops with events which have sum=0 after them.

It involves both CTE:s and window functions and is quite a beautiful example of 
SQL, IMHO.

I think it’s best to google it, he does a good job of explaining how it works.

Hope that helps a bit at least.

/Viktor

On 23 Feb 2018, at 01:44, Ken Tanzer 
> wrote:

Hi, hoping to get some help with this.  I'm needing to take a specific date, a 
series of dateranges  and, given a specific date, return a single conitinuous 
daterange that includes that date.

To elaborate a bit, I've got lots of tables that include start and end dates.  
For example:

CREATE TABLE tbl_staff_assign (
staff_assign_id SERIAL PRIMARY KEY,
client_id   INTEGER NOT NULL REFERENCES tbl_client (client_id),
staff_idINTEGER REFERENCES tbl_staff(staff_id),
staff_assign_type_code  VARCHAR(10) NOT NULL REFERENCES 
tbl_l_staff_assign_type (staff_assign_type_code),
staff_assign_date   DATE NOT NULL,
staff_assign_date_end   DATE,
...

So a client might leave a progrma and then return later, or they might simply 
switch to another staff_id.  (In which case one record will have and end date, 
and the next record will start on the next day.)  In this case I need to know 
"what period were they continuously in the program that includes X date?"  So 
I'd like to be able to do something like:

"SELECT staff_assign_date,continuous_daterange( staff_assign_date, (SELECT 
array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM staff_assign sa

I've done this before with procedures specific to a particular table, and 
working with the start and end dates.  I'm now wanting to try to do this once 
generically that will work for all my cases.  So I'm hoping to do this in a way 
that performance isn't horrible.  And it's a little unclear to me how much and 
how I might be able to use the daterange operators to accomplish this 
efficiently.

Any advice or suggestions or ways to go about this appreciated.  Thanks!

Ken

p.s.,  Another small wrinkle is these records aren't always perfect, and 
ideally I'd allow for an optional fudge factor that would allow small gaps to 
be ignored.  I could just add that in every query (start_date+2,end_date-2), 
but it might be nice to have the function do it, if it didn't badly hurt 
performance.


--
[http://agency-software.org/demo/client/images/agency_logo_small.png]
AGENCY Software
A Free Software data system
By and for non-profits
http://agency-software.org/
https://demo.agency-software.org/client
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing 
list to
learn more about AGENCY or
follow the discussion.



Re: Given a set of daterange, finding the continuous range that includes a particular date

2018-02-23 Thread mariusz
On Thu, 2018-02-22 at 17:23 -0800, Ken Tanzer wrote:
> 
> 
> On Thu, Feb 22, 2018 at 5:05 PM, Adrian Klaver
>  wrote:
> On 02/22/2018 04:58 PM, Ken Tanzer wrote:
> 
> 
> 
> On Thu, Feb 22, 2018 at 4:53 PM, Adrian Klaver
>  > wrote:
> 
> On 02/22/2018 04:44 PM, Ken Tanzer wrote:
> 
> Hi, hoping to get some help with this.  I'm
> needing to take a
> specific date, a series of dateranges  and,
> given a specific
> date, return a single conitinuous daterange
> that includes that date.
> 
> To elaborate a bit, I've got lots of tables
> that include start
> and end dates.  For example:
> 
> CREATE TABLE tbl_staff_assign (
>   staff_assign_id SERIAL PRIMARY
> KEY,
>   client_id   INTEGER NOT NULL
> REFERENCES
> tbl_client (client_id),
>   staff_idINTEGER
> REFERENCES
> tbl_staff(staff_id),
>   staff_assign_type_code  VARCHAR(10)
> NOT NULL
> REFERENCES tbl_l_staff_assign_type
> (staff_assign_type_code),
>   staff_assign_date   DATE NOT
> NULL,
>   staff_assign_date_end   DATE,
> ...
> 
> So a client might leave a progrma and then
> return later, or they
> might simply switch to another staff_id.  (In
> which case one
> record will have and end date, and the next
> record will start on
> the next day.)  In this case I need to know
> "what period were
> they continuously in the program that includes
> X date?"  So I'd
> like to be able to do something like:
> 
> "SELECT
> staff_assign_date,continuous_daterange(
> staff_assign_date, (SELECT
> 
> 
> array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
> ) FROM staff_assign sa2 WHERE
> sa2.client_id=sa.client_id) FROM
> staff_assign sa
> 
> I've done this before with procedures specific
> to a particular
> table, and working with the start and end
> dates.  I'm now
> wanting to try to do this once generically
> that will work for
> all my cases.  So I'm hoping to do this in a
> way that
> performance isn't horrible.  And it's a little
> unclear to me how
> much and how I might be able to use the
> daterange operators to
> accomplish this efficiently.
> 
> 
> The operator I use to solve similar problems:
> 
> 
> https://www.postgresql.org/docs/10/static/functions-range.html
> 
> 
> 
> 
> @>  contains element
> '[2011-01-01,2011-03-01)'::tsrange
> @> '2011-01-10'::timestamp   t
> 
> 
> Thanks Adrian.  But how would you apply that to this
> situation, where I have a series of (quite possibly
> discontinuous) dateranges?
> 
> 
> This is going to depend on a more formal definition of the
> problem with some sample data. Right now I am trying to
> reconcile "what period were they continuously in the program
> that includes X date?" with "where I have a series of (quite
> possibly discontinuous) dateranges? " Maybe its just me,  I
> can't see how discontinuous can also be continuously.
> 
> 
> 
> 
> Start   End
> -- Episode 1
> 1/1/16  3/30/16
> 4/1/16  4/30/16
> -- Episode 2
> 
> 1/1/18  1/31/18
> 2/1/18  NULL
> 
> 
> Given 1/15/18, should return 1/1/18-->NULL
> Given 

Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later

2018-02-23 Thread PT
On Fri, 23 Feb 2018 17:10:56 +1300
David Rowley  wrote:

> On 23 February 2018 at 04:00, Bill Moran  wrote:
> > 2) The negative impact of an unused index is tiny compared to the negative
> >  impact of a missing index.
> 
> I'd say that would entirely depend on the workload of the table and
> the entire cluster. Disk space and I/O to write WAL and index pages to
> is surely a finite resource. Not to mention the possibility of
> disallowing HOT-Updates in the heap.

I feel like you missed my point. You're absolutely right, but the
disagreement was not on whether or not an unused index could cause
problems, but on the likelihood that the OP was going to build the
simulation code to actually determine whether the index is needed
or not. Information from the original question led me to believe
that simulation was either beyond his skill level or beyond his
time allocation; so I provided a less good, but more likely to be
helpful (in my opinion) answer.

The pattern that almost ALL new ventures I've seen follow is:
1) Hack something together based on an idea for a product
2) If the product actually succeeds, experience tons of issues
 related to scaling and performance
3) Run around like a crazy person fixing all the scaling and
 performance issues
4) Sell the company to someone else who ultimately becomes responsible
 for maturing the software

In theory, there's no reason this pattern _has_ to be so prevalent,
yet it is. Probably becuase it appears to minimize the up front cost,
which the people footing the bill just love.

Given that process, "shotgun" indexing is part of step 1 or step 3.
Whereas the building of load simulations and _real_ tuning of the
system is relegated to step 4.

Since investers tend to want to get out quick if #2 isn't going to
happen, they don't want people doing work that they consider part
of step #4.

I'm a pragmatist. I'd love to see everyone build software in a sane,
well-considered manner. I'd also love to see government without
corruption. However, in the world I _actually_ see, those things aren't
prevalent.

> It would seem to me that anyone using the "shotgun" indexing method
> may end up having to learn more about indexing the hard way.  Learning
> the hard way is something I like to try to avoid, personally. Probably
> it all has to come down to how important it is that your app actually
> can handle the load vs devs/dba experience level divided by time, both
> of the latter two are also a finite resource. So, it probably all has
> to be carefully balanced and quite possibly a person's opinion
> strongly relates to their experience.  If you were in the air traffic
> control business, perhaps your opinion might not be the same!? ...
> Sorry, the aeroplane crashed because the replica instance lagged and
> the plane's location wasn't updated... Analysis shows that the DBA
> indexed every column in the table and the WAL volume was more than the
> network's bandwidth could handle over the holiday period.  (Note: I
> know nothing about air traffic control, but it does seem like
> something you'd want to make stable systems for, games on the
> internet, probably less so...).

I really hope that people writing ATC software have the experience to
do really good testing (including load simulation, etc) but the 3 mile
island accident happened partially because of faulty sensor design, so
there's no guarantee that's the case.

Interesting discussion.

-- 
Bill Moran