Re: Trouble Upgrading Postgres

2018-11-04 Thread Adrian Klaver

On 11/4/18 2:55 PM, Charles Martin wrote:

Yep, you called it:

Nov  2 20:30:45 localhost kernel: Out of memory: Kill process 30438 
(postmaster) score 709 or sacrifice child
Nov  2 20:30:45 localhost kernel: Killed process 30438, UID 26, 
(postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kB


So it's running out of memory when trying to dump this table. The "old" 
server has 4GB of ram, the "new" server 20GB.




In addition to the other suggestions, what is the exact pg_dump command 
you are using?



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Trouble Upgrading Postgres

2018-11-04 Thread Tom Lane
Charles Martin  writes:
> Yep, you called it:
> Nov  2 20:30:45 localhost kernel: Out of memory: Kill process 30438
> (postmaster) score 709 or sacrifice child
> Nov  2 20:30:45 localhost kernel: Killed process 30438, UID 26,
> (postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kB

> So it's running out of memory when trying to dump this table. The "old"
> server has 4GB of ram, the "new" server 20GB.

That's kind of odd: a COPY shouldn't really consume very much working
memory.  I suspect that much of the process's apparent VM consumption may
be shared buffers ... what have you got shared_buffers set to on the old
server?  If it's more than half a GB or so, maybe reducing it would help.

regards, tom lane



Re: Trouble Upgrading Postgres

2018-11-04 Thread Ron

Not enough swap space?

On 11/04/2018 04:55 PM, Charles Martin wrote:

Yep, you called it:

Nov  2 20:30:45 localhost kernel: Out of memory: Kill process 30438 
(postmaster) score 709 or sacrifice child
Nov  2 20:30:45 localhost kernel: Killed process 30438, UID 26, 
(postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kB


So it's running out of memory when trying to dump this table. The "old" 
server has 4GB of ram, the "new" server 20GB.



On Sun, Nov 4, 2018 at 3:13 PM Adrian Klaver > wrote:


On 11/4/18 8:38 AM, Charles Martin wrote:
>
> Adtrian said:
>>> pg_dump: Error message from server: server closed the connection
>>> unexpectedly
>
>  >Is this error the client reporting?
>  >Is this the same that is showing up in the server log?
>
> Yes, that's the client message, i.e. what appeared in the terminal
> window that gave the command. The server log shows:
>
> 2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG: server process
> (PID 30438) was terminated by signal 9: Killed
>
> 2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL: Failed process
> was running: COPY public.docfile (docfile_pkey, docfileoriginalname,
> ordernumber, versionnum, docfilecontents, docfilepath, d$
>
> 2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG: terminating any
> other active server processes
>
> 2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING:  terminating
> connection because of crash of another server process
>
> 2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL:  The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server process exited abnor$
>
>
>
>>So where is the server located relative to the pg_dump client?
>>On the same machine?
>>If so is it a virtual machine e.g AWS?
>>Across a local or remote network?
>
>
>   I gave the command in a terminal session after SSHing to the server
> from the same network. It is not a virtual machine.
>
>
> Lsaurenz said:
>
>
>>You probably have a corrupted database.
>>You should get that fixed first, then you can upgrade.
>>Maybe you should hire a professional for that.
>
>
> I suspect this is is correct, both that there is corruption in the
table
> and that I need a professional to help. If someone here is available,
> I'm interested.

Given that this involves your largest table I would confirm that the
signal 9 kill was not coming from the system OOM killer. Take a look at
the system logs to see what they show over the same time period.

>
>
> Andreas said:
>
>
>  >which exact minor version please?
>
>
> PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> 20120313 (Red Hat 4.4.7-23), 64-bit
>
>


-- 
Adrian Klaver

adrian.kla...@aklaver.com 



--
Angular momentum makes the world go 'round.


Re: Trouble Upgrading Postgres

2018-11-04 Thread Charles Martin
Yep, you called it:

Nov  2 20:30:45 localhost kernel: Out of memory: Kill process 30438
(postmaster) score 709 or sacrifice child
Nov  2 20:30:45 localhost kernel: Killed process 30438, UID 26,
(postmaster) total-vm:3068900kB, anon-rss:1695392kB, file-rss:1074692kB

So it's running out of memory when trying to dump this table. The "old"
server has 4GB of ram, the "new" server 20GB.


On Sun, Nov 4, 2018 at 3:13 PM Adrian Klaver 
wrote:

> On 11/4/18 8:38 AM, Charles Martin wrote:
> >
> > Adtrian said:
> >>> pg_dump: Error message from server: server closed the connection
> >>> unexpectedly
> >
> >  >Is this error the client reporting?
> >  >Is this the same that is showing up in the server log?
> >
> > Yes, that's the client message, i.e. what appeared in the terminal
> > window that gave the command. The server log shows:
> >
> > 2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG:  server process
> > (PID 30438) was terminated by signal 9: Killed
> >
> > 2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL:  Failed process
> > was running: COPY public.docfile (docfile_pkey, docfileoriginalname,
> > ordernumber, versionnum, docfilecontents, docfilepath, d$
> >
> > 2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG:  terminating any
> > other active server processes
> >
> > 2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING:  terminating
> > connection because of crash of another server process
> >
> > 2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL:  The
> > postmaster has commanded this server process to roll back the current
> > transaction and exit, because another server process exited abnor$
> >
> >
> >
> >>So where is the server located relative to the pg_dump client?
> >>On the same machine?
> >>If so is it a virtual machine e.g AWS?
> >>Across a local or remote network?
> >
> >
> >   I gave the command in a terminal session after SSHing to the server
> > from the same network. It is not a virtual machine.
> >
> >
> > Lsaurenz said:
> >
> >
> >>You probably have a corrupted database.
> >>You should get that fixed first, then you can upgrade.
> >>Maybe you should hire a professional for that.
> >
> >
> > I suspect this is is correct, both that there is corruption in the table
> > and that I need a professional to help. If someone here is available,
> > I'm interested.
>
> Given that this involves your largest table I would confirm that the
> signal 9 kill was not coming from the system OOM killer. Take a look at
> the system logs to see what they show over the same time period.
>
> >
> >
> > Andreas said:
> >
> >
> >  >which exact minor version please?
> >
> >
> > PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> > 20120313 (Red Hat 4.4.7-23), 64-bit
> >
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Trouble Upgrading Postgres

2018-11-04 Thread Adrian Klaver

On 11/4/18 8:38 AM, Charles Martin wrote:


Adtrian said:
pg_dump: Error message from server: server closed the connection 
unexpectedly


 >Is this error the client reporting?
 >Is this the same that is showing up in the server log?

Yes, that's the client message, i.e. what appeared in the terminal 
window that gave the command. The server log shows:


2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG:  server process 
(PID 30438) was terminated by signal 9: Killed


2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL:  Failed process 
was running: COPY public.docfile (docfile_pkey, docfileoriginalname, 
ordernumber, versionnum, docfilecontents, docfilepath, d$


2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG:  terminating any 
other active server processes


2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING:  terminating 
connection because of crash of another server process


2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL:  The 
postmaster has commanded this server process to roll back the current 
transaction and exit, because another server process exited abnor$





So where is the server located relative to the pg_dump client?
On the same machine?
If so is it a virtual machine e.g AWS?
Across a local or remote network?



  I gave the command in a terminal session after SSHing to the server 
from the same network. It is not a virtual machine.



Lsaurenz said:



You probably have a corrupted database.
You should get that fixed first, then you can upgrade.
Maybe you should hire a professional for that.



I suspect this is is correct, both that there is corruption in the table 
and that I need a professional to help. If someone here is available, 
I'm interested.


Given that this involves your largest table I would confirm that the 
signal 9 kill was not coming from the system OOM killer. Take a look at 
the system logs to see what they show over the same time period.





Andreas said:


 >which exact minor version please?


PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 
20120313 (Red Hat 4.4.7-23), 64-bit






--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Trouble Upgrading Postgres

2018-11-04 Thread Andreas Kretschmer




Am 04.11.2018 um 17:38 schrieb Charles Martin:


Andreas said:


>which exact minor version please?


PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 
20120313 (Red Hat 4.4.7-23), 64-bit




okay, i asked to just rule out a stale version. 9.6.7, for instance, 
contains some importand bug fixes.



Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: Trouble Upgrading Postgres

2018-11-04 Thread Charles Martin
Adtrian said:
>> pg_dump: Error message from server: server closed the connection
>> unexpectedly

>Is this error the client reporting?
>Is this the same that is showing up in the server log?

Yes, that's the client message, i.e. what appeared in the terminal window
that gave the command. The server log shows:

2018-11-02 20:30:46 EDT [20405]: [4-1] user=,db= LOG:  server process (PID
30438) was terminated by signal 9: Killed

2018-11-02 20:30:46 EDT [20405]: [5-1] user=,db= DETAIL:  Failed process
was running: COPY public.docfile (docfile_pkey, docfileoriginalname,
ordernumber, versionnum, docfilecontents, docfilepath, d$

2018-11-02 20:30:46 EDT [20405]: [6-1] user=,db= LOG:  terminating any
other active server processes

2018-11-02 20:30:46 EDT [20415]: [10-1] user=,db= WARNING:  terminating
connection because of crash of another server process

2018-11-02 20:30:46 EDT [20415]: [11-1] user=,db= DETAIL:  The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnor$



>So where is the server located relative to the pg_dump client?
>On the same machine?
>If so is it a virtual machine e.g AWS?
>Across a local or remote network?


 I gave the command in a terminal session after SSHing to the server from
the same network. It is not a virtual machine.


Lsaurenz said:


>You probably have a corrupted database.
>You should get that fixed first, then you can upgrade.
>Maybe you should hire a professional for that.


I suspect this is is correct, both that there is corruption in the table
and that I need a professional to help. If someone here is available, I'm
interested.


Andreas said:


>which exact minor version please?


PostgreSQL 9.6.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-23), 64-bit


Re: Trouble Upgrading Postgres

2018-11-04 Thread Andreas Kretschmer




Am 03.11.2018 um 23:47 schrieb Charles Martin:

When I do a pg_dump using PG 9.6, I got this:

pg_dump: Dumping the contents of table "docfile" failed: 
PQgetCopyData() failed.


pg_dump: Error message from server: server closed the connection 
unexpectedly


This probably means the server terminated abnormally

before or while processing the request.

pg_dump: The command was: COPY public.docfile (docfile_pkey, 
docfileoriginalname, ordernumber, versionnum, docfilecontents, 
docfilepath, docfileextension, enddatetime, endby, editnum, insby, 
insdatetime, modby, moddatetime, active, doc_fkey) TO stdout;




which exact minor version please?


Regards, Andreas
--

2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: Trouble Upgrading Postgres

2018-11-04 Thread Laurenz Albe
Charles Martin wrote:
> When I do a pg_dump using PG 9.6, I got this:
> 
> > pg_dump: Dumping the contents of table "docfile" failed: PQgetCopyData() 
> > failed.
> > pg_dump: Error message from server: server closed the connection 
> > unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > pg_dump: The command was: COPY public.docfile (docfile_pkey, 
> > docfileoriginalname,
> > ordernumber, versionnum, docfilecontents, docfilepath, docfileextension, 
> > enddatetime,
> > endby, editnum, insby, insdatetime, modby, moddatetime, active, doc_fkey) 
> > TO stdout;

You probably have a corrupted database.

You should get that fixed first, then you can upgrade.

Maybe you should hire a professional for that.

The other alternative is that you hava a buggy extension installed
that causes PostgreSQL to crash.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Logical replication hangs up.

2018-11-04 Thread Aleš Zelený
Hello,

wal_sender_timeout did not help (from default 60s to 600s), but I haven't
tested receiver timeout since it was sender complaining and receiver loops
over ( recvfrom with "Resource  temporarily unavailable" result, so i
thought it is waiting for master sender to provide some data), thanks for
the tip.

Since the behavior is all the time deterministic - works as expected until
sender reaches 429/69E9CC60 (  wal 000104290069 ) and it stuck
it seems to me, that the issue is somewhat related to content of this file.

Another consideration is, why restart_lsn ( flush_lsn   /  replay_lsn on
master ) never goes through 426/AAE55A68. May be, it is due to lack of my
knowledge, but I'd expect that within almost 11GB of wal data would be some
committed changes and it is lot behind maximum sent_lsn reported by wal
sender - I would expect reply_lsn to more/less follow sent_lsn but I have
no clue what what is root cause (stuck sender at certain sent_lsn or
logical replication bg worker not applying/reporting so keeping restart_lsn
constant). That is why I've tried to find whatever suspicious using
pg_waldump in wal files corresponding to restart_lsn and sent_lsn.

Kind regards Ales

so 3. 11. 2018 v 17:02 odesílatel Jerry Sievers 
napsal:

> Aleš Zelený  writes:
>
> > Hello,
> >
> > we are suing logical replication on 10.4  and it now hangs. After
> > some timeout it is retarted again, replaying 18GB of data and then
> > hangs (while 7GB of wals remains to be proceeded).
>
> Timeout...
>
> Have a look at the 2 setting wal sender/receiver timeout and you
> probably need to raise the sender timeout value.
>
> HTH
>
> >
> > The backlog of 18GB comes from a failed migration adding new table to
> > replication while replication user was not granted to select the
> > table. This was clear from log files and once resolved by adding
> > select privilege, I thought that all will work as usual (same
> > happened in test env. many times and adding missing grant for select
> > was sufficient to get it working... these were issues on tests).
> >
> > RDBMS Version:
> > PostgreSQL 10.4 (Ubuntu 10.4-2.pgdg16.04+1) on x86_64-pc-linux-gnu,
> > compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609,
> > 64-bit
> >
> > Publication (master) instance error message from log:
> >
> > 2018-11-02 18:34:14 UTC 7974 5bdc8d27.1f26 7 192.168.23.11(58884)
> > master_prod repusr 0 5/0 sub_eur idle [0]:LOG:  terminating
> > walsender process due to replication timeout
> >
> > Subscription instance log:
> >
> > 2018-11-02 18:34:14 UTC 8657 5bdc8d26.21d1 20 3/0
> > [XX000]:ERROR:  could not receive data from WAL stream: SSL
> > connection has been closed unexpectedly
> > 2018-11-02 18:34:14 UTC 24699 5b923b1c.607b 12093970
> > [0]:LOG:  worker process: logical replication worker for
> > subscription 37932 (PID 8657) exited with exit code 1
> >
> > While it seems to be a network issue, it is may be not - we have
> > checked the network and even monitoring, all the time some packets
> > were exchanged.
> >
> > We do have 3 subscriptions(thus repl. slots) for one publication, one
> > subscriber instance is within same datacenter as master, remainig
> > subscribers are remote.
> >
> >
> >>> select * from pg_replication_slots
> >
> ++--+---++-+---+++--+--+--+-+
> > | slot_name  | plugin   | slot_type | datoid | database|
> > temporary | active | active_pid | xmin | catalog_xmin | restart_lsn
> > | confirmed_flush_lsn |
> >
> ++--+---++-+---+++--+--+--+-+
> > | sub_usd| pgoutput | logical   | 16421  | master_prod |
> > False | True   | 16604  |  | 5536488  | 426/AAE55A68
> > | 426/AAE55A68|
> > | sub_cad| pgoutput | logical   | 16421  | master_prod |
> > False | True   | 22875  |  | 5536488  | 426/AAE55A68
> > | 426/AAE55A68|
> > | sub_eur| pgoutput | logical   | 16421  | master_prod |
> > False | True   | 16605  |  | 5536488  | 426/AAE55A68
> > | 426/AAE55A68|
> >
> ++--+---++-+---+++--+--+--+-+
> >
> >
> > Once  after the error occurs, wal senders are re/started and they
> > reads wal files until they reach restart_lsn wal file:
> >
> >>> select pg_walfile_name('426/AAE55A68')
> > +--+
> > | pg_walfile_name  |
> > +--+
> > | 0001042600AA |
> > +--+
> >
> > # changing file names until they reach this one:
> > root@master-db:/pgsql/pgcluster/10/master_prod# lsof -p 1560 -p 5758
> > -p 5790| grep pg_wal
> > postgres 1560 postgres   10r  REG  259,3  16777216 115766007 /
> > 

Unsuscribe

2018-11-04 Thread Christophe TAVERNE
Hi,

I would like to unsuscribe but I can't recover my account.

How can I process ?

Please, admin, could you remove my email from List?


Thanks

regards,

Chris




signature.asc
Description: OpenPGP digital signature