Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-04-13 Thread Justin Pryzby
On Wed, Apr 13, 2022 at 03:36:19PM +, l...@laurent-hasson.com wrote:
> After a lot of back and forth, someone in IT informed us that the database VM 
> is under a backup schedule using Veeam. Apparently, during the backup window, 
> Veeam creates a snapshot and that takes the VM offline for a couple of 
> minutes… And of course, they scheduled this right at the busiest time of the 
> day for this machine which is during our nightly ETL. Their backup doesn’t 
> perform very week either, which explained why the failure seemed to randomly 
> happen at various points during our ETL (which takes about 2h30mn).
> 
> They moved the schedule out and the issue has not happened again over the 
> past 3 weeks. This looks like it was the root cause and would explain (I 
> think) how the database and the client simultaneously reported a connection 
> timeout.
> 
> Thank you so much for all your help in trying to figure this out and 
> exonerate Postgres.

Great, thanks for letting us know.
This time it wasn't postgres' fault; you're 2 for 3 ;)

One issue I've seen is if a vmware snapshot is taken and then saved for a long
time.  It can be okay if VEEM takes a transient snapshot, copies its data, and
then destroys the snapshot.  But it can be bad if multiple snapshots are taken
and then left around for a long time to use as a backup themselves.

-- 
Justin




RE: An I/O error occurred while sending to the backend (PG 13.4)

2022-04-13 Thread l...@laurent-hasson.com
>From: Ranier Vilela ranier...@gmail.com
>Sent: Thursday, March 03, 2022 13:22
>
>
>You are welcome.
>
>regards,
>Ranier Vilela



Hello all,

After a lot of back and forth, someone in IT informed us that the database VM 
is under a backup schedule using Veeam. Apparently, during the backup window, 
Veeam creates a snapshot and that takes the VM offline for a couple of minutes… 
And of course, they scheduled this right at the busiest time of the day for 
this machine which is during our nightly ETL. Their backup doesn’t perform very 
week either, which explained why the failure seemed to randomly happen at 
various points during our ETL (which takes about 2h30mn).

They moved the schedule out and the issue has not happened again over the past 
3 weeks. This looks like it was the root cause and would explain (I think) how 
the database and the client simultaneously reported a connection timeout.

Thank you so much for all your help in trying to figure this out and exonerate 
Postgres.

Thank you,
Laurent.




Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread Ranier Vilela
Em qui., 3 de mar. de 2022 às 15:19, l...@laurent-hasson.com <
l...@laurent-hasson.com> escreveu:

> I am also starting to feel that the issue being on the database’s side is
> less and less likely. There is something happening in between, or possibly
> on the client.
>
>
>
> Ranier, the only reason I was focusing on this at the PG level is that
> this issue started to show up several months ago shortly after I updated to
> PG13 from PG11. Had run PG11 for 2 years without ever seeing that issue at
> all. The ETL itself hasn’t changed either, except for upgrading the JDBC
> driver… But I did revert back to an older JDBC driver and the issue still
> did occur eventually.
>
>
>
> Of course, other things could have changed in the client’s IT
> infrastructure that I am not aware of, so I am pushing that angle as well
> more aggressively now. I am also pushing for WireShark to monitor the
> network more closely. Stay tuned!
>
>
>
> Thank you so much all for your support but at this time, I think the ball
> is in my camp and working out with it on some plan.
>
You are welcome.

regards,
Ranier Vilela


RE: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread l...@laurent-hasson.com
I am also starting to feel that the issue being on the database’s side is less 
and less likely. There is something happening in between, or possibly on the 
client.

Ranier, the only reason I was focusing on this at the PG level is that this 
issue started to show up several months ago shortly after I updated to PG13 
from PG11. Had run PG11 for 2 years without ever seeing that issue at all. The 
ETL itself hasn’t changed either, except for upgrading the JDBC driver… But I 
did revert back to an older JDBC driver and the issue still did occur 
eventually.

Of course, other things could have changed in the client’s IT infrastructure 
that I am not aware of, so I am pushing that angle as well more aggressively 
now. I am also pushing for WireShark to monitor the network more closely. Stay 
tuned!

Thank you so much all for your support but at this time, I think the ball is in 
my camp and working out with it on some plan.

Thank you,
Laurent.


From: Ranier Vilela 
Sent: Thursday, March 3, 2022 11:57
To: Justin Pryzby 
Cc: l...@laurent-hasson.com; pgsql-performa...@postgresql.org
Subject: Re: An I/O error occurred while sending to the backend (PG 13.4)


Em qui., 3 de mar. de 2022 às 13:46, Justin Pryzby 
mailto:pry...@telsasoft.com>> escreveu:
On Thu, Mar 03, 2022 at 01:33:08PM -0300, Ranier Vilela wrote:
> Sorry, but this is much more on the client side.

The client is reporting the problem, as is the server.
 Are you read the server log?
" 2022-03-03 01:04:40 EST [21228] LOG:  could not receive data from client: An 
existing connection was forcibly closed by the remote host.
2022-03-03 01:04:40 EST [21228] LOG:  unexpected EOF on client connection with 
an open transaction"


> Following the logs, it is understood that the client is dropping the
> connection.

The logs show that the client's connection *was* dropped.
And on the server, the same.
No, the log server shows that the client dropped the connection.


> So most likely the error could be from Pentaho or JDBC.
>
> https://www.geeksforgeeks.org/java-net-socketexception-in-java-with-examples/
> " This *SocketException* occurs on the server-side when the client closed
> the socket connection before the response could be returned over the
> socket."
>
> I suggest moving this thread to the Pentaho or JDBC support.

We don't know the source of the problem.
Yeah, but it is much more likely to be on the client.

  I still doubt it's in postgres,
Everything indicates not.

but I
don't think it's helpful to blame the client, just because the client reported
the problem.  If the server were to disconnect abruptly, I'd expect the client
to report that, too.

Laurent would just have to start the conversation over (and probably collect
the same diagnostic information anyway).  The client projects could blame
postgres with as much rationale as there is for us to blame the client.

Please don't add confusion here.
I just suggested, this is not an order.

regards,
Ranier Vilela


Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread Ranier Vilela
Em qui., 3 de mar. de 2022 às 13:46, Justin Pryzby 
escreveu:

> On Thu, Mar 03, 2022 at 01:33:08PM -0300, Ranier Vilela wrote:
> > Sorry, but this is much more on the client side.
>
> The client is reporting the problem, as is the server.
>
 Are you read the server log?
" 2022-03-03 01:04:40 EST [21228] LOG:  could not receive data from client:
An existing connection was forcibly closed by the remote host.
2022-03-03 01:04:40 EST [21228] LOG:  unexpected EOF on client connection
with an open transaction"


> > Following the logs, it is understood that the client is dropping the
> > connection.
>
> The logs show that the client's connection *was* dropped.
> And on the server, the same.
>
No, the log server shows that the client dropped the connection.


>
> > So most likely the error could be from Pentaho or JDBC.
> >
> >
> https://www.geeksforgeeks.org/java-net-socketexception-in-java-with-examples/
> > " This *SocketException* occurs on the server-side when the client closed
> > the socket connection before the response could be returned over the
> > socket."
> >
> > I suggest moving this thread to the Pentaho or JDBC support.
>
> We don't know the source of the problem.

Yeah, but it is much more likely to be on the client.


>   I still doubt it's in postgres,

Everything indicates not.

but I
> don't think it's helpful to blame the client, just because the client
> reported
> the problem.  If the server were to disconnect abruptly, I'd expect the
> client
> to report that, too.
>
> Laurent would just have to start the conversation over (and probably
> collect
> the same diagnostic information anyway).  The client projects could blame
> postgres with as much rationale as there is for us to blame the client.
>
> Please don't add confusion here.

I just suggested, this is not an order.

regards,
Ranier Vilela


Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread Justin Pryzby
On Thu, Mar 03, 2022 at 01:33:08PM -0300, Ranier Vilela wrote:
> Sorry, but this is much more on the client side.

The client is reporting the problem, as is the server.

> Following the logs, it is understood that the client is dropping the
> connection.

The logs show that the client's connection *was* dropped.
And on the server, the same.

> So most likely the error could be from Pentaho or JDBC.
> 
> https://www.geeksforgeeks.org/java-net-socketexception-in-java-with-examples/
> " This *SocketException* occurs on the server-side when the client closed
> the socket connection before the response could be returned over the
> socket."
> 
> I suggest moving this thread to the Pentaho or JDBC support.

We don't know the source of the problem.  I still doubt it's in postgres, but I
don't think it's helpful to blame the client, just because the client reported
the problem.  If the server were to disconnect abruptly, I'd expect the client
to report that, too.

Laurent would just have to start the conversation over (and probably collect
the same diagnostic information anyway).  The client projects could blame
postgres with as much rationale as there is for us to blame the client.

Please don't add confusion here.  I made suggestions for how to collect more
information to better understand the source of the problem, and there's
probably not much else to say without that.

-- 
Justin




Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread Ranier Vilela
Em qui., 3 de mar. de 2022 às 11:55, l...@laurent-hasson.com <
l...@laurent-hasson.com> escreveu:

>
>
>>  -Original Message-
>>  From: Justin Pryzby 
>>  Sent: Tuesday, March 1, 2022 14:27
>>  To: l...@laurent-hasson.com
>>  Cc: pgsql-performa...@postgresql.org
>    >  Subject: Re: An I/O error occurred while sending to the backend (PG
> 13.4)
>>
>>  On Tue, Mar 01, 2022 at 04:28:31PM +, l...@laurent-hasson.com
>>  wrote:
>>  > Now, there is an additional component I think... Storage is on an
> array
>>  and I am not getting a clear answer as to where it is  Is it
> possible that
>>  something is happening at the storage layer? Could that be reported
> as a
>>  network issue vs a storage issue for Postgres?
>>
>>  No.  If there were an error with storage, it'd be reported as a
> local error,
>>  and the query would fail, rather than failing with client-server
>>  communication.
>>
>>  > Also, both machines are actually VMs. I forgot to mention that and
> not
>>  sure if that's relevant.
>>
>>  Are they running on the same hypervisor ?  Is that hyperv ?
>>  Lacking other good hypotheses, that does seem relevant.
>>
>>  --
>>  Justin
>
> Issue happened again last night. I did implement your recommendations but
> it didn't seem to prevent the issue:
>
> tcp_keepalives_idle=9   # TCP_KEEPIDLE, in seconds;
> # 0 selects the system default
> tcp_keepalives_interval=9   # TCP_KEEPINTVL, in seconds;
> # 0 selects the system default
> tcp_keepalives_count=0  # TCP_KEEPCNT;
> # 0 selects the system default
> #tcp_user_timeout = 0   # TCP_USER_TIMEOUT, in milliseconds;
> # 0 selects the system default
>
> On the client application, the exceptions are:
>
> 2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from
> 2017-05-16 17.18.02 by buildguy) : Unexpected error
> 2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from
> 2017-05-16 17.18.02 by buildguy) :
> org.pentaho.di.core.exception.KettleStepException:
> 2022/03/03 01:04:56 - Upsert2.0 - Error in step, asking everyone to stop
> because of:
> 2022/03/03 01:04:56 - Upsert2.0 -
> 2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row
> 2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to
> the backend.
> 2022/03/03 01:04:56 - Upsert2.0 -
> 2022/03/03 01:04:56 - Upsert2.0 -
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:313)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> java.lang.Thread.run(Thread.java:745)
> 2022/03/03 01:04:56 - Upsert2.0 - Caused by:
> org.pentaho.di.core.exception.KettleDatabaseException:
> 2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row
> 2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to
> the backend.
> 2022/03/03 01:04:56 - Upsert2.0 -
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.core.database.Database.insertRow(Database.java:1321)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.core.database.Database.insertRow(Database.java:1245)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.core.database.Database.insertRow(Database.java:1233)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.trans.steps.insertupdate.InsertUpdate.lookupValues(InsertUpdate.java:163)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:299)
> 2022/03/03 01:04:56 - Upsert2.0 -... 2 more
> 2022/03/03 01:04:56 - Upsert2.0 - Caused by:
> org.postgresql.util.PSQLException: An I/O error occurred while sending to
> the backend.
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:382)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatem

RE: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread l...@laurent-hasson.com


   >  -Original Message-
   >  From: Justin Pryzby 
   >  Sent: Tuesday, March 1, 2022 14:27
   >  To: l...@laurent-hasson.com
   >  Cc: pgsql-performa...@postgresql.org
   >  Subject: Re: An I/O error occurred while sending to the backend (PG 13.4)
   >  
   >  On Tue, Mar 01, 2022 at 04:28:31PM +, l...@laurent-hasson.com
   >  wrote:
   >  > Now, there is an additional component I think... Storage is on an array
   >  and I am not getting a clear answer as to where it is  Is it possible 
that
   >  something is happening at the storage layer? Could that be reported as a
   >  network issue vs a storage issue for Postgres?
   >  
   >  No.  If there were an error with storage, it'd be reported as a local 
error,
   >  and the query would fail, rather than failing with client-server
   >  communication.
   >  
   >  > Also, both machines are actually VMs. I forgot to mention that and not
   >  sure if that's relevant.
   >  
   >  Are they running on the same hypervisor ?  Is that hyperv ?
   >  Lacking other good hypotheses, that does seem relevant.
   >  
   >  --
   >  Justin

Issue happened again last night. I did implement your recommendations but it 
didn't seem to prevent the issue:

tcp_keepalives_idle=9   # TCP_KEEPIDLE, in seconds;
# 0 selects the system default
tcp_keepalives_interval=9   # TCP_KEEPINTVL, in seconds;
# 0 selects the system default
tcp_keepalives_count=0  # TCP_KEEPCNT;
# 0 selects the system default
#tcp_user_timeout = 0   # TCP_USER_TIMEOUT, in milliseconds;
# 0 selects the system default

On the client application, the exceptions are:

2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from 
2017-05-16 17.18.02 by buildguy) : Unexpected error
2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from 
2017-05-16 17.18.02 by buildguy) : 
org.pentaho.di.core.exception.KettleStepException: 
2022/03/03 01:04:56 - Upsert2.0 - Error in step, asking everyone to stop 
because of:
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row
2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to the 
backend.
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:313)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2022/03/03 01:04:56 - Upsert2.0 -at java.lang.Thread.run(Thread.java:745)
2022/03/03 01:04:56 - Upsert2.0 - Caused by: 
org.pentaho.di.core.exception.KettleDatabaseException: 
2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row
2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to the 
backend.
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.core.database.Database.insertRow(Database.java:1321)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.core.database.Database.insertRow(Database.java:1245)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.core.database.Database.insertRow(Database.java:1233)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.trans.steps.insertupdate.InsertUpdate.lookupValues(InsertUpdate.java:163)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:299)
2022/03/03 01:04:56 - Upsert2.0 -... 2 more
2022/03/03 01:04:56 - Upsert2.0 - Caused by: org.postgresql.util.PSQLException: 
An I/O error occurred while sending to the backend.
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:382)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:134)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.core.database.Database.insertRow(Database.java:1288)
2022/03/03 01:04:56 - Upsert2.0 -... 6 more
2022/03/03 01:04:56 - Upsert2.0 - Caused by: java.net.SocketException: 
Connection reset
2022/03/03 01:04:56 - Upsert2.0 -at 
java.net.SocketInputStream.read(SocketInputStream.java:209)
2022/03/03 01:04:56 - Upsert2.0 -at 
java.net.SocketInputStream.read(SocketInputStream.java:141)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.core.VisibleBufferedInput

Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-01 Thread Ranier Vilela
Em seg., 28 de fev. de 2022 às 13:50, l...@laurent-hasson.com <
l...@laurent-hasson.com> escreveu:

>
>
> >From: Ranier Vilela 
> >Sent: Thursday, February 24, 2022 08:46
> >To: Justin Pryzby 
> >Cc: l...@laurent-hasson.com; pgsql-performa...@postgresql.org
> >Subject: Re: An I/O error occurred while sending to the backend (PG 13.4)
> >
> >Em qui., 24 de fev. de 2022 às 09:59, Justin Pryzby  pry...@telsasoft.com> escreveu:
> >On Thu, Feb 24, 2022 at 08:50:45AM -0300, Ranier Vilela wrote:
> >> I can't understand why you are still using 13.4?
> >> [1] There is a long discussion about the issue with 13.4, the project
> was
> >> made to fix a DLL bottleneck.
> >>
> >> Why you not use 13.6?
> >
> >That other problem (and its fix) were in the windows build environment,
> and not
> >an issue in some postgres version.
> >Yeah, correct.
> >But I think that it was very clear in the other thread that version 13.4,
> >on Windows, may have a slowdown, because of the DLL problem.
> >So it would be better to use the latest available version
> >that has this specific fix and many others.
> >
> >regards,
> >Ranier Vilela
>
>
> OK, absolutely. I was thinking about even moving to 14. I know migrations
> within a release are painless, but my experience with upgrading across
> releases has also been quite good (short of bugs that were found of
> course). Any opinion on 14.2?
>
Of course, 14.2 would be better than 13.6, but I think that there are
chances that this specific problem is not beneficial.
And both 13.6 and 14.2 still suffer from a Windows version specific issue
[1].
A solution has been proposed which has not yet been accepted.

But in general terms, you will benefit from adopting 14.2 for sure.

regards,
Ranie Vilela

[1]
https://www.postgresql.org/message-id/CAEudQAovOEM0haC4NbWZaYGW4ESmAE1j6_yr93tS8Xo8i7%2B54A%40mail.gmail.com


Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-01 Thread Justin Pryzby
On Tue, Mar 01, 2022 at 04:28:31PM +, l...@laurent-hasson.com wrote:
> Now, there is an additional component I think... Storage is on an array and I 
> am not getting a clear answer as to where it is  Is it possible that 
> something is happening at the storage layer? Could that be reported as a 
> network issue vs a storage issue for Postgres?

No.  If there were an error with storage, it'd be reported as a local error,
and the query would fail, rather than failing with client-server communication.

> Also, both machines are actually VMs. I forgot to mention that and not sure 
> if that's relevant.

Are they running on the same hypervisor ?  Is that hyperv ?
Lacking other good hypotheses, that does seem relevant.

-- 
Justin




RE: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-01 Thread l...@laurent-hasson.com


   >  -Original Message-
   >  From: Justin Pryzby 
   >  Sent: Monday, February 28, 2022 17:05
   >  To: l...@laurent-hasson.com
   >  Cc: pgsql-performa...@postgresql.org
   >  Subject: Re: An I/O error occurred while sending to the backend (PG 13.4)
   >  
   >  On Mon, Feb 28, 2022 at 09:43:09PM +, l...@laurent-hasson.com
   >  wrote:
   >  >On Wed, Feb 23, 2022 at 07:04:15PM -0600, Justin Pryzby wrote:
   >  >>  > And the aforementioned network trace.  You could set a capture
   >  filter on TCP
   >  >>  > SYN|RST so it's not absurdly large.  From my notes, it might 
look like
   >  this:
   >  >>  > (tcp[tcpflags]&(tcp-rst|tcp-syn|tcp-fin)!=0)
   >  >>
   >  >>  I'd also add '|| icmp'.  My hunch is that you'll see some ICMP 
(not
   >  "ping")
   >  >>  being sent by an intermediate gateway, resulting in the connection
   >  being
   >  >>  reset.
   >  >
   >  > I am so sorry but I do not understand what you are asking me to do. I am
   >  unfamiliar with these commands. Is this a postgres configuration file? Is 
this
   >  something I just do once or something I leave on to hopefully catch it 
when
   >  the issue occurs? Is this something to do on the DB machine or the ETL
   >  machine? FYI:
   >  
   >  It's no problem.
   >  
   >  I suggest that you run wireshark with a capture filter to try to show 
*why*
   >  the connections are failing.  I think the capture filter might look like:
   >  
   >  (icmp || (tcp[tcpflags] & (tcp-rst|tcp-syn|tcp-fin)!=0)) && host
   >  10.64.17.211
   >  
   >  With the "host" filtering for the IP address of the *remote* machine.
   >  
   >  You could run that on whichever machine is more convenient and leave it
   >  running for however long it takes for that error to happen.  You'll be 
able to
   >  save a .pcap file for inspection.  I suppose it'll show either a TCP RST 
or an
   >  ICMP.
   >  Whichever side sent that is where the problem is.  I still suspect the 
issue
   >  isn't in postgres.
   >  
   >  >   - My ETL machine is on 10.64.17.211
   >  >   - My DB machine is on 10.64.17.210
   >  >   - Both on Windows Server 2012 R2, x64
   >  
   >  These network details make my theory unlikely.
   >  
   >  They're on the same subnet with no intermediate gateways, and
   >  communicate directly via a hub/switch/crossover cable.  If that's true, 
then
   >  both will have each other's hardware address in ARP after pinging from one
   >  to the other.
   >  
   >  --
   >  Justin

Yes, the machines ARE on the same subnet. They actually even are on the same 
physical rack as per what I have been told. When I run a tracert, I get this:

Tracing route to PRODDB.xxx.int [10.64.17.210] over a maximum of 30 hops:
  1 1 ms<1 ms<1 ms  PRODDB.xxx.int [10.64.17.210]
Trace complete.

Now, there is an additional component I think... Storage is on an array and I 
am not getting a clear answer as to where it is  Is it possible that something 
is happening at the storage layer? Could that be reported as a network issue vs 
a storage issue for Postgres?

Also, both machines are actually VMs. I forgot to mention that and not sure if 
that's relevant.

Thank you,
Laurent.




Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-02-28 Thread Justin Pryzby
On Mon, Feb 28, 2022 at 09:43:09PM +, l...@laurent-hasson.com wrote:
>On Wed, Feb 23, 2022 at 07:04:15PM -0600, Justin Pryzby wrote:
>>  > And the aforementioned network trace.  You could set a capture filter 
> on TCP
>>  > SYN|RST so it's not absurdly large.  From my notes, it might look 
> like this:
>>  > (tcp[tcpflags]&(tcp-rst|tcp-syn|tcp-fin)!=0)
>>  
>>  I'd also add '|| icmp'.  My hunch is that you'll see some ICMP (not 
> "ping")
>>  being sent by an intermediate gateway, resulting in the connection being
>>  reset.
> 
> I am so sorry but I do not understand what you are asking me to do. I am 
> unfamiliar with these commands. Is this a postgres configuration file? Is 
> this something I just do once or something I leave on to hopefully catch it 
> when the issue occurs? Is this something to do on the DB machine or the ETL 
> machine? FYI:

It's no problem.

I suggest that you run wireshark with a capture filter to try to show *why* the
connections are failing.  I think the capture filter might look like:

(icmp || (tcp[tcpflags] & (tcp-rst|tcp-syn|tcp-fin)!=0)) && host 10.64.17.211

With the "host" filtering for the IP address of the *remote* machine.

You could run that on whichever machine is more convenient and leave it running
for however long it takes for that error to happen.  You'll be able to save a
.pcap file for inspection.  I suppose it'll show either a TCP RST or an ICMP.
Whichever side sent that is where the problem is.  I still suspect the issue
isn't in postgres.

>   - My ETL machine is on 10.64.17.211
>   - My DB machine is on 10.64.17.210
>   - Both on Windows Server 2012 R2, x64

These network details make my theory unlikely.

They're on the same subnet with no intermediate gateways, and communicate
directly via a hub/switch/crossover cable.  If that's true, then both will have
each other's hardware address in ARP after pinging from one to the other.

-- 
Justin




RE: An I/O error occurred while sending to the backend (PG 13.4)

2022-02-28 Thread l...@laurent-hasson.com


   >  -Original Message-
   >  From: Justin Pryzby 
   >  Sent: Friday, February 25, 2022 08:02
   >  To: l...@laurent-hasson.com
   >  Cc: pgsql-performa...@postgresql.org
   >  Subject: Re: An I/O error occurred while sending to the backend (PG 13.4)
   >  
   >  On Wed, Feb 23, 2022 at 07:04:15PM -0600, Justin Pryzby wrote:
   >  > And the aforementioned network trace.  You could set a capture filter
   >  > on TCP
   >  > SYN|RST so it's not absurdly large.  From my notes, it might look like 
this:
   >  > (tcp[tcpflags]&(tcp-rst|tcp-syn|tcp-fin)!=0)
   >  
   >  I'd also add '|| icmp'.  My hunch is that you'll see some ICMP (not 
"ping")
   >  being sent by an intermediate gateway, resulting in the connection being
   >  reset.
   >  
   >  --
   >  Justin


Hello Justin,

I am so sorry but I do not understand what you are asking me to do. I am 
unfamiliar with these commands. Is this a postgres configuration file? Is this 
something I just do once or something I leave on to hopefully catch it when the 
issue occurs? Is this something to do on the DB machine or the ETL machine? FYI:

  - My ETL machine is on 10.64.17.211
  - My DB machine is on 10.64.17.210
  - Both on Windows Server 2012 R2, x64

So sorry for the bother,
Laurent.






RE: An I/O error occurred while sending to the backend (PG 13.4)

2022-02-28 Thread l...@laurent-hasson.com


>From: Ranier Vilela  
>Sent: Thursday, February 24, 2022 08:46
>To: Justin Pryzby 
>Cc: l...@laurent-hasson.com; pgsql-performa...@postgresql.org
>Subject: Re: An I/O error occurred while sending to the backend (PG 13.4) 
>
>Em qui., 24 de fev. de 2022 às 09:59, Justin Pryzby 
><mailto:pry...@telsasoft.com> escreveu:
>On Thu, Feb 24, 2022 at 08:50:45AM -0300, Ranier Vilela wrote:
>> I can't understand why you are still using 13.4?
>> [1] There is a long discussion about the issue with 13.4, the project was
>> made to fix a DLL bottleneck.
>> 
>> Why you not use 13.6?
>
>That other problem (and its fix) were in the windows build environment, and not
>an issue in some postgres version.
>Yeah, correct.
>But I think that it was very clear in the other thread that version 13.4, 
>on Windows, may have a slowdown, because of the DLL problem.
>So it would be better to use the latest available version 
>that has this specific fix and many others.
>
>regards,
>Ranier Vilela


OK, absolutely. I was thinking about even moving to 14. I know migrations 
within a release are painless, but my experience with upgrading across releases 
has also been quite good (short of bugs that were found of course). Any opinion 
on 14.2?

Thank you, Laurent.




Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-02-25 Thread Justin Pryzby
On Wed, Feb 23, 2022 at 07:04:15PM -0600, Justin Pryzby wrote:
> And the aforementioned network trace.  You could set a capture filter on TCP
> SYN|RST so it's not absurdly large.  From my notes, it might look like this:
> (tcp[tcpflags]&(tcp-rst|tcp-syn|tcp-fin)!=0)

I'd also add '|| icmp'.  My hunch is that you'll see some ICMP (not "ping")
being sent by an intermediate gateway, resulting in the connection being reset.

-- 
Justin




Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-02-24 Thread Ranier Vilela
Em qui., 24 de fev. de 2022 às 09:59, Justin Pryzby 
escreveu:

> On Thu, Feb 24, 2022 at 08:50:45AM -0300, Ranier Vilela wrote:
> > I can't understand why you are still using 13.4?
> > [1] There is a long discussion about the issue with 13.4, the project was
> > made to fix a DLL bottleneck.
> >
> > Why you not use 13.6?
>
> That other problem (and its fix) were in the windows build environment,
> and not
> an issue in some postgres version.

Yeah, correct.
But I think that it was very clear in the other thread that version 13.4,
on Windows, may have a slowdown, because of the DLL problem.
So it would be better to use the latest available version
that has this specific fix and many others.

regards,
Ranier Vilela


Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-02-24 Thread Justin Pryzby
On Thu, Feb 24, 2022 at 08:50:45AM -0300, Ranier Vilela wrote:
> I can't understand why you are still using 13.4?
> [1] There is a long discussion about the issue with 13.4, the project was
> made to fix a DLL bottleneck.
> 
> Why you not use 13.6?

That other problem (and its fix) were in the windows build environment, and not
an issue in some postgres version.  It's still a good idea to schedule an
update.

-- 
Justin




Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-02-24 Thread Ranier Vilela
Em qua., 23 de fev. de 2022 às 21:47, l...@laurent-hasson.com <
l...@laurent-hasson.com> escreveu:

>
>
>>  -Original Message-
>>  From: l...@laurent-hasson.com 
>>  Sent: Saturday, December 4, 2021 14:18
>>  To: Justin Pryzby 
>>  Cc: pgsql-performa...@postgresql.org
>    >  Subject: RE: An I/O error occurred while sending to the backend (PG
> 13.4)
>>
>>
>> >  -Original Message-
>> >  From: Justin Pryzby 
>> >  Sent: Saturday, December 4, 2021 12:59
>> >  To: l...@laurent-hasson.com
>    >     >  Cc: pgsql-performa...@postgresql.org
>> >  Subject: Re: An I/O error occurred while sending to the
> backend (PG
>> >  13.4)
>> >
>> >  On Sat, Dec 04, 2021 at 05:32:10PM +,
> l...@laurent-hasson.com
>> >  wrote:
>> >  > I have a data warehouse with a fairly complex ETL process
> that has
>> >  been running for years now across PG 9.6, 11.2 and now 13.4
> for the
>> >  past couple of months. I have been getting the error "An I/O
> error
>> >  occurred while sending to the backend" quite often under load
> in 13.4
>> >  which I never used to get on 11.2. I have applied some tricks,
>>  particularly
>> >  with the socketTimeout JDBC configuration.
>> >  >
>> >  > So my first question is whether anyone has any idea why this
> is
>> >  happening? My hardware and general PG configuration have not
>> >  changed between 11.2 and 13.4 and I NEVER experienced this on
> 11.2
>>  in
>> >  about 2y of production.
>> >  >
>> >  > Second, I have one stored procedure that takes a very long
> time to
>>  run
>> >  (40mn more or less), so obviously, I'd need to set
> socketTimeout to
>> >  something like 1h in order to call it and not timeout. That
> doesn't seem
>> >  reasonable?
>> >
>> >  Is the DB server local or remote (TCP/IP) to the client?
>> >
>> >  Could you collect the corresponding postgres query logs when
> this
>> >  happens ?
>> >
>> >  It'd be nice to see a network trace for this too.  Using
> tcpdump or
>> >  wireshark.
>> >  Preferably from the client side.
>> >
>> >  FWIW, I suspect the JDBC socketTimeout is a bad workaround.
>> >
>> >  --
>> >  Justin
>>
>>  It's a remote server, but all on a local network. Network
> performance is I
>>  am sure not the issue. Also, the system is on Windows Server. What
> are you
>>  expecting to see out of a tcpdump? I'll try to get PG logs on the
> failing query.
>>
>>  Thank you,
>>  Laurent.
>>
>>
>>
>>
>
> Hello Justin,
>
> It has been ages! The issue has been happening a bit more often recently,
> as much as once every 10 days or so. As a reminder, the set up is Postgres
> 13.4 on Windows Server with 16cores and 64GB memory.

I can't understand why you are still using 13.4?
[1] There is a long discussion about the issue with 13.4, the project was
made to fix a DLL bottleneck.

Why you not use 13.6?

regards,
Ranier Vilela

[1]
https://www.postgresql.org/message-id/MN2PR15MB2560BBB3EC911D973C2FE3F885A89%40MN2PR15MB2560.namprd15.prod.outlook.com


Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-02-23 Thread Justin Pryzby
You originally mailed about an error on the client, and now you found
corresponding server logs, which suggests a veritable network issue.

Are the postgres clients and server on the same subnet ?  If not, what are the
intermediate routers ?  Is there any NAT happening ?  Do those devices have any
interesting logs that correspond with the server/client connection failures ?

Have you tried enabling TCP keepalives ?  This might help to convince a NAT
device not to forget about your connection.

https://www.postgresql.org/docs/current/runtime-config-connection.html
tcp_keepalives_idle=9
tcp_keepalives_interval=9
tcp_keepalives_count=0
tcp_user_timeout=0 -- You apparently have this set, but it cannot work on 
windows, so just generates noise.

On linux, you can check the keepalive counters in "netstat -not" to be sure
that it's enabled.  A similar switch hopefully exists for windows.

-- 
Justin




Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-02-23 Thread Justin Pryzby
On Thu, Feb 24, 2022 at 12:47:42AM +, l...@laurent-hasson.com wrote:
> On Sat, Dec 04, 2021 at 05:32:10PM +, l...@laurent-hasson.com wrote:
> >  I have a data warehouse with a fairly complex ETL process that has
> >  been running for years now across PG 9.6, 11.2 and now 13.4 for the
> >  past couple of months. I have been getting the error "An I/O error
> >  occurred while sending to the backend" quite often under load in 13.4
> >  which I never used to get on 11.2. I have applied some tricks, 
> particularly
> >  with the socketTimeout JDBC configuration.

>  It'd be nice to see a network trace for this too.  Using tcpdump or
>  wireshark.  Preferably from the client side.
> 
> Hello Justin,
> 
> It has been ages! The issue has been happening a bit more often recently, as 
> much as once every 10 days or so. As a reminder, the set up is Postgres 13.4 
> on Windows Server with 16cores and 64GB memory. The scenario where this 
> occurs is an ETL tool called Pentaho Kettle (V7) connecting to the DB for 
> DataWarehouse workloads. The tool is Java-based and connects via JDBC using 
> postgresql-42.2.5.jar. There are no particular settings besides the 
> socketTimeout setting mentioned above.
> 
> The workload has some steps being lots of quick transactions for dimension 
> tables for example, but some fact table calculations, especially large 
> pivots, can make queries run for 40mn up to over an hour (a few of those).
> 
> I caught these in the logs at the time of a failure but unsure what to make 
> of that:
> 
> 2022-02-21 02:10:43.605 EST [1368] LOG:  unexpected EOF on client connection 
> with an open transaction
> 2022-02-21 02:10:43.605 EST [3304] LOG:  could not receive data from client: 
> An existing connection was forcibly closed by the remote host.
>   
> 2022-02-21 02:10:43.605 EST [3304] LOG:  unexpected EOF on client connection 
> with an open transaction
> 2022-02-21 02:31:38.808 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
> supported

I suggest to enable CSV logging, which has many more columns of data.
Some of them might provide an insight - I'm not sure.
log_destination=csvlog (in addition to whatever else you have set).

And the aforementioned network trace.  You could set a capture filter on TCP
SYN|RST so it's not absurdly large.  From my notes, it might look like this:
(tcp[tcpflags]&(tcp-rst|tcp-syn|tcp-fin)!=0)

-- 
Justin




RE: An I/O error occurred while sending to the backend (PG 13.4)

2022-02-23 Thread l...@laurent-hasson.com



   >  -Original Message-
   >  From: l...@laurent-hasson.com 
   >  Sent: Saturday, December 4, 2021 14:18
   >  To: Justin Pryzby 
   >  Cc: pgsql-performa...@postgresql.org
   >  Subject: RE: An I/O error occurred while sending to the backend (PG 13.4)
   >  
   >  
   > >  -Original Message-
   > >  From: Justin Pryzby 
   > >  Sent: Saturday, December 4, 2021 12:59
   > >  To: l...@laurent-hasson.com
   > >  Cc: pgsql-performa...@postgresql.org
   > >  Subject: Re: An I/O error occurred while sending to the backend (PG
   > >  13.4)
   > >
   > >  On Sat, Dec 04, 2021 at 05:32:10PM +, l...@laurent-hasson.com
   > >  wrote:
   > >  > I have a data warehouse with a fairly complex ETL process that has
   > >  been running for years now across PG 9.6, 11.2 and now 13.4 for the
   > >  past couple of months. I have been getting the error "An I/O error
   > >  occurred while sending to the backend" quite often under load in 
13.4
   > >  which I never used to get on 11.2. I have applied some tricks,
   >  particularly
   > >  with the socketTimeout JDBC configuration.
   > >  >
   > >  > So my first question is whether anyone has any idea why this is
   > >  happening? My hardware and general PG configuration have not
   > >  changed between 11.2 and 13.4 and I NEVER experienced this on 11.2
   >  in
   > >  about 2y of production.
   > >  >
   > >  > Second, I have one stored procedure that takes a very long time to
   >  run
   > >  (40mn more or less), so obviously, I'd need to set socketTimeout to
   > >  something like 1h in order to call it and not timeout. That doesn't 
seem
   > >  reasonable?
   > >
   > >  Is the DB server local or remote (TCP/IP) to the client?
   > >
   > >  Could you collect the corresponding postgres query logs when this
   > >  happens ?
   > >
   > >  It'd be nice to see a network trace for this too.  Using tcpdump or
   > >  wireshark.
   > >  Preferably from the client side.
   > >
   > >  FWIW, I suspect the JDBC socketTimeout is a bad workaround.
   > >
   > >  --
   > >  Justin
   >  
   >  It's a remote server, but all on a local network. Network performance is I
   >  am sure not the issue. Also, the system is on Windows Server. What are you
   >  expecting to see out of a tcpdump? I'll try to get PG logs on the failing 
query.
   >  
   >  Thank you,
   >  Laurent.
   >  
   >  
   >  
   >  

Hello Justin,

It has been ages! The issue has been happening a bit more often recently, as 
much as once every 10 days or so. As a reminder, the set up is Postgres 13.4 on 
Windows Server with 16cores and 64GB memory. The scenario where this occurs is 
an ETL tool called Pentaho Kettle (V7) connecting to the DB for DataWarehouse 
workloads. The tool is Java-based and connects via JDBC using 
postgresql-42.2.5.jar. There are no particular settings besides the 
socketTimeout setting mentioned above.

The workload has some steps being lots of quick transactions for dimension 
tables for example, but some fact table calculations, especially large pivots, 
can make queries run for 40mn up to over an hour (a few of those).

I caught these in the logs at the time of a failure but unsure what to make of 
that:


2022-02-21 02:08:16.214 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:29.347 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:30.371 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:30.463 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:30.596 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:30.687 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:30.786 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:30.873 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:30.976 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:31.050 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:31.131 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:31.240 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:31.906 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:31.988 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:33.068 EST [1704] LOG:  setsockopt(TCP_USER_TIMEOUT) not 
supported
2022-02-21 02:08:

Re: An I/O error occurred while sending to the backend (PG 13.4)

2021-12-04 Thread Justin Pryzby
On Sat, Dec 04, 2021 at 07:18:06PM +, l...@laurent-hasson.com wrote:
> It's a remote server, but all on a local network. Network performance is I am 
> sure not the issue. Also, the system is on Windows Server. What are you 
> expecting to see out of a tcpdump? I'll try to get PG logs on the failing 
> query.

I'd want to know if postgres sent anything to the client, like TCP RST, or if
the client decided on its own that there had been an error.

-- 
Justin




RE: An I/O error occurred while sending to the backend (PG 13.4)

2021-12-04 Thread l...@laurent-hasson.com


   >  -Original Message-
   >  From: Justin Pryzby 
   >  Sent: Saturday, December 4, 2021 12:59
   >  To: l...@laurent-hasson.com
   >  Cc: pgsql-performa...@postgresql.org
   >  Subject: Re: An I/O error occurred while sending to the backend (PG
   >  13.4)
   >  
   >  On Sat, Dec 04, 2021 at 05:32:10PM +, l...@laurent-hasson.com
   >  wrote:
   >  > I have a data warehouse with a fairly complex ETL process that has
   >  been running for years now across PG 9.6, 11.2 and now 13.4 for the
   >  past couple of months. I have been getting the error "An I/O error
   >  occurred while sending to the backend" quite often under load in 13.4
   >  which I never used to get on 11.2. I have applied some tricks, 
particularly
   >  with the socketTimeout JDBC configuration.
   >  >
   >  > So my first question is whether anyone has any idea why this is
   >  happening? My hardware and general PG configuration have not
   >  changed between 11.2 and 13.4 and I NEVER experienced this on 11.2 in
   >  about 2y of production.
   >  >
   >  > Second, I have one stored procedure that takes a very long time to run
   >  (40mn more or less), so obviously, I'd need to set socketTimeout to
   >  something like 1h in order to call it and not timeout. That doesn't seem
   >  reasonable?
   >  
   >  Is the DB server local or remote (TCP/IP) to the client?
   >  
   >  Could you collect the corresponding postgres query logs when this
   >  happens ?
   >  
   >  It'd be nice to see a network trace for this too.  Using tcpdump or
   >  wireshark.
   >  Preferably from the client side.
   >  
   >  FWIW, I suspect the JDBC socketTimeout is a bad workaround.
   >  
   >  --
   >  Justin

It's a remote server, but all on a local network. Network performance is I am 
sure not the issue. Also, the system is on Windows Server. What are you 
expecting to see out of a tcpdump? I'll try to get PG logs on the failing query.

Thank you,
Laurent.







Re: An I/O error occurred while sending to the backend (PG 13.4)

2021-12-04 Thread Justin Pryzby
On Sat, Dec 04, 2021 at 05:32:10PM +, l...@laurent-hasson.com wrote:
> I have a data warehouse with a fairly complex ETL process that has been 
> running for years now across PG 9.6, 11.2 and now 13.4 for the past couple of 
> months. I have been getting the error "An I/O error occurred while sending to 
> the backend" quite often under load in 13.4 which I never used to get on 
> 11.2. I have applied some tricks, particularly with the socketTimeout JDBC 
> configuration.
> 
> So my first question is whether anyone has any idea why this is happening? My 
> hardware and general PG configuration have not changed between 11.2 and 13.4 
> and I NEVER experienced this on 11.2 in about 2y of production.
> 
> Second, I have one stored procedure that takes a very long time to run (40mn 
> more or less), so obviously, I'd need to set socketTimeout to something like 
> 1h in order to call it and not timeout. That doesn't seem reasonable?

Is the DB server local or remote (TCP/IP) to the client?

Could you collect the corresponding postgres query logs when this happens ?

It'd be nice to see a network trace for this too.  Using tcpdump or wireshark.
Preferably from the client side.

FWIW, I suspect the JDBC socketTimeout is a bad workaround.

-- 
Justin