Re: Added to Jira

2021-11-16 Thread Dan Smith
Done! You should have access now.

-Dan

From: Kristen Oduca 
Sent: Tuesday, November 16, 2021 4:03 PM
To: dev@geode.apache.org 
Subject: Added to Jira

Hi,

Can I be added to the Geode Jira so I can assign myself tasks? My username is 
kris10.

Thanks,
Kristen Oduca


Added to Jira

2021-11-16 Thread Kristen Oduca
Hi,

Can I be added to the Geode Jira so I can assign myself tasks? My username is 
kris10.

Thanks,
Kristen Oduca


Re: Open socket handles build up over time (leaking?)

2021-11-16 Thread Leon Finker
Hi Darrel,

Thank you! I'll try to track it!

On Tue, Nov 16, 2021 at 2:42 PM Darrel Schneider  wrote:
>
> This link: 
> https://help.mulesoft.com/s/article/How-to-identify-leaked-file-descriptors-that-are-shown-only-as-cant-identify-protocol-in-lsof
>  points out that once the fd gets into this "can't identify protocol" state 
> you can no longer figure out things like what port(s) and addresses are 
> associated with the fd. They suggest running lsof periodically to catch that 
> fd (in your example output the first fd is 133u) when it was still healthy. 
> This would help track it down to an area of the geode code. For example you 
> could see that it was on of the sockets using the port the cache server is 
> listening on.
> How to identify leaked file descriptors that are shown only as “can't 
> identify protocol” in lsof | MuleSoft Help 
> Center
> Functional cookies enhance functions, performance, and services on the 
> website. Some examples include: cookies used to analyze site traffic, cookies 
> used for market research, and cookies used to display advertising that is not 
> directed to a particular individual.
> help.mulesoft.com
>
> 
> From: Leon Finker 
> Sent: Tuesday, November 16, 2021 9:28 AM
> To: dev@geode.apache.org 
> Subject: Open socket handles build up over time (leaking?)
>
> Hi,
>
> We observe in our geode (1.14 - same before as well in 1.13) cache
> server (that supports durable client sessions) an increase in half
> opened sockets. It seems there is a socket leak. Could someone
> recommend how to track the leak down? It's not obvious where it's
> leaking...I can only suspect AcceptorImpl.run and where it only
> handles IOException. but I wasn't able to reproduce it in debugger
> yet...
>
> lsof -p 344|grep "can't"
>
> java 344   user  133u  sock0,6   0t0 115956017
> can't identify protocol
> java 344   user  142u  sock0,6   0t0 113361870
> can't identify protocol
> java 344   user  143u  sock0,6   0t0 111979650
> can't identify protocol
> java 344   user  156u  sock0,6   0t0 117202529
> can't identify protocol
> java 344   user  178u  sock0,6   0t0 113357568
> can't identify protocol
> ...
>
> lsof -p 344|grep "can't"|wc -l
> 934
>
> Thank you


Re: Open socket handles build up over time (leaking?)

2021-11-16 Thread Leon Finker
Hi Anthony,

>- What OS for client and server?
Server: CentOS release 6.10
Client: Windows 10 and CentOS 6.*

The cache is connected to by durable and non durable clients. Durable
clients connect from Windows 10. And non durable clients connect from
backend CentOS servers. The server cache has no custom networking
code/libraries. It's purely a geode cache.

- What is the scenario?  Is it “normal” operation or is the client or
server killed?

Server is not killed and continuously running. Clients are also
normally connected. But they may lose connections and reconnect. The
issue is definitely caused by client side connections to server. But
I'm not able to isolate any specific exception from geode log.

>- Does netstat give you any additional information about the sockets?  Are any 
>in TIME_WAIT status?

Ran "netstat -altnup" and all tcp connections have only LISTEN and
ESTABLISHED states for the server process.

>- Do you have a reproducible test case?

Unfortunately no. I'm trying to isolate from geode logs what can be
causing the issue.

>- Do you have a tcpdump of the socket?

No. I'll try it. But not sure it will help? What to look for?
Unfortunately I'm not sure which connection causes the problem.

>- Are you seeing the sockets clean up over time or do they persist until a 
>reboot?
They seem to persist till jvm restart. Forcing GC has no effect on
their count. It slowly creeps up.

On Tue, Nov 16, 2021 at 12:44 PM Anthony Baker  wrote:
>
> Hi, thanks for this report.  Some questions to help us help you—
>
> - What OS for client and server?
> - Are you seeing the sockets clean up over time or do they persist until a 
> reboot?
> - Does netstat give you any additional information about the sockets?  Are 
> any in TIME_WAIT status?
> - Do you have a tcpdump of the socket?
> - What is the scenario?  Is it “normal” operation or is the client or server 
> killed?
> - Do you have a reproducible test case?
>
> Thanks,
> Anthony
>
>
>
>
> > On Nov 16, 2021, at 9:28 AM, Leon Finker  wrote:
> >
> > Hi,
> >
> > We observe in our geode (1.14 - same before as well in 1.13) cache
> > server (that supports durable client sessions) an increase in half
> > opened sockets. It seems there is a socket leak. Could someone
> > recommend how to track the leak down? It's not obvious where it's
> > leaking...I can only suspect AcceptorImpl.run and where it only
> > handles IOException. but I wasn't able to reproduce it in debugger
> > yet...
> >
> > lsof -p 344|grep "can't"
> >
> > java 344   user  133u  sock0,6   0t0 115956017
> > can't identify protocol
> > java 344   user  142u  sock0,6   0t0 113361870
> > can't identify protocol
> > java 344   user  143u  sock0,6   0t0 111979650
> > can't identify protocol
> > java 344   user  156u  sock0,6   0t0 117202529
> > can't identify protocol
> > java 344   user  178u  sock0,6   0t0 113357568
> > can't identify protocol
> > ...
> >
> > lsof -p 344|grep "can't"|wc -l
> > 934
> >
> > Thank you
>


Re: Open socket handles build up over time (leaking?)

2021-11-16 Thread Darrel Schneider
This link: 
https://help.mulesoft.com/s/article/How-to-identify-leaked-file-descriptors-that-are-shown-only-as-cant-identify-protocol-in-lsof
 points out that once the fd gets into this "can't identify protocol" state you 
can no longer figure out things like what port(s) and addresses are associated 
with the fd. They suggest running lsof periodically to catch that fd (in your 
example output the first fd is 133u) when it was still healthy. This would help 
track it down to an area of the geode code. For example you could see that it 
was on of the sockets using the port the cache server is listening on.
How to identify leaked file descriptors that are shown only as “can't identify 
protocol” in lsof | MuleSoft Help 
Center
Functional cookies enhance functions, performance, and services on the website. 
Some examples include: cookies used to analyze site traffic, cookies used for 
market research, and cookies used to display advertising that is not directed 
to a particular individual.
help.mulesoft.com


From: Leon Finker 
Sent: Tuesday, November 16, 2021 9:28 AM
To: dev@geode.apache.org 
Subject: Open socket handles build up over time (leaking?)

Hi,

We observe in our geode (1.14 - same before as well in 1.13) cache
server (that supports durable client sessions) an increase in half
opened sockets. It seems there is a socket leak. Could someone
recommend how to track the leak down? It's not obvious where it's
leaking...I can only suspect AcceptorImpl.run and where it only
handles IOException. but I wasn't able to reproduce it in debugger
yet...

lsof -p 344|grep "can't"

java 344   user  133u  sock0,6   0t0 115956017
can't identify protocol
java 344   user  142u  sock0,6   0t0 113361870
can't identify protocol
java 344   user  143u  sock0,6   0t0 111979650
can't identify protocol
java 344   user  156u  sock0,6   0t0 117202529
can't identify protocol
java 344   user  178u  sock0,6   0t0 113357568
can't identify protocol
...

lsof -p 344|grep "can't"|wc -l
934

Thank you


Re: Failed durable client connection initialization can sometimes leak client socket handle?

2021-11-16 Thread Darrel Schneider
The run method on AcceptorImpl is run in a LoggingThread instance (see 
AcceptorImpl.start()). So any exceptions thrown by AcceptorImpl.run() will be 
logged as a fatal log message containing ""Uncaught exception in thread" by the 
LoggingThread. You can see the code that does this in 
LoggingUncaughtExceptionHandler.

Also the AcceptorImpl.run() method I see has a finally block in which it closes 
"serverSocket" if it is not null. It is on this close call that it catches and 
ignores IOException.
I think you may be talking about AcceptorImpl.ClientQueueInitializerTask.run(). 
This run is called from an executor which is created in 
initializeClientQueueInitializerThreadPool. It uses CoreLoggingExecutors so 
once again any unhandled exception should be logged as a fatal log msg in the 
server log.

From: Leon Finker 
Sent: Wednesday, November 10, 2021 10:01 AM
To: dev@geode.apache.org 
Subject: Failed durable client connection initialization can sometimes leak 
client socket handle?

Hi,

In AcceptorImpl.run, the accepted client socket seems to only be
closed when there is IOException. I can't prove it, but I think there
can sometimes be non IO exception here as well and then the client
socket will not be closed? Also, can we please add a catch for other
kinds of exceptions and at least log them as errors?

The symptoms we have are like this:
1. Durable client has a connection problem during initialization.
2. Durable client ends up with orphaned durable HA region (the one
prefixed with_gfe_durable_client_with_id_)
3. Now the client automatically reconnects and the geode server fails
to properly initialize the client. Most likely because the region
already has an error. If inspecting the regions at runtime, we indeed
can see durable region for the client without CacheClientProxy
properly created and added to the proxies collection.
4. We observe a pretty rapid (over few days) memory leak and socket handles leak
5. This leak stops as soon as we destroy that internal durable region
(partially through reflection) for the client and client can then
properly reconnect and initialize its region and proxy.

Does this ring any bells for anyone?

Thank you


Re: Failed durable client connection initialization can sometimes leak client socket handle?

2021-11-16 Thread Anthony Baker
I just responded to your latest question, but if you have logs or a test case 
you could share that would be really helpful.

Thanks,
Anthony


> On Nov 10, 2021, at 10:01 AM, Leon Finker  wrote:
> 
> Hi,
> 
> In AcceptorImpl.run, the accepted client socket seems to only be
> closed when there is IOException. I can't prove it, but I think there
> can sometimes be non IO exception here as well and then the client
> socket will not be closed? Also, can we please add a catch for other
> kinds of exceptions and at least log them as errors?
> 
> The symptoms we have are like this:
> 1. Durable client has a connection problem during initialization.
> 2. Durable client ends up with orphaned durable HA region (the one
> prefixed with_gfe_durable_client_with_id_)
> 3. Now the client automatically reconnects and the geode server fails
> to properly initialize the client. Most likely because the region
> already has an error. If inspecting the regions at runtime, we indeed
> can see durable region for the client without CacheClientProxy
> properly created and added to the proxies collection.
> 4. We observe a pretty rapid (over few days) memory leak and socket handles 
> leak
> 5. This leak stops as soon as we destroy that internal durable region
> (partially through reflection) for the client and client can then
> properly reconnect and initialize its region and proxy.
> 
> Does this ring any bells for anyone?
> 
> Thank you



Re: Open socket handles build up over time (leaking?)

2021-11-16 Thread Anthony Baker
Hi, thanks for this report.  Some questions to help us help you—

- What OS for client and server?
- Are you seeing the sockets clean up over time or do they persist until a 
reboot?
- Does netstat give you any additional information about the sockets?  Are any 
in TIME_WAIT status?
- Do you have a tcpdump of the socket? 
- What is the scenario?  Is it “normal” operation or is the client or server 
killed?
- Do you have a reproducible test case?

Thanks,
Anthony




> On Nov 16, 2021, at 9:28 AM, Leon Finker  wrote:
> 
> Hi,
> 
> We observe in our geode (1.14 - same before as well in 1.13) cache
> server (that supports durable client sessions) an increase in half
> opened sockets. It seems there is a socket leak. Could someone
> recommend how to track the leak down? It's not obvious where it's
> leaking...I can only suspect AcceptorImpl.run and where it only
> handles IOException. but I wasn't able to reproduce it in debugger
> yet...
> 
> lsof -p 344|grep "can't"
> 
> java 344   user  133u  sock0,6   0t0 115956017
> can't identify protocol
> java 344   user  142u  sock0,6   0t0 113361870
> can't identify protocol
> java 344   user  143u  sock0,6   0t0 111979650
> can't identify protocol
> java 344   user  156u  sock0,6   0t0 117202529
> can't identify protocol
> java 344   user  178u  sock0,6   0t0 113357568
> can't identify protocol
> ...
> 
> lsof -p 344|grep "can't"|wc -l
> 934
> 
> Thank you



Open socket handles build up over time (leaking?)

2021-11-16 Thread Leon Finker
Hi,

We observe in our geode (1.14 - same before as well in 1.13) cache
server (that supports durable client sessions) an increase in half
opened sockets. It seems there is a socket leak. Could someone
recommend how to track the leak down? It's not obvious where it's
leaking...I can only suspect AcceptorImpl.run and where it only
handles IOException. but I wasn't able to reproduce it in debugger
yet...

lsof -p 344|grep "can't"

java 344   user  133u  sock0,6   0t0 115956017
can't identify protocol
java 344   user  142u  sock0,6   0t0 113361870
can't identify protocol
java 344   user  143u  sock0,6   0t0 111979650
can't identify protocol
java 344   user  156u  sock0,6   0t0 117202529
can't identify protocol
java 344   user  178u  sock0,6   0t0 113357568
can't identify protocol
...

lsof -p 344|grep "can't"|wc -l
934

Thank you