I ran a bunch of tests using the long-running-test code where the servers had a mix of conserve-sockets settings, and they all worked ok.
One set of tests had 6 servers - 3 with conserve-sockets=false and 3 with conserve-sockets=true. Another set of tests had 4 servers - 3 with conserve-sockets=false and 1 with conserve-sockets=true. In each case, the multi-threaded client did: - puts - gets - destroys - function updates - oql queries One thing I found interesting was the server where the operation originated dictated which thread was used on the remote server. If the server where the operation originated had conserve-sockets=false, then the remote server used an unshared P2P message reader to process the replication no matter what its conserve-sockets setting was. And if the server where the operation originated had conserve-sockets=true, then the remote server used a shared P2P message reader to process the replication no matter what its conserve-sockets setting was. Here is some logging from a DistributionMessageObserver that shows that behavior. Case 1: The server (server1) that processes the put operation from the client is primary and has conserve-sockets=false. The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=true. 1. A ServerConnection thread in server1 sends the UpdateWithContextMessage: ServerConnection on port 60802 Thread 4: TestDistributionMessageObserver operation=beforeSendMessage; time=1606929894787; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server-conserve-sockets1:58995)<v16>:41002] 2. An unshared P2P message reader in server2 handles the UpdateWithContextMessage even though conserve-sockets=true: P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: DistributionMessage.schedule msg=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)) P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606929894809; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null] P2P message reader for 192.168.1.8(server1:58984)<v15>:41001 unshared ordered uid=11 dom #1 local port=58405 remote port=60860: TestDistributionMessageObserver operation=afterProcessMessage; time=1606929894810; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server1:58984)<v15>:41001; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null] Case 2: The server (server1) that processes the put operation from the client is primary and has conserve-sockets=true. The server (server2) that handles the UpdateWithContextMessage has conserve-sockets=false. 1. A ServerConnection thread in server1 sends the UpdateWithContextMessage: ServerConnection on port 61474 Thread 1: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400283; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[192.168.1.8(server1:63224)<v26>:41001] 2. The shared P2P message reader in server2 handles the UpdateWithContextMessage and sends the ReplyMessage even though conserve-sockets=false: P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400295; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null] P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=beforeSendMessage; time=1606932400296; message=ReplyMessage processorId=42 from null; recipients=[192.168.1.8(server-conserve-sockets1:63240)<v27>:41002] P2P message reader for 192.168.1.8(server-conserve-sockets1:63240)<v27>:41002 shared ordered uid=4 local port=54619 remote port=61472: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=UpdateOperation$UpdateWithContextMessage(region path='/__PR/_B__data_48'; sender=192.168.1.8(server-conserve-sockets1:63240)<v27>:41002; op=UPDATE; key=0; newValue=(10485820 bytes)); recipients=[null] 3. The shared P2P message reader in server1 handles the ReplyMessage: P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=beforeProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null] P2P message reader for 192.168.1.8(server1:63224)<v26>:41001 shared unordered uid=3 local port=47098 remote port=61467: TestDistributionMessageObserver operation=afterProcessMessage; time=1606932400296; message=ReplyMessage processorId=42 from 192.168.1.8(server1:63224)<v26>:41001; recipients=[null] ________________________________ From: Anthony Baker <bak...@vmware.com> Sent: Monday, November 23, 2020 2:16 PM To: dev@geode.apache.org <dev@geode.apache.org> Subject: Re: [PROPOSAL] Change the default value of conserve-sockets to false Udo, you’re correct that individual servers can set the property independently. I was assuming this is more like the ’security-manager` property and others that require all cluster members to be in agreement. I’m not sure I understand the use case to allow this setting to be per-member. That makes it pretty challenging to reason about what is happening in a cluster when doing root cause analysis. There is even an API to change this value dynamically: https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgeode.apache.org%2Fdocs%2Fguide%2F12%2Fmanaging%2Fmonitor_tune%2Fperformance_controls_controlling_socket_use.html&data=04%7C01%7Cboglesby%40vmware.com%7Cfba12bb4b08d4e3f322b08d88ffd7bf3%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637417666152686146%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=aL2MwOcdSjStgGsRIT4yENLP%2FQ41qsWpPi8xvh8RMkI%3D&reserved=0 …but I’ve only seen that used to make function threads/sockets follow the correct setting. Anthony On Nov 20, 2020, at 11:23 AM, Udo Kohlmeyer <u...@vmware.com<mailto:u...@vmware.com>> wrote: @Anthony I cannot think of a single reason, why the server should not start up, even in a rolling upgrade. This setting should not have an effect on the cluster (other than potentially positive). Also, if the Geode were to enforce this setting across the cluster, then we have seriously broken our “shared nothing value” here..