Re: Tomcat not syncing existing sessions on restart
On 10/03/2024 16:59, Manak Bisht wrote: On Fri, Feb 9, 2024 at 4:45 PM Mark Thomas wrote: Using 0.0.0.0 as the address for the receiver is going to cause problems. I see similar issues with 11.0.x as 8.5.x. I haven't dug too deeply into things as a) I am short of time and b) I'm not convinced this should/could work anyway. What seems to happen is that the use of 0.0.0.0 confuses the cluster as to which node is which - I think because multiple nodes are using 0.0.0.0. That causes the failure of the initial state synchronisation. Yes, this was indeed the problem. I chose 0.0.0.0 because binding to the host's ip threw the following error - 01-Mar-2024 22:30:32.315 SEVERE [main] org.apache.catalina.tribes.transport.nio.NioReceiver.start Unable to start cluster receiver java.net.BindException: Cannot assign requested address The full stack trace is available in my previous mail. To identify the problem, I ran my application outside the container, where I did not encounter the above error. This led me to investigate on the Docker side of things. By default, a Docker container uses a bridge network, so binding to the host's ip address from inside the container is simply not possible even when the receiver port has been correctly mapped. I was able to get it to work by passing the --network=host flag to my docker create command. This puts the container inside the host's network, essentially de-containerizing its networking. Although this works, this is not desirable because this opens every port on the container, increasing the surface area for security and debugging. 0.0.0.0 is a natural choice and is used by a lot of applications running on Docker, even the official Tomcat image on Docker Hub does so. There is no official Docker image provided by the Tomcat project. I am no expert on Docker or Tomcat, however, I don't think this is ideal. Docker has become so ubiquitous that I couldn't imagine deploying without it, but using clustering makes me lose some of the benefits of it. I have not looked into it, but this might also impact the BackupManager because it also requires a Receiver element. On Mon, Feb 12, 2024 at 8:52 PM Christopher Schultz < ch...@christopherschultz.net> wrote: If this is known to essentially always not-work... should we log something at startup? I think this is the least that we could do, I am willing to work on this. However, I also think that this should be looked into deeper to solve the actual problem. Thinking about this a little more (although I am still short on time so haven't investigated) I wonder if the issue is that a node needs to advertise to other nodes what IP address it is listening on. If if advertises 0.0.0.0 the other nodes have no way to contact it. Further (and you can look at the acceptor unlock code for the details) trying to determine a valid IP address to provide to other nodes is non-trivial (and the acceptor case is only looking at localhost, not across a network). I understand that this discussion might be more fit for the dev mailing list, please let me know if you think the above holds merit, and I will move it there. You start to get into having to separate the IP address a node listens on and the IP address it advertises for other nodes to contact it (similar to HTTP or JMX behind a proxy) I'm not a docker expert but it looks to me from a quick Google search that the expectation in this case is that you should use swarm mode which provides an overlay network across the nodes. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat not syncing existing sessions on restart
On Fri, Feb 9, 2024 at 4:45 PM Mark Thomas wrote: > Using 0.0.0.0 as the address for the receiver is going to cause > problems. I see similar issues with 11.0.x as 8.5.x. I haven't dug too > deeply into things as a) I am short of time and b) I'm not convinced > this should/could work anyway. > > What seems to happen is that the use of 0.0.0.0 confuses the cluster as > to which node is which - I think because multiple nodes are using > 0.0.0.0. That causes the failure of the initial state synchronisation. > Yes, this was indeed the problem. I chose 0.0.0.0 because binding to the host's ip threw the following error - > 01-Mar-2024 22:30:32.315 SEVERE [main] > org.apache.catalina.tribes.transport.nio.NioReceiver.start Unable to start > cluster receiver > java.net.BindException: Cannot assign requested address The full stack trace is available in my previous mail. To identify the problem, I ran my application outside the container, where I did not encounter the above error. This led me to investigate on the Docker side of things. By default, a Docker container uses a bridge network, so binding to the host's ip address from inside the container is simply not possible even when the receiver port has been correctly mapped. I was able to get it to work by passing the --network=host flag to my docker create command. This puts the container inside the host's network, essentially de-containerizing its networking. Although this works, this is not desirable because this opens every port on the container, increasing the surface area for security and debugging. 0.0.0.0 is a natural choice and is used by a lot of applications running on Docker, even the official Tomcat image on Docker Hub does so. I am no expert on Docker or Tomcat, however, I don't think this is ideal. Docker has become so ubiquitous that I couldn't imagine deploying without it, but using clustering makes me lose some of the benefits of it. I have not looked into it, but this might also impact the BackupManager because it also requires a Receiver element. On Mon, Feb 12, 2024 at 8:52 PM Christopher Schultz < ch...@christopherschultz.net> wrote: > If this is known to essentially always not-work... should we log > something at startup? I think this is the least that we could do, I am willing to work on this. However, I also think that this should be looked into deeper to solve the actual problem. I understand that this discussion might be more fit for the dev mailing list, please let me know if you think the above holds merit, and I will move it there. Sincerely, Manak Bisht
Re: [OT] Tomcat not syncing existing sessions on restart
I would suggest focusing on Docker networking rather than Tomcat. My guess is that how that works will inform your Tomcat configuration. You might also try first getting it to work with two Docker instances on a single machine. -Terence Bandoian On 3/1/2024 11:59 AM, Manak Bisht wrote: I am fairly certain now that the docker container is the problem. I am unable to replicate the issue without it. Using the hostname/IP address of the host (tomcat/ip) for the receiver always causes the following problem, 01-Mar-2024 22:30:32.315 INFO [main] org.apache.catalina.tribes.transport.ReceiverBase.bind Unable to bind server socket to:tomcat/ip:4000 throwing error. 01-Mar-2024 22:30:32.315 SEVERE [main] org.apache.catalina.tribes.transport.nio.NioReceiver.start Unable to start cluster receiver java.net.BindException: Cannot assign requested address at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) at org.apache.catalina.tribes.transport.ReceiverBase.bind(ReceiverBase.java:184) at org.apache.catalina.tribes.transport.nio.NioReceiver.bind(NioReceiver.java:125) at org.apache.catalina.tribes.transport.nio.NioReceiver.start(NioReceiver.java:89) at org.apache.catalina.tribes.group.ChannelCoordinator.internalStart(ChannelCoordinator.java:150) at org.apache.catalina.tribes.group.ChannelCoordinator.start(ChannelCoordinator.java:102) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor.start(StaticMembershipInterceptor.java:108) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor.start(TcpPingInterceptor.java:65) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.GroupChannel.start(GroupChannel.java:421) at org.apache.catalina.ha.tcp.SimpleTcpCluster.startInternal(SimpleTcpCluster.java:544) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:902) at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:262) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.StandardService.startInternal(StandardService.java:439) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:760) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.startup.Catalina.start(Catalina.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:351) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:485) Either address binding does not work for any address inside the container or just binding to the address of the host machine does not work. I am leaning towards the latter because the * *element has never exhibited this issue. Here's what I have already tried/checked, - The receiver/address port of the container is mapped to the same port on the host - The IP of the host is reachable via ping and telnet from the container. - Running the following code from inside the container always works java.net.InetAddress bind = java.net.InetAddress.getByName("tomcat"); System.out.println(bind); // Output: tomcat/ip I have read a lot of resources and tried a variety of solutions to no avail. Literature covering session replication with containerisation is also sparse. If someone has tried this before or has any ideas, please let me know, I would greatly appreciate it. Sincerely, Manak Bisht On Mon, Feb 12, 2024 at 9:07 PM Christopher Schultz < ch...@christopherschultz.net> wrote: Manak, On 2/12/24 10:33, Manak Bisht wrote: Chris, On Mon, 12 Feb 2024, 20:52 Christopher Schultz, < ch...@christopherschultz.net> wrote: I wouldn't refuse to configure, since anyone using 0.0.0.0 with /separate/ hosts wouldn't experience this problem. I am using separate hosts (two docker containers on two different machines) in my main deployment. I just reproduced the pro
Re: [OT] Tomcat not syncing existing sessions on restart
I am fairly certain now that the docker container is the problem. I am unable to replicate the issue without it. Using the hostname/IP address of the host (tomcat/ip) for the receiver always causes the following problem, 01-Mar-2024 22:30:32.315 INFO [main] org.apache.catalina.tribes.transport.ReceiverBase.bind Unable to bind server socket to:tomcat/ip:4000 throwing error. 01-Mar-2024 22:30:32.315 SEVERE [main] org.apache.catalina.tribes.transport.nio.NioReceiver.start Unable to start cluster receiver java.net.BindException: Cannot assign requested address at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) at org.apache.catalina.tribes.transport.ReceiverBase.bind(ReceiverBase.java:184) at org.apache.catalina.tribes.transport.nio.NioReceiver.bind(NioReceiver.java:125) at org.apache.catalina.tribes.transport.nio.NioReceiver.start(NioReceiver.java:89) at org.apache.catalina.tribes.group.ChannelCoordinator.internalStart(ChannelCoordinator.java:150) at org.apache.catalina.tribes.group.ChannelCoordinator.start(ChannelCoordinator.java:102) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor.start(StaticMembershipInterceptor.java:108) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor.start(TcpPingInterceptor.java:65) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.GroupChannel.start(GroupChannel.java:421) at org.apache.catalina.ha.tcp.SimpleTcpCluster.startInternal(SimpleTcpCluster.java:544) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:902) at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:262) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.StandardService.startInternal(StandardService.java:439) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:760) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.startup.Catalina.start(Catalina.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:351) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:485) Either address binding does not work for any address inside the container or just binding to the address of the host machine does not work. I am leaning towards the latter because the * *element has never exhibited this issue. Here's what I have already tried/checked, - The receiver/address port of the container is mapped to the same port on the host - The IP of the host is reachable via ping and telnet from the container. - Running the following code from inside the container always works java.net.InetAddress bind = java.net.InetAddress.getByName("tomcat"); System.out.println(bind); // Output: tomcat/ip I have read a lot of resources and tried a variety of solutions to no avail. Literature covering session replication with containerisation is also sparse. If someone has tried this before or has any ideas, please let me know, I would greatly appreciate it. Sincerely, Manak Bisht On Mon, Feb 12, 2024 at 9:07 PM Christopher Schultz < ch...@christopherschultz.net> wrote: > Manak, > > On 2/12/24 10:33, Manak Bisht wrote: > > Chris, > > > > On Mon, 12 Feb 2024, 20:52 Christopher Schultz, < > > ch...@christopherschultz.net> wrote: > > > >> I wouldn't refuse to configure, since anyone using > >> 0.0.0.0 with /separate/ hosts wouldn't experience this problem. > > > > > > I am using separate hosts (two docker containers on two different > machines) > > in my main deployment. I just reproduced the problem on the same host to > > rule out network issues. > > Thanks for the clarification. For some reason, I thought this was two > Docker containers on the same host. > > -chris > > - > To u
Re: [OT] Tomcat not syncing existing sessions on restart
Manak, On 2/12/24 10:33, Manak Bisht wrote: Chris, On Mon, 12 Feb 2024, 20:52 Christopher Schultz, < ch...@christopherschultz.net> wrote: I wouldn't refuse to configure, since anyone using 0.0.0.0 with /separate/ hosts wouldn't experience this problem. I am using separate hosts (two docker containers on two different machines) in my main deployment. I just reproduced the problem on the same host to rule out network issues. Thanks for the clarification. For some reason, I thought this was two Docker containers on the same host. -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: [OT] Tomcat not syncing existing sessions on restart
Chris, On Mon, 12 Feb 2024, 20:52 Christopher Schultz, < ch...@christopherschultz.net> wrote: > I wouldn't refuse to configure, since anyone using > 0.0.0.0 with /separate/ hosts wouldn't experience this problem. I am using separate hosts (two docker containers on two different machines) in my main deployment. I just reproduced the problem on the same host to rule out network issues. Sincerely, Manak Bisht
Re: [OT] Tomcat not syncing existing sessions on restart
Mark, On 2/9/24 06:14, Mark Thomas wrote: With the Receiver using address="0.0.0.0" I see the same issues you do. I'm not yet convinced that is a bug. If this is known to essentially always not-work... should we log something at startup? I wouldn't refuse to configure, since anyone using 0.0.0.0 with /separate/ hosts wouldn't experience this problem. -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat not syncing existing sessions on restart
On 09/02/2024 07:51, Manak Bisht wrote: On Fri, Feb 9, 2024 at 3:25 AM Mark Thomas wrote: Same JRE? Yes, 8.0.402 Generally, I wouldn't use 0.0.0.0, I'd use a specific IP address. I'm not sure how the clustering would behave with 0.0.0.0 Using 0.0.0.0 as the address for the receiver is going to cause problems. I see similar issues with 11.0.x as 8.5.x. I haven't dug too deeply into things as a) I am short of time and b) I'm not convinced this should/could work anyway. What seems to happen is that the use of 0.0.0.0 confuses the cluster as to which node is which - I think because multiple nodes are using 0.0.0.0. That causes the failure of the initial state synchronisation. That's the problem really. Using the DNS name or IP address causes the following error - I am as sure as I can be that the issue you are seeing is environmental. I have configured my test cluster with: - your cluster configuration with changes to host names and IP addresses - Java 8.0.402 - Tomcat 8.5.x With the Receiver using address="0.0.0.0" I see the same issues you do. I'm not yet convinced that is a bug. With the Receiver using address="hostname" the cluster starts but doesn't work. Examining the logs shows that is because the host name resolves to a loopback address. I'd class that as behaving as expected. I coudl always change the host's config if I wanted the name to resolve to the public IP. With the Receiver using address="ip-address" the cluster start and log messages show that cluster state is exchanged within a few milliseconds. That leads me to conclude that the BindException you see is a configuration and/or envornmental issue although I don't see why your simple test works but clustering doesn't. Perhaps a conflict with something else in your Tomcat configuration? Somethign to try is starting Tomcat with the Receiver using 0.0.0.0 and then using nestat to see which address/port combinations are being used. Mark - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat not syncing existing sessions on restart
On Fri, Feb 9, 2024 at 3:25 AM Mark Thomas wrote: > Same JRE? > Yes, 8.0.402 Generally, I wouldn't use 0.0.0.0, I'd use a specific IP address. I'm > not sure how the clustering would behave with 0.0.0.0 > That's the problem really. Using the DNS name or IP address causes the following error - 09-Feb-2024 13:08:32.440 SEVERE [main] org.apache.catalina.startup.Catalina.start The required Server component failed to start so Tomcat is unable to start. org.apache.catalina.LifecycleException: Failed to start component [StandardServer[8006]] at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154) at org.apache.catalina.startup.Catalina.start(Catalina.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:351) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:485) Caused by: org.apache.catalina.LifecycleException: Failed to start component [StandardService[Catalina]] at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154) at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:760) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) ... 7 more Caused by: org.apache.catalina.LifecycleException: Failed to start component [StandardEngine[Catalina]] at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154) at org.apache.catalina.core.StandardService.startInternal(StandardService.java:439) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) ... 9 more Caused by: org.apache.catalina.LifecycleException: Failed to start component [org.apache.catalina.ha.tcp.SimpleTcpCluster[Catalina]] at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154) at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:902) at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:262) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) ... 11 more Caused by: org.apache.catalina.LifecycleException: org.apache.catalina.tribes.ChannelException: java.net.BindException: Cannot assign requested address; No faulty members identified. at org.apache.catalina.ha.tcp.SimpleTcpCluster.startInternal(SimpleTcpCluster.java:549) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) ... 14 more Caused by: org.apache.catalina.tribes.ChannelException: java.net.BindException: Cannot assign requested address; No faulty members identified. at org.apache.catalina.tribes.group.ChannelCoordinator.internalStart(ChannelCoordinator.java:184) at org.apache.catalina.tribes.group.ChannelCoordinator.start(ChannelCoordinator.java:102) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor.start(StaticMembershipInterceptor.java:108) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor.start(TcpPingInterceptor.java:65) at org.apache.catalina.tribes.group.ChannelInterceptorBase.start(ChannelInterceptorBase.java:155) at org.apache.catalina.tribes.group.GroupChannel.start(GroupChannel.java:421) at org.apache.catalina.ha.tcp.SimpleTcpCluster.startInternal(SimpleTcpCluster.java:544) ... 15 more Caused by: java.net.BindException: Cannot assign requested address at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) at org.apache.catalina.tribes.transport.ReceiverBase.bind(ReceiverBase.java:184) at org.apache.catalina.tribes.transport.nio.NioReceiver.bind(NioReceiver.java:125) at org.apache.catalina.tribes.transport.nio.NioReceiver.start(NioReceiver.java:89) at org.apache.catalina.tribes.group.ChannelCoordinator.internalStart(ChannelCoordinator.java:150) ... 25 more The *host *attribute of the *member* element does not exhibit the same problem. The DNS/IP is also reachable via ping and telnet. I even wrote a simple test to check this, and it works successfully - java.net.InetAddress bind = java.net.InetAddress.getByName("tomcat"); System.out.println(bind); // Output: tomcat/ip Sincerely, Manak Bisht
Re: Tomcat not syncing existing sessions on restart
On 07/02/2024 11:43, Manak Bisht wrote: I think I have narrowed down the problem. For Tomcat 9 (v9.0.85), using 0.0.0.0 for the local member and receiver works fine. However, the same does not work in Tomcat 8.5 (v8.5.98). Same JRE? Generally, I wouldn't use 0.0.0.0, I'd use a specific IP address. I'm not sure how the clustering would behave with 0.0.0.0 Mark Sincerely, Manak Bisht On Fri, Feb 2, 2024 at 9:41 PM Mark Thomas wrote: On 31/01/2024 13:33, Manak Bisht wrote: I tried tweaking all the settings that I could think of but I am unable to sync sessions on restart even on a stock Tomcat 8.5.98 installation using your provided war. I am unable to identify whether this is actually a bug or something wrong with my configuration (this is far more likely). Could you please share your server.xml? Did you make any other changes? Sincerely, Manak Bisht Here is the cluster configuration from the first node my test environment: - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat not syncing existing sessions on restart
I think I have narrowed down the problem. For Tomcat 9 (v9.0.85), using 0.0.0.0 for the local member and receiver works fine. However, the same does not work in Tomcat 8.5 (v8.5.98). Sincerely, Manak Bisht On Fri, Feb 2, 2024 at 9:41 PM Mark Thomas wrote: > On 31/01/2024 13:33, Manak Bisht wrote: > > I tried tweaking all the settings that I could think of but I am unable > to > > sync sessions on restart even on a stock Tomcat 8.5.98 installation using > > your provided war. I am unable to identify whether this is actually a bug > > or something wrong with my configuration (this is far more likely). Could > > you please share your server.xml? Did you make any other changes? > > > > Sincerely, > > Manak Bisht > > Here is the cluster configuration from the first node my test environment: > > className="org.apache.catalina.ha.tcp.SimpleTcpCluster" > channelSendOptions="6" > > > >className="org.apache.catalina.ha.session.DeltaManager" >expireSessionsOnShutdown="false" >notifyListenersOnReplication="true" >/> > >className="org.apache.catalina.tribes.group.GroupChannel"> > > > className="org.apache.catalina.tribes.membership.StaticMembershipService" > > > >className="org.apache.catalina.tribes.membership.StaticMember" >port="4000" >host="192.168.23.32" >uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}" >/> > >className="org.apache.catalina.tribes.membership.StaticMember" >port="4000" >host="192.168.23.33" >uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2}" >/> > > >className="org.apache.catalina.tribes.transport.nio.NioReceiver" > address="192.168.23.32" > port="4000" > autoBind="0" > selectorTimeout="5000" > maxThreads="6" > /> > > > className="org.apache.catalina.tribes.transport.ReplicationTransmitter" > > > > > className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" >/> > > > > > > className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector" > performReadTest="true" > /> > > > className="org.apache.catalina.tribes.group.interceptors.MessageDispatchInterceptor" > /> > > >className="org.apache.catalina.ha.deploy.FarmWarDeployer" >tempDir="cluster-temp" >deployDir="webapps" >watchDir="cluster-watch" >watchEnabled="true" >/> > >className="org.apache.catalina.ha.tcp.ReplicationValve" >filter="" >/> > >className="org.apache.catalina.ha.session.JvmRouteBinderValve" >/> > > className="org.apache.catalina.ha.session.ClusterSessionListener" > /> > > > > - > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >
Re: Tomcat not syncing existing sessions on restart
On 31/01/2024 13:33, Manak Bisht wrote: I tried tweaking all the settings that I could think of but I am unable to sync sessions on restart even on a stock Tomcat 8.5.98 installation using your provided war. I am unable to identify whether this is actually a bug or something wrong with my configuration (this is far more likely). Could you please share your server.xml? Did you make any other changes? Sincerely, Manak Bisht Here is the cluster configuration from the first node my test environment: - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat not syncing existing sessions on restart
I tried tweaking all the settings that I could think of but I am unable to sync sessions on restart even on a stock Tomcat 8.5.98 installation using your provided war. I am unable to identify whether this is actually a bug or something wrong with my configuration (this is far more likely). Could you please share your server.xml? Did you make any other changes? Sincerely, Manak Bisht >
Re: Tomcat not syncing existing sessions on restart
Hi Mark, I tried running your *cluster-test* war example on a stock 8.5.98 installation, however, I am facing the same issue. Session sync does not trigger on restarting a node. Could you please share your configuration? Sincerely, Manak Bisht
Re: Tomcat not syncing existing sessions on restart
Thanks for going the extra mile to help me out on this. I really appreciate it. As far as I am aware, the auto detection of local member is only available post v9.0.17 and the tag was added in v8.5.1. Unfortunately, I happen to be working in an environment where 8.5.0 is the highest non-EOL version available. I know I am playing very fast and loose with the definition of EOL when the current version is 8.5.98. Since the StaticMembershipInterceptor has been available for a long time, I thought I could make it work without those two features. Sincerely, Manak Bisht On Tue, Jan 23, 2024 at 3:56 PM Mark Thomas wrote: > The other difference is that you don't appear to have defined the local > member of the cluster. You should define all members of the cluster, > including the local member, on each node. The local member can be > defined explicitly as LocalMember or as an ordinary Member and Tomcat > will figure out it is the local one. >
Re: Tomcat not syncing existing sessions on restart
I have configured my standard cluster test environment for a 2-node cluster, using DeltaManager and static membership. httpd is configured for non-sticky load-balancing. Each node has the Manager web application and my simple cluster-test deployed. https://people.apache.org/~markt/dev/cluster-test.war Starting both both nodes and connecting directly to each manager instance shows no sessions in cluster-test as expected. Requesting the cluster index page via httpd triggers the creation of a single session in cluster-test. Requests alternate between node 1 and node 2 as expected. Examining the session via the manager app shows that the changes to the session are being correctly replicated. Stopping node 2 causes further requests to be directed to node 1 only. Starting node 2 shows that the session is replicated correctly from node 1. I see the updated session in both nodes via the Manager app. Also the following test works: - create a session - stop node 2 - further requests (handled by node 1) - stop requests - start node 2 - stop node 1 - resume requests (handled by node 2) One difference is that I am using the StaticMembershipService rather than the StaticMembershipInterceptor. I don't think that will make any difference. The other difference is that you don't appear to have defined the local member of the cluster. You should define all members of the cluster, including the local member, on each node. The local member can be defined explicitly as LocalMember or as an ordinary Member and Tomcat will figure out it is the local one. Mark On 22/01/2024 08:39, Manak Bisht wrote: I thought that this https://marc.info/?l=tomcat-user&m=119376798217922&w=2 might be the problem. *"The uniqueId is used to be able to differentiate between the same node joining a cluster, then crashing and then rejoining again. if the uniqueId didn't change in between this, there is no way to tell the difference between a node going down, or just leaving the cluster and rejoining."* So, I tried creating a session when one of the nodes was down, but that did not sync as well when the other node came online again. In that case, I would also expect org.apache.catalina.ha. session.DeltaManager.waitForSendAllSessions to proceed with no state sync rather than timing out. I have also checked the time on both the servers using the Linux date command and they seem to be in sync. The timezone flag passed to the JAVA_OPTS argument in catalina.sh is also the same. Please let me know if any more information is required to help debug this issue. Sincerely, Manak Bisht On Sun, Jan 14, 2024 at 11:09 PM Manak Bisht wrote: Hi, I am using DeltaManager (static membership) with non-sticky load balancing on two nodes. I have observed even load, and requests with the same JSESSIONID being served successfully by both tomcats. This leads me to conclude that session replication is working as expected when both nodes are up. However, when I restart any one of them, the newly restarted tomcat is unable to serve requests from old sessions. The logs indicate that node discovering is working but the session sync timeouts. New logins/sessions work just fine though, implying that replication is working successfully again. *tomcat1.log* 13-Jan-2024 14:16:35.713 INFO [GroupChannel-Heartbeat-1] org.apache.catalina.ha.tcp.SimpleTcpCluster.memberDisappeared Received member disappeared:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090, alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 }, payload={}, command={}, domain={}, ] 13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1] org.apache.catalina.ha.tcp.SimpleTcpCluster.memberAdded Replication member added:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090, alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 }, payload={}, command={}, domain={}, ] 13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1] org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.performBasicCheck Suspect member, confirmed alive.[org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090, alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 }, payload={}, command={}, domain={}, ]] *13-Jan-2024 14:45:24.354 WARNING [Tribes-Task-Receiver-4] org.apache.catalina.ha.session.DeltaManager.deserializeSessions overload existing session * *tomcat2.log* 13-Jan-2024 14:45:24.290 INFO [localhost-startStop-1] org.apache.catalina.ha.session.DeltaManager.startInternal Register manager localhost# to cluster element Engine with name Catalina 13-Jan-2024 14:45:24.291 INFO [localhost-startStop-1] org.apache.catalina.ha.session.DeltaManager.startInternal Starting clustering manager at localhost# 13-Jan-2024 14:45:24.363 INFO [localhost-startStop-1] org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.report Thr
Re: Tomcat not syncing existing sessions on restart
I thought that this https://marc.info/?l=tomcat-user&m=119376798217922&w=2 might be the problem. *"The uniqueId is used to be able to differentiate between the same node joining a cluster, then crashing and then rejoining again. if the uniqueId didn't change in between this, there is no way to tell the difference between a node going down, or just leaving the cluster and rejoining."* So, I tried creating a session when one of the nodes was down, but that did not sync as well when the other node came online again. In that case, I would also expect org.apache.catalina.ha. session.DeltaManager.waitForSendAllSessions to proceed with no state sync rather than timing out. I have also checked the time on both the servers using the Linux date command and they seem to be in sync. The timezone flag passed to the JAVA_OPTS argument in catalina.sh is also the same. Please let me know if any more information is required to help debug this issue. Sincerely, Manak Bisht On Sun, Jan 14, 2024 at 11:09 PM Manak Bisht wrote: > Hi, > I am using DeltaManager (static membership) with non-sticky load balancing > on two nodes. I have observed even load, and requests with the same > JSESSIONID being served successfully by both tomcats. This leads me to > conclude that session replication is working as expected when both nodes > are up. > > However, when I restart any one of them, the newly restarted tomcat is > unable to serve requests from old sessions. The logs indicate that node > discovering is working but the session sync timeouts. New logins/sessions > work just fine though, implying that replication is working successfully > again. > > *tomcat1.log* > 13-Jan-2024 14:16:35.713 INFO [GroupChannel-Heartbeat-1] > org.apache.catalina.ha.tcp.SimpleTcpCluster.memberDisappeared Received > member > disappeared:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090, > alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 }, > payload={}, command={}, domain={}, ] > 13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1] > org.apache.catalina.ha.tcp.SimpleTcpCluster.memberAdded Replication member > added:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090, > alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 }, > payload={}, command={}, domain={}, ] > 13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1] > org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.performBasicCheck > Suspect member, confirmed > alive.[org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090, > alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 }, > payload={}, command={}, domain={}, ]] > *13-Jan-2024 14:45:24.354 WARNING [Tribes-Task-Receiver-4] > org.apache.catalina.ha.session.DeltaManager.deserializeSessions overload > existing session * > > > *tomcat2.log* > 13-Jan-2024 14:45:24.290 INFO [localhost-startStop-1] > org.apache.catalina.ha.session.DeltaManager.startInternal Register manager > localhost# to cluster element Engine with name Catalina > 13-Jan-2024 14:45:24.291 INFO [localhost-startStop-1] > org.apache.catalina.ha.session.DeltaManager.startInternal Starting > clustering manager at localhost# > 13-Jan-2024 14:45:24.363 INFO [localhost-startStop-1] > org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.report > ThroughputInterceptor Report[ > Tx Msg:1 messages > Sent:0.00 MB (total) > Sent:0.00 MB (application) > Time:0.06 seconds > Tx Speed:0.01 MB/sec (total) > TxSpeed:0.01 MB/sec (application) > Error Msg:0 > Rx Msg:15 messages > Rx Speed:0.00 MB/sec (since 1st msg) > Received:0.00 MB] > > 13-Jan-2024 14:45:24.368 INFO [localhost-startStop-1] > org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions Manager > [localhost#], requesting session state from > org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat1:8090,tomcat1,8090, > alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 }, > payload={}, command={}, domain={}, ]. This operation will timeout if no > session state has been received within 60 seconds. > *13-Jan-2024 14:46:24.459 SEVERE [localhost-startStop-1] > org.apache.catalina.ha.session.DeltaManager.waitForSendAllSessions Manager > [localhost#]: No session state send at 1/13/24 2:45 PM received, timing out > after 60,167 ms.* > > There is also a warning, but I am unsure of its significance. > I have tried tweaking the sendAllSessions value to false and increasing > the stateTransferTimeout window to no avail. > > This is my clustering config for tomcat1 (the config is the same for > tomcat2 with the host as tomcat1 and uniqueId > {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}) - > > channelSendOptions="6" channelStartOptions="3"> > > > > > className="org.apache.catalina.tribes.transport.nio.NioReceiver" > address="0.0.0.0" > port="8090" >
Tomcat not syncing existing sessions on restart
Hi, I am using DeltaManager (static membership) with non-sticky load balancing on two nodes. I have observed even load, and requests with the same JSESSIONID being served successfully by both tomcats. This leads me to conclude that session replication is working as expected when both nodes are up. However, when I restart any one of them, the newly restarted tomcat is unable to serve requests from old sessions. The logs indicate that node discovering is working but the session sync timeouts. New logins/sessions work just fine though, implying that replication is working successfully again. *tomcat1.log* 13-Jan-2024 14:16:35.713 INFO [GroupChannel-Heartbeat-1] org.apache.catalina.ha.tcp.SimpleTcpCluster.memberDisappeared Received member disappeared:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090, alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 }, payload={}, command={}, domain={}, ] 13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1] org.apache.catalina.ha.tcp.SimpleTcpCluster.memberAdded Replication member added:org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090, alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 }, payload={}, command={}, domain={}, ] 13-Jan-2024 14:44:16.457 INFO [GroupChannel-Heartbeat-1] org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.performBasicCheck Suspect member, confirmed alive.[org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat2:8090,tomcat2,8090, alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 }, payload={}, command={}, domain={}, ]] *13-Jan-2024 14:45:24.354 WARNING [Tribes-Task-Receiver-4] org.apache.catalina.ha.session.DeltaManager.deserializeSessions overload existing session * *tomcat2.log* 13-Jan-2024 14:45:24.290 INFO [localhost-startStop-1] org.apache.catalina.ha.session.DeltaManager.startInternal Register manager localhost# to cluster element Engine with name Catalina 13-Jan-2024 14:45:24.291 INFO [localhost-startStop-1] org.apache.catalina.ha.session.DeltaManager.startInternal Starting clustering manager at localhost# 13-Jan-2024 14:45:24.363 INFO [localhost-startStop-1] org.apache.catalina.tribes.group.interceptors.ThroughputInterceptor.report ThroughputInterceptor Report[ Tx Msg:1 messages Sent:0.00 MB (total) Sent:0.00 MB (application) Time:0.06 seconds Tx Speed:0.01 MB/sec (total) TxSpeed:0.01 MB/sec (application) Error Msg:0 Rx Msg:15 messages Rx Speed:0.00 MB/sec (since 1st msg) Received:0.00 MB] 13-Jan-2024 14:45:24.368 INFO [localhost-startStop-1] org.apache.catalina.ha.session.DeltaManager.getAllClusterSessions Manager [localhost#], requesting session state from org.apache.catalina.tribes.membership.StaticMember[tcp://tomcat1:8090,tomcat1,8090, alive=0, securePort=-1, UDP Port=-1, id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 }, payload={}, command={}, domain={}, ]. This operation will timeout if no session state has been received within 60 seconds. *13-Jan-2024 14:46:24.459 SEVERE [localhost-startStop-1] org.apache.catalina.ha.session.DeltaManager.waitForSendAllSessions Manager [localhost#]: No session state send at 1/13/24 2:45 PM received, timing out after 60,167 ms.* There is also a warning, but I am unsure of its significance. I have tried tweaking the sendAllSessions value to false and increasing the stateTransferTimeout window to no avail. This is my clustering config for tomcat1 (the config is the same for tomcat2 with the host as tomcat1 and uniqueId {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}) - Any help would be greatly appreciated. Sincerely, Manak Bisht