Hello All, I've been using Zookeeper at my place of work for a few months now successfully, but there has been a lingering issue I haven't been able to solve without issue. Namely, when using GDB with libzookeeper_mt, once you hit a breakpoint, the program you're running essentially has until the session timeout to continue onward or its session will be expired. This is a pain in the butt when using ephemeral znodes, but in my case those ephemeral znodes are tied to locks which means losing them is bad news. I've tried a number of different ideas to solve this issue, and all of them have varying degrees of success.
The first idea I had was jacking up the session timeouts, which obviously works. This extends the time you have at any given breakpoint to figure out the issue and move onward, but comes at the expense of ephemeral znodes living for much longer than they reasonably should when the program crashes (something that is likely to be an issue if you're using GDB). In the case of locking, those znodes which hang around for a while have negative consequences on the performance of the system. This is how we currently deal with the issue. The second idea was to instruct all developers at my job to use GDB non-stop mode for debugging. This works, since GDB would only stop the thread which hit a breakpoint in this mode, but runs into the issue that I need to change the development habits of hundreds of engineers just to save myself the trouble. Ideally Zookeeper would function with GDB in whatever mode you felt like using. The third idea was decidedly more intricate. Essentially I spawn a subprocess which uses the exact same session I do, but only holds onto that session while the parent process is unresponsive (at a breakpoint probably). This essentially locks your session while at breakpoints, but has no impact while not at breakpoints. The only caveat to this approach is the transition between breakpoints and non-breakpoints. Since the server last saw the session in the subprocess, it doesn't send heartbeat messages to the parent process. This means it's up to the parent process to send PING messages to the server in order to reestablish the session, but this only happens at 1/3 of the session timeout (which is too long). Whatever the case, a simple, generic solution would be ideal for this situation. It might be as simple as allowing configurable PING messages (for the third solution) or it might be as frustrating as creating a Zookeeper service which runs outside of the process (thus bypassing GDB's breakpoints). Any ideas? Thanks, Stephen Tyree
