I did another test and added a two-second sleep after attempting to connect to the APserver, and that removed the problem. Thus, I conclude that the issue is that the APserver doesn't have time to initialise before the parent tries to connect.
I'd like to propose that APserver sends a message to the parent instead of having an arbitrary sleep (right now it's 20 ms, I believe). There are a few different ways of doing this. Here are a few: - The parent redirects stdout and waits for a message from APserver - The same as above, but the apl session opens a named pipe, passed in the name of the pipe to APserver and the message is sent over that channel instead. - APserver detaches and forks itself into the background once all initialisation has been performed. The parent apl session waits for the "parent" APserver to exit before attempting to connect. - The apl session attempts multiple retries over a few seconds before giving up. I'm sure there are other ways to handle it as well. At least we know what the problem is now. :-) Regards, Elias On 31 July 2014 10:43, Elias Mårtenson <loke...@gmail.com> wrote: > I've checked, and here are the results. I noticed that sometimes the > APserver gets killed when I )OFF the interpreter, and sometimes it > doesn't. > > $ *dist/bin/apl --silent -l 37* > sizeof(Svar_record) is 328 > sizeof(Svar_partner) is 28 > > initializing paths from argv[0] = dist/bin/apl > initializing paths from $PWD = /home/emartenson/src/apl > APL_bin_path is: ./dist/bin > APL_bin_name is: apl > Reading config file > /home/emartenson/src/apl/dist/etc/gnu-apl.d/preferences ... > config file /home/emartenson/.config/gnu-apl/preferences is not > present/readable > 0 input files: > Using TCP socket towards APserver... > connecting to 127.0.0.1 TCP port 16366 > (this is expected to fail, unless APserver was started manually) > forking new APserver listening on 127.0.0.1 TCP port 16366 > connecting to 127.0.0.1 TCP port 16366 > (this is supposed to succeed.) > ::connect() to existing APserver failed: Connection refused > PID is 22704 > argc: 4 > argv[0]: 'dist/bin/apl' > argv[1]: '--silent' > argv[2]: '-l' > argv[3]: '37' > uprefs.user_do_svars: 1 > uprefs.system_do_svars: 1 > uprefs.requested_id: 0 > uprefs.requested_par: 0 > Svar_DB not connected in Svar_DB::is_registered_id() > id.proc: 1001 at ProcessorID.cc:77 > Processor ID was completely initialized: 1001:0:0 > system_do_svars is: 1 > > Then, I check for listeners from another terminal: > > $ *netstat -an | grep 16366* > tcp 0 0 127.0.0.1:16366 0.0.0.0:* > LISTEN > $ *ps -ef | grep AP* > emarten+ 22712 1 0 10:34 pts/3 00:00:00 ./dist/bin/APserver --port > 16366 > emarten+ 22733 28324 0 10:36 pts/1 00:00:00 grep AP > > I then quit the APL session: > > *)off* > > And then check connections again: > > $ *ps -ef | grep AP* > emarten+ 22712 1 0 10:34 pts/3 00:00:00 ./dist/bin/APserver --port > 16366 > emarten+ 22750 28324 0 10:38 pts/1 00:00:00 grep AP > em-desktop$ *netstat -an | grep 16366* > tcp 0 0 127.0.0.1:16366 0.0.0.0:* > LISTEN > > As we can see, the APserver is still listening. > > I now try to start the APL interpreter again, and it properly connects to > the *old* APserver: > > $ *dist/bin/apl --silent -l 37* > sizeof(Svar_record) is 328 > sizeof(Svar_partner) is 28 > > initializing paths from argv[0] = dist/bin/apl > initializing paths from $PWD = /home/emartenson/src/apl > APL_bin_path is: ./dist/bin > APL_bin_name is: apl > Reading config file > /home/emartenson/src/apl/dist/etc/gnu-apl.d/preferences ... > config file /home/emartenson/.config/gnu-apl/preferences is not > present/readable > 0 input files: > Using TCP socket towards APserver... > connected to APserver, socket is 3 > using Svar_DB on APserver! > PID is 22768 > argc: 4 > argv[0]: 'dist/bin/apl' > argv[1]: '--silent' > argv[2]: '-l' > argv[3]: '37' > uprefs.user_do_svars: 1 > uprefs.system_do_svars: 1 > uprefs.requested_id: 0 > uprefs.requested_par: 0 > id.proc: 1001 at ProcessorID.cc:77 > Processor ID was completely initialized: 1001:0:0 > system_do_svars is: 1 > > We can see that it's actually connected by checking the APserver status > again: > > $ *netstat -an | grep 16366* > tcp 0 0 127.0.0.1:16366 0.0.0.0:* > LISTEN > tcp 0 0 127.0.0.1:44102 127.0.0.1:16366 > ESTABLISHED > tcp 0 0 127.0.0.1:16366 127.0.0.1:44102 > ESTABLISHED > em-desktop$ *ps -ef | grep AP* > emarten+ 22712 1 0 10:34 pts/3 00:00:00 ./dist/bin/APserver --port > 16366 > emarten+ 22782 28324 0 10:40 pts/1 00:00:00 grep AP > > Now, let's )OFF the interpreter which promptly kills the APserver that > was originally started in the first invocation of apl: > > $ *netstat -an | grep 16366* > tcp 0 0 127.0.0.1:44102 127.0.0.1:16366 > TIME_WAIT > em-desktop$ *ps -ef | grep AP* > emarten+ 22790 28324 0 10:41 pts/1 00:00:00 grep AP > > Regards, > Elias > > > On 29 July 2014 21:17, Elias Mårtenson <loke...@gmail.com> wrote: > >> I will definitely check this when I get back to the office tomorrow. I'll >> keep you posted. >> >> Thanks and regards, >> Elias >> >> >> On 29 July 2014 21:13, Juergen Sauermann <juergen.sauerm...@t-online.de> >> wrote: >> >>> Hi, >>> >>> that makes me think that APserver is listening on a different socket >>> type than the one apl is using. >>> Therefore, netstat -l -p to see where APserver listens and apl -l 37 to >>> see where apl tries to connect. >>> >>> /// Jürgen >>> >>> >>> >>> >>> On 07/29/2014 03:07 PM, Elias Mårtenson wrote: >>> >>> I don't think so. The APserver is definitely started. Also, if I start >>> another apl it's able to connect to the previous one. >>> >>> My theory is the same as before, I think that apl attempts to connect to >>> APserver before it's ready to accept connections. >>> >>> Also, given the fact that apl never connects to APserver, it's not very >>> strange that it's not killed when apl exits. >>> >>> In the case where I start a second apl that connects to the first >>> APserver, it does get killed properly. >>> >>> Regards, >>> Elias >>> On 29 Jul 2014 21:02, "Juergen Sauermann" <juergen.sauerm...@t-online.de> >>> wrote: >>> >>>> Hi Elias, >>>> >>>> looks like either no APserver is running or the APserver listens on >>>> another socket. >>>> Check with netstat -l -p. That should show a line like: >>>> >>>> tcp 0 0 localhost:16366 *:* >>>> LISTEN 2631/APserver >>>> >>>> If the APserver does not get killed then this is the problem I had >>>> earlier but could not reproduce. >>>> If you can reproduce it, please uncomment the *#define USE_POLL* at >>>> the beginning of *APserver.cc* >>>> and reinstall. That will tell us if *poll()* works better than >>>> *select()*. If not, we could try tcp_keepalive to >>>> see if that works better. >>>> >>>> /// Jürgen >>>> >>>> On 07/29/2014 05:27 AM, Elias Mårtenson wrote: >>>> >>>> The following happens on my Arch Linux system. >>>> >>>> When I start the apl binary (without Emacs) I'm getting a "connection >>>> refused" error. The log with *-l 37* is reproduced below. >>>> >>>> The APserver is properly started (I can see it in the process >>>> listing), but after I call )OFF, it doesn't get killed. >>>> >>>> Note that if I start APserver separately, I do not get any errors, >>>> and everything seems to work correctly. >>>> >>>> Here's the output from -l 37 (errors highlighted in red): >>>> >>>> $ *dist/bin/apl -l 37 --silent* >>>> sizeof(Svar_record) is 328 >>>> sizeof(Svar_partner) is 28 >>>> >>>> initializing paths from argv[0] = dist/bin/apl >>>> initializing paths from $PWD = /home/emartenson/src/apl >>>> APL_bin_path is: ./dist/bin >>>> APL_bin_name is: apl >>>> Reading config file >>>> /home/emartenson/src/apl/dist/etc/gnu-apl.d/preferences ... >>>> config file /home/emartenson/.config/gnu-apl/preferences is not >>>> present/readable >>>> 0 input files: >>>> Using TCP socket towards APserver... >>>> connecting to 127.0.0.1 TCP port 16366 >>>> (this is expected to fail, unless APserver was started manually) >>>> forking new APserver listening on 127.0.0.1 TCP port 16366 >>>> connecting to 127.0.0.1 TCP port 16366 >>>> (this is supposed to succeed.) >>>> ::connect() to existing APserver failed: Connection refused >>>> PID is 24054 >>>> argc: 4 >>>> argv[0]: 'dist/bin/apl' >>>> argv[1]: '-l' >>>> argv[2]: '37' >>>> argv[3]: '--silent' >>>> uprefs.user_do_svars: 1 >>>> uprefs.system_do_svars: 1 >>>> uprefs.requested_id: 0 >>>> uprefs.requested_par: 0 >>>> Svar_DB not connected in Svar_DB::is_registered_id() >>>> id.proc: 1001 at ProcessorID.cc:77 >>>> Processor ID was completely initialized: 1001:0:0 >>>> system_do_svars is: 1 >>>> * ⎕SVQ⍳0* >>>> Svar_DB not connected in Svar_DB::get_offering_processors() >>>> 100 210 >>>> >>>> Regards. >>>> Elias >>>> >>>> >>>> >>> >> >