I did another test and added a two-second sleep after attempting to connect
to the APserver, and that removed the problem. Thus, I conclude that the
issue is that the APserver doesn't have time to initialise before the
parent tries to connect.

I'd like to propose that APserver sends a message to the parent instead of
having an arbitrary sleep (right now it's 20 ms, I believe). There are a
few different ways of doing this. Here are a few:

   - The parent redirects stdout and waits for a message from APserver
   - The same as above, but the apl session opens a named pipe, passed in
   the name of the pipe to APserver and the message is sent over that channel
   instead.
   - APserver detaches and forks itself into the background once all
   initialisation has been performed. The parent apl session waits for the
   "parent" APserver to exit before attempting to connect.
   - The apl session attempts multiple retries over a few seconds before
   giving up.

I'm sure there are other ways to handle it as well. At least we know what
the problem is now. :-)

Regards,
Elias


On 31 July 2014 10:43, Elias Mårtenson <loke...@gmail.com> wrote:

> I've checked, and here are the results. I noticed that sometimes the
> APserver gets killed when I )OFF the interpreter, and sometimes it
> doesn't.
>
> $ *dist/bin/apl --silent -l 37*
> sizeof(Svar_record) is    328
> sizeof(Svar_partner) is   28
>
> initializing paths from argv[0] = dist/bin/apl
> initializing paths from  $PWD = /home/emartenson/src/apl
> APL_bin_path is: ./dist/bin
> APL_bin_name is: apl
> Reading config file
> /home/emartenson/src/apl/dist/etc/gnu-apl.d/preferences ...
> config file /home/emartenson/.config/gnu-apl/preferences is not
> present/readable
> 0 input files:
> Using TCP socket towards APserver...
> connecting to 127.0.0.1 TCP port 16366
>     (this is expected to fail, unless APserver was started manually)
> forking new APserver listening on 127.0.0.1 TCP port 16366
> connecting to 127.0.0.1 TCP port 16366
>     (this is supposed to succeed.)
> ::connect() to existing APserver failed: Connection refused
> PID is 22704
> argc: 4
>   argv[0]: 'dist/bin/apl'
>   argv[1]: '--silent'
>   argv[2]: '-l'
>   argv[3]: '37'
> uprefs.user_do_svars:   1
> uprefs.system_do_svars: 1
> uprefs.requested_id:    0
> uprefs.requested_par:   0
> Svar_DB not connected in Svar_DB::is_registered_id()
> id.proc: 1001 at ProcessorID.cc:77
> Processor ID was completely initialized: 1001:0:0
> system_do_svars is: 1
>
> Then, I check for listeners from another terminal:
>
> $ *netstat -an | grep 16366*
> tcp        0      0 127.0.0.1:16366         0.0.0.0:*
> LISTEN
> $ *ps -ef | grep AP*
> emarten+ 22712     1  0 10:34 pts/3    00:00:00 ./dist/bin/APserver --port
> 16366
> emarten+ 22733 28324  0 10:36 pts/1    00:00:00 grep AP
>
> I then quit the APL session:
>
> *)off*
>
> And then check connections again:
>
> $ *ps -ef | grep AP*
> emarten+ 22712     1  0 10:34 pts/3    00:00:00 ./dist/bin/APserver --port
> 16366
> emarten+ 22750 28324  0 10:38 pts/1    00:00:00 grep AP
> em-desktop$ *netstat -an | grep 16366*
> tcp        0      0 127.0.0.1:16366         0.0.0.0:*
> LISTEN
>
> As we can see, the APserver is still listening.
>
> I now try to start the APL interpreter again, and it properly connects to
> the *old* APserver:
>
> $ *dist/bin/apl --silent -l 37*
> sizeof(Svar_record) is    328
> sizeof(Svar_partner) is   28
>
> initializing paths from argv[0] = dist/bin/apl
> initializing paths from  $PWD = /home/emartenson/src/apl
> APL_bin_path is: ./dist/bin
> APL_bin_name is: apl
> Reading config file
> /home/emartenson/src/apl/dist/etc/gnu-apl.d/preferences ...
> config file /home/emartenson/.config/gnu-apl/preferences is not
> present/readable
> 0 input files:
> Using TCP socket towards APserver...
> connected to APserver, socket is 3
> using Svar_DB on APserver!
> PID is 22768
> argc: 4
>   argv[0]: 'dist/bin/apl'
>   argv[1]: '--silent'
>   argv[2]: '-l'
>   argv[3]: '37'
> uprefs.user_do_svars:   1
> uprefs.system_do_svars: 1
> uprefs.requested_id:    0
> uprefs.requested_par:   0
> id.proc: 1001 at ProcessorID.cc:77
> Processor ID was completely initialized: 1001:0:0
> system_do_svars is: 1
>
> We can see that it's actually connected by checking the APserver status
> again:
>
> $ *netstat -an | grep 16366*
> tcp        0      0 127.0.0.1:16366         0.0.0.0:*
> LISTEN
> tcp        0      0 127.0.0.1:44102         127.0.0.1:16366
> ESTABLISHED
> tcp        0      0 127.0.0.1:16366         127.0.0.1:44102
> ESTABLISHED
> em-desktop$ *ps -ef | grep AP*
> emarten+ 22712     1  0 10:34 pts/3    00:00:00 ./dist/bin/APserver --port
> 16366
> emarten+ 22782 28324  0 10:40 pts/1    00:00:00 grep AP
>
> Now, let's )OFF the interpreter which promptly kills the APserver that
> was originally started in the first invocation of apl:
>
> $ *netstat -an | grep 16366*
> tcp        0      0 127.0.0.1:44102         127.0.0.1:16366
> TIME_WAIT
> em-desktop$ *ps -ef | grep AP*
> emarten+ 22790 28324  0 10:41 pts/1    00:00:00 grep AP
>
> Regards,
> Elias
>
>
> On 29 July 2014 21:17, Elias Mårtenson <loke...@gmail.com> wrote:
>
>> I will definitely check this when I get back to the office tomorrow. I'll
>> keep you posted.
>>
>> Thanks and regards,
>> Elias
>>
>>
>> On 29 July 2014 21:13, Juergen Sauermann <juergen.sauerm...@t-online.de>
>> wrote:
>>
>>>  Hi,
>>>
>>> that makes me think that APserver is listening on a different socket
>>> type than the one apl is using.
>>> Therefore, netstat -l -p to see where APserver listens and apl -l 37 to
>>> see where apl tries to connect.
>>>
>>> /// Jürgen
>>>
>>>
>>>
>>>
>>> On 07/29/2014 03:07 PM, Elias Mårtenson wrote:
>>>
>>> I don't think so. The APserver is definitely started. Also, if I start
>>> another apl it's able to connect to the previous one.
>>>
>>> My theory is the same as before, I think that apl attempts to connect to
>>> APserver before it's ready to accept connections.
>>>
>>> Also, given the fact that apl never connects to APserver, it's not very
>>> strange that it's not killed when apl exits.
>>>
>>> In the case where I start a second apl that connects to the first
>>> APserver, it does get killed properly.
>>>
>>> Regards,
>>> Elias
>>> On 29 Jul 2014 21:02, "Juergen Sauermann" <juergen.sauerm...@t-online.de>
>>> wrote:
>>>
>>>>  Hi Elias,
>>>>
>>>> looks like either no APserver is running or the APserver listens on
>>>> another socket.
>>>> Check with netstat -l -p. That should show a line like:
>>>>
>>>> tcp        0      0 localhost:16366         *:*
>>>> LISTEN      2631/APserver
>>>>
>>>> If the APserver does not get killed then this is the problem I had
>>>> earlier but could not reproduce.
>>>> If you can reproduce it, please uncomment the *#define USE_POLL* at
>>>> the beginning of *APserver.cc*
>>>> and reinstall. That will tell us if *poll()* works better than
>>>> *select()*. If not, we could try tcp_keepalive to
>>>> see if that works better.
>>>>
>>>> /// Jürgen
>>>>
>>>> On 07/29/2014 05:27 AM, Elias Mårtenson wrote:
>>>>
>>>>  The following happens on my Arch Linux system.
>>>>
>>>>  When I start the apl binary (without Emacs) I'm getting a "connection
>>>> refused" error. The log with *-l 37* is reproduced below.
>>>>
>>>>  The APserver is properly started (I can see it in the process
>>>> listing), but after I call )OFF, it doesn't get killed.
>>>>
>>>>  Note that if I start APserver separately, I do not get any errors,
>>>> and everything seems to work correctly.
>>>>
>>>>  Here's the output from -l 37 (errors highlighted in red):
>>>>
>>>>  $ *dist/bin/apl -l 37 --silent*
>>>> sizeof(Svar_record) is    328
>>>> sizeof(Svar_partner) is   28
>>>>
>>>>  initializing paths from argv[0] = dist/bin/apl
>>>> initializing paths from  $PWD = /home/emartenson/src/apl
>>>> APL_bin_path is: ./dist/bin
>>>> APL_bin_name is: apl
>>>> Reading config file
>>>> /home/emartenson/src/apl/dist/etc/gnu-apl.d/preferences ...
>>>> config file /home/emartenson/.config/gnu-apl/preferences is not
>>>> present/readable
>>>> 0 input files:
>>>> Using TCP socket towards APserver...
>>>> connecting to 127.0.0.1 TCP port 16366
>>>>     (this is expected to fail, unless APserver was started manually)
>>>> forking new APserver listening on 127.0.0.1 TCP port 16366
>>>> connecting to 127.0.0.1 TCP port 16366
>>>>     (this is supposed to succeed.)
>>>> ::connect() to existing APserver failed: Connection refused
>>>> PID is 24054
>>>> argc: 4
>>>>   argv[0]: 'dist/bin/apl'
>>>>   argv[1]: '-l'
>>>>   argv[2]: '37'
>>>>   argv[3]: '--silent'
>>>> uprefs.user_do_svars:   1
>>>> uprefs.system_do_svars: 1
>>>> uprefs.requested_id:    0
>>>> uprefs.requested_par:   0
>>>> Svar_DB not connected in Svar_DB::is_registered_id()
>>>> id.proc: 1001 at ProcessorID.cc:77
>>>> Processor ID was completely initialized: 1001:0:0
>>>> system_do_svars is: 1
>>>> *      ⎕SVQ⍳0*
>>>> Svar_DB not connected in Svar_DB::get_offering_processors()
>>>> 100 210
>>>>
>>>>  Regards.
>>>> Elias
>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to