Hi Pat,

From my point of view the issue is philosophical, because practically speaking I couldn't reproduce the orphans with out doing something a little odd namely, ssh ... && sleep 1d. Although the fact that a user reported suggests that it may occur in the real world as well. The question is this: should an application explicitly clean up resources it allocates? or should an application rely on the user not only knowing that there is the potential for a resource leak but also knowing enough to do the right thing to avoid it (eg ssh -tt ...)? In my opinion, as a matter of principle, if PV spawns a process it should explicitly clean it up and there should be no way it can become an orphan. In this case the fact that the orphan can hold ports open is particularly insidious, because further connection attempt on that port fails with no helpful error information. Also it is not very difficult to clean up a spawned process. What it comes down to is a little book keeping to hang on to the qprocess handle and a few lines of code called from pqCommandServerStartup destructor to make certain it's cleaned up. This is from the patch I submitted when I filed the bug report.

+    // close running process
+    if (this->Process->state()==QProcess::Running)
+      {
+      this->Process->close();
+      }
+    // free the object
+    delete this->Process;
+    this->Process=NULL;

I think if the cluster admins out there new which ssh options (GatewayPorts etc) are important for ParView to work seamlessly, then they might be willing to open them up. It's my impression that the folks that build clusters want tools like PV to be easy to use, but they don't necessarily know all the in's and out's of confinguring and running PV.

Thanks for looking at this again! The -tt option to ssh is indeed a good find.

Burlen

pat marion wrote:
Hi all!

I'm bringing this thread back- I have learned a couple new things...

-----------------------
No more orphans:

Here is an easy way to create an orphan:

   $ ssh localhost sleep 1d
   $ <press control c>

The ssh process is cleaned up, but sshd orphans the sleep process. You can avoid this by adding '-t' to ssh:

  $ ssh -t localhost sleep 1d

Works like a charm! But then there is another problem... try this command from paraview (using QProcess) and it still leaves an orphan, doh! Go back and re-read ssh's man page and you have the solution, use '-t' twice: ssh -tt

-------------------------
GatewayPorts and portfwd workaround:

In this scenario we have 3 machines: workstation, service-node, and compute-node. I want to ssh from workstation to service-node and submit a job that will run pvserver on compute-node. When pvserver starts on compute-node I want it to reverse connect to service-node and I want service-node to forward the connection to workstation. So here I go:

   $ ssh -R11111:localhost:11111 service-node qsub start_pvserver.sh

Oops, the qsub command returns immediately and closes my ssh tunnel. Let's pretend that the scheduler doesn't provide an easy way to keep the command alive, so I have resorted to using 'sleep 1d'. So here I go, using -tt to prevent orphans:

$ ssh -tt -R11111:localhost:11111 service-node "qsub start_pvserver.sh && sleep 1d"

Well, this will only work if GatewayPorts is enabled in sshd_config on service-node. If GatewayPorts is not enabled, the ssh tunnel will only accept connections from localhost, it will not accept a connection from compute-node. We can ask the sysadmin to enable GatewayPorts, or we could use portfwd. You can run portfwd on service-node to forward port 22222 to port 11111, then have compute-node connect to service-node:22222. So your job script would launch pvserver like this:

  pvserver -rc -ch=service-node -sp=22222

Problem solved! Also convenient, we can use portfwd to replace 'sleep 1d'. So the final command, executed by paraview client:

ssh -tt -R 11111:localhost:11111 service-node "qsub start_pvserver.sh && portfwd -g -c fwd.cfg"

Where fwd.cfg contains:

  tcp { 22222 { => localhost:11111 } }


Hope this helps!

Pat

On Fri, Feb 12, 2010 at 7:06 PM, burlen <burlen.lor...@gmail.com <mailto:burlen.lor...@gmail.com>> wrote:


        Incidentally, this brings up an interesting point about
        ParaView with client/server.  It doesn't try to clean up it's
        child processes, AFAIK.  For example, if you set up this ssh
        tunnel inside the ParaView GUI (e.g., using a command instead
        of a manual connection), and you cancel the connection, it
        will leave the ssh running.  You have to track down the ssh
        process and kill it yourself.  It's minor thing, but it can
        also prevent future connections if you don't realize there's a
        zombie ssh that kept your ports open.

    I attempted to reproduce on my kubuntu 9.10, qt 4.5.2 system, with
    slightly different results, which may be qt/distro/os specific.

    On my system as long as the process ParaView spawns finishes on
    its own there is no problem. That's usually how one would expect
    things to work out since when the client disconnects the server
    closes followed by ssh. But, you are right that PV never
    explicitly kills or otherwise cleans up after the process it
    starts. So if the spawned process for some reason doesn't finish
    orphan processes are introduced.

    I was able to produce orphan ssh processes, giving the PV client a
    server start up command that doesn't finish. eg

      ssh ... pvserver ... && sleep 100d

    I get the situation you described which prevents further
    connection on the same ports. Once PV tries and fails to connect
    on th eopen ports, there is crash soon after.

    I filed a bug report with a patch:
    http://www.paraview.org/Bug/view.php?id=10283



    Sean Ziegeler wrote:

        Most batch systems have an option to wait until the job is
        finished before the submit command returns.  I know PBS uses
        "-W block=true" and that SGE and LSF have similar options (but
        I don't recall the precise flags).

        If your batch system doesn't provide that, I'd recommend
        adding some shell scripting to loop through checking the queue
        for job completion and not return until it's done.  The sleep
        thing would work, but wouldn't exit when the server finishes,
        leaving the ssh tunnels (and other things like portfwd if you
        put them in your scripts) lying around.

        Incidentally, this brings up an interesting point about
        ParaView with client/server.  It doesn't try to clean up it's
        child processes, AFAIK.  For example, if you set up this ssh
        tunnel inside the ParaView GUI (e.g., using a command instead
        of a manual connection), and you cancel the connection, it
        will leave the ssh running.  You have to track down the ssh
        process and kill it yourself.  It's minor thing, but it can
        also prevent future connections if you don't realize there's a
        zombie ssh that kept your ports open.


        On 02/08/10 21:03, burlen wrote:

            I am curious to hear what Sean has to say.

            But, say the batch system returns right away after the job
            is submitted,
            I think we can doctor the command so that it will live for
            a while
            longer, what about something like this:

            ssh -R XXXX:localhost:YYYY remote_machine
            "submit_my_job.sh && sleep
            100d"


            pat marion wrote:

                Hey just checked out the wiki page, nice! One
                question, wouldn't this
                command hang up and close the tunnel after submitting
                the job?
                ssh -R XXXX:localhost:YYYY remote_machine submit_my_job.sh
                Pat

                On Mon, Feb 8, 2010 at 8:12 PM, pat marion
                <pat.mar...@kitware.com <mailto:pat.mar...@kitware.com>
                <mailto:pat.mar...@kitware.com
                <mailto:pat.mar...@kitware.com>>> wrote:

                Actually I didn't write the notes at the hpc.mil
                <http://hpc.mil> <http://hpc.mil>
                link.

                Here is something- and maybe this is the problem that
                Sean refers
                to- in some cases, when I have set up a reverse ssh
                tunnel from
                login node to workstation (command executed from
                workstation) then
                the forward does not work when the compute node
                connects to the
                login node. However, if I have the compute node
                connect to the
                login node on port 33333, then use portfwd to forward
                that to
                localhost:11111, where the ssh tunnel is listening on
                port 11111,
                it works like a charm. The portfwd tricks it into
                thinking the
                connection is coming from localhost and allow the ssh
                tunnel to
                work. Hope that made a little sense...

                Pat


                On Mon, Feb 8, 2010 at 6:29 PM, burlen
                <burlen.lor...@gmail.com <mailto:burlen.lor...@gmail.com>
                <mailto:burlen.lor...@gmail.com
                <mailto:burlen.lor...@gmail.com>>> wrote:

                Nice, thanks for the clarification. I am guessing that
                your
                example should probably be the recommended approach rather
                than the portfwd method suggested on the PV wiki. :) I
                took
                the initiative to add it to the Wiki. KW let me know
                if this
                is not the case!

                
http://paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_connection_over_an_ssh_tunnel



                Would you mind taking a look to be sure I didn't miss
                anything
                or bollix it up?

                The sshd config options you mentioned may be why your
                method
                doesn't work on the Pleiades system, either that or
                there is a
                firewall between the front ends and compute nodes. In
                either
                case I doubt the NAS sys admins are going to
                reconfigure for
                me :) So at least for now I'm stuck with the two hop ssh
                tunnels and interactive batch jobs. if there were
                someway to
                script the ssh tunnel in my batch script I would be
                golden...

                By the way I put the details of the two hop ssh tunnel
                on the
                wiki as well, and a link to Pat's hpc.mil
                <http://hpc.mil> <http://hpc.mil>
                notes. I don't dare try to summarize them since I've never
                used portfwd and it refuses to compile both on my
                workstation
                and the cluster.

                Hopefully putting these notes on the Wiki will save future
                ParaView users some time and headaches.


                Sean Ziegeler wrote:

                Not quite- the pvsc calls ssh with both the tunnel options
                and the commands to submit the batch job. You don't even
                need a pvsc; it just makes the interface fancier. As long
                as you or PV executes something like this from your
                machine:
                ssh -R XXXX:localhost:YYYY remote_machine submit_my_job.sh

                This means that port XXXX on remote_machine will be the
                port to which the server must connect. Port YYYY (e.g.,
                11111) on your client machine is the one on which PV
                listens. You'd have to tell the server (in the batch
                submission script, for example) the name of the node and
                port XXXX to which to connect.

                One caveat that might be causing you problems, port
                forwarding (and "gateway ports" if the server is running
                on a different node than the login node) must be enabled
                in the remote_machine's sshd_config. If not, no ssh
                tunnels will work at all (see: man ssh and man
                sshd_config). That's something that an administrator
                would need to set up for you.

                On 02/08/10 12:26, burlen wrote:

                So to be sure about what you're saying: Your .pvsc
                script ssh's to the
                front end and submits a batch job which when it's
                scheduled , your batch
                script creates a -R style tunnel and starts pvserver
                using PV reverse
                connection. ? or are you using portfwd or a second ssh
                session to
                establish the tunnel ?

                If you're doing this all from your .pvsc script
                without a second ssh
                session and/or portfwd that's awesome! I haven't been
                able to script
                this, something about the batch system prevents the
                tunnel created
                within the batch job's ssh session from working. I
                don't know if that's
                particular to this system or a general fact of life
                about batch systems.

                Question: How are you creating the tunnel in your
                batch script?

                Sean Ziegeler wrote:

                Both ways will work for me in most cases, i.e. a
                "forward" connection
                with ssh -L or a reverse connection with ssh -R.

                However, I find that the reverse method is more
                scriptable. You can
                set up a .pvsc file that the client can load and
                will call ssh with
                the appropriate options and commands for the
                remote host, all from the
                GUI. The client will simply wait for the reverse
                connection from the
                server, whether it takes 5 seconds or 5 hours for
                the server to get
                through the batch queue.

                Using the forward connection method, if the server
                isn't started soon
                enough, the client will attempt to connect and
                then fail. I've always
                had to log in separately, wait for the server to
                start running, then
                tell my client to connect.

                -Sean

                On 02/06/10 12:58, burlen wrote:

                Hi Pat,

                My bad. I was looking at the PV wiki, and
                thought you were talking about
                doing this without an ssh tunnel and using
                only port forward and
                paraview's --reverse-connection option . Now
                that I am reading your
                hpc.mil <http://hpc.mil> <http://hpc.mil> post I see
                what you
                mean :)

                Burlen


                pat marion wrote:

                Maybe I'm misunderstanding what you mean
                by local firewall, but
                usually as long as you can ssh from your
                workstation to the login node
                you can use a reverse ssh tunnel.


                _______________________________________________
                Powered by www.kitware.com <http://www.kitware.com>
                <http://www.kitware.com>

                Visit other Kitware open-source projects at
                http://www.kitware.com/opensource/opensource.html

                Please keep messages on-topic and check the
                ParaView Wiki at:
                http://paraview.org/Wiki/ParaView

                Follow this link to subscribe/unsubscribe:
                http://www.paraview.org/mailman/listinfo/paraview







_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the ParaView Wiki at: 
http://paraview.org/Wiki/ParaView

Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview

Reply via email to