Hey Burlen, on the bug report page for 10283, I think you need to fix the
command line you are testing with :
$ ssh remote cmd1 && cm2
will execute cmd1 on remote and cmd2 locally. It should be:
$ ssh remote "cmd1 && cmd2"
Pat
On Fri, Apr 30, 2010 at 9:12 AM, pat marion <pat.mar...@kitware.com
<mailto:pat.mar...@kitware.com>> wrote:
I have applied your patch. I agree that paraview should explicity
close the child process. But... what I am pointing out is that
calling QProcess::close() does not help in this situation. What I
am saying is that, even when paraview does kill the process, any
commands run by ssh on the other side of the netpipe will be
orphaned by sshd. Are you sure you can't reproduce it?
$ ssh localhost sleep 1d
$ < press control-c >
$ pidof sleep
$ # sleep is still running
Pat
On Fri, Apr 30, 2010 at 2:08 AM, burlen <burlen.lor...@gmail.com
<mailto:burlen.lor...@gmail.com>> wrote:
Hi Pat,
From my point of view the issue is philosophical, because
practically speaking I couldn't reproduce the orphans with out
doing something a little odd namely, ssh ... && sleep 1d.
Although the fact that a user reported suggests that it may
occur in the real world as well. The question is this: should
an application explicitly clean up resources it allocates? or
should an application rely on the user not only knowing that
there is the potential for a resource leak but also knowing
enough to do the right thing to avoid it (eg ssh -tt ...)? In
my opinion, as a matter of principle, if PV spawns a process
it should explicitly clean it up and there should be no way it
can become an orphan. In this case the fact that the orphan
can hold ports open is particularly insidious, because further
connection attempt on that port fails with no helpful error
information. Also it is not very difficult to clean up a
spawned process. What it comes down to is a little book
keeping to hang on to the qprocess handle and a few lines of
code called from pqCommandServerStartup destructor to make
certain it's cleaned up. This is from the patch I submitted
when I filed the bug report.
+ // close running process
+ if (this->Process->state()==QProcess::Running)
+ {
+ this->Process->close();
+ }
+ // free the object
+ delete this->Process;
+ this->Process=NULL;
I think if the cluster admins out there new which ssh options
(GatewayPorts etc) are important for ParView to work
seamlessly, then they might be willing to open them up. It's
my impression that the folks that build clusters want tools
like PV to be easy to use, but they don't necessarily know all
the in's and out's of confinguring and running PV.
Thanks for looking at this again! The -tt option to ssh is
indeed a good find.
Burlen
pat marion wrote:
Hi all!
I'm bringing this thread back- I have learned a couple new
things...
-----------------------
No more orphans:
Here is an easy way to create an orphan:
$ ssh localhost sleep 1d
$ <press control c>
The ssh process is cleaned up, but sshd orphans the sleep
process. You can avoid this by adding '-t' to ssh:
$ ssh -t localhost sleep 1d
Works like a charm! But then there is another problem...
try this command from paraview (using QProcess) and it
still leaves an orphan, doh! Go back and re-read ssh's
man page and you have the solution, use '-t' twice: ssh -tt
-------------------------
GatewayPorts and portfwd workaround:
In this scenario we have 3 machines: workstation,
service-node, and compute-node. I want to ssh from
workstation to service-node and submit a job that will run
pvserver on compute-node. When pvserver starts on
compute-node I want it to reverse connect to service-node
and I want service-node to forward the connection to
workstation. So here I go:
$ ssh -R11111:localhost:11111 service-node qsub
start_pvserver.sh
Oops, the qsub command returns immediately and closes my
ssh tunnel. Let's pretend that the scheduler doesn't
provide an easy way to keep the command alive, so I have
resorted to using 'sleep 1d'. So here I go, using -tt to
prevent orphans:
$ ssh -tt -R11111:localhost:11111 service-node "qsub
start_pvserver.sh && sleep 1d"
Well, this will only work if GatewayPorts is enabled in
sshd_config on service-node. If GatewayPorts is not
enabled, the ssh tunnel will only accept connections from
localhost, it will not accept a connection from
compute-node. We can ask the sysadmin to enable
GatewayPorts, or we could use portfwd. You can run
portfwd on service-node to forward port 22222 to port
11111, then have compute-node connect to
service-node:22222. So your job script would launch
pvserver like this:
pvserver -rc -ch=service-node -sp=22222
Problem solved! Also convenient, we can use portfwd to
replace 'sleep 1d'. So the final command, executed by
paraview client:
ssh -tt -R 11111:localhost:11111 service-node "qsub
start_pvserver.sh && portfwd -g -c fwd.cfg"
Where fwd.cfg contains:
tcp { 22222 { => localhost:11111 } }
Hope this helps!
Pat
On Fri, Feb 12, 2010 at 7:06 PM, burlen
<burlen.lor...@gmail.com <mailto:burlen.lor...@gmail.com>
<mailto:burlen.lor...@gmail.com
<mailto:burlen.lor...@gmail.com>>> wrote:
Incidentally, this brings up an interesting point about
ParaView with client/server. It doesn't try to
clean up it's
child processes, AFAIK. For example, if you set up
this ssh
tunnel inside the ParaView GUI (e.g., using a
command instead
of a manual connection), and you cancel the
connection, it
will leave the ssh running. You have to track down
the ssh
process and kill it yourself. It's minor thing,
but it can
also prevent future connections if you don't
realize there's a
zombie ssh that kept your ports open.
I attempted to reproduce on my kubuntu 9.10, qt 4.5.2
system, with
slightly different results, which may be qt/distro/os
specific.
On my system as long as the process ParaView spawns
finishes on
its own there is no problem. That's usually how one
would expect
things to work out since when the client disconnects
the server
closes followed by ssh. But, you are right that PV never
explicitly kills or otherwise cleans up after the
process it
starts. So if the spawned process for some reason
doesn't finish
orphan processes are introduced.
I was able to produce orphan ssh processes, giving the
PV client a
server start up command that doesn't finish. eg
ssh ... pvserver ... && sleep 100d
I get the situation you described which prevents further
connection on the same ports. Once PV tries and fails
to connect
on th eopen ports, there is crash soon after.
I filed a bug report with a patch:
http://www.paraview.org/Bug/view.php?id=10283
Sean Ziegeler wrote:
Most batch systems have an option to wait until the
job is
finished before the submit command returns. I know
PBS uses
"-W block=true" and that SGE and LSF have similar
options (but
I don't recall the precise flags).
If your batch system doesn't provide that, I'd
recommend
adding some shell scripting to loop through
checking the queue
for job completion and not return until it's done.
The sleep
thing would work, but wouldn't exit when the server
finishes,
leaving the ssh tunnels (and other things like
portfwd if you
put them in your scripts) lying around.
Incidentally, this brings up an interesting point about
ParaView with client/server. It doesn't try to
clean up it's
child processes, AFAIK. For example, if you set up
this ssh
tunnel inside the ParaView GUI (e.g., using a
command instead
of a manual connection), and you cancel the
connection, it
will leave the ssh running. You have to track down
the ssh
process and kill it yourself. It's minor thing,
but it can
also prevent future connections if you don't
realize there's a
zombie ssh that kept your ports open.
On 02/08/10 21:03, burlen wrote:
I am curious to hear what Sean has to say.
But, say the batch system returns right away
after the job
is submitted,
I think we can doctor the command so that it
will live for
a while
longer, what about something like this:
ssh -R XXXX:localhost:YYYY remote_machine
"submit_my_job.sh && sleep
100d"
pat marion wrote:
Hey just checked out the wiki page, nice! One
question, wouldn't this
command hang up and close the tunnel after
submitting
the job?
ssh -R XXXX:localhost:YYYY remote_machine
submit_my_job.sh
Pat
On Mon, Feb 8, 2010 at 8:12 PM, pat marion
<pat.mar...@kitware.com
<mailto:pat.mar...@kitware.com>
<mailto:pat.mar...@kitware.com
<mailto:pat.mar...@kitware.com>>
<mailto:pat.mar...@kitware.com
<mailto:pat.mar...@kitware.com>
<mailto:pat.mar...@kitware.com
<mailto:pat.mar...@kitware.com>>>> wrote:
Actually I didn't write the notes at the
hpc.mil <http://hpc.mil>
<http://hpc.mil> <http://hpc.mil>
link.
Here is something- and maybe this is the
problem that
Sean refers
to- in some cases, when I have set up a
reverse ssh
tunnel from
login node to workstation (command executed
from
workstation) then
the forward does not work when the compute node
connects to the
login node. However, if I have the compute node
connect to the
login node on port 33333, then use portfwd
to forward
that to
localhost:11111, where the ssh tunnel is
listening on
port 11111,
it works like a charm. The portfwd tricks
it into
thinking the
connection is coming from localhost and
allow the ssh
tunnel to
work. Hope that made a little sense...
Pat
On Mon, Feb 8, 2010 at 6:29 PM, burlen
<burlen.lor...@gmail.com
<mailto:burlen.lor...@gmail.com>
<mailto:burlen.lor...@gmail.com
<mailto:burlen.lor...@gmail.com>>
<mailto:burlen.lor...@gmail.com
<mailto:burlen.lor...@gmail.com>
<mailto:burlen.lor...@gmail.com
<mailto:burlen.lor...@gmail.com>>>> wrote:
Nice, thanks for the clarification. I am
guessing that
your
example should probably be the recommended
approach rather
than the portfwd method suggested on the PV
wiki. :) I
took
the initiative to add it to the Wiki. KW
let me know
if this
is not the case!
http://paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_connection_over_an_ssh_tunnel
Would you mind taking a look to be sure I
didn't miss
anything
or bollix it up?
The sshd config options you mentioned may
be why your
method
doesn't work on the Pleiades system, either
that or
there is a
firewall between the front ends and compute
nodes. In
either
case I doubt the NAS sys admins are going to
reconfigure for
me :) So at least for now I'm stuck with
the two hop ssh
tunnels and interactive batch jobs. if
there were
someway to
script the ssh tunnel in my batch script I
would be
golden...
By the way I put the details of the two hop
ssh tunnel
on the
wiki as well, and a link to Pat's hpc.mil
<http://hpc.mil>
<http://hpc.mil> <http://hpc.mil>
notes. I don't dare try to summarize them
since I've never
used portfwd and it refuses to compile both
on my
workstation
and the cluster.
Hopefully putting these notes on the Wiki
will save future
ParaView users some time and headaches.
Sean Ziegeler wrote:
Not quite- the pvsc calls ssh with both the
tunnel options
and the commands to submit the batch job.
You don't even
need a pvsc; it just makes the interface
fancier. As long
as you or PV executes something like this
from your
machine:
ssh -R XXXX:localhost:YYYY remote_machine
submit_my_job.sh
This means that port XXXX on remote_machine
will be the
port to which the server must connect. Port
YYYY (e.g.,
11111) on your client machine is the one on
which PV
listens. You'd have to tell the server (in
the batch
submission script, for example) the name of
the node and
port XXXX to which to connect.
One caveat that might be causing you
problems, port
forwarding (and "gateway ports" if the
server is running
on a different node than the login node)
must be enabled
in the remote_machine's sshd_config. If
not, no ssh
tunnels will work at all (see: man ssh and man
sshd_config). That's something that an
administrator
would need to set up for you.
On 02/08/10 12:26, burlen wrote:
So to be sure about what you're saying:
Your .pvsc
script ssh's to the
front end and submits a batch job which
when it's
scheduled , your batch
script creates a -R style tunnel and starts
pvserver
using PV reverse
connection. ? or are you using portfwd or a
second ssh
session to
establish the tunnel ?
If you're doing this all from your .pvsc script
without a second ssh
session and/or portfwd that's awesome! I
haven't been
able to script
this, something about the batch system
prevents the
tunnel created
within the batch job's ssh session from
working. I
don't know if that's
particular to this system or a general fact
of life
about batch systems.
Question: How are you creating the tunnel
in your
batch script?
Sean Ziegeler wrote:
Both ways will work for me in most cases,
i.e. a
"forward" connection
with ssh -L or a reverse connection with
ssh -R.
However, I find that the reverse method is more
scriptable. You can
set up a .pvsc file that the client can
load and
will call ssh with
the appropriate options and commands for the
remote host, all from the
GUI. The client will simply wait for the
reverse
connection from the
server, whether it takes 5 seconds or 5
hours for
the server to get
through the batch queue.
Using the forward connection method, if the
server
isn't started soon
enough, the client will attempt to connect and
then fail. I've always
had to log in separately, wait for the
server to
start running, then
tell my client to connect.
-Sean
On 02/06/10 12:58, burlen wrote:
Hi Pat,
My bad. I was looking at the PV wiki, and
thought you were talking about
doing this without an ssh tunnel and using
only port forward and
paraview's --reverse-connection option . Now
that I am reading your
hpc.mil <http://hpc.mil> <http://hpc.mil>
<http://hpc.mil> post I see
what you
mean :)
Burlen
pat marion wrote:
Maybe I'm misunderstanding what you mean
by local firewall, but
usually as long as you can ssh from your
workstation to the login node
you can use a reverse ssh tunnel.
_______________________________________________
Powered by www.kitware.com
<http://www.kitware.com> <http://www.kitware.com>
<http://www.kitware.com>
Visit other Kitware open-source projects at
http://www.kitware.com/opensource/opensource.html
Please keep messages on-topic and check the
ParaView Wiki at:
http://paraview.org/Wiki/ParaView
Follow this link to subscribe/unsubscribe:
http://www.paraview.org/mailman/listinfo/paraview