Bugs item #844704, was opened at 2003-11-18 13:29
Message generated for change (Comment added) made by bernardli
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=109368&aid=844704&group_id=9368

Category: Documentation
Group: 2.4
Status: Open
Resolution: None
Priority: 9
Submitted By: Jason Brechin (brechin)
Assigned to: Bernard Li (bernardli)
Summary: Install package fails

Initial Comment:
==========================================
===================================
== Running step 2 of the OSCAR wizard: Configure 
selected OSCAR packages
==========================================
===================================

--> About to 
run /opt/oscar/packages/kernel_picker/scripts/pre_config
ure for kernel_picker
warning: /tftpboot/rpm/redhat-release-9-3.i386.rpm: V3 
DSA signature: NOKEY, key ID db42a60e
 [OSCAR::PackageBest :: Line 407] Reading package 
directory
 [OSCAR::PackageBest :: Line 419] Reading cache file.
 [OSCAR::PackageBest :: Line 432] Comparing cache to 
directory.
 [OSCAR::PackageBest :: Line 457] Writing new cache 
file.
62750 blocks
--> About to 
run /opt/oscar/packages/switcher/scripts/pre_configure 
for switcher
--> About to 
run /opt/oscar/packages/switcher/scripts/post_configure
 for switcher
Setting default for tag mpi ("lam-7.0")
Attribute successfully set; new attribute setting will be 
effective for
future shells
--> Step 2: Completed successfully
executing:/opt/c3-4/cexec --pipe c3cmd-filter hostname
oscar_cluster oscarnode1.ncsa.uiuc.edu: Warning: No 
xauth data; using fake authentication data for X11 
forwarding.
Warning: sanity check failed.


----------------------------------------------------------------------

>Comment By: Bernard Li (bernardli)
Date: 2003-11-25 19:01

Message:
Logged In: YES 
user_id=879102

Added a blurb about using ssh -x in the release notes.

----------------------------------------------------------------------

Comment By: Jeff Squyres (jsquyres)
Date: 2003-11-25 11:47

Message:
Logged In: YES 
user_id=11722

Add a note in 3.0 release notes about this issue -- if ssh
is broken, then c3 will be broken, etc.  In particular, if
there are warnings/errors coming from ssh, then c3
functionality will be broken (e.g., package
install/uninstall may not run properly).  And perhaps using
"ssh -x" would help (e.g., setenv C3_RSH to "ssh -x").

----------------------------------------------------------------------

Comment By: Jason Brechin (brechin)
Date: 2003-11-20 12:05

Message:
Logged In: YES 
user_id=274641

Glad you mentioned that, since I wrote a tool to manage 
c3.conf named (go figure) c3_conf_manager.  It has the 
ability to list the clusters defined, list the nodes in a cluster 
(no dead nodes), add a cluster (with its nodes, and with or 
without 0-indexing), or remove a cluster.

This tool has been mentioned before, and is distributed with 
the gm and pvfs packages.  It effectively will give you the list 
of nodes that should be up.

----------------------------------------------------------------------

Comment By: John (muglerj)
Date: 2003-11-20 11:54

Message:
Logged In: YES 
user_id=505737


Yeah, i agree that would be a nice sanity check. The `cexec
--pipe hostname | grep -c 'oscar_cluster' idea. But if you
think about it, the node listing that you can get from
systemimager may not match what is actually available or
running.

Also, there does not seem to be a c3 command capable of
giving back a listing of non 'dead' nodes from the c3.conf
file (cname, clist, and the like). If you have an idea for
listing out what is supposed to be up or online, send it. I
don't know if using a monitoring tool like clumon or ganglia
is appropriate here. I don't think it is.

----------------------------------------------------------------------

Comment By: Jason Brechin (brechin)
Date: 2003-11-20 09:31

Message:
Logged In: YES 
user_id=274641

Well... this may be a doc fix after all.  It seems that the 
xauth errors were due to me trying to do the install remotely, 
manually setting DISPLAY (instead of just tunneling X).  If I 
tunnel X, then it continues as expected.

Note that everything worked in this situation before.

Also note that the sanity check will also fail due to the 
switcher errors, like "switcher:mpi: Cannot find modulefile for 
lam-7.0 -- skipping".

Neither of these situations really indicate a problem with the 
cluster.

As for a good method of checking whether nodes are alive, 
why not do something like what I suggested - check whether 
`cexec --pipe hostname | grep -c 'oscar_cluster'` is equal to 
the number of clients.  That not only checks to see whether 
commands can work, but whether or not ALL the nodes are 
up.  This doesn't solve everything since you'd still be using 
c3cmd-filter to see if the other commands you run succeed, 
but it would make for a good sanity check.

----------------------------------------------------------------------

Comment By: Thomas Naughton (naughtont)
Date: 2003-11-19 14:49

Message:
Logged In: YES 
user_id=288102

yes, indeed this warning causes problems with the heurstic
used to
try and detect an error with remote cexec commands.

the problem in PackageInUn.pm actually comes from the way,
eval_c3cmd_filter() determines an "error".  Output is
expected to have nothing on the right hand side (RHS) of the
colon ":" when getting results from the c3cmd-filter.

The case were Warning: are displayed in this RHS are false
errors.  I don't know whether its worth patching with a
check to ignore certain results, e.g.,   next if( $output =~
/^Warning:/ );

The standard cases however are cought and no actual problem
is in c3cmd-filter, it's in the usage and assumptions
(heuristics) when using it.

I agree that the long-term fix is to have better error
reporting from C3 and that's something we (ORNL) are going
to have to address.  But in the mean time, what do folks
suggest, this approached seemed like a reasonable work
around, maybe it is not?

----------------------------------------------------------------------

Comment By: John (muglerj)
Date: 2003-11-19 14:40

Message:
Logged In: YES 
user_id=505737

After looking at this for awhile, i'm concluding that there
is no bug here. Jason hit a case where his ssh is not
configured properly and spits warning messages. The sanity
check takes these warning messages and does the right thing,
it fails. Once ssh is properly configured on his test
system, the sanity check should succeed. 

I recommend that this bug be either removed or downgraded
unless there are further comments. 

NOTES:
1. I looked into possibly coding the sanity check another
way, and looking for positive "node alive" messages from a
cexec command. This turns out to be more difficult than it
first seems and may not be doable at all, unless i'm missing
something. 

2. It may be possible to catch specific ssh warning messages
and ignore them. This might not be such a good idea either,
and i think our best bet is to fail gracefully as we are doing. 

----------------------------------------------------------------------

Comment By: Jason Brechin (brechin)
Date: 2003-11-19 07:42

Message:
Logged In: YES 
user_id=274641

Yep... it returns successfully...

[EMAIL PROTECTED] oscar]# ssh oscarnode1 hostname
Warning: No xauth data; using fake authentication data for 
X11 forwarding.
oscarnode1.ncsa.uiuc.edu
[EMAIL PROTECTED] oscar]# echo $?
0


----------------------------------------------------------------------

Comment By: Benoit des Ligneris (bligneri)
Date: 2003-11-19 07:05

Message:
Logged In: YES 
user_id=179120

This SSH behavior is caused by the fact that the SSH key for
host  oscarnode1.ncsa.uiuc.edu has changed and is not the
same as the one in (~/.ssh/known_hosts).

This can happens because you add/delete node ?

Anyway, we should remove all the key when we remove a node
so that this can not happen (I guess some grepping and
sedding of all 
the /home/*/.ssh/known_hosts should do the trick ?).

Same problem if the user has already a .known_hosts file
that conflict with the real SSH key of the host.

Anyway, to reproduce this, simply alter the host key in
.ssh/known_hosts

However, at some point, 

You can reproduce this ssh behavior

----------------------------------------------------------------------

Comment By: John (muglerj)
Date: 2003-11-18 21:21

Message:
Logged In: YES 
user_id=505737

Well, i do see a warning message:

Warning: No
xauth data; using fake authentication data for X11
forwarding.

Does ssh spit back a success return code with this warning?
If it spits back something other than success, the  sanity
check is designed to fail. 

I cannot seem to reproduce this, although i've seen it
before with ssh. I tried zapping my .Xauthority file but no
luck with that. 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=109368&aid=844704&group_id=9368


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
Oscar-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to