Thanks very much to Michael for your detailed analysis!
I will try to have a look at /var/log/messages.
I describe the situations as:
1) Actually, all of the nodes have a previous operating system that was
installed before by an old OSCAR. Now I want to reinstall the new OSCAR due
to the crash of one disk on the headnode.
2) I also found three days ago, when I didn't disable the firewall on the
head node, that I can ping from the head node to other nodes when they are
booted on from the hard disk and I logged in to see the IP of the computing
nodes, however, I couldn't ping to the head node due to the firewall.
3) So from the above, you can see that I monitored the computing nodes when
they are booting. During the booting time, these nodes firstly boot from the
first ehter card, and I can see the mac addresses that I said before that I
manually collected, and the nodes are waiting to boot from DHCP, and after a
period of time, the nodes said that no DHCP server found or dhcp proxy
failed, and then the nodes try to boot from the second ehter card( all of
the computing nodes have two ehter cards, and I just connect eth0 to the
3com 3c17701 4924 switch. Actually, the other card was thought to be
collected to another switch to serve as public network), obviously, they
failed at once due to these cards are not activated, so these nodes try to
boot on from the hard disk and got a success.
I agree with you that the problem is in the network setup, however, I am not
familiar with the switch and the network. From the above description, could
you please give me some hints? Thanks in advance!
On 2/2/07, Michael Edwards <[EMAIL PROTECTED]> wrote:
>From looking at your log, it appears that the OSCAR bits are working
ok, and the DHCP server at least isn't dumping any errors to the OSCAR
log. You might check /var/log/messages to see if it is sending any
messages there if it appears to be failing. It will list a [FAILED]
message the first time it tries to shut down the server, because it
probably wasn't on to begin with, but after that it should go up and
down without any problems...
I still suspect the problem is in the network setup somewhere...
You might want to boot one of the nodes somehow, maybe with a knopix
or ubuntu livecd if you want to avoid installing a full os on the node
for a simple test, and make sure you have network connectivity between
your nodes.
At the very least, I would put a monitor on one of the nodes and watch
it as it boots. It should be trying to DHCP boot, and at this point
it should at least get an IP address since you have set up the head
node with a list of MAC addresses. It will probably even try to pull
the client image, though from what you have said before it sounds like
you still don't have one of those. Anyway, I find it very helpful to
be able to see what is going on in at least one node durring setup, it
makes the problem much less of a mystery.
On 2/1/07, Michael Edwards <[EMAIL PROTECTED]> wrote:
> All you need to do to check is to attach it to a monitor and watch as
> it boots. It will say something (or most do anyway) about sending a
> DHCP request and pause to wait for a response.
>
> On 2/1/07, Zhimin Xiong <[EMAIL PROTECTED]> wrote:
> > I am sorry that I don't know how to monitor a node whether it sends a
dhcp
> > request, of course, I am sure I have set the BIOS of all nodes to boot
from
> > network.
> >
> > The attached is the installation log file.
> >
> >
> >
> > On 2/1/07, Michael Edwards <[EMAIL PROTECTED]> wrote:
> > > If you have not successfully built a client image, then there is not
> > > much point in collecting the MAC addresses. I would at least run
the
> > > start_over script, if not start from a clean OS install now that you
> > > are starting to get the hang of things.
> > >
> > > Until you get experience with what parts of OSCAR are more optional,
> > > it is best to follow all the steps in order until they have
completed
> > > successfully. If you have problems, it is safest to run the
> > > start_over script and try again from the beginning. You also must
log
> > > out and log back in again if you want to be sure that your
environment
> > > is clean of the trash from the last install.
> > >
> > > Have you hooked a monitor to one of the nodes and made sure that it
> > > actually sending out a DHCP request? It looks different depending
on
> > > what BIOS is used, but it generally lists the MAC address on screen
> > > when it is sending out the DHCP request. If the node is not sending
> > > out a request, the head node can not receive it...
> > >
> > > On 2/1/07, Zhimin Xiong < [EMAIL PROTECTED]> wrote:
> > > > Thanks a lot to Michael!
> > > >
> > > > Yes, I disabled the firewall. However, I thought the procedure is
that
> > > > firstly collect mac addresses, and then enable install mode, then
> > configure
> > > > dhcp server, and lastly setup network boot. So maybe that is the
> > problem.
> > > > But after I imported macs from the file, and then click "enable
install
> > > > mode" and then "setup network boot", the nodes also couldn't boot
on.
> > > >
> > > > Another question is that "Dynamic DHCP update" and "configure DHCP
> > server",
> > > > defaultly, "Dynamic DHCP update" is checked, so should I click
> > "configure
> > > > DHCP server"? When I clicked "configure DHCP server", it seems
that dhcp
> > > > server couldn't be started.
> > > >
> > > > Now, the other point is that I activated the public network, maybe
OSCAR
> > > > head is not the first dhcp server the nodes recognized.
> > > >
> > > > Lastly, after I failed to install it and shut down the head node,
there
> > are
> > > > some problems when I want to install OSCAR again. From the install
> > manual, I
> > > > need to excute /opt/oscar/scripts/start_over, after doing that, I
also
> > have
> > > > troubles when do a second installation of OSCAR, for example, step
> > 4---build
> > > > oscar client, step 5--define oscar clients. All in all, the
question is
> > that
> > > > whether I need to execute the script "start_over", if is not,
whether I
> > need
> > > > to perform step 0--5 again.
> > > >
> > > > Thanks again!
> > > >
> > > >
> > > >
> > > > On 2/1/07, Michael Edwards < [EMAIL PROTECTED]> wrote:
> > > > > Did you push the "Setup Network Boot" button on the same window
before
> > > > > you started collecting addresses? If this does not work, you
can try
> > > > > making a boot CD if your nodes have CD drives.
> > > > >
> > > > > OSCAR's setup more or less assumes that your nodes are networked
in
> > > > > such a way that the first DHCP server they can see is the OSCAR
head
> > > > > node. If you aren't sure if this is true, try isolating the
cluster
> > > > > from the outside network and try network booting again.
> > > > >
> > > > > Also double check and make sure the firewall on the head node is
> > > > > turned off and that there is a working network between your
nodes and
> > > > > the head node. I occasionally forget to turn the switch power
on, and
> > > > > silly things like that myself.
> > > > >
> > > > > If you want to post a zipped version of your oscarinstall.logfile it
> > > > > might be helpful.
> > > > >
> > > > > On 1/31/07, Zhimin Xiong < [EMAIL PROTECTED]> wrote:
> > > > > > Hi, All!
> > > > > >
> > > > > > I am trying to install OSCAR 5 on Fedora Core 5, however, I
> > encountered
> > > > a
> > > > > > problem
> > > > > > when collecting mac address. I couln't get any mac address,
and all
> > the
> > > > > > compute node
> > > > > > couldn't boot from the network, and from the output it seems
that
> > dhcp
> > > > on
> > > > > > headnode is
> > > > > > difficult to start on. So I don't know what is the core
problem,
> > and
> > > > how to
> > > > > > solve it? Any
> > > > > > suggestion is appreciated!
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Zhimin Xiong
> > > > > >
> > > > > > State Key Lab. of Polymer Physics & Chemistry
> > > > > > Institute of Chemistry
> > > > > > Chinese Academy of Sciences
> > > > > >
> > > >
> >
-------------------------------------------------------------------------
> > > > > > Using Tomcat but need to do more? Need to support web
services,
> > > > security?
> > > > > > Get stuff done quickly with pre-integrated technology to make
your
> > job
> > > > > > easier.
> > > > > > Download IBM WebSphere Application Server v.1.0.1 based on
Apache
> > > > Geronimo
> > > > > >
> > > >
> >
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > > > > > _______________________________________________
> > > > > > Oscar-users mailing list
> > > > > > [email protected]
> > > > > >
> > > >
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> >
-------------------------------------------------------------------------
> > > > > Using Tomcat but need to do more? Need to support web services,
> > security?
> > > > > Get stuff done quickly with pre-integrated technology to make
your job
> > > > easier.
> > > > > Download IBM WebSphere Application Server v.1.0.1 based on
Apache
> > Geronimo
> > > > >
> > > >
> >
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > > > > _______________________________________________
> > > > > Oscar-users mailing list
> > > > > [email protected]
> > > > >
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Zhimin Xiong
> > > >
> > > > State Key Lab. of Polymer Physics & Chemistry
> > > > Institute of Chemistry
> > > > Chinese Academy of Sciences
> > > >
> >
-------------------------------------------------------------------------
> > > > Using Tomcat but need to do more? Need to support web services,
> > security?
> > > > Get stuff done quickly with pre-integrated technology to make your
job
> > > > easier.
> > > > Download IBM WebSphere Application Server v.1.0.1 based on Apache
> > Geronimo
> > > >
> >
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > > > _______________________________________________
> > > > Oscar-users mailing list
> > > > [email protected]
> > > >
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> > > >
> > > >
> > >
> > >
> >
-------------------------------------------------------------------------
> > > Using Tomcat but need to do more? Need to support web services,
security?
> > > Get stuff done quickly with pre-integrated technology to make your
job
> > easier.
> > > Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
> > >
> >
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > > _______________________________________________
> > > Oscar-users mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/oscar-users
> > >
> >
> >
> >
> > --
> > Zhimin Xiong
> >
> > State Key Lab. of Polymer Physics & Chemistry
> > Institute of Chemistry
> > Chinese Academy of Sciences
> >
-------------------------------------------------------------------------
> > Using Tomcat but need to do more? Need to support web services,
security?
> > Get stuff done quickly with pre-integrated technology to make your job
> > easier.
> > Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
> >
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > _______________________________________________
> > Oscar-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/oscar-users
> >
> >
> >
>
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users
--
Zhimin Xiong
State Key Lab. of Polymer Physics & Chemistry
Institute of Chemistry
Chinese Academy of Sciences
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users