Hi Srimanth,
  Thanks again for your valuable inputs.

a) We change the jre before ambari server setup.

    The below are the steps we followed .
        1. Installed ambari
        2. In the ambari.properties changed, the port and the jdk to 1.7
        3. Ran ambari setup (did not install jdk1.6)
4. Started ambari. And chose Auto Install of agents. Which failed saying; INFO 2013-07-04 09:38:00,378 security.py:49 - SSL Connect being called.. connecting to the server INFO 2013-07-04 09:38:00,563 Controller.py:99 - Unable to connect to: https://xxxxx:8441/agent/v1/register/xxxxxx

        5.Stopped the ambari-server, changed jdk, did setup again.
6. This time it installed jdk 1.6, and agents were successfully registered.

b) Yes, with this url, everything was functional as normal. I was able to see all status , perform conf change,start-stop etc. I guess it is a problem with the url redirect. Extremely sorry abt this, but we have removed our old setup and starting fresh. We need to hit the production soon, so we are experimenting rigorously.


c) As I mentioned in my first mail, the .ini was proper after the initial trial. We removed the rpm, and did retry, but still no luck. It seemed like the first value was
cached somewhere.
* We added some logs in the main.py expecting to see it in the agent logs.


Another point that we have noticed is that, stopping ambari server doesn't stop the agents.Not sure whether this is the usual behavior or some problem with our setup. The jdk issue is our major concern currently as we have worked around for others. I will look more into the keystore point.


Thanks
Vivek


On Thursday 04 July 2013 09:26 PM, Srimanth Gunturi wrote:
Hi Vivek,

a) Did you change the JRE before of after setup+install? If you changed it after install, there might ssh keys not in the keystore of the new VM.

b) When it does go back to installer, which page does it goto, and does it have values pre-populated? Also, can you please go to http://ambari:port/api/v1/persist/CLUSTER_CURRENT_STATUS and provide the value for "clusterState" key.

c) Ambari agents have /etc/ambari-agent/conf/ambari-agent.ini which points to server hostname/port. This might have been initialized with 'localhost' resulting in failure, till it was fixed by removal.

Hope that helped.
Regards,
Srimanth






On Thu, Jul 4, 2013 at 7:09 AM, Vivek Padmanabhan <[email protected] <mailto:[email protected]>> wrote:

    Hi Srimanth,

     Thanks for the response my replies below. I am using HDP-1.3.0.0.

    a) It was a https call made to ambari server from agent;

    INFO 2013-07-04 09:38:00,378 security.py:49 - SSL Connect being
    called.. connecting to the server
    INFO 2013-07-04 09:38:00,563 Controller.py:99 - Unable to connect
    to: https://xxxxx:8441/agent/v1/register/xxxxxx

    If i change the jdk to 1.6, it starts working.

    b) When I manually acess the url, I can properly see the status,
    gangalia, do start/stop, config changes etc.
    It doesnt jump back to the installer.

    c)
    Was the ambari-server started on localhost initially perhaps?
        This could be. But after we corrected other machines, we did
    ambari-server reset.
        Next time it failed saying the same localhost, even though the
    conf was proper.
        Hence we removed rpm, but still did not help and finally
    deleted /etc/ambari-agent  and /usr/lib/ambari*. Which helped.

    (Everytime we were doing retry for that machine installation alone)

    d) Sure will have a look at the agent logs next time.


    Thanks
    Vivek



    On Thursday 04 July 2013 07:10 PM, Srimanth Gunturi wrote:
    Hi Vivek,
    Wanted to find out the version of Ambari you are using.

    a) What sort of communication failures were you seeing? If there
    is anything specific in logs that you can share?

    b) UI jumping to installer after login means that the server says
    installation is not complete. Did you notice any errors during
    install? Also when it does go back to installer, which page of
    installer does it end on, and are any previous values populated?

    When you do manually go to http://xxx:5858/#/main/dashboard -
    does it stay there, or jump back to installer after a few clicks?

    c) Ambari server should be setup on a hostname (hostname -f) from
    where agent nodes can talk back.
    Was the ambari-server started on localhost initially perhaps?
    When some agent hosts had server as localhost - did you install
    agent manually?

    d) Ganglia server component failed to install for some reason.
    The agent logs on that node should contain exceptions of why it
    failed. Fixing that issue should help.

    Regards,
    Srimanth




    On Wed, Jul 3, 2013 at 10:10 PM, Vivek Padmanabhan
    <[email protected] <mailto:[email protected]>> wrote:

        Hi,
        I was trying out ambari to setup a cluster and we faced some
        of the below issues. Would be great if someone could throw
        some light on these;



        a) Is it possible to run ambari with jdk1.7. We are seeing
        some communication failures while using 1.7 for ambari.
        But prior to ambari we have tested our hadoop programs with
        1.7 and everything went well. And all of
        our code base is in 1.7. (we have no native apps)





        b) After a cluster setup finished successfully,we are able to
        see the dashborad etc. But after few clicks or if i am accessing
        it from a different machine it again redirects me to the
        installation page.

        I figured out that manually entering the below urls only can
        help us. (our port is 585. and browser cache is cleared)
        http://xxx:5858/#/main/dashboard




        c) During our process of hadoop deployment and installation,
        some servers failed (ssh access) and some passed .
        So we had to reset and start from the beginning. But this
        time those which passed earlier are failing now,
        since it thinks that the ambari server is 'localhost' .

        The property in the /etc/...ini file the server ip was
        proper. So, we tried the following in those failed machines

        * Remove rpm,reset ambari - This did not work on retry
        * Remove the rpm,delete /etc/ambari-agent, delete
        /usr/lib/ambari* , retry – It worked

        Does this mean that the rpm -e did not remove all the files?
        Is there anything extra we need to care take in such scenarios





        d) Hadoop installation and deployment gets successful at
        random retries. When it fails only message we saw was ;
        ERROR ServiceComponentHostImpl:721 – Can’t handle
        ServiceComponentHostEvent event at current state,
        serviceComponentName=GANGLIA_SERVER,
        hostName=server233.xxxxxx, currentState=INSTALL_FAILED,
        eventType=HOST_SVCCOMP_OP_
        SUCCEEDED, event=EventType: HOST_SVCCOMP_OP_SUCCEEDED
        15:17:12,934 WARN HeartBeatHandler:233 – State machine exception
        org.apache.ambari.server.state.fsm.InvalidStateTransitionException:
        Invalid event: HOST_SVCCOMP_OP_SUCCEEDED at INSTALL_FAILED





        Thanks
        Vivek





Reply via email to