Github user nickwallen commented on the issue: https://github.com/apache/incubator-metron/pull/436 I have been able to launch "Quick Dev" with deployment report. Thanks for the fix @dlyle65535 I have been fighting a bit with the AWS deployment. I ran into two issues. (1) On one pass the setup of Ambari seems to fail, but the deployment continued, which causes it to fail later on in the deployment. To fix, I manually logged into the host and ran the Ambari setup and then re-ran the deployment which addressed the problem. I am almost certain that I have seen this before prior to the work in this PR. ``` $ ./run.sh ... TASK [ambari_master : Setup ambari server] ************************************* ... "Successfully downloaded JDK distribution to /var/lib/ambari-server/resources/jdk-8u77-linux-x64.tar.gz", "Installing JDK to /usr/jdk64/", "Successfully installed JDK to /usr/jdk64/", "Downloading JCE Policy archive from http://public-repo-1.hortonworks.com/ARTIFACTS/jce_policy-8.zip to /var/lib/ambari-server/resources/jce_policy-8.zip", "", "Successfully downloaded JCE Policy archive to /var/lib/ambari-server/resources/jce_policy-8.zip", "Installing JCE policy...", "Completing setup...", "Configuring database...", "Enter advanced database configuration [y/n] (n)? ", "Configuring database...", "Default properties detected. Using built-in database.", "Configuring ambari database...", "Checking PostgreSQL...", "Running initdb: This may take up to a minute.", "Initializing database: [ OK ]", "", "About to start PostgreSQL", "Configuring local database...", "Connecting to local database...connection timed out...retrying (1)", "Connecting to local database...connection timed out...r etrying (2)", "Connecting to local database...unable to connect to database", "ERROR: could not change directory to \"/home/centos\"", "psql: FATAL: the database system is starting up", "", "ERROR: Exiting with exit code 2. ", "REASON: Running database init script failed. Exiting."], "warnings": []} $ ./run.sh ... TASK [ambari_config : check if ambari-server is up on ec2-52-37-229-181.us-west-2.compute.amazonaws.com:8080] *** fatal: [ec2-52-37-229-181.us-west-2.compute.amazonaws.com]: FAILED! => {"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for ec2-52-37-229-181.us-west-2.compute.amazonaws.com:8080"} ``` (2) The second issue was more unexpected. On all but one of the 10 AWS nodes, the deployment went smoothly. At some point during the deployment, Ansible could not talk to one node, but it continued on anyways. After the 9 were finished, Ambari showed all 10 nodes, except the one, which it showed in yellow indicating that it could not get a heartbeat. After Ansible was done with the 9 nodes, it then seemed to almost start over on the last node. It went and rebuilt the source code, pushed out the RPMs, reinstalled the MPack, etc. That really confused the cluster and it has not processed any data. I'm sure a little manual effort could fix-up the cluster, but the behavior of Ansible was weird. Before when I've worked with the AWS deployment, it would fail if any one node failed. Now it seems to retry failed nodes at a later point in time, which has some negative implications when we expect actions like the build, mpack install, etc to only occur once. Not sure what to make of this issue.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---