from:"Jonathan Hurley"

Re: [EXTERNAL]Difference between installing from Apache vs Hortonworks

2019-11-04 Thread Jonathan Hurley

Yes, there is typically a slight difference between the x.y.z version from
Apache and that offered by Hortonworks. The Hortonworks version usually
contains several additional fixes that did not make it into the official
Apache release. Additionally, there are stacks shipped with the Hortonworks
version which are not shipped with the Apache version.

On Fri, Nov 1, 2019 at 7:13 PM Reed Villanueva 
wrote:

> I actually meant something more along the lines of: Is the underlying code
> any different when getting the same ambari version from Hortonworks vs
> building from source via the apache docs? And the reason for asking was due
> to the differences in installation method, yet having both marketed as
> "Apache".
>
> And I agree that the package manager is much easier than via the maven
> build, but that was partly why I was asking if the actual underlying code
> had any difference.
>
> On Fri, Nov 1, 2019 at 10:51 AM Preston, Dale <
> dale.pres...@conocophillips.com> wrote:
>
>> The difference is huge.  From Hortonworks, you can install packages using
>> your package manager.  Building and deploying from Maven is not a light
>> undertaking.  I’ve done it a couple times out of obstinance and
>> stubbornness just to say I had.  I was eventually successful but I wouldn’t
>> want to do it again.
>>
>>
>>
>> *From:* Reed Villanueva 
>> *Sent:* Friday, November 1, 2019 3:03 PM
>> *To:* user@ambari.apache.org
>> *Subject:* [EXTERNAL]Difference between installing from Apache vs
>> Hortonworks
>>
>>
>>
>> Is there a difference between installing ambari via the apache docs
>> 
>> vs Hortonworks docs
>> 
>> ?
>>
>>
>>
>> I assume that the end result is exactly the same since Hortonworks labels
>> the distribution in the docs as "Apache" and the repo they instruct to add
>> uses apache.org
>> 
>> as the package URL:
>>
>> [root@HW001 ~]# yum info ambari-server
>> Installed Packages
>> Name: ambari-server
>> Arch: x86_64
>> Version : 2.7.3.0
>> Release : 139
>> Size: 418 M
>> Repo: installed
>> From repo   : ambari-2.7.3.0
>> Summary : Ambari Server
>> URL : http://www.apache.org
>> 
>> License : (c) Apache Software Foundation
>> Description : Maven Recipe: RPM Package.
>>
>>
>>
>> However, the installation instructions that the ambari project site links
>> to are different and only involve building from source via maven (and seem
>> to nowhere mention installation options via package manager), so gives me
>> pause as to whether these are exactly the same.
>>
>>
>>
>> Could anyone with more experience here explain this a bit more to me?
>>
>>
>> This electronic message is intended only for the named
>> recipient, and may contain information that is confidential or
>> privileged. If you are not the intended recipient, you are
>> hereby notified that any disclosure, copying, distribution or
>> use of the contents of this message is strictly prohibited. If
>> you have received this message in error or are not the named
>> recipient, please notify us immediately by contacting the
>> sender at the electronic mail address noted above, and delete
>> and destroy all copies of this message. Thank you.
>>
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all

Re: Ambari Kafka version

2019-01-11 Thread Jonathan Hurley

I believe that Kafka 1.0.0.3.0 is first supported in HDP 3.0. You will need 
Ambari 2.7 to use HDP 3.0. You can upgrade Ambari 2.6 to Ambari 2.7 and also 
HDP 2.6 to HDP 3.0. Or, you can do a new install. 

On 1/11/19, 9:03 AM, "Jacek Szewczyk"  wrote:

Thanks Jonathan,

Is there a way to upgrade to most recent (at least 1.0) kafka with 2.6 
Ambari? If not which Ambari can give me out of the box 1.0 Kafka?

Jacek

> On Jan 11, 2019, at 14:56, Jonathan Hurley  
wrote:
> 
> HDP 2.6 uses Kafka 0.10.0:
> 
https://github.com/apache/ambari/blob/branch-2.6/ambari-server/src/main/resources/stacks/HDP/2.6/services/KAFKA/metainfo.xml#L23
> 
> The version number which you are seeing is a combination of the HDP stack 
version (2.6.1.0-129) along with the Apache version of Kafka which is included 
(0.10.1). If you were trying to install HDP 2.6.2.2, then you must have used 
the wrong repo since 2.6.1.0-129 was installed. Either way, you're still going 
to get Kafka 0.10.1
> 
> On 1/11/19, 6:48 AM, "Jacek Szewczyk"  wrote:
> 
>Hey !
> 
>I am confused by Ambari kafka version, after installing  2.6.2.2 in 
Admin -> Stack and Versions I have kafka 1.0.0 so I expected that’s the version 
kafka was installed, but in reality it is: 
>kafka_2_6_1_0_129-0.10.1.2.6.1.0-129.noarch
> 
>Any ideas why that’s the case?
> 
>Thanks,
> 
>Jacek 
> 
> 
>

Re: Ambari Kafka version

2019-01-11 Thread Jonathan Hurley

HDP 2.6 uses Kafka 0.10.0:
https://github.com/apache/ambari/blob/branch-2.6/ambari-server/src/main/resources/stacks/HDP/2.6/services/KAFKA/metainfo.xml#L23

The version number which you are seeing is a combination of the HDP stack 
version (2.6.1.0-129) along with the Apache version of Kafka which is included 
(0.10.1). If you were trying to install HDP 2.6.2.2, then you must have used 
the wrong repo since 2.6.1.0-129 was installed. Either way, you're still going 
to get Kafka 0.10.1

On 1/11/19, 6:48 AM, "Jacek Szewczyk"  wrote:

Hey !

I am confused by Ambari kafka version, after installing  2.6.2.2 in Admin 
-> Stack and Versions I have kafka 1.0.0 so I expected that’s the version kafka 
was installed, but in reality it is: 
kafka_2_6_1_0_129-0.10.1.2.6.1.0-129.noarch

Any ideas why that’s the case?

Thanks,

Jacek

Re: Cannot match package for regexp name

2018-06-14 Thread Jonathan Hurley

Hi,

Please uninstall hadoop_2_6_4_0_91-yarn.x86_64 from my-yum-local-repo. Then, 
remove this repo by searching /etc/yum.repos.d for my-yum-local-repo. After 
this, you can retry the install.

From: Lian Jiang 
Reply-To: "user@ambari.apache.org" 
Date: Wednesday, June 13, 2018 at 11:00 PM
To: "user@ambari.apache.org" 
Subject: Cannot match package for regexp name

Hi,
My cluster installed using ambari has two datanodes. One of them failed to 
install nodemanager due to error:

resource_management.core.exceptions.Fail: Cannot match package for regexp name 
hadoop_${stack_version}-yarn.

# for kafka broker which can be installed successfully, I see:

yum list installed | grep kafka
kafka_2_6_4_0_91.noarch   0.10.1.2.6.4.0-91 @HDP-2.6-repo-1

# for nodemanager which fail to install, I see:

yum list installed | grep yarn
hadoop_2_6_4_0_91-yarn.x86_64 2.7.3.2.6.4.0-91  
@my-yum-local-repo

I guess yarn is associated with wrong repo. How can I fix this? Appreciate any 
help!

Re: unexpected version error - for custom components

2018-06-07 Thread Jonathan Hurley

I’m not sure if you meant mpack instead of mstack, but the alert below 
indicates that there are components installed which are not reporting the 
correct versions. It should tell you which components are wrong.

Normally, this happens with an mpack that has installed services which indicate 
they advertise a version when they really don’t. The fix here is to find the 
service’s metainfo.xml which was installed and look for 
true and set this to false.

From: Satyanarayana Jampa 
Reply-To: "user@ambari.apache.org" 
Date: Thursday, June 7, 2018 at 11:28 AM
To: "user@ambari.apache.org" 
Subject: unexpected version error - for custom components

Hi Team,

I have installed custom services using mstack and the below alert is observed:

“This alert is triggered if the server detects that there is a problem with the 
expected and reported version of a component. The alert is suppressed 
automatically during an upgrade.”

What needs to be fixed to disable this alert. Thanks in advance.

Thanks,
Satya.

Re: ${hdp.version} parameter in service client configs

2018-03-16 Thread Jonathan Hurley

This is expected for now. The problem is that client configs are run and 
rendered on the Ambari server itself, which might not even be a part of the 
cluster. Some properties, such as the ones you listed below, are rendered on a 
per-host basis, and can be different depending on the versions of the 
components which are installed.

We believe that the download client configs logic needs to be rewritten to 
allow you to specify the host on which you want to download them.

On Mar 16, 2018, at 5:35 AM, Gonzalo Herreros 
> wrote:

In the cluster nodes, it's normal to have hdp.version as a variable in all the 
configs, which gets resolved at runtime.
I think the ambari agents set it to the right value on the start scripts

However, it is a good point that if you want to download the config, it's 
normally because you want to use it on some client external to the node and 
thus shouldn't need that.

Gonzalo

On 15 March 2018 at 21:21, Juanjo Marron 
> wrote:
Hi all,

I am using the Download Service Client Configs feature in Ambari APIs and I 
realized that in some of the configuration files the ${hdp.version} parameter 
has not been resolved.

This token needs to be replaced by the right hdp version value in order to 
properly use some of the properties.
Is it a bug  in Ambari or it is the expected behavior?
In case it is expected, how/where is the best way to obtain the  value for 
{hdp.version}?

A good example is: mapred-site.xml configuration file form MapReduce2 service 
where multiple property values downloaded maintain the parameter ${hdp.version}

These are two of them:

  mapreduce.application.classpath

$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:/usr/hdp/current/ext/hadoop/*

  mapreduce.application.framework.path

/hdp/apps/${hdp.version}/mapreduce/mapreduce.tar.gz#mr-framework

Also I have seen tokens that get replaced in the UI but not in the 
configuration file downloaded, for example:

The property yarn.nodemanager.aux-services.spark2_shuffle.classpath in advanced 
yarn-site.xml (YARN service) shows this value in the UI:

{{stack_root}}/${hdp.version}/spark2/aux/*

While in the client configurations obtained after downloaded

  yarn.nodemanager.aux-services.spark2_shuffle.classpath
  /usr/hdp/${hdp.version}/spark2/aux/*

{{stack_root}} gets properly replaced but ${hdp.version} remains as a string 
not resolved

How that? Can not the same logic applied to both parameters?

I would appreciate some answers and more details on this topic

Thanks

Re: Ambari api to restart all clients without specifying the list of SERVICE/components on all hosts

2017-09-18 Thread Jonathan Hurley

Depending on which version of Ambari you are on, a request like this might work:

POST api/v1/clusters//requests

{
  "RequestInfo":
  {
  "command":"RESTART",
  "context":"Restart all ZooKeeper Clients Across the Cluster",
  "operation_level": {
  "level":"HOST",
  "cluster_name":"c1"
}
  },"Requests/resource_filters":[
{
"service_name": "ZOOKEEPER",
"component_name": "ZOOKEEPER_CLIENT",
"hosts_predicate":"HostRoles/component_name=ZOOKEEPER_CLIENT"
 }
]
}

On Sep 18, 2017, at 7:19 AM, Latha Appanna 
> wrote:

Hello,

Do we have  ambari  REST api to RESTART all clients/refresh all client configs 
without specifying the list of service/components?

https://cwiki.apache.org/confluence/display/AMBARI/Restarting+host+components+via+the+API
I know we already have this, but it requires one to specify all the required 
hostname, service-name and components


Thanks & Regards,
Latha

Re: Error while creating database accessor

2017-09-18 Thread Jonathan Hurley

This is caused by the connection refused exception to your standalone postgres 
database. It means the Ambari Server can't connect to it. You can check the 
settings in ambari.properties to see if any of them are wrong:

grep jdbc /etc/ambari-server/conf/ambari.properties

And the adjust any which are wrong (database host, port, etc). You should also 
try psql to ensure that you can communicate with the database from this host 
outside of Ambari Server.

On Sep 15, 2017, at 9:08 PM, mayank rathi 
> wrote:


Hello All,

I just finished setting up Ambari 2.5.1. I am using Standalone PostgreSQL 9.6

Setup finished successfully but I am getting error while starting Ambari server.

This is what I see in Linux command line

Ambari Server 'setup' completed successfully.
[kfkadm@**didkfkw ~]$ ambari-server start
Using python /usr/bin/python
Starting ambari-server
Organizing resource files at /var/lib/ambari-server/resources...
Unable to check firewall status when starting without root privileges.
Please do not forget to disable or adjust firewall if needed
Ambari database consistency check started...
Server PID at: /var/run/ambari-server/ambari-server.pid
Server out at: /var/log/ambari-server/ambari-server.out
Server log at: /var/log/ambari-server/ambari-server.log
/usr/bin/sh: line 0: ulimit: open files: cannot modify limit: Operation not 
permitted
Waiting for server start.Unable to determine server PID. Retrying...
..Unable to determine server PID. Retrying...
..Unable to determine server PID. Retrying...
ERROR: Exiting with exit code -1.
REASON: Ambari Server java process died with exitcode 1. Check 
/var/log/ambari-server/ambari-server.out for more information.

This is what I see in ambari-server.out file

Error injecting constructor, java.lang.RuntimeException: Error while creating 
database accessor
at org.apache.ambari.server.orm.DBAccessorImpl.(DBAccessorImpl.java:85)
at org.apache.ambari.server.orm.DBAccessorImpl.class(DBAccessorImpl.java:73)
while locating org.apache.ambari.server.orm.DBAccessorImpl
while locating org.apache.ambari.server.orm.DBAccessor
for field at 
org.apache.ambari.server.orm.dao.DaoUtils.dbAccessor(DaoUtils.java:36)
at org.apache.ambari.server.orm.dao.DaoUtils.class(DaoUtils.java:36)
while locating org.apache.ambari.server.orm.dao.DaoUtils
for field at org.apache.ambari.server.orm.dao.UserDAO.daoUtils(UserDAO.java:45)
at org.apache.ambari.server.orm.dao.UserDAO.class(UserDAO.java:45)
while locating org.apache.ambari.server.orm.dao.UserDAO
for field at 
org.apache.ambari.server.controller.internal.ActiveWidgetLayoutResourceProvider.userDAO(ActiveWidgetLayoutResourceProvider.java:61)
Caused by: java.lang.RuntimeException: Error while creating database accessor
at org.apache.ambari.server.orm.DBAccessorImpl.(DBAccessorImpl.java:118)
at 
org.apache.ambari.server.orm.DBAccessorImpl$$FastClassByGuice$$86dbc63e.newInstance()
at 
com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
at 
com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
at 
com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
at 
com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at 
com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:54)
at 
com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:53)
at 
com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:110)
at 
com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:94)
at 
com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at 
com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at 
com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:53)
at 
com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:110)
at

Re: Enabling SNMP alerts

2017-06-06 Thread Jonathan Hurley

No, it cannot. The script-approach for SNMP in Ambari 2.2 is meant as a way of 
providing your own custom behavior on top of Ambari. The core logic of Ambari 
only passes the fields that you see here. (alert state, name, service, etc).

You need to edit the script to provide your own host. I've seen people actually 
put several hosts in this script since they want to push to multiple SNMP 
managers for the same notification.

Something else to mention; the web client in Ambari 2.2 didn't support edittng 
script dispatchers. You shouldn't try to edit it from the web client as it 
could change the type.

On Jun 6, 2017, at 2:09 AM, Satyanarayana Jampa 
> wrote:

Hi,
I am using Ambari 2.2 and I would like to enable the SNMP 
alerts.
For  enabling the alerts I followed the below link:

https://community.hortonworks.com/articles/74370/snmp-alert.html
The “snmp_mib_script.sh” has snmptrap command, which take 
localhost as the “HOST” parameter as below:
Can this “Host” parameter be taken from the Ambari “Edit Notification” screen, 
just like the alert state, alertname etc?


HOST=localhost
COMMUNITY=public

STATE=0
if [ $4 == "OK" ]; then
  STATE=0
elif [ $4 == "UNKNOWN" ]; then
  STATE=1
elif [ $4 == "WARNING" ]; then
  STATE=2
elif [ $4 == "CRITICAL" ]; then
  STATE=3
fi

/usr/bin/snmptrap -v 2c -c $COMMUNITY $HOST '' 
APACHE-AMBARI-MIB::apacheAmbariAlert alertDefinitionName s "$1" alertName s 
"$2" alertText s "$5" alertState i $STATE alertService s "$3"

Thanks,
Satya.

Re: Ambari's "HBase Regionserver Process" alert thresholds

2017-03-24 Thread Jonathan Hurley

You're right that the AGGREGATE alert doesn't give you the host name of the
affected host. You can query the alerts endpoint directly to discover the name
of the host:
GET
api/v1/clusters//alerts?Alert/state=CRITICAL/definition_name=hbase_regionserver_process

On Mar 24, 2017, at 4:05 PM, Ganesh Viswanathan
<gan...@gmail.com<mailto:gan...@gmail.com>> wrote:

This API call worked to get the state for all regionservers:

/api/v1/clusters/cluster_name/services/HBASE/components/HBASE_REGIONSERVER?fields=host_components/HostRoles/state

I can filter out INSTALLED from this list to find the stopped one.

Thanks!

On Fri, Mar 24, 2017 at 12:34 PM, Ganesh Viswanathan
<gan...@gmail.com<mailto:gan...@gmail.com>> wrote:
Thanks, that explains the behavior when I shut down the regionserver process
and see the CRITICAL alert.

What I am trying to do is setup a WARNING alert for the case when a single
"HBase Regionserver Process" is down and CRITICAL alert when two or more
regionservers are down. I am also trying to get the hostname where the
regionserver is down in the warning case.

Only the "HBase Regionserver Process" alert gives the name of the host impacted
(I don't get these from "RegionServers Health Summary" and "Percent
RegionServers Available"), hence I am trying to suitably modify this alert for
my use-case. Is there a better way to get the regionserver host impacted from
Ambari API when RegionServers Health Summary fires at WARNING level?

On Fri, Mar 24, 2017 at 12:27 PM, Jonathan Hurley
<jhur...@hortonworks.com<mailto:jhur...@hortonworks.com>> wrote:
I'm not sure what you mean when you say "turn down" the process. If you are
shutting down the process, then the port is released and the alert will not be
able to make a socket connection. You will get a CRITICAL right away. The
values in the alert are a round-trip-time coupled with a socket read time. For
the warning, it will attempt to make a socket connection and if it succeeds and
releases in under 1.5 seconds, then there's no warning. Because you set the
CRITICAL value to 3600s but stopped the process, it's not going to wait 3600
since it can detect much faster that the port is not open for a socket
connection.

On Mar 24, 2017, at 2:40 PM, Ganesh Viswanathan
<gan...@gmail.com<mailto:gan...@gmail.com>> wrote:

I am using Ambari's "HBase Regionserver Process" alert with 1.5s as WARNING
threshold and 3600s as CRITICAL threshold. However, when I test this by turning
down the regionserver process, the alert fires off as CRITICAL directly. Is
this a bug?

I am using HDP2.4 with Ambari 2.2.1.0<http://2.2.1.0/>:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Ambari_Users_Guide/content/_hbase_service_alerts.html

Thanks,
Ganesh

Re: Ambari's "HBase Regionserver Process" alert thresholds

2017-03-24 Thread Jonathan Hurley

I'm not sure what you mean when you say "turn down" the process. If you are 
shutting down the process, then the port is released and the alert will not be 
able to make a socket connection. You will get a CRITICAL right away. The 
values in the alert are a round-trip-time coupled with a socket read time. For 
the warning, it will attempt to make a socket connection and if it succeeds and 
releases in under 1.5 seconds, then there's no warning. Because you set the 
CRITICAL value to 3600s but stopped the process, it's not going to wait 3600 
since it can detect much faster that the port is not open for a socket 
connection.

On Mar 24, 2017, at 2:40 PM, Ganesh Viswanathan 
> wrote:

I am using Ambari's "HBase Regionserver Process" alert with 1.5s as WARNING 
threshold and 3600s as CRITICAL threshold. However, when I test this by turning 
down the regionserver process, the alert fires off as CRITICAL directly. Is 
this a bug?

I am using HDP2.4 with Ambari 2.2.1.0:
https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.0/bk_Ambari_Users_Guide/content/_hbase_service_alerts.html


Thanks,
Ganesh

Re: Ambari Metrics Collector Process alert - CRITICAL threshold rule

2016-10-28 Thread Jonathan Hurley

In your version of Ambari, the alert will trigger right away. In Ambari 2.4, we
have the notion of "soft" and "hard" alerts. You can configure it so that it
doesn't trigger alert notifications until n number of CRITICAL alerts have been
received in a row.

On Oct 28, 2016, at 4:07 PM, Ganesh Viswanathan
<gan...@gmail.com<mailto:gan...@gmail.com>> wrote:

Thanks Jonathan, that explains some of the behavior I'm seeing.

Two additional questions:
1) How do I make sure the Ambari "Metrics Collector Process" does not alert
immediately when the process is down? I am using Ambari 2.2.1.0 and it has a
bug [1] which can trigger restarts of the process. The fix for
AMBARI-15492<http://issues.apache.org/jira/browse/AMBARI-15492> has been
documented in that wiki as "comment out auto-recovery". But that would mean the
process would not restart (when the bug hits) bringing down visibility into the
cluster metrics. We have disabled the auto-restart count alert because of the
bug, but what is a good way to say "if the metrics collector process has been
down for 15mins, then alert".

2) Will restarting "Metrics Collector Process" impact the other hbase or hdfs
health alerts? Or is this process only for the Ambari-Metrics system
(collecting usage and internal ambari metrics). I am trying to see if the
Ambari Metrics Collector Process can be disabled while still keep the other
hbase and hdfs alerts.

[1] https://cwiki.apache.org/confluence/display/AMBARI/Known+Issues

-Ganesh

On Fri, Oct 28, 2016 at 12:36 PM, Jonathan Hurley
<jhur...@hortonworks.com<mailto:jhur...@hortonworks.com>> wrote:
It sounds like you're asking two different questions here. Let me see if I can
address them:

Most "CRITICAL" thresholds do contain different text then their OK/WARNING
counterparts. This is because there is different information which needs to be
conveyed when an alert has gone CRITICAL. In the case of this alert, it's a
port connection problem. When that happens, administrators are mostly
interested in the error message and the attempted host:port combination. I'm
not sure what you mean by "CRITICAL is a point in time alert". All alerts of
the PORT/WEB variety are point-in-time alerts. They represent the connection
state of a socket and the data returned over that socket at a specific point in
time. The alert which gets recorded in Ambari's database maintains the time of
the alert. This value is available via a tooltip hover in the UI.

The second part of your question is asking why increasing the timeout value to
something large, like 600, doesn't prevent the alert from triggering. I believe
this is how the python sockets are being used in that a failed connection is
not limited to the same timeout restrictions as a socket which won't respond.
If the machine is available and refuses the connection outright, then the
timeout wouldn't take effect.

On Oct 28, 2016, at 1:37 PM, Ganesh Viswanathan
<gan...@gmail.com<mailto:gan...@gmail.com>> wrote:

Hello,

The Ambari "Metrics Collector Process" Alert has a different defintion for
CRITICAL threshold vs. OK and WARNING thresholds. What is the reason for this?

In my tests, CRITICAL seems like a "point-in-time" alert and the value of that
field is not being used. When the metrics collector process is killed or
restarts, the alert fires in 1min or less even when I set the threshold value
to 600s. This means the alert description of "This alert is triggered if the
Metrics Collector cannot be confirmed to be up and listening on the configured
port for number of seconds equal to threshold." NOT VALID for CRITICAL
threshold. Is that true and what is the reason for this discrepancy? Has anyone
else gotten false pages because of this and what is the fix?

"ok": {
"text": "TCP OK - {0:.3f}s response on port {1}"
},
"warning": {
"text": "TCP OK - {0:.3f}s response on port {1}",
"value": 1.5
},
"critical": {
"text": "Connection failed: {0} to {1}:{2}",
"value": 5.0
}

Ref:
https://github.com/apache/ambari/blob/2ad42074f1633c5c6f56cf979bdaa49440457566/ambari-server/src/main/resources/common-services/AMBARI_METRICS/0.1.0/alerts.json#L102

Thanks,
Ganesh

Re: Ambari Metrics Collector Process alert - CRITICAL threshold rule

2016-10-28 Thread Jonathan Hurley

It sounds like you're asking two different questions here. Let me see if I can
address them:

On Oct 28, 2016, at 1:37 PM, Ganesh Viswanathan
> wrote:

Hello,

The Ambari "Metrics Collector Process" Alert has a different defintion for
CRITICAL threshold vs. OK and WARNING thresholds. What is the reason for this?

Ref:
https://github.com/apache/ambari/blob/2ad42074f1633c5c6f56cf979bdaa49440457566/ambari-server/src/main/resources/common-services/AMBARI_METRICS/0.1.0/alerts.json#L102

Thanks,
Ganesh

Re: HDP install issues about hdp-select

2016-06-15 Thread Jonathan Hurley

I believe this is because you have a "hadoop" directory in /usr/hdp ... 
/usr/hdp should only contain versions and "current". If there's another 
directory, it would cause the hdp-select tool to fail.

On Jun 15, 2016, at 3:23 PM, Pawel Akonom 
> wrote:

Hi,

I have some problems with hadoop cluster scratch installation (not upgrade). 
Versions I am using:


Ambari version: 2.2.2.0
HDP stack version: 2.3.4.7


Zookeper installation fails on step:

Execute['ambari-sudo.sh /usr/bin/hdp-select set all `ambari-python-wrap 
/usr/bin/hdp-select versions | grep ^2.3 | tail -1`'] {'only_if': 'ls -d 
/usr/hdp/2.3*'}

When I execute this command manually on bash I get error:

[root@hdp-vora-master ~]# ambari-python-wrap /usr/bin/hdp-select versions
Traceback (most recent call last):
  File "/usr/bin/hdp-select", line 378, in 
printVersions()
  File "/usr/bin/hdp-select", line 235, in printVersions
result[tuple(map(int, versionRegex.split(f)))] = f
ValueError: invalid literal for int() with base 10: 'hadoop'

The same problem is described in Hortonworks community:
https://community.hortonworks.com/questions/5811/install-of-hdp-fails-with-valueerror-invalid-liter.html

Workaround for this problem is to edit /usr/bin/hdp-select python script and 
modify printVersions() function.

Function before:

# Print the installed packages
def printVersions():
  result = {}
  for f in os.listdir(root):
if f not in [".", "..", "current", "share", "lost+found"]:
  result[tuple(map(int, versionRegex.split(f)))] = f
  keys = result.keys()
  keys.sort()
  for k in keys:
 print result[k]

Function after modification:

# Print the installed packages
def printVersions():
  result = {}
  for f in os.listdir(root):
if f not in [".", "..", "current", "share", "lost+found"]:
  try:
result[tuple(map(int, versionRegex.split(f)))] = f
  except:
pass
  keys = result.keys()
  keys.sort()
  for k in keys:
 print result[k]

Hadoop cluster installation need to be automated so this workaround is not a 
solution. Script /usr/bin/hdp-select appears during installation and it doesn't 
come with any rpm package. It can be a bug in the script or maybe it fails only 
with some specific python versions only.

Do you know the problem? If so what can be the solution?

Thanks in advance,
Pawel

Re: NPE in AbstractResourceProvider

2016-05-31 Thread Jonathan Hurley

We'd need to know which version of Ambari you're using. This type of error can 
typically be seen in one of two scenarios:

- You're using MySQL with MyISAM as the database engine. MyISAM doesn't support 
transactions or foreign keys and can lead to a corrupted Ambari database.

- You're using an older build of Ambari which had issues with cached JPA 
entities. As a result, a restart usually fixes the issue.

On May 30, 2016, at 7:20 PM, Fay Wang 
> wrote:

Hi,
   We encounter this NPE when trying to logon to Ambari UI. Any help is highly 
appreciated

30 May 2016 22:55:48,883  INFO [qtp-client-57] MetricsPropertyProvider:526 - 
METRICS_COLLECTOR is not live. Skip populating resources with metrics.
30 May 2016 22:55:48,883  INFO [qtp-client-57] MetricsPropertyProvider:526 - 
METRICS_COLLECTOR is not live. Skip populating resources with metrics.
30 May 2016 22:55:48,884  INFO [qtp-client-57] MetricsPropertyProvider:526 - 
METRICS_COLLECTOR is not live. Skip populating resources with metrics.
30 May 2016 22:55:48,884  INFO [qtp-client-57] MetricsPropertyProvider:526 - 
METRICS_COLLECTOR is not live. Skip populating resources with metrics.
30 May 2016 22:55:48,885  INFO [qtp-client-57] MetricsPropertyProvider:526 - 
METRICS_COLLECTOR is not live. Skip populating resources with metrics.
30 May 2016 22:55:48,885  INFO [qtp-client-57] MetricsPropertyProvider:526 - 
METRICS_COLLECTOR is not live. Skip populating resources with metrics.
30 May 2016 22:55:52,354 ERROR [qtp-client-57] ReadHandler:91 - Caught a 
runtime exception executing a query
java.lang.NullPointerException
at 
org.apache.ambari.server.controller.internal.AlertResourceProvider.toResource(AlertResourceProvider.java:199)
at 
org.apache.ambari.server.controller.internal.AlertResourceProvider.getResources(AlertResourceProvider.java:179)
at 
org.apache.ambari.server.controller.internal.AlertResourceProvider.queryForResources(AlertResourceProvider.java:135)
at 
org.apache.ambari.server.controller.internal.ClusterControllerImpl$ExtendedResourceProviderWrapper.queryForResources(ClusterControllerImpl.java:945)
at 
org.apache.ambari.server.controller.internal.ClusterControllerImpl.getResources(ClusterControllerImpl.java:132)
at org.apache.ambari.server.api.query.QueryImpl.doQuery(QueryImpl.java:489)
at 
org.apache.ambari.server.api.query.QueryImpl.queryForResources(QueryImpl.java:388)
at org.apache.ambari.server.api.query.QueryImpl.execute(QueryImpl.java:224)

Re: Ambari-server schema upgrade failed -2.2.1.1

2016-05-25 Thread Jonathan Hurley

Ensure that this MySQL JAR file is specified in your 
/etc/ambari-server/conf/ambari.properties:

db.mysql.jdbc.name=/var/lib/ambari-server/resources/mysql-connector-java-5.1.36-bin.jar

On May 25, 2016, at 2:23 PM, Anandha L Ranganathan 
> wrote:

I am trying to upgrade ambari-server with 2.2.1.1 version from 2.1.0

1) First upgraded ambari-server  with "yum upgrade ambari-server"

Here is the message received.
Updated:
  ambari-server.x86_64 0:2.2.1.1-70

Complete!

2)  Upgrade ambari-database with "ambari-server upgrade"

This is the response I received.

[root@usw2stdpmn02 yum.repos.d]# ambari-server upgrade
Using python  /usr/bin/python
Upgrading ambari-server
Updating properties in ambari.properties ...
WARNING: Can not find ambari-env.sh.rpmsave file from previous version, 
skipping restore of environment settings
Fixing database objects owner
Ambari Server configured for MySQL. Confirm you have made a backup of the 
Ambari Server database [y/n] (y)? y
Upgrading database schema
Error output from schema upgrade command:
Exception in thread "main" java.lang.Exception: Unexpected error, upgrade failed
at 
org.apache.ambari.server.upgrade.SchemaUpgradeHelper.main(SchemaUpgradeHelper.java:319)
Caused by: com.google.inject.CreationException: Guice creation errors:

1) Error injecting constructor, java.lang.RuntimeException: 
java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
  at org.apache.ambari.server.orm.DBAccessorImpl.(DBAccessorImpl.java:77)
  while locating org.apache.ambari.server.orm.DBAccessorImpl
  while locating org.apache.ambari.server.orm.DBAccessor
for field at 
org.apache.ambari.server.orm.dao.DaoUtils.dbAccessor(DaoUtils.java:36)
  at org.apache.ambari.server.orm.dao.DaoUtils.class(DaoUtils.java:36)
  while locating org.apache.ambari.server.orm.dao.DaoUtils
for field at 
org.apache.ambari.server.orm.dao.HostComponentStateDAO.daoUtils(HostComponentStateDAO.java:39)
  at 
org.apache.ambari.server.orm.dao.HostComponentStateDAO.class(HostComponentStateDAO.java:39)
  while locating org.apache.ambari.server.orm.dao.HostComponentStateDAO
for field at 
org.apache.ambari.server.orm.models.HostComponentSummary.hostComponentStateDao(HostComponentSummary.java:52)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
com.mysql.jdbc.Driver
at 
org.apache.ambari.server.orm.DBAccessorImpl.(DBAccessorImpl.java:103)
at 
org.apache.ambari.server.orm.DBAccessorImpl$$FastClassByGuice$$86dbc63e.newInstance()
at 
com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
at 
com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
at 
com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
at 
com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:54)
at 
com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:53)
at 
com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:110)
at 
com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:94)
at 
com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at 
com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at 
com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:53)
at 
com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:110)
at 
com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:94)
at 
com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at 
com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
at 
com.google.inject.internal.SingleFieldInjector.inject(SingleFieldInjector.java:53)
at

Re: Ambari upgrade 2.4 - not yet finalized

2016-05-19 Thread Jonathan Hurley

You hitting an instance of https://issues.apache.org/jira/browse/AMBARI-15482

I don't know of a way around this aside from:
- Finalizing the upgrade
- Starting NameNode manually from the command prompt

It's probably best to just finalize the upgrade and start NameNode from the web 
client after finalization.

On May 18, 2016, at 10:02 PM, Anandha L Ranganathan 
> wrote:

I am running rolling upgrade in dev cluster .It is completed with 100% but 
not yet finalized.
I was testing int he dev cluster and validating is everything working fine.   I 
was able to run the Hive Query using HS2 server.

I don't remember what is the reason but I restarted all namenode services 
through Ambari UI and I started getting this error.It says run with 
--upgrade option. I thorugh rolling upgrade would take care of it.   Please 
help  me how do I handle this ? What are the steps should I do ?


2016-05-19 01:42:38,561 INFO  util.GSet 
(LightWeightGSet.java:computeCapacity(356)) - 0.02999329447746% max memory 
1011.3 MB = 310.7 KB
2016-05-19 01:42:38,561 INFO  util.GSet 
(LightWeightGSet.java:computeCapacity(361)) - capacity  = 2^15 = 32768 
entries
2016-05-19 01:42:38,579 INFO  common.Storage (Storage.java:tryLock(715)) - Lock 
on /mnt/data/hadoop/hdfs/namenode/in_use.lock acquired by nodename 
13159@usw2dxdpma01.glassdoor.local
2016-05-19 01:42:38,651 WARN  namenode.FSNamesystem 
(FSNamesystem.java:loadFromDisk(690)) - Encountered exception loading fsimage
java.io.IOException:
File system image contains an old layout version -60.
An upgrade to version -63 is required.
Please restart NameNode with the "-rollingUpgrade started" option if a rolling 
upgrade is already started; or restart NameNode with the "-upgrade" option to 
start a new upgrade.
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:245)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:983)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:688)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:662)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:722)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:951)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:935)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1641)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1707)
2016-05-19 01:42:38,661 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
HttpServer2$SelectChannelConnectorWithSafeStartup@usw2dxdpma01.glassdoor.local:50070
2016-05-19 01:42:38,663 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(211)) - Stopping NameNode metrics system...
2016-05-19 01:42:38,664 INFO  impl.MetricsSinkAdapter 
(MetricsSinkAdapter.java:publishMetricsFromQueue(141)) - timeline thread 
interrupted.
2016-05-19 01:42:38,664 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(217)) - NameNode metrics system stopped.
2016-05-19 01:42:38,664 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:shutdown(607)) - NameNode metrics system shutdown 
complete.
2016-05-19 01:42:38,665 ERROR namenode.NameNode (NameNode.java:main(1712)) - 
Failed to start namenode.

Re: Redundant Ambari services

2016-05-09 Thread Jonathan Hurley

Running two Ambari servers concurrently is not going to work due to the nature 
of how the server uses JPA to interact with the database. You can keep a spare 
Ambari server ready to startup on another host and use a virtual IP so that the 
agents don't need to change who they talk to. But now you have all of your 
traffic being routed through a single, virtual IP, so that introduces another 
single point of failure.

On May 9, 2016, at 1:41 PM, David Robison 
> wrote:

Thanks, unfortunately, because of our clients, we cannot deploy to VMs but are 
deploying to physical machines. David

Best Regards,

David R Robison
Senior Systems Engineer

From: Andrew Stadtler [mailto:a...@phdata.io]
Sent: Monday, May 9, 2016 1:40 PM
To: user@ambari.apache.org
Subject: Re: Redundant Ambari services

David,

The simple solution if you have an existing Virtual Machine infrastructure is 
to put the Ambari-Server on a VM with HA that can be restarted automatically in 
the event of a hardware failure. This usually works best if you move the 
database to some type of highly available cluster mysql, postgres or oracle rac 
too.

On May 9, 2016, at 12:34 PM, David Robison 
> wrote:

I am working on setting up a Hadoop cluster where we need to ensure no single 
point of failure. As part of this, the question is how best to deploy the 
Ambari services (e.g. configuration and monitoring) to provide automatic 
failover should one of the monitoring nodes fails. One thought was to use 
something like corosync and pacemaker to start the ambri-server on the failover 
server if the primary should fail. The other idea was to have the ambary-server 
running on both servers and use a virtual IP with failover to automatically 
switch traffic from one server to the other should one fail. We are deploying 
onto Ubuntu 14.04. Has anyone done anything like this? Any thoughts on how to 
proceed? Thanks, David

David R Robison
Senior Systems Engineer
O. +1 512 247 3700
M. +1 512 608 3173
david.robi...@psgglobal.net
www.psgglobal.net

Prometheus Security Group Global, Inc.
3019 Alvin Devane Boulevard
Building 4, Suite 450
Austin, TX 78741

Re: Ambari - Trouble Checking Status for Custom Application

2016-04-19 Thread Jonathan Hurley

Ah, I had forgotten this was a status command. The print statements will also 
appear in /var/log/ambari-agent/ambari-agent.out

On Apr 19, 2016, at 1:08 AM, Souvik Sarkhel 
<souvik.sark...@gmail.com<mailto:souvik.sark...@gmail.com>> wrote:

When will the command run as far as I know Ambari automatically invokes the 
status function.

On Mon, Apr 18, 2016 at 7:50 PM, Jonathan Hurley 
<jhur...@hortonworks.com<mailto:jhur...@hortonworks.com>> wrote:
When your command runs, it will show up in the UI as something like 
"command-123.json". You'll match this up to the "output" file on the agent:
/var/lib/ambari-agent/data/output-123.txt

You're not printing the value of dummy_master_pid_file in your example below; 
you'll want to print that as well to make sure it's rendering properly. Are you 
sure you have the directories correct? You're trying to use zoo/dataDir as the 
placeholder which renders to "/usr/share/zookeeper/tmp". You'll need to make 
sure that your "zoo" config has dataDir set to /usr/share/zookeeper/tmp

On Apr 18, 2016, at 9:44 AM, Souvik Sarkhel 
<souvik.sark...@gmail.com<mailto:souvik.sark...@gmail.com>> wrote:

Hi Jonathan,
Earlier I was using
from resource_management import *
now I have added this import statement also
from resource_management.libraries.functions.format import format
and also given 777 permission till the intended pid file. Still its not working.

Can you please tell me from where can I see the print statements which I 
provide in status function so that I can debug the function.

On Mon, Apr 18, 2016, 18:35 Jonathan Hurley 
<jhur...@hortonworks.com<mailto:jhur...@hortonworks.com>> wrote:
What are your import statements? The "format" function provided by Ambari's 
common library has a naming conflict with a default python function named 
"format". If you don't import the right one, your format("...") command will 
fail silently. Make sure you are importing:

from resource_management.libraries.functions.format import format

On Apr 18, 2016, at 4:27 AM, Souvik Sarkhel 
<souvik.sark...@gmail.com<mailto:souvik.sark...@gmail.com>> wrote:

Hi All,

I have created a custom service for Zookeeper and using Ambari 2.1.0 .In status 
function of master.py if its defined in this way:
def status(self, env):
config = Script.get_config()
zkDataDir = config['configurations']['zoo']['dataDir']
print 'Status of the Zookeeper Master'
print *
print zkDataDir
dummy_master_pid_file = format("{zkDataDir}/zookeeper_server.pid")
check_process_status(dummy_master_pid_file)

Ambari is always showing status of application stopped but when I provide the 
constant path of the pid file for example:

dummy_master_pid_file = "/usr/share/zookeeper/tmp/zookeeper_server.pid")

it starts working perfectly and Ambari is able to correctly show the status of 
the application. I need a variable pid file instead of a constant one. I would 
to thankful if someone suggest me a way out.

Thanking you in advance

--
Souvik Sarkhel





--
Souvik Sarkhel

Re: Ambari - Trouble Checking Status for Custom Application

2016-04-18 Thread Jonathan Hurley

When your command runs, it will show up in the UI as something like 
"command-123.json". You'll match this up to the "output" file on the agent:
/var/lib/ambari-agent/data/output-123.txt

You're not printing the value of dummy_master_pid_file in your example below; 
you'll want to print that as well to make sure it's rendering properly. Are you 
sure you have the directories correct? You're trying to use zoo/dataDir as the 
placeholder which renders to "/usr/share/zookeeper/tmp". You'll need to make 
sure that your "zoo" config has dataDir set to /usr/share/zookeeper/tmp

On Apr 18, 2016, at 9:44 AM, Souvik Sarkhel 
<souvik.sark...@gmail.com<mailto:souvik.sark...@gmail.com>> wrote:

Hi Jonathan,
Earlier I was using
from resource_management import *
now I have added this import statement also
from resource_management.libraries.functions.format import format
and also given 777 permission till the intended pid file. Still its not working.

Can you please tell me from where can I see the print statements which I 
provide in status function so that I can debug the function.

On Mon, Apr 18, 2016, 18:35 Jonathan Hurley 
<jhur...@hortonworks.com<mailto:jhur...@hortonworks.com>> wrote:
What are your import statements? The "format" function provided by Ambari's 
common library has a naming conflict with a default python function named 
"format". If you don't import the right one, your format("...") command will 
fail silently. Make sure you are importing:

from resource_management.libraries.functions.format import format

On Apr 18, 2016, at 4:27 AM, Souvik Sarkhel 
<souvik.sark...@gmail.com<mailto:souvik.sark...@gmail.com>> wrote:

Hi All,

I have created a custom service for Zookeeper and using Ambari 2.1.0 .In status 
function of master.py if its defined in this way:
def status(self, env):
config = Script.get_config()
zkDataDir = config['configurations']['zoo']['dataDir']
print 'Status of the Zookeeper Master'
print *
print zkDataDir
dummy_master_pid_file = format("{zkDataDir}/zookeeper_server.pid")
check_process_status(dummy_master_pid_file)

Ambari is always showing status of application stopped but when I provide the 
constant path of the pid file for example:

dummy_master_pid_file = "/usr/share/zookeeper/tmp/zookeeper_server.pid")

it starts working perfectly and Ambari is able to correctly show the status of 
the application. I need a variable pid file instead of a constant one. I would 
to thankful if someone suggest me a way out.

Thanking you in advance

--
Souvik Sarkhel

Re: Ambari - Trouble Checking Status for Custom Application

2016-04-18 Thread Jonathan Hurley

What are your import statements? The "format" function provided by Ambari's 
common library has a naming conflict with a default python function named 
"format". If you don't import the right one, your format("...") command will 
fail silently. Make sure you are importing:

from resource_management.libraries.functions.format import format

On Apr 18, 2016, at 4:27 AM, Souvik Sarkhel 
> wrote:

Hi All,

I have created a custom service for Zookeeper and using Ambari 2.1.0 .In status 
function of master.py if its defined in this way:
def status(self, env):
config = Script.get_config()
zkDataDir = config['configurations']['zoo']['dataDir']
print 'Status of the Zookeeper Master'
print *
print zkDataDir
dummy_master_pid_file = format("{zkDataDir}/zookeeper_server.pid")
check_process_status(dummy_master_pid_file)

Ambari is always showing status of application stopped but when I provide the 
constant path of the pid file for example:

dummy_master_pid_file = "/usr/share/zookeeper/tmp/zookeeper_server.pid")

it starts working perfectly and Ambari is able to correctly show the status of 
the application. I need a variable pid file instead of a constant one. I would 
to thankful if someone suggest me a way out.

Thanking you in advance

--
Souvik Sarkhel

Re: Changing the Alert Definitions

2016-04-06 Thread Jonathan Hurley

Alerts are automatically distributed to all hosts which match their service and
component. So, if you created your alert definition with HDFS and NameNode,
then Ambari will automatically push this alert definition to any host that's
running NameNode. The host will begin running the alert automatically. There's
really nothing that you need to do here; the alert framework handles everything
for you.

On Apr 6, 2016, at 9:35 AM, Henning Kropp
<hkr...@microlution.de<mailto:hkr...@microlution.de>> wrote:

Actually I added a alert definition (via REST), but it does not have any
Service/Host attached, so I was wondering how are hosts "attached" to an alert
defintion?

It's an alert for HDFS, NAMENODE, so the definition on POST contained the
component and service attributes, which would be enough information to
distribute the alert on the corresponding hosts?

Sorry for the confusion. In my search for an answer I came accross the
host-only alerts and thought it was related.

Thanks again for your help.

Regards,
Henning

Am 06/04/16 um 15:26 schrieb Jonathan Hurley:
I think what you're asking about is a concept known as host-level alerts. These
are alerts which are not scoped by any particular hadoop service. A good
example of this is the disk usage alert. It's bound only to a host and will be
distributed and run regardless of what components are installed on that host.

There are two ways to add a host alert:
1) Edit the alerts.json under /var/lib/ambari-server/resources and add your new
alert to the "AMBARI_AGENT" component.
2) Use the REST APIs to create your new alert. The service should be "AMBARI"
and the component should be "AMBARI_AGENT".

You can use the current agent alert (disk usage) as an example:
https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/alerts.json#L31

On Apr 6, 2016, at 8:56 AM, Henning Kropp
<<mailto:hkr...@microlution.de>hkr...@microlution.de<mailto:hkr...@microlution.de>>
wrote:

How can an alert be added to a host?

Am 05/04/16 um 18:41 schrieb Henning Kropp:
Worked now. Thanks.

Am 05/04/16 um 18:01 schrieb Jonathan Hurley:
The alerts.json file is only to pickup brand new alerts that are not currently
defined in the system. It's more of a way to quickly seed Ambari with a default
set of alerts. If the alert has already been created, any updates for that
alert made in alerts.json will not be brought in. You'll need to use the REST
APIs to update existing definitions.

You are correct that the agents run the alerts. The definitions.json file on
each agent shows what alerts it is trying to run.

On Apr 5, 2016, at 11:46 AM, Henning Kropp
<<mailto:hkr...@microlution.de>hkr...@microlution.de<mailto:hkr...@microlution.de>>
wrote:

Hi,

I am currently trying to change the alert definitions. I used the REST api to
put a new definition for example for id /30 . I can see the changes when doing
a GET.

Additionaly I replaced the alert.json of the service under ambari-server and
ambari-agent. Still the changes are not reflected in
/var/lib/ambari-agent/cache/alerts/definition.json and I suspect the alert is
not working as expected because of this.

As I undestand the defintions are broadcasted with heartbeats by the server?
And are executed on the host by the agent, where the service is running? Right?

What am I missing?

Thanks,
Henning

Re: Changing the Alert Definitions

2016-04-06 Thread Jonathan Hurley

I think what you're asking about is a concept known as host-level alerts. These
are alerts which are not scoped by any particular hadoop service. A good
example of this is the disk usage alert. It's bound only to a host and will be
distributed and run regardless of what components are installed on that host.

You can use the current agent alert (disk usage) as an example:
https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/alerts.json#L31

On Apr 6, 2016, at 8:56 AM, Henning Kropp
<hkr...@microlution.de<mailto:hkr...@microlution.de>> wrote:

How can an alert be added to a host?

Am 05/04/16 um 18:41 schrieb Henning Kropp:
Worked now. Thanks.

You are correct that the agents run the alerts. The definitions.json file on
each agent shows what alerts it is trying to run.

On Apr 5, 2016, at 11:46 AM, Henning Kropp
<hkr...@microlution.de<mailto:hkr...@microlution.de>> wrote:

Hi,

I am currently trying to change the alert definitions. I used the REST api to
put a new definition for example for id /30 . I can see the changes when doing
a GET.

As I undestand the defintions are broadcasted with heartbeats by the server?
And are executed on the host by the agent, where the service is running? Right?

What am I missing?

Thanks,
Henning

Re: Changing the Alert Definitions

2016-04-05 Thread Jonathan Hurley

The alerts.json file is only to pickup brand new alerts that are not currently 
defined in the system. It's more of a way to quickly seed Ambari with a default 
set of alerts. If the alert has already been created, any updates for that 
alert made in alerts.json will not be brought in. You'll need to use the REST 
APIs to update existing definitions.

You are correct that the agents run the alerts. The definitions.json file on 
each agent shows what alerts it is trying to run. 

> On Apr 5, 2016, at 11:46 AM, Henning Kropp  wrote:
> 
> Hi,
> 
> I am currently trying to change the alert definitions. I used the REST api to 
> put a new definition for example for id /30 . I can see the changes when 
> doing a GET.
> 
> Additionaly I replaced the alert.json of the service under ambari-server and 
> ambari-agent. Still the changes are not reflected in 
> /var/lib/ambari-agent/cache/alerts/definition.json and I suspect the alert is 
> not working as expected because of this.
> 
> As I undestand the defintions are broadcasted with heartbeats by the server? 
> And are executed on the host by the agent, where the service is running? 
> Right?
> 
> What am I missing?
> 
> Thanks,
> Henning
> 
>

Re: Ambari - error 500 - getAllRequests after upgrade

2016-03-08 Thread Jonathan Hurley

That's very odd, especially since the upgrade doesn't touch the topology 
tables. Are you using MySQL by any chance? If so, can you check to make sure 
that your database engine is Innodb and not MyISAM. You have an integrity 
violation here which doesn't seem possible unless you're using a database which 
doesn't support foreign key constraints.

There's probably some SQL which you can run to insert an entry into the 
topology_logical_request table, but it's probably best to understand why this 
happened first.

On Mar 8, 2016, at 5:55 AM, cs user 
> wrote:

Hi All,

I've upgraded Ambari from version 2.1.2-377 to version 2.2.1.0-161.

After performing the upgrade on the server, agents, upgrading the database and 
starting everything up, I keep seeing the following error in the logs on the 
server:

08 Mar 2016 10:07:05,087  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: Host Assignment 
Pending
08 Mar 2016 10:07:05,088  INFO [qtp-ambari-agent-55] LogicalRequest:420 - 
LogicalRequest.createHostRequests: created new outstanding host request ID = 3
08 Mar 2016 10:07:05,120  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: Host Assignment 
Pending
08 Mar 2016 10:07:05,120  INFO [qtp-ambari-agent-55] LogicalRequest:420 - 
LogicalRequest.createHostRequests: created new outstanding host request ID = 5
08 Mar 2016 10:07:05,134  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: Host Assignment 
Pending
08 Mar 2016 10:07:05,134  INFO [qtp-ambari-agent-55] LogicalRequest:420 - 
LogicalRequest.createHostRequests: created new outstanding host request ID = 8
08 Mar 2016 10:07:05,147  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: Host Assignment 
Pending
08 Mar 2016 10:07:05,148  INFO [qtp-ambari-agent-55] LogicalRequest:420 - 
LogicalRequest.createHostRequests: created new outstanding host request ID = 7
08 Mar 2016 10:07:05,158  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: Host Assignment 
Pending
08 Mar 2016 10:07:05,158  INFO [qtp-ambari-agent-55] LogicalRequest:420 - 
LogicalRequest.createHostRequests: created new outstanding host request ID = 6
08 Mar 2016 10:07:05,170  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: Host Assignment 
Pending
08 Mar 2016 10:07:05,170  INFO [qtp-ambari-agent-55] LogicalRequest:420 - 
LogicalRequest.createHostRequests: created new outstanding host request ID = 2
08 Mar 2016 10:07:05,184  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: Host Assignment 
Pending
08 Mar 2016 10:07:05,185  INFO [qtp-ambari-agent-55] LogicalRequest:420 - 
LogicalRequest.createHostRequests: created new outstanding host request ID = 1
08 Mar 2016 10:07:05,194  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: Host Assignment 
Pending
08 Mar 2016 10:07:05,194  INFO [qtp-ambari-agent-55] LogicalRequest:420 - 
LogicalRequest.createHostRequests: created new outstanding host request ID = 4
08 Mar 2016 10:07:05,290  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: 
ambdevtestdc2host-group-21.node.example
08 Mar 2016 10:07:05,328  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: 
ambdevtestdc2host-group-51.node.example
08 Mar 2016 10:07:05,384  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: 
ambdevtestdc2host-group-11.node.example
08 Mar 2016 10:07:05,428  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: 
ambdevtestdc2host-group-41.node.example
08 Mar 2016 10:07:05,507  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: 
ambdevtestdc2host-group-31.node.example
08 Mar 2016 10:07:05,575  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: 
ambdevtestdc2host-group-53.node.example
08 Mar 2016 10:07:05,627  INFO [qtp-ambari-agent-55] HostRequest:125 - 
HostRequest: Successfully recovered host request for host: 
ambdevtestdc2host-group-52.node.example
08 Mar 2016 10:07:05,644  WARN [qtp-ambari-agent-55] ServletHandler:563 - 
/agent/v1/register/ambdevtestdc2host-group-51.node.example
java.lang.NullPointerException
at 
org.apache.ambari.server.topology.PersistedStateImpl.getAllRequests(PersistedStateImpl.java:157)
at 
org.apache.ambari.server.topology.TopologyManager.ensureInitialized(TopologyManager.java:131)
at

Re: Metrics build failure

2016-02-12 Thread Jonathan Hurley

Maven looks for dependencies in your local repositories (~/.m2). When you have 
another compiled project as a dependency (which ambari-server has on 
ambari-metrics), you need to "install" this dependency in your local repo. In 
the ambari-metrics subproject folder, you'll want to do a: mvn clean compile 
package install -DskipTests

The "install" part is what copies the dependencies into your local maven repo. 
Also, you'll probably need this for ambari-views as well since ambari-server 
also depends on ambari-views.

On Feb 12, 2016, at 10:01 AM, Banias H 
> wrote:

My suggestion to get past this error is to comment out the 
... in ambari/pom.xml file. The error is:
Could not find artifact org.apache.ambari:ambari-metrics-common:jar:2.2.1.0 in 
oss.sonatype.org 
(https://oss.sonatype.org/content/groups/staging) -> [Help 1]

However ambari-metrics-common has been suucessfully built locally:

[INFO] Ambari Metrics Common .. SUCCESS [  1.724 s]

Yet it is always trying to fetch from sonatype.org, which 
doesn't have ambari 2.2.1.0. If commenting out repositories block in pom file 
doesn't help, I would suggest going into ambari-metrics-common and do a "mvn 
clean install". Then do an ambari build again.

On Fri, Feb 12, 2016 at 8:38 AM, rammohan ganapavarapu 
> wrote:
Any one faced this issue before? please help me how to resolve this issue.

Thanks,
Ram

On Thu, Feb 11, 2016 at 9:12 PM, rammohan ganapavarapu 
> wrote:
I was able to fix that error by installing python26-devel package. Now i am 
getting server build error.

[INFO] 
[INFO] Building Ambari Server 2.2.1.0
[INFO] 
[INFO] Downloading: 
https://oss.sonatype.org/content/groups/staging/org/apache/ambari/ambari-metrics-common/2.2.1.0/ambari-metrics-common-2.2.1.0.pom
[INFO] Downloading: 
http://repo.spring.io/milestone/org/apache/ambari/ambari-metrics-common/2.2.1.0/ambari-metrics-common-2.2.1.0.pom
[INFO] Downloading: 
https://repository.apache.org/content/groups/staging/org/apache/ambari/ambari-metrics-common/2.2.1.0/ambari-metrics-common-2.2.1.0.pom
[INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/ambari/ambari-metrics-common/2.2.1.0/ambari-metrics-common-2.2.1.0.pom
[WARNING] The POM for org.apache.ambari:ambari-metrics-common:jar:2.2.1.0 is 
missing, no dependency information available
[INFO] Downloading: 
https://oss.sonatype.org/content/groups/staging/org/apache/ambari/ambari-metrics-common/2.2.1.0/ambari-metrics-common-2.2.1.0.jar
[INFO] Downloading: 
http://repo.spring.io/milestone/org/apache/ambari/ambari-metrics-common/2.2.1.0/ambari-metrics-common-2.2.1.0.jar
[INFO] Downloading: 
https://repository.apache.org/content/groups/staging/org/apache/ambari/ambari-metrics-common/2.2.1.0/ambari-metrics-common-2.2.1.0.jar
[INFO] Downloading: 
https://repo.maven.apache.org/maven2/org/apache/ambari/ambari-metrics-common/2.2.1.0/ambari-metrics-common-2.2.1.0.jar
[INFO]
[INFO] 
[INFO] Skipping Ambari Main
[INFO] This project has been banned from the build due to previous failures.
[INFO] 
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Ambari Main  SUCCESS [  3.544 s]
[INFO] Apache Ambari Project POM .. SUCCESS [  0.259 s]
[INFO] Ambari Web . SUCCESS [01:02 min]
[INFO] Ambari Views ... SUCCESS [  2.314 s]
[INFO] Ambari Admin View .. SUCCESS [ 39.650 s]
[INFO] ambari-metrics . SUCCESS [  0.886 s]
[INFO] Ambari Metrics Common .. SUCCESS [  1.724 s]
[INFO] Ambari Metrics Hadoop Sink . SUCCESS [  3.882 s]
[INFO] Ambari Metrics Flume Sink .. SUCCESS [  2.366 s]
[INFO] Ambari Metrics Kafka Sink .. SUCCESS [  1.515 s]
[INFO] Ambari Metrics Storm Sink .. SUCCESS [  3.873 s]
[INFO] Ambari Metrics Collector ... SUCCESS [01:21 min]
[INFO] Ambari Metrics Monitor . SUCCESS [  3.215 s]
[INFO] Ambari Metrics Assembly  SUCCESS [02:50 min]
[INFO] Ambari Server .. FAILURE [  6.231 s]
[INFO] Ambari Agent ... SKIPPED
[INFO] Ambari Client .. SKIPPED

Re: Ambari Blueprints - Re: properties [href, items] specified in the request or predicate are not supported

2015-11-18 Thread Jonathan Hurley

This all kind of depends on how you created your blueprint and what host groups 
you have defined. Assuming you have two host groups, here’s an example of what 
to POST. You also need to specify the blueprint name of the blueprint you 
created.

POST api/v1/clusters/indigo

{
  "blueprint": ,
  "default_password": "password",
  "host_groups": [
{
  "hosts": [
{
  "fqdn": 
"ip-10-4-148-160.us<http://ip-10-4-148-160.us>-west-2.compute.internal"
}
  ],
  "name": "host_group_1"
},
{
  "hosts": [
{
  "fqdn": 
"ip-10-4-148-49.us<http://ip-10-4-148-49.us>-west-2.compute.internal"
}
  ],
  "name": "host_group_2"
}
  ]
}

On Nov 18, 2015, at 11:04 AM, Naga Vijay 
<nagah...@gmail.com<mailto:nagah...@gmail.com>> wrote:

I tried removing the href and items, but am unable to POST the host mappings 
file.

Can you please provide the right format for the minimal host mappings file 
below?

{
  "href" : "http://10.4.148.160:8080/api/v1/clusters/indigo/hosts;,
  "items" : [
{
  "href" : 
"http://10.4.148.160:8080/api/v1/clusters/indigo/hosts/ip-10-4-148-160.us-west-2.compute.internal;,
  "Hosts" : {
"cluster_name" : "indigo",
"host_name" : 
"ip-10-4-148-160.us<http://ip-10-4-148-160.us>-west-2.compute.internal"
  }
},
{
  "href" : 
"http://10.4.148.160:8080/api/v1/clusters/indigo/hosts/ip-10-4-148-49.us-west-2.compute.internal;,
  "Hosts" : {
"cluster_name" : "indigo",
"host_name" : 
"ip-10-4-148-49.us<http://ip-10-4-148-49.us>-west-2.compute.internal"
  }
}
  ]
}

Thanks
Naga



On Mon, Nov 16, 2015 at 11:01 AM, Jonathan Hurley 
<jhur...@hortonworks.com<mailto:jhur...@hortonworks.com>> wrote:
When you make a  REST request to Ambari, it gives you back some JSON which 
contains the data along with some decorator information. The “href” and “items” 
elements are only for informational and structure purposes; you wouldn’t want 
to include them in a POST going back to the server. What are you trying to do 
with your cluster? Chances are you just need to omit the fields it’s 
complaining about.

On Nov 15, 2015, at 3:50 PM, Naga Vijay 
<nagah...@gmail.com<mailto:nagah...@gmail.com>> wrote:

Just prefixed "Ambari Blueprints" to the subject line above.

On Sun, Nov 15, 2015 at 12:48 PM, Naga Vijay 
<nagah...@gmail.com<mailto:nagah...@gmail.com>> wrote:
Hello,

I am using Ambari 2.1.2 and facing this issue when I POST the host mappings 
json file ...

"message" : "The properties [href, items] specified in the request or predicate 
are not supported for the resource type Cluster."

Has anyone encountered this?

If yes, may I know how you have overcome?

Thanks
Naga

Re: Ambari Blueprints - Re: properties [href, items] specified in the request or predicate are not supported

2015-11-18 Thread Jonathan Hurley

Unless I’m mistaken, blueprint installations require agents to already be 
bootstrapped and running on all hosts.

On Nov 18, 2015, at 2:58 PM, Naga Vijay 
<nagah...@gmail.com<mailto:nagah...@gmail.com>> wrote:

Thank you, that worked to the point of triggering the install.  I could see the 
background operation (Logical Request: Provision Cluster) running in Ambari UI. 
 But, the background operation is in a hung state.  I am wondering whether that 
is due to missing information (ssh key and login user, as we provide them 
during manual install using Ambari UI).  Can you please clarify?

Thanks
Naga


On Wed, Nov 18, 2015 at 8:43 AM, Jonathan Hurley 
<jhur...@hortonworks.com<mailto:jhur...@hortonworks.com>> wrote:
This all kind of depends on how you created your blueprint and what host groups 
you have defined. Assuming you have two host groups, here’s an example of what 
to POST. You also need to specify the blueprint name of the blueprint you 
created.

POST api/v1/clusters/indigo

{
  "blueprint": ,
  "default_password": "password",
  "host_groups": [
{
  "hosts": [
{
  "fqdn": 
"ip-10-4-148-160.us<http://ip-10-4-148-160.us/>-west-2.compute.internal"
}
  ],
  "name": "host_group_1"
},
{
  "hosts": [
{
  "fqdn": 
"ip-10-4-148-49.us<http://ip-10-4-148-49.us/>-west-2.compute.internal"
}
  ],
  "name": "host_group_2"
}
  ]
}

On Nov 18, 2015, at 11:04 AM, Naga Vijay 
<nagah...@gmail.com<mailto:nagah...@gmail.com>> wrote:

I tried removing the href and items, but am unable to POST the host mappings 
file.

Can you please provide the right format for the minimal host mappings file 
below?

{
  "href" : "http://10.4.148.160:8080/api/v1/clusters/indigo/hosts;,
  "items" : [
{
  "href" : 
"http://10.4.148.160:8080/api/v1/clusters/indigo/hosts/ip-10-4-148-160.us-west-2.compute.internal;,
  "Hosts" : {
"cluster_name" : "indigo",
"host_name" : 
"ip-10-4-148-160.us<http://ip-10-4-148-160.us/>-west-2.compute.internal"
  }
},
{
  "href" : 
"http://10.4.148.160:8080/api/v1/clusters/indigo/hosts/ip-10-4-148-49.us-west-2.compute.internal;,
  "Hosts" : {
"cluster_name" : "indigo",
"host_name" : 
"ip-10-4-148-49.us<http://ip-10-4-148-49.us/>-west-2.compute.internal"
  }
}
  ]
}

Thanks
Naga



On Mon, Nov 16, 2015 at 11:01 AM, Jonathan Hurley 
<jhur...@hortonworks.com<mailto:jhur...@hortonworks.com>> wrote:
When you make a  REST request to Ambari, it gives you back some JSON which 
contains the data along with some decorator information. The “href” and “items” 
elements are only for informational and structure purposes; you wouldn’t want 
to include them in a POST going back to the server. What are you trying to do 
with your cluster? Chances are you just need to omit the fields it’s 
complaining about.

On Nov 15, 2015, at 3:50 PM, Naga Vijay 
<nagah...@gmail.com<mailto:nagah...@gmail.com>> wrote:

Just prefixed "Ambari Blueprints" to the subject line above.

On Sun, Nov 15, 2015 at 12:48 PM, Naga Vijay 
<nagah...@gmail.com<mailto:nagah...@gmail.com>> wrote:
Hello,

I am using Ambari 2.1.2 and facing this issue when I POST the host mappings 
json file ...

"message" : "The properties [href, items] specified in the request or predicate 
are not supported for the resource type Cluster."

Has anyone encountered this?

If yes, may I know how you have overcome?

Thanks
Naga

Re: NPE doing rolling upgrade

2015-11-17 Thread Jonathan Hurley

This is a problem that happens when the host with Ambari is not also a part of 
the cluster. You should probably downgrade and make the Ambari server a part of 
the cluster by installing a simple client on it. Then you can try the upgrade 
again.

This is fixed in Ambari 2.1.3

On Nov 17, 2015, at 7:59 AM, Brian Jeltema 
> wrote:

I recently upgraded to Ambari 2.1.2.1 and am attempting to do a rolling upgrade
from HDP 2.2.8.0 to HDP 2.3.2.0.

About 15 seconds after the upgrade begins, the UI displays an error dialog:

  500 status code received on POST method for API: /api/v1/clusters/dev/upgrades

 Error message: Server Error

and the Ambari server log contains the two stack traces below. Does anyone know 
what might be
causing  this?

Thanks
Brian


17 Nov 2015 07:46:41,061  INFO [qtp-client-23] ConfigureTask:452 - Skipping 
property delete for hive-site/hive.server2.authentication.pam.services as the 
value NONE for hive-site/hive.server2.authentication is not equal to custom
17 Nov 2015 07:46:41,061  INFO [qtp-client-23] ConfigureTask:452 - Skipping 
property delete for hive-site/hive.server2.authentication.kerberos.keytab as 
the value NONE for hive-site/hive.server2.authentication is not equal to custom
17 Nov 2015 07:46:41,062  INFO [qtp-client-23] ConfigureTask:452 - Skipping 
property delete for hive-site/hive.server2.authentication.kerberos.principal as 
the value NONE for hive-site/hive.server2.authentication is not equal to custom
17 Nov 2015 07:46:41,380 ERROR [qtp-client-23] BaseManagementHandler:66 - 
Caught a runtime exception while attempting to create a resource
java.lang.NullPointerException
at 
org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.persistActions(ActionDBAccessorImpl.java:300)
at 
org.apache.ambari.server.orm.AmbariJpaLocalTxnInterceptor.invoke(AmbariJpaLocalTxnInterceptor.java:68)
at 
org.apache.ambari.server.actionmanager.ActionManager.sendActions(ActionManager.java:99)
at 
org.apache.ambari.server.controller.internal.RequestStageContainer.persist(RequestStageContainer.java:216)
at 
org.apache.ambari.server.controller.internal.UpgradeResourceProvider.createUpgrade(UpgradeResourceProvider.java:752)
at 
org.apache.ambari.server.controller.internal.UpgradeResourceProvider.access$100(UpgradeResourceProvider.java:116)
at 
org.apache.ambari.server.controller.internal.UpgradeResourceProvider$1.invoke(UpgradeResourceProvider.java:284)
at 
org.apache.ambari.server.controller.internal.UpgradeResourceProvider$1.invoke(UpgradeResourceProvider.java:274)
at 
org.apache.ambari.server.controller.internal.AbstractResourceProvider.createResources(AbstractResourceProvider.java:272)
at 
org.apache.ambari.server.controller.internal.UpgradeResourceProvider.createResources(UpgradeResourceProvider.java:274)
at 
org.apache.ambari.server.controller.internal.ClusterControllerImpl.createResources(ClusterControllerImpl.java:289)
at 
org.apache.ambari.server.api.services.persistence.PersistenceManagerImpl.create(PersistenceManagerImpl.java:76)
at 
org.apache.ambari.server.api.handlers.CreateHandler.persist(CreateHandler.java:36)
at 
org.apache.ambari.server.api.handlers.BaseManagementHandler.handleRequest(BaseManagementHandler.java:72)
at 
org.apache.ambari.server.api.services.BaseRequest.process(BaseRequest.java:135)
at 
org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:105)
at 
org.apache.ambari.server.api.services.BaseService.handleRequest(BaseService.java:74)
at 
org.apache.ambari.server.api.services.UpgradeService.createUpgrade(UpgradeService.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at 
com.sun.jersey.server.impl.uri.rules.SubLocatorRule.accept(SubLocatorRule.java:137)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
at

Re: Ambari server start takes too long

2015-11-16 Thread Jonathan Hurley

What this step is doing is loading classes which match an interface and binding 
them as individual alert dispatchers in Guice. I haven’t experienced any 
slowdown starting Ambari server - usually starts up in about 10 seconds total. 
Can you provide a jstack dump during your startup so we can see what the 
various threads are doing? I’m mainly concerned with the main Ambari thread 
that would be initializing this stuff.

> On Nov 16, 2015, at 3:52 AM, Constantine Yarovoy  wrote:
> 
> Hi all.
> 
> I'm developing my own stack for Ambari and I often need to change 
> master/slave component code in Python. And I have 2 questions regarding this:
> 
> 1. What is the fastest way to make Ambari understand that stack has changed 
> and to use updated code ? The only way it works for me now is to restart 
> server (service ambari-server restart)
> 
> Is it possible to do it any other way without restarting the server ?
> 
> 2. Ambari server start procedure really takes too long. I'm using Centos 7 
> and after starting the service it takes at about 4-8 minutes for it to 
> actually bind to port 8080 so that web ui becomes available. Tailing -f 
> ambari-server.log I've notices that the biggest delay during start is on this 
> step:
> 
> 16 Nov 2015 08:47:13,392  INFO [main] ControllerModule:560 - Binding and 
> registering notification dispatcher class 
> org.apache.ambari.server.notifications.dispatchers.AlertScriptDispatcher
> 
> Maybe someone experienced the same behavior and there is a way to speed up 
> this step?
> 
> Thanks in advance.
> 
> --
> Kostiantyn Yarovyi
>

Re: Ambari Blueprints - Re: properties [href, items] specified in the request or predicate are not supported

2015-11-16 Thread Jonathan Hurley

When you make a  REST request to Ambari, it gives you back some JSON which 
contains the data along with some decorator information. The “href” and “items” 
elements are only for informational and structure purposes; you wouldn’t want 
to include them in a POST going back to the server. What are you trying to do 
with your cluster? Chances are you just need to omit the fields it’s 
complaining about.

On Nov 15, 2015, at 3:50 PM, Naga Vijay 
> wrote:

Just prefixed "Ambari Blueprints" to the subject line above.

On Sun, Nov 15, 2015 at 12:48 PM, Naga Vijay 
> wrote:
Hello,

I am using Ambari 2.1.2 and facing this issue when I POST the host mappings 
json file ...

"message" : "The properties [href, items] specified in the request or predicate 
are not supported for the resource type Cluster."

Has anyone encountered this?

If yes, may I know how you have overcome?

Thanks
Naga

Re: Issue with Ambari Metrics Collector - Distributed mode

2015-10-23 Thread Jonathan Hurley

The ambari disk usage alerts are meant to check two things: that you have have 
enough space total and percent free space in /usr/hdp for data created by 
hadoop and for installing versioned RPMs. Total free space alerts are something 
that you’ll probably want to fix since it means you have less than a certain 
amount of total free space left.

It seems like you’re talking about percent free space. Those can be changed via 
the thresholds that the script uses. You can’t do this through the Ambari Web 
Client. You have two options:

- Use the Ambari APIs to adjust the threshold values - this command is rather 
long; let me know if you want to try this and I can paste the code to do it.

- Edit the script directly and set the defaults to higher limits: 
https://github.com/apache/ambari/blob/branch-2.1/ambari-server/src/main/resources/host_scripts/alert_disk_space.py#L36-L37

On Oct 23, 2015, at 9:26 AM, Vijaya Narayana Reddy Bhoomi Reddy 
> 
wrote:

Siddharth,

Thanks for your response. As ours was a 4 node cluster, I changed it to 
Embedded mode from distributed mode and is working fine. However, I am facing 
another issue with regards to Ambari agent disk usage alerts. Earlier, I had 
three alerts for three machines where /usr/hdp is utilised more than 50%.

Initially when I setup the cluster, I had multiple mount points listed under 
yarn.nodemanager.local-dirs and yarn.nodemaneger.log-dirs. /usr/hdp was one 
amor them Later, I changed these values such that only one value is present for 
these (/export/hadoop/yarn/local and /export/hadoop/yarn/log) and restarted the 
required components.

However, I am still seeing the Ambari disk usage alert for /usr/hdp. Can you 
please let me know how to get rid of these alerts?

Thanks
Vijay

On 22 Oct 2015, at 19:02, Siddharth Wagle 
> wrote:

Hi Vijaya,

Please make all of the configs are accurate. 
(https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode)

Can you attach, your ams-site.xml and /etc/ams-hbase/conf/hbase-site.xml ?

- Sid

From: Vijaya Narayana Reddy Bhoomi Reddy 
>
Sent: Thursday, October 22, 2015 8:36 AM
To: user@ambari.apache.org
Subject: Issue with Ambari Metrics Collector - Distributed mode

Hi,

I am facing an issue while setting up Ambari Metrics in distributed mode. I am 
setting up HDP 2.3.x using Ambari 2.1.x. Initially when I was setting up the 
cluster, I was shown a warning message that the volume / directory for metrics 
service  is same as the one used by datanode and hence I was recommended to 
change it. So I went ahead and pointed it to hdfs, trying to setting up metrics 
service in distributed mode.

However, Ambari Metrics service is not set up properly and it timed out while 
setting up the cluster, showing a warning that Ambari Metrics service hasn’t 
started. I restarted the Metrics collector service multiple times, but it would 
stop again in a few seconds.

On further observation, I realised that in the ams-site.xml file, 
timeline.metrics.service.operation.mode was still pointing to “embedded", where 
as hbase-site.xml had all the required properties set correctly. So I changed 
the timeline.metrics.service.operation.mode property to “distributed” and 
restarted the required services as recommended by Ambari. However, the restart 
process is stuck at 68% and eventually timed out. Its not able to restart the 
Metrics Collector service. However, all the metrics monitor services are 
re-started without any issues.

Can anyone please throw light on why this happening and what is the solution to 
fix this?

Thanks
Vijay
--
The contents of this e-mail are confidential and for the exclusive use of
the intended recipient. If you receive this e-mail in error please delete
it from your system immediately and notify us either by e-mail or
telephone. You should not copy, forward or otherwise disclose the content
of the e-mail. The views expressed in this communication may not
necessarily be the view held by WHISHWORKS.

The contents of this e-mail are confidential and for the exclusive use of the 
intended recipient. If you receive this e-mail in error please delete it from 
your system immediately and notify us either by e-mail or telephone. You should 
not copy, forward or otherwise disclose the content of the e-mail. The views 
expressed in this communication may not necessarily be the view held by 
WHISHWORKS.

Re: Missing entries in oozie-site.xml after Upgrade Ambari 2.1.2

2015-10-19 Thread Jonathan Hurley

The upgrade should have preserved all of this properties. They are marked in 
the upgrade pack as “keep” so if they existed before the upgrade, then they 
should have been present.

With that said, the values from the manual upgrade look correct. When you 
replace this values, what error does Oozie provide on startup?

On Oct 19, 2015, at 3:03 AM, Shaik M 
> wrote:

Hi Team,

I have recently upgrade my Cluster with HDP 2.3 with Amabri  2.1.2.

I have verified Oozie configuration in Amabri, Falcon - Oozie integration 
section all values are empty. Please find the attached screenshot for your 
reference.

I am tried to fill the values with the help of below link, but after restating 
the Oozie it is not starting.

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/configuring_oozie_for_falcon.html

I have removed those values and filled with space (blank), I can bring it back 
oozie service.

Please help me to get with correct values in missing properties.

Thanks,
Shaik

Re: ambari server startup halts for over five minutes

2015-10-15 Thread Jonathan Hurley

The ControllerModule is iterating over all classes which are instances of 
NotificationDispatcher and binding them to a singleton instance. At this point 
in the startup, the classes have been identified and the only real work being 
done is a conversation from a String to a Class instance before having guice 
create the bindings.

Can you produce a thread thump during those 6 minutes so we can see where the 
time is being spent?

On Oct 15, 2015, at 8:58 PM, Stephen Boesch 
> wrote:

I am building a new ambari cluster. Presently we just have the ambari binaries 
installed.

The following delays have been noted several times now: notice there is a six 
minute delay during which nothing appears to be occurring:

..
16 Oct 2015 00:45:48,471  INFO [main] Configuration:670 - Reading password from 
existing file
16 Oct 2015 00:45:48,481  INFO [main] Configuration:957 - Hosts Mapping File 
null
16 Oct 2015 00:45:48,481  INFO [main] HostsMap:60 - Using hostsmap file null
16 Oct 2015 00:45:48,800  INFO [main] ControllerModule:187 - Detected POSTGRES 
as the database type from the JDBC URL
16 Oct 2015 00:45:49,488  INFO [main] ControllerModule:564 - Binding and 
registering notification dispatcher class 
org.apache.ambari.server.notifications.dispatchers.AlertScriptDispatcher
16 Oct 2015 00:51:48,881  INFO [main] ControllerModule:564 - Binding and 
registering notification dispatcher class 
org.apache.ambari.server.notifications.dispatchers.SNMPDispatcher
16 Oct 2015 00:51:48,892  INFO [main] ControllerModule:564 - Binding and 
registering notification dispatcher class 
org.apache.ambari.server.notifications.dispatchers.EmailDispatcher
16 Oct 2015 00:51:51,047  INFO [main] AmbariServer:710 - Getting the controller
16 Oct 2015 00:51:51,907  INFO [main] StackManager:107 - Initializing the stack 
manager...
16 Oct 2015 00:51:51,908  INFO [main] StackManager:267 - Validating stack 
directory /var/lib/ambari-server/resources/stacks ...
16 Oct 2015 00:51:51,908  INFO [main] StackManager:243 - Validating common 
services directory /var/lib/ambari-server/resources/common-services ...
16 Oct 2015 00:51:52,271  INFO [main] StackDirectory:426 - Stack 
'/var/lib/ambari-server/resources/stacks/HDP/2.3.GlusterFS' doesn't contain an 
upgrade directory
 ..

What is going on here?

Re: Change Ambari Agent Disk Usage alert Threshold Limit

2015-10-12 Thread Jonathan Hurley

There is currently no way to change the directories which are checked by this 
alert. The alert is mostly concerned with the free space where stack components 
are installed (either /usr/hdp or /usr/lib).

You can easily create another alert script to check a different directory if 
desired.

On Oct 12, 2015, at 4:16 AM, Vijaya Narayana Reddy Bhoomi Reddy 
> 
wrote:

Hi,

Good Morning!!

I am as well facing same issue where Ambari alerts with the disk usage alert. 
However, I would want to understand on the process much before the alerting 
process i.e. the allocation of disk space itself. Suppose I had 2 volumes 
mounted /dev/sda1 which is 50GB and /dev/sda2 which is 1TB.

Now when I install HDP with these two volumes, HDFS space is correctly 
allocated i.e. /dev/sda2 is given for HDFS. However, I believe /dev/sda1 is 
used by Ambari for its internal storage. As in my case /dev/sda1 has other data 
as well, including OS, it is obviously less in disk space.

Can anyone please let me know how to configure Ambari to use disk space of my 
choice rather than auto selecting the smaller partition?

Thanks & Regards
Vijay

On 9 Oct 2015, at 12:44, Jeffrey Sposetti 
> wrote:

The Ambari Agent Disk Usage Alert is a SCRIPT type alert.

https://github.com/apache/ambari/blob/branch-2.1/ambari-server/src/main/resources/alerts.json#L31-L70

It's thresholds are not configurable in the Ambari Web UI you can modify via 
the API.

From: Shaik M >
Sent: Thursday, October 08, 2015 11:41 PM
To: user@ambari.apache.org
Subject: Change Ambari Agent Disk Usage alert Threshold Limit

Hi,

I am looking for change the Ambari Agent Disk Usage alert Threshold Limit. But, 
I couldn't find that kind of option.

I saw only description and time interval.

Please let us know how to change the  Ambari Agent Disk Usage alert Threshold 
Limit as per our need.

Thanks,
Shaik

The contents of this e-mail are confidential and for the exclusive use of the 
intended recipient. If you receive this e-mail in error please delete it from 
your system immediately and notify us either by e-mail or telephone. You should 
not copy, forward or otherwise disclose the content of the e-mail. The views 
expressed in this communication may not necessarily be the view held by 
WHISHWORKS.

Re: Ambari 2.1.0-snap: Alert configuration location

2015-06-18 Thread Jonathan Hurley

You can update the values, but you’ll need to us the APIs to do this. When 
sending a new “source” element, you need to include all fields - it will not 
merge omitted source child elements in with the existing source:

PUT api/v1/clusters/cluster-name/alert_definitions/definition-id

{
  AlertDefinition : {
source : {
  type: SCRIPT,
  path: alert_disk_space.py,
  parameters: [
{
  name: minimum.free.space,
  display_name: Minimum Free Space,
  value: 50,
  type: NUMERIC,
  description: The overall amount of free disk space left before an 
alert is triggered.,
  units: bytes,
  threshold: WARNING
},
{
  name: percent.used.space.warning.threshold,
  display_name: Warning,
  value: 0.5,
  type: PERCENT,
  description: The percent of disk space consumed before a warning 
is triggered.,
  units: %,
  threshold: WARNING
},
{
  name: percent.free.space.critical.threshold,
  display_name: Critical,
  value: 0.8,
  type: PERCENT,
  description: The percent of disk space consumed before a critical 
alert is triggered.,
  units: %,
  threshold: CRITICAL
   }
 ]
  }
}



On Jun 18, 2015, at 10:05 AM, Sumit Mohanty 
smoha...@hortonworks.commailto:smoha...@hortonworks.com wrote:

Does this have the details you need?
https://github.com/apache/ambari/blob/branch-2.0.0/ambari-server/docs/api/v1/alert-definitions.md

-Sumit

From: Eirik Thorsnes eirik.thors...@uni.no
Sent: Thursday, June 18, 2015 2:58 AM
To: user@ambari.apache.org
Subject: Ambari 2.1.0-snap: Alert configuration location

Hi,

I'm looking at Ambari 2.1.0 compiled from git.
In JIRA AMBARI-10816 it was added the possibility to use configuration
instead of hard-coded values for alerts (as I understand it).

Where do I set these configurations?
For e.g.: the configuration percent.used.space.warning.threshold in the
alert_disk_space.py script.

Thanks,
Eirik

--
Eirik Thorsnes

Re: Ambari 2.1.0-snap: Alert configuration location

2015-06-18 Thread Jonathan Hurley

Yes, they should be there. Was this an upgrade from an earlier Ambari 2.0 
install? In which case they might not be there since the 2.0 definitions didn’t 
have them. Also, my below JSON had a syntax error in it, there should be an 
extra closing { 

 On Jun 18, 2015, at 11:12 AM, Eirik Thorsnes eirik.thors...@uni.no wrote:
 
 On 18. juni 2015 16:53, Jonathan Hurley wrote:
 You can update the values, but you’ll need to us the APIs to do this.
 When sending a new “source” element, you need to include all fields - it
 will not merge omitted source child elements in with the existing source:
 
 PUT api/v1/clusters/cluster-name/alert_definitions/definition-id
 
 Thank you, I'll try that.
 
 Are the parameters section supposed to be already there in the definitions?
 
 If I issue a GET I only see the following:
 
 GET -H 'X-Requested-By: ambari'
 http://localhost:8080/api/v1/clusters/helm/alert_definitions/52
 {
  href :
 http://localhost:8080/api/v1/clusters/helm/alert_definitions/52;,
  AlertDefinition : {
cluster_name : helm,
component_name : AMBARI_AGENT,
description : This host-level alert is triggered if the amount of
 disk space used on a host goes above specific thresholds. The default
 values are 50% for WARNING and 80% for CRITICAL.,
enabled : true,
id : 52,
ignore_host : false,
interval : 1,
label : Ambari Agent Disk Usage,
name : ambari_agent_disk_usage,
scope : HOST,
service_name : AMBARI,
source : {
  path : alert_disk_space.py,
  type : SCRIPT
}
  }
 }
 
 Regards,
 Eirik
 
 -- 
 Eirik Thorsnes

Re: Restarting nodes to avoid HTTP 403

2015-05-22 Thread Jonathan Hurley

What if you make a connection locally using wget; does tcpdump capture output 
then? Everything about this issue seems to indicate it’s environmental.

On May 19, 2015, at 10:11 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

With your command I have got the exact same result : tcpdump returns nothing.

De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : mardi 19 mai 2015 14:42
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

It might be because you’re using the loopback adapter. Your command should 
probable be

tcpdump -i eth0 -s0 -n dst port 8042

On May 18, 2015, at 11:44 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

First of all sorry for my (very) late answer.
I confirm that restarting the agents before installing the cluster fixes the 
issue. But what is slightly more complicated is the fact that there is not 
network trace.
The command  sudo tcpdump -i lo -l -s0 -w - tcp dst port 8042 | strings
executed on the host returning 403 returns absolutely nothing.

Does that help you ?
If you want any additional information, feel free to ask.

Regards,


Loïc

De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : mercredi 13 mai 2015 23:39
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

Well, I’m really baffled by this. Restarting the agents before installing a 
cluster fixes the issue as well? So it seems like after agents are installed 
they are not able to make connections from python without getting a 403 
forbidden until they are restarted at least once. Is it possible to get a 
network trace of the agents when they encounter the 403 forbidden? That way we 
can see the communication path between the agent and a particular endpoint, 
like NameNode WebUI.

On May 13, 2015, at 8:31 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

Additional information : the restart does not have to be necessarily AFTER the 
cluster have been deployed using a Blueprint. Restarting the ambari-agent 
before using a blueprint to deploy the cluster make a perfectly clear cluster, 
without the 11 Warnings/Errors I mentioned in my previous emails.

Hope this will help understand where the problem comes from.
Have a nice day,


Loïc

De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : jeudi 7 mai 2015 18:53
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

All of this really seems to point to some kind of firewall/proxy issue on the 
agent hosts. From your agent named 
vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.frhttp://vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.fr/,
 could you try the following python code:

 import urllib2
 response = 
 urllib2.urlopen(http://vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.fr:8042/ws/v1/node/info;,
  timeout=10.0)
 print(response.code)

I’m really curious if executing that from your agent host results in a 200 or a 
403.

On May 7, 2015, at 11:45 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

Here are the answers to your questions :

1) Which operating system are you running the agents on?
-- CentOS6

2) Is there a linux proxy setup, such as “export http_proxy=foo” - curl doesn’t 
respect this proxy setting but python’s urllib2 does, which has caused some 
issues before
-- There is a proxy, but there is no such setup (I mean “echo $http_proxy” 
returns nothing, and I do not remember setting Ambari properly in order to 
access to the Internet via the proxy)

3) Which stack are you deploying?
-- I am deploying HDP-2.2.4.2-2

Have a nice weekend,


Loïc


De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : jeudi 7 mai 2015 17:34
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

The logs indicate that the alerts are running correctly and are simply hitting 
a 403. Normally, you might encounter this kind of problem from Python when 
making web connections without specifying a known user agent header. Python’s 
default header sometimes causes issues since it’s not standard. However, that 
problem would continue to happen after you restarted the agents. The fact that 
a simple agent restart completely fixes the issue is baffling.

I’ve certainly never seen this type of behavior before. I’d like to know a few 
more details on your environment:
1) Which operating system are you running the agents on?
2) Is there a linux proxy setup, such as “export http_proxy=foo” - curl doesn’t 
respect this proxy setting but python’s urllib2 does, which has caused some 
issues before
3) Which stack are you deploying?

On May 7, 2015, at 4:05 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi

Re: Restarting nodes to avoid HTTP 403

2015-05-19 Thread Jonathan Hurley

It might be because you’re using the loopback adapter. Your command should 
probable be

tcpdump -i eth0 -s0 -n dst port 8042

On May 18, 2015, at 11:44 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

First of all sorry for my (very) late answer.
I confirm that restarting the agents before installing the cluster fixes the 
issue. But what is slightly more complicated is the fact that there is not 
network trace.
The command  sudo tcpdump -i lo -l -s0 -w - tcp dst port 8042 | strings
executed on the host returning 403 returns absolutely nothing.

Does that help you ?
If you want any additional information, feel free to ask.

Regards,


Loïc

De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : mercredi 13 mai 2015 23:39
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

Well, I’m really baffled by this. Restarting the agents before installing a 
cluster fixes the issue as well? So it seems like after agents are installed 
they are not able to make connections from python without getting a 403 
forbidden until they are restarted at least once. Is it possible to get a 
network trace of the agents when they encounter the 403 forbidden? That way we 
can see the communication path between the agent and a particular endpoint, 
like NameNode WebUI.

On May 13, 2015, at 8:31 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

Additional information : the restart does not have to be necessarily AFTER the 
cluster have been deployed using a Blueprint. Restarting the ambari-agent 
before using a blueprint to deploy the cluster make a perfectly clear cluster, 
without the 11 Warnings/Errors I mentioned in my previous emails.

Hope this will help understand where the problem comes from.
Have a nice day,


Loïc

De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : jeudi 7 mai 2015 18:53
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

All of this really seems to point to some kind of firewall/proxy issue on the 
agent hosts. From your agent named 
vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.frhttp://vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.fr/,
 could you try the following python code:

 import urllib2
 response = 
 urllib2.urlopen(http://vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.fr:8042/ws/v1/node/info;,
  timeout=10.0)
 print(response.code)

I’m really curious if executing that from your agent host results in a 200 or a 
403.

On May 7, 2015, at 11:45 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

Here are the answers to your questions :

1) Which operating system are you running the agents on?
-- CentOS6

2) Is there a linux proxy setup, such as “export http_proxy=foo” - curl doesn’t 
respect this proxy setting but python’s urllib2 does, which has caused some 
issues before
-- There is a proxy, but there is no such setup (I mean “echo $http_proxy” 
returns nothing, and I do not remember setting Ambari properly in order to 
access to the Internet via the proxy)

3) Which stack are you deploying?
-- I am deploying HDP-2.2.4.2-2

Have a nice weekend,


Loïc


De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : jeudi 7 mai 2015 17:34
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

The logs indicate that the alerts are running correctly and are simply hitting 
a 403. Normally, you might encounter this kind of problem from Python when 
making web connections without specifying a known user agent header. Python’s 
default header sometimes causes issues since it’s not standard. However, that 
problem would continue to happen after you restarted the agents. The fact that 
a simple agent restart completely fixes the issue is baffling.

I’ve certainly never seen this type of behavior before. I’d like to know a few 
more details on your environment:
1) Which operating system are you running the agents on?
2) Is there a linux proxy setup, such as “export http_proxy=foo” - curl doesn’t 
respect this proxy setting but python’s urllib2 does, which has caused some 
issues before
3) Which stack are you deploying?

On May 7, 2015, at 4:05 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

You will find in attachment what I get from the API you gave me the URL.
As far as the log file is concerned, here is an extract from the one on the 
NameNode :

WARNING 2015-05-07 10:01:47,221 base_alert.py:365 - 
[Alert][namenode_directory_status] HA nameservice value is present but there 
are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice
WARNING 2015-05-07 10:01:47,222 base_alert.py:365 - 
[Alert][datanode_health_summary] HA nameservice value is present but there are 
no aliases for {{hdfs-site

Re: Restarting nodes to avoid HTTP 403

2015-05-18 Thread Jonathan Hurley

That’s kind of what I expected. I believe that the web request isn’t even 
making it to the host; that there’s something in your environment redirecting 
your request and returning a 403. I’d instead do a packet capture the other 
way, from the agent, showing outbound requests on 8042.

On May 18, 2015, at 11:44 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

First of all sorry for my (very) late answer.
I confirm that restarting the agents before installing the cluster fixes the 
issue. But what is slightly more complicated is the fact that there is not 
network trace.
The command  sudo tcpdump -i lo -l -s0 -w - tcp dst port 8042 | strings
executed on the host returning 403 returns absolutely nothing.

Does that help you ?
If you want any additional information, feel free to ask.

Regards,


Loïc

De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : mercredi 13 mai 2015 23:39
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

Well, I’m really baffled by this. Restarting the agents before installing a 
cluster fixes the issue as well? So it seems like after agents are installed 
they are not able to make connections from python without getting a 403 
forbidden until they are restarted at least once. Is it possible to get a 
network trace of the agents when they encounter the 403 forbidden? That way we 
can see the communication path between the agent and a particular endpoint, 
like NameNode WebUI.

On May 13, 2015, at 8:31 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

Additional information : the restart does not have to be necessarily AFTER the 
cluster have been deployed using a Blueprint. Restarting the ambari-agent 
before using a blueprint to deploy the cluster make a perfectly clear cluster, 
without the 11 Warnings/Errors I mentioned in my previous emails.

Hope this will help understand where the problem comes from.
Have a nice day,


Loïc

De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : jeudi 7 mai 2015 18:53
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

All of this really seems to point to some kind of firewall/proxy issue on the 
agent hosts. From your agent named 
vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.frhttp://vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.fr/,
 could you try the following python code:

 import urllib2
 response = 
 urllib2.urlopen(http://vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.fr:8042/ws/v1/node/info;,
  timeout=10.0)
 print(response.code)

I’m really curious if executing that from your agent host results in a 200 or a 
403.

On May 7, 2015, at 11:45 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

Here are the answers to your questions :

1) Which operating system are you running the agents on?
-- CentOS6

2) Is there a linux proxy setup, such as “export http_proxy=foo” - curl doesn’t 
respect this proxy setting but python’s urllib2 does, which has caused some 
issues before
-- There is a proxy, but there is no such setup (I mean “echo $http_proxy” 
returns nothing, and I do not remember setting Ambari properly in order to 
access to the Internet via the proxy)

3) Which stack are you deploying?
-- I am deploying HDP-2.2.4.2-2

Have a nice weekend,


Loïc


De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : jeudi 7 mai 2015 17:34
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

The logs indicate that the alerts are running correctly and are simply hitting 
a 403. Normally, you might encounter this kind of problem from Python when 
making web connections without specifying a known user agent header. Python’s 
default header sometimes causes issues since it’s not standard. However, that 
problem would continue to happen after you restarted the agents. The fact that 
a simple agent restart completely fixes the issue is baffling.

I’ve certainly never seen this type of behavior before. I’d like to know a few 
more details on your environment:
1) Which operating system are you running the agents on?
2) Is there a linux proxy setup, such as “export http_proxy=foo” - curl doesn’t 
respect this proxy setting but python’s urllib2 does, which has caused some 
issues before
3) Which stack are you deploying?

On May 7, 2015, at 4:05 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

You will find in attachment what I get from the API you gave me the URL.
As far as the log file is concerned, here is an extract from the one on the 
NameNode :

WARNING 2015-05-07 10:01:47,221 base_alert.py:365 - 
[Alert][namenode_directory_status] HA nameservice value is present but there 
are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice

Re: Restarting nodes to avoid HTTP 403

2015-05-13 Thread Jonathan Hurley

Well, I’m really baffled by this. Restarting the agents before installing a 
cluster fixes the issue as well? So it seems like after agents are installed 
they are not able to make connections from python without getting a 403 
forbidden until they are restarted at least once. Is it possible to get a 
network trace of the agents when they encounter the 403 forbidden? That way we 
can see the communication path between the agent and a particular endpoint, 
like NameNode WebUI.

On May 13, 2015, at 8:31 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

Additional information : the restart does not have to be necessarily AFTER the 
cluster have been deployed using a Blueprint. Restarting the ambari-agent 
before using a blueprint to deploy the cluster make a perfectly clear cluster, 
without the 11 Warnings/Errors I mentioned in my previous emails.

Hope this will help understand where the problem comes from.
Have a nice day,


Loïc

De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : jeudi 7 mai 2015 18:53
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

All of this really seems to point to some kind of firewall/proxy issue on the 
agent hosts. From your agent named 
vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.frhttp://vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.fr/,
 could you try the following python code:

 import urllib2
 response = 
 urllib2.urlopen(http://vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.fr:8042/ws/v1/node/info;,
  timeout=10.0)
 print(response.code)

I’m really curious if executing that from your agent host results in a 200 or a 
403.

On May 7, 2015, at 11:45 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

Here are the answers to your questions :

1) Which operating system are you running the agents on?
-- CentOS6

2) Is there a linux proxy setup, such as “export http_proxy=foo” - curl doesn’t 
respect this proxy setting but python’s urllib2 does, which has caused some 
issues before
-- There is a proxy, but there is no such setup (I mean “echo $http_proxy” 
returns nothing, and I do not remember setting Ambari properly in order to 
access to the Internet via the proxy)

3) Which stack are you deploying?
-- I am deploying HDP-2.2.4.2-2

Have a nice weekend,


Loïc


De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : jeudi 7 mai 2015 17:34
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

The logs indicate that the alerts are running correctly and are simply hitting 
a 403. Normally, you might encounter this kind of problem from Python when 
making web connections without specifying a known user agent header. Python’s 
default header sometimes causes issues since it’s not standard. However, that 
problem would continue to happen after you restarted the agents. The fact that 
a simple agent restart completely fixes the issue is baffling.

I’ve certainly never seen this type of behavior before. I’d like to know a few 
more details on your environment:
1) Which operating system are you running the agents on?
2) Is there a linux proxy setup, such as “export http_proxy=foo” - curl doesn’t 
respect this proxy setting but python’s urllib2 does, which has caused some 
issues before
3) Which stack are you deploying?

On May 7, 2015, at 4:05 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

You will find in attachment what I get from the API you gave me the URL.
As far as the log file is concerned, here is an extract from the one on the 
NameNode :

WARNING 2015-05-07 10:01:47,221 base_alert.py:365 - 
[Alert][namenode_directory_status] HA nameservice value is present but there 
are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice
WARNING 2015-05-07 10:01:47,222 base_alert.py:365 - 
[Alert][datanode_health_summary] HA nameservice value is present but there are 
no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice
INFO 2015-05-07 10:01:47,228 scheduler.py:509 - Running job 
e89e2c29-1f2c-4bf5-a37c-7a6c5b43433a (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.211937) (scheduled at 2015-05-07 10:01:47.211937)
INFO 2015-05-07 10:01:47,229 scheduler.py:509 - Running job 
ad0e0fd3-d5f3-471f-955e-ec10f53cdd5b (trigger: interval[0:01:00], next run at: 
2015-05-07 10:01:47.212773) (scheduled at 2015-05-07 10:01:47.212773)
WARNING 2015-05-07 10:01:47,230 base_alert.py:365 - [Alert][namenode_webui] HA 
nameservice value is present but there are no aliases for 
{{hdfs-site/dfs.ha.namenodes.{{ha-nameservice
INFO 2015-05-07 10:01:47,230 scheduler.py:527 - Job 
71988556-19e3-4871-92e0-e3c0a838df13 (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.206667) executed successfully
INFO 2015-05-07 10:01:47,238 scheduler.py:509 - Running job 
2de4dc85-8993

Re: Restarting nodes to avoid HTTP 403

2015-05-07 Thread Jonathan Hurley

The logs indicate that the alerts are running correctly and are simply hitting 
a 403. Normally, you might encounter this kind of problem from Python when 
making web connections without specifying a known user agent header. Python’s 
default header sometimes causes issues since it’s not standard. However, that 
problem would continue to happen after you restarted the agents. The fact that 
a simple agent restart completely fixes the issue is baffling.

I’ve certainly never seen this type of behavior before. I’d like to know a few 
more details on your environment:
1) Which operating system are you running the agents on?
2) Is there a linux proxy setup, such as “export http_proxy=foo” - curl doesn’t 
respect this proxy setting but python’s urllib2 does, which has caused some 
issues before
3) Which stack are you deploying?

On May 7, 2015, at 4:05 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

You will find in attachment what I get from the API you gave me the URL.
As far as the log file is concerned, here is an extract from the one on the 
NameNode :

WARNING 2015-05-07 10:01:47,221 base_alert.py:365 - 
[Alert][namenode_directory_status] HA nameservice value is present but there 
are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice
WARNING 2015-05-07 10:01:47,222 base_alert.py:365 - 
[Alert][datanode_health_summary] HA nameservice value is present but there are 
no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice
INFO 2015-05-07 10:01:47,228 scheduler.py:509 - Running job 
e89e2c29-1f2c-4bf5-a37c-7a6c5b43433a (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.211937) (scheduled at 2015-05-07 10:01:47.211937)
INFO 2015-05-07 10:01:47,229 scheduler.py:509 - Running job 
ad0e0fd3-d5f3-471f-955e-ec10f53cdd5b (trigger: interval[0:01:00], next run at: 
2015-05-07 10:01:47.212773) (scheduled at 2015-05-07 10:01:47.212773)
WARNING 2015-05-07 10:01:47,230 base_alert.py:365 - [Alert][namenode_webui] HA 
nameservice value is present but there are no aliases for 
{{hdfs-site/dfs.ha.namenodes.{{ha-nameservice
INFO 2015-05-07 10:01:47,230 scheduler.py:527 - Job 
71988556-19e3-4871-92e0-e3c0a838df13 (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.206667) executed successfully
INFO 2015-05-07 10:01:47,238 scheduler.py:509 - Running job 
2de4dc85-8993-4c68-9915-db25c4313d6e (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.213529) (scheduled at 2015-05-07 10:01:47.213529)
WARNING 2015-05-07 10:01:47,240 base_alert.py:140 - 
[Alert][datanode_health_summary] Unable to execute alert. HTTP Error 403: 
Forbidden
INFO 2015-05-07 10:01:47,242 scheduler.py:527 - Job 
5bcf7b5d-73e7-4d59-bc0a-f4772d4e3166 (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.208176) executed successfully
INFO 2015-05-07 10:01:47,241 scheduler.py:509 - Running job 
ff1fc102-bdec-4858-baf5-8b60de4488e4 (trigger: interval[0:01:00], next run at: 
2015-05-07 10:01:47.214826) (scheduled at 2015-05-07 10:01:47.214826)
WARNING 2015-05-07 10:01:47,244 base_alert.py:365 - 
[Alert][yarn_resourcemanager_webui] HA nameservice value is present but there 
are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}}
INFO 2015-05-07 10:01:47,240 scheduler.py:527 - Job 
72e5031e-4c2a-4236-b09e-6a749100bc9a (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.203349) executed successfully
INFO 2015-05-07 10:01:47,250 scheduler.py:509 - Running job 
a083e3c2-d4b2-430b-9b64-b60c783a06a5 (trigger: interval[0:01:00], next run at: 
2015-05-07 10:01:47.218706) (scheduled at 2015-05-07 10:01:47.218706)
INFO 2015-05-07 10:01:47,250 scheduler.py:509 - Running job 
7e9470cb-043b-48a8-990a-0050f2c63311 (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.217519) (scheduled at 2015-05-07 10:01:47.217519)
WARNING 2015-05-07 10:01:47,253 base_alert.py:140 - 
[Alert][namenode_directory_status] Unable to execute alert. HTTP Error 403: 
Forbidden

So yes, I can see some output in the logs corresponding to the alerts I get on 
the Ambari web app.
Please tell me if you need any complementary information about my problem,

Thanks,


Loïc


De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : mercredi 6 mai 2015 16:24
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

OK, so I think I have a clear picture of how you get to this situation. I’d 
still like to know a few things:

1) When you have the warnings present in the web client, can you try the alerts 
URL I posted below to see the actual alerts coming back from the API. I’m 
mostly interesting in whether any come back in the first place, and what the 
most recent timestamp was (indicating they are actually running and still 
reporting a warning status)

2) When the alerts are present in the web client, do you see any output in the 
agent log file that I mentioned for alerts that start with [Alert].

On May 6, 2015

Re: Restarting nodes to avoid HTTP 403

2015-05-07 Thread Jonathan Hurley

All of this really seems to point to some kind of firewall/proxy issue on the 
agent hosts. From your agent named 
vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.frhttp://vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.fr,
 could you try the following python code:

 import urllib2
 response = 
 urllib2.urlopen(http://vm-03cfbc97-f027-46fe-8e65-cb8c54edf377.frida.priv.atos.fr:8042/ws/v1/node/info;,
  timeout=10.0)
 print(response.code)

I’m really curious if executing that from your agent host results in a 200 or a 
403.

On May 7, 2015, at 11:45 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

Here are the answers to your questions :

1) Which operating system are you running the agents on?
-- CentOS6

2) Is there a linux proxy setup, such as “export http_proxy=foo” - curl doesn’t 
respect this proxy setting but python’s urllib2 does, which has caused some 
issues before
-- There is a proxy, but there is no such setup (I mean “echo $http_proxy” 
returns nothing, and I do not remember setting Ambari properly in order to 
access to the Internet via the proxy)

3) Which stack are you deploying?
-- I am deploying HDP-2.2.4.2-2

Have a nice weekend,


Loïc


De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : jeudi 7 mai 2015 17:34
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

The logs indicate that the alerts are running correctly and are simply hitting 
a 403. Normally, you might encounter this kind of problem from Python when 
making web connections without specifying a known user agent header. Python’s 
default header sometimes causes issues since it’s not standard. However, that 
problem would continue to happen after you restarted the agents. The fact that 
a simple agent restart completely fixes the issue is baffling.

I’ve certainly never seen this type of behavior before. I’d like to know a few 
more details on your environment:
1) Which operating system are you running the agents on?
2) Is there a linux proxy setup, such as “export http_proxy=foo” - curl doesn’t 
respect this proxy setting but python’s urllib2 does, which has caused some 
issues before
3) Which stack are you deploying?

On May 7, 2015, at 4:05 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi Jonathan,

You will find in attachment what I get from the API you gave me the URL.
As far as the log file is concerned, here is an extract from the one on the 
NameNode :

WARNING 2015-05-07 10:01:47,221 base_alert.py:365 - 
[Alert][namenode_directory_status] HA nameservice value is present but there 
are no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice
WARNING 2015-05-07 10:01:47,222 base_alert.py:365 - 
[Alert][datanode_health_summary] HA nameservice value is present but there are 
no aliases for {{hdfs-site/dfs.ha.namenodes.{{ha-nameservice
INFO 2015-05-07 10:01:47,228 scheduler.py:509 - Running job 
e89e2c29-1f2c-4bf5-a37c-7a6c5b43433a (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.211937) (scheduled at 2015-05-07 10:01:47.211937)
INFO 2015-05-07 10:01:47,229 scheduler.py:509 - Running job 
ad0e0fd3-d5f3-471f-955e-ec10f53cdd5b (trigger: interval[0:01:00], next run at: 
2015-05-07 10:01:47.212773) (scheduled at 2015-05-07 10:01:47.212773)
WARNING 2015-05-07 10:01:47,230 base_alert.py:365 - [Alert][namenode_webui] HA 
nameservice value is present but there are no aliases for 
{{hdfs-site/dfs.ha.namenodes.{{ha-nameservice
INFO 2015-05-07 10:01:47,230 scheduler.py:527 - Job 
71988556-19e3-4871-92e0-e3c0a838df13 (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.206667) executed successfully
INFO 2015-05-07 10:01:47,238 scheduler.py:509 - Running job 
2de4dc85-8993-4c68-9915-db25c4313d6e (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.213529) (scheduled at 2015-05-07 10:01:47.213529)
WARNING 2015-05-07 10:01:47,240 base_alert.py:140 - 
[Alert][datanode_health_summary] Unable to execute alert. HTTP Error 403: 
Forbidden
INFO 2015-05-07 10:01:47,242 scheduler.py:527 - Job 
5bcf7b5d-73e7-4d59-bc0a-f4772d4e3166 (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.208176) executed successfully
INFO 2015-05-07 10:01:47,241 scheduler.py:509 - Running job 
ff1fc102-bdec-4858-baf5-8b60de4488e4 (trigger: interval[0:01:00], next run at: 
2015-05-07 10:01:47.214826) (scheduled at 2015-05-07 10:01:47.214826)
WARNING 2015-05-07 10:01:47,244 base_alert.py:365 - 
[Alert][yarn_resourcemanager_webui] HA nameservice value is present but there 
are no aliases for {{yarn-site/yarn.resourcemanager.ha.rm-ids}}
INFO 2015-05-07 10:01:47,240 scheduler.py:527 - Job 
72e5031e-4c2a-4236-b09e-6a749100bc9a (trigger: interval[0:01:00], next run at: 
2015-05-07 10:02:47.203349) executed successfully
INFO 2015-05-07 10:01:47,250 scheduler.py:509 - Running job 
a083e3c2-d4b2-430b-9b64-b60c783a06a5 (trigger: interval[0:01:00], next run

Re: Restarting nodes to avoid HTTP 403

2015-05-06 Thread Jonathan Hurley

OK, so I think I have a clear picture of how you get to this situation. I’d
still like to know a few things:

1) When you have the warnings present in the web client, can you try the alerts
URL I posted below to see the actual alerts coming back from the API. I’m
mostly interesting in whether any come back in the first place, and what the
most recent timestamp was (indicating they are actually running and still
reporting a warning status)

2) When the alerts are present in the web client, do you see any output in the
agent log file that I mentioned for alerts that start with [Alert].

On May 6, 2015, at 5:14 AM, Chanel Loïc
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

I encounter the warnings and critical alerts when deploying the cluster. I
install Ambari server 2.0 on a VM and Ambari agent 2.0 on 4 others VMs. Then, I
give the Ambari server a blueprint coming from a functioning cluster to
instantiate my new cluster and have a quickstart configuration.

Then, when logging into the Ambari web application to ensure everything is
running properly, I have these alerts concerning the HTTP 403 errors returned
by all host VMs but the one which only handles Ambari metrics.

As I am not sure my explanations are quite understandable, do not hesitate to
tell me if something remains unclear.
Thanks,

De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : mardi 5 mai 2015 20:38
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

If restarting the agents fixes everything, can you explain when you first
encounter the warnings? Is this only after a cluster deployment? You can also
check to see if this is some sort of web client issue by issuing the following
GET:

http://server/api/v1/clusters/cluster/alerts?fields=*Alert/state.in(CRITICAL,WARNING)http://server/api/v1/clusters/%3Ccluster%3E/alerts?fields=*Alert/state.in(CRITICAL,WARNING)

This will show you alerts which are actually being returned from the agents in
a warning or critical state.

For reference, you can also look in /var/log/ambari-agent/ambari-agent.log to
see if you see any alert issues. Most important messages are prefixed with
[Alert]”.

On May 5, 2015, at 11:43 AM, Chanel Loïc
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Indeed, I did not gave so much information about my problem, sorry about that.
Here is your answers :

1) Version of Ambari
-- I’m using Ambari 2.0

2) Whether the environment is kerberized
-- Not it’s not. Kerberos security is not enabled on this cluster.

3) Are you running the Ambari agent as root, or another user
-- I am running it as root

4) Any information from the ambari-agent.log file that might seem to indicate a
problem
-- Here is another weird thing I did not mentioned : I could not find logs
referring to the problem.

5) You said that restarting the agents resolves the issue. Does it continue to
happen after restarting? If so, how long before new warnings start to show up
-- Restarting the agent totally resolves the problem. It does not happen
anymore, and everything run quite normally.

De : Jonathan Hurley [mailto:jhur...@hortonworks.com]
Envoyé : mardi 5 mai 2015 17:36
À : user@ambari.apache.orgmailto:user@ambari.apache.org
Objet : Re: Restarting nodes to avoid HTTP 403

Can you provide some more information on your environment, such as:

1) Version of Ambari
2) Whether the environment is kerberized
3) Are you running the Ambari agent as root, or another user.
4) Any information from the ambari-agent.log file that might seen to indicate a
problem
5) You said that restarting the agents resolves the issue. Does it continue to
happen after restarting? If so, how long before new warnings start to show up.

I’m guessing you have a kerberized environment running Ambari 2.0. Ambari will
use curl in this case to attempt to make a connection to the web endpoints. It
uses the keytabs and principals defined on the alert definition. For NameNode,
as an example, it would use:

hdfs-site/dfs.web.authentication.kerberos.keytab
hdfs-site/dfs.web.authentication.kerberos.principal

You’ll want to verify that these properties are correctly set and that the
keytab file is accessible to the agent user.

It could also be a cache problem as the agents cache the kerberos credentials
in the agent’s temp directory. How long it takes after alerts start producing
warnings would in determining if it’s a caching issue.

On May 5, 2015, at 10:49 AM, Chanel Loïc
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi,

I have currently some issues with cluster nodes. According to the Ambari web
User Interface, I have 11 alerts linked to the fact that all nodes (except the
monitoring one) return HTTP 403 response for services such as NameNode web UI,
DataNode web UI, or NodeManager Health.

The first weird thing is the fact that I can easily see that the so-called
Forbidden ports are actually

Re: Restarting nodes to avoid HTTP 403

2015-05-05 Thread Jonathan Hurley

Can you provide some more information on your environment, such as:

1) Version of Ambari
2) Whether the environment is kerberized
3) Are you running the Ambari agent as root, or another user.
4) Any information from the ambari-agent.log file that might seen to indicate a 
problem
5) You said that restarting the agents resolves the issue. Does it continue to 
happen after restarting? If so, how long before new warnings start to show up.

I’m guessing you have a kerberized environment running Ambari 2.0. Ambari will 
use curl in this case to attempt to make a connection to the web endpoints. It 
uses the keytabs and principals defined on the alert definition. For NameNode, 
as an example, it would use:

hdfs-site/dfs.web.authentication.kerberos.keytab
hdfs-site/dfs.web.authentication.kerberos.principal

You’ll want to verify that these properties are correctly set and that the 
keytab file is accessible to the agent user.

It could also be a cache problem as the agents cache the kerberos credentials 
in the agent’s temp directory. How long it takes after alerts start producing 
warnings would in determining if it’s a caching issue.


On May 5, 2015, at 10:49 AM, Chanel Loïc 
loic.cha...@worldline.commailto:loic.cha...@worldline.com wrote:

Hi,

I have currently some issues with cluster nodes. According to the Ambari web 
User Interface, I have 11 alerts linked to the fact that all nodes (except the 
monitoring one) return HTTP 403 response for services such as NameNode web UI, 
DataNode web UI, or NodeManager Health.

The first weird thing is the fact that I can easily see that the so-called 
Forbidden ports are actually quite available (via cURL for example) and 
indicate that the cluster is totally ok.
The second is the fact that the 11 alerts magically disappear when rebooting 
the agent on the hosts related to the errors.

Does anyone know where these errors might come from ?

Thanks,


Loïc



Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
exclusif de ses destinataires. Il peut également être protégé par le secret 
professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
immédiatement l'expéditeur et de le détruire. L'intégrité du message ne pouvant 
être assurée sur Internet, la responsabilité de Worldline ne pourra être 
recherchée quant au contenu de ce message. Bien que les meilleurs efforts 
soient faits pour maintenir cette transmission exempte de tout virus, 
l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
saurait être recherchée pour tout dommage résultant d'un virus transmis.

This e-mail and the documents attached are confidential and intended solely for 
the addressee; it may also be privileged. If you receive this e-mail in error, 
please notify the sender immediately and destroy it. As its integrity cannot be 
secured on the Internet, the Worldline liability cannot be triggered for the 
message content. Although the sender endeavours to maintain a computer 
virus-free network, the sender does not warrant that this transmission is 
virus-free and will not be liable for any damages resulting from any virus 
transmitted.