Fine with flag, but prefer to use rco as default though. Since the default behavior is only recently changed in the last 6 months. It would be better to restore to the v1 behavior.
regards, Eric On Mon, Mar 14, 2016 at 5:55 PM, Bhuvnesh Chaudhary <bchaudh...@pivotal.io> wrote: > I have created a placeholder JIRA documenting the feature and if we all > agree let's do it. > https://issues.apache.org/jira/browse/AMBARI-15417 > > Thanks, > Bhuvnesh Chaudhary > Email: bchau <bchaudh...@gopivotal.com>dh...@pivotal.io > Desk: +1-650-846-1696 | Mobile: +1-973-906-6976 > > On Mon, Mar 14, 2016 at 11:17 AM, Alejandro Fernandez < > afernan...@hortonworks.com> wrote: > > > I agree configuring this with a flag is ideal. > > > > Thanks, > > Alejandro > > > > From: Bhuvnesh Chaudhary <bchaudh...@pivotal.io> > > Date: Monday, March 14, 2016 at 11:06 AM > > To: Ambari <dev@ambari.apache.org> > > Cc: Sumit Mohanty <smoha...@hortonworks.com>, Alejandro Fernandez < > > afernan...@hortonworks.com> > > Subject: Re: Blueprints - RCO - Related question. > > > > Thank you very much Robert for the detailed explanation. It helps > > to understand the background. > > > > Regarding HAWQ to capitalize on retry: We can potentially do some > > tweaks to verify if HAWQ has been initialized or not according to the > > current behavior, and change the way of doing init so that it can utilize > > retry. > > Currently, it goes for retry but it has certain pre-requisites which > fails > > after the first > > failed installed attempt and retry is also not successul. > > Will have to investigate on it. > > > > Regarding alternatives: > > Was the option to put a flag in blueprints enabling / disabling RCO > > considered ? Say, by default use_rco is true, and if someone want's > > to override the behavior they can override that in blueprint. > > > > As quoted by Eric in the above email, in some cases, the retry can also > > cause > > increase in the amount of time required due to > > 1) number of retries before it completes successfully, or it fails > > completely > > 2) Before retry there has to be some cleanup steps which may be > > required for a service (for hawq currently), services must incorporate > > that logic. > > > > Also with RCO, the sequence of startup is predictable and all the > > dependencies will be met. > > > > So probably, making use of rco configurable in blueprints satisfies both > > the worlds > > who want to use rco vs not use it. > > Your thoughts ? > > > > > > > > > > Thanks, > > Bhuvnesh Chaudhary > > Email: bchau <bchaudh...@gopivotal.com>dh...@pivotal.io > > Desk: +1-650-846-1696 | Mobile: +1-973-906-6976 > > > > On Mon, Mar 14, 2016 at 9:18 AM, Eric Yang <eric...@gmail.com> wrote: > > > >> We have a use case where a service depends on Sqoop, Hive Metastore, > HBase > >> Client, Hadoop Client on a worker node. We found that Hadoop Client is > >> sometimes not yet installed when our service installation has already > >> started. This looks like a big problem for our use case. Is there a > way > >> to keep RCO by using a flag? Parallel install with retries is Chef and > >> Puppet approach of configuring distributed loosely coupled service that > >> has > >> no strong tight relationship between nodes. It doesn't solve the > problem > >> of virtual services where a component depends on availability of other > >> services. We had been scratching our heads on this since August last > >> year. It is good to know the problem so we can work out the kinks. > >> > >> If component is also monster size that it takes 60 minutes to download > and > >> install. We can bump up retries for Hadoop client to very large number, > >> but does this mean that while the monster size component is retrying, > >> Hadoop clients maybe installed in parallel, hence second attempt of the > >> monster component could succeed? It seems like in this use case, the > new > >> optimization doesn't improve installation time because Ambari needs 120 > >> minutes to complete second retry of installation frequently. > >> > >> regards, > >> Eric > >> > >> On Mon, Mar 14, 2016 at 6:38 AM, Robert Nettleton < > >> rnettle...@hortonworks.com> wrote: > >> > >> > Hi Bhuvnesh, > >> > > >> > You are correct. The Blueprints deployment mechanism in Ambari no > >> longer > >> > relies on Role-command ordering to install or start components across > >> the > >> > cluster. > >> > > >> > This change to Blueprints was actually implemented in Ambari 2.1.0, so > >> it > >> > has been around for several releases now. The new approach was > >> implemented > >> > to improve the performance times of cluster deployments, and provide > >> better > >> > support for dynamic scaling of clusters. > >> > > >> > That being said, the new deployment mechanism does indeed remove the > >> > guarantee of ordering, which can potentially cause some problems for > >> > certain types of clusters. There were also changes implemented on the > >> > Ambari Agent side to mitigate this problem or ordering. The > >> ambari-agent > >> > will now retry INSTALL and START operations if those operations happen > >> to > >> > fail. The START operation is probably the most relevant in your case, > >> and > >> > is also the operation that does show the ordering issues you’ve > >> mentioned > >> > in some deployments. > >> > > >> > The idea is that the ambari-agent retries should help to resolve any > >> > issues with services starting in an unexpected order. > >> > > >> > This ambari-agent feature is on by default, but can be configured in a > >> > more fine-grained fashion by setting some properties in “cluster-env” > in > >> > your Blueprint or Cluster Creation Template. > >> > > >> > Unfortunately, this is not documented very well, but the three > >> properties > >> > in question are set by default in the BlueprintConfigurationProcessor > in > >> > the following method: > >> > > >> > > >> > > >> > org.apache.ambari.server.controller.internal.BlueprintConfigurationProcessor#setRetryConfiguration > >> > > >> > The properties set in this method allow control over the types of > >> > operations that are retried, the max number of retries attempted, and > >> the > >> > maximum amount of time that the agent should attempt a retry. > >> > > >> > We’ve seen many clusters using this new approach, and have not run > into > >> > that many problems with respect to ordering. > >> > > >> > One possible problem we’ve seen is in a small number of components > that > >> > launch services as a background command. In that case, the > ambari-agent > >> > cannot detect that a retry is required, and so cannot attempt a > restart > >> of > >> > a failed service. This problem can usually be resolved with > >> > component-specific retries. > >> > > >> > I don’t know much about the HAWQ component, but I would expect that > >> > customizing the retry settings may help this problem. Do the HAWQ > >> > components implement retry attempts when booting up? > >> > > >> > Hope this helps. > >> > > >> > Thanks, > >> > Bob > >> > > >> > > >> > > >> > > >> > On Mar 11, 2016, at 7:18 PM, Alejandro Fernandez < > >> > afernan...@hortonworks.com> wrote: > >> > > >> > > +others who have more insight into BluePrints > >> > > > >> > > On 3/11/16, 3:24 PM, "Bhuvnesh Chaudhary" <bchaudh...@pivotal.io> > >> wrote: > >> > > > >> > >> Hello Sebastian, Alejandro, Andrew, > >> > >> > >> > >> Referring to the discussion on RB: > >> https://reviews.apache.org/r/43948 > >> > >> <https://reviews.apache.org/r/43948/#review120537>, it appears > that > >> > while > >> > >> deploying clusters using Blueprints, RCO is not honored. Please > >> confirm > >> > if > >> > >> this understanding is correct. > >> > >> > >> > >> While running internal test suites for HAWQ, we deploy the clusters > >> > using > >> > >> BP, and we need a specific order in which the HAWQ components must > be > >> > >> initialized / started. > >> > >> > >> > >> "HAWQ Standby" component should be initialized after "HAWQ Master" > >> > >> component as it has to copy the contents from HAWQ Master. However, > >> > since > >> > >> RCO is not honored, we often come across issues as HAWQ Standby > >> start / > >> > >> initialization before HAWQ Master. > >> > >> > >> > >> Could you please let us know if there any work already going on for > >> > >> bringing in RCO dependency for Blueprints, if not is there any > other > >> > >> alternative which can be used to enforce the dependency locally, or > >> > >> something else which you suggest. > >> > >> > >> > >> Thanks, > >> > >> Bhuvnesh Chaudhary > >> > >> Email: bchau <bchaudh...@gopivotal.com>dh...@pivotal.io > >> > >> Desk: +1-650-846-1696 | Mobile: +1-973-906-6976 > >> > > > >> > > >> > > >> > > > > >