Re: Review Request 30603: RU Hacks and Technical Debt - Namenode order of active/standby in code is flipped

Alejandro Fernandez Tue, 03 Feb 2015 18:55:57 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30603/#review70915
-----------------------------------------------------------




ambari-server/src/main/java/org/apache/ambari/server/serveraction/upgrades/FinalizeUpgradeAction.java
<https://reviews.apache.org/r/30603/#comment116382>

    Unrelated fix.



ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java
<https://reviews.apache.org/r/30603/#comment116383>

    This logic should not be here, since it prevents accurately calculating the 
states, and would require another restart, which can only be done through the 
API or the experimental flag.



ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/zookeeper_server.py
<https://reviews.apache.org/r/30603/#comment116384>

    More debugging info.


- Alejandro Fernandez


On Feb. 4, 2015, 2:53 a.m., Alejandro Fernandez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30603/
> -----------------------------------------------------------
> 
> (Updated Feb. 4, 2015, 2:53 a.m.)
> 
> 
> Review request for Ambari, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, 
> and Yurii Shylov.
> 
> 
> Bugs: AMBARI-9467
>     https://issues.apache.org/jira/browse/AMBARI-9467
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> UpgradeHelper somehow calls the active Namenode first, but this ends up being 
> the standby namenode by the time it gets called; investigate why.
> 
> We will abide by the order in the runbook to first upgrade the standby then 
> the active namenode, which then causes a flip.
> In rare cases, if a namenode fails for whatever reason, ZKFC will initiate a 
> failover, which explains why sometimes the order may be flipped by the time 
> that the Namenode prepare happens. However, the namenode_upgrade.py script 
> works in both cases (active first, or standby first). So this explains the 
> rare behavior.
> There's another Jira to run the namenode_upgrade script as part of the 
> Pre-Cluster group to make the backup, so this should reduce the likelyhood of 
> a flip happening after the calculation was made.
> 
> 
> Diffs
> -----
> 
>   
> ambari-server/src/main/java/org/apache/ambari/server/serveraction/upgrades/FinalizeUpgradeAction.java
>  fceb44d 
>   
> ambari-server/src/main/java/org/apache/ambari/server/state/UpgradeHelper.java 
> 0c6f68a 
>   
> ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java
>  db17109 
>   
> ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/params.py
>  2484463 
>   
> ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/service_check.py
>  338de32 
>   
> ambari-server/src/main/resources/common-services/ZOOKEEPER/3.4.5.2.0/package/scripts/zookeeper_server.py
>  a7ca335 
> 
> Diff: https://reviews.apache.org/r/30603/diff/
> 
> 
> Testing
> -------
> 
> Verified Rolling Upgrade a 3-node cluster with HDFS, ZK, and Namenode HA. The 
> flip happens rarely, but ambari must be robust to handle it.
> 
> Unit tests are in progress.
> 
> 
> Thanks,
> 
> Alejandro Fernandez
> 
>

Re: Review Request 30603: RU Hacks and Technical Debt - Namenode order of active/standby in code is flipped

Reply via email to