[ 
https://issues.apache.org/jira/browse/BIGTOP-635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421089#comment-13421089
 ] 

Stephen Chu commented on BIGTOP-635:
------------------------------------

Thanks, Sujay.
 
Some initial comments/questions:

{code}
+   <dependency>                                                                
                                                        
+     <groupId>org.apache.bigtop.itest</groupId>                                
                                                        
+     <artifactId>itest-common</artifactId>                                     
                                                        
+     <version>0.2.0-incubating</version>                                       
                                                        
+   </dependency>
{code}

Should this be 0.5.0-incubating instead? Bigtop trunk test artifacts are using 
0.5.0-incubating.

{code}
+     <dependency>                                                              
                                                        
+         <groupId>org.apache.hadoop</groupId>                                  
                                                        
+         <artifactId>hadoop-mapreduce-client-core</artifactId>                 
                                                        
+         <version>2.0.0-alpha</version>                                        
                                                        
+       </dependency>                                                           
                                                        
+     <dependency>                                                              
                                                        
+         <groupId>org.apache.hadoop</groupId>                                  
                                                        
+         <artifactId>hadoop-common</artifactId>                                
                                                        
+         <version>2.0.0-alpha</version>                                        
                                                        
+     </dependency>                                                             
                                                        
+     <dependency>                                                              
                                                        
+         <groupId>org.apache.hadoop</groupId>                                  
                                                        
+         <artifactId>hadoop-common</artifactId>                                
                                                        
+         <version>2.0.0-alpha</version>                                        
                                                        
+         <type>test-jar</type>                                                 
                                                        
+     </dependency> 
{code}

Bigtop trunk hadoop tests are using 2.0.0-SNAPSHOT.

{code}
+public interface ClusterAdapter {                                              
                                                        
+ /**                                                                           
                                                        
+  * Cluster Daemons: NameNode, DataNode, JobTracker, TaskTracker, 
SecondaryNameNode, HRegionServer, HMaster                            
+  /** 
{code}

The "Cluster Daemons:" comment seems unnecessary because the specific daemons 
are not referenced in the rest of the class.


{code}
+  /**                                                                          
                                                        
+   * Shuts down HBase cluster                                                  
                                                        
+   */                                                                          
                                                        
+  void hbaseShutdown();   
{code}

In HDFSAdapter, there is stopHDFSservice, startHDFSservice, and 
restartHDFSservice (MRAdapter follows the same style, too). Seems like we 
should have a startHBaseService, stopHBaseService, and restartHBaseService. 
Also, should we truncate these names to stopHDFS/startHDFS? Tagging "Service" 
on the end might be unnecessary. I think most people will know what you mean.

{code}
private LinkedList<Host> cluster = new LinkedList<Host>();
{code}

Perhaps rename to clusterHosts? If I'm reading "cluster" in other parts of the 
code, I might not quickly remember that it's a collection of Hosts.

{code}
+         //dataNode.refreshDaemons(); 
{code}

Remove if unnecessary.

{code}
+ public void waitUntilStarted(String daemon, Host hostname, long timeout) 
throws Exception {                                                              
                      
+   assertTrue(hostname != null);                                               
                                                                                
                 
+   long endTime = System.currentTimeMillis() + timeout;                        
                                                                                
                 
+   boolean done = false;                                                       
                                                                                
                 
+     while (!done) {                                                           
                                                                                
                 
+         if (System.currentTimeMillis() > endTime) {   
+             throw new Exception("Timeout value reached");                     
                                                                                
                 
+         }                                                                     
                                                                                
                 
+       for (Daemon d : hostname.getDaemons()) {                                
                                                                                
                 
+         if (d.getName().equalsIgnoreCase(daemon)) {                           
                                                                                
                 
+           done = true;                                                        
                                                                                
                 
+         }                                                                     
                                                                                
                 
+       }                                                                       
                                                                                
                 
+     }                                                                         
                                                                                
                 
+ }                                                                             
                                                                                
                 
+ /**                                                                           
                                                                                
                 
+  * Stalls thread until specified daemon is stopped on specified machine or 
timeout value is reached.                                                       
                    
+  * @throws Exception                                                          
                                                                                
                 
+  */                                                                           
                                                                                
                 
+ public void waitUntilStopped(String daemon, Host hostname, long timeout) 
throws Exception {                                                              
                      
+   assertTrue(hostname != null);                                               
                                                                                
                 
+   long endTime = System.currentTimeMillis() + timeout;                        
                                                                                
                 
+   boolean done = false;                                                       
                                                                                
                 
+     while (!done) {                                                           
                                                                                
                 
+         if (System.currentTimeMillis() > endTime) {                           
                                                                                
                 
+             throw new Exception("Timeout value reached");                     
                                                                                
                 
+         }                                                                     
                                                                                
                 
+       boolean isStopped = true;                                               
                                                                                
                 
+       for (Daemon d : hostname.getDaemons()) {                                
                                                                                
                 
+         if (d.getName().equalsIgnoreCase(daemon)) {                           
                                                                                
                 
+           isStopped = false;                                                  
                                                                                
                 
+         }                                                                     
                                                                                
                 
+       }                                                                       
                                                                                
                 
+       if (isStopped) {                                                        
                                                                                
                 
+         done = true;                                                          
                                                                                
                 
+       }                                                                       
                                                                                
                 
+     }                                                                         
                                                                                
                 
+ } 
{code}

Seems like we can refactor these 2 methods because they share a lot in common.

{code}
+   if (onNamenode) {                                                           
                                                                                
                 
+                                                                               
                            
+   }                                                                           
                                                                                
                 
+   else {                                                                      
                                                                                
                 
+     runShellCommand("sudo -u hdfs hdfs haadmin -failover " + active + " " + 
standby, active_host, false, false);                                            
                   
+   }        
{code}

I think we can just call shHDFS.exec("hdfs haadmin -failover " + active + " " + 
standby); If we successfully get hdfs user's shell on any node in the cluster, 
we should be able to perform failover using it.

{code}
+++ 
bigtop-test-framework/src/main/groovy/org/apache/bigtop/itest/clustermanager/distributions/VersionAClusterManager.java
{code}

Should we start thinking of a different name for this? Maybe 
BigtopClusterManager like you mentioned before. 

{code}
+++ 
bigtop-test-framework/src/test/groovy/org/apache/bigtop/itest/clustermanager/HAMRBCMHelperThread.java
                               
{code}
{code}
+++ 
bigtop-test-framework/src/test/groovy/org/apache/bigtop/itest/clustermanager/TestHAMRBCM.java
                                       
{code}

We should move these tests into the Bigtop Hadoop test artifacts.
                
> Implement a cluster-abstraction, discovery and manipulation framework for 
> iTest
> -------------------------------------------------------------------------------
>
>                 Key: BIGTOP-635
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-635
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Tests
>    Affects Versions: 0.4.0
>            Reporter: Roman Shaposhnik
>            Assignee: Sujay Rau
>             Fix For: 0.5.0
>
>         Attachments: BigtopClusterManager.zip, BigtopClusterManagerv2.zip, 
> ClusterManagerAPI.pdf, bigtop-635.patch
>
>
> We've come to a point where our tests need to have a uniform way of 
> interfacing with the cluster under test. It is no longer ok to assume that 
> the test can be executed on a particular node (and thus have access to 
> services running on it). It is also less than ideal for tests to assume a 
> particular type of interaction with the services since it tends to break in 
> different deployment scenarios. 
> A framework that needs to be put in place has to be capable of (regardless of 
> where a test using it is executed on):
>   # representing the abstract configuration of the cluster
>   # representing the abstract topology of the entire cluster (services 
> running on a cluster, nodes hosting the daemons, racks, etc).
>   # giving tests an ability to query this topology
>   # giving tests an ability to affect the nodes in that topology in a 
> particular way (refreshing configuration, restarting services, etc.)
> Of course, the ideal solution here would be to give Bigtop tests a 
> programmatic access to a Hadoop cluster management framework such as 
> Cloudera's CM or Apache Ambari. 
> As with any ideal solutions I don't think it is realistic though. Hence we 
> have to cook something up. At this point I'm really focused on getting the 
> API right and I'm totally fine with an implementation of that API to be 
> something as silly as a bunch of ssh-based scripts or something.
> This JIRA is primarily focused on coming up with such an API. Anybody who's 
> willing to help is welcome to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to