Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Eric Yang Mon, 02 Oct 2017 16:55:43 -0700

Chris,

Here is a patch that use powermock.  
https://issues.apache.org/jira/secure/attachment/12889144/YARN-7202.yarn-native-services.002.patch
This was written to verify that when ServiceClient interacts with Hadoop, if it 
throws the possible Exception types declared by ServiceClient API, does the 
REST API layer handles the error code correctly.  It can help to simulate 
internal errors and safe guard the API against the errors.  It seems like a 
useful approach to reduce the full setup of MiniYarnCluster, and submit job and 
generate actual failure situations in the backend.


It looks like a useful way to test negative test cases.  The full exercise of 
positive case is written in another test case in TestYarnNativeServices in 
Hadoop-yarn-services-api project.
Without ability to inject fault into the system, it is harder to test negative 
cases.  However, I found it difficult to attempt this in Hadoop code base.  
Suggestion?

Regards,
Eric

On 10/2/17, 3:09 PM, "Chris Douglas" <[email protected]> wrote:

    Eric/Steve-
    
    Please pick a test- any test- and demonstrate why Powermock would
    improve- by any metric- testing in Hadoop. -C
    
    
    
    On Mon, Oct 2, 2017 at 2:12 PM, Eric Yang <[email protected]> wrote:
    > Mock provides tool chains to run simulation for a piece of code.  It 
helps to prevent null pointer exception, and reduce unexpected runtime 
exceptions.  When a piece of code is finished with a well-defined unit test, it 
provides great insights to see author’s intention and reasoning to write the 
code.  However, everyone looks at code from a different perspective, and it is 
often easier to rewrite the code than modifying and update the tests.   The 
short coming of writing new code, there is always danger of losing existing 
purpose, workaround buried deep in the code.  On the other hand, if a test 
program is filling with several pages of initialization code, and override.  It 
is hard to get context of the test case, and easy to lose the original meaning 
of the test case.  Hence, there are drawback for using mock or full integration 
test.
    >
    > I was in favor of using Powermock in favor of giving user the ability to 
unit test a class and reduce external interference initially.  However, I 
quickly come to realization that Hadoop usage of protocol buffer serialization 
technique and java reflection serialization technique have some difference 
which prevents powermock to work for certain Hadoop classes.
    >
    > Hadoop unit tests are written to be bigger than one class, and 
frequently, a mini-cluster is spawned to test 5-10 lines of code.  Any simple 
API test will trigger large portion of Hadoop code to be initialized.  Hadoop 
code base will require too much effort to work with Powermock.  Programs 
outside of Hadoop can use powermock annotation to prevent mocking Hadoop 
classes, such as: @powermockignore({"javax.management_", "javax.xml.", 
"org.w3c.", "org.apache.hadoop._", "com.sun.*"}) .  However, working in Hadoop 
code base, this technique is not practical because every class in Hadoop prefix 
with org.apache.hadoop.  It will be heavy upkeep to maintain the list of prefix 
packages that can not work with powermock reflection.
    > Hence, I rest my case for re-opening this issue.
    >
    > Regards,
    > Eric
    >
    > From: Steve Loughran <[email protected]>
    > Date: Sunday, October 1, 2017 at 12:36 PM
    > To: Eric Yang <[email protected]>
    > Cc: Andrew Wang <[email protected]>, Chris Douglas 
<[email protected]>, "[email protected]" 
<[email protected]>
    > Subject: Re: [DISCUSS] HADOOP-9122 Add power mock library for writing 
better unit tests
    >
    >
    > On 29 Sep 2017, at 22:46, Eric Yang 
<[email protected]<mailto:[email protected]>> wrote:
    >
    > Hi Chris and Andrew,
    >
    > The intend is for new code to have better unit test cases without resort 
to invocation of miniHDFSCluster or miniYarnCluster.  Existing code don’t 
require refactoring, if the test cases already have good coverages.  I am 
currently working on part of YARN to improve YARN and Docker integration.  
There are a lot of code getting triggered for UGI, FileSystem object to Yarn 
job submission.  My code is only responsible to check the logic of the user 
input, and expected output prior to YarnClient job submission.  Starting a 
miniCluster for this test case is excessive for the small piece of code for 
validation.  The submission code was imported from Slider for YARN native 
services, a single class imports various Hadoop services.  In several failure 
cases, it is difficult to simulate exact error conditions because the API is 
several layers deep.  Powermock provides easy way to replace and stubbing 
return object or throw proper exception to simulate the failure conditions.  
One can argue that the code should have been written easier for unit tests, but 
Hadoop code density is beyond trivial to get simple initialization done.  
Constructor suppression, inner class replacement and private method override 
are good tools from Powermock that can provide more accurate testing without 
losing sights of multiple stage API calling tests while keeping the test case 
localized to a small piece of the greater puzzle.  Hence, I like to request the 
community to rethink the improvement that Powermock can bring to the table.  
Thank you for your considation.
    >
    > I don't know enough about powermock to have opinions on the matter. I do 
know I don't like mocking in general 
https://www.slideshare.net/steve_l/i-hate-mocking , or at least in the one area 
where I find it most troublesome: maintaining code
    >
    >
    > I' just find that mock code tests to be very brittle to changes in the 
codepaths of the classes called, so whenever you change the implementation, 
tests fail. And it's not so much "your code has regressed and we correctly 
caught it"  failure as "the change in order of invocation caused our test to 
report a regression when it wasn't really" kind of failure. Which is bad, as 
you waste time working out that this is the cause, then often fix the problems 
by moving bits of the test around until it stops failing. Which can hide real 
regressions.
    >
    > Where mocking can be good is in that
    >
    > 1. you can make assertions about how thinga were invoked, though note 
we've moved in S3A towards actually instrumenting the code and asserting on 
that. This way our shipping code gets to enjoy better instrumentation. [Note, 
those assertions can be brittle to changes in implementation too]
    >
    > 2. You can simulate failure better. But for S3Guard/S3A we've gone and 
implemented an InconsistentS3Client which can be used downstream (it ships in 
the hadoop-aws JAR) and so can be used downstream.
    >
    > 3. You can test things without needing so much support infra (e.g. in 
unit tests and on jenkins without needing logins, running services)
    >
    > 4. You can have faster tests, because there's no need to set up/tear down 
things like HDFS
    >
    > 5. You can isolate problems to the code under test, rather than looking 
at the logs of forked processes collected somewhere under target/
    >
    > I think Eric's looking @ #4, & 5 which, for tests which need a MiniYARN 
cluster is significant. If Powermock helps this, I don't see why we should say 
"don't use it", as long as we are aware of the cost, which is the risk of 
creating tests which are brittle to changes in the implementation code
    >
    >
    > FWIW, Mocking is why I couldn't make the init/start/stop methods of 
org.apache.hadoop.service.AbstractService final; the need to test with mocking 
can impact production code. Is that bad? Well, we do other things to code to 
aid testability,...
    >
    >
    > -Steve
    >
    >

Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better unit tests

Reply via email to