Mock provides tool chains to run simulation for a piece of code. It helps to
prevent null pointer exception, and reduce unexpected runtime exceptions. When
a piece of code is finished with a well-defined unit test, it provides great
insights to see author’s intention and reasoning to write the code. However,
everyone looks at code from a different perspective, and it is often easier to
rewrite the code than modifying and update the tests. The short coming of
writing new code, there is always danger of losing existing purpose, workaround
buried deep in the code. On the other hand, if a test program is filling with
several pages of initialization code, and override. It is hard to get context
of the test case, and easy to lose the original meaning of the test case.
Hence, there are drawback for using mock or full integration test.
I was in favor of using Powermock in favor of giving user the ability to unit
test a class and reduce external interference initially. However, I quickly
come to realization that Hadoop usage of protocol buffer serialization
technique and java reflection serialization technique have some difference
which prevents powermock to work for certain Hadoop classes.
Hadoop unit tests are written to be bigger than one class, and frequently, a
mini-cluster is spawned to test 5-10 lines of code. Any simple API test will
trigger large portion of Hadoop code to be initialized. Hadoop code base will
require too much effort to work with Powermock. Programs outside of Hadoop can
use powermock annotation to prevent mocking Hadoop classes, such as:
@powermockignore({"javax.management_", "javax.xml.", "org.w3c.",
"org.apache.hadoop._", "com.sun.*"}) . However, working in Hadoop code base,
this technique is not practical because every class in Hadoop prefix with
org.apache.hadoop. It will be heavy upkeep to maintain the list of prefix
packages that can not work with powermock reflection.
Hence, I rest my case for re-opening this issue.
Regards,
Eric
From: Steve Loughran <[email protected]>
Date: Sunday, October 1, 2017 at 12:36 PM
To: Eric Yang <[email protected]>
Cc: Andrew Wang <[email protected]>, Chris Douglas
<[email protected]>, "[email protected]"
<[email protected]>
Subject: Re: [DISCUSS] HADOOP-9122 Add power mock library for writing better
unit tests
On 29 Sep 2017, at 22:46, Eric Yang
<[email protected]<mailto:[email protected]>> wrote:
Hi Chris and Andrew,
The intend is for new code to have better unit test cases without resort to
invocation of miniHDFSCluster or miniYarnCluster. Existing code don’t require
refactoring, if the test cases already have good coverages. I am currently
working on part of YARN to improve YARN and Docker integration. There are a
lot of code getting triggered for UGI, FileSystem object to Yarn job
submission. My code is only responsible to check the logic of the user input,
and expected output prior to YarnClient job submission. Starting a miniCluster
for this test case is excessive for the small piece of code for validation.
The submission code was imported from Slider for YARN native services, a single
class imports various Hadoop services. In several failure cases, it is
difficult to simulate exact error conditions because the API is several layers
deep. Powermock provides easy way to replace and stubbing return object or
throw proper exception to simulate the failure conditions. One can argue that
the code should have been written easier for unit tests, but Hadoop code
density is beyond trivial to get simple initialization done. Constructor
suppression, inner class replacement and private method override are good tools
from Powermock that can provide more accurate testing without losing sights of
multiple stage API calling tests while keeping the test case localized to a
small piece of the greater puzzle. Hence, I like to request the community to
rethink the improvement that Powermock can bring to the table. Thank you for
your considation.
I don't know enough about powermock to have opinions on the matter. I do know I
don't like mocking in general https://www.slideshare.net/steve_l/i-hate-mocking
, or at least in the one area where I find it most troublesome: maintaining code
I' just find that mock code tests to be very brittle to changes in the
codepaths of the classes called, so whenever you change the implementation,
tests fail. And it's not so much "your code has regressed and we correctly
caught it" failure as "the change in order of invocation caused our test to
report a regression when it wasn't really" kind of failure. Which is bad, as
you waste time working out that this is the cause, then often fix the problems
by moving bits of the test around until it stops failing. Which can hide real
regressions.
Where mocking can be good is in that
1. you can make assertions about how thinga were invoked, though note we've
moved in S3A towards actually instrumenting the code and asserting on that.
This way our shipping code gets to enjoy better instrumentation. [Note, those
assertions can be brittle to changes in implementation too]
2. You can simulate failure better. But for S3Guard/S3A we've gone and
implemented an InconsistentS3Client which can be used downstream (it ships in
the hadoop-aws JAR) and so can be used downstream.
3. You can test things without needing so much support infra (e.g. in unit
tests and on jenkins without needing logins, running services)
4. You can have faster tests, because there's no need to set up/tear down
things like HDFS
5. You can isolate problems to the code under test, rather than looking at the
logs of forked processes collected somewhere under target/
I think Eric's looking @ #4, & 5 which, for tests which need a MiniYARN cluster
is significant. If Powermock helps this, I don't see why we should say "don't
use it", as long as we are aware of the cost, which is the risk of creating
tests which are brittle to changes in the implementation code
FWIW, Mocking is why I couldn't make the init/start/stop methods of
org.apache.hadoop.service.AbstractService final; the need to test with mocking
can impact production code. Is that bad? Well, we do other things to code to
aid testability,...
-Steve