Thanks a lot for looking into HDFS-7722, Chris.

In HDFS-7722:
TestDataNodeVolumeFailureXXX tests reset data dir permissions in TearDown().
TestDataNodeHotSwapVolumes reset permissions in a finally clause.

Also I ran mvn test several times on my machine and all tests passed.

However, since in DiskChecker#checkDirAccess():

private static void checkDirAccess(File dir) throws DiskErrorException {
  if (!dir.isDirectory()) {
    throw new DiskErrorException("Not a directory: "
                                 + dir.toString());
  }

  checkAccessByFileMethods(dir);
}

One potentially safer alternative is replacing data dir with a regular
file to stimulate disk failures.

On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <cnaur...@hortonworks.com> wrote:
> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
> TestDataNodeVolumeFailureReporting, and
> TestDataNodeVolumeFailureToleration all remove executable permissions from
> directories like the one Colin mentioned to simulate disk failures at data
> nodes.  I reviewed the code for all of those, and they all appear to be
> doing the necessary work to restore executable permissions at the end of
> the test.  The only recent uncommitted patch I¹ve seen that makes changes
> in these test suites is HDFS-7722.  That patch still looks fine though.  I
> don¹t know if there are other uncommitted patches that changed these test
> suites.
>
> I suppose it¹s also possible that the JUnit process unexpectedly died
> after removing executable permissions but before restoring them.  That
> always would have been a weakness of these test suites, regardless of any
> recent changes.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
>
>
>
> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> wrote:
>
>>Hey Colin,
>>
>>I asked Andrew Bayer, who works with Apache Infra, what's going on with
>>these boxes. He took a look and concluded that some perms are being set in
>>those directories by our unit tests which are precluding those files from
>>getting deleted. He's going to clean up the boxes for us, but we should
>>expect this to keep happening until we can fix the test in question to
>>properly clean up after itself.
>>
>>To help narrow down which commit it was that started this, Andrew sent me
>>this info:
>>
>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ has
>>500 perms, so I'm guessing that's the problem. Been that way since 9:32
>>UTC
>>on March 5th."
>>
>>--
>>Aaron T. Myers
>>Software Engineer, Cloudera
>>
>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmcc...@apache.org>
>>wrote:
>>
>>> Hi all,
>>>
>>> A very quick (and not thorough) survey shows that I can't find any
>>> jenkins jobs that succeeded from the last 24 hours.  Most of them seem
>>> to be failing with some variant of this message:
>>>
>>> [ERROR] Failed to execute goal
>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean (default-clean)
>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>>>
>>>
>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>> -> [Help 1]
>>>
>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>>> permissions?
>>>
>>> Colin
>>>
>



-- 
Lei (Eddy) Xu
Software Engineer, Cloudera

Reply via email to